
2025-03
ESP32 Distributed Camera System
Multi-camera surveillance with real-time streaming over WebSockets
Overview
Off-the-shelf surveillance systems are closed platforms — you get what the vendor decided to build. This project takes ESP32-CAM modules (a $10 microcontroller with an onboard camera) and builds a fully custom distributed surveillance system on top of them: real-time streaming, motion detection, remote camera control, and a centralized management interface, all running over WebSockets.
The system spans two layers: firmware running on the ESP32-CAM modules written in C++ (Arduino), and a Python WebSocket server with a browser-based control interface.
System Architecture
Browser (control interface)
│
▼
Python WebSocket Server (server.py)
│ ┌──────────────────────────────┐
├─► ESP32-CAM Module 1 (WiFi) │
├─► ESP32-CAM Module 2 (WiFi) │
└─► ESP32-CAM Module N (WiFi) │
└──────────────────────────────┘Each ESP32-CAM module maintains a persistent WebSocket connection to the Python server. The server acts as a hub — proxying video frames to the browser interface, routing control commands to the appropriate camera, and managing reconnection when a module drops off the network.
Firmware (C++ / Arduino)
The ESP32-CAM firmware handles camera initialization, frame capture, WiFi connection, and the WebSocket client loop. Each module:
- Connects to WiFi on boot and establishes a WebSocket connection to the server
- Streams JPEG frames continuously over the connection
- Accepts control commands: LED on/off, resolution changes, quality adjustment, motor control
- Handles automatic reconnection if the WebSocket connection drops
The firmware uses the ESP32 camera library directly for frame capture, giving precise control over resolution, compression quality, and exposure. Frames are transmitted as binary WebSocket messages.
Server (Python)
The Python server manages all connected cameras and serves the browser interface:
- Multi-camera coordination — tracks each connected ESP32 by a persistent identifier, routes commands to the correct module
- Motion detection — uses OpenCV frame differencing to detect motion per camera, with configurable sensitivity threshold
- Stream relay — receives binary frames from ESP32 modules and forwards them to browser clients watching that camera
- Settings persistence — camera names and configuration survive server restarts
- Automatic reconnection handling — cameras that drop off the network are marked offline and reconnected transparently when they come back
Why WebSockets
The ESP32 has limited memory and no persistent storage. Polling (HTTP request per frame) would be too slow and expensive for real-time video. WebSockets give a persistent bidirectional channel: the ESP32 pushes frames as fast as it can capture them, and the server can send control commands back on the same connection without the ESP32 needing to implement an HTTP server.
Hardware Constraints
Working with the ESP32-CAM imposed real constraints that shaped the design:
- Memory — the ESP32-CAM has 4MB of PSRAM. Frame buffer size is bounded. Higher resolutions at high quality fill the buffer faster than WiFi can drain it, causing dropped frames.
- WiFi reliability — cheap modules on busy 2.4GHz networks drop connections regularly. The reconnection logic on both firmware and server sides was not optional.
- Programming interface — the ESP32-CAM has no USB. Programming requires a USB-to-TTL converter wired to the UART pins, with GPIO0 pulled to ground during flash.
These constraints make the project more interesting than a cloud camera integration: every decision has a physical consequence.