Guide to Going Component-Free: Implementing Pure Vanilla JavaScript WebSockets for Remote ESP32-CAM AV Control

Going Component‑Free: Implementing Pure Vanilla JavaScript WebSockets for Remote ESP32‑CAM AV Control

Control an ESP32‑CAM from a browser without pulling in React, Vue, or any third‑party library. By using the native WebSocket API and straightforward HTML5, you can stream video, trigger snapshots, and command pan/tilt/zoom (PTZ) in real‑time—all while keeping the page lightweight and SEO‑friendly.

Why Go Vanilla?

  • Performance: No bundle size overhead; the browser handles everything.
  • SEO & Accessibility: Content is directly crawlable; no client‑side rendering delays.
  • Maintainability: One HTML file, one script – easy to debug and extend.
  • Portability: Works on any modern browser, from desktops to mobile.
The tutorial focuses on the core logic. You can later wrap the code in a framework if you wish, but the foundation stays the same.

Prerequisites

Hardware

  • ESP32‑CAM module (AI‑Thinker board recommended)
  • Power supply (5 V / 2 A)
  • Wi‑Fi network (same subnet as your client)

Software

  • Arduino IDE (or PlatformIO) with ESP32 board support
  • Basic knowledge of JavaScript, HTML, and the ESP32‑CAM API
  • A modern browser (Chrome, Edge, Firefox, Safari)

1️⃣ Setting Up the ESP32‑CAM WebSocket Server

The ESP32‑CAM runs a tiny WebSocket server that pushes JPEG frames and receives control commands. Below is a minimal sketch.

#include <WiFi.h>
#include <WebServer.h>
#include <WebSocketsServer.h>
#include <esp_camera.h>

// ---- Wi‑Fi credentials ----
const char* ssid = "YOUR_SSID";
const char* password = "YOUR_PASSWORD";

// ---- WebSocket on port 81 ----
WebSocketsServer webSocket = WebSocketsServer(81);

// ---- Camera configuration (AI‑Thinker) ----
camera_config_t camConfig = {
  .ledc_channel = LEDC_CHANNEL_0,
  .ledc_timer   = LEDC_TIMER_0,
  .pin_d0       = 5,
  .pin_d1       = 18,
  .pin_d2       = 19,
  .pin_d3       = 21,
  .pin_d4       = 36,
  .pin_d5       = 39,
  .pin_d6       = 34,
  .pin_d7       = 35,
  .pin_xclk     = 0,
  .pin_pclk     = 22,
  .pin_vsync    = 25,
  .pin_href     = 23,
  .pin_sscb_sda = 26,
  .pin_sscb_scl = 27,
  .pin_pwdn     = 32,
  .pin_reset    = -1,
  .x_clk_freq_hz = 20000000,
  .pixel_format = PIXFORMAT_JPEG,
  .frame_size   = FRAMESIZE_QVGA, // 320x240
  .jpeg_quality = 12,
  .fb_count     = 2,
};

void handleWebSocketMessage(void *arg, uint8_t *data, size_t len) {
  // Simple command parser (e.g., "TAKE_SNAPSHOT")
  String msg = String((char*)data);
  if (msg == "TAKE_SNAPSHOT") {
    camera_fb_t *fb = esp_camera_fb_get();
    if (fb) {
      webSocket.sendBIN(0, fb->buf, fb->len);
      esp_camera_fb_return(fb);
    }
  }
  // Add PTZ commands here (if you have a servo board attached)
}

void onWebSocketEvent(uint8_t num, WStype_t type, uint8_t * payload, size_t length){
  switch(type){
    case WStype_TEXT:
      handleWebSocketMessage(nullptr, payload, length);
      break;
    default: break;
  }
}

void setup() {
  Serial.begin(115200);
  WiFi.begin(ssid, password);
  while (WiFi.status() != WL_CONNECTED) delay(500);
  Serial.println("WiFi connected: " + WiFi.localIP().toString());

  // Init camera
  esp_err_t err = esp_camera_init(&camConfig);
  if (err != ESP_OK) {
    Serial.printf("Camera init failed with error 0x%x", err);
    return;
  }

  // Start WebSocket server
  webSocket.begin();
  webSocket.onEvent(onWebSocketEvent);
}

void loop() {
  webSocket.loop();

  // Broadcast live JPEG frames at ~10 FPS
  static uint32_t lastMs = 0;
  if (millis() - lastMs > 100) {
    camera_fb_t *fb = esp_camera_fb_get();
    if (fb) {
      webSocket.broadcastBIN(fb->buf, fb->len);
      esp_camera_fb_return(fb);
      lastMs = millis();
    }
  }
}

This sketch does three things:

  1. Connects to Wi‑Fi.
  2. Initialises the camera in JPEG mode.
  3. Starts a WebSocket server on port 81 that continuously streams frames and listens for text commands.

2️⃣ Building the Pure Vanilla JavaScript Client

The client consists of a single HTML file with embedded CSS and JavaScript. No external libraries are loaded, which keeps the page fast and SEO‑friendly.

<!-- index.html – place this file on any web server (or open locally) -->
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>ESP32‑CAM Remote Control</title>
  <meta name="description" content="Pure vanilla JavaScript WebSocket interface for ESP32‑CAM live streaming and AV control.">
  <style>
    body{font-family:Arial,Helvetica,sans-serif;background:#fafafa;margin:0;padding:0;}
    .container{max-width:960px;margin:auto;padding:20px;}
    .video-box{position:relative;width:100%;padding-top:56.25%;background:#000;border-radius:8px;overflow:hidden;box-shadow:0 4px 12px rgba(0,0,0,0.08);}
    .video-box img{position:absolute;top:0;left:0;width:100%;height:100%;object-fit:cover;}
    .controls{margin-top:15px;display:flex;flex-wrap:wrap;gap:10px;}
    .btn{background:#6B7C3A;color:#fff;padding:10px 15px;border:none;border-radius:6px;cursor:pointer;transition:background .2s;}
    .btn:hover{background:#566024;}
    .status{margin-top:10px;font-size:0.9rem;color:#555;}
  </style>
</head>
<body>
  <div class="container">
    <h1>ESP32‑CAM Remote Control</h1>
    <div class="video-box">
      <img id="liveFeed" src="" alt="Live stream from ESP32‑CAM">
    </div>

    <div class="controls">
      <button class="btn" id="snapshotBtn">Take Snapshot</button>
      <button class="btn" id="panLeftBtn">Pan Left</button>
      <button class="btn" id="panRightBtn">Pan Right</button>
      <button class="btn" id="tiltUpBtn">Tilt Up</button>
      <button class="btn" id="tiltDownBtn">Tilt Down</button>
    </div>

    <div class="status" id="statusMsg">Connecting...</div>
  </div>

  <script>
    // ----- Configuration -----
    const ESP_IP = '192.168.1.45'; // replace with your ESP32‑CAM IP
    const WS_PORT = 81;
    const WS_URL = `ws://${ESP_IP}:${WS_PORT}`;

    // ----- UI references -----
    const liveFeed = document.getElementById('liveFeed');
    const statusMsg = document.getElementById('statusMsg');
    const snapshotBtn = document.getElementById('snapshotBtn');
    const panLeftBtn   = document.getElementById('panLeftBtn');
    const panRightBtn  = document.getElementById('panRightBtn');
    const tiltUpBtn    = document.getElementById('tiltUpBtn');
    const tiltDownBtn  = document.getElementById('tiltDownBtn');

    // ----- WebSocket handling -----
    let ws;

    function initWebSocket() {
      ws = new WebSocket(WS_URL);
      ws.binaryType = 'blob'; // receive JPEG as Blob

      ws.onopen = () => {
        statusMsg.textContent = '✅ Connected to ESP32‑CAM';
      };

      ws.onmessage = event => {
        // If the server sent binary data (JPEG frame)
        if (event.data instanceof Blob) {
          const url = URL.createObjectURL(event.data);
          liveFeed.src = url;
          // Revoke after a short delay to free memory
          setTimeout(() => URL.revokeObjectURL(url), 250);
        }
      };

      ws.onerror = err => {
        console.error('WebSocket error:', err);
        statusMsg.textContent = '⚠️ Connection error';
      };

      ws.onclose = () => {
        statusMsg.textContent = '🔌 Disconnected – retrying...';
        // Auto‑reconnect after 2 seconds
        setTimeout(initWebSocket, 2000);
      };
    }

    // ----- Command helpers -----
    function sendCommand(cmd) {
      if (ws && ws.readyState === WebSocket.OPEN) {
        ws.send(cmd);
      } else {
        console.warn('WebSocket not ready – command ignored');
      }
    }

    // ----- Button actions -----
    snapshotBtn.onclick = () => sendCommand('TAKE_SNAPSHOT');
    panLeftBtn.onclick   = () => sendCommand('PAN_LEFT');
    panRightBtn.onclick  = () => sendCommand('PAN_RIGHT');
    tiltUpBtn.onclick    = () => sendCommand('TILT_UP');
    tiltDownBtn.onclick  = () => sendCommand('TILT_DOWN');

    // ----- Initialise -----
    initWebSocket();
  </script>
</body>
</html>

Key points in the script:

  • Set binaryType = 'blob' so JPEG frames arrive as binary data.
  • Use URL.createObjectURL for fast image rendering without base64 conversion.
  • Automatic reconnection logic keeps the UI alive if the ESP32‑CAM restarts.
Because the page contains only native HTML and JavaScript, search engines can index the headings, description, and even the code snippets—boosting SEO.

3️⃣ Adding PTZ (Pan/Tilt/Zoom) Support

If your ESP32‑CAM is paired with a servo board, extend the Arduino sketch to react to the new commands. Below is a quick addition.

// Assume two SG90 servos on GPIO 13 (pan) and GPIO 14 (tilt)
#include <Servo.h>
Servo panServo, tiltServo;

void setupServos() {
  panServo.attach(13);
  tiltServo.attach(14);
  panServo.write(90);   // center position
  tiltServo.write(90);
}

void handleWebSocketMessage(void *arg, uint8_t *data, size_t len) {
  String cmd = String((char*)data);
  if (cmd == "PAN_LEFT")  panServo.write(panServo.read() - 10);
  else if (cmd == "PAN_RIGHT") panServo.write(panServo.read() + 10);
  else if (cmd == "TILT_UP")   tiltServo.write(tiltServo.read() - 10);
  else if (cmd == "TILT_DOWN") tiltServo.write(tiltServo.read() + 10);
  else if (cmd == "TAKE_SNAPSHOT") {
    // existing snapshot logic
  }
}

Adjust the angle step (10°) to suit your hardware. The same sendCommand function on the client side works without modification.

4️⃣ Security & Performance Best Practices

Secure the Connection

  • Prefer wss:// when serving the page over HTTPS; you can terminate TLS on a reverse proxy (e.g., Nginx) and proxy to the ESP32‑CAM.
  • Implement a simple token handshake: the client sends a secret string right after onopen, and the ESP validates before broadcasting.

Reduce Bandwidth

  • Set .frame_size = FRAMESIZE_QVGA or even QQVGA for low‑speed networks.
  • Adjust .jpeg_quality (10‑20 range) to balance quality vs. size.
  • Throttle the broadcast interval (e.g., 150 ms for ~7 fps).

5️⃣ Debugging Tips

Scenario Check Solution
WebSocket fails to open
Ready to Start?

Become Part of the ICT Club Community

Many learners are already building the technology skills that improve their daily work performance. Your journey starts today.