Home / Tutorials / ESP32 Tutorial / Voice Command Recognition with ESP32 and TinyML (Using Edge Impulse)
pcbway
Voice Command Recognition using ESP32 and TinyML

Voice Command Recognition with ESP32 and TinyML (Using Edge Impulse)

Ever wanted to make your ESP32 respond to voice commands like “ON”, “OFF”, or “LIGHT”? Thanks to TinyML, we can now run machine-learning models directly on microcontrollers — no cloud needed. In this tutorial, I’ll walk you through building a voice command recognition project using an ESP32, Edge Impulse, and a small digital microphone.

This guide is perfect if you're exploring IoT, home automation, or want to experiment with embedded AI.


What You’ll Learn

  • How to collect and prepare audio data for machine learning
  • How to train a keyword-spotting model in Edge Impulse
  • How to deploy TinyML models to an ESP32
  • How to trigger actions using recognized voice commands

Why Use TinyML on ESP32?

The ESP32 is a powerhouse for a microcontroller — dual-core, WiFi, BLE, and enough memory to run compact ML models. With TinyML, we can make it “listen” for keywords locally, meaning:

  • No internet needed
  • Faster response
  • More private (no cloud recordings)
  • Extremely low power

TinyML + ESP32 = smarter projects without depending on a server.


Hardware Requirements

  • ESP32 dev board (any ESP32 WROOM/WROVER board works)
  • I2S microphone such as:
    • INMP441
    • SPH0645
    • ICS-43434
  • USB cable
  • Jumper wires

Optional:

  • LED, relay, or any output you want to control with voice.

Step 1: Create an Edge Impulse Project

  1. Go to https://edgeimpulse.com
  2. Create a new project → choose Audio as the domain
  3. Set the labeling mode to “Keyword Spotting” (or just Speech)

Edge Impulse makes the workflow super clean: collect → train → deploy.


Step 2: Collect Audio Samples

You can collect samples either:

A) Using your laptop/microphone

Go to Data Acquisition → Record new audio
Record different command words such as:

  • on
  • off
  • stop
  • go

Record at least 30 samples per class, ideally 1 second long and normal speaking volume.

B) Using ESP32 (optional)

You can install the Edge Impulse CLI and stream audio from ESP32, but for beginners, I usually suggest using the browser mic — it's easier and faster.


Step 3: Generate MFCC Features

TinyML audio models usually use MFCC (Mel-Frequency Cepstral Coefficients).
Good news: Edge Impulse handles everything automatically.

  1. Go to Impulse Design
  2. Set window size: 1000ms
  3. Add “MFCC” as the processing block
  4. Add “Classification (Keras)” as the learning block
  5. Save and click MFCC section
  6. Generate features → Save Features

You'll see a nice feature graph with clear shapes for each word.


Step 4: Train the Neural Network

  1. Go to Training
  2. Start training
  3. Aim for at least 90% accuracy
  4. Tweak epochs or learning rate if needed

The default network in Edge Impulse is usually enough:

  • 1D CNN
  • Small footprint
  • Good performance on microcontrollers

Once done, check the Confusion Matrix to confirm your commands aren’t overlapping too much.


Step 5: Deploy the Model to ESP32

This is the coolest part.

Option 1: Export as an Arduino Library

  1. Go to Deployment
  2. Choose Arduino Library
  3. Download the .zip file
  4. Import into Arduino IDE using Sketch → Include Library → Add .ZIP Library

Option 2: Use the Edge Impulse ESP32 firmware

Edge Impulse provides a ready-made ESP32 firmware for inference, but for full project customization, I recommend the Arduino library.


Step 6: Connect the I2S Microphone to ESP32

Example wiring for INMP441:

INMP441 ESP32
VDD 3.3V
GND GND
SCK GPIO 14
WS GPIO 15
SD GPIO 32

Make sure your microphone is 3.3V-compatible!


Step 7: Arduino Code for Voice Recognition

Below is a clean starter sketch using Edge Impulse’s library. Replace

your-project-name_inferencing.h
with your generated file.

#include <your-project-name_inferencing.h>
#include <driver/i2s.h>

#define SAMPLE_RATE     16000
#define I2S_NUM         I2S_NUM_0

// I2S config
i2s_config_t i2s_config = {
  .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
  .sample_rate = SAMPLE_RATE,
  .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
  .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
  .communication_format = I2S_COMM_FORMAT_I2S,
  .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
  .dma_buf_count = 4,
  .dma_buf_len = 1024
};

// I2S pin config
i2s_pin_config_t pin_config = {
  .bck_io_num = 14,
  .ws_io_num = 15,
  .data_out_num = -1,
  .data_in_num = 32
};

void setup() {
  Serial.begin(115200);

  // Init I2S
  i2s_driver_install(I2S_NUM, &i2s_config, 0, NULL);
  i2s_set_pin(I2S_NUM, &pin_config);

  Serial.println("Ready to recognize commands...");
}

void loop() {
  static signed short audio_buffer[EI_CLASSIFIER_RAW_SAMPLE_COUNT];

  size_t bytes_read;
  i2s_read(I2S_NUM, audio_buffer, sizeof(audio_buffer), &bytes_read, portMAX_DELAY);

  ei_impulse_result_t result;
  EI_IMPULSE_ERROR err = run_classifier(&signal, &result, false);

  if (err != EI_IMPULSE_OK) {
    Serial.println("Error running classifier");
    return;
  }

  // Print predictions
  for (size_t i = 0; i < EI_CLASSIFIER_LABEL_COUNT; i++) {
    Serial.print(result.classification[i].label);
    Serial.print(": ");
    Serial.println(result.classification[i].value);
  }

  // Trigger actions based on predictions
  if (result.classification[0].value > 0.8) {
    // Example: ON command
    Serial.println("Command recognized: TURN ON");
  }
}

You can now add LEDs, relays, motors — whatever action you want linked to the detected word.


Step 8: Add Actions (LED Example)

if (strcmp(result.classification[i].label, "on") == 0 &&
    result.classification[i].value > 0.8) {
    digitalWrite(LED_BUILTIN, HIGH);
}

if (strcmp(result.classification[i].label, "off") == 0 &&
    result.classification[i].value > 0.8) {
    digitalWrite(LED_BUILTIN, LOW);
}

Simple and powerful!


Improving Accuracy

Here are some optional tweaks:

✔ Record samples in different tones and volumes
✔ Add “noise” samples (fan noise, background talking)
✔ Use a better microphone
✔ Add more training epochs
✔ Add a “silence” class


Conclusion

Voice command recognition on ESP32 used to sound impossible — but now, with Edge Impulse and TinyML, it’s incredibly easy to build your own offline voice assistant. You can expand this project to:

  • Smart home voice control
  • Kids’ toys with voice interaction
  • Hands-free control panels
  • Voice-activated robots

Check Also

Wemos D1 R32

Getting Started with WEMOS D1 R32 

Updated: November 20, 2025The WeMos D1 R32 is an ESP32-based development board that takes the …

Index