Voice Command Recognition using ESP32 and TinyML

Voice Command Recognition with ESP32 and TinyML (Using Edge Impulse)

November 24, 2025 ESP32 Tutorial

Updated: November 15, 2025

Ever wanted to make your ESP32 respond to voice commands like “ON”, “OFF”, or “LIGHT”? Thanks to TinyML, we can now run machine-learning models directly on microcontrollers — no cloud needed. In this tutorial, I’ll walk you through building a voice command recognition project using an ESP32, Edge Impulse, and a small digital microphone.

This guide is perfect if you're exploring IoT, home automation, or want to experiment with embedded AI.

Contents

What You’ll Learn

How to collect and prepare audio data for machine learning
How to train a keyword-spotting model in Edge Impulse
How to deploy TinyML models to an ESP32
How to trigger actions using recognized voice commands

Why Use TinyML on ESP32?

The ESP32 is a powerhouse for a microcontroller — dual-core, WiFi, BLE, and enough memory to run compact ML models. With TinyML, we can make it “listen” for keywords locally, meaning:

No internet needed
Faster response
More private (no cloud recordings)
Extremely low power

TinyML + ESP32 = smarter projects without depending on a server.

Hardware Requirements

ESP32 dev board (any ESP32 WROOM/WROVER board works)
I2S microphone such as:
- INMP441
- SPH0645
- ICS-43434
USB cable
Jumper wires

Optional:

LED, relay, or any output you want to control with voice.

Step 1: Create an Edge Impulse Project

Go to https://edgeimpulse.com
Create a new project → choose Audio as the domain
Set the labeling mode to “Keyword Spotting” (or just Speech)

Edge Impulse makes the workflow super clean: collect → train → deploy.

Step 2: Collect Audio Samples

You can collect samples either:

A) Using your laptop/microphone

Go to Data Acquisition → Record new audio
Record different command words such as:

on
off
stop
go

Record at least 30 samples per class, ideally 1 second long and normal speaking volume.

B) Using ESP32 (optional)

You can install the Edge Impulse CLI and stream audio from ESP32, but for beginners, I usually suggest using the browser mic — it's easier and faster.

Step 3: Generate MFCC Features

TinyML audio models usually use MFCC (Mel-Frequency Cepstral Coefficients).
Good news: Edge Impulse handles everything automatically.

Go to Impulse Design
Set window size: 1000ms
Add “MFCC” as the processing block
Add “Classification (Keras)” as the learning block
Save and click MFCC section
Generate features → Save Features

You'll see a nice feature graph with clear shapes for each word.

Step 4: Train the Neural Network

Go to Training
Start training
Aim for at least 90% accuracy
Tweak epochs or learning rate if needed

The default network in Edge Impulse is usually enough:

1D CNN
Small footprint
Good performance on microcontrollers

Once done, check the Confusion Matrix to confirm your commands aren’t overlapping too much.

Step 5: Deploy the Model to ESP32

This is the coolest part.

Option 1: Export as an Arduino Library

Go to Deployment
Choose Arduino Library
Download the .zip file
Import into Arduino IDE using Sketch → Include Library → Add .ZIP Library

Option 2: Use the Edge Impulse ESP32 firmware

Edge Impulse provides a ready-made ESP32 firmware for inference, but for full project customization, I recommend the Arduino library.

Step 6: Connect the I2S Microphone to ESP32

Example wiring for INMP441:

INMP441	ESP32
VDD	3.3V
GND	GND
SCK	GPIO 14
WS	GPIO 15
SD	GPIO 32

Make sure your microphone is 3.3V-compatible!

Step 7: Arduino Code for Voice Recognition

Below is a clean starter sketch using Edge Impulse’s library. Replace

your-project-name_inferencing.h

with your generated file.

#include <your-project-name_inferencing.h>
#include <driver/i2s.h>

#define SAMPLE_RATE     16000
#define I2S_NUM         I2S_NUM_0

// I2S config
i2s_config_t i2s_config = {
  .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
  .sample_rate = SAMPLE_RATE,
  .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
  .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
  .communication_format = I2S_COMM_FORMAT_I2S,
  .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
  .dma_buf_count = 4,
  .dma_buf_len = 1024
};

// I2S pin config
i2s_pin_config_t pin_config = {
  .bck_io_num = 14,
  .ws_io_num = 15,
  .data_out_num = -1,
  .data_in_num = 32
};

void setup() {
  Serial.begin(115200);

  // Init I2S
  i2s_driver_install(I2S_NUM, &i2s_config, 0, NULL);
  i2s_set_pin(I2S_NUM, &pin_config);

  Serial.println("Ready to recognize commands...");
}

void loop() {
  static signed short audio_buffer[EI_CLASSIFIER_RAW_SAMPLE_COUNT];

  size_t bytes_read;
  i2s_read(I2S_NUM, audio_buffer, sizeof(audio_buffer), &bytes_read, portMAX_DELAY);

  ei_impulse_result_t result;
  EI_IMPULSE_ERROR err = run_classifier(&signal, &result, false);

  if (err != EI_IMPULSE_OK) {
    Serial.println("Error running classifier");
    return;
  }

  // Print predictions
  for (size_t i = 0; i < EI_CLASSIFIER_LABEL_COUNT; i++) {
    Serial.print(result.classification[i].label);
    Serial.print(": ");
    Serial.println(result.classification[i].value);
  }

  // Trigger actions based on predictions
  if (result.classification[0].value > 0.8) {
    // Example: ON command
    Serial.println("Command recognized: TURN ON");
  }
}

You can now add LEDs, relays, motors — whatever action you want linked to the detected word.

Step 8: Add Actions (LED Example)

if (strcmp(result.classification[i].label, "on") == 0 &&
    result.classification[i].value > 0.8) {
    digitalWrite(LED_BUILTIN, HIGH);
}

if (strcmp(result.classification[i].label, "off") == 0 &&
    result.classification[i].value > 0.8) {
    digitalWrite(LED_BUILTIN, LOW);
}

Simple and powerful!

Improving Accuracy

Here are some optional tweaks:

✔ Record samples in different tones and volumes
✔ Add “noise” samples (fan noise, background talking)
✔ Use a better microphone
✔ Add more training epochs
✔ Add a “silence” class

Conclusion

Voice command recognition on ESP32 used to sound impossible — but now, with Edge Impulse and TinyML, it’s incredibly easy to build your own offline voice assistant. You can expand this project to: