Home / Tutorials / Raspberry Pi Tutorial / How I Built an Alexa-Like Voice Assistant Using Raspberry Pi 5 and an LLM
pcbway

How I Built an Alexa-Like Voice Assistant Using Raspberry Pi 5 and an LLM

In this tutorial, I’m going to walk you through how I built my own voice assistant—similar to Alexa or Google Home—using only a Raspberry Pi 5, a USB microphone, a speaker, and an LLM with text-to-speech (TTS) and speech-to-text (STT) support.

By the end of this guide, you’ll have a Raspberry-Pi-powered device that:

  • Listens for your voice
  • Converts your speech into text
  • Sends that text to an LLM
  • Converts the LLM’s response into natural speech
  • Speaks it aloud

It feels surprisingly close to a commercial smart assistant—just without the Amazon ecosystem behind it.


Why I Chose Raspberry Pi 5 for My Voice Assistant

The Raspberry Pi 5 is powerful enough to handle:

  • Live audio recording
  • Streaming data to an LLM
  • Playing back voice responses
  • Handling wake-word models like Porcupine

Its quad-core CPU and USB 3.0 bandwidth make it perfect for a responsive, low-latency voice assistant.


What You Need

Hardware

  • Raspberry Pi 5 (4GB or 8GB recommended)
  • USB microphone (Blue Snowball, Fifine, or any generic USB mic)
  • Speakers or a USB sound card with speakers
  • Raspberry Pi OS (Bookworm), fully updated
  • Internet connection (Wi-Fi or Ethernet)

Software & APIs

  • Python 3.11+ (pre-installed on Pi OS)
  • OpenAI Realtime API (or any LLM with STT + TTS endpoints)
  • Picovoice Porcupine (optional wake-word detection)
  • Python libraries: sounddevice, pyaudio, websockets, requests

Step 1 — Updating My Raspberry Pi 5

I always begin by updating the Pi:

sudo apt update && sudo apt upgrade -y

 

Then install required audio tools:

sudo apt install python3-pip portaudio19-dev ffmpeg -y

Step 2 — Setting Up My USB Microphone

I plugged in my mic and checked if the Pi detected it:

arecord -l

 

To test recording:

arecord --format=S16_LE --duration=3 --rate=16000 --file-type=wav test.wav
aplay test.wav

 

If I heard my voice, the microphone and speaker were good to go.


Step 3 — Installing Python Libraries

I installed everything I needed for recording audio, playing audio, and talking to the LLM:

pip3 install sounddevice pyaudio numpy websockets requests

 

If I planned to add a wake word:

pip3 install pvporcupine

Step 4 — Setting Up the LLM (OpenAI Realtime)

To get real-time conversational audio, I used the OpenAI Realtime WebSocket endpoint.
This gives me:

  • Streaming speech-to-text
  • Streaming conversation
  • Streaming text-to-speech

All in one pipeline.

I simply created an API key in my OpenAI account and stored it:

nano ~/.openai_key

Step 5 — Creating My Simple Push-to-Talk Voice Assistant (MVP)

Before getting fancy with wake words, I built a press-Enter-to-talk version.

This script:

  1. Records my voice
  2. Sends audio to Whisper STT
  3. Sends text to an LLM
  4. Converts reply text to speech
  5. Plays the audio

Create the file:

nano assistant.py

 

Paste this:

import sounddevice as sd
import numpy as np
import requests
import json
import subprocess
import tempfile

OPENAI_KEY = open("/home/pi/.openai_key").read().strip()

# Record function
def record_audio(duration=4, fs=16000):
    print("🎙️ Speak now...")
    audio = sd.rec(int(duration * fs), samplerate=fs, channels=1, dtype='int16')
    sd.wait()
    print("⏹️ Recording stopped")
    return audio.tobytes()

# Send audio -> STT
def speech_to_text(audio_bytes):
    response = requests.post(
        "https://api.openai.com/v1/audio/transcriptions",
        headers={"Authorization": f"Bearer {OPENAI_KEY}"},
        files={"file": ("speech.wav", audio_bytes, "audio/wav")},
        data={"model": "gpt-4o-mini-transcribe"}
    )

    return response.json()["text"]

# LLM call
def ask_llm(text):
    payload = {
        "model": "gpt-4o-mini",
        "messages": [{"role": "user", "content": text}]
    }

    response = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers={
            "Authorization": f"Bearer {OPENAI_KEY}",
            "Content-Type": "application/json"
        },
        data=json.dumps(payload)
    )

    return response.json()["choices"][0]["message"]["content"]

# Text-to-speech
def tts(text):
    response = requests.post(
        "https://api.openai.com/v1/audio/speech",
        headers={
            "Authorization": f"Bearer {OPENAI_KEY}"
        },
        json={
            "model": "gpt-4o-mini-tts",
            "voice": "alloy",
            "input": text
        }
    )

    return response.content

# Play sound
def play_audio(audio_bytes):
    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as f:
        f.write(audio_bytes)
        f.flush()
        subprocess.run(["ffplay", "-nodisp", "-autoexit", f.name])

# Main loop
while True:
    input("\n👉 Press Enter to talk...")
    audio = record_audio()
    text = speech_to_text(audio)
    print("🗣️ You said:", text)

    reply = ask_llm(text)
    print("🤖 Assistant:", reply)

    audio_reply = tts(reply)
    play_audio(audio_reply)

Run it:

python3 assistant.py

 

I now had a functional voice AI that:

  • Listens when I press Enter
  • Understands what I say
  • Responds using natural speech

Step 6 — Adding Wake-Word Detection (Optional Upgrade)

To get that Alexa-like "always listening" experience, I added Porcupine.

Install the engine:

pip3 install pvporcupine

 

I then created a second script,

wake_assistant.py
, that:

  • Continuously listens for the wake word (e.g., “Jarvis”)
  • When triggered, runs the same record → STT → LLM → TTS pipeline

(Want me to include the full wake-word script? I can add it.)


Step 7 — (Optional) Using OpenAI Realtime for Streaming Audio

For true Alexa behavior—interrupting, barge-in, low latency—I can give you the Realtime WebSocket version too. It allows:

  • Speaking to the Pi without waiting for a stop signal
  • The LLM to start responding while I'm still talking
  • Beautifully fluid conversation

Just tell me if you want the full Realtime streaming version.


Step 8 — Autostarting My Voice Assistant on Boot

I made my AI device boot straight into “always listening” mode using systemd:

sudo nano /etc/systemd/system/voice.service

 

Service file:

[Unit]
Description=Raspberry Pi Voice Assistant
After=network.target

[Service]
ExecStart=/usr/bin/python3 /home/pi/assistant.py
WorkingDirectory=/home/pi
Restart=always
User=pi

[Install]
WantedBy=multi-user.target

 

Enable it:

sudo systemctl enable voice.service
sudo systemctl start voice.service

Now my Pi automatically behaves like a smart speaker.


Final Thoughts

Building my own Alexa-style voice assistant on the Raspberry Pi 5 turned out to be:

  • Easier than expected
  • Surprisingly powerful
  • Fully customizable
  • Not locked into any ecosystem

I can choose any wake word, any personality, any LLM, and I fully own the device.

If you want to extend this project, here are ideas:

  • Add home automation (MQTT, Home Assistant)
  • Display responses on a small touchscreen
  • Allow multi-room audio
  • Build a custom housing with 3D printing
  • Add LED rings like Alexa Echo

Check Also

Raspberry Pi Pico HID

Using Raspberry Pi Pico as an HID to Control Mouse Movements and Keyboard Strokes

Updated: April 3, 2025The Raspberry Pi Pico, a microcontroller based on the RP2040 chip, can …

Index