Voice assistant pipeline using VOSK & ffmpeg
Overview
First step - Setup the complete flow.
-
Using ffmpeg, listen to a port for the stream from python code. (server)
ffmpeg -protocol_whitelist file,udp,rtp -i something.sdp -ar str(SAMPLE_RATE) -ac 1 -f s16le - -loglevel quiet
-
Using ffmpeg streaming hellohowareyou.mp3 file in loop to server to mimic mic input. (local)
ffmpeg -stream_loop -1 -re -i /Users/harims/code/archived/vosk_code/voice_processing/input/audiofiles/hellohowareyou.mp3 -ar 8000 -f mulaw -f rtp rtp://<Server IP>:<Port>
-
Setup Mosquitto mqqt
sudo apt-add-repository ppa:mosquitto-dev/mosquitto-ppa
sudo apt-get update
sudo apt install mosquitto
-
From server push the identified sentence into mqqt.
import paho.mqtt.client as mqtt def on_pre_connect(client,data): return # The callback for when the client receives a CONNACK response from the server. def on_connect(client, userdata, flags, rc): print("Connected with result code "+str(rc)) client = mqtt.Client() client.on_pre_connect=on_pre_connect client.connect("localhost", 1883, 60) def sendtomqqt(data): client.publish("identifiedtext", data)
-
Subscribe to this message and invoke Python function to process.
import paho.mqtt.client as mqtt def on_pre_connect(client,data): return # The callback for when the client receives a CONNACK response from the server. def on_connect(client, userdata, flags, rc): print("Connected with result code "+str(rc)) # Subscribing in on_connect() means that if we lose the connection and # reconnect then subscriptions will be renewed. client.subscribe("identifiedtext") # The callback for when a PUBLISH message is received from the server. def on_message(client, userdata, msg): print(msg.topic+" "+str(msg.payload)) client = mqtt.Client() client.on_pre_connect=on_pre_connect client.on_connect = on_connect client.on_message = on_message client.connect("localhost", 1883, 60) # Blocking call that processes network traffic, dispatches callbacks and # handles reconnecting. # Other loop*() functions are available that give a threaded interface and a # manual interface. client.loop_forever()
- Setup NLP based systems like deeppavlov.
- Get the output and push that to mqqt.
- Subscribe to this message and execute functionality. Push the result code to mqqt back.
-
Subscribe to this message and invoke Python function to prepare response and generate sound file using TTS or Vosk-tts
tts --text "Text for TTS" --model_name "tts_models/en/ek1/tacotron2" --vocoder_name "vocoder_models/universal/libri-tts/wavegrad" --out_path ~/sound.wav
- Using ffmpeg stream it back to source.
To stream mic - ffmpeg -f avfoundation -i ":0" -acodec libmp3lame -ab 32k -ac 1 -f rtp rtp://192.168.1.148:8009