Voice assistant pipeline using VOSK & ffmpeg
First step - Setup the complete flow.
Using ffmpeg, listen to a port for the stream from python code. (server)
ffmpeg -protocol_whitelist file,udp,rtp -i something.sdp -ar str(SAMPLE_RATE) -ac 1 -f s16le - -loglevel quiet
Using ffmpeg streaming hellohowareyou.mp3 file in loop to server to mimic mic input. (local)
ffmpeg -stream_loop -1 -re -i /Users/harims/code/archived/vosk_code/voice_processing/input/audiofiles/hellohowareyou.mp3 -ar 8000 -f mulaw -f rtp rtp://<Server IP>:<Port>
Setup Mosquitto mqqt
sudo apt-add-repository ppa:mosquitto-dev/mosquitto-ppa
sudo apt-get update
sudo apt install mosquitto
From server push the identified sentence into mqqt.
import paho.mqtt.client as mqtt def on_pre_connect(client,data): return # The callback for when the client receives a CONNACK response from the server. def on_connect(client, userdata, flags, rc): print("Connected with result code "+str(rc)) client = mqtt.Client() client.on_pre_connect=on_pre_connect client.connect("localhost", 1883, 60) def sendtomqqt(data): client.publish("identifiedtext", data)
Subscribe to this message and invoke Python function to process.
import paho.mqtt.client as mqtt def on_pre_connect(client,data): return # The callback for when the client receives a CONNACK response from the server. def on_connect(client, userdata, flags, rc): print("Connected with result code "+str(rc)) # Subscribing in on_connect() means that if we lose the connection and # reconnect then subscriptions will be renewed. client.subscribe("identifiedtext") # The callback for when a PUBLISH message is received from the server. def on_message(client, userdata, msg): print(msg.topic+" "+str(msg.payload)) client = mqtt.Client() client.on_pre_connect=on_pre_connect client.on_connect = on_connect client.on_message = on_message client.connect("localhost", 1883, 60) # Blocking call that processes network traffic, dispatches callbacks and # handles reconnecting. # Other loop*() functions are available that give a threaded interface and a # manual interface. client.loop_forever()
- Setup NLP based systems like deeppavlov.
- Get the output and push that to mqqt.
- Subscribe to this message and execute functionality. Push the result code to mqqt back.
Subscribe to this message and invoke Python function to prepare response and generate sound file using TTS or Vosk-tts
tts --text "Text for TTS" --model_name "tts_models/en/ek1/tacotron2" --vocoder_name "vocoder_models/universal/libri-tts/wavegrad" --out_path ~/sound.wav
- Using ffmpeg stream it back to source.
To stream mic - ffmpeg -f avfoundation -i ":0" -acodec libmp3lame -ab 32k -ac 1 -f rtp rtp://