
First step - Setup the complete flow.

  • Using ffmpeg, listen to a port for the stream from python code. (server) ffmpeg -protocol_whitelist file,udp,rtp -i something.sdp -ar str(SAMPLE_RATE) -ac 1 -f s16le - -loglevel quiet

  • Using ffmpeg streaming hellohowareyou.mp3 file in loop to server to mimic mic input. (local) ffmpeg -stream_loop -1 -re -i /Users/harims/code/archived/vosk_code/voice_processing/input/audiofiles/hellohowareyou.mp3 -ar 8000 -f mulaw -f rtp rtp://<Server IP>:<Port>

  • Setup Mosquitto mqqt sudo apt-add-repository ppa:mosquitto-dev/mosquitto-ppa sudo apt-get update sudo apt install mosquitto

  • From server push the identified sentence into mqqt.

      import paho.mqtt.client as mqtt
      def on_pre_connect(client,data):
      # The callback for when the client receives a CONNACK response from the server.
      def on_connect(client, userdata, flags, rc):
          print("Connected with result code "+str(rc))
      client = mqtt.Client()
      client.connect("localhost", 1883, 60)
      def sendtomqqt(data):
          client.publish("identifiedtext", data)
  • Subscribe to this message and invoke Python function to process.

      import paho.mqtt.client as mqtt
      def on_pre_connect(client,data):
      # The callback for when the client receives a CONNACK response from the server.
      def on_connect(client, userdata, flags, rc):
          print("Connected with result code "+str(rc))
          # Subscribing in on_connect() means that if we lose the connection and
          # reconnect then subscriptions will be renewed.
      # The callback for when a PUBLISH message is received from the server.
      def on_message(client, userdata, msg):
          print(msg.topic+" "+str(msg.payload))
      client = mqtt.Client()
      client.on_connect = on_connect
      client.on_message = on_message
      client.connect("localhost", 1883, 60)
      # Blocking call that processes network traffic, dispatches callbacks and
      # handles reconnecting.
      # Other loop*() functions are available that give a threaded interface and a
      # manual interface.
  • Setup NLP based systems like deeppavlov.
  • Get the output and push that to mqqt.
  • Subscribe to this message and execute functionality. Push the result code to mqqt back.
  • Subscribe to this message and invoke Python function to prepare response and generate sound file using TTS or Vosk-tts

      tts --text "Text for TTS" --model_name "tts_models/en/ek1/tacotron2" --vocoder_name "vocoder_models/universal/libri-tts/wavegrad" --out_path ~/sound.wav
  • Using ffmpeg stream it back to source.

To stream mic - ffmpeg -f avfoundation -i ":0" -acodec libmp3lame -ab 32k -ac 1 -f rtp rtp://