The Listener service acts as the ears. This daemon captures audio from a microphone and parses for speech using the combination Voice Activity Detection (VAD) and Deepspeech models to convert raw audio into text. The result is then sent to the brain for further processing.

The Listener also receives commands from the Brain. The main command it will receive is the notification that the Brain is attempting to react audibly to an input. This notification is used to temporarily suspend the listener from taking in new audio samples in an effort to avoid listening to the output of the speaker and creating a loop. When the speaking is complete the Brain sends a follow-up for the listener to resume its normal operation.

It’s important to note that all of the “recognition” of speech in the audio stream is done inside the Listener service. The Brain receives only the converted text results.

The minimum start options for the Listener are:

# Start the Listener service with standard HTTP
python3 --listener-start --use-http \
  --listener-model /path/to/speech/model.pbmm

# Start the Listener service with HTTPS (SSL)
python3 --listener-start \
  --ssl-keypath /path/to/key.pem \
  --ssl-certfile /path/to/cert.pem \
  --listener-model /path/to/speech/model.pbmm