Python speech recognition when you are offline
August 10, 2022
In the first article, we talk and building a speech recognition system but it uses the internet to connect to google and use its speech recognition algorithm, today in this article we going to build a speech recognition system when you are offline.
let's get started.
First of all, there is a python library called, VOSK. to install it on your computer type this command
pip3 install vosk
for more details please visit:
https://alphacephei.com/vosk/install
now we need to install my audio
pip install pyaudio
now our pre-requirements are done, now we have to build the offline speech recognition program. first, we need to import all the necessary
from vosk import Model, KaldiRecognizer
import pyaudio
here what we do is import the model and the kaldiRecognizer from the vosk library. along with that, we import py audio.
now we have to download the model for that go to this website and choose your preferred model and download it:
https://alphacephei.com/vosk/models
here I use " vosk-model-small-en-us-0.15
" as my model
after download, you can see it is a compressed file unzip it in your root folder, like this
speech-recognition/
├─ vosk-model-small-en-us-0.15 ( Unzip follder )
├─ offline-speech-recognition.py ( python file )
now create a variable called " model " and type this
model = Model(r"C:\\Users\User\Desktop\python practice\ai\vosk-model-small-en-us-0.15")
what we do here we create a variable called " model " and assign our unzip folder location with the Model
speech-recognition/
├─ vosk-model-small-en-us-0.15 ( Unzip follder )
now we going to create a variable called " recognizer " and assign KaldiRecognizer
recognizer = KaldiRecognizer(model, 16000)
now as usual we have to import the microphone to our code so we import pyaudio
now we going to call the pyaudio to use the microphone to catch the voice.
mic = pyaudio.PyAudio()
stream = mic.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192) stream.start_stream()
now we enter the fun part :
while True:
data = stream.read(4096)
if recognizer.AcceptWaveform(data):
text = recognizer.Result()
print(f"' {text[14:-3]} '")
what we do here is we write a while loop
create a variable called data and assign it to a " stream.read
(4096)
"
data = stream.read(4096)
now we create a decision " if " recognizer caught a waveform in the data
then assign the results to the variable called " text " and print it out.
if recognizer.AcceptWaveform(data):
text = recognizer.Result()
print(text)
this is the output we got :
{
"text" : "hello"
}
{
"text" : "hello"
}
{
"text" : "hi"
}
to remove { and "text" changes the print command to this :
print(f"' {text[14:-3]} '")
now the output will look like this , clean and more readable:
' '
' hello '
' hi '
' '
the complete code :
from vosk import Model, KaldiRecognizer
import pyaudio
model = Model(r"C:\\Users\User\Desktop\python practice\ai\vosk-model-small-en-us-0.15")
recognizer = KaldiRecognizer(model, 16000)
mic = pyaudio.PyAudio()
stream = mic.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192)
stream.start_stream()
while True:
data = stream.read(4096)
if recognizer.AcceptWaveform(data):
text = recognizer.Result()
print(f"' {text[14:-3]} '")
visit https://alphacephei.com/vosk/models
for more languages and models.