Offline Speech Recognition
Introduction
Most businesses in the world try to reach their customers according to their culture to get customers more interact with the business. In that case, reaching the customers with their own language is an excellent method to deal with their customers. Therefore, recognizing the customer's indentation using their own language is very important. Therefore, Here, we’re mainly focusing on the “What is speech recognition?”, options we have to do speech recognition and speech recognition with English and Arabic languages. But we’ll provide enough guidance to switch to other languages easily.
Natural Language Processing (NLP) is the core of speech recognition. It’s like teaching a language to a small child. We train a model on how to use language and say what the rules of the language are by providing sample data, and the model starts learning after that we can get predictions utilizing that model. NLP is a trending machine learning application that use in different areas to give a better experience to people in the world.
Let’s start with the content..!
Speech Recognition
- Microsoft Azure Cognitive Services - Speech Recognition
- Google Cloud - Speech-to-text
( Documentation: Azure Speech Recognition, Google-speech-to-text ) can be considered as huge platforms where we can perform speech recognition, and there are very accurate when producing their predictions. But majorly some organizations are messed with the cost that they have to pay for these services, some organizations don't have cloud subscriptions, and some do not like to use cloud platforms as they do not like to share their business data with the third parties since there are more sensitive data in the business. At this point, organizations looking for applications that can run in offline mode, connecting our own model. However some cases it is not important whether it uses the internet or not.
Offline Speech Recognition
We are going to focus on offline speech recognition systems here widely. Here are some methods that we're going to demonstrate here.
Mozilla DeepSpeech
Arabic Speech Recognition with Klaam
Vosk-API (developed using the Kaldi project)
Let's go through them one by one.
1. Mozilla DeepSpeech
Step 1: Creating a virtual environment
source $HOME/tmp/deepspeech-venv/bin/activate
Step 2: Install Tensorflow
pip install --upgrade pip
pip install tensorflow
If you want further support you can refer to Tesorflow Official Documentation
Step 3: Install DeepSpeech
pip3 install deepspeech
Step 4: Download English Model Files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
curl -LOhttps://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer
Step 5: Download the Sample Audio file
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/audio-0.9.3.tar.gz
tar xvf audio-0.9.3.tar.gz
Step 6: Transcribe an audio
deepspeech --model deepspeech-0.9.3-models.pbmm --scorer deepspeech-0.9.3-models.scorer --audio audio/2830-3980-0043.wav2. Arabic Speech Recognition with Klaam
Klaam is a powerful project that supports Natural Language Understanding in the Arabic Language. This project has the ability for Speech Recognition, Text-To-Speech, and Speech Classification.
Let’s dive into how to create a klaam project.
Step 2: Clone the klaam project from Github.
Note that, if you get issues when installing packages please make sure your python version is equal to or above python 3.7.
Some python versions there’ll not have an IPython library in-build. If you get this issue, run the following command.
pip install IPython
from IPython.display import Audio
from klaam import SpeechRecognition
model = SpeechRecognition()
data = model.transcribe('samples/demo.wav')
print(data)
Run your application: python speech_recognition.py
2.2. Speech Classification
Same as Speech Recognition place your “.wav” file in your project directory and replace “wave_file” with a path to your audio file.
speech_classification.py
from IPython.display import Audio
from klaam import SpeechRecognition
model = SpeechRecognition()
data = model.transcribe('samples/demo.wav')
print(data)
Run your application: python speech_classification.py
2.3. Text-To-Speech
Here you have to consider the root path. Make sure you put the path where the “cfgs” file is located. Replace your Arabic sentence with “arabic_sentence”
text_to_speech.py
from klaam import TextToSpeech
prepare_tts_model_path = "cfgs/FastSpeech2/config/Arabic/preprocess.yaml"
model_config_path = "cfgs/FastSpeech2/config/Arabic/model.yaml"
train_config_path = "cfgs/FastSpeech2/config/Arabic/train.yaml"
vocoder_config_path = "cfgs/FastSpeech2/model_config/hifigan/config.json"
speaker_pre_trained_path = "data/model_weights/hifigan/generator_universal.pth.tar"
This will create a .wav file called “sample.wav” with translated voice.
3. VOSK-API
Vosk API is a powerful tool that we can use to do speech recognition with many languages like Python, C#, JAVA, NodeJS, Ruby etc.
Checkout my post on Speech Recognition with VOSK-API
Keep going 👏
ReplyDelete