Build a speech recognizer with python
--
We need a database of speech files to build our speech recognizer. We will use the database
available at https://code.google.com/archive/p/hmm-speech-recognition/
downloads. This contains seven different words, where each word has 15 audio files
associated with it. This is a small dataset, but this is sufficient to understand how to build a
speech recognizer that can recognize seven different words. We need to build an HMM model
for each class. When we want to identify the word in a new input file, we need to run all the
models on this file and pick the one with the best score. We will use the HMM class.
How to do it…
1. Create a new Python file, and import the following packages:
import os
import argparse
import numpy as np
from scipy.io import wavfile
from hmmlearn import hmm
from features import mfcc
2. Define a function to parse the input arguments in the command line:
# Function to parse input arguments
def build_arg_parser():
parser = argparse.ArgumentParser(description=’Trains the HMM
classifier’)
parser.add_argument(“ — input-folder”, dest=”input_folder”,
required=True,
help=”Input folder containing the audio files in
subfolders”)
return parser
3. Define the main function, and parse the input arguments:
if __name__==’__main__’:
args = build_arg_parser().parse_args()
input_folder = args.input_folder
4. Initiate the variable that will hold all the HMM models:
hmm_models = []
5. Parse the input directory that contains all the database’s audio files:
# Parse the input directory
for dirname in os.listdir(input_folder):
6. Extract the name of the subfolder:
# Get the name of the subfolder
subfolder = os.path.join(input_folder, dirname)
if not os.path.isdir(subfolder):
continue
7. The name of the subfolder is the label of this class. Extract it using the following:
# Extract the label
label = subfolder[subfolder.rfind(‘/’) + 1:]
8. Initialize the variables for training:
# Initialize variables
X = np.array([])
y_words = []
9. Iterate through the list of audio files in each subfolder:
# Iterate through the audio files (leaving 1 file for
testing in each class)
for filename in [x for x in os.listdir(subfolder) if
x.endswith(‘.wav’)][:-1]:
10. Read each audio file, as follows:
# Read the input file
filepath = os.path.join(subfolder, filename)
sampling_freq, audio = wavfile.read(filepath)
11. Extract the MFCC features:
# Extract MFCC features
mfcc_features = mfcc(audio, sampling_freq)
12. Keep appending this to the X variable:
# Append to the variable X
if len(X) == 0:
X = mfcc_features
else:
X = np.append(X, mfcc_features, axis=0)
13. Append the corresponding label too:
# Append the label
y_words.append(label)
14. Once you have extracted features from all the files in the current class, train and save
the HMM model. As HMM is a generative model for unsupervised learning, we don’t
need labels to build HMM models for each class. We explicitly assume that separate
HMM models will be built for each class:
# Train and save HMM model
hmm_trainer = HMMTrainer()
hmm_trainer.train(X)
hmm_models.append((hmm_trainer, label))
hmm_trainer = None
15. Get a list of test files that were not used for training:
# Test files
input_files = [
‘data/pineapple/pineapple15.wav’,
‘data/orange/orange15.wav’,
‘data/apple/apple15.wav’,
‘data/kiwi/kiwi15.wav’
]
16. Parse the input files, as follows:
# Classify input data
for input_file in input_files:
17. Read in each audio file:
# Read input file
sampling_freq, audio = wavfile.read(input_file)
18. Extract the MFCC features:
# Extract MFCC features
mfcc_features = mfcc(audio, sampling_freq)
19. Define variables to store the maximum score and the output label:
# Define variables
max_score = None
output_label = None
20. Iterate through all the models and run the input file through each of them:
# Iterate through all HMM models and pick
# the one with the highest score
for item in hmm_models:
hmm_model, label = item
21. Extract the score and store the maximum score:
score = hmm_model.get_score(mfcc_features)
if score > max_score:
max_score = score
output_label = label
22. Print the true and predicted labels:
# Print the output
print “\nTrue:”, input_file[input_file.find(‘/’)+1:input_
file.rfind(‘/’)]
print “Predicted:”, output_label
23. The full code is in the speech_recognizer.py file. If you run this code, you will see
the following on your Terminal: