Build a speech recognizer with python

Itexamtools
3 min readJan 16

--

We need a database of speech files to build our speech recognizer. We will use the database

available at https://code.google.com/archive/p/hmm-speech-recognition/

downloads. This contains seven different words, where each word has 15 audio files

associated with it. This is a small dataset, but this is sufficient to understand how to build a

speech recognizer that can recognize seven different words. We need to build an HMM model

for each class. When we want to identify the word in a new input file, we need to run all the

models on this file and pick the one with the best score. We will use the HMM class.

How to do it…

1. Create a new Python file, and import the following packages:

import os

import argparse

import numpy as np

from scipy.io import wavfile

from hmmlearn import hmm

from features import mfcc

2. Define a function to parse the input arguments in the command line:

# Function to parse input arguments

def build_arg_parser():

parser = argparse.ArgumentParser(description=’Trains the HMM

classifier’)

parser.add_argument(“ — input-folder”, dest=”input_folder”,

required=True,

help=”Input folder containing the audio files in

subfolders”)

return parser

3. Define the main function, and parse the input arguments:

if __name__==’__main__’:

args = build_arg_parser().parse_args()

input_folder = args.input_folder

4. Initiate the variable that will hold all the HMM models:

hmm_models = []

5. Parse the input directory that contains all the database’s audio files:

# Parse the input directory

for dirname in os.listdir(input_folder):

6. Extract the name of the subfolder:

# Get the name of the subfolder

subfolder = os.path.join(input_folder, dirname)

if not os.path.isdir(subfolder):

continue

7. The name of the subfolder is the label of this class. Extract it using the following:

# Extract the label

label = subfolder[subfolder.rfind(‘/’) + 1:]

8. Initialize the variables for training:

# Initialize variables

X = np.array([])

y_words = []

9. Iterate through the list of audio files in each subfolder:

# Iterate through the audio files (leaving 1 file for

testing in each class)

for filename in [x for x in os.listdir(subfolder) if

x.endswith(‘.wav’)][:-1]:

10. Read each audio file, as follows:

# Read the input file

filepath = os.path.join(subfolder, filename)

sampling_freq, audio = wavfile.read(filepath)

11. Extract the MFCC features:

# Extract MFCC features

mfcc_features = mfcc(audio, sampling_freq)

12. Keep appending this to the X variable:

# Append to the variable X

if len(X) == 0:

X = mfcc_features

else:

X = np.append(X, mfcc_features, axis=0)

13. Append the corresponding label too:

# Append the label

y_words.append(label)

14. Once you have extracted features from all the files in the current class, train and save

the HMM model. As HMM is a generative model for unsupervised learning, we don’t

need labels to build HMM models for each class. We explicitly assume that separate

HMM models will be built for each class:

# Train and save HMM model

hmm_trainer = HMMTrainer()

hmm_trainer.train(X)

hmm_models.append((hmm_trainer, label))

hmm_trainer = None

15. Get a list of test files that were not used for training:

# Test files

input_files = [

‘data/pineapple/pineapple15.wav’,

‘data/orange/orange15.wav’,

‘data/apple/apple15.wav’,

‘data/kiwi/kiwi15.wav’

]

16. Parse the input files, as follows:

# Classify input data

for input_file in input_files:

17. Read in each audio file:

# Read input file

sampling_freq, audio = wavfile.read(input_file)

18. Extract the MFCC features:

# Extract MFCC features

mfcc_features = mfcc(audio, sampling_freq)

19. Define variables to store the maximum score and the output label:

# Define variables

max_score = None

output_label = None

20. Iterate through all the models and run the input file through each of them:

# Iterate through all HMM models and pick

# the one with the highest score

for item in hmm_models:

hmm_model, label = item

21. Extract the score and store the maximum score:

score = hmm_model.get_score(mfcc_features)

if score > max_score:

max_score = score

output_label = label

22. Print the true and predicted labels:

# Print the output

print “\nTrue:”, input_file[input_file.find(‘/’)+1:input_

file.rfind(‘/’)]

print “Predicted:”, output_label

23. The full code is in the speech_recognizer.py file. If you run this code, you will see

the following on your Terminal:

check out more python learning here

--

--

Itexamtools

At ITExamtools.com we help IT students and Professionals by providing important info. about latest IT Trends & for selecting various Academic Training courses.