Documente Academic
Documente Profesional
Documente Cultură
Authors Name/s per 1st Affiliation (Author) Authors Name/s per 2nd Affiliation (Author)
line 1 (of Affiliation): dept. name of organization line 1 (of Affiliation): dept. name of organization
line 2-name of organization, acronyms acceptable line 2-name of organization, acronyms acceptable
line 3-City, Country line 3-City, Country
line 4-e-mail address if desired line 4-e-mail address if desired
Abstract— This paper describes our project that aims to the nueron activitions in each layer is given by the following
control a wheelchair based on voice commands. The program is matrix.[1]
able to recognize voice basic commands that are required to move
the wheelchair. This is done using a dataset that has been trained
by convolutional neural networks. The voice input is compared to
find the best match to the predefined set and wheelchair moves
accordingly.
In each layer we use a rectified linear unit ( ReLU ) non This image is then given to a multi layered convolutional
linearity. In the output layer , there is one output target for neural network. [8]
each of the sound in the keyboard phrase. [5]
Using asynchronous gradient descent , we train the < insert the spectrogram images>
network weights to optimize a cross entropy criterion.
V. CONFUSION MATRIX VII. RESULT
To analyse the mistakes that the network is doing , we use <insert image of result>
the confusion matrix.[9] Each column represents the set of <write some conclusion from result>
samples that were predicted to be each label and each row
represents their ground truth value. For example in this model
the first column represents all clips that were predicted as VIII.CONCLUSION
"stop" and the first row contains all the clips that were "stop"
If the model is ideal, the confusion matrix produced will Thus, in this project we have been able to successfully use
contain all zeroes except the diagonal elements
convolutional neural networks to build a model that can be
used to recognize basic voice commands. The model was
trained to identify the commands like “left”, “right”, “stop”.
This was then interfaced with Arduino to control the motors of
the wheelchair
Acknowledgment
Perfect and precision guidance, hard work ,
dedication and full encouragement are needed to complete a
project successfully. In the life of every student illumination of
project work is like engraving a diamond.
We take this opportunity on the successful
completion of our project to thank all the staff members for
their valuable guidance, for devoting their precious time-
VI. STREAMING ACCURACY sharing their knowledge and their co-operation throughout the
course of development of our project and academic years of
Our model is based on individual clips but the audio education.
recognition applications run on a continuous stream . We own a deep guidance to our project guide Dr. M.
Therefore a general way to use a model in this system is by Deshpande. whose valuable guidance, which has been a key
applying this repeatedly with different offsets in time. By
factor in the successful completion of project.
averaging the results over a short window a smooth prediction
can be produced.
Our input is like an image; therefore, we need a series of References
images sampled at a high rate to increases the chances of
having an alignment that captures most of the spoken word in [1] A. K. Jain, Jianchang Mao and K. M. Mohiuddin, "Artificial neural
a single time window that we feed into the model. networks: a tutorial," in Computer, vol. 29, no. 3, pp. 31-44, Mar 1996.
doi: 10.1109/2.485891
<insert pics of accuracy >
[2] H. Bahi and N. Benati, "A new keyword spotting approach," 2009
By modifying the average signal parameters we can International Conference on Multimedia Computing and Systems,
produce the desired results as per our application. For Ouarzazate, 2009, pp. 77-80.
doi: 10.1109/MMCS.2009.5256728
example, some applications may require a high recall value
[3] S. K. Kopparapu and M. Laxminarayana, "Choice of Mel filter bank in
whereas the others may require a high precision. Generating computing MFCC of a resampled speech," 10th International
an ROC curve can aid in understanding. [10] Conference on Information Science, Signal Processing and their
Applications (ISSPA 2010), Kuala Lumpur, 2010, pp. 121-124.
doi: 10.1109/ISSPA.2010.5605491
[4] L. Thomas, Manoj Kumar M V and Annappa B, "Discovery of optimal
neurons and hidden layers in feed-forward Neural Network," 2016 IEEE
International Conference on Emerging Technologies and Innovative
Business Practices for the Transformation of Societies (EmergiTech),
Balaclava, 2016, pp. 286-291.
[5] M. D. Zeiler et al., "On rectified linear units for speech processing,"
2013 IEEE International Conference on Acoustics, Speech and Signal
Processing, Vancouver, BC, 2013, pp. 3517-3521.
doi: 10.1109/ICASSP.2013.6638312.
[6] S. Soldo, M. Magimai. -Doss, J. Pinto and H. Bourlard, "Posterior
features for template-based ASR," 2011 IEEE International Conference
on Acoustics, Speech and Signal Processing (ICASSP), Prague, 2011, pp.
4864-4867.
doi: 10.1109/ICASSP.2011.5947445
[7] J. Dennis, H. D. Tran and H. Li, "Spectrogram Image Feature for Sound vol. 6, no. 2, pp. 215-222, June 2008.
Event Classification in Mismatched Conditions," in IEEE Signal doi: 10.1109/TLA.2008.4609920.
Processing Letters, vol. 18, no. 2, pp. 130-133, Feb. 2011. [11] O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn, D. Yu,
doi: 10.1109/LSP.2010.2100380 “Convolutional neural networks for speech recognition,” IEEE
[8] S. Albawi, T. A. Mohammed and S. Al-Zawi, "Understanding of a Transactions on Audio, Speech, and Language Processing, vol.22, no.1,
convolutional neural network," 2017 International Conference on pp.1533-1545, 2014.
Engineering and Technology (ICET), Antalya, 2017, pp. 1-6. [12] T.N. Sainath, A. Mohamed, B. Kingsbury, and B. Ramabhadran, “Deep
doi: 10.1109/ICEngTechnol.2017.8308186 convolutional neural networks for LVCSR,” in Proc IEEE ICASSP,
[9] N. D. Marom, L. Rokach and A. Shmilovici, "Using the confusion 2013.
matrix for improving ensemble classifiers," 2010 IEEE 26-th [13] T.N. Sainath, B. Kingsbury, A. Mohamed, G.E. Dahl, G. Saon, H.
Convention of Electrical and Electronics Engineers in Israel, Eliat, Soltau, T. Beran, A.Y. Aravkin, and B. Ramabhadran, “Improvements to
2010, pp. 000555-000559. deep convolutional neural networks for LVCSR,” in Proc IEEE ASRU,
doi: 10.1109/EEEI.2010.5662159 2013.
[10] R. C. Prati, G. E. A. P. A. Batista and M. C. Monard, "Evaluating
Classifiers Using ROC Curves," in IEEE Latin America Transactions,