Sunteți pe pagina 1din 5

Music Recognition System

Jingyu Zhang

Jiarui Wang

ECE Department in UC Davis

ECE Department in UC Davis

wsshelly1037@gmail.com

jiarui.wang523@gmail.com

ABSTRACT
Language is the most important, common and direct way to
exchange information. Speech recognition is an important
technology to achieve man-machine communication. Speech
recognition in voice dialing telephones, remote control home
appliance, industrial control and other fields has a wide
range of application, and has an important practical value.
Our project also uses the speech recognition to implement
an application with the function that users could control
smart phones with Android system to play songs by speaking names or lyrics of songs. Our application translates the
speech into string, then search the string of the speech in
the database of string which includes the lyrics and other
information of songs. If there is a match, it will play the
song.

Keywords
recognition

1.

INTRODUCTION

Recently, there are some applications that can record a small


part of a song, and find out what the name of this song is.
We try to write an android app that has the similar function.
But processing and matching the data of sound is a very
difficult, so we try from a different way by matching string
of names or lyrics to realize the function.
At first, we use a lyric resolve package to process the sample
lyric files into string which contains names, lyrics, artists,
album and other information, and build a database of these
string files for our Android application. The figure 1 shows
the procedure of resolving lyric to build database.
After the user tell smart phone the name or lyrics of a song,
the application will record the audio of the speech and use a
speech library package Iflytek(which is developed by IFLYTEK CO.,LTD) to translate the speech into string.
The overall procedure of our application is shown by the fig-

Figure 1: database

ure 2. After the user press the start button, the application
will invoke the listener of phone using the Audiorecord class
of API. If there is nobody speaking, the application will return null and go back to the start page waiting for next
operation. If the listener detects that someone is speaking, listener will record the speech. Then input the audio
of speech into the translation model. The translation model
will translate the speech which is inputted into string to output. After that, according to the string of the speech, the
application will search the database to find whether there
is a match with names or lyrics of any song one by one. If
the application could not found a match, it will display no
match, and go back to the start page. If the application
found a match, it will display the name, artist, album of the
song and play the song which is already had been stored in
the memory of the smart phone.
Iflytek package is developed by IFLYTEK CO.LTD. It is
an Auto Speech Recognition. It has powerful functions like
Speech synthesis, speech recognition, semantic understanding, and speech search. We just use the speech recognition
function. We mainly use four essential classes, InitListener,
SpeechRecognizer, setParameter, RecognizerDialogListener.

Figure 3: data of audio

in Matlab, our figures of data are different, like the figure


below
We tried almost two week to solve this problem, but we
failed. I think the buffer from which the application read
data was not the right buffer that contained data of samples.
Thus, we had to try another way to implement the function
by lyrics matching.

Figure 2: procedure

InitListener is used to initialize the listener. SetParameter


is the class to set the begin point, end point and saving path
of speech. RecognizerDialogListener is used to call the result after recognition. We also use some other classes in the
program.

2.

3.

IMPLEMENTATION

There are basically four steps:Firstly, we create a database


for lyric file. Secondly we translate the audio data into text
message; then we do the match to find the song;lastly we
play the music.

3.1

Resolution of Lyric file

FIRST ATTEMPT FAILED

In the beginning, we plan to write an application like Shazam,


which could record a small part of music and find the name
of it from its huge database through internet. In order to
implement similar function, we need to process the data of
samples and songs using the fast Fourier transform to convert the time domain into frequency domain. Because the
frequency of each song is unique, so we can try to find a
match for the sample in the database of each songs frequency.
The first step was to record a part of a song by calling the
listener of smart phone using the Audiorecord class of API.
We succeed in this step.
The second step was to read the data of the sample using
read(short[] audioData) method. The audio here was sampled by the rate of 11025Hz.
The third step was to input the data of the sample into FFT
function unit to convert the data in the time domain to the
data in the frequency domain. But we encountered obstacle
to successfully achieve the data in frequency domain. Then
we try to find out where the error was, we figured out that
data was wrong when we read the data out of the audio
sample. When we plot the data, we should get a figure of
music like the figure 3
But, after plotting our data which is read from samples

Figure 5: lyric file resolution

To get the music information, we first need to know the formate of .lrc file. figure 5
LRC is a text-based formate that used to synchronize songs
with lyrics while playing the audio file. It can be distributed
into two area: ID part and lyric part. figure 6
Usually, ID information are placed at the head of the file,
and the lyric contents are placed after it.
For the first part, there are can be many tags. Most commonly, is the following three tags: ti(Song title),al(Album
name) and ar(Artist). This tags can be arranged in any order.
For the lyrics part, each line contains of a time tag and a
sentence of lyrics.
Thus, we can create a class for lyric files as showed in the
figure 7.
Thus to do the .lrc file resolution, we can do the following

(a)

(b)
Figure 4: wrong data

Figure 6: Lyric files

1. Create a LRC object


2. read the file
3. Put it into the BufferedReader
4. Use readline() method to read the file and get a new
String for each line
5. Resolve the String and get the value for the object
6. Set the value of all the attributes.
For the step 5: To get the value for al,ar,ti and by, we
can simply use the combination of String.startsWith(String)and
the String.substring((int beginIndex, int endIndex)method.
For example, to obtain the value of al:
(a) if(str.startsWith([al:)){
(b) String album = str.substring(4,str.length() -1);
(c) newLRC.setAl(album);
Since we dont care about the timeTag, we can

Figure 7: UML for LRC class

get the lyrics in a similar way by skip the timetag


and read the string afterwards. To do that, We
can check whether the char after[ is a digit by
using the char.isDigit(char)method.
In this way,we can create a LRC object for each
lyric file we have in the database.

3.2

Audio translation

As mentioned in the Introduction part, here we


used the com.iflytek.cloud package to turn the audio data in to text message.
More specifically, we first need to create a SpeechRecognizer Object,and a RecognizerDialog object. And
then attach a RecognizerDialogListener to the RecognizerDialog object. The in the callback method
onResults(ArrayList results,boolean islast)of the
listener, we can use the method RecognizerResult.getResultString() to get the translation result from the server,which will return a message
in Json type.
To parse the JSON type in to String type,we can
use the org.json library.

3.3

one song, so the application could work successfully to find out the right match. I think that is
a fatal disadvantage of lyric matching, no matter
how you improve the algorithm and the application.
There is another problem of our application that
it can be fatally affected by the noise when the
listener of smart phone is recording. If we want
to improve the application with this problem, we
have to go back to use the fast Fourier transform.
Using fast Fourier transform to convert the time
domain to frequency domain, and eliminate the
low frequency which is noise to enhance the robust of the application. If we can achieve this, we
also can try the first idea again to improve the
application by this frequency matching method.
Frequency matching is a better way to implement
our idea, even though it is hard for us.
Despite the problems above, considering the application could successfully work in a small database
of lyrics, we think our project is basically completed and success. The following figures 8 9 10
are the results of correct lyrics matching.

Match

Finally,we have the message and the lyric database,


the next thing to do is the match. To do this, we
need to iterate the lyrics to find whether the message makes a part of it . lyrics contains a list
of Strings, each string is generated from a line of
lyrics file. We compare the message with String
in the list one by one to find a match. If we find
a match, then we will get the certain LRC object,
using the get method we can get the information
we need: Album,artist,title,etc.

3.4

Play the music

Since the .lrc file and .mp3 file has the same file
name for a song. For example the lyric file and
the audio file for the song let it go are : let it
go.irc and let it go.mp3. Therefore, the title of
the LRC object will point us to the very mp3 file.
Then we can play it using MediaPlayer.

4.

CONCLUSIONS

Our project basically fulfilled our demand. It can


recognize the speech of names or lyrics of songs
to find out and play the songs in the database.
But the database is small, which only contains
five songs. The two packages we used, that one
converts the audio of speech into string, the other
one converts lyric files into string, are the essential parts of our application. Then the left work
is to connect the components and invoke the listener and the speaker of the smart phone. These
are basically our project.
There is a problem with lyric matching that many
lyrics have some common words. If the users
record some words that are common in songs, the
application may match the wrong song. In our example, we use the word Queen that is unique in

Figure 8: result1

Figure 9: result2

Figure 10: result3

S-ar putea să vă placă și