Sound Classification

Sound Classification
EASHWAR N N ACHALA S PANDITKLE

H NEHA MALLYA KAMARAJ COLLEGE OF JSS ACADEMY OF TECHNICAL
CANNARA ENGINEERING ENGINEERING AND TECHNOLOGY EDUCATION
COLLEGE eashwar408@gmail.com achpices@gmail.com
mallyaneha@gmail.com
C.V.VAMSI KRISHNA YADAV HARSHIL JAIN
SONAL PRIYADARSHAN MADANAPALLI INSTITUTE OF CHANDIGARH ENGINEERING
DAYANAND SAGAR COLLEGE OF TECHNOLOGY AND SCIENCE COLLEGE
ENGINEERING vamsikrishnavk098@gmail.com harshiljain203@gmail.com
sonalp2010@gmail.com
PRIYAM PODDAR SHREEDEVI B OLEKAR
ANSHULA RANJIT JSS ACADEMY OF TECHNICAL KLETU HUBBALLI
ACHARAYA INSTITUTE OF EDUCATION, BANGALORE shrideviolekar123@gmail.com
TECHNOLOGY priyampoddar89@gmail.com
anshula.beis.16@acharya.ac.in KASHINATH W
KLE COLLEGE OF ENGINEERING
AND TECHNOLOGY, CHIKODI
kashinathwanegaon7441@gmail.
com
I. INTRODUCTION
Abstract-Classification of audio based on key features is

a growing area of research with abundant real-world
We can state that sounds are all around us [1]. We are
applications, simultaneously it is a fundamental
always in constant contact with audio data knowingly and
problem in the field of audio processing and identifying
unknowingly. From personal security to critical
which class does it belongs to. Area of Music Genre
surveillance, sound is a key element to develop the
Classification receives severe attention from both the
research and music industry to minimize human automated systems for these fields.[2] Classification of
independence and developing a streamlined model and audio based on key features is a growing area of research
method for segregation and identification of the genre. with abundant Real-World applications. Identification of
The first step towards the development will be sound consists of stages such as: pre-processing of
determining the type manually. The upcoming task signals, extraction of specific features and their
would be to pre-process the audio inputs and extract classification. In order to pre-process the input signal, we
the key features and applying a classification algorithm. have, segments are used for extracting related features.
We are focusing primarily on the separation of essential
music from the disturbance(noise). Machine Learning The human brain is continuously processing and giving us
along with Deep Learning techniques are extensively information about the environment around us.
used and has been proven to be quite successful in the Automatic environmental and sound classification is a
extracting patterns for classing from a large pool of data. growing area, but the work on the same is scarce.
This project emphasis on using the available open- Observing the recent advancements in the field of image
source data and recent development techniques and classification, CNN is used to classify images with high
efficient models along with suitable algorithms for accuracy and at scale, we are applying the same concept
testing and training the system to achieve our target of to sound classification, where sound is discrete over time.
classifying the sound and thus implementing the
concept of audio tagging with Machine Learning and To achieve classification, we are using MLP and CNN
Deep Learning to build a system that can be classified as As shown in Fig. 1, a spectrogram is a way to visualize the
an Intelligent system, ultimately eradicating maximum frequency spectrum of the sound wave. In simple words,
manual involvement. it is a photograph of the frequency spectrum present in
the sound wave [3].
Keywords— Sentiment analysis, Amazon food reviews,
classification.
The name of the audio file. The name takes the following
format: [fsID]-[classID]-[occurrenceID]-[sliceID].wav,
where:
[fsID] = the Freesound ID of the recording from which this
excerpt (slice) is taken
[classID] = a numeric identifier of the sound class (see
description of classID below for further details)
[occurrenceID] = a numeric identifier to distinguish
different occurrences of the sound within the original
recording
[sliceID] = a numeric identifier to distinguish different
slices taken from the same occurrence
2. fsID:
Figure 1.1: Generated Spectrogram Of A Sound Wave The Freesound ID of the recording from which this
excerpt (slice) is taken
3. start:
The start time of the slice in the original Freesound

II. RELATED WORKS recording
Concerning sound classification, there are numerous
works that are available as open source on the internet. 4. end:
Mike Smales project ‘Classifying Urban Sounds Using The end time of slice in the original Freesound recording
Deep Learning.
5. salience:
Sound Classification Using Convolutional Neural Network A (subjective) salience rating of the sound.
and Tensor Deep Stacking Network by Aditya khamparia
and team. 1=foreground
The data-set used for all these projects differ from each 2=background.
other.
6. fold:The fold number (1-10) to which this file has been
allocated.
III. DATASETS 7. classID:
Datasets used in this project is more of an environmental

sound datasets, not speech datasets.The environmental A numeric identifier of the sound class:
sound datasets are very limited which is a huge problem 0=air_conditioner
to develop a good system for sound classification.[1] 1=car_horn
2=children_playing
UrbanSound Dataset sample - subsection of the data 3=dog_bark
used within the project so you don't need to download 4=drilling
the full dataset.The dataset being used can be 5=engine_idling
downloaded 6=gun_shot
from:https://urbansounddataset.weebly.com/urbansou 7=jackhammer
nd8k.html 8=siren
9=street_music
The audios in these are stored in the form of .wav form
and are associated with metadatas for the same. 8. class:
There are 8732 audio files of urban sounds in wav The class name: air_conditioner, car_horn,
format.The sampling rate, bit depth, and number of children_playing, dog_bark, drilling, engine_idling,
channels are the same as those of the original file gun_shot, jackhammer,
uploaded to Freesound (and hence may vary from file to siren, street_music.
file).
Metadata:UrbanSound8k.csv
This file contains meta-data information about every

audio file in the dataset. This includes:
1. slice_file_name:
GUI: The Graphical User Interface is a form of user
interface that allows users to interact with electronic
devices through graphical icons and visual indicators such
as secondary notation, instead of text-based user
interfaces, typed command labels or text navigation.
Input: The user inputs the audio files
Figure2.1: 10 folds of classes Mp3 to Wav: When music is downloaded it is in Mp3

form and for classification we have to convert
the audio files to WAV files
Test Data: A list of testable datasets that are pre-loaded

into the model to test the data. It is very difficult to find
the converted audio files online hence a data set is
prepared and loaded into the model.
Training Data hat is a set of examples used to fit the

parameters
Segregation: Model is having varied versions
Version 1: Audio file can be classified as music or noise
Version 2: Identification of noise
Accuracy / Prediction: Accuracy of the model is checked

Figure2.2: UrbanSound8k.csv Cross validation: Comparing testing and training
accuracy
IV. ARCHITECTURE AND METHODOLOGY
Output: to determine the required target
V. WORKING OF THE MODEL
Fig 5.1. Run jupyter_noteboook(menu)
Fig 4.1. Block Diagram

Fig 5.6. Data normalization
Fig 5.2. List of existing audio files
Fig 5.7. Accuracy rate for MLP
Fig 5.3. User Input of audio name Fig 5.8. Accuracy rate for CNN
Fig 5.4. If file exists
Fig 5.9. Input given as bark and the Output predicted is

dog_bark with the higher predictability ratio
VI. RESULTS
In this way, we can predict the type of audio file given as

input. This audio analysis is done for the sake of
entertainment media. Music Industry can effectively
utilize this when categorization becomes hard and
reliable result is not predicted Companies can look into
Fig 5.5. mp3 to wav format these predictions and henceforth perform other
operations as they wish according to their wish and the
work scenario. This is not time consuming and more
efficient. The program is using a machine-based learning
approach which is more accurate for analysing an audio
together with natural language processing techniques,
deep learning and python programming language will be
used. As a result, program will categorize audio into the
type it belongs to, that is defined in the dataset.
VII. CONCLUSION
Automatic tagging is a step towards better understanding

of music by machines. One road map
to achieve this goal consists of two steps: describing aud
io with meaningful labels [4] and using these labels to
manage and discover new products. The second step ha
s been intensively developed by web search
engines, and it is the greatest incentive to create a good
automatic tagging algorithm. We state that sounds are all
around us and it is inevitable to ignore the sound.
However, it should be kept in mind that no machine
learning algorithm can provide an accuracy of 100% due
to the complexity and limitations of the technology. So,
despite some of its limitations, audio tagging has proven
to be a valuable opportunity for businesses to grow. In
this work, we have proposed both MLP and CNN [5].
VIII. REFERENCES
[1]Udacity-ML-Capstone, Classifying Urban Sounds Using

Deep Learning by Mike Smales.
From:https://github.com/mikesmales/Udacity-ML-
Capstone/blob/master/Proposal/Proposal.pdf
[2]Sound Classification Using Convolutional Neural

Network and Tensor Deep Stacking Network Publisher:
IEEE.
From:https://ieeexplore.ieee.org/document/8605515
[3]R. A. Altes, "Detection estimation and classification

with spectrograms", J. Acoust. Soc. Amer., vol. 67, no. 4,
pp. 1232-1246, 1980.
from:https://asa.scitation.org/doi/10.1121/1.384165
[4]Automatic Tagging of Songs Using Machine

LearningAnurag Das#1, Rajat Bhai#2, ShaiwalSachdev#3,
Tanushree Anand#4and Utkarsh
Kumar#5http://article.nadiapub.com/IJDTA/vol9_no5/1
3.pdf
[5]GENERAL AUDIO TAGGING WITH ENSEMBLING

CONVOLUTIONAL NEURALNETWORKS AND STATISTICAL
FEATURESKele Xu1,2, Boqing Zhu1,2, Qiuqiang Kong3,
Haibo Mi1,2, Bo Ding1,2, Dezhi Wang4∗, Huaimin
Wang1,2 from:https://arxiv.org/pdf/1810.12832v1.pdf
[6]https://towardsdatascience.com/urban-sound-
classification.

Sound Classification

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Sound Classification

Încărcat de

Drepturi de autor:

Formate disponibile

Sound Classification

EASHWAR N N ACHALA S PANDITKLE

Abstract-Classification of audio based on key features is

The start time of the slice in the original Freesound

III. DATASETS 7. classID:

Datasets used in this project is more of an environmental

This file contains meta-data information about every

Input: The user inputs the audio files

Figure2.1: 10 folds of classes Mp3 to Wav: When music is downloaded it is in Mp3

Test Data: A list of testable datasets that are pre-loaded

Training Data hat is a set of examples used to fit the

Segregation: Model is having varied versions

Version 1: Audio file can be classified as music or noise

Version 2: Identification of noise

Accuracy / Prediction: Accuracy of the model is checked

V. WORKING OF THE MODEL

Fig 5.1. Run jupyter_noteboook(menu)

Fig 4.1. Block Diagram

Fig 5.2. List of existing audio files

Fig 5.7. Accuracy rate for MLP

Fig 5.4. If file exists

Fig 5.9. Input given as bark and the Output predicted is

In this way, we can predict the type of audio file given as

Automatic tagging is a step towards better understanding

[1]Udacity-ML-Capstone, Classifying Urban Sounds Using

[2]Sound Classification Using Convolutional Neural

[3]R. A. Altes, "Detection estimation and classification

[4]Automatic Tagging of Songs Using Machine

[5]GENERAL AUDIO TAGGING WITH ENSEMBLING

S-ar putea să vă placă și