Sunteți pe pagina 1din 8

Introduction 1 Introduction 2

Course goals
SGN-14006 / A.K. SGN-14006 / A.K.

SGN–14006 !  Learn basics of audio signal processing


Audio and Speech Processing –  Basic operations and their underlying ideas and principles
–  Give basic skills although all the latest cutting edge algorithms
cannot be covered

!  Learn fundamentals of speech processing


–  Speech production and its computational modeling
Lectures, Fall 2015 –  Acoustic features to represent speech signals
Pasi Pertilä –  Some applications: speech coding, synthesis
Tampere University of Technology
!  Learn the basics of acoustics and human hearing
–  These form the foundation for technical applications
(slides by Anssi Klapuri)

Introduction 3 Introduction 4

Lecture timeline (some changes may still take place) What is not covered by this course
SGN-14006 / A.K. SGN-14006 / A.K.

!  Sound, audio signals, acoustics !  Speech recognition, audio content analysis, and acoustic
!  Hearing pattern recognition
" Course SGN-24006 ”Analysis of Audio, Speech and Music
!  Basic audio signal processing operations
Signals” (period 4)
–  AD/DA-conversion, filters and filter banks, dynamic control, etc.
!  Sound synthesis
!  Analog audio
!  Audio coding
–  Electroacoustics, microphone and loudspeaker design
" See the course ”Akustiikan mittaukset”
!  Speech production anatomy, phonetics
!  Linear prediction, MFCCs, and cepstrum !  Hardware implementations
!  Speech coding
!  Speech synthesis
Introduction 5 Introduction 6
Exercises
Practical arrangements
SGN-14006 / A.K. SGN-14006 / A.K.

!  Exercises start one week after the lectures (2.9.2015)


!  Assistants: Shriram Nandakumar, Emre Cakir
!  Course homepage: http://www.cs.tut.fi/~sgn14006
!  Lectures !  Contents: math and Matlab exercises related to the
–  Mondays 12-14 in TB219 lectures
–  Thursdays 14-16 in TB222 !  Two alternative groups
–  Pasi Pertilä, pasi.pertila @ tut.fi –  Tuesday 10-12 in TC303 (updated!)
!  Lecture slides will be available as pdf on the course page –  Friday 12-14 in TC303
–  Course is not based on any individual textbook. Lectures, lecture notes –  Register to either group on-line at 14:00 today www.tut.fi/pop
and exercises will be sufficient to take the exam.
–  Some recommended textbooks are mentioned at the end of this
introduction
!  Requirements: exam and project work !  Math problems are to be solved in advance, Matlab
!  5 cr exercises are done during the exercises
!  Active completion of the exercises and participation in the
exercises is credited up to 3 points in the exam
(equivalent to one mark)
!  Project work will be discussed at the exercises too

Introduction 7 Introduction 8

Project work
SGN-14006 / A.K. SGN-14006 / A.K.
Reference material
!  Implementing an audio signal processing algorithm in !  Gold, Morgan, Ellis, ”Speech and audio signal processing,” Wiley, 2011.
Matlab !  Zölzer.”Digital audio signal processing,” Wiley&Sons, 2nd ed. 2008.
–  In two-person groups –  Including AD/DA-conversion, dynamic control, equalization, filter banks
!  T.F. Quatieri: "Discrete-Time Speech Signal Processing: Principles and
!  Topic(s) will be introduced later during the lectures Practice", Prentice Hall PTR, 2002.
!  Rossing. ”The science of sound”, Addison-Wesley, 1990.
!  Requirements: –  Acoustics, hearing
–  Choosing the topic !  Brandenburg, Kahrs. (1998). ”Applications of digital signal processing to audio
and acoustics,” Kluwer Academic Publishers
–  Implementing the algorithm –  Chapter on Perceptual audio coding
–  Final report by 28.10.
!  Pulkki, Karjalainen, ”Communication acoustic”,2015, Wiley
!  More detailed instructions will appear on the course home
page
Introduction 9 Introduction 10

Audio signals
SGN-14006 / A.K. SGN-14006 / A.K.

!  Audio = related to sound or hearing


Introduction to audio signals and !  The word sound may mean
1.  a sensation perceived by the auditory system, or
their representation 2.  longitudinal pressure waves in a material medium (such as air)
that may cause a hearing sensation
–  Due to human hearing, we usually consider the frequency range
20 Hz – 20 kHz and air as the medium (although hearing works
also underwater for example)
!  Sound signal – audio signal
–  Numerical representation of sound
–  Sound pressure level as a function of time, measured using a
microphone for example
!  Note: audio signal is often understood as non-speech
audio signal, although speech signals are audio too

Introduction 11 Introduction 12

Audio and speech processing


SGN-14006 / A.K. SGN-14006 / A.K.
Audio signal representations

!  Where is audio and speech processing needed? !  Different applications employ different representations
!  Examples: –  Time domain representation
–  Convert a musical piece into compressed mp3 format and store it –  Frequency domain representation
on a hard disc for playback later (audio coding) –  Time-frequency domain representation
–  Encode a speech signal on a mobile phone before transmission !  On this course we consider mainly music and speech
–  Add reverberation to a sound, correct the pitch of a singer (studio –  Music signals involve a wide variety of sounds, billions of people
technology) listen to music worldwide
–  Enhance the quality of a speech signal (denoising, echo cancell.) –  Speech signals are an important special category of sound signals
–  Compensate for loudspeaker non-idealities by digital equalization due to their importance for communication
!  Typical digital signal processing system:
1. Digitize a signal (sampling, quantization)
2. Process in digital form (store, manipulate, etc)
-digital representation enables a variety of algorithms
3. Convert back to an analog signal
Introduction 13 Introduction 14

Time domain signal Time domain signal (1)


SGN-14006 / A.K. SGN-14006 / A.K.

!  Air pressure level as a function of time (zero level =


normal air pressure) is a natural representation for audio !  Analog signal (solid line) can be represented with discrete
–  An analog signal is easy to record using a microphone and play samples (dots) without loss of information, if the sampling
back using a loudspeaker frequency ≥ 2 * highest frequency component in the signal
!  For music, typical sampling rates are 44.1 or 48 kHz –  Remember from introductory signal processing courses
–  Allows for representing the frequency range of human hearing
(approximately 20 Hz – 20 kHz)
!  For speech
–  8 kHz: Narrowband
•  the conventional telephone rate (sibilants /s/, /f/ distorted)
–  16 kHz: Wideband
•  voice over IP, bandwidth extension
!  Other rates are also widely used: 96, 32, 22.05 kHz etc.
!  Most of the energy (and information) of natural sounds is
at low frequencies (around 200 Hz – 5 kHz)

Introduction 15 Introduction 16

Time domain signal (2) Time domain signal (3)


SGN-14006 / A.K. SGN-14006 / A.K.

!  Large time scale illustrates the sound amplitude envelope !  Zoom-in of the same oboe signal at time t = 0.45 s
!  Example signal: one note from the oboe !  90 ms frame illustrates the periodic waveform
–  Amplitude is zero before the sound starts –  Many sounds are periodic, for example most musical instrument
–  The oboe has continuous excitation, therefore the sound’s sounds and vowels in speech
amplitude envelope remains nearly constant throught it duration
Introduction 17 Introduction 18
SGN-14006 / A.K. SGN-14006 / A.K.
Frequency domain representation – spectrum Consider log-frequency and dB-magnitude
!  Obtained by computing discrete Fourier transform (for
example) of the time-domain signal, usually in a short frame
!  Linear scale
!  Many perceptually important properties are more clearly
visible in the frequency domain –  usually
hard to ”see”
!  Decibel scale for amplitude is useful from the viewpoint of anything
the human hearing and the dynamics of natural sounds
–  Due to Fechner’s law (subjective sensation is proportional to the !  Log-frequency
logarithm of the stimulus intensity) –  each octave is
!  Phases are perceptually less important – often omitted approximately
equally important
perceptually
!  Log-magnitude
–  perceived change
from 50dB to 60dB
about the same as
from 60dB to 70dB

Introduction 19 Introduction 20

Example audio signals: guitar


SGN-14006 / A.K. SGN-14006 / A.K.
Time-frequency representation – spectrogram

!  Shows sound intensity as a function of time and frequency !  Sound decays gradually after the onset
!  Obtained by blocking the signal into short analysis frames !  Instantaneous excitation: string is plucked at onset
and by computing their spectra
!  Periodic sound (vibrating string, covered on Acoustics
!  For audio, the frame size is typically 10–100 ms: sound lecture)
spectra are often nearly stationary at that time scale
Introduction 21 Introduction 22
SGN-14006 / A.K. SGN-14006 / A.K.
Example audio signal: snare drum Example audio signals: snare drum (2)

!  Instantaneous excitation, exponentially decaying !  Zoom-in of the snare drum waveform


amplitude envelope !  The signal contains also non-periodic components

Introduction 23 Introduction 24
SGN-14006 / A.K. SGN-14006 / A.K.
Example audio signals: snare drum (3) Example audio signals: snare drum (4)

!  Spectrum is noise-like too: not as clear structure as that in !  Spectrogram


oboe’s spectrum
Introduction 25 Introduction 26
SGN-14006 / A.K. SGN-14006 / A.K.
Polyphonic music (1) Polyphonic music (2)

!  Polyphonic music consists of a mix of several sound !  Spectrogram reveals e.g. the rhythmic structure
sources (linear superposition)

Introduction 27 Introduction 28
SGN-14006 / A.K. SGN-14006 / A.K.
Speech: time domain signal (1) Speech: time domain (2)

!  One sentence (”He knew what taboos he was violating.”) !  Zooming in to different phonemes
!  Speech can be viewed as a sequence of phonemes –  Left: vowel ”e” in He (voiced: periodic)
–  Right: ”t” in ”taboos” (unvoiced: ”noisy”)
Processing​, ​School of Architecture​ and ​Civil
Engineering
 

Introduction 29 AD#1 Introduction 30


SGN-14006 / A.K. “​NERDS​ MEET ​ART​ISTS​” SGN-14006 / A.K.
Speech spectrogram 2015-‐2016 Joint Course Module of ​Signal
Processing​, ​School of Architecture​ and ​Civil
!  Each phoneme has its characteristic spectral shape Engineering
 
GOAL:
!  Transitions between phonemes are continuous rather than  
 

step-like This  course  module  invites  students  from  signal  processing,  architecture  and  civil  engineering.  
Help signal processing engineers to understand needs of urban design and help architects and civil
 
engineers to understand potential of modern ICT in quantitative analysis of urban spaces. With the help of
GOAL:  
Help  signal  processing  engineers  to  understand  needs  of  urban  design  and  help  architects  and  
camera and microphone systems automatic analysis is provided for quantitative urban space monitoring.
civil  engineers  to  understand  potential  of  modern  ICT  in  quantitative  analysis  of  urban  spaces.  With  
The quantitative data is used for boosting architectural and civil engineering design of future urban spaces.
the  help  of  camera  and  microphone  systems  automatic  analysis  is  provided  for  quantitative  urban  
space  monitoring.  The  quantitative  data  is  used  for  boosting  architectural  and  civil  engineering  
design  of  future  urban  spaces.  
COURSE: SGN-81006 S​ignal Processing Innovation Project  
COURSES  (depends  on  your  home  department):  
ARK-­53806  ​Sustainable  Design  Studio  
PARTICIPATION: RAK-­13106  ​Sustainable  Development  Studio  
 
SGN-­81006  ​ Signal  Processing  Innovation  Project  
 
Enroll to the above course and come to the O​pening Session August 25 2015 10:00-12:00 RO104 ​where
 
the overall description is given and the project groups will be formed. The works will be supervised by the
PARTICIPATION:  
This  course  module  invites  students  from  signal  processing,  architecture  and  civil  engineering.  
Enroll  to  one  of  the  above  courses  and  come  to  the  ​Opening  Session  August  25  2015  10:00-­12:00  
  researchers from Department of Signal Processing, School of Architecture and Department of Civil
RO104​  where  the  overall  description  is  given  and  the  project  groups  will  be  formed.  The  works  will  
GOAL:  
Engineering. be  supervised  by  the  researchers  from  Department  of  Signal  Processing,  School  of  Architecture  
Help  signal  processing  engineers  to  understand  needs  of  urban  design  and  help  architects  and  
and  Department  of  Civil  Engineering.  
civil  engineers  to  understand  potential  of  modern  ICT  in  quantitative  analysis  of  urban  spaces.  With  
 
FOR MORE INFORMATION: FOR  MORE  INFORMATION:  
the  help  of  camera  and  microphone  systems  automatic  analysis  is  provided  for  quantitative  urban  
Harry  Edelman  (School  of  Architecture  /  Dept.  of  Civil  Engineering)  
space  monitoring.  The  quantitative  data  is  used  for  boosting  architectural  and  civil  engineering  
Harry Edelman (School of Architecture / Dept. of Civil Engineering) Joni  Kämäräinen  (Dept.  of  Signal  Processing  -­  video  processing)  
design  of  future  urban  spaces.  
  Joni Kämäräinen (Dept. of Signal Processing - video processing) Tuomas  Virtanen  (Dept.  of  Signal  Processing  -­  audio  processing)  

Tuomas Virtanen (Dept. of Signal Processing - audio processing)


COURSES  (depends  on  your  home  department):  
ARK-­53806  ​Sustainable  Design  Studio  
RAK-­13106  ​Sustainable  Development  Studio  
SGN-­81006  ​Signal  Processing  Innovation  Project  
 

Invitation to Data Collection CampaignIntroduction 31 PARTICIPATION:  


Enroll  to  one  of  the  above  courses  and  come  to  the  ​Opening  Session  August  25  2015  10:00-­12:00  
RO104​  where  the  overall  description  is  given  and  the  project  groups  will  be  formed.  The  works  will  
be  supervised  by  the  researchers  from  Department  of  Signal  Processing,  School  of  Architecture  
AD#2, Participate in a study, get a movie ticket! SGN-14006 / A.K.
and  Department  of  Civil  Engineering.  
 
I A project in Department of Signal Processing needs FOR  MORE  INFORMATION:  
Harry  Edelman  (School  of  Architecture  /  Dept.  of  Civil  Engineering)  

speech data for research purposes. Joni  Kämäräinen  (Dept.  of  Signal  Processing  -­  video  processing)  
Tuomas  Virtanen  (Dept.  of  Signal  Processing  -­  audio  processing)  

I Your task is to read out simple English sentences


from a script. Takes 25 minutes.
I Reward: a movie ticket.
I How to participate?
I We need two persons per recording. !
come with a friend. If you are alone, we
could try to pair you.
I Sign-up via email
aleksandr.diment@tut.fi
I The sessions take place on 24-28.8
during office hours, or at a different time
upon agreement.

S-ar putea să vă placă și