Documente Academic
Documente Profesional
Documente Cultură
Course goals
SGN-14006 / A.K. SGN-14006 / A.K.
Introduction 3 Introduction 4
Lecture timeline (some changes may still take place) What is not covered by this course
SGN-14006 / A.K. SGN-14006 / A.K.
! Sound, audio signals, acoustics ! Speech recognition, audio content analysis, and acoustic
! Hearing pattern recognition
" Course SGN-24006 ”Analysis of Audio, Speech and Music
! Basic audio signal processing operations
Signals” (period 4)
– AD/DA-conversion, filters and filter banks, dynamic control, etc.
! Sound synthesis
! Analog audio
! Audio coding
– Electroacoustics, microphone and loudspeaker design
" See the course ”Akustiikan mittaukset”
! Speech production anatomy, phonetics
! Linear prediction, MFCCs, and cepstrum ! Hardware implementations
! Speech coding
! Speech synthesis
Introduction 5 Introduction 6
Exercises
Practical arrangements
SGN-14006 / A.K. SGN-14006 / A.K.
Introduction 7 Introduction 8
Project work
SGN-14006 / A.K. SGN-14006 / A.K.
Reference material
! Implementing an audio signal processing algorithm in ! Gold, Morgan, Ellis, ”Speech and audio signal processing,” Wiley, 2011.
Matlab ! Zölzer.”Digital audio signal processing,” Wiley&Sons, 2nd ed. 2008.
– In two-person groups – Including AD/DA-conversion, dynamic control, equalization, filter banks
! T.F. Quatieri: "Discrete-Time Speech Signal Processing: Principles and
! Topic(s) will be introduced later during the lectures Practice", Prentice Hall PTR, 2002.
! Rossing. ”The science of sound”, Addison-Wesley, 1990.
! Requirements: – Acoustics, hearing
– Choosing the topic ! Brandenburg, Kahrs. (1998). ”Applications of digital signal processing to audio
and acoustics,” Kluwer Academic Publishers
– Implementing the algorithm – Chapter on Perceptual audio coding
– Final report by 28.10.
! Pulkki, Karjalainen, ”Communication acoustic”,2015, Wiley
! More detailed instructions will appear on the course home
page
Introduction 9 Introduction 10
Audio signals
SGN-14006 / A.K. SGN-14006 / A.K.
Introduction 11 Introduction 12
! Where is audio and speech processing needed? ! Different applications employ different representations
! Examples: – Time domain representation
– Convert a musical piece into compressed mp3 format and store it – Frequency domain representation
on a hard disc for playback later (audio coding) – Time-frequency domain representation
– Encode a speech signal on a mobile phone before transmission ! On this course we consider mainly music and speech
– Add reverberation to a sound, correct the pitch of a singer (studio – Music signals involve a wide variety of sounds, billions of people
technology) listen to music worldwide
– Enhance the quality of a speech signal (denoising, echo cancell.) – Speech signals are an important special category of sound signals
– Compensate for loudspeaker non-idealities by digital equalization due to their importance for communication
! Typical digital signal processing system:
1. Digitize a signal (sampling, quantization)
2. Process in digital form (store, manipulate, etc)
-digital representation enables a variety of algorithms
3. Convert back to an analog signal
Introduction 13 Introduction 14
Introduction 15 Introduction 16
! Large time scale illustrates the sound amplitude envelope ! Zoom-in of the same oboe signal at time t = 0.45 s
! Example signal: one note from the oboe ! 90 ms frame illustrates the periodic waveform
– Amplitude is zero before the sound starts – Many sounds are periodic, for example most musical instrument
– The oboe has continuous excitation, therefore the sound’s sounds and vowels in speech
amplitude envelope remains nearly constant throught it duration
Introduction 17 Introduction 18
SGN-14006 / A.K. SGN-14006 / A.K.
Frequency domain representation – spectrum Consider log-frequency and dB-magnitude
! Obtained by computing discrete Fourier transform (for
example) of the time-domain signal, usually in a short frame
! Linear scale
! Many perceptually important properties are more clearly
visible in the frequency domain – usually
hard to ”see”
! Decibel scale for amplitude is useful from the viewpoint of anything
the human hearing and the dynamics of natural sounds
– Due to Fechner’s law (subjective sensation is proportional to the ! Log-frequency
logarithm of the stimulus intensity) – each octave is
! Phases are perceptually less important – often omitted approximately
equally important
perceptually
! Log-magnitude
– perceived change
from 50dB to 60dB
about the same as
from 60dB to 70dB
Introduction 19 Introduction 20
! Shows sound intensity as a function of time and frequency ! Sound decays gradually after the onset
! Obtained by blocking the signal into short analysis frames ! Instantaneous excitation: string is plucked at onset
and by computing their spectra
! Periodic sound (vibrating string, covered on Acoustics
! For audio, the frame size is typically 10–100 ms: sound lecture)
spectra are often nearly stationary at that time scale
Introduction 21 Introduction 22
SGN-14006 / A.K. SGN-14006 / A.K.
Example audio signal: snare drum Example audio signals: snare drum (2)
Introduction 23 Introduction 24
SGN-14006 / A.K. SGN-14006 / A.K.
Example audio signals: snare drum (3) Example audio signals: snare drum (4)
! Polyphonic music consists of a mix of several sound ! Spectrogram reveals e.g. the rhythmic structure
sources (linear superposition)
Introduction 27 Introduction 28
SGN-14006 / A.K. SGN-14006 / A.K.
Speech: time domain signal (1) Speech: time domain (2)
! One sentence (”He knew what taboos he was violating.”) ! Zooming in to different phonemes
! Speech can be viewed as a sequence of phonemes – Left: vowel ”e” in He (voiced: periodic)
– Right: ”t” in ”taboos” (unvoiced: ”noisy”)
Processing, School of Architecture and Civil
Engineering
step-like This course module invites students from signal processing, architecture and civil engineering.
Help signal processing engineers to understand needs of urban design and help architects and civil
engineers to understand potential of modern ICT in quantitative analysis of urban spaces. With the help of
GOAL:
Help signal processing engineers to understand needs of urban design and help architects and
camera and microphone systems automatic analysis is provided for quantitative urban space monitoring.
civil engineers to understand potential of modern ICT in quantitative analysis of urban spaces. With
The quantitative data is used for boosting architectural and civil engineering design of future urban spaces.
the help of camera and microphone systems automatic analysis is provided for quantitative urban
space monitoring. The quantitative data is used for boosting architectural and civil engineering
design of future urban spaces.
COURSE: SGN-81006 Signal Processing Innovation Project
COURSES (depends on your home department):
ARK-53806 Sustainable Design Studio
PARTICIPATION: RAK-13106 Sustainable Development Studio
SGN-81006 Signal Processing Innovation Project
Enroll to the above course and come to the Opening Session August 25 2015 10:00-12:00 RO104 where
the overall description is given and the project groups will be formed. The works will be supervised by the
PARTICIPATION:
This course module invites students from signal processing, architecture and civil engineering.
Enroll to one of the above courses and come to the Opening Session August 25 2015 10:00-12:00
researchers from Department of Signal Processing, School of Architecture and Department of Civil
RO104 where the overall description is given and the project groups will be formed. The works will
GOAL:
Engineering. be supervised by the researchers from Department of Signal Processing, School of Architecture
Help signal processing engineers to understand needs of urban design and help architects and
and Department of Civil Engineering.
civil engineers to understand potential of modern ICT in quantitative analysis of urban spaces. With
FOR MORE INFORMATION: FOR MORE INFORMATION:
the help of camera and microphone systems automatic analysis is provided for quantitative urban
Harry Edelman (School of Architecture / Dept. of Civil Engineering)
space monitoring. The quantitative data is used for boosting architectural and civil engineering
Harry Edelman (School of Architecture / Dept. of Civil Engineering) Joni Kämäräinen (Dept. of Signal Processing - video processing)
design of future urban spaces.
Joni Kämäräinen (Dept. of Signal Processing - video processing) Tuomas Virtanen (Dept. of Signal Processing - audio processing)
speech data for research purposes. Joni Kämäräinen (Dept. of Signal Processing - video processing)
Tuomas Virtanen (Dept. of Signal Processing - audio processing)