Sunteți pe pagina 1din 322
Chapter 1: Introduction • General introduction • Communication by sound and voice – Examples of

Chapter 1: Introduction

• General introduction
• Communication by sound and voice

– Examples of communication situations

• Systems approach to communication
• Modeling and theory formation in research

1
1

M. Karjalainen

Information Transmission by Sound Environmental orientation by sound 2 M. Karjalainen
Information Transmission by Sound
Environmental orientation by sound
2
M. Karjalainen
Communication by Speech Speech communication via acoustic medium 3 M. Karjalainen
Communication by Speech
Speech communication via acoustic medium
3
M. Karjalainen
Communication by Music Music via acoustic medium 4 M. Karjalainen

Communication by Music

Communication by Music Music via acoustic medium 4 M. Karjalainen

Music via acoustic medium

4
4

M. Karjalainen

Communication by Music Origins of speech and music ? Speech has been important in evolution

Communication by Music

Origins of speech and music ?

Speech has been important in evolution by what about music? Role of music: just a side product or important factor? - Charles Darvin: Important for mating etc.

Two interesting recent books:

Steven Mithen: “The Singing Neanderthals --- The Origins of Music, Language, Mind, and Body” Harward University Press, 2006

Daniel J. Levitin: This is Your Brain on Music --- The Science of a Human Obsession, PLUME 2006

5
5

M. Karjalainen

Speech Transmission Speech communication electronic medium 6 M. Karjalainen
Speech Transmission
Speech communication electronic medium
6
M. Karjalainen
Virtual Acoustic Reality Virtual instrument in virtual space 7 M. Karjalainen
Virtual Acoustic Reality
Virtual instrument in virtual space
7
M. Karjalainen
Man-Machine Communication by Speech Speech synthesis and recognition 8 M. Karjalainen
Man-Machine Communication by Speech
Speech synthesis and recognition
8
M. Karjalainen
A Black-Box Approach Input-output relationship M. Karjalainen 9

A Black-Box Approach

A Black-Box Approach Input-output relationship M. Karjalainen 9

Input-output relationship

M. Karjalainen

9
9
A Systems Approach A multi-level system 10 M. Karjalainen

A Systems Approach

A Systems Approach A multi-level system 10 M. Karjalainen

A multi-level system

10
10

M. Karjalainen

State

Object

Type (class)

System

Control

Systemic Concepts

Element (part of a whole, entity)

Relation / property

Structure (relatively permanent properties of a system)

Function(ality) (relatively variant properties of a system)

Event (a relatively discrete change, typically in time)

Process

Organization

Hierarchy / heterarchy

Data / information / knowledge (communication, language)

11
11

M. Karjalainen

Abstraction in Modeling and Theory Formation Abstraction hierarchy 12 M. Karjalainen

Abstraction in Modeling and Theory Formation

Abstraction in Modeling and Theory Formation Abstraction hierarchy 12 M. Karjalainen

Abstraction hierarchy

12
12

M. Karjalainen

Analysis Communication by Sound and Voice software hardware contentware Synthesis 13 Information Cognition functionware
Analysis
Analysis

Communication by Sound and Voice

software hardware
software
hardware

contentware

Synthesis 13
Synthesis
13

Information

Cognition

functionware

Signals

Physics

and Voice software hardware contentware Synthesis 13 Information Cognition functionware Signals Physics M. Karjalainen

M. Karjalainen

Chapter 2: Acoustics This is background information that is not asked directly in the exam,

Chapter 2: Acoustics

This is background information that is not asked directly in the exam, but knowing it certainly helps, especially if you need to apply your knowledge in practice.

M. Karjalainen

1
1
• • •

Chapter 2: Acoustics

Sound as physical phenomenon

When a tree in a forrest falls, and there is no one to listen, does it make a sound?

Vibration – generation of sound

Sound radiation

Sound propagation

• Reflection, absorption,

• Diffraction, refraction

• Standing waves

• Resonance, resonators

2
2

M. Karjalainen

Vibrating systems • Simple vibration: mass–spring system M. Karjalainen 3

Vibrating systems

Vibrating systems • Simple vibration: mass–spring system M. Karjalainen 3

• Simple vibration: mass–spring system

M. Karjalainen

3
3
Undamped and damped oscillation
Undamped and damped oscillation

Vibrating systems

4 M. Karjalainen
4
M. Karjalainen
Mass-spring resonator
Mass-spring
resonator

Helmholtz-

resonator

Resonance

Mass-spring resonator Helmholtz- resonator Resonance M. Karjalainen 5
Mass-spring resonator Helmholtz- resonator Resonance M. Karjalainen 5

M. Karjalainen

5
5
Two-mass vibrating system Transversal and longitudinal vibration of a two-mass system M. Karjalainen 6

Two-mass vibrating system

Transversal and longitudinal vibration of a two-mass system

Two-mass vibrating system Transversal and longitudinal vibration of a two-mass system M. Karjalainen 6

M. Karjalainen

6
6
Vibration modes of a string M. Karjalainen 7

Vibration modes of a string

Vibration modes of a string M. Karjalainen 7
M. Karjalainen
M. Karjalainen
7
7
Wave equation: D’Alembert:
Wave equation:
D’Alembert:

Wave propagation

8 M. Karjalainen
8
M. Karjalainen
Sound pressure: p [Pa] Sound pressure level: Reference:
Sound pressure: p [Pa]
Sound pressure level:
Reference:

M. Karjalainen

Sound pressure, sound pressure level, decibel

Sound pressure: p [Pa] Sound pressure level: Reference: M. Karjalainen Sound pressure, sound pressure level, decibel
9
9
Spherical wave:
Spherical wave:

Wave phenomena: spherical wave

Spherical wave: Wave phenomena: spherical wave Sound velocity in the air: 10 M. Karjalainen

Sound velocity in the air:

10
10

M. Karjalainen

Planar wave in a tube: Reflection (and transmission):
Planar wave in a tube:
Reflection (and transmission):
Planar wave in a tube: Reflection (and transmission): M. Karjalainen Wave phenomena: planar wave 11

M. Karjalainen

Wave phenomena: planar wave

11
11
Lowest resonance modes in a tube Open ends One end closed 12 M. Karjalainen
Lowest resonance modes in a tube
Open ends
One end closed
12
M. Karjalainen
Spectral content of string vibration 13 M. Karjalainen

Spectral content of string vibration

Spectral content of string vibration 13 M. Karjalainen
13
13

M. Karjalainen

Bar Membrane
Bar
Membrane

Bar and membrane modes

Bar Membrane Bar and membrane modes 14 M. Karjalainen
14 M. Karjalainen
14
M. Karjalainen
Reflection and refraction (bending) 15 M. Karjalainen
Reflection and refraction (bending)
15
M. Karjalainen
Diffraction 16 M. Karjalainen
Diffraction
16
M. Karjalainen
Sound propagation paths in a room 17 M. Karjalainen
Sound propagation paths in a room
17
M. Karjalainen
Tapiola-sali Sound field decay in a room M. Karjalainen 18

Tapiola-sali

Tapiola-sali Sound field decay in a room M. Karjalainen 18

Sound field decay in a room

Tapiola-sali Sound field decay in a room M. Karjalainen 18

M. Karjalainen

18
18
Sound field in a room, Computer simulation 19 M. Karjalainen
Sound field in a room, Computer simulation
19
M. Karjalainen
Sound field level in a reverberant room M. Karjalainen 20

Sound field level in a reverberant room

Sound field level in a reverberant room M. Karjalainen 20
Sound field level in a reverberant room M. Karjalainen 20

M. Karjalainen

20
20
L i = dimensions of a rectangular room n i = integer indices 0, 1,
L i = dimensions of a rectangular room
n i = integer indices 0, 1, 2,

Modal behavior in a room

measured magnitude response in a room
measured magnitude response in a room

M. Karjalainen

21
21
Sound propagation by image source model Solid line = real path; dotted line virtual path
Sound propagation by image source model
Solid line = real path; dotted line virtual path
22
M. Karjalainen
Electroacoustics: Loudspeaker Dynamic loudspeaker principle driver structure enclosure 23 M. Karjalainen
Electroacoustics: Loudspeaker
Dynamic loudspeaker
principle
driver structure
enclosure
23
M. Karjalainen
Electroacoustics: Microphone Condenser microphone principle construction 24 M. Karjalainen
Electroacoustics: Microphone
Condenser microphone
principle
construction
24
M. Karjalainen
Chapter 3: Sound and Voice as Signals This is background information that is not asked

Chapter 3: Sound and Voice as Signals

This is background information that is not asked directly in the exam, but knowing it certainly helps, especially if you need to apply your knowledge in practice.

M. Karjalainen

1
1
• – –

Sound and Voice as Signals

In signal representations a physical or abstract variable is typically reptesented as a function of time, such as:

Signal as a mathematical function:

Pure tone:

time, such as: Signal as a mathematical function: Pure tone: Random signal: Discrete-time numeric sequence •

Random signal:

Signal as a mathematical function: Pure tone: Random signal: Discrete-time numeric sequence • Continues 2 M.

Discrete-time numeric sequence

• Continues 2 M. Karjalainen
Continues
2
M. Karjalainen
Continues sinewave sample sequence
Continues
sinewave
sample sequence

Sound and Voice as Signals

Graphical presentations:

• Sound and Voice as Signals Graphical presentations: M. Karjalainen random noise speech waveform 3 unit

M. Karjalainen

random noise

speech waveform

3
3

unit impulse

unit pulse

Properties of LTI systems: •
Properties of LTI systems:

Any (stable) LTI system can be fully represented by its impulse response

• Output cannot include any frequencies that are not in the input (no nonlinear distortion)

• Any bandlimited LTI system can be approximated by digital filters with arbitrary accuracy (theoretically)

M. Karjalainen

Linear and time-invariant (LTI) systems

by digital filters with arbitrary accuracy (theoretically) M. Karjalainen Linear and time-invariant (LTI) systems 4
4
4
Convolution Fourier analysis
Convolution
Fourier analysis
Convolution Fourier analysis M. Karjalainen Signal processing algorithms 5

M. Karjalainen

Signal processing algorithms

5
5
Fourier synthesis Convolution vs. Fourier transform
Fourier synthesis
Convolution vs. Fourier transform

Signal processing algorithms

Fourier synthesis Convolution vs. Fourier transform Signal processing algorithms M. Karjalainen 6

M. Karjalainen

6
6
Decomposition of sawtooth waveform 7 M. Karjalainen
Decomposition of sawtooth waveform
7
M. Karjalainen
Spectrum analysis Magnitude spectrum Phase spectrum Phase delay Group delay 8 M. Karjalainen
Spectrum analysis
Magnitude spectrum
Phase spectrum
Phase delay
Group delay
8
M. Karjalainen
• Rectangular window
• Rectangular window

Fourier analysis with windowing

• Hamming window

• Hann(ing) window

• Kaiser window

• Blackman (Blackman-Harris) window

M. Karjalainen

9
9
Spectrum analysis using Fourier analysis with windowing Sine wave Sine wave windowed synchronously Sine wave
Spectrum analysis using Fourier analysis with windowing
Sine wave
Sine wave
windowed
synchronously
Sine wave
windowed non-
synchronously
Sine wave,
Hamming-
windowed
10
M. Karjalainen
Vowel spectra 11 M. Karjalainen
Vowel spectra
11
M. Karjalainen
Time-frequency representations: Spectrogram Word: /kaksi/ 12 M. Karjalainen

Time-frequency representations: Spectrogram

Word: /kaksi/

Time-frequency representations: Spectrogram Word: /kaksi/ 12 M. Karjalainen
12
12

M. Karjalainen

Auto- and cross-correlation Autocorrelation Cross-correlation M. Karjalainen 13

Auto- and cross-correlation

Auto- and cross-correlation Autocorrelation Cross-correlation M. Karjalainen 13

Autocorrelation

Cross-correlation

Auto- and cross-correlation Autocorrelation Cross-correlation M. Karjalainen 13

M. Karjalainen

13
13
• Compute Fourier transform • Logarithm of (power) spectrum • Inverse Fourier transform
• Compute Fourier transform
• Logarithm of (power) spectrum
• Inverse Fourier transform

Cepstrum

M. Karjalainen

14
14
• Analog-to-digital (A/D) converter
• Analog-to-digital (A/D) converter

• Digital signal processor (+ software)

• Digital-to-analog (D/A) converter

Digital signal processing: DSP systems

15
15

M. Karjalainen

Signal quantization: A/D conversion • Linear quantization (PCM-coding) • Discrete levels: 2 n (n= bit

Signal quantization: A/D conversion

Signal quantization: A/D conversion • Linear quantization (PCM-coding) • Discrete levels: 2 n (n= bit number)

• Linear quantization (PCM-coding)

• Discrete levels: 2 n (n= bit number)

• 16–24 bits/sample in audio ( 96 dB SNR)

• Sample rate: 44100 or 48000 samples/sec

M. Karjalainen

16
16
Unit delay as building element:
Unit delay as building element:

Z-transform

Linear transform of sequence x(n) :

element: Z-transform Linear transform of sequence x(n) : Digital filtering can be expressed as rational function

Digital filtering can be expressed as rational function (or polynomial) of z -1

M. Karjalainen

17
17
FIR = finite impulse response filter
FIR = finite impulse response filter
FIR = finite impulse response filter M. Karjalainen Digital filtering: FIR filters 18

M. Karjalainen

Digital filtering: FIR filters

18
18
Digital filtering: IIR filters IIR = infinite impulse response filter M. Karjalainen 19

Digital filtering: IIR filters

IIR = infinite impulse response filter

Digital filtering: IIR filters IIR = infinite impulse response filter M. Karjalainen 19
Digital filtering: IIR filters IIR = infinite impulse response filter M. Karjalainen 19

M. Karjalainen

19
19
Signal Windowed
Signal
Windowed

Linear prediction (AR-modeling)

Modeling of signal generation with flat spectrum excitation (impulse or noise) and IIR (all-pole) filter. Speech example:

excitation (impulse or noise) and IIR (all-pole) filter. Speech example: M. Karjalainen LP-spectra 20 FFT-spectrum

M. Karjalainen

LP-spectra

20
20

FFT-spectrum

MLF = multilayer feedforward network = multilayer perceptron Input layer + hidden and output layer
MLF = multilayer feedforward network
= multilayer perceptron
Input layer + hidden and output layer nodes
with sigmoidal nonlinearity
Backpropagation algorithm for training

M. Karjalainen

Neural networks

21
21
For probabilistic modeling of state sequences Used especially in speech recognition
For probabilistic modeling of state sequences
Used especially in speech recognition

Hidden Markov models (HMM)

22
22

M. Karjalainen

Magnitude response of a non-ideal loudspeaker
Magnitude response of a non-ideal loudspeaker

M. Karjalainen

Audio reproduction: loudspeaker response

23
23
Group delay response of a loudspeaker M. Karjalainen 24

Group delay response of a loudspeaker

M. Karjalainen

24
24
Reproduction quality: Distortion and SNR Nonlinearity results in distortion: Sine wave input results in generation

Reproduction quality: Distortion and SNR

Nonlinearity results in distortion: Sine wave input results in generation of harmonic components A(i) Distortion (usually given in %):

harmonic components A(i) Distortion (usually given in %): Distortion in general is discussed in later chapters

Distortion in general is discussed in later chapters

Signal-to-noise ratio (SNR):

(usually given in %): Distortion in general is discussed in later chapters Signal-to-noise ratio (SNR): M.

M. Karjalainen

25
25
M. Karjalainen Response equalization Non-flat magnitude response can be equalized (flattened), by digital filtering.

M. Karjalainen

Response equalization

Non-flat magnitude response can be equalized (flattened), by digital filtering. Example by so-called frequency-warped filters

26
26
• • – – – – – – • •

Chapter 4: Speech and Music

Speech communication

Speech production:

Speech production mechanism Vocal cords – phonation Vocal and nasal tract – articulation Units and notation of speech: vowels, consonants Prosody of speech Modeling of speech production

Singing voice

Speech processing: analysis, synthesis, coding, recognition

• Musical instruments as sound sources

• Music signal processing

1
1

– Sound synthesis techniques

– Physical modeling

– Digital audio vs. music

M. Karjalainen

Speech communication chain 2 M. Karjalainen
Speech communication chain
2
M. Karjalainen
Speech production mechanism 3 M. Karjalainen
Speech production mechanism
3
M. Karjalainen
• – • –

Phonation and articulation

Vocal cords (vocal folds) — phonation

Generation and controlling of voiced sound at glottis

Vocal tract and nasal tract — articulation

Controlling of voice features by articulation organs

Controlling of voice features by articulation organs • Concepts: – Glottis (vocal cord opening) – Voiced

Concepts:

– Glottis (vocal cord opening)

– Voiced / unvoiced / combined

– Constriction

– Formant (and antiformant)

– Vowel / consonant

– Prosodic features

4
4

M. Karjalainen

• • – • – – • –

Units and notation of speech – Phonetics

Phonetics: study and description of spoken language

Languages and language families

Indo-European, Finno-Ugric, …

Phonetic alphabet:

IPA (International Phonetic Alphabet) Computerized: SAMPA, Worldbet,

Units of spoken language:

Phoneme (smallest linguistic unit), abstract unit class

Allophone (variant of a phoneme)

Phone (äänne in Finnish), a concrete unit of speech

Diphone (from mid phone via transition to the mid of next one

5
5

Triphone (similar combination of three successive phones)

Speech segment (typically subunit of a phone)

M. Karjalainen

Vowels (Finnish) • Front–back (etisyys: etu–taka) • Open–closed (suppeus: suppea–väljä) •
Vowels (Finnish)
• Front–back (etisyys: etu–taka)
• Open–closed (suppeus: suppea–väljä)
• Rounded–unrounded (lavea–pyöreä)
6
6

M. Karjalainen

Consonants (Finnish) • Articulation place (ääntämispaikka): – Labial, dental, palatal, velar, laryngeal •
Consonants (Finnish)
• Articulation place (ääntämispaikka):
– Labial, dental, palatal, velar, laryngeal
• Articulation manner (ääntämistapa)
– Stop consonant (klusiili), fricative (frikatiivi), nasal
(nasaali) tremulant (tremulantti), lateral (lateraali),
semivowel (puolivokaali)

M. Karjalainen

7
7
• – • – • –

Prosody (suprasegmental features)

Intonation (intonaatio)

Primarily by fundamental frequency trajectory

Stress (paino)

Primarily by intensity (loudness) of pronounciation

Timing (ajoitus)

Rhythmic pattern (primarily by segment durations)

M. Karjalainen

8
8
Modeling of speech production • Simplification of the speech production mechanism – Acoustic model 9
Modeling of speech production
• Simplification of the speech production mechanism
– Acoustic model
9
M. Karjalainen
Circuit model (transmission-line model) • Glottal oscillator – Varying cross-section between vocal cords • Vocal
Circuit model (transmission-line model)
• Glottal oscillator
– Varying cross-section between vocal cords
• Vocal tract as a transmission line
– Two-directional wave propagation
• Lip radiation (acoustic load)
• Variables: pressure and volume velocity
10
M. Karjalainen
Signal model = Source-Filter model • Source = excitation – (a) voiced = quasiperiodic excitation
Signal model = Source-Filter model
• Source = excitation
– (a) voiced = quasiperiodic excitation
– (b) unvoiced = noislike excitation
• Filter = vocal and nasal tract
11
M. Karjalainen
Glottal oscillation • Phonation = vibration of vocal folds – Glottal opening is a function
Glottal oscillation
• Phonation = vibration of vocal folds
– Glottal opening is a function of time:
• Open phase, closed phase
• Glottal closure event generates the main
excitation to the vocal tract

M. Karjalainen

12
12
Formants (tract resonances) • Example: resonances of a homogeneous tube – Volume velocity transfer function
Formants (tract resonances)
Example: resonances of a homogeneous tube
Volume velocity transfer function
17 cm tube corresponds to typical male vocal tract
quarter waveleght resonator with resonances at

M. Karjalainen

13
13
• – Constriction in frontal tract – Cavity in the rear part of tract –
– Constriction in frontal tract
– Cavity in the rear part of tract
– First formant down from neutral position
– Second formant up from neutral position

M. Karjalainen

Vocal tract transfer functions: vowel /i/

Inhomogeneous vocal tract area profile /i/

14
14
Radiation directivity of speech • Omnidirectional at low frequencies • Increased frontal directivity at high
Radiation directivity of speech
• Omnidirectional at low frequencies
• Increased frontal directivity at high frequencies
Azimuth
Elevation
15
M. Karjalainen
• – – • – –

Singing voice

Classical singing style

`Singers formant´ around 3 kHz makes voice more audible In soprano singing the high fundamental frequency or a harmonic component should match a formant

Singing in popular music

Style and way of voice production is free since amplification makes it loud anyway Personality of voice is important

M. Karjalainen

16
16
• – • – • – •

Speech processing

Speech analysis

Feature analysis of speech signals

Speech synthesis

Typically synthesis from text

Speech recognition

From speech to text or commands

Speech coding

– Compression for transmission or storage

• Speech enhancement

– Improving degraded speech signals

M. Karjalainen

17
17
Formant synthesis models • Cascaded and parallel filter models
Formant synthesis models
• Cascaded and parallel filter models

M. Karjalainen

18
18
• – Fundamental frequency (pitch) can be changed
– Fundamental frequency (pitch) can be changed

M. Karjalainen

Synthesis by waveform concatenation

Overlap-add reconstruction of voiced speech

19
19
Text-to-speech synthesis • Transforming text to speech signal – Language-dependent text processing – Speech
Text-to-speech synthesis
• Transforming text to speech signal
– Language-dependent text processing
– Speech signal production quite language-independent
20
M. Karjalainen
Text-to-speech synthesis M. Karjalainen
Text-to-speech synthesis
M. Karjalainen
21
21
• – • • • •

Speech coding

Speech signal analysis

Typically model-based (linear prediction) where source and filter parameters are analyzed from speech signal

Quantization of the parameters (bit compression)

Transmission or storage of parametrized speech

Reconstruction of parameters

Reconstruction of speech signal

• Encoding -> transmission -> decoding

M. Karjalainen

22
22
• – – • – – •

Speech recognition

Feature analysis of signal

Typically mel cepstral coefficients Compression of data & redundancy removal

Pattern recognition

Comparison to speech units Typically by Hidden Markov Models (HMM)

Possible postprocessing

– Language modeling

• Formal grammar

• Unlimited text is difficult

M. Karjalainen

23
23
• – – – • – –

Musical instrument sounds

String instruments

Plucked string instruments Struck string instruments Bowed string instrument

Wind instruments

Brass instruments Woodwind instruments

• Percussion instruments

– Drums etc.

M. Karjalainen

24
24
Modeling of musical instruments (string modeling) • String model – Two-dimensional waveguide (transmission line)
Modeling of musical instruments (string modeling)
• String model
– Two-dimensional waveguide (transmission line)
– Excitation (pluck) inserted to both delay lines
– Wave reflections at terminations modeled as filters
– Output is taken at bridge or pickup, sum of both lines
– The same model is applicable to wind instrument bores
(but there is a nonlinear oscillating feedback in them)
25
M. Karjalainen
Simplified string modeling • String model reduction (signal model) – Two delay lines can be
Simplified string modeling
• String model reduction (signal model)
– Two delay lines can be combined to one
– Filters in the loop can be combined to a single loop filter
– Computation is more efficient
– So-called Karplus-Strong model is a simplified case where
an intial random noise is inserted in the delay line before
synthesis and loop filter is a simple two-tap FIR filter
26
M. Karjalainen
•

Impulse response of a simple string model

Impulse and magnitude responses of the previous model

27
27

M. Karjalainen

• Impulse response
Impulse
response

Body response modeling

String instrument body works like an LTI system (filter)

modeling String instrument body works like an LTI system (filter) M. Karjalainen 28 Magnitude response (low
modeling String instrument body works like an LTI system (filter) M. Karjalainen 28 Magnitude response (low

M. Karjalainen

28
28

Magnitude response (low frequencies)

• – – – • • • • • •

Chapter 5: Structure and Function of Hearing

Peripheral hearing

External ear Middle ear Inner ear (cochlea)

Basilar membrane

Hair cells

Auditory nerve

Active cochlea and nonlinearities

Higher levels of the auditory system

Basic properties of human hearing

– Effective hearing area (level vs. frequency)

– Equal loudness curves

– Technical measures related to hearing

• Sound level and frequency weighting functions

M. Karjalainen

1
1
• – • – • –

Approaches to hearing research

Anatomy of hearing

The structure of hearing organs is studied

Physiology of hearing

The (physiological) responses of hearing to physical sound stimuli are studied

Psychology of hearing

Functional properties of auditory perception are studied as subjects reactions to physical sound stimuli

2
2

The main interest here is ’Engineering psychoacoustics’ and computational models of auditory functions

M. Karjalainen

•

Peripheral hearing

Middle ear

Inner ear

External ear (outer ear)

• Peripheral hearing Middle ear Inner ear External ear (outer ear) M. Karjalainen 3

M. Karjalainen

3
3
Schematic of peripheral hearing • External ear (outer ear) Middle ear Inner ear 4 M.
Schematic of peripheral hearing
• External ear (outer ear)
Middle ear
Inner ear
4
M. Karjalainen
• – –

External ear and ear canal transmission

Transfer functions

Frontal sound source to the eardrum (solid line) Entrance of ear canal to the eardrum (dotted line)

line) Entrance of ear canal to the eardrum (dotted line) 5 • Head-related transfer functions (HRTFs)
5
5

• Head-related transfer functions (HRTFs) discussed later

M. Karjalainen

Middle ear: Bone conduction • Ossicles – Malleus (hammer-shaped bone) – Incus (anvil-shaped bone) –
Middle ear: Bone conduction
Ossicles
– Malleus (hammer-shaped bone)
– Incus (anvil-shaped bone)
– Stapes (stirrup-shaped bone)
Impedance match from air to liquid (1:3000)
6
M. Karjalainen
Animations of middle ear function 7 M. Karjalainen Animations: Universit y of Wisconsin

Animations of middle ear function

Animations of middle ear function 7 M. Karjalainen Animations: Universit y of Wisconsin
7
7
Animations of middle ear function 7 M. Karjalainen Animations: Universit y of Wisconsin

M. Karjalainen

Animations: University of Wisconsin

http://www.neurophys.wisc.edu/~ychen/auditory/fs-auditory.html

•

Middle ear conduction and features

Signal transfer function is a bandpass filter

and features Signal transfer function is a bandpass filter 8 • Other middle ear features: –
8
8

• Other middle ear features:

– Acoustic reflex

– Eustachian tube

M. Karjalainen

• • • •

Inner ear: the cochlea

Cochlea is a spiral-shaped, liquid-filled tube of about 2.7 turns and 35 mm long

Stapes vibration enters to cochlea through oval window

Another window to mid-ear is called round window

Basilar membrane divides the cochlea into two parts

Cochlea linearized
Cochlea linearized

M. Karjalainen

9
9
• – • • •

Cross-section of the cochlea

Basilar membrane between bony shelves

Division to scala vestibuli and scala tympani

Reissner’s membrane separates scala media

Organ of Corti: hair cells

Tectrorial membrane

scala tympani Reissner’s membrane separates scala media Organ of Corti: hair cells Tectrorial membrane M. Karjalainen

M. Karjalainen

10
10
• – – – –

Basilar membrane motion: traveling waves

Basilar membrane is a nonhomogeneous transmission line:

Wider and more massive towards apex Sound pressure entering the liquid of cochlea generates a traveling wave along the basilar membrane Traveling wave has maximum vibration amplitude depending on the frequency of wave (characteristic frequency = C.F.) High frequencies resonate close to the oval window and low frequencies close to helicotrema

= C.F.) High frequencies resonate close to the oval window and low frequencies close to helicotrema

M. Karjalainen

11
11
Animation of basilar membrane motion 12 M. Karjalainen
Animation of basilar membrane motion
12
M. Karjalainen
•

Basilar membrane response to a square-wave signal

Time–position–amplitude pattern of basilar membrane movement as a response to square-wave signal

Time–position–amplitude pattern of basilar membrane movement as a response to square-wave signal M. Karjalainen 13

M. Karjalainen

13
13
• • • • •

Hair cells

Inner hair cells, in one row

Outer hair cells, in 3-5 rows

Together about 15000 – 16000 hair cells

Each hair cell is equipped on top with u-, v-, or w- shaped filament called stereocilia

Neural fibers are connected to hair cells

on top with u-, v-, or w- shaped filament called stereocilia Neural fibers are connected to

M. Karjalainen

14
14
Hair cells in the organ of Corti 15 M. Karjalainen
Hair cells in the organ of Corti
15
M. Karjalainen
Stereocilia (= ’hair bundles’ of hair cells) 16 M. Karjalainen
Stereocilia (= ’hair bundles’ of hair cells)
16
M. Karjalainen
Movement of the organ of Corti 17 M. Karjalainen

Movement of the organ of Corti

17
17

M. Karjalainen

Movement and activation of hair cells 18 M. Karjalainen

Movement and activation of hair cells

18
18

M. Karjalainen

• • • •

Hair cells: neural conduction

Vibration of the basilar membrane causes bending of stereocilia and this opens ion channels which modulates potential within the cell

Activation of the cell releases neurotransmitter to synaptic junctions between hair cell and neural fibers of the auditory nerve

A neural spike is generated that propagates in the auditory nerve fiber

Next spike possible only after at least 1 ms

is generated that propagates in the auditory nerve fiber Next spike possible only after at least

M. Karjalainen

19
19
• • Cochlear potentials
Cochlear potentials

Activation and inhibition of hair cells

Asymmetrical effect of sterocilia bending on firing rate

20
20

M. Karjalainen

• Statistically phase-locked within half cycle
• Statistically phase-locked
within half cycle

Phase-locking and synchrony of neural firing

• Statistical synchrony of neural firing

21
21

M. Karjalainen

• • • – •

Passive vs. active cochlea

Georg von Békésy found basilar membrane behavior by experimention with ears from dead animals => reduced frequency resolution

Explanation: second filter needed

Now it is known that the cochlea is active:

Especially at low signal levels the outer hair cells amplify basilar membrane motion

Outer hair cells receive many efferent neural fibers from higher neural levels

• Outer hair cells are able to change their length very rapidly (in synchrony with high audio frequencies)

• Otoacoustic emission (cochlear echo) as a response to external stimulus, recordable in near canal, is related to this phenomenon

22
22

M. Karjalainen

• •

Auditory nerve responses: firing rate

Steady-state firing rate is a saturating function with spontaneous rate (= without sound excitation)

There are fibers with different sensitivity (and spontaneous rate)

rate (= without sound excitation) There are fibers with different sensitivity (and spontaneous rate) M. Karjalainen

M. Karjalainen

23
23
• –

Poststimulus time histogram (PST)

Firing rate overshoot and undershoot with onset and offset of excitation

Works like automatic gain control

rate overshoot and undershoot with onset and offset of excitation Works like automatic gain control 24
24
24

M. Karjalainen

•

PST with steady-state sinusoidal excitation

Statistically, half-wave rectification appears along with automatic gain control

sinusoidal excitation Statistically, half-wave rectification appears along with automatic gain control M. Karjalainen 25

M. Karjalainen

25
25
•

Firing rate saturation for a vowel excitation

For increasing level of excitation, the firing rate profile (’neural activation spectrum’) saturates

increasing level of excitation, the firing rate profile (’neural activation spectrum’) saturates M. Karjalainen 26

M. Karjalainen

26
26
• •

Tuning curves for constant firing level

If the firing rate of a neural fiber is kept constant for varying excitation frequency, a tuning curve is obtained

This characterizes the frequency selectivity of cochlea

excitation frequency, a tuning curve is obtained This characterizes the frequency selectivity of cochlea M. Karjalainen

M. Karjalainen

27
27
• – – • •

Effects of active cochlea

Low-level signals are amplified substantially by active cochlea:

Sensitivity of hearing is increased Due to AGC-like compression, the narrow dynamic range (about 25 dB) of hair cells is expanded to more than 100 dB

Selectivity (frequency resolution) is increased (especially at low signal levels) due to active function

If outer hair cells are damaged, the active amplification is degraded or disappears

– Loss of auditory sensitivity

– Tuning curves are broadened

– Otoacoustic emissions disappear

M. Karjalainen

28
28
•

Cochlear nonlinearity: Two-tone suppression

Addition of another tone (shaded area in figure below) suppresses the activation due to probe tone at its characteristic frequency (= kind of masking)

below) suppresses the activation due to probe tone at its characteristic frequency (= kind of masking)
29
29

M. Karjalainen

• – • – • •

Cochlear nonlinearity: Combination tones

Nonlinear interaction of two tones generates new tones that are perceived:

Difference tone: f diff = f 2 f 1

E.g.: 1.1 kHz and 1.0 kHz => 100 Hz

Cubic difference tone: f cubic = 2f 1 f 2

E.g.: 1.0 kHz and 1.1 kHz => 900 Hz

Appears already at low level of excitation

M. Karjalainen

30
30
• • • •

Central auditory system

31 M. Karjalainen
31
M. Karjalainen

Higher-level functions not known well.

Cochlear nucleus has specific cells such as ’chopper cells’ that do temporal processing. Spectral information is recovered unsaturated.

Binaural hearing starts at superior olive level.

Auditory cortex is the center for processing perceptions and integrating the sound scene.

• Interaction with other senses (vision) strong.

6 dB steps 3 dB steps
6
dB steps
3
dB steps
6 dB steps 3 dB steps 1 dB steps Dynamic range of hearing M. Karjalainen Sound

1 dB steps

Dynamic range of hearing

6 dB steps 3 dB steps 1 dB steps Dynamic range of hearing M. Karjalainen Sound

M. Karjalainen

Sound

level

’thermo-

meter’

32
32
•

Equal loudness curves and threshold of hearing

Equal loudness level perception, unit phone = SPL at 1 kHz

loudness curves and threshold of hearing Equal loudness level perception, unit phone = SPL at 1

M. Karjalainen

33
33
•

Sound level and frequency weighting curves

Weighting filters for sound level measurement (A most common)

Sound level and frequency weighting curves Weighting filters for sound level measurement (A most common) M.

M. Karjalainen

34
34
• • Octave = 2:1 • 1/2 octave
• Octave = 2:1
• 1/2 octave

1/3 octave

Recommended frequences and bands

M. Karjalainen
M. Karjalainen
35
35

Recommended frequences and frequency bands for measurements and technical applications:

• • •

Filtered noise demo

White noise

Low-pass filtered noise, decreasing cutoff frequency

High-pass filtered noise, increasing cutoff frequency

•
•

1/3 octave noise, increasing center frequency

White and pink noise

M. Karjalainen

36
36
• • – – • – • – •

Chapter 6: Fundamentals of Psychoacoustics

Psychoacoustics = auditory psychophysics

Sound events vs. auditory events

Sound stimuli types, psychophysical experiments Psychophysical functions

Basic phenomena and concepts

Masking effect

Spectral masking, temporal masking

Pitch perception and pitch scales

Different pitch phenomena and scales

Loudness formation

• Static and dynamic loudness

Timbre

• as a multidimensional perceptual attribute

Subjective duration of sound

M. Karjalainen

1
1
• • – • •

Psychophysical experimentation

Sound events (s i ) = pysical (objective) events

Auditory events (h i ) = subject’s internal events

Need to be studied indirectly from reactions (b i )

Psychophysical function h=f(s)

Reaction function b=f(h)

studied indirectly from reactions ( b i ) Psychophysical function h=f(s) Reaction function b=f(h) M. Karjalainen

M. Karjalainen

2
2
• – – – – – – – –

Sound events: Stimulus signals

Elementary sounds

Sinusoidal tones Amplitude- and frequency-modulated tones Sinusoidal bursts Sine-wave sweeps, chirps, and warble tones Single impulses and pulses, pulse trains Noise (white, pink, uniform masking noise) Modulated noise, noise bursts Tone combinations (consisting of partials)

Complex sounds

3
3

– Combination tones, noise, and pulses

– Speech sounds (natural, synthetic)

– Musical sounds (natural, synthetic)

– Reverberant sounds

– Environmental sounds (nature, man-made noise)

M. Karjalainen

• – – – • – – • •

Sound generation and experiment environment

– • • Sound generation and experiment environment 4 Reproduction techniques Natural acoustic sounds
– • • Sound generation and experiment environment 4 Reproduction techniques Natural acoustic sounds
– • • Sound generation and experiment environment 4 Reproduction techniques Natural acoustic sounds

4

Reproduction techniques

Natural acoustic sounds (repeatability problems) Loudspeaker reproduction Headphone reproduction

Reproduction environment

Not critical in headphone reproduction Anechoic chamber (free field)

Room effects minimized

Not a natural environment

– Listening room

• Carefully designed, relatively normal acoustics

– Reverberation chamber

• Special experiments with diffuse sound field

M. Karjalainen

Psychophysical functions • Sound event property to auditory event property mapping h h = a
Psychophysical functions
Sound event property to auditory event property mapping
h
h
= a log(s)
= c s k
Weber, Weber-Fechner law
(e.g., loudness)

M. Karjalainen

5
5
• – –

Experimental concepts: Thresholds

Threshold values

Absolute thresholds (e.g., threshold of hearing) Difference thresholds (just noticeable difference, JND)

Example: Threshold of perception:

- 50%, 75%, etc. thresholds

(just noticeable difference, JND) Example: Threshold of perception: - 50%, 75%, etc. thresholds M. Karjalainen 6

M. Karjalainen

6
6
Experimental concepts • Comparison of percepts – – Magnitude estimation Magnitude production • Probe tone
Experimental concepts
Comparison of percepts
Magnitude estimation
Magnitude production
Probe tone method
Generation of a probe tone to make test tone
audible/noticeable
Modulation, canceling, interference
Classification and scaling of percepts
Nominal scale (rough, sharp, reverberant, …)
Ordinal scale (percepts have ordering)
Interval scale (numeric scale, no zero point defined)

Ratio scale (numeric scale, zero point defined)

• Multidimensional scaling

– Semantic differentials: low – high, dull – sharp,

M. Karjalainen

7
7
• – • – • – • • – •

Psychoacoustic experiments

Description of auditory events

Oral or written description

Method of adjustment

Adjusting a stimulus to correspont to a reference

Selection methods

Forced choice methods (select one!):

Two alternative forced choice (TAFC, 2AFC)

Method of tracking

Tracking with varying stimulus

Bekesy audiometry

Bracketing method

– Descending and ascending bracketing

• Yes/no answering

• Reaction time measurement

– Indicates the difficulty of decision task

M. Karjalainen

8
8
•

Békésy audiometry

Slow frequency sweep and level tracking

• Békésy audiometry Slow frequency sweep and level tracking M. Karjalainen 9

M. Karjalainen

9
9
Typical psychoacoustical test types • AB test – – Set in preference order / select
Typical psychoacoustical test types
AB test
Set in preference order / select one
AB hidden reference (one must be recognized)
AB scale test
As AB but assign numeric values for A and B
ABC test
A is fixed reference (anchor point) for assigning
values for B and C
ABX test
Which one, A or B, is equal to X ?

• TAFC (2AFC)

– Two alternative forced choice

• Formation of a listening test panel

• Formation of a description language

M. Karjalainen

10
10
• • – – – – – – •

Masking effect

”A loud sound makes a weaker sound imperceptible”

Categories and aspects of masking

Frequency masking Temporal masking Time-frequency masking Frequency selectivity of the auditory system Psychophysical tuning curves Critical band

Bark bandwidth

• ERB bandwidth

• Masking tone and test tone

M. Karjalainen

11
11
•

Frequency masking

Masking by white noise

• Frequency masking Masking by white noise 12 M. Karjalainen
12
12

M. Karjalainen

•

Frequency masking

Masking by narrow-band noise (0.25, 1, 4 kHz)

13 M. Karjalainen
13
M. Karjalainen
•

Frequency masking

Frequency masking as a function of masker level

• Frequency masking Frequency masking as a function of masker level 14 M. Karjalainen
14
14

M. Karjalainen

•

Frequency masking

Frequency masking by lowpass and highpass noise

• Frequency masking Frequency masking by lowpass and highpass noise 15 M. Karjalainen
15
15

M. Karjalainen

•

Frequency masking

Frequency masking by 1 kHz sinusoidal signal

• Frequency masking Frequency masking by 1 kHz sinusoidal signal M. Karjalainen 16

M. Karjalainen

16
16
Frequency masking • Frequency masking by a complex tone (harmonic complex) 17 M. Karjalainen
Frequency masking
• Frequency masking by a complex tone
(harmonic complex)
17
M. Karjalainen
•

Temporal masking

Masking before and after a noise signal

18 M. Karjalainen
18
M. Karjalainen
•

Temporal masking

Beginning of postmasking

• Temporal masking Beginning of postmasking 19 M. Karjalainen
19
19

M. Karjalainen

• – –

Temporal masking

Postmasking as a function of time

For 200 ms long masker For 5 ms long masker

– – Temporal masking Postmasking as a function of time For 200 ms long masker For

M. Karjalainen

20
20
•

Time-frequency masking

Masking of a tone burst in time and frequency by a time-frequency block of noise

Time-frequency masking Masking of a tone burst in time and frequency by a time-frequency block of
21
21

M. Karjalainen

•

Temporal masking

Masking due to an impulse train

• Temporal masking Masking due to an impulse train 22 M. Karjalainen
22
22

M. Karjalainen

• •

Frequency selectivity of hearing

Masking curves tell much about auditory selectivity

Psychophysical tuning curves match with physiological curves

curves tell much about auditory selectivity Psychophysical tuning curves match with physiological curves M. Karjalainen 23

M. Karjalainen

23
23
•

Critical band experiment

Experiment: loudness vs. bandwidth of noise

• Critical band experiment Experiment: loudness vs. bandwidth of noise 24 M. Karjalainen
24
24

M. Karjalainen

• –

Critical band

Loudness vs. bandwidth of noise

Loudness increases when bandwidth exceeds a critical band

Critical band Loudness vs. bandwidth of noise Loudness increases when bandwidth exceeds a critical band M.

M. Karjalainen

25
25
• •

Critical band (Bark band) vs. frequency

Critical band (Bark band) f G vs. mid frequency

Ref: just noticeable tone frequency change vs. frequency

band (Bark band) f G vs. mid frequency Ref: just noticeable tone frequency change vs. frequency

M. Karjalainen

26
26
M. Karjalainen
M. Karjalainen

Critical band: 24 Bark bands (Zwicker)

27
27
• • •

ERB band experiment

ERB = Equivalent Rectangular Bandwidth

Loudness of a tone is measured as a function of frequency gap in masking noise around the test tone

ERB band is narrower than Bark band, especially at low frequences

in masking noise around the test tone ERB band is narrower than Bark band, especially at

M. Karjalainen

28
28
Pitch scales • Pitch = subjective measure of tone hight • Mel scale or •
Pitch scales
Pitch = subjective measure of tone hight
Mel scale
or
Bark scale
or
Inverse function:

• ERB scale

of tone hight • Mel scale or • Bark scale or Inverse function: • ERB scale
of tone hight • Mel scale or • Bark scale or Inverse function: • ERB scale

Inverse :

of tone hight • Mel scale or • Bark scale or Inverse function: • ERB scale

M. Karjalainen

29
29
• • •

Logarithmic pitch scale

Logarithmic scale used in music and audio

Frequency ratios more important than absolute frequencies

Octave and ratios of small integers important

Frequency ratios more important than absolute frequencies Octave and ratios of small integers important M. Karjalainen

M. Karjalainen

30
30
Comparison of pitch scales • Pitch scales are related to place coding on the basilar
Comparison of pitch scales
• Pitch scales are related to place coding on the basilar
membrane, although they are measured by psychoacoustic
experiments
31
M. Karjalainen
•

Comparison of pitch scales

Comparison (log reference) of:

– logarithmic scale

– ERB scale

– Bark scale

– linear scale

Comparison (log reference) of: – logarithmic scale – ERB scale – Bark scale – linear scale

M. Karjalainen

32
32
•

Comparison of pitch scales

Comparison (linear reference) of:

– logarithmic scale

– ERB scale

– Bark scale

– linear scale

Comparison (linear reference) of: – logarithmic scale – ERB scale – Bark scale – linear scale

M. Karjalainen

33
33
•

Pitch

Continues in file KA6b

M. Karjalainen

34
34
• –

Pitch phenomena

Cont’d from file 6a

Pitch of a pure tone as a function of amplitude

Individually varying property

1 M. Karjalainen
1
M. Karjalainen
• – – –

JND of frequency modulation

Frequency modulation JND threshold

As a function of carrier frequency As a function of modulation frequency About 4 Hz modulation most easily perceivable

of carrier frequency As a function of modulation frequency About 4 Hz modulation most easily perceivable

M. Karjalainen

2
2
• – –

Minumum duration of a tone for pitch percept

Duration to make pitch perceivable

Duration in milliseconds Duration of two cycles as a reference

percept Duration to make pitch perceivable Duration in milliseconds Duration of two cycles as a reference
3
3

M. Karjalainen

JND pitch change vs. tone duration • Threshold of perceived pich variation increases below 200
JND pitch change vs. tone duration
• Threshold of perceived pich variation increases below
200 ms duration
4
M. Karjalainen
Pitch strength • How strong or weak a pitch perception is? 5 M. Karjalainen
Pitch strength
• How strong or weak a pitch perception is?
5
M. Karjalainen
• • • • • • •

Pitch phenomena and theories

Place (spectral) pitch vs. temporal pitch theories

Spectral pitch (due to spectral peak)

Temporal pitch (periodicity)

Missing fundamental

Virtual pitch

Repetition pitch

Pitch of inharmonic signals

• Absolute pitch (memory)

M. Karjalainen

6
6
• – –

Loudness

Loudness is the perceived subjective ’strength’ (’volume’, ’intensity, etc.) of a sound

Subjective scale defined in relation to physical scale Unit is sone: 1 sone — 40 dB SPL at 1 kHz

sound Subjective scale defined in relation to physical scale Unit is sone: 1 sone — 40

M. Karjalainen

7
7
• – Loudness vs. loudness level : Power law:
Loudness vs.
loudness level :
Power law:
• – Loudness vs. loudness level : Power law: More precisely: Loudness of a sinusoidal tone

More precisely:

Loudness of a sinusoidal tone

Loudness N vs. SPL of a 1 kHz tone

Power law found to mach best

Loudness of a sinusoidal tone Loudness N vs. SPL of a 1 kHz tone Power law

M. Karjalainen

8
8
Loudness of a sinusoidal tone Loudness N vs. SPL of a 1 kHz tone Power law
• –

Partial loudness (by noise masking)

Partial loudness of 1 kHz tone in presence of masking noise

As a function of tone level and masking noise level

loudness of 1 kHz tone in presence of masking noise As a function of tone level

M. Karjalainen

9
9
• – – –

Loudness example: two tones

Loudness of a pair of tones as a function of frequency difference

Slow beat range: loudness due to peaks (6 dB over 60 dB) Medium rate fluctuation: power doubled => 3 dB increase Fast fluctuation: wideband signal => loudness doubled (10 dB)

power doubled => 3 dB increase Fast fluctuation: wideband signal => loudness doubled (10 dB) M.

M. Karjalainen

10
10
• • • •

Loudness density

Loudness computation (Zwicker formulation)

Excitation signal => power spectral density on the Bark scale

Spreading function B(z), such as

density on the Bark scale Spreading function B(z) , such as Convolution by spreading function •
density on the Bark scale Spreading function B(z) , such as Convolution by spreading function •

Convolution by spreading function

the Bark scale Spreading function B(z) , such as Convolution by spreading function • Total loudness
the Bark scale Spreading function B(z) , such as Convolution by spreading function • Total loudness

• Total loudness

the Bark scale Spreading function B(z) , such as Convolution by spreading function • Total loudness

M. Karjalainen

11
11
Loudness computation, examples • Left: excitation level for sinusoidal tone and white noise • Right:
Loudness computation, examples
• Left: excitation level for sinusoidal tone and white noise
• Right: loudness density for sinusoidal and white noise
12
M. Karjalainen
•

Loudness graphically

Graphical chart determination of loudness (Zwicker)

• Loudness graphically Graphical chart determination of loudness (Zwicker) M. Karjalainen 13

M. Karjalainen

13
13
• – – –

JND of loudness level

Just noticeable difference by amplitude modulation

Modulation of 1 kHz tone Modulation of white noise Modulation frequency 4 Hz

by amplitude modulation Modulation of 1 kHz tone Modulation of white noise Modulation frequency 4 Hz
14
14

M. Karjalainen

• – – –

JND of loudness level

Just noticeable difference by amplitude modulation

As a function of modulation frequency Modulation of 1 kHz tone Modulation of white noise

modulation As a function of modulation frequency Modulation of 1 kHz tone Modulation of white noise
15
15

M. Karjalainen

Modulation detection • Detection of amplitude and frequency modulation – Amplitude modulation easily detectable by
Modulation detection
• Detection of amplitude and frequency modulation
– Amplitude modulation easily detectable by ’off-band listening’
(loudness modulated due to upper spreading slope variation)
– No slope variation in frequency modulation
16
M. Karjalainen
• –

Loudness vs. duration

Temporal integration of loudness for duration < 200 ms

Loudness level decreases 10 phon for for 10-fold decrease in duration

for duration < 200 ms Loudness level decreases 10 phon for for 10-fold decrease in duration
17
17

M. Karjalainen

• –

Loudness formation temporally

Loudness formation for different durations of a tone burst

Peak value of total loudness is tracked in time-varying cases

for different durations of a tone burst Peak value of total loudness is tracked in time-varying

M. Karjalainen

18
18
• – • • • – • –

Timbre (perceived ’sound color’)

Timbre is a multidimensional attribute of sound

For stationary sounds:

Spectrum: (loudness spectrum)

Periodicity (periodic, multiperiodic, noise-like)

Repetitiveness (reflections, reverberation, spatialness)

For time-varying signals

Amplitude envelope important

– Amplitude envelope at each critical band

For transients and onsets

• Changes are more prominent than steady-state parts, especially onsets

M. Karjalainen

19
19
•

Subjective duration

Subjective vs. objective duration

• Subjective duration Subjective vs. objective duration 20 M. Karjalainen
20
20

M. Karjalainen

1 2-6 7 21 22
1
2-6
7
21
22
1 2-6 7 21 22 Auditory Demonstrations 1 Cancelled harmonics Critical bands by masking C.B. by

Auditory Demonstrations 1

Cancelled harmonics Critical bands by masking

C.B. by loudness comparison 8-11 The decibel scale

12-16 Filtered noise 17-18 Frequency response of the ear 19-20 Loudness scaling

Temporal integration Asymmetry of masking by pulsed tones

21
21

23-25 Backward and forward masking

26 Pulsation threshold

M. Karjalainen

27-28 29 30 31 32 33 36 37
27-28
29
30
31
32
33
36
37

Auditory Demonstrations 2

Dependence of pich on intensity

Pitch salience and tone duration

Influence of masking noise on pitch Octave matching Streched and compressed scales Frequency difference limen

34-35 Log and lin frequency scales

Pitch streaming Virtual pitch (missing fundamental)

22
22

38-39 Shift of virtual pitch 40-42 Masking spectral and virtual pitch

M. Karjalainen

43-45 48 52 53 57
43-45
48
52
53
57
43-45 48 52 53 57 Auditory Demonstrations 3 Virtual pitch with random harmonics 46-47 Strike note

Auditory Demonstrations 3

Virtual pitch with random harmonics

46-47 Strike note of chime

Analytic vs synthetic pitch

49-51 Scales with repetition pitch

Circularity in pitch judgment Effect of spectrum on timbre

54-56 Effect of tone envelope on timbre

Change in timbre with transposition

23
23

58-61 Tones and tuning with streched partials

62-63 Primary and secondary beats

M. Karjalainen

• – • – • • – • Tonality – • •
• Tonality

Sharpness

Chapter 7: Other psychoacoustic concepts

Spectral center of gravity

Fluctuation strength

Perception of slow modulations (beats)

Impulsiveness

Roughness Perception of fast modulations

Periodic vs. random excitation

Sensory pleasantness

Psychoacoustic concepts and music

– Sensory consonance and dissonance

– Intervals, scales, and tunings

– Rhythm, tempo, bar, measure

1
1

• Perceptual organization of sound

M. Karjalainen

Sharpness • Perceived sharpness is proportional to spectral center of gravity • Unit of sharpness
Sharpness
• Perceived sharpness is proportional to spectral center of gravity
• Unit of sharpness is 1 acum ~ for noise of 60 dB, 1 kHz, 1 Bark
• Sharpness for 1 Bark wide noise, lowpass noise, and highpass noise
• Increase of level from 30 dB to 90 dB doubles the sharpness
Bandpass noises:
2
M. Karjalainen
• where is defined by curve:
where
is defined by curve:

Computation of sharpness

Sharpness can be estimated (without level effect) from

is defined by curve: Computation of sharpness Sharpness can be estimated (without level effect) from 3
3
3

M. Karjalainen

• • • • 1 Hz
1 Hz
• • • • 1 Hz 4 Hz 16 Hz Fluctuation strength Perception of relatively slow

4 Hz

• • • • 1 Hz 4 Hz 16 Hz Fluctuation strength Perception of relatively slow

16 Hz

Fluctuation strength

Perception of relatively slow modulations: fluctuation strength

Highest sensitivity to modulation at 4 Hz

Unit of fluctuation strength is 1 vacil ~ 4 Hz 100 % modulation of 1 kHz 60 dB tone

Figure: (a) AM broadband noise, (b) AM sinusoidal tone, (c) FM sinusoidal tone

4 M. Karjalainen
4
M. Karjalainen
Fluctuation strength • Left: fluctuation strength for AM (4 Hz) wideband noise (60 dB) •
Fluctuation strength
• Left: fluctuation strength for AM (4 Hz) wideband noise (60 dB)
• Right: sine tone, 1.5 kHz, 70 dB, modulated at 4 Hz, as a function
of FM deviation
5
M. Karjalainen
• Fluctuation strength computation:
• Fluctuation strength computation:

Fluctuation strength

• Fluctuation strength computation: Fluctuation strength M. Karjalainen 6
• Fluctuation strength computation: Fluctuation strength M. Karjalainen 6

M. Karjalainen

6
6
• • • •

Impulsiveness

There is no clearly defined psychoacoustic concept of impulsiveness

Impulsiveness is related to rapid onsets in signal

If the repetition rate of impulses is > 10–15 Hz, roughness is perceived

In noise control, impulsiveness is considered to increase hearing damage risk compared to non-impulsive sound of same energy

M. Karjalainen

7
7
• • • • 1 kHz+ f 7 Hz
1 kHz+ f
7 Hz

70 Hz• • • • 1 kHz+ f 7 Hz 300 Hz Roughness Fast (> 15 Hz)

300 Hz• • • • 1 kHz+ f 7 Hz 70 Hz Roughness Fast (> 15 Hz)

Roughness

Fast (> 15 Hz) modulation is perceived as roughness

Addition of two tones of different frequencies creates envelope fluctuation

When the frequency difference increases, tones start to segregate

When the frequency difference is larger than a critical band, roughness disappears

8 M. Karjalainen
8
M. Karjalainen
• •

Roughness

Unit of roughness is 1 asper ~ 1 kHz tone, 60 dB, 100 % AM modulated at 70 Hz.

Towards lower and higher modulation frequences the roughness decreases

dB, 100 % AM modulated at 70 Hz. Towards lower and higher modulation frequences the roughness

M. Karjalainen

9
9
• 1 kHz+ f 7 Hz 70 Hz
1 kHz+ f
7 Hz
70 Hz
• 1 kHz+ f 7 Hz 70 Hz 300 Hz Roughness Roughness for different carrier frequencies

300 Hz• 1 kHz+ f 7 Hz 70 Hz Roughness Roughness for different carrier frequencies as a

Roughness

Roughness for different carrier frequencies as a function of AM modulation frequency with 100 % modulation.

for different carrier frequencies as a function of AM modulation frequency with 100 % modulation. 10
10
10

M. Karjalainen

• • • • – • 10/CB
10/CB

Tonality

Tonality (tonalness) = sound exhibits voiced component(s), periodicity

Non-tonal sound is noise-like, non-periodic

Non-tonal (noisy) signal masks a tonal one more easily than vice versa

For tonality index , critical band index i, the masking threshold is:

( = 0.0: non-tonal, = 0.5: half-tonal, = 1: fully tonal)

= 0.0: non-tonal, = 0.5: half-tonal, = 1: fully tonal) Tonality with varying modal density, log.

Tonality with varying modal density, log. distribution of frequencies (approx/critical band):

with varying modal density, log. distribution of frequencies (approx/critical band): 20/CB 40/CB 80/CB 11 M. Karjalainen

20/CB

with varying modal density, log. distribution of frequencies (approx/critical band): 20/CB 40/CB 80/CB 11 M. Karjalainen

40/CB

with varying modal density, log. distribution of frequencies (approx/critical band): 20/CB 40/CB 80/CB 11 M. Karjalainen

80/CB

11 M. Karjalainen
11
M. Karjalainen
Sensory pleasantness • Sensory pleasantness (example by Zwicker): – P = sensory pleasantness – S
Sensory pleasantness
Sensory pleasantness (example by Zwicker):
P = sensory pleasantness
S = sharpness
R = roughness
T = tonality
N = loudness
Product sound quality measures are often constructed by
similar techniques.

M. Karjalainen

12
12
• •

Sensory consonance and dissonance

Consonance and dissonance are closely related to roughness

Consonance vs. dissonance of two partials:

Consonance and dissonance are closely related to roughness Consonance vs. dissonance of two partials: 13 M.
13
13

M. Karjalainen

• • •

Consonance and dissonance of harmonic tones

Roughness due to interaction of partials in a sound contribute to dissonance

Rations of small integers are most consonant (just intonation)

Consonance vs. dissonance of two harmonic complexes:

small integers are most consonant (just intonation) Consonance vs. dissonance of two harmonic complexes: M. Karjalainen

M. Karjalainen

14
14
• • – – – – –
• • – – – – – Examples of intervals Pythagoras noticed that intervals 2:1, 3:2,
• • – – – – – Examples of intervals Pythagoras noticed that intervals 2:1, 3:2,

Examples of intervals

Pythagoras noticed that intervals 2:1, 3:2, and 4:3 sound ”pleasant”

Consonant intervals (decreasing order of consonance):

Equally

tempered

intervals

2:1 octave 3:2 perfect fifth 4:3 perfect fourth 5:3 major sixth 5:4 major third

fifth 4:3 perfect fourth 5:3 major sixth 5:4 major third 1.4983 fifth 1.2599 third – 8:5

1.4983

fifth

1.2599

third

– 8:5 minor sixth

– 6:5 minor third

– 16/15 (dissonant)

– 40/27 (dissonant)

M. Karjalainen

15
15
• • •

Examples of intervals

Octave and its partitioning

Log and lin uniformly spaced scales

Which one is the best octave ?

Stretched and compressed scales

Circularity of pitch

Which one is the best octave ? Stretched and compressed scales Circularity of pitch • Shepard

• Shepard effect

M. Karjalainen

16
16
•

Intervals, scales, tuning

Just intonation, Pythagorean scale, (equally) tempered scale

Just intonation, Pythagorean scale, (equally) tempered scale 17 • On a tempered scale a semitone is
17
17

• On a tempered scale a semitone is 1:1.05946

• 1 cent is 1/100 of a semitone

M. Karjalainen

• • – • •

Non-western scales and tunings

The (tempered) western scale is adapted to a multitude of harmonic timbres of western instruments

For example the Balinese gamelan music is quite different

W. A. Sethares: Tuning, Timbre, Spectrum, Scale. Springer 1998

Example of tuning where octave is a very dissonant interval!

Example of tuning where octave is a very dissonant interval! Tunings and musical scales are strongly

Tunings and musical scales are strongly bound with spectral properties of musical instruments

18
18

M. Karjalainen

• • • • •

Temporal structures in music: Rhythm, tempo

Rhythm: periodicity and repeated structure in music

Tempo: rate of main events in music

Beat: positioning of emphasis on some events

Measure: basic rhythmic sequence

Duration of a note or another basic unit

M. Karjalainen

19
19
Perception of magnitude and phase spectrum • Magnitude – 1 dB deviation per critical band
Perception of magnitude and phase spectrum
Magnitude
1 dB deviation per critical band noticeable in direct comparison.
Even smaller deviations can be noticed by trained ”golden ears”
Even ± 3
5
dB deviations are not easy to ”perceive” when there is
no immediate reference (except for well trained listeners)
Magnitude response deviations = spectral coloration
Phase and time differences

The auditory system is relatively insensitive to phase (Helmholtz) in general: magnitude spectrum more important than phase spectrum, but sometimes phase is important

– Phase functions from Fourier analysis are circular and difficult to analyze and interpret

– Group delay (phase derivative) is a relatively good perceptual measure which describes the delay of modulation (not the carrier)

20
20

M. Karjalainen

• Special phase effects: –
• Special phase effects:

Perception of phase: extreme cases

The following two signals have the same magnitude spectrum but sound (as well as look) different

This is how the response looks like in a single critical band

21

M. Karjalainen

• – –

Perceptual organization of sound

Streaming (sequential grouping) of pitch sequences:

Slow repetition: one stream perceived Fast repetition: segregation into two separate streams

D B F C E A
D
B
F
C
E
A

(a)

One stream

Time

D D B B F F C C E E A A (b) Time Two
D
D
B
B
F
F
C
C
E
E
A
A
(b)
Time
Two streams
22
22

M. Karjalainen

• – – B
B

Perceptual organization of sound

Streaming may change also the perceived rhythm:

Large separation: B-D-F vs. A-C-E Small separation: B-D vs. A-C-E-F

A

C

D

E

F

A C D E F Time Upper stream Lower stream D B F C E A
A C D E F Time Upper stream Lower stream D B F C E A

Time

Upper stream

Lower stream

A C D E F Time Upper stream Lower stream D B F C E A
A C D E F Time Upper stream Lower stream D B F C E A
D B F C E A
D
B
F
C
E
A
A C D E F Time Upper stream Lower stream D B F C E A

Time

Upper stream

Lower stream

23
23

M. Karjalainen

•

Perceptual organization of sound

Streaming with increasing tempo

increasing tempo or frequency difference segregation of multiple streams time TIMBRE/TEXTURE

increasing

tempo or

frequency

difference

segregation of multiple streams
segregation
of multiple
streams
time
time

TIMBRE/TEXTURE

M. Karjalainen

24
24
•

Perceptual organization of sound

Streaming or segregation as a function of frequency difference and repetition period

20 15 10 5 0 20 10 5 3 always separated separated or coherent always

20

15

10

5

0

20 10 5 3 always separated separated or coherent always coherenti 0 50 100 150
20
10
5
3
always
separated
separated
or coherent
always
coherenti
0
50
100 150
200
250 300
400 500

Repetition period (msec)

M. Karjalainen

25
25
• – • – – • – – –

Auditory scene analysis

Auditory scene analysis

Bregman: Auditory scene analysis (MIT Press, 1990)

Sequential integration and segregation

Spectral vs. temporal relations Spatial cues in segregation

Integration and segregation of simultaneous auditory components

Spectral vs. temporal relations The ”old-plus-new” heuristics Spatial cues in segregation

• Primitive auditory organization

– Built-in and low-level mechanisms

• Schema-based auditory organization

– Learning of stream integration and segregation

M. Karjalainen

26
26
• – – – – – – •

Computational auditory scene analysis (CASA)

Computational auditory scene analysis (CASA) is an attempt to computationally simulate and model human auditory scene analysis

Sound source segregation (separation) Multipitch signal analysis of harmonic sound mixtures Bottom-up vs. top-down driven processing Prediction-driven processing Spatial source separation (coctail-party effect) Applications:

Audio content analysis and content-based coding

• Automatic music transcription

• Speech recognition

M. Karjalainen

27
27

Tilakuuleminen

Ville Pulkki

Akustiikan ja a¨anenk¨

asittelytekniikan¨

laboratorio

Teknillinen korkeakoulu Espoo, Suomi http://www.acoustics.hut.fi/ Ville Pulkki@hut.fi

TKK, Akustiikan ja a¨anenk¨ asittelytekniikan¨ laboratorio 26.3.2002 ¨ Aani¨ tilassa Ville Pulkki

TKK, Akustiikan ja a¨anenk¨

asittelytekniikan¨

laboratorio

26.3.2002

¨

Aani¨

tilassa

ja a¨anenk¨ asittelytekniikan¨ laboratorio 26.3.2002 ¨ Aani¨ tilassa Ville Pulkki (Ville.Pulkki@hut.fi) sivu 3
TKK, Akustiikan ja a¨anenk¨ asittelytekniikan¨ laboratorio 26.3.2002 Tilakuuleminen Suuntakuulo • Suuntakuulon

TKK, Akustiikan ja a¨anenk¨

asittelytekniikan¨

laboratorio

26.3.2002

Tilakuuleminen

Suuntakuulo

Suuntakuulon tarkkuus

Suuntakuulon teoria Etaisyyskuulo¨ Tilan havaitseminen Tilaa¨nentoisto¨

TKK, Akustiikan ja a¨anenk¨ asittelytekniikan¨ laboratorio 26.3.2002 Siirtofunktio a¨anil¨ ahteest¨ a¨ korvakayt¨

TKK, Akustiikan ja a¨anenk¨

asittelytekniikan¨

laboratorio

26.3.2002

Siirtofunktio a¨anil¨

ahteest¨

a¨ korvakayt¨

av¨ a¨an¨

a¨anil¨ ahteest¨ a¨ korvakayt¨ av¨ a¨an¨ Head Related Impulse Response (HRIR) Head Related Transfer

Head Related Impulse Response (HRIR) Head Related Transfer Function (HRTF)

c Duda: http://interface.cipic.ucdavis.edu/CIL tutorial/