Documente Academic
Documente Profesional
Documente Cultură
phenomena, theories
and computational models
July 1998
Dan Ellis
International Computer Science Institute, Berkeley CA
<dpwe@icsi.berkeley.edu>
Outline
4 Big issues
4000
2000
1000
400
200
0 1 2 3 4 5 6 7 8 9
Horn1 (10/10)
S9−horn 2
S10−car horn
S4−horn1
S6−double horn
S2−first double horn
S7−horn
S7−horn2
S3−1st horn
S5−Honk
S8−car horns
S1−honk, honk
Crash (10/10)
S7−gunshot
S8−large object crash
S6−slam
S9−door Slam?
S2−crash
S4−crash
S10−door slamming
S5−Trash can
S3−crash (not car)
S1−slam
Horn2 (5/10)
S9−horn 5
S8−car horns
S2−horn during crash
S6−doppler horn
S7−horn3
Truck (7/10)
S8−truck engine
S2−truck accelerating
S5−Acceleration
S1−rev up/passing
S6−acceleration
S3−closeup car
S10−wheels on road
Horn3 (5/10)
S7−horn4
S9−horn 3
S8−car horns
S3−2nd horn
S10−car horn
4 Big issues
Sound
Computational ‘what’ and ‘why’;
source
Theory the overall goal
organization
‘how’;
Auditory
Algorithm an approach to
grouping
meeting the goal
practical Feature
Implementation realization of the calculation &
process. binding
Frequency- X(f)
Computational
domain
theory
processing
f
Discrete-time
Algorithm filtering
(subtraction)
Neurons with
Implementation GABAergic
inhibitions
4 Big issues
time
? exhaustive search
Implementation
• evolution in time
freq
onset
time
period
frq.mod
- feature maps
- periodicity cue
- common-onset boost
- resynthesis
3000 3000
2000 2000
1500 1500
1000 1000
600 600
400 400
300 300
200 200
150 150
100 100
0.2 0.4 0.6 0.8 1.0 time/s 0.2 0.4 0.6 0.8 1.0 time/s
time
- harmonic usually groups by onset & periodicity
- can alter frequency and/or onset time
- ‘degree of grouping’ from overall pitch match
• Gradual, various results:
pitch shift
mistuning
3%
time
4 Big issues
freq/kHz
2
1
0
0.0 0.4 0.8 1.2 time/s
time
• competing time-frequency
Implementation
affinity weights...
4000
2000
1000
3000
2500
2000
• Phonemic 1500
restoration 1000
500
0
1.2 1.3 1.4 1.5 1.6 1.7 time/s
Temporal compound (1998jul10)
20
• Temporal 40
compounds 60
80
100
120
f/Bark
S1−env.pf:0
speech 80
15
10
(duplex?) 60
5
40
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
hypotheses
Noise
components
Hypothesis Predict
management & combine
Periodic
components
prediction
errors
input signal predicted
mixture features Compare features
Front end
& reconcile
Time t1:
initial element
created
Time t2:
Additional
element required
Time t3:
Second element
finished
4000
2000
1000
f/Hz
Wefts1−4 Weft5 Wefts6,7 Weft8 Wefts9−12
4000
2000
1000
400
200
1000
400
200
100
50
Horn1 (10/10)
Horn2 (5/10)
Horn3 (5/10)
Horn4 (8/10)
Horn5 (10/10)
f/Hz
Noise2,Click1
4000
2000
1000
400
200
Crash (10/10)
f/Hz
Noise1
4000
2000
1000 −40
400
200 −50
−60
Squeal (6/10)
Truck (7/10)
−70
0 1 2 3 4 5 6 7 8 9 dB
time/s
Implementation ???
4 Big issues
- the state of ASA and CASA
- outstanding issues
- discussion points
• Plausibility
- correct level for human correspondence?
- which phenomena are important to match?
- how to implement symbolic-style processing?
• Top-down vs. bottom-up
- different approaches to ambiguity, latency
- how far down for top-down?
- how far ‘up’ for high level?
- choice between extraction & inference?
• Integrating multiple cues (e.g. binaural)
• Other debates:
- what is the real goal?
- resynthesis
- evaluation
ASA - Dan Ellis 1998jul11 - 36
Big issues in ASA & CASA:
• Knowledge:
how to acquire, represent & store ...
- short-term: context
- long-term: memories
- abstract: classes, generalities
• Attention:
- what does it mean in these models?
- limitation or important principle?