The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 1
COM3502-4502-6502
SPEECH PROCESSING
Lecture 1
Introduction
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 2
The Course
• Speech
– speaking and hearing
– acoustics and sound
– the nature of speech
– sounds and symbols
– phonetics
– phonology
– prosody
• Speech Processing
– signals and spectra
– sampling
– waveform processing
– the Fourier transform
– filters and
linear prediction
– cepstral analysis
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 3
Python & Jupyter Notebooks
• Python
Picture © 2017 Project Jupyter Contributors
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 4
Python & Jupyter Notebooks
• Python for exercises
– Jupyter Notebooks
– Anaconda
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 5
Recommended Text Books
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 6
Teaching Staff
Lecturer
Graduate
Teaching
Assistants
George L. Close
Robbie Sutherland Jason Clarke
Stefan Goetze
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 7
Logistics
Lecture material on Blackboard:
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 8
Logistics
• Lectures
– 20x 50 mins (with breaks)
– 2 per week
• Practical work
– Weekly lab sheets (first in week 2)
– Main programming assignment (~9 weeks work)
• Feedback/Interaction with Teaching Staff
– Ask questions during lecture! Beneficial for everyone
– Blackboard Discussion Group (lectures + practical work)
– Sample Solutions for Lab Sheets
– Staff contact details on Blackboard (https://vle.shef.ac.uk)
• Assessment
– Main programming assignment (worth 55%)
– Blackboard exam (worth 45%)
• Lecture notes (these slides)
– available on Blackboard prior to each lecture
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 9
What is Speech Processing ?
“… the study of speech signals and the
processing methods of these signals”
“… a special case of digital signal
processing applied to speech signals”
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 10
What is Speech Processing For ?
2020: 9,200,000,000
1876: 2
style/gadgets-and-tech/news/there-are-
officially-more-mobile-devices-than-people-
in-the-world-9780518.html
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 11
Speech Processing Technologies
X
Automatic
Speech
Recognition
X
Text-to-Speech
Synthesis
Spoken Language
Dialogue Systems
X
Digital Speech
Coding
X
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 12
Extracting Information from Speech
Speaker
Recognition
Words
“How are you?”
Language
English
Speaker
John Smith
Speech Signal
Accent
Recognition
Speech
Recognition
Accent
Sheffield
Language
Recognition
Emotion
‘happy’
Emotion
Recognition
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 13
Speech Enhancement
Speech
Processing
Speech
Noise
Speech+Noise
Recovered
Speech
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 16
Significant Market Penetration
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 18
Why Speech Processing ?
Lots of applications
… especially in Science Fiction !
Star Trek IV: The Voyage Home (1986)
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 19
Why use Speech ?
“Speech is the ‘natural’ way to
interact with your computer.”
Speech may be a more intuitive way of
accessing information, controlling
things and communicating …
but there may be viable alternatives
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 20
Why use Speech ?
Some alternatives can be problematic…
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 21
The Advantages of Speech
• hands-free
• eyes-free
• fast
• intuitive
“You have been learning since birth the only
skill needed to operate our equipment.”
Fran Capo
World Record Holder
603.32 wpm
Video source:
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 30
Robust Speaker-Independent Small-
Vocabulary Automatic Speech Recognition
Command & Control: Vehicle
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 36
Intelligence: Voice Stress Analysis
Vocal Emotion Detection
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 37
Processing: Speaker Localisation
Direction/Position Estimation
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 38
Processing: Special Effects
Vocal Manipulation
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 39
Processing: Audio Alignment
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 40
(Some)
Notation & Basics
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 41
Repetition: Real Numbers
• Natural numbers ℕ = {1, 2, 3, …}
• Whole numbers ℕ0 = {0, 1, 2, 3, …}
• Integer numbers ℤ = {…,
-2,
-1, 0, 1, 2, 3, …}
• Rational numbers ℚ = {a/b : a ∈ ℤ and b ∈ ℕ}
every number resulting from a ratio
• Real numbers ℝ (includes e.g. p = 3.14159…)
positive numbers
negative numbers
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 42
Complex Numbers
• Complex numbers ℂ
– Add “imaginary” dimension
– Real Part + ��� Imaginary Part
– ��� = ��� + ��� ���
– Adding by
interpretation as vectors
Real part
Re{���}
Imaginary unit
(sometimes ���)
Imaginary part
Im{z}
Complex
number
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 43
Continuous / Discrete Time
Continuous (time)
wave form
Discrete (time)
index
Continuous (time)
wave form
Discrete (time)
index
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 44
Vector Notation
• Scalars:
– Signals:
• Vectors:
– Signal vectors:
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 45
• Scalars:
– Signals:
• Vectors:
– Signal vectors:
• Matrixes:
Notation
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 46
Notation
Continuous
time domain
Discrete
time domain
Continuous
freq. domain
Discrete
freq. domain
Scalar
Vector
Matrix
bold capital
letters
bold
letters
(round)
parentheses
(round)
parentheses
(squared)
brackets
‘normal’
letters
discrete
signal
(time) continuous
signal
(squared)
brackets
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 47
• Scalars:
– Impulse responses (time-varying):
– Example for a (real) convolution:
• Vectors:
– Signal vectors:
– Impulse response vectors (time-varying):
– Example for a (real) convolution:
• Matrixes:
Notation – Part 1
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 49
• Auto and cross correlation with real, stationary
stochastic processes:
– Autocorrelation function:
– Cross correlation function:
– Auto-power density spectrum:
– Cross-power density spectrum:
Notation – Part 2
Continuous frequency index
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 50
This lecture has covered …
• What speech processing is
• What speech processing is for
• Speech processing technologies
• The advantages of speech
• Types of application
• Some math notation
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 51
Any Questions ?
Ask during lecture (e.g., now),
or post in the Blackboard Discussion group
© The University of Sheffield
COM3502-4502-6502 Speech Processing: Lecture 1, slide 52
Next lecture …
Sound