代写辅导接单-CSC401/2511 2024 A3 Tutorial 02

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top

Dynamic Programming

in Speech

CSC401/2511 2024 A3 Tutorial 02. Presenter: Ken Shi

1

Agenda

A few remarks for GMM from Piazza

-

Dynamic Programming

-

Word Error Rate with Levenshtein Distance

-

Dynamic Time Warping in Speaker Verification

-

2

GMM: Practical Tips cont’d

How do we get the in from ?

A good module that does this:

scipy.special.logsumexp

-

3

GMM: Takeaways!!

Probability of observing x in the mth Gaussian:

-

t

Prior probability of the mth Gaussian:

-

Probability of the mth Gaussian, given x :

-

t

Probability of x in the GMM:

-

t

4

Dynamic Programming (DP) Recap

Applicable problem: one that can be recursively defined into sub-problem in a way

-

that satisfies Bellman Equation.

In human language: Problems that can be divided into smaller problem of the same

-

type via induction.

E.g. : Fibonacci Sequence: A[n] = A[n - 1] + A[n - 2]

-

- Use Recursion: A[n] = fib(n - 1) + fib(n - 2) —>O(n2)

- Use DP: A[n] = A[n - 1] + A[n - 2] —>O(n)

5

2D DP

Maintain a 2D table

-

Find a formula for each cell of the table. This formula can depend on itself of an earlier

-

step

Initialize state independent cells (Those that do not depend on others)

-

Iterate through the table and complete the rest

-

Take the desired value (usually the final step of the iteration)

-

6

Word Error Rate

For ASR Evaluation

7

Levenshtein Distance

In its origin:

8

Levenshtein in Word Error Rate (WER)

Two strings: Reference Sentences (Groundtruth) and Hypothesis Sentences (Transcript)

-

Four states describing a pair of words, one from each:

-

- Match: Two words are the same

- Insertion: A new hypothesis word added to the sequence

- Deletion: An existing reference word missing

- Substitution: A complete replacement of the word

WER formula:

-

9

WER Example

Consider this sentence pair:

Reference: how to recognize speech

-

Hypothesis: how to wreck a nice beach

-

How To Wreck A Nice Beach

0 1 2 3 4 5 6

How 1 0 1 2 3 4 5

To 2 1 0 1 2 3 4

Recognize 3 2 1 1 2 3 4

Speech 4 3 2 2 2 3 4

10

WER Example

Substitution Deletion

Consider this sentence pair: / Match

Reference: how to recognize speech

-

Insertion Destination

Hypothesis: how to wreck a nice beach

-

How To Wreck A Nice Beach

S = 2

0 1 2 3 4 5 6 I = 2

D = 0

How 1 0 1 2 3 4 5 WER = (2 + 2 + 0) / 4

= 100%!

To 2 1 0 1 2 3 4

Recognize 3 2 1 1 2 3 4

Speech 4 3 2 2 2 3 4

11

Recall: A Monotonic Forward Algorithm

Remember this from the lecture?

12

Your Job

Implement initialize, step and finalize

-

- Initialize: defines the table, initializes the helper row/column

- Step: forward calculate values of the table

- Finalize: backtrace which operation has been done through the iteration and

computes WER

13

Practical Tip: Priority Operations

What happen when there is a tie?

-

Convention: Match > Substitution > Insertion > Deletion

-

It is really a choice, but for the purpose of the assignment, we want you to follow this

-

particular convention.

14

Your Task

WER is used as a measure to evaluate different ASR system (in this case, Kaldi and

-

Google)

You shouldn’t have to modify the main function to see how it is used in action

-

(provided that your implementation is correct)

Please follow the instruction in the assignment sheet carefully, as our test may look

-

into every component of your function to evaluate your operations

Interesting thing to think about: Does the cost has to be one across the board? What if

-

they don’t?

15

Dynamic Time

Warping

In Speaker Verification

16

Task: Speaker Verification

Given: Audio samples of somebody saying a specific line/word

-

Goal: Identify if the samples are coming from the same speaker.

-

Approach: Dynamic Time Warping!

-

17

Dynamic Time Warping (DTW)

Given: two temporal sequence X and Y

-

Goal: find the optimal warping path that minimizes the cost that the path take

-

Approach: Use DP to maintain an Accumulative Cost Matrix

-

Three Conditions (Constraints):

-

- Boundary condition: start by pairing the first elements from both, end by pairing

the last elements from both.

- Monotonicity condition: every step must push forward

- Step size condition: increment in forward should be bounded by certain range

18

DTW in Speaker Verification

Follows the exact recipe like WER, but instead of dealing with strings, we deal with

-

MFCCs.

The above should automatically maintain Boundary condition, Monotonicity condition

-

The above should also make the step size 1 (upper, left or upper left)

-

In terms of cost, we will use the euclidean distance between each MFCC frame

-

For the specific usage, see example from Lecture Slides!

-

19

A few tips

The main function is written for you. It chops a few voice segments from the dataset

-

and runs the algorithm. You should be able to tell what is the correct answer just by

looking at how the segments are loaded.

The default test given to you should give you an intuitive result if it does work.

-

You are welcomed to investigate further by playing around with different segments

-

you find or listen to segments of the specified time stamp

Obviously this is not a perfect algorithm. Based on lecture material, can you think of

-

why it could sometimes underperform? Does it have to do with how this task is set up?

20

51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: Fudaojun0228