程序辅导案例 > Program >

代写辅导接单-CSC401/2511 2024 A3 Tutorial 02

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Dynamic Programming

in Speech

CSC401/2511 2024 A3 Tutorial 02. Presenter: Ken Shi

Agenda

A few remarks for GMM from Piazza

Dynamic Programming

Word Error Rate with Levenshtein Distance

Dynamic Time Warping in Speaker Verification

GMM: Practical Tips cont’d

How do we get the in from ?

A good module that does this:

scipy.special.logsumexp

GMM: Takeaways!!

Probability of observing x in the mth Gaussian:

Prior probability of the mth Gaussian:

Probability of the mth Gaussian, given x :

Probability of x in the GMM:

Dynamic Programming (DP) Recap

Applicable problem: one that can be recursively defined into sub-problem in a way

that satisfies Bellman Equation.

In human language: Problems that can be divided into smaller problem of the same

type via induction.

E.g. : Fibonacci Sequence: A[n] = A[n - 1] + A[n - 2]

- Use Recursion: A[n] = fib(n - 1) + fib(n - 2) —>O(n2)

- Use DP: A[n] = A[n - 1] + A[n - 2] —>O(n)

2D DP

Maintain a 2D table

Find a formula for each cell of the table. This formula can depend on itself of an earlier

step

Initialize state independent cells (Those that do not depend on others)

Iterate through the table and complete the rest

Take the desired value (usually the final step of the iteration)

Word Error Rate

For ASR Evaluation

Levenshtein Distance

In its origin:

Levenshtein in Word Error Rate (WER)

Two strings: Reference Sentences (Groundtruth) and Hypothesis Sentences (Transcript)

Four states describing a pair of words, one from each:

- Match: Two words are the same

- Insertion: A new hypothesis word added to the sequence

- Deletion: An existing reference word missing

- Substitution: A complete replacement of the word

WER formula:

WER Example

Consider this sentence pair:

Reference: how to recognize speech

Hypothesis: how to wreck a nice beach

How To Wreck A Nice Beach

0 1 2 3 4 5 6

How 1 0 1 2 3 4 5

To 2 1 0 1 2 3 4

Recognize 3 2 1 1 2 3 4

Speech 4 3 2 2 2 3 4

WER Example

Substitution Deletion

Consider this sentence pair: / Match

Reference: how to recognize speech

Insertion Destination

Hypothesis: how to wreck a nice beach

How To Wreck A Nice Beach

S = 2

0 1 2 3 4 5 6 I = 2

D = 0

How 1 0 1 2 3 4 5 WER = (2 + 2 + 0) / 4

= 100%!

To 2 1 0 1 2 3 4

Recognize 3 2 1 1 2 3 4

Speech 4 3 2 2 2 3 4

Recall: A Monotonic Forward Algorithm

Remember this from the lecture?

Your Job

Implement initialize, step and finalize

- Initialize: defines the table, initializes the helper row/column

- Step: forward calculate values of the table

- Finalize: backtrace which operation has been done through the iteration and

computes WER

Practical Tip: Priority Operations

What happen when there is a tie?

Convention: Match > Substitution > Insertion > Deletion

It is really a choice, but for the purpose of the assignment, we want you to follow this

particular convention.

Your Task

WER is used as a measure to evaluate different ASR system (in this case, Kaldi and

Google)

You shouldn’t have to modify the main function to see how it is used in action

(provided that your implementation is correct)

Please follow the instruction in the assignment sheet carefully, as our test may look

into every component of your function to evaluate your operations

Interesting thing to think about: Does the cost has to be one across the board? What if

they don’t?

Dynamic Time

Warping

In Speaker Verification

Task: Speaker Verification

Given: Audio samples of somebody saying a specific line/word

Goal: Identify if the samples are coming from the same speaker.

Approach: Dynamic Time Warping!

Dynamic Time Warping (DTW)

Given: two temporal sequence X and Y

Goal: find the optimal warping path that minimizes the cost that the path take

Approach: Use DP to maintain an Accumulative Cost Matrix

Three Conditions (Constraints):

- Boundary condition: start by pairing the first elements from both, end by pairing

the last elements from both.

- Monotonicity condition: every step must push forward

- Step size condition: increment in forward should be bounded by certain range

DTW in Speaker Verification

Follows the exact recipe like WER, but instead of dealing with strings, we deal with

MFCCs.

The above should automatically maintain Boundary condition, Monotonicity condition

The above should also make the step size 1 (upper, left or upper left)

In terms of cost, we will use the euclidean distance between each MFCC frame

For the specific usage, see example from Lecture Slides!

A few tips

The main function is written for you. It chops a few voice segments from the dataset

and runs the algorithm. You should be able to tell what is the correct answer just by

looking at how the segments are loaded.

The default test given to you should give you an intuitive result if it does work.

You are welcomed to investigate further by playing around with different segments

you find or listen to segments of the specified time stamp

Obviously this is not a perfect algorithm. Based on lecture material, can you think of

why it could sometimes underperform? Does it have to do with how this task is set up?