辅导案例-CE314/887-Assignment 1

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

CE314/887 Assignment 1

Issued data: November 05
th
2020
Faser due date: November 20
th
2020
Latest date for no penalty submission: November 27
th
2020

You should provide code for part 1 and part 2, no coding needed in part 3.

Part 1: Regular expression (40%) (You can store your code in output
part1_regularexpression_studentID.py)•

• (20%) Write a regular expression that can find all amounts of money in a text. Your expression
should be able to deal with different formats and currencies, for example £50,000 and
£117.3m as well as 30p, 500m euro, 338bn euros, $15bn and $92.88. Make sure that you can
at least detect amounts in Pounds, Dollars and Euros.

For full marks: include the output of a Python program that applies your regular expression
to the following BBC News Web site:

https://www.bbc.co.uk/news/business-41779341

• (20%) Write a regular expression that can matching all phone numbers listed below: (You
can write a python program to check the matching results)

555.123.4565
+1-(800)-545-2468
2-(800)-545-2468
3-800-545-2468
555-123-3456
555 222 3342
(234) 234 2442
(243)-234-2342
1234567890
123.456.7890
123.4567
123-4567
1234567900
12345678900

Part 2: NLTK (10%) •

Find the 50 highest frequency word in Wall Street Journal corpus in NLTK.books (text7), submit
your code as the name: part2_NLTK_studentID.py (All punctuation removed and all words
lowercased.)

Part 3: Language modeling (50% Paper work based, no need to code for this part)

Exercise 1 Consider the following toy example

Training data:
~~I am Sam~~
~~Sam I am~~
~~Sam I like~~
~~Sam I do like~~
~~do I like Sam~~

Assume that we use a bigram language model based on the above training data.

1. What is the most probable next word predicted by the model for the following word
sequences? (10%)

(a) Sam ...
(b) Sam I do ...
(c) Sam I am Sam ...
(d) do I like ...

2. Which of the following sentences is better, i.e., gets a higher probability with this model?
(10%)

(a) ~~Sam I do I like~~
(b) ~~Sam I am~~
(c) ~~I do like Sam I am~~

Exercise 2 Consider again the same training data and the same bigram model. Compute the
perplexity of (10%)

I do like

Exercise 3 Take again the same training data. This time, we use a bigram LM with Laplace
smoothing.

1, Give the following bigram probabilities estimated by this model:

P(do|)
P(do|Sam)
P(Sam|)
P(Sam|do)
P(I|Sam)
P(I|do)
P(like|I)
Note that for each word wn−1, we count an additional bigram for each possible continuation
wn. Consequently, we have to take the words into consideration and also the symbol .(10%)

2. Calculate the probabilities of the following sequences according to this model:

(a) do Sam I like
(b) Sam do I like

Which of the two sequences is more probable according to our LM (language modeling)?
(10%)

欢迎咨询51作业君

分类归档

ALL

C/C++代写

Java代写

Python代写

Matlab代写

数据结构代写

机器学习 /ML代写

操作系统代写

金融编程代写

Android代写

IOS代写

JSP代写

ASP.NET代写

PHP代写

R代写

JavaScript/js代写

Ruby代写

计算机网络代写

数据库代写

网络编程代写

Linux编程代写

算法代写

汇编代写

伪代码代写

web代写

c#

图像处理

Lisp代写

程序代写

留学生代写经验指导

Tag

java代写

calculator

澳洲代写

Car log book

File System

作业代写

CS代写

作业帮助

数据库代写

database代写

作业加急

代写作业

北美代写

linux代写

Shell

C语言代写

程序代写

英国代写

计算机代写

英文代写

代写Python

It代写

留学生

温度分析

python代写

Assignment代写

chess game

游戏代写

加拿大代写

lab代写

机器学习

汇编