程序辅导案例 > Program >

ECON6087 2023Spring Assignment1

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

For assignment 1, we will use a new corpus, “A Million News Headlines” Corpus, cov- ering all the news headlines published on the Australian news source ABC (Australian Broadcasting Corporation, http://www.abc.net.au) over a period of 19 years. The data can be accessed from the following Kaggle page https://www.kaggle.com/datasets/therohk/ million-headlines. You may also learn more details about this dataset and even found some coding examples from the same page. Please use this data to finish the following tasks:

1. Train word embeddings using word2vec on this corpus, and perform a sentiment analysis based on the word embeddings and the “positivity” vector. We construct this vector based on the same way as Luca Bellodi (2022):

2. Plot

the article-level sentiment scores by year-month.

−−−−−−→ −−−−→ −−→ −−−→ −−−−→ −−−−−−→ −−−→ −−→

positivity = success + good + happy + perfect + +important + worth + rich

−−−−→ −→ −→ −−−−→ −→ −−−→

− failure − bad − sad − terrible − bad − regret − poor

• Use the appropriate pre-processing steps that you feel fit;

• Decide on the size of dimensions, number of iterations, and which model you

would like to train;

• Choose a reasonable distance (or similarities) measure;

• Find a reasonable way to aggregate the sentiment scores for each word to the document level.

3. Try to construct sentiment scores toward different countries or international organiza- tions, such as “US”, “UK”, and “Russia”, “Iran”, “NATO”, and “UN”.

Please submit your Rmarkdown files with both codes to complete the above tasks and the plots as output. The deadline is 8 March before class (at 6:15pm).

−−→