程序辅导案例 > Program >

代写辅导接单-605.646 Natural Language Processing: Class Project

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

605.646 Natural Language Processing: Class Project

Overview

The class project allows you the opportunity to investigate a particular topic in greater depth

than we can cover in the classroom. Projects are individual endeavors that require advance

planning and progressive effort to complete successfully. Most projects involve writing or

using software to conduct an experiment or process a textual dataset. It is acceptable and

encouraged to use open-source toolkits and source code (with citation) for components of

your project. Successful projects usually have the following characteristics:

● involve working with two or more distinct NLP techniques (not necessarily ones

specifically covered in the class lectures)

● work with language data

● include meaningful experiments and report quantitative results

● are scoped appropriately for completion in a one-semester course

We provide below specifications for three diverse projects: (a) answering factoid questions,

(b) cross-language information retrieval, and (c) detecting adverse drug reactions; you may

choose any one of them. If there is a different project that interests you, perhaps from a

hobby or professional interest, you may propose to do that instead. Projects focused on

topics like sentiment analysis, text retrieval, information extraction, authorship attribution,

spam filtering, detecting fake reviews or fake news, dialog systems, large language models,

or translation are all reasonable ideas.

To earn a grade of A- or higher in the course, students must complete and submit a project;

however, completing a project does not guarantee receiving an A- or higher. Students may

opt out of submitting a project, in which case the other coursework will determine the final

grade, as discussed in the course syllabus.

Grading Criteria

Project grades are based on the work performed and documented in the written report

(70%), and a presentation to the class (30%) that is a shared, pre-recorded video. Criteria

that we use to score presentations are:

1. Were the project’s goals and motivation sufficiently explained? (1-10)

2. Was suitable and meaningful background information presented (e.g., prior work)? (1-10)

3. Did the presentation provide sufficient technical detail (1-5) and articulate a contribution or

insight? (1-5)

4. Clarity of the presentation, quality of slides or materials, appropriate length (1-10).

5. Was the work well thought out? Did conclusions follow from the argument or experimental

results? (1-10)

Proposal

Irrespective of whether you are doing one of the projects provided by the instructors or one

of your own devising, you must submit a written proposal in Canvas for approval by the

instructors no later than end of Module 6. The proposal should have a title, must identify the

project topic, should briefly motivate why this is an interesting or important natural language

problem, identify some relevant scientific literature for the problem of interest, identify

sources of data, and outline planned work for the project. Sufficient details about data,

experimental design, and evaluation methodology should be provided. (For the instructor-

provided projects, some of this information will be easy to compile.) Proposals are usually

less than a page in length. If you have a project topic that interests you, but you have

questions or are not sure how to proceed, you are strongly encouraged to contact us

informally for ideas or feedback in advance of submitting the proposal.

Written Report

The written report is the most significant project deliverable – it is where you document the

work that you have performed, and it counts for most of the project grade. Reports should

be scientifically oriented and should include an abstract, an introduction to the problem, a

brief review of related work, details about experimental design (e.g., how training/dev/test

data is used, what evaluation metrics are reported, etc.), experimental results with analysis,

findings supported by the work, and appropriate references. You have flexibility in the style

of formatting; however, do include headings, and use a font between 10 and 12 points.

Suitable tables and figures are highly encouraged.

You should take care to clearly communicate the scale and quality of your work. We leave

the length of the report up to you, but as a rough guideline, five pages is probably too short,

and over 10 pages is getting long. You do not need to include source code (but details about

the amount of code you wrote, or which packages you used can be informative). And we

repeat that tables, charts, sample data, and figures that help explain experimental results

and observations are valued.

Reports are due on the last day of the final Module and should be submitted in Canvas as a

single PDF file.

Presentation

You will share an approximately ten-minute video presentation about your project during the

last Module. A suggested format is voice-annotated slides created using PowerPoint,

Keynote, OpenOffice, etc. The presentation should focus on describing the problem, your

approach, any difficulties encountered, qualitative and quantitative results, and any

interesting observations and findings. Experiments are not always successful, and you can

achieve a good score on the project, even with negative results; however, your design should

be good and you need to articulate what was learned.

Schedule

Module 6 By Day 7 of Module 6, select a topic and submit a proposal in Canvas (as

PDF). One page is enough. Earlier is okay. You are welcome to contact the

instructors ahead of time to informally discuss ideas.

Anytime You are welcome to contact the instructors for advice if you have questions

about projects. We are available during office hours or by email.

Module 11 There will be no new lecture material or assigned readings this week. We

will set up times on a calendar when students may meet with the

instructors for individual project consulting. If you prefer not to meet over

Zoom to discuss your project, please do send us a brief status update by

email to both instructors by the end of the Module. We mainly want to

know if you are making progress, and if you discover any serious

impediments to the project that might require a late change in plans.

Module 14 By Day 1 of Module 14, create a discussion post titled "Project video: BRIEF

TITLE" with an attached video or a link to your video online.

Module 14 By Day 7 of Module 14, upload your written report as a PDF file in Canvas

Literature

Numerous resources are available to you. Research papers can be found via Google

Scholar, the ACL Anthology, arXiv, CiteSeer, or websites for various conferences. JHU

libraries can provide access to the ACM and IEEE digital libraries. (You may have to be VPN'd

into the JHU network to use some of these resources.)

Datasets

There are several shared tasks with available datasets. Data from Kaggle or HuggingFace

may be a good starting place for some projects. The computational linguistics community

also runs many shared tasks at conferences. One of the more popular evaluation

workshops is SemEval, which has run tasks for many years. The websites for the most

recent completed campaigns are:

https://semeval.github.io/SemEval2024/

https://semeval.github.io/SemEval2023/

https://semeval.github.io/SemEval2022/

https://semeval.github.io/SemEval2021/

Citation

The source of any code not written by you must be cited. This includes online tutorials, code

completion software, other students, etc.

Project A: Answering Factoid Questions

This project revolves around building and evaluating a system that attempts to automatically answer

a question whose answer is generally a short noun phrase. The question should be answered based

on information from the documents in the provided collection, not from general world knowledge,

an existing knowledge graph, or Internet sources.

We are providing a small collection of ~ 227k English sentences. The sentence collection is based

on news articles written by the Southeast European Times, a now-defunct news portal that closed in

2015. The site published content covering the Balkans. We are not providing you with any training

data; however, we are providing an evaluation set of 50 questions that predominantly seek a person

or a location as the response. Because the expected answers are short noun phrases, you should

generate one response per question that is no longer than 100 characters in length. You can score

a set of responses using the provided ScoreAnswers script on a file of answers, one per line.

The factoid QA task was popularized at the NIST TREC-8 evaluation in 1999. With the more recent

advent of deep learning, additional datasets have become available such as NewsQA and SQUAD.

Many early QA systems followed the same general architecture, which consists of three or four

primary components in a pipeline:

• Analyze the question (and determine the answer type)

• Document / Passage search

• Candidate answer extraction (possibly exploiting NER)

• Optional validation and selection of the top-ranked response

A more recent approach is based on using document search and LLMs to perform question

answering. This method is briefly described in J&M Chapter 14.

We expect that you will implement a baseline approach, evaluate performance quantitatively, and

then conduct experiments to try to improve performance on the task. Your work should make use of

at least two distinct HLT technologies; a solution based solely on LLM extraction (if you attempt that)

is not enough, but a combination of LLM with RAG would be. It is acceptable to use general purpose

NLP tools; however, you should not merely run an QA system that has been created by others. (It is

permissible to run an QA system of others for the purpose of comparing your performance to

previous results.) Your analysis could include a comparison of different approaches, a measure of

the benefit of data augmentation or fine-tuning, or other areas you find of interest.

You should measure and report system performance using the script and test data that we provide.

However, you may use other datasets to help develop your system or to conduct your experiments.

NLPProgress.com hosts a QA leaderboard page that is a good place to look for English language QA

resources.

Project B: Cross-Language Information Retrieval

Finding information in a language that you speak is usually straightforward. For example, if you speak

English and are trying to find Web documents you can use a variety of search engines, such as Bing,

Google, or DuckDuckGo, to find information on a topic of interest, then read those documents

directly. But what if the information you are seeking is only published in a language that you can’t

read? This used to be a rare use case. But with the advent of high-quality machine translation it

makes much more sense. A Cross-language Information Retrieval (CLIR) system takes queries in one

language and returns relevant documents that are written in a different language.

You are to build a CLIR system. Your system will:

• Index a large document collection in Chinese, Russian, or Persian

• Take English queries as input

• Use machine translation to translate queries into the language of the documents

• Use your index to retrieve the top 1000 most relevant documents for each query

• Evaluate your system using nDCG@10, recall, and average precision

Data

You will use a CLIR dataset from the TREC NeuCLIR collection (use of a different collection is

possible with permission of the instructors). These datasets include:

• Documents in a non-English language, either Chinese, Russian, or Persian

• Topics. These are English expressions of a user information need that you will convert to a

query provided as input to your retrieval engine. NeuCLIR topics include a two- or three-word

title and a sentence-length description. Here is an example:

title: Iranian female athletes refugees

description: I am looking for stories about Iranian female athletes who seek asylum in other countries.

• Judgments. Usually called qrels for historical reasons, these are decisions about the

relevance of given documents to the topics. Each qrels entry includes a query ID query_id, a

document ID doc_id, and a relevance judgment relevance. NeuCLIR has three relevance levels:

‘3’ meaning the document is highly relevant to the topic, ‘1’ meaning it’s somewhat relevant,

and ‘0’ meaning it’s not relevant.

The NeuCLIR collections are available from https://ir-datasets.com/neuclir.html. This site hosts

many information retrieval datasets with easy-to-use Python interfaces. You will need to pip install

ir_datasets.

Retrieval Systems

You may use any monolingual information retrieval system you like, including one of your own

construction if you so choose. Options you might consider include: Terrier, Anserini, Lucene,

ColBERT, etc. We recommend starting with a statistical system as they are fast, reliable, and do not

typically require training. Then if you have time you are welcome to experiment with a neural system.

Translation Systems

You may use a machine translation system of your choice. Options you might consider include

EasyNMT, TranslateShell (which can call Bing or Google APIs), or NLLB.

Evaluation

For evaluation you should report nDCG@10, recall, and average precision. These measures are all

easily available through IR Measures. You will need to pip install ir_measures.

Tasks

We expect you to create a baseline system and assess its performance on your chosen NeuCLIR

collection. Then you should attempt to improve your baseline system and report the results of your

enhancement(s) compared to the baseline approach. For example, you might focus on better

tokenization of the document language, improved query translation, query expansion, or a different

retrieval algorithm.

To perform error analysis, you might choose to use machine translation to convert some of the top

ranked documents to English and display them. This requires translation in the opposite direction

from the queries, but the mechanism should be the same.

Your project should conform to the requirements in the Class Project handout. Reminder: the source

of any code not written by you must be cited. This includes online tutorials, code completion

software, other students, etc.

Project C: Detecting Adverse Drug Reactions

You are to build a system that takes in portions of English text (i.e., sentences or short paragraphs)

and extracts any tuples (drug-or-intervention, adverse-consequence) that are supported by the text.

For example, given the passage: “Children are not permitted to take aspirin because of Reye’s

syndrome”, the system should extract (aspirin, Reye’s syndrome). You will be given some training

and evaluation data for this task, and you should evaluate your system performance using precision,

recall, and a composite F1 score.

We are providing training and evaluation data using the ade_corpus_v2 benchmark, a version of

which is described in a paper by Gurulingappa et al., and which can be found on Hugging Face. We

have created an 80%/10%/10% partition for you to use. The data are in TSV format with three

columns: text, drug, and effect.

Some ideas:

• Detecting ADRs in natural text is a well-studied problem. You may want to review the existing

literature for suggestions about auxiliary data sources or techniques. Searching the ACL

Anthology is a good place to start.

• You may benefit from utilizing an existing list of drug names, or adverse reactions, or

automatically learning them yourself from unlabeled corpora.

• There are some databases of known drug/side-effect pairs which could potentially be of use,

for example:

o https://www.canada.ca/en/health-canada/services/drugs-health-

products/medeffect-canada/adverse-reaction-database.html

o https://fis.fda.gov/extensions/FPD-QDE-FAERS/FPD-QDE-FAERS.html

And there may be other databases that are easier to use.

• A variant of the problem you could explore would be to take the English language training

data that we have provided, and using translation or multilingual embeddings, try to adapt

this to extract ADRs on non-English data (e.g., Russian: https://github.com/cimm-

kzn/RuDReC or Spanish: https://github.com/isegura/ADR).

We expect that you will create a baseline approach, evaluate performance quantitatively, and then

conduct experiments to try to improve performance on the task. Your work should make use of at

least two distinct HLT technologies. It is acceptable to use general purpose NLP tools, however, you

should not merely run an ADR system that has been created by others. (It is permissible to run an

ADR system of others for the purpose of comparing your performance to previous results.) Your

analysis could include a comparison of different machine learning approaches, the utility of different

features, measuring the benefit of data augmentation, or other areas you find of interest.

Some approaches you might consider exploring include:

• First detect drug names (or other medical interventions), and adverse reactions using regular

expressions, gazetteers, or sequence taggers. Then use supervised machine learning to

extract possible pairs given the drugs/reactions.

• Try various prompts and call an LLM such as ChatGPT to take a passage of text and extract

desired pairs.