代写辅导接单-COMP5425/4425 -python代写

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top

School of Computer Science

Page 1 of 3

COMP5425/4425 Multimedia Retrieval 2025S1

Homework

Homework is an individual assessment and worth 15% of the total assessment of this unit

of study. It is due in Week 08.

You need to

1)

Complete a programming task, write a report of your implementation and results,

produce a video to explain your implementation and results.

2)

make online submission of your source codes, your report, and your video in a zip

file named with your unikey.

Due: Tue, 15/04/2025 (Week 08)

Note that this homework aims to provide students opportunity to learn and implement a

text retrieval algorithm of a specific application and you must complete the task by yourself.

Using Artificial Intelligence (AI) tools is allowed and needs to be acknowledged, if it

complies with the University’s policies.

Some codes of this task may have been available publicly (e.g., on GitHub) or the task has

been implemented as a set of function calls in some programming libraries. You must not

use such resources or complete the project directly with those function calls, while you can

still use programming libraries for components which are not directly relevant to the task to

be completed.

What you can ONLY use:

- Python language itself

- NLTK for tokenization, lemmatization.

- NumPy, SciPy.

- built-in library from python (as listed https://docs.python.org/3/library/)

- Any IDE/Editor

Task – Passage Retrieval/Ranking

Your need to implement a simple passage retrieval/ranking system. Your system needs to

accept a text as query and retrieve a list of documents.

Inputs:

1. query(str): a textual query, it can be:

- A simple word, phrase, sentence OR

- Boolean expression.

A Boolean expression combines keywords with Boolean operators (i.e. "AND", "OR",

"NOT"). An example query is: `"Australia AND (Housing OR Property) NOT Car"`

The words (or keywords) can be exact words or wildcards, you only need to handle trailing

wildcards (e.g. Austra* matches all words starting with Austra with at least 0 any trailing

characters). You do not need to implement all the query parsing, refer to the requirements

paragraph for details.

COMP5425 Multimedia Retrieval PROJECT

School of Computer Science

Page 2 of 3

2. K(integer): the number of top K documents in your ranking/retrieval.

Outputs:

1. Your output needs to contain top K documents id.

2. Your code can get the documents content with documents id.

3. You results needs to be in JSON format

An example output is:

Dataset:

You can choose any retrieval dataset and sub-sample them to obtain your own dataset.

Some examples are MS Marco(https://microsoft.github.io/msmarco/Datasets.html), TREC- CAR (https://github.com/BRIGHT-benchmark/BRIGHT). You can choose your own

appropriate processing methods.

Requirements:

You can choose several features for your query system.

1. Queries and documents processing

1. You need to process the text appropriately, such as tokenization,

lemmization, etc.

2. You need at least to implement: simple key words query.

3. You can implement Boolean expression [3, 4], wildcard query for higher

scores.

2. Ranking criteria/score

1. You need to implement ranking criteria to determine the similarity between

your query and passages.

2. You need to implement TD-IDF score.

3. You can implement BM25 (Best Matching 25) [1] for higher scores.

3. UI

1. You can implement Command Line Interface (CLI).

Marking Scheme (15 marks)

1. [12 marks] Algorithm implementation

Up to 6 marks

Implement basic function. Exact keywords matching. TF-IDF score implemented.

COMP5425 Multimedia Retrieval PROJECT

School of Computer Science

Page 3 of 3

Up to 9 marks

In addition to above features, implemented inverted-index, wildcard query.

Up to 12 marks

In addition to above features, implemented advanced scoring (e.g. BM25), Boolean query.

[3 marks] Report and Video

Please prepare a brief report up to 4 pages to describe your method and results. It could

contain key steps or details of the algorithm implemented, experimental settings, and

experimental results (or text cases), and several key references. A clear report will allow

readers to reproduce your work.

Please produce a video up to 3 minutes and 30 MB in MP4 format to explain and

demonstrate the implementation of your project. You may be asked to demonstrate and

explain your implementation to your tutor, if the tutor needs more information. Marks may

be deducted if you are not able to explain your work clearly.

References

1. Robertson, Stephen; Zaragoza, Hugo (2009). The Probabilistic Relevance Framework:

BM25 and Beyond (PDF). NOW Publishers, Inc.

2. Yuanhua Lv and ChengXiang Zhai. Lower-bounding term frequency normalization. In

Proceedings of CIKM'2011, pages 7-16

3. Aho, Alfred Vaino; Lam, Monica Sin-Ling; Sethi, Ravi; Ullman, Jeffrey David (2006).

Compilers: Principles, Techniques, and Tools (2 ed.). Boston, Massachusetts, US:

Addison-Wesley. ISBN 0-321-48681-1. OCLC 70775643

4. Reverse Polish Notation https://mathworld.wolfram.com/ReversePolishNotation.html

51作业君版权所有

51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468