INFS7410part2课业解析

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

题意：

实现基于query reduction的信息检索技术，和part 1中实现的技术方法进行比较

解析：

分别实现使用IDF-r和KLI方法的query reduction检索技术。前者探讨r（保留率）的不同取值情况下对训练集和测试集结果的影响，后者对于每个检索项都要使用MLE来计算它在检索集中出现的概率要求从每个话题的标题字段创建查询。例如： Topic: CD008122 Title: Rapid diagnostic tests for diagnosing uncomplicated P. falciparum malaria in endemic countries

Query:Exp Malaria/Exp Plasmodium/Malaria.ti,ab1or2or3Exp Reagent kits, diagnostic/ 6. rapid diagnos* test*.ti,abRDT.ti,abDipstick*.ti,ab 查询关键词是Rapid diagnostic tests for diagnosing uncomplicated P. falciparum malaria in endemic countries 以BM25为标准，按每个话题比较2017和2018结果的得失，评估精度使用trec_eval。

知识点：

信息检索、IDF-r、KLI

更多可加微信讨论

微信号：ITCSdaixie

INFS7410 Project - Part 2
Preamble
The due date for this assignment is 19 September 2019 17:00 Eastern Australia Standard Time,
together with part 1.
This part of the project is worth 10% of the overall mark for INFS7410 (part 1 is woth 5% -- and
thus the whole submission of part 1 + 2 is worth 15%). A detailed marking sheet for this
assignment is provided at the end of this document.
Aim
Project aim: The aim of this project is to implement a state-of-the-art information retrieval
method, evaluate it and compare it to the baseline and rank fusion methods obtained in part 1 in
the context of a real use-case.
Project Part 2 aim
The aim of part 2 is to:
Use the evaluation infrastructure setup for part 1
implement state-of-the-art information retrieval methods, based on query reduction
evaluate, compare and analyse the developed state-of-the-art methods against baseline and
ranking fusion methods
The Information Retrieval Task: Ranking of studies for
Systematic Reviews
Part 2 of the project considers the same problem described in part 1: re-rank a set of documents
retrieved for the compilation of a systematic review. A description of the wider task is provided in
part 1.
What we provide you with (same as part 1)
We provide:
for each dataset, a list of topics to be used for training. Each topic is organised into a file.
Each topic contains a title and a Boolean query.
for each dataset, a list of topics to be used for testing. Each topic is organised into a file. Each
topic contains a title and a Boolean query.
each topic file (both those for training and those for testing), includes a list of retrieved
documents in the form of their PMIDs: these are the documents that you have to rank. Take
note: you do not need to perform the retrieval from scratch (i.e. execute the query against
the whole index); instead you need to rank (order) the provided documents.
for each dataset, and for each train and test partition, a qrels file, containing relevance
assessments for the documents to be ranked. This is to be used for evaluation.
for each dataset, and for test partitions, a set of runs from retrieval systems that
participated to CLEF 2017/2018 to be considered for fusion.
a Terrier index of the entire Pubmed collection. This index has been produced using the
Terrier stopword list and Porter stemmer.
a Java Maven project that contains the Terrier dependencies and a skeleton code to give you
a start. NOTE: Tip #1 provides you with a restructured skeleton code to make the processing
of queries more efficient.
a template for your project report.
What you need to produce
You need to produce:
correct implementations of the state-of-the-art methods required by this project
specifications
correct evaluation, analysis and comparison of the state-of-the-art method, including
comparison with the methods implemented in part 1. This should be written up into a
report following the provided template.
a project report that, following the provided template, details: an explanation of the state-ofthe-art retrieval method used (with your own words), an explanation of the evaluation
settings followed, the evaluation of results (as described above), inclusive of analysis, a
discussion of the findings. Note that you will need to provide a unique report that
encompasses both part 1 and part 2.
Required methods to implement
In part 2 of the project you are required to implement the following query reduction retrieval
method:
Query reduction using IDF-r. We have discussed this method in the week 6 lecture (online
video) and in the week 6 tutorial. This method is described in Koopman, Bevan, Liam
Cripwell, and Guido Zuccon, "Generating clinical queries from patient narratives: A
comparison between machines and humans." Proceedings of the 40th international ACM SIGIR
conference on Research and development in information retrieval. ACM, 2017. (see the first
paragraph of section 3.1 if you want a description from the literature -- ignore the settings of
described in that publication). You may have already implemented this for part 1 for
reducing the boolean queries (tip 4), and in the relevant tutorial.
Query reduction using Kullback-Liebler informativeness (KLI). This reduction method is
partially described in Daniel Locke, Guido Zuccon, and Harrisen Scells, "Automatic Query
Generation from Legal Texts for Case Law Retrieval." Asia Information Retrieval Symposium.
Springer, Cham, 2017. (top of page 187)
For IDF-r, we ask you explore reduction on the query formed by the title query. Queries will be
reduced at a reduction of , where is the retantion rate, i.e. means retaining 85%
of the original terms. We ask you explore three retantion rates on the training set: 85%, 50% and
30%. When rounding the number of query terms to retain to an integer number, use the ceiling
function.
For implementing KLI, consider the following, revised definition of this method. The KLI of a term
is formally defined by
where is the set of documents provided to rank (i.e. the documents initially retrieved by the
Boolean query), and is the entire collection as indexed in the provided index. Thus, you need to
compute, for each query term, the probability of the term appearing in the provided retrieved set
(i.e. term frequency in the set -- note, here is not representing one document!, but the set
of initially retrieved documents): use MLE to compute this. Similarly, use MLE to compute the
probability of term appearing in the collection. Query reduction is then performed by ranking
query terms in decresing value of , and applying the retaintion rate . For KLI, perform a
similar exploration of retation rates as for IDF- .
For both methods, rank documents according to the reduced queries using BM25 with the best
parameters found from part 1 for the dataset you are experimenting in.
When tuning, tune with respect to MAP.
We strongly recommend you use and extend the Maven project provided for part 1 to implement
these methods. You should have already attempted the implementation of IDF- as part of the
relevant tutorial exercise.
In the report, detail how the methods were implemented, including which formula you
implemented.
What queries to use
For part 2, we ask you to consider the queries for each topic created from the title field of each
topic. For example, consider the example (partial) topic listed below: the query will be Rapid
diagnostic tests for diagnosing uncomplicated P. falciparum malaria in endemic
countries (you may consider performing text processing). This is the same query type used in
part 1.
Above: example topic file
Required evaluation to perform
In part 1 of the project you are required to perform the following evaluation:
1. For all methods, train on the training set for the 2017 topics with respect to the retaintion
rate and test on the testing set for the 2017 topics (using the parameter value you selected
from the training set). Report the results of every method on the training (the best selected)
and on the testing set, separately, into one table. Perform statistical significance analysis
across the results of the methods.
2. Comment on the results reported in the previous table by comparing the methods on the
2017 dataset.
3. For all methods, train on the training set for the 2018 topics (with respect to the retaintion
rate and test on the testing set for the 2018 topics (using the parameter value you selected
from the training set). Report the results of every method on the training (the best selected)
and on the testing set, separately, into one table. Perform statistical significance analysis
across the results of the methods.
4. Comment on the results reported in the previous table by comparing the methods on the
2018 dataset.
5. Perform a topic-by-topic gains/losses analysis for both 2017 and 2018 results on the testing
datasets, by considering as baseline (tuned) BM25.
6. Comment on trends and differences observed when comparing the findings from 2017 and
2018 results. Is there a query reduction method that consistently outperform the others?
In terms of evaluation measures, evaluate the retrieval methods with respect to mean average
precision (MAP) using trec_eval . Remember to set the cut-off value ( -M , i.e. the maximum
number of documents per topic to use in evaluation) to the number of documents to be reranked for each of the queries. Using trec_eval , also compute Rprecision (Rprec), which is the
precision after R documents have been retrieved (by default, R is the total number of relevant
docs for the topic).
For all statistical significance analysis, use paired t-test; distinguish between p<0.05 and p<0.01.
Topic: CD008122
Title: Rapid diagnostic tests for diagnosing uncomplicated P. falciparum
malaria in endemic countries
Query:
1. Exp Malaria/
2. Exp Plasmodium/
3. Malaria.ti,ab
4. 1or2or3
5. Exp Reagent kits, diagnostic/ 6. rapid diagnos* test*.ti,ab
7. RDT.ti,ab
8. Dipstick*.ti,ab
How to submit
You will have to submit 3 files:
1. the report, formatted according to the provided template, saved as PDF or MS Word
document. Note, write the report by combining part 1 (the previous assignment) and part 2
(this assignment) results and methods. make sure you clearly label methods and results that
belong to the different assignments.
2. a zip file containing a folder called runs-part2 , which itself contains the runs (result files)
you have created for the implemented methods.
3. a zip file containing a folder called code-part2 , which itself contains all the code to re-run
your experiments. You do not need to include in this zip file the runs we have given to you.
You may need to include additional files e.g. if you manually process the topic files into an
intermediate format (rather than automatically process them from the files we provide you),
so that we can re-run your experiments to confirm your results and implementation.
If your set of runs is too big, please do the following:
include in the zip the test run
include in the zip the best train run you used to decide upon the parameter tuning
create a separate zip file with all the runs; upload it to a file sharing service like dropbox or
google drive (or similar), then make sure it is visible without login and add the link to it to
your report. Please ensure that the link to the resources is available for at least 6 days after
the submission of the assignment.
All items need to be submitted via the relevant Turnitin link in the INFS7410 Blackboard site, by 19
September 2019 17:00 Eastern Australia Standard Time, together with part 1, unless you have
been given an extension (according to UQ policy), before the due date of the assignment. Note:
appropriate, separate links are provided in the Assignment 2 folder in Blackboard for submission
of the report, or runs-part1, runs-part2, code-part1, and code-part2.
INFS 7410 Project Part 2 – Marking Sheet

Criterion	%	7 100%	4 50%	FAIL 1 0%
IMPLEMENTATION The ability to: • Understand implement and execute common IR baseline • Understand implement and execute rank fusion methods • Perform text processing	4	• Correctly implements both query reduction methods	• Correctly implements only one of the specified query reduction methods	• No implementation
EVALUATION The ability to: • Empirically evaluate and compare IR methods • Analyse the results of empirical IR evaluation • Analyse the statistical significance difference between IR methods’ effectiveness	5	• Correct empirical evaluation has been performed • Uses all required evaluation measures • Correct handling of the tuning regime (train/test) • Reports all results for the provided query sets into appropriate tables • Provides graphical analysis of results on a query-by query basis using appropriate gain-loss plots • Provides correct statistical significance analysis within the result table; and correctly describes the statistical analysis performed • Provides a written understanding and discussion of the results with respect to the methods • Provides examples of where query reduction works, and were it does not, and why, e.g., discussion with respect to queries, runs.	• Correct empirical evaluation has been performed • Uses all required evaluation measures • Correct handling of the tuning regime (train/test) • Reports all results for the provided query sets into appropriate tables • Provides graphical analysis of results on a query-by-query basis using appropriate gain-loss plots • Does not perform statistical significance analysis, or errors are present in the analysis	• No or only partial empirical evaluation has been conducted, e.g. only on a topic set, or a subset of topics • Only report a partial set of evaluation measures • Fails to correctly handle training and testing partitions, e.g. train on test, reports only overall results
WRITE UP Binary score: 0/2 The ability to: • use fluent language with correct grammar, spelling and punctuation • use appropriate paragraph, sentence structure • use appropriate style and tone of writing • produce a professionally presented document, according to the provided template	1	• Structure of the document is appropriate and meets expectations • Clarity promoted by consistent use of standard grammar, spelling and punctuation • Sentences are coherent • Paragraph structure effectively developed • Fluent, professional style and tone of writing. • No proof reading errors • Polished professional appearance	• Written expression and presentation are incoherent, with little or no structure, well below required standard • Structure of the document is not appropriate and does not meet expectations • Meaning unclear as grammar and/or spelling contain frequent errors. • Disorganised or incoherent writing.