程序代写案例-COMP6714

COMP6714 Review
Wei Wang
weiw AT cse.unsw.edu.au
School of Computer Science and Engineering
Universities of New South Wales
November 11, 2020
Course Logisitics
I THE formula:
mark =
{
0.25 · (ass1 + proj1) + 0.50 · exam , if exam ≥ 40
39FL , otherwise.
I Exam date: Exact time to be announced, 2 Dec (Wed) afternoon.
I Pre-exam consultations:
I TBA
I TBA
I Sample exam papers to be released soon.
I Course survey or private messages to me on the forum.
(1) The final exam mark is important and you must achieve at least
40! (2) Supplementary exam is only for those who cannot attend final
exam. (3) Apply for UNSW Special Consideration (SC) with sufficient
evidence and the SC team will make the final decision.
About the Final Exam
I Time: 10 minutes reading time + 2 hr open-book exam + 15
minutes scanning+uploading+submission time.
I Very important for you to know how to scan, upload, and submit.
Practice before-hand !! We will launch a practice session before
hand.
I Designed to test your understanding and familiarity of the core
contents of the course.
I 100 (8 questions)
I Similar to those in the assignment.
Special Note on the Final Exam
I We trust every student will uphold the academic integrity.
I Severe consequences for any misconduct in the final exam.
About the Final Exam . . .
I Read the instructions carefully.
I You can answer the questions in any order.
I Some of the “Advanced” Methods/algorithms/systems are not
required, unless explicitly mentioned here.
Tip: Write down intermediate steps, so that we can give you partial
marks even if the final answer is wrong.
Disclaimer: We will go through the main contents of each lecture.
However, note that it is by no means exhaustive.
Boolean Model
I incidence vector
I semantics of the query model (AND/OR/NOT, and other operators,
e.g., /k, /S)
I inverted index, positional inverted index
I query processing methods for basic and advanced boolean queries
(including phrase query, queries with /S operator, etc.)
I query optimization methods (list merge order, skip pointers)
I Not required: next-word index
Preprocessing
I typical preprocessing steps: tokenization, stopword removal,
stemming/lemmatization,
Index Construction
I Why we need dedicated algorithms to build the index?
I BSBI: Blocked sort-based indexing
I SPIMI: Single-pass in-memory indexing
I Dynamic indexing: Immediate merge, no merge, logarithmic merge
Vector Space Model
I What is/why ranked retrieval?
I raw and normalized tf, idf
I cosine similarity
I tf-idf variants (using SMART notation): e.g., lnc.ltc
I basic query processing method: document-at-a-time vs
term-at-a-time
I exact & approximate query optimization methods (heap-based top-k
algorithm, MaxScore and WAND algorithms, etc.)
I Not required: Query processing methods based on advanced or
tiered inverted indexes (e.g., high/low lists, impact-oriented lists,
etc.)
Evaluation
I Existing method to prepare for the benchmark dataset, queries, and
ground truth
I For unranked results: Precision, recall, F-measure
I For ranked results: precision-recall graph, 11-point interpolated
precision, MAP, etc.
I Not required: NDCG, Kappa (κ) measure for inter-judge
(dis)agreement
Probabilistic Model and Language Model
I Probability ranking principle (intuitively, how to rank documents and
when to stop)
I derivation of the ranking formula of the probabilistic model
I the BM25 method
I Query-likelihood unigram language model with Jelinek-Mercer
smoothing.
Web Search Basics
I Difference between Web search and Information Retrieval.
I Estimation of relative sizes of two search engines.
I Near duplicate detection: the shingling method
I Not required: the SimHash method.
Crawling
I Understand the requirements and the current architecture of
crawlers (e.g., the Mercator architecture).
I Not required: optimization for age, finding content blocks, etc.
Link Analysis
I The pagerank algorithm: theory and practice
I Not required: the topic-specific/personalized pagerank
Thanks and Good Luck!

欢迎咨询51作业君
51作业君 51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: ITCSdaixie