辅导案例-IFN647-Assignment 2

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

IFN647 ASSIGNMENT2.201 cont/…
IFN647 – Assignment 2 Requirements
Weighting: 35% of the assessment for IFN647

Items required to be submitted through IFN647 Blackboard:

1. A PDF or word file includes both
• Statement of completeness and your name(s) and student ID(s) in a cover page.
• Solutions to questions Q1, Q2, Q4 and Q7, and a paragraph README description
for how to execute your python code in terminal or in IDLE, the structure of your
data folder setting and import packages as well.
2. Your source code for all other questions, containing all files necessary to run the solutions
and perform the evaluation (source code only, no executables) and a main python file
(“script.py”) to run all source code you defined for all questions (using a zip file “code.zip”
to put them together).
3. A zip file “result.zip” contains all “result” data files (in text).

Please note you do not need to include the dataset folder generated by “dataset101-150.zip” in
your submission. Zip all the above file as your “student ID_Surname_Asm2.zip” and submit it
in BB before 11.59pm on 29 May 2020.

Due date of Blackboard Submission: Friday week 12 (29th May 2020)

Individual working/pair: You may work on this assignment individually or in a pair (please
note the different requirements for individual and pairs as indicated in the questions).

Currently, a major challenge is to build communication between users and Web search systems.
However, most Web search systems use user queries rather than user information needs due to
the difficulty of automatically acquiring user information needs. The first reason for this is that
users may not know how to represent their topics of interest. The second reason is that users
may not wish to invest a great deal of effort to dig out relevant pages from hundreds of
thousands of candidates provided by a Web search system.

In this assignment, you are expected to design a system, “Weak Supervision Model (WSM)”, to
provide a solution for this challenging issue. The system is broken up into three parts: Part I
(Training Set Discovery), Part II (IF model) and Part III (Evaluation). In Part I, the major task is
to present an approach in order to automatically discover a training set for a specified topic (we
will provide you 50 topics), which includes both positive documents (e.g., labelled as “1”) and
negative documents (e.g., labelled as “0”). You may need to use the topic title, description or
narratives, Pseudo-Relevance Feedback technique (or clustering technique) and an IR model for
this part to find a training set D which includes both D+ (positive – likely relevant documents)
and D-(negative – likely irrelevant documents) in a given un-labelled document set U. Part II is
to select more terms in D and discover weights for them; and then use the selected terms and
their weights to rank documents in U. Part III is the evaluation, you are required to prove your
solution is better than the query-based method (“the baseline model”) which uses only the topic
titles to rank U.
IFN647 ASSIGNMENT2.201 cont/…
2

Example of topic102 - “Convicts, repeat offenders” is described as follows:

Number: R102
Convicts, repeat offenders <desc> Description: Search for information pertaining to crimes committed by people who have been previously convicted and later released or paroled from prison. <narr> Narrative: Relevant documents are those which cite actual crimes committed by "repeat offenders" or ex-convicts. Documents which only generally discuss the topic or efforts to prevent its occurrence with no specific cases cited are irrelevant. </top> Part I: Training Set Discovery It requires obtaining a complete training set D which consists of a set of positive documents D+; and a set of negative documents D-. In this part, you attempt to present an approach (or two approaches for a pair) finding a complete training set D in U (a given unlabelled document set, e.g., the set of documents in Training102 folder), which includes at least some likely relevant documents (positive part) and some likely irrelevant documents (the negative part). The proposed approach depends on your knowledge acquired from this unit. You could discuss your approach with your tutor before you do the implementation. Q1) (6 marks) Write an algorithm (or two algorithms for a pair) in plain English to show your approach for the discovery of a complete training set for 50 topics and the corresponding 50 datasets (Training101 to Training 150). Your approach should be generic that means it is feasible to be used for all (or most) topics. For each topic, e.g., Topic102, you should use the following input and generate the output. Inputs: query Q = a topic (you may use title e.g., ‘Convicts repeat offenders’ or all information including the <desc> and <narr>); and U = folder “Traning102”. Output: D = D+ È D-, where D+ Ç D- = Æ and D Í U. The following is the possible outputs in D (not the answer) for topic 102: R102 73038 1 R102 26061 1 R102 65414 1 R102 57914 1 R102 58476 1 R102 76635 1 R102 12769 1 R102 12767 1 IFN647 ASSIGNMENT2.201 cont/… 3 R102 25096 1 R102 78836 1 R102 82227 1 R102 26611 1 R102 15200 0 R102 13320 0 R102 54745 0 R102 15082 0 R102 53523 0 R102 65306 0 R102 68419 0 R102 29920 0 R102 30456 0 R102 75563 0 R102 28657 0 R102 65394 0 R102 85372 0 Q2) (6 marks) Implement the algorithm (two algorithms for a pair) by using Python. You also need to discuss the output to justify why the proposed algorithm likely generates high quality training sets. You may use figures to show the justification. Q3) (3 marks) BM25 based baseline model implementation (see week 8 workshop) – please use the titles as queries to rank documents for each topic, and save the result into 50 files; e.g., BaselineResult1.dat, …, BaselineResult50.dat; where each row includes the document number and the corresponding relevance degree or ranking (in descendent order). The following is the possible result (not the answer) for topic 102 (in BaselineResult2.dat): 73038 5.898798484774149 26061 4.273638903483098 65414 4.1414522450167475 57914 3.967136888209526 58476 3.708467957856744 76635 3.5867337114200843 12769 3.4341129093591456 12767 3.352170358051889 25096 2.7646308089876177 78836 2.6823617071618404 82227 2.6056189593652537 26611 2.3595327588643613 24515 2.2258395867976226 33172 2.218657303566887 33203 2.2027873338265396 29908 2.188504022701605 … IFN647 ASSIGNMENT2.201 cont/… 4 Part II: Information Filtering Model Q4) (5 marks) Design an information filtering model (your WSM) that includes both a training algorithm and a testing algorithm (for an individual person) or two information filtering models (for a pair) in plain English, which illustrates your idea for using your discovered training set in Part I to learn the model. Please note your selected keywords (terms) in the discovered training set should be very important for each given topic. You will use the following input for the training algorithm to select some useful features Input: D = D+ È D- Output: Features For the testing algorithm, you will have the following input and output Input: U (e.g., folder “Traning102”). Output: sorted U Q5) (5 marks) Implement your WSM (or two models for a pair) in Python. You need to find useful features (e.g., terms) and their weights for every topic using the proposed training algorithm (in Q4) and store them in a data structure or a file. For all documents in U, you also need to calculate the relevance score for each document using the proposed testing algorithm; and sort the documents in U for each topic according to their relevance scores and save the results into “result1.dat” to “result50.dat” files for 50 topics, where each row includes the document number and the corresponding relevance score or ranking (in descendent order). The following is the possible result (not the answer) for topic 102 (in result2.dat): 73038 5.898798484774149 26061 4.273638903483098 65414 4.1414522450167475 57914 3.967136888209526 58476 3.708467957856744 76635 3.5867337114200843 12769 3.4341129093591456 12767 3.352170358051889 25096 2.7646308089876177 78836 2.6823617071618404 … IFN647 ASSIGNMENT2.201 cont/… 5 Part III: Evaluation Q6) (5 marks) Implement a python program to calculate top10 precision, recall and F1 (you may use extra measures, e.g., average precision) for both the baseline model and your WSM on all topics by using the provided relevant judgements for each topic and save the results into “EvaluationResult.dat”. Please note you can use the evaluation result to update your WSM. For each topic, e.g., Topic102, you should use the following inputs for your WSM, the output includes all evaluation results for the 50 topics: Input: “result2.dat” and “Training102.txt” Output: EvaluationResult.dat The following is the possible result (not the answer) in a csv file: Topic precision recall F1 101 0.130435 0.428571 0.20 102 0.020100 0.029630 0.023952 103 0.046875 0.214286 0.076923 … Q7) (5 marks) You will get the 5 marks if you can approve your WSM is significantly better than the baseline model (you can choose any measure used in Q6); otherwise, you will lose the 5 marks. Please use “t-test” to help you answering this question. IFN647 ASSIGNMENT2.201 Please Note • Your programs should be well laid out, easy to read and well commented. • All items submitted should be clearly labelled with your name and student number. • Marks will be awarded for programs (correctness, programming style, elegance, commenting) and evaluation results, according to the marking guide. • You will lose marks for missing or inaccurate statements of completeness, and for missing files or items. END OF ASSIGNMENT 2 </div> </div> </div> <div class="aside"> <aside> <h3>分类归档</h3> <div class="line"></div> <ul class="folder"> <li><a href="/programCase.html">ALL</a></li> <li><a href="/programCase.html?categoryId=1">C/C++代写</a> </li> <li><a href="/programCase.html?categoryId=2">Java代写</a> </li> <li><a href="/programCase.html?categoryId=3">Python代写</a> </li> <li><a href="/programCase.html?categoryId=4">Matlab代写</a> </li> <li><a href="/programCase.html?categoryId=5">数据结构代写</a> </li> <li><a href="/programCase.html?categoryId=6">机器学习 /ML代写</a> </li> <li><a href="/programCase.html?categoryId=7">操作系统代写</a> </li> <li><a href="/programCase.html?categoryId=8">金融编程代写</a> </li> <li><a href="/programCase.html?categoryId=9">Android代写</a> </li> <li><a href="/programCase.html?categoryId=10">IOS代写</a> </li> <li><a href="/programCase.html?categoryId=11">JSP代写</a> </li> <li><a href="/programCase.html?categoryId=12">ASP.NET代写</a> </li> <li><a href="/programCase.html?categoryId=13">PHP代写</a> </li> <li><a href="/programCase.html?categoryId=14">R代写</a> </li> <li><a href="/programCase.html?categoryId=15">JavaScript/js代写</a> </li> <li><a href="/programCase.html?categoryId=16">Ruby代写</a> </li> <li><a href="/programCase.html?categoryId=17">计算机网络代写</a> </li> <li><a href="/programCase.html?categoryId=18">数据库代写</a> </li> <li><a href="/programCase.html?categoryId=19">网络编程代写</a> </li> <li><a href="/programCase.html?categoryId=20">Linux编程代写</a> </li> <li><a href="/programCase.html?categoryId=21">算法代写</a> </li> <li><a href="/programCase.html?categoryId=22">汇编代写</a> </li> <li><a href="/programCase.html?categoryId=23">伪代码代写</a> </li> <li><a href="/programCase.html?categoryId=24">web代写</a> </li> <li><a href="/programCase.html?categoryId=25">c#</a> </li> <li><a href="/programCase.html?categoryId=26">图像处理</a> </li> <li><a href="/programCase.html?categoryId=27">Lisp代写</a> </li> <li><a href="/programCase.html?categoryId=28">程序代写</a> </li> <li><a href="/programCase.html?categoryId=29">留学生代写经验指导</a> </li> </ul> </aside> <aside> <h3>Tag</h3> <div class="line"></div> <ul class="tag"> <li><a href="/programCase.html?tagId=1">java代写</a> </li> <li><a href="/programCase.html?tagId=2">calculator</a> </li> <li><a href="/programCase.html?tagId=3">澳洲代写</a> </li> <li><a href="/programCase.html?tagId=4">Car log book</a> </li> <li><a href="/programCase.html?tagId=5">File System</a> </li> <li><a href="/programCase.html?tagId=6">作业代写</a> </li> <li><a href="/programCase.html?tagId=7">CS代写</a> </li> <li><a href="/programCase.html?tagId=8">作业帮助</a> </li> <li><a href="/programCase.html?tagId=9">数据库代写</a> </li> <li><a href="/programCase.html?tagId=10">database代写</a> </li> <li><a href="/programCase.html?tagId=11">作业加急</a> </li> <li><a href="/programCase.html?tagId=12">代写作业</a> </li> <li><a href="/programCase.html?tagId=13">北美代写</a> </li> <li><a href="/programCase.html?tagId=14">linux代写</a> </li> <li><a href="/programCase.html?tagId=15">Shell</a> </li> <li><a href="/programCase.html?tagId=16">C语言代写</a> </li> <li><a href="/programCase.html?tagId=17">程序代写</a> </li> <li><a href="/programCase.html?tagId=18">英国代写</a> </li> <li><a href="/programCase.html?tagId=19">计算机代写</a> </li> <li><a href="/programCase.html?tagId=20">英文代写</a> </li> <li><a href="/programCase.html?tagId=21">代写Python</a> </li> <li><a href="/programCase.html?tagId=22">It代写</a> </li> <li><a href="/programCase.html?tagId=23">留学生</a> </li> <li><a href="/programCase.html?tagId=24">温度分析</a> </li> <li><a href="/programCase.html?tagId=25">python代写</a> </li> <li><a href="/programCase.html?tagId=26">Assignment代写</a> </li> <li><a href="/programCase.html?tagId=27">chess game</a> </li> <li><a href="/programCase.html?tagId=28">游戏代写</a> </li> <li><a href="/programCase.html?tagId=29">加拿大代写</a> </li> <li><a href="/programCase.html?tagId=30">lab代写</a> </li> <li><a href="/programCase.html?tagId=31">机器学习</a> </li> <li><a href="/programCase.html?tagId=32">汇编</a> </li> </ul> </aside> </div> </div> <footer id="about"> <div class="container"> <div class="content"> <div class="tips"> 联系方式 </div> <ul> <li> 51zuoyejun@gmail.com</li>  <li>3551 Trousdale Pkwy,University Park,Los Angeles,CA</li> </ul> <div class="qrcode"> <ul> <li> <img src="/reception3/images/qr2.jpg" alt="客服二"> 微信客服:abby12468 </li> <li> <img src="/reception3/images/qr1.jpg" alt="客服一"> 微信客服:Fudaojun0228 </li> </ul> </div> 温馨提示：如果您使用手机请先保存二维码，微信识别；或者直接搜索客服微信号添加好友，如果用电脑，请直接掏出手机果断扫描。 </div> </div> <div class="bottom"> <div class="main"> <div class="logo"> <img src="/reception3/images/footer-logo.png" alt="51作业君"> </div> <div class="pages"> <ul> <li><a href="index.html">首页</a></li> <li><a href="/program.html">程序辅导</a></li> <li><a href="/paper.html">论文辅导</a></li> <li><a href="#evalute">客户好评</a></li> </ul> <ul> <li>友情链接：</li> <li><a href="https://www.hddaixie.com" target="_blank">HD代写</a></li> <li><a href="https://sanyangcoding.com" target="_blank">三洋技术团队</a></li> <li><a href="http://apluscode.net" target="_blank">apluscode代写辅导</a> </li> <li><a href="https://www.aplusdx.com" target="_blank">Aplus代写</a> </li> </ul> <ul> <li><a href="#case">客户案例</a></li> <li><a href="#about">联系我们</a></li> </ul> <ul> <li>keywords：</li> <li><a href="https://51zuoyejun.com/paper.html" title="论文辅导" target="_blank">论文辅导</a></li> <li><a href="https://51zuoyejun.com/paper.html" title="论文润色" target="_blank">论文润色</a></li> <li><a href="/paper.html" title="论文代写" target="_blank">论文代写</a> <li><a href="/program.html" title="程序辅导" target="_blank">程序辅导</a></li> <li><a href="https://51zuoyejun.com/sitemap.html" title="论文辅导" target="_blank">sitemap</a></li> </ul> </div> </div> </div> </footer> <div class="H5Link"> <ul> <li> <a href="#about"> <img src="/reception3/img/wechat.png" alt="51作业君"> 官方微信 </a> </li> <li> <a href="/index.html"> <img src="/reception3/img/arrow-up.png" alt="51作业君"> TOP </a> </li> </ul> </div> <div id="code"> <div class="code"> <img src="/reception3/images/qr1.jpg" alt="51作业君"> Email:51zuoyejun @gmail.com </div> </div> <div id="aside"> 添加客服微信: abby12468 </div> </body> <script src="/reception3/js/jq-session.js"></script>  <script src="https://cdn.bootcdn.net/ajax/libs/jquery/3.5.1/jquery.min.js"></script> <script> function change(lang) { $.ajax({ type: 'post', url: '/changeLang', dataType: 'json', data: { lang: lang }, success: function (data) { if (data == "success") { location.reload() } }, err: function (XMLHttpRequest, textStatus, errorThrown) { alert("error") } }); } /** * header */ $('header .nav a').click(function () { var eq = $(this).index() $(this).addClass('light').siblings('a').removeClass('light') }) $("#Menu").click(function () { $("header .nav").css("right", "0"); $("body").css("overflow-y", "hidden"); $(".bg").show() }) $("header .nav").on("click", function () { $("header .nav").css("right", "-3.4rem"); $(".bg").hide(); $("body").css("overflow-y", "auto"); }) $(".bg").on("click", function () { $("header .nav").css("right", "-3.4rem"); $(".bg").hide(); $("body").css("overflow-y", "auto"); }) </script> <script defer src='https://static.cloudflareinsights.com/beacon.min.js' data-cf-beacon='{"token": "960d64b0a58f4a6f96f6ee60f14c3d14"}'></script><script>(function(){if (!document.body) return;var js = "window['__CF$cv$params']={r:'8877b038bb1c8714',t:'MTcxNjMyNzI0My43MTEwMDA='};_cpo=document.createElement('script');_cpo.nonce='',_cpo.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js',document.getElementsByTagName('head')[0].appendChild(_cpo);";var _0xh = document.createElement('iframe');_0xh.height = 1;_0xh.width = 1;_0xh.style.position = 'absolute';_0xh.style.top = 0;_0xh.style.left = 0;_0xh.style.border = 'none';_0xh.style.visibility = 'hidden';document.body.appendChild(_0xh);function handler() {var _0xi = _0xh.contentDocument || _0xh.contentWindow.document;if (_0xi) {var _0xj = _0xi.createElement('script');_0xj.innerHTML = js;_0xi.getElementsByTagName('head')[0].appendChild(_0xj);}}if (document.readyState !== 'loading') {handler();} else if (window.addEventListener) {document.addEventListener('DOMContentLoaded', handler);} else {var prev = document.onreadystatechange || function () {};document.onreadystatechange = function (e) {prev(e);if (document.readyState !== 'loading') {document.onreadystatechange = prev;handler();}};}})();</script><script src="/cdn-cgi/scripts/7d0fa10a/cloudflare-static/rocket-loader.min.js" data-cf-settings="150e95d7539f41d986373006-|49" defer></script>