辅导案例-MCEN90048

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

MCEN90048 Artificial Intelligence for Mechatronics
Project 2: Recent Research and Applications of Artificial Intelligence
Contents
1 Summary 1
1.1 Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Submission and Due Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Project Background 3
2.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Method Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Industrial and Clinical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Project Protocols 4
4 Expected Deliverables 5
5 Marking Critera 6
Appendix A Topic Description 8
A.1 LR. Literature Review Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
LR.1. Reinforcement Learning in Mechatronics . . . . . . . . . . . . . . . . . . . . . . . . . . 8
LR.2. Continual and Incremental Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
LR.3. Recent Advances in Neuro Fuzzy Systems . . . . . . . . . . . . . . . . . . . . . . . . . 8
LR.4. Fundamentals and Recent Advances of Transfer Learning . . . . . . . . . . . . . . . . . 8
LR.5. Causality Inference in Deep Learning - Towards Artificial General Intelligence . . . . . 9
LR.6. Neural Architecture Search - Identifying Optimal Neural Networks . . . . . . . . . . . 9
LR.7. Anomaly Detection using Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . 9
A.2 MR. Method Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
MR.1. Data Generation using Generative Adversarial Networks . . . . . . . . . . . . . . . . . 9
MR.2. Data Visualization with Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . 9
A.3 IC. Industrial and Clinical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
IC.1. & IC.2. COVID-19 Related Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
IC.3. Incremental Learning in Industrial Mechatronics . . . . . . . . . . . . . . . . . . . . . . 11
IC.4. Positive Unlabeled Learning in Drug Repositioning . . . . . . . . . . . . . . . . . . . . . 11
A.4 SS. Student-Suggested Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1 Summary
1.1 Topics
In this project, we invite you to form a group of one to four students to participate in the recent research
and real-world applications of artificial intelligence. We offer a selection of topics from which each group
may choose one to form a project. The topics are from four categories:
1
1. LR: Literature Review - investigate the theoretical and mathematical foundations, recent advances,
current challenges and future directions of a topic.
2. MR: Methods Research - investigate existing methods for specific tasks, benchmark the methods, and
explore possible improvements.
3. IC: Industrial and Clinical Applications of AI - apply cutting-edge artificial intelligence methods on
real-world problems.
4. SS: Suggested by Students - propose your own projects, set up your own goals and accomplish them.
The proposed projects may belong to any of the above categories.
In the table below, we propose a list of 13 topics into three categories (LR, MR, IC). Your group need to
choose one topic out of these 13 to form a project. You may propose your own project. Please read Section
2 on the background of LR, MR, and IC projects. For more details on each specific project and related
resources, please read Appendix A.
Type Topic (listed not in particular order) Coordinator
LR
1 Reinforcement Learning in Mechatronics Saman
2 Continuous and Incremental Learning Damith
3 Recent Advances in Neuro Fuzzy Systems Saman
4 Fundamentals and Recent Advances of Transfer Learning Saman
5 Causality Inference in Deep Learning - Towards Artificial General Intelligence Richard
6 Neural Architecture Search - Identifying the Optimal Neural Networks Richard
7 Anomaly Detection using Deep Neural Networks Saman
MR
1 Data Generation using Generative Adversarial Networks Richard
2 Data Visualization with Unsupervised Learning Damith
IC
1 COVID-19 Pneumonia Diagnosis Using Chest X-Rays Richard
2 Kaggle Competition: Forecasting Global Daily Confirmed COVID-19 Cases Richard
3 Incremental Learning in Industrial Mechatronics Damith
4 Positive Unlabeled Learning in Drug Repositioning Saman
SS Projects Suggested by Students -*
* For SS projects, you may invite any one of the three coordinators to be your group coordinator.
1.2 Submission and Due Dates
Group formation This project is expected be completed by a group of one to four students. The group
should be finalized by 6:00 pm on Wednesday 6th May 2020. For group registration, please go to
Canvas - People - Groups to join any of the groups with your teammates and reply to us on Canvas -
Discussion the name for your group. If you wish to propose your own project, the topic should be ready
by this time. A short project proposal on the topic is appreciated for evaluation of the applicability of the
proposed project.
Expected deliverables The outcomes of this project depend on the topic you choose. They may include:
1. A project proposal, half-page to one page, applicable to SS projects only; due by 6:00 pm on Wednes-
day 6th May 2020.
2. An initial report, one to two pages, applicable to all projects; due by 6:00 pm on Friday 15th May
2020; the initial report may be extended into the final report.
2
3. A short presentation submitted as a video record, ideally six to eight minutes, applicable to all projects;
due at 6:00 pm on Sunday 24th May 2020.
4. Python code and result files, applicable to MR and IC projects; due by 6:00 pm on Friday 5th June
2020.
5. The submission result on the Kaggle website, only applicable to Project IC.2 Kaggle Competition:
Forecasting Global Daily Confirmed COVID-19 Cases; due at 6:00 pm on Friday 5th June 2020.
6. A final report, four to five pages, applicable to all projects; due by 6:00 pm on Friday 5th June
2020. This may be extended from the initial report.
The Outcomes 2 - 6 are submitted via Canvas – Assessment. The Outcome 1 project proposal (only for
SS projects) should be sent via email to both Richard and Damith. Detailed explanation for each topic is
provided in Appendx A.
2 Project Background
This assignment aims to provide the students with the experience of research and real-world applications of
artificial intelligence related to Mechatronics and adjacent fields. In this section, we introduce the background
for each type of projects in this assignment.
2.1 Literature Review
A literature review is a critical analysis of published literature that present the current knowledge including
substantive findings as well as theoretical and methodological contributions to a particular topic. It is an
assessment of the literature and provides a summary, classification, comparison and evaluation, and often the
first step towards solving any complex real-world problems. Although literature review is offered as project
category in this assignment, all other projects include a portion of reviewing existing methods.
For the groups choosing a topic from the LR category, each student is expected to read in detail at least
5 research papers from the recent decade. You are required to understand the overarching mathematical
principles, problem formulation and categorize literature according to the high-level ideas. This should also
be complemented with your logical assessments of the state-of-the-art methods, the status-quo of the domain,
and potential improvements or gap areas.
For MR and IC topics, the students will also read recent research papers, identify the cutting-edge
methods, and apply them in practice. The goal of the literature review in this case is to justify your
selection of methods or analysis procedures. For instance, you may find rationale in literature to use certain
pre-processing techniques, model architectures and training methods. To support the rigor of these selections,
you are free to cite existing literature and findings. Conversely, any such choice in your approaches should
be well-supported in literature where applicable.
2.2 Method Research
The second step towards solving any complex real-world problems is to select the set of tools for the prob-
lems or similar ones. In the case of artificial intelligence, the tools are methods, algorithms or models. Any
selection should be well-supported by literature and often your own experiments. The performance of mod-
els is typically first evaluated on a benchmark set of datasets using standard metrics. For example, image
classification is one typical type of problems that enjoys a wide variety of real-world applications including
photo and video categorization, visual search engine, product recommendation, self-driving automotive, and
disease diagnosis. Image classifiers are often benchmarked on datasets such as ImageNet, SVHN, Object-
Net before applications on any specific datasets in considered problems. The benchmark results motivate
researchers to further develop their methods and also provide certain insurance for practitioners especially
on problems with insufficient labels.
For MR topics, we encourage students to implement or evaluate recently proposed methods on benchmark
datasets. For model comparison, the same experiment setup should be applied to all methods for fairness.
For example, to compare model efficiency, all models should be executed on the same or similar devices. To
3
investigate the effects of components of a method, e.g., the normalization method in each layer, an ablation
study is often required that removes the considered component. At last, a systematic study of different
hyper-parameter settings needs to be conducted for increased rigor in your findings.
For the MR and IC projects in this assignment, you may use existing libraries (e.g., TensorFlow, Keras)
or online resources (e.g., GitHub) to implement algorithms. Sometimes, you may need to implement the
algorithm or modify the existing implementation by yourselves.
2.3 Industrial and Clinical Applications
At this stage, we apply the selected methods on real-world industrial and clinical problems. The big difference
between real-world datasets and the benchmark datasets are that benchmark datasets are often deliberately
curated to evaluate certain merits of methods while in practice, there is a greater level of uncertainties in the
data, e.g., the features may be noisy (not certainly Gaussian noise), the labels may be arbitrarily wrong (not
according to any distribution), there may be outliers that are often associated with wrong measurements.
As a result, models robust to noise and data preprocessing methods such as outlier detection may be of
significance in practice. In addition, for real datasets, domain knowledge certainly helps building a better
model while on benchmark datasets, we sometimes ignore the domain knowledge in order to achieve a general
model.
For IC projects, students are expected to apply existing methods and come up with creative solutions to
analyzing a real-world dataset. From benchmarks in literature, you may not find a consensus on what is the
‘best’ model for your problem. Therefore, we encourage you to find several candidate models, apply them
and modify them with any domain knowledge. As mentioned previously, students are encouraged to take
advantages of any online resources.
3 Project Protocols
Students should follow the protocols below for this group assignment:
• This project may be finished by a group of students where each student should contribute roughly
equally to the expected outcomes (report, presentation and code if applicable). Each group should
have one to at most four students.
• For LR projects, each student in the group is expected to read in detail at least five research papers
published in the recent decade (2010-2020). Additionally, you may cite other resources (papers, blogs,
GitHub repositories) of any time where it is merited to explain founding concepts.
• For MR and IC projects, we do encourage each group to explore several models and tune the hyper-
parameters to get as good performance as possible. Note that this would take a significant amount of
time, so do not wait until the last week to do the experiments. However, due to the limited Spartan
resources, we suggest each group do not submit more than 4 jobs (or request more than 4 GPUs) at
the same time.
• For LR projects, students may follow any review papers to learn academic writing and organization
of materials. For MR and IC projects, students may use any pre-trained models and existing code on
GitHub, Canvas, or other platform. However, copying published work (texts, figures, tables, etc) into
your reports or sharing any part of the outcomes among different groups of this subject is considered
plagiarism and is strictly prohibited.
In this assignment, each group of students are also welcomed to propose their own project (labelled
under SS) and invite any one of the three coordinates to be your group coordinate. The SS projects may
roughly belong to any of three topic categories, namely, literature review, method research or applications.
The SS projects should be challenging, and ideally relevant to deep learning or machine learning techniques
introduced in this subject. For the SS projects, the group should follow the same protocols above and finish
the expected outcomes listed in Section 4 Expected Deliverables.
4
4 Expected Deliverables
Each group should have one to four students and finish the following tasks for the chosen project:
1. An initial report, which may be extended into the final report;
2. A short presentation submitted as a video record;
3. A final report.
For students working on MR and IC projects, Python code and result files (e.g., saved models if applicable)
also need to be submitted. For students working on Project IC.2. Kaggle Competition: Forecasting Global
Daily Confirmed COVID-19 Cases, the submission results on the Kaggle websites need to be submitted as
a screenshot. For students proposing their own projects, a project proposal need to be submitted.
For all written submissions including project proposal, initial report and final report, we only accept
PDF file format. You may use LaTeX (OverLeaf) or any office software (Microsoft Word, Google Doc, etc.)
to generate a PDF. We recommend Times New Roman or Cambria font, with 14pt for section titles, 10-12pt
size single spacing for main text, 8-9pt for any captions or illustrations of figures and tables. For references,
the APA Citation Style is recommended. The page limits for written submissions do not include references
(you may provide as many reference as you want) and optional appendices (if applicable, you may provide
in appendices the mathematical derivations, theoretical proof, method details, supplementary experiments,
figures and tables that cannot fit into the main text, but please note that the appendix will not be scored).
In the following, we describe each deliverable in detail.
Project proposal Students who wish to propose their own projects should submit a project proposal by
6:00 pm on Wednesday 6th May 2020. The proposal should ideally be from half-page to one-page.
It should explain the impact, scope and measurable outcomes of the SS projects (see Appendix A.4 for
definitions) and advise the name of the coordinator your group would like to nominate.
Initial report In this task, the group is required to submit an initial report including the following content:
• Description and understanding of the project topic, methods and/or datasets;
• Review of existing methods to solve this or similar problems;
• Proposed assumption, hypothesis and/or solution towards solving this problem;
• Progress and the plan (including task allocation for each student in the group).
The initial report has a maximum two-page limit which includes main text, tables and figures but excludes
references and appendix. The initial report is due at 6:00 pm on Friday 15th May 2020.
In your initial report as well as the final report, a highly logical flow should be presented in academic
English. Ideally, your reports should assume that the reader has no extensive prior knowledge of your
problem and your approach. For claims you are not observing in the results presented in your report, please
cite references. Striking a good balance between the rigor and clarity will be highly rewarded in our grading.
Short presentation In this task, each group is required to give a presentation on the current progress,
results and conclusions. The presentation should cover the following topics:
• Description and understanding of the project topic, methods and/or datasets;
• Ideas, hypotheses, current progress, results and conclusions;
• If applicable, encountered problems and possible solutions;
• Future work to be done before final submission.
Each group should nominate at least one member to do the presentation and submit it before 6:00 pm
on Sunday 24th May 2020. The presenter(s) should have full consent from the rest member(s) of the
group. The presentation should ideally last about 6 to 8 minutes. The presentation should be submitted as a
video in any standard format, e.g., AVI, MP4, MPEG, WAV, etc. Please note that the quality or resolution
5
of video and audio should not be a concern here, as long as the presented content is visible on screen and
the audio can be easily understood. Thus, there is no need to use a camcorder to record the video. Instead,
we suggest using the camera and speaker of a laptop or phone and software such as Zoom. Please note that
there is no need to show the presenter(s) in the video if the slides are focused on.
If the video turns out to be too large to submit to Canvas, please submit a link to the video shared via
any file-hosting platform like OneDrive, Google Drive, Dropbox, etc., and be sure to give the coordinators
access to it.
Code files (Where applicable) Please compress all your code and relevant resultant files into a zip file
and submit it to Canvas along with the final report. Although you may use any language/platform, we
highly recommend Python (3.6 or newer) and Tensorflow (2.1 or newer).
Submission result on Kaggle For students working on Project IC.2 Kaggle Competition: Forecasting
Global Daily Confirmed COVID-19 Cases, please upload a screen-shot of your final Kaggle submission results
along with your final report. Please note that, to get these submission results, you have to first register an
account on Kaggle if you have not done so previously.
Final report The final report should be a self-sustained explanation of your project in at least four pages.
You may extend your initial report to the final report. Please submit the final report before 6:00 pm on
Friday 5th June 2020.
For all projects (LR, MR, IC, SS), please organize your writing in a logical structure. For LR projects,
the following section structure is recommended:
1. Section 1 Introduction: problem definition and significance, and scope of review.
2. Section 3: Related Work : In this section, please summarize literature and their rationale in a
organized structure. There should be a logical flow between methods and paragraphs. You should
discuss the strengths and weaknesses of the methods. You may draw contrasts as to which methods
are best suited for which specific scenarios, etc.
3. Section 4: Discussion: In this section, please summarize your findings and give an assessment on the
status quo. Ideally, you may find some gap areas where more research needs to be done.
We encourage you to construct a report in the following structure for MR or IC projects:
1. Section 1: Introduction: Problem definition and significance.
2. Section 2: Related Work: Review of methods in literature which should ideally support your choice
of approaches.
3. Section 3: Methods: Describe your approach in detail. You may provide algorithm blocks or flowcharts
if needed.
4. Section 4: Results: Concisely describe your results and observations, illustrated with figures and
tables. You may provide results for benchmark methods for comparison purposes.
5. Section 5: Discussion: Summarize your project work and findings . You may opine on shortcomings,
challenges and how you wish to improve further as well.
5 Marking Critera
This group project takes up 25% of the final marks of the subject. The marks are divided among the tasks
as follows:
1. Initial report – 5%
• The report should be readable and does not have obvious technical errors – 1%.
• The report shows a good understanding of the selected topic – 1%.
6
• The review of methods is comprehensive and reasonable – 2% for LR projects, 1% for MR and
IC projects.
• The proposed hypothesis is reasonable, and the solution is achievable – 1% for MR and IC projects;
this criterion does not apply to LR projects.
• The progress and plan are satisfactory, and the task allocation is reasonable – 1%.
2. Presentation - 5%
• The presentation is well-structured, and the idea is properly communicated – 2%
• The content (introduction, methods, progress, results, and conclusions if applicable) are presented
nicely without obvious technical errors – 3%
Please note that the presentation will be scored by three examiners (the lecturer and two tutors)
independently and the average score will be used as the final score for presentation.
3. Final report – 15% for LR projects, 5% for MR and IC projects.
• The final report extends the initial report accordingly – 2% for LR projects, 1% for MR and IC
projects.
• The idea, methods and results are clearly and logically presented in Academic English – 7% for
LR projects, 1% for MR and IC projects.
• The analysis is reasonable and do not have obvious technical errors – 4% for LR projects, 2% for
MR and IC projects.
• Based on the whole report, the observations and conclusions are correct – 2% for LR projects, 1%
for MR and IC projects.
4. Code files and submission results on Kaggle – 10% for MR and IC projects; this criterion does not
apply to LR projects.
• The code files and results are well structured, and the code is smartly commented – 1%
• The save model can be loaded easily and reproduce the predictions faithfully – 2%
• The model predictions are accurate – 7%
Please feel free to contact Richard if you need any clarification on the tasks, requirements, and marking
criteria of Project 2.
7
Appendix A Topic Description
For details on the provided topics, please read the descriptions relevant to projects you are interested in.
A.1 LR. Literature Review Projects
In the following, we introduce the topics briefly. Please note that these are general topics. In your outcomes,
a good strategy is to focus on a relatively narrower aspect of the general topic. For example, In LR.1. below,
instead of reviewing papers in reinforcement learning and Mechatronics in general, you may focus on specific
type of methods (e.g., model-free reinforcement learning methods), or applications (e.g., robotics control),
or both (e.g., imitation learning in robotics control).
Each student may use any online resources (tutorials, blogs, Google Scholars, etc.) to find at least five
research articles related to each topic for a thorough reading. Students may also read blogs and literature
review papers to learn how to write reviews. However, plagiarism is strictly forbidden.
LR.1. Reinforcement Learning in Mechatronics
Reinforcement learning (RL) is a machine learning area investigating how software agents ought to take
actions in an environment in order to maximize the cumulative reward. RL deals with problems where the
models connecting actions and rewards are hard to define and differentiate. These problems are frequently
come across in mechatronic applications including robotics and automation. In this project, students are
expected to investigate the recent advances in the interdisciplinary area of RL and Mechatronics. Note that
Mechatronics is a broad area involving robotics, electronics, computer, telecommunications, systems, control,
and product engineering. You may choose any sub-area of Mechatronics or discuss Mechatronics in general.
LR.2. Continual and Incremental Learning
The real intelligence learns knowledge throughout a lifetime. The continual learning capability is crucial in
stepping towards artificial general intelligence. Additionally, in real-world, data and tasks are often presented
incrementally. While models can be trained on an initial dataset or task, they can also be improved up-on
with new incoming data or task. However, conventional models tend to encounter an issue called catastrophic
forgetting when presented with new data. Contextualizing new data with already learnt data is a significant
problem applicable to both supervised and unsupervised learning. In this project, students are expected to
investigate the-state-of-the-art incremental and continual learning algorithms which together step towards a
lifetime learning capability.
LR.3. Recent Advances in Neuro Fuzzy Systems
Neuro-fuzzy refers to combinations of artificial neural networks and fuzzy logic. The main strength of neuro-
fuzzy systems is that they are universal approximators with the ability to solicit interpretable IF-THEN
rules. Combining the learning power of neural networks, with knowledge representation capabilities of fuzzy
logic makes neuro fuzzy systems an attractive candidate for AI researchers. In this application, students are
expected to investigate the fundamental ideas of neuro-fuzzy systems, focusing on the recent advances of the
domain in this project.
LR.4. Fundamentals and Recent Advances of Transfer Learning
Humans often learn new things drawing from already known things. If you have been driving a sedan for
several years, you may learn to drive a truck or a bus by relating concepts you have already learned such as
acceleration, turning and slowing down. In transfer Learning, the model uses some of the already inferred
knowledge in new learning tasks rather than learning everything from scratch. This allows for highly use of
data. In this project, students are expected to investigate some recent advances along with the mathematical
foundations of transfer learning.
8
LR.5. Causality Inference in Deep Learning - Towards Artificial General Intelligence
Causal inference is the process of drawing a conclusion about a causal connection based on the conditions
of the occurrence of an effect. The neural networks we have learnt in lectures so far deal with association
inference, i.e., connecting data to labels, images to classes. However, our humans can easily do reasoning,
figuring out and the causality and effects. As a result, causal inference research is considered to be the next
trend in deep learning, a step towards artificial general intelligence. In this project, students are expected
to investigate the strategies and recent methods to infer causality from data.
LR.6. Neural Architecture Search - Identifying Optimal Neural Networks
Finding the optimal structure for a neural network suitable for a given task is a labor-intensive process.
Human experts spend a lot of time on designing a good solution structure, i.e., selecting the number of
layers, number of neurons, activation functions, connections between neurons, etc., for which we often rely
on prior knowledge and empirical evidence. It is thus reasonable to explore how we can automate this
process in a more scientific manner. In this project, we investigate the recent advances, their philosophical
underpinnings and the fundamental challenges.
LR.7. Anomaly Detection using Deep Neural Networks
Anomalies, or outliers, are defined as events that deviate from the standard. For example, in DNA sequence
analysis, anomalies could be caused by contaminated samples. Anomalies happen rarely, and do not follow
the rest of the data patterns, making it hard and crucial to detect them. In traditional machine learning,
researchers have created algorithms such as Isolation Forests, One-class SVMs, Elliptic Envelopes, and Local
Outlier Factor to help detect anomalies. Can we utilize the powerful representation learning capabilities of
deep learning from this task? In this subject, students are expected to investigate the challenges in anomaly
detection, recent deep learning solution and their hypothesis, foundations and gaps.
A.2 MR. Method Research
MR.1. Data Generation using Generative Adversarial Networks
Have you ever wondered how life-vivid human faces are generated by Generative Adversarial Networks
(GANs)? If not, go check this cool website: This person does not exist (clickable link). In this project, you
are required to benchmark a high-performant GAN on CelebA dataset (downloaded here) (or CelebA-HQ
which is a high-resolution version of CelebA) using metrics such as Inception Score, FID or others. During
the experiments, we encourage you to assess the following aspects of GANs:
• Running time. Does the image resolution significantly affect runing time?
• Model stability. Is GAN training stable?
• Sensitivity to Hyper-parameters. Do hyper-parameters significantly affect generated image quality?
• Mode collapses. Is there any portion of the data never appears in the generated samples? Is there any
way to measure model collapses.
As a starting point, the following tutorials may be helpful:
• GAN lab: playing GANs in your browser.
• Blog: A Beginner’s Guide to GANs.
• Paper with code: a search engine for papers that share code on GitHub.
MR.2. Data Visualization with Unsupervised Learning
Visualization of high-dimensional data is a preliminary step used in many data analyses. We can get a sense
of the ‘structure’ of the high-dimensional data by visualizing them. There are significant recent advances
in this field of study with highly efficient algorithms being presented. In this study, you are expected to
9
benchmark data visualization methods on a selective set of datasets. The goal is to familiarize yourself
with the process of gaining information from unsupervised preliminary analysis that may help you with the
downstream analysis.
During the experiments, you may compare the following properties of the data visualization methods:
• Running time.
• Quality of visualizations. Are the clusters well separated? Does the method preserve class hierarchies?
You should compare methods use both quantitatively and qualitatively.
• If the data are provided incrementally to the data, can the methods adapt to new data increments?
You are also encouraged to pay attention to the following aspects:
• Size of the dataset, including the number of data instances and dimensionality, and how they affect
the method running time.
• Uniformity of the data. Are the data instances from a single cluster with certain properties spread
around, or are there multiple clusters?
• Noise and outliers in data. Would adding noise and outliers sabotage the visualization? Which method
is the most robust?
We do encourage you to be creative and explore other aspects of methods and data not mentioned above.
The following tutorials may help you get started: (link clickable)
• Introduction to Dimensionality Reduction on Kaggle.
• Blog: Data visualization and dimensionality reduction using t-SNE.
• Nature paper: The art of using t-SNE.
A.3 IC. Industrial and Clinical Applications
IC.1. & IC.2. COVID-19 Related Projects
When this document is being written, the COVID-19 has arguably infected more than 3 million people
in the world (clickable source: ECDC), causing a huge loss to public health, well-being and economic. In
this assignment, we provide two projects related to COVID-19 to give you an idea how machine learning
techniques can contribute to this fight against Corona-virus. These two projects are:
• IC.1. COVID-19 Pneumonia diagnosis using chest X-Rays.
• IC.2. Kaggle Competition: Forecasting Global Daily Confirmed COVID-19 Cases
As a group, you may choose either one to form a project. For Project IC.1., you will build sample-
efficient image classifier to distinguish chest X-rays of individuals with respiratory illness testing positive for
COVID-19 from other X-rays. The classifier model is ideally interpretable to promote discovery of patterns
in such X-rays. You may find more information and links to three datasets at this blog (clickable link). The
three continually growing datasets are provided at: (clickable links)
• COVID-19 image data collection (GitHub link).
• Figure 1 COVID-19 Chest X-ray Dataset Initiative (GitHub link).
• Kaggle competition: RSNA Pneumonia detection challenge.
For Project IC.2., you are expected to participate in a global epidemiological study and competition
initiated by the White House Office of Science and Technology Policy (OSTP), US, and build a regression
model to predict the daily spread in regions around the world. The links to view the background, datasets
and other people’s solutions are listed below: (clickable links)
• COVID19 Global Forecasting (Week 1).
• COVID19 Global Forecasting (Week 2).
10
• COVID19 Global Forecasting (Week 3).
• COVID19 Global Forecasting (Week 4).
With the time moving on, the follow-up competitions will be available. Please participate in the latest
competition of this series and you may use all data up-to-date.
IC.3. Incremental Learning in Industrial Mechatronics
Industrial components undergo degradation over time. Machine learning provides a way to automate the
monitoring of such components to keep industrial plants running. In this Kaggle competition (clickable
link), the objective is to identify such degrading materials in an unsupervised manner and you are required
to detect any degradation as well as identify whether component replacement has happened. However, in
this assignment, we take one step further. Although the competition provides you with a complete year’s
worth of data, can we do this in real time? This means that as we monitor the equipment, we may not
begin with a large set of data, but incrementally gather data as the processes commence. Your challenge is
to build on the work provided in the kernels of this competition, and related publication, to see if and how
you can learn to predict anomalies/degradation of components incrementally.
Please check the following resources for more information: (clickable links)
• Kaggle competition: one-year-industrial-component-degradation and the kernel page.
• Previous Kaggle competition related to this project.
• Research paper on anomaly detection using Self-Organizing Maps.
• Damith’s recent manuscript on data visualization in incremental learning scenarios.
IC.4. Positive Unlabeled Learning in Drug Repositioning
Drug-Drug interactions (DDIs) can occur when two drugs are administered to a patient simultaneously.
However, verifying DDIs experimentally is a time-consuming and costly procedure. While computational
methods are great candidates for identifying DDIs from the drugs’ chemical and therapeutic properties, the
challenges lie in that the known interactions are rare (low number of positive labels) and most drug pairs
are not tested for interactions. In such a case, positive unlabeled learning is considered, where we have a
binary classification problem with only a small number of labeled data points. Note that for such data,
supervised learning models may not be appropriate as the model will simply overfit the small number of
labeled samples. In this project, you are expected to investigate appropriate machine learning methods to
predict possible DDIs by looking at the chemical and therapeutic properties of known drugs, and a small
number of known interactions.
Please follow the related resources as the first step: (clickable links)
• Research paper on Positive Unlabeled Learning based DDI identification
• Drugbank Database
• Download links for preprocessed Drug Data
A.4 SS. Student-Suggested Projects
In addition to proposing the above projects, we also encourage students to be creative and propose their
own projects (SS) that may roughly be categorized as any of LR, MR and IC. However, SS projects need
to be approved by the subject coordinators before you commence the projects. A good project should have
the following properties:
• Impact: The project should have considerable relevance to machine learning methods addressing a
significant methodological problem, solving a problem which impacts a large community, etc.
• Scope: The project should be planned and able to be concluded before the final report submission
(Friday 5 June 2020).
11
• Measurable Outcomes: The project should be self-sustained, i.e., it should provide its own resource
such as publications, blogs, code repositories, datasets as well as outcomes such as software libraries
or conclusive reports.
12