程序代写案例-B363

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

B363 Final Project Suggested Topics
(Final Report Due: 12/16, Wed midnight)

1. We have learned several algorithms to predict the replication start site in bacterial
genome sequences. We have shown the methods can be applied to E. coli genome
successfully that identified the replication start regions and the signal (k-mers) of
the DnaA box. In fact, the genomes from thousands of E. coli strains have become
available at https://www.ncbi.nlm.nih.gov/genome/browse#!/prokaryotes/167/.
You may implement the algorithms learned in the class to analyze and compare a
subset of these genomes’ replication start sites.

2. We have learned in the class clustering algorithms to group genes with similar
expression patterns across a biological process such as the diauxic shift. The
dataset obtained by DeRisi and colleagues is available at
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28. You may
implement one clustering algorithm to analyze the dataset in order to identify the
subset of genes showing increasing expression levels after diauxic shift. You may
further implement a motif finding algorithm to identify the carbon source
response element (CSRE) motif in the upstream regions of many of these genes.
You can find the yeast genome sequence and the annotations of genes here:
https://www.ncbi.nlm.nih.gov/genome/15?genome_assembly_id=22535

3. Evolutionary studies of SARS-Cov-2 genomes. The infectious disease Coronavirus
disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus
2 was first identified in Wuhan, China, and is currently spread across many countries
including the United States. Since the outbreak, thousands of COV-2 viral genomes have
been sequenced from different countries around the world. For example, the collection
(https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLin
eage_ss=SARS-CoV-2,%20taxid:2697049) at NCBI contains more than 30,000 genome
sequences. You may use one of the algorithms learned in the class to construct a
phylogenetic tree among selected subset of genome sequences to study how the viruses
have spread across the world, and how new strains emerges during the pandemic. You
may refer to a similar study by the NextStrain team: https://nextstrain.org/ncov/global.

Requirement and Evaluation: The above are the suggested topics for the final project. You may
choose to work on one of the above projects or other projects of your interests. You can work by
yourself or in a team of two to complete the final project. You should not simply present the data
with figures and visualization tools, but need to use some algorithms learned from the class to
reach your conclusion. Please contact me by email at [email protected] if you are not sure
whether your idea is appropriate for the final project. I will host some discussions in the classes of
12/1 and 12/3 3:15-4:30p. Each team is required to give a presentation for 5-10 minutes about the
project in the class on 12/8 and 12/10. You do not need to have final results for the project, but
should present the idea and methods you plan to pursue. We will make those arrangement in the
class of 12/3 – please make efforts to attend the zoom meeting on 12/3.

Each team should submit a final report and related implementations of bioinformatics algorithms
on canvas by the due date 12/16 Wednesday. We will evaluate them based on the following
questions. For team project, you need to describe the contribution of each member to the project
in your final report.

1) Is the formulated problem reasonable? (15%)
2) Have comprehensive data been collected? (15%)
3) Is the bioinformatics algorithm devised properly and described clearly? (15%)
4) Has the bioinformatics algorithm been implemented correctly and efficiently? (20%)
5) Is the conclusion meaningful and supported by the data? (20%)
6) Are the results presented clearly and intuitively? (15%)

欢迎咨询51作业君