Semester	Spring Semester, 2021
Department	Artificial Intelligence, First Year Computer Science and Engineering, First Year
Course Name	Theory and Practice of Bioinformatics
Instructor	CHANG JIA-MING
Credit	3.0
Course Type	Elective
Prerequisite

Course Objective

Course Description

Course Schedule

周次	課程主題	課程內容與指定閱讀	教學活動與作業	學習投入時數
1	Introduction	What is bioinformatics? Central dogma of molecular biology: DNA, mRNA, protein	The DNA Journey 天下文化觀念生物學1~4 Canadian Bioinformatics Workshops (all slides and video are available)	6
2	Sequence alignment	Why do we need sequence alignment? Its application in structure homology and evolutionary modeling context Dynamic programming	SEAVIEW : Sequence alignment editor T-Coffee documentation	6
3	Pairwise Sequence alignment	Global & Local alignment Linear space algorithm BLAST	NCBI BLAST server BLAST by O'Reilly Media	6
4	Multiple Sequence alignment	The variation of the algorithms, which one is better? Another issue: huge amount data	T-Coffee web server PSI/TM-Coffee web server PSI/TM-Coffee: Floden, E. W. et al. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases. Nucleic Acids Res. 44, W339–43 (2016). PSI-Coffee: Chang, J.-M. M., Di Tommaso, P., Taly, J.-F. F. & Notredame, C. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13 Suppl 4, S1 (2012).	6
5	Sequence alignment post-process	Uncertainty and its effect on downstream analysis How to detect uncertainty?	TCS web server TCS: Chang, J.-M. M., Di Tommaso, P. & Notredame, C. TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.Mol. Biol. Evol. 31, 1625–37 (2014).	6
6	Phylogenetic tree 1/2	Probabilistic and ideal-data models Character/parsimony-based methods	Databases of rRNA sequences and associated software summary by Manolo Gouy The rRNA WWW Server by Antwerp, Belgium The Ribosomal Database Project by Michigan State University	6
7	Phylogenetic tree 2/2	Distance-based methods: UPGMA, NJ Maximum-likelihood methods: PhyML	Programs for molecular phylogeny summary by Manolo Gouy PHYLIP: an extensive package of programs for all platforms PAUP: a very performing commercial package PHYLO_WIN: a graphical interface, for unix only MrBayes: Bayesian phylogenetic analysis PhyML: fast maximum likelihood tree building WWW-interface at Institut Pasteur, Paris	6
8	Protein secondary structure prediction	Neural network approach Knowledge-based approach	HYPROSP: Wu, K.-P., Lin, H.-N., Chang, J.-M., Sung, T.-Y. & Hsu, W.-L. HYPROSP: a hybrid protein secondary structure prediction algorithm—a knowledge-based approach. Nucleic Acids Research 32, 5059–5065 (2004). HYPROSPII: Lin, H.-N., Chang, J.-M., Wu, K.-P., Sung, T.-Y. & Hsu, W.-L. HYPROSP II-A knowledge-based hybrid method for protein secondary structure prediction based on local prediction confidence. Bioinformatics 21, 3227–3233 (2005).	6
9	Protein functional classes prediction	Machine learning Feature reduction	The Critical Assessment of protein Function Annotation algorithms (CAFA) CAFA1: Radivojac, P. et al. A large-scale evaluation of computational protein function prediction. Nat. Methods 10, 221–7 (2013). CAFA2: Jiang, Y. et al. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. arXiv preprint arXiv:1601.00891 (2016). at PSLDoc: Chang, J.-M. M. et al. PSLDoc: Protein subcellular localization prediction based on gapped-dipeptides and probabilistic latent semantic analysis. Proteins 72, 693–710 (2008) PSLDoc2: Chang, J.-M. M. et al. Efficient and interpretable prediction of protein functional classes by correspondence analysis and compact set relations. PLoS ONE 8, e75542 (2013)	6
10	Midterm	One A4 page		3
11	Genomics	What is gene and genome? How does a gene express and regulate? The Human Genome Project Gene finding	The Assemblathon	6
12	Next generation sequencing	RNA-Seq: large amounts of data How to identify significant expression?	Applications of next-generation sequencing by Nature Reviews Genetics	6
13	Comparative genomics	Genome alignment Phylogenomics Single-nucleotide polymorphisms related with diseases	The Alignathon HaploReg: a tool for exploring annotations of the noncoding genome at variants on haplotype blocks ClinVar: aggregates information about genomic variation and its relationship to human health	6
14	Computational epigenetics	Chromatin biology Nuclear organization	染色體結構捕捉技術 by 陳政儀 HiC contact bias : Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet. 43,1059–65 (2011). HiC peak calling method : Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–80 (2014). Genome segmentation : Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods 9, 215–216 (2012). Review : Sexton, T. & Cavalli, G. The role of chromosome domains in shaping the functional genome. Cell 160, 1049–59 (2015).	6
15	Big data - big projects	ENCODE: Encyclopedia of DNA Elements modENCODE: model organism Encyclopedia of DNA Elements NIH Roadmap Epigenomics 1000 Genomes	Collected papers for Epigenome Roadmap Epigenetics by Nature Reviews Genetics ENCODE modENCODE Roadmap Epigenomics project 1000 Genomes project	6
16	Big data visualization	UCSC/Ensembl genome browser WashU epi-genetics browser	UCSC Genome browser Ensembl genome browser WashU Epigenome browser	6
17	Data base	RCSB Protein Data Bank (PDB) NCBI Sequence Read Archive (SRA)	PDB SRA NCBI	6
18	Final project presentation	Rubrics/評分量尺

Teaching Methods

Teaching Assistant

Prepare assignments

Grade assignments

Maintain content in Moodle

Answer students' questions

Requirement/Grading

作業 60%

期中考 15%

期末專題 25%

上課表現(加分) <= 10%

Textbook & Reference

主要參考書籍

Introduction to Bioinformatics

Author: Arthur Lesk.

Publisher: Oxford University Press; 4 edition (January 1, 2014)

ISBN: 0199651566

其他參考書籍

Bioinformatics For Dummies

Author: Jean Michel Claverie, Cedric Notredame

Publisher: For Dummies; 2 edition (December 18, 2006)

ISBN: 0470089857

Bioinformatics: Sequence and Genome Analysis

Author: David W. Mount

Publisher: 2nd Edition, Cold Spring Harbor Lab. Press

Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 3rd Edition

Author: Andreas D. Baxevanis, B. F. Francis Ouellette, Wuket Kussm

Statistical Methods in Bioinformatics: An Introduction (Statistics for Biology and Health)

Author: Warren J. Ewens and Gregory R. Grant

Introduction to Bioinformatics Algorithms

Author: Jones Neil J. and Pevzner Pavel A.

Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids Paperback

Author: Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison

Urls about Course

http://www.changlabtw.com/1092-bioinformatics.html

Attachment