SemesterSpring Semester, 2021
DepartmentMA Program of Computer Science, First Year MA Program of Computer Science, Second Year
Course NameTheory and Practice of Bioinformatics
Course TypeElective
Course Objective
Course Description
Course Schedule

周次 課程主題 課程內容與指定閱讀 教學活動與作業 學習投入時數
1 Introduction What is bioinformatics? 

Central dogma of molecular biology: DNA, mRNA, protein
The DNA Journey 

?天下文化 觀念生物學1~4 

Canadian Bioinformatics Workshops (all slides and video are available)
2 Sequence alignment Why do we need sequence alignment? 

Its application in structure homology and evolutionary modeling context? 

Dynamic programming
SEAVIEW : Sequence alignment editor 

T-Coffee documentation
3 Pairwise Sequence alignment Global & Local alignment 

Linear space algorithm 

NCBI BLAST server 

BLAST by O'Reilly Media
4 Multiple Sequence alignment The variation of the algorithms, which one is better? 

Another issue: huge amount data
T-Coffee web server 

?PSI/TM-Coffee web server 

  • PSI/TM-Coffee: Floden, E. W. et al. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.  Nucleic Acids Res. 44, W339–43 (2016). 

  • PSI-Coffee: ?Chang, J.-M. M., Di Tommaso, P., Taly, J.-F. F. & Notredame, C. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13 Suppl 4, S1 (2012).

5 Sequence alignment post-process Uncertainty and its effect on downstream analysis 

How to detect uncertainty?
TCS web server 

TCS: Chang, J.-M. M., Di Tommaso, P. & Notredame, C. TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.Mol. Biol. Evol. 31, 1625–37 (2014).
6 Phylogenetic tree 1/2 Probabilistic and ideal-data models 

Character/parsimony-based methods
Databases of rRNA sequences and associated software summary by Manolo Gouy 

The rRNA WWW Server by Antwerp, Belgium 

The Ribosomal Database Project by Michigan State University
7 Phylogenetic tree 2/2 Distance-based methods: UPGMA, NJ 

Maximum-likelihood methods: PhyML
Programs for molecular phylogeny summary by Manolo Gouy 

PHYLIP: an extensive package of programs for all platforms 

PAUP: a very performing commercial package 

PHYLO_WIN: a graphical interface, for unix only 

MrBayes: Bayesian phylogenetic analysis 

PhyML: fast maximum likelihood tree building 

WWW-interface at Institut Pasteur, Paris
8 Protein secondary structure prediction Neural network approach 

Knowledge-based approach 

  • HYPROSP: Wu, K.-P., Lin, H.-N., Chang, J.-M., Sung, T.-Y. & Hsu, W.-L. HYPROSP: a hybrid protein secondary structure prediction algorithm—a knowledge-based approach. Nucleic Acids Research 32, 5059–5065 (2004). 

  • HYPROSPII: Lin, H.-N., Chang, J.-M., Wu, K.-P., Sung, T.-Y. & Hsu, W.-L. HYPROSP II-A knowledge-based hybrid method for protein secondary structure prediction based on local prediction confidence. Bioinformatics 21, 3227–3233 (2005).

9 Protein functional classes prediction Machine learning 

Feature reduction? 
The Critical Assessment of protein Function Annotation algorithms (CAFA) 

  • CAFA1: Radivojac, P. et al. A large-scale evaluation of computational protein function prediction. Nat. Methods 10, 221–7 (2013). 

  • CAFA2: Jiang, Y. et al. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. arXiv preprint arXiv:1601.00891 (2016). at 

PSLDoc: Chang, J.-M. M. et al. PSLDoc: Protein subcellular localization prediction based on gapped-dipeptides and probabilistic latent semantic analysis. Proteins 72, 693–710 (2008) 

PSLDoc2: Chang, J.-M. M. et al. Efficient and interpretable prediction of protein functional classes by correspondence analysis and compact set relations. PLoS ONE 8, e75542 (2013) 

10 Midterm One A4 page   3
11 Genomics What is gene and genome? How does a gene express and regulate? 

The Human Genome Project 

Gene finding 
The Assemblathon 6
12 Next generation sequencing RNA-Seq: large amounts of data 

How to identify significant expression?
Applications of next-generation sequencing by Nature Reviews Genetics 6
13 Comparative genomics Genome alignment 


Single-nucleotide polymorphisms related with diseases
The Alignathon 

?HaploReg: a tool for exploring annotations of the noncoding genome at variants on haplotype blocks 

ClinVar?: aggregates information about genomic variation and its relationship to human health
14 Computational epigenetics Chromatin biology 

Nuclear organization 
染色體結構捕捉技術 by 陳政儀 

  • HiC contact bias : Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet. 43,1059–65 (2011). 

  • HiC peak calling method : Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–80 (2014). 

  • Genome segmentation : Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods 9, 215–216 (2012). 

  • Review : Sexton, T. & Cavalli, G. The role of chromosome domains in shaping the functional genome. Cell 160, 1049–59 (2015). 

15 Big data - big projects ENCODE: Encyclopedia of DNA Elements 

modENCODE: model organism Encyclopedia of DNA Elements 

NIH Roadmap Epigenomics 

1000 Genomes 
Collected papers for Epigenome Roadmap 

Epigenetics by Nature Reviews Genetics 



Roadmap Epigenomics project 

?1000 Genomes project
16 Big data visualization UCSC/Ensembl genome browser 

WashU epi-genetics browser
UCSC Genome browser 

Ensembl genome browser 

WashU Epigenome browser 
17 Data base  RCSB Protein Data Bank (PDB) 

NCBI Sequence Read Archive (SRA) 

18 Final project presentation Rubrics/評分量尺    

Teaching Methods
Teaching Assistant

  • Prepare assignments

  • Grade assignments

  • Maintain content in Moodle

  • Answer students' questions


  • 作業 60% 

  • 期中考 15% 

  • 期末專題 25% 

  • 上課表現(加分) <= 10%

Textbook & Reference


Introduction to Bioinformatics 

Author: Arthur Lesk. 

Publisher: Oxford University Press; 4 edition (January 1, 2014) 

ISBN: 0199651566 


Bioinformatics For Dummies 

Author: Jean Michel Claverie, Cedric Notredame 

Publisher: For Dummies; 2 edition (December 18, 2006) 

ISBN: 0470089857 

Bioinformatics: Sequence and Genome Analysis 

Author: David W. Mount 

Publisher: 2nd Edition, Cold Spring Harbor Lab. Press 

Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 3rd Edition 

Author: Andreas D. Baxevanis, B. F. Francis Ouellette, Wuket Kussm 

Statistical Methods in Bioinformatics: An Introduction (Statistics for Biology and Health) 

Author: Warren J. Ewens and Gregory R. Grant 

Introduction to Bioinformatics Algorithms 

Author: Jones Neil J. and Pevzner Pavel A. 

Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids Paperback 

Author: Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison

Urls about Course