SemesterFall Semester, 2023
DepartmentMA Program of Computer Science, First Year Artificial Intelligence, First Year Computer Science and Engineering, First Year
Course NameTheory and Practice of Bioinformatics
Course TypeElective
Course Objective
Course Description
Course Schedule

周次 課程主題 課程內容與指定閱讀 教學活動與作業 學習投入時數
1 Introduction What is bioinformatics? 

The central dogma of molecular biology: DNA, mRNA, protein
The DNA Journey 

?天下文化 觀念生物學1~4 

Canadian Bioinformatics Workshops (all slides and video are available)
2 Sequence alignment Why do we need sequence alignment? 

Its application in structure homology and evolutionary modeling context? 

Dynamic programming
SEAVIEW : Sequence alignment editor 

T-Coffee documentation
3 Pairwise Sequence alignment Global & Local alignments

Linear space algorithm 

NCBI BLAST server 

BLAST by O'Reilly Media
4 Multiple Sequence alignment The variation of the algorithms, which one is better? 

Another issue: huge amount of data
T-Coffee web server 

?PSI/TM-Coffee web server 

  • PSI/TM-Coffee: Floden, E. W. et al. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.  Nucleic Acids Res. 44, W339–43 (2016). 

  • PSI-Coffee: ?Chang, J.-M. M., Di Tommaso, P., Taly, J.-F. F. & Notredame, C. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13 Suppl 4, S1 (2012).

5 Sequence alignment post-process Uncertainty and its effect on downstream analysis 

How to detect uncertainty?
TCS web server 

TCS: Chang, J.-M. M., Di Tommaso, P. & Notredame, C. TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.Mol. Biol. Evol. 31, 1625–37 (2014).
6 Phylogenetic tree 1/2 Probabilistic and ideal-data models 

Character/parsimony-based methods
Databases of rRNA sequences and associated software summary by Manolo Gouy 

The rRNA WWW Server by Antwerp, Belgium 

The Ribosomal Database Project by Michigan State University
7 Phylogenetic tree 2/2 Distance-based methods: UPGMA, NJ 

Maximum-likelihood methods: PhyML
Programs for molecular phylogeny summary by Manolo Gouy 

PHYLIP: an extensive package of programs for all platforms 

PAUP: a very performing commercial package 

PHYLO_WIN: a graphical interface, for unix only 

MrBayes: Bayesian phylogenetic analysis 

PhyML: fast maximum likelihood tree building 

WWW-interface at Institut Pasteur, Paris
8 Protein secondary structure prediction Neural network approach 

Knowledge-based approach 

  • HYPROSP: Wu, K.-P., Lin, H.-N., Chang, J.-M., Sung, T.-Y. & Hsu, W.-L. HYPROSP: a hybrid protein secondary structure prediction algorithm—a knowledge-based approach. Nucleic Acids Research 32, 5059–5065 (2004). 

  • HYPROSPII: Lin, H.-N., Chang, J.-M., Wu, K.-P., Sung, T.-Y. & Hsu, W.-L. HYPROSP II-A knowledge-based hybrid method for protein secondary structure prediction based on local prediction confidence. Bioinformatics 21, 3227–3233 (2005).

9 Protein functional classes prediction Machine learning 

Feature reduction? 
The Critical Assessment of protein Function Annotation algorithms (CAFA) 

  • CAFA1: Radivojac, P. et al. A large-scale evaluation of computational protein function prediction. Nat. Methods 10, 221–7 (2013). 

  • CAFA2: Jiang, Y. et al. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. arXiv preprint arXiv:1601.00891 (2016). at 

PSLDoc: Chang, J.-M. M. et al. PSLDoc: Protein subcellular localization prediction based on gapped-dipeptides and probabilistic latent semantic analysis. Proteins 72, 693–710 (2008) 

PSLDoc2: Chang, J.-M. M. et al. Efficient and interpretable prediction of protein functional classes by correspondence analysis and compact set relations. PLoS ONE 8, e75542 (2013) 

10 Review before the midterm exam      
11 Midterm One A4 page   3
12 Genomics What is gene and genome? How does a gene express and regulate? 

The Human Genome Project 

Gene finding 
The Assemblathon 6
13 Next Generation Sequencing RNA-Seq: large amounts of data 

How to identify significant expressions?
Applications of next-generation sequencing by Nature Reviews Genetics 6
14 Comparative genomics 1 Genome alignment 

The Alignathon 6
15 Comparative genomics 2 Single-nucleotide polymorphisms related to diseases HaploReg: a tool for exploring annotations of the noncoding genome at variants on haplotype blocks 

ClinVar?: aggregates information about genomic variation and its relationship to human health
16 General review of the final project Online discuss the issues of the final project    
17 Big projects & their visualizations ENCODE: Encyclopedia of DNA Elements 

modENCODE: model organism Encyclopedia of DNA Elements 

NIH Roadmap Epigenomics 

1000 Genomes

UCSC/Ensembl genome browser 

WashU epi-genetics browser

RCSB Protein Data Bank (PDB) 

NCBI Sequence Read Archive (SRA)
Collected papers for Epigenome Roadmap 

Epigenetics by Nature Reviews Genetics 



Roadmap Epigenomics project 

?1000 Genomes project

UCSC Genome browser 

Ensembl genome browser 

WashU Epigenome browser


18 Final project presentation



Teaching Methods
Teaching Assistant

  • Prepare assignments

  • Grade assignments

  • Maintain content in Moodle

  • Answer students' questions


  • 作業 55% 

  • 期中考 15% 

  • 期末專題 20% 

  • 上課表現 10%

Textbook & Reference


Introduction to Bioinformatics 

Author: Arthur Lesk. 

Publisher: Oxford University Press; 4 edition (January 1, 2014) 

ISBN: 0199651566 


Bioinformatics For Dummies 

Author: Jean Michel Claverie, Cedric Notredame 

Publisher: For Dummies; 2 edition (December 18, 2006) 

ISBN: 0470089857 

Bioinformatics: Sequence and Genome Analysis 

Author: David W. Mount 

Publisher: 2nd Edition, Cold Spring Harbor Lab. Press 

Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 3rd Edition 

Author: Andreas D. Baxevanis, B. F. Francis Ouellette, Wuket Kussm 

Statistical Methods in Bioinformatics: An Introduction (Statistics for Biology and Health) 

Author: Warren J. Ewens and Gregory R. Grant 

Introduction to Bioinformatics Algorithms 

Author: Jones Neil J. and Pevzner Pavel A. 

Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids Paperback 

Author: Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison

Urls about Course