SemesterSpring Semester, 2021
DepartmentArtificial Intelligence, First Year Computer Science and Engineering, First Year
Course NameTheory and Practice of Bioinformatics
InstructorCHANG JIA-MING
Credit3.0
Course TypeElective
Prerequisite
Course Objective
Course Description
Course Schedule








































































































































周次 課程主題 課程內容與指定閱讀 教學活動與作業 學習投入時數
1 Introduction What is bioinformatics? 

Central dogma of molecular biology: DNA, mRNA, protein
The DNA Journey 

?天下文化 觀念生物學1~4 

Canadian Bioinformatics Workshops (all slides and video are available)
6
2 Sequence alignment Why do we need sequence alignment? 

Its application in structure homology and evolutionary modeling context? 

Dynamic programming
SEAVIEW : Sequence alignment editor 

T-Coffee documentation
6
3 Pairwise Sequence alignment Global & Local alignment 

Linear space algorithm 

BLAST
NCBI BLAST server 

BLAST by O'Reilly Media
6
4 Multiple Sequence alignment The variation of the algorithms, which one is better? 

Another issue: huge amount data
T-Coffee web server 

?PSI/TM-Coffee web server 

  • PSI/TM-Coffee: Floden, E. W. et al. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.  Nucleic Acids Res. 44, W339–43 (2016). 

  • PSI-Coffee: ?Chang, J.-M. M., Di Tommaso, P., Taly, J.-F. F. & Notredame, C. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13 Suppl 4, S1 (2012).


6
5 Sequence alignment post-process Uncertainty and its effect on downstream analysis 

How to detect uncertainty?
TCS web server 

TCS: Chang, J.-M. M., Di Tommaso, P. & Notredame, C. TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.Mol. Biol. Evol. 31, 1625–37 (2014).
6
6 Phylogenetic tree 1/2 Probabilistic and ideal-data models 

Character/parsimony-based methods
Databases of rRNA sequences and associated software summary by Manolo Gouy 

The rRNA WWW Server by Antwerp, Belgium 

The Ribosomal Database Project by Michigan State University
6
7 Phylogenetic tree 2/2 Distance-based methods: UPGMA, NJ 

Maximum-likelihood methods: PhyML
Programs for molecular phylogeny summary by Manolo Gouy 

PHYLIP: an extensive package of programs for all platforms 

PAUP: a very performing commercial package 

PHYLO_WIN: a graphical interface, for unix only 

MrBayes: Bayesian phylogenetic analysis 

PhyML: fast maximum likelihood tree building 

WWW-interface at Institut Pasteur, Paris
6
8 Protein secondary structure prediction Neural network approach 

Knowledge-based approach 


  • HYPROSP: Wu, K.-P., Lin, H.-N., Chang, J.-M., Sung, T.-Y. & Hsu, W.-L. HYPROSP: a hybrid protein secondary structure prediction algorithm—a knowledge-based approach. Nucleic Acids Research 32, 5059–5065 (2004). 

  • HYPROSPII: Lin, H.-N., Chang, J.-M., Wu, K.-P., Sung, T.-Y. & Hsu, W.-L. HYPROSP II-A knowledge-based hybrid method for protein secondary structure prediction based on local prediction confidence. Bioinformatics 21, 3227–3233 (2005).


6
9 Protein functional classes prediction Machine learning 

Feature reduction? 
The Critical Assessment of protein Function Annotation algorithms (CAFA) 

  • CAFA1: Radivojac, P. et al. A large-scale evaluation of computational protein function prediction. Nat. Methods 10, 221–7 (2013). 

  • CAFA2: Jiang, Y. et al. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. arXiv preprint arXiv:1601.00891 (2016). at 



PSLDoc: Chang, J.-M. M. et al. PSLDoc: Protein subcellular localization prediction based on gapped-dipeptides and probabilistic latent semantic analysis. Proteins 72, 693–710 (2008) 



PSLDoc2: Chang, J.-M. M. et al. Efficient and interpretable prediction of protein functional classes by correspondence analysis and compact set relations. PLoS ONE 8, e75542 (2013) 


6
10 Midterm One A4 page   3
11 Genomics What is gene and genome? How does a gene express and regulate? 

The Human Genome Project 

Gene finding 
The Assemblathon 6
12 Next generation sequencing RNA-Seq: large amounts of data 

How to identify significant expression?
Applications of next-generation sequencing by Nature Reviews Genetics 6
13 Comparative genomics Genome alignment 

Phylogenomics 

Single-nucleotide polymorphisms related with diseases
The Alignathon 

?HaploReg: a tool for exploring annotations of the noncoding genome at variants on haplotype blocks 

ClinVar?: aggregates information about genomic variation and its relationship to human health
6
14 Computational epigenetics Chromatin biology 

Nuclear organization 
染色體結構捕捉技術 by 陳政儀 

  • HiC contact bias : Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet. 43,1059–65 (2011). 

  • HiC peak calling method : Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–80 (2014). 

  • Genome segmentation : Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods 9, 215–216 (2012). 

  • Review : Sexton, T. & Cavalli, G. The role of chromosome domains in shaping the functional genome. Cell 160, 1049–59 (2015). 


6
15 Big data - big projects ENCODE: Encyclopedia of DNA Elements 

modENCODE: model organism Encyclopedia of DNA Elements 

NIH Roadmap Epigenomics 

1000 Genomes 
Collected papers for Epigenome Roadmap 

Epigenetics by Nature Reviews Genetics 

ENCODE 

modENCODE 

Roadmap Epigenomics project 

?1000 Genomes project
6
16 Big data visualization UCSC/Ensembl genome browser 

WashU epi-genetics browser
UCSC Genome browser 

Ensembl genome browser 

WashU Epigenome browser 
6
17 Data base  RCSB Protein Data Bank (PDB) 

NCBI Sequence Read Archive (SRA) 
PDB 

?SRA NCBI 
6
18 Final project presentation Rubrics/評分量尺    

Teaching Methods
Teaching Assistant

  • Prepare assignments

  • Grade assignments

  • Maintain content in Moodle

  • Answer students' questions


Requirement/Grading

  • 作業 60% 

  • 期中考 15% 

  • 期末專題 25% 

  • 上課表現(加分) <= 10%


Textbook & Reference

主要參考書籍 

Introduction to Bioinformatics 

Author: Arthur Lesk. 

Publisher: Oxford University Press; 4 edition (January 1, 2014) 

ISBN: 0199651566 



其他參考書籍 

Bioinformatics For Dummies 

Author: Jean Michel Claverie, Cedric Notredame 

Publisher: For Dummies; 2 edition (December 18, 2006) 

ISBN: 0470089857 



Bioinformatics: Sequence and Genome Analysis 

Author: David W. Mount 

Publisher: 2nd Edition, Cold Spring Harbor Lab. Press 



Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 3rd Edition 

Author: Andreas D. Baxevanis, B. F. Francis Ouellette, Wuket Kussm 



Statistical Methods in Bioinformatics: An Introduction (Statistics for Biology and Health) 

Author: Warren J. Ewens and Gregory R. Grant 



Introduction to Bioinformatics Algorithms 

Author: Jones Neil J. and Pevzner Pavel A. 



Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids Paperback 

Author: Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison


Urls about Course
http://www.changlabtw.com/1092-bioinformatics.html
Attachment