SemesterSpring Semester, 2020
DepartmentSocial Networks and Human-Centered Computing, First Year Social Networks and Human-Centered Computing, Second Year
Course NameData Science
InstructorCHANG JIA-MING
Credit3.0
Course TypeElective
Prerequisite
Course Objective
Course Description
Course Schedule

Week01 - Sep. 18

Introduction



What is data science? big data? deep learning?

Three components: data, modeling, evaluation

Data science platforms



  • why choose R programming language?

  • integrated development environment for R : RStudio


?Supporting Materials


  1. Chapter 1, 2, appendix A?







    ?Number of hours invested per week = 6 hours


Week02 - Sep. 25

Documentation and deployment of your code



Version control system by Github

?Supporting Materials



  1. Chapter 10, 11





    ?Number of hours invested per week = 6 hours



Week03 - Oct. 02

How to evaluate output?



Specificity, sensitivity, recall, F-score

Receiver operating characteristic curve, AUC

Statistical significance : p-value, false discovery rate

?Supporting Materials



  1. Chapter 5

  2. ROCR package - Visualizing classifier performance in R




    ?Number of hours invested per week = 6 hours


Week04 - Oct. 09

How to perform evaluation?



Cross-validation

Bootstrap and jackknife sampling

Bias, variance, overfitting

?Supporting Materials?



  1. ?Chapter 6.2




    ?Number of hours invested per week = 6 hours


Week05 - Oct. 16

Feature selection/extraction/reduction



?Principal component analysis (PCA), correspondence analysis (CA)

Probabilistic latent semantic analysis



  • maximum likelihood estimation

  • expectation–maximization algorithm


Supporting Materials


  1. A tutorial on principal component analysis by Jonathon Shlens

  2. Correspondence Analysis and Related Methods by Michael Greenacre

  3. Multivariate statistics by Michael Greenacre




    ?Number of hours invested per week = 6 hours


Week06 - Oct. 23

?Exploring/managing data



?Probabilistic and ideal-data models

Character/parsimony-based method
s

?Supporting Materials



  1. Chapter 3, 4?




    ?Number of hours invested per week = 6 hours


Week07 - Oct. 30

Visualization (1/2)



charts, graphs, networks, maps

?Interactive visualizations - Shiny app


Supporting Materials



  1. Simple Graphs with R

  2. Basic Graphs by Quick R




    ?Number of hours invested per week = 6 hours


Week08 - Nov. 06 

Visualization (2/2)



Workflow: scripts

Exploratory Data Analysis

Workflow: projects Data import


Supporting Materials



  • R for Data Science


    1. Cha 6. Workflow: scripts

    2. Cha 7. Exploratory Data Analysis

    3. Cha 8. Workflow: projects

    4. Cha 11. Data import






    ?Number of hours invested per week = 18 hours


Week09 - Nov. 13

Midterm



Closed book except to one A4 notes


 


Week10 - Nov. 20

?Unsupervised learning



Clustering analysis

Association rule


Supporting Materials



  1. Chapter 6, 8




    ?Number of hours invested per week = 6 hours


Week11 - Nov. 27

Supervised learning (1/6)



Memorization methods?

Supporting Materials



  1. Chapter 6




    ?Number of hours invested per week = 6 hours


Week12 - Dec. 04

Supervised learning (2/6)



Linear regression

?Supporting Materials



  1. PSDR: Chapter 7.1

  2. ISLR: Chapter 3




    ?Number of hours invested per week = 6 hours


Week13 - Dec. 11

Supervised learning (3/6)



Logistic regression

?Supporting Materials



  1. PSDR: Chapter 7.2

  2. ISLR: Chapter 4




    ?Number of hours invested per week = 6 hours


Week14 - Dec. 18

Supervised learning (4/6)



Generalized Additive Models

Supporting Materials



  1. PSDR: Chapter 9.1?

  2. ISLR: Chapter 7




    ?Number of hours invested per week = 6 hours


Week15 - Dec. 25

Supervised learning (5/6)



Decision Tree & Random forest

Supporting Materials



  1. PSDR: Chapter 9.1??

  2. ISLR: Chapter 8




    ?Number of hours invested per week = 6 hours


Week16 - Jan. 01 2019 Holiday



 



Week17 - Jan. 08

Supervised learning (6/6)



Kernel Methods

SVM   


Supporting Materials



  1. PSDR: Chapter 9.3, 9.4

  2. ISLR: Chapter 9

  3. Support vector machines and kernel methods: status and challenges by Chih-Jen Lin

  4. Talks about Machine Learning by Chih-Jen Lin




    ?Number of hours invested per week = 24 hours


Week18 - Jan. 15

Final project presentation




Teaching Methods
Teaching Assistant

  • Prepare assignments

  • Grade assignments

  • Maintain content in Moodle

  • Answer students' questions


Requirement/Grading

  • Homework     60%

  • Midterm     15%

  • Final project    25%

  • Attendance/Participation (bonus) ≤ 10%


Textbook & Reference

  • 指定




  1. PSDRPractical Data Science with R. by Zumel, N. & Mount, J.  (Manning, 2014).  ISBN-10: 1617291560

  2. ?ISLRAn Introduction to Statistical Learning with Applications in R? by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

  3. R for Data Science: Visualize, Model, Transform, Tidy, and Import Data, by Hadley Wickham  & Garrett Grolemund (1st Edition) 



 




  • 其他參考資料




  1. ?How to Measure Anything Workbook: Finding the Value of Intangibles in Business

  2. ?Additional material (Credit by Thomas M. Carsey, carsey@unc.edu)

  3. Data Mining with R: Learning with Case Studies, by Torgo, http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR/

  4. An Introduction to Data Science, Version 3, by Stanton, http://jsresearch.net/

  5. Machine Learning with R by Lantz, http://www.packtpub.com/machine-learning-with-r/book

  6. A Simple Introduction to Data Science, by Burlingame and Nielsen, http://newstreetcommunications.com/businesstechnical/a_simple_introduction_to_data_science

  7. Ethics of Big Data, by Davis, http://shop.oreilly.com/product/0636920021872.do

  8. Privacy and Big Data, by Craig and Ludloff, http://shop.oreilly.com/product/0636920020103.do

  9. Doing Data Science: Straight Talk from the Frontline, by O’Neil and Schutt, http://shop.oreilly.com/product/0636920028529.do

  10. Springer Textbooks Use R! Series, http://www.springer.com/series/6991

  11. Online search tool Rseek, http://www.rseek.org/

  12. ?The Odum Institute’s online course, http://www.odum.unc.edu/odum/contentSubpage.jsp?nodeid=670


Urls about Course
http://www.changlabtw.com/1082-datascience.html
Attachment