Course Description and Syllabus

Catalog Description: A data driven approach for the computational and statistical understanding and expertise needed to solve bioinformatics problems that you will likely encounter in your research. Topics will include: microarray data analysis, high-throughput sequence data analysis and SNP genotyping analysis as well as some additional specific advanced topics.

Class Meetings: Monday/Wednesday 10:00-11:30 in LSS 440 (IBEST Classroom)

Course Credits: 3 cr.

Prerequisites: One of CS 120 (Computer Science I), Stat 452 (Mathematical Statistics), Biology 456 (Computer Skills for Biologists), or with permission.

Textbook: None

Instructor: Matt Settles

Office: LSS 441C; Phone:885-6051;

Teaching Assistant: Matt Pennell

Office:CNR 212;Phone:208-874-7539;

Course Goals: Following this course the student will be capable of performing their own data analysis project, understanding the technical and statistical tools needed to conduct the analysis with the computational ability to do so, and critically review and implement techniques and methods in publications.

Course Format: The course will be divided into both lecture and lab/workshop sessions.

Topics will include:

  • Expression microarray analysis
  • CGH/CHiP-Chip microarray analysis
  • Phylogenetic methods
  • High-throughput sequence assembly
  • High-throughput sequence mapping
  • RNA-seq studies
  • Metagenomics
  • Whole Genome Association Studies

Course Grading:A point system will be used for grading. Your semester grade will be based on a standard grading curve (90%, 80%, 70%,…) of the cumulative number of points you have earned by the last day of finals week. There will be 6 projects (each worth a 12 points), and roughly 14 publication reviews (each worth 2 points), for a total of 100 points.

Each project will be a report on the analysis of public data (or your own data) using the techniques discussed in class. The reports must be written using Latex with embedded R code of the complete analysis. A template with brief introduction is provided in the documents section below.

Publication reviews will be short 1/2 to full page comments on assigned methods papers. A Template is provided in the documents section below.

Assigned Reading

Rob Jelier, Jelle Geoman, Kristina Hettne, Martin Schuemie, Johan Dunnen and Peter Hoen. Literature-aided interpretation of gene expression data with the weighted global test. Briefing in Bioinformatics, 12(5):518-529, 2010.

Yao Yu, Kang Tu, Siyuan Zheng, Yun Li, Guohui Ding, Jie Ping, Pei Hao and Yixue Li. GEOGLE: context mining tool for the correlation between gene expression and the phenotypic distinction. BMC Bioinformatics, 10:264, 2009.

Rafael Irizarry, Christine Ladd-Acosta, Benilton Carvalho, Hao Wu, Sheri Brandenburg, Jeffrey Jeddeloh, Bo Wen, and Andrew Feinberg. Comprehensive high-throughput arrays for relative methylation (CHARM). Genome Research, 18: 780-790, 2008.

onathan M. Eastman, Michael E. Alfaro, Paul Joyce, Andrew L. Hipp, and Luke J. Harmon. A novel comparative method for identifying shifts in the rate of character evolution on trees. Evolution 65(12):3578–3589, 2011.

Alexei J. Drummond, Marc A. Suchard, Dong Xie, and Andrew Rambaut. Bayesian phylogenetics with BEAUti and the BEAST 1.7. MBE Advance Access published February 25, 2012.

Wenyu Zhang, Jiajia Chen, Yang Yang, Yifei Tang, Jing Shang, Bairong Shen. A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies. PLoS ONE 6(3): e17915, 2011.

Daniel R. Zerbino and Ewan Birney Velvet: Algorithms for de novo short read assembly using de Bruijn graphs Genome Res. 2008. 18: 821-829.

Ben Langmead and Steven L Salzberg Fast gapped-read alignment with Bowtie 2 Nature Methods 9, 357–359 (2012).

Manfred G Grabherr, Brian J Haas, Moran Yassour, et. al Full-length transcriptome assembly from RNA-Seq data without a reference genome Nature Biotechnology 29,7, (2011).

Simon Anders, Wolfgang HuberDifferential expression analysis for sequence count data Genome Biology 2010, 11:R106

Davide Risso, Katja Schwartz, Gavin Sherlock and Sandrine Dudoit GC-Content Normalization for RNA-Seq Data BMC Bioinformatics 2011, 12:480.

Chizu Tanikawa, et al. A genome­wide association study identifies two susceptibility loci for duodenal ulcer in the Japanese population Nature Genetics, 44:4, April 2012.

Shu-Yi Su, et al. and Lachlan J.M. Coin Inferring combined CNV/SNP haplotypes from genotype data Bioinformatics, 6:11, 2010.