STAT 540 Homepage

STAT 519 Multivariate Analysis



Welcome to Statistics 519 Multivariate Analysis, 2009 Spring.

 

    Syllabus

    Textbooks: 

1.       Analyzing Multivariate Data, by J. Lattin, J.D. Carroll, and P.E. Green, Duxbury, 2003.

2.      An R and S-PLUS Companion o Multivariate Analysis, by Brian Everitt, Springer, 2007.  (datasets and R codes used in the book is available at http://biostatistics.iop.kcl.ac.uk/publications/everitt/

3.      Statistical Data Mining by Wiesner Vos & Ludger Evers.

 

    Grades:     

·         Homework            50%

·         Mid-term              20%    

Some areas of focus for your review:

 

Proficient in R graphics and commands.

 

EDA graphics and summary statistics for multivariate data. Mean, variance, covariance, correlation of standardized and original data, and the theoretical relationship between them. Rotations and Orthognal transformation. Be able to produce various plots.

 

Vector and geometries of matrices, scalar product, linear combinations of vectors, mean and variance of linear combinations of vectors, matrix representation, singular value decomposition.

 

PCA, methodology, matrix representation, eigenvalues, eigenvectors, and related properties. Ways to choose the number of PCs, able to conduct all numerical work in R, and all theoretically proofs between various entities relating, PC scores, PC loadings, eigenvectors, eigenvalues, variance of PC projections, linear combination of the original variables, orthogonal matrices, diagonal matrices, stretching, shrinkage, new space, old space. Solving the eigenvalues and eigenvectors solution algrabraically in a quadratic case. Validation and Bootstrapping concepts and R implementation.

 

·         Final                      30%    

 

    Homework Sets:   

·         Homework 1

·         Homework 2

·         Homework 3

·         Homework 4

·         Homework 5

·         Homework 6

·         Homework 7

 

   Intended Course Coverage:  Not necessarily in the following order.

·         Introduction to R

·         R Graphics

·         Vector and Matrix Geometry

·         Principal Component Analysis

·         Factor Analysis

·         Multidimensional Scaling

·         Hierarchical Clustering

·         Distance Measures

·         K-Means Clustering

·         Canonical Correlation

·         Multivariate Normal Distribution

·         MANOVA

·         Discriminant Analysis

·         Classification Trees (time permitting)

·         Random Forests (time permitting)

 

   R example codes:  (Credits to source code authors: Brian Everitt [Bold font]; Michael Minnotte [Roman font]; Marloes Maathuis [Italic font] )

·         Introduction to R (Same as Homework 1)

·         Multivariate Data and Multivariate Analysis

·         R Graphics I

·         R Graphics II

·         Looking at Multivariate Data

·         Vector and Matrix Geometry

 

·         Principal Component Analysis

·         PCA

·         Exploratory Factor Analysis

·         EFA

·         CFA

 

·         Hierarchical Clustering

o    Hierarchical Clustering applet

·         K-Means Clustering

o    K-Means applet

·         Model-based Clustering, or Mixture models Clustering. See section 3.3.2 Gaussian mixture models of Statistical Data Mining by Wiesner Vos & Ludger Evers

o    EM algorithm for mixture model applet

o    Another mixture applet

o    MCLUST in R: Multivariate Normal Mixture Modeling and Model-Based Clustering

·         Cluster Analysis

 

·         Discriminant Analysis See chapter 4 Classification of Statistical Data Mining by Wiesner Vos & Ludger Evers

·         MANOVA and Discriminant Function Analysis

 

·         Multidimensional Scaling and Corresponding Analysis

·         Multidimensional Scaling

 

·         Canonical Correlation

 

·         Classification Trees

·         Random Forests

 

  

   R help from CRAN (The Comprehensive R Archive Network):

·         R reference cards  (4 pages)

·         "R for Beginners" by Emmanuel Paradis  (72 pages)

·         "An Introduction to R" by Venables and Smith   (93 pages)

·         Other  R 'official' Manuals

·         Other  Contributed Documentations  from CRAN

 

    R GUI:

·         R Editor:  A good text editor for writing and editing your R codes is Tinn-R from http://www.sciviews.org/Tinn-R/

·         R Commander: A nice R interface; a standard package which can be easily install. Just type   install.packages("Rcmdr", dependencies=TRUE)  at the R prompt.

   

    Datasets:

·         http://www.webpages.uidaho.edu/~stevel/519/Data/

·         http://www.webpages.uidaho.edu/~stevel/519/text ASCII data sets/  (for the text Analyzing Multivariate Data)

·         http://www.webpages.uidaho.edu/~stevel/519/R.CMA  (for the text An R and S-PLUS Companion, AND R codes and author’s functions)

·         http://archive.ics.uci.edu/ml/  ( UCI Machine Learning Repository)

 

   

     LaTeX:

·         Free implementation of LaTeX, e.g., MiKTeX is best

·         Plain text editor, e.g., Notepad. Much better: WinEdt

·         The Not So Short Introduction to LaTeX2e

   

    Literatures:

·         A survey of clustering algorithms

·         Multivariate Normal Distribution

·         Multivariate Normal Distribution Exercise

·         Applied Multivariate Analysis Notes from U of Canbridge

·         Miscedllaneous R worksheets from U of Cambridge

 

Go back to:
Stephen Lee Home Page
Department of Statistics Home Page
UIdaho
Home Page