Statistics 5525: Data Analytics I

Basic Stucture

Aug. 28, 2017

The focus in this course is on contemporary statistical methodologies that are both algorithmically and computationally oriented and especially useful for analysis of high dimensional data (data with both a large number of observations and a large number of variables).

We are drowning in information and starving for knowledge. -R. D. Roger

The basic topics to be discussed in this course follow as:

  • Machine Learning vs. Data Mining vs. Statistics,
  • Linear models/classifiers, Nearest Neighbors, and Bayes Classifiers,
  • Exploratory Data Analysis,
  • Generalized Linear Models (Logit and Probit Regression),
  • High Dimensional Analysis (Regularization and the LASSO),
  • Principal Components (Projections and Projectors, PCA, Clustering, Regression, Probabilistic PCA),
  • Tree Methods (CART, Random Forests, Neighbor Joining Trees),
  • Heuristic Clustering (K means, Hierarchical Clustering, Biclustering),
  • Probabilistic Clustering (model based clustering and Dirichlet Processes).


We will continue with:

  • Distance Methods (MDS and the relationship to PCA, Self Organizing Maps, the Generative Topographical Mapping, Graphical Models, and the ISOMap),
  • Supervised Learning (Discriminant analysis, Naive Bayes, Supervised PCA, Support Vector Machines, and Kernel methods).

How are 5525 and 5526 different?

STAT 5526 will be a comprehensive course focusing on more theoretical concepts which come up in this class. The purpose of STAT 5525 is to present many of the algorithms and techniques used in DA, and apply them to data. The exercises in 5525 will focus both on both theoretical and practical issues. Discussions of Reproducing Kernel Hilbert Spaces (RKHB) and the mathematical details of kernel methods will be reserved for 5526.

Upcoming Conferences

Statistics Jobs

Statistics Organizations