Syllabus
Statistics 5525 will be a comprehensive course in Data Mining, Machine Learning, and Probabilistic Modeling techniques. The course covers techniques in supervised, unsupervised, and visualized learning in high dimensional spaces. Theoretical, probabilistic, and applied aspects of data analytics. Methods include generalized linear models in high dimensional spaces, regularization, lasso and related methods, principal component regression (pca), tree methods, and random forests. Clustering methods including K means, hierarchical clustering, biclustering, and model-based clustering will be thoroughly examined. Distance-based learning methods include multi dimensional scaling, the self organizing map, graphical/network models, and isomap. Supervised learning will consist of discriminant analyses, supervised pca, support vector machines, and kernel methods.
How are 5525 and 5526 different?
STAT 5526 will be a comprehensive course focusing on more theoretical concepts which come up in this class. The purpose of STAT 5525 is to present many of the algorithms and techniques used in DA, and apply them to data. The exercises in 5525 will focus both on both theoretical and practical issues. Discussions of Reproducing Kernel Hilbert Spaces (RKHB) and the mathematical details of kernel methods will be reserved for 5526. Familiarity with a variety of techniques will be developed in 5525, and expanded upon in 5526.
We are drowning in information and starving for knowledge. -R. D. Roger
Grading policies, office hours, and general information
Course Objectives
- To To develop an understanding of techniques in Machine Learning, Data Mining, and Probabilistic based modeling.
- To compare and contrast algorithmic and model based learning techniques.
- To understand the theory behind these techniques and implement them.
Logistics
- Lecture Times and Location: M/W/F, 8:00 - 8:50 PM, in Hutcheson 204.
- Instructor: Professor Scotland Leman, 401A Hutcheson Building, , leman(AT)vt(DOT)edu
- Instructor's Office Hours: After each class and group meetings
- Teaching Assistants: Sumin Shen
- TAs' Office Hours: (TBA)
Prerequisites
Readings
The primary text is:
Hastie, Friedman, and Tibshirani (2009). The Elements of Statistical Learning: Second Edition, ISBN: 978-0-387-84857-0. Springer
This is a very comprehensive book on Statistical Learning models and algorithms; however, this text should not limit your reading from other relevant texts.
A good supplementary text is:
Chistopher Bishop (2007). Pattern Recognition and Machine Learning: Second Edition, ISBN: 978-0387310732. Springer
This book takes Bayesian inference as a primitive, and extends theory to machine learning.
Computing
For computing, you may use any upper level language of your choosing. For instance, C/C++, Java, Matlab, R, all make for reasonable choices.
Graded work
Graded work for the course will consist of problem sets, computational problems, and some mini quizzes. You may work in teams of 2-3 people for bi-weekly homework (bi-weekly HW teams may be altered throughout the semester). For the final project, you will be in teams of 4-5, but this team will be permanent and changes will not be allowed. Your final grade will be determined as follows:
Quizzes | 10 % |
Homeworks | 40 % |
Project | 50 % |
There are no make-ups for exams, in-class or homework problems except for a medical or familial emergency or previous
approval of the instructor. See the instructor in advance of
relevant due dates to discuss possible alternatives.
Cumulative numerical averages of 90 - 100 are guaranteed at least an
A-. Cumulative numerical averages of 80 - 89 are guaranteed at
least a B-. Cumulative numerical averages of 70 - 79 are
guaranteed at least a C-. Cumulative numerical averages of 60 -
69 are guaranteed at least a D-. These ranges may be lowered,
but
they will not be raised (e.g., if everyone has averages in the 90s,
everyone gets at least an A-).
Academic honesty
You are expected to abide by Virginia Tech's Community Standard for all work
for this course. Violations of the Standard will result in a
failing final grade for this course and will be reported to the Dean of
Students for
adjudication. Ignorance of what constitutes academic dishonesty
is
not a justifiable excuse for violations.
For the homework problems, you may work with a study group with
others
but must submit your own answers, unless otherwise indicated. For exams, you are required to work alone and for only the
specified time period. .
Procedures if you suspect your work has been graded incorrectly
Every effort will be made to mark your work accurately. You should be credited with all the points you've worked hard to earn! However, sometimes grading mistakes happen. If you believe that an error has been made on an in-class problem or exam, return the paper to the instructor immediately, stating your claim in writing.
The following claims will be considered for re-grading:
(i) points are not totaled correctly;
(ii) the grader did not see a correct answer that is on
your paper;
(iii) your answer is the same as the correct answer, but in a
different form (e.g., you wrote a correct answer as 1/3 and the grader
was looking for .333);
(iv) your answer to a free response question is essentially
correct but stated slightly differently than the grader's
interpretation.
The following claims will not be considered for re-grading:
(v) arguments about the number of points lost;
(vi) arguments about question wording.
Considering re-grades takes up valuable time and resources that TAs
and the instructor would rather spend helping you understand
material. Please be considerate and only bring claims of type
(i),
(ii), (iii), or (iv) to our attention.