MCS314 ( Diwali Semester 2010) Special Topics in Data Mining

Advanced Classification: Ensemble Techniques (4 credit course : 3-0-2)

Being an advanced course in Data Mining, the pre-requisite is MCS104. Other pre-requisite is a course on Algorithms and fair understanding of statistics and linear algebra.  Strong programming skills and dexterity with complex data structures will be of advantage.

Tentative Weekly schedule

Week Topics Suggested Lab Tasks
Week 1 - 3 Recapitulate Classification, Bayesian Decision Theory, Taxonomy of Classification Methods, Decision Functions and Notation building Learn Latex, GnuPlot Revise scripting language Check Assignment 1
Week 4 - 5 Evaluation of Classifiers, Occam's Razor, No Free Lunch theorem  
Week 6 - 16 Ensemble Techniques Check Assignment 2

Text Books:

1.      Data Mining with Decision Trees : Theory and Applications, Lior Rokash and Oded Maimon, (2008), World Scientific Publication.

2.      Pattern Classification (Second Edition), (2001) Duda, Hart and Stork, John Wiley

Supporting Texts

3.      Neural Networks (Second Edition), (1999) Simon Haykin, PHI

4.      Data Mining: Concepts and Techniques, Han and Kamber (Morgan Koffmann, 2006)

5.      Principles of Data Mining, David J. Hand, Heikki Mannila and Padhraic Smyth
(PHI)

Research Papers

1.      Introduction to ROC Analysis (A detailed version available here)

  1.  

Internal Assessment  

Programming assignments (20 marks )

Minors (30  marks)

Syllabus for Minor 1: Portion of the syllabus covered up to week 6.

Syllabus for Minor 2 : Portion of the syllabus covered between week 7 and 11

 Assignment 1  (Submission date 5 Sept 2009)

Implement Naive Bayes Classifier. For five data sets in weka, run two classification algorithms and note the performance. Observe the performance of NB classifier for these data sets. Plot the observations and write a report.

 Assignment 2: As announce in class; Submit by Oct 10; Evaluation schedule to be announced.