MCS314 ( Diwali Semester 2009) Special Topics in Data Mining

Advanced Classification Techniques (4 credit course : 3-0-2)

 

Being an advanced course in Data Mining, the pre-requisite is MCS104. Other pre-requisite is a course on Algorithms and fair understanding of statistics and linear algebra.  Strong programming skills and dexterity with complex data structures will be of advantage.

 

Tentative Weekly schedule

Week

Topics

Suggested Tasks

Week 1

Recapitulate Classification,, Taxonomy of Classification Methods, Decision Functions and Notation building

Download C4.5 and get familiarized

Revise scripting language

Week 2 - 3

Evaluation of Classifiers, No Free Lunch theorem

Use gnuplot to draw graphs, Check Assignment 1

Week 4 - 5

Decision Trees : Splitting criteria, Pruning, Induction Algorithms

Carry on with the assignment. All the best!!

Week 6- 9

Support Vector Machines

Download an SVM implementation and get familiarized

Check Assignment 2

Week 10-12

Classification Ensembles

Check Assignment 3

Week 13-14

Hybrid Classifiers

Good luck with the assignment

 

Books:

  1. Data Mining with Decision Trees : Theory and Applications, Lior Rokash and Oded Maimon, (2008), World Scientific Publication.
  2. Suuport Vector Machines and other kernel based learning methods, Nello Cristianini and John Shawe-Taylor, (2000), Cambridge University Press.

Supporting Texts

  1. Neural Networks (Second Edition), (1999) Simon Haykin, PHI
  2. Pattern Classification (Second Edition), (2001) Duda, Hart and Stork, John Wiley
  3. Data Mining: Concepts and Techniques, Han and Kamber (Morgan Koffmann, 2006)
  4. Principles of Data Mining, David J. Hand, Heikki Mannila and Padhraic Smyth
    (PHI)

Research Papers

(To be announced)

Internal Assessment  

There will be three  programming assignments (10 marks each) for the course to be done in the group of two. The deadlines are strict.  

2 Minors as per the schedule displayed on the department notice board (15  marks each)

Syllabus for Minor 1 : Portion of the syllabus covered up to week 6.

Syllabus for Minor 2 : Syllabus covered between week 7 - 12

 

Assignment 1 (Submission date 1 Sept 2009)

Write a script/program for cross-validation. Use a dataset from UCI ML repository and C4.5 classifier with default parameters and do the following:

i)                    for each fold compute precision, recall, specificity and f-measure, plot graph (measure vs. fold)

ii)                  compute cross validation measures for different values of k and plot graph (measure vs. k)

iii)                Draw ROC curve by varying the pruning confidence level

iv)                Write a report containing the three graphs and your observations

 

Assignment 2: As announced in class. Submission deadline Sept 28th

 

Assignment 3: As announced in class. Evaluation to take place in lab on Oct 27th.