MCS 203 (Jan - May 2019)  Data Mining

Data has been accumulating throughout the computer age in many forms, including database systems, spreadsheets, text files, and recently web pages. Data mining aims to search through data for hidden relationships and patterns. It has application potential in both commercial and scientific domains. Data Mining has emerged as a field of inter-disciplinary  studies in Computer Science.

This is an introductory course on Data Mining. This is a 4 credit course (3-0-2), requiring intensive programming. Pre-requisites for this course are good understanding of statistics, C++ programming on Linux, and good practice of using and designing complex data structures.

Text Book:

1.     Data Mining and Analysis: Fundamental Concepts and Algorithms : Zaki, Mohammed J., and Wagner Meira Jr., (Cambridge University Press, 2014) [Click here to download draft copy of the book]

Ref. Books:

2.     Introduction to Data Mining: Tan, Stienbach and Vipin Kumar, (Pearson Education 2010)

3.     Data Mining: Concepts and Techniques, Han and Kamber (Morgan Koffmann, 2010)

4.     Principles of Data Mining, David J. Hand, Heikki Mannila and Padhraic Smyth
(PHI)

Course Plan 

Week

Topics from Text Book

1

Chapters 1

2-7

Clustering Algorithms: Hierarchical clustering (Chapter 14), Partition based - K means and Expectation-Maximization (Chapter 13), Kernal trick (Chap 5), Density based clustering -Dbscan (Chapter 15), Cluster validation

8-12

Supervised Learning: Bayes Classification, KNN classifier (Chapter 18), Decision Tree (Chapter 19), Evaluation of classifiers (Chapter 22)

13-15

Frequent item-set Mining (Chapter 8)

 

Lab Work

Week

Tasks

Remarks

1-3

Download and install Weka. Familiarize yourself with Weka. Explore data using Weka. Learn to use attribute filters for discretization, etc.

Practice use of filters

Click here for Weka

4

Introduction to R, Visualization in R

Click here to download R

4-6

Start working on the project

Exploratory data analysis

7-14

Project work

Submit report by 15 Apr 2019

30 marks are for internal assessment. Breakup is

10 Minor 1, as per schedule displayed on the notice board.

15 Project

05 Assignments

 Project

You have to work in groups of four. Diversity in group will bring advantage.

Visit open data portal of Government of India data.gov.in . Choose a sector of your interest.  Study the available data sets for the sector.

i) Write a one page project proposal mentioning: Sector, Data sets intended to be used, Business questions that can be answeredBroad data mining techniques may be used to answer the questions.  Submit the hard copy (single page) by 31st January 2019

ii) Explore datasets, write preliminary observations about the data quality, re-examine the business question and revise if required. Liberally use visualization techniques to prove your point. Ensure that the visualizations are meaningful and have appropriate legends, captions etc. Submit the soft copy of the report by 15th February 2019.  No page limit.

iii) Submit  detailed report of the observed patterns in data. Clearly write the data mining technique used, algorithm used, reason for selecting the algorithm,  pre-processing done. How do the observed patterns answer your business questions? Submit the soft copy of the report by 15th April 2019.  No page limit.

Syllabus for Minor 1: TBA