MCS 203 (Jan - May 2019) Data Mining
Data
has been accumulating throughout the computer age in many forms, including
database systems, spreadsheets, text files, and recently web pages. Data mining
aims to search through data for hidden relationships and patterns. It
has application potential in both commercial and scientific domains. Data
Mining has emerged as a field of inter-disciplinary studies in
Computer Science.
This
is an introductory course on Data Mining. This is a 4 credit course (3-0-2),
requiring intensive programming. Pre-requisites for this course are good
understanding of statistics, C++ programming on Linux, and good practice of
using and designing complex data structures.
Text
Book:
1.
Data Mining and
Analysis: Fundamental Concepts and Algorithms : Zaki, Mohammed J., and Wagner
Meira Jr., (Cambridge University Press, 2014) [Click here to download
draft copy of the book]
Ref.
Books:
2.
Introduction to
Data Mining: Tan, Stienbach and Vipin Kumar, (Pearson Education 2010)
3.
Data Mining:
Concepts and Techniques, Han and Kamber (Morgan Koffmann, 2010)
4.
Principles of Data Mining, David J. Hand, Heikki Mannila and
Padhraic Smyth
(PHI)
Course Plan
Week |
Topics from Text Book |
1 |
Chapters 1 |
2-7 |
Clustering Algorithms: Hierarchical clustering (Chapter 14), Partition based - K means and Expectation-Maximization (Chapter 13), Kernal trick (Chap 5), Density based clustering -Dbscan (Chapter 15), Cluster validation |
8-12 |
Supervised Learning: Bayes Classification, KNN classifier (Chapter 18), Decision Tree (Chapter 19), Evaluation of classifiers (Chapter 22) |
13-15 |
Frequent item-set Mining (Chapter 8) |
Lab Work
Week |
Tasks |
Remarks |
1-3 |
Download and install Weka. Familiarize yourself with Weka. Explore data using Weka. Learn to use attribute filters for discretization, etc. Practice use of filters |
Click
here
for Weka |
4 |
Introduction to R, Visualization in R |
Click here to download R |
4-6 |
Start working on the project |
Exploratory data analysis |
7-14 |
Project work |
Submit report by 15 Apr 2019 |
30 marks are for
internal assessment. Breakup is
10 Minor 1, as per schedule displayed on the
notice board.
15 Project
05 Assignments
Project
You have
to work in groups of four. Diversity in group will bring advantage.
Visit open data portal of Government of India data.gov.in . Choose a sector of your interest. Study the available data sets for the
sector.
i) Write a one page project proposal
mentioning: Sector, Data
sets intended to be used, Business
questions that can be answered, Broad data
mining techniques may be used to answer the questions. Submit the
hard copy (single page) by 31st
January 2019.
ii) Explore datasets, write preliminary
observations about the data quality, re-examine
the business question and revise if required. Liberally
use visualization techniques to prove your point. Ensure that the
visualizations are meaningful and have appropriate legends, captions etc. Submit
the soft copy of the report by 15th February 2019.
No page limit.
iii) Submit detailed report of the
observed patterns in data. Clearly write the data mining technique used, algorithm used, reason for selecting the
algorithm, pre-processing
done. How do the observed patterns answer your
business questions? Submit the soft copy of the report by
15th April 2019. No page limit.
Syllabus for Minor 1: TBA