

Number of Unites: 4
Schedule: Three hours of lecture and one hour of
discussion per week.
Prerequisites: Basic concepts and algorithms
from probability and statistics
Catalog Description :
Knowledge
discovery is the process of discovering useful
regularities in large and complex data sets.
The field encompasses techniques from
artificial intelligence (representation and
search), statistics (inference), and databases
(data storage and access). When integrated
into useful systems, these techniques can help
human analysts make sense of vast stores of
digital information. This course presents the
fundamental principles of the field,
familiarizes students with the technical
details of representative algorithms.
Expanded Description:
 Data preprocessing
 Data cleaning
 Data transformation
 Data reduction
 Discretization
 Association rules and sequential patterns
 Basic concepts
 Apriori Algorithm
 Mining association rules with multiple
minimum supports
 Mining class association rules
 Sequetial pattern mining
 Supervised learning (Classification)
 Basic concepts
 Decision trees
 Classifier evaluation
 Rule induction
 Classification based on association
rules
 NaiveBayesian learning
 NaiveBayesian learning for text
classification
 Support vector machines
 Knearest neighbor
 Unsupervised learning (Clustering)
 Basic concepts
 Kmeans algorithm
 Representation of clusters
 Hierarchical clustering
 Distance functions
 Data standardization
 Handling mixed attributes
 Which clustering algorithm to use?
 Cluster evaluation
 Discovering holes and data regions
 Postprocessing
 Objective interestingness
 Subjective interestingness
 Information retrieval and Web search
 Basic text processing and
representation
 Cosine similarity
 Relevance feedback and Rocchio
algorithm
 Partially supervised learning
 Semisupervised learning
 Learning from labeled and
unlabeled examples using EM
 Learning from labeled and
unlabeled examples using cotraining
 Learning from positive and unlabeled
examples
 Link analysis
 Social network analysis
 Citation analysis: cocitation and
bibliographic coupling
 The PageRank algoithm (of Google)
 The HITS algorithm: authorities and
hubs
 Mining communities on the Web
 Data extraction and information
integration
Course Objectives & Role in the Program:
This course
has three objectives. First, to provide
students with a sound basis in data mining
tasks and techniques. Second, to ensure that
students are able to read, and critically
evaluate data mining research papers. Third,
to ensue that students are able to implement
and to use some of the important data mining
and text mining algorithms.
Method of Evaluation:
 Midterm: 25%
 Final Exam: 40%
 Projects:
 Project 1: Algorithm implementation
(15%)
 Project 2: Research project (including
implementation) (20%)
Required Books:
Textbooks:
 Building an Intelligent
Web: Theory & Practice, R. Akerkar
& P. Lingras; Jones & Bartlett,
2007.
 Data mining: Concepts and
Techniques, by Jiawei Han and
Micheline Kamber, Morgan Kaufmann
Publishers, ISBN 1558604898.
Reference books:
 Principles of Data Mining, by David Hand,
Heikki Mannila, Padhraic Smyth, The MIT
Press, ISBN 026208290X.
 Introduction to Data
Mining, by PangNing Tan, Michael
Steinbach, and Vipin Kumar, Pearson/Addison
Wesley, ISBN 0321321367.
 Data mining resource site: KDnuggets
Directory
Useful Links:


foo
