Fall 2011   CSE4334 / 5334   Data Mining


Course Information:

Instructor: Chengkai Li

  • Office hours: Tue/Thu 11am - 1pm
  • Office: ERB 652
  • Phone: (817) 272-0162
  • E-mail: cli [AT] uta [DOT] edu
  • Homepage: http://ranger.uta.edu/~cli

TA: Saravanan Thirumuruganathan

  • Office Hours: Friday 10am-12pm
  • Office: ERB 504
  • Phone: 817-201-5046
  • E-mail: saravanan.thirumuruganathan@gmail.com

Course Description: This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web and text.

Prerequisites: CSE 3330/5330  Database Systems I     or     CSE 4331/5331  Database Systems II      or     similar courses    or     consent of instructor

Textbook

Grades


Announcements: Stay tuned and make sure to check Blackboard frequently. Important announcements will be posted there.

Assignments and Deadlines

Regrading: Regrading request must be made within 7 days after we post scores on Blackboard. TA will handle regrade requests. If student is not satisfied with the regarding results, you get 7 days to request again. The instructor will regrade, and the decision is final.


Ethics Policies and Academic Integrity: The College cannot and will not tolerate any form of academic dishonesty by its students. This includes, but is not limited to cheating on examinations, plagiarism, or collusion (explained in the document below). Students are required to read the following document carefully, sign it, return the signed copy to the instructor, and keep a copy for their own records. Hardcopies of this document will be provided to the students in the first class, and also can be picked up in the instructor's office. If you print by yourself, please make it double-sided.

Statement on Ethics, Professionalism, and Conduct for Engineering Students

Miscellaneous: If you require accommodation based on disability, I would like to meet with you in the privacy of my office during the first week of the semester to ensure that you are appropriately accommodated. Please read the page of the office for students with disabilities.


Schedule:

Date # Lecture Assignment Lecture Notes Extra Reading
Out Due
08/25 1 Course Overview     [PPT]  
08/30 2 Introduction (Chapter 1)     [PPT]  
09/01 3 Prominent Streak Discovery HW1   [PPT] Prominent Streak paper
Data Warehousing, OLAP, Data Cube (Chapter 3, 4)
09/06 4 Prominent Streak Discovery        
09/08 5 Course Project        
09/13 6 Data Warehousing, OLAP, Data Cube     [PPT]  
09/15 7 Data Warehousing, OLAP, Data Cube   HW1    
09/20 8 Data Warehousing, OLAP, Data Cube        
Classification and Prediction (Chapter 6)
09/22 9 Decision Tree     [PPT]  
09/27 10 Decision Tree (cont'd)        
09/29 11 Bayesian Classifiers HW2   [PPT]  
10/04 12 Bayesian Classifiers (cont'd)        
10/06 13 Rule-Based Classifiers     [PPT]  
10/11 14 Nearest Neighbor Classifiers   HW2
Project Proposal
[PPT]  
10/13   Midterm Exam (Thursday, Oct. 13th, 9:30am-10:50am, ERB130)
10/18 15 Evaluating Classification Models     [PPT]  
10/20 16 Evaluating Classification Models (cont'd)   HW3      
10/25 17 Support Vector Machine     [PPT]  
Clustering (Chapter 7)
10/27 18 Overview of Clustering, Similarity/Dissimilarity Measure [PPT]  
Text and Web Mining (1)
11/01 19 Vector Space Model     [PDF] textbook excerpt
11/03 20 Document Classification/Document Clustering     document camera  
Clustering (Chapter 7)
11/08 21 K-means   Project Progress [PPT]
document camera
 
11/10 22 K-means (cont'd)   Project Progress    
11/15 23 Hierarchical clustering     [PPT]  
11/17 24 Hierarchical clustering (cont'd)       textbook excerpt (in Blackboard)
Frequent Pattern and Association Rule Mining (Chapter 5)
11/22 25 Association Rule Mining   HW3 [PPT]  
11/24   Thanksgiving Holidays        
11/29 26 Correlation Analysis     [PPT]  
Text and Web Mining (2)
12/01 27

Link Analysis: PageRank

    [PDF] textbook excerpt
12/06 28 MapReduce     [PPT]  
12/08 29 Final Review   Project Report [PPT]  
12/09   Project Demo/Presentation (Friday, Dec. 9th, ERB 501)
12/15   Final Exam (Thursday, Dec. 15th, 8am-10:30am, ERB 130)