Fall 2010   CSE4392 / 5334   Data Mining


Resources: Google       Google Scholar      CiteSeer       DBLP Bibliography    ACM Digital Library       IEEE Xplore       Other Computer Science articles


Course Information:

Instructor: Chengkai Li

TA: Ning Yan

  • Office hours: Wednesdays 3-5pm
  • Office: GeoScience 237
  • Phone: 817-272-0896
  • E-mail: ning.yan@mavs.uta.edu
  • Homepage: http://idir.uta.edu/~nyan/

Course Description: This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web and text.

Prerequisites: CSE 3330/5330  Database Systems I     or     CSE 4331/5331  Database Systems II      or     similar courses    or     consent of instructor

Textbook

Grades


Announcements: Stay tuned and make sure to check WebCT frequently. Important announcements will be posted there.

Assignments and Deadlines

Regrading: Regrading request must be made within 7 days after we post scores on WebCT. TA will handle regrade requests. If student is not satisfied with the regarding results, you get 7 days to request again. The instructor will regrade, and the decision is final.

WebCT: Log in to the WebCT page http://www.uta.edu/webct with your NetID and password. We use WebCT for: (1) Announcements; (2) Assignment Submission; (3) Discussion Group;  (4) Releasing materials, assignments, scores and grades. Follow these steps exactly during electronic assignment submission.


Ethics Policies and Academic Integrity: The College cannot and will not tolerate any form of academic dishonesty by its students. This includes, but is not limited to cheating on examinations, plagiarism, or collusion (explained in the document below). Students are required to read the following document carefully, sign it, return the signed copy to the instructor, and keep a copy for their own records. Hardcopies of this document will be provided to the students in the first class, and also can be picked up in the instructor's office. If you print by yourself, please make it double-sided.

Statement on Ethics, Professionalism, and Conduct for Engineering Students

Miscellaneous: If you require accommodation based on disability, I would like to meet with you in the privacy of my office during the first week of the semester to ensure that you are appropriately accommodated. Please read the page of the office for students with disabilities.


Schedule:

Date # Lecture Assignment Lecture Notes
Out Due
08/26 1 Course Overview     [PDF]
08/31 2 Introduction (Chapter 1)     [PDF]
Data Warehousing, OLAP, Data Cube (Chapter 3, 4)
09/02 3 Data Warehousing and OLAP HW1   [PDF]
09/07 4 Data Cube      
Classification and Prediction (Chapter 6)
09/09 5 Decision Tree     [PDF]
09/14 6 Decision Tree (cont'd)      
09/16 7 Evaluating Classification Models P1 HW1 [PDF]
09/21 8 Evaluating Classification Models (cont'd)      
09/23 9 Bayesian Classifiers     [PDF]
09/28 10 Nearest Neighbor Classifiers     [PDF]
09/30 11 Support Vector Machine HW2   [PDF]
Frequent Pattern and Association Rule Mining (Chapter 5)
10/05 12 Association Rule Mining     [PDF]
[PPT]
10/07 13 Correlation Analysis     [PDF]
[PPT]
Data Preprocessing (Chapter 2)
10/12 14 Data, Data Quality, Data Preprocessing   HW2  
10/14   Midterm Exam (Thursday, Oct. 14th, 2:00pm-3:20pm, WH210)
Clustering (Chapter 7)
10/19 15 Overview of Clustering, Similarity/Dissimilarity Measure     [PDF]
[PPT]
10/21 16     P1 (Due at Oct. 24)  
10/26 17 K-means     [PDF]
[PPT]
10/28 18 K-means (cont'd)      
11/02 19 Hierarchical     [PDF]
[PPT]
11/04 20 Hierarchical (cont'd) P2, HW3    
Text and Web Mining
11/09 21 Vector Space Model     [PDF]
11/11 22 Document Classification     [PDF]
11/16 23 Document Clustering     [PDF]
11/18 24 MapReduce P3 HW3 (Due at Nov. 19) [PDF]
[PPT
11/23 25 MapReduce   P2  
11/25   Thanksgiving Holidays      
11/30 26 MapReduce      
12/02 27

Link Analysis: PageRank

    [PDF]
12/07 28 Link Analysis (cont'd)      
12/09 29 Final Review   P3 (Due at Dec. 11) [PDF]
12/14   Final Exam (Tuesday, Dec. 14th, 2:00pm-4:30pm, WH210)

University calendar: Fall 2010