Course Information:
|
Instructor: Chengkai Li
|
TA: Nandish Jayaram
|
Course Description: This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.
Student Learning Outcomes: A solid understanding of the basic concepts, prunciples, and techniques in data mining; an ability to analyze real-world applications, to model data mining problems, and to assess different solutions; an ability to design, implement, and evaluate data mining software.
Prerequisites:
Course Project 30% (Independently or in group of 2)
The final letter grades will be based on the curve of students' performace. Undergraduate and graduate students are to be compared in 2 separate groups.
Attendance: Students are required to attend lectures.
Announcements: Stay tuned and make sure to check Blackboard frequently. Important announcements will be posted there.
Regrading: Regrading request must be made within 7 days after we post scores on Blackboard. TA will handle regrade requests. If student is not satisfied with the regarding results, you get 7 days to request again. The instructor will regrade, and the decision is final.
Drop Policy: Students may drop or swap (adding and dropping a class concurrently) classes through self-service in MyMav from the beginning of the registration period through the late registration period. After the late registration period, students must see their academic advisor to drop a class or withdraw. Undeclared students must see an advisor in the University Advising Center. Drops can continue through a point two-thirds of the way through the term or session. It is the student's responsibility to officially withdraw if they do not plan to attend after registering. Students will not be automatically dropped for non-attendance. Repayment of certain types of financial aid administered through the University may be required as the result of dropping classes or withdrawing. For more information, contact the Office of Financial Aid and Scholarships (http://wweb.uta.edu/ses/fao).
Americans with Disabilities Act: The University of Texas at Arlington is on record as being committed to both the spirit and letter of all federal equal opportunity legislation, including the Americans with Disabilities Act (ADA). All instructors at UT Arlington are required by law to provide "reasonable accommodations" to students with disabilities, so as not to discriminate on the basis of that disability. Any student requiring an accommodation for this course must provide the instructor with official documentation in the form of a letter certified by the staff in the Office for Students with Disabilities, University Hall 102. Only those students who have officially documented a need for an accommodation will have their request honored. Information regarding diagnostic criteria and policies for obtaining disability-based academic accommodations can be found at www.uta.edu/disability or by calling the Office for Students with Disabilities at (817) 272-3364.
Academic Integrity: All students enrolled in this course are expected to adhere to the UT Arlington Honor Code:
I pledge, on my honor, to uphold UT Arlington’s tradition of academic integrity,
a tradition that values hard work and honest effort in the pursuit of academic
excellence.
I promise that I will submit only
work that I personally create or contribute to group collaborations, and I will
appropriately reference any work from other sources. I will follow the highest
standards of integrity and uphold the spirit of the Honor Code.
Instructors may employ the Honor Code as they see fit in their courses,
including (but not limited to) having students acknowledge the honor code as
part of an examination or requiring students to incorporate the honor code into
any work submitted. Per UT System Regents’ Rule 50101, §2.2, suspected
violations of university’s standards for academic integrity (including the Honor
Code) will be referred to the Office of Student Conduct. Violators will be
disciplined in accordance with University policy, which may result in the
student’s suspension or expulsion from the University.
Student Support Services: UT Arlington provides a variety of resources and programs designed to help students develop academic skills, deal with personal situations, and better understand concepts and information related to their courses. Resources include tutoring, major-based learning centers, developmental education, advising and mentoring, personal counseling, and federally funded programs. For individualized referrals, students may visit the reception desk at University College (Ransom Hall), call the Maverick Resource Hotline at 817-272-6107, send a message to resources@uta.edu, or view the information at www.uta.edu/resources.
Electronic Communication:
UT Arlington has adopted MavMail as its official means to communicate with students about important deadlines and events, as well as to transact university-related business regarding financial aid, tuition, grades, graduation, etc. All students are assigned a MavMail account and are responsible for checking the inbox regularly. There is no additional charge to students for using this account, which remains active even after graduation. Information about activating and using MavMail is available at http://www.uta.edu/oit/cs/email/mavmail.php.Student Feedback Survey: At the end of each term, students enrolled in classes categorized as lecture, seminar, or laboratory shall be directed to complete a Student Feedback Survey (SFS). Instructions on how to access the SFS for this course will be sent directly to each student through MavMail approximately 10 days before the end of the term. Each student’s feedback enters the SFS database anonymously and is aggregated with that of other students enrolled in the course. UT Arlington’s effort to solicit, gather, tabulate, and publish student feedback is required by state law; students are strongly urged to participate. For more information, visit http://www.uta.edu/sfs.
Final Review Week: A period of five class days prior to the first day of final examinations in the long sessions shall be designated as Final Review Week. The purpose of this week is to allow students sufficient time to prepare for final examinations. During this week, there shall be no scheduled activities such as required field trips or performances; and no instructor shall assign any themes, research problems or exercises of similar scope that have a completion date during or following this week unless specified in the class syllabus. During Final Review Week, an instructor shall not give any examinations constituting 10% or more of the final grade, except makeup tests and laboratory examinations. In addition, no instructor shall give any portion of the final examination during Final Review Week. During this week, classes are held as scheduled. In addition, instructors are not required to limit content to topics that have been previously covered; they may introduce new concepts as appropriate.
Schedule:
As the instructor for this course, I reserve
the right to adjust this schedule in any way that serves the educational needs
of the students enrolled in this course. –Chengkai Li
Date | # |
Lecture |
Assignment |
Lecture Notes |
Extra Reading | |
Out |
Due |
|||||
08/27 | 1 | Course Overview | [PPT] | |||
08/29 | 2 |
Introduction
(Chapter 1) |
[PPT] | |||
09/03 | Labor Day Holiday | |||||
Data Warehousing, OLAP, Data Cube (Chapter 3, 4) |
||||||
09/05 | 3 | Data Warehousing, OLAP, Data Cube | [PPT] | |||
09/10 | 4 | Data Warehousing, OLAP, Data Cube | ||||
09/12 | 5 | One-of-the-Few Objects | [PPT] | One-of-the-Few paper | ||
09/17 | 6 | Prominent Streak Discovery | [PPT] | Prominent Streak paper | ||
09/19 | 7 | Course Project | HW1 | |||
Classification and Prediction (Chapter 6) | ||||||
09/24 | 8 | Decision Tree | [PPT] | |||
09/26 | 9 |
Decision Tree |
||||
10/01 | 10 | Bayesian Classifiers | [PPT] | |||
10/03 | 11 | Bayesian Classifiers (cont'd) | HW1 | |||
Text and Web Mining (1) | ||||||
10/08 | 12 | Vector Space Model |
|
[PDF] | textbook excerpt | |
10/10 | 13 |
|
|
[PPT] | ||
10/15 | Midterm Exam (Monday, Oct. 15th, 4-5:20pm, NH 202) | |||||
Classification and Prediction (Chapter 6) | ||||||
10/17 | 14 |
|
[PPT] | |||
10/22 | 15 | Evaluating Classification Models | HW2 | [PPT] | ||
10/24 | 16 |
Evaluating Classification
Models |
||||
10/29 | 17 | Support Vector Machine | [PPT] | |||
Clustering
(Chapter 7) |
||||||
10/31 | 18 | Overview of Clustering, Similarity/Dissimilarity Measure | [PDF] | |||
11/05 | 19 |
K-means |
Project Progress Report | [PPT] | ||
11/07 | 20 |
K-means |
|
|||
11/12 | 21 |
Hierarchical |
HW2 | [PPT] | ||
11/14 | 22 |
Hierarchical |
HW3 | textbook excerpt (in Blackboard) | ||
Frequent Pattern and Association Rule Mining (Chapter 5) |
||||||
11/19 | 23 | Association Rule Mining | ||||
11/21 | 24 | Association Rule Mining | [PPT] | |||
11/26 | 25 | Correlation Analysis | ||||
11/28 | 26 | Skyline Groups | HW3 | [PPT] | skyline group paper | |
Text and Web Mining (2) | ||||||
12/03 | 27 |
Link Analysis: PageRank |
[PDF] | textbook excerpt | ||
12/05 | 28 | MapReduce | Project Report | [PPT] | ||
TBD | Project Demo/Presentation (Time and Location TBD) | |||||
12/12 |
Final Exam
(Wednesday, Dec. 12th,
2-4:30pm, NH 202) |