Spring 2015 CSE 4334/5334

Data Mining

Course Information:

Instructor: Chengkai Li

  • Office hours: Thu 10:30-12:30pm
  • Office: ERB 628
  • Phone: (817) 272-0162
  • E-mail: cli [AT] uta [DOT] edu
  • Homepage: http://ranger.uta.edu/~cli

TA: Chinmay Srinath

  • Office hours: Mon 12-2pm
  • Office: ERB 504
  • E-mail: chinmay [DOT] srinath [AT] mavs [DOT] uta [DOT] edu

Course Description: This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

Student Learning Outcomes: A solid understanding of the basic concepts, prunciples, and techniques in data mining; an ability to analyze real-world applications, to model data mining problems, and to assess different solutions; an ability to design, implement, and evaluate data mining software.

Prerequisites:

Textbook

Grades

The final letter grades will be based on students' performace. There is no pre-defined cutoffs or distribution of grades.

Attendance: At The University of Texas at Arlington, taking attendance is not required. Rather, each faculty member is free to develop his or her own methods of evaluating students’ academic performance, which includes establishing course-specific policies on attendance. As the instructor of this section, I require all students to attend lectures.


Announcements: Stay tuned and make sure to check Blackboard frequently. Important announcements will be posted there.

Assignments and Deadlines

Regrading: Regrading request must be made within 7 days after we post scores on Blackboard. TA will handle regrade requests. If student is not satisfied with the regarding results, you get 7 days to request again. The instructor will regrade, and the decision is final.

Drop Policy: Students may drop or swap (adding and dropping a class concurrently) classes through self-service in MyMav from the beginning of the registration period through the late registration period. After the late registration period, students must see their academic advisor to drop a class or withdraw. Undeclared students must see an advisor in the University Advising Center. Drops can continue through a point two-thirds of the way through the term or session. It is the student's responsibility to officially withdraw if they do not plan to attend after registering. Students will not be automatically dropped for non-attendance. Repayment of certain types of financial aid administered through the University may be required as the result of dropping classes or withdrawing. For more information, contact the Office of Financial Aid and Scholarships (http://wweb.uta.edu/ses/fao).


Americans with Disabilities Act: The University of Texas at Arlington is on record as being committed to both the spirit and letter of all federal equal opportunity legislation, including the Americans with Disabilities Act (ADA). All instructors at UT Arlington are required by law to provide "reasonable accommodations" to students with disabilities, so as not to discriminate on the basis of that disability. Any student requiring an accommodation for this course must provide the instructor with official documentation in the form of a letter certified by the staff in the Office for Students with Disabilities, University Hall 102. Only those students who have officially documented a need for an accommodation will have their request honored. Information regarding diagnostic criteria and policies for obtaining disability-based academic accommodations can be found at www.uta.edu/disability or by calling the Office for Students with Disabilities at (817) 272-3364.

Title IX: The University of Texas at Arlington is committed to upholding U.S. Federal Law “Title IX” such that no member of the UT Arlington community shall, on the basis of sex, be excluded from participation in, be denied the benefits of, or be subjected to discrimination under any education program or activity. For more information, visit www.uta.edu/titleIX.

Academic Integrity: All students enrolled in this course are expected to adhere to the UT Arlington Honor Code:

        I pledge, on my honor, to uphold UT Arlington’s tradition of academic integrity, a tradition that values hard work and honest effort in the pursuit of academic excellence.
        I promise that I will submit only work that I personally create or contribute to group collaborations, and I will appropriately reference any work from other sources. I will follow the highest standards of integrity and uphold the spirit of the Honor Code.


Instructors may employ the Honor Code as they see fit in their courses, including (but not limited to) having students acknowledge the honor code as part of an examination or requiring students to incorporate the honor code into any work submitted. Per UT System Regents’ Rule 50101, §2.2, suspected violations of university’s standards for academic integrity (including the Honor Code) will be referred to the Office of Student Conduct. Violators will be disciplined in accordance with University policy, which may result in the student’s suspension or expulsion from the University.

Student Support Services: UT Arlington provides a variety of resources and programs designed to help students develop academic skills, deal with personal situations, and better understand concepts and information related to their courses. Resources include tutoring, major-based learning centers, developmental education, advising and mentoring, personal counseling, and federally funded programs. For individualized referrals, students may visit the reception desk at University College (Ransom Hall), call the Maverick Resource Hotline at 817-272-6107, send a message to resources@uta.edu, or view the information at www.uta.edu/resources.

Electronic Communication: UT Arlington has adopted MavMail as its official means to communicate with students about important deadlines and events, as well as to transact university-related business regarding financial aid, tuition, grades, graduation, etc. All students are assigned a MavMail account and are responsible for checking the inbox regularly. There is no additional charge to students for using this account, which remains active even after graduation. Information about activating and using MavMail is available at http://www.uta.edu/oit/cs/email/mavmail.php.

Student Feedback Survey: At the end of each term, students enrolled in classes categorized as lecture, seminar, or laboratory shall be directed to complete a Student Feedback Survey (SFS). Instructions on how to access the SFS for this course will be sent directly to each student through MavMail approximately 10 days before the end of the term. Each student’s feedback enters the SFS database anonymously and is aggregated with that of other students enrolled in the course. UT Arlington’s effort to solicit, gather, tabulate, and publish student feedback is required by state law; students are strongly urged to participate. For more information, visit http://www.uta.edu/sfs.

Final Review Week: A period of five class days prior to the first day of final examinations in the long sessions shall be designated as Final Review Week. The purpose of this week is to allow students sufficient time to prepare for final examinations. During this week, there shall be no scheduled activities such as required field trips or performances; and no instructor shall assign any themes, research problems or exercises of similar scope that have a completion date during or following this week unless specified in the class syllabus. During Final Review Week, an instructor shall not give any examinations constituting 10% or more of the final grade, except makeup tests and laboratory examinations. In addition, no instructor shall give any portion of the final examination during Final Review Week. During this week, classes are held as scheduled. In addition, instructors are not required to limit content to topics that have been previously covered; they may introduce new concepts as appropriate.

Emergency Exit Procedures: Should we experience an emergency event that requires us to vacate the building, students should exit the room and move toward the nearest exit. When exiting the building during an emergency, one should never take an elevator but should use the stairwells. Faculty members and instructional staff will assist students in selecting the safest route for evacuation and will make arrangements to assist individuals with disabilities.

Student Support Services: UT Arlington provides a variety of resources and programs designed to help students develop academic skills, deal with personal situations, and better understand concepts and information related to their courses. Resources include tutoring, major-based learning centers, developmental education, advising and mentoring, personal counseling, and federally funded programs. For individualized referrals, students may visit the reception desk at University College (Ransom Hall), call the Maverick Resource Hotline at 817-272-6107, send a message to resources@uta.edu, or view the information at www.uta.edu/resources.


Schedule

As the instructor for this course, I reserve the right to adjust this schedule in any way that serves the educational needs of the students enrolled in this course.

Date # Lecture Assignment Lecture Notes Required Reading
Out Due
01/20 1 Course Overview     [PDF]  
Everything About Data
01/22 2 Data Mining, Big Data, Data Science, Applications, Tools, Datasets     [PDF]  
01/27 3 The Life-Cycle of Data: data types, data extraction, curation, integration, wrangling, retrieval, mining P1   [PDF] TSK ch2
01/29 4 Modeling Text Data: vector space model, search engine   [PDF] MRS ch6
02/03 5 Similarity Measures     [PDF]  
02/05 6 Data Visualization        
Classification and Prediction (1) (Chapter 6)
02/10 7 Decision Tree     [PDF] TSK ch4
02/12 8 Decision Tree        
02/17 9 Bayesian Classifiers     [PDF] TSK ch5, MRS ch13
02/19 10 Bayesian Classifiers P2, SLP P1 (02/23)    
02/24 11 Support Vector Machine, Nearest Neighbor Classifiers     [PDF] TSK ch5, MRS ch14, MRS ch15
02/26 12 Text Mining: classification     [PDF] MRS ch13, MRS ch14, MRS ch15
03/03 13 Evaluating Classification Models     [PDF] TSK ch4
03/05 14 Evaluating Classification Models   Project Proposal [PDF]  
03/10   Spring Break      
03/12   Spring Break P2    
Web and Graph Mining, Large-Scale Data Processing
03/17 15 Web Mining: link analysis (PageRank) P3   [PPT]
03/19 16 Graph Mining    
03/24 Midterm Exam (Tuesday, March 24th, 2:00pm-3:20pm, UH 121)  
03/26 17 Large-Scale Data Processing (MapReduce)     [PPT]  
03/31 18 Large-Scale Data Processing (MapReduce)     [PPT]  
Clustering (Chapter 7)
04/02 19 Overview of Clustering   Project Progres Report [PPT]  
04/03 Last day to drop class  
04/07 20 K-means P4 P3    
04/09 21 Hierarchical clustering     [PPT]  
04/14 22 Text Mining: clustering        
Frequent Pattern and Association Rule Mining (Chapter 5)
04/16  23 Association Rule Mining        
04/21 24 Association Rule Mining        
04/23 25 Correlation Analysis   P4    
04/28 26 overflow        
Research and Application: Computational Journalism
04/30 27  Incremental Discovery of Prominent Situational Facts (guest lecture by Afroza Sultana)     [PDF] situational fact paper
05/05 28  Prominent Streak Discovery (guest lecture by Gensheng Zhang)   Project Final Deliverables [PDF] prominent streak paper
05/07 29 Showcase of SLP      
05/12 Final Exam (Tuesday, May 12th, 2:00pm-4:30pm, UH 121)  

University calendar: Spring 2015