Resources:
Google
Google Scholar
CiteSeer
DBLP Bibliography
Course Information:
|
Instructor: Chengkai Li
|
TA: Ning Yan
|
Course Description: We will study papers on Web Search, Mining, and Integration, covering topics in databases, data mining, information retrieval, and the intersections of these areas. The goals of the course are: to expose graduate students to the cutting-edge of research in these areas; to equip them with the necessary skill sets for finding jobs; to help them identify research topics and come up with preliminary works through course projects; and to prepare new students for doing research with faculty in databases, data mining, and information retrieval. Detailed topics include:
Prerequisites:
CSE 3330/5330 Database Systems
I
or
CSE 5334
Data Mining or similar
courses or consent of instructor
There is no exam. We will focus on paper review, presentation, and project.
Announcements: Stay tuned and make sure to check WebCT frequently. Important announcements will be posted there.
Every student must equally contribute to the group project. Only one student in a group needs to upload project-related assignments into WebCT.
Regrading: Regrading request must be made within 7 days after we post scores on WebCT. TA will handle regrade requests. If student is not satisfied with the regarding results, you get 7 days to request again. The instructor will regrade, and the decision is final.
WebCT: (WebCT is not ready yet. You will be notified when it is ready.)
Log in to the WebCT page http://www.uta.edu/webct with your NetID and password. We use WebCT for: (1) Announcements; (2) Assignment Submission; (3) Discussion; (4) Releasing materials, assignments, scores and grades. Follow these steps exactly during electronic assignment submission.
Ethics Policies and Academic Integrity: The College cannot and will not tolerate any form of academic dishonesty by its students. This includes, but is not limited to cheating on examinations, plagiarism, or collusion (explained in the document below). Students are required to read the following document carefully, sign it, return the signed copy to the instructor, and keep a copy for their own records. Hardcopies of this document will be provided to the students in the first class, and also can be picked up in the instructor's office. If you print by yourself, please make it double-sided.
Statement on Ethics, Professionalism, and Conduct for Engineering Students
Miscellaneous: If you require accommodation based on disability, I would like to meet with you in the privacy of my office during the first week of the semester to ensure that you are appropriately accommodated. Please read the page of the office for students with disabilities.
Date | Lecture/Activities |
Presenter |
Due |
Lecture Notes |
Introduction | ||||
01/19 | Course Overview | Chengkai Li | [PDF] | |
01/21 |
Paper Review, Presentation, Research Resources |
Chengkai Li | [PDF] | |
Basics | ||||
01/26 |
Paper Review, Presentation, Research Resources (cont'd) |
Chengkai Li | ||
01/28 |
Course Project Topics |
Chengkai Li | [slides in WebCT] | |
02/02 | Boolean Query Model, Vector Space Model, Inverted Index and Distributed Index | Chengkai Li | [PDF] | |
02/04 |
Web Search Basics |
Abdus Salam | [PPT] | |
02/09 |
Text Clustering |
Mahashweta Das | [PPT] | |
02/11 | (Rescheduled due to snow) | |||
02/16 |
Text Classification |
Rakesh Ramegowda | [PPT] | |
Semantic Web | ||||
02/18 |
SemTag |
Shahina Ferdous | Proposal | [PPT] |
02/23 |
YAGO Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum: YAGO: A Large Ontology from Wikipedia and WordNet. J. Web Sem. 6(3): 203-217 (2008) |
Quazi Hasan | [PDF] | |
02/26 |
(make-up class for 02/11) Simone Paolo Ponzetto, Michael Strube: Deriving a Large-Scale Taxonomy from Wikipedia. AAAI 2007: 1440-1445 Cäcilia Zirn, Vivi Nastase, Michael Strube: Distinguishing between Instances and Classes in the Wikipedia Taxonomy. ESWC 2008: 376-387 |
Chengkai Li | [PDF] | |
Entity Recognition and Disambiguation | ||||
02/25 |
D. Milne and I. H. Witten.
Learning to link with Wikipedia. In CIKM ’08, pages 509–518, 2008. R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In CIKM ’07, pages 233–242, 2007. |
Abhijit Tendulkar | [PDF] | |
03/02 |
S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of Wikipedia entities in Web text. In KDD ’09, pages 457–466, 2009. X. Han and J. Zhao. Named entity disambiguation by leveraging Wikipedia semantic knowledge. In CIKM ’09, pages 215–224. |
Avinash Bharadwaj | [PDF] | |
Information Extraction | ||||
03/04 |
Machine Learning Approach, Wrapper |
Shanshan Lu | [PDF] | |
03/09 |
KnowItAll |
Chengkai Li | [PDF] | |
03/11 |
TextRunner and Open Information Extraction |
Ning Yan | Essay 1 (03/12) | [PDF] |
03/16 |
spring break |
|||
03/18 | ||||
Structured Querying Over the Web, Entity Search and Ranking | ||||
03/23 |
ExDB Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni: Structured Querying of Web Text Data: A Technical Challenge. CIDR 2007: 225-234 |
Shahina Ferdous | [PDF] | |
03/25 |
EntityRank Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang: EntityRank: Searching Entities Directly and Holistically. VLDB 2007: 387-398 |
Abdus Salam | [PDF] | |
03/30 | Soumen Chakrabarti, Kriti Puniyani, Sujatha Das: Optimizing scoring functions and indexes for proximity search in type-annotated corpora. WWW 2006: 717-726 | Xiaonan Li | Progress Report | [PDF] |
04/01 |
SQoUT Panagiotis G. Ipeirotis, Eugene Agichtein, Pranay Jain, Luis Gravano: To search or to crawl?: towards a query optimizer for text-centric tasks. SIGMOD Conference 2006: 265-276 |
Avinash Bharadwaj | [PDF] | |
04/02 | Last day to drop class | |||
Guest Lectures | ||||
04/06 | Xiaonan Li | Essay 2 | [PDF] | |
04/08 | Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia. Chengkai Li, Ning Yan, Senjuti Basu Roy, Lekhendro Lisham, Gautam Das. To appear in Proceedings of the 19th International World Wide Web Conference (WWW 2010), Raleigh, North Carolina, April 2010. | Ning Yan | [PDF] | |
Web Data Mining (cont'd) | ||||
04/13 |
Clustering Web Search Results |
Mahashweta Das | [PDF] | |
04/15 |
GABRILOVICH, G. AND S. MARKOVITCH. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. IJCAI’07, p.1606–1611. |
Quazi Hasan | [PDF] | |
04/20 | MILNE, D. AND WITTEN, I.H. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. WIKIAI'08 | Shanshan Lu | [PDF] | |
Social Networks | ||||
04/22 |
Tagging |
Abhijit Tendulkar | [PDF] | |
04/27 | Shenghua Bao, Gui-Rong Xue, Xiaoyuan Wu, Yong Yu, Ben Fei, Zhong Su: Optimizing web search using social annotations. WWW 2007: 501-510 | Rakesh Ramegowda | [PDF] | |
04/29 | overflow lecture | Essay 3 | [PDF] | |
05/11 | Project presentation and Demo |
Salam Shanshan Rakesh /Avinash |
Final Report, Presentation and Demo Slides, source code (due at 05/10) | [PDF] |
05/13 | Project presentation and Demo |
Sunny/Shahina Mahashweta/Abhijit |
[PDF] |
University calendar: Spring 2010