Spring 2011 CSE6339 (Section 001, #23309)

Special Topics in Advanced Database Systems

Web Search, Mining, and Integration


Resources: Google       Google Scholar      CiteSeer       DBLP Bibliography    ACM Digital Library       IEEE Xplore       Other Computer Science articles


Course Information:

Instructor: Chengkai Li

TA: Yuanzhe Cai

  • Office hours: 
  • Office:
  • Phone:
  • E-mail: yuanzhe.cai [AT] mavs [DOT] uta [DOT] edu

Course Description: We will study papers on Web Search, Mining, and Integration, covering topics in databases, data mining, information retrieval, and the intersections of these areas. The goals of the course are: to expose graduate students to the cutting-edge of research in these areas;  to equip them with the necessary skill sets for finding jobs; to help them identify research topics and come up with preliminary works through course projects; and to prepare new students for doing research with faculty in databases, data mining, and information retrieval. Detailed topics include:

Prerequisites: CSE 3330/5330  Database Systems I     or      CSE 5334  Data Mining       or     similar courses    or     consent of instructor

Reference Textbook (Not Required)

Grades

There is no exam. We will focus on paper review, presentation, and project.


Announcements: Stay tuned and make sure to check BlackBoard frequently. Important announcements will be posted there.

Assignments and Deadlines

Regrading: Regrading request must be made within 7 days after we post scores in BlackBoard. TA will handle regrade requests. If student is not satisfied with the regarding results, you get 7 days to request again. The instructor will regrade, and the decision is final.

BlackBoard:

Log in to BlackBoard with your NetID and password. We use BlackBoard for: (1) Announcements; (2) Assignment Submission; (3) Discussion;  (4) Releasing materials, assignments, scores and grades.


Ethics Policies and Academic Integrity: The College cannot and will not tolerate any form of academic dishonesty by its students. This includes, but is not limited to cheating on examinations, plagiarism, or collusion (explained in the document below). Students are required to read the following document carefully, sign it, return the signed copy to the instructor, and keep a copy for their own records. Hardcopies of this document will be provided to the students in the first class, and also can be picked up in the instructor's office. If you print by yourself, please make it double-sided.

Statement on Ethics, Professionalism, and Conduct for Engineering Students

Miscellaneous: If you require accommodation based on disability, I would like to meet with you in the privacy of my office during the first week of the semester to ensure that you are appropriately accommodated. Please read the page of the office for students with disabilities.


Schedule

Date Lecture/Activities

Presenter

Due

Lecture Notes

01/19 To be rescheduled      
Introduction
01/24 Course Overview Chengkai Li   [PDF]
01/26

Entity-Relationship Queries over Wikipedia. Xiaonan Li, Chengkai Li, Cong Yu. In Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents (SMUC 2010), pages 21-28, Toronto, Canada, October 2010. (Co-located with CIKM 2010)

Xiaonan Li    
01/31

Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia. Chengkai Li, Ning Yan, Senjuti Basu Roy, Lekhendro Lisham, Gautam Das. To appear in Proceedings of the 19th International World Wide Web Conference (WWW 2010), Raleigh, North Carolina, April 2010.

Ning Yan    
02/02

Course Project Topics

Chengkai Li
02/07 Paper Review, Presentation, Research Resources Chengkai Li    
02/09

Paper Review, Presentation, Research Resources (cont'd)

Chengkai Li    
Semantic Web
02/14

SemTag

  • Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, Ramanathan V. Guha, Anant Jhingran, Tapas Kanungo, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, Jason Y. Zien: SemTag and seeker: bootstrapping the semantic web via automated semantic annotation. WWW 2003: 178-186
  •      
    02/16 YAGO

    Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum: YAGO: A Large Ontology from Wikipedia and WordNet. J. Web Sem. 6(3): 203-217 (2008)
         
    02/21 Simone Paolo Ponzetto, Michael Strube: Deriving a Large-Scale Taxonomy from Wikipedia. AAAI 2007: 1440-1445

    Cäcilia Zirn, Vivi Nastase, Michael Strube: Distinguishing between Instances and Classes in the Wikipedia Taxonomy. ESWC 2008: 376-387
      Proposal  
    Entity Recognition and Disambiguation
    02/23 D. Milne and I. H. Witten. Learning to link with Wikipedia. In CIKM ’08, pages 509–518, 2008.

    R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In CIKM ’07, pages 233–242, 2007.
         
    02/28

    S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of Wikipedia entities in Web text. In KDD ’09, pages 457–466, 2009.

    X. Han and J. Zhao. Named entity disambiguation by leveraging Wikipedia semantic knowledge. In CIKM ’09, pages 215–224.

     

    QUIZ

     

         
    Information Extraction
    03/02 Machine Learning Approach, Wrapper
  • Andrew McCallum: Information Extraction: Distilling Structured Data from Unstructured Text. ACM Queue, volume 3, Number 9, November 2005.
  • Craig A. Knoblock, Kristina Lerman, Steven Minton, Ion Muslea: Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach. IEEE Data Eng. Bull. 23(4): 33-41 (2000)
  •      
    03/07

    KnowItAll

  • Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates: Unsupervised named-entity extraction from the Web: An experimental study. Artif. Intell. 165(1): 91-134 (2005)
  •    
    03/09

    TextRunner and Open Information Extraction

  • Oren Etzioni, Michele Banko, Stephen Soderland, Daniel S. Weld: Open information extraction from the web. Commun. ACM 51(12): 68-74 (2008)
  • Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, Oren Etzioni: Open Information Extraction from the Web. IJCAI 2007: 2670-2676
  • Michele Banko and Oren Etzioni: The Tradeoffs Between Open and Traditional Relation Extraction. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL 2008)
  •      
    03/14

    spring break

    03/16
    Structured Querying Over the Web, Entity Search and Ranking
    03/21 ExDB

    Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni: Structured Querying of Web Text Data: A Technical Challenge. CIDR 2007: 225-234
         
    03/23 EntityRank

    Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang: EntityRank: Searching Entities Directly and Holistically. VLDB 2007: 387-398
         
    03/28 Soumen Chakrabarti, Kriti Puniyani, Sujatha Das: Optimizing scoring functions and indexes for proximity search in type-annotated corpora. WWW 2006: 717-726   Progress Report  
    03/30 SQoUT

    Panagiotis G. Ipeirotis, Eugene Agichtein, Pranay Jain, Luis Gravano: To search or to crawl?: towards a query optimizer for text-centric tasks. SIGMOD Conference 2006: 265-276

    QUIZ

         
    04/01 Last day to drop class
    Guest Lectures
    04/04        
    04/06        
    Web Data Mining (cont'd)
    04/11

    Clustering Web Search Results

  •      
    04/13

    GABRILOVICH, G. AND S. MARKOVITCH. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. IJCAI’07, p.1606–1611.

         
    04/18 MILNE, D. AND WITTEN, I.H. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. WIKIAI'08      
    Social Networks
    04/20

    Tagging

  • Michael Benedikt, Sihem Amer Yahia, Laks Lakshmanan, Julia Stoyanovich. Efficient Network-Aware Search in Collaborative Tagging Sites. VLDB 2008.
  •      
    04/25 Shenghua Bao, Gui-Rong Xue, Xiaoyuan Wu, Yong Yu, Ben Fei, Zhong Su: Optimizing web search using social annotations. WWW 2007: 501-510      
    04/27 QUIZ      
    05/02     Final Report, Presentation and Demo Slides, source code  
    05/04 Project presentation and Demo      
    05/09 5:30-8pm, Project presentation and Demo      

    University calendar: Spring 2011