Skip to content. Skip to main navigation.

avatar

Chengkai Li

Name

[Li, Chengkai]
  • Associate Professor, Department of Computer Science & Engineering

Biography

Dr. Chengkai Li is Associate Professor and Director of the Innovative Database and Information Systems Research Laboratory (IDIR) in the Department of Computer Science and Engineering at the University of Texas at Arlington. He received his Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign in 2007, and an M.E. and a B.S. degree in Computer Science from Nanjing University, in 2000 and 1997, respectively. After graduation in 2007, he worked as Assistant Professor in the CSE Department of UT Arlington and was promoted to Associate Professor with tenure in 2013. Dr. Li's research interests are in several areas related to big data and data science, including database, data mining, Web data management, and natural language processing. His current research focuses on building large-scale human-assisting and human-assisted data and information systems with high usability, low cost and applications for social good. His research projects are on computational journalism, crowdsourcing and human computation, data exploration by ranking (top-k), skyline and preference queries, database testing, entity query, and usability challenges in querying graph data. Dr. Li's papers have appeared in prestigious database, data mining and Web conferences including SIGMOD, VLDB, ICDE, EDBT, CIDR, KDD, WWW, CIKM and WSDM, as well as in several leading journals such as TODS, TKDD and TKDE. He has served as General Co-Chair and Program Co-Chair of IEEE IPCCC, and he has also served on the organizing committees of SIGMOD and WAIM. He served on the program committees of premier conferences such as SIGMOD, VLDB, ICDE, EDBT, KDD, ICDM, WWW, IJCAI, CIKM, ICWSM, ISWC, and ICDCS. He is on the editorial board of a couple of journals. He has also been a reviewer for multiple prestigious journals, e.g., TODS, TOIS, TKDE and VLDB Journal. Dr. Li is a recipient of the 2011 and 2012 HP Labs Innovation Research Award.

Professional Preparation

    • 2007 Ph.D. in Computer ScienceUniversity of Illinois at Urbana-Champaign
    • 2000 M.Eng. in Computer ScienceNanjing University
    • 1997 B.S. in Computer ScienceNanjing University

Appointments

    • Sept 2017 to Present Associate Chair
      University of Texas at Arlington
    • Sept 2013 to Present Associate Professor
      University of Texas at Arlington
    • Sept 2007 to Aug 2013 Assistant Professor
      University of Texas at Arlington
    • June 2006 to Aug 2006 Research Intern
      IBM T. J. Watson Research Center
    • May 2003 to Aug 2003 Research Intern
      Bell Labs, Lucent Technologies
    • May 2002 to Aug 2002 Research Intern
      Bell Labs, Lucent Technologies

Memberships

  • Membership
    • June 2011 to Present Institute of Electrical and Electronics Engineers (IEEE)
    • June 2007 to Present SIGMOD, SIGKDD
  • Membership
    • June 2007 to Present Association for Computing Machinery (ACM)

Awards and Honors

    • Sep  2014 VLDB 2014 Excellent Demonstration Award sponsored by VLDB Endowment
    • Mar  2013 UT-Arlington Faculty Development Leave, Fall 2015 sponsored by University of Texas at Arlington
    • Sep  2012 HP Labs Innovation Research Award (2012) sponsored by HP
    • Sep  2011 HP Labs Innovation Research Award (2011) sponsored by HP
    • Jan  2011 The 3rd place in best Outrageous Ideas and Vision (OIV) Track paper competition in the 5th Biennial Conference on Innovative Data Systems Research (CIDR 2011) sponsored by Computing Community Consortium
    • Dec  2005 Excellent TA Award sponsored by Department of Computer Science, UIUC

News Articles

Research and Expertise

  • Big Data and Data Science

    Dr. Chengkai Li is Associate Professor and Director of the Innovative Database and Information Systems Research Laboratory (IDIR) in the Department of Computer Science and Engineering at the University of Texas at Arlington. Dr. Li's research interests are in several areas related to big data and data science, including database, data mining, Web data management, and natural language processing. His current research focuses on building large-scale human-assisting and human-assisted data and information systems with high usability, low cost and applications for social good. His current research projects are on computational journalism, crowdsourcing and human computation, data exploration by ranking (top-k), skyline and preference queries, database testing, entity query, and usability challenges in querying graph data.

Publications

      Journal Article Accepted
      • You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. Computational Fact-Checking through Query Perturbations. In ACM Transactions on Database Systems (TODS), to appear. 

        {Peer Reviewed }

      Conference Proceeding 2016
      • Ning Yan, Sona Hasani, Abolfazl Asudeh, Chengkai Li. Generating Preview Tables for Entity Graphs. In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 1797-1811, San Francisco, June 2016. (acceptance rate 116/=%)

        {Peer Reviewed }
      2016
      • Naeemul Hassan, Mark Tremayne, Fatma Arslan, and Chengka Li. Comparing Automated Factual Claim Detection Against Judgments of Journalism Organizations. In Proceedings of the 2016 Computation+Journalism Symposium, 5 pages, Stanford, California, USA, September 2016.

        {Peer Reviewed }

      Technical Report 2016
      • Nandish Jayaram, Rohit Bhoopalam, Chengkai Li, and Vassilis Athitsos. Orion: Enabling Suggestions in a Visual Query Builder for Ultra-Heterogeneous Graphs. Technical Report, arXiv:1605.06856, May 2016.

        {Technical Report }

      Conference Paper 2016
      • Nandish Jayaram, Arijit Khan, Chengkai Li, Xifeng Yan, Ramez Elmasri. Querying Knowledge Graphs by Example Entity Tuples. In Proceedings of the 31st International Conference on Data Engineering (ICDE), TKDE poster track, pages 1494-1495, Helsinki, Finland, May 2016.

        {Peer Reviewed }

      Conference Paper 2015
      • Naeemul Hassan, Bill Adair, James Hamilton, Chengkai Li, Mark Tremayne, Jun Yang and Cong Yu. The Quest to Automate Fact-Checking.  In Proceedings of the 2015 Computation+Journalism Symposium, 5 pages, New York City, USA, October 2015.

        {Conference Paper }
      2015
      • Nandish Jayaram, Sidharth Goyal, Chengkai Li. VIIQ: Auto-suggestion Enabled Visual Interface for Interactive Graph Query Formulation. In Proceedings of the VLDB Endowment (PVLDB), 8(12): 1940-1943, August 2015. Demonstration description. (acceptance rate 49/150=32.7%)

        {Conference Paper }
      2015
      • Xiang Ao, Ping Luo, Chengkai Li, Fuzhen Zhuang, and Qing He. Online Frequent Episode Mining. In Proceedings of the 31st International Conference on Data Engineering (ICDE), pages 891-902, Seoul, Korea, April 2015. 

        {Conference Paper }
      2015
      • Abolfazl Asudeh, Gensheng Zhang, Naeemul Hassan, Chengkai Li, and Gergely Zaruba. Crowdsourcing Pareto-Optimal Object Finding by Pairwise Comparisons. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM), pages 753-762, Melbourne, Australia, October 2015. (DB track full Paper, acceptance rate 35/166=21.1%)

        {Conference Paper }
      2015
      • Naeemul Hassan, Chengkai Li, and Mark Tremayne. Detecting Check-worthy Factual Claims in Presidential Debates.In Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM), pages 1835-1838, Melbourne, Australia, October 2015. (KM track short Paper, full paper acceptance rate 87/484=18.0%, short paper acceptance rate 36/484=7.4%)

        {Conference Paper }

      Journal Article 2015
      • Nandish Jayaram, Arijit Khan, Chengkai Li, Xifeng Yan, Ramez Elmasri. Querying Knowledge Graphs by Example Entity Tuples. In IEEE Transactions on Knowledge and Data Engineering (TKDE), 27(10): 2797-2811, October 2015. 

        {Journal Article }

      Journal Article 2014
      • Chengkai Li, Bin He, Ning Yan, and Muhammad Assad Safiullah. Set Predicates in SQL: Enabling Set-Level Comparisons for Dynamically Formed Groups. In IEEE Transactions on Knowledge and Data Engineering (TKDE), 26(2):438-452, February 2014.

        {Journal Article }
      2014
      • You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. Toward Computational Fact-Checking. In Proceedings of the VLDB Endowment (PVLDB), 7(7):589-600, March 2014.

        {Journal Article }
      2014
      • Nan Zhang, Chengkai Li, Naeemul Hassan, Sundaresan Rajasekaran, and Gautam Das. On Skyline Groups. In IEEE Transactions on Knowledge and Data Engineering (TKDE), 26(4):942-956, April 2014. 

        {Journal Article }
      2014
      • Gensheng Zhang, Xiao Jiang, Ping Luo, Min Wang, and Chengkai Li. Discovering General Prominent Streaks in Sequence Data. In ACM Transactions on Knowledge Discovery from Data (TKDD), 8(2):article 9, June 2014.

        {Journal Article }

      Conference Paper 2014
      • Nandish Jayaram, Mahesh Gupta, Arijit Khan, Chengkai Li, Xifeng Yan, and Ramez Elmasri. GQBE: Querying Knowledge Graphs by Example Entity Tuples. In Proceedings of the 30th International Conference on Data Engineering (ICDE), pages 1250-1253, Chicago, Illinois, USA, March 2014. Demonstration description. (acceptance rate 28/65=43%)

        {Conference Paper }
      2014
      • Afroza Sultana, Naeemul Hassan, Chengkai Li, Jun Yang, and Cong Yu. Incremental Discovery of Prominent Situational Facts. In Proceedings of the 30th International Conference on Data Engineering (ICDE), pages -, Chicago, Illinois, USA, March 2014. (acceptance rate 89/446=20%)

        {Conference Paper }
      2014
      • Xiang Ao, Ping Luo, Chengkai Li, Fuzhen Zhuang, Qing He, and Zhongzhi Shi. Discovering and Learning Sensational Episodes of News Events. In Proceedings of the 23rd International World Wide Web Conference (WWW), pages 217-218, Seoul, Korea, April 2014. (poster paper, acceptance rate 110/226=48.7%)

        {Conference Paper }
      2014
      • You Wu, Brett Walenz, Peggy Li, Andrew Shim, Emre Sonmez, Pankaj Agarwal, Chengkai Li, Jun Yang, and Cong Yu. iCheck: Computationally Combating "Lies, D---ned Lies, and Statistics". In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 1063-1066, Snowbird, Utah, USA, June 2014. Demonstration description. (acceptance rate 29/76=38%)

        {Conference Paper }
      2014
      • Brett Walenz, You (Will) Wu, Seokhyun (Alex) Song, Emre Sonmez, Eric Wu, Kevin Wu, Pankaj K. Agarwal, Jun Yang, Naeemul Hassan, Afroza Sultana, Gensheng Zhang, Chengkai Li, Cong Yu. Finding, Monitoring, and Checking Claims Computationally Based on Structured Data.  In Proceedings of the 2014 Computation+Journalism Symposium, 5 pages, New York City, USA, October 2014.

        {Conference Paper }
      2014
      • Naeemul Hassan, Huadong Feng, Ramesh Venkataraman, Gautam Das, Chengkai Li, Nan Zhang. Anything You Can Do, I Can Do Better: Finding Expert Teams by CrewScout. In Proceedings of the 23rd ACM International Conference on Information and Knowledge Management (CIKM), pages 2030-2032, Shanghai, China, November 2014. Demonstration description. (acceptance rate 29/73=39%) 

        {Conference Paper }
      2014
      • Naeemul Hassan, Afroza Sultana, You Wu, Gensheng Zhang, Chengkai Li, Jun Yang, and Cong Yu. Data In, Fact Out: Automated Monitoring of Facts by FactWatcher. In Proceedings of the VLDB Endowment (PVLDB), pages 1557-1560, 2014. Demonstration description. (acceptance rate 42/115=36.5%) (excellent demonstration award)

        {Conference Paper }
      2014
      • Nandish Jayaram, Arijit Khan, Chengkai Li, Xifeng Yan, Ramez Elmasri. Towards a Query-by-Example System for Knowledge Graphs. In Proceedings of the 2nd International Workshop on Graph Data Management Experiences and Systems (GRADES), 6 pages, Snowbird, Utah, USA, June 2014. (Co-located with ACM SIGMOD 2014)

        {Conference Paper }

      Technical Report 2014
      • Ning Yan, Abolfazl Asudeh, and Chengkai Li. Generating Preview Tables for Entity Graphs. Technical Report, arXiv:1403.5006, March 2014.

        {Technical Report }
      2014
      • Abolfazl Asudeh, Gensheng Zhang, Naeemul Hassan, Chengkai Li, and Gergely V. Zaruba. Crowdsourcing Pareto-Optimal Object Finding by Pairwise Comparisons. Technical Report, arXiv:1409.4161, September 2014.

        {Technical Report }

      Technical Report 2013
      • Nandish Jayaram, Mahesh Gupta, Arijit Khan, Chengkai Li, Xifeng Yan, and Ramez Elmasri. Querying Knowledge Graphs by Example Entity Tuples. Technical Report, arXiv:1311.2100, November 2013.

        {Technical Report }

      Conference Paper 2013
      • Lijiang Chen, Yibing Zhao, Shimin Chen, Hui Fang, Chengkai Li, and Min Wang. iPLUG: Personalized List Recommendation in Twitter. In Proceedings of the 14th International Conference on Web Information System Engineering (WISE), pages 88-103, Nanjing, China, October 2013. (acceptance rate 49/198=24.7%)

        {Conference Paper }
      2013
      • Peng Jiang, Huiman Hou, Lijiang Chen, Shimin Chen, Conglei Yao, Chengkai Li, and Min Wang. Wiki3C: Exploiting Wikipedia for Context-aware Concept Categorization. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM), pages 345-354, Rome, Italy, February 2013. (acceptance rate 73/387=19%) 

        {Conference Paper }

      Journal Article 2013
      • On Contextual Ranking Queries in Databases. Chengkai Li. In Information Systems, accepted at January 3, 2013, in preprint.Chengkai Li. On Contextual Ranking Queries in Databases. In Information Systems, Volume 38, Issue 4, Pages 509–523, June 2013.

        {Journal Article }
      2013
      • Aditya Telang, Sharma Chakravarthy, and Chengkai Li. Personalized Ranking in Web Databases: Establishing and Utilizing an Appropriate Workload. In Distributed and Parallel Databases (DPD), 31(1):47-70, March 2013. 

        {Journal Article }

      Conference Paper 2012
      • Kulsawasd Jitkajornwanich, Ramez Elmasri, John Mcenery, and Chengkai Li. Extracting Storm-Centric Characteristics from Raw Rainfall Data for Storm Analysis and Mining. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (BigSpatial), pages 91-99, Redondo Beach, California, USA, November 2012. (Co-located with ACM SIGSPATIAL GIS 2012)

        {Conference Paper }
      2012
      • Afroza Sultana, Quazi Hasan, Ashis Biswas, Soumyava Das, Habibur Rahman, Chris Ding, and Chengkai Li. Infobox Suggestion for Wikipedia Entities. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM), pages 2307-2310, Maui, Hawaii, October 2012. (poster paper, acceptance rate 106/228=46.5%)

        {Conference Paper }
      2012
      • Chengkai Li, Nan Zhang, Naeemul Hassan, Sundaresan Rajasekaran, and Gautam Das. On Skyline Groups. InProceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM), pages 2119-2123, Maui, Hawaii, October 2012. (short Paper, full paper acceptance rate 146/1088=13.4%, short paper acceptance rate 156/1088=14.3%)

        {Conference Paper }
      2012
      • You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. On "One of the Few" Objects. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 1487-1495, Beijing, China, August 2012. (acceptance rate 133/755=17.6%)

        {Conference Paper }
      2012
      • Leonidas Fegaras, Chengkai Li, and Upa Gupta. An Optimization Framework for Map-Reduce Queries. InProceedings of the 15th International Conference on Extending Database Technology (EDBT), pages 26-37, Berlin, Germany, March 2012. (acceptance rate 43/193=22.5%)

        {Conference Paper }

      Journal Article 2012
      • Aditya Telang, Chengkai Li, and Sharma Chakravarthy. One Size Does Not Fit All: Towards User- and Query-Dependent Ranking For Web Databases. In IEEE Transactions on Knowledge and Data Engineering (TKDE), 24(9):1671-1685, September 2012. 

        {Journal Article }
      2012
      • Xiaonan Li, Chengkai Li, and Cong Yu. Entity-Relationship Queries over Wikipedia. In ACM Transactions on Intelligent Systems and Technology (TIST), vol. 3, no. 4, article 70:1-20, September 2012.

        {Journal Article }

      Conference Paper 2011
      • Computational Journalism: A Call to Arms to Database Researchers. Sarah Cohen, Chengkai Li, Jun Yang, Cong Yu. In Proceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR), pages 148-151, Asilomar, California, USA, January 2011. (3rdplace in best Outrageous Ideas and Vision (OIV) Track paper competition)
        {Conference Paper }
      2011
      • XML Query Optimization in Map-Reduce. Leonidas Fegaras, Chengkai Li, Upa Gupta, Jijo Philip. In Proceedings of the 14th International Workshop on the Web and Databases (WebDB), 6 pages, Athens, Greece, June 2011. (Co-located with SIGMOD 2011) (acceptance rate 12/43=27.9%)

        {Conference Paper }
      2011
      • Formalization of 2-D Spatial Ontology and OWL/Protégé Realization. Kulsawasd Jitkajornwanich, Ramez Elmasri, Chengkai Li and John Mcenery. In Proceedings of the 3rd International Workshop on Semantic Web Information Management (SWIM), 6 pages, Athens, Greece, June 2011. (Co-located with SIGMOD 2011)

        {Conference Paper }
      2011
      • Christoph Csallner, Leonidas Fegaras, and Chengkai Li. Testing MapReduce-Style Programs. In Proceedings of the 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), New Ideas Track, pages 504-507, Szeged, Hungary, September 2011. (acceptance rate 11/43=25.6%)

        {Conference Paper }
      2011
      • Xiao Jiang, Chengkai Li, Ping Luo, Min Wang, and Yong Yu. Prominent Streak Discovery in Sequence Data. InProceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 1280-1288, San Diego, California, USA, August 2011. (full paper, poster presentation, acceptance rate 125/714=17.5%)

        {Conference Paper }

      Conference Paper 2010
      • Entity-Relationship Queries over Wikipedia. Xiaonan Li, Chengkai Li, Cong Yu. In Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents (SMUC), pages 21-28, Toronto, Canada, October 2010. (Co-located with CIKM 2010) (acceptance rate 8/32=25%)
        {Conference Paper }
      2010
      • EntityEngine: Answering Entity-Relationship Queries using Shallow Semantics. Xiaonan Li, Chengkai Li, Cong Yu. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM), pages 1925-1926, Toronto, Canada, October 2010. Demonstration description.
        {Conference Paper }
      2010
      • Facetedpedia: Enabling Query-Dependent Faceted Search for Wikipedia. Ning Yan, Chengkai Li, Senjuti B. Roy, Rakesh Ramegowda, Gautam Das. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM), pages 1927-1928, Toronto, Canada, October 2010. Demonstration description.
        {Conference Paper }
      2010
      • Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia. Chengkai Li, Ning Yan, Senjuti Basu Roy, Lekhendro Lisham, Gautam Das. In Proceedings of the 19th International World Wide Web Conference (WWW), pages 651-660, Raleigh, North Carolina, USA, April 2010.
        {Conference Paper }
      2010
      • Dynamic Symbolic Database Application Testing. Chengkai Li, Christoph Csallner. In Proceedings of the Third International Workshop on Testing Database Systems (DBTest), pages -, Indianapolis, Indiana, USA, June 2010. (Co-located with SIGMOD 2010)
        {Conference Paper }

      Technical Report 2010
      • Xiaonan Li, Chengkai Li, and Cong Yu. Structured Querying of Annotation-Rich Web Text with Shallow Semantics.Technical Report, Department of Computer Science and Engineering, University of Texas at Arlington, March 2010. 

        {Technical Report }

      Encyclopedia Entry 2009
      Conference Paper 2009
      • Query-By-Keywords (QBK): Query Formulation Using Semantics and Feedback. AdityaTelang, Sharma Chakravarthy, and Chengkai Li. InProceedings of the 2009 International Conference on Conceptual Modeling (ER), pages 191-204, 2009. (acceptance rate 31/162=19%) 
        {Conference Paper }

      Conference Paper 2008
      • Querying for Information Integration: How to go from an Imprecise Intent to a Precise Query? AdityaTelang, Sharma Chakravarthy, and Chengkai Li. In Proceedings of the 2008 International Conference on Management of Data (COMAD), pages 245-248, Bombay, India, December 2008.
        {Conference Paper }

      Conference Paper 2007
      • Supporting Ranking and Clustering as Generalized Order-By and Group-By. Chengkai Li, Min Wang, Lipyeow Lim, Haixun Wang, and Kevin Chen-Chuan Chang. In Proceedings of the 2007 ACM SIGMOD Conference (SIGMOD), pages 127-138, Beijing, China, June 2007. (acceptance rate 69/480=14%)
        {Conference Paper }

      Conference Paper 2006
      • Supporting Ad-hoc Ranking Aggregates. Chengkai Li, Kevin Chen-Chuan Chang, and Ihab F. Ilyas. In Proceedings of the 2006 ACM SIGMOD Conference (SIGMOD), pages 61-72, Chicago, Illinois, USA, June 2006. (acceptance rate 58/446=13%)
        {Conference Paper }

      Technical Report 2006
      • Bin He, Chengkai Li, David Killian, Mitesh Patel, Yuping Tseng, and Kevin Chen-Chuan Chang. A Structure-Driven Yield-Aware Web Form Crawler: Building a Database of Online DatabasesUIUCDCS-R-2006-2752, Department of Computer Science, UIUC, July 2006.

        {Technical Report }

      Conference Paper 2005
      • RankSQL: Query Algebra and Optimization for Relational Top-k Queries. Chengkai Li, Kevin Chen-Chuan Chang, Ihab F. Ilyas, and Sumin Song. In Proceedings of the 2005 ACM SIGMOD Conference (SIGMOD), pages 131-142, Baltimore, Maryland, USA, June 2005. (acceptance rate 65/431=15%)
        {Conference Paper }
      2005
      • RankSQL: Supporting Ranking Queries in Relational Database Management Systems. Chengkai Li, Mohamed Ali, Kevin Chen-Chuan Chang, and Ihab F. Ilyas. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), pages 1342-1345, Trondheim, Norway, August 2005. Demonstration description. (acceptance rate 29/69=42%)
        {Conference Paper }
      2005
      • Query Routing: Finding Ways in the Maze of the Deep Web. Govind Kabra, Chengkai Li, and Kevin Chen-Chuan Chang. In Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration (WIRI), Tokyo, Japan, April 2005. (In conjunction with ICDE 2005) (acceptance rate 14/47=30%)
        {Conference Paper }

      Journal Article 2004
      • Structured Databases on the Web: Observations and Implications. K. C.-C. Chang, B. He, C. Li, M. Patel, and Z. Zhang. SIGMOD Record, 33(3):61-70, September 2004.
        {Journal Article }

      Conference Paper 2003
      • Composing XSL Transformations with XML Publishing Views. Chengkai Li, Philip Bohannon, Henry F. Korth, and PPS Narayan. InProceedings of the 2003 ACM SIGMOD Conference (SIGMOD), pages 515-526, San Diego, California, USA, June 2003. (acceptance rate 52/342=15%)
        {Conference Paper }
      2003
      • Relational On-Line Exchange with XML. Philip Bohannon, Xin (Luna) Dong, Sumit Ganguly, Henry F. Korth, Chengkai Li, P.P.S. Narayan, and Pradeep Shenoy. In Proceedings of the 2003 ACM SIGMOD Conference (SIGMOD), pages 673, San Diego, California, USA, June 2003. Demonstration description.
        {Conference Paper }

      Technical Report 2003
      • Chengkai Li and Kevin Chen-Chuan Chang. Discovering Attribute Locality across the Deep Web: an Ordering-Based ApproachUIUCDCS-R-2003-2323, Department of Computer Science, UIUC, February 2003.

        {Technical Report }

Support & Funding

This data is entered manually by the author of the profile and may duplicate data in the Sponsored Projects section.
    • Sept 2014 to Aug 2017 III: Medium: Collaborative Research: From Answering Questions to Questioning Answers (and Questions)---Perturbation Analysis of Database Queries (as UTA PI) (Collaborative grant with Duke (lead institute) and Stanford. The grant is totaled at $1.2 million. sponsored by  - $241778
    • Nov 2015 to May 2016 I-Corps Team: ClaimBuster: Automated, Live Fact-Checking (as PI) sponsored by  - $50000
    • Nov 2015 to Apr 2016 Knight Prototype Fund, ClaimBuster (as PI) sponsored by  - $35000
    • Sept 2011 to Aug 2014 SHF: Small: Testing Large-Scale Database-Centric Applications (as co-PI) sponsored by  - $500000
    • Sept 2011 to Aug 2014 Entity-Centric Querying of Enterprise Information for IT Management (as PI) sponsored by  - $80000
    • Sept 2010 to Aug 2013 III: Small: EntityEngine: A Query Engine for Entity-Relationship Queries over Web Text (as PI) sponsored by  - $500000
    • Sept 2008 to Aug 2009 UTA REP: Mashing Up Information on the Web sponsored by  - $10000

Students Supervised

  • Doctoral
    • Present
      thumbnail
    • Present
      thumbnail
    • Present
    • Present
    • Present
    • Present
    • Present
    • Present
    • Aug 2016
      thumbnail

      Ph.D. dissertation: Toward Automated Fact Monitoring And Checking, August 2016; interned at AT&T and Qatar Computing Research Institute (QCRI); first-employment after graduation: Assistant Professor, University of Mississippi, Oxford, MS

    • May 2016
      thumbnail

      Ph.D. dissertation: Toward Better Usability of Query Systems for Massive Ultra-Heterogeneous Graphs: Novel Approaches of Query Formulation and Query Specification, May 2016; co-advised with Ramez Elmasri; interned at IBM Research India and HP Labs; first-employment after graduation: Member of Technical Staff, Pivotal, Palo Alto, CA

    • Dec 2013

      Ph.D. dissertation: Novel Methods for Entity-centric Information Exploration, December 2013; first-employment after graduation: Research Scientist, Huawei R&D Center, Santa Clara, CA

    • Aug 2011

      Ph.D. dissertation: A Holistic, Similarity-based Approach for Personalized Ranking in Web Databases, August 2011; co-advised with Sharma Chakravarthy; first-employment after graduation: Researcher, IBM Research India

  • Master's
    • Present
      thumbnail
    • Present
    • Present
    • Present
    • Present
    • Present
    • Present
    • Present
    • May 2016

      M.S. thesis: Comparison of Machine Learning Algorithms in Suggesting Candidate Edges to Construct a Query on Heterogeneous Graphs, May 2015; first-employment after graduation: Akamai, Boston, MA

    • Dec 2015

      M.S. thesis: Detecting Real-time Check-worthy Factual Claims in Tweets Related to U.S. Politics, December 2015, UTA Ph.D. program

    • Dec 2015

      M.S. thesis: Speaker Identification in Live Events Using Twitter, December 2015

    • Aug 2014

      M.S. thesis: Linking Entity Profiles, August 2014; first-employment after graduation: Amazon, Irving, TX

    • Dec 2012

      M.S. thesis: Prominent Streaks Discovery on Blog Articles, December 2012; first-employment after graduation: Cerner Corporation, Kansas City, MO

    • Dec 2012

      M.S. thesis: Querying Entity-relationship Graphs by Example Tuples: Experimental Evaluation and User Study, December 2012; first-employment after graduation: Electronic Arts (EA), Los Angeles, CA

    • Dec 2011
      thumbnail

      M.S. thesis: Automatic Discovery of Significant Events from Databases, December 2011; first-employment after graduation: Copper Labs, Irving, TX

    • Dec 2010
      thumbnail

      M.S. thesis: Measuring Named Entity Similarity Through Wikipedia Category Hierarchies, December 2010; employment after graduation: Ambit Energy, Dallas, TX

    • Dec 2010

      M.S. thesis: Typifying Wikipedia Articles, December 2010; first-employment after graduation: Dematic, Madison, WI

    • Aug 2008
      thumbnail

      M.S. thesis: Efficient Processing of Set Queries Using Bitmap Indexes, August 2008; first-employment after graduation: Microsoft, Seattle, WA

  • Undergraduate

Courses

      • CSE 4334-001 DATA MINING

        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

        Fall - Regular Academic Session - 2018 Download Syllabus Contact info & Office Hours
      • CSE 5334-005 DATA MINING

        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

        Fall - Regular Academic Session - 2018 Download Syllabus Contact info & Office Hours
      • CSE 4334-001 Data Mining

        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

        Fall - Regular Academic Session - 2017 Download Syllabus Contact info & Office Hours
      • CSE 5334-001 DATA MINING

        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

        Fall - Regular Academic Session - 2017 Download Syllabus Contact info & Office Hours
      • CSE 4334-001 Data Mining

        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

        Fall - Regular Academic Session - 2016 Download Syllabus Contact info & Office Hours
      • CSE 5334-001 DATA MINING

        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

        Fall - Regular Academic Session - 2016 Download Syllabus Contact info & Office Hours
      • CSE 5334-002 Data Mining

        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

        Spring - Regular Academic Session - 2016 Download Syllabus Contact info & Office Hours
      • CSE 4334-001 Data Mining

        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

        Spring - Regular Academic Session - 2016 Download Syllabus Contact info & Office Hours
      • CSE 5334-001 DATA MINING

        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

        Spring - Regular Academic Session - 2015 Download Syllabus Contact info & Office Hours
      • CSE 6339-001 Data Science and Computational Journalism

        Claims of "fact" are constantly made from data--by journalists, politicians, lobbyists, public relations specialists, sports fans, etc. Wherever numbers and data are involved, they can be laden with "lies, d--ed lies, and statistics." Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning the quality of the resulting claims, or to formulating good queries from the outset. There is demand for research that fills this void in many domains where decisions are increasingly driven by data, particularly in journalism. Data-driven fact-checking and lead-finding are growing in importance, as more data become publicly available in the movement of "democratizing data." This course is project-driven. We will build Websites, applications, systems to support public interest journalism. We plan to make our systems available for public use after the course concludes. In this process, we will learn, apply, and invent techniques for database systems, text mining, data mining, Web applications, data visualization, social media and social computational systems, cloud computing, and so on. We will also be exposed to the modern practice of journalism. 

        Spring - Regular Academic Session - 2015 Download Syllabus Contact info & Office Hours
      • CSE 4334-001 Data Mining

        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

        Spring - Regular Academic Session - 2015 Download Syllabus Contact info & Office Hours
      • CSE 5334-001 Data Mining

        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

        Fall - Regular Academic Session - 2014 Download Syllabus Contact info & Office Hours
      • CSE 3330-001 Database Systems and File Structures

        This is an introductory course on database. We will study database system architecture; file structures for databases, including indexing, hashing, and B+-trees; the relational model and algebra; the SQL database language; Entity-Relationship data modeling; functional dependencies and basic normalization.

        Spring - Regular Academic Session - 2014 Download Syllabus Contact info & Office Hours
      • CSE 6339-001 Research Frontiers in Crowdsourcing, Knowledge Graphs and Computational Journalsim

        Our society has entered the era of big data. It is estimated that we produce 2.5 exabytes of data every day. The open data movement in governments, organizations and scientific communities have brought a significant portion of the big data to the reach of all Web users, consumers, journalists, scientists, and executives. Users and developers are trying hard to tap into the large amount of data for numerous applications. In this course, we investigate data management and data mining techniques for taming big data and we focus on the research frontiers in three inter-related areas---crowdsourcing, knowledge graphs and computational journalism.

        Spring - Regular Academic Session - 2014 Download Syllabus Contact info & Office Hours
      • CSE 5334-001 Data Mining

        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

        Fall - Regular Academic Session - 2013 Download Syllabus Contact info & Office Hours
      • CSE 4334-001 Data Mining

        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

        Fall - Regular Academic Session - 2013 Download Syllabus Contact info & Office Hours
      • CSE 3330-001 Database Systems and File Structures
        This is an introductory course on database. We will study database system architecture; file structures for databases, including indexing, hashing, and B+-trees; the relational model and algebra; the SQL database language; Entity-Relationship data modeling; functional dependencies and basic normalization.
        Spring - Regular Academic Session - 2013 Download Syllabus 1 Link
      • CSE 6339-001 Web Search, Mining, and Integration
        Our society has entered the era of big data. It is estimated that we produce 2.5 exabytes of data every day. In the wave of big data, we see an unprecedented proliferation of entity data graphs. This is not really surprising given that graph, as a fundamental data abstraction, gracefully models real-world data in many domains, including knowledge bases, social networks, citation graphs, gene and protein databases, mobile networks, and program analysis graphs, to name just a few. In an entity data graph, vertices represent entities (e.g., persons, products, organizations) and edges represent relationships between entities. The open data movement in governments, organizations and scientific communities have brought a significant portion of the big data to the reach of all Web users, consumers, journalists, scientists, and executives. Users and developers are trying hard to tap into the large amount of graph data for numerous applications. In this course, students will learn, apply, and invent techniques for graph data management and mining. Specific topics include graph data model, graph query languages, graph index, graph query algorithms, graph mining, keyword search, large-scale data processing, and applications.
        Spring - Regular Academic Session - 2013 Download Syllabus 1 Link
      • CSE 4334-001 Data Mining
        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.
        Fall - Regular Academic Session - 2012 Download Syllabus
      • CSE 5334-001 DATA MINING
        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.
        Fall - Regular Academic Session - 2012 Download Syllabus
      • CSE 3330-001 Database Systems and File Structures
        This is an introductory course on database. We will study database system architecture; file structures for databases, including indexing, hashing, and B+-trees; the relational model and algebra; the SQL database language; Entity-Relationship data modeling; functional dependencies and basic normalization.
        Spring - Regular Academic Session - 2012 Download Syllabus
      • CSE 6339-001 Web Search, Mining, and Integration
        "The Mavericks overwhelmed the Kings, 99-60, on Saturday night, holding Sacramento to 22 field goals in 86 attempts-that's 25.6 percent. Mercy: There have been just over 48,000 NBA games played since the introduction of the shot clock in the fall of 1954-and the Kings became just the third of those 96,000-plus teams to have a game in which they produced both so few points and such a low shooting percentage." This is what appears in the "Elias Says" column of ESPN.com today, Jan. 15, 2012. The game took place last night. A record was established -- Dallas Mavericks were only the third team in NBA history to force their opponent to give such an embarrassing performance. Records like this fascinate sports fans. The experts at Elias Sports Bureau, Inc. were skillful and insightful to discover what had happened. We want to do the same and more, but by building an automatic system to do it. "When I was mayor of New York City, I encouraged adoptions. Adoptions went up 65 to 70 percent; abortions went down 16 percent." During a Republican presidential candidates' debate in 2007, Rudy Giuliani made the above claim. The city's Administration for Children's Services (ACS), established by Giuliani in 1996, made a similar claim by comparing the total number of adoptions during 1996-2001 to that during 1990-1995. But according to the investigation by FactCheckED.org**, "in fact adoptions at the end of Giuliani's tenure were only 17 percent higher than at the start, and falling. ... it is a classic case of how candidates and public officials sometimes use data selectively to create a false impression." We want to build systems to automatically check factual statements in news. ** http://www.factcheck.org/2007/05/levitating-numbers/ Above are two example problems that we will investigate when exploring the young field of computational journalism in this course. Journalism is at the crossroads. In the past, we have come to rely on traditional news organizations for investigative reporting to hold governments, corporations, and individuals accountable to society. The decline of traditional media in recent years has led to dwindling support for this style of journalism, which has profound impact on the well-being of democracy. At the same time, there is also an opportunity. With technological advances and the movement towards transparency, the amount of data available to the public is ever increasing. However, the potential of this "democratization of data" cannot be fully realized with the widening divide between the growing amount of data on one hand, and the shrinking number of investigative journalists on the other. Computing is a key to bridge this divide. Computational journalism aims at developing computational techniques and tools to increase effectiveness and broaden participation for journalism -- especially public interest journalism -- to help preserve its watchdog tradition. This course is project-driven. We will build Websites, applications, systems to support public interest journalism. We plan to make our systems available for public use after the course concludes. In this process, we will learn, apply, and invent techniques for database systems, text mining, data mining, Web applications, data visualization, social media and social computational systems, cloud computing, and so on. We will also be exposed to the modern practice of journalism.
        Spring - Regular Academic Session - 2012 Download Syllabus
      • CSE 4334-001 Data Mining
        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web and text.
        Fall - Regular Academic Session - 2011 Download Syllabus
      • CSE 5334-001 DATA MINING
        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web and text.
        Fall - Regular Academic Session - 2011 Download Syllabus 1 Link
      • CSE 3330-001 Database Systems and File Structures
        This is an introductory course on database. We will study database system architecture; file structures for databases, including indexing, hashing, and B+-trees; the relational model and algebra; the SQL database language; Entity-Relationship data modeling; functional dependencies and basic normalization.
        Spring - Regular Academic Session - 2011 Download Syllabus 1 Link
      • CSE 6339-001 Web Search, Mining, and Integration
        We will study papers on Web Search, Mining, and Integration, covering topics in databases, data mining, information retrieval, and the intersections of these areas. The goals of the course are: to expose graduate students to the cutting-edge of research in these areas;  to equip them with the necessary skill sets for finding jobs; to help them identify research topics and come up with preliminary works through course projects; and to prepare new students for doing research with faculty in databases, data mining, and information retrieval.
        Spring - Regular Academic Session - 2011 Download Syllabus 1 Link
      • CSE 5334-001 DATA MINING
        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web and text.
        Fall - Regular Academic Session - 2010 Download Syllabus
      • CSE 4392-001 Information Security
        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web and text.
        Fall - Regular Academic Session - 2010 Download Syllabus
      • CSE 6339-002 Web Search, Mining, and Integration
        We will study papers on Web Search, Mining, and Integration, covering topics in databases, data mining, information retrieval, and the intersections of these areas. The goals of the course are: to expose graduate students to the cutting-edge of research in these areas; to equip them with the necessary skill sets for finding jobs; to help them identify research topics and come up with preliminary works through course projects; and to prepare new students for doing research with faculty in databases, data mining, and information retrieval.
        Spring - Regular Academic Session - 2010 Download Syllabus
      • CSE 5334-001 DATA MINING
        This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web and text.
        Fall - Regular Academic Session - 2009 Download Syllabus 1 Link
      • CSE 6339-001 Web Search, Mining, and Integration
        No Description Provided.
        Spring - Regular Academic Session - 2009 Download Syllabus 1 Link

Service to the Profession