Data Analytics over Hidden Databases
|
Project Duration: 9/1/2008 - 2/28/2010
Amount: $136,001
Investigators: Gautam Das (Principal Investigator), Nan Zhang (Co-Principal Investigator)
Collaborator: Surajit Chaudhuri, Microsoft Research
Participating Students:
Arjun Dasgupta, PhD student, University of Texas at Arlington
Kirankumar Mehta, PhD student, George Washington University
Anirban Maiti, Masters student, University of Texas at Arlington
|
|
Abstract: Structured hidden databases are widely prevalent on the Web. They provide restricted form-like search interfaces that allow users to execute search queries by specifying desired attribute values of the sought-after tuples, and the system responds by returning a few (e.g., top-k) tuples that satisfy the selection conditions, sorted by a suitable ranking function. Although search interfaces for hidden databases are designed with focused search queries in mind, for certain applications it may be advantageous to infer more aggregated views of the data from the returned results of search queries. Such aggregated information will facilitate learning data distributions or building mining models, which can then be used to power and optimize a multitude of emerging data analytical applications.
This research involves developing effective techniques for performing data analytics, especially sampling, over hidden structured databases via their public interfaces. The outcomes include efficient algorithms for sampling hidden databases with a heterogeneous mix of data types, achievability results for sampling different types of search interfaces, and a prototypical toolset which demonstrates the sampling of real-world hidden databases. The research results of this project have broader impact on the nation's higher education system and high-tech industries. The ability to pose high-level analytical queries over hidden databases is needed by knowledge workers in a wide variety of corporations, governments, and security agencies. Parts of this project will be integrated into teaching and carried out by students as part of advanced class projects, which will potentially attract motivated students to pursue doctoral degrees.
Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in the materials listed here are those of the PIs and do not necessarily reflect the views of the National Science Foundation.
|
|