List of Projects

Sampling and Approximate Queries

 

·         Approximate Query Processing

(Gautam Das, Arjun Dasgupta, Zubin Joseph)

In many OLAP and decision support environments, it is often desirable to answer complex long-running aggregate database queries approximately, provided some estimate of the error is also given. For example, when a sales manager asks give me the aggregate sales of Product X, grouped by the US states, she/he is probably not interested in getting answers to the nearest cent. We approach this difficult problem using statistical sampling-based techniques. Our objective is to propose practical solutions that require minimal changes to the underlying DBMS systems.

·         P2P sampling

 


Click to enlarge

(Gautam Das, Zubin Joseph)

The focus of this project is on sampling and statistics gathering from unstructured Peer-to-Peer networks. We are currently working on the DiVE-DeeP project (Distinct Value Estimation with Duplicates across Peers) which deals particularly with distinct value estimation where there is duplication of data across the peers in the network. We hope to extend this work to related problems such as the approximation of duplication on the network in order to determine trends in the popularity of data on the network.

 

·         Data Analytics over hidden databases

Recent Publications

  • Surajit Chaudhuri, Gautam Das, Vivek Narasayya. Optimized Stratified Sampling for Approximate Query Processing. To appear in ACM Transactions on Database Systems 2007.
  • Arjun Dasgupta, Gautam Das, Heikki Mannila. A random walk approach to sampling Hidden Databases. To appear in SIGMOD 2007.
  • Benjamin Arai, Gautam Das, Dimitrios Gunopulos and Vana Kalogeraki. Approximating Aggregations in Peer-to-Peer Databases. HDMS 2006.
  • Gautam Das: Approximate Query Processing. Tutorial, SBBD 2005.
  • Gautam Das: Sampling Methods in Approximate Query Answering Systems. Invited Book Chapter, Encyclopedia of Data Warehousing and Mining. Editor John Wang, Information Science Publishing, 2005.
  • Gautam Das: Approximate Query Processing Techniques. Invited Tutorial at the 11th International Conference on Management of Data (COMAD) 2005.

List of Projects