List of Projects

Ranking and Top-k Queries

  • Searching Structured Objects


    Image not available
    Click to enlarge

    (Gautam Das, Muhammed Miah, Lekhendro Singh, Arjun Dasgupta)

    Recent years have seen a proliferation of data objects on the web for virtually every domain of interest. These objects are stored in a variety of repositories, or data sources, ranging from unstructured web pages to semistructured and highly structured (relational) databases accessible through form-based interfaces. While web search engines have enjoyed unprecedented success for unstructured text data, ad-hoc search for information across structured data sources is still a nascent field holding enormous promise for the future. Our vision is to develop innovative and principled frameworks and methodologies for building Search Engines for Structured Objects.
    For more on Searching Structured Objects click here



  • Attribute Ordering

    (Gautam Das, Nishant Kapoor)

    In recent years there has been a great deal of interest in developing effective techniques for ad-hoc search and retrieval in structured repositories such as relational databases - e.g., searching online databases of homes, used cars, and electronic goods. In many of these applications, the user often experiences "information overload", which occurs when the system responds to an under-specified user query by returning an overwhelming number of tuples, each displayed with a huge number of features (or attributes). We have developed a search and retrieval system that tackles this information overload problem from two angles. First, we show how to automatically rank and display the top-n most relevant tuples. Our ranking functions are either based on traditional distance-based metrics, or use probabilistic information retrieval principles that learn user preferences by exploiting past query workloads. Second, our system offers techniques for ordering the attributes of the returned tuples in decreasing order of "usefulness" and selects only a few of the most useful attributes to display. We have built demos of the system on a used cars and a homes for sale dataset. User surveys have shown that our system improves the user's query experience.


  • Top-k over web services

    (Gautam Das, Arjun Dasgupta)

    Traditional web services provide the users with limited query capability. Users can only ask questions which are supported by the front end interface. This can be improved by the presence of a middleware which can query the back end database through a web service and provide the user with a filtered selection of results. Problems related to filtering and scheduling web responses from single/multiple sources are being tackled in this project.

  • Top-k over Distributed Database



    Click to enlarge

    (Gautam Das, Amrita Tamrakar)

    Unlike centralized database, we are considering autonomous databases which are distributed in the network with data replication and hybrid fragmentation among them. The focus is to create a search engine which queries the database for top-k matching results using efficient top-k algorithms.



  • Top k queries over Exact and Fuzzy Data

    (Gautam Das, Bhushan Chaudhari)

    We initiated research on the anytime behavior of top-k algorithms on exact and fuzzy data. Top-k queries on large multi-attribute data sets are fundamental operations in information retrieval and ranking applications. In particular given specific top-k algorithms we were interested in studying their progress towards identification of the correct result at any point of the algorithms' execution. We adopted a probabilistic approach where we seek to report at any point the scores of the top-k results the algorithm has identified, as well as associate a confidence with this prediction. Such functionality can be a valuable asset when one is interested to reduce the runtime cost of top-k computations. We showed analytically that such probability and confidence are monotone in expectation. We presented a thorough experimental evaluation to validate our techniques using both synthetic and real data sets


  • Attribute Recommendation

    (Gautam Das, Muhammed Miah)

    Recommending Top-m Features/Attributes For Product Sellers

    When advertising a product in the e-marketplace, it is very important to make sure that its content should be attractive to potential buyers, and that it beats the competitive products in the market. We wish to design methods which can assist the seller in deciding which attributes of the product (e.g., product features, keywords, etc) should be emphasized or recommended when preparing the advertisement. We have developed algorithms for several variants of the problem across different application domains, e.g., car/home sales, products advertising in newspapers, creating catchy titles for an article, discovering useless attributes of an object (e.g., a homebuilder can find out that 'adding a fireplace does not make the home more desirable in this market') and so on.

Recent Publications
  • The STAR system, by Nishant Kapoor, Gautam Das, Vagelis Hristidis, S. Sudarshan, and Gerhard Weikum will be demo-ed at the upcoming ICDE 2007 conference.
  • Surajit Chaudhuri, Gautam Das, Vagelis Hristidis, Gerhard Weikum. Probabilistic Information Retrieval Approach for Ranking of Database Query Results. To appear in ACM Transactions on Database Systems, 2006.
  • Gautam Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis. Answering Top-k Queries Using Views. VLDB 2006.
  • Gautam Das, Vagelis Hristidis, Nishant Kapoor and S. Sudarshan. Ordering the Attributes of Query Results. SIGMOD 2006.
  • The paper "Answering Top-k Queries Using Views" by Gautam Das, Dimitrios Gunopulos, Nick Koudas, and Dimitris Tsirogiannis was presented at VLDB 2006.
  • The paper "Ordering the Attributes of Query Results", by Gautam Das, Vagelis Hristidis, Nishant Kapoor and S. Sudarshan has been accepted by SIGMOD 2006.
List of Projects