List of Projects
Data Mining (Gautam Das, Muhammed Miah)
We are involved in several data mining projects, particularly in similarity models for categorical databases and time series databases. Similarity between complex objects is a central notion in data mining, with applications in segmentation and clustering. We are also involved in mining interesting generalization of market-basket databases (i.e., binary, or 0/1 databases).
-
Similarity in Categorical Databases: Traditional similarity measures such as Euclidean distances are often inadequate for categorical data where there is no natural numeric notion of distance. For example, in market-basket databases, two products (such as Coke and Pepsi) can be essentially similar, but how do we discover this similarity automatically? We have explored notions of context-sensitive similarity between such categorical data.
-
Time-Series Similarity:Similarity problems involving time-series data are equally interesting, e.g., we want to quickly answer questions such as which stocks are similar to stock X over the last two weeks?, and also automatically infer rules such as if stock X goes up and stock Y remains the same, then stock Z will shortly go down. We have developed similarity and retrieval/indexing models that are more sophisticated than traditional Euclidean distance models.
-
Mining of Market-Basket Databases:In this project we investigate interesting data mining problems on the widely applicable market-basket databases (binary databases). For example, we investigate interesting generalizations such as mining chains of binary relations, as well as develop efficient algorithms for correlations discovery and data compression in binary databases.
Recent Publications
-
Pauli Miettinen, Taneli Mielikäinen, Aristides Gionis, Gautam Das,
Heikki Mannila. The Discrete Basis Problem. Best Paper,
PKDD 2006.
-
Foto Afrati, Gautam Das, Aris Gionis, Heikki
Mannila, Taneli Mielikainen, Panayiotis Tsaparas: Mining Chains of
Relations. ICDM 2005.
-
Chotirat Ann Ratanamahatana, Jessica Lin, Dimitrios
Gunopulos, Eamonn Keogh, Michail Vlachos, and Gautam Das. Mining
Time Series Data. In O. Maimon and Rokach (eds.), Data Mining and
Knowledge Discovery Handbook: A Complete Guide for Practitioners and
Researchers, Kluwer Academic Publishers. 2005.
List of Projects
|