With our years of research into data mining, an OLAP-based data mining system, DBMiner, has been developed, where OLAP mining is not only for data characterization but also for other data mining functions, including association, classification, prediction, clustering, and sequencing. Such an integration increases the flexibility of mining and helps users find desired knowledge. In this talk, we introduce the concept of OLAP mining and discuss how OLAP mining should be implemented in a data mining system.
Jiawei Han received his Ph.D. from the University of Wisconsin, Madison, in 1985. He is Director of the Intelligent Database Systems Research Laboratory, and a Professor in the School of Computing Science, at Simon Fraser University in British Columbia, Canada. He has conducted research in the areas of data mining and data warehousing, deductive and object-oriented databases, spatial databases, multimedia databases, and logic programming, with over 100 journal and conference publications. He is an editor of IEEE Transactions on Knowledge and Data Engineering, the Journal of Intelligent Information Systems, and Data Mining and Knowledge Discovery: An International Journal. He has served or is currently serving on the program committees of over 30 international conferences and workshops, including ICDE'95 (Program Committee Vice-Chair), DOOD'95, ACM-SIGMOD'96, VLDB'96, KDD'96 (Program Co-Chair), CIKM'97, SSD'97, KDD'97, and ICDE'98.
Professor Chris Wallace received his PhD from the University of
Sydney in 1959. He is a Fellow of the Association for Computing
Machinery and the Australian Computer Society. His main research
interests are in information theory and computer architecture. Among
other distinguished contributions, Professor Wallace conceived and
developed (initially with D. Boulton) a new theory for multivariate
analysis based upon information and coding theory. This technique is
now embodied in a large computer program used by research workers in
several biological and social science disciplines in Australia and
overseas. The basic theory and technique has also been applied to the
testing and refinement of extremely complex hypotheses in archeology,
and is currently being developed (with J. Patrick and P.R. Freeman) as
a new and very general method for statistical and inductive inference.
Some of the research results have been published in the Machine
Learning journal (in 1993), the 1996 International Conference on
Machine Learning, and the 1997 International Joint Conference on
Artificial Intelligence.
Chris
Wallace, Monash University, Australia
Intrinsic Classification with Spatial Correlation
The Snob program(s) implement a Minimum Message Length method for
intrinsic classification (or clustering or unsupervized classification
if you prefer.) This method has been successfully used for over 25
years for a wide variety of problems, and with the very similar
Autoclass family of programs by Cheeseman, is perhaps the most
successful current approach. It does have limitations, one of which
is that the cases or things to be classified are treated as
INDEPENDENT random instances drawn from an unknown multi-class
population. This talk addresses a recent attempt to allow the method
to use prior knowledge of spatial relations among the things in
domains where it is reasonable to expect the class of a thing to be
correlated with the classes of its spatial neighbours. Two domains
are discussed. In one, the "things" to be classified are the backbone
sites of a protein, the hoped-for classes relate to the secondary
structure of the protein, and the "space" is the one-dimensional space
of the site sequence. Tim Edgoose has modified Snob to allow the
prior probability of the class of the next site to be derived from a
first-order Markov chain sensitive to the class of the previous site.
Snob itself learns the transition probabilities of the Markov chain.
In the second domain, the "things" to be classified are the pixels of
a multi-spectral image, and the hoped-for classes might relate to
useful segments of the image, or (for a satellite image of the Earth)
terrain and vegetation type. It is reasonable to expect the classes
of neighbouring pixels to be correlated, probably positively.
Edgoose's approach does not readily generalize to the 2-dimensional
space of an image, unless unrealistic assumptions are made that the
pixel correlations are tied to a raster scan. A more radical
modification of the Snob approach is needed and is described. Early
results are encouraging.