PAKDD-98 Invited Speakers

Jiawei Han, Simon Fraser University, Canada

ACSys Keynote Talk: OLAP Mining: An Integration of Data Mining and Data Warehousing Technologies

Data mining and data warehousing are two important database applications with great potential. OLAP mining is a mechanism which integrates on-line analytical processing (OLAP) with data mining so that OLAP and mining can be interleaved, and mining can be performed in different portions of data warehouses and at different levels of abstraction at user's finger tips. With rapid developments of data warehouse and OLAP technologies in database industry, it is promising to develop OLAP mining mechanisms.

With our years of research into data mining, an OLAP-based data mining system, DBMiner, has been developed, where OLAP mining is not only for data characterization but also for other data mining functions, including association, classification, prediction, clustering, and sequencing. Such an integration increases the flexibility of mining and helps users find desired knowledge. In this talk, we introduce the concept of OLAP mining and discuss how OLAP mining should be implemented in a data mining system.

Jiawei Han received his Ph.D. from the University of Wisconsin, Madison, in 1985. He is Director of the Intelligent Database Systems Research Laboratory, and a Professor in the School of Computing Science, at Simon Fraser University in British Columbia, Canada. He has conducted research in the areas of data mining and data warehousing, deductive and object-oriented databases, spatial databases, multimedia databases, and logic programming, with over 100 journal and conference publications. He is an editor of IEEE Transactions on Knowledge and Data Engineering, the Journal of Intelligent Information Systems, and Data Mining and Knowledge Discovery: An International Journal. He has served or is currently serving on the program committees of over 30 international conferences and workshops, including ICDE'95 (Program Committee Vice-Chair), DOOD'95, ACM-SIGMOD'96, VLDB'96, KDD'96 (Program Co-Chair), CIKM'97, SSD'97, KDD'97, and ICDE'98.

Chris Wallace, Monash University, Australia

Intrinsic Classification with Spatial Correlation

The Snob program(s) implement a Minimum Message Length method for intrinsic classification (or clustering or unsupervized classification if you prefer.) This method has been successfully used for over 25 years for a wide variety of problems, and with the very similar Autoclass family of programs by Cheeseman, is perhaps the most successful current approach. It does have limitations, one of which is that the cases or things to be classified are treated as INDEPENDENT random instances drawn from an unknown multi-class population. This talk addresses a recent attempt to allow the method to use prior knowledge of spatial relations among the things in domains where it is reasonable to expect the class of a thing to be correlated with the classes of its spatial neighbours. Two domains are discussed. In one, the "things" to be classified are the backbone sites of a protein, the hoped-for classes relate to the secondary structure of the protein, and the "space" is the one-dimensional space of the site sequence. Tim Edgoose has modified Snob to allow the prior probability of the class of the next site to be derived from a first-order Markov chain sensitive to the class of the previous site. Snob itself learns the transition probabilities of the Markov chain. In the second domain, the "things" to be classified are the pixels of a multi-spectral image, and the hoped-for classes might relate to useful segments of the image, or (for a satellite image of the Earth) terrain and vegetation type. It is reasonable to expect the classes of neighbouring pixels to be correlated, probably positively. Edgoose's approach does not readily generalize to the 2-dimensional space of an image, unless unrealistic assumptions are made that the pixel correlations are tied to a raster scan. A more radical modification of the Snob approach is needed and is described. Early results are encouraging.

Professor Chris Wallace received his PhD from the University of Sydney in 1959. He is a Fellow of the Association for Computing Machinery and the Australian Computer Society. His main research interests are in information theory and computer architecture. Among other distinguished contributions, Professor Wallace conceived and developed (initially with D. Boulton) a new theory for multivariate analysis based upon information and coding theory. This technique is now embodied in a large computer program used by research workers in several biological and social science disciplines in Australia and overseas. The basic theory and technique has also been applied to the testing and refinement of extremely complex hypotheses in archeology, and is currently being developed (with J. Patrick and P.R. Freeman) as a new and very general method for statistical and inductive inference. Some of the research results have been published in the Machine Learning journal (in 1993), the 1996 International Conference on Machine Learning, and the 1997 International Joint Conference on Artificial Intelligence.

Bhavani Thuraisingham, MITRE Corporation, USA

Data Warehousing, Data Mining, and Security

Having a data warehouse for managing the data is becoming a necessity with many enterprises. Several organizations are building their own data warehouses. Commercial database system vendors are marketing data warehousing products. In addition, some companies are specializing in developing data warehouses. The idea behind a data warehouse is that it is often cumbersome to access data from multiple and possibly heterogeneous databases. Several processing modules need to cooperate with each other for processing a query in a heterogeneous environment. Therefore, a data warehouse will bring together the essential data from these diverse data sources. This way the users need to query only the warehouse. In addition, a data warehouse also contains information such as summary reports and aggregates that are determined by the applications using the warehouse and the queries posed.

A related technology, which is used to convert the data in the warehouse into some useful information is data mining. That is, data mining is the process of posing a series of appropriate queries to extract information, often previously unknown, from large quantities of data in the database or the data warehouse. Data mining technology is a combination of various other technologies including machine learning, database management, statistics, and parallel processing.

This presentation will focus on security aspects of data warehousing and mining. Data warehousing security issues include security architectures, integrating multiple security policies for the warehouse, inference problem, administrating and auditing the warehouse. Data mining security issues include preventing unauthorized disclosure of information from mining as well as privacy issues for data mining. On the other hand data mining techniques could also be used to help with security including auditing and intrusions detection. The presentation will cover these various aspects of security for both warehousing and mining.

Dr. Bhavani Thuraisingham is a Senior Principal Engineer with the MITRE Corporation's Advanced Information Systems Center where she heads the Data and Information Management Department. She also heads the Corporate Initiative on Evolvable Interoperable Information Systems and in this position is responsible for the initiatives on data management, real-time systems, object technology and architectures, software reengineering, and economics analysis as they are related to information systems evolution and interoperation. She is currently working on real-time database management, data mining/knowledge discovery related to data security, and distributed object management technology. She is also a Director of MITRE's Database Specialty Group. Dr. Thuraisingham is a recipient of IEEE Computer Society's 1997 technical achievement award for her work on secure distributed databases systems. She has two US patents on inference control and has published over forty journal articles. She serves on the editorial board of IEEE Transactions on Knowledge and Data Engineering and is a Senior Member of IEEE.