Knowledge Acquisition from Databases


by Xindong Wu (Monash University, Australia),
1995

Publisher: Ablex Publishing Corporation 55 Old Post Road No. 2, P.O.Box 5297, Greenwich, CT 06831-0504, USA. (Old Address: 355 Chestnut Street, Norwood, New Jersey 07648, USA.) Telephone: +1-203-661-7602; Fax: +1-203-661-0792

ISBN 1-56750-206-7 (cloth cover); 1-56750-205-9 (paper cover).


Knowledge acquisition from databases is a research frontier for both database technology and machine learning techniques, and has seen sustained research in recent years. It also acts as a link between the two fields, thus offering a dual benefit. First, because database technology has already found wide application in many fields, machine learning research obviously stands to gain from this greater exposure and established technological foundation. Second, machine learning techniques can augment the ability of existing database systems to represent, acquire, and process a collection of expertise such as those that form part of the semantics of many advanced applications, for example, computer-aided design (CAD) and computer-aided manufacturing (CAM).

This book contains three parts. Part I surveys the area of knowledge acquisition from databases and figures out some of the major problems. Part II provides an overview of symbolic methods in machine learning and describes two types of rule induction algorithms to facilitate the acquisition of knowledge from databases: the decision tree-based ID3-like algorithms and the extension matrix-based induction algorithms. The author's own HCV induction algorithm based on the newly developed extension matrix approach is described as a counterpart to ID3-like algorithms. Two practical issues, noise handling and processing real-valued attributes in the context of knowledge acquisition from databases, are addressed in detail, and a performance comparison of different learning algorithms (ID3, C4.5, NewID, and HCV) is also provided in terms of rule compactness and accuracy on a battery of experimental data sets including three famous classification problems, the MONK's problems. Finally, in Part III, an intelligent learning database system, KEshell2, which makes use of the HCV algorithm and couples machine learning techniques with database and knowledge base technology, is described with examples.

The parts of the book have different but interrelated objectives and suit different levels of readership. Part II can be adopted as an inductive learning module in an artificial intelligence (AI) related undergraduate and/or postgraduate course. Part III can be integrated into a machine learning or advanced database course. Together with the brief overview in Part I, this book as a whole should be of interest to the whole intelligent databases and machine learning community and to students in machine learning, expert systems, and advanced database courses. Knowledge acquisition from databases could well form an independent honors or postgraduate course in a computer science or information systems program, and therefore this book could be adopted as a textbook.

The book is based on the author's papers and reports produced over the past few years. A short PostScript file with a table of contents is given here.


Copyright (c) 1995 Xindong Wu (Email: xwu@cs.uvm.edu)