Multi-Layer Induction in Large, Noisy Databases
Sponsor(s): U.S. Army Research Office, Grant No. DAAD19-02-1-0178.
Duration: June 1, 2002 - August 31, 2005.
Project Summary: This project plans to design a new data mining
algorithm, multi-layer induction, which divides a database into
subsets of approximately equal size, runs an existing induction
algorithm on the first subset to obtain a first set of rules, and then
processes each of the remaining data subsets one at a time by
incorporating the induction results from the previous subsets(s).
This way, multi-layer induction will accumulate discovered rules from
each data subset at each layer and produce a final integrated output
which is expected to represent the original data more accurately. Any
noisy data contained in the original database can be partitioned and
diminished in multi-layer induction into the small data subsets, thus
the effects of noise would be diluted and induction efficiency can be
increased.
Research Team:
- Principal Investigator: Prof. Xindong
Wu
- Research Assistant Professor: Dr. Xingquan
Zhu
- Graduate Student: Qijun (Stella) Chen
Graduate Theses:
- Qijun Chen, Inductive Learning on
Partitioned Data, MSc Thesis, Department of Computer Science,
University of Vermont, 2004.
- Gong Chen, Mining Sequential Patterns Across Data Streams, MSc Thesis, Department of Computer Science,
University of Vermont, 2005.
Publications:
- Xingquan Zhu and Xindong Wu, "Class Noise Handling for
Effective Cost-Sensitive Learning by Cost-Guided Iterative
Classification Filtering", IEEE Trans. on
Knowledge and Data Engineering (TKDE), 18(2006), 10:
1435-1440. [abstract]
- Xingquan Zhu and Xindong Wu, "Bridging Local and Global Data
Cleansing: Identifying Class Noise in Large, Distributed Data
Datasets",
Data Mining and Knowledge Discovery (DMKD), 12(2006),
2: 275-308. [abstract]
- Xingquan Zhu, Ying Yang and Xindong Wu, "Effective Classification of Noisy Data Streams with Attribute-oriented Dynamic Classifier Selection
Perspective",
Knowledge and Information Systems (KAIS), vol.9, no.3, 2006,
339-363.
[abstract]
- Xingquan Zhu and Xindong Wu, "Cost-Constrained Data Acquisition for Intelligent Data Preparation",
IEEE Trans. on Knowledge and Data Engineering (TKDE), vol.17,
no.11, 2005. [abstract]
- Xingquan Zhu, Ahmed K. Elmagarmid, Xiangyang Xue, Lide Wu, Ann C. Catlin.
"InsightVideo: Towards Hierarchical Video Content Organization for Efficient
Browsing, Summarization and Retrieval", IEEE
Trans. on Multimedia, vol.7, no.4, 2005. [abstract]
- Xingquan Zhu, Xindong Wu, Ahmed K. Elmagarmid, Zhe Feng, and
Lide Wu, "Video Data Mining: Semantic Indexing and Event
Detection from the Association Perspective", IEEE Trans. on
Knowledge and Data Engineering (TKDE), vol.17, no.5,
2005. [abstract]
- Xingquan Zhu and Xindong Wu, "Data Acquisition with Active Impact-Sensitive
Instance Selection", in Proceedings of the
16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI
2004), Boca Raton, FL, November 15 - 17 2004. [abstract]
- Xingquan Zhu, Xindong Wu and Ying Yang, "Dynamic Classifier Selection for Effective Mining from Noisy Data Streams", in
Proceedings of the 4th IEEE
International Conference on Data Mining (ICDM 2004) , Brighton, UK, November 1 - 4, 2004. [abstract]
- Xingquan Zhu and Xindong Wu, "Cost-guided Class Noise Handling for Effective Cost-sensitive Learning", in Proceedings of the
4th IEEE International Conference on Data Mining (ICDM 2004) , Brighton,
UK, November 1 - 4, 2004. [abstract]
- Ying Yang, Xindong Wu and Xingquan Zhu, "Dealing with
Predictive-but-Unpredictable Attributes in Noisy Data Sources", in Proceedings of the 8th European
Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2004), Pisa, Italy, 2004. [abstract]
- Xingquan Zhu and Xindong Wu, "Class Noise vs Attribute Noise: A Quantitative Study of Their Impacts ",
Artificial Intelligence Review 22 (3-4):177-210, November
2004. [abstract]
- Xingquan Zhu, Xindong Wu and Ying Yang, "Error Detection and Impact-sensitive
Instance Ranking in Noisy Datasets", in Proceedings of the 19th
National Conference on Artificial Intelligence (AAAI-04), July 25-29,
2004, San Jose, California. [abstract]
- Xingquan Zhu, Xindong Wu, Jianping Fan, Walid G. Aref, Ahmed
K. Elmagarmid, "Exploring Video Content Structure for
Hierarchical Summarization", Multimedia Systems, 10(2):98-115, 2004. [abstract]
- Xingquan Zhu, Xindong Wu and Qijun Chen, "Eliminating Class Noise in
Large Datasets", in Proceedings of the 20th ICML International Conference on Machine Learning (ICML
2003), August 21-24, 2003, Washington D.C., 920-927. [abstract] [pdf]
- Xingquan Zhu and Xindong Wu, "Mining Video Association for Efficient
Database Management", in Proceedings of the 18th International Joint Conference
on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, August
12-15, 2003, 1422-1424. [abstract] [pdf]
- Shichao Zhang, Xindong Wu, and Chengqi Zhang, "Multi-Database Mining",
The IEEE Computational
Intelligence Bulletin, Volume 2, 5-13. [abstract] [pdf]
- Xingquan Zhu and Xindong Wu, "Sequential Association Mining for Video
Summarization", in Proceedings of the 2003 IEEE International Conference
on Multimedia & Expo (ICME 2003), Baltimore, MD, USA, July 6-9,
2003, Volume 3, 333-336. [abstract] [pdf]
- Xingquan Zhu, Jianping Fan, Ahmed K. Elmagarmid,
and Xindong Wu, "Hierarchical
Video Summarization and Content Description Joint Semantic and Visual Similarity",
Multimedia Systems, 9(2003), 1: 31-53. [abstract] [pdf].
This page has been accessed
times since May 22, 2003.
Last modified:
August 18, 2006.