ICDM 2003 Accepted Papers

We had a total of 501 papers submitted to ICDM this year, from which
58 regular papers, 61 short papers, and 9 industry-track papers were
selected for presentation.
Research-Track Regular Papers
-
R229 Bin Zhang, "Regression Clustering"
-
R240 Dmitry Pavlov, "Sequence Modeling with Mixtures of Conditional
Maximum Entropy Distributions"
-
R253 Fengzhan Tian and Yuchang Lu, "Learning Bayesian Networks from
Incomplete Data Based on EMI Method"
-
R256 Levon Lloyd and Steven Skiena, "Parsing without a Grammar:
Making Sense of Unknown File Formats"
-
R267 Jian Pei, Xiaoling Zhang, Moonjung Cho, Haixun Wang, and Philip
S. Yu, "MaPle: A Fast Algorithm for Maximal Pattern-based
Clustering"
-
R281 Jessica Lin, Eamonn Keogh, and Wagner Truppel, "Clustering of
Streaming Time Series is Meaningless: Implications for Previous and
Future Research"
-
R297 Saharon Rosset and Einat Neumann, "Integrating Customer Value
Considerations into Predictive Modeling"
-
R314 Huidong Jin, Man-Leung Wong, and Kwong-Sak Leung, "Scalable
Model-based Clustering by Working on Data Summaries"
-
R338 zehang sun, George Bebis, and Ronald Miller, "Evolutionary
Gabor Filter Optimization with Application to Vehicle Detection"
-
R339 Raymond Chan, Qiang Yang, and Yi-Dong Shen, "Mining High
Utility Itemsets"
-
R341 Wei Fan, Haixun Wang, Philip Yu, and Sheng Ma, "Is random
model better? Its accuracy and efficiency"
-
R342 Raymond Chi-Wing Wong, Ada Wai-Chee Fu, and Ke Wang, "MPIS:
Maximal-Profit Item Selection with Cross-Selling Considerations"
-
R347 Shou-de Lin and Hans Chalupsky, "Unsupervised Link Discovery
in Multi-relational Data via Rarity Analysis"
-
R356 Yongqiao Xiao, Jenq-Foung Yao, Zhigang Li, and Margaret
Dunham, "Efficient Data Mining for Maximal Frequent Subtrees"
-
R370 Jiuyong Li and Yanchun Zhang, "Generate Interesting Rules
Directly"
-
R373 Xingzhi Sun, Maria E. Orlowska, and Xue Li, "Introducing
Uncertainty into Pattern Discovery in Temporal Event Sequences"
-
R380 Mukund Deshpande, Michihiro Kuramochi, and George Karypis,
"Frequent Sub-Structure-Based Approaches for Classifying Chemical
Compounds"
-
R387 Robert Munro, Sanjay Chawla, and Pei Sun, "Complex Spatial
Relationships"
-
R390 noriaki kawamae, "Semantic Log Analysis Based on A User's
Query Behavior Model"
-
R401 Ran Wolff and Assaf Schuster, "Association Rule Mining in
Peer-to-Peer Systems"
-
R405 Ran Wolff, Assaf Schuster, and Dan Trock, "A High-Performance
Distributed Algorithm for Mining Association Rules"
-
R419 Jinze Liu and Wei Wang, "OP-Cluster: Clustering by Tendency in
High Dimensional Space"
-
R432 Jvrg Walter, Jvrg Ontrup, and Helge Ritter, "Interactive
Visualization and Navigation in Large Data Collections using the
Hyperbolic Space"
-
R433 Fabien De Marchi and Jean-Marc Petit, "Zigzag: a new algorithm
for mining large inclusion dependencies in databases"
-
R435 Jeremy Kubica and Andrew Moore, "Probabilistic Noise
Identification and Data Cleaning"
-
R442 Jeremy Kolter and Marcus Maloof, "Dynamic Weighted Majority: A
New Ensemble Method for Tracking Concept Drift"
-
R449 Qiang Yang and Hong Cheng,
"Mining Plans for Customer-Class Transformation"
-
R459 Akihiro Inokuchi and Hisashi Kashima, "Mining Significant
Pairs of Patterns from Graph Structures with Class Labels"
-
R462 Hua-Jun Zeng, Xuan-Hui Wang, Zheng Chen, and Wei-Ying Ma,
"CBC: Clustering Based Text Classification Requiring Minimal Labeled
Data"
-
R472 Chihli Hung and Stefan Wermter, "A Dynamic Adaptive
Self-Organising Hybrid Model for Text Clustering"
-
R484 Jieping Ye, Ravi Janardan, Cheong Hee Park, and Haesun Park,
"A new optimization criterion for generalized discriminant analysis on
undersampled problems"
-
R493 Petre Tzvetkov, Xifeng Yan, and Jiawei Han, "TSP: Mining Top-K
Closed Sequential Patterns"
-
R502 Sau Dan Lee and Luc De Raedt, "An Algebra for Inductive Query
Evaluation"
-
R522 Alexander Topchy, Anil Jain, and William Punch, "Combining
Multiple Weak Clusterings"
-
R527 Amihood Amir, Reuven Kashi, and Nathan Netanyahu, "Efficient
Multidimensional Quantitative Hypotheses Generation"
-
R528 Francesco Bonchi, Fosca Giannotti, Alessio Mazzanti, and Dino
Pedreschi, "ExAMiner: Optimized Level-wise Frequent Pattern Mining
with Monotone Constraints"
-
R535 Taneli Mielikdinen, "Change Profiles"
-
R542 Kang Peng, Slobodan Vucetic, Bo Han, Hongbo Xie, and Zoran
Obradovic, "Exploiting Unlabeled Data for Improving Accuracy of
Predictive Data Mining"
-
R553 Shi Zhong and Joydeep Ghosh, "Model-based Clustering with Soft
Balancing"
-
R557 Guizhen Yang, Saikat Mukherjee, and I. V. Ramakrishnan, "On
Precision and Recall of Multi-Attribute Data Extraction from
Semistructured Sources"
-
R558 Olfa Nasraoui, Cesar Cardona, Carlos Rojas, and Fabio
Gonzalez, "Mining Evolving Clusters in Noisy Data with a Scalable
Immune System Learning Model"
-
R560 Yinghui Yang and Balaji Padmanabhan, "Segmenting Customer
Transactions Using a Pattern-Based Clustering Approach"
-
R565 Robert Gwadera, Mikhail Atallah, and Wojciech Szpankowski,
"Reliable Detection of Episodes in Event Sequences"
-
R575 Cheong Hee Park and Haesun Park,
"Efficient Nonlinear Dimension Reduction for Clustered Data Using
Kernel Functions"
-
R577 Srujana Merugu and Joydeep Ghosh, "Privacy-preserving
Distributed Clustering using Generative Models"
-
R587 Qi Li, Jieping Ye, and Chandra Kambhamettu, "Spatial Interest
Pixels (SIPs): Useful Low-Level Features of Visual Media Data"
-
R588 Lewis Frey, Douglas Fisher, Ioannis Tsamardinos, Constantin
Aliferis, and Alexander Statnikov, "Identifying Markov Blankets with
Decision Tree Induction"
-
R598 Jeonghee Yi, Tetsuya Nasukawa, Razvan Bunescu, and Wayne Niblack,
"Sentiment Analyzer:
Extracting Sentiments About A Given Topic Using
Natural Language Processing Techniques"
-
R618 J Elble, C Heeren, and L Pitt, "Optimized Disjunctive
Association Rules via Sampling"
-
R619 Bianca Zadrozny, John Langford, and Naoki Abe, "Cost-sensitive
learning by cost-proportionate example weighting"
-
R620 Hillol Kargupta, Souptik Datta, Qi Wang, and Krishnamoorthy
Sivakumar, "On the Privacy Preserving Properties of Random Data
Perturbation Techniques"
-
R622 Alexandrin Popescul, Lyle Ungar, Steve Lawrence, and David
Pennock, "Statistical Relational Learning for Document Mining"
-
R631 Eren Manavoglu, Dmitry Pavlov, and C. Lee Giles, "Probabilstic
User Behavior Models"
-
R637 Shusaku Tsumoto, "Visualization of Rules Similarity using
Multidimensional Scaling"
-
R654 Hui Xiong, Pang-Ning Tan, and Vipin Kumar, "Mining Strong
Affinity Association Patterns in Data Sets with Skewed Support
Distribution"
-
R670 Bing Liu, Xiaoli Li, Wee Sun Lee, and Philip Yu, "Building
Text Classifiers Using Positive and Unlabeled Data"
-
R676 Einoshin Suzuki, Takeshi Watanabe, Hideto Yokoi, and Katsuhiko
Takabayashi, "Detecting Interesting Exceptions from Medical Test Data
with Visual Summarization"
-
R688 Aleksandar Lazarevic, Ramdev Kanapady, Chandrika Kamath, Vipin
Kumar, and Kumar Tamma, "Localized Prediction of Continuous Target
Variables Using Hierarchical Clustering"
Research-Track Short Papers
-
R211 Frederic Maire, "Balancing Board Machines"
-
R244 Jiwen Guan and David Bell, "Rough set theory finding maximal
association rules in mining for keyword co-occurrences"
-
R259 Man Lung Yiu and Nikos Mamoulis, "Frequent-Pattern based
Iterative Projected Clustering"
-
R268 Joarder Kamruzzaman and Ruhul Sarker, "SVM based models for
predicting foreign currency exchange rates"
-
R269 Reda Alhajj and Mehmet Kaya, "Integrating Fuzziness into OLAP
for Multidimensional Fuzzy Association Rules Mining"
-
R288 Jin Huang and Charles Ling, "Accuracy vs AUC: Comparing Naive
Bayes, Decision Trees and SVM"
-
R290 Mehmet Kaya and Reda Alhajj, "Facilitating Fuzzy Association
Rules Mining by Using Multi-Objective Genetic Algorithms for Automated
Clustering"
-
R298 Chun-Nan Hsu, Hao-Hsiang Chung, and Han-Shen Huang, "The
Hybrid Poisson Aspect Model for Personalized Shopping Recommendation"
-
R306 Yun Chi, Yirong Yang, and Richard R. Muntz, "Indexing and
Mining Free Trees"
-
R311 Jinyan Li and Huiqing Liu, "Ensembles of Cascading Trees"
- R320 Kai Ming Ting and Regina Jing Ying Quek, "Model
Stability: A key factor in determining whether an algorithm produces
an optimal model from a matching distribution"
-
R348 Longin Jan Latecki, Rajagopal Venugopal, Marc Sobel, and Steve
Horvath, "Tree-structured Partitioning Based on Splitting Histograms
of Distances"
-
R350 Raz Tamir and Reinhard Rapp, "Mining the Web to Discover the
Meanings of an Ambiguous Word"
-
R352 Lawrence Hall and Kevin Bowyer, "Comparing Pure Parallel
Ensemble Creation Techniques against Bagging"
-
R355 Hwanjo Yu, "General MC: Estimating Boundary of Positive Class
from Small Positive Data"
-
R358 Ping Chen, Chenyi Hu, Wei Ding, Heloise Lynn, and Simon Yves,
"Icon-based Visualization of Large High-Dimensional Datasets"
-
R360 Tao Li, "Using Discriminant Analysis for Multi-class
Classification"
-
R368 Stanley Oliveira and Osmar Zaiane, "Protecting Sensitive
Knowledge By Data Sanitization"
-
R381 Juan Velasquez, Hiroshi Yasuda, and Terumasa Aoki, "Combining
the web content and usage mining to understand the visitor behavior in
a web site"
-
R382 Jyh-Jong Tsay, "Enhancing Techniques for Efficient Topic
Hierarchy Integration"
-
R399 Frans Coenen, Paul Leng, and Ahmed Shakil, "T-trees, Vertical
Partitioning and Distributed Association Rule Mining"
-
R403 Lemuel Waitman, Douglas Fisher, and Paul King, "Bootstrapping
Rule Induction"
-
R406 Rajaraman Kanagasabai and Ah-Hwee Tan, "Mining Semantic
Networks for Knowledge Discovery"
-
R437 Andreas Hotho, Steffen Staab, and Gerd Stumme, "Ontologies
Improve Text Document Clustering"
-
R438 Sriharsha Veeramachaneni and Paolo Avesani, "Active Sampling
for Feature Selection"
-
R443 Arkadiusz Wojna, "Center-Based Indexing for Nearest Neighbors
Search"
-
R451 Ed Heierman and Diane Cook, "Improving Home Automation by
Discovering Regularly Occurring Device Usage Patterns"
-
R452 Mark Krogel and Tobias Scheffer, "Effectiveness of Information
Extraction, Multi-Relational, and Semi-Supervised Learning for Mining
Microarray Data"
-
R457 Hongwei Zhu and Otman Basir, "A K-NN Associated Fuzzy
Evidential Reasoning Classifier With Adaptive Neighbor Selection"
-
R465 Horia Nicolai Teodorescu and LucianIulian Fira, "A Hybrid
Data-Mining Approach in Genomics and Text Structures"
-
R469 Qiang Yang, Jie Yin, Charles Ling, and Tielin Chen,
"Postprocessing Decision Trees to Extract Actionable Knowledge"
-
R486 Pasi Frdnti, Olli Virmajoki, and Ville Hautamdki, "Fast
PNN-Based Clustering Using K-Nearest Neighbor Graph"
-
R494 Julien BLANCHARD, Fabrice GUILLET, and Henri BRIAND, "A
user-driven and quality-oriented visualization for mining association
rules"
-
R496 Jeremy Kubica, Andrew Moore, and Jeff Schneider, "Tractable
Group Detection on Large Data Sets"
-
R503 Daniel Keim, Stephen North, Christian Panse, and
Mike Sips, "PixelMaps: A
New Visual Data Mining Approach for Analyzing Large Spatial Data Sets"
-
R512 Matthew V. Mahoney and Philip K. Chan, "Learning Rules for
Anomaly Detection of Hostile Network Traffic"
-
R516 Francois Fouss, Jean-Michel Renders, and Marco Saerens, "Links
between Kleinbergs hubs and authorities, correspondence analysis, and
Markov Chains"
-
R517 Hanchuan Peng and Chris Ding, "Structure Search and Stability
Enhancement of Bayesian Networks"
-
R525 Tassos Argyros and Charis Ermopoulos, "Efficient Subsequence
Matching in Time Series Databases Under Time and Amplitude
Transformations"
-
R537 Amihood Amir, Reuven Kashi, Daniel Keim, Nathan Netanyahu, and
Markus Wawryniuk, "Analyzing High-Dimensional Data by Subspace
Validity"
- R541 Daniel Barbara, Carlotta Domeniconi, and Ning
Kang, "Mining Relevant Text from Unlabelled Documents"
-
R547 Keke Chen and Ling Liu, "Validating and Refining Clusters via
Visual Rendering"
-
R551 Zhaohui Zheng, Rohini Srihari, and Sargur Srihari, "A Feature
Selection Framework for Text Filtering"
-
R555 Chang-Tien Lu, Dechang Chen, and Yufeng Kou, "Algorithms for
Spatial Outlier Detection"
-
R556 Ching-Huang Yun, Kun-Ta Chuang, and Ming-Syan Chen,
"Clustering Item Data Sets with Association-Taxonomy Similarity"
-
R568 Michele Sebag and Jerome Aze, "Evolutionary Optimization of
the ROC Curve: Application to Medical Data Mining"
-
R572 Young-Koo Lee, Won-Young Kim, Y. Dora Cai, and Jiawei Han,
"CoMine: Efficient Mining of Correlated Patterns"
-
R586 Ricardo Vilalta, Murali-Krishna Achari, and Christoph Eick,
"Class Decomposition Via Clustering: A New Framework For Low-Variance
Classifiers"
-
R604 Doina Caragea, Dianne Cook, and Vasant Honavar, "Towards
Simple, Easy-to-Understand, yet Accurate Classifiers"
-
R606 Huseyin Polat and Wenliang Du, "Privacy-Preserving
Collaborative Filtering using Randomized Perturbation Techniques"
-
R610 Tomoyuki shubata, Takekazu Kato, and Toshikazu Wada, "K-D
Decision Tree;An Accelerated and Memory Efficient Nearest Neighbor
Classifier"
-
R617 Jennifer Neville, David Jensen, and Brian Gallagher, "Simple
Estimators for Relational Bayesian Classifiers"
-
R621 Peng Zhang, Jing Peng, and Carlotta Domeniconi,
"Dimensionality Reduction Using Kernel Pooled Local Discriminant
Information"
-
R629 Jun Huan, Wei Wang, and Jan Prins, "Efficient Mining of
Frequent Subgraph in the Presence of Isomophism"
-
R634 Shusaku Tsumoto, "Pattern Discovery based on Rule Induction
and Taxonomy Generation"
-
R641 Aijun An, Shakil Khan, and Xiangji Huang, "Objective and
Subjective Algorithms for Grouping Association Rules"
-
R645 Matthew Otey, Adriano Veloso, Chao Wang, Srinivasan
Parthasarathy, and Wagner Meira Jr., "Incremental Techniques for
Mining Dynamic and Distributed Databases"
-
R657 Inderjit Dhillon and Yuqiang Guan, "Information Theoretic
Clustering of Sparse Co-Occurrence Data"
-
R673 Yuefeng Li and Ning Zhong, "Interpretations of Association
Rules by Granular Computing"
-
R677 James Bailey, Thomas Manoukian, and Kotagiri Ramamohanarao, "A
Fast Algorithm for Computing Hypergraph Transversals and its
Application in Mining Emerging Patterns"
-
R684 Sameer Pradhan, Kadri Hacioglu, Wayne Ward, James Martin, and
Dan Jurafsky, "Semantic Role Parsing: Adding Semantic Structure to
Unstructured Text"
Industry-Track Papers
-
I203 Mingkun Li, Shuo Feng, Ishwar Sethi, Jason Luciow, and Keith
Wagner, "Mining Production Data with Neural Network & CART"
-
I208 Kaidi Zhao, Bing Liu, Tom Tirpak, and Andreas Schaller,
"Detecting Patterns of Change Using Enhanced Parallel Coordinate
Visualization"
-
I213 Steve Selvaggio, Zach Zakharian, Jutta Kreyss, and Michael
White, "Text Mining for a Clear Picture of Defect Reports: A Praxis
Report"
-
R220
Rajat Gupta, B.V.L. Narayana,
P. Krishna Reddy, G.V. Ranga Rao, C.L.L. Gowda, Y.U.R. Reddy and G.Rama Murthy,
"Understanding Helicoverpa
armigera Pest Population Dynamics related to Chickpea Crop Using
Neural Networks"
-
R285 Frank Dellmann, Holger Wulff, and Stefan Schmitz, "Statistical
Analysis of Web Log Files of a German Automobile Producer: Findings
from a Practical Project Concerning Web Usage Mining"
-
R303 Qinghua Guo, Maggi Kelly, and Catherine Graham, "One-class
support vector machines for predicting distribution of Sudden Oak
Death in California"
-
R402 Phuong Minh Tu, Doheon Lee, and Kwang-Hyung Lee, "Regulatory
element discovery using tree-structured modes"
-
R531 Byung-Hoon Park, George Ostrouchov, Gong-Xin Yu, Al Geist,
Andrey Gorin, and Nagiza Samatova, "Inference of Protein-Protein
Interactions by Unlikely Profile Pair"
-
R574 Choh Man Teng, "Applying Noise Handling Techniques to Genomic
Data: A Case Study"
This page has been accessed
times since August 15, 2003.