University of Vermont
Computer Science Technical Report CS-05-04
Mining Sequential Patterns Across Data Streams
by
Gong Chen, Xindong Wu, and Xingquan Zhu
Abstract
There are extensive endeavors toward mining frequent items or itemsets
in a single data stream, but rare efforts have been made to explore
sequential patterns among literals in different data streams. In this
paper, we define a challenging problem of mining frequent sequential
patterns across multiple data streams. We propose an efficient
algorithm MILE (MIning from muLtiple strEams) to
manage the mining process. The proposed
algorithm recursively utilizes the knowledge of existing patterns to
make new patterns' mining fast. We also apply a state-of-the-art
sequential pattern mining algorithm PrefixSpan which was designed for
transaction databases to solve our problem. Extensive empirical
results show that MILE is significantly faster than PrefixSpan. One
unique feature of our algorithm is when some prior knowledge of the
data distribution in the data streams is available, it can be
incorporated into the mining process to further improve the
performance of MILE. As MILE consumes more memory than PrefixSpan, we
also propose a solution to balance the memory usage and time
efficiency in memory limited environments.
Last updated: March 1, 2005