Data Mining Assignment 1
Spring 2009
Submission Deadline: Wednesday, February 4, at
4.00pm
Experiments with C4.5
To get a copy of the C4.5 source code, download it from the following
website:
http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/c4.5/c4.5r8.tar.gz
- Download at least three databases from the UCI Machine Learning
Repository (http://www1.ics.uci.edu/~mlearn/MLSummary.html) and run
C4.5 on each of them. If necessary, format the .names file and create
a .test file for each database.
- Submit a report in plain text by e-mail with the following
information:
- The three database names, and the number of files for each
database you have downloaded from the UCI Machine Learning Repository.
- Your predictive accuracy results of C4.5 on one of these
databases with different combinations of the following switches: -g,
-m, -c, and -s. You should try at least two different values for each
of -m and -c.
- What is the best predictive accuracy from the above experiments
and what are the switch settings for achieving the best predictive
accuracy? Also, how many examples do you have in each of the training
and testing files? Please note that the best accuracy might not need
every switch to be present.
Please e-mail questions to xwu@cs.uvm.edu.