Project 2: Classification
Start date 20 February, due 5 March beginning of class.
Your task for this project is to extend the ID3 classifier
(provided in the Weka package) to support postpruning.
Use the
UCI Machine Learning Repository
Iris and
Adult
dataset for this tasks. You are welcome to try on other datasets, but
the results you turn in should be based on these datasets.
Project Report
The project report should contain the following:
- Description of the method used (e.g., cost-based pruning).
- Documentation for how to use your class (should probably inherit from
weka.classifiers.trees.Id3).
- Sample run and results.
- Commentary: Does it work well (e.g., accuracy, efficiency)?
What do you think are the advantages/disadvantages?
If you were to do it again, what would you do differently?
Also turn in your code (obviously.)
Scoring
Scoring will be based on:
- Correctness of execution (1-2 points)
- Quality of interface defined (1-2 points)
- Quality of documentation (1-2 points)
- Quality/readability of code (1 point)
- Difficulty of pruning method used (1 point)
- Demonstration of understanding of tradeoffs/issues (1 point)
Turning in the project
Electronic submission preferred. Please use the
turnin
command (on mentor.ics.purdue.edu,
turnin -c cs490d -p proj2 directoryname).
If that doesn't work, you can tar/zip and email to
.
Pdf is the safest for capturing non-text.
Hard copy is acceptable, please hand in at the beginning of class.
