Projects for CS590D
Themes for Projects: These are just a sampler. You can work in groups of 2-3, depending on
the complexity/magnitude of the project.
- Enhancing the expressiveness of query languages by embedding "mining" functions in SQL etc.
This would entail, adding more user defined functions/operators that will call a data mining
routine that is either written by you or available as a standard off-the-shelf component. Also
involves performance evaluation of your implementation.
Medium to Heavy Programming.
- Complete, case study of a particular application as a domain for data mining. This would involve
feasibility analysis, data cleaning and deciding on the right tool/software to use, conducting data mining
and reporting/visualization of the results. Light programming.
- Research into the design/improvement of a specific data mining algorithm or dwelving into parallelization of an existing
algorithm. Would need to generate datasets for performance evaluation and conduct comparative results. Medium
Programming.
- Any ideas for theoretical/analytic work that lies at the intersection of one or more sub areas/sub disciplines
of data mining and its constituent topics.
- Integration of data mining systems into several kinds of "host" environments. Tight coupling of inductive functionality
into existing applications. Preferably using a commercial RDBMS or OLAP tool. Medium to Heavy programming.
- "Closing the loop" - Data Mining is never a single step, rather an iterative process involving several modules. Novel
ideas and suggestions for building a closed loop data mining system are also welcome. Medium programming. Would involve
interfacing tools with one another.