CS 590D Course Information

CS 590D Course Info.

Data Mining has emerged as one of the most exciting and dynamic fields in comput er science. The driving force for data mining is the presence of petabyte-scale onl ine archives that potentially contain valuable bits of information hidden in them. Commerical enterprises have been quick to recognize the value of this concept; consequently, within the span of a few years, the software market itself for data mining is expected to be in excess of $5 billion by the end of this year.

Simple stated, data mining refers to a family of techniques used to detect `interesting' nuggets of relationships/knowledge in data. While the theoretical underpinnings of the field have been around for quite some time (in the form of pattern recognition, statistics, data analysis and machine learning), the practice and use of these techniques have been largely ad-hoc. With the availability of large databases to store, manage and assimilate data, the new thrust of data mining lies at the intersection of database systems, artificial intelligence and algorithms that efficiently analyze data. The distributed nature of several databases, their size and the high complexity of many techniques present interesting computational challenges.

There have been several success stories in this relatively young area: the SKICAT system for automatic cataloguing of sky surverys (JPL), the Advanced Scout system for mining NBA data (IBM), the QuakeFinder system for geoscientific data mining (UCLA/JPL) and the PYTHIA system for mining information from performance evaluation of scientific software (Purdue). Almost all major vendors like Microsoft, IBM have jumped onto the data mining bandwagon and Kluwer Academic Publishers have started a new journal specifically devoted to this topic.

CS590D is designed to provide graduate students with a broad background in the practice and use of serial, distributed and parallel data mining algorithms, tools and specialized expertise in applying these ideas to a real-life situation. Case studies will be provided using practical examples of data mining systems. The projects will hone their skills in programming, database systems, parallel computing and the use of mining tools in a database environment.

Prerequsites: Background knowledge of databases and algorithms. Graduate Student Standing or consent of instructor(s).