CS 590D Course Info.
Data Mining has emerged as one of the most exciting and dynamic fields in comput
er
science. The driving force for data mining is the presence of petabyte-scale onl
ine archives that potentially contain valuable bits of information hidden
in them. Commerical enterprises have been quick to recognize the value of
this concept; consequently, within the span of a few years, the software
market itself for data mining is expected to be in excess of
$5 billion by the end of this year.
Simple stated, data mining refers to a family of techniques
used to detect `interesting' nuggets of relationships/knowledge in data.
While the theoretical underpinnings of the field have been
around for quite some time (in the form of pattern recognition, statistics, data analysis
and machine learning), the practice and use of these techniques
have been largely ad-hoc. With the availability of large databases to store, manage
and assimilate data, the new thrust of data mining lies at the intersection of database systems,
artificial intelligence and algorithms that efficiently analyze data.
The distributed nature of several databases, their size and the high complexity
of many techniques present interesting computational challenges.
There have been several success stories in this relatively young area:
the SKICAT
system for automatic cataloguing of sky surverys (JPL), the Advanced Scout system for mining NBA
data (IBM), the QuakeFinder system for geoscientific data mining (UCLA/JPL) and the PYTHIA system
for mining information from performance evaluation of scientific software (Purdue). Almost all major vendors like
Microsoft, IBM
have jumped onto the data mining bandwagon and Kluwer Academic Publishers have started a new journal specifically devoted to this topic.
CS590D is designed to provide graduate students with a broad background in the practice
and use of serial, distributed and parallel data mining algorithms, tools and specialized expertise in applying these ideas
to a real-life situation. Case studies will be provided using practical examples of data
mining systems. The projects will hone their skills in programming, database
systems, parallel computing and the use of mining tools in a database
environment.
Prerequsites: Background knowledge of databases and algorithms. Graduate Student Standing or
consent of instructor(s).