Security Issues in Data Mining |
Tuesdays and Thursdays, 9:00-10:15 |
Heavilon Hall 123 |
Chris Clifton |
Email: |
Data mining, the discovery of new and interesting patterns in large datasets, is an exploding field. Recently there has been a realization that data mining has an impact on security (including a workshop on Data Mining for Security Applications.) One aspect is the use of data mining to improve security, e.g., for intrusion detection. A second aspect is the potential security hazards posed when an adversary has data mining capabilities.
This seminar will explore the field of data mining from a security perspective. My goal is that on completing the course you will have a solid background in the area, such that you will be ready to pursue research on some aspect of data mining security.
The course will begin with a tutorial on data mining. The contents and scope of this tutorial will depend on the background and preparation of the students. The bulk of the course will concentrate on exploring recent advances in the field through investigation of the research literature.
The workload in the course will be as follows:
There may be additional/alternative work assigned, especially during the tutorial portion of the class.
Ideally, students in the course would have a good background in data mining, some database experience, a knowledge of probability and statistics, and a good background in computer security. However, I doubt many students will have such a background. What I consider a reasonable set of prerequisites is two of the following three:
Permission of instructor is of course a sufficient prerequisite. If you are interested in the course, but do not have two of the above three (or you are unsure if you have sufficient background), please email me with why you are interested and what you consider to be your relevant background.
Please read the above link to the policy written by Professor Spafford. This will be followed unless I provide written documentation of exceptions.
Late work will only be accepted in case of documented emergency (e.g., medical emergency), or by prior arrangement if doing the work in advance is impossible due to fault of the instructor (e.g., you can't do a review early because the paper hasn't been assigned yet.)
Reviews should be an independent analysis of the paper - collusion between reviewers is poor practice. Therefore I ask that reviewers of a paper not discuss the paper with the other reviewers before writing their own review. This will help bring a healthy difference of opinion into classroom discussions. One exception to this: If you are presenting a paper and have difficulty understanding it, you are encouraged to talk to the people reviewing the paper to see if they have insights that may help you in your presentation.
Evaluation will be a subjective process, however it will be based primarily on your understanding of the material as evidenced in:
I will evaluate presentations and reviews on a five point scale:
If the number of students is in the right range (allowing between two and three classroom presentations for each student), you will have the option of doing a final project: a research proposal for work in this area. This will be done instead of presentations and reviews in the final two weeks of the course. Students opting not to write a proposal will have one additional presentation and additional review (during the final two weeks) giving equal opportunity to demonstrate knowledge of the material.
Note: The time may be changed to 7:30-8:45 or 4:30-5:45 if there are no conflicts. This would be done to get a room where the lectures can be videotaped - I'd like you to have a chance to see yourself presenting. This will be done later in the term, if necessary. For now, the 9:00-10:15 slot is the one we will use.
Please add yourself to the course mailing list. Send mail to mailer@cs.purdue.edu containing the line:
add your email to cs590m
You may want to use the Purdue Libraries proxy server to get on-line access to more papers.
Date | Paper | Presenter | Reviewers |
---|---|---|---|
9/11 | Charles Elkan, KDD Cup '99 | Chris Clifton (slides) | Jaideep Shrikant Vaidya Amit J. Shirsat |
Wenke Lee, Sal Stolfo. ``Data Mining Approaches for Intrusion Detection'' In Proceedings of the Seventh USENIX Security Symposium (SECURITY '98), San Antonio, TX, January 1998. | Murat Kantarcioglu | Addam Schroll, Eirik Herskedal, Ann-Sofie Nystrom | |
9/13 | James Cannady. The Application of Artificial Neural Networks to Misuse Detection: Initial Results First International Workshop on the Recent Advances in Intrusion Detection (RAID98), September 14-16, 1998, Louvain-la-Neuve, Belgium. | Evimaria Dimitrios Terzi | Rajeev Gopalkrishna, Xiaodong Lin |
Wenke Lee, Sal Stolfo, and Kui Mok. ``A Data Mining Framework for Building Intrusion Detection Models'' In Proceedings of the 1999 IEEE Symposium on Security and Privacy, Oakland, CA, May 1999 | Pat Gorman | Benjamin Lee, Mohamed Galal Elfeky, James Joshi | |
9/18 | Wenke Lee, Sal Stolfo, and Kui Mok. ``Mining Audit Data to Build Intrusion Detection Models'' In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD '98), New York, NY, August 1998 | Rajeev Gopalakrishna | James Joshi, Murat Kantarcioglu, Evimaria Dimitrios Terzi |
Wenke Lee, Sal Stolfo, and Kui Mok. ``Mining in a Data-flow Environment: Experience in Network Intrusion Detection'' In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '99), San Diego, CA, August, 1999 | Benjamin Lee | Mohamed Galal Elfeky, Pat Gorman | |
9/20 | Filippo Neri, Mining TCP/IP Traffic for Network Intrusion Detection by Using a Distributed Genetic Algorithm, in Proceedings of the 11th European Conference on Machine Learning, Barcelona, Catalonia, Spain, May 31-June 2, 2000. (on reserve in the Math Library) | Eirik Herskedal | Jaideep Shrikant Vaidya, Xiaodong Lin |
R. Lippman and S. Cunningham, ``Improving Intrusion Detection Performance using Keyword Selection and Neural Networks, In Proceedings of the Second International Workshop on Recent Advances in Intrusion Detection (Raid99), September 7-9, 1999, West Lafayette, Indiana. | Ann-Sofie Nystrom | Addam Schroll, Amit J. Shirsat | |
9/25 | Wenke Lee, Sal Stolfo, and Kui Mok, ``Adaptive Intrusion Detection: a Data Mining Approach'' Artificial Intelligence Review, Kluwer Academic Publishers, 14(6):533-567, December 2000. | Addam Schroll | Pat Gorman, |
Wenke Lee and Sal Stolfo, A Framework for Constructing Features and Models for Intrusion Detection Systems ACM Transactions on Information and System Security 3(4) (November 2000). | Jaideep Shrikant Vaidya | Xiaodong Lin, Evimaria Dimitrios Terzi | |
9/27 | Stefanos Manganaris, Marvin Christensen, Dan Zerkle, Keith Hermiz, ``A Data Mining Analysis of RTID Alarms'', First International Workshop on the Recent Advances in Intrusion Detection (RAID98), September 14-16, 1998, Louvain-la-Neuve, Belgium. Better yet, see Computer Networks, Volume 34 for a later version. | Mohamed Galal Elfeky | James Joshi, Ann-Sofie Nystrom, Murat Kantarcioglu |
Daniel Barbará, Ningning Wu, Julia Couto, and Sushil Jajodia, ``Mining Unexpected Rules in Network Audit Trails'', journal article in preparation/review. | Amit J. Shirsat | Rajeev Gopalkrishna, Eirik Herskedal, Benjamin Lee | |
10/2 | Stefan Axelsson, ``The Base-Rate Fallacy and the Difficulty of Intrusion Detection'', In Proceedings of the 6th ACM Conference on Computer and Communications Security, pp. 1-7, November 1-4, 1999, Kent Ridge Digital Labs, Singapore. See also his licentiate of engineering thesis. | Xiaodong Lin | Amit J. Shirsat, Murat Kantarcioglu, Eirik Herskedal, Rajeev Gopalkrishna |
Wenke Lee, Wei Fan, Matt Miller, Sal Stolfo, and Erez Zadok ``Toward Cost-Sensitive Modeling for Intrusion Detection and Response'' to appear in Journal of Computer Security, 2001. | James Joshi | Evimaria Dimitrios Terzi, Jaideep Shrikant Vaidya, Benjamin Lee, Pat Gorman | |
10/4 | Corinna Cortes, Kathleen Fisher, Daryl Pregibon and Anne Rogers, ``Hancock: a language for extracting signatures from data streams'', Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining August 20 - 23, 2000, Boston, MA USA. | Kathleen Fisher | Mohamed Galal Elfeky, Ann-Sofie Nystrom |
10/9 | October Break. Students presenting or turning in a review for the papers on October 11 will not be expected to due one the week of Thanksgiving vacation. I would like to have at least six students presenting/reviewing on October 11 and November 20, so if you have a preference for which week you have "off", get your requests in early. | ||
10/11 | Xinzhou Qin, Wenke Lee, Lundy Lewis and Joao B. D. Cabrera ``Using MIB II Variables For Network Anomaly Detection: A feasibility Study'' Workshop on Data Mining for Security Applications. | Pat Gorman | Benjamin Lee, Amit J. Shirsat |
Jianxiong Luo and Susan M. Bridges, ``Mining fuzzy association rules and fuzzy frequency episodes for intrusion detection'', International Journal of Intelligent Systems 15(8), 2000. Pages: 687-70 | Murat Kantarcioglu | Jaideep Shrikant Vaidya, Mohamed Galal Elfeky, Ann-Sofie Nystrom | |
10/16 | Zheng Zhang, Jun Li, Constantine Manikopoulos, Jay Jorgenson and Jose Ucles ``HIDE: a Hierarchical Network Intrusion Detection System Using Statistical Preprocessing and Neural Network Classification'', 2001 IEEE Man Systems and Cybernetics Information Assurance Workshop. | Robert Gwadera | Addam Schroll, Murat Kantarcioglu |
Discussion of the progress of a research project from first publication to Ph.D. For additional reading see Wenke Lee, Sal Stolfo, and Phil Chan, ``Learning Patterns from Unix Process Execution Traces for Intrusion Detection'', AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, July 1997; and Wenke Lee's dissertation. | Ben Kuperman | (none) | |
10/18 | Oliver M. Dain and Robert K. Cunningham, ``Fusing Heterogeneous Alert Streams into Scenarios'', Workshop on Data Mining for Security Applications. | Addam Schroll | Eirik Herskedal, Evimaria Dimitrios Terzi, Amit J. Shirsat, James Joshi, Pat Gorman, Rajeev Gopalkrishna |
Maloof, M.A. and Michalski, R.S., ``A Partial Memory Incremental Learning Methodology and its Application to Intrusion Detection'' Proceedings of the 7th IEEE International Conference on Tools with Artificial Intelligence, 1995. (For more details on the learning algorithm used, see: Maloof, M.A. and Michalski, R.S., ``Selecting examples for partial memory learning'', Machine Learning 41:27-52, 2000.) | Chris Clifton | Mohamed Galal Elfeky, Jaideep Shrikant Vaidya, Xiaodong Lin, Ann-Sofie Nystrom | |
10/23 | Nong Ye and Xiangyang Li, ``A Scalable Clustering Technique for Intrusion Signature Recognition'' 2001 IEEE Man Systems and Cybernetics Information Assurance Workshop, West Point, NY, June 5-6, 2001. | Evimaria Dimitrios Terzi | Benjamin Lee, James Joshi |
Leonid Portnoy, Elezar Eskin, and Sal Stolfo ``Intrusion detection with unlabeled data using clustering'' Workshop on Data Mining for Security Applications. (Leonid Portnoy's thesis is also available.) | Xiaodong Lin | , | |
10/25 | Matthew G. Schultz, Eleazar Eskin, Erez Zadok, and Salvatore J. Stolfo, ``Data Mining Methods for Detection of New Malicious Executables'', IEEE Symposium on Security and Privacy, Oakland, CA, May 2001. | Mohamed Galal Elfeky | Pat Gorman, Murat Kantarcioglu, Addam Schroll, Ann-Sofie Nystrom |
Christoph Michael and Anup K. Ghosh, ``Using Finite Automata to Mine Execution Data for Intrusion Detection: A Preliminary Report'', Third International Workshop on the Recent Advances in Intrusion Detection, October 2-4, 2000, Toulouse, France. (also available here). | Jaideep Shrikant Vaidya | Eirik Herskedal, Amit J. Shirsat, Rajeev Gopalkrishna | |
10/30 | O. de Vel, A. Anderson, M. Corney, and G. Mohay, ``Multi-Topic E-mail Authorship Attribution Forensics'' Workshop on Data Mining for Security Applications. | Ann-Sofie Nystrom | Pat Gorman, Addam Schroll |
Terran Lane and Carla E. Brodley, ``Temporal sequence learning and data reduction for anomaly detection, ACM Transactions on Information Systems Security 2(3) (Aug. 1999), Pages 295 - 331. | Benjamin Lee | , | |
11/1 | J. Hale, J. Threet, S. Shenoi, ``A Practical Formalism for Imprecise Inference Control'', Proceedings of the 8th IFIP WG11.3 Workshop on Database Security. | James Joshi | Eirik Herskedal, Evimaria Dimitrios Terzi, Mohamed Galal Elfeky |
S. Rath, D. Jones, J. Hale, S. Shenoi, ``A Tool for Inference Detection and Knowledge Discovery in Databases'', in Proceedings of the 9th IFIP WG11.3 Workshop on Database Security. | Amit J. Shirsat | Murat Kantarcioglu, Jaideep Shrikant Vaidya, Xiaodong Lin | |
11/6 | J. Hale and S. Shenoi, ``Analyzing FD Inference in Relational Databases'', Data and Knowledge Engineering Journal, vol. 18, pp. 167-183, 1996 | Eirik Herskedal | Benjamin Lee, James Joshi |
Chris Clifton and Don Marks, ``Security and Privacy Implications of Data Mining'', ACM SIGMOD Workshop on Data Mining and Knowledge Discovery, Montreal, Canada, June 2, 1996. | Rajeev Gopalkrishna | Pat Gorman, Jaideep Shrikant Vaidya | |
11/8 | M. Atallah, M., E. Bertino, E., A. K. Elmagarmid, A.K., M. Ibrahim, and V. S. Verykios, ``Disclosure Limitation of Sensitive Rules'', In Proceedings of 1999 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX'99) pp. 45-52, November 1999, Chicago, IL. | Amit J. Shirsat | Mohamed Galal Elfeky, Ann-Sofie Nystrom, Addam Schroll, Xiaodong Lin |
Chris Clifton ``Using Sample Size to Limit Exposure to Data Mining'', Journal of Computer Security 8(4), IOS Press, November 2000. | Chris Clifton | Rajeev Gopalkrishna, Evimaria Dimitrios Terzi | |
11/13 | T. D. Johnsten and V. V. Raghavan, ``Impact of decision-region based classification mining algorithms on database security'', In V. Atluri and J. Hale, editors, Research Advances in Database and Information Systems Security, pages 171-191. Kluwer Academic, Norwell, MA, 2000. (See also the conference preproceedings version for a slightly longer treatment: ``Impact of decision-region based classification mining algorithms on database security'', In Proc. of Thirteenth IFIP WG 11.3 Working Conference on Database Security, Seattle, WA, July 1999.) | Benjamin Lee | Pat Gorman, Eirik Herskedal |
T. D. Johnsten and V. V. Raghavan, ``Security Procedures for Classification Mining Algorithms'', Fifteenth Annual IFIP WG 11.3 Working Conference on Database and Application Security, Niagara on the Lake, Ontario, CANADA, July 15-18, 2001. | Chris Clifton | Addam Schroll, | |
11/15 | Y. Lindell and B. Pinkas, ``Privacy Preserving Data Mining'', In Crypto 2000, Springer-Verlag (LNCS 1880), pages 36-54, 2000. | Jaideep Shrikant Vaidya | Murat Kantarcioglu, James Joshi, Xiaodong Lin |
R. Agrawal and R. Srikant, "Privacy-Preserving Data Mining", Proc. of the ACM SIGMOD Conference on Management of Data, Dallas, May 2000. | Mohamed Galal Elfeky | Ann-Sofie Nystrom, Evimaria Dimitrios Terzi, Amit J. Shirsat, Rajeev Gopalkrishna | |
11/20 | Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Rajeev Motwani, Jeffrey D. Ullman and Cheng Yang ``Finding Interesting Associations without Support Pruning'', in Proceedings of the 16th International Conference on Data Engineering, 28 February - 3 March, 2000, San Diego, California. | Murat Kantarcioglu | Xiaodong Lin, |
Yucel Saygin, Vassilios S. Verykios, and Chris Clifton, ``Using Unknowns to Prevent Discovery of Association Rules'', Submitted to ACM SIGMOD Record special issue on Data Mining and Security. | James Joshi | Evimaria Dimitrios Terzi, Addam Schroll, Eirik Herskedal, Rajeev Gopalkrishna | |
11/22 | Thanksgiving. | ||
11/27 | Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava, ``Grouping Web Page References into Transactions for Mining World Wide Web Browsing Patterns'', in Proceedings of the 1997 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX-97), November 1997. | Evimaria Dimitrios Terzi | Benjamin Lee, James Joshi |
Wai Chiu Wong and A. Fu, ``Incremental Document Clustering for Web Page Classification'', IEEE 2000 Int. Conf. on Info. Society in the 21st century: emerging technologies and new challenges (IS2000), Nov 5-8, 2000, Japan. | Amit J. Shirsat | Mohamed Galal Elfeky, Jaideep Shrikant Vaidya | |
11/29 | S. Hofmeyr, S. Forrest, and A. Somayaji, ``Intrusion Detection Using Sequences of System Calls'', Journal of Computer Security Vol. 6, pp. 151-180 (1998). | Ann-Sofie Nystrom | Eirik Herskedal, Rajeev Gopalkrishna |
Christopher J.C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery 2(2): 121-167, June 1998. | Xiaodong Lin | , | |
Bonus session: 10:30am, LAEB B254 (not mandatory) | Privacy Preserving Association Rule Mining in Vertically Partitioned Data. | Jaideep Shrikant Vaidya | None |
12/4 | Data Mining applied to File Integrity. | Pat Gorman | None |
Dakshi Agrawal and Charu C. Aggarwal, ``On the design and quantification of privacy preserving data mining algorithms'', in Proceedings of the twentieth ACM SIGMOD_SIGACT-SIGART symposium on principles of Database Systems on Principles of database systems, 2001. | Eirik Herskedal | Murat Kantarcioglu, Amit J. Shirsat, Benjamin Lee | |
12/6 | Classifying disk blocks into file type. | Addam Schroll | None |
Bing Liu, Yiming Ma, Philip S. Yu, ``Discovering Unexpected Information from your Competitors' Web Sites'' in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-2001). | Rajeev Gopalkrishna | Evimaria Dimitrios Terzi, Ann-Sofie Nystrom, James Joshi, Xiaodong Lin, Mohamed Galal Elfeky |