Intrusion Detection and
Database Systems



A Flattened Schema

Summary:

      This approach uses one table per event class of audit data. That is, the events we represent in the database can have different attributes attached to them. We subdivide the total set of possible events into groups based upon the attributes that they can contain. In this way, we have a reasonably subdivided set of audit data, but one which we can effectively search for intrusions.

Advantages:

      A wider flatter table structure leads to fewer joins and faster searches over the data. We recognize that breaking down normalization in this case can have some disadvantages in terms of redunancy, but hope gain some rewards in overall speed. It should also be noted that this schema is the most specialized in terms of its closness to BSM. While the others are wonderful demonstrations that intrusion detection can be sucessfully performed with a relational database management system, this schema in particular really only shows that BSM can be efficiently inspected for evidence of intrusions. This might seem to reduce the value of this schema, but in fact we believe that it is quite important, as it demonstrates that specialized designs can be generated to match specific audit logging systmems, designs which can have reasonable performance and closely match the type of data being output by the audit logging system.

Disadvantages:

      The tables will be quite wide, that is, there will be many columns as each event classification may require several attributes to represent, and each attribute will have several fields. This means that we will have a bulky structure which may prove rather unweildy, and additionally, there will be duplications of parts the structure between some tables. While these flaws may at first glance seem to be fatal, the tables actually turn out to be not all that difficult to conceptualize, comprehend, and generate queries for.

Questions:

      Going into this project and into this schema design there are two very vital questions that we need to deal with. These will be the basis for making decisions as to the value of each of our schema designs as we go forward with research, and as we expand the set of intrusions we consider. Below are the questions and a brief discussion of their importance as well as the assumptions we make as to the actual results that we will see.
  • Will this method show any speed improvement over the other available structures?
  • At the start of this design, our belief was that this method would provide real advantages in terms of performace over the other schema designs. The elimination of joins and the locality of reference created in retrieving the data from a single row seem to imply a real benifit to be gained from this approach. Below we will analize the schema under varying data loads to determine wether or nor these assumptions are borne out.
  • Will certain tables become heavily loaded when real BSM data is loaded into the database? Will this be a bigger problem in this type of schema than in the others?
  • This is an interesting question. The synthetic data loading presented below may help us gain an understanding of this, but in fact, the limited set of intrusions that are currently detected and the contrived nature of synthetic testing may make this a difficult question to answer. The benchmark results will hopefull provide some direction in the task of ansewering this question, but it is clear that further research (in real world trials) will be required before we can truly assess the impact of these factors upon the performance of all of our schema designs.