A Flattened Schema
|
Summary:
This approach uses one table per event class of audit data. That is,
the events we represent in the database can have different attributes attached
to them. We subdivide the total set of possible events into groups based upon
the attributes that they can contain. In this way, we have a reasonably
subdivided set of audit data, but one which we can effectively search for
intrusions.
Advantages:
A wider flatter table structure leads to fewer joins and faster
searches over the data. We recognize that breaking down normalization
in this case can have some disadvantages in terms of redunancy, but
hope gain some rewards in overall speed. It should also be noted that this
schema is the most specialized in terms of its closness to BSM. While the
others are wonderful demonstrations that intrusion detection can be
sucessfully performed with a relational database management system, this
schema in particular really only shows that BSM can be efficiently inspected
for evidence of intrusions. This might seem to reduce the value of this
schema, but in fact we believe that it is quite important, as it demonstrates
that specialized designs can be generated to match specific audit logging
systmems, designs which can have reasonable performance and closely match the
type of data being output by the audit logging system.
Disadvantages:
The tables will be quite wide, that is, there will be many columns
as each event classification may require several attributes to represent, and
each attribute will have several fields. This means that we will have a
bulky structure which may prove rather unweildy, and additionally, there will
be duplications of parts the structure between some tables. While these flaws
may at first glance seem to be fatal, the tables actually turn out to be not
all that difficult to conceptualize, comprehend, and generate queries for.
Questions:
Going into this project and into this schema design there are two very
vital questions that we need to deal with. These will be the basis for making
decisions as to the value of each of our schema designs as we go forward with
research, and as we expand the set of intrusions we consider. Below are the
questions and a brief discussion of their importance as well as the assumptions
we make as to the actual results that we will see.
- Will this method show any speed improvement over the other available
structures?
- At the start of this design, our belief was that this method would
provide real advantages in terms of performace over the other schema
designs. The elimination of joins and the locality of reference created
in retrieving the data from a single row seem to imply a real benifit
to be gained from this approach. Below we will analize the schema under
varying data loads to determine wether or nor these assumptions are
borne out.
- Will certain tables become heavily loaded when real BSM data is loaded
into the database? Will this be a bigger problem in this type of schema
than in the others?
- This is an interesting question. The synthetic data loading presented
below may help us gain an understanding of this, but in fact, the limited
set of intrusions that are currently detected and the contrived nature of
synthetic testing may make this a difficult question to answer. The
benchmark results will hopefull provide some direction in the task of
ansewering this question, but it is clear that further research (in real
world trials) will be required before we can truly assess the impact of
these factors upon the performance of all of our schema designs.
|
|
|