CS542 -- Project Description
1. Implement Centralized Two
Phase Locking (2PL)
You will implement the
centralized 2PL (Global 2PL) approach to Concurrency Control and be able to
detect/resolve deadlocks.
You may use Centralized
Control for global decisions.
You will generate at least
three sites; you will maintain a lock table and a wait-for (dependency graph)
at one site!
This means that all
transactions arriving at different sites are sent to the centralized site.
Queues for requesting /releasing locks will be maintained at this site.
Processing must take place at the site where transaction is submitted. Locks
will be released after waiting on the database at Centralized site. Updates for
other sites will be sent (we assume fully replicated database) and they may
arrive in different order. Your program must ensure that all updates at all
sites are posted in the same order.
You may implement the
algorithm of section 11.3.1 on page 318 of our text book (Ozsu)
or a variation of it.
2. Local vs
Global Transactions in Optimistic CC
More details are available
in the note I handed out in class and a paper Resilient Concurrency Control in Distributed Database Systems present
in the Chapter 10 of Prof. Bhargava's book (on reserve in Math library).
You will implement the
optimistic approach to concurrency control and maintain the data structures.
You will do local and global validation for each transaction.
You will send to a site, a
mix of local transactions (that require and update at only this site) and
global transaction
(that require an update on all copes).
You will generate at least
three sites and maintain a list of logical objects and directory of copies as
in Site Recovery in Replicated
Distributed Database Systems description (Chapter 11 of Prof. Bhargava's book. Its
on reserve in Math library).
You may assume that no
failures occur in the system.
One of the problems you will
have to figure out is how to keep the list of committed transactions against
which a global transaction must validate fixed
at each site without delaying the local transaction.
3. Replicated Copy Control:
Correct Read-From Precedence
Transactions are validated
and committed in the concurrency control server, while the database on
different sites is posted by the I/O server (or access manager). There is an
interval between the time of commit of a transaction
Ti, and the time of update by it on the database. During this interval, a new
transaction can read an old value.
For concurrency control, the
new transaction is serialized after the commitment of Ti, while for accessing
the database it is before Ti.
The research and
implementation question is:
a ) How to eliminate this
interval?,
and
b ) How to establish the
correct read-from relation for each pair of transactions for all servers on a
site?
You should design a policy
(different than total blocking of the new transaction), and implement it.
You will need to have at
least two servers for each site. You need to generate at least two sites.
You will need data
structures that contain:
a) transaction id and time
stamp of commit,
b) transaction id and time stamp of reading an object,
c) transaction id and time stamp of writing an object,
d) read-from relation for each transaction w.r.t. all
other transactions (conflict graph).
You will implement a
protocol that will produce the same read-from relation for all transactions on
each sites.
You may be able to measure
the delays caused by your policy.
4. Adaptability
More information is
available in the paper ``A Model for
Adaptable Transaction Processing'', in IEEE Transaction on Knowledge &
Data Engineering, December 1989 (Given as handout in the class).
You will set up a data
structure to capture the read/write set and time stamps for transactions that
arrive on the site. This information will be used by three types of concurrency
control;
You will output the serializable history as accepted
by each of the above protocols. In addition you should be able to switch from
one protocol to another. Certain conditions must be satisfied that are given in
the technical report and need further study.
For distributed concurrency
control you may want to use the decentralized protocol of Rosenkrantz,
Stearn and Lewis (ACM Transactions on Database
Systems, June 1988).
You will implement the
a) would-wait protocol,
b) wait-die protocol,
and should be able to switch
among them.
If you like, you may choose
to adapt from distributed control to centralized control protocol or
vice-versa. (This can be done for concurrency control as well as commit
protocols.)
(Group project is possible.)
5. Semantics
The objective of this project
is to identify semantics useful for concurrency control and/or recovery. The
semantics will be provided by the user as a part of each transaction (e.g.,
breakpoints suggested by Lynch in ACM Transactions in Database Systems,
December 1983, types of locks, commutativity of
operation, user defined dependencies among transactions, types of objects and
operations, etc.).
The concurrency control
protocol will use this semantics to serialize transactions and eliminate
partial executions.
You will need to implement a
concurrency control method and change the transaction generator to add the
semantics.
6. Three Phase Commit
Protocol
More details are available
in the Ph.D. thesis of Dale Skeen Chapter 5, Section 3, on page 102 of his
paper (available from Prof. Bhargava).
A minimum of three sites
should be generated. Each site will contain data structures:
Your program will initiate rounds of messages to terminate or commit a
transaction and update the data structures. After two or three rounds, a
decision will be made for the transaction. You will initiate transactions at
each site. You will also simulate failures of different types.
You do not have to update
the database.
7a. Site Failure in
Partially Replicated Database
More details are available
in the paper "Site Recovery in
Replicated Distributed Database Systems", 6th IEEE Distributed
Computing Systems Conference, May 1986 (also Chapter 11 of Prof. Bhargava's book on reserve in Math library).
You will implement the
protocol ``Read one copy, write all available copies.''
A minimum of three sites
should be generated. Each site will contain data structures for:
Transactions will arrive at each site. A copy will be read from this site or
another site that is up and contains a copy. All copies at up sites will be
update. The up/down status is determined by session vector.
You will simulate the
up/down status of sites. You may assume initially that at least one site is
always up. You will use a control transaction to announce the failure. You will
recover a site by updating the data structures correctly,and will announce the recovery via another control
transaction.
Initially you do not have to
worry about recovering the database.
(Note that reading or
writing of database will be simulated.
The data structures will be updated in reality.)
7b. Network Partitions (Two
Partitions)
Several protocols are given
in Chapter 1 of Prof. Bhargava's book (on reserver in Math library) to deal with this problem.
You would need to implement
an optimistic or a pessimistic approach.
For the pessimistic
approach, you may read the technical report "Maintaining Mutual Consistency of Replicated Files in Partitioned
Distributed Database System", which is available from Prof. Bhargava.
You will need to make
several data structures:
You
will generate at least three sites. You will maintain a set of logical files
and a directory of
physical copies (it could be the same as site name vector). You will simulate
two partitions.
(Note: You will not do any
actual reading or writing of files for any transaction.)
You will implement the
following procedures:
These procedures will allow you to do all the necessary work to deal with
transaction processing during network partitions.
8. Fail Locks and Database
Recovery (Multiple Failures)
(See notes for site failure
in partially replicated database.)
Instead of recovering the
site, you will concentrate on ensuring that the database is fully recovered at
the failed site. The operational sites will maintain fail-locks (for each
failed site) on copies that are updated. The recovering site will collect
fail-lock table from each site and mark the corresponding data-items as
unavailable. It will allow processing of transactions on other data items. If
any site is down, you will maintain the fail-lock table on the
recovered site if any other site is down.
The unavailable data-items
will become available by one of the two approaches:
We may be able to measure by simulating the time it takes to make all
unavailable copies up-to-date.
9. Peer-to-Peer
Multimedia Distribution
Most of the contents distributed over the current P2P systems such as Kazaa and Gnutella are
multimedia files, e.g., music files and videos. Nonetheless, current
systems distribute these multimedia files using
bulk-file download, which does not consider the time inter-dependency among
different segments of a multimedia file. This means that a client has to download the entire file before
starting playing it out. Consider for example, a one-hour movie
recorded at 1 Mb/s, and being downloaded by a
client with an in-bound bandwidth of 1.5 Mb/s. Ignoring all protocols
overhead and retransmissions, the client will
have to wait for 40 minutes to start watching the movie!
In this project, you will
develop a real-time media streaming system on
top of a P2P lookup service. You can use any lookup service
you want. Examples include, JXTA (jxta.org),
Gnutella, and FreePastry. The real-time
streaming system will start playing out the requested movie after a short
(e.g., order of seconds) waiting period. For more information, check the paper co-authored by professor Bhargava and available at (http://www.cs.purdue.edu/homes/bb/jmms03.pdf).
You should do the following:
A minimum of four sites (a client and at least two senders
concurrently sending and one backup) are required.
If interested, first read the above paper. Then, discuss with the TA your ideas and the scope of the project.
10. Verifying Data Integrity in Peer-to-Peer Video
Streaming
In P2P video streaming,
multiple senders serve one client. Each sender provides different segments of
the video. The segments are assembled and ordered by the client. A major
concern in the P2P environment is whether a peer provides genuine data. If a
peer provides corrupt data, it severely impact the quality of the streaming
session. In this project you will implement distributed data verification
protocols in P2P environment. The verification protocol has to run in
real-time, i.e., during the streaming session. Two data verification protocols
are proposed in the following paper:
Ahsan Habib, Dongyan
Xu, Mikhail Atallah, Bharat
Bhargava, "Verifying
Data Integrity in Peer-to-Peer Video Streaming," CERIAS TR 2002-39,
Purdue University, Revised 2003, submitted for review.
You should do the following:
A minimum of four sites (a client and at three senders) are required.
If interested, first read the above paper. Then, discuss with the TA your ideas and the scope of the project.