In this assignment, you will implement a simplified version of the Buffer Manager layer, without support for concurrency control or recovery. You will be given the code for the lower layer, the Disk Space Manager.
You should begin by reading the chapter on Disks and Files, to get an overview of buffer management.
This material will also be covered in class. In addition, HTML documentation is available for Minibase.
There is a link to the Minibase home page here.
In particular, you should read the description of the DB class, which you will call extensively in this assignment.
The Java documentation for the diskmgr
package can be found here.
You should also read the code under diskmgr/ carefully to learn how package is declared and how Exceptions are handled in Minibase.
The simplified Buffer Manager interface that you will implement in this assignment allows a client (a higher level program that calls the Buffer Manager) to allocate/de-allocate pages on disk, to bring a disk page into the buffer pool and pin it, and to unpin a page in the buffer pool.
The methods that you have to implement are described below:
public class BufMgr {
/**
* Create the BufMgr object.
* Allocate pages (frames) for the buffer pool in main memory and
* make the buffer manage aware that the replacement policy is
* specified by replacerArg (i.e. LH, Clock, LRU, MRU, LFU, etc.).
*
* @param numbufs number of buffers in the buffer pool
* @param lookAheadSize number of pages to be looked ahead
* @param replacementPolicy Name of the replacement policy
*/
public BufMgr(int numbufs, int lookAheadSize, String replacementPolicy) {};
/**
* Pin a page.
* First check if this page is already in the buffer pool.
* If it is, increment the pin_count and return a pointer to this
* page.
* If the pin_count was 0 before the call, the page was a
* replacement candidate, but is no longer a candidate.
* If the page is not in the pool, choose a frame (from the
* set of replacement candidates) to hold this page, read the
* page (using the appropriate method from {\em diskmgr} package) and pin it.
* Also, must write out the old page in chosen frame if it is dirty
* before reading new page.__ (You can assume that emptyPage==false for
* this assignment.)
*
* @param pageno page number in the Minibase.
* @param page the pointer point to the page.
* @param emptyPage true (empty page); false (non-empty page)
*/
public void pinPage(PageId pageno, Page page, boolean emptyPage) {};
/**
* Unpin a page specified by a pageId.
* This method should be called with dirty==true if the client has
* modified the page.
* If so, this call should set the dirty bit
* for this frame.
* Further, if pin_count>0, this method should
* decrement it.
* If pin_count=0 before this call, throw an exception
* to report error.
* (For testing purposes, we ask you to throw
* an exception named PageUnpinnedException in case of error.)
*
* @param pageno page number in the Minibase.
* @param dirty the dirty bit of the frame
*/
public void unpinPage(PageId pageno, boolean dirty) {};
/**
* Allocate new pages.
* Call DB object to allocate a run of new pages and
* find a frame in the buffer pool for the first page
* and pin it. (This call allows a client of the Buffer Manager
* to allocate pages on disk.) If buffer is full, i.e., you
* can't find a frame for the first page, ask DB to deallocate
* all these pages, and return null.
*
* @param firstpage the address of the first page.
* @param howmany total number of allocated new pages.
*
* @return the first page id of the new pages.__ null, if error.
*/
public PageId newPage(Page firstpage, int howmany) {};
/**
* This method should be called to delete a page that is on disk.
* This routine must call the method in diskmgr package to
* deallocate the page.
*
* @param globalPageId the page number in the data base.
*/
public void freePage(PageId globalPageId) {};
/**
* Used to flush a particular page of the buffer pool to disk.
* This method calls the write_page method of the diskmgr package.
*
* @param pageid the page number in the database.
*/
public void flushPage(PageId pageid) {};
/**
* Used to flush all dirty pages in the buffer poll to disk
*
*/
public void flushAllPages() {};
/**
* Gets the total number of buffer frames.
*/
public int getNumBuffers() {}
/**
* Gets the total number of unpinned buffer frames.
*/
public int getNumUnpinned() {}
};
3. Internal Design
The buffer pool is a collection of frames (page-sized sequence of main memory bytes) that is managed by the Buffer Manager.
Conceptually, it should be stored as an array bufPool[numbuf]
of Page
objects.
However, due to the limitation of Java language, it is not feasible to declare an array of Page
objects
and later on writing string (or other primitive values) to the defined Page
.
To get around the problem, we have defined our Page
as an array of bytes and deal with buffer pool at the byte array level.
Note that the size of Minibase pages is defined in the interface GlobalConst
of the global
package.
Before jump into coding, please make sure that you understand how Page
object is defined and manipulated in Java Minibase.
In addition, you should maintain an array bufDescr[numbuf]
of descriptors, one per frame.
Each descriptor is a record with the following fields:
page_number, pin_count, dirtybit.
The pin_count field is an integer, page_number is a PageId
object, and dirtybit is a boolean.
This describes the page that is stored in the corresponding frame.
A page is identified by a page_number that is generated by the DB class when the page is allocated,
and is unique over all pages in the database.
The PageId
type is defined as an integer type in global
package.
A simple hash table should be used to figure out what frame a given disk page occupies. The hash table should be implemented (entirely in main memory) by using an array of pointers to lists of < page_number, frame_number > pairs. The array is called the directory and each list of pairs is called a bucket. Given a page number, you should apply a hash function to find the directory entry pointing to the bucket that contains the frame number for this page, if the page is in the buffer pool. If you search the bucket and don't find a pair containing this page number, the page is not in the pool. If you find such a pair, it will tell you the frame in which the page resides. This is illustrated in the next figure.
The hash function must distribute values in the domain of the search field uniformly over the collection of buckets. If we have HTSIZE buckets, numbered 0 through M-1, a hash function h of the form h(value) = (a*value+b) mod HTSIZE works well in practice. HTSIZE should be chosen to be a prime number.
When a page is requested the buffer manager should do the following:
bufDescr
array to reflect these changes.
Theoretically, the best candidate page for replacement is the page that will be last requested in the future. Since implementing such policy requires a future-predicting oracle, all buffer replacement policies try to approximate it one way or another. The LFU policy, for example, uses the past access information as an indication for the future.
The usage frequency of a page is defined to be the ratio between the number of accesses over the time since that page was brought into the buffer pool. For simplicity, the number of accesses of a page is counted as the number of calls to pin that page.
For example, a page P1 was brought into the buffer pool at the time T1. Since T1, page P1 has been pinned C1 times (and also unpinned C1 times so that P1 can be a candidate for replacement). If the current time is Tc, then the usage frequency of P1 at the time Tc is C1/(Tc-T1).
At a particular time, there can be a number of candidates, and you are free to choose how to maintain those candidates provided that when a candidate is needed to be replaced, the candidate chosen for replacement must be the one with the smallest usage frequency.
In this assignment you are supposed to implement the LFU with look ahead policy, a modified version of LFU. With look ahead policy, when a page is pinned, the buffer manager assumes that the subsequent pages will be pinned next. Thus if those subsequent pages are already in the buffer pool, we should avoid replacing those pages (otherwise we need more I/O operations to place them back to the buffer pool). If some subsequent pages are not in the buffer pool or not candidates, you can skip them.
How to maintain the candidate list and implement is your own choice. We inform you that if you keep the candidates in a priority queue (sorted by usage frequency), the order inside the queue may change over time. For example, page P1 was brought into buffer pool at time T1 = 0, and was pinned (and unpinned) 3 times. Page P2 was brought into buffer pool at time T2 = 3 and was pinned/unpinned twice.
Tc = 5 Tc = 15 P1[T1=0,C1=3] 3/5 3/15 P2[T2=3,C2=2] 2/2 2/12
The the time Tc=5, P1 has smallest usage frequency, but at the time Tc=15, P2 has the smallest usage frequency.
You may consider the following ways to implement look ahead (or devise your own way)
In the BufMgr
’s constructor, the look-ahead size LAH_SIZE is provided.
It is the number of pages to be checked and applied the LFU-LA policy.
Specifically, if a page with PageId
PID is pinned, the pages with PageId
from PID+1 to PID+LAH_SIZE will be considered
if they are already loaded in the buffer pool.
You can ignore those pages with PageId
s in that range but not in the buffer pool.
Please document your design decision in your readme file.
4. Error Protocol
Though the Throwable
class in Java contains a snapshot of the execution stack of its thread at the time
it was created and also a message string that gives more information about the error (exception),
we have decided to maintain a copy of our own stack to have more control over the error handling.
We provide the chainexception
package to handle the Minibase exception stack.
Every exception created in your bufmgr
package should extend the ChainException
class.
The exceptions are thrown according to the following rule:
Error caused by an exception caused by another layer (for example when try to pin page from diskmgr
layer):
try { Minibase.BufferManager.pinPage(pageId, page, true); } catch (Exception e) { throw new DiskMgrException(e, "DB.java: pinPage() failed"); }Error not caused by exceptions of others, but needs to be acknowledged (for example when try to unpin a page that is not pinned)
if (pin\_count == 0) { throw new PageUnpinnedException (null, "BUFMGR: PAGE_NOT_PINNED."); }Basically, the
ChainException
class keeps track of all the exceptions thrown across the different layers.
Any exceptions that you decide to throw in your bufmgr
package should extend the ChainException
class.
For testing purposes, we ask you to throw the following exceptions in case of error (use the exact same name, please):
BufferPoolExceededException
BufferPoolExceededException
when try to pin a page to the buffer pool with no unpinned frame left.
PagePinnedException
PagePinnedException
when try to free a page that is still pinned.
HashEntryNotFoundException
HashEntryNotFoundException
when page specified by PageId
is not found in the buffer pool.
Feel free to throw other new exceptions as you see fit.
But make sure that you follow the error protocol when you catch an exception.
Also, think carefully about what else you need to do in the catch
phase.
Sometimes you do need to unroll the previous operations when failure happens.
Please copy this file (proj21.zip) into your own Unix local directory. The directory lib contains the jar file that implements the disk manager, you will use this file to compile your code. The content of the directory src is:
bufmgr
package implemented under this directory.
diskmgr
package carefully before you implement the bufmgr
package.
Please refer to the java documentation of the packages.
We provide a sample Makefile to compile your project. You will have to set up any dependencies by editing this file. You may need to design your own Makefile. Whatever you do, please make sure that the classpath is correct.
You can find other useful information by reading the java documentation on other packages,
such as the global
and diskmgr
package.
You should turn in copies of your code together with the Makefile and a readme file. All files need to be zipped in a file named:
your_pucc_login1_your_pucc_login2_bufmgr.zip.
In the Readme file, put the name of your group members, your design, and the feature you would like me to know. We should be able to compile/run your program using make on CS department’s Linux machines. You don’t need to include the library file and other files provided.
The directory structure of your zip file should be identical to the directory structure of the provided zip file (i.e., having the directory src, the Makefile, ...), except the top-level name (should be your pucc login above). Your grade may be reduced 5% if you don’t follow this.
In this part, you will implement the Heap File layer. You will be given the documentation for the lower layers (Buffer Manager and Disk Space Manager). You can find the package index for the above here.
This assignment has three parts. You have to do the following:
HFPage
in Minibase;
you may find it helpful (although it isn’t strictly necessary) to implement this class to help in building HeapFile
.
The major methods of HeapFile.java that you will implement include:
public HeapFile(String name) public RID insertRecord(byte[] record) public Tuple getRecord(RID rid) public void updateRecord(RID rid, Tuple newRecord) public void deleteRecord(RID rid) public int getRecCnt() //get number of records in the file public HeapScan openScan()2. Available Documentation
You should begin by reading textbook the chapter on Disks and Files to get an overview of the Heap File layer and buffer management. In addition, HTML documentation is available for Minibase.
3. Classes to Familiarize Yourself With FirstThere are three main packages with which you should familiarize yourself: heap, bufmgr, diskmgr. A Heap file is seen as a collection of records. Internally, records are stored on a collection of Page objects.
4. Compiling Your Code and Running the TestsCopy this file (proj22.zip) to your own local directory and study them carefully. The files are:
You should turn in copies of your code together with the Makefile and a readme file. All need to be zipped in a file named:
your_pucc_login1_your_pucc_login2_heapfile.zip.
In the readme file, put the name of your group members, your design, and the feature you would like me to know. We should be able to compile/run your program using make on CS department’s Linux machines. You don’t need to include the library file and other files provided.
The directory structure of your zip file should be identical to the directory structure of the provided zip file (i.e., having the directory src, the Makefile, ...), except the top-level name (should be your pucc login above). Your grade may be reduced 5% if you don’t follow this.