Please go over the concepts in chapters 1-3.
Read about various types of, data model, schema/subschema, data independence, Fig. 1.1, Fig 2.3, Fig 3.1.
Read about benefits of using a DBMS.
Read appendix B on disk parameters.
Main ideas that we have discussed in first week of class.
Need to think about user, database, schema/catalog, DBMS software/utilities.
Need to think about implementing query processing, recovery, privacy, concurrency sotware.
Parameters for performance evaluation and how to implement ideas that can improve performance (query optimization, I/O optimization, reducing I/Os by getting more data in one I/O operation for use later).
Relationship among data, implementing hierarchical and network model that query/update one record at a time vs relation model that deal with set of records.
Type of queries for different models. Hierarchical model needs, commands such get next, get next within parent. Network model goes for one entity to another entity via links and uses the command find next. I introduced these briefly, but more in "other materials" after slides for Chapter 29.
We move on to discuss Chapter 17 briefly, so we know how to think of implementing I/O.
A B-tree (Bayer & McCreight 1972) of order 5 (Knuth 1998). Image by Wikipedia
Difference between B Tree and B+ Tree.
See https://en.wikipedia.org/wiki/B%2B_tree.
A B+ tree can be viewed as a B-tree in which each node contains only keys (not key-value pairs), and to which an additional level is added at the bottom with linked leaves.
See tree below.
A simple B+ tree example linking the keys 1-7 to data values d1-d7.
The linked list (red) allows rapid in-order traversal. This particular tree's branching factor is b=4.
Image by Wikipedia
Time to search a sorted file.
Large databases have historically been kept on disk drives. The time to read a record on a disk drive far exceeds the time needed to compare keys once the record is available. The time to read a record from a disk drive involves a seek time and a rotational delay. The seek time may be 0 to 20 or more milliseconds, and the rotational delay averages about half the rotation period. For a 7200 RPM drive, the rotation period is 8.33 milliseconds. For a drive such as the Seagate ST3500320NS, the track-to-track seek time is 0.8 milliseconds and the average reading seek time is 8.5 milliseconds. For simplicity, assume reading from disk takes about 10 milliseconds.
Naively, then, the time to locate one record out of a million would take 20 disk reads times 10 milliseconds per disk read, which is 0.2 seconds.
An index speeds the search.
See an example of Insertion and the tree below.
A B Tree insertion example with each iteration.
The nodes of this B tree have at most 3 children (Knuth order 3).
Image by Wikipedia
If students get a chance, please go over the example 1 on page 605, example 2 on page 606, example 3 on page 609, example 4 on page 614.
Please read about constraints in relational model.
Please think about how to store constraints(in schema or in another relation). Also, think about how to store schema and subschema (again in a relational model).
How to verify constraints (after each action or after a set of actions)? What overheads in terms of disk block accesses are caused? How to minimize the overhead?
In chapter 19, please see how index structures are used when sorting a large file.
I have discussed material covered, types of questions etc in class. I will be available before and after the class (outside Physics 223) and you may come by to learn more. We can go across the bridge to MSEE and sit on a desk.
More in Class.
There are five questions in midterm examination.
Question 1 is worth 4 points and check the vocabulary of database management system from various chapters.
Question 2 is worth 3 points and asks about integrity assertions, types of attributes, and finding the key of a relation.
Question 3 is worth 7 points and asks questions about relational algebra and SQL queries. It has questions on disk storage, index structures, and access methods based on chapters.
Question 4 is worth 7 points and is multiple choice.Question 5 is worth 4 points and includes questions on normalization of relations.
Some questions will be based on projects and home works as shown in class.
You should read chapters 1, 2, some ideas (relationships, ER diagram, weak entities) from chapter 3.
Chapters 5, 6, some understanding of chapter 8, 14 ( 1st, 2nd, 3rd, Boyce-Codd), and 15 (inference rules, equivalence, decomposition). Chapters 16 and 17 as covered in class.
Please read handouts
VLSI for Relational Algebra. Kung, H. T., and Philip L. Lehman. Proceedings of the 1980 ACM SIGMOD international conference on Management of data. ACM, 1980. (pdf) (Just for Knowledge)
Design of intelligent query systems for large databases, Bharat Bhargava, in Lecture notes in Computer Science 2005 (pdf)
Please see example of conflict graph and 2PL, crash recovery, time stamp, optimistic from Jeff Ullman's book: (pdf)
Theorem 11.2 proves that if you follow 2PL, you will have serializable schedules.
Basic concepts of locking and serializability can also be learnt from: pdf
Protecting aginst crashes, time stamp, optimistic CC are discussed in: pdf
Dear Students:
Please read about the following topics.
Chapter 20
What is a transaction and desirable properties
What is read and write set and conflicts
Why CC is needed, performance of CC
Lost update, dirty read, incorrect summary etc
Why recovery is needed, types of failures
REDO, UNDO logs
Commit point of Transactions
Schedule or history of a set of transactions
Non recoverable and recoverable schedules
Cascade roll back, serial, serializable schedules
Equivalence of schedules
Testing for serializability, conflict or precedence graphs
View equivalence, view serializability
Chapter 21
Locking, two phase locking, time stamps, validation, optimistic protocols
Types of locks, binary, shared ( Read/Write Locks) Deadlock detection, wait for graphs, live lock, granularity of locking.
2PL guarantees serializability (how to provide a proof –see handout from Ullman’s book).
Types of 2PL ( basic, conservative, strict, rigorous).
Where to store lock tables, how to implement 2PL.
Deadlocks, Live locks.
Wait-die, wound-wait ideas in time stamps.
Basic TO.
Optimistic , validation techniques for CC, overheads.
Concurrency Control in Database Systems, Bharat Bhargava, IEEE Trans on Knowledge and Data Engineering,11(1), Jan.-Feb. 1999. (pdf)
The Transaction Concept: Virtues and Limitations , Jim Gray, VLDB, 1981. (pdf)
Preparing for Final examination of CS 448 (These are just suggestions but the exam is comprehensive. Any topic not discussed in class or PSO is not in examination). Many questions appropriate for final exam have been mentioned in class). I am asking TAs to send me any questions form PSOs. Students may also propose questions. We are identifying multiple choice questions.
Please think about questions that arise from implementation of a database system.
Chapters covered briefly relational model, relational algebra and calculus ( tuple, domain, SQL etc).
Chapters with many questions: normalization, dependency theory, concurrency control, recovery, privacy (all topics covered after midterm)
Material coved in final exam BRIEFLY:
Chapter 1 - Introduction (many definition type of questions) (all sections)
Chapter 2 - Database System Concepts and Architecture (basic questions) (all sections)
Chapter 5 - The Relational Data Model Relational Database Constraints (must know in depth)
Chapters 6 and 8- The Relational Algebra (chapter 8 section 8.1-8.6) and Basic SQL ( must know in depth)
Optimization ideas for query processing and algebra expressions, different algebra operators, outer/inner joins, time spent in I/O on disks (chapter 16 section 16.1-16.4), and steps in processing queries.
Chapter 7 - SQL-99: Schema Definition, Constraints, and Queries and Views
Material covered in final exam EXTENSIVELY:
Chapter 14 - Basics of Functional Dependencies and Normalization for Rational Databases
(pdf) (Extensive)
Chapter 15 - Relational Database Design and Algorithms and Further Dependencies
(pdf) (Extensive)
Feature analysis of selected database recovery techniques, Bharat Bhargava and Leszek Lilien, Proceedings of the May 4-7, 1981, National Computer Conference. ACM, 1981.(pdf)
I have mention some questions in class in last class.
Please read all definitions that are in bold wording in book. Please go over some review questions at the end of chapters.