Time: Tue., Thu. 10:30am - 11:45am
Room: Lawson 1106
Instructor: Prof. Bharat Bhargava . Office: LWSN 2116F, Tel: 7654137322, Email: bbshail@purdue.edu, Available hours: By appointment only preferably on zoom or face time.
TA: Shafkat Islam (islam59@purdue.edu). Office: LWSN B132(#9). Available hours: Tue: 12-1pm (LWSN Commons), Fri: 9.30am-10.30am (HAAS 175) or By appointment.
Midterm Exam: March 22 at 8pm in Lawson B155
Final Exam: May 04 at 1pm in FRNY B124
Teaching Material
This course will deal with the fundamental issues in large distributed systems which are motivated by the computer networking and distribution of processors, and control. The theory, design, specification, implementation, and performance of large systems will be discussed. Concurrency, Consistency, Integrity, Reliability, Privacy, and Security in distributed systems will be included.
A special feature of the course includes interesting problems in Mobile Ad hoc networks and Cloud Computing that can benefit from research ideas in distributed systems. Research related to Mobile Computing, Streaming databases, Video conferencing, Peer to Peer systems, Cloud computing will be covered.
Textbooks:
- Principles of Distributed Database Systems, Prentice Hall, Tamer Oszu and Patrick Valduriez, (Copy on reserve in LWSN reception office book shelf and on the Springer website from the university network) (MAIN TEXT)
- Concurrency Control and Reliability in Distributed Systems, Van Nostrand and Reinhold Publishers by Bharat Bhargava (Ed.), 1987 (Copy on reserve in LWSN reception office book shelf)
- Transaction Processing: Concepts and Techniques, Morgan Kaufmann, Jim Gray and Andreas Reuter, 1992 (Copy on reserve in LWSN reception office book shelf)
Course Outline:
- Architecture, General Systems Issues, Example Systems ....(3 Hrs.)
- Distributed control for synchronization and concurrency (will include the models for concurrent processing and transactions, theory of serializability, classes of concurrency control approaches, performance evaluation of these classes, centralized control vs. decentralized control).........(12 Hrs.)
- Distributed commitment/termination (involves preservation of atomicity of transaction execution, blocking/non-blocking protocols).........(6 Hrs.)
- Resiliency in distributed systems (involves design of protocols for site failure, network partitioning, loss of messages or variable transmission delays, consistent recovery)..........(3-6 hours.)
- Security in distributed systems. (involves study of a variety of attacks on the components of system (such as on routing protocols in ad hoc networks), privacy issues in Peer to Peer systems, trusted collaboration and dissemination of data among cooperative entities).........(9-12 hours)
- Design and Implementation of Prototype/commercial systems, Experimental Evaluations. Details of peer to peer system developed at Purdue and several commercial systems .....(9 Hrs)
- Research Issues in Cloud Computing (see web site under CS 590: http://www.cs.purdue.edu/homes/bb/#teaching)
Assignments and Grading Policy:
The following assignments/exams etc. are planned.Non-programming assignments (4/5) Once every two weeks. |
20% |
Programming assignment (1) A programming project that you select from a choice of 12-15 projects. You may suggest a project based on your interest. |
30% |
Class contributions (e.g., class attendance and participation, discussions, outside reading/presentation of research papers). You should attend all classes for highest class contribution credit. |
10% |
Midterm Exam | 20% |
Final Exam | 20% |
You can expect a grade of A- if you score 90-93, an A if you score 94 or higher. Same scale for B and C grade. If you do not do any part of the project or home works or score less than 75%, a C grade is possible.
Programming Assignment
- Your programming project can involve implementing some components for integrity, security, reliability, or privacy policy. Communication facility for LAN, or routing in mobile ad-hoc networks can also be developed.
- You can conduct experiments on ns2, or peer to peer system prototype, Java RMI. I suggest using Java and Java RMI for communications, but other programming languages (e.g., Javascript, C, etc) and communication frameworks are acceptable. If you haven't used RMI, this tutorial available on oracle is a good starting point.
Class contribution
Some things that contribute to class participation:- Attending classes, outside seminars of interest, interacting inside or outside class with professor and TA, contributing to other students learning, helping with projects, contributing to Piazza.
- If you contribute by reading extra material form other books, research papers, it will count towards class participation.
- Some of these items may not be possible for online students.
Using Piazza
We will use piazza.com for the class. I encourage you to register there if you are not registered yet since it is the best way to ask questions related to the course. Besides, some announcements related to assignments will be done via Piazza. Once you are registered, you will be able to find the class "CS 54200 Distributed Databases" and join it as a student. If you can't find the class then use the following link to sign-up for it: Join CS 54200 (Spring 2022) in Piazza.Late Submission Policy
Penalty for late homework is 10% for each class day after the due date. Problems on the grading of assignments/exams/project must be resolved within one week after the graded work is returned or score is posted in Blackboard. The grades will not be modified after the one week time period. Please, refer to Piazza for further explanation of the Policy.Absence from Class
You must attend all the classes. Absence in five or more classes will lower your final grade. Please inform TA or instructor in advance of absence due to illness, job interview or family issue. However, we will accomodate all who are required to be quarantined due to covid or other health related issues. Please contact prior to the class in such situation.Academic Dishonesty
One can not use any part of another person's work in his/her assignments and projects. If an overlap in any submission for grading is detected, an automatic grade of F in course will be assigned to the student and it will be reported to graduate school and deans. You are welcomed to discuss and learn from others. If you use any material in your homework, you have to put a reference to it. If you do a group project, specify individual contributions clearly at the time of submission and let me and TA know in advance of collaborating. Read the following links: Course Policies, Academic Integrity.Comprehensive Reading list
Along the semester, try reading at least one paper for each section.Big Data
- What is Big Data?, Jennifer Dutcher, September 2014.
- Big Data Means at Leat Three Different Things, Michael Stonebraker (Slides).
- Data Science and Prediction, Dhar, Vasant. Data science and prediction. Communications of the ACM 56.12 (2013): 64-73.
- Bigtable: A Distributed Storage System for Structured Data. J. Fay Chang, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra & R. Gruber. (2006, November). In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (Vol. 7, pp. 15-15).
- H-Store: A High-Performance, Distributed Main Memory Transaction Processing System. Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J. Abadi. VLDB, 2008
Distributed Systems
- Time, Clocks, and the Ordering of Events in a Distributed System, Lamport, Leslie. Communications of the ACM 21.7 (1978): 558-565.
- Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems. J. Mace, R. Roelke, & R. Fonseca. ACM Transactions on Computer Systems (TOCS), 35(4), 1-28. 2020, March.
Implementation
- Building Distributed Database Systems, Bharat Bhargava.
- The Raid Distributed Database System, Bharat Bhargava and John Riedl, IEEE Trans on Software Engineering, 15(6), June 1989.
- PROMISE:Peer-to-Peer Media Streaming Using CollectCast,M. Hefeeda, A. Habib, B.Botey, D. Xu, and B. Bhargava, in Proc. of ACM Multimedia, 45-54, Berkeley, CA,November 2003.
- Adaptability Experiments in the RAID Distributed Database System , B. Bhargava, K. Friesen, A. Helal, and J. Riedl. Adaptability Experiments in the Raid Distributed Database Systems, in Proceedings of Ninth IEEE Symposium on Reliability in Distributed Systems, Huntsville, AL, October 1990
- Operating System Support for Database Management, Michael Stonebraker, Communications of the ACM, Volume 24 Issue 7, Pages 412-418, July 1981.
- Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. 2017. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD 17).
- Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership ChangesAlexandre Verbitski, Anurag Gupta, Debanjan Saha, James Corey, Kamal Gupta, Murali Brahmadesam, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvilli, and Xiaofeng Bao. 2018. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD 18).
- Amazon Aurora
- Amazon Aurora Global Database
Useful Links:
Concurrency
- Concurrency Control in Database Systems Bharat Bhargava, IEEE Trans on Knowledge and Data Engineering,11(1), Jan.-Feb. 1999
- A Model for Adaptable Systems for Transaction Processing, Bharat Bhargava and John Riedl, IEEE Transactions on Knowledge and Data Engineering, 1(4), Dec 1989.
- A Causal Model for Analyzing Distributed Concurrency Control Algorithms, B. Bhargava and C. Hua, IEEE Trans. on Software Engineering, SE-9, 470-486, 1983.
- Global scheduling for flexible transactions in heterogeneous distributed database systems, A. Zhang, M. Nodine, and B. Bhargava. IEEE TKDE, 13(3), 2001.
- Concurrency Control in Distributed Database Systems , P. Bernstein, N. Goodman, ACM Computer Survey, 13(2), 1981.
- Concurrency control in a system for distributed databases (SDD-1), P. Bernstein, D.Rothnie, ACM Transactions on Database Systems, 5(1), 1980.
- The Transaction Concept: Virtues and Limitations , Jim Gray, VLDB, 1981.
- On Optimistic Methods for Concurrency Control , H.T. Kung and John T. Robinson, ACM Trans. Database Systems, 6(2), 1981.
- The serializability of concurrent database updates, C. Papadimitriou, Journal of the ACM (JACM), 26(4), 1979.
- Granularity of Locks and Degrees of Consistency in a Shared Data Base, J. Gray, R. Lorie, G. Putzolu, I. Traiger. Modelling in Data Base Management Systems, G.M. Nijssen (ed). North Holland Publishing Company, 1976.
- Real-time lock-based concurrency control in distributed database systems. O. Ulusoy, G. Belford. Proceedings of the 12th International Conference on Distributed Computing Systems, 1992.
- 2PL and Conflict Graph (Proof of 2PL). [pdf]
- Locking-Serializability. [pdf]
- Optimistic-timestamps-Failure-Commitment. [pdf]
Useful handouts (Jeff Ullman's Principles of Database book):
Communication Network
- Communication Facilities for Distributed Transaction Processing Systems, E. Mafla, and B. Bhargava, IEEE Computer, 24(8), 1991. [slides]
- A Communication Framework for Digital Libraries, B. Bhargava and M. Annamalai, Multimedia Tools and Applications International Journal, Volume 10, No 2, pp 205-236, April 2000.
- Evolution of a communication system for distributed transaction processing in RAID, B. Bhargava, Y. Zhang, and E. Mafla, Computing Systems Journal, 4(3), 1991.
- WANCE: Wide area network communication emulation systems, Y. Zhang and B. Bhargava, IEEE workshop on Parallel and Distributed Systems, 1993 [slides]
- Bandwidth Measurements for VMs in Cloud, Amit Gupta, Rohit Ranchal [slides]
- A Facility For Experiments Distributed Software in the Internet. Y. Zhang and . Bhargava. In Proceedings of the IEEE Workshop in Advances in Parallel and Distributed Systems.
- Improve Operations of Data Center Networks with Physical-Layer Programmability. Yiting Xia. April, 2020. [slides] [video]
- High Speed Trading, When milliseconds mean millions - March 1, 2017.
- High Speed Trading, Why Michael Lewis thinks the stock market is rigged by high-frequency trading - April 9, 2014
- High Speed Trading, Is the U.S. stock market rigged? - March 30, 2014.
- High Speed Trading, Wall Street's Speed War - September 9, 2010.
Talks:
Recommended articles:
Security
- A Taxonomy of Security Attacks and Issues in Vehicular Ad-Hoc Networks (VANETs), Parul Tyagi, Deepak Dembla, International Journal of Computer Applications (0975 - 8887) Volume 91 - No.7, April 2014.
- Detecting Service Violation in Internet and Mobile Ad Hoc Networks [slides] [abstract]
- An MTD-based Self-Adaptive Resilience Approach for Cloud Systems. M. Villarreal-Vasquez, B. Bhargava, P. Angin, N. Ahmed, D. Goodwin, K.Brin and J. Kobes. IEEE CLOUD, June 2017. [slides] [demo]
- Attacks on Networks
- Vulnerabilities and Threats in Distributed Systems. ICDCIT, Dec. 2004. [slides] [extended slides]
- Collaborative Attack in Wireless Ad Hoc Networks. Bharat Bhargava. CERIAS, 2007. [slides]
- Secure / Resilient Systems and Data Dissemination / Provenance. Bharat Bhargava. NGCRC Symposium, Apr. 25-26, 2016. [slides] [demo#1 video]
- Talk for Secure Data Warehouse. [slides]
Talks:
Privacy
- Trust-Based Privacy Preservation for Peer-to-peer Media Streaming, Y. Lu, W, Wang, D. Xu, and B. Bhargava, in Proceedings of Secure Knowledge Management (SKM) Amherst, NY, September 2004 [slides]
- Private and Trusted Collaborations, B. Bhargava and L. Lilien, in Proceedings of Secure KnowledgeManagement (SKM) Amherst, NY, Septemeber 2004.
- Privacy - Preserving Data Dissemination in Untrusted Cloud. Denis Ulybyshev, Bharat Bhargava, Miguel Villarreal-Vasquez, Aala Oqab Alsalem. IEEE Cloud, 2017. [slides] [video]
- Innovative Ideas in Privacy Research. Bharat Bhargava. Sept. 2006. [slides]
- Active Bundle for Privacy
Talks:
Useful Resources:
Reliability
- An Experimental Analysis of Replicated Copy Control During Site Failure and Recovery, B. Bhargava, P. Noll, and D. Sabo. In Proceedings of International Conference on Data Engineering, p.82-91, Feb. 1988.
- Resilient Concurrency Control in Distributed Database Systems, B. Bhargava, IEEE Trans. on Reliability, R-31(5): 437-443, 1984.
- Optimism and Consistency in partitioned distributed database systems, S. B. Davidson, ACM Trans. on Database Systems, 9(3): 456-481, 1984.
- Consistency in Partitioned Networks, S. B. Davidson, H. Garcia-Molina, and D. Skeen, ACM Computer Survey, 17(3): 341-370, 1985.
- Nonblocking commit protocols, D. Skeen, ACM SIGMOD, 1981.
- A Decentralized Termination Protocol, D. Skeen, IEEE Symposium on Reliability in Distributed Software and Database Systems, July, 1981.
- A Formal Model of Crash Recovery in a Distributed System, D. Skeen and M. Stonebraker, IEEE Trans. on Software Engineering, 9(3): 219-228, 1983.
- Detection of Mutual Inconsistency in Distributed Systems, Jr. D. Parker, et al., IEEE Trans. on Software Engineering, SE-9, 1983.
- How to Assign Votes in a Distributed System, Hector Garcia-Molina, Daniel Barbara, J. ACM 32(4): 841-860, 1985.
- A History of the Virtual Synchrony Replication Model, Ken Birman. Appears in Replication:Theory and Practice. B. Charron-Bost, F. Pedone, A. Schiper (Eds), Replication, LNCS 5959, pp. 91–120, 2010.
- Reliable communication in the presence of failures, K.Birman, T. Joseph, ACM Transactions on Computer Systems (TOCS), Volume 5 Issue 1, Feb. 1987.
- Transaction Processing and Consistency Control of Replicated Copies during Failures in Distributed Databases, B. Bhargava, Journal of Management Information Systems, Vol. 4 No. 2, Fall 1987.
- Site Recovery in Replicated Distributed Database Systems, B. Bhargava, Computer Science Technical Reports, Report Number: 85-564, 1985.
- Consistency in Partitioned Networks, S. B. Davidson, H. Garcia-Molina, and D. Skeen, ACM Computer Survey, 17(3): 341-370, 1985.
- A Dynamic Majority Determination Algorithm for Reconfiguration of Network Partitions, B. Bhargava and P.L. Ng, Information Sciences 46, pp 27-45, 1987.
- PLANET: Making Progress with Commit Processing in Unpredictable Environments, Gene Pang, Tim Kraska, Michael Franklin, Alan Fekete, SIGMOD 2014.
Mobile
- Peer-to-Peer File-sharing over Mobile Ad Hoc Networks, G. Ding and B. Bhargava, in the first International Workshop on Mobile Peer-to-Peer Computing, Orlando, Florida, March 2004.
- Data Consistency in Intermittently Connected Distributed Systems, E. Pitouri and B. Bhargava, IEEE TKDE, 11(6), 1999.
- Maintaining Consistency of Data in Mobile Distributed Environments, Evaggelia Pitoura, Bharat Bhargava, ICDCS, 1995.
- Optimal File Allocation in Multiple Computer System, W. W. Chu, IEEE Transaction on Computers, 885-889, October 1969.
Cloud Computing
- http://www.cs.purdue.edu/homes/bb/cs590/
- Hey, You, Get Off of My Cloud
- Amazon's response to "Hey, you, get off my cloud"
- An Entity-centric Approach for Privacy and Identity Management in Cloud Computing
- A Mobile-Cloud Collaborative Traffic Lights Detector for Blind Navigation
- Paper- Above the Clouds: A Berkeley View of Cloud Computing
- PPT- Above the Clouds: A Berkeley View of Cloud Computing
- NIST Cloud Computing
NISTCloud Computing Slides - Chunxiao Li, Anand Raghunathan, Niraj K. Jha Secure Virtual Machine Execution under an Untrusted Management OS(Slides) (Presenter: Pelin Angin or Prof Raghunathan of ECE)
- Security and Privacy in Cloud Computing
- Mehdi Azarmi, et. al., End to End security in Service Oriented Architecture (Slides) (Presenter: Mehdi Azarmi)
- A guardian Angel in your car. Tekla Perry. IEEE Spectrum, 2019.
Additional Reading Material
- Though there are many distributed databases to choose from, some examples of distributed databases include Apache Ignite, Apache Cassandra, Apache HBase, Couchbase Server, Amazon SimpleDB, Clusterpoint, and FoundationDB. Apache Ignite specializes in storing and computing large volumes of data across clusters of nodes.
- Conceptually the one the comes to mind is your old school common telephone book. Every area has phone numbers specific to their area. Collectively it is not feasible to store all the possible numbers in one place. The book would be about the size of a small skyscraper. But divided up a telephone book provides 99% of the commonly needed numbers for your local community and can be easily published and traversed.
- In distributed systems, data is distributed among different database systems of an organization. These database systems are connected via communication links. Such links help the end-users to access the data easily. Examples of the Distributed database are Apache Cassandra, HBase, Ignite, etc.
- Apache Cassandra
- Apache HBase
- Distributed Database For High Performance Applications With In Memory Speed
- Distributed Database System
- Distributed DBMS
- Distributed Databases
- SQL vs NoSQL: What is the Difference Between SQL and NoSQL
- MongoDB
Student Proposed Papers
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing - USENIX
- The Case for Shared Nothing
- Bigtable: A Distributed Storage System for Structured Data
- Dynamo: Amazon's Highly Available Key-value Store
- Practical Software Model Checking via Dynamic Interface Reduction
Homework
Homework assignments are generally due on Thursday midnight (EST). Check individual homework for the exact due date. The homework should be typed and submitted via BrightSpace.- Homework 1 Due on 02/01/2022 11:59 PM (EST)
- Homework 2 Due on 02/23/2022 11:59 PM (EST)
- Homework 3 Due on 03/30/2022 11:59 PM (EST)
- Homework 4 Due on 04/20/2022 11:59 PM (EST)
Project
The programming assignment may involve implementing some algorithm or communication facility for transaction processing.
The demonstration of projects will be scheduled on the next week after submission. Please discuss your project with the Professor or TA soon and get started.
You may want to talk about the project in class and see if you can partner with some other student.
The easy projects are on Two Phase Locking and Distributed Commit protocols. You will emulate 3-4 sites and communicate among them to accomplish transaction processing. For the complete list of projects and more information, please click here
Lecture slides
- Introduction, DDBMS, Distributed DB background
- Dist. DB Architecture, Dist. DB Design - Fragmentation, Dist. DB Design - Data Location
- Implementation issues and RAID system, Dist. Query Processing, Dist. Query Optimization
- Dist. Transaction Management: Transaction Concepts, ACID, Transaction Models
- Concurrency Control Ideas, Concurrency Control Algorithms, Concurrency-Control Research, Formal Concurrency Control, Deadlocks
- Optimistic CC, Distributed & Centralized, Performance
- Commit/Termination Protocols: 2PC, 3PC, Degrees of Commitment, Termination
- Reliability and Fault-Tolerant Mechanisms, Failure and Recovery, Logging
- Site Failure and Recovery, Partitioning
- Reliability Partition, Mutual Consistency
- Distributed Version Management
- Mobile Database Systems
- Privacy, Trust and Authentication, Privacy-Anonymity
- P2P systems
Book Slides
Please find the slides with the following links.- Ch1-Introduction
- Ch2-Background
- Ch3-Distribution Design
- Ch4-Data Integration
- Ch5-Semantic
- Ch6-Query Intro
- Ch7-Query Localization
- Ch8-Query Optimization
- Ch9-Query MDB
- Ch10-Transactions
- Ch11-Conc Control
- Ch12-Reliability
- Ch13-Replication
- Ch14-Parallel DBMS
- Ch15-Object DBMS
- Ch16-P2P
- Ch17-Web
- Ch18-Current Issues
Protect Purdue
Please refer to the protect purdue link.
Any student who has substantial reason to believe that another person is threatening the safety of others by not complying with Protect Purdue protocols is encouraged to report the behavior to and discuss the next steps with their instructor. Students also have the option of reporting the behavior to the Office of the Student Rights and Responsibilities. See also Purdue University Bill of Student Rights and the Violent Behavior Policy under University Resources in Brightspace.
Mental Health/Wellness Statement
If you find yourself beginning to feel some stress, anxiety and/or feeling slightly overwhelmed, try WellTrack. Sign in and find information and tools at your fingertips, available to you at any time.
If you need support and information about options and resources, please contact or see the Office of the Dean of Students. Call 765-494-1747. Hours of operation are M-F, 8 am- 5 pm..
If you find yourself struggling to find a healthy balance between academics, social life, stress, etc., sign up for free one-on-one virtual or in-person sessions with a Purdue Wellness Coach at RecWell. Student coaches can help you navigate through barriers and challenges toward your goals throughout the semester. Sign up is completely free and can be done on BoilerConnect. If you have any questions, please contact Purdue Wellness at evans240@purdue.edu.
If you’re struggling and need mental health services: Purdue University is committed to advancing the mental health and well-being of its students. If you or someone you know is feeling overwhelmed, depressed, and/or in need of mental health support, services are available. For help, such individuals should contact Counseling and Psychological Services (CAPS) at 765-494-6995 during and after hours, on weekends and holidays, or by going to the CAPS office on the second floor of the Purdue University Student Health Center (PUSH) during business hours.
In the event of Quarantined/Isolated
If you must miss class at any point in time during the semester, please reach out to me via Purdue email so that we can communicate about how you can maintain your academic progress. If you find yourself too sick to progress in the course, notify your adviser and notify me via email or Brightspace. We will make arrangements based on your particular situation. Please note that, according to Details for Students on Normal Operations for Fall 2021 announced on the Protect Purdue website, individuals who test positive for COVID-19 are not guaranteed remote access to all course activities, materials, and assignments
.
Nondiscrimination Statement
Purdue University is committed to maintaining a community that recognizes and values the inherent worth and dignity of every person; fosters tolerance, sensitivity, understanding, and mutual respect among its members; and encourages each individual to strive to reach his or her potential. In pursuit of its goal of academic excellence, the University seeks to develop and nurture diversity. The University believes that diversity among its many members strengthens the institution, stimulates creativity, promotes the exchange of ideas, and enriches campus life. A hyperlink to Purdue's full Nondiscrimination Policy Statement is included in our course Brightspace under University Policies.