Purdue CS592: Databases for AI: Vector Databases
(Spring 2026)




Course Description

Vector databases have recently emerged as a hot topic in the broader realm of databases for AI. The surge of interest is largely fueled by large language models (LLMs), where vector databases help overcome inherent limitations such as hallucinations, lack of domain expertise, and the inability to incorporate real-time information. This is enabled by the new paradigm of Retrieval-Augmented Generation (RAG), in which vector databases act as external knowledge bases, delivering relevant context to LLMs via vector search. While vector search itself is not new, modern vector databases face a host of system-level challenges, which we will explore in depth in this course.

In this seminar, we will first cover the fundamentals of vector databases. Then we will feature a series of invited talks and student presentations covering recent advances in vector databases. By the end of the course, students are expected to gain a solid understanding of the challenges, the state-of-the-art techniques, and the open problems in vector databases.




Instructor




Logistics

  • When: MW 4:30-5:45pm
  • Where: GRIS 133
  • Office hour: after class or make appointment
  • Pre-requisites: No prior experience with vector databases is required. However, familiarity with data structures (e.g., CS251), databases (e.g., CS348 or CS448), and introductory AI/ML (e.g., CS242 or CS243) will be helpful.



Online communications

  • We'll use Piazza, e.g., announcements, discussions, and Q&A.
  • We'll NOT use Brightspace except for sending emails occasionally.



Schedule (More Speakers to Be Invited)

Lecture

Talk/Paper Title

Presenter

Lec 01 (01/12) Introduction on Vector Databases
Jianguo Wang
Lec 02 (01/14) Introduction on Vector Databases
Jianguo Wang
Lec 03 (01/19) Cancelled due to MLK Day
N/A
Lec 04 (01/21) Cancelled due to business trip
N/A
Lec 05 (01/26) Introduction on Vector Databases
Jianguo Wang
Lec 06 (01/28) VectorChord - Scalable Vector Search in PostgreSQL
Allen Zhou @ Vectorchord
Lec 07 (02/02) Elasticsearch: world's most downloaded vector database
Mayya Sharipova @ Elastic
Lec 08 (02/04) TBD
TBD
Lec 09 (02/09) The DiskANN Vector Indexing Library: From Ideas to Microsoft Scale
Harsha Simhadri @ Microsoft
Lec 10 (02/11) Database Systems in the AI Era
Qi Chen @ Microsoft
Lec 11 (02/16) MyVector: Vector Search & AI Database Tools for MySQL
Alkin Tezuysal @ Altinity
Lec 12 (02/18) RAG on Financial Documents
Xinyu Wang @ McGill University
Lec 13 (02/23) RetroInfer Vector Store for LLM Inference
Baotong Lu @ Microsoft
Lec 14 (02/25) Building a search system for AI workloads
Tanuj Nayak @ Chroma
Lec 15 (03/02) RaBitQ
Cheng Long @ NTU
Lec 16 (03/04) pgvector Deep Dive
Jonathan Katz @ Databricks
Lec 17 (03/09) BANG: Billion-Scale Approximate Nearest Neighbour Search using a Single GPU
Karthik V. and Jyothi Vedurada @ IIT
Lec 18 (03/11) Building Stateless Serverless Vector DBs via Block-based Data Partitioning
Daniel Barcelona-Pons @ URV
Lec 19 (03/16) Cancelled due to Spring break
N/A
Lec 20 (03/18) Cancelled due to Spring break
N/A
Lec 21 (03/23) GPU-Accelerated ANNS:Quantized for Speed, Built for Change
Hunter McCoy @ Northeastern Univ.
Lec 22 (03/25) Building Vector Search in MongoDB: Systems Challenges at Scale
Chunbin Lin @ MongoDB
Lec 23 (03/30) Beyond Search: Full Embedding Lifecycle in Cloud SQL for MySQL
Shu Zhou @ Google
Lec 24 (04/01) Vector Database Systems for AI Workloads
Yunan Zhang @ Purdue
Lec 25 (04/06) Index Maintenance in Vector Database Systems
Chenzhe Jin @ Purdue
Lec 26 (04/08) Vector Database for Aviation Safety Narrative Report Analysis
Shuo Liu @ Purdue
Lec 27 (04/13) Cancelled for business trip
N/A
Lec 28 (04/15) Cancelled for business trip
N/A
Lec 29 (04/20) Auto Tuning in Vector Databases
Yibo Wang @ Purdue
Lec 30 (04/22) Final Project Presentation
N/A
Lec 31 (04/27) Final Project Presentation
N/A
Lec 32 (04/29) ByteDance Vector Search
Silu Huang @ ByteDance