ReactDB: Fast and Real-Time Analytical Database

Analytical database queries are critical to support business decisions. Because these queries involve complicated computation over a large corpus of data, their execution typically takes minutes to hours. When information in the database is updated, the user needs to re-execute the query on the current snapshot of database, which again takes a long time and the result reflects a stale snapshot. In this rapidly changing world, business intelligence should react to information updates in real-time.

To this end, we design ReactDB, a new database with fast analytical queries and reactive to database updates.

ReactDB is reactive in two ways. First, cached analytical queries are reactive to updates in the database. We observe that many analytical queries are repetitive. So we cache intermediate results of frequent analytical queries. When data updates, the cached results and ongoing transactions are updated incrementally in real-time. This enables cached queries to complete immediately. The user may even subscribe to an analytical query and receive an updated query result whenever the database updates.

Second, in ReactDB, physical data layout and indexes are reactive to data access pattern. Different queries need different physical data layouts and indexes for efficient access. Traditionally, they need to be manually tuned by the DBA, which may be suboptimal for certain workloads.

We rethink the data layout of the database and consider the redo log as the ground truth. Both row-based and column-based data layouts are caches which optimize for point and analytical queries respectively. Indexes are also considered as caches. Views and intermediate results of analytic queries may be materialized as caches. Each tuple in base table may exist in zero or more caches. Maintaining a cache offers speedup for certain reads but penalizes all writes. The optimal balance of read and write operations depend on data access pattern. We leverage reinforcement learning to explore the the solution space and determine a set of caches according to historical queries. During online serving, the reinforcement learning procedure continues to monitor changes in access pattern and updates caching decisions accordingly.