top of page

Building Efficient OLAP Index With Roaring Bitmap

  • Writer: Rishaab
    Rishaab
  • 2 days ago
  • 6 min read

Updated: 17 hours ago


Today, data is growing at an exponential rate, and the need to analyze and execute queries in near real time on petabytes of data has pushed OLAP (Online Analytical Processing) engines to evolve well beyond their batch-oriented origins. Historically, OLAP systems were designed to process large datasets in batch mode, with efficient query execution treated as a secondary concern. Modern workloads, however, increasingly demand that OLAP engines behave more like OLTP (Online Transaction Processing) systems, providing fast query responses with predictable performance. OLTP systems achieve this through efficient indexing structures such as B-trees or B+-trees. Unfortunately, these structures do not translate well into OLAP environments because the access patterns are fundamentally different. OLTP workloads are point-lookup heavy, whereas OLAP workloads involve scanning extremely large volumes of data. Using B-trees in OLAP systems results in excessive random I/O and poor performance.


Because OLAP workloads operate at massive scale, they require fundamentally different indexing strategies. An ideal OLAP index must be simple, compressible, highly dense, and capable of packing a large amount of information into a small memory footprint. Reducing memory footprint also reduces I/O, since each index page can store more information. Bitmaps naturally satisfy many of these requirements. Bitmaps can encode large numbers of values compactly, and their compressibility makes them even more attractive.


To understand how OLAP systems use bitmaps effectively, consider that OLAP queries often access only a few columns but must process millions or billions of rows. For this reason, we will focus on a columnar OLAP engine.



Consider a dataset named Employee which contains billions of rows. Only few rows are shown below for illustration.

Row

Name

0

Bill

1

Jeff

2

Sam

3

Sundar

4

Satya

5

Bill

6

Sam

7

Sundar

8

Sundar

9

Bill


Suppose we have to execute the following SQL query on this dataset:

SELECT COUNT(*) FROM Employee WHERE Name = 'Bill' OR Name = 'Sundar'

One simple way to execute this query is to go over the all billions of rows and filter out those that have Bill and Sundar in them and finally count those number of rows. This approach will work, however, it will be very expensive and won't deliver fast query performance. Since, we filter on Name column, we can build index on it using bitmap approach.


Lets introduce bitmap for each unique name in the Name column. Each position of bitmap corresponds to the Row number and a 1 indicates the presence of that name in that row. The MSB (most significant bit) is to the left and LSB (least significant bit) is to the right.

Index for Bill:   (MSB)[1, 0, 0, 0, 1, 0, 0, 0, 0, 1](LSB)
Index for Jeff:   (MSB)[0, 0, 0, 0, 0, 0, 0, 0, 1, 0](LSB)
Index for Sam:    (MSB)[0, 0, 0, 1, 0, 0, 0, 1, 0, 0](LSB)
Index for Sundar: (MSB)[0, 1, 1, 0, 0, 0, 1, 0, 0, 0](LSB)
Index for Satya:  (MSB)[0, 0, 0, 0, 0, 1, 0, 0, 0, 0](LSB)

To evaluate the query, we simply OR the bitmaps for Bill and Sundar. We will get [1, 1, 1, 0, 1, 0, 1, 0, 0, 1]. Next to count the number of rows, we will simply count the number of 1's in the bitmap, which results in 6, and that's the result of the query.


Note that, we never scanned the actual rows when executing the query. We got the result from the index itself, making this query very fast.



But what if the name Jeff occurred only once in those 1 billion rows?

With the current approach we will be allocating a fixed bitmap of size approx. 120 MB ( 10^9 / 1024 / 1024 / 8), even if we needed only 1 bit position. This is certainly not very efficient. Bitmap indexes are extremely effective for dense data but wasteful for sparse data, where most bits are zero, like in this case.



Can we do better?

Certainly yes. A natural improvement is to use a hybrid approach. We can use bitmaps for names that occur frequently in the dataset, and different data structure to store names that occur rarely, like Jeff in our case.


For Jeff, we could store the row number(s) in a sorted array of int-32 type. That will consume only 4 bytes for one entry and will be very efficient.


This approach an be formalized as follows: if a name occurs more than X times, use bitmap otherwise use sorted array.


Process of creating a bitmap or sorted array based on the size of element in an index
Process of creating a bitmap or sorted array based on the size of element in an index



Can we do even better?

Yes. Consider this bitmap,

[0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1]

This bitmap contains long sequences of consecutive 1's and 0's. We can optimize it using RLE (Run Length Encoding). The idea is simple, rather than storing all consecutive 1's in the bitmap, we store only the start and end positions of consecutive runs of 1's. Applying RLE, changes above bitmap to,

[(1, 5), (10, 14)]

This can drastically reduce the memory footprint for large clusters of 1's.


Note that the processing applying RLE compression might require scanning the existing bitmap, which can be costly. So systems often run RLE compression as an asynchronous index-optimization task.


So far so good. What is Roaring Bitmap then?

What we saw above is essentially how a Roaring bitmap works. A Roaring Bitmap is a structure that dynamically switches between a sorted array and a bitmap based on how many elements fall into a given Container (discussed below). Production-ready Roaring implementations may include more features, but the core idea stays the same. They also use techniques like RLE to compress the data layout, described earlier.


Roaring bitmaps organize data into structures called Containers. These are the components that hold the actual values. For the Employee dataset example, they are the structures that store the row positions for each name. Each unique name will have its own container(s). Roaring uses three types of containers,

  • Array Container - They store values in sorted array.

  • Bitmap Container - They store values in the bitmap.

  • Run Container - They store values using RLE.


Roaring Bitmap dynamically converts one container to another depending on the count and distribution of elements within the container. These conversions are generally inexpensive, but OLAP access pattern should be designed so that the cost of these transitions is naturally amortized over large batches of operations, where the performance benefits far outweigh the occasional container conversion.


When inserting a value, Roaring Bitmap divides the number bits into lower and upper halves. So, if a value is int-32, Roaring bitmap will split it into its most significant 16 bits and least significant 16 bits.


If the int-32 value is, for eg. 0x0012ABCD, the upper 16 bits become 0x0012 and the lower 16 bits become 0xABCD. The upper half tells Roaring which container this value belongs to, and the lower half is the actual position stored inside that container.

Upper bits (0x0012) = 0000000000010010
Lower bits (0xABCD) = 1010101111001101

When inserting this value, Roaring looks up container 0x0012. If it exists, the value 0xABCD is added to it. If it does not, a new container is created for 0x0012. The container then stores the lower 16 bits in its optimal format. Think of it like maintaining a hash map of Containers, where the key for the hash map is the upper 16 bits. Once the container is found, we insert the lower 16 bits into that container. Roaring may convert that container to another type depending on number and distribution of elements after insertion.



Insertion process in Roaring Bitmap
Insertion process in Roaring Bitmap


But why does Roaring splits the bit representation into two halves?

Roaring does this because of the following reasons,

  • The upper 16-bit space is usually very sparse in real datasets. If we tried to maintain a single flat 32-bit bitmap, almost all bits would be zero and the bitmap would be huge and wasteful. Splitting solves this problem by turning the sparse upper region into a small list of keys, and pushing all activity into small 16-bit chunks where the distribution is far more compressible.

  • A flat 32-bit integer would require 512 MB of bitmap to accommodate all possible values, a 16-bit integer on the other hand would require 8 KB to store all possible values. The size 8 KB fits very well in modern CPU caches and provides good cache locality and performance, on the other hand, 512 MB bitmap will not fit in the cache lines and will result in poor performance.




By combining the strengths of arrays, bitmaps, and run-length encoding, Roaring Bitmaps provide a highly efficient indexing techniques. High performance system like Apache Druid uses Roaring Bitmaps to deliver fast, scalable, and memory-efficient query processing at massive scale. The line between OLAP and OLTP is steadily blurring, and data structures like Roaring Bitmaps are playing a key role in closing that gap.


I hope you learned something new. See you next time and happy learning 🙂.

Recent Posts

See All
Part 1: Deep Dive - Spark Window Functions

This is part 1 of deep dive series on understanding internals of Apache Spark Window Functions. The full series in based upon Spark version 3.1x . Introduction A window function in query processing is

 
 
 

Comments


Thanks for submitting!

©2023 by Rishab Joshi.

bottom of page