Search
Building Efficient OLAP Index With Roaring Bitmap
Today, data is growing at an exponential rate, and the need to analyze and execute queries in near real time on petabytes of data has pushed OLAP (Online Analytical Processing) engines to evolve well beyond their batch-oriented origins. Historically, OLAP systems were designed to process large datasets in batch mode, with efficient query execution treated as a secondary concern. Modern workloads, however, increasingly demand that OLAP engines behave more like OLTP (Online Tra
Nov 28, 20256 min read
Part 1: Deep Dive - Spark Window Functions
This is part 1 of deep dive series on understanding internals of Apache Spark Window Functions. The full series in based upon Spark version 3.1x . Introduction A window function in query processing is a function that processes a subset of data, often called window frame , over a large dataset. A window frame defines the boundaries (ie. start and end row) within which a window function is applied to a set of rows. Consider column values as [1, 2, 3, 4, 5] and we want to calc
Nov 16, 20243 min read
The Design of Causal Consistent Databases
Today's distributed systems are undoubtedly complex and varied, requiring different data consistency guarantees. Amongst all,...
Jul 18, 20239 min read
Erasure codes for Distributed Storage
Data is exploding exponentially and so is the requirement for massively scalable distributed storage systems. A distributed storage...
Nov 10, 20228 min read
