Search
Part 5: Deep Dive - Spark Window Functions
This is the final part of the deep dive series on understanding internals of Apache Spark Window Functions. In earlier sections, we focused on grasping the implementation details of the individual components necessary for Window functions to operate. We used the rank window function as an example and examined each component thoroughly. We explored various representations such as UnsafeRow , SpecificInternalRow , and JoinedRow , and how they improve data processing in Spark. W
Dec 2, 20246 min read
Â
Â
Â
Part 4: Deep Dive - Spark Window Functions
In part 3 , we explored the internal workings of AggregateProcessor and how it evaluates expressions using our rank() window function example. In this post, we will gain a deep understanding of the WindowFunctionFrame . Since Spark provides several different types of frames, it's not feasible to cover each one in this post. Therefore, we will concentrate on one specific frame type called UnboundedPrecedingWindowFunctionFrame , also used by our rank function example from part
Nov 29, 20246 min read
Â
Â
Â
Part 3: Deep Dive - Spark Window Functions
In the previous post, we briefly alluded about AggregateProcessor without going into the depth. In this part, we will focus on its implementation details. To set things up, an AggregateProcessor is a component that facilitates in computing aggregate operation, like SUM, RANK, etc. In order to ensure high efficiency in a scalable system like Spark, this aggregation process sacrifices immutability, a fundamental concept in the Scala language. AggregateProcessor uses mutable cl
Nov 22, 20247 min read
Â
Â
Â
Part 2: Deep Dive - Spark Window Functions
This is part 2 of deep dive series on understanding internals of Apache Spark Window Functions. Following our discussion of Spark's rank()...
Nov 19, 20244 min read
Â
Â
Â
Part 1: Deep Dive - Spark Window Functions
This is part 1 of deep dive series on understanding internals of Apache Spark Window Functions. The full series in based upon Spark version 3.1x . Introduction A window function in query processing is a function that processes a subset of data, often called window frame , over a large dataset. A window frame  defines the boundaries (ie. start and end row) within which a window function is applied to a set of rows. Consider column values as [1, 2, 3, 4, 5] and we want to calc
Nov 16, 20243 min read
Â
Â
Â
