Stats
Learn more about multi-stage stats and how to use them to improve your queries.
Last updated
Was this helpful?
Learn more about multi-stage stats and how to use them to improve your queries.
Last updated
Was this helpful?
Multi-stage stats (MSE) are more complex but also more expressive than single-stage stats. While in single-stage stats Apache Pinot returns a single set of statistics for the query, in multi-stage stats Apache Pinot returns a set of statistics for each operator of the query execution. These stats are collected by default and included in the response of any MSE query.
Each operator has its own set of statistics, which are collected during the execution of the query. See the section to learn more about the different operator types and their statistics.
The recommended way to analyze the multi-stage stats is to use the visualizer included in the Pinot UI. It can be accessed by running a query in the Pinot controller UI and clicking on the Visual
button.
Then, the view is changed to only show the multi-stage stats in a graph format like the following, where each node represents an operator. Inside each node, you can see the operator type and the statistics collected for that operator. Nodes are connected with edges that represent the relationship between the operators. Parent operators are above their children, and the edges' width represents the time spent on the child operator.
For example, the following query in ColocatedJoinQuickStart:
Creates the following graph:
By default, the visualizer will only show the most important stats. To show all the stats, click on Show details
button in the bottom left corner of the visualizer.
The Pinot UI stats visualizer is a convenient way to see the multi-stage stats, but sometimes you may want to see the raw JSON format. For example, you may want to analyze the stats programmatically or use a different visualization tool. To do so, you can read the stageStats
field in the JSON response of the query.
For example, the same query used in the previous section returns: Returns the following stageStats
:
Each node in the tree represents an operation that is executed and the tree structure form is similar (but not equal) to the logical plan of the query that can be obtained with the EXPLAIN PLAN
command.
Here we can see there are 5 stages (one for each MAILBOX_SEND operator). A significant part of the time is spent in HASH_JOIN on stage 1, followed by the read on userFactEvents
. We can also see that stage 5, the one that reads from userFactEvents
, returns 40000 rows while the other stage returns 2494 rows, so as explained in , it is better to have the smaller table on the right side of the join, so the query would be faster if written as:
The graph being drawn is usually a tree-like structure, but it can be a directed acyclic graph (DAG) in some cases, like when using .
The stats are always a tree structure when using the JSON format, even when are used. In that case, the spooled stages will be included more than once in the tree. You will need to create the DAG yourself by looking at the stage
field for each operator and connect the operators with the same stage ID.