GROUP BY reordering
A commit caught my attention during my technical watch:
1commit 0452b461bc405e6d35d8a14c02813c15e28ae516
2Author: Alexander Korotkov <akorotkov@postgresql.org>
3AuthorDate: Sun Jan 21 22:21:36 2024 +0200
4Commit: Alexander Korotkov <akorotkov@postgresql.org>
5CommitDate: Sun Jan 21 22:21:36 2024 +0200
6
7 Explore alternative orderings of group-by pathkeys during optimization.
8
9 When evaluating a query with a multi-column GROUP BY clause, we can minimize
10 sort operations or avoid them if we synchronize the order of GROUP BY clauses
11 with the ORDER BY sort clause or sort order, which comes from the underlying
12 query tree. Grouping does not imply any ordering, so we can compare
13 the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
14 we simply compared keys in the order specified in the query. This commit
15 explores alternative ordering of the keys, trying to find a cheaper one.
16
17 The ordering of group keys may interact with other parts of the query, some of
18 which may not be known while planning the grouping. For example, there may be
19 an explicit ORDER BY clause or some other ordering-dependent operation higher up
20 in the query, and using the same ordering may allow using either incremental
21 sort or even eliminating the sort entirely.
22
23 The patch always keeps the ordering specified in the query, assuming the user
24 might have additional insights.
25
26 This introduces a new GUC enable_group_by_reordering so that the optimization
27 may be disabled if needed.
28
29 Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru
30 Author: Andrei Lepikhov, Teodor Sigaev
31 Reviewed-by: Tomas Vondra, Claudio Freire, Gavin Flower, Dmitry Dolgov
32 Reviewed-by: Robert Haas, Pavel Borisov, David Rowley, Zhihong Yu
33 Reviewed-by: Tom Lane, Alexander Korotkov, Richard Guo, Alena Rybakina
We notice the explicit commit message mentioning peoples involved (12 reviewers!) as well as the link to the discussion : POC: GROUP BY optimization
Work began in 2018! It took 5 and a half years to reach this commit. It’s the result of many exchanges to reach a consensus taking into consideration the multiple ideas.
To test this patch, I compiled Postgres from source. Let’s take a simple example:
1create table t1 (c1 int, c2 int, c3 int);
2create index on t1 (c1,c2);
3insert into t1 SELECT i%100, i%1000, i from generate_series(1,10_000_000) i;
4vacuum analyze t1;
We got a table of 422MB and an index of 66MB.
You may see, that I’ve used a Postgres 16 new feature by using underscores in the generate_series1.
If we do a GROUP BY on c1,c2, we get this plan:
1explain (settings, analyze,buffers) select count(*),c1,c2 from t1 group by c1,c2 order by c1,c2;
2 QUERY PLAN
3--------------------------------------------------------------------------------------------------------------------------------------------------
4 GroupAggregate (cost=0.43..235342.27 rows=100000 width=16) (actual time=3.605..3501.887 rows=1000 loops=1)
5 Group Key: c1, c2
6 Buffers: shared hit=10447
7 -> Index Only Scan using t1_c1_c2_idx on t1 (cost=0.43..159340.96 rows=10000175 width=8) (actual time=0.070..1900.730 rows=10000000 loops=1)
8 Heap Fetches: 0
9 Buffers: shared hit=10447
10 Settings: enable_group_by_reordering = 'off', random_page_cost = '1.1', max_parallel_workers_per_gather = '0'
11 Planning:
12 Buffers: shared hit=1
13 Planning Time: 0.191 ms
14 Execution Time: 3502.036 ms
You’ll notice that I’ve disabled this feature (enable_group_by_reordering=off). I’ve also disabled parallelization for clarity.
We have the expected plan, Postgres reads 10447 blocks with an Index Only Scan path. It uses the plan that manipulates the fewest blocks possible.
However, if we change the group by order:
1explain (settings, analyze,buffers) select count(*),c1,c2 from t1 group by c2,c1 order by c1,c2;
2 QUERY PLAN
3------------------------------------------------------------------------------------------------------------------------------
4 Sort (cost=602422.32..602672.32 rows=100000 width=16) (actual time=5393.446..5393.503 rows=1000 loops=1)
5 Sort Key: c1, c2
6 Sort Method: quicksort Memory: 79kB
7 Buffers: shared hit=96 read=53959
8 -> HashAggregate (cost=514992.50..594117.50 rows=100000 width=16) (actual time=5392.351..5392.894 rows=1000 loops=1)
9 Group Key: c2, c1
10 Batches: 1 Memory Usage: 3217kB
11 Buffers: shared hit=96 read=53959
12 -> Seq Scan on t1 (cost=0.00..154055.00 rows=10000000 width=8) (actual time=0.033..1186.171 rows=10000000 loops=1)
13 Buffers: shared hit=96 read=53959
14 Settings: enable_group_by_reordering = 'off', random_page_cost = '1.1', max_parallel_workers_per_gather = '0'
15 Planning:
16 Buffers: shared hit=1
17 Planning Time: 0.189 ms
18 Execution Time: 5394.566 ms
In this case, Postgres will read the entire table (seqscan) and sort it, although the result is identical.
It reads 422MB of data versus 80MB, i.e. 5 times more. The result can be disastrous, given storage performance. In this case, we’re lucky: my instance is on a ramdisk, so the query isn’t much slower. With mechanical storage or cloud disks, response times can increase drastically.
The order of columns in a group by is important, and is a fairly simple optimization as long as you know the database schema and are able to control the queries executed on the server.
Unfortunately, with ORMs, this understanding can be lost or corrections missed. That’s where this feature comes in. Let’s see what it does:
1postgres=# set enable_group_by_reordering to on;
2SET
3postgres=# explain (settings, analyze,buffers) select count(*),c1,c2 from t1 group by c2,c1 order by c1,c2;
4 QUERY PLAN
5--------------------------------------------------------------------------------------------------------------------------------------------------
6 GroupAggregate (cost=0.43..235338.33 rows=100000 width=16) (actual time=4.502..3658.168 rows=1000 loops=1)
7 Group Key: c1, c2
8 Buffers: shared hit=10447
9 -> Index Only Scan using t1_c1_c2_idx on t1 (cost=0.43..159338.33 rows=10000000 width=8) (actual time=0.081..1923.553 rows=10000000 loops=1)
10 Heap Fetches: 0
11 Buffers: shared hit=10447
12 Settings: random_page_cost = '1.1', max_parallel_workers_per_gather = '0'
13 Planning:
14 Buffers: shared hit=1
15 Planning Time: 0.230 ms
16 Execution Time: 3658.337 ms
We got back the first plan, much more performant.
This is a simple feature that should improve execution times for certain queries whose group by clause has not been optimized.
It’s a recurring demand to improve the planner in order to handle apparently simple cases. Bear in mind that the risk is to increase the number of operations in the planner. But we want this step to be as quick as possible: “we don’t want to increase the complexity of the planner when we can correct the query”.
However, this type of optimization can be accepted if we know that it won’t be costly for the planner.
Let’s just hope this feature won’t be removed by the time Postgres 17 is released :)
-
A little story: to add this feature, the author made the SQL standard evolve: Grouping digits in SQL . ↩︎
For seven years, I held various positions as a systems and network engineer. And it was in 2013 that I first started working with PostgreSQL. I held positions as a consultant, trainer, and also production DBA (Doctolib, Peopledoc).
This blog is a place where I share my findings, knowledge and talks.