GROUP BY reordering

Jan 26, 2024·
Adrien Nayrat
Adrien Nayrat
· 5 min read
post Postgres

A commit caught my attention during my technical watch:

 1commit 0452b461bc405e6d35d8a14c02813c15e28ae516
 2Author:     Alexander Korotkov <akorotkov@postgresql.org>
 3AuthorDate: Sun Jan 21 22:21:36 2024 +0200
 4Commit:     Alexander Korotkov <akorotkov@postgresql.org>
 5CommitDate: Sun Jan 21 22:21:36 2024 +0200
 6
 7    Explore alternative orderings of group-by pathkeys during optimization.
 8
 9    When evaluating a query with a multi-column GROUP BY clause, we can minimize
10    sort operations or avoid them if we synchronize the order of GROUP BY clauses
11    with the ORDER BY sort clause or sort order, which comes from the underlying
12    query tree. Grouping does not imply any ordering, so we can compare
13    the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
14    we simply compared keys in the order specified in the query. This commit
15    explores alternative ordering of the keys, trying to find a cheaper one.
16
17    The ordering of group keys may interact with other parts of the query, some of
18    which may not be known while planning the grouping. For example, there may be
19    an explicit ORDER BY clause or some other ordering-dependent operation higher up
20    in the query, and using the same ordering may allow using either incremental
21    sort or even eliminating the sort entirely.
22
23    The patch always keeps the ordering specified in the query, assuming the user
24    might have additional insights.
25
26    This introduces a new GUC enable_group_by_reordering so that the optimization
27    may be disabled if needed.
28
29    Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru
30    Author: Andrei Lepikhov, Teodor Sigaev
31    Reviewed-by: Tomas Vondra, Claudio Freire, Gavin Flower, Dmitry Dolgov
32    Reviewed-by: Robert Haas, Pavel Borisov, David Rowley, Zhihong Yu
33    Reviewed-by: Tom Lane, Alexander Korotkov, Richard Guo, Alena Rybakina

We notice the explicit commit message mentioning peoples involved (12 reviewers!) as well as the link to the discussion : POC: GROUP BY optimization

Work began in 2018! It took 5 and a half years to reach this commit. It’s the result of many exchanges to reach a consensus taking into consideration the multiple ideas.

To test this patch, I compiled Postgres from source. Let’s take a simple example:

1create table t1 (c1 int, c2 int, c3 int);
2create index on t1 (c1,c2);
3insert into t1 SELECT i%100, i%1000, i from generate_series(1,10_000_000) i;
4vacuum analyze t1;

We got a table of 422MB and an index of 66MB.

You may see, that I’ve used a Postgres 16 new feature by using underscores in the generate_series1.

If we do a GROUP BY on c1,c2, we get this plan:

 1explain (settings, analyze,buffers) select count(*),c1,c2 from t1 group by c1,c2 order by c1,c2;
 2                                                                    QUERY PLAN
 3--------------------------------------------------------------------------------------------------------------------------------------------------
 4 GroupAggregate  (cost=0.43..235342.27 rows=100000 width=16) (actual time=3.605..3501.887 rows=1000 loops=1)
 5   Group Key: c1, c2
 6   Buffers: shared hit=10447
 7   ->  Index Only Scan using t1_c1_c2_idx on t1  (cost=0.43..159340.96 rows=10000175 width=8) (actual time=0.070..1900.730 rows=10000000 loops=1)
 8         Heap Fetches: 0
 9         Buffers: shared hit=10447
10 Settings: enable_group_by_reordering = 'off', random_page_cost = '1.1', max_parallel_workers_per_gather = '0'
11 Planning:
12   Buffers: shared hit=1
13 Planning Time: 0.191 ms
14 Execution Time: 3502.036 ms

You’ll notice that I’ve disabled this feature (enable_group_by_reordering=off). I’ve also disabled parallelization for clarity.

We have the expected plan, Postgres reads 10447 blocks with an Index Only Scan path. It uses the plan that manipulates the fewest blocks possible.

However, if we change the group by order:

 1explain (settings, analyze,buffers) select count(*),c1,c2 from t1 group by c2,c1 order by c1,c2;
 2                                                          QUERY PLAN
 3------------------------------------------------------------------------------------------------------------------------------
 4 Sort  (cost=602422.32..602672.32 rows=100000 width=16) (actual time=5393.446..5393.503 rows=1000 loops=1)
 5   Sort Key: c1, c2
 6   Sort Method: quicksort  Memory: 79kB
 7   Buffers: shared hit=96 read=53959
 8   ->  HashAggregate  (cost=514992.50..594117.50 rows=100000 width=16) (actual time=5392.351..5392.894 rows=1000 loops=1)
 9         Group Key: c2, c1
10         Batches: 1  Memory Usage: 3217kB
11         Buffers: shared hit=96 read=53959
12         ->  Seq Scan on t1  (cost=0.00..154055.00 rows=10000000 width=8) (actual time=0.033..1186.171 rows=10000000 loops=1)
13               Buffers: shared hit=96 read=53959
14 Settings: enable_group_by_reordering = 'off', random_page_cost = '1.1', max_parallel_workers_per_gather = '0'
15 Planning:
16   Buffers: shared hit=1
17 Planning Time: 0.189 ms
18 Execution Time: 5394.566 ms

In this case, Postgres will read the entire table (seqscan) and sort it, although the result is identical.

It reads 422MB of data versus 80MB, i.e. 5 times more. The result can be disastrous, given storage performance. In this case, we’re lucky: my instance is on a ramdisk, so the query isn’t much slower. With mechanical storage or cloud disks, response times can increase drastically.

The order of columns in a group by is important, and is a fairly simple optimization as long as you know the database schema and are able to control the queries executed on the server.

Unfortunately, with ORMs, this understanding can be lost or corrections missed. That’s where this feature comes in. Let’s see what it does:

 1postgres=# set enable_group_by_reordering to on;
 2SET
 3postgres=# explain (settings, analyze,buffers) select count(*),c1,c2 from t1 group by c2,c1 order by c1,c2;
 4                                                                    QUERY PLAN
 5--------------------------------------------------------------------------------------------------------------------------------------------------
 6 GroupAggregate  (cost=0.43..235338.33 rows=100000 width=16) (actual time=4.502..3658.168 rows=1000 loops=1)
 7   Group Key: c1, c2
 8   Buffers: shared hit=10447
 9   ->  Index Only Scan using t1_c1_c2_idx on t1  (cost=0.43..159338.33 rows=10000000 width=8) (actual time=0.081..1923.553 rows=10000000 loops=1)
10         Heap Fetches: 0
11         Buffers: shared hit=10447
12 Settings: random_page_cost = '1.1', max_parallel_workers_per_gather = '0'
13 Planning:
14   Buffers: shared hit=1
15 Planning Time: 0.230 ms
16 Execution Time: 3658.337 ms

We got back the first plan, much more performant.

This is a simple feature that should improve execution times for certain queries whose group by clause has not been optimized.

It’s a recurring demand to improve the planner in order to handle apparently simple cases. Bear in mind that the risk is to increase the number of operations in the planner. But we want this step to be as quick as possible: “we don’t want to increase the complexity of the planner when we can correct the query”.

However, this type of optimization can be accepted if we know that it won’t be costly for the planner.

Let’s just hope this feature won’t be removed by the time Postgres 17 is released :)


  1. A little story: to add this feature, the author made the SQL standard evolve: Grouping digits in SQL↩︎

Adrien Nayrat
Authors
PostgreSQL DBA Freelance

For seven years, I held various positions as a systems and network engineer. And it was in 2013 that I first started working with PostgreSQL. I held positions as a consultant, trainer, and also production DBA (Doctolib, Peopledoc).

This blog is a place where I share my findings, knowledge and talks.