Interestingly, "aggregate first, join later" has been the standard way of joinin...

simonw · 2025-12-03T17:10:51 1764781851

Something I really appreciate about PostgreSQL is that features don't land in a release until they are rock solid.

I don't think it's that nobody thought of it for PostgreSQL - I think it's that making sure it worked completely reliably across the entire scope of existing PostgreSQL features to their level of required quality took a bunch of effort.

Sesse__ · 2025-12-03T12:08:12 1764763692

It's not that nobody thought of it. Group pushdown has been a thing in papers for ~10 years at least, but it's hard to plan; your search space (which was already large) explodes, and it's always hard to know exactly how many rows come out of a given grouping. I have no idea how Postgres deals with these. Hopefully, they're doing something good (enough) :-)

Next up would hopefully be groupjoin, where you combine grouping and hash join into one operation if they are on the same or compatible keys (which is surprisingly often).

SigmundA · 2025-12-03T18:29:14 1764786554

I wonder if PG will ever implement plan caching like MSSQL so that the speed of the optimizer is less of a concern and it can take more time finding better plans rather than replanning on every execution of the same statement.

Sesse__ · 2025-12-03T19:33:18 1764790398

Postgres used to have plan caching inside the same session, and that was so disastrous that it was limited severely by default.

Plan caching is very much a two-edged sword; cache too aggressively, and the situation will be different between the runs. Cache too little, and your hit rates are useless.

SigmundA · 2025-12-03T20:05:03 1764792303

Not sure how that makes sense, if the stats change significantly then caches would be evicted during the gathering of statistics.

I believe popular connection poolers and clients attempt to do plan caching through prepared statements and keeping the connection open.

My understanding its not easy to do in PG since connections are process based instead of thread based and the query plans are not serializable between processes, so they cannot be shared between connections.

MSSQL has been doing statement plan caching for at least 20 years and it did stored procedure plan caching before that.

Netcob · 2025-12-03T10:19:51 1764757191

Probably quite a lot, being a specialist in multiple domains is getting more difficult.

pgaddict · 2025-12-03T15:43:04 1764776584

It's not about not knowing about an optimization. The challenge is to know when to apply it, so that it does not cause regressions for cases that can't benefit from it. It may be less risky in specialized systems, like BI systems typically don't need to worry about regressing OLTP workloads. Postgres absolutely needs to be careful of that.

I believe that's one of the reasons why it took about ~8 years (the original patch was proposed in 2017).