Import gp_relsizes_stats extention from Greenplum#1757
Conversation
There was a problem hiding this comment.
Hi, @Vlasdislav welcome!🎊 Thanks for taking the effort to make our project better! 🙌 Keep making such awesome contributions!
There was a problem hiding this comment.
Pull request overview
This PR ports the gp_relsizes_stats extension into the Cloudberry monorepo under gpcontrib/, integrating it into the in-tree build and adding regression tests.
Changes:
- Adds the new
gp_relsizes_statsextension (C code, SQL install/upgrade scripts, control file, docs). - Integrates the extension into
gpcontrib/Makefilerecursion targets and adds a dual-mode (PGXS vs in-tree) Makefile. - Adds regression tests and expected outputs for functionality and privilege behavior.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| gpcontrib/Makefile | Adds gp_relsizes_stats to the gpcontrib build recurse targets. |
| gpcontrib/gp_relsizes_stats/Makefile | Builds/installs the extension in-tree or via PGXS; wires up regression tests. |
| gpcontrib/gp_relsizes_stats/src/gp_relsizes_stats.c | Implements background worker orchestration and filesystem stats collection. |
| gpcontrib/gp_relsizes_stats/sql/gp_relsizes_stats--1.3.sql | Creates schema/tables/views and declares C functions. |
| gpcontrib/gp_relsizes_stats/sql/gp_relsizes_stats--1.2--1.3.sql | Upgrade script (grants). |
| gpcontrib/gp_relsizes_stats/sql/gp_relsizes_stats--1.1--1.2.sql | Upgrade script (view changes). |
| gpcontrib/gp_relsizes_stats/sql/gp_relsizes_stats--1.0--1.1.sql | Upgrade script (function signature change). |
| gpcontrib/gp_relsizes_stats/gp_relsizes_stats.control | Registers extension metadata (version, module path, trusted flag). |
| gpcontrib/gp_relsizes_stats/test/sql/grants.sql | Adds privileges/regression coverage for creator + granting to another role. |
| gpcontrib/gp_relsizes_stats/test/sql/gp_relsizes_stats.sql | Adds functional regression coverage for stats collection and size reporting. |
| gpcontrib/gp_relsizes_stats/test/expected/grants.out | Expected output for grants.sql. |
| gpcontrib/gp_relsizes_stats/test/expected/gp_relsizes_stats.out | Expected output for gp_relsizes_stats.sql. |
| gpcontrib/gp_relsizes_stats/README.md | Extension documentation. |
| gpcontrib/gp_relsizes_stats/LICENCE | Bundled Apache 2.0 license text for the extension. |
| gpcontrib/gp_relsizes_stats/.gitignore | Ignores local build artifacts in the extension directory. |
| gpcontrib/gp_relsizes_stats/.clang-format | Local formatting configuration for the extension sources. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @Vlasdislav thanks for your work. The following are my comments for your reference:
|
3514aa1 to
6ee3710
Compare
c0217c2 to
9c9866d
Compare
|
Hi @Vladislav, Thank you for your work. Here are my comments for your consideration:
You can refer to this PR for guidance: #1629. |
16c18fd to
6f1a976
Compare
Hi, @tuhaihe, See if I understood you correctly. |
leborchuk
left a comment
There was a problem hiding this comment.
Checked - all looks good for me, much more better than original version
|
See regression in tests, check https://github.com/apache/cloudberry/actions/runs/26833894082/job/79174723029?pr=1757 - some issues with history stat and partition info |
6f1a976 to
f07716d
Compare
If I understood your message correctly, then I looked at the error and the problem is not in the code (local it's OK), but in the configuration of running the CI test. There may be a fix in this commit: f07716d. |
|
Overall, LGTM now from the non-tech side, like the ASF-compliance related. Would like to have your technical comments on this PR, cc @my-ship-it @Smyatkin-Maxim, @yjhjstz |
0e88a45 to
438746c
Compare
|
Will take a look |
438746c to
070b61e
Compare
Smyatkin-Maxim
left a comment
There was a problem hiding this comment.
LGTM. I think we have to squash it to just two commits: @AndrewOvvv `s initial commit to keep authorship and the other one with @Vlasdislav changes.
fbce008 to
21be9f2
Compare
|
Hi @Vlasdislav we are unable to complete the |
9126b24 to
21be9f2
Compare
Hi, @tuhaihe should be OK now |
34400ca to
047c005
Compare
What does this PR do?
This PR ports the
gp_relsizes_statsextension from the standalone open-gpdb/gp_relsizes_stats repository into the Cloudberry monorepo undergpcontrib/and adapts it to Cloudberry (PostgreSQL 14 base).The extension collects and stores statistics about table and file sizes on coordinator and segment hosts. It supports automatic collection via a Background Worker and manual one-shot collection via
relsizes_stats_schema.relsizes_collect_stats_once().Build / integration changes:
Makefile— added dual-mode build support:USE_PGXS=1for standalone builds (original behavior) and in-tree build via$(top_builddir)/src/Makefile.global+contrib-global.mkfor building inside the Cloudberry source tree.gpcontrib/Makefile— addedgp_relsizes_statstorecurse_targetsfor both release and debug (enable_debug_extensions=yes) build configurations.README.md— updated terminology (master → coordinator, GPDB → Cloudberry) and installation instructions for in-tree build.SQL adaptation for Cloudberry / PG14 (
sql/gp_relsizes_stats--1.3.sqlandsql/gp_relsizes_stats--1.1--1.2.sql):table_filesview to use a recursiveWITH RECURSIVEwalk overpg_inheritsinstead of the GP6-onlypg_partition/pg_partition_rulecatalogs, which do not exist in Cloudberry.segment_file_sizesfromINNERtoLEFT JOIN(withCOALESCE(size, 0)/COALESCE(mtime, 0)): on Cloudberry/PG14 the partition root and intermediate partitioned tables haverelfilenode = 0and therefore have no row insegment_file_sizes; we still want them surfaced with size = 0.OIDS=FALSEfromsegment_file_sizes(unsupported since PG12).TRUNCATE TABLE segment_file_sizes/TRUNCATE TABLE table_sizes_historycalls that were executed during extension install — re-installing the extension no longer wipes previously collected stats.EXECUTE ON MASTERfromrelsizes_collect_stats_once()(the notion of MASTER is replaced by COORDINATOR in Cloudberry; the default dispatch is sufficient).Type of Change
Breaking Changes
Not applicable — this is a new extension; nothing existing in the tree changes behavior.
Impact
Performance:
No impact on existing functionality. The background worker is disabled by default (
gp_relsizes_stats.enabled = false). When enabled, naptimes between databases / files are tunable to spread the load.User-facing changes:
New extension
gp_relsizes_statsavailable undergpcontrib. Must be added toshared_preload_librariesto enable the background worker. Manual collection viaSELECT relsizes_stats_schema.relsizes_collect_stats_once();does not require preload.Dependencies:
None beyond standard PostgreSQL / Cloudberry server headers (
bgworker,shmem,cdbvars,pg_appendonly).Additional Context
Upstream repository: https://github.com/open-gpdb/gp_relsizes_stats