Grouped Index Tuples for PostgreSQL

Overview

Grouped Index Tuples is essentially a lossy compression scheme for B-tree indexes that takes advantage of clustering in the table. On a fully clustered table, it can achieve impressive compression by effectively storing only one index tuple per heap page instead of one per heap tuple. That can be a 100-fold reduction in index size in the best case of very narrow rows and full clustering. On a completely non-ordered table, it degrades to a normal B-tree. Smaller index size is important because it means less I/O which means more overall throughput.

Other DBMSs achieve similar effect with Clustered Indexes. Traditional clustered indexes in PostgreSQL would require moving heap tuples around, which is difficult. GIT doesn't enforce ordering of the heap, but instead gradually degrades to a normal b-tree as the table becomes less clustered.

Because the key of each index tuple and the order of tuples within a heap page is not stored, index scans need to do some extra processing with GIT. Therefore GIT is a tradeoff between CPU and I/O.

Download

The patches below are against PostgreSQL CVS head.

The main GIT patch: groupeditems-43-pghead.patch.gz

The regression tests included in the GIT patch depend on a custom indexcontents tool that I've been using to peek into index pages. The regression tests are "white box" tests that look into the contents of the index, because I found it difficult to write a reliable black box test suite. There's a similar tool in contrib, pgstattuple, which I could've used instead, but I ended up reinventing the wheel because I found out about pgstattuple only after writing the regression tests. Here's the tool: indexcontents-20061205.tar.gz

Here's a patch to maintain cluster order on a table as new rows are inserted. This patch is independent from GIT, but I've included it here because the performance of GIT depends on clustering of the heap. maintain_cluster_order_v5.patch

A performance unit test suite that tests various simple test cases, with and without GIT: git-perfunittests-20070222.tar.gz

A crude crash recovery test suite: crashtest.sh. Modify the path to your PostgreSQL installation in the script.

Usage

After applying the GIT patch, the feature will automatically be enabled for any index that is defined as the clustered index for a table. Use "CREATE INDEX ... WITH (groupthreshold=2)" to enable the feature explicitly. You can also set the default_groupthreshold GUC parameter to enable the feature for all indexes by default.

The patch also introduces a number of new columns in pgstat_*_indexes views, to count various b-tree operations.

Design

See git-readme.txt. The same text is also in src/backend/access/nbtree/README in the patch.

For now, I've refrained from doing any code changes that aren't absolutely necessary for GIT to work. There's a number of things that should be changed, including dramatic changes to the indexam API, to get full performance gains and to make the patch less ugly

TODO

correctness/completeness

Performance

API/Code style

Testing

Performance test results

Unit tests, CPU bound

These tests were executed on a dual-core Dell Precision 690

Results show the index size benefits and the CPU overhead of GIT.

Test nameidxpages (unpatched)idxpages (patched)patched / unpatchedduration (unpatched)duration (patched)patched / unpatched
bitmapscan_range.sql8228971.18%00:00:03.9700:00:07.01176.31%
create_uniq.sql8228971.18%00:00:04.8400:00:04.3890.51%
delete_insert.sql944244.68%00:00:47.7400:00:48.05100.64%
insert_backwards_monotonic.sql139551661.19%00:00:21.3600:00:17.2680.79%
insert_duplicates_wide.sql324006181.91%00:03:47.5000:03:42.4697.78%
insert_duplicates.sql104011651.59%00:00:15.4600:00:16.52106.85%
insert_monotonic_nonuniq.sql8228971.18%00:00:14.7600:00:15.88107.60%
insert_monotonic_uniq.sql8228971.18%00:00:14.5900:00:15.76108.00%
insert_monotonic_wide.sql493978681.76%00:02:36.1300:02:02.8178.66%
insert_piecewise_monotonic_nonuniq.sql15025171811.43%00:00:21.5000:00:20.4194.96%
insert_prefilled.sql11634133811.50%00:00:15.6300:00:07.4547.67%
insert_random.sql1062310639100.15%00:00:31.1300:00:27.2787.59%
repeated_update.sql34482.33%00:06:04.8800:06:17.40103.43%
select_range_char10.sql115541201.04%00:00:11.1200:00:23.97215.56%
select_range_timestamp.sql8228971.18%00:00:05.2400:00:07.17136.71%
select_range.sql8228971.18%00:00:04.4400:00:07.00157.66%
select_single_char10.sql115541201.04%00:00:04.2200:00:04.1698.59%
select_single.sql8228971.18%00:00:02.3900:00:03.39141.81%
vacuum.sql8228971.18%00:00:12.4000:00:00.987.93%

Unit tests, I/O bound

These results show the I/O benefits of the patch

[Needs to be run again. I ran some tests with an earlier version of the patch, and they did show a gain as expected from the reduced index size, but I don't have the numbers now]

[DBT-2 numbers to be added. I'll just say that there's significant reduction in I/O initially, but it vanishes over time as the tables gets less clustered as a result of updates... The HOT patch stops the declustering of the tables, and makes the benefit permanent. ]


Heikki Linnakangas <heikki@enterprisedb.com.nospam>
Updated: 25.7.2007