PostgreSQL extensions — pg_visibility

Tomasz Gintowt
5 min readApr 25, 2024

--

Some time ago, I’ve decided to prepare series of short notes about interesting and most common used PG extensions. I called it #PostgreSQL Thursday’s .

What is Visibility Map ?

Visibility Map is a bitmap file which is created for every table. This file is named with suffix _vm. File contains entry for each page in the table along with two flags: all_frozen,all_visible. The first flag say “all rows in the block are visible to all transactions”, second say “all rows in the block are frozen”. The second bit “all_frozen” is set only when “all_visible” is set.

Why Visibility Map is useful ?

Improve vacuum performance — The vacuum process check for the bits of the page. If the first bit — “all rows are visible” — is set then vacuum process no need to process the page.
Improve freezing performance — The vacuum process while freezing the complete table (transaction id wraparound) if the second bit — “all rows are frozen” — is set then vacuum process no need to process the page.
Improve query performance — during Index-Only scan (instead of seeing in table page, get data from index if all columns in select are part of index) if the all_visible is set then it don’t access the table page to check the visibility of row, which reduce I/O.

Who create/modify this file ?

During the first vacuum process the file will be created by vacuum process and it will be modified by subsequent user processes who do DML(update, delete) on the table. There are two bits which say “all rows are visible” one is in visibility-map file and second is in the page of table it self(PD_ALL_VISIBLE). Any DML operation need to update both the bits(all_visible and PD_ALL_VISIBLE).

That’s theory, let’s play PostgreSQL.

postgres=# create extension pg_visibility ;
CREATE EXTENSION
postgres=# insert into test ( SELECT generate_series(1,1000) AS id);
SELECT 1000
postgres=# SELECT * FROM pg_visibility('test'::regclass);
blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------
0 | f | f | f
1 | f | f | f
2 | f | f | f
3 | f | f | f
4 | f | f | f
(5 rows)

Where blkno is a block number from CTID column.(More about PostgreSQL system columns). Now let’s check what is blkno from record id=1.

postgres=# SELECT ctid, id from test WHERE id =50;
ctid | id
--------+----
(0,50) | 50
(1 row)

So, address for record id=50 is (0,50), where 0 is block number. Let’s check visibility map.

postgres=#  SELECT * FROM pg_visibility_map('test')
WHERE blkno = 0;
blkno | all_visible | all_frozen
-------+-------------+------------
0 | f | f
(1 row)

If we back to top of this article, you will find explanation of all_visible and all_frozen column. Both are false.

postgres=# SELECT * FROM pg_visibility_map('test');
blkno | all_visible | all_frozen
-------+-------------+------------
0 | f | f
1 | f | f
2 | f | f
3 | f | f
4 | f | f
(5 rows)

postgres=# vacuum test;
VACUUM
postgres=# SELECT * FROM pg_visibility_map('test');
blkno | all_visible | all_frozen
-------+-------------+------------
0 | t | f
1 | t | f
2 | t | f
3 | t | f
4 | t | f
(5 rows)

Like you see vacuum has changes all_visible from false to true. Next step, pretty simple update.

postgres=# UPDATE test set id=15 where id=50;
UPDATE 1
postgres=# SELECT * FROM pg_visibility_map('test');
blkno | all_visible | all_frozen
-------+-------------+------------
0 | f | f
1 | t | f
2 | t | f
3 | t | f
4 | f | f
(5 rows)

If you look carefully all_visible has changed from block 0 and 4. After updating a value in the row with id 50, the visibility map shows the all_visible column as false for the corresponding page.

Visibility percentage

SELECT
x.pages::text,
round(avg(x.percent), 1)::int AS percentage
FROM (
SELECT
CASE WHEN all_visible THEN 'visible'
WHEN NOT all_visible THEN 'no_visible'
END pages,
100 * COUNT(all_visible) OVER (PARTITION BY all_visible) / COUNT(all_visible) OVER () AS percent
FROM
pg_visibility_map ('tab_visible')) x
GROUP BY
x.pages
UNION ALL
SELECT
'no_visible'::text,
0
WHERE NOT EXISTS (SELECT 1 FROM pg_visibility_map ('tab_visible') WHERE NOT all_visible);

pages | percentage
------------+------------
visible | 60
no_visible | 40
(2 rows)

In a freshly table or right after the vacuum we can get an output similar to the following:

postgres=# vacuum test;
VACUUM
postgres=# SELECT
x.pages::text,
round(avg(x.percent), 1)::int AS percentage
FROM (
SELECT
CASE WHEN all_visible THEN 'visible'
WHEN NOT all_visible THEN 'no_visible'
END pages,
100 * COUNT(all_visible) OVER (PARTITION BY all_visible) / COUNT(all_visible) OVER () AS percent
FROM
pg_visibility_map ('test')) x
GROUP BY
x.pages
UNION ALL
SELECT
'no_visible'::text,
0
WHERE NOT EXISTS (SELECT 1 FROM pg_visibility_map ('test') WHERE NOT all_visible);
pages | percentage
------------+------------
visible | 100
no_visible | 0
(2 rows)

postgres=# SELECT * FROM pg_visibility_map('test');
blkno | all_visible | all_frozen
-------+-------------+------------
0 | t | f
1 | t | f
2 | t | f
3 | t | f
4 | t | f
(5 rows)

Now all_visible is true and all_frozen is false.

Why it is so important ?

PostgreSQL will keep trying to perform the index-only scan, but when the pages are marked as all_visible false in the visibility map, it’ll have to choose a different technique and lose the performance gains.

An index-only scan is a super speed approach to getting the best possible performance, however, before adding indexes, we also need to keep our eyes on how the tables behave regarding their updates and page visibility.

pg_visibility extension is worth to know tool, with some queries like the one we saw in this blog, we can monitor tables and decide if they are good to work with the index-only scan or if some tuning for the vacuum needs to be done.

Corrupted visibility map

Below you can find quires how to check and fix corrupted visibility map.


postgres=# select * from pg_check_visible('test');
t_ctid
--------
(0 rows)

If above query returns non-zero visibility map is corrupted, usually due to database crash. You can fix it by:

postgres=# select pg_truncate_visibility_map('test');
pg_truncate_visibility_map
----------------------------

(1 row)

I’ve been an Architect, a DevOps, a DataOps, and a DBA. I’m focused on databases, real-time data and streaming technologies. I work with Apache Kafka, RabbitMQ, Elastic Stack and PostgreSQL/MySQL. After hours I love to share my knowledge and experience. I’m founder of DataOps Poland group, it’s an exchange place of experience, concepts and ideas from Data Engineering world.

https://www.dataops-academy.pl

https://www.meetup.com/dataops-poland/

https://www.linkedin.com/in/tomasz-gintowt/

--

--

Tomasz Gintowt

Architect, DevOps, SysOps, and DBA. Currently, I’m an IT Systems Engineer working with Apache Kafka, RabbitMQ, PostgreSQL, Elastic Stack in Real-Time Data Team.