PostgreSQL is a fantastic database, but if you’re storing images, video, audio files, or other large data objects, you need to “toast” them to get optimal performance. This post will look at using The Oversized-Attribute Storage Technique (TOAST) to improve performance and scalability.
PostgreSQL uses fixed-sized pages, which makes it challenging to store huge data values directly. To solve this problem, large data values are compressed and broken into multiple smaller chunks. This process is done automatically and does not significantly impact how the database is used. This technique, called TOAST, improves how large data values are stored and used in the database.
Therefore, TOAST is a storage technique used in PostgreSQL to handle large data objects such as images, videos, and audio files. The TOAST technique allows for the efficient storage of large data objects by breaking them into smaller chunks and storing them separately from the main table. This can improve the performance of queries and indexing and reduce the amount of disk space required to store the data.
TOAST tables are created automatically by PostgreSQL when a table contains a column of type OID, bytea, or any other data type with the TOASTable storage class. The TOAST table is then used to store the large data objects, while the main table stores a reference to the TOAST table.
Here’s an example of using the TOAST technique in PostgreSQL:
|
1 |
CREATE TABLE images ( id SERIAL PRIMARY KEY, data BYTEA ); |
|
1 |
INSERT INTO images (data) VALUES (E'x...'); |
|
1 |
SELECT relname, relkind FROM pg_class WHERE relname LIKE 'pg_toast%'; |
In the example, the images table contains a column named data of type bytea, which can hold extensive binary data. When a large image is inserted into the table, PostgreSQL automatically creates a TOAST table to store the image data separately from the main table. The pg_class system catalog table is then queried to show that a TOAST table has been created.
It’s important to note that while TOAST tables help store large data objects, they can add complexity to the database and should be used with care. Also, in some cases, the query performance will decrease when the data is spread among different tables, depending on the query conditions.
If you have a lot of large data, you don’t need to query/index; another option you could consider is to store it outside the DB in the file system and store a reference to it in DB, similar to how a TOAST table works.
In PostgreSQL, you can use the different TOAST storage strategies by setting the “storage” attribute on a column.
|
1 |
CREATE TABLE mytable ( id serial primary key, large_column dat);<br>postgres=# d+ mytable<br>Table "public.mytable"<br>Column | Type | Collation | Nullable | Default | Storage <br>-------------+---------+-----------+----------+-------------------------------------+----------<br>id | integer | | not null | nextval('mytable_id_seq'::regclass) | plain <br>large_column | bytea | | | | extended <br>Indexes:<br>"mytable_pkey" PRIMARY KEY, btree (id)<br>Access method: heap<br> |
|
1 |
postgres=# ALTER TABLE mytable ALTER COLUMN large_column SET STORAGE PLAIN;<br>ALTER TABLE<br>postgres=# d+ mytable<br>Table "public.mytable"<br>Column | Type | Collation | Nullable | Default | Storage <br>-------------+---------+-----------+----------+-------------------------------------+----------<br>id | integer | | not null | nextval('mytable_id_seq'::regclass) | plain <br>large_column | bytea | | | | plain <br>Indexes:<br>"mytable_pkey" PRIMARY KEY, btree (id)<br>Access method: heap<br> |
|
1 |
postgres=# ALTER TABLE mytable ALTER COLUMN large_column SET STORAGE MAIN; <br>ALTER TABLE<br>postgres=# d+ mytable<br>Table "public.mytable"<br>Column | Type | Collation | Nullable | Default | Storage <br>-------------+---------+-----------+----------+-------------------------------------+----------<br>id | integer | | not null | nextval('mytable_id_seq'::regclass) | plain <br>large_column | bytea | | | | main <br>Indexes:<br>"mytable_pkey" PRIMARY KEY, btree (id)<br>Access method: heap<br> |
Where “data_type” is the data type of the column (e.g., text, bytea), and “strategy” is one of the four TOAST storage strategies (PLAIN, EXTENDED, EXTERNAL, MAIN).
You can also use the pg_attribute table in the system catalog to check which strategy a column uses.
|
1 |
SELECT attname, attstorage FROM pg_attribute WHERE attrelid = 'tablename'::regclass AND attnum > 0;<br><br>postgres=# SELECT attname, attstorage FROM pg_attribute WHERE attrelid = 'mytable'::regclass AND attnum > 0;<br>attname | attstorage <br>--------------+------------<br>id | p<br>large_column | m<br>(2 rows) |
It’s worth noting that the default storage strategy for most columns is ‘EXTENDED’ (compression and out-of-line storage) and that you can change the storage strategy of a table column at any time. However, it’s important to note that changing a column’s storage strategy may affect your queries’ performance and table size. Thus, testing your specific use case with different storage strategies is advisable to determine which provides the best performance.
While the TOAST technique can be useful to handle large data objects in PostgreSQL, you might hit a few problems. Here are a few common issues and ways to address them:
A common issue with the TOAST system in PostgreSQL is that the size of the TOAST table can grow out of control. This can happen when large amounts of data are inserted into the table, causing the table to become larger than the available disk space. There are several ways to address this issue:
In conclusion, the TOAST system in PostgreSQL is a powerful feature that allows the database to handle large column values that would not fit in a single database block. The system uses a variety of strategies for storing these columns, including PLAIN, EXTENDED, EXTERNAL, and MAIN. Each strategy has its advantages and uses cases, and the appropriate strategy will depend on the specific requirements of your application.
For example, you may use the EXTERNAL strategy if you have a table with a large column of text and want to improve performance when substring operations are needed. When designing a table, consider the size and type of data stored in the columns and choose an appropriate storage strategy that will meet your application’s performance and space requirements. It’s also possible to change the storage strategy of a column at any time, although this may affect the performance of your queries and the size of your table. Thus, testing different strategies before settling on the best one is highly recommended.
Ultimately, understanding the TOAST system and how to use it effectively can significantly improve the performance and scalability of your PostgreSQL applications.
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.
Resources
RELATED POSTS