PostgreSQL extensions are great. Simply by adding an extension, one can transform an otherwise vanilla general-purpose database management system into one capable of processing data requirements in a highly optimized fashion. Some extensions, like pg_repack, simplify and enhance existing features, while other extensions, such as PostGIS and pgvector, add completely new capabilities.
I would like to introduce the PostgreSQL extension TimescaleDB. As the name implies, TimescaleDB is all about data recorded against time. In database systems, time series data is often a relatively small data point of one or more dimensions. A good example is weather data: time versus air pressure, UV index, wind speed, temperature, humidity, and related measurements.
Time series data processing can be taxing to any database system:
While PostgreSQL is quite capable of ingesting, managing, and analyzing time series data, there are other solutions, many of them proprietary, that can perform data ingestion and generate actionable insights at a faster rate.
This brings us back to TimescaleDB.
TimescaleDB was created to improve the rate of time series data ingestion into PostgreSQL by simplifying and automating many of the steps that an experienced DBA uses to optimize and maintain the health of the database over its lifecycle.
The best way to appreciate the possibilities is to install the extension and try it out.
Because my Linux machine is Ubuntu, the following commands are used to install the extension:
|
1 2 3 |
echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - |
|
1 2 3 4 5 |
echo "deb https://packagecloud.io/timescale/timescaledb/ubuntu/ $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/timescaledb.list wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey | sudo apt-key add - apt update && apt upgrade -y |
As I am using Ubuntu Jammy, I have the following available packages:
|
1 2 |
# This command returns the package list. apt search postgresql | grep -A 3 -E '^timescaledb-2-postgresql' |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# A collection of packages covering all currently supported versions of Postgres. timescaledb-2-postgresql-12/jammy 2.11.2~ubuntu22.04 amd64 An open-source time-series database based on PostgreSQL, as an extension. timescaledb-2-postgresql-13/jammy 2.13.1~ubuntu22.04 amd64 An open-source time-series database based on PostgreSQL, as an extension. timescaledb-2-postgresql-14/jammy 2.13.1~ubuntu22.04 amd64 An open-source time-series database based on PostgreSQL, as an extension. timescaledb-2-postgresql-15/jammy 2.13.1~ubuntu22.04 amd64 An open-source time-series database based on PostgreSQL, as an extension. timescaledb-2-postgresql-16/jammy 2.13.1~ubuntu22.04 amd64 An open-source time-series database based on PostgreSQL, as an extension. |
|
1 |
apt install timescaledb-2-postgresql-16 |
The available packages include:
|
1 2 3 4 5 6 7 8 9 10 |
timescaledb-2-loader-postgresql-16 # The loader for TimescaleDB. # timescaledb-2-postgresql-16 # The TimescaleDB extension for Postgres 16. # timescaledb-toolkit-postgresql-16 # A library of analytical hyperfunctions, # time-series pipelining, and other SQL utilities. # timescaledb-tools # A suite of tools that can be used with TimescaleDB. # PGBIN/timescaledb-parallel-copy # PGBIN/timescaledb-tune |
The CLI utility timescaledb-tune is useful because it can generate an optimized set of Postgres runtime parameters. However, it assumes that Postgres is the only major process running on the host and is not competing for RAM or CPU.
Suggested test invocation:
|
1 |
/usr/bin/timescaledb-tune --help 2>&1 >/dev/stdout | less |
Here is the invocation that saves the results in postgresql.auto.conf:
|
1 2 3 4 5 |
/usr/bin/timescaledb-tune \ --conf-path /var/lib/postgresql/16/main/postgresql.auto.conf \ -pg-version 16 \ -quiet \ -yes |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
shared_preload_libraries = 'timescaledb' shared_buffers = 5967MB effective_cache_size = 17901MB maintenance_work_mem = 2047MB work_mem = 5091kB timescaledb.max_background_workers = 16 max_worker_processes = 31 max_parallel_workers_per_gather = 6 max_parallel_workers = 12 wal_buffers = 16MB min_wal_size = 512MB max_wal_size = 1GB default_statistics_target = 100 random_page_cost = 1.1 checkpoint_completion_target = 0.9 max_connections = 100 max_locks_per_transaction = 256 autovacuum_max_workers = 10 autovacuum_naptime = 10 effective_io_concurrency = 256 timescaledb.last_tuned = '2024-03-01T20:49:53Z' timescaledb.last_tuned_version = '0.15.0' |
The service can now be restarted:
|
1 |
systemctl restart postgresql@16-main |
|
1 2 3 4 5 6 7 |
$ netstat -tlnp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 6786/systemd-resolv tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 6579/sshd: /usr/sbi tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 11447/postgres tcp6 0 0 :::22 :::* LISTEN 6579/sshd: /usr/sbi |
Pretty standard stuff: create your database and extension.
|
1 2 3 4 5 6 7 8 |
createdb db01 psql db01 <<'_eof_' CREATE EXTENSION timescaledb; -- Optional: turn off telemetry. ALTER SYSTEM SET timescaledb.telemetry_level = off; _eof_ |
Once created, the TimescaleDB extension installs a number of functions, views, and tables in the database:
|
1 2 3 4 5 6 7 8 9 10 11 12 |
\dn List of schemas Name | Owner --------------------------+------------------- _timescaledb_cache | postgres _timescaledb_catalog | postgres _timescaledb_config | postgres _timescaledb_functions | postgres _timescaledb_internal | postgres public | pg_database_owner timescaledb_experimental | postgres timescaledb_information | postgres |
Now that everything has been set up, we are ready to explore.
Let’s create two sets of tables. One is the standard heap table found in PostgreSQL, while the second is TimescaleDB’s hypertable.
Creating a TimescaleDB hypertable automatically defines the partitioning rule using any column of date/time type found in the table. In this case, the key is column t_stamp, which has the timestamptz data type, or timestamp with time zone.
|
1 2 3 4 5 6 7 8 9 10 11 |
-- Create a standard Postgres table. CREATE TABLE t_timescale ( id uuid, c2 int DEFAULT random() * 1E6, c3 int DEFAULT random() * 1E6, c4 int DEFAULT random() * 1E6, c5 int DEFAULT random() * 1E6, c6 int DEFAULT random() * 1E6, c7 int DEFAULT random() * 1E6, t_stamp timestamptz NOT NULL DEFAULT clock_timestamp() ); |
This is the command required to create the hypertable:
|
1 |
SELECT create_hypertable('t_timescale', 't_stamp'); |
The ordinary heap table is created as follows:
|
1 |
CREATE TABLE t_standard (LIKE t_timescale); |
TimescaleDB partitions its tables into chunks. Although the default size is constrained to seven days, it can be varied to any desired time range.
For the purposes of this blog, and because I am using small data sets, the chunk interval for table t_timescale is set to 10 minutes:
|
1 |
SELECT set_chunk_time_interval('t_timescale', INTERVAL '10 minutes'); |
Chunk size best practices:
Prior to populating the tables, the psql meta-command timing is invoked:
|
1 |
\timing |
While the standard table did not have any index at table creation, which accelerated data population, the hypertable added the index at the time of hypertable creation. Note that these numbers will vary widely according to the hardware used. Mine is relatively low-end.
|
1 2 3 4 5 6 7 8 |
-- Hypertable. -- 44:14 minutes: 54 GB. INSERT INTO t_timescale(id) SELECT gen_random_uuid() FROM generate_series(1, 5E8); -- Determine table size. SELECT * FROM pg_size_pretty(hypertable_size('t_timescale')); |
|
1 2 3 4 5 6 7 |
-- Heap. -- 08:58 minutes: 36 GB. INSERT INTO t_standard(id) SELECT id FROM t_timescale; -- 04:18 minutes: 10 GB. CREATE INDEX ON t_standard(t_stamp); |
Taking a closer look at the t_timescale schema definition, we see that five partitioned child tables have been created. Recalling the chunk interval setting of 10 minutes prior to data population, new partitions were created as the time interval incremented by 10 minutes:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
Table Structure: t_timescale Column | Type ---------+-------------------------- id | uuid c2 | integer c3 | integer c4 | integer c5 | integer c6 | integer c7 | integer t_stamp | timestamp with time zone Indexes: "t_timescale_t_stamp_idx" btree (t_stamp DESC) Triggers: ts_insert_blocker BEFORE INSERT ON t_timescale FOR EACH ROW EXECUTE FUNCTION _timescaledb_functions.insert_blocker() Child tables: _timescaledb_internal._hyper_2_2_chunk, _timescaledb_internal._hyper_2_2_chunk, _timescaledb_internal._hyper_2_3_chunk, _timescaledb_internal._hyper_2_5_chunk, _timescaledb_internal._hyper_2_6_chunk Access method: heap |
Examination of one of the chunks confirms the partition is set at 10 minutes:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
db01=# \d _hyper_2_6_chunk Table "_timescaledb_internal._hyper_2_6_chunk" Column | Type | Collation | Nullable | Default ---------+--------------------------+-----------+----------+--------------------------------------------------- id | uuid | | | c2 | integer | | | (random() * '1000000'::numeric::double precision) c3 | integer | | | (random() * '1000000'::numeric::double precision) c4 | integer | | | (random() * '1000000'::numeric::double precision) c5 | integer | | | (random() * '1000000'::numeric::double precision) c6 | integer | | | (random() * '1000000'::numeric::double precision) c7 | integer | | | (random() * '1000000'::numeric::double precision) t_stamp | timestamp with time zone | | not null | clock_timestamp() Indexes: "_hyper_2_6_chunk_t_timescale_t_stamp_idx" btree (t_stamp DESC) Check constraints: "constraint_6" CHECK (t_stamp >= '2024-01-30 19:20:00+00'::timestamp with time zone AND t_stamp < '2024-01-30 19:30:00+00'::timestamp with time zone) Inherits: public.t_timescale |
The inconsistent sizes shown below are because the number of records varied within the assigned 10-minute intervals:
|
1 2 3 4 5 6 7 8 9 10 11 |
List of Tables Schema | Name | Size -----------------------+------------------------+-------------- public | t_standard | 36000 MB | TOTAL 36 GB -----------------------+------------------------+-------------- _timescaledb_internal | _hyper_2_2_chunk | 4292 MB | _timescaledb_internal | _hyper_2_3_chunk | 8540 MB | _timescaledb_internal | _hyper_2_4_chunk | 8712 MB | _timescaledb_internal | _hyper_2_5_chunk | 8204 MB | _timescaledb_internal | _hyper_2_6_chunk | 6769 MB | TOTAL 36.5 GB |
You will notice the TimescaleDB indexes are significantly larger than the B-tree index created for table t_standard. Evidently, they contain more information/data:
|
1 2 3 4 5 6 7 8 9 10 |
List of Indexes Schema | Name | Type | Size -----------------------+--------------------------------------------+-------------------+--------- public | _t_standard_t_stamp_idx | index | 10 GB TOTAL 10 GB ----------------------------------------------------------------------------------------------------- _timescaledb_internal | _hyper_2_2_chunk_t_timescale_t_stamp_idx | _hyper_2_2_chunk | 2262 MB _timescaledb_internal | _hyper_2_3_chunk_t_timescale_t_stamp_idx | _hyper_2_3_chunk | 4500 MB _timescaledb_internal | _hyper_2_4_chunk_t_timescale_t_stamp_idx | _hyper_2_4_chunk | 4591 MB _timescaledb_internal | _hyper_2_5_chunk_t_timescale_t_stamp_idx | _hyper_2_5_chunk | 4323 MB _timescaledb_internal | _hyper_2_6_chunk_t_timescale_t_stamp_idx | _hyper_2_6_chunk | 3567 MB TOTAL 19.2 GB |
The following focuses solely on simple chunk administration.
These TimescaleDB functions are used to administer chunks:
|
1 2 3 4 5 6 7 8 9 |
Schema | Name --------+------------------------- public | chunks_detailed_size public | drop_chunks public | move_chunk public | reorder_chunk public | set_adaptive_chunking public | set_chunk_time_interval public | show_chunks |
These TimescaleDB functions are used to administer table compression:
|
1 2 3 4 5 6 7 8 9 |
Schema | Name --------+------------------------------ public | add_compression_policy public | chunk_compression_stats public | compress_chunk public | decompress_chunk public | hypertable_compression_stats public | recompress_chunk public | remove_compression_policy |
Similar to the general collection of Postgres runtime parameters, some TimescaleDB parameters operate across the data cluster while others can be more fine-tuned for specific relations and even sessions.
|
1 2 3 4 5 6 7 8 9 |
SELECT name, setting FROM pg_settings WHERE name ~ '^timescale' AND name ~ 'compress' UNION SELECT name, setting FROM pg_settings WHERE name ~ '^timescale' AND name ~ 'chunk'; |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
name | setting -------------------------------------------------+--------- timescaledb.enable_bulk_decompression | on timescaledb.enable_chunk_append | on timescaledb.enable_chunkwise_aggregation | on timescaledb.enable_compression_indexscan | on timescaledb.enable_decompression_logrep_markers | off timescaledb.enable_decompression_sorted_merge | on timescaledb.enable_dml_decompression | on timescaledb.enable_hypertable_compression | on timescaledb.enable_parallel_chunk_append | on timescaledb.enable_transparent_decompression | on timescaledb.max_cached_chunks_per_hypertable | 1024 timescaledb.max_open_chunks_per_insert | 1024 |
Taking a small subset of the aforementioned functions and runtime parameters, this next scenario demonstrates how one can compress everything from individual chunks to setting a comprehensive policy for a table based on chunk age under normal production conditions.
Table compression is one of those features best labeled as a killer feature. What is especially useful is that one can not only reduce space consumption but also query and perform DML operations on a hypertable.
|
1 2 3 4 5 6 7 8 9 10 |
-- Get current state of chunks. db01=# \dt+ _timescaledb_internal._hyper*chunk List of relations Schema | Name | Type | Owner | Persistence | Access method | Size -----------------------+------------------+-------+----------+-------------+---------------+---------+ _timescaledb_internal | _hyper_2_2_chunk | table | postgres | permanent | heap | 4292 MB | _timescaledb_internal | _hyper_2_3_chunk | table | postgres | permanent | heap | 8540 MB | _timescaledb_internal | _hyper_2_4_chunk | table | postgres | permanent | heap | 8712 MB | _timescaledb_internal | _hyper_2_5_chunk | table | postgres | permanent | heap | 8204 MB | _timescaledb_internal | _hyper_2_6_chunk | table | postgres | permanent | heap | 6769 MB | |
|
1 2 3 4 5 6 |
-- Enable compression. ALTER TABLE t_timescale SET ( timescaledb.compress, timescaledb.compress_orderby = 't_stamp asc' ); |
|
1 2 |
-- Time: 163081.593 ms (02:43.082). SELECT * FROM compress_chunk('_timescaledb_internal._hyper_2_4_chunk', true); |
Notice how the chunk’s size has been zeroed. To get the true size of the now-compressed relation, you need to use a function call. See below for an example invocation.
|
1 2 3 4 5 6 7 8 |
db01=# \dt+ _timescaledb_internal._hyper*chunk Schema | Name | Type | Owner | Persistence | Access method | Size -----------------------+------------------+-------+----------+-------------+---------------+---------- _timescaledb_internal | _hyper_2_2_chunk | table | postgres | permanent | heap | 4292 MB _timescaledb_internal | _hyper_2_3_chunk | table | postgres | permanent | heap | 8540 MB _timescaledb_internal | _hyper_2_4_chunk | table | postgres | permanent | heap | 0 bytes _timescaledb_internal | _hyper_2_5_chunk | table | postgres | permanent | heap | 8204 MB _timescaledb_internal | _hyper_2_6_chunk | table | postgres | permanent | heap | 6769 MB |
This function gives a complete set of metrics describing the now-compressed chunk:
|
1 2 |
-- Get current state of compression. SELECT * FROM hypertable_compression_stats('t_timescale'); |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
db01=# SELECT * FROM hypertable_compression_stats('t_timescale'); -[ RECORD 1 ]------------------+------------ total_chunks | 5 number_compressed_chunks | 1 before_compression_table_bytes | 9135104000 before_compression_index_bytes | 4813725696 before_compression_toast_bytes | 0 before_compression_total_bytes | 13948829696 after_compression_table_bytes | 24150016 after_compression_index_bytes | 0 after_compression_toast_bytes | 4946141184 after_compression_total_bytes | 4970291200 node_name | |
It is just as easy to decompress a chunk as it is to compress one. The typical reason is to maximize performance when DML operations must be performed.
|
1 2 3 |
-- Decompressing chunk size. -- Time: 267639.222 ms (04:27.639). SELECT * FROM decompress_chunk('_timescaledb_internal._hyper_2_4_chunk'); |
As previously demonstrated, setting the chunk interval is straightforward. So is setting the compression policy. One chooses the hypertable and how long after a chunk has been created before it is compressed. Using a time-based argument for compression recognizes the need to process the most recent data in the shortest amount of time.
|
1 2 3 4 5 6 |
-- Recall: enable compression. ALTER TABLE t_timescale SET ( timescaledb.compress, timescaledb.compress_orderby = 't_stamp asc' ); |
This compression policy compresses chunks that are older than 30 minutes:
|
1 2 3 4 5 |
SELECT add_compression_policy('t_timescale', compress_after => INTERVAL '30m'); add_compression_policy ------------------------ 1000 |
Testing the newly set compression policy is accomplished by inserting new records:
|
1 2 3 4 5 6 7 8 9 10 11 12 |
-- Populate new records. INSERT INTO t_timescale(id) SELECT gen_random_uuid() FROM generate_series(1, 2E8); INSERT 0 200000000 Time: 1100742.301 ms (18:20.742) SELECT * FROM pg_size_pretty(hypertable_size('t_timescale')); pg_size_pretty ---------------- 41 GB |
Note the partition size differences between the old and new chunks:
|
1 2 3 4 5 6 7 8 9 10 11 12 |
\dt+ _timescaledb_internal._hyper*chunk List of relations Schema | Name | Type | Owner | Persistence | Access method | Size ----------------------+-------------------+-------+----------+-------------+---------------+-------- _timescaledb_internal | _hyper_2_12_chunk | table | postgres | permanent | heap | 3231 MB _timescaledb_internal | _hyper_2_14_chunk | table | postgres | permanent | heap | 8265 MB _timescaledb_internal | _hyper_2_15_chunk | table | postgres | permanent | heap | 3111 MB _timescaledb_internal | _hyper_2_2_chunk | table | postgres | permanent | heap | 0 bytes _timescaledb_internal | _hyper_2_3_chunk | table | postgres | permanent | heap | 0 bytes _timescaledb_internal | _hyper_2_4_chunk | table | postgres | permanent | heap | 0 bytes _timescaledb_internal | _hyper_2_5_chunk | table | postgres | permanent | heap | 0 bytes _timescaledb_internal | _hyper_2_6_chunk | table | postgres | permanent | heap | 0 bytes |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
db01=# SELECT * FROM hypertable_compression_stats('t_timescale'); -[ RECORD 1 ]------------------+------------ total_chunks | 8 number_compressed_chunks | 5 before_compression_table_bytes | 38291226624 before_compression_index_bytes | 20177428480 before_compression_toast_bytes | 0 before_compression_total_bytes | 58468655104 after_compression_table_bytes | 101351424 after_compression_index_bytes | 0 after_compression_toast_bytes | 20732502016 after_compression_total_bytes | 20833853440 node_name | |
Disabling the compression policy is accomplished as follows:
|
1 |
SELECT remove_compression_policy('t_timescale'); |
As with all technologies, especially new ones, there are always limitations. TimescaleDB is no exception:
https://docs.timescale.com/
https://docs.timescale.com/api/latest/
https://docs.timescale.com/api/latest/compression/alter_table_compression/
https://docs.timescale.com/use-timescale/latest/compression/about-compression/
Resources
RELATED POSTS