PostgreSQL extensions are great! Simply by adding an extension, one transforms what is an otherwise vanilla general-purpose database management system into one capable of processing data requirements in a highly optimized fashion. Some extensions, like pg_repack, simplify and enhance existing features already, while other extensions, such as PostGIS and pgvector, add completely new capabilities.
I’d like to introduce to you the PostgreSQL extension TimescaleDB. As the name implies, TimescaleDB is all about data recorded against time. In database systems, time series data is often a relatively small data point of one or more dimensions. A good example is the gathering of weather data, i.e., time versus air pressure, UV index, wind speed, temperature, humidity, etc.
Time series data processing can be taxing to any database system:
While PostgreSQL is quite capable of ingesting, managing, and analyzing time series data, there are other solutions, many of them proprietary, that can perform data ingestion and generate actionable insights at a faster rate.
Which brings us back to TimescaleDB.
TimescaleDB was created as a means to improve the rate of data ingestion of time series data into PostgreSQL by simplifying and automating many of the steps that an experienced DBA uses in order to optimize and maintain the health of the database over its lifecycle.
The best way to appreciate the possibilities is, of course, to install the extension and try it out!
Because my Linux machine is Ubuntu, the following commands are used to install the extension:
|
1 |
echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list<br><br>wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -<br> |
|
1 |
echo "deb https://packagecloud.io/timescale/timescaledb/ubuntu/ $(lsb_release -c) -s) main" | sudo tee /etc/apt/sources.list.d/timescaledb.list<br><br>wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey <br>| sudo apt-key add -<br><br>apt update && apt upgrade -y<br> |
As I’m using Ubuntu “Jammy,” I have the following available packages:
|
1 |
# this command shows returns the package list<br>apt search postgresql | grep -A 3 -E '^timescaledb-2-postgresql' |
|
1 |
# A nice collection of packages covering all currently supported versions of postgres<br> timescaledb-2-postgresql-12/jammy 2.11.2~ubuntu22.04 amd64<br> An open-source time-series database based on PostgreSQL, as an extension.<br><br> timescaledb-2-postgresql-13/jammy 2.13.1~ubuntu22.04 amd64<br> An open-source time-series database based on PostgreSQL, as an extension.<br><br> timescaledb-2-postgresql-14/jammy 2.13.1~ubuntu22.04 amd64<br> An open-source time-series database based on PostgreSQL, as an extension.<br><br> timescaledb-2-postgresql-15/jammy 2.13.1~ubuntu22.04 amd64<br> An open-source time-series database based on PostgreSQL, as an extension.<br><br> timescaledb-2-postgresql-16/jammy 2.13.1~ubuntu22.04 amd64<br> An open-source time-series database based on PostgreSQL, as an extension. |
|
1 |
apt install timescaledb-2-postgresql-16 |
The available packages include:
|
1 |
timescaledb-2-loader-postgresql-16 # The loader for TimescaleDB<br> #<br>timescaledb-2-postgresql-16 # The timescale extension for postgres 16<br> #<br>timescaledb-toolkit-postgresql-16 # A library of analytical hyperfunctions,<br> # time-series pipelining, and other SQL utilities<br> #<br>timescaledb-tools # A suite of tools that can be used with<br> # TimescaleDB<br> # PGBIN/timescaledb-parallel-copy<br> # PGBIN/timescaledb-tune<br><br> |
The CLI utility timescaledb-tune is great as it can be used to generate an optimized set of Postgres runtime parameters. However, it assumes that Postgres is the only major process running on the host and is neither competing for RAM nor CPU.
Suggested test invocation:
|
1 |
/usr/bin/timescaledb-tune --help 2>& 1>/dev/stdout | less |
And here’s the invocation saving the results in postgresql.auto.conf:
|
1 |
/usr/bin/timescaledb-tune <br> --conf-path /var/lib/postgresql/16/main/postgresql.auto.conf <br> -pg-version 16 <br> -quiet <br> -yes |
|
1 |
shared_preload_libraries = 'timescaledb'<br>shared_buffers = 5967MB<br>effective_cache_size = 17901MB<br>maintenance_work_mem = 2047MB<br>work_mem = 5091kB<br>timescaledb.max_background_workers = 16<br>max_worker_processes = 31<br>max_parallel_workers_per_gather = 6<br>max_parallel_workers = 12<br>wal_buffers = 16MB<br>min_wal_size = 512MB<br>max_wal_size = 1GB<br>default_statistics_target = 100<br>random_page_cost = 1.1<br>checkpoint_completion_target = 0.9<br>max_connections = 100<br>max_locks_per_transaction = 256<br>autovacuum_max_workers = 10<br>autovacuum_naptime = 10<br>effective_io_concurrency = 256<br>timescaledb.last_tuned = '2024-03-01T20:49:53Z'<br>timescaledb.last_tuned_version = '0.15.0'<br> |
The service can now be restarted:
|
1 |
systemctl restart postgresql@16-main |
|
1 |
$ netstat -tlnp <br>Active Internet connections (only servers)<br>Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name <br>tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 6786/systemd-resolv <br>tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 6579/sshd: /usr/sbi <br>tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 11447/postgres <br>tcp6 0 0 :::22 :::* LISTEN 6579/sshd: /usr/sbi |
Pretty standard stuff, create your database and extension:
|
1 |
createdb db01<br><br>psql db01<<_eof_<br> create extension timescaledb;<br> -- optional; turn off telemetry<br> alter system set timescaledb.telemetry_level=off;<br>_eof_<br> |
Once created, the TimescaleDB extension installs a number of functions, views, and tables in the database:
|
1 |
dn<br> List of schemas<br> Name | Owner<br>--------------------------+-------------------<br> _timescaledb_cache | postgres<br> _timescaledb_catalog | postgres<br> _timescaledb_config | postgres<br> _timescaledb_functions | postgres<br> _timescaledb_internal | postgres<br> public | pg_database_owner<br> timescaledb_experimental | postgres<br> timescaledb_information | postgres |
Now that everything has been set up, we’re ready to explore.
Let’s create two sets of tables: one is the standard heap table found in PostgreSQL, while the second type will, of course, be TimescaleDB’s hypertable.
Creating a TimescaleDB hypertable automatically defines the partitioning rule using any column of Date/Time Types found in the table. In this case, it’s column t_stamp which is of datatype timestamptz (timestamp with time zone) as the key.
|
1 |
-- create a standard postgres table<br>create table t_timescale(<br> id uuid<br> ,c2 int default random()*1E6<br> ,c3 int default random()*1E6<br> ,c4 int default random()*1E6<br> ,c5 int default random()*1E6<br> ,c6 int default random()*1E6<br> ,c7 int default random()*1E6<br> ,t_stamp timestamptz not null default clock_timestamp()<br>); |
This is the command required to create the hypertable:
|
1 |
select create_hypertable('t_timescale', 't_stamp'); |
The ordinary heap table is created thusly:
|
1 |
create table t_standard (like t_timescale); |
TimescaleDB partitions its tables into chunks. Although the default size is constrained to seven days, it can however be varied to any time range desired.
For the purposes of this blog, and because I’m using small data sets, the chunk interval for table t_timescale is set at 10 minutes:
|
1 |
select set_chunk_time_interval('t_timescale', interval '10 minutes'); |
Chunk size best practices:
Prior to populating the tables, the psql meta-command timing is invoked:
|
1 |
timing |
While the standard table didn’t have any index at table creation, thus accelerating data population, the hypertable added the index at the time of the hyper table’s creation. Please note: these numbers will vary wildly according to the Hardware used, i.e., mine is relatively low-end.
|
1 |
-- HYPER --<br>-- 44:14 minutes: 54GB<br>insert into t_timescale(id) select gen_random_uuid() from generate_series (1, 5E8);<br><br>-- determining table size<br>select * from pg_size_pretty(hypertable_size('t_timescale')); |
|
1 |
-- HEAP --<br>-- 08:58 minutes : 36 GB<br>insert into t_standard(id) select * from t_timescale<br><br>-- 04:18 minutes: 10 GB<br>create index on t_standard(t_stamp); |
Taking a closer look at the t_timescale schema definition one sees that five partitioned child tables have been created. Recalling the chunk interval setting of 10 minutes prior to the data population, new partitions were created as the time interval incremented by 10 minutes:
|
1 |
Table Structure: t_timescale<br><br> Column | Type |<br>---------+--------------------------+<br> id | uuid |<br> c2 | integer |<br> c3 | integer |<br> c4 | integer |<br> c5 | integer |<br> c6 | integer |<br> c7 | integer |<br> t_stamp | timestamp with time zone |<br>Indexes:<br> "t_timescale_t_stamp_idx" btree (t_stamp DESC)<br>Triggers:<br> ts_insert_blocker BEFORE INSERT ON t_timescale FOR EACH ROW EXECUTE FUNCTION _timescaledb_functions.insert_blocker()<br>Child tables: _timescaledb_internal._hyper_2_2_chunk,<br> _timescaledb_internal._hyper_2_2_chunk,<br> _timescaledb_internal._hyper_2_3_chunk,<br> _timescaledb_internal._hyper_2_5_chunk,<br> _timescaledb_internal._hyper_2_6_chunk<br>Access method: heap |
Examination of one of the chunks confirms the partition is set at 10 minutes:
|
1 |
db01=# d _hyper_2_6_chunk<br> Table "_timescaledb_internal._hyper_2_6_chunk"<br> Column | Type | Collation | Nullable | Default<br>---------+--------------------------+-----------+----------+---------------------------------------------------<br> id | uuid | | |<br> c2 | integer | | | (random() * '1000000'::numeric::double precision)<br> c3 | integer | | | (random() * '1000000'::numeric::double precision)<br> c4 | integer | | | (random() * '1000000'::numeric::double precision)<br> c5 | integer | | | (random() * '1000000'::numeric::double precision)<br> c6 | integer | | | (random() * '1000000'::numeric::double precision)<br> c7 | integer | | | (random() * '1000000'::numeric::double precision)<br> t_stamp | timestamp with time zone | | not null | clock_timestamp()<br>Indexes:<br> "_hyper_2_6_chunk_t_timescale_t_stamp_idx" btree (t_stamp DESC)<br>Check constraints:<br> "constraint_6" CHECK (t_stamp >= '2024-01-30 19:20:00+00'::timestamp with time zone<br> AND t_stamp < '2024-01-30 19:30:00+00'::timestamp with time zone)<br>Inherits: public.t_timescale |
The inconsistent sizes, as indicated in the results below, are because the number of records varied within the assigned 10-minute interval:
|
1 |
List of Tables<br><br> Schema | Name | Size<br>-----------------------+------------------------+--------------<br> public | t_standard | 36000 MB | TOTAL 36 GB<br>-----------------------+------------------------+--------------<br> _timescaledb_internal | _hyper_2_2_chunk | 4292 MB |<br> _timescaledb_internal | _hyper_2_3_chunk | 8540 MB |<br> _timescaledb_internal | _hyper_2_4_chunk | 8712 MB |<br> _timescaledb_internal | _hyper_2_5_chunk | 8204 MB |<br> _timescaledb_internal | _hyper_2_6_chunk | 6769 MB | TOTAL 36.5 GB |
You’ll notice the timescaleDB Indexes are significantly larger than the Btree index created for table t_standard. Evidently, they contain more information/data:
|
1 |
List of Indexes<br> Schema | Name | Type | Size<br>-----------------------+--------------------------------------------+-------+------------------------<br> public | _t_standard_t_stamp_idx | index | 10 GB TOTAL 10 GB<br><br>-----------------------------------------------------------------------------------------------------<br> _timescaledb_internal | _hyper_2_2_chunk_t_timescale_t_stamp_idx | _hyper_2_2_chunk | 2262 MB<br> _timescaledb_internal | _hyper_2_3_chunk_t_timescale_t_stamp_idx | _hyper_2_3_chunk | 4500 MB<br> _timescaledb_internal | _hyper_2_4_chunk_t_timescale_t_stamp_idx | _hyper_2_4_chunk | 4591 MB<br> _timescaledb_internal | _hyper_2_5_chunk_t_timescale_t_stamp_idx | _hyper_2_5_chunk | 4323 MB<br> _timescaledb_internal | _hyper_2_6_chunk_t_timescale_t_stamp_idx | _hyper_2_6_chunk | 3567 MB TOTAL 19.2 GB |
The following focuses solely on simple chunk administration.
These TimescaleDB functions are used to administer the chunks:
|
1 |
Schema | Name<br>--------+-------------------------<br> public | chunks_detailed_size<br> public | drop_chunks<br> public | move_chunk<br> public | reorder_chunk<br> public | set_adaptive_chunking<br> public | set_chunk_time_interval<br> public | show_chunks |
These TimescaleDB functions are used to administer table’s compression:
|
1 |
Schema | Name<br>--------+------------------------------<br> public | add_compression_policy<br> public | chunk_compression_stats<br> public | compress_chunk<br> public | decompress_chunk<br> public | hypertable_compression_stats<br> public | recompress_chunk<br> public | remove_compression_policy |
Similar to the general collection of Postgres runtime parameters, some of these timescale parameters operate across the data cluster while others can be more fine-tuned for specific relations and even sessions.
|
1 |
select name, setting<br> from pg_settings<br> where name ~ '^timescale'<br> and name ~ 'compress'<br>union<br>select name, setting<br> from pg_settings<br> where name ~ '^timescale'<br> and name ~ 'chunk'; |
|
1 |
name | setting<br>-------------------------------------------------+---------<br> timescaledb.enable_bulk_decompression | on<br> timescaledb.enable_chunk_append | on<br> timescaledb.enable_chunkwise_aggregation | on<br> timescaledb.enable_compression_indexscan | on<br> timescaledb.enable_decompression_logrep_markers | off<br> timescaledb.enable_decompression_sorted_merge | on<br> timescaledb.enable_dml_decompression | on<br> timescaledb.enable_hypertable_compression | on<br> timescaledb.enable_parallel_chunk_append | on<br> timescaledb.enable_transparent_decompression | on<br> timescaledb.max_cached_chunks_per_hypertable | 1024<br> timescaledb.max_open_chunks_per_insert | 1024 |
Taking a small subset of the aforementioned functions and runtime parameters, this next scenario demonstrates how one can compress everything from individual chunks to setting a comprehensive policy for a table based on the chunk age that has been created under normal production conditions.
Table compression is one of those features best labeled under the category of the killer feature. What’s especially cool is one can not only reduce space consumption but query and perform DML operations on a hyper table too.
|
1 |
-- Get Current State of CHUNKS<br>db01=# dt+ _timescaledb_internal._hyper*chunk<br> List of relations<br> Schema | Name | Type | Owner | Persistence | Access method | Size <br>-----------------------+------------------+-------+----------+-------------+---------------+---------+<br> _timescaledb_internal | _hyper_2_2_chunk | table | postgres | permanent | heap | 4292 MB |<br> _timescaledb_internal | _hyper_2_3_chunk | table | postgres | permanent | heap | 8540 MB |<br> _timescaledb_internal | _hyper_2_4_chunk | table | postgres | permanent | heap | 8712 MB |<br> _timescaledb_internal | _hyper_2_5_chunk | table | postgres | permanent | heap | 8204 MB |<br> _timescaledb_internal | _hyper_2_6_chunk | table | postgres | permanent | heap | 6769 MB | |
|
1 |
-- ENABLE COMPRESSION<br>alter table t_timescale<br>set (<br> timescaledb.compress,<br> timescaledb.compress_orderby = 't_stamp asc'<br>); |
|
1 |
-- Time: 163081.593 ms (02:43.082)<br>select * from compress_chunk('_timescaledb_internal._hyper_2_4_chunk',true); |
Notice how the chunk’s size has been zeroed. In order to get the true size of the now compressed relation, you’ll need to use a function call, see below for an example invocation.
|
1 |
db01=# dt+ _timescaledb_internal._hyper*chunk<br> Schema | Name | Type | Owner | Persistence | Access method | Size <br>-----------------------+------------------+-------+----------+-------------+---------------+----------<br> _timescaledb_internal | _hyper_2_2_chunk | table | postgres | permanent | heap | 4292 MB |<br> _timescaledb_internal | _hyper_2_3_chunk | table | postgres | permanent | heap | 8540 MB |<br> _timescaledb_internal | _hyper_2_4_chunk | table | postgres | permanent | heap | 0 bytes |<br> _timescaledb_internal | _hyper_2_5_chunk | table | postgres | permanent | heap | 8204 MB |<br> _timescaledb_internal | _hyper_2_6_chunk | table | postgres | permanent | heap | 6769 MB | |
This function gives a complete set of metrics describing the now compressed chunk:
|
1 |
-- Get Current State of Compression<br>select * from hypertable_compression_stats('t_timescale'); |
|
1 |
db01=# select * from hypertable_compression_stats('t_timescale');<br>-[ RECORD 1 ]------------------+------------<br>total_chunks | 5<br>number_compressed_chunks | 1<br>before_compression_table_bytes | 9135104000<br>before_compression_index_bytes | 4813725696<br>before_compression_toast_bytes | 0<br>before_compression_total_bytes | 13948829696<br>after_compression_table_bytes | 24150016<br>after_compression_index_bytes | 0<br>after_compression_toast_bytes | 4946141184<br>after_compression_total_bytes | 4970291200<br>node_name |
It’s just as easy to decompress a chunk as it is to compress one. The typical reason will be to maximize performance when it comes to the various DML operations that must be performed.
|
1 |
-- Decompressing Chunk Size<br>-- Time: 267639.222 ms (04:27.639)<br>select * from decompress_chunk('_timescaledb_internal._hyper_2_4_chunk'); |
As previously demonstrated, setting the CHUNK interval (time) is a fairly straightforward process. So too, is setting the compression policy. One merely chooses the hypertable and how long after it has been created before it is finally compressed. Having a time-based argument for compression recognizes the need to process the most recent data in the shortest amount of time:
|
1 |
-- RECALL: ENABLE COMPRESSION<br>alter table t_timescale<br>set (<br> timescaledb.compress,<br> timescaledb.compress_orderby = 't_stamp asc'<br>); |
This compression policy compresses chunks that are older than 30 minutes:
|
1 |
select add_compression_policy('t_stamp', compress_after => INTERVAL '30m');<br><br> add_compression_policy <br>------------------------<br> 1000<br> |
Testing the newly set compression policy is accomplished by inserting new records:
|
1 |
-- Populate new records<br>insert into t_timescale(id) select gen_random_uuid() from generate_series (1, 2E8);<br>INSERT 0 200000000<br>Time: 1100742.301 ms (18:20.742)<br><br>select * from pg_size_pretty(hypertable_size('t_timescale'));<br> pg_size_pretty <br>----------------<br> 41 GB<br> |
Note the partition size differences between the old and new ones:
|
1 |
dt+ _timescaledb_internal._hyper*chunk<br> List of relations<br> Schema | Name | Type | Owner | Persistence | Access method | Size <br>----------------------+-------------------+-------+----------+-------------+---------------+--------<br>_timescaledb_internal | _hyper_2_12_chunk | table | postgres | permanent | heap | 3231 MB<br>_timescaledb_internal | _hyper_2_14_chunk | table | postgres | permanent | heap | 8265 MB<br>_timescaledb_internal | _hyper_2_15_chunk | table | postgres | permanent | heap | 3111 MB<br>_timescaledb_internal | _hyper_2_2_chunk | table | postgres | permanent | heap | 0 bytes<br>_timescaledb_internal | _hyper_2_3_chunk | table | postgres | permanent | heap | 0 bytes<br>_timescaledb_internal | _hyper_2_4_chunk | table | postgres | permanent | heap | 0 bytes<br>_timescaledb_internal | _hyper_2_5_chunk | table | postgres | permanent | heap | 0 bytes<br>_timescaledb_internal | _hyper_2_6_chunk | table | postgres | permanent | heap | 0 bytes |
|
1 |
db01=# select * from hypertable_compression_stats('t_timescale');<br>-[ RECORD 1 ]------------------+------------<br>total_chunks | 8<br>number_compressed_chunks | 5<br>before_compression_table_bytes | 38291226624<br>before_compression_index_bytes | 20177428480<br>before_compression_toast_bytes | 0<br>before_compression_total_bytes | 58468655104<br>after_compression_table_bytes | 101351424<br>after_compression_index_bytes | 0<br>after_compression_toast_bytes | 20732502016<br>after_compression_total_bytes | 20833853440<br>node_name |
Disabling the compression policy is accomplished thusly:
|
1 |
SELECT remove_compression_policy('t_timescale'); |
As with all technologies, especially new ones, there’s always some form of limitation. TimescaleDB is no exception:
https://docs.timescale.com/
https://docs.timescale.com/api/latest/
https://docs.timescale.com/api/latest/compression/alter_table_compression/
https://docs.timescale.com/use-timescale/latest/compression/about-compression/
Resources
RELATED POSTS