There is always a need for profiling tools in databases for admins or developers. While it is easy to understand the point where an SQL is spending more time using
EXPLAIN or
EXPLAIN ANALYZE in PostgreSQL, the same would not work for functions. Recently, Jobin has published a blog post where he demonstrated how plprofiler can be useful in profiling functions.
plprofiler builds call graphs and creates flame graphs which make the report very easy to understand. Similarly, there is another interesting project called
plpgsql_check which can be used for a similar purpose as
plprofiler, while it also looks at code and points out compilation errors. Let us see all of that in action, in this blog post.
You could use yum on RedHat/CentOS to install this extension from PGDG repository. Steps to perform source installation on Ubuntu/Debian are also mentioned in the following logs.
On RedHat/CentOS
|
1 |
$ sudo yum install plpgsql_check_11 |
On Ubuntu/Debian
|
1 |
$ sudo apt-get install postgresql-server-dev-11 libicu-dev gcc make <br>$ git clone https://github.com/okbob/plpgsql_check.git <br>$ cd plpgsql_check/ <br>$ make && make install |
There are 3 advantages of using plpgsql_check
When using plpgsql_check for the first 2 requirements, you may not need to add any entry to shared_preload_libraries. However, if you need to use it for profiling functions (3), then you should add appropriate entries to shared_preload_libraries so that it could load both plpgsql and plpgsql_check. Due to dependencies, plpgsql must be before plpgsql_check in the shared_preload_libraries config as you see in the following example :
|
1 |
shared_preload_libraries = plpgsql, plpgsql_check |
Any change to shared_preload_libraries requires a restart. You may see the following error when you do not have plpgsql before plpgsql_check in the shared_preload_libraries config.
|
1 |
$ grep "shared_preload_libraries" $PGDATA/postgresql.auto.conf<br>shared_preload_libraries = 'pg_qualstats, pg_stat_statements, plpgsql_check'<br><br>$ pg_ctl -D /var/lib/pgsql/11/data restart -mf<br>waiting for server to shut down.... done<br>server stopped<br>waiting for server to start....2019-07-07 02:07:10.104 EDT [6423] <br>FATAL: could not load library "/usr/pgsql-11/lib/plpgsql_check.so": /usr/pgsql-11/lib/plpgsql_check.so: undefined symbol: plpgsql_parser_setup<br>2019-07-07 02:07:10.104 EDT [6423] LOG: database system is shut down<br>stopped waiting<br>pg_ctl: could not start server<br>Examine the log output.<br><br>$ grep "shared_preload_libraries" $PGDATA/postgresql.auto.conf<br>shared_preload_libraries = 'pg_qualstats, pg_stat_statements, plpgsql, plpgsql_check'<br><br>$ pg_ctl -D /var/lib/pgsql/11/data start<br>.....<br>server started |
Once the PostgreSQL instance is started, create this extension in the database where you need to perform any of the previously discussed 3 tasks.
|
1 |
$ psql -d percona -c "CREATE EXTENSION plpgsql_check"<br>CREATE EXTENSION |
As discussed earlier, this extension can help developers and admins determine compilation errors. But why is it needed? Let’s consider the following example where we see no errors while creating the function. By the way, I have taken this example from my previous blog post where I was talking about Automatic Index recommendations using hypopg and pg_qualstats. You might want to read that blog post to understand the logic of the following function.
|
1 |
percona=# CREATE OR REPLACE FUNCTION find_usable_indexes()<br>percona-# RETURNS VOID AS<br>percona-# $$<br>percona$# DECLARE<br>percona$# l_queries record;<br>percona$# l_querytext text;<br>percona$# l_idx_def text;<br>percona$# l_bef_exp text;<br>percona$# l_after_exp text;<br>percona$# hypo_idx record;<br>percona$# l_attr record;<br>percona$# /* l_err int; */<br>percona$# BEGIN<br>percona$# CREATE TABLE IF NOT EXISTS public.idx_recommendations (queryid bigint, <br>percona$# query text, current_plan jsonb, recmnded_index text, hypo_plan jsonb);<br>percona$# FOR l_queries IN<br>percona$# SELECT t.relid, t.relname, t.queryid, t.attnames, t.attnums, <br>percona$# pg_qualstats_example_query(t.queryid) as query<br>percona$# FROM <br>percona$# ( <br>percona$# SELECT qs.relid::regclass AS relname, qs.relid AS relid, qs.queryid, <br>percona$# string_agg(DISTINCT attnames.attnames,',') AS attnames, qs.attnums<br>percona$# FROM pg_qualstats_all qs<br>percona$# JOIN pg_qualstats q ON q.queryid = qs.queryid<br>percona$# JOIN pg_stat_statements ps ON q.queryid = ps.queryid<br>percona$# JOIN pg_amop amop ON amop.amopopr = qs.opno<br>percona$# JOIN pg_am ON amop.amopmethod = pg_am.oid,<br>percona$# LATERAL <br>percona$# ( <br>percona$# SELECT pg_attribute.attname AS attnames<br>percona$# FROM pg_attribute<br>percona$# JOIN unnest(qs.attnums) a(a) ON a.a = pg_attribute.attnum <br>percona$# AND pg_attribute.attrelid = qs.relid<br>percona$# ORDER BY pg_attribute.attnum) attnames, <br>percona$# LATERAL unnest(qs.attnums) attnum(attnum)<br>percona$# WHERE NOT <br>percona$# (<br>percona$# EXISTS <br>percona$# ( <br>percona$# SELECT 1<br>percona$# FROM pg_index i<br>percona$# WHERE i.indrelid = qs.relid AND <br>percona$# (arraycontains((i.indkey::integer[])[0:array_length(qs.attnums, 1) - 1], <br>percona$# qs.attnums::integer[]) OR arraycontains(qs.attnums::integer[], <br>percona$# (i.indkey::integer[])[0:array_length(i.indkey, 1) + 1]) AND i.indisunique)))<br>percona$# GROUP BY qs.relid, qs.queryid, qs.qualnodeid, qs.attnums) t<br>percona$# GROUP BY t.relid, t.relname, t.queryid, t.attnames, t.attnums <br>percona$# LOOP<br>percona$# /* RAISE NOTICE '% : is queryid',l_queries.queryid; */<br>percona$# execute 'explain (FORMAT JSON) '||l_queries.query INTO l_bef_exp;<br>percona$# execute 'select hypopg_reset()';<br>percona$# execute 'SELECT indexrelid,indexname FROM hypopg_create_index(''CREATE INDEX on '||l_queries.relname||'('||l_queries.attnames||')'')' INTO hypo_idx; <br>percona$# execute 'explain (FORMAT JSON) '||l_queries.query INTO l_after_exp;<br>percona$# execute 'select hypopg_get_indexdef('||hypo_idx.indexrelid||')' INTO l_idx_def;<br>percona$# INSERT INTO public.idx_recommendations (queryid,query,current_plan,recmnded_index,hypo_plan) <br>percona$# VALUES (l_queries.queryid,l_querytext,l_bef_exp::jsonb,l_idx_def,l_after_exp::jsonb); <br>percona$# END LOOP; <br>percona$# execute 'select hypopg_reset()';<br>percona$# END;<br>percona$# $$ LANGUAGE plpgsql;<br>CREATE FUNCTION |
From the above log, it has created the function with no errors. Unless we call the above function, we do not know if it has any compilation errors. Surprisingly, with this extension, we can use the plpgsql_check_function_tb() function to learn whether there are any errors, without actually calling it.
|
1 |
percona=# SELECT functionid, lineno, statement, sqlstate, message, detail, hint, level, position, <br>left (query,60),context FROM plpgsql_check_function_tb('find_usable_indexes()');<br>-[ RECORD 1 ]------------------------------------------------------------<br>functionid | find_usable_indexes<br>lineno | 14<br>statement | FOR over SELECT rows<br>sqlstate | 42P01<br>message | relation "pg_qualstats_all" does not exist<br>detail | <br>hint | <br>level | error<br>position | 306<br>left | SELECT t.relid, t.relname, t.queryid, t.attnames, t.attnums,<br>context | |
From the above log, it is clear that there is an error where a relation used in the function does not exist. But, if we are using dynamic SQLs that are assembled in runtime, false positives are possible, as you can see in the following example. So, you may avoid the functions using dynamic SQL’s or comment the code containing those SQLs before checking for other compilation errors.
|
1 |
percona=# select * from plpgsql_check_function_tb('find_usable_indexes()');<br>-[ RECORD 1 ]------------------------------------------------------------------------------<br>functionid | find_usable_indexes<br>lineno | 50<br>statement | EXECUTE<br>sqlstate | 00000<br>message | cannot determinate a result of dynamic SQL<br>detail | There is a risk of related false alarms.<br>hint | Don't use dynamic SQL and record type together, when you would check function.<br>level | warning<br>position | <br>query | <br>context | <br>-[ RECORD 2 ]------------------------------------------------------------------------------<br>functionid | find_usable_indexes<br>lineno | 52<br>statement | EXECUTE<br>sqlstate | 55000<br>message | record "hypo_idx" is not assigned yet<br>detail | The tuple structure of a not-yet-assigned record is indeterminate.<br>hint | <br>level | error<br>position | <br>query | <br>context | SQL statement "SELECT 'select hypopg_get_indexdef('||hypo_idx.indexrelid||')'" |
This extension can be used to find dependent objects in a function. This way, it becomes easy for administrators to understand the objects being used in a function without spending a huge amount of time reading the code. The trick is to simply pass your function as a parameter to plpgsql_show_dependency_tb() as you see in the following example.
|
1 |
percona=# select * from plpgsql_show_dependency_tb('find_usable_indexes()');<br> type | oid | schema | name | params <br>----------+-------+------------+----------------------------+-----------<br> FUNCTION | 16547 | public | pg_qualstats | ()<br> FUNCTION | 16545 | public | pg_qualstats_example_query | (bigint)<br> FUNCTION | 16588 | public | pg_stat_statements | (boolean)<br> RELATION | 2601 | pg_catalog | pg_am | <br> RELATION | 2602 | pg_catalog | pg_amop | <br> RELATION | 1249 | pg_catalog | pg_attribute | <br> RELATION | 1262 | pg_catalog | pg_database | <br> RELATION | 2610 | pg_catalog | pg_index | <br> RELATION | 16480 | public | idx_recommendations | <br> RELATION | 16549 | public | pg_qualstats | <br> RELATION | 16559 | public | pg_qualstats_all | <br> RELATION | 16589 | public | pg_stat_statements | <br>(12 rows) |
This is one of the very interesting features. Once you have added the appropriate entries to shared_preload_libraries as discussed earlier, you could easily enable or disable profiling through a GUC: plpgsql_check.profiler. This parameter can either be set globally or for only your session. Here’s an example to understand it better.
|
1 |
percona=# ALTER SYSTEM SET plpgsql_check.profiler TO 'ON';<br>ALTER SYSTEM<br>percona=# select pg_reload_conf();<br> pg_reload_conf <br>----------------<br> t<br>(1 row) |
When you set it globally, all the functions running in the database are automatically profiled and stored. This may be fine in a development or a testing environment, but you could also enable profiling of functions called in a single session through a session-level setting as you see in the following example.
|
1 |
percona=# BEGIN;<br>BEGIN<br>percona=# SET plpgsql_check.profiler TO 'ON';<br>SET<br>percona=# select salary_update(1000);<br> salary_update <br>---------------<br> <br>(1 row)<br><br>percona=# select lineno, avg_time, source from plpgsql_profiler_function_tb('salary_update(int)');<br> lineno | avg_time | source <br>--------+----------+---------------------------------------------------------------------------------------------------------------<br> 1 | | <br> 2 | | DECLARE <br> 3 | | l_abc record;<br> 4 | | new_sal INT;<br> 5 | 0.005 | BEGIN<br> 6 | 144.96 | FOR l_abc IN EXECUTE 'SELECT emp_id, salary FROM employee where emp_id between 1 and 10000 and dept_id = 2'<br> 7 | | LOOP<br> 8 | NaN | new_sal := l_abc.salary + sal_raise;<br> 9 | NaN | UPDATE employee SET salary = new_sal WHERE emp_id = l_abc.emp_id;<br> 10 | | END LOOP;<br> 11 | | END;<br>(11 rows)<br><br>--- Create an Index and check if it improves the execution time of FOR loop.<br><br>percona=# CREATE INDEX idx_1 ON employee (emp_id, dept_id);<br>CREATE INDEX<br>percona=# select salary_update(1000);<br> salary_update <br>---------------<br> <br>(1 row)<br><br>percona=# select lineno, avg_time, source from plpgsql_profiler_function_tb('salary_update(int)');<br> lineno | avg_time | source <br>--------+----------+---------------------------------------------------------------------------------------------------------------<br> 1 | | <br> 2 | | DECLARE <br> 3 | | l_abc record;<br> 4 | | new_sal INT;<br> 5 | 0.007 | BEGIN<br> 6 | 73.074 | FOR l_abc IN EXECUTE 'SELECT emp_id, salary FROM employee where emp_id between 1 and 10000 and dept_id = 2'<br> 7 | | LOOP<br> 8 | NaN | new_sal := l_abc.salary + sal_raise;<br> 9 | NaN | UPDATE employee SET salary = new_sal WHERE emp_id = l_abc.emp_id;<br> 10 | | END LOOP;<br> 11 | | END;<br>(11 rows)<br><br>percona=# END;<br>COMMIT<br>percona=# show plpgsql_check.profiler;<br> plpgsql_check.profiler <br>------------------------<br> on<br>(1 row)<br> |
In the above log, I have opened a new transaction block and enabled the parameter plpgsql_check.profiler only for that block. So any function that I have executed in that transaction is profiled, which can be extracted using plpgsql_profiler_function_tb(). Once we have identified the area where the execution time is high, the immediate action is to tune that piece of code. After creating the index, I have called the function again. It has now performed faster than earlier.
Special thanks to Pavel Stehule who has authored this extension and also to the contributors who have put this extension into a usable stage today. This is one of the simplest extensions that can be used to check for compilation errors and dependencies. While this can also be a handy profiling tool, a developer may find both plprofiler or plpgsql_check helpful for profiling as well.
Resources
RELATED POSTS