Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Writing PostgreSQL Extensions is Fun – C Language

April 5, 2019

Author

Ibrar Ahmed

Insight for Developers

PostgreSQL

Share this Post:

PostgreSQL is a powerful open source relational database management system. It extends the SQL language with additional features. A DBMS is not only defined by its performance and out of the box features, but also its ability to support bespoke/additional user-specific functionality. Some of these functionalities may be in the form of database constructs or modules, like stored procedures or functions, but their scope is generally limited to the functionality being exposed by the DBMS. For instance, how will you write a custom query-analyzing application that resides within your DBMS?

To support such options, PostgreSQL provides a pluggable architecture that allows you to install extensions. Extensions may consist of a configuration (control) file, a combination of SQL files, and dynamically loadable libraries.

This means you can write your own code as per the defined guidelines of an extension and plug it in a PostgreSQL instance without changing the actual PostgreSQL code tree. An extension by very definition extends what PostgreSQL can do, but more than that, it gives you the ability to interact with external entities. These external entities can be other database management systems like ClickHouse, Mongo or HDFs (normally these are called foreign data wrappers), or other interpreters or compilers (thus allowing us to write database functions in another language like Java, Python, Perl or TCL, etc.). Another potential use case of an extension can be for code obfuscation which allows you to protect your super secret code from prying eyes.

Build your own

To build your own extension, you don’t need a complete PostgreSQL code base. You can build and install an extension using installed PostgreSQL (it may require you to install a devel RPM or Debian package). Details about extensions can be found in PostgreSQL’s official documentation[1]. There many extensions available for different features in the contrib directory of PostgreSQL source. Other than the contrib directory, people are also writing extensions readily available on the internet but currently not part of the PostgreSQL source tree. The pg_stat_statements, PL/pgSQL, and PostGIS are examples of the best known or most widely used extensions.

Generally available PostgreSQL extensions may be classified into four main categories:

- Add support of a new language extension (PL/pgSQL, PL/Python, and PL/Java)

- Data type extensions where you can introduce new ((Hstore, cube, and hstore)

- Miscellaneous extensions (contrib folder has many miscellaneous extensions)

- Foreign Data Wrapper extension (postgres_fdw, mysqldb_fdw, clickhousedb_fdw)

There are four basic file types that are required for building an extension:

- Makefile: Which uses PGXS PostgreSQL’s build infrastructure for extensions.

- Control File: Carries information about the extension.

- SQL File(s): If the extension has any SQL code, it may reside in form SQL files (optional)

- C Code: The shared object that we want to build (optional).

Extension Makefile

To compile the C code, we need a makefile. It’s a very simple makefile with the exception of “PGXS”, which is PostgreSQL’s infrastructural makefile for creating extensions. The inclusion of “PGXS” is done by invoking pg_config binary with “–pgxs” flag. The detail of that file can be found at GitHub[2].

This is a sample makefile, which can be used to compile the C code.

EXTENSION = log
MODULE_big = log
DATA = log--0.0.1.sql
OBJS = log.o 
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

EXTENSION = log

MODULE_big = log

DATA = log--0.0.1.sql

OBJS = log.o

PG_CONFIG = pg_config

PGXS := $(shell $(PG_CONFIG) --pgxs)

include $(PGXS)

Extension Control File

This file must be named as [EXTENSION NAME]. control file. The control file can contain many options which can be found at official documentation [3]. But in this example, I have used some basic options.

comments: Comments about the extension.

default_version: This is the extension SQL version. The name of the SQL file contains this information in the file name.

relocatable: Tells PostgreSQL if it is possible to move contained objects into a different schema.

module_pathname: This information is replaced with the actual lib file path.

comment = 'PostgreSQL Utility Command Logger
'default_version = '0.0.1'
relocatable = true
module_pathname = '$libdir/log'

comment = 'PostgreSQL Utility Command Logger

'default_version = '0.0.1'

relocatable = true

module_pathname = '$libdir/log'

The contents of the file can be seen using the psql command dx psql.

postgres=# dx log      
List of installed extensions Name   | Version | Schema |       Description 
------------------------------------+---------+--------+---------------------------------- 
log                                 | 0.0.1   | public | PostgreSQL Utility Command Logger

postgres=# dx log

List of installed extensions Name | Version | Schema | Description

------------------------------------+---------+--------+----------------------------------

log | 0.0.1 | public | PostgreSQL Utility Command Logger

Extension SQL File

This is a mapping file, which I used to map the PostgreSQL function with the corresponding C function. Whenever you call the SQL function then the corresponding C function is called. The name of the file must be [EXTENSION NAME]–[default-version].sql. This is the same default_version as defined in the control file.

CREATE FUNCTION pg_all_queries(OUT query TEXT, pid OUT TEXT)
RETURNS SETOF RECORDAS 'MODULE_PATHNAME', 
'pg_all_queries'
LANGUAGE C STRICT VOLATILE;

CREATE FUNCTION pg_all_queries(OUT query TEXT, pid OUT TEXT)

RETURNS SETOF RECORDAS 'MODULE_PATHNAME',

'pg_all_queries'

LANGUAGE C STRICT VOLATILE;

Extension C Code

There are three kinds of functions you can write in c code.

The first is where you call your c code function using SQL function written in SQL file of the extension.

The second type of function is callbacks. You register that callback by assigning the pointer of the function. There is no need for an SQL function here. You call this function automatically when a specific event occurs.

The third type of function is called automatically, even without registering. These functions are called on events such as extension load/unload time etc.

This is the C file containing the definition of the C code. There is no restriction for the name, or of the number of C files.

#include "postgres.h"
/* OS Includes */
/* PostgreSQL Includes */
PG_MODULE_MAGIC;
void _PG_init(void);
void _PG_fini(void);
Datum pg_all_queries(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(pg_all_queries);
static void process_utility(PlannedStmt *pstmt, const char *queryString,ProcessUtilityContext context,ParamListInfo params,QueryEnvironment *queryEnv,DestReceiver *dest,       char *completionTag);

#include "postgres.h"

/* OS Includes */

/* PostgreSQL Includes */

PG_MODULE_MAGIC;

void _PG_init(void);

void _PG_fini(void);

Datum pg_all_queries(PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1(pg_all_queries);

static void process_utility(PlannedStmt *pstmt, const char *queryString,ProcessUtilityContext context,ParamListInfo params,QueryEnvironment *queryEnv,DestReceiver *dest, char *completionTag);

You need to include postgres.h for the extension. You can include other PostgreSQL as needed. PG_MODULE_MAGIC is macro which you need to include in the C file for the extension. _PG_init and _PG_fini are functions that are called when the extension is loaded or unloaded, respectively.

Here is an example of the load and unload functions of an extension.

void _PG_init(void)
{
    /* ... C code here at time of extension loading ... */
    ProcessUtility_hook = process_utility;
}

Void _PG_fini(void)
{
    /* ... C code here at time of extension unloading ... */
}

void _PG_init(void)

{

/* ... C code here at time of extension loading ... */

ProcessUtility_hook = process_utility;

}

Void _PG_fini(void)

{

/* ... C code here at time of extension unloading ... */

}

Here is an example of a callback function that you can invoke whenever you call a utility statement e.g. any DDL statement. The “queryString” variable contains the actual query text.

static void process_utility(PlannedStmt *pstmt, 
                           const char *queryString,
                           ProcessUtilityContext context,
                           ParamListInfo params,
                           QueryEnvironment *queryEnv,DestReceiver *dest,
                           char *completionTag)
{
    /* ... C code here ... */
    standard_ProcessUtility(pstmt,  
                            queryString,  
                            context,  
                            params,  
                            queryEnv,  
                            dest, 
                            completionTag);
    /* ... C code here ... */
}

static void process_utility(PlannedStmt *pstmt,

const char *queryString,

ProcessUtilityContext context,

ParamListInfo params,

QueryEnvironment *queryEnv,DestReceiver *dest,

char *completionTag)

{

/* ... C code here ... */

standard_ProcessUtility(pstmt,

queryString,

context,

params,

queryEnv,

dest,

completionTag);

/* ... C code here ... */

}

Finally, here’s an example of a C function that is invoked through a user-defined SQL function. This internally calls the C function contained in our shared object.

Datum pg_all_queries(PG_FUNCTION_ARGS)
{
    /* ... C code here ... */
    tupstore = tuplestore_begin_heap(true, false, work_mem);
    /* ... C code here ... */     
    values[0] = CStringGetTextDatum(query);
    values[1] = CStringGetTextDatum(pid);
    /* ... C code here ... */
    tuplestore_donestoring(tupstore);
    return (Datum) 0;
}

Datum pg_all_queries(PG_FUNCTION_ARGS)

{

/* ... C code here ... */

tupstore = tuplestore_begin_heap(true, false, work_mem);

/* ... C code here ... */

values[0] = CStringGetTextDatum(query);

values[1] = CStringGetTextDatum(pid);

/* ... C code here ... */

tuplestore_donestoring(tupstore);

return (Datum) 0;

}

Compile and Install

Before compilation, you need to set the PATH for PostgreSQL’s bin directory if there is no pg_config available in the system paths.

export PATH=/usr/local/pgsql/bin:$PATH

1	export PATH=/usr/local/pgsql/bin:$PATH

make USE_PGXS=1

1	make USE_PGXS=1

make USE_PGXS=1 install

1	make USE_PGXS=1 install

Output

We can now use our extension through a simple SQL query. Following is the output that is coming straight out of extension written in C programming language.

postgres=# select * from pg_all_queries();          
query                      | pid 
--------------------------+|------ 
create table foo(a int);  +| 8196 
create table bar(a int);  +| 8196 
drop table foo;           +| 8196 
(3 rows)

postgres=# select * from pg_all_queries();

query | pid

--------------------------+|------

create table foo(a int); +| 8196

create table bar(a int); +| 8196

drop table foo; +| 8196

(3 rows)

I hope this example can serve as a starting point for you to create more useful extensions that will not only help you and your company, but will also give you an opportunity to share and help the PostgreSQL community grow.

The complete example can be found at Github[4].

[1]: https://www.postgresql.org/docs/current/external-extensions.html

[2]: https://github.com/postgres/postgres/blob/master/src/makefiles/pgxs.mk

[3]: https://www.postgresql.org/docs/9.1/extend-extensions.html

[4]: https://github.com/ibrarahmad/Blog-Examples/tree/master/log

—
Photo from Pexels

0 0 votes

Article Rating

5 Comments

Oldest

Newest Most Voted

RhodiumToad (@RhodiumToad)

7 years ago

Note that _PG_fini is never actually called; there is no provision for unloading modules. (Way back in the day there used to be, but it was removed as unsafe – the code is still there in the backend but #ifdef’d out.) So it’s best not to mention it in examples.

If a module has a _PG_init, then also bear in mind that it won’t normally be loaded until the first use of (or definition of) some function defined in the module, unless you also add it to an appropriate _preload_libraries config setting.

Author

Ibrar Ahmed

7 years ago

Reply to RhodiumToad (@RhodiumToad)

I think you are right about _PG_fini, but it is still there in the code. But _PG_init called in case of CREATE EXTENSION too.

Example:
postgres=# create extension log;
2019-04-05 13:05:42.572 UTC [17456] WARNING: _PG_init
psql: WARNING: _PG_init
CREATE EXTENSION

Author

Ibrar Ahmed

7 years ago

Reply to Ibrar Ahmed

I have not explained the _preload_libraries too. I have also not explained the complete loading concept of the module. As this is the basic blog about extesnion, so lets keep that simple.

RhodiumToad (@RhodiumToad)

7 years ago

Reply to Ibrar Ahmed

As I said above, defining a function that’s in some module does also load the module (in that session, only). That’s why CREATE EXTENSION ends up loading it, which calls _PG_init. If you disconnect and reconnect after doing the CREATE, you’ll find that the extension exists but that the module is not loaded until you call the function.