Performing ELT Operations Using DOMAINs

First things first, a couple of definitions to clear up:

ETL: Extract, Transform, Load. Transforms data before loading it into the target system
- extract raw data from various sources
- use a secondary processing server to transform that data
- load that data into a target database
ELT: Extra, Load, Transform. Transforms data after loading it into the target system
- extract raw data from various sources
- load that data into a target database
- transform that data within the data system itself

Because there are often differences in the manner in which data is stored between database systems, migrating the data between these DBMS can be challenging. Therefore, in such cases, one can minimize efforts refactoring the application and instead perform ELT operations on the data itself in a dynamic fashion during the migration phase.

One of the more problematic types is date and timestamps.

There are three techniques that come to mind when migrating DATE datatypes between two RDBMS from a CSV file:

Perform an ELT transformation, such as substituting the empty string for a NULL value, using a trigger while it loads into the table with the column of type “DATE”.
Craft a custom PostgreSQL command that dynamically performs the ELT operation by transforming the empty string for a NULL value and inserting the data into a standard column of type “DATE” as it is copied.
Create a custom datatype, thus permitting the direct loading of empty strings into a custom DATE column equivalent.

The first and second method migrates the data directly into standard postgres DATE columns. This method does assume, though, that empty strings will not be inserted into the database by the application in future transactions, post-migration.

Although the third method is the simplest and most straightforward method of migrating the data, some thought of administering the data post-migration may be required as well as it limits the DATE administration functionality available in PostgreSQL.

For our purposes, the third method, the use of custom datatypes and operators, is documented herein.

The steps are:

Create the function used by a CAST
Create the CAST
Create the DOMAIN

Step 1: Create function f_ex

Notice the commented RETURN values “null”, “-infinity”, “infinity”, “epoch” etc. Although this POC defaults returning NULL, investigating these other DATE constants can provide further return options:

create or replace function f_ex(in x varchar)
returns date
as
$$
begin
    raise notice 'entering function f_ex, the argument is "%" ',x;
    return null;
--  return '-infinity'::date;
--  return 'infinity'::date;
--  return 'epoch'::date;
end;
$$
language plpgsql;

create or replace function f_ex(in x varchar)

returns date

begin

raise notice 'entering function f_ex, the argument is "%" ',x;

return null;

-- return '-infinity'::date;

-- return 'infinity'::date;

-- return 'epoch'::date;

end;

language plpgsql;

Step 2: Create CAST

Using the aforementioned function, a CAST is created in the database and notice that it is defined as type “DATE”. Even in this current form, without using the DOMAIN, one creates a column storing this value however, it would have to be declared as a character datatype:

create cast (varchar as date)
    with function f_ex(varchar)
    as assignment;

create cast (varchar as date)

with function f_ex(varchar)

as assignment;

Step 3: Create DOMAIN

The DOMAIN coerces the inserted value to be saved in the table as a specific datatype, which has its own specific set of behaviors.

Two variations are demonstrated in order to highlight the different behaviors that can be introduced:

Variation 1:

create domain uk_date1 as date;

1	create domain uk_date1 as date;

Variation 2:

create domain uk_date2 as varchar;

1	create domain uk_date2 as varchar;

Validation/POC

Using the aforementioned CAST and DOMAIN, a table can now be created and populated with data.

The steps are:

Create an example table
Insert records into the example table
Query the table

Step 1: Create table

Additional columns are declared in the table in order to demonstrate administering a stored value. See what it looks like:

create table example_table(
    id serial primary key
    ,date_field1 uk_date1
    ,date_field2 uk_date2
    ,date_default date
);

create table example_table(

id serial primary key

,date_field1 uk_date1

,date_field2 uk_date2

,date_default date

);

ATTENTION: This SQL statement ensures that UK-style date inputs are implemented in the database.

alter database db01 set datestyle to 'SQL, DMY';

1	alter database db01 set datestyle to 'SQL, DMY';

Step 2: Insert records

Notice there are not only are multiple date formats accepted, but both empty strings and NULLs can be successfully inserted:

INSERT INTO example_table (date_field1,date_field2,date_default) VALUES ('24/10/2024','24/10/2024','24/10/2024');
INSERT INTO example_table (date_field1,date_field2,date_default) VALUES ('24.10.2024','24.10.2024','24.10.2024');
INSERT INTO example_table (date_field1,date_field2,date_default) VALUES (now()::date,now()::date,now()::date);
INSERT INTO example_table (date_field1,date_field2,date_default) VALUES (NULL,NULL,NULL);

INSERT INTO example_table (date_field1,date_field2,date_default) VALUES ('24/10/2024','24/10/2024','24/10/2024');

INSERT INTO example_table (date_field1,date_field2,date_default) VALUES ('24.10.2024','24.10.2024','24.10.2024');

INSERT INTO example_table (date_field1,date_field2,date_default) VALUES (now()::date,now()::date,now()::date);

INSERT INTO example_table (date_field1,date_field2,date_default) VALUES (NULL,NULL,NULL);

Step 3: Query

Example 1

Note the blank space generated by the NULL on row 4:

SELECT * FROM example_table;

1	SELECT * FROM example_table;

--
-- f_ex returns NULL
--
 id | date_field1 | date_field2 | date_default
----+-------------+-------------+--------------
  1 | 24/10/2024  | 24/10/2024  | 24/10/2024
  2 | 24.10.2024  | 24/10/2024  | 24/10/2024
  3 | 15/11/2024  | 15/11/2024  | 15/11/2024
  4 |             |             |

-- f_ex returns NULL

id | date_field1 | date_field2 | date_default

----+-------------+-------------+--------------

1 | 24/10/2024 | 24/10/2024 | 24/10/2024

2 | 24.10.2024 | 24/10/2024 | 24/10/2024

3 | 15/11/2024 | 15/11/2024 | 15/11/2024

4 | | |

Example 2

Testing the use of empty spaces using the two different DOMAINs.

Variation 1: DOMAIN “uk_date1″… Type VARCHAR

Observe that Column date_field1 is coerced to a datatype VARCHAR by the DOMAIN uk_date1 while columns date_field2, date_default are CAST from type VARCHAR to DATE:

INSERT INTO example_table (date_field1,date_field2,date_default)
    VALUES ('',''::varchar,''::varchar);

1 2	INSERT INTO example_table (date_field1,date_field2,date_default) VALUES ('',''::varchar,''::varchar);

-- empty string "" is excluded because it can't be cast with this query
select date_field1 as datatype_varchar
      ,(date_field1::text::date + '1 month'::interval)::date as date_field1_1month
      , date_default
      ,(date_default + '1 month'::interval)::date as date_default_interval
from example_table
where date_field1 != '';

-- empty string "" is excluded because it can't be cast with this query

select date_field1 as datatype_varchar

,(date_field1::text::date + '1 month'::interval)::date as date_field1_1month

, date_default

,(date_default + '1 month'::interval)::date as date_default_interval

from example_table

where date_field1 != '';

Here is the resulting output; note the NOTICE comments raised in the body of function f_ex:

psql:go2_validate.sql:39: NOTICE:  entering function f_ex, the argument is ""
psql:go2_validate.sql:39: NOTICE:  entering function f_ex, the argument is ""
INSERT 0 1

datatype_varchar | date_field1_1month | date_default | date_default_interval
------------------+--------------------+--------------+-----------------------
24/10/2024       | 24/11/2024         | 24/10/2024   | 24/11/2024
24.10.2024       | 24/11/2024         | 24/10/2024   | 24/11/2024
15/11/2024       | 15/12/2024         | 15/11/2024   | 15/12/2024
(3 rows)

psql:go2_validate.sql:39: NOTICE: entering function f_ex, the argument is ""

INSERT 0 1

datatype_varchar | date_field1_1month | date_default | date_default_interval

------------------+--------------------+--------------+-----------------------

24/10/2024 | 24/11/2024 | 24/10/2024 | 24/11/2024

24.10.2024 | 24/11/2024 | 24/10/2024 | 24/11/2024

15/11/2024 | 15/12/2024 | 15/11/2024 | 15/12/2024

(3 rows)

Variation 2: DOMAIN “uk_date2” … type DATE

In this variation, column “date_field2” is coerced to type DATE. Notice in this case how both empty string and NULL are returned by this query:

select date_field2 as datatype_date
      ,(date_field2::text::date+'1 month'::interval)::date as date_field2_1month
      , date_default
      ,(date_default + '1 month'::interval)::date as date_default_1month
from example_table;

select date_field2 as datatype_date

,(date_field2::text::date+'1 month'::interval)::date as date_field2_1month

, date_default

,(date_default + '1 month'::interval)::date as date_default_1month

from example_table;

datatype_date | date_field2_1month | date_default | date_default_1month
---------------+--------------------+--------------+---------------------
24/10/2024    | 24/11/2024         | 24/10/2024   | 24/11/2024
24/10/2024    | 24/11/2024         | 24/10/2024   | 24/11/2024
15/11/2024    | 15/12/2024         | 15/11/2024   | 15/12/2024
              |                    |              |
              |                    |              |
(5 rows)

datatype_date | date_field2_1month | date_default | date_default_1month

---------------+--------------------+--------------+---------------------

24/10/2024 | 24/11/2024 | 24/10/2024 | 24/11/2024

15/11/2024 | 15/12/2024 | 15/11/2024 | 15/12/2024

| | |

(5 rows)

Caveat

Beware, function f_ex allows the insertion of any character string, which, of course, cannot conform to the PostgreSQL DATE type implementation. This can, however be mitigated by adding the relevant conditionals in the function call, as demonstrated by the following example;

insert into example_table (date_field1,date_field2,date_default)
    values ('abc','abc'::varchar,'abc'::varchar);

1 2	insert into example_table (date_field1,date_field2,date_default) values ('abc','abc'::varchar,'abc'::varchar);

select date_field1 as datatype_varchar
      ,date_field2 as datatype_date
      ,date_default
from example_table;

select date_field1 as datatype_varchar

,date_field2 as datatype_date

,date_default

from example_table;

-- output
psql:go2_validate.sql:67: NOTICE:  entering function f_ex, the argument is "abc"
psql:go2_validate.sql:67: NOTICE:  entering function f_ex, the argument is "abc"
INSERT 0 1

datatype_varchar | datatype_date | date_default
-----------------+---------------+--------------
24/10/2024       | 24/10/2024    | 24/10/2024
24.10.2024       | 24/10/2024    | 24/10/2024
15/11/2024       | 15/11/2024    | 15/11/2024
                 |               |
                 |               |
abc              |               |
(6 rows)

-- output

psql:go2_validate.sql:67: NOTICE: entering function f_ex, the argument is "abc"

INSERT 0 1

datatype_varchar | datatype_date | date_default

-----------------+---------------+--------------

24/10/2024 | 24/10/2024 | 24/10/2024

24.10.2024 | 24/10/2024 | 24/10/2024

15/11/2024 | 15/11/2024 | 15/11/2024

| |

abc | |

(6 rows)

Conclusion

While the importation of data that includes empty strings by using a CAST and storing it as a DOMAIN is fairly straightforward, some level of consideration should be given to how one intends to manipulate the data once it’s there. At the end of the day, unless the application logic and querying are straightforward, one may have to either develop a more sophisticated set of queries or finally migrate to an altogether traditional table schema using standard data types.

Remember, it’s important to consider not only how the data gets to the database but also how it will be used.

Our PostgreSQL Performance Tuning eBook condenses years of database expertise into a practical guide for optimizing your PostgreSQL databases. Inside, you’ll discover our most effective PostgreSQL performance strategies derived from real-world experience.

Download eBook: Elevate your PostgreSQL Performance

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

FFF

1 year ago

“empty strings will not be inserted into the database by the application in future transactions”

I may miss the obvious, but how should this happen for a date column?

Robert Bernier

1 year ago

Hi,

> how should this happen for a date column?

Yes, I’ve seen this! It’s really all a matter of the source database system. Work long enough in this industry and you’ll find there’s always an exception to what is otherwise a very commonsense rule that to follow.

MySQL 5.7
Support

Compare Percona to Leading Database Solutions

Software
Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Performing ELT Operations Using DOMAINs

Step 1: Create function f_ex

Step 2: Create CAST

Step 3: Create DOMAIN

Validation/POC

Step 1: Create table

Step 2: Insert records

Step 3: Query

Example 1

Example 2

Variation 1: DOMAIN “uk_date1″… Type VARCHAR

Variation 2: DOMAIN “uk_date2” … type DATE

Caveat

Conclusion

Related Blog Articles

RECOMMENDED ARTICLES

JavaScript Stored Routines in Percona Server for MySQL: A New Era for Database Programmability

Running Databases on Kubernetes: A Practical Guide to Risks, Benefits, and Best Practices

Building a Multi-Cloud Strategy: Cut Costs, Improve Resilience, and Avoid Lock-In

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7 Support

Compare Percona to Leading Database Solutions

Software Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Performing ELT Operations Using DOMAINs

Step 1: Create function f_ex

Step 2: Create CAST

Step 3: Create DOMAIN

Validation/POC

Step 1: Create table

Step 2: Insert records

Step 3: Query

Example 1

Example 2

Variation 1: DOMAIN “uk_date1″… Type VARCHAR

Variation 2: DOMAIN “uk_date2” … type DATE

Caveat

Conclusion

About the Author

Share This Post!

Stay up to date with the Percona Blog

Related Blog Articles

RECOMMENDED ARTICLES

JavaScript Stored Routines in Percona Server for MySQL: A New Era for Database Programmability

Running Databases on Kubernetes: A Practical Guide to Risks, Benefits, and Best Practices

Building a Multi-Cloud Strategy: Cut Costs, Improve Resilience, and Avoid Lock-In

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7
Support

Software
Downloads