First things first, a couple of definitions to clear up:

  • ETL: Extract, Transform, Load. Transforms data before loading it into the target system
    • extract raw data from various sources
    • use a secondary processing server to transform that data
    • load that data into a target database
  • ELT: Extra, Load, Transform. Transforms data after loading it into the target system
    • extract raw data from various sources
    • load that data into a target database
    • transform that data within the data system itself

Because there are often differences in the manner in which data is stored between database systems, migrating the data between these DBMS can be challenging. Therefore, in such cases, one can minimize efforts refactoring the application and instead perform ELT operations on the data itself in a dynamic fashion during the migration phase.

One of the more problematic types is date and timestamps.

There are three techniques that come to mind when migrating DATE datatypes between two RDBMS from a CSV file:

  1. Perform an ELT transformation, such as substituting the empty string for a NULL value, using a trigger while it loads into the table with the column of type “DATE”.
  2. Craft a custom PostgreSQL command that dynamically performs the ELT operation by transforming the empty string for a NULL value and inserting the data into a standard column of type “DATE” as it is copied.
  3. Create a custom datatype, thus permitting the direct loading of empty strings into a custom DATE column equivalent.

The first and second method migrates the data directly into standard postgres DATE columns. This method does assume, though, that empty strings will not be inserted into the database by the application in future transactions, post-migration.

Although the third method is the simplest and most straightforward method of migrating the data, some thought of administering the data post-migration may be required as well as it limits the DATE administration functionality available in PostgreSQL.

For our purposes, the third method, the use of custom datatypes and operators, is documented herein.

The steps are:

  1. Create the function used by a CAST
  2. Create the CAST
  3. Create the DOMAIN

Step 1: Create function f_ex

Notice the commented RETURN values “null”, “-infinity”, “infinity”, “epoch” etc. Although this POC defaults returning NULL, investigating these other DATE constants can provide further return options:


Step 2: Create CAST

Using the aforementioned function, a CAST is created in the database and notice that it is defined as type “DATE”. Even in this current form, without using the DOMAIN, one creates a column storing this value however, it would have to be declared as a character datatype:

Step 3: Create DOMAIN

The DOMAIN coerces the inserted value to be saved in the table as a specific datatype, which has its own specific set of behaviors. 

Two variations are demonstrated in order to highlight the different behaviors that can be introduced:

  • Variation 1: 

  • Variation 2: 

Validation/POC

Using the aforementioned CAST and DOMAIN, a table can now be created and populated with data.

The steps are:

  1. Create an example table
  2. Insert records into the example table
  3. Query the table

Step 1: Create table

Additional columns are declared in the table in order to demonstrate administering a stored value. See what it looks like:

ATTENTION: This SQL statement ensures that UK-style date inputs are implemented in the database.

Step 2: Insert records

Notice there are not only are multiple date formats accepted, but both empty strings and NULLs can be successfully inserted:


Step 3: Query

Example 1

Note the blank space generated by the NULL on row 4:

Example 2

Testing the use of empty spaces using the two different DOMAINs.

Variation 1: DOMAIN “uk_date1″… Type VARCHAR

Observe that Column date_field1 is coerced to a datatype VARCHAR by the DOMAIN uk_date1 while columns date_field2, date_default are CAST from type VARCHAR to DATE:

Here is the resulting output; note the NOTICE comments raised in the body of function f_ex:

Variation 2: DOMAIN “uk_date2” … type DATE

In this variation, column “date_field2” is coerced to type DATE. Notice in this case how both empty string and NULL are returned by this query:


Caveat

Beware, function f_ex allows the insertion of any character string, which, of course, cannot conform to the PostgreSQL DATE type implementation. This can, however be mitigated by adding the relevant conditionals in the function call, as demonstrated by the following example;

Conclusion

While the importation of data that includes empty strings by using a CAST and storing it as a DOMAIN is fairly straightforward, some level of consideration should be given to how one intends to manipulate the data once it’s there. At the end of the day, unless the application logic and querying are straightforward, one may have to either develop a more sophisticated set of queries or finally migrate to an altogether traditional table schema using standard data types.

Remember, it’s important to consider not only how the data gets to the database but also how it will be used.


Our PostgreSQL Performance Tuning eBook condenses years of database expertise into a practical guide for optimizing your PostgreSQL databases. Inside, you’ll discover our most effective PostgreSQL performance strategies derived from real-world experience.

 

Download eBook: Elevate your PostgreSQL Performance

Subscribe
Notify of
guest

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
FFF

“empty strings will not be inserted into the database by the application in future transactions”

I may miss the obvious, but how should this happen for a date column?

Robert Bernier

Hi,

> how should this happen for a date column?

Yes, I’ve seen this! It’s really all a matter of the source database system. Work long enough in this industry and you’ll find there’s always an exception to what is otherwise a very commonsense rule that to follow.