Character collations determine the sort order and classification of characters. When creating a database with initdb, PostgreSQL normally sets the collation based on the operating system’s locale settings, but other special collations, such as “C,” “POSIX,” and “ucs_basic,” are available as alternatives.
On Linux systems, updates to glibc can bring changes to collation rules. Normally, these updates are small, but they can lead to index corruption and need attention when planning upgrades and updates. PostgreSQL can be configured to use collation rules from different sources. When updating PostgreSQL and the operating system you run PostgreSQL on, you need to be alerted for changes to collation rules.
This blog covers the pg_upgrade and lc_collate mismatch issues and solutions that might appear while running pg_upgrade.
pg_upgrade check fails with the below error:
lc_collate values for database “postgres” do not match
Example:
|
1 |
-bash-4.2$ /usr/pgsql-14/bin/pg_upgrade -b /usr/pgsql-12/bin/ -B /usr/pgsql-14/bin -c -d /var/lib/pgsql/12/data -D /var/lib/pgsql/14/data<br><br>Performing Consistency Checks<br><br>-----------------------------<br><br>Checking cluster versions ok<br><br>Checking database user is the install user ok<br><br>Checking database connection settings ok<br><br>Checking for prepared transactions ok<br><br>Checking for system-defined composite types in user tables ok<br><br>Checking for reg* data types in user tables ok<br><br>Checking for contrib/isn with bigint-passing mismatch ok<br><br>Checking for user-defined encoding conversions ok<br><br>Checking for user-defined postfix operators ok<br><br>Checking for incompatible polymorphic functions ok<br><br><br><br><br>lc_collate values for database "postgres" do not match: old "C", new "en_US.utf8"<br><br>Failure, exiting |
Here, the lc_collate/locale value might be different in the message depending on the environment setup.
When the default collation of the old and new PostgreSQL databases does not match, the pg_upgrade consistency check will fail, and the message lc_collate values for the database “postgres” do not match.
Here is an example of a locale issue.
On source PostgreSQL version:
|
1 |
postgres=# l<br><br> List of databases<br><br> Name | Owner | Encoding | Collate | Ctype | Access privileges <br><br>-----------+----------+----------+---------+-------+-----------------------<br><br> postgres | postgres | UTF8 | C | C | <br><br> template0 | postgres | UTF8 | C | C | =c/postgres +<br><br> | | | | | postgres=CTc/postgres<br><br> template1 | postgres | UTF8 | C | C | =c/postgres +<br><br> | | | | | postgres=CTc/postgres<br><br> test | postgres | UTF8 | C | C | <br><br> test1 | postgres | UTF8 | C | C | <br><br> test2 | postgres | UTF8 | C | C | <br><br>(6 rows) |
A problem appears if we install a new PostgreSQL version and initialize the data directory using initdb. The information from the OS will be used to set the locale.
|
1 |
# sudo su - postgres<br><br>-bash-4.2$ locale<br><br>LANG=en_US.utf8<br><br>LC_CTYPE="en_US.utf8"<br><br>LC_NUMERIC="en_US.utf8"<br><br>LC_TIME="en_US.utf8"<br><br>LC_COLLATE="en_US.utf8"<br><br>LC_MONETARY="en_US.utf8"<br><br>LC_MESSAGES="en_US.utf8"<br><br>LC_PAPER="en_US.utf8"<br><br>LC_NAME="en_US.utf8"<br><br>LC_ADDRESS="en_US.utf8"<br><br>LC_TELEPHONE="en_US.utf8"<br><br>LC_MEASUREMENT="en_US.utf8"<br><br>LC_IDENTIFICATION="en_US.utf8"<br><br>LC_ALL=<br><br><br>-bash-4.2$ /usr/pgsql-14/bin/initdb -D /var/lib/pgsql/14/data<br><br>The files belonging to this database system will be owned by user "postgres".<br><br>This user must also own the server process.<br><br><br>The database cluster will be initialized with locale "en_US.utf8".<br><br>The default database encoding has accordingly been set to "UTF8".<br><br>The default text search configuration will be set to "english".<br><br>While performing pg_upgrade consistency checks, there is a failure message for locale mismatch:<br><br><br>-bash-4.2$ /usr/pgsql-14/bin/pg_upgrade -b /usr/pgsql-12/bin/ -B /usr/pgsql-14/bin -c -d /var/lib/pgsql/12/data -D /var/lib/pgsql/14/data<br><br>Performing Consistency Checks<br><br>-----------------------------<br><br>Checking cluster versions ok<br><br>Checking database user is the install user ok<br><br>Checking database connection settings ok<br><br>Checking for prepared transactions ok<br><br>Checking for system-defined composite types in user tables ok<br><br>Checking for reg* data types in user tables ok<br><br>Checking for contrib/isn with bigint-passing mismatch ok<br><br>Checking for user-defined encoding conversions ok<br><br>Checking for user-defined postfix operators ok<br><br>Checking for incompatible polymorphic functions ok<br><br><br>lc_collate values for database "postgres" do not match: old "C", new "en_US.utf8"<br><br>Failure, exiting<br><br>-bash-4.2$ |
To fix this, reinitialized the new version of PGDATA Directory with the same –encoding and –locale as the original PostgreSQL cluster by explicitly specifying it as the command-line argument for initdb as shown in the below example.
|
1 |
-bash-4.2$ rm -rf /var/lib/pgsql/14/data/*<br><br>-bash-4.2$ /usr/pgsql-14/bin/initdb -D /var/lib/pgsql/14/data/ <strong>--encoding=UTF8 --locale=C</strong><br><br>The files belonging to this database system will be owned by user "postgres".<br><br>This user must also own the server process.<br><br><br>The database cluster will be initialized with locale "C".<br><br>The default text search configuration will be set to "english".<br><br>Let's try pg_upgrade check again, and it is working fine.<br><br>-bash-4.2$ /usr/pgsql-14/bin/pg_upgrade -b /usr/pgsql-12/bin/ -B /usr/pgsql-14/bin -c -d /var/lib/pgsql/12/data -D /var/lib/pgsql/14/data<br><br>Performing Consistency Checks<br><br>-----------------------------<br><br>Checking cluster versions ok<br><br>Checking database user is the install user ok<br><br>Checking database connection settings ok<br><br>Checking for prepared transactions ok<br><br>Checking for system-defined composite types in user tables ok<br><br>Checking for reg* data types in user tables ok<br><br>Checking for contrib/isn with bigint-passing mismatch ok<br><br>Checking for user-defined encoding conversions ok<br><br>Checking for user-defined postfix operators ok<br><br>Checking for incompatible polymorphic functions ok<br><br>Checking for presence of required libraries ok<br><br>Checking database user is the install user ok<br><br>Checking for prepared transactions ok<br><br>Checking for new cluster tablespace directories ok<br><br><br>*Clusters are compatible*<br><br>-bash-4.2$ |
Furthermore, there is no issue with pg_upgrade:
|
1 |
-bash-4.2$ /usr/pgsql-14/bin/pg_upgrade -b /usr/pgsql-12/bin/ -B /usr/pgsql-14/bin -d /var/lib/pgsql/12/data -D /var/lib/pgsql/14/data<br><br>Performing Consistency Checks<br><br>-----------------------------<br><br>Checking cluster versions ok<br><br>Checking database user is the install user ok<br><br>Checking database connection settings ok<br><br>Checking for prepared transactions ok<br><br>Checking for system-defined composite types in user tables ok<br><br>Checking for reg* data types in user tables ok<br><br>Checking for contrib/isn with bigint-passing mismatch ok<br><br>Checking for user-defined encoding conversions ok<br><br>Checking for user-defined postfix operators ok<br><br>Checking for incompatible polymorphic functions ok<br><br>Creating dump of global objects ok<br><br>Creating dump of database schemas<br><br> ok<br><br>Checking for presence of required libraries ok<br><br>Checking database user is the install user ok<br><br>Checking for prepared transactions ok<br><br>Checking for new cluster tablespace directories ok<br><br><br>If pg_upgrade fails after this point, you must re-initdb the<br><br>new cluster before continuing.<br><br><br>Performing Upgrade<br><br>------------------<br><br>Analyzing all rows in the new cluster ok<br><br>Freezing all rows in the new cluster ok<br><br>Deleting files from new pg_xact ok<br><br>Copying old pg_xact to new server ok<br><br>Setting oldest XID for new cluster ok<br><br>Setting next transaction ID and epoch for new cluster ok<br><br>Deleting files from new pg_multixact/offsets ok<br><br>Copying old pg_multixact/offsets to new server ok<br><br>Deleting files from new pg_multixact/members ok<br><br>Copying old pg_multixact/members to new server ok<br><br>Setting next multixact ID and offset for new cluster ok<br><br>Resetting WAL archives ok<br><br>Setting frozenxid and minmxid counters in new cluster ok<br><br>Restoring global objects in the new cluster ok<br><br>Restoring database schemas in the new cluster<br> ok<br><br>Copying user relation files<br> ok<br><br>Setting next OID for new cluster ok<br><br>Sync data directory to disk ok<br><br>Creating script to delete old cluster ok<br><br>Checking for extension updates ok<br><br><br>Upgrade Complete<br><br>----------------<br><br>Optimizer statistics are not transferred by pg_upgrade.<br><br>Once you start the new server, consider running:<br><br> /usr/pgsql-14/bin/vacuumdb --all --analyze-in-stages<br><br>Running this script will delete the old cluster's data files:<br><br> ./delete_old_cluster.sh<br><br>-bash-4.2$ |
Start a new version of PostgreSQL, verify the encoding, and collate:
|
1 |
-bash-4.2$ /usr/pgsql-14/bin/pg_ctl -D /var/lib/pgsql/14/data -l logfile start<br><br>waiting for server to start.... done<br><br>server started<br><br>-bash-4.2$ psql<br><br>psql (14.11)<br><br>Type "help" for help.<br><br><br><br><br>postgres=# l<br> List of databases<br><br> Name | Owner | <strong>Encoding </strong>| <strong>Collate</strong> | Ctype | Access privileges <br><br>-----------+----------+----------+---------+-------+-----------------------<br><br> postgres | postgres | UTF8 | C | C | <br><br> template0 | postgres | UTF8 | C | C | =c/postgres +<br><br> | | | | | postgres=CTc/postgres<br><br> template1 | postgres | UTF8 | C | C | postgres=CTc/postgres+<br><br> | | | | | =c/postgres<br><br> test | postgres | UTF8 | C | C | <br><br> test1 | postgres | UTF8 | C | C | <br><br> test2 | postgres | UTF8 | C | C | <br><br>(6 rows) |
Encoding and collate look the same for all databases as it was in the old version of PostgreSQL.
Our services and software are designed to make running high-performance, highly available PostgreSQL in critical environments practically effortless, enhancing PostgreSQL with security, high availability, and performance features that are all certified and tested to work seamlessly together.
Resources
RELATED POSTS