Postgres-XL

Nodes Concept

Postgres-XL is composed of three major components called the GTM, Coordinator and Datanode.

The GTM is responsible to provide ACID property of transactions.

The Datanode stores table data and handle SQL statements locally.

The Coordinator handles each SQL statements from applications, determines which Datanode to go, and sends plans on to the appropriate Datanodes.

The Coordinators and Datanodes of Postgres-XL are essentially PostgreSQL database servers

You usually should run GTM on a separate server because GTM has to take care of transaction requirements from all the Coordinators and Datanodes. To group multiple requests and responses from Coordinator and Datanode processes running on the same server, you can configure GTM-Proxy. GTM-Proxy reduces the number of interactions and the amount of data to GTM. GTM-Proxy also helps handle GTM failures.

It is often good practice to run both Coordinator and Datanode on the same server because we don’t have to worry about workload balance between the two, and you can often get at data from replicated tables locally without sending an additional request out on the network. You can have any number of servers where these two components are running. Because both Coordinator and Datanode are essentially PostgreSQL instances, you should configure them to avoid resource conflict. It is very important to assign them different working directories and port numbers.

https://www.postgres-xl.org/documentation/tutorial-arch.html

Table distributing concept

CREATE TABLE DISTRIBUTE BY ...

REPLICATION

Each row of the table will be replicated to all the Datanode of the Postgres-XL database cluster.

ROUNDROBIN

Each row of the table will be placed in one of the Datanodes in a round-robin manner. The value of the row will not be needed to determine what Datanode to go.

HASH ( column_name )

Each row of the table will be placed based on the hash value of the specified column. Following type is allowed as distribution column: INT8, INT2, OID, INT4, BOOL, INT2VECTOR, OIDVECTOR, CHAR, NAME, TEXT, BPCHAR, BYTEA, VARCHAR, NUMERIC, MONEY, ABSTIME, RELTIME, DATE, TIME,TIMESTAMP, TIMESTAMPTZ, INTERVAL, and TIMETZ. Please note that floating point is not allowed as a basis of the distribution column.

MODULO ( column_name )

Each row of the table will be placed based on the modulo of the specified column. Following type is allowed as distribution column: INT8, INT2, INT4, BOOL, ABSTIME, RELTIME, DATE.

Please note that floating point is not allowed as a basis of the distribution column.

If DISTRIBUTE BY is not specified, columns with UNIQUE constraint will be chosen as the distribution key. If no such column is specified, distribution column is the first eligible column in the definition. If no such column is found, then the table will be distributed by ROUNDROBIN.

You could Alter a replicated table to make it a distributed table.

https://www.postgres-xl.org/documentation/sql-createtable.html

https://www.postgres-xl.org/documentation/tutorial-createcluster.html

Shard limitation

  • (…) in distributed tables, UNIQUE constraints must include the distribution column of the table

  • (…) the distribution column must be included in PRIMARY KEY

  • (…) column with REFERENCES (FK) should be the distribution column.

  • (…) PRIMARY KEY must be the distribution column as well.

In Postgres-XL, in distributed tables, UNIQUE constraints must include the distribution column of the table. This is because Postgres-XL currently only allows that it can push down to the Datanodes to be enforced locally. If we include the distribution column in unique constraints, it stands to reason that it can be enforced locally. If a table is distributed by ROUNDROBIN, we cannot enforce UNIQUE constraints because it does not have a distribution column; it is possible that the same value for a column exists on multiple nodes. There’s no restriction in UNIQUE constraint in replicated tables. When an expression is used on a UNIQUE constraint, this expression must contain the distribution column of its parent table. It cannot use other columns as well.

As mentioned when discussing UNIQUE constraint, the distribution column must be included in PRIMARY KEY. Other restrictions apply to the PRIMARY KEY as well. When an expression is used on a PRIMARY KEY constraint, this expression must contain the distribution column of its parent table. It cannot use other columns as well.

Please note that column with REFERENCES should be the distribution column. This limitation is introduced because constraints are enforced only locally in each Datanode.

In Postgres-XL, you cannot specify both PRIMARY KEY and REFERENCES key for different columns.

In Postgres-XL, you cannot omit the column name in REFERENCES clause.

In Postgres-XL, you cannot specify more than one foreign key constraints.

Postgres-XL does not support exclusion constraints.

Postgres-XL does not allow to modify the value of distribution column.

https://www.postgres-xl.org/documentation/ddl-constraints.html

https://www.postgres-xl.org/documentation/dml-update.html

https://stackoverflow.com/questions/28547437/migrating-from-postgresql-to-postgres-xl-distributed-tables-design

https://www.postgres-xl.org/documentation/upgrading.html

High Availability

You can add slaves for each node analogous to PostgreSQL’s streaming replication. In addition, the cluster can be configured such that the Global Transaction Manager (GTM) can have a GTM Standby.

In terms of automatic failover, it is currently not part of the core project, but Corosync/Pacemaker has been used for this purpose.

https://www.postgres-xl.org/documentation/warm-standby-failover.html

https://www.postgres-xl.org/faq/

https://github.com/ClusterLabs/PAF

https://github.com/bitnine-oss/postgres-xl-ha

Download

https://www.postgres-xl.org/download/

Setting up Postgres-XL cluster

Install Postgres-XL

On each hosts:

  • postgres-xl-gtm (192.168.0.140)

  • postgres-xl-cr1 (192.168.0.141)

  • postgres-xl-dn1 (192.168.0.142)

  • postgres-xl-dn2 (192.168.0.143)

Do the following commands:

# Install requirements
$ sudo apt-get upgrade
$ sudo apt-get install build-essential
$ sudo apt-get install libreadline-dev
$ apt-get install zlib1g-dev
$ apt-get install flex
# Download postgres-xl
$ wget https://www.postgres-xl.org/downloads/postgres-xl-9.5r1.6.tar.bz2
$ tar -xvjpf postgres-xl-9.5r1.6.tar.bz2
$ cd postgres-xl-9.5r1.6
# Install postgres-xl
$ ./configure
$ make
    All of Postgres-XL successfully made. Ready to install.
$ sudo make install
    Postgres-XL installation complete.
# Install pgxc_ctl
$ cd contrib
$ make
$ sudo make install
$ sudo adduser postgres
$ su postgres
$ vim /home/postgres/.bashrc
    export PATH=/usr/local/pgsql/bin:$PATH

$ mkdir ~/.ssh

To fix these probable errors:

bash: gtm_ctl: command not found
bash: pg_ctl: command not found

initdb: invalid locale settings; check LANG and LC_* environment variables

Add these lines to /etc/environment:

$ vim /etc/environment

    PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/local/pgsql/bin:"
    export LANG=en_US.utf-8
    export LC_ALL=en_US.utf-8

On postgres-xl-gtm host:

$ su postgres
$ ssh-keygen -t rsa
    Enter file in which to save the key (/home/postgres/.ssh/id_rsa):
$  cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

$ scp ~/.ssh/authorized_keys postgres@192.168.0.141:~/.ssh/
$ scp ~/.ssh/authorized_keys postgres@192.168.0.142:~/.ssh/
$ scp ~/.ssh/authorized_keys postgres@192.168.0.143:~/.ssh/

On every hosts:

$ chmod 700 ~/.ssh
$ chmod 600 ~/.ssh/authorized_keys

On postgres-xl-gtm host check ssh connecting to other hosts:

$ ssh postgres@192.168.0.141
$ ssh postgres@192.168.0.142
$ ssh postgres@192.168.0.143

Configure Postgres-XL

Configure pgxc_ctl.conf on postgres-xl-gtm host:

$ export dataDirRoot=$HOME/DATA/pgxl/nodes
$ mkdir $HOME/pgxc_ctl
$ pgxc_ctl

    /bin/bash
    Installing pgxc_ctl_bash script as /home/postgres/pgxc_ctl/pgxc_ctl_bash.
    ERROR: File "/home/postgres/pgxc_ctl/pgxc_ctl.conf" not found or not a regular file. No such file or directory
    Installing pgxc_ctl_bash script as /home/postgres/pgxc_ctl/pgxc_ctl_bash.
    Reading configuration using /home/postgres/pgxc_ctl/pgxc_ctl_bash --home /home/postgres/pgxc_ctl --configuration /home/postgres/pgxc_ctl/pgxc_ctl.conf
    Finished reading configuration.
       ******** PGXC_CTL START ***************

    Current directory: /home/postgres/pgxc_ctl

Create empty configuration file, on the PGXC console:

PGXC$  prepare config empty
PGXC$  exit
$ vim ~/pgxc_ctl/pgxc_ctl.conf

    pgxcOwner=postgres
    coordPgHbaEntries=(192.168.0.0/24)
    datanodePgHbaEntries=(192.168.0.0/24)

Configure gtm master node:

$ pgxc_ctl
PGXC$  add gtm master gtm 192.168.0.140 20001 $dataDirRoot/gtm
PGXC$  monitor all
"""
Running: gtm master
"""

Configure coordinator nodes:

PGXC$  add coordinator master cr1 192.168.0.141 30001 30011 $dataDirRoot/cr_master.1 none none
    """
    Success.
    Starting coordinator master cr1
    LOG:  redirecting log output to logging collector process
    HINT:  Future log output will appear in directory 'pg_log'.
    Done.
    """

PGXC$  monitor all
    """
    Running: gtm master
    Running: coordinator master cr1
    """

PGXC$  add coordinator master cr2 192.168.0.142 30002 30012 $dataDirRoot/cr_master.2 none none
    """
    Success.
    Starting coordinator master cr2
    LOG:  redirecting log output to logging collector process
    HINT:  Future log output will appear in directory 'pg_log'.
    Done.
    """

PGXC$  monitor all
    """
    Running: gtm master
    Running: coordinator master cr1
    Running: coordinator master cr2
    """

Configure data nodes:

PGXC$  add datanode master dn1 192.168.0.143 40001 40011 $dataDirRoot/dn_master.1 none none none
    """
    Success.
    Starting datanode master dn1.
    LOG:  redirecting log output to logging collector process
    HINT:  Future log output will appear in directory 'pg_log'.
    Done.
    """
PGXC$  monitor all
    """
    Running: gtm master
    Running: coordinator master cr1
    Running: coordinator master cr2
    Running: datanode master dn1
    """

PGXC$  add datanode master dn2 192.168.0.144 40002 40012 $dataDirRoot/dn_master.2 none none none
    """
    Success
    Starting datanode master dn2.
    LOG:  redirecting log output to logging collector process
    HINT:  Future log output will appear in directory 'pg_log'.
    Done.
    """
PGXC$  monitor all
    """
    Running: gtm master
    Running: coordinator master cr1
    Running: coordinator master cr2
    Running: datanode master dn1
    Running: datanode master dn2
    """

# To stop
PGXC stop gtm master
PGXC stop  coordinator master cr1
PGXC stop coordinator master cr2
PGXC stop datanode master dn1
PGXC stop datanode master dn2

# To start
PGXC start gtm master
PGXC start  coordinator master cr1
PGXC start coordinator master cr2
PGXC start datanode master dn1
PGXC start datanode master dn2

https://stackoverflow.com/questions/29225743/installing-postgres-xl-in-linux-in-distributed-environment

https://ruihaijiang.wordpress.com/2015/09/17/postgres-xl-installation-example-on-linux/

Docker

https://github.com/tiredpixel/postgres-xl-docker

Ansible

https://gitlab.com/ansible-postgres-xl/postgres-xl-cluster/tree/master

Django

https://github.com/omidraha/django-postgres-xl-example