Postgres-XL ----------- Nodes Concept ------------- Postgres-XL is composed of three major components called the GTM, Coordinator and Datanode. The GTM is responsible to provide ACID property of transactions. The Datanode stores table data and handle SQL statements locally. The Coordinator handles each SQL statements from applications, determines which Datanode to go, and sends plans on to the appropriate Datanodes. The Coordinators and Datanodes of Postgres-XL are essentially PostgreSQL database servers You usually should run GTM on a separate server because GTM has to take care of transaction requirements from all the Coordinators and Datanodes. To group multiple requests and responses from Coordinator and Datanode processes running on the same server, you can configure GTM-Proxy. GTM-Proxy reduces the number of interactions and the amount of data to GTM. GTM-Proxy also helps handle GTM failures. It is often good practice to run both Coordinator and Datanode on the same server because we don't have to worry about workload balance between the two, and you can often get at data from replicated tables locally without sending an additional request out on the network. You can have any number of servers where these two components are running. Because both Coordinator and Datanode are essentially PostgreSQL instances, you should configure them to avoid resource conflict. It is very important to assign them different working directories and port numbers. https://www.postgres-xl.org/documentation/tutorial-arch.html Table distributing concept -------------------------- .. code-block:: bash CREATE TABLE DISTRIBUTE BY ... REPLICATION Each row of the table will be replicated to all the Datanode of the Postgres-XL database cluster. ROUNDROBIN Each row of the table will be placed in one of the Datanodes in a round-robin manner. The value of the row will not be needed to determine what Datanode to go. HASH ( column_name ) Each row of the table will be placed based on the hash value of the specified column. Following type is allowed as distribution column: INT8, INT2, OID, INT4, BOOL, INT2VECTOR, OIDVECTOR, CHAR, NAME, TEXT, BPCHAR, BYTEA, VARCHAR, NUMERIC, MONEY, ABSTIME, RELTIME, DATE, TIME,TIMESTAMP, TIMESTAMPTZ, INTERVAL, and TIMETZ. Please note that floating point is not allowed as a basis of the distribution column. MODULO ( column_name ) Each row of the table will be placed based on the modulo of the specified column. Following type is allowed as distribution column: INT8, INT2, INT4, BOOL, ABSTIME, RELTIME, DATE. Please note that floating point is not allowed as a basis of the distribution column. If DISTRIBUTE BY is not specified, columns with UNIQUE constraint will be chosen as the distribution key. If no such column is specified, distribution column is the first eligible column in the definition. If no such column is found, then the table will be distributed by ROUNDROBIN. You could Alter a replicated table to make it a distributed table. https://www.postgres-xl.org/documentation/sql-createtable.html https://www.postgres-xl.org/documentation/tutorial-createcluster.html Shard limitation ----------------- * (...) in distributed tables, UNIQUE constraints must include the distribution column of the table * (...) the distribution column must be included in PRIMARY KEY * (...) column with REFERENCES (FK) should be the distribution column. * (...) PRIMARY KEY must be the distribution column as well. In Postgres-XL, in distributed tables, UNIQUE constraints must include the distribution column of the table. This is because Postgres-XL currently only allows that it can push down to the Datanodes to be enforced locally. If we include the distribution column in unique constraints, it stands to reason that it can be enforced locally. If a table is distributed by ROUNDROBIN, we cannot enforce UNIQUE constraints because it does not have a distribution column; it is possible that the same value for a column exists on multiple nodes. There's no restriction in UNIQUE constraint in replicated tables. When an expression is used on a UNIQUE constraint, this expression must contain the distribution column of its parent table. It cannot use other columns as well. As mentioned when discussing UNIQUE constraint, the distribution column must be included in PRIMARY KEY. Other restrictions apply to the PRIMARY KEY as well. When an expression is used on a PRIMARY KEY constraint, this expression must contain the distribution column of its parent table. It cannot use other columns as well. Please note that column with REFERENCES should be the distribution column. This limitation is introduced because constraints are enforced only locally in each Datanode. In Postgres-XL, you cannot specify both PRIMARY KEY and REFERENCES key for different columns. In Postgres-XL, you cannot omit the column name in REFERENCES clause. In Postgres-XL, you cannot specify more than one foreign key constraints. Postgres-XL does not support exclusion constraints. Postgres-XL does not allow to modify the value of distribution column. https://www.postgres-xl.org/documentation/ddl-constraints.html https://www.postgres-xl.org/documentation/dml-update.html https://stackoverflow.com/questions/28547437/migrating-from-postgresql-to-postgres-xl-distributed-tables-design https://www.postgres-xl.org/documentation/upgrading.html High Availability ----------------- You can add slaves for each node analogous to PostgreSQL’s streaming replication. In addition, the cluster can be configured such that the Global Transaction Manager (GTM) can have a GTM Standby. In terms of automatic failover, it is currently not part of the core project, but Corosync/Pacemaker has been used for this purpose. https://www.postgres-xl.org/documentation/warm-standby-failover.html https://www.postgres-xl.org/faq/ https://github.com/ClusterLabs/PAF https://github.com/bitnine-oss/postgres-xl-ha Download -------- https://www.postgres-xl.org/download/ Setting up Postgres-XL cluster ------------------------------ Install Postgres-XL +++++++++++++++++++ On each hosts: * postgres-xl-gtm (192.168.0.140) * postgres-xl-cr1 (192.168.0.141) * postgres-xl-dn1 (192.168.0.142) * postgres-xl-dn2 (192.168.0.143) Do the following commands: .. code-block:: bash # Install requirements $ sudo apt-get upgrade $ sudo apt-get install build-essential $ sudo apt-get install libreadline-dev $ apt-get install zlib1g-dev $ apt-get install flex # Download postgres-xl $ wget https://www.postgres-xl.org/downloads/postgres-xl-9.5r1.6.tar.bz2 $ tar -xvjpf postgres-xl-9.5r1.6.tar.bz2 $ cd postgres-xl-9.5r1.6 # Install postgres-xl $ ./configure $ make All of Postgres-XL successfully made. Ready to install. $ sudo make install Postgres-XL installation complete. # Install pgxc_ctl $ cd contrib $ make $ sudo make install .. code-block:: bash $ sudo adduser postgres $ su postgres $ vim /home/postgres/.bashrc export PATH=/usr/local/pgsql/bin:$PATH $ mkdir ~/.ssh To fix these probable errors: .. code-block:: bash bash: gtm_ctl: command not found bash: pg_ctl: command not found initdb: invalid locale settings; check LANG and LC_* environment variables Add these lines to ``/etc/environment``: .. code-block:: bash $ vim /etc/environment PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/local/pgsql/bin:" export LANG=en_US.utf-8 export LC_ALL=en_US.utf-8 On ``postgres-xl-gtm`` host: .. code-block:: bash $ su postgres $ ssh-keygen -t rsa Enter file in which to save the key (/home/postgres/.ssh/id_rsa): $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ scp ~/.ssh/authorized_keys postgres@192.168.0.141:~/.ssh/ $ scp ~/.ssh/authorized_keys postgres@192.168.0.142:~/.ssh/ $ scp ~/.ssh/authorized_keys postgres@192.168.0.143:~/.ssh/ On every hosts: .. code-block:: bash $ chmod 700 ~/.ssh $ chmod 600 ~/.ssh/authorized_keys On ``postgres-xl-gtm`` host check ssh connecting to other hosts: .. code-block:: bash $ ssh postgres@192.168.0.141 $ ssh postgres@192.168.0.142 $ ssh postgres@192.168.0.143 Configure Postgres-XL +++++++++++++++++++++ Configure ``pgxc_ctl.conf`` on ``postgres-xl-gtm`` host: .. code-block:: bash $ export dataDirRoot=$HOME/DATA/pgxl/nodes $ mkdir $HOME/pgxc_ctl $ pgxc_ctl /bin/bash Installing pgxc_ctl_bash script as /home/postgres/pgxc_ctl/pgxc_ctl_bash. ERROR: File "/home/postgres/pgxc_ctl/pgxc_ctl.conf" not found or not a regular file. No such file or directory Installing pgxc_ctl_bash script as /home/postgres/pgxc_ctl/pgxc_ctl_bash. Reading configuration using /home/postgres/pgxc_ctl/pgxc_ctl_bash --home /home/postgres/pgxc_ctl --configuration /home/postgres/pgxc_ctl/pgxc_ctl.conf Finished reading configuration. ******** PGXC_CTL START *************** Current directory: /home/postgres/pgxc_ctl Create empty configuration file, on the ``PGXC`` console: .. code-block:: bash PGXC$ prepare config empty PGXC$ exit .. code-block:: bash $ vim ~/pgxc_ctl/pgxc_ctl.conf pgxcOwner=postgres coordPgHbaEntries=(192.168.0.0/24) datanodePgHbaEntries=(192.168.0.0/24) Configure gtm master node: .. code-block:: bash $ pgxc_ctl PGXC$ add gtm master gtm 192.168.0.140 20001 $dataDirRoot/gtm PGXC$ monitor all """ Running: gtm master """ Configure coordinator nodes: .. code-block:: bash PGXC$ add coordinator master cr1 192.168.0.141 30001 30011 $dataDirRoot/cr_master.1 none none """ Success. Starting coordinator master cr1 LOG: redirecting log output to logging collector process HINT: Future log output will appear in directory 'pg_log'. Done. """ PGXC$ monitor all """ Running: gtm master Running: coordinator master cr1 """ PGXC$ add coordinator master cr2 192.168.0.142 30002 30012 $dataDirRoot/cr_master.2 none none """ Success. Starting coordinator master cr2 LOG: redirecting log output to logging collector process HINT: Future log output will appear in directory 'pg_log'. Done. """ PGXC$ monitor all """ Running: gtm master Running: coordinator master cr1 Running: coordinator master cr2 """ Configure data nodes: .. code-block:: bash PGXC$ add datanode master dn1 192.168.0.143 40001 40011 $dataDirRoot/dn_master.1 none none none """ Success. Starting datanode master dn1. LOG: redirecting log output to logging collector process HINT: Future log output will appear in directory 'pg_log'. Done. """ PGXC$ monitor all """ Running: gtm master Running: coordinator master cr1 Running: coordinator master cr2 Running: datanode master dn1 """ PGXC$ add datanode master dn2 192.168.0.144 40002 40012 $dataDirRoot/dn_master.2 none none none """ Success Starting datanode master dn2. LOG: redirecting log output to logging collector process HINT: Future log output will appear in directory 'pg_log'. Done. """ PGXC$ monitor all """ Running: gtm master Running: coordinator master cr1 Running: coordinator master cr2 Running: datanode master dn1 Running: datanode master dn2 """ # To stop PGXC stop gtm master PGXC stop coordinator master cr1 PGXC stop coordinator master cr2 PGXC stop datanode master dn1 PGXC stop datanode master dn2 # To start PGXC start gtm master PGXC start coordinator master cr1 PGXC start coordinator master cr2 PGXC start datanode master dn1 PGXC start datanode master dn2 https://stackoverflow.com/questions/29225743/installing-postgres-xl-in-linux-in-distributed-environment https://ruihaijiang.wordpress.com/2015/09/17/postgres-xl-installation-example-on-linux/ Docker ------ https://github.com/tiredpixel/postgres-xl-docker Ansible ------- https://gitlab.com/ansible-postgres-xl/postgres-xl-cluster/tree/master Django ------ https://github.com/omidraha/django-postgres-xl-example Links ----- https://www.postgres-xl.org/faq/ https://github.com/bitnine-oss/postgres-xl-ha https://github.com/systemapic/wu/wiki/Installing-Postgresql-XL https://www.postgres-xl.org/documentation/admin.html https://stackoverflow.com/questions/42431018/can-postgres-xl-shard-replicate-and-auto-balance-at-the-same-time