PG高可用之Citus分布式集群搭建及使用-伙伴云

PG高可用之Citus分布式集群搭建及使用

网友投稿 1375 2025-03-31

Citus集群简介

Citus是Postgres的开源扩展，将Postgres转换成一个分布式数据库，在集群的多个节点上分发数据和查询，具有像分片、分布式SQL引擎、复制表和分布式表等特性。

因为Citus是Postgres的扩展(而不是一个独立的代码分支)，所以当你使用Citus时，你也在使用Postgres，可以利用最新的Postgres特性、工具和生态系统。

Citus是一款基于postgresql的开源分布式数据库，自动继承了postgresql强大的SQL支持能力和应用生态（不仅是客户端协议的兼容还包括服务端扩展和管理工具的完全兼容）。Citus是PostgreSQL的扩展（not a fork），采用shared nothing架构，节点之间无共享数据，由协调器节点和Work节点构成一个数据库集群。专注于高性能HTAP分布式数据库。

相比单机PostgreSQL，Citus可以使用更多的CPU核心，更多的内存数量，保存更多的数据。通过向集群添加节点，可以轻松的扩展数据库。与其他类似的基于PostgreSQL的分布式方案，比如Greenplum，PostgreSQL-XL相比，citus最大的不同在于它是一个PostgreSQL扩展而不是一个独立的代码分支。 Citus可以用很小的代价和更快的速度紧跟PostgreSQL的版本演进；同时又能最大程度的保证数据库的稳定性和兼容性。

Citus支持新版本PostgreSQL的特性，并保持与现有工具的兼容。 Citus使用分片和复制在多台机器上横向扩展PostgreSQL。它的查询引擎将在这些服务器上执行SQL进行并行化查询，以便在大型数据集上实现实时（不到一秒）的响应。

Citus集群由一个中心的协调节点(CN)和若干个工作节点(Worker)构成。

coordinate：协调节点，一般称为cn，存储所有元数据，不存实际数据，该节点直接对用户开放，等于一个客户端。

worker：工作节点，不存储元数据，存储实际数据。执行协调节点发来的查询请求。一般不直接对用户开放。

环境

-- 网卡 docker network create --subnet=172.72.6.0/24 pg-network -- pg cn docker rm -f lhrpgcituscn80 docker run -d --name lhrpgcituscn80 -h lhrpgcituscn80 \ --net=pg-network --ip 172.72.6.80 \ -p 64380:5432 \ -v /sys/fs/cgroup:/sys/fs/cgroup \ --privileged=true lhrbest/lhrpgall:2.0 \ /usr/sbin/init docker rm -f lhrpgcitusdn81 docker run -d --name lhrpgcitusdn81 -h lhrpgcitusdn81 \ --net=pg-network --ip 172.72.6.81 \ -p 64381:5432 \ -v /sys/fs/cgroup:/sys/fs/cgroup \ --privileged=true lhrbest/lhrpgall:2.0 \ /usr/sbin/init docker rm -f lhrpgcitusdn82 docker run -d --name lhrpgcitusdn82 -h lhrpgcitusdn82 \ --net=pg-network --ip 172.72.6.82 \ -p 64382:5432 \ -v /sys/fs/cgroup:/sys/fs/cgroup \ --privileged=true lhrbest/lhrpgall:2.0 \ /usr/sbin/init docker rm -f lhrpgcitusdn83 docker run -d --name lhrpgcitusdn83 -h lhrpgcitusdn83 \ --net=pg-network --ip 172.72.6.83 \ -p 64383:5432 \ -v /sys/fs/cgroup:/sys/fs/cgroup \ --privileged=true lhrbest/lhrpgall:2.0 \ /usr/sbin/init docker rm -f lhrpgcitusdn84 docker run -d --name lhrpgcitusdn84 -h lhrpgcitusdn84 \ --net=pg-network --ip 172.72.6.84 \ -p 64384:5432 \ -v /sys/fs/cgroup:/sys/fs/cgroup \ --privileged=true lhrbest/lhrpgall:2.0 \ /usr/sbin/init [root@docker35 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0183e7a9704a lhrbest/lhrpgall:2.0 "/usr/sbin/init" 6 seconds ago Up 3 seconds 0.0.0.0:64384->5432/tcp, :::64384->5432/tcp lhrpgcitusdn84 877d897a5a76 lhrbest/lhrpgall:2.0 "/usr/sbin/init" 8 seconds ago Up 6 seconds 0.0.0.0:64383->5432/tcp, :::64383->5432/tcp lhrpgcitusdn83 98dafcefc505 lhrbest/lhrpgall:2.0 "/usr/sbin/init" 10 seconds ago Up 7 seconds 0.0.0.0:64382->5432/tcp, :::64382->5432/tcp lhrpgcitusdn82 04510e0bfa96 lhrbest/lhrpgall:2.0 "/usr/sbin/init" 11 seconds ago Up 10 seconds 0.0.0.0:64381->5432/tcp, :::64381->5432/tcp lhrpgcitusdn81 8cf991b0633f lhrbest/lhrpgall:2.0 "/usr/sbin/init" 13 seconds ago Up 11 seconds 0.0.0.0:64380->5432/tcp, :::64380->5432/tcp lhrpgcituscn80

防火墙修改

其中，coordinate节点的pg_hba.conf配置：

cat >> /var/lib/pgsql/13/data/pg_hba.conf <<"EOF" host all all 0.0.0.0/0 md5 EOF

worker节点的pg_hba.conf配置:

PG高可用之Citus分布式集群搭建及使用

cat >> /var/lib/pgsql/13/data/pg_hba.conf <<"EOF" host all all 172.72.6.0/24 trust EOF

安装citus

在每个节点上都安装citus，包括cn和dn。

可以在以下位置下载citus的源码：

https://github.com/citusdata/citus/releases

https://pgxn.org/dist/citus/10.2.4/

最新版本10.2.4，如下：

-- yum直接安装 yum list | grep citus yum install -y citus_13 su - postgresql psql create database lhrdb; \c lhrdb alter system set shared_preload_libraries='citus'; select * from pg_available_extensions where name='citus'; pg_ctl restart psql -d lhrdb create extension citus; \dx \dx+ citus

集群配置

协调节点新增工作节点

管理操作仅仅在协调节点（cn）上操作：

[postgres@lhrpgcituscn80 ~]$ psql -d lhrdb psql (13.3) Type "help" for help. lhrdb=# -- 节点可以是ip或者dns name SELECT * from master_add_node('172.72.6.81', 5432); SELECT * from master_add_node('172.72.6.82', 5432); SELECT * from master_add_node('172.72.6.83', 5432); SELECT * from master_add_node('172.72.6.84', 5432); -- 查看工作节点： lhrdb=# SELECT * FROM master_get_active_worker_nodes(); node_name | node_port -------------+----------- 172.72.6.81 | 5432 172.72.6.83 | 5432 172.72.6.84 | 5432 172.72.6.82 | 5432 (4 rows) lhrdb=# select * from pg_dist_node; nodeid | groupid | nodename | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster | metadatasynced | shouldhaveshards --------+---------+-------------+----------+----------+-------------+----------+----------+-------------+----------------+------------------ 1 | 1 | 172.72.6.81 | 5432 | default | f | t | primary | default | f | t 2 | 2 | 172.72.6.82 | 5432 | default | f | t | primary | default | f | t 3 | 3 | 172.72.6.83 | 5432 | default | f | t | primary | default | f | t 4 | 4 | 172.72.6.84 | 5432 | default | f | t | primary | default | f | t (4 rows)

创建分片表

lhrdb=# create table test(id int primary key ,name varchar); #配置分片策略 #设置分片数，4个主机，设置分片4，每个主机一个分片 lhrdb=# set citus.shard_count=4; # 配置副本数 lhrdb=# set citus.shard_replication_factor=2; lhrdb=# SELECT create_distributed_table('test', 'id', 'hash'); lhrdb=# insert into test select id,md5(random()::text) from generate_series(1,500) as id; # 查看分片分布 lhrdb=# select * from citus_tables; table_name | citus_table_type | distribution_column | colocation_id | table_size | shard_count | table_owner | access_method ------------+------------------+---------------------+---------------+------------+-------------+-------------+--------------- test | distributed | id | 1 | 384 kB | 4 | postgres | heap (1 row) lhrdb=# select * from master_get_table_metadata('test'); logical_relid | part_storage_type | part_method | part_key | part_replica_count | part_max_size | part_placement_policy ---------------+-------------------+-------------+----------+--------------------+---------------+----------------------- 16995 | t | h | id | 2 | 1073741824 | 2 (1 row) lhrdb=# select * from pg_dist_placement where shardid in (select shardid from pg_dist_shard where logicalrelid='test'::regclass); placementid | shardid | shardstate | shardlength | groupid -------------+---------+------------+-------------+--------- 1 | 102008 | 1 | 0 | 1 2 | 102008 | 1 | 0 | 2 3 | 102009 | 1 | 0 | 2 4 | 102009 | 1 | 0 | 3 5 | 102010 | 1 | 0 | 3 6 | 102010 | 1 | 0 | 4 7 | 102011 | 1 | 0 | 4 8 | 102011 | 1 | 0 | 1 (8 rows) lhrdb=# SELECT * from pg_dist_shard_placement order by shardid, placementid; shardid | shardstate | shardlength | nodename | nodeport | placementid ---------+------------+-------------+-------------+----------+------------- 102008 | 1 | 0 | 172.72.6.81 | 5432 | 1 102008 | 1 | 0 | 172.72.6.82 | 5432 | 2 102009 | 1 | 0 | 172.72.6.82 | 5432 | 3 102009 | 1 | 0 | 172.72.6.83 | 5432 | 4 102010 | 1 | 0 | 172.72.6.83 | 5432 | 5 102010 | 1 | 0 | 172.72.6.84 | 5432 | 6 102011 | 1 | 0 | 172.72.6.84 | 5432 | 7 102011 | 1 | 0 | 172.72.6.81 | 5432 | 8 (8 rows) lhrdb=# select count(*) from test; count ------- 500 (1 row) -- 查看分片表 [postgres@lhrpgcitusdn84 ~]$ psql -U postgres -h 172.72.6.80 -d lhrdb -c "\dt"; List of relations Schema | Name | Type | Owner --------+------+-------+---------- public | test | table | postgres (1 row) [postgres@lhrpgcitusdn84 ~]$ psql -U postgres -h 172.72.6.81 -d lhrdb -c "\dt"; List of relations Schema | Name | Type | Owner --------+-------------+-------+---------- public | test_102008 | table | postgres public | test_102011 | table | postgres (2 rows) [postgres@lhrpgcitusdn84 ~]$ psql -U postgres -h 172.72.6.82 -d lhrdb -c "\dt"; List of relations Schema | Name | Type | Owner --------+-------------+-------+---------- public | test_102008 | table | postgres public | test_102009 | table | postgres (2 rows) [postgres@lhrpgcitusdn84 ~]$ psql -U postgres -h 172.72.6.83 -d lhrdb -c "\dt"; List of relations Schema | Name | Type | Owner --------+-------------+-------+---------- public | test_102009 | table | postgres public | test_102010 | table | postgres (2 rows) [postgres@lhrpgcitusdn84 ~]$ psql -U postgres -h 172.72.6.84 -d lhrdb -c "\dt"; List of relations Schema | Name | Type | Owner --------+-------------+-------+---------- public | test_102010 | table | postgres public | test_102011 | table | postgres (2 rows)

有4个worker，所以数据分片为4，每个分片，做两个副本。

通过分片分布，如102008分布在172.72.6.81，172.72.6.82上，同理102009分布在172.72.6.82，172.72.6.83上。

假设6.81机器宕机了，集群访问102008原先是方位6.81的，现在会自动访问6.82上的102008分片。也就是说，单个数据节点故障，集群还能正常用，通过多设置副本，多个节点故障也能更强壮。

CN节点的进程：

[root@lhrpgcituscn80 /]# ps -ef|grep post postgres 1589 0 0 10:27 ? 00:00:00 /usr/pgsql-13/bin/postgres -D /var/lib/pgsql/13/data/ postgres 1590 1589 0 10:27 ? 00:00:00 postgres: logger postgres 1592 1589 0 10:27 ? 00:00:00 postgres: checkpointer postgres 1593 1589 0 10:27 ? 00:00:00 postgres: background writer postgres 1594 1589 0 10:27 ? 00:00:00 postgres: walwriter postgres 1595 1589 0 10:27 ? 00:00:00 postgres: autovacuum launcher postgres 1596 1589 0 10:27 ? 00:00:00 postgres: stats collector postgres 1597 1589 0 10:27 ? 00:00:00 postgres: logical replication launcher postgres 1641 1589 0 10:28 ? 00:00:03 postgres: Citus Maintenance Daemon: 16430/10

DN节点的进程：

[root@lhrpgcitusdn81 /]# ps -ef|grep post postgres 8661 0 0 11:09 ? 00:00:00 /usr/pgsql-13/bin/postgres -D /var/lib/pgsql/13/data/ postgres 8662 8661 0 11:09 ? 00:00:00 postgres: logger postgres 8665 8661 0 11:09 ? 00:00:00 postgres: checkpointer postgres 8666 8661 0 11:09 ? 00:00:00 postgres: background writer postgres 8667 8661 0 11:09 ? 00:00:00 postgres: walwriter postgres 8668 8661 0 11:09 ? 00:00:00 postgres: autovacuum launcher postgres 8669 8661 0 11:09 ? 00:00:00 postgres: stats collector postgres 8670 8661 0 11:09 ? 00:00:00 postgres: logical replication launcher postgres 8710 8661 0 11:10 ? 00:00:00 postgres: Citus Maintenance Daemon: 13255/10 postgres 8720 8661 0 11:10 ? 00:00:00 postgres: Citus Maintenance Daemon: 16430/10 postgres 9591 8661 0 11:25 ? 00:00:00 postgres: postgres lhrdb 172.72.6.80(58852) idle postgres 13145 8661 0 12:27 ? 00:00:00 postgres: postgres lhrdb 172.72.6.80(58998) idle

所有变量查询，可以使用tab键自动返回相关变量：

lhrdb=# set citus. citus.all_modifications_commutative citus.count_distinct_error_rate citus.enable_binary_protocol citus.enable_local_execution citus.enable_repartition_joins citus.explain_analyze_sort_method citus.local_hostname citus.log_remote_commands citus.max_cached_connection_lifetime citus.max_intermediate_result_size citus.multi_shard_modify_mode citus.node_connection_timeout citus.propagate_set_commands citus.shard_count citus.shard_placement_policy citus.task_assignment_policy citus.values_materialization_threshold citus.writable_standby_coordinator citus.coordinator_aggregation_strategy citus.defer_drop_after_shard_move citus.enable_deadlock_prevention citus.enable_local_reference_table_foreign_keys citus.explain_all_tasks citus.limit_clause_row_fetch_count citus.local_table_join_policy citus.max_adaptive_executor_pool_size citus.max_cached_conns_per_worker citus.multi_shard_commit_protocol citus.multi_task_query_log_level citus.partition_buffer_size citus.remote_task_check_interval citus.shard_max_size citus.shard_replication_factor citus.task_executor_type citus.worker_min_messages lhrdb=# set citus.shard_ citus.shard_count citus.shard_max_size citus.shard_placement_policy citus.shard_replication_factor lhrdb=# show citus.shard_count ; citus.shard_count ------------------- 32 (1 row)

查看所有执行计划

默认情况下，Citus中查看执行计划会省略大部分不同节点的相同计划，如果想查看完整的查询计划，会话设置如下：

lhrdb=# explain select count(*) from test; QUERY PLAN ------------------------------------------------------------------------------------------ Aggregate (cost=250.00..250.02 rows=1 width=8) -> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=100000 width=8) Task Count: 4 Tasks Shown: One of 4 -> Task Node: host=172.72.6.81 port=5432 dbname=lhrdb -> Aggregate (cost=2.49..2.50 rows=1 width=8) -> Seq Scan on test_102008 test (cost=0.00..2.19 rows=119 width=0) (8 rows) lhrdb=# SET citus.explain_all_tasks = 'TRUE'; SET lhrdb=# explain select count(*) from test; QUERY PLAN ------------------------------------------------------------------------------------------ Aggregate (cost=250.00..250.02 rows=1 width=8) -> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=100000 width=8) Task Count: 4 Tasks Shown: All -> Task Node: host=172.72.6.81 port=5432 dbname=lhrdb -> Aggregate (cost=2.49..2.50 rows=1 width=8) -> Seq Scan on test_102008 test (cost=0.00..2.19 rows=119 width=0) -> Task Node: host=172.72.6.82 port=5432 dbname=lhrdb -> Aggregate (cost=3.73..3.73 rows=1 width=8) -> Seq Scan on test_102009 test (cost=0.00..3.38 rows=138 width=0) -> Task Node: host=172.72.6.83 port=5432 dbname=lhrdb -> Aggregate (cost=2.47..2.48 rows=1 width=8) -> Seq Scan on test_102010 test (cost=0.00..2.18 rows=118 width=0) -> Task Node: host=172.72.6.84 port=5432 dbname=lhrdb -> Aggregate (cost=3.56..3.57 rows=1 width=8) -> Seq Scan on test_102011 test (cost=0.00..3.25 rows=125 width=0) (20 rows)

PostgreSQL 分布式

【云图说】第132期小云妹带您快速玩转RDS实例操作（2）——删除与退订

1375 2025-03-31

云数据库五大安全秘籍，乐享专场活动超低价">揭秘云数据库五大安全秘籍，乐享专场活动超低价

1375 2025-03-31

云数据库 PostgreSQL 支持 10.3 ，高性能，新体验">华为云数据库 PostgreSQL 支持 10.3 ，高性能，新体验

1375 2025-03-31

PG高可用之Citus分布式集群搭建及使用

【云图说】第132期小云妹带您快速玩转RDS实例操作（2）——删除与退订

云数据库五大安全秘籍，乐享专场活动超低价">揭秘云数据库五大安全秘籍，乐享专场活动超低价

云数据库 PostgreSQL 支持 10.3 ，高性能，新体验">华为云数据库 PostgreSQL 支持 10.3 ，高性能，新体验

推荐文章

企业生产管理是什么，企业生产管理软件

进盘点进销存软件排行榜前十名

进销存系统哪个简单好用？进销存系统优点

工厂生产管理（工厂生产管理流程及制度）

生产管理软件，机械制造业生产管理，制造业生产过程管理软件

进销存软件和ERP有什么区别？进销存与erp软件理解

进销存如何进行库存管理

如何利用excel制作销售订单管理系统？

数据库订单管理系统有哪些功能？数据库订单管理系统怎么设计？

什么是数据库管理系统？

最近发表

热评文章

零代码开发是什么？2022低代码平台排行榜">零代码开发是什么？2022低代码平台排行榜

进销存库存管理 系统（智慧进销存）">智能进销存库存管理系统（智慧进销存）

在线文档哪家强？8款在线文档编辑软件推荐">在线文档哪家强？8款在线文档编辑软件推荐

WPS2016怎么绘制简单的价格表?

定制订单管理系统（为特定需求定制的订单管理系统）

电子表格，居家办公更轻松">用在线电子表格，居家办公更轻松

友情链接

PG高可用之Citus分布式集群搭建及使用

微信扫一扫：分享

云数据库五大安全秘籍，乐享专场活动超低价">揭秘云数据库五大安全秘籍，乐享专场活动超低价

云数据库 PostgreSQL 支持 10.3 ，高性能，新体验">华为云数据库 PostgreSQL 支持 10.3 ，高性能，新体验

推荐文章

最近发表

热评文章

零代码开发是什么？2022低代码平台排行榜">零代码开发是什么？2022低代码平台排行榜

进销存库存管理系统（智慧进销存）">智能进销存库存管理系统（智慧进销存）

在线文档哪家强？8款在线文档编辑软件推荐">在线文档哪家强？8款在线文档编辑软件推荐

电子表格，居家办公更轻松">用在线电子表格，居家办公更轻松

友情链接