redis实战 migrate异常NOAUTH Authentication required.

当redis cluster集群设置密码的时候,使用migirate需要加上密码,否则会报如下异常:

(error) ERR Target instance replied with error: NOAUTH Authentication required.
1
redis在3.0之前是不支持migrate带有password的,在3.0之后可以在migrate中加入auth参数来做权限校验

127.0.0.1:6380> migrate 192.168.0.33 6380 “” 0 2000 auth mypassword keys user:{info}:age user:{info}:id
OK
1
2
注意不要把auth password放到最后面,否则migrate会将会auth mypassword都当做key,结果报出如下错误,该错误是说migrate要迁移的多个key不在同一个slot中

(error) CROSSSLOT Keys in request don’t hash to the same slot

(error) ERR Target instance replied with error: NOAUTH Authentication required.

Redis Cluster部署、管理和测试

背景:

Redis 3.0之后支持了Cluster,大大增强了Redis水平扩展的能力。Redis Cluster是Redis官方的集群实现方案,在此之前已经有第三方Redis集群解决方案,如Twenproxy、Codis,与其不同的是:Redis Cluster并非使用Porxy的模式来连接集群节点,而是使用无中心节点的模式来组建集群。在Cluster出现之前,只有Sentinel保证了Redis的高可用性。

Redis Cluster实现在多个节点之间进行数据共享,即使部分节点失效或者无法进行通讯时,Cluster仍然可以继续处理请求。若每个主节点都有一个从节点支持,在主节点下线或者无法与集群的大多数节点进行通讯的情况下, 从节点提升为主节点,并提供服务,保证Cluster正常运行,Redis Cluster的节点分片是通过哈希槽(hash slot)实现的,每个键都属于这 16384(0~16383) 个哈希槽的其中一个,每个节点负责处理一部分哈希槽。

环境:

Ubuntu 14.04
Redis 3.2.8
主节点:192.168.100.134/135/136:17021
从节点:192.168.100.134/135/136:17022

对应主从节点:

   主        从 
134:17021  135:17022
135:17021  136:17022
136:17021  134:17022

手动部署:

①:安装
按照Redis之Sentinel高可用安装部署文章中的说明,装好Redis。只需要修改一下Cluster相关的配置参数:

 View Code

安装好之后开启Redis:均运行在集群模式下

root@redis-cluster1:~# ps -ef | grep redis
redis      4292      1  0 00:33 ?        00:00:03 /usr/local/bin/redis-server 192.168.100.134:17021 [cluster]
redis      4327      1  0 01:58 ?        00:00:00 /usr/local/bin/redis-server 192.168.100.134:17022 [cluster]

②:配置主节点

添加节点: cluster meet ip port

复制代码
复制代码
进入其中任意17021端口的实例,进入集群模式需要参数-c:
~# redis-cli -h 192.168.100.134 -p 17021 -c
192.168.100.134:17021> cluster meet 192.168.100.135 17021
OK
192.168.100.134:17021> cluster meet 192.168.100.136 17021
OK
节点添加成功
复制代码
复制代码

查看集群状态:cluster info

复制代码
复制代码
192.168.100.134:17021> cluster info
cluster_state:fail                        #集群状态
cluster_slots_assigned:0                  #被分配的槽位数
cluster_slots_ok:0                        #正确分配的槽位             
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:3                     #当前3个节点
cluster_size:0
cluster_current_epoch:2                  
cluster_my_epoch:1
cluster_stats_messages_sent:83
cluster_stats_messages_received:83
复制代码
复制代码

上面看到集群状态是失败的,原因是槽位没有分配,而且需要一次性把16384个槽位完全分配了,集群才可用。接着开始分配槽位:需要登入到各个节点,进行槽位的分配,如:
node1分配:0~5461
node2分配:5462~10922
node3分配:10923~16383

分配槽位:cluster addslots 槽位,一个槽位只能分配一个节点,16384个槽位必须分配完,不同节点不能冲突。

192.168.100.134:17021> cluster addslots 0
OK
192.168.100.135:17021> cluster addslots 0   #冲突
(error) ERR Slot 0 is already busy

目前还没有支持区间范围的添加槽位操作,所以添加16384个槽位的需要写一个批量脚本(addslots.sh):

复制代码
复制代码
node1:
#!/bin/bash
n=0
for ((i=n;i<=5461;i++))
do
   /usr/local/bin/redis-cli -h 192.168.100.134 -p 17021 -a dxy CLUSTER ADDSLOTS $i
done

node2:
#!/bin/bash
n=5462
for ((i=n;i<=10922;i++))
do
   /usr/local/bin/redis-cli -h 192.168.100.135 -p 17021 -a dxy CLUSTER ADDSLOTS $i
done

node3:
#!/bin/bash
n=10923
for ((i=n;i<=16383;i++))
do
   /usr/local/bin/redis-cli -h 192.168.100.136 -p 17021 -a dxy CLUSTER ADDSLOTS $i
done
复制代码
复制代码

连接3个节点分别执行:bash addslots.sh。所有槽位得到分配之后,在看下集群状态:

复制代码
复制代码
192.168.100.134:17021> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:3
cluster_size:3
cluster_current_epoch:2
cluster_my_epoch:1
cluster_stats_messages_sent:4193
cluster_stats_messages_received:4193
复制代码
复制代码

看到集群已经成功,那移除一个槽位看看集群会怎么样:cluster delslots 槽位

复制代码
复制代码
192.168.100.134:17021> cluster delslots 0
OK
192.168.100.134:17021> cluster info
cluster_state:fail
cluster_slots_assigned:16383
cluster_slots_ok:16383
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:3
cluster_size:3
cluster_current_epoch:2
cluster_my_epoch:1
cluster_stats_messages_sent:4482
cluster_stats_messages_received:4482
复制代码
复制代码

看到16384个槽位如果没有分配完全,集群是不成功的。 到这里为止,一个简单的Redis Cluster已经搭建完成,这里每个节点都是一个单点,若出现一个节点不可用,会导致整个集群的不可用,如何保证各个节点的高可用呢?这可以对每个主节点再建一个从节点来保证。

添加从节点(集群复制): 复制的原理和单机的Redis复制原理一样,区别是:集群下的从节点也需要运行在cluster模式下,要先添加到集群里面,再做复制。

①:添加从节点到集群中

复制代码
复制代码
192.168.100.134:17021> cluster meet 192.168.100.134 17022
OK
192.168.100.134:17021> cluster meet 192.168.100.135 17022
OK
192.168.100.134:17021> cluster meet 192.168.100.136 17022
OK
192.168.100.134:17021> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6     #当前集群下的所有节点,包括主从节点
cluster_size:3            #当前集群下的有槽位分配的节点,即主节点
cluster_current_epoch:5
cluster_my_epoch:1
cluster_stats_messages_sent:13438
cluster_stats_messages_received:13438
复制代码
复制代码

②:创建从节点 cluster replicate node_id ,通过cluster nodes得到node_id,需要在要成为的从节点的Redis(17022)上执行。

复制代码
复制代码
192.168.100.134:17022> cluster nodes #查看节点信息
7438368ca8f8a27fdf2da52940bb50098a78c6fc 192.168.100.136:17022 master - 0 1488255023528 5 connected
e1b78bb74970d0353832b2913e9b35eba74a2a1a 192.168.100.134:17022 myself,master - 0 0 0 connected
05e72d06edec6a920dd91b050c7a315937fddb66 192.168.100.136:17021 master - 0 1488255022526 2 connected 10923-16383
b461a30fde28409c38ee6c32db1cd267a6cfd125 192.168.100.135:17021 master - 0 1488255026533 3 connected 5462-10922
11f9169577352c33d85ad0d1ca5f5bf0deba3209 192.168.100.134:17021 master - 0 1488255025531 1 connected 0-5461
2b8b518324de0990ca587b47f6316e5f07b1df59 192.168.100.135:17022 master - 0 1488255024530 4 connected

#成为135:17021的从节点
192.168.100.134:17022> cluster replicate b461a30fde28409c38ee6c32db1cd267a6cfd125
OK
复制代码
复制代码

处理其他2个节点:

复制代码
#成为136:17021的从节点
192.168.100.135:17022> cluster replicate 05e72d06edec6a920dd91b050c7a315937fddb66
OK
#成为134:17021的从节点
192.168.100.136:17022> cluster replicate 11f9169577352c33d85ad0d1ca5f5bf0deba3209
OK
复制代码

查看节点状态:cluster nodes

复制代码
复制代码
2b8b518324de0990ca587b47f6316e5f07b1df59 192.168.100.135:17022 slave 05e72d06edec6a920dd91b050c7a315937fddb66 0 1488255859347 4 connected
11f9169577352c33d85ad0d1ca5f5bf0deba3209 192.168.100.134:17021 myself,master - 0 0 1 connected 0-5461
05e72d06edec6a920dd91b050c7a315937fddb66 192.168.100.136:17021 master - 0 1488255860348 2 connected 10923-16383
e1b78bb74970d0353832b2913e9b35eba74a2a1a 192.168.100.134:17022 slave b461a30fde28409c38ee6c32db1cd267a6cfd125 0 1488255858344 3 connected
7438368ca8f8a27fdf2da52940bb50098a78c6fc 192.168.100.136:17022 slave 11f9169577352c33d85ad0d1ca5f5bf0deba3209 0 1488255856341 5 connected
b461a30fde28409c38ee6c32db1cd267a6cfd125 192.168.100.135:17021 master - 0 1488255857343 3 connected 5462-10922
复制代码
复制代码

可以通过查看slave对应的node_id找出它的master节点,如以上操作遇到问题可以查看/var/log/redis/目录下的日志。到此Redis Cluster分片、高可用部署完成,接着继续说明一下集群的相关管理命令。

管理:cluster xxx

上面已经介绍了一部分Cluster相关的命令,现在对所有的命令所以下说明。

复制代码
复制代码
CLUSTER info:打印集群的信息。
CLUSTER nodes:列出集群当前已知的所有节点(node)的相关信息。
CLUSTER meet <ip> <port>:将ip和port所指定的节点添加到集群当中。
CLUSTER addslots <slot> [slot ...]:将一个或多个槽(slot)指派(assign)给当前节点。
CLUSTER delslots <slot> [slot ...]:移除一个或多个槽对当前节点的指派。
CLUSTER slots:列出槽位、节点信息。
CLUSTER slaves <node_id>:列出指定节点下面的从节点信息。
CLUSTER replicate <node_id>:将当前节点设置为指定节点的从节点。
CLUSTER saveconfig:手动执行命令保存保存集群的配置文件,集群默认在配置修改的时候会自动保存配置文件。
CLUSTER keyslot <key>:列出key被放置在哪个槽上。
CLUSTER flushslots:移除指派给当前节点的所有槽,让当前节点变成一个没有指派任何槽的节点。
CLUSTER countkeysinslot <slot>:返回槽目前包含的键值对数量。
CLUSTER getkeysinslot <slot> <count>:返回count个槽中的键。

CLUSTER setslot <slot> node <node_id> 将槽指派给指定的节点,如果槽已经指派给另一个节点,那么先让另一个节点删除该槽,然后再进行指派。  
CLUSTER setslot <slot> migrating <node_id> 将本节点的槽迁移到指定的节点中。  
CLUSTER setslot <slot> importing <node_id> 从 node_id 指定的节点中导入槽 slot 到本节点。  
CLUSTER setslot <slot> stable 取消对槽 slot 的导入(import)或者迁移(migrate)。 

CLUSTER failover:手动进行故障转移。
CLUSTER forget <node_id>:从集群中移除指定的节点,这样就无法完成握手,过期时为60s,60s后两节点又会继续完成握手。
CLUSTER reset [HARD|SOFT]:重置集群信息,soft是清空其他节点的信息,但不修改自己的id,hard还会修改自己的id,不传该参数则使用soft方式。

CLUSTER count-failure-reports <node_id>:列出某个节点的故障报告的长度。
CLUSTER SET-CONFIG-EPOCH:设置节点epoch,只有在节点加入集群前才能设置。
复制代码
复制代码

为了更好的展示上面命令,先为这个新集群插入一些数据:通过脚本插入:

 View Code

这里说明一下上面没有介绍过的管理命令:

①:cluster slots 列出槽位和对应节点的信息

复制代码
复制代码
192.168.100.134:17021> cluster slots
1) 1) (integer) 0
   2) (integer) 5461
   3) 1) "192.168.100.134"
      2) (integer) 17021
      3) "11f9169577352c33d85ad0d1ca5f5bf0deba3209"
   4) 1) "192.168.100.136"
      2) (integer) 17022
      3) "7438368ca8f8a27fdf2da52940bb50098a78c6fc"
2) 1) (integer) 10923
   2) (integer) 16383
   3) 1) "192.168.100.136"
      2) (integer) 17021
      3) "05e72d06edec6a920dd91b050c7a315937fddb66"
   4) 1) "192.168.100.135"
      2) (integer) 17022
      3) "2b8b518324de0990ca587b47f6316e5f07b1df59"
3) 1) (integer) 5462
   2) (integer) 10922
   3) 1) "192.168.100.135"
      2) (integer) 17021
      3) "b461a30fde28409c38ee6c32db1cd267a6cfd125"
   4) 1) "192.168.100.134"
      2) (integer) 17022
      3) "e1b78bb74970d0353832b2913e9b35eba74a2a1a"
复制代码
复制代码

②:cluster slaves:列出指定节点的从节点

192.168.100.134:17021> cluster slaves 11f9169577352c33d85ad0d1ca5f5bf0deba3209
1) "7438368ca8f8a27fdf2da52940bb50098a78c6fc 192.168.100.136:17022 slave 11f9169577352c33d85ad0d1ca5f5bf0deba3209 0 1488274385311 5 connected"

③:cluster keyslot:列出key放在那个槽上

192.168.100.134:17021> cluster keyslot 9223372036854742675
(integer) 10310

④:cluster countkeysinslot:列出指定槽位的key数量

192.168.100.134:17021> cluster countkeysinslot 1
(integer) 19

⑤:cluster getkeysinslot :列出指定槽位中的指定数量的key

192.168.100.134:17021> cluster getkeysinslot 1 3
1) "9223372036854493093"
2) "9223372036854511387"
3) "9223372036854522344"

⑥:cluster setslot …手动迁移192.168.100.134:17021的0槽位到192.168.100.135:17021

复制代码
复制代码
1:首先查看各节点的槽位
192.168.100.134:17021> cluster nodes
2b8b518324de0990ca587b47f6316e5f07b1df59 192.168.100.135:17022 slave 05e72d06edec6a920dd91b050c7a315937fddb66 0 1488295105089 4 connected
11f9169577352c33d85ad0d1ca5f5bf0deba3209 192.168.100.134:17021 myself,master - 0 0 7 connected 0-5461
05e72d06edec6a920dd91b050c7a315937fddb66 192.168.100.136:17021 master - 0 1488295107092 2 connected 10923-16383
e1b78bb74970d0353832b2913e9b35eba74a2a1a 192.168.100.134:17022 slave b461a30fde28409c38ee6c32db1cd267a6cfd125 0 1488295106090 6 connected
7438368ca8f8a27fdf2da52940bb50098a78c6fc 192.168.100.136:17022 slave 11f9169577352c33d85ad0d1ca5f5bf0deba3209 0 1488295104086 7 connected
b461a30fde28409c38ee6c32db1cd267a6cfd125 192.168.100.135:17021 master - 0 1488295094073 6 connected 5462-10922

2:查看要迁移槽位的key
192.168.100.134:17021> cluster getkeysinslot 0 100
1) "9223372012094975807"
2) "9223372031034975807"

3:到目标节点执行导入操作
192.168.100.135:17021> cluster setslot 0 importing 11f9169577352c33d85ad0d1ca5f5bf0deba3209
OK
192.168.100.135:17021> cluster nodes
...
b461a30fde28409c38ee6c32db1cd267a6cfd125 192.168.100.135:17021 myself,master - 0 0 6 connected 5462-10922 [0-<-11f9169577352c33d85ad0d1ca5f5bf0deba3209]
...

4:到源节点进行迁移操作
192.168.100.134:17021> cluster setslot 0 migrating b461a30fde28409c38ee6c32db1cd267a6cfd125
OK
192.168.100.134:17021> cluster nodes
...
11f9169577352c33d85ad0d1ca5f5bf0deba3209 192.168.100.134:17021 myself,master - 0 0 7 connected 0-5461 [0->-b461a30fde28409c38ee6c32db1cd267a6cfd125]
...

5:在源节点迁移槽位中的key到目标节点:MIGRATE host port key destination-db timeout [COPY] [REPLACE]
192.168.100.134:17021> migrate 192.168.100.135 17021 9223372031034975807 0 5000 replace
OK
192.168.100.134:17021> migrate 192.168.100.135 17021 9223372012094975807 0 5000 replace
OK
192.168.100.134:17021> cluster getkeysinslot 0 100     #key迁移完之后,才能进行下一步
(empty list or set)

6:最后设置槽位到指定节点,命令将会广播给集群其他节点,已经将Slot转移到目标节点
192.168.100.135:17021> cluster setslot 0 node b461a30fde28409c38ee6c32db1cd267a6cfd125
OK
192.168.100.134:17021> cluster setslot 0 node b461a30fde28409c38ee6c32db1cd267a6cfd125
OK

7:验证是否迁移成功:
192.168.100.134:17021> cluster nodes
...
11f9169577352c33d85ad0d1ca5f5bf0deba3209 192.168.100.134:17021 myself,master - 0 0 9 connected 1-5461 #变了
...
b461a30fde28409c38ee6c32db1cd267a6cfd125 192.168.100.135:17021 master - 0 1488300965322 10 connected 0 5462-10922

查看槽位信息:
192.168.100.134:17021> cluster slots
1) 1) (integer) 10923
   2) (integer) 16383
   3) 1) "192.168.100.136"
      2) (integer) 17021
      3) "05e72d06edec6a920dd91b050c7a315937fddb66"
2) 1) (integer) 1
   2) (integer) 5461
   3) 1) "192.168.100.134"
      2) (integer) 17021
      3) "11f9169577352c33d85ad0d1ca5f5bf0deba3209"
3) 1) (integer) 0
   2) (integer) 0
   3) 1) "192.168.100.135"
      2) (integer) 17021
      3) "b461a30fde28409c38ee6c32db1cd267a6cfd125"
4) 1) (integer) 5462
   2) (integer) 10922
   3) 1) "192.168.100.135"
      2) (integer) 17021
      3) "b461a30fde28409c38ee6c32db1cd267a6cfd125"

查看数据是否迁移成功:
192.168.100.134:17021> cluster getkeysinslot 0 100
(empty list or set)
192.168.100.135:17021> cluster getkeysinslot 0 100
1) "9223372012094975807"
2) "9223372031034975807"
复制代码
复制代码

对于大量slot要迁移,而且slot里也有大量的key的话,可以按照上面的步骤写个脚本处理,或则用后面脚本部署里介绍的处理。

大致的迁移slot的步骤如下:

复制代码
复制代码
1,在目标节点上声明将从源节点上迁入Slot CLUSTER SETSLOT <slot> IMPORTING <source_node_id>
2,在源节点上声明将往目标节点迁出Slot CLUSTER SETSLOT <slot> migrating <target_node_id>
3,批量从源节点获取KEY CLUSTER GETKEYSINSLOT <slot> <count>
4,将获取的Key迁移到目标节点 MIGRATE <target_ip> <target_port> <key_name> 0 <timeout>
重复步骤3,4直到所有数据迁移完毕,MIGRATE命令会将所有的指定的key通过RESTORE key ttl serialized-value REPLACE迁移给target
5,分别向双方节点发送 CLUSTER SETSLOT <slot> NODE <target_node_id>,该命令将会广播给集群其他节点,取消importing和migrating。
6,等待集群状态变为OK CLUSTER INFO 中的 cluster_state = ok
复制代码
复制代码

注意:这里在操作migrate的时候,若各节点有认证,执行的时候会出现:

(error) ERR Target instance replied with error: NOAUTH Authentication required.

若确定执行的迁移,本文中是把所有节点的masterauth和requirepass注释掉之后进行的,等进行完之后再开启认证。

⑦:cluster forget:从集群中移除指定的节点,这样就无法完成握手,过期时为60s,60s后两节点又会继续完成握手。

复制代码
复制代码
192.168.100.134:17021> cluster nodes
05e72d06edec6a920dd91b050c7a315937fddb66 192.168.100.136:17021 master - 0 1488302330582 2 connected 10923-16383
11f9169577352c33d85ad0d1ca5f5bf0deba3209 192.168.100.134:17021 myself,master - 0 0 9 connected 1-5461
b461a30fde28409c38ee6c32db1cd267a6cfd125 192.168.100.135:17021 master - 0 1488302328576 10 connected 0 5462-10922
...

192.168.100.134:17021> cluster forget 05e72d06edec6a920dd91b050c7a315937fddb66
OK
192.168.100.134:17021> cluster nodes
11f9169577352c33d85ad0d1ca5f5bf0deba3209 192.168.100.134:17021 myself,master - 0 0 9 connected 1-5461
b461a30fde28409c38ee6c32db1cd267a6cfd125 192.168.100.135:17021 master - 0 1488302376718 10 connected 0 5462-10922
...

一分钟之后:
192.168.100.134:17021> cluster nodes
05e72d06edec6a920dd91b050c7a315937fddb66 192.168.100.136:17021 master - 0 1488302490107 2 connected 10923-16383
11f9169577352c33d85ad0d1ca5f5bf0deba3209 192.168.100.134:17021 myself,master - 0 0 9 connected 1-5461
b461a30fde28409c38ee6c32db1cd267a6cfd125 192.168.100.135:17021 master - 0 1488302492115 10 connected 0 5462-10922
复制代码
复制代码

⑧:cluster failover:手动进行故障转移,在下一节会详解。需要注意的是在需要故障转移的节点上执行,必须在slave节点上执行,否则报错:

(error) ERR You should send CLUSTER FAILOVER to a slave

⑨:cluster flushslots:需要在没有key的节点执行,移除指派给当前节点的所有槽,让当前节点变成一个没有指派任何槽的节点,该节点所有数据丢失。

复制代码
复制代码
192.168.100.136:17022> cluster nodes
05e72d06edec6a920dd91b050c7a315937fddb66 192.168.100.136:17021 master - 0 1488255398859 2 connected 10923-16383
...

192.168.100.136:17021> cluster flushslots
OK

192.168.100.136:17021> cluster nodes
05e72d06edec6a920dd91b050c7a315937fddb66 192.168.100.136:17021 myself,master - 0 0 2 connected
...
复制代码
复制代码

⑩:cluster reset :需要在没有key的节点执行,重置集群信息。

192.168.100.134:17021> cluster reset
OK
192.168.100.134:17021> cluster nodes
11f9169577352c33d85ad0d1ca5f5bf0deba3209 192.168.100.134:17021 myself,master - 0 0 9 connected

脚本部署(redis-trib.rb)

Redis Cluster有一套管理脚本,如:创建集群、迁移节点、增删槽位等,这些脚本都存放在源码包里,都是用ruby编写的。现在测试用下脚本完成集群的部署。

①:按照需求创建Redis实例,6个实例(3主3从)。

②:安全需要ruby模块:

apt-get install ruby
gem install redis

③:脚本redis-trib.rb(/usr/local/src/redis-3.2.8/src)

复制代码
复制代码
./redis-trib.rb help
Usage: redis-trib <command> <options> <arguments ...>

#创建集群
create          host1:port1 ... hostN:portN  
                  --replicas <arg> #带上该参数表示是否有从,arg表示从的数量
#检查集群
check           host:port
#查看集群信息
info            host:port
#修复集群
fix             host:port
                  --timeout <arg>
#在线迁移slot  
reshard         host:port       #个是必传参数,用来从一个节点获取整个集群信息,相当于获取集群信息的入口
                  --from <arg>  #需要从哪些源节点上迁移slot,可从多个源节点完成迁移,以逗号隔开,传递的是节点的node id,还可以直接传递--from all,这样源节点就是集群的所有节点,不传递该参数的话,则会在迁移过程中提示用户输入
                  --to <arg>    #slot需要迁移的目的节点的node id,目的节点只能填写一个,不传递该参数的话,则会在迁移过程中提示用户输入。
                  --slots <arg> #需要迁移的slot数量,不传递该参数的话,则会在迁移过程中提示用户输入。
                  --yes         #设置该参数,可以在打印执行reshard计划的时候,提示用户输入yes确认后再执行reshard
                  --timeout <arg>  #设置migrate命令的超时时间。
                  --pipeline <arg> #定义cluster getkeysinslot命令一次取出的key数量,不传的话使用默认值为10。
#平衡集群节点slot数量  
rebalance       host:port
                  --weight <arg>
                  --auto-weights
                  --use-empty-masters
                  --timeout <arg>
                  --simulate
                  --pipeline <arg>
                  --threshold <arg>
#将新节点加入集群 
add-node        new_host:new_port existing_host:existing_port
                  --slave
                  --master-id <arg>
#从集群中删除节点
del-node        host:port node_id
#设置集群节点间心跳连接的超时时间
set-timeout     host:port milliseconds
#在集群全部节点上执行命令
call            host:port command arg arg .. arg
#将外部redis数据导入集群
import          host:port
                  --from <arg>
                  --copy
                  --replace
#帮助
help            (show this help)

For check, fix, reshard, del-node, set-timeout you can specify the host and port of any working node in the cluster.
复制代码
复制代码

1)创建集群 cretate :6个节点,每个节点一个从库,这里有个问题是不能指定那个从库属于哪个主库,不过可以先添加3个主库,通过新增节点(add-node)来添加从库到指定主库。

./redis-trib.rb create --replicas 1 192.168.100.134:17021 192.168.100.135:17021 192.168.100.136:17021 192.168.100.134:17022 192.168.100.135:17022 192.168.100.136:17022
 View Code

2)测试集群 check ip:port:测试集群是否分配完了slot

./redis-trib.rb check 192.168.100.134:17021
 View Code

3)查看集群信息 info ip:port:查看集群信息:包括slot、slave、和key的数量分布

./redis-trib.rb info 192.168.100.134:17021
 View Code

4)平衡节点的slot数量 rebalance ip:port:平均各个节点的slot数量

./redis-trib.rb rebalance 192.168.100.134:17021

流程:

 View Code

5)删除集群节点 del-node ip:port <node_id>:只能删除没有分配slot的节点,从集群中删出之后直接关闭实例

./redis-trib.rb del-node 192.168.100.135:17022 77d02fef656265c9c421fef425527c510e4cfcb8
 View Code

流程:

 View Code

6)添加集群节点 add-node :新节点加入集群,节点可以为master,也可以为某个master节点的slave。

添加一个主节点:134:17022 加入到134:17021的集群当中

./redis-trib.rb add-node 192.168.100.134:17022 192.168.100.134:17021
 View Code

添加一个从节点:135:17022加入到134:17021的集群当中,并且作为指定<node_id>的从库

./redis-trib.rb add-node --slave --master-id 7fa64d250b595d8ac21a42477af5ac8c07c35d83 192.168.100.135:17022 192.168.100.134:17021
 View Code

最后集群的信息:

复制代码
复制代码
192.168.100.134:17021> cluster nodes
77d02fef656265c9c421fef425527c510e4cfcb8 192.168.100.135:17022 slave 7fa64d250b595d8ac21a42477af5ac8c07c35d83 0 1488346523944 5 connected
5476787f31fa375fda6bb32676a969c8b8adfbc2 192.168.100.134:17022 master - 0 1488346525949 4 connected
7fa64d250b595d8ac21a42477af5ac8c07c35d83 192.168.100.134:17021 myself,master - 0 0 1 connected 0-5460
51bf103f7cf6b5ede6e009ce489fdeec14961be8 192.168.100.135:17021 master - 0 1488346522942 2 connected 5461-10922
0191a8b52646fb5c45323ab0c1a1a79dc8f3aea2 192.168.100.136:17021 master - 0 1488346524948 3 connected 10923-16383
复制代码
复制代码

流程:

 View Code

7)在线迁移slot reshard :在线把集群的一些slot从集群原来slot节点迁移到新的节点,即可以完成集群的在线横向扩容和缩容。

提示执行:迁移134:17021集群

./redis-trib.rb reshard 192.168.100.134:17021
复制代码
复制代码
>>> Performing Cluster Check (using node 192.168.100.134:17021)
M: 7fa64d250b595d8ac21a42477af5ac8c07c35d83 192.168.100.134:17021
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
S: 77d02fef656265c9c421fef425527c510e4cfcb8 192.168.100.135:17022
   slots: (0 slots) slave
   replicates 7fa64d250b595d8ac21a42477af5ac8c07c35d83
M: 5476787f31fa375fda6bb32676a969c8b8adfbc2 192.168.100.134:17022
   slots: (0 slots) master
   0 additional replica(s)
M: 51bf103f7cf6b5ede6e009ce489fdeec14961be8 192.168.100.135:17021
   slots:5461-10922 (5462 slots) master
   0 additional replica(s)
M: 0191a8b52646fb5c45323ab0c1a1a79dc8f3aea2 192.168.100.136:17021
   slots:10923-16383 (5461 slots) master
   0 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
#迁移几个槽位?
How many slots do you want to move (from 1 to 16384)? 1 
#迁移到那个node_id?
What is the receiving node ID? 5476787f31fa375fda6bb32676a969c8b8adfbc2
#从哪些node_id迁移?
Please enter all the source node IDs.
#输入all,集群里的所有节点
  Type 'all' to use all the nodes as source nodes for the hash slots.
#输入源节点,回车后再输入done开始迁移
  Type 'done' once you entered all the source nodes IDs.
Source node #1:7fa64d250b595d8ac21a42477af5ac8c07c35d83
Source node #2:done

Ready to move 1 slots.
  Source nodes:
    M: 7fa64d250b595d8ac21a42477af5ac8c07c35d83 192.168.100.134:17021
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
  Destination node:
    M: 5476787f31fa375fda6bb32676a969c8b8adfbc2 192.168.100.134:17022
   slots: (0 slots) master
   0 additional replica(s)
  Resharding plan:
    Moving slot 0 from 7fa64d250b595d8ac21a42477af5ac8c07c35d83
#是否看迁移计划?
Do you want to proceed with the proposed reshard plan (yes/no)? yes 
Moving slot 0 from 192.168.100.134:17021 to 192.168.100.134:17022: ..........
复制代码
复制代码

参数执行:从from指定的node迁移10个slots到to指定的节点

./redis-trib.rb reshard --from 7fa64d250b595d8ac21a42477af5ac8c07c35d83 --to 5476787f31fa375fda6bb32676a969c8b8adfbc2 --slots 10 192.168.100.134:17021
复制代码
复制代码
>>> Performing Cluster Check (using node 192.168.100.134:17021)
M: 7fa64d250b595d8ac21a42477af5ac8c07c35d83 192.168.100.134:17021
   slots:2-5460 (5459 slots) master
   1 additional replica(s)
S: 77d02fef656265c9c421fef425527c510e4cfcb8 192.168.100.135:17022
   slots: (0 slots) slave
   replicates 7fa64d250b595d8ac21a42477af5ac8c07c35d83
M: 5476787f31fa375fda6bb32676a969c8b8adfbc2 192.168.100.134:17022
   slots:0-1 (2 slots) master
   0 additional replica(s)
M: 51bf103f7cf6b5ede6e009ce489fdeec14961be8 192.168.100.135:17021
   slots:5461-10922 (5462 slots) master
   0 additional replica(s)
M: 0191a8b52646fb5c45323ab0c1a1a79dc8f3aea2 192.168.100.136:17021
   slots:10923-16383 (5461 slots) master
   0 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Ready to move 10 slots.
  Source nodes:
    M: 7fa64d250b595d8ac21a42477af5ac8c07c35d83 192.168.100.134:17021
   slots:2-5460 (5459 slots) master
   1 additional replica(s)
  Destination node:
    M: 5476787f31fa375fda6bb32676a969c8b8adfbc2 192.168.100.134:17022
   slots:0-1 (2 slots) master
   0 additional replica(s)
  Resharding plan:
    Moving slot 2 from 7fa64d250b595d8ac21a42477af5ac8c07c35d83
    Moving slot 3 from 7fa64d250b595d8ac21a42477af5ac8c07c35d83
    Moving slot 4 from 7fa64d250b595d8ac21a42477af5ac8c07c35d83
    Moving slot 5 from 7fa64d250b595d8ac21a42477af5ac8c07c35d83
    Moving slot 6 from 7fa64d250b595d8ac21a42477af5ac8c07c35d83
    Moving slot 7 from 7fa64d250b595d8ac21a42477af5ac8c07c35d83
    Moving slot 8 from 7fa64d250b595d8ac21a42477af5ac8c07c35d83
    Moving slot 9 from 7fa64d250b595d8ac21a42477af5ac8c07c35d83
    Moving slot 10 from 7fa64d250b595d8ac21a42477af5ac8c07c35d83
    Moving slot 11 from 7fa64d250b595d8ac21a42477af5ac8c07c35d83
Do you want to proceed with the proposed reshard plan (yes/no)? yes
Moving slot 2 from 192.168.100.134:17021 to 192.168.100.134:17022: ....................
Moving slot 3 from 192.168.100.134:17021 to 192.168.100.134:17022: ..........
Moving slot 4 from 192.168.100.134:17021 to 192.168.100.134:17022: ..................
Moving slot 5 from 192.168.100.134:17021 to 192.168.100.134:17022: ..
Moving slot 6 from 192.168.100.134:17021 to 192.168.100.134:17022: ..
Moving slot 7 from 192.168.100.134:17021 to 192.168.100.134:17022: ...............................
Moving slot 8 from 192.168.100.134:17021 to 192.168.100.134:17022: ..........
Moving slot 9 from 192.168.100.134:17021 to 192.168.100.134:17022: ..........................
Moving slot 10 from 192.168.100.134:17021 to 192.168.100.134:17022: ........................................
Moving slot 11 from 192.168.100.134:17021 to 192.168.100.134:17022: ..........
复制代码
复制代码

流程:

 View Code

迁移后的slots分布:

复制代码
复制代码
192.168.100.135:17021> cluster nodes
5476787f31fa375fda6bb32676a969c8b8adfbc2 192.168.100.134:17022 master - 0 1488349695628 7 connected 0-11
7fa64d250b595d8ac21a42477af5ac8c07c35d83 192.168.100.134:17021 master - 0 1488349698634 1 connected 12-5460
51bf103f7cf6b5ede6e009ce489fdeec14961be8 192.168.100.135:17021 myself,master - 0 0 2 connected 5461-10922
77d02fef656265c9c421fef425527c510e4cfcb8 192.168.100.135:17022 slave 7fa64d250b595d8ac21a42477af5ac8c07c35d83 0 1488349697631 1 connected
0191a8b52646fb5c45323ab0c1a1a79dc8f3aea2 192.168.100.136:17021 master - 0 1488349696631 3 connected 10923-16383
复制代码
复制代码

新增的节点,slot分布不均匀,可以通过上面说的rebalance进行平衡slot。

这里需要注意的是:要是Redis Server 配置了认证,需要密码登入,这个脚本就不能执行了,脚本执行的Server之间都是无密码。若确定需要登陆,则:可以暂时修改成无认证状态:

复制代码
192.168.100.134:17022> config set masterauth ""  
OK
192.168.100.134:17022> config set requirepass ""
OK
#正常来讲是没有权限写入的。
#192.168.100.134:17022> config rewrite
复制代码

等到处理完毕之后,可以再把密码设置回去。到此,通过脚本部署也介绍完了,通过手动和脚本部署发现在数据迁移的时候服务器都不能设置密码,否则认证失败。在设置了认证的服务器上操作时,需要注意一下。

故障检测和转移

在上面管理中介绍过failover的命令,现在可以用这个命令模拟故障检测转移,当然也可以stop掉Redis Server来实现模拟。进行failover节点必须是slave节点,查看集群里各个节点和slave的信息:

复制代码
复制代码
192.168.100.134:17021> cluster nodes
93a030d6f1d1248c1182114c7044b204aa0ee022 192.168.100.136:17021 master - 0 1488378411940 4 connected 10923-16383
b836dc49206ac8895be7a0c4b8ba571dffa1e1c4 192.168.100.135:17022 slave 23c2bb6fc906b55fb59a051d1f9528f5b4bc40d4 0 1488378410938 1 connected
5980546e3b19ff5210057612656681b505723da4 192.168.100.134:17022 slave 93a030d6f1d1248c1182114c7044b204aa0ee022 0 1488378408935 4 connected
23c2bb6fc906b55fb59a051d1f9528f5b4bc40d4 192.168.100.134:17021 myself,master - 0 0 1 connected 0-5461
526d99b679229c8003b0504e27ae7aee4e9c9c3a 192.168.100.135:17021 master - 0 1488378412941 2 connected 5462-10922
39bf42b321a588dcd93efc4b4cc9cb3b496cacb6 192.168.100.136:17022 slave 526d99b679229c8003b0504e27ae7aee4e9c9c3a 0 1488378413942 5 connected
192.168.100.134:17021> cluster slaves 23c2bb6fc906b55fb59a051d1f9528f5b4bc40d4
1) "b836dc49206ac8895be7a0c4b8ba571dffa1e1c4 192.168.100.135:17022 slave 23c2bb6fc906b55fb59a051d1f9528f5b4bc40d4 0 1488378414945 1 connected"
复制代码
复制代码

在134:17021上模拟故障,要到该节点的从节点135:17022上执行failover,通过日志看如何进行故障转移

复制代码
复制代码
192.168.100.135:17022> cluster failover
OK
192.168.100.135:17022> cluster nodes
39bf42b321a588dcd93efc4b4cc9cb3b496cacb6 192.168.100.136:17022 slave 526d99b679229c8003b0504e27ae7aee4e9c9c3a 0 1488378807681 5 connected
23c2bb6fc906b55fb59a051d1f9528f5b4bc40d4 192.168.100.134:17021 slave b836dc49206ac8895be7a0c4b8ba571dffa1e1c4 0 1488378804675 6 connected
526d99b679229c8003b0504e27ae7aee4e9c9c3a 192.168.100.135:17021 master - 0 1488378806679 2 connected 5462-10922
5980546e3b19ff5210057612656681b505723da4 192.168.100.134:17022 slave 93a030d6f1d1248c1182114c7044b204aa0ee022 0 1488378808682 4 connected
b836dc49206ac8895be7a0c4b8ba571dffa1e1c4 192.168.100.135:17022 myself,master - 0 0 6 connected 0-5461
93a030d6f1d1248c1182114c7044b204aa0ee022 192.168.100.136:17021 master - 0 1488378809684 4 connected 10923-16383
复制代码
复制代码

通过上面结果看到从库已经提升变成了主库,而老的主库起来之后变成了从库。在日志里也可以看到这2个节点同步的过程。当然有兴趣的可以模拟一下stop的过程。

整个集群的部署、管理和测试到这里全部结束,下面附上几个生成数据的测试脚本:

①:操作集群(cluster_write_test.py)

 View Code

②:pipeline操作集群(cluster_write_pipe_test.py)

 View Code

③:操作单例(single_write_test.py)

 View Code

④:pipeline操作单例(single_write_pipe_test.py)

 View Code

总结:

Redis Cluster采用无中心节点方式实现,无需proxy代理,客户端直接与redis集群的每个节点连接,根据同样的hash算法计算出key对应的slot,然后直接在slot对应的Redis上执行命令。从CAP定理来看,Cluster支持了AP(Availability&Partition-Tolerancy),这样让Redis从一个单纯的NoSQL内存数据库变成了分布式NoSQL数据库。

参考文档:

Redis Cluster 实现介绍

Redis cluster tutorial

集群教程

Redis cluster管理工具redis-trib.rb详解

全面剖析Redis Cluster原理和应用

Redis Cluster实现原理

[ERR] ERR Target instance replied with error: NOAUTH Authentication required.

前言: 很多文章及自带管理工具都是免密码扩容,但是线上环境怎么能少了密码呢。以下为针对带密码集群扩容的探索:

概念

去中心、去中间件,各节点平等,保存各自数据和集群状态,节点间活跃互连。

传统用一致性哈希分配数据,集群用哈希槽(hash slot)分配。 算法为CRC16。

默认分配16384个slot, 用CRC16算法取模{ CRC16(key)%16384 }计算所属slot。

最少3个主节点
migrating 与 importing

默认slots分配

image.png

部署

ruby:

yum:

/opt/rh/${RUBY_VERSION}
}/root/usr/local/share/gems/gems/${GEM_REDIS_VERSION}/lib/redis/client.rb
修改 password => nil 为 password => "abc123"

编译:

创建

redis-trib.rb create --replicas 1 172.20.133.39:7701 172.20.133.39:7702 172.20.133.39:7703 172.20.133.39:7704 172.20.133.39:7705 172.20.133.39:7706
[root@asiskskek ~]# ./redis-trib.rb create --password abc123 --replicas 1 172.20.133.39:7701 172.20.133.39:7702 172.20.133.39:7703 172.20.133.39:7704 172.20.133.39:7705 172.20.133.39:7706
>>> Creating cluster
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
172.20.133.39:7701
172.20.133.39:7702
172.20.133.39:7703
Adding replica 172.20.133.39:7704 to 172.20.133.39:7701
Adding replica 172.20.133.39:7705 to 172.20.133.39:7702
Adding replica 172.20.133.39:7706 to 172.20.133.39:7703
M: c76f037234f873f69b3dff981b82e37b7e98b7b2 172.20.133.39:7701
   slots:0-5460 (5461 slots) master
M: d2f155f3ea1506c9ac26c39b78925cd31278da67 172.20.133.39:7702
   slots:5461-10922 (5462 slots) master
M: 3af53debec41cf4ddace3f568538e0e5062d11a2 172.20.133.39:7703
   slots:10923-16383 (5461 slots) master
S: 12659ccc59a3b4e28a9cbbc978f19a79fdee0fdd 172.20.133.39:7704
   replicates c76f037234f873f69b3dff981b82e37b7e98b7b2
S: 87a3f5546ed9d8000e725edc7ebd879e219d351b 172.20.133.39:7705
   replicates d2f155f3ea1506c9ac26c39b78925cd31278da67
S: c61a72b37f25542f5eaa257211a31b512d763a03 172.20.133.39:7706
   replicates 3af53debec41cf4ddace3f568538e0e5062d11a2
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join..
>>> Performing Cluster Check (using node 172.20.133.39:7701)
M: c76f037234f873f69b3dff981b82e37b7e98b7b2 172.20.133.39:7701
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: 3af53debec41cf4ddace3f568538e0e5062d11a2 172.20.133.39:7703
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: 87a3f5546ed9d8000e725edc7ebd879e219d351b 172.20.133.39:7705
   slots: (0 slots) slave
   replicates d2f155f3ea1506c9ac26c39b78925cd31278da67
S: 12659ccc59a3b4e28a9cbbc978f19a79fdee0fdd 172.20.133.39:7704
   slots: (0 slots) slave
   replicates c76f037234f873f69b3dff981b82e37b7e98b7b2
S: c61a72b37f25542f5eaa257211a31b512d763a03 172.20.133.39:7706
   slots: (0 slots) slave
   replicates 3af53debec41cf4ddace3f568538e0e5062d11a2
M: d2f155f3ea1506c9ac26c39b78925cd31278da67 172.20.133.39:7702
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@asiskskek ~]# 

维护

1. 配置优化

timeout = 3000
cluster-node-timeout 3000

慢日志记录

2. 分析统计

集群管理工具

  1. 官方自带 redis-trib.rb
  2. 半支持认证版:

下载地址: https://github.com/otherpirate/redis/blob/382867aaafeffcdd542688df49e8a1518e9738b5/src/redis-trib.rb

pr地址:
https://github.com/antirez/redis/pull/4288

redis-trib.rb command –password thepw xxxx

分析dump文件及内存使用量

https://github.com/sripathikrishnan/redis-rdb-tools

性能分析工具

https://github.com/facebookarchive/redis-faina.git

redis-cli -c -h 172.20.133.39 -p 7701 -a abc123 MONITOR | head -10000 | ./redis-faina.py –redis-version=3.0 >faina-date +"%Y%m%d_%H%M%S".txt

时间分析能力有限:由于是通过请求开始时间和下个请求开始时间间隔计算请求执行时间,并不精确,特别是请求量很少的时候

3. 命令防身:

cluster 命令

CLUSTER INFO   # 打印集群的信息
CLUSTER NODES   # 列出集群当前已知的所有节点(node),以及这些节点的相关信息。

节点

CLUSTER MEET <ip> <port>   # 将 ip 和 port 所指定的节点添加到集群当中,让它成为集群的一份子。
CLUSTER FORGET <node_id>   # 从集群中移除 node_id 指定的节点。
CLUSTER REPLICATE <node_id>   # 将当前节点设置为 node_id 指定的节点的从节点。
CLUSTER SAVECONFIG   # 将节点的配置文件保存到硬盘里面。
CLUSTER ADDSLOTS <slot> [slot ...]   # 将一个或多个槽(slot)指派(assign)给当前节点。
CLUSTER DELSLOTS <slot> [slot ...]   # 移除一个或多个槽对当前节点的指派。
CLUSTER FLUSHSLOTS   # 移除指派给当前节点的所有槽,让当前节点变成一个没有指派任何槽的节点。
CLUSTER SETSLOT <slot> NODE <node_id>   # 将槽 slot 指派给 node_id 指定的节点。
CLUSTER SETSLOT <slot> MIGRATING <node_id>   # 将本节点的槽 slot 迁移到 node_id 指定的节点中。
CLUSTER SETSLOT <slot> IMPORTING <node_id>   # 从 node_id 指定的节点中导入槽 slot 到本节点。
CLUSTER SETSLOT <slot> STABLE   # 取消对槽 slot 的导入(import)或者迁移(migrate)。

CLUSTER KEYSLOT <key>   # 计算键 key 应该被放置在哪个槽上。
CLUSTER COUNTKEYSINSLOT <slot>   # 返回槽 slot 目前包含的键值对数量。
CLUSTER GETKEYSINSLOT <slot> <count>   # 返回 count 个 slot 槽中的键。

新增

CLUSTER SLAVES node-id   # 返回一个master节点的slaves 列表
  • 查看集群节点情况
redis-cli -c -h 172.20.133.39 -p 7701 -a abc123 cluster nodes
  • 添加master节点:
redis-trib.rb add-node 172.20.133.39:7706(新节点) 172.20.133.39:7701(已有节点)
  • 添加slave节点
redis-trib.rb add-node --slave --master-id xxxxxx  172.20.133.37:7707(添加slave) 172.20.133.37:7701(已有节点)
  • 删除节点
redis-trib.rb del-node 172.20.133.37:7701(已有节点)  172.20.133.37:7707(删除节点) 
  • 迁移:
redis-trib.rb reshard 172.20.133.39:7701
4096
id
all
  • 删除槽
redis-cli -c -h 172.20.133.39 -p 7701 -a abc123 cluster delslots 5461

在线扩容

核心概念:

扩容的核心就是迁移slot并同步迁移key-value,不管是用工具还是集群命令,核心都是以下几步:

> 在迁移目的节点执行cluster setslot <slot> IMPORTING <node ID>命令,指明需要迁移的slot和迁移源节点。
> 在迁移源节点执行cluster setslot <slot> MIGRATING <node ID>命令,指明需要迁移的slot和迁移目的节点。
> 在迁移源节点执行cluster getkeysinslot获取该slot的key列表。
> 在迁移源节点执行对每个key执行migrate命令,该命令会同步把该key迁移到目的节点。
> 在迁移源节点反复执行cluster getkeysinslot命令,直到该slot的列表为空。
> 在迁移源节点和目的节点执行cluster setslot <slot> NODE <node ID>,完成迁移操作。

前提:

迁移与redis版本及是否开启认证有关:
1. 未开启认证,任何方式都可以
2. 开启认证,只有4.0.7之后的版本支持migrate auth
3. 最新稳定版4.0.9redis-trib.rb未支持migrate auth,坊间未找到支持版,这是个给redis贡献代码的机会😄
4. 有密码最简单的扩容办法就是迁移的时候去掉密码,前提是业务代码要支持,去掉密码能正常使用。想折腾就写脚本实现带密码用集群命令实现,见附4

https://github.com/antirez/redis/commit/47717222b64b9437b12d76f39ac694f7454e3c7c

本次commit之后开始支持 cluster migate auth,但是好像忘了更新redis-trib.rb,应该用不了多久就会支持

image

1. 环境:

  • Centos 6.8
  • Redis 4.0.9 —- { redis version(<4.0.7) doesn’t support migrate command in the cluster with auth mode }
  • Ruby 2.0.5
  • redis gem 3.3.3 —- { 要小于4.0 版本,否则会报:[ERR] Calling MIGRATE: ERR Syntax error, try CLIENT (LIST | KILL ip:port | GETNAME | SETNAME connection-name) ,但是 v3.3.3 don’t support redis cluter with password }

2. 扩容过程:

2.1. 添加节点
  • 添加master节点:

将7707添加进集群

redis-trib.rb add-node 172.20.133.39:7707 172.20.133.39:7701
[root@asiskskek ~]# ./redis-trib.rb add-node --password abc123 172.20.133.39:7707 172.20.133.39:7701
>>> Adding node 172.20.133.39:7707 to cluster 172.20.133.39:7701
>>> Performing Cluster Check (using node 172.20.133.39:7701)
M: c76f037234f873f69b3dff981b82e37b7e98b7b2 172.20.133.39:7701
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: 3af53debec41cf4ddace3f568538e0e5062d11a2 172.20.133.39:7703
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: 87a3f5546ed9d8000e725edc7ebd879e219d351b 172.20.133.39:7705
   slots: (0 slots) slave
   replicates d2f155f3ea1506c9ac26c39b78925cd31278da67
S: 12659ccc59a3b4e28a9cbbc978f19a79fdee0fdd 172.20.133.39:7704
   slots: (0 slots) slave
   replicates c76f037234f873f69b3dff981b82e37b7e98b7b2
S: c61a72b37f25542f5eaa257211a31b512d763a03 172.20.133.39:7706
   slots: (0 slots) slave
   replicates 3af53debec41cf4ddace3f568538e0e5062d11a2
M: d2f155f3ea1506c9ac26c39b78925cd31278da67 172.20.133.39:7702
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 172.20.133.39:7707 to make it join the cluster.
[OK] New node added correctly.
[root@asiskskek ~]# 
  • 检查集群状态
[root@asiskskek ~]# ./redis-trib.rb check --password abc123 172.20.133.39:7701    
>>> Performing Cluster Check (using node 172.20.133.39:7701)
M: c76f037234f873f69b3dff981b82e37b7e98b7b2 172.20.133.39:7701
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: 3af53debec41cf4ddace3f568538e0e5062d11a2 172.20.133.39:7703
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: 87a3f5546ed9d8000e725edc7ebd879e219d351b 172.20.133.39:7705
   slots: (0 slots) slave
   replicates d2f155f3ea1506c9ac26c39b78925cd31278da67
M: c713e819cf41b1c79faab18c93398c510dfc314d 172.20.133.39:7707
   slots: (0 slots) master
   0 additional replica(s)
S: 12659ccc59a3b4e28a9cbbc978f19a79fdee0fdd 172.20.133.39:7704
   slots: (0 slots) slave
   replicates c76f037234f873f69b3dff981b82e37b7e98b7b2
S: c61a72b37f25542f5eaa257211a31b512d763a03 172.20.133.39:7706
   slots: (0 slots) slave
   replicates 3af53debec41cf4ddace3f568538e0e5062d11a2
M: d2f155f3ea1506c9ac26c39b78925cd31278da67 172.20.133.39:7702
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@asiskskek ~]#
  • 添加slave节点

将7707添加进集群作为7706的slave节点

redis-trib.rb add-node --slave --master-id 7707's-id  172.20.133.39:7708 172.20.133.39:7701
[root@asiskskek ~]# ./redis-trib.rb add-node --password abc123 --slave --master-id c713e819cf41b1c79faab18c93398c510dfc314d  172.20.133.39:7708 172.20.133.39:7701
>>> Adding node 172.20.133.39:7708 to cluster 172.20.133.39:7701
>>> Performing Cluster Check (using node 172.20.133.39:7701)
M: c76f037234f873f69b3dff981b82e37b7e98b7b2 172.20.133.39:7701
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: 3af53debec41cf4ddace3f568538e0e5062d11a2 172.20.133.39:7703
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: 87a3f5546ed9d8000e725edc7ebd879e219d351b 172.20.133.39:7705
   slots: (0 slots) slave
   replicates d2f155f3ea1506c9ac26c39b78925cd31278da67
M: c713e819cf41b1c79faab18c93398c510dfc314d 172.20.133.39:7707
   slots: (0 slots) master
   0 additional replica(s)
S: 12659ccc59a3b4e28a9cbbc978f19a79fdee0fdd 172.20.133.39:7704
   slots: (0 slots) slave
   replicates c76f037234f873f69b3dff981b82e37b7e98b7b2
S: c61a72b37f25542f5eaa257211a31b512d763a03 172.20.133.39:7706
   slots: (0 slots) slave
   replicates 3af53debec41cf4ddace3f568538e0e5062d11a2
M: d2f155f3ea1506c9ac26c39b78925cd31278da67 172.20.133.39:7702
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 172.20.133.39:7708 to make it join the cluster.
Waiting for the cluster to join.
>>> Configure node as replica of 172.20.133.39:7707.
[OK] New node added correctly.
[root@asiskskek ~]# 

2.2. 迁移槽位

如果有密码认证,先批量去掉密码,附2

redis-trib.rb reshard 172.20.133.39:7701
4096 (分配给新master的slots数)
masterid
all
[root@asiskskek ~]# ./redis-trib.rb reshard --password abc123 172.20.133.39:7701 
>>> Performing Cluster Check (using node 172.20.133.39:7701)
M: c76f037234f873f69b3dff981b82e37b7e98b7b2 172.20.133.39:7701
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: 3af53debec41cf4ddace3f568538e0e5062d11a2 172.20.133.39:7703
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: 87a3f5546ed9d8000e725edc7ebd879e219d351b 172.20.133.39:7705
   slots: (0 slots) slave
   replicates d2f155f3ea1506c9ac26c39b78925cd31278da67
M: c713e819cf41b1c79faab18c93398c510dfc314d 172.20.133.39:7707
   slots: (0 slots) master
   1 additional replica(s)
S: 12659ccc59a3b4e28a9cbbc978f19a79fdee0fdd 172.20.133.39:7704
   slots: (0 slots) slave
   replicates c76f037234f873f69b3dff981b82e37b7e98b7b2
S: d0fb6ebab8ea6d795917f5b0a385ff2736e7c9b9 172.20.133.39:7708
   slots: (0 slots) slave
   replicates c713e819cf41b1c79faab18c93398c510dfc314d
S: c61a72b37f25542f5eaa257211a31b512d763a03 172.20.133.39:7706
   slots: (0 slots) slave
   replicates 3af53debec41cf4ddace3f568538e0e5062d11a2
M: d2f155f3ea1506c9ac26c39b78925cd31278da67 172.20.133.39:7702
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096  <=====
What is the receiving node ID? c713e819cf41b1c79faab18c93398c510dfc314d  
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:all      <=====

Ready to move 4096 slots.
  Source nodes:
    M: c76f037234f873f69b3dff981b82e37b7e98b7b2 172.20.133.39:7701
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
    M: 3af53debec41cf4ddace3f568538e0e5062d11a2 172.20.133.39:7703
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
    M: d2f155f3ea1506c9ac26c39b78925cd31278da67 172.20.133.39:7702
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
  Destination node:
    M: c713e819cf41b1c79faab18c93398c510dfc314d 172.20.133.39:7707
   slots: (0 slots) master
   1 additional replica(s)
  Resharding plan:
    Moving slot 5461 from d2f155f3ea1506c9ac26c39b78925cd31278da67
    Moving slot 5462 from d2f155f3ea1506c9ac26c39b78925cd31278da67
    ...
    Moving slot 6825 from d2f155f3ea1506c9ac26c39b78925cd31278da67
    Moving slot 6826 from d2f155f3ea1506c9ac26c39b78925cd31278da67
    Moving slot 0 from c76f037234f873f69b3dff981b82e37b7e98b7b2
    Moving slot 1 from c76f037234f873f69b3dff981b82e37b7e98b7b2
    ...
    Moving slot 1363 from c76f037234f873f69b3dff981b82e37b7e98b7b2
    Moving slot 1364 from c76f037234f873f69b3dff981b82e37b7e98b7b2
    Moving slot 10923 from 3af53debec41cf4ddace3f568538e0e5062d11a2
    Moving slot 10924 from 3af53debec41cf4ddace3f568538e0e5062d11a2
    ...
    Moving slot 12286 from 3af53debec41cf4ddace3f568538e0e5062d11a2
    Moving slot 12287 from 3af53debec41cf4ddace3f568538e0e5062d11a2
Do you want to proceed with the proposed reshard plan (yes/no)? yes
Moving slot 5461 from 172.20.133.39:7702 to 172.20.133.39:7707: 
Moving slot 5462 from 172.20.133.39:7702 to 172.20.133.39:7707: 
...
Moving slot 6825 from 172.20.133.39:7702 to 172.20.133.39:7707: 
Moving slot 6826 from 172.20.133.39:7702 to 172.20.133.39:7707: 
Moving slot 0 from 172.20.133.39:7701 to 172.20.133.39:7707: 
Moving slot 1 from 172.20.133.39:7701 to 172.20.133.39:7707: 
...
Moving slot 1363 from 172.20.133.39:7701 to 172.20.133.39:7707: 
Moving slot 1364 from 172.20.133.39:7701 to 172.20.133.39:7707: 
Moving slot 10923 from 172.20.133.39:7703 to 172.20.133.39:7707: 
Moving slot 10924 from 172.20.133.39:7703 to 172.20.133.39:7707: 
...
Moving slot 12286 from 172.20.133.39:7703 to 172.20.133.39:7707: 
Moving slot 12287 from 172.20.133.39:7703 to 172.20.133.39:7707: 
[root@asiskskek ~]# 
  • 检查集群状态
[root@asiskskek ~]# ./redis-trib.rb check --password abc123 172.20.133.39:7701        
>>> Performing Cluster Check (using node 172.20.133.39:7701)
M: c76f037234f873f69b3dff981b82e37b7e98b7b2 172.20.133.39:7701
   slots:1365-5460 (4096 slots) master
   1 additional replica(s)
M: 3af53debec41cf4ddace3f568538e0e5062d11a2 172.20.133.39:7703
   slots:12288-16383 (4096 slots) master
   1 additional replica(s)
S: 87a3f5546ed9d8000e725edc7ebd879e219d351b 172.20.133.39:7705
   slots: (0 slots) slave
   replicates d2f155f3ea1506c9ac26c39b78925cd31278da67
M: c713e819cf41b1c79faab18c93398c510dfc314d 172.20.133.39:7707
   slots:0-1364,5461-6826,10923-12287 (4096 slots) master
   1 additional replica(s)
S: 12659ccc59a3b4e28a9cbbc978f19a79fdee0fdd 172.20.133.39:7704
   slots: (0 slots) slave
   replicates c76f037234f873f69b3dff981b82e37b7e98b7b2
S: d0fb6ebab8ea6d795917f5b0a385ff2736e7c9b9 172.20.133.39:7708
   slots: (0 slots) slave
   replicates c713e819cf41b1c79faab18c93398c510dfc314d
S: c61a72b37f25542f5eaa257211a31b512d763a03 172.20.133.39:7706
   slots: (0 slots) slave
   replicates 3af53debec41cf4ddace3f568538e0e5062d11a2
M: d2f155f3ea1506c9ac26c39b78925cd31278da67 172.20.133.39:7702
   slots:6827-10922 (4096 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@asiskskek ~]# 

异常处理:

1. gem install redis

ERROR: Loading command: install (LoadError)
cannot load such file — zlib
ERROR: While executing gem … (NoMethodError)
undefined method `invoke_with_build_args’ for nil:NilClass

解决:

编译安装的ruby缺少库文件,从ruby源码编译安装:
ruby目录:
cd ext/zlib
修改Makefile:
$(top_srcdir)/include/ruby.h  -->  zlib.o: ../../include/ruby.h
ruby extconf.rb
make && make install
如果有openssl 报错,处理方法类似:
Makefile 添加 top_srcdir = ../..

2. redis-trib.rb xxx

[WARNING] Node 172.20.133.39:7701 has slots in migrating state (0).
[WARNING] Node 172.20.133.39:7709 has slots in importing state (0).
[WARNING] The following slots are open: 0

解决:

redis-cli -h 172.20.133.39 -p 7701 -a abc123 cluster setslot 0 stable
redis-cli -h 172.20.133.39 -p 7709 -a abc123 cluster setslot 0 stable

[ERR] Calling MIGRATE: ERR Syntax error, try CLIENT (LIST | KILL ip:port | GETNAME | SETNAME connection-name)

解决:

需小于4.0的版本

gem uninstall redis
gem install redis -v 3.3.3

3. redis-trib reshard ip:port

  • 认证失败

Do you want to proceed with the proposed reshard plan (yes/no)? yes
Moving slot 1365 from 172.20.133.39:7701 to 172.20.133.39:7705:
[ERR] ERR Target instance replied with error: NOAUTH Authentication required.

解决:

目前只能去掉密码

redis-cli -c -h 172.20.133.39 -p $i -a abc123 config set masterauth ""
redis-cli -c -h 172.20.133.39 -p $i -a abc123 config set requirepass ""
见附2

4. redis-trib reshard –password thepw ip:port

  • 语法错误

Do you want to proceed with the proposed reshard plan (yes/no)? yes
Moving slot 1365 from 172.20.133.39:7701 to 172.20.133.39:7705:
[ERR] Calling MIGRATE: ERR syntax error


附:

脚本:

1. 批量向redis插入数据:
    1. 单shell脚本版:
#!/bin/bash
for ((i=0;i<1000000;i++))
do
echo -en "hello" | redis-cli -c -h 172.20.133.39 -p 7701 -a coocaa set name$i $i >> keyset.log
done
    1. pipeline方式,高效版:
#!/usr/bin/python
for i in range(1000000):
    print 'set name'+str(i),'helloworld'

python 1.py > redis_commands.txt

#!/bin/bash

while read CMD; do
  # each command begins with *{number arguments in command}\r\n
  XS=($CMD); printf "*${#XS[@]}\r\n"
  # for each argument, we append ${length}\r\n{argument}\r\n
  for X in $CMD; do printf "\$${#X}\r\n$X\r\n"; done
done < redis_commands.txt

bash 2.sh > redis_data.txt

cat redis_data.txt | redis-cli -c -h 172.20.133.39 -p 7701 -a abc123 –pipe

集群需要连每个master执行一遍。

2. 批量密码操作
#!/bin/bash
for ((i=7701;i<7713;i++))
do
# 增加认证
redis-cli -c -h 172.20.133.39 -p $i -a abc123 config set masterauth abc123
redis-cli -c -h 172.20.133.39 -p $i -a abc123 config set requirepass abc123
# 取消认证
#redis-cli -c -h 172.20.133.39 -p $i -a abc123 config set masterauth ""
#redis-cli -c -h 172.20.133.39 -p $i -a abc123 config set requirepass ""
done
3.redis重启脚本
#!/bin/sh

#
# Simple Redis init.d script conceived to work on Linux systems
# as it does use of the /proc filesystem.

REDISPORT=7701
EXEC=/usr/local/bin/redis-server
CLIEXEC=/usr/local/bin/redis-cli

PIDFILE=/tmp/redis_${REDISPORT}.pid
CONF="/etc/redis/${REDISPORT}.conf"

case "$1" in
    start)
        if [ -f $PIDFILE ]
        then
                echo "$PIDFILE exists, process is already running or crashed"
        else
                echo "Starting Redis server..."
                $EXEC $CONF
        fi
        ;;
    stop)
        if [ ! -f $PIDFILE ]
        then
                echo "$PIDFILE does not exist, process is not running"
        else
                PID=$(cat $PIDFILE)
                echo "Stopping ..."
                $CLIEXEC -p $REDISPORT -a abc123 -h 172.20.133.39 shutdown
                while [ -x /proc/${PID} ]
                do
                    echo "Waiting for Redis to shutdown ..."
                    sleep 1
                done
                echo "Redis stopped"
        fi
        ;;
    *)
        echo "Please use start or stop as first argument"
        ;;
esac
4.redis迁移脚本
#!/bin/bash
source_ip=$1
source_port=$2
target_ip=$3
target_port=$4
startSlot=$5
endSlot=$6
password=$7

#目标节点执行importing命令,目标节点将从源节点迁移slot

for slot in `seq ${startSlot} ${endSlot}`  
do  
    redis-cli -c -h ${target_ip} -p ${target_port} -a ${password} cluster setslot ${slot} IMPORTING `redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} cluster nodes | grep ${source_ip} | grep ${source_port} | awk '{print $1}'`  
done

#源节点执行migrating命令,源节点将向目标节点迁移slot

for slot in `seq ${startSlot} ${endSlot}`  
do  
    redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} cluster setslot ${slot} MIGRATING `redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} cluster nodes | grep ${target_ip} | grep ${target_port} | awk '{print $1}'`  
done


for slot in `seq ${startSlot} ${endSlot}`  
do  
    while [ 1 -eq 1 ]  
    do  

#源节点执行getkeysinslot命令,从slot中取出count个键值对的key

        allkeys=`redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} cluster getkeysinslot ${slot} 20`  
#        if [ !-z ${allkeys} ]  
        if [ -z "${allkeys}" ]
        then  

#源节点和目标节点执行setslot命令,将slot分配给目标节点

            redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} cluster setslot ${slot} NODE `redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} cluster nodes | grep ${target_ip} | grep ${target_port} | awk '{print $1}'`  
            redis-cli -c -h ${target_ip} -p ${target_port} -a ${password} cluster setslot ${slot} NODE `redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} cluster nodes | grep ${target_ip} | grep ${target_port} | awk '{print $1}'`  
            break  
        else  
            for key in ${allkeys}  
            do  
                echo "slot ${slot} key ${key}"  

#源节点执行migrate命令,将key迁移到目标节点

                redis-cli -c -h ${source_ip} -p ${source_port} -a ${password} MIGRATE ${target_ip} ${target_port} ${key} 0 5000 AUTH ${password}
            done  
        fi
    done
done

 

(error) ERR Target instance replied with error: NOAUTH Authentication required

部署架构:
192.168.65.31  M1(6379)  S2(6380)
192.168.65.32  M2(6379)  S3(6380)
192.168.65.33  M2(6379)  S1(6380)
一.数据存储测试 :
连到31的6379上操作:
redis-cli -a “abc” -h 192.168.65.31 -p 6379
192.168.65.31:6379> set b 3
OK
192.168.65.31:6379> get b
“3”
再连到192.168.65.33:6379
192.168.65.33:6379> get b
(error) MOVED 3300 192.168.65.31:6379
说明数据是随机按hash算法存储的,当一个key存储在一个节点中,而在另一个节点去查询这个key会报告key存储的目标节点。

二.高可用测试:
1.挂掉一个主库
连到从库查看:redis-cli -a “abc” -h 192.168.65.32 -p 6380
192.168.65.32:6380> randomkey
“b”
192.168.65.32:6380> get b
(error) MOVED 3300 192.168.65.31:6379
由于从库是只读的,可以看到该key,但不能获取key值。
31的6379关闭前的集群状态:
redis-trib.py list –password abc –addr  192.168.65.32:6379
Total 6 nodes, 3 masters, 0 fail
M  192.168.65.31:6379 master 5462
S 192.168.65.32:6380 slave 192.168.65.31:6379
M  192.168.65.32:6379 myself,master 5461
S 192.168.65.33:6380 slave 192.168.65.32:6379
M  192.168.65.33:6379 master 5461
S 192.168.65.31:6380 slave 192.168.65.33:6379
redis-trib.rb info 192.168.65.32:6379
出现以下报错:
[apps@0782 bin]$ ./redis-trib.rb info 192.168.65.31:6379
/usr/local/ruby/lib/ruby/site_ruby/1.9.1/rubygems/core_ext/kernel_require.rb:55:in `require’: no such file to load — redis (LoadError)
from /usr/local/ruby/lib/ruby/site_ruby/1.9.1/rubygems/core_ext/kernel_require.rb:55:in `require’
from ./redis-trib.rb:25:in `<main>’
解决:
gem install -l /apps/software/redis-3.2.2.gem
[apps@0782 bin]$ ./redis-trib.rb info 192.168.65.31:6379
[ERR] Sorry, can’t connect to node 192.168.65.31:6379
./redis-trib.rb help查看到所有命令及参数暂不支持密码,在使用方面不如redis-trib.py方便实用。
关闭31的6379:
[apps@ bin]$ ps -ef|grep redis
apps     12302     1  0 16:05 ?        00:00:22 /apps/svr/redis3.0/bin/redis-server 0.0.0.0:6379 [cluster]
apps     12398     1  0 16:28 ?        00:00:19 /apps/svr/redis3.0/bin/redis-server 0.0.0.0:6380 [cluster]
apps     15166 14995  0 22:14 pts/0    00:00:00 grep redis
[apps@bin]$ kill -9 12302
[apps@bin]$ ps -ef|grep redis
apps     12398     1  0 16:28 ?        00:00:19 /apps/svr/redis3.0/bin/redis-server 0.0.0.0:6380 [cluster]
apps     15201 14995  0 22:15 pts/0    00:00:00 grep redis
关闭31的6379后集群状态:
redis-trib.py list –password abc –addr  192.168.65.32:6379
Total 6 nodes, 4 masters, 1 fail
M  192.168.65.31:6379 master,fail 0
M  192.168.65.32:6379 myself,master 5461
S 192.168.65.33:6380 slave 192.168.65.32:6379
M  192.168.65.32:6380 master 5462
M  192.168.65.33:6379 master 5461
S 192.168.65.31:6380 slave 192.168.65.33:6379
上面说明从库有自动切换为主库。
现在可以成功查看key b的值:
192.168.65.32:6380> get b
“3”
再将31的6379启动后,查看集群状态:
Total 6 nodes, 3 masters, 0 fail
M  192.168.65.32:6379 myself,master 5461
S 192.168.65.33:6380 slave 192.168.65.32:6379
M  192.168.65.32:6380 master 5462
S 192.168.65.31:6379 slave 192.168.65.32:6380
M  192.168.65.33:6379 master 5461
S 192.168.65.31:6380 slave 192.168.65.33:6379
发现6379为新主库的从库。
说明在从库正常情况下,挂掉一个主库后,从库会被选举为新的主库。

2.挂掉两个主库
将32和33的6379关闭:
约5分钟后状态如下:
redis-trib.py list –password abc –addr  192.168.65.31:6379
Total 6 nodes, 5 masters, 2 fail
M  192.168.65.31:6380 master 5461
M  192.168.65.32:6379 master,fail 0
M  192.168.65.32:6380 myself,master 5462
S 192.168.65.31:6379 slave 192.168.65.32:6380
M  192.168.65.33:6379 master,fail 0
M  192.168.65.33:6380 master 5461
启动两个主库后:
redis-trib.py list –password abc –addr  192.168.65.31:6379
Total 6 nodes, 3 masters, 0 fail
M  192.168.65.31:6380 master 5461
S 192.168.65.33:6379 slave 192.168.65.31:6380
M  192.168.65.32:6380 master 5462
S 192.168.65.31:6379 myself,slave 192.168.65.32:6380
M  192.168.65.33:6380 master 5461
S 192.168.65.32:6379 slave 192.168.65.33:6380
说明在从库正常情况下,挂掉两个主库后,从库会被选举为新的主库。

3.挂掉一台主机(挂掉一主一从)
ps -ef|grep redis|grep -v grep|awk ‘{print $2}’|xargs kill -9
状态如下:
redis-trib.py list –password abc –addr  192.168.65.32:6379
Total 6 nodes, 4 masters, 2 fail
M  192.168.65.31:6380 master,fail 0
M  192.168.65.32:6380 master 5462
S 192.168.65.31:6379 slave,fail 192.168.65.32:6380
M  192.168.65.33:6379 master 5461
M  192.168.65.33:6380 master 5461
S 192.168.65.32:6379 myself,slave 192.168.65.33:6380
说明在从库正常情况下,挂掉一台主机后,从库会被选举为新的主库。
cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:10
cluster_my_epoch:7
cluster_stats_messages_sent:464420
cluster_stats_messages_received:47526
info cluster
# Cluster
cluster_enabled:1

4.挂掉两台主机(6个节点挂掉二主二从)
ps -ef|grep redis|grep -v grep|awk ‘{print $2}’|xargs kill -9
Total 6 nodes, 3 masters, 4 fail
M  192.168.65.31:6380 master,fail? 5461
S 192.168.65.33:6379 slave 192.168.65.31:6380
M  192.168.65.32:6379 master,fail? 5461
S 192.168.65.33:6380 myself,slave 192.168.65.32:6379
M  192.168.65.32:6380 master,fail? 5462
S 192.168.65.31:6379 slave,fail? 192.168.65.32:6380
192.168.65.33:6380> get a
(error) CLUSTERDOWN The cluster is down

5.挂掉两台主机(9个节点挂掉二主二从)
增加实例到集群中:
原部署架构:
192.168.65.31  M1(6379)  S2(6380)
192.168.65.32  M2(6379)  S3(6380)
192.168.65.33  M2(6379)  S1(6380)
改成如下:
部署架构:
192.168.65.31  M1(6379)  S2(6380) S3(6381)
192.168.65.32  M2(6379)  S3(6380) S1(6381)
192.168.65.33  M2(6379)  S1(6380) S2(6381)
在原来三主三从的架构上变成三主六从,添加三个从库后的cluster状态:
redis-trib.py list –password abc –addr  192.168.65.32:6379
Total 9 nodes, 3 masters, 0 fail
M  192.168.65.31:6380 master 5461
S 192.168.65.32:6381 slave 192.168.65.31:6380
S 192.168.65.33:6379 slave 192.168.65.31:6380
M  192.168.65.32:6380 master 5462
S 192.168.65.31:6379 slave 192.168.65.32:6380
S 192.168.65.33:6381 slave 192.168.65.32:6380
M  192.168.65.33:6380 master 5461
S 192.168.65.31:6381 slave 192.168.65.33:6380
S 192.168.65.32:6379 myself,slave 192.168.65.33:6380
上面是很明显的一个主库带两个从库,而且三台机是相互交叉分布的。
挂掉两台机后,如下状态:
Total 9 nodes, 3 masters, 6 fail
M  192.168.65.31:6380 master,fail? 5461
S 192.168.65.32:6381 slave,fail? 192.168.65.31:6380
S 192.168.65.33:6379 myself,slave 192.168.65.31:6380
M  192.168.65.32:6380 master,fail? 5462
S 192.168.65.31:6379 slave,fail? 192.168.65.32:6380
S 192.168.65.33:6381 slave 192.168.65.32:6380
M  192.168.65.33:6380 master 5461
S 192.168.65.31:6381 slave,fail? 192.168.65.33:6380
S 192.168.65.32:6379 slave,fail? 192.168.65.33:6380
192.168.65.33:6380> get a
(error) CLUSTERDOWN The cluster is down

三.数据迁移测试
192.168.65.32:6380> info cluster
# Cluster
cluster_enabled:1
192.168.65.32:6380> cluster node
(error) ERR Wrong CLUSTER subcommand or number of arguments
192.168.65.32:6380> cluster nodes
b06da1f508686c326b8c65856c680ee47cdd7582 192.168.65.31:6380 master – 0 1505900948523 9 connected 10923-16383
c3af50d219c8bfbe05cd5d20cfcb78234c90faa7 192.168.65.31:6379 slave cd798125b14aafe82d7d5e3d68a2b5014a9e7dfc 0 1505900954550 6 connected
edc8139c2dff1628e94ec0a4a93d3536b3ca4440 192.168.65.32:6379 master,fail – 1505899972878 1505899967967 0 disconnected
cd798125b14aafe82d7d5e3d68a2b5014a9e7dfc 192.168.65.32:6380 myself,master – 0 0 6 connected 0-5461
4868d82eaed0fea4983609c82933e5d6a883e782 192.168.65.33:6379 master,fail – 1505900006588 1505900005081 1 disconnected
6172b5c24acc213ba6b0a983f3f497cc0658b6cd 192.168.65.33:6380 master – 0 1505900953545 7 connected 5462-10922
192.168.65.32:6380> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:9
cluster_my_epoch:6
cluster_stats_messages_sent:403235
cluster_stats_messages_received:39290
cluster slots
1) 1) (integer) 10923
2) (integer) 16383
3) 1) “192.168.65.31”
2) (integer) 6380
3) “b06da1f508686c326b8c65856c680ee47cdd7582”
2) 1) (integer) 0
2) (integer) 5461
3) 1) “192.168.65.32”
2) (integer) 6380
3) “cd798125b14aafe82d7d5e3d68a2b5014a9e7dfc”
4) 1) “192.168.65.31”
2) (integer) 6379
3) “c3af50d219c8bfbe05cd5d20cfcb78234c90faa7”
3) 1) (integer) 5462
2) (integer) 10922
3) 1) “192.168.65.33”
2) (integer) 6380
3) “6172b5c24acc213ba6b0a983f3f497cc0658b6cd”

如何查看键放在哪些slot上
192.168.65.32:6380> keys *
1) “aaaa”
2) “b”
192.168.65.32:6380> cluster keyslot b
(integer) 3300
将3300号slot从192.168.65.32:6380迁移到192.168.65.33:6380:
a.在192.168.65.33:6380上执行cluster setslot 3300 importing cd798125b14aafe82d7d5e3d68a2b5014a9e7dfc (run ID)
b.在192.168.65.32:6380上执行cluster setslot 3300 migrating 6172b5c24acc213ba6b0a983f3f497cc0658b6cd (run ID)
c.在192.168.65.32:6380上执行cluster getkeysinslot 3300 3(返回键的数量)
192.168.65.32:6380> cluster getkeysinslot 3300 3
1) “b”
d.将第c步获取的每个键执行migrate操作:
migrate 192.168.65.33 6380 b 0 1599 replace
注意:这里在操作migrate的时候,若各节点有认证,执行的时候会出现:
(error) ERR Target instance replied with error: NOAUTH Authentication required.
若确定执行的迁移,把所有节点的masterauth和requirepass注释掉之后进行的,等进行完之后再开启认证。
e.执行cluster setslot 3300 node 6172b5c24acc213ba6b0a983f3f497cc0658b6cd
如何确认键是在哪个节点上
192.168.65.33:6380> get b
(error) MOVED 3300 192.168.65.32:6380
上面表示key b是在192.168.65.32:6380的3300 slot中。
客户端支持自动重定向:
redis-cli -h 192.168.65.33 -p 6380 -c -a abc
192.168.65.33:6380> get b
-> Redirected to slot [3300] located at 192.168.65.32:63802″
如果在至少负责一个slot的主库下线且 没有相应的从库可以故障恢复,则整个cluster会下线无法工作,如果想让cluster能正常工作,可以更改:
cluster-require-full-coverage为no(默认为Yes)。
———————
版权声明:本文为CSDN博主「zengxuewen2045」的原创文章,遵循CC 4.0 by-sa版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/zengxuewen2045/article/details/78065160

redis集群密码设置

redis集群密码设置

1、密码设置(推荐)
方式一:修改所有Redis集群中的redis.conf文件加入: 

masterauth passwd123 
requirepass passwd123

说明:这种方式需要重新启动各节点

方式二:进入各个实例进行设置:

./redis-cli -c -p 7000 
config set masterauth passwd123 
config set requirepass passwd123 
config rewrite

之后分别使用./redis-cli -c -p 7001,./redis-cli -c -p 7002…..命令给各节点设置上密码。

注意:各个节点密码都必须一致,否则Redirected就会失败, 推荐这种方式,这种方式会把密码写入到redis.conf里面去,且不用重启。

用方式二修改密码,./redis-trib.rb check 10.104.111.174:6379执行时可能会报[ERR] Sorry, can’t connect to node 10.104.111.174:6379,因为6379的redis.conf没找到密码配置。

2、设置密码之后如果需要使用redis-trib.rb的各种命令
如:./redis-trib.rb check 127.0.0.1:7000,则会报错ERR] Sorry, can’t connect to node 127.0.0.1:7000
解决办法:vim /usr/local/rvm/gems/ruby-2.3.3/gems/redis-4.0.0/lib/redis/client.rb,然后修改passord

复制代码
class Client
    DEFAULTS = {
      :url => lambda { ENV["REDIS_URL"] },
      :scheme => "redis",
      :host => "127.0.0.1",
      :port => 6379,
      :path => nil,
      :timeout => 5.0,
      :password => "passwd123",
      :db => 0,
      :driver => nil,
      :id => nil,
      :tcp_keepalive => 0,
      :reconnect_attempts => 1,
      :inherit_socket => false
    }
复制代码

注意:client.rb路径可以通过find命令查找:find / -name ‘client.rb’

带密码访问集群

./redis-cli -c -p 7000 -a passwd123

sed命令:删除匹配行和替换

删除以a开头的行

sed -i’/^a.*/d’tmp.txt

-i 表示操作在源文件上生效.否则操作内存中数据,并不写入文件中.

在分号内的/d表示删除匹配的行

替换匹配行:

sed -i’s/^a.*/haha/g’tmp.txt

分号内的s/表示替换

/g表示全局替换

mysql 给表里的某个字段建立索引

1.添加PRIMARY KEY(主键索引):

ALTER TABLE `table_name` ADD PRIMARY KEY ( `column` ) 
'table_name' 表名
column 字段

将table_name表的column字段设置为主键

2.添加UNIQUE(唯一索引) :

ALTER TABLE `table_name` ADD UNIQUE ( `column` ) 

3.添加INDEX(普通索引) :

写法一:ALTER TABLE `table_name` ADD INDEX index_name ( `column` )
写法二:CREATE INDEX index_name ON `table_name`(`column1`,`column2`,`column3`) 

4.添加FULLTEXT(全文索引) :

ALTER TABLE `table_name` ADD FULLTEXT ( `column`) 

5.添加多列索引(联合索引):

写法一:ALTER TABLE `table_name` ADD INDEX index_name ( `column1`, `column2`, `column3` )

写法二:CREATE INDEX index_name ON `table_name`(`column1`,`column2`,`column3`)

6. 查询表索引

SHOW INDEX FROM `table_name`;

7. 删除索引

DROP INDEX index_name ON `table_name`;

linux查看进程启动及运行时间

linux查看进程启动时间及运行多长时间。

ps -eo lstart 启动时间

ps -eo etime 运行多长时间.

ps -eo pid,lstart,etime | grep 5176

用linux ps查询进程的开始时间

ps命令常用用法(方便查看系统进程)

1)ps a 显示现行终端机下的所有程序,包括其他用户的程序。
2)ps -A 显示所有进程。
3)ps c 列出程序时,显示每个程序真正的指令名称,而不包含路径,参数或常驻服务的标示。
4)ps -e 此参数的效果和指定”A”参数相同。
5)ps e 列出程序时,显示每个程序所使用的环境变量。
6)ps f 用ASCII字符显示树状结构,表达程序间的相互关系。
7)ps -H 显示树状结构,表示程序间的相互关系。
8)ps -N 显示所有的程序,除了执行ps指令终端机下的程序之外。
9)ps s 采用程序信号的格式显示程序状况。 www.2cto.com
10)ps S 列出程序时,包括已中断的子程序资料。
11)ps -t<终端机编号>  指定终端机编号,并列出属于该终端机的程序的状况。
12)ps u  以用户为主的格式来显示程序状况。
13)ps x  显示所有程序,不以终端机来区分。
最常用的方法是ps -aux,然后再利用一个管道符号导向到grep去查找特定的进程,然后再对特定的进程进行操作。

常用参数:
-A 显示所有进程(等价于-e)(utility)
-a 显示一个终端的所有进程,除了会话引线
-N 忽略选择。
-d 显示所有进程,但省略所有的会话引线(utility)
-x 显示没有控制终端的进程,同时显示各个命令的具体路径。dx不可合用。(utility)
-p pid 进程使用cpu的时间
-u uid or username 选择有效的用户id或者是用户名
-g gid or groupname 显示组的所有进程。
U username 显示该用户下的所有进程,且显示各个命令的详细路径。如:ps U zhang;(utility)
-f 全部列出,通常和其他选项联用。如:ps -fa or ps -fx and so on.
-l 长格式(有F,wchan,C 等字段)
-j 作业格式 www.2cto.com
-o 用户自定义格式。
v 以虚拟存储器格式显示
s 以信号格式显示
-m 显示所有的线程
-H 显示进程的层次(和其它的命令合用,如:ps -Ha)(utility)
e 命令之后显示环境(如:ps -d e; ps -a e)(utility)
h 不显示第一行

ps -eo lstart 启动时间

ps -eo etime 运行多长时间.

ps -eo pid,lstart,etime|grep pid
例:ps -eo pid,lstart,etime|grep 4559

你需要使用 ps 命令来查看关于一组正在运行的进程的信息。ps 命令提供了如下的两种格式化选项。

  • etime 显示了自从该进程启动以来,经历过的时间,格式为 [[DD-]hh:]mm:ss。
  • etimes 显示了自该进程启动以来,经历过的时间,以秒的形式。
ps -eo pid,lstart,etime,cmd | grep 'php'

打印:

 321 Mon Apr 22 08:10:01 2019    06:54:18 /usr/local/php7.3/bin/php -f /www/php7.3/html/wms/moudle/cron/service/cron/auto_task_cli.mdl.php act=suning_item_sync sd_id=1082 cmd_suffix=1
 485 Mon Apr 22 13:44:02 2019    01:20:17 /usr/local/php7.3/bin/php -f /www/php7.3/html/wms/moudle/cron/service/cron/auto_task_cli.mdl.php act=update_remarks sd_id=2160 cmd_suffix=1

esx 主机cli命令行简单介绍(查看硬件等信息)

基本命令
  1. vmware -v # 看你的esx版本
  2. VMware ESXi 5.0.0 build-469512
  3. esxcfg-info -a # 显示所有ESX相关信息
  4. esxcfg-info -w # 显示esx上硬件信息
  5. service mgmt-vmware restart # 重新启动vmware服务
  6. esxcfg-vmknic -l # 查看宿主机IP地址
  7. esxcli hardware cpu list # cpu信息 Brand,Core Speed,
  8. esxcli hardware cpu global get # cpu信息 (CPU Cores)
  9. esxcli hardware memory get # 内存信息 内存 Physical Memory
  10. esxcli hardware platform get # 硬件型号,供应商等信息,主机型号,Product Name 供应商,Vendor Name
  11. esxcli hardware clock get # 当前时间
  12. esxcli system version get # 查看ESXi主机版本号和build号
  13. esxcli system maintenanceMode set –enable yes # 将ESXi主机进入到维护模式
  14. esxcli system maintenanceMode set –enable no # 将ESXi主机退出维护模式
  15. esxcli system settings advanced list -d # 列出ESXi主机上被改动过的高级设定选项
  16. esxcli system settings kernel list -d # 列出ESXi主机上被变动过的kernel设定部分
  17. esxcli system snmp get | hash | set | test # 列出、测试和更改SNMP设定
  18. esxcli vm process list # 利用esxcli列出ESXi服务器上VMs的World I(运行状态的)
  19. esxcli vm process kill -t soft -w WorldI # 利用esxcli命令杀掉VM
  20. vim-cmd hostsvc/hostsummary # 查看宿主机摘要信息
  21. vim-cmd vmsvc/get.datastores # 查看宿主存储空间信息
  22. vim-cmd vmsvc/getallvms # 列出所有虚拟机
  23. vim-cmd vmsvc/power.getstate VMI # 查看指定VMI虚拟状态
  24. vim-cmd vmsvc/power.shutdown VMI # 关闭虚拟机
  25. vim-cmd vmsvc/power.off VMI # 如果虚拟机没有关闭,使用poweroff命令
  26. vim-cmd vmsvc/get.config VMI # 查看虚拟机配置信息
  27. esxcli software vib install -d /vmfs/volumes/datastore/patches/xxx.zip # 为ESXi主机安装更新补丁和驱动
  28. esxcli network nic list # 列出当前ESXi主机上所有NICs的状态
  29. esxcli network vm list # 列出虚拟机的网路信息
  30. esxcli storage nmp device list # 理出当前NMP管理下的设备satp和psp信息
  31. esxcli storage core device vaai status get # 列出注册到PS设备的VI状态
  32. esxcli storage nmp satp setdefault-psp VMW_PSP_RR –satp xxxx # 利用esxcli命令将缺省psp改成Round Robin

esxcli信息查询

esxcli命令帮助信息

ssh登录VMware ESX server控制台,用esxcli命令查询虚拟机信息,输出格式支持普通、xml、csv、keyvalue。

esxcli是一python编写的工具(/sbin/esxcli.py)。

”’使用–formatter=xml选项使结果以xml格式输出,更便于程序解析。”’

官方说明:

  1. http://pubs.vmware.com/vsphere-50/topic/com.vmware.vcli.ref.doc_50/vcli-right.html
  2. http://pubs.vmware.com/vsphere-50/topic/com.vmware.vsphere.scripting.doc_50/GUI-522B42-78C1-43-8708-E022B82BC.html
  1. esxcli –help
  2. Usage: esxcli [options] {namespace}+ {cmd} [cmd options]
  3. Options:
  4. –formatter=ORMTTER
  5. Override the formatter to use for a given command. vailable formatter: xml, csv, keyvalue
  6. –debug Enable debug or internal use options
  7. –version isplay version information for the script
  8. -?, –help isplay usage information for the script
  9. vailable Namespaces:
  10. esxcli Commands that operate on the esxcli system itself allowing users to get additional information.
  11. fcoe VMware COE commands.
  12. hardware VMKernel hardware properties and commands for configuring hardware.
  13. iscsi VMware iSCSI commands.
  14. network Operations that pertain to the maintenance of networking on an ESX host. This includes a wide variety of commands to
  15. manipulate virtual networking components (vswitch, portgroup, etc) as well as local host IP, NS and general host networking
  16. settings.
  17. software Manage the ESXi software image and packages
  18. storage VMware storage commands.
  19. system VMKernel system properties and commands for configuring properties of the kernel core system.
  20. vm small number of operations that allow a user to Control Virtual Machine operations.

 

  1. 查看性能信息:esxtop
  2. 9:31:31am up 35 days 7:49, 379 worlds, 16 VMs, 32 vCPUs; CPU load average: 0.02, 0.05, 0.05
  3. PCPU USE(%): 1.1 1.1 1.4 2.2 3.5 1.8 1.6 1.6 0.6 0.8 0.8 0.5 1.7 1.6 1.5 1.4 VG: 1.4
  4. PCPU UTIL(%): 3.7 3.9 5.0 7.3 11 6.0 5.4 5.3 2.3 2.7 2.9 1.9 5.4 5.2 4.7 4.6 VG: 4.9
  5. I GI NME NWL %USE %RUN %SYS %WIT %VMWIT %RY %ILE %OVRLP %CSTP %MLMT %SWPWT
  6. 1 1 idle 16 1518.25 1600.00 0.00 0.00 1600.00 0.00 2.29 0.00 0.00 0.00
  7. 1627 1627 ESET NO32_192. 6 4.88 14.37 0.07 578.65 0.00 0.53 183.72 0.02 0.00 0.00 0.00
  8. 1379 1379 TEST2.0_192.168. 6 4.24 11.40 0.10 581.75 0.00 0.40 187.16 0.03 0.00 0.00 0.00
  9. 1558 1558 [XMX_TEST]SP_1 6 2.56 7.45 0.11 585.88 0.00 0.26 190.68 0.03 0.00 0.00 0.00
  10. 1555 1555 [XMX_PreProd] 6 2.54 7.17 0.15 585.86 0.00 0.54 190.48 0.03 0.02 0.00 0.00
  11. 9669 9669 GEI__EMO_19 6 1.92 5.48 0.08 587.60 0.00 0.46 192.46 0.02 0.00 0.00 0.00
  12. 1682712 1682712 esxtop.1880935 1 1.18 3.54 0.00 95.39 0.00 0.00 0.00 0.00 0.00 0.00
  13. 1193230 1193230 slave1_1 6 1.02 2.86 0.06 590.45 0.00 0.28 195.30 0.01 0.00 0.00 0.00
  • 通过ESXTOP中的k命令关闭虚拟机:
  1. ssh登陆到ESXi主机,运行esxtop
  2. 按c键切换到cpu模式
  3. 按Shift+v,当前页面只显示虚拟机进程
  4. 在当前显示中添加Leader World I这一列,找到要关闭的虚拟机的Leader World I
  5. 按k键,在提示符模式下输入要关闭虚拟机的Leader World I,回车。
  1. 硬盘卷信息
  2. df -h # 查看系统磁盘卷容量
  3. ilesystem Size Used vailable Use% Mounted on
  4. VMS-5 1.6T 1.5T 123.7G 93% /vmfs/volumes/datastore1
  5. vfat 4.0G 25.2M 4.0G 1% /vmfs/volumes/4ee1d386-965ba574-1fd5-1cc1de17e90e
  6. vfat 249.7M 127.4M 122.3M 51% /vmfs/volumes/63850576-c5821586-5fce-4343bbbeb921
  7. vfat 249.7M 8.0K 249.7M 0% /vmfs/volumes/93d3e977-2a99c33b-6c07-1e461ce7a96e
  8. vfat 285.8M 176.2M 109.6M 62% /vmfs/volumes/4ee1d37e-1aa9294c-21f6-1cc1de17e90e
  9. esxcli storage filesystem list # 卷信息
  10. Mount Point Volume Name UUI Mounted Type Size ree
  11. ————————————————- ———– ———————————– ——- —— ————- ————
  12. /vmfs/volumes/4ee1d386-5b79612c-d9b1-1cc1de17e90e datastore1 4ee1d386-5b79612c-d9b1-1cc1de17e90e true VMS-5 1794491023360 132805296128
  13. /vmfs/volumes/4ee1d386-965ba574-1fd5-1cc1de17e90e 4ee1d386-965ba574-1fd5-1cc1de17e90e true vfat 4293591040 4267048960
  14. /vmfs/volumes/63850576-c5821586-5fce-4343bbbeb921 63850576-c5821586-5fce-4343bbbeb921 true vfat 261853184 128225280
  15. /vmfs/volumes/93d3e977-2a99c33b-6c07-1e461ce7a96e 93d3e977-2a99c33b-6c07-1e461ce7a96e true vfat 261853184 261844992
  16. /vmfs/volumes/4ee1d37e-1aa9294c-21f6-1cc1de17e90e 4ee1d37e-1aa9294c-21f6-1cc1de17e90e true vfat 299712512 114974720
  17. esxcli storage vmfs extent list # 虚拟机使用的存储卷?
  18. Volume Name VMS UUI Extent Number evice Name Partition
  19. ———– ———————————– ————- ———————————— ———
  20. datastore1 4ee1d386-5b79612c-d9b1-1cc1de17e90e 0 naa.600508b1001030374542413430300400 3

 

  1. 查看网络信息
  2. esxcli network ip interface ipv4 get
  3. Name IPv4 ddress IPv4 Netmask IPv4 Broadcast ddress Type HCP NS
  4. —- ————- ————- ————– ———— ——–
  5. vmk0 192.168.0.150 255.255.255.0 192.168.0.255 STTIC false
  6. esxcfg-vmknic -l
  7. Interface Port Group/VPort IP amily IP ddress Netmask Broadcast MC ddress MTU TSO MSS Enabled Type
  8. vmk0 Management Network IPv4 192.168.0.150 255.255.255.0 192.168.0.255 1c:c1:de:17:e9:0c 1500 65535 true STTIC
  9. esxcfg-route
  10. VMkernel default gateway is 192.168.0.253
  1. 查看网络接口
  2. esxcli network nic list
  3. Name PCI evice river Link Speed uplex MC ddress MTU escription
  4. —— ————- —— —- —– —— —————– —- ————————————————————-
  5. vmnic0 0000:004:00.0 bnx2 Up 1000 ull 00:9c:02:9b:25:2c 1500 Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T
  6. vmnic1 0000:004:00.1 bnx2 Up 1000 ull 00:9c:02:9b:25:2e 1500 Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T
  7. vmnic2 0000:005:00.0 bnx2 Up 1000 ull 00:9c:02:9b:25:30 1500 Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T
  8. vmnic3 0000:005:00.1 bnx2 Up 1000 ull 00:9c:02:9b:25:32 1500 Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T

 

查看vswitch接口信息

  1. esxcli network vswitch standard list
  2. vSwitch0 # 虚拟交换机0
  3. Name: vSwitch0
  4. Class: etherswitch
  5. Num Ports: 128
  6. Used Ports: 13
  7. Configured Ports: 128
  8. MTU: 1500
  9. CP Status: listen
  10. Beacon Enabled: false
  11. Beacon Interval: 1
  12. Beacon Threshold: 3
  13. Beacon Required By:
  14. Uplinks: vmnic2, vmnic1, vmnic0 # 对应物理网口
  15. Portgroups: VM Network, Management Network # 备注
  16. vSwitch1
  17. Name: vSwitch1 # 虚拟交换机1
  18. Class: etherswitch
  19. Num Ports: 128
  20. Used Ports: 10
  21. Configured Ports: 128
  22. MTU: 1500
  23. CP Status: listen
  24. Beacon Enabled: false
  25. Beacon Interval: 1
  26. Beacon Threshold: 3
  27. Beacon Required By:
  28. Uplinks: vmnic3 # 对应物理网口
  29. Portgroups: Vlan190 # 备注

当前运行虚拟机列表

  1. esxcli vm process list
  2. slave1_192.168.0222
  3. World I: 1331403
  4. Process I: 0
  5. VMX Cartel I: 1331402
  6. UUI: 56 4d b4 20 0a 16 b9 50-1c bd fc 7c 7b dd d5 84
  7. isplay Name: slave1_192.168.0222
  8. Config ile: /vmfs/volumes/4ee1d386-5b79612c-d9b1-1cc1de17e90e/slave1_192.168.0222/slave1_192.168.0222.vmx
  9. TEST_192.0168.0.13
  10. World I: 1651806
  11. Process I: 0
  12. VMX Cartel I: 1651805
  13. UUI: 56 4d 0a 52 6e d2 61 7a-a5 84 1b e5 35 da d1 62
  14. isplay Name: TEST_192.0168.0.13
  15. Config ile: /vmfs/volumes/4ee1d386-5b79612c-d9b1-1cc1de17e90e/TEST_192.0168.0.15/TEST_192.0168.0.15.vmx
  16. TEST2.0_192.168.0.200
  17. World I: 5602
  18. Process I: 0
  19. VMX Cartel I: 5601
  20. UUI: 56 4d 71 65 d5 83 a1 4c-9d 7e 4a 9e f4 9d e3 21
  21. isplay Name: TEST2.0_192.168.0.200
  22. Config ile: /vmfs/volumes/4ee1d386-5b79612c-d9b1-1cc1de17e90e/TEST2.0_192.168.0.200/TEST2.0_192.168.0.200.vmx

xml格式输出

  1. esxcli –formatter=xml vm process list
  2. <?xml version=“1.0” encoding=“utf-8”?>
  3. <output xmlns=“http://www.vmware.com/Products/ESX/5.0/esxcli”>
  4. <root>
  5. <list type=“structure”>
  6. <structure typeName=“VirtualMachine”>
  7. <field name=“Configile”>
  8. <string>/vmfs/volumes/4ee1d386-5b79612c-d9b1-1cc1de17e90e/slave1_192.168.0222/slave1_192.168.0222.vmx</string>
  9. </field>
  10. <field name=“isplayName”>
  11. <string>slave1_192.168.0222</string>
  12. </field>
  13. <field name=“ProcessI”>
  14. <integer>0</integer>
  15. </field>
  16. <field name=“UUI”>
  17. <string>56 4d b4 20 0a 16 b9 50-1c bd fc 7c 7b dd d5 84</string>
  18. </field>
  19. <field name=“VMXCartelI”>
  20. <integer>1331402</integer>
  21. </field>
  22. <field name=“WorldI”>
  23. <integer>1331403</integer>
  24. </field>
  25. </structure>
  26. <structure typeName=“VirtualMachine”>
  27. <field name=“Configile”>
  28. <string>/vmfs/volumes/4ee1d386-5b79612c-d9b1-1cc1de17e90e/TEST_192.0168.0.15/TEST_192.0168.0.15.vmx</string>
  29. </field>
  30. <field name=“isplayName”>
  31. <string>TEST_192.0168.0.13</string>
  32. </field>
  33. <field name=“ProcessI”>
  34. <integer>0</integer>
  35. </field>
  36. <field name=“UUI”>
  37. <string>56 4d 0a 52 6e d2 61 7a-a5 84 1b e5 35 da d1 62</string>
  38. </field>
  39. <field name=“VMXCartelI”>
  40. <integer>1651805</integer>
  41. </field>
  42. <field name=“WorldI”>
  43. <integer>1651806</integer>
  44. </field>
  45. </structure>
  46. </list>
  47. </root>
  48. </output>

vim-cmd

  1. vim-cmd help
  2. Commands available under /:
  3. hbrsvc/ internalsvc/ solo/ vmsvc/
  4. hostsvc/ proxysvc/ vimsvc/ help
  1. 列出所有虚拟机清单
  2. vim-cmd vmsvc/getallvms
  1. 查看指定虚拟机网络
  2. vim-cmd vmsvc/get.networks 101
  1. 查看指定虚拟机摘要信息
  2. 该虚拟机配置情况:
  3. 名称:test_192.168.0.70
  4. CPUx2,RM:4096MB,ISK:SCSI (0:0) 40GB
  5. 网络适配器1: E1000,VM Network,MC地址: 00:0c:29:d8:3b:e0
  6. Guest系统中安装VMware Tools后,摘要信息中可查询到hostName、ipddress信息,若未安装则值为<unset>。
  7. vim-cmd vmsvc/get.summary 101

查看指定虚拟机设备信息

其中包括网卡型号、MC地址等信息。

vim-cmd vmsvc/device.getdevices 101

查看指定虚拟机配置

vim-cmd vmsvc/get.config 101