redis数据迁移

1.需求

需要将一个redis实例中的部分keys,转移到另一个redis实例

2.迁移方案

2.1 源实例与目标实例版本相同

2.1.1 使用dump命令
#!/bin/bash

#redis 源ip
src_ip=127.0.0.1
#redis 源port
src_port=6392

#redis 目的ip
dest_ip=127.0.0.1
#redis 目的port
dest_port=6393

#要迁移的key前缀
key_prefix=test

i=1

redis-cli -h $src_ip -p $src_port keys "${key_prefix}*" | while read key
do
    redis-cli -h $dest_ip -p $dest_port del $key
    redis-cli -h $src_ip -p $src_port --raw dump $key | perl -pe 'chomp if eof' | redis-cli -h $dest_ip -p $dest_port -x restore $key 0
    echo "$i migrate key $key"
    ((i++))
done
2.1.2 使用migrate命令

migrate用法:

MIGRATE host port key destination-db timeout [COPY] [REPLACE] 

起始版本:2.6.0
时间复杂度:This command actually executes a DUMP+DEL in the source instance, and a RESTORE in the target instance. See the pages of these commands for time complexity. Also an O(N) data transfer between the two instances is performed.

迁移脚本

#!/bin/bash

#redis 源ip
src_ip=127.0.0.1
#redis 源port
src_port=6392

#redis 目的ip
dest_ip=127.0.0.1
#redis 目的port
dest_port=6393

#要迁移的key前缀
key_prefix=test

i=1

redis-cli -h $src_ip -p $src_port keys "${key_prefix}*" | while read key
do
    redis-cli -h $src_ip -p $src_port migrate $dest_ip $dest_port $key 0 1000 replace
    echo "$i migrate key $key"
    ((i++))
done

2.2 源实例与目标实例版本不相同

2.2.1 不可行方案
  • 如果源实例与目标实例版本不相同,使用migrate进行迁移的时候会有如下错误:
    1935 migrate key esf_common_auth_code_18587656289
    (error) ERR Target instance replied with error: ERR DUMP payload version or checksum are wrong
  • 如果源实例与目标实例版本不相同,使用dump进行迁移的时候会有如下错误
    (error) ERR DUMP payload version or checksum are wrong
  • 如果源实例与目标实例版本不相同,直接复制rdbw文件搭建从库为有如下报错
    5453:S 23 Nov 18:13:14.153 * MASTER <-> SLAVE sync: Flushing old data
    5453:S 23 Nov 18:13:14.153 * MASTER <-> SLAVE sync: Loading DB in memory
    5453:S 23 Nov 18:13:14.153 # Can't handle RDB format version 8
    5453:S 23 Nov 18:13:14.153 # Failed trying to load the MASTER synchronization DB from disk
2.2.2 可行方案
  1. 开启源实例aof持久化功能
    config set appendonly yes
  2. 手动进行aof持久化
    bgrewriteaof
  3. 新建一个redis实例,与目标实例版本相同并启动
  4. 将源实例的redis的aof文件导入新建实例
    redis-cli -h 127.0.0.1 -p 6395 -a password --pipe < appendonly.aof
  5. 通过dump或者migrate的方式将新实例的key迁移到目标实例

[ERR] Calling MIGRATE ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)

1.搭建步骤

wiki:

1:http://www.cnblogs.com/mafly/p/redis_cluster.html
2:https://blog.csdn.net/plei_yue/article/details/78791440

1、更侧重与安装,2、更详细的描述了扩容、新增、删除、节点(主、从)
1、2均为Linux版本,Mac上唯一的区别在于,安装Rubyrubygems,使用HomeBrew来进行安装即可,
另外可以参考如下文章:

Mac OS X
10.5和更高版本的Mac OS X已经安装Ruby和RubyGems。
如果您使用的是早期版本的Mac OS,请下载并安装最新版本的 Ruby 和RubyGems.

https://blog.csdn.net/happyteafriends/article/details/8225611

2.redis.conf 集群配置

主要配置以下几个参数:

port 9001(每个节点的端口号)
daemonize yes
bind 192.168.119.131(绑定当前机器 IP)
dir /usr/local/redis-cluster/9001/data/(数据文件存放位置)
pidfile /var/run/redis_9001.pid(pid 9001和port要对应)
cluster-enabled yes(启动集群模式)
cluster-config-file nodes9001.conf(9001和port要对应)
cluster-node-timeout 15000
appendonly yes
cluster-require-full-coverage no    #默认是yes,只要有结点宕机导致16384个槽没全被覆盖,整个集群就全部停止服务,所以一定要改为no

3.踩的几个坑

3.1 redis cluster 集群重新分片故障处理

由于Mac OS自带的rubygems版本较高,重新分片时报错

[WARNING] Node 10.21.10.120:7002 has slots in importing state (3398).
[WARNING] Node 10.21.14.251:7001 has slots in migrating state (3398).
[WARNING] The following slots are open: 3398
Check slots coverage...
[OK] All 16384 slots covered.
[ERR] Calling MIGRATE ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)

解决方案:

1、ruby gem安装的redis库,版本不能使用最新的4.0,否则redis-trib.rb reshard 127.0.0.1:7000 重新分片时会报语法错误。

    1、卸载最新redis库,gem uninstall redis

    2、安装3.x版本,gem install redis -v 3.3.5 测试3.2.1到3.3.5都可以,4.x以上的分片报错
2、使用fix来进行修复,具体命令如下:
    ./redis-trib.rb fix 10.21.10.120:7000

参考wiki:

https://my.oschina.net/juluking/blog/1606222
http://blog.51cto.com/hsbxxl/1978491

4.redis集群重启

参见文档:

https://blog.csdn.net/jing956899449/article/details/53611838

 

解决 Redis Cluster 扩容故障

集群中有个节点挂掉了,并且报错信息如下:
—— STACK TRACE ——

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
EIP:
/usr/local/bin/redis-server 0.0.0.0:6380 [cluster](migrateCloseSocket+0x52)[0x4644f2]
Backtrace:
/usr/local/bin/redis-server 0.0.0.0:6380 [cluster](logStackTrace+0x3c)[0x45bd5c]
/usr/local/bin/redis-server 0.0.0.0:6380 [cluster](sigsegvHandler+0xa1)[0x45cc41]
/lib64/libpthread.so.0[0x336b60f710]
/usr/local/bin/redis-server 0.0.0.0:6380 [cluster](migrateCloseSocket+0x52)[0x4644f2]
/usr/local/bin/redis-server 0.0.0.0:6380 [cluster](migrateCommand+0x7cd)[0x46744d]
/usr/local/bin/redis-server 0.0.0.0:6380 [cluster](call+0x72)[0x424192]
/usr/local/bin/redis-server 0.0.0.0:6380 [cluster](processCommand+0x365)[0x428d75]
/usr/local/bin/redis-server 0.0.0.0:6380 [cluster](processInputBuffer+0x109)[0x435089]
/usr/local/bin/redis-server 0.0.0.0:6380 [cluster](aeProcessEvents+0x13d)[0x41f86d]
/usr/local/bin/redis-server 0.0.0.0:6380 [cluster](aeMain+0x2b)[0x41fb6b]
/usr/local/bin/redis-server 0.0.0.0:6380 [cluster](main+0x370)[0x427220]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x336b21ed1d]
/usr/local/bin/redis-server 0.0.0.0:6380 [cluster][0x41d039]

 

挂掉了之后, 我们用redis_tribe 这个脚本进行对我们的redis 集群进行状态检查,发现有个槽很久都处于import状态和migrate状态之间。
[WARNING] Node 10.112.142.21:7210 has slots in importing state (45).
[WARNING] Node 10.112.142.20:6380 has slots in migrating state (45).
之后我们用 fix 对这个集群进行修复,然后整个集群才ok了。

以下是我们开始尝试着用 rebalance ,让redis 自己来帮我们调整整个集群的solt 分配情况:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@GZ-JSQ-JP-REDIS-CLUSTER-142-21 ~]#<strong> /usr/local/bin/redis-trib.rb rebalance 10.112.142.21:7211</strong>
>>> Performing Cluster Check (using node 10.112.142.21:7211)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Rebalancing across 7 nodes. Total weight = 7
Moving 892 slots from 10.112.142.20:6380 to 10.112.142.21:7210
[ERR] Calling MIGRATE: ERR Target instance replied with error: <strong>BUSYKEY Target key name already exists.</strong>
>>> Check for open slots...
[WARNING] Node 10.112.142.20:6380 has slots in migrating state (45).
<strong>[WARNING] The following slots are open: 45
>>> Fixing open slot 45</strong>
*** Found keys about slot 45 in node 10.112.142.21:7210!
Set as migrating in: 10.112.142.20:6380
Set as importing in: 10.112.142.21:7210
Moving slot 45 from 10.112.142.20:6380 to 10.112.142.21:7210:
*** Target key exists. Replacing it for FIX.
/usr/local/lib/ruby/gems/2.3.0/gems/redis-3.3.1/lib/redis/client.rb:121:in `call': MOVED 45 10.112.142.21:6380 (Redis::CommandError)
<strong>from /usr/local/bin/redis-trib.rb:942:in `rescue in move_slot'
</strong>from /usr/local/bin/redis-trib.rb:937:in `move_slot'
from /usr/local/bin/redis-trib.rb:607:in `fix_open_slot'
from /usr/local/bin/redis-trib.rb:422:in `block in check_open_slots'
from /usr/local/bin/redis-trib.rb:422:in `each'
from /usr/local/bin/redis-trib.rb:422:in `check_open_slots'
from /usr/local/bin/redis-trib.rb:360:in `check_cluster'
from /usr/local/bin/redis-trib.rb:1140:in `fix_cluster_cmd'
from /usr/local/bin/redis-trib.rb:1696:in `<main>'

 

那么很明显,这样是行不通的,报了一些奇奇怪怪的问题,说是我们的key 已经存在了,那么开到这里是不是我们的的集群里边有脏数据了呢?于是我们把这两个节点的所有key 拿出来对比了一番,发现并没有重复的key出现,也就是没有胀数据啦,那怎么办呢?我们集群是一定要扩容的,不然双11肯定是抗不住的。

ok, 还好这个工具提供了另外一种人工迁移solt 的方式【reshard】,知道了这个方式后,我们很愉快的迁移了大部分节点, 但是在迁移第45个solt 的时候又出问题了。出的问题就和上边的类似。

看样子问题是出现在这第45 号solt 身上,如果我们不解决这个问题,这个节点上的负载就会很大,双11 就可能会成为瓶颈。我们看一下这个reshard 都做了哪些动作:
通过redis-trib.rb 源码可以看到:
       cluster setslot imporing

       cluster setslot migrating

       cluster getkeysinslot

       migrate

       setslot node

我们进行重新分片的时候我们会进行这几个操作。那么从上边重新分片失败的情况看,有可能是由于在迁移过程中超时导致的,或者说某个很大的key堵塞了redis导致的。
OK,我们先猜想到这里, 那我们接下来开始验证我们的猜想,看看第45 号solt 是不是有比较大的solt。
首先利用上边给出的信息,我们看一下这个solt 里边都有哪些key:

10.205.142.21:6380> CLUSTER GETKEYSINSLOT 45 100
1) “JIUKUIYOU_COM_GetCouponInfo_3479num”
2) “com.juanpi.api.user_hbase_type”
3) “t2526767”
4) “t2593793”
…………

然后呢,我们看一下每一个key序列化后都占了多大的空间:
10.205.142.21:6380> DEBUG OBJECT com.juanpi.api.user_hbase_type
Value at:0x7f973b7226d0 refcount:1 encoding:hashtable serializedlength:489435339 lru:1802371 lru_seconds_idle:3013 【466MB!!!!】
(7.11s)
到此,我们看到了一个蛮大的key,看一下量:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
10.205.142.21:6380> <strong>HLEN com.juanpi.api.user_hbase_type</strong>
-> Redirected to slot [45] located at 10.205.142.20:6380
(integer) <strong>6589164</strong>
10.205.142.20:6380> <strong>HSCAN com.juanpi.api.user_hbase_type 0</strong>
1) "3670016"
2) 1) "6581cc3950e071873763e4b016b66914"
2) "{\"type\":\"A1\",\"time\":1434625093}"
3) "4baca6b94be68d704f348ee0a3e45915"
4) "{\"type\":{\"A\":\"A3\",\"C\":\"C1\"},\"time\":1442105611}"
5) "b83ce222b54890bf4de03cfad2362e9e"
6) "{\"type\":\"A1\",\"time\":1434672804}"
7) "821d1f1a27c63d6d40bc1d969bcec5f6"
8) "{\"type\":{\"A\":\"A6\",\"C\":\"C3\"},\"time\":1435705301}"
9) "cbce79067ad2773b87360fb91b9a325c"
10) "{\"type\":\"A3\",\"time\":1433925484}"
11) "8eef2bc24fd819687e017f7bd1ad8e1c"
12) "{\"type\":{\"A\":\"A6\",\"C\":\"C2\"},\"time\":1435543716}"
13) "d4112308e47066f2e3d35dbcf96ba092"
14) "{\"type\":{\"A\":\"A1\",\"C\":\"C4\"},\"time\":1435141321}"
15) "2ca72205e82f5d9188cc9c56285ea161"
16) "{\"type\":{\"A\":\"A2\",\"C\":\"C3\"},\"time\":1434946176}"
17) "b259ce800a0112dbd316f54aae7679a6"
18) "{\"type\":{\"A\":\"A6\",\"C\":\"\"},\"time\":1441892174}"
19) "039b9011d1470791143563a2660d8dc2"
20) "{\"type\":{\"A\":\"A6\",\"C\":\"C3\"},\"time\":1435442319}"
21) "82def11dfe206501074eb200558fb8a5"
22) "{\"type\":\"C2\",\"time\":1434466702}"

 

到此,问题我们基本锁定就是这个solt里边有个非正常大小的key了,那么到底是不是这个key导致的呢?如果是我们又改如何验证呢?
首先我们想到的就是能不能跳过这个solt 的迁移,或者说迁移指定的solt呢?
那么对于迁移指定的solt,对于原始的这个工具里边是没有支持的,而如果要实现的话,我们需要手动拆解这几个步骤,自己实现逻辑
那么跳过这个solt 呢?看起来稍微的比较容易实现一点,那么我们就修改一下redis-trib.rb 源码好了:

首先我们找到函数入口: def reshard_cluster_cmd(argv,opt)    【大约在1200行左右】
然后找到这句话:print “Do you want to proceed with the proposed reshard plan (yes/no)? ”
在它下边几行添加个逻辑:判读当前solt是否是 45 solt,如果是则跳过,如果不是则迁移

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
if !opt['yes']
        print "Do you want to proceed with the proposed reshard plan (yes/no)? "
        yesno = STDIN.gets.chop
        exit(1if (yesno != "yes")
    end
    reshard_table.each{|e|
    xputs "------------------------> #{e[:slot]}"
    case e[:slot]
         when 45
             puts "sb 45"
         else
             move_slot(e[:source],target,e[:slot],
                   :dots=>true,
                   :pipeline=>opt['pipeline'])
         END<br>    END<br>END

Ok,我们就来试一把吧,看看跳过这个solt之后, 我们重新分片是否成功!那么结果符合我们的猜测,就是由于这个solt 里边有个超大的key导致的。事后通过更业务方商量,这个key是个无效的key,可以删掉的!呵呵

最后,问题到此已经顺利解决了。

总结一下:
有时候报错的信息不一定能够准确的反应问题所在,我们需要清理在报错期间我们执行了什么操作,这个操作的具体步骤有哪些,涉及道德这个系统的原理又是怎么样的。通过一步步的推理、猜测、验证最终得到问题的解答。

上边那种粗暴的方法来修改源码其实还有没有考虑到的地方,
比如有些除了这个solt之外又没有其它solt 有类似这个大的key呢?
是不是应该在迁移solt前,对整个solt的key大小进行一次扫描,检查呢?

redis-trib.rb命令详解

edis-trib.rb是官方提供的Redis Cluster的管理工具,无需额外下载,默认位于源码包的src目录下,但因该工具是用ruby开发的,所以需要准备相关的依赖环境。

 

准备redis-trib.rb的运行环境

wget https://cache.ruby-lang.org/pub/ruby/2.5/ruby-2.5.1.tar.gz

yum -y install zlib-devel

tar xvf ruby-2.5.1.tar.gz

cd ruby-2.5.1/

./configure -prefix=/usr/local/ruby

make

make install

cd /usr/local/ruby/

cp bin/ruby /usr/local/bin

cp bin/gem /usr/local/bin

 

安装rubygem redis依赖

wget http://rubygems.org/downloads/redis-3.3.0.gem

gem install -l redis-3.3.0.gem

 

redis-trib.rb支持的操作

复制代码
# redis-trib.rb help
Usage: redis-trib <command> <options> <arguments ...>

  create          host1:port1 ... hostN:portN
                  --replicas <arg>
  check           host:port
  info            host:port
  fix             host:port
                  --timeout <arg>
  reshard         host:port
                  --from <arg>
                  --to <arg>
                  --slots <arg>
                  --yes
                  --timeout <arg>
                  --pipeline <arg>
  rebalance       host:port
                  --weight <arg>
                  --auto-weights
                  --use-empty-masters
                  --timeout <arg>
                  --simulate
                  --pipeline <arg>
                  --threshold <arg>
  add-node        new_host:new_port existing_host:existing_port
                  --slave
                  --master-id <arg>
  del-node        host:port node_id
  set-timeout     host:port milliseconds
  call            host:port command arg arg .. arg
  import          host:port
                  --from <arg>
                  --copy
                  --replace
  help            (show this help)

For check, fix, reshard, del-node, set-timeout you can specify the host and port of any working node in the cluster.
复制代码

支持的操作如下:

1. create:创建集群

2. check:检查集群

3. info:查看集群信息

4. fix:修复集群

5. reshard:在线迁移slot

6. rebalance:平衡集群节点slot数量

7. add-node:添加新节点

8. del-node:删除节点

9. set-timeout:设置节点的超时时间

10. call:在集群所有节点上执行命令

11. import:将外部redis数据导入集群

 

创建集群

redis-trib.rb create –replicas 1 127.0.0.1:6379 127.0.0.1:6380 127.0.0.1:6381 127.0.0.1:6382 127.0.0.1:6383 127.0.0.1:6384

–replicas参数指定集群中每个主节点配备几个从节点,这里设置为1。

复制代码
>>> Creating cluster
/usr/local/ruby/lib/ruby/gems/2.5.0/gems/redis-3.3.0/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
127.0.0.1:6379
127.0.0.1:6380
127.0.0.1:6381
Adding replica 127.0.0.1:6383 to 127.0.0.1:6379
Adding replica 127.0.0.1:6384 to 127.0.0.1:6380
Adding replica 127.0.0.1:6382 to 127.0.0.1:6381
>>> Trying to optimize slaves allocation for anti-affinity
[WARNING] Some slaves are in the same host as their master
M: bc775f9c4dea40820b82c9451778b1fcd42f92bc 127.0.0.1:6379
   slots:0-5460 (5461 slots) master
M: 3b27d00d13706a032a92ff6b0a914af272dcaaf2 127.0.0.1:6380
   slots:5461-10922 (5462 slots) master
M: d874f003257f1fb036bbd856ca605172a1741232 127.0.0.1:6381
   slots:10923-16383 (5461 slots) master
S: 648eb314863b82aaa676380be7db2ec307f5547d 127.0.0.1:6382
   replicates bc775f9c4dea40820b82c9451778b1fcd42f92bc
S: 65a6efb441ac44c348f7da8c62e26b888cda7c48 127.0.0.1:6383
   replicates 3b27d00d13706a032a92ff6b0a914af272dcaaf2
S: 57bda956485109552547aef6c77fba43d2124abf 127.0.0.1:6384
   replicates d874f003257f1fb036bbd856ca605172a1741232
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join...
>>> Performing Cluster Check (using node 127.0.0.1:6379)
M: bc775f9c4dea40820b82c9451778b1fcd42f92bc 127.0.0.1:6379
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
S: 648eb314863b82aaa676380be7db2ec307f5547d 127.0.0.1:6382
   slots: (0 slots) slave
   replicates bc775f9c4dea40820b82c9451778b1fcd42f92bc
M: 3b27d00d13706a032a92ff6b0a914af272dcaaf2 127.0.0.1:6380
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: 57bda956485109552547aef6c77fba43d2124abf 127.0.0.1:6384
   slots: (0 slots) slave
   replicates d874f003257f1fb036bbd856ca605172a1741232
S: 65a6efb441ac44c348f7da8c62e26b888cda7c48 127.0.0.1:6383
   slots: (0 slots) slave
   replicates 3b27d00d13706a032a92ff6b0a914af272dcaaf2
M: d874f003257f1fb036bbd856ca605172a1741232 127.0.0.1:6381
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
复制代码

16384个槽全部被分配,集群创建成功。注意:给redis-trib.rb的节点地址必须是不包含任何槽/数据的节点,否则会拒绝创建集群。

>>> Creating cluster
/usr/local/ruby/lib/ruby/gems/2.5.0/gems/redis-3.3.0/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
[ERR] Node 127.0.0.1:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.

关于主从节点的选择及槽的分配,其算法如下:

1> 把节点按照host分类,这样保证master节点能分配到更多的主机中。

2> 遍历host列表,从每个host列表中弹出一个节点,放入interleaved数组。直到所有的节点都弹出为止。

3> 将interleaved数组中前master个数量的节点保存到masters数组中。

4> 计算每个master节点负责的slot数量,16384除以master数量取整,这里记为N。

5> 遍历masters数组,每个master分配N个slot,最后一个master,分配剩下的slot。

6> 接下来为master分配slave,分配算法会尽量保证master和slave节点不在同一台主机上。对于分配完指定slave数量的节点,还有多余的节点,也会为这些节点寻找master。分配算法会遍历两次masters数组。

7> 第一次遍历master数组,在余下的节点列表找到replicas数量个slave。每个slave为第一个和master节点host不一样的节点,如果没有不一样的节点,则直接取出余下列表的第一个节点。

8> 第二次遍历是分配节点数除以replicas不为整数而多出的一部分节点。

 

检查集群状态

redis-trib.rb check 127.0.0.1:6379

指定任意一个节点即可。

复制代码
/usr/local/ruby/lib/ruby/gems/2.5.0/gems/redis-3.3.0/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
>>> Performing Cluster Check (using node 127.0.0.1:6379)
M: bc775f9c4dea40820b82c9451778b1fcd42f92bc 127.0.0.1:6379
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
S: 648eb314863b82aaa676380be7db2ec307f5547d 127.0.0.1:6382
   slots: (0 slots) slave
   replicates bc775f9c4dea40820b82c9451778b1fcd42f92bc
M: 3b27d00d13706a032a92ff6b0a914af272dcaaf2 127.0.0.1:6380
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: 57bda956485109552547aef6c77fba43d2124abf 127.0.0.1:6384
   slots: (0 slots) slave
   replicates d874f003257f1fb036bbd856ca605172a1741232
S: 65a6efb441ac44c348f7da8c62e26b888cda7c48 127.0.0.1:6383
   slots: (0 slots) slave
   replicates 3b27d00d13706a032a92ff6b0a914af272dcaaf2
M: d874f003257f1fb036bbd856ca605172a1741232 127.0.0.1:6381
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
复制代码

 

查看集群信息

redis-trib.rb info 127.0.0.1:6383

/usr/local/ruby/lib/ruby/gems/2.5.0/gems/redis-3.3.0/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
127.0.0.1:6380 (3b27d00d...) -> 0 keys | 5462 slots | 1 slaves.
127.0.0.1:6381 (d874f003...) -> 1 keys | 5461 slots | 1 slaves.
127.0.0.1:6379 (bc775f9c...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 1 keys in 3 masters.
0.00 keys per slot on average.

 

修复集群

目前fix命令能修复两种异常,

1. 节点中存在处于迁移中(importing或migrating状态)的slot。

2. 节点中存在未分配的slot。

其它异常不能通过fix命令修复。

复制代码
[root@slowtech conf]# redis-trib.rb fix 127.0.0.1:6379
/usr/local/ruby/lib/ruby/gems/2.5.0/gems/redis-3.3.0/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
>>> Performing Cluster Check (using node 127.0.0.1:6379)
S: d826c5fd98efa8a17a880e9a90a25f06c88e6ae9 127.0.0.1:6379
   slots: (0 slots) slave
   replicates a8b3d0f9b12d63dab3b7337d602245d96dd55844
S: 55c05d5b0dfea0d52f88548717ddf24975268de6 127.0.0.1:6383
   slots: (0 slots) slave
   replicates a8b3d0f9b12d63dab3b7337d602245d96dd55844
M: f413fb7e6460308b17cdb71442798e1341b56cbc 127.0.0.1:6381
   slots:50-16383 (16334 slots) master
   2 additional replica(s)
S: beba753c5a63607fa66d9ec7427ed9a511ea136e 127.0.0.1:6382
   slots: (0 slots) slave
   replicates f413fb7e6460308b17cdb71442798e1341b56cbc
S: 83797d518e56c235272402611477f576973e9d34 127.0.0.1:6384
   slots: (0 slots) slave
   replicates f413fb7e6460308b17cdb71442798e1341b56cbc
M: a8b3d0f9b12d63dab3b7337d602245d96dd55844 127.0.0.1:6380
   slots:0-49 (50 slots) master
   2 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
复制代码

 

在线迁移slot

交互环境中使用

如,

redis-trib.rb reshard 127.0.0.1:6379

指定任意一个节点即可。

复制代码
/usr/local/ruby/lib/ruby/gems/2.5.0/gems/redis-3.3.0/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
>>> Performing Cluster Check (using node 127.0.0.1:6379)
M: bc775f9c4dea40820b82c9451778b1fcd42f92bc 127.0.0.1:6379
   slots:3225-5460 (2236 slots) master
   1 additional replica(s)
S: 648eb314863b82aaa676380be7db2ec307f5547d 127.0.0.1:6382
   slots: (0 slots) slave
   replicates bc775f9c4dea40820b82c9451778b1fcd42f92bc
M: 3b27d00d13706a032a92ff6b0a914af272dcaaf2 127.0.0.1:6380
   slots:0-3224,5461-13958 (11723 slots) master
   1 additional replica(s)
S: 57bda956485109552547aef6c77fba43d2124abf 127.0.0.1:6384
   slots: (0 slots) slave
   replicates d874f003257f1fb036bbd856ca605172a1741232
S: 65a6efb441ac44c348f7da8c62e26b888cda7c48 127.0.0.1:6383
   slots: (0 slots) slave
   replicates 3b27d00d13706a032a92ff6b0a914af272dcaaf2
M: d874f003257f1fb036bbd856ca605172a1741232 127.0.0.1:6381
   slots:13959-16383 (2425 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 200
What is the receiving node ID? 3b27d00d13706a032a92ff6b0a914af272dcaaf2
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:
复制代码

它首先会提示需要迁移多个槽,我这里写的是200。

接着它会提示需要将槽迁移到哪个节点,这里必须写节点ID。

紧跟着它会提示槽从哪些节点中迁出。

如果指定为all,则待迁移的槽在剩余节点中平均分配,在这里,127.0.0.1:6379和127.0.0.1:6381各迁移100个槽出来。

也可从指定节点中迁出,这个时候,必须指定源节点的节点ID,最后以done结束,如下所示,

复制代码
Source node #1:bc775f9c4dea40820b82c9451778b1fcd42f92bc
Source node #2:done

Ready to move 200 slots.
  Source nodes:
    M: bc775f9c4dea40820b82c9451778b1fcd42f92bc 127.0.0.1:6379
   slots:3225-5460 (2236 slots) master
   1 additional replica(s)
  Destination node:
    M: 3b27d00d13706a032a92ff6b0a914af272dcaaf2 127.0.0.1:6380
   slots:0-3224,5461-13958 (11723 slots) master
   1 additional replica(s)
  Resharding plan:
    Moving slot 3225 from bc775f9c4dea40820b82c9451778b1fcd42f92bc
    Moving slot 3226 from bc775f9c4dea40820b82c9451778b1fcd42f92bc
    Moving slot 3227 from bc775f9c4dea40820b82c9451778b1fcd42f92bc
    ...
Do you want to proceed with the proposed reshard plan (yes/no)? yes
Moving slot 3225 from 127.0.0.1:6379 to 127.0.0.1:6380: .
Moving slot 3226 from 127.0.0.1:6379 to 127.0.0.1:6380: 
Moving slot 3227 from 127.0.0.1:6379 to 127.0.0.1:6380: ..
Moving slot 3228 from 127.0.0.1:6379 to 127.0.0.1:6380: 
...
复制代码

最后,提示是否继续进行。

 

命令行中使用

redis-trib.rb reshard host:port --from <arg> --to <arg> --slots <arg> --yes --timeout <arg> --pipeline <arg>

其中,

host:port:必传参数,集群内任意节点地址,用来获取整个集群信息。

–from:源节点id,如果有多个源节点,使用逗号分隔,如果是all,则源节点为集群内出目标节点外的其它所有主节点。

–to:目标节点id,只能填写一个。

–slots:需要迁移槽的总数量。

–yes:迁移无需用户手动确认。

–timeout:控制每次migrate操作的超时时间,默认为60000毫秒。

–pipeline:控制每次批量迁移键的数量,默认为10。

如,

redis-trib.rb reshard --from a8b3d0f9b12d63dab3b7337d602245d96dd55844 --to f413fb7e6460308b17cdb71442798e1341b56cbc  --slots 10923 --yes --pipeline 20 127.0.0.1:6383

 

平衡集群节点slot数量

复制代码
rebalance       host:port
                  --weight <arg>
                  --auto-weights
                  --use-empty-masters
                  --timeout <arg>
                  --simulate
                  --pipeline <arg>
                  --threshold <arg>
复制代码

其中,

–weight <arg>:节点的权重,格式为node_id=weight,如果需要为多个节点分配权重的话,需要添加多个–weight <arg>参数,即–weight b31e3a2e=5 –weight 60b8e3a1=5,node_id可为节点名称的前缀,只要保证前缀位数能唯一区分该节点即可。没有传递–weight的节点的权重默认为1。

–auto-weights:自动将每个节点的权重默认为1。如果–weight和–auto-weights同时指定,则–auto-weights会覆盖前者。

–threshold <arg>:只有节点需要迁移的slot阈值超过threshold,才会执行rebalance操作。

–use-empty-masters:默认没有分配slot节点的master是不参与rebalance的。如果要让其参与rebalance,需添加该参数。

–timeout <arg>:设置migrate命令的超时时间。

–simulate:设置该参数,只会提示用户会迁移哪些slots,而不会执行真正的迁移操作。

–pipeline <arg>:定义cluster getkeysinslot命令一次取出的key数量,不传的话使用默认值为10。

如,

复制代码
# redis-trib.rb rebalance --weight a8b3d0f9b12d63dab3b7337d602245d96dd55844=3 --weight f413fb7e6460308b17cdb71442798e1341b56cbc=2  --use-empty-masters  127.0.0.1:6379
/usr/local/ruby/lib/ruby/gems/2.5.0/gems/redis-3.3.0/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
>>> Performing Cluster Check (using node 127.0.0.1:6379)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Rebalancing across 2 nodes. Total weight = 5.0
Moving 3824 slots from 127.0.0.1:6380 to 127.0.0.1:6381
#########################################...
复制代码

 

删除节点

redis-trib.rb del-node host:port node_id

在删除节点之前,其对应的槽必须为空,所以,在进行节点删除动作之前,必须使用redis-trib.rb reshard将其迁移出去。

需要注意的是,如果某个节点的槽被完全迁移出去,其对应的slave也会随着更新,指向迁移的目标节点。

# redis-trib.rb del-node 127.0.0.1:6379 8f7836a9a14fb6638530b42e04f5e58e28de0a6c
>>> Removing node 8f7836a9a14fb6638530b42e04f5e58e28de0a6c from cluster 127.0.0.1:6379
/usr/local/ruby/lib/ruby/gems/2.5.0/gems/redis-3.3.0/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.

 

添加新节点

redis-trib add-node new_host:new_port existing_host:existing_port --slave --master-id <arg>

其中,

new_host:new_port:待添加的节点,必须确保其为空或不在其它集群中。否则,会提示以下错误。

[ERR] Node 127.0.0.1:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.

所以,线上建议使用redis-trib.rb添加新节点,因为其会对新节点的状态进行检查。如果手动使用cluster meet命令加入已经存在于其它集群的节点,会造成被加入节点的集群合并到现有集群的情况,从而造成数据丢失和错乱,后果非常严重,线上谨慎操作。

existing_host:existing_port:集群中任意一个节点的地址。

如果添加的是主节点,只需指定源节点和目标节点的地址即可。

redis-trib.rb add-node 127.0.0.1:6379 127.0.0.1:6384

如果添加的是从节点,其语法如下,

redis-trib.rb add-node --slave --master-id f413fb7e6460308b17cdb71442798e1341b56cbc 127.0.0.1:6379 127.0.0.1:6384

注意:–slave和–master-id必须写在前面,同样的参数,如果是下面这样写法,会提示错误,

# redis-trib.rb add-node 127.0.0.1:6379 127.0.0.1:6384 --slave --master-id f413fb7e6460308b17cdb71442798e1341b56cbc
[ERR] Wrong number of arguments for specified sub command

添加从节点,可不设置–master-id,此时会随机选择主节点。

 

设置节点的超时时间

redis-trib.rb set-timeout host:port milliseconds

其实就是批量修改集群各节点的cluster-node-timeout参数。

复制代码
# redis-trib.rb set-timeout 127.0.0.1:6379 20000
/usr/local/ruby/lib/ruby/gems/2.5.0/gems/redis-3.3.0/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
>>> Reconfiguring node timeout in every cluster node...
*** New timeout set for 127.0.0.1:6379
*** New timeout set for 127.0.0.1:6383
*** New timeout set for 127.0.0.1:6381
*** New timeout set for 127.0.0.1:6382
*** New timeout set for 127.0.0.1:6384
*** New timeout set for 127.0.0.1:6380
>>> New node timeout set. 6 OK, 0 ERR.
复制代码

 

在集群所有节点上执行命令

redis-trib.rb call host:port command arg arg .. arg

如,

复制代码
[root@slowtech conf]# redis-trib.rb call 127.0.0.1:6379 set hello world
/usr/local/ruby/lib/ruby/gems/2.5.0/gems/redis-3.3.0/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
>>> Calling SET hello world
127.0.0.1:6379: MOVED 866 127.0.0.1:6381
127.0.0.1:6383: MOVED 866 127.0.0.1:6381
127.0.0.1:6381: OK
127.0.0.1:6382: MOVED 866 127.0.0.1:6381
127.0.0.1:6384: MOVED 866 127.0.0.1:6381
127.0.0.1:6380: MOVED 866 127.0.0.1:6381

[root@slowtech conf]# redis-trib.rb call 127.0.0.1:6379 get hello
/usr/local/ruby/lib/ruby/gems/2.5.0/gems/redis-3.3.0/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
>>> Calling GET hello
127.0.0.1:6379: MOVED 866 127.0.0.1:6381
127.0.0.1:6383: MOVED 866 127.0.0.1:6381
127.0.0.1:6381: world
127.0.0.1:6382: MOVED 866 127.0.0.1:6381
127.0.0.1:6384: MOVED 866 127.0.0.1:6381
127.0.0.1:6380: MOVED 866 127.0.0.1:6381
复制代码

 

将外部redis数据导入集群

redis-trib.rb import --from 127.0.0.1:6378 127.0.0.1:6379

其内部处理流程如下:

1> 通过load_cluster_info_from_node方法加载集群信息,check_cluster方法检查集群是否健康。

2> 连接外部redis节点,如果外部节点开启了cluster_enabled,则提示错误([ERR] The source node should not be a cluster node.)

3> 通过scan命令遍历外部节点,一次获取1000条数据。

4> 遍历这些key,计算出key对应的slot。

5> 执行migrate命令,源节点是外部节点,目的节点是集群slot对应的节点,如果设置了–copy参数,则传递copy参数,其会保留源节点的key,如果设置了–replace,则传递replace参数。如果目标节点中存在同名key,其值会被覆盖。两个参数可同时指定。

6> 不停执行scan命令,直到遍历完所有key。

7> 迁移完成。

复制代码
[root@slowtech conf]# redis-trib.rb import --from 127.0.0.1:6378 --replace  127.0.0.1:6379 
>>> Importing data from 127.0.0.1:6378 to cluster 
/usr/local/ruby/lib/ruby/gems/2.5.0/gems/redis-3.3.0/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
>>> Performing Cluster Check (using node 127.0.0.1:6379)
S: d826c5fd98efa8a17a880e9a90a25f06c88e6ae9 127.0.0.1:6379
   slots: (0 slots) slave
   replicates a8b3d0f9b12d63dab3b7337d602245d96dd55844
S: 55c05d5b0dfea0d52f88548717ddf24975268de6 127.0.0.1:6383
   slots: (0 slots) slave
   replicates a8b3d0f9b12d63dab3b7337d602245d96dd55844
M: f413fb7e6460308b17cdb71442798e1341b56cbc 127.0.0.1:6381
   slots:50-16383 (16334 slots) master
   2 additional replica(s)
S: beba753c5a63607fa66d9ec7427ed9a511ea136e 127.0.0.1:6382
   slots: (0 slots) slave
   replicates f413fb7e6460308b17cdb71442798e1341b56cbc
S: 83797d518e56c235272402611477f576973e9d34 127.0.0.1:6384
   slots: (0 slots) slave
   replicates f413fb7e6460308b17cdb71442798e1341b56cbc
M: a8b3d0f9b12d63dab3b7337d602245d96dd55844 127.0.0.1:6380
   slots:0-49 (50 slots) master
   2 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Connecting to the source Redis instance
*** Importing 1 keys from DB 0
Migrating key5 to 127.0.0.1:6381: OK
复制代码

 

redis 数据迁移:(error) ERR Target instance replied with error: ERR DUMP payload version or checksum are wron

1.需求

需要将一个redis实例中的部分keys,转移到另一个redis实例

2.迁移方案

2.1 源实例与目标实例版本相同

2.1.1 使用dump命令
#!/bin/bash

#redis 源ip
src_ip=127.0.0.1
#redis 源port
src_port=6392

#redis 目的ip
dest_ip=127.0.0.1
#redis 目的port
dest_port=6393

#要迁移的key前缀
key_prefix=test

i=1

redis-cli -h $src_ip -p $src_port keys "${key_prefix}*" | while read key
do
    redis-cli -h $dest_ip -p $dest_port del $key
    redis-cli -h $src_ip -p $src_port --raw dump $key | perl -pe 'chomp if eof' | redis-cli -h $dest_ip -p $dest_port -x restore $key 0
    echo "$i migrate key $key"
    ((i++))
done
2.1.2 使用migrate命令

migrate用法:

MIGRATE host port key destination-db timeout [COPY] [REPLACE]

起始版本:2.6.0
时间复杂度:This command actually executes a DUMP+DEL in the source instance, and a RESTORE in the target instance. See the pages of these commands for time complexity. Also an O(N) data transfer between the two instances is performed.
迁移脚本

#!/bin/bash

#redis 源ip
src_ip=127.0.0.1
#redis 源port
src_port=6392

#redis 目的ip
dest_ip=127.0.0.1
#redis 目的port
dest_port=6393

#要迁移的key前缀
key_prefix=test

i=1

redis-cli -h $src_ip -p $src_port keys "${key_prefix}*" | while read key
do
    redis-cli -h $src_ip -p $src_port migrate $dest_ip $dest_port $key 0 1000 replace
    echo "$i migrate key $key"
    ((i++))
done

2.2 源实例与目标实例版本不相同

2.2.1 不可行方案

如果源实例与目标实例版本不相同,使用migrate进行迁移的时候会有如下错误:

1935 migrate key esf_common_auth_code_18587656289
(error) ERR Target instance replied with error: ERR DUMP payload version or checksum are wrong

如果源实例与目标实例版本不相�
 
褂胐ump进行迁移的时候会有如下错误

(error) ERR DUMP payload version or checksum are wrong

如果源实例与目标实例版本不相同,直接复制rdbw文件搭建从库为有如下报错

5453:S 23 Nov 18:13:14.153 * MASTER <-> SLAVE sync: Flushing old data
5453:S 23 Nov 18:13:14.153 * MASTER <-> SLAVE sync: Loading DB in memory
5453:S 23 Nov 18:13:14.153 # Can't handle RDB format version 8
5453:S 23 Nov 18:13:14.153 # Failed trying to load the MASTER synchronization DB from disk
2.2.2 可行方案

开启源实例aof持久化功能

config set appendonly yes

手动进行aof持久化

bgrewriteaof

新建一个redis实例,与目标实例版本相同并启动

将源实例的redis的aof文件导入新建实例

redis-cli -h 127.0.0.1 -p 6395 -a password --pipe < appendonly.aof

通过dump或者migrate的方式将新实例的key迁移到目标实例

redis 集群新增节点,slots槽分配,删除节点, [ERR] Calling MIGRATE ERR Syntax error, try CLIENT (LIST | KILL | GET……

原文链接:http://www.jianshu.com/p/ba09ff851a6b
  1. redis reshard 重新分槽(slots)
    https://github.com/antirez/redis/issues/5029 redis 官方已确认该bug
  • redis 集群重新(reshard)分片故障
    [ERR] Calling MIGRATE ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)
  • 错误背景描述
    redis版本:4.0.1
    ruby gem reids 版本:4.0.0

    ruby gem安装的redis库,版本不能使用最新的4.0,否则redis-trib.rb reshard 172.16.160.60:6377 重新分片时会报错误

  • 解决方案

    a. 卸载最新redis库,gem uninstall redis
    b. 安装3.x版本,gem install redis -v 3.3.5 测试3.2.1到3.3.5都可以,4.x以上的分片报错

  1. redis slots migrating importing 解决方法
  • 错误描述
  1. [WARNING] Node 172.16.160.34:6368 has slots in migrating state (6390)
  2. [WARNING] Node 172.16.160.61:6377 has slots in migrating state (6390)
  3. [WARNING] The following slots are open: 6390
  • 处理方式:

    登入两个提示错误的节点,执行清除命令即可
    cluster setslot 6390 stable

  1. redis 集群新增节点
    新增节点有两种方式,一种为添加主节点(master),另一种为添加一个从节点(slave)
  • 列出redis现有节点
  1. 172.16.160.60:6377> cluster nodes
  2. afa51bebb90da31a1da4912c762edfdb713411c5 172.16.160.34:6368@16368 master – 0 1537168776000 12 connected 01364 54616826 1092312287
  3. c447385f64b9294ee9fdab634254505e06dd3770 172.16.160.34:6367@16367 slave e4c7c9cb80caf727cb5724af7b47ce0b462b9749 0 1537168776158 4 connected
  4. bdc98c07bdcfc5141a3a41af25ac5b1826aa9f2a 172.16.160.61:6367@16367 slave 92ab349e2f5723cec93e8b3e26af1d4062cd1469 0 1537168776000 5 connected
  5. e4c7c9cb80caf727cb5724af7b47ce0b462b9749 172.16.160.34:6377@16377 master – 0 1537168778163 3 connected 1228816383
  6. 78fd5441a07f6762d821b51fa330d535239953fe 172.16.160.61:6377@16377 master – 0 1537168774000 11 connected 682710922
  7. 92ab349e2f5723cec93e8b3e26af1d4062cd1469 172.16.160.60:6377@16377 myself,master – 0 1537168775000 1 connected 13655460
  8. 0b6f0cabbb8488f43a6b5c8a44c781656d3075d2 172.16.160.60:6367@16367 slave 78fd5441a07f6762d821b51fa330d535239953fe 0 1537168777161 11 connected
  9. 0a170716fa820a056a8826c63a5a4c02a9aaa34a 172.16.160.34:6369@16369 slave afa51bebb90da31a1da4912c762edfdb713411c5 0 1537168775000 12 connected
  • 新增节点(master)主节点

    add-node 要添加的节点 ip :prot 用来标识添加至哪个集群 ip : prot

  1. [root@yoyo60 bin]# ./redis-trib.rb add-node 172.16.160.61:6368 172.16.160.60:6377
  2. >>> Adding node 172.16.160.61:6368 to cluster 172.16.160.60:6377
  3. >>> Performing Cluster Check (using node 172.16.160.60:6377)
  4. M: 92ab349e2f5723cec93e8b3e26af1d4062cd1469 172.16.160.60:6377
  5. slots:13655460 (4096 slots) master
  6. 1 additional replica(s)
  7. M: afa51bebb90da31a1da4912c762edfdb713411c5 172.16.160.34:6368
  8. slots:01364,54616826,1092312287 (4096 slots) master
  9. 1 additional replica(s)
  10. S: c447385f64b9294ee9fdab634254505e06dd3770 172.16.160.34:6367
  11. slots: (0 slots) slave
  12. replicates e4c7c9cb80caf727cb5724af7b47ce0b462b9749
  13. S: bdc98c07bdcfc5141a3a41af25ac5b1826aa9f2a 172.16.160.61:6367
  14. slots: (0 slots) slave
  15. replicates 92ab349e2f5723cec93e8b3e26af1d4062cd1469
  16. M: e4c7c9cb80caf727cb5724af7b47ce0b462b9749 172.16.160.34:6377
  17. slots:1228816383 (4096 slots) master
  18. 1 additional replica(s)
  19. M: 78fd5441a07f6762d821b51fa330d535239953fe 172.16.160.61:6377
  20. slots:682710922 (4096 slots) master
  21. 1 additional replica(s)
  22. S: 0b6f0cabbb8488f43a6b5c8a44c781656d3075d2 172.16.160.60:6367
  23. slots: (0 slots) slave
  24. replicates 78fd5441a07f6762d821b51fa330d535239953fe
  25. S: 0a170716fa820a056a8826c63a5a4c02a9aaa34a 172.16.160.34:6369
  26. slots: (0 slots) slave
  27. replicates afa51bebb90da31a1da4912c762edfdb713411c5
  28. [OK] All nodes agree about slots configuration.
  29. >>> Check for open slots…
  30. >>> Check slots coverage…
  31. [OK] All 16384 slots covered.
  32. >>> Send CLUSTER MEET to node 172.16.160.61:6368 to make it join the cluster.
  33. [OK] New node added correctly.
  • 查看新增的主节点(master)

    新增主节点成功,主节点id: 117dd5c58a92c602ee6fc2df2d76a6bb3216654f 并且新增的主节点并未分分槽(slots),处于不可使用状态(没有分slots数据不会存至该节点)

  1. [root@yoyo60 bin]# ./redis-trib.rb check 172.16.160.61:6368
  2. >>> Performing Cluster Check (using node 172.16.160.61:6368)
  3. M: 117dd5c58a92c602ee6fc2df2d76a6bb3216654f 172.16.160.61:6368
  4. slots: (0 slots) master
  5. 0 additional replica(s)
  6. S: c447385f64b9294ee9fdab634254505e06dd3770 172.16.160.34:6367
  7. slots: (0 slots) slave
  8. replicates e4c7c9cb80caf727cb5724af7b47ce0b462b9749
  9. M: 92ab349e2f5723cec93e8b3e26af1d4062cd1469 172.16.160.60:6377
  10. slots:13655460 (4096 slots) master
  11. 1 additional replica(s)
  12. S: bdc98c07bdcfc5141a3a41af25ac5b1826aa9f2a 172.16.160.61:6367
  13. slots: (0 slots) slave
  14. replicates 92ab349e2f5723cec93e8b3e26af1d4062cd1469
  15. S: 0b6f0cabbb8488f43a6b5c8a44c781656d3075d2 172.16.160.60:6367
  16. slots: (0 slots) slave
  17. replicates 78fd5441a07f6762d821b51fa330d535239953fe
  18. M: 78fd5441a07f6762d821b51fa330d535239953fe 172.16.160.61:6377
  19. slots:682710922 (4096 slots) master
  20. 1 additional replica(s)
  21. S: 0a170716fa820a056a8826c63a5a4c02a9aaa34a 172.16.160.34:6369
  22. slots: (0 slots) slave
  23. replicates afa51bebb90da31a1da4912c762edfdb713411c5
  24. M: e4c7c9cb80caf727cb5724af7b47ce0b462b9749 172.16.160.34:6377
  25. slots:1228816383 (4096 slots) master
  26. 1 additional replica(s)
  27. M: afa51bebb90da31a1da4912c762edfdb713411c5 172.16.160.34:6368
  28. slots:01364,54616826,1092312287 (4096 slots) master
  29. 1 additional replica(s)
  30. [OK] All nodes agree about slots configuration.
  31. >>> Check for open slots…
  32. >>> Check slots coverage…
  33. [OK] All 16384 slots covered.
  • 新增从节点(slave)并指定其对应的主节点

    –master-id 指定主节点的节点id(添加的节点要作为谁的从节点) ip : prot 要添加从节点ip及端口 ip :prot 标识给那个集群添加节点

  1. [root@yoyo60 bin]# ./redis-trib.rb add-node –slave –master-id 117dd5c58a92c602ee6fc2df2d76a6bb3216654f 172.16.160.61:6369 172.16.160.61:6368
  2. >>> Adding node 172.16.160.61:6369 to cluster 172.16.160.61:6368
  3. >>> Performing Cluster Check (using node 172.16.160.61:6368)
  4. M: 117dd5c58a92c602ee6fc2df2d76a6bb3216654f 172.16.160.61:6368
  5. slots: (0 slots) master
  6. 0 additional replica(s)
  7. S: c447385f64b9294ee9fdab634254505e06dd3770 172.16.160.34:6367
  8. slots: (0 slots) slave
  9. replicates e4c7c9cb80caf727cb5724af7b47ce0b462b9749
  10. M: 92ab349e2f5723cec93e8b3e26af1d4062cd1469 172.16.160.60:6377
  11. slots:13655460 (4096 slots) master
  12. 1 additional replica(s)
  13. S: bdc98c07bdcfc5141a3a41af25ac5b1826aa9f2a 172.16.160.61:6367
  14. slots: (0 slots) slave
  15. replicates 92ab349e2f5723cec93e8b3e26af1d4062cd1469
  16. S: 0b6f0cabbb8488f43a6b5c8a44c781656d3075d2 172.16.160.60:6367
  17. slots: (0 slots) slave
  18. replicates 78fd5441a07f6762d821b51fa330d535239953fe
  19. M: 78fd5441a07f6762d821b51fa330d535239953fe 172.16.160.61:6377
  20. slots:682710922 (4096 slots) master
  21. 1 additional replica(s)
  22. S: 0a170716fa820a056a8826c63a5a4c02a9aaa34a 172.16.160.34:6369
  23. slots: (0 slots) slave
  24. replicates afa51bebb90da31a1da4912c762edfdb713411c5
  25. M: e4c7c9cb80caf727cb5724af7b47ce0b462b9749 172.16.160.34:6377
  26. slots:1228816383 (4096 slots) master
  27. 1 additional replica(s)
  28. M: afa51bebb90da31a1da4912c762edfdb713411c5 172.16.160.34:6368
  29. slots:01364,54616826,1092312287 (4096 slots) master
  30. 1 additional replica(s)
  31. [OK] All nodes agree about slots configuration.
  32. >>> Check for open slots…
  33. >>> Check slots coverage…
  34. [OK] All 16384 slots covered.
  35. >>> Send CLUSTER MEET to node 172.16.160.61:6369 to make it join the cluster.
  36. Waiting for the cluster to join…
  37. >>> Configure node as replica of 172.16.160.61:6368.
  38. [OK] New node added correctly.
  • 查看集群状态

    新增从节点成功,从信息中可以看到 172.16.160.61:6369 作为从节点,并在 replicates 中记录了主节点的id 117dd5c58a92c602ee6fc2df2d76a6bb3216654f

  1. [root@yoyo60 bin]# ./redis-trib.rb check 172.16.160.61:6368
  2. >>> Performing Cluster Check (using node 172.16.160.61:6368)
  3. M: 117dd5c58a92c602ee6fc2df2d76a6bb3216654f 172.16.160.61:6368
  4. slots: (0 slots) master
  5. 1 additional replica(s)
  6. S: c447385f64b9294ee9fdab634254505e06dd3770 172.16.160.34:6367
  7. slots: (0 slots) slave
  8. replicates e4c7c9cb80caf727cb5724af7b47ce0b462b9749
  9. M: 92ab349e2f5723cec93e8b3e26af1d4062cd1469 172.16.160.60:6377
  10. slots:13655460 (4096 slots) master
  11. 1 additional replica(s)
  12. S: bdc98c07bdcfc5141a3a41af25ac5b1826aa9f2a 172.16.160.61:6367
  13. slots: (0 slots) slave
  14. replicates 92ab349e2f5723cec93e8b3e26af1d4062cd1469
  15. S: 0b6f0cabbb8488f43a6b5c8a44c781656d3075d2 172.16.160.60:6367
  16. slots: (0 slots) slave
  17. replicates 78fd5441a07f6762d821b51fa330d535239953fe
  18. M: 78fd5441a07f6762d821b51fa330d535239953fe 172.16.160.61:6377
  19. slots:682710922 (4096 slots) master
  20. 1 additional replica(s)
  21. S: 0a170716fa820a056a8826c63a5a4c02a9aaa34a 172.16.160.34:6369
  22. slots: (0 slots) slave
  23. replicates afa51bebb90da31a1da4912c762edfdb713411c5
  24. M: e4c7c9cb80caf727cb5724af7b47ce0b462b9749 172.16.160.34:6377
  25. slots:1228816383 (4096 slots) master
  26. 1 additional replica(s)
  27. M: afa51bebb90da31a1da4912c762edfdb713411c5 172.16.160.34:6368
  28. slots:01364,54616826,1092312287 (4096 slots) master
  29. 1 additional replica(s)
  30. S: 66c7917a991442068c2741207980fb5d8f60e218 172.16.160.61:6369
  31. slots: (0 slots) slave
  32. replicates 117dd5c58a92c602ee6fc2df2d76a6bb3216654f
  33. [OK] All nodes agree about slots configuration.
  34. >>> Check for open slots…
  35. >>> Check slots coverage…
  36. [OK] All 16384 slots covered.

但是通过上面查看的集群信息发现:

  1. M: 117dd5c58a92c602ee6fc2df2d76a6bb3216654f 172.16.160.61:6368
  2. slots: (0 slots) master
  3. 1 additional replica(s)

slots: (0 slots) master 可以看出并未给这个主节点分槽,因此该节点并未负责数据存取,我们需要手动对集群节点分槽。

  1. redis 分槽(slots)

通过上面,我们发现新增的主节点并未分slots ,需要我们手动对其分槽。

  • 使用reshard命令对集群slots重新分配

ip : prot ip与端口标识将要操作的集群

  1. [root@yoyo60 bin]# ./redis-trib.rb reshard 172.16.160.60:6367
  2. >>> Performing Cluster Check (using node 172.16.160.60:6367)
  3. S: 0b6f0cabbb8488f43a6b5c8a44c781656d3075d2 172.16.160.60:6367
  4. slots: (0 slots) slave
  5. replicates 78fd5441a07f6762d821b51fa330d535239953fe
  6. S: c447385f64b9294ee9fdab634254505e06dd3770 172.16.160.34:6367
  7. slots: (0 slots) slave
  8. replicates e4c7c9cb80caf727cb5724af7b47ce0b462b9749
  9. M: 78fd5441a07f6762d821b51fa330d535239953fe 172.16.160.61:6377
  10. slots:682710922 (4096 slots) master
  11. 1 additional replica(s)
  12. S: 66c7917a991442068c2741207980fb5d8f60e218 172.16.160.61:6369
  13. slots: (0 slots) slave
  14. replicates 117dd5c58a92c602ee6fc2df2d76a6bb3216654f
  15. S: bdc98c07bdcfc5141a3a41af25ac5b1826aa9f2a 172.16.160.61:6367
  16. slots: (0 slots) slave
  17. replicates 92ab349e2f5723cec93e8b3e26af1d4062cd1469
  18. S: 0a170716fa820a056a8826c63a5a4c02a9aaa34a 172.16.160.34:6369
  19. slots: (0 slots) slave
  20. replicates afa51bebb90da31a1da4912c762edfdb713411c5
  21. M: 92ab349e2f5723cec93e8b3e26af1d4062cd1469 172.16.160.60:6377
  22. slots:13655460 (4096 slots) master
  23. 1 additional replica(s)
  24. M: e4c7c9cb80caf727cb5724af7b47ce0b462b9749 172.16.160.34:6377
  25. slots:1228816383 (4096 slots) master
  26. 1 additional replica(s)
  27. M: afa51bebb90da31a1da4912c762edfdb713411c5 172.16.160.34:6368
  28. slots:01364,54616826,1092312287 (4096 slots) master
  29. 1 additional replica(s)
  30. M: 117dd5c58a92c602ee6fc2df2d76a6bb3216654f 172.16.160.61:6368
  31. slots: (0 slots) master
  32. 1 additional replica(s)
  33. [OK] All nodes agree about slots configuration.
  34. >>> Check for open slots…
  35. >>> Check slots coverage…
  36. [OK] All 16384 slots covered.
  37. How many slots do you want to move (from 1 to 16384)?

提示我们需要迁移多少个slots到172.16.160.61:6368上,slots总数为16384个,现在有五个节点,为了平衡分配16384/5≈3276,我们需要移动3276个slots

  1. How many slots do you want to move (from 1 to 16384)? 3276
  2. What is the receiving node ID?

此时提示我们需要用哪个节点的nodeId来接收这些slots,通过172.16.160.61:6368找到该节点对应的nodeId为:117dd5c58a92c602ee6fc2df2d76a6bb3216654f

  1. How many slots do you want to move (from 1 to 16384)? 3276
  2. What is the receiving node ID? 117dd5c58a92c602ee6fc2df2d76a6bb3216654f
  3. Please enter all the source node IDs.
  4. Type ‘all’ to use all the nodes as source nodes for the hash slots.
  5. Type ‘done’ once you entered all the source nodes IDs.
  6. Source node #1:

提示我们需要从哪些节点迁移出这些slots,由于我们需要平均分配slots,所以需要从其他所有主节点上迁移slots,根据提示录入: all 即可
这样的话, 集群中的所有主节点都会成为源节点, redis-trib 将从各个源节点中各取出一部分哈希槽, 凑够 3276 个, 然后移动到172.16.160.61:6368节点上:

推荐使用:

Source node #1: all

也可以列出需要迁出的节点id,以done结束(事例,不做参考,如需指定迁出多个节点id可选择操作)

  1. Please enter all the source node IDs.
  2. Type ‘all’ to use all the nodes as source nodes for the hash slots.
  3. Type ‘done’ once you entered all the source nodes IDs.
  4. Source node #1:117dd5c58a92c602ee6fc2df2d76a6bb3216654f
  5. Source node #2:done

接下来提示是否迁移,录入yes/no

Do you want to proceed with the proposed reshard plan (yes/no)? 

录入yes,回车之后redis-trib 就会正式的帮我们分槽处理。当然在重新分槽也遇到了一系列问题,可参考1/2。

  • 查看集群状态

    新增172.16.160.61:6368的这个主节点已经分配到slots,可以负责存取

  1. [root@yoyo60 bin]# ./redis-trib.rb check 172.16.160.61:6368
  2. >>> Performing Cluster Check (using node 172.16.160.61:6368)
  3. M: 117dd5c58a92c602ee6fc2df2d76a6bb3216654f 172.16.160.61:6368
  4. slots:0818,13652183,68277645,1228813106 (3276 slots) master
  5. 1 additional replica(s)
  6. S: c447385f64b9294ee9fdab634254505e06dd3770 172.16.160.34:6367
  7. slots: (0 slots) slave
  8. replicates e4c7c9cb80caf727cb5724af7b47ce0b462b9749
  9. M: 92ab349e2f5723cec93e8b3e26af1d4062cd1469 172.16.160.60:6377
  10. slots:21845460 (3277 slots) master
  11. 1 additional replica(s)
  12. S: bdc98c07bdcfc5141a3a41af25ac5b1826aa9f2a 172.16.160.61:6367
  13. slots: (0 slots) slave
  14. replicates 92ab349e2f5723cec93e8b3e26af1d4062cd1469
  15. S: 0b6f0cabbb8488f43a6b5c8a44c781656d3075d2 172.16.160.60:6367
  16. slots: (0 slots) slave
  17. replicates 78fd5441a07f6762d821b51fa330d535239953fe
  18. M: 78fd5441a07f6762d821b51fa330d535239953fe 172.16.160.61:6377
  19. slots:764610922 (3277 slots) master
  20. 1 additional replica(s)
  21. S: 0a170716fa820a056a8826c63a5a4c02a9aaa34a 172.16.160.34:6369
  22. slots: (0 slots) slave
  23. replicates afa51bebb90da31a1da4912c762edfdb713411c5
  24. M: e4c7c9cb80caf727cb5724af7b47ce0b462b9749 172.16.160.34:6377
  25. slots:1310716383 (3277 slots) master
  26. 1 additional replica(s)
  27. M: afa51bebb90da31a1da4912c762edfdb713411c5 172.16.160.34:6368
  28. slots:8191364,54616826,1092312287 (3277 slots) master
  29. 1 additional replica(s)
  30. S: 66c7917a991442068c2741207980fb5d8f60e218 172.16.160.61:6369
  31. slots: (0 slots) slave
  32. replicates 117dd5c58a92c602ee6fc2df2d76a6bb3216654f
  33. [OK] All nodes agree about slots configuration.
  34. >>> Check for open slots…
  35. >>> Check slots coverage…
  36. [OK] All 16384 slots covered.
  1. 移除节点

    移除节点亲测,踩了不少坑。如果redis cluster 16384 个slots不完整,会导致集群失败。所以强烈建议先将需要删除节点的slots移交到其他节点,然后再移除节点。

  • 移除节点,使用reshard移交节点slots。
    同上一样,使用reshard重新分配slots,操作步骤基本一致,但需要保证每个节点slots分配平衡。

移除172.16.160.61:6368这个主节点

  1. [root@yoyo60 bin]# ./redis-trib.rb reshard 172.16.160.60:6367
  2. >>> Performing Cluster Check (using node 172.16.160.60:6367)
  3. S: 0b6f0cabbb8488f43a6b5c8a44c781656d3075d2 172.16.160.60:6367
  4. slots: (0 slots) slave
  5. replicates 78fd5441a07f6762d821b51fa330d535239953fe
  6. S: c447385f64b9294ee9fdab634254505e06dd3770 172.16.160.34:6367
  7. slots: (0 slots) slave
  8. replicates e4c7c9cb80caf727cb5724af7b47ce0b462b9749
  9. M: 78fd5441a07f6762d821b51fa330d535239953fe 172.16.160.61:6377
  10. slots:764610922 (3277 slots) master
  11. 1 additional replica(s)
  12. S: 66c7917a991442068c2741207980fb5d8f60e218 172.16.160.61:6369
  13. slots: (0 slots) slave
  14. replicates 117dd5c58a92c602ee6fc2df2d76a6bb3216654f
  15. S: bdc98c07bdcfc5141a3a41af25ac5b1826aa9f2a 172.16.160.61:6367
  16. slots: (0 slots) slave
  17. replicates 92ab349e2f5723cec93e8b3e26af1d4062cd1469
  18. S: 0a170716fa820a056a8826c63a5a4c02a9aaa34a 172.16.160.34:6369
  19. slots: (0 slots) slave
  20. replicates afa51bebb90da31a1da4912c762edfdb713411c5
  21. M: 92ab349e2f5723cec93e8b3e26af1d4062cd1469 172.16.160.60:6377
  22. slots:21845460 (3277 slots) master
  23. 1 additional replica(s)
  24. M: e4c7c9cb80caf727cb5724af7b47ce0b462b9749 172.16.160.34:6377
  25. slots:1310716383 (3277 slots) master
  26. 1 additional replica(s)
  27. M: afa51bebb90da31a1da4912c762edfdb713411c5 172.16.160.34:6368
  28. slots:8191364,54616826,1092312287 (3277 slots) master
  29. 1 additional replica(s)
  30. M: 117dd5c58a92c602ee6fc2df2d76a6bb3216654f 172.16.160.61:6368
  31. slots:0818,13652183,68277645,1228813106 (3276 slots) master
  32. 1 additional replica(s)
  33. [OK] All nodes agree about slots configuration.
  34. >>> Check for open slots…
  35. >>> Check slots coverage…
  36. [OK] All 16384 slots covered.
  37. How many slots do you want to move (from 1 to 16384)?
  • 录入需要迁出的数量,172.16.160.61:6368这个节点上有3276个slots,slots:0-818,1365-2183,6827-7645,12288-13106创建时由四个节点迁移过来,所以决定分四次迁出。
How many slots do you want to move (from 1 to 16384)? 819
  • 录入接收迁出slots的节点id
  1. How many slots do you want to move (from 1 to 16384)? 819
  2. What is the receiving node ID? afa51bebb90da31a1da4912c762edfdb713411c5
  • 录入迁出slots的节点id,以done结束
  1. Please enter all the source node IDs.
  2. Type ‘all’ to use all the nodes as source nodes for the hash slots.
  3. Type ‘done’ once you entered all the source nodes IDs.
  4. Source node #1:117dd5c58a92c602ee6fc2df2d76a6bb3216654f
  5. Source node #2:done
  • 确认迁出,yes/no
Do you want to proceed with the proposed reshard plan (yes/no)? yes

重复迁出剩余三次,查看集群状态。

  1. [root@yoyo60 bin]# ./redis-trib.rb check 172.16.160.61:6368
  2. >>> Performing Cluster Check (using node 172.16.160.61:6368)
  3. M: 117dd5c58a92c602ee6fc2df2d76a6bb3216654f 172.16.160.61:6368
  4. slots: (0 slots) master
  5. 0 additional replica(s)
  6. S: c447385f64b9294ee9fdab634254505e06dd3770 172.16.160.34:6367
  7. slots: (0 slots) slave
  8. replicates e4c7c9cb80caf727cb5724af7b47ce0b462b9749
  9. M: 92ab349e2f5723cec93e8b3e26af1d4062cd1469 172.16.160.60:6377
  10. slots:13655460 (4096 slots) master
  11. 1 additional replica(s)
  12. S: bdc98c07bdcfc5141a3a41af25ac5b1826aa9f2a 172.16.160.61:6367
  13. slots: (0 slots) slave
  14. replicates 92ab349e2f5723cec93e8b3e26af1d4062cd1469
  15. S: 0b6f0cabbb8488f43a6b5c8a44c781656d3075d2 172.16.160.60:6367
  16. slots: (0 slots) slave
  17. replicates 78fd5441a07f6762d821b51fa330d535239953fe
  18. M: 78fd5441a07f6762d821b51fa330d535239953fe 172.16.160.61:6377
  19. slots:682710922 (4096 slots) master
  20. 1 additional replica(s)
  21. S: 0a170716fa820a056a8826c63a5a4c02a9aaa34a 172.16.160.34:6369
  22. slots: (0 slots) slave
  23. replicates afa51bebb90da31a1da4912c762edfdb713411c5
  24. M: e4c7c9cb80caf727cb5724af7b47ce0b462b9749 172.16.160.34:6377
  25. slots:1228816383 (4096 slots) master
  26. 2 additional replica(s)
  27. M: afa51bebb90da31a1da4912c762edfdb713411c5 172.16.160.34:6368
  28. slots:01364,54616826,1092312287 (4096 slots) master
  29. 1 additional replica(s)
  30. S: 66c7917a991442068c2741207980fb5d8f60e218 172.16.160.61:6369
  31. slots: (0 slots) slave
  32. replicates e4c7c9cb80caf727cb5724af7b47ce0b462b9749
  33. [OK] All nodes agree about slots configuration.
  34. >>> Check for open slots…
  35. >>> Check slots coverage…
  36. [OK] All 16384 slots covered.
  • 移除该主节点对应的从节点

移除节点与新增节点不同,移除节点需要标识集群的ip :prot ,被移除节点的nodeId

  1. [root@yoyo60 bin]# ./redis-trib.rb del-node 172.16.160.60:6377 66c7917a991442068c2741207980fb5d8f60e218
  2. >>> Removing node 66c7917a991442068c2741207980fb5d8f60e218 from cluster 172.16.160.60:6377
  3. >>> Sending CLUSTER FORGET messages to the cluster…
  4. >>> SHUTDOWN the node.

移除从节点成功,查看集群状态

从节点成功移除

  1. [root@yoyo60 bin]# ./redis-trib.rb check 172.16.160.61:6368
  2. >>> Performing Cluster Check (using node 172.16.160.61:6368)
  3. M: 117dd5c58a92c602ee6fc2df2d76a6bb3216654f 172.16.160.61:6368
  4. slots: (0 slots) master
  5. 0 additional replica(s)
  6. S: c447385f64b9294ee9fdab634254505e06dd3770 172.16.160.34:6367
  7. slots: (0 slots) slave
  8. replicates e4c7c9cb80caf727cb5724af7b47ce0b462b9749
  9. M: 92ab349e2f5723cec93e8b3e26af1d4062cd1469 172.16.160.60:6377
  10. slots:13655460 (4096 slots) master
  11. 1 additional replica(s)
  12. S: bdc98c07bdcfc5141a3a41af25ac5b1826aa9f2a 172.16.160.61:6367
  13. slots: (0 slots) slave
  14. replicates 92ab349e2f5723cec93e8b3e26af1d4062cd1469
  15. S: 0b6f0cabbb8488f43a6b5c8a44c781656d3075d2 172.16.160.60:6367
  16. slots: (0 slots) slave
  17. replicates 78fd5441a07f6762d821b51fa330d535239953fe
  18. M: 78fd5441a07f6762d821b51fa330d535239953fe 172.16.160.61:6377
  19. slots:682710922 (4096 slots) master
  20. 1 additional replica(s)
  21. S: 0a170716fa820a056a8826c63a5a4c02a9aaa34a 172.16.160.34:6369
  22. slots: (0 slots) slave
  23. replicates afa51bebb90da31a1da4912c762edfdb713411c5
  24. M: e4c7c9cb80caf727cb5724af7b47ce0b462b9749 172.16.160.34:6377
  25. slots:1228816383 (4096 slots) master
  26. 1 additional replica(s)
  27. M: afa51bebb90da31a1da4912c762edfdb713411c5 172.16.160.34:6368
  28. slots:01364,54616826,1092312287 (4096 slots) master
  29. 1 additional replica(s)
  30. [OK] All nodes agree about slots configuration.
  31. >>> Check for open slots…
  32. >>> Check slots coverage…
  33. [OK] All 16384 slots covered.
  • 移除主节点,步骤与移除从节点基本一致
  1. [root@yoyo60 bin]# ./redis-trib.rb del-node 172.16.160.60:6377 117dd5c58a92c602ee6fc2df2d76a6bb3216654f
  2. >>> Removing node 117dd5c58a92c602ee6fc2df2d76a6bb3216654f from cluster 172.16.160.60:6377
  3. >>> Sending CLUSTER FORGET messages to the cluster…
  4. >>> SHUTDOWN the node.
  • 移除主节点成功,登录集群查看节点状态。
  1. [root@yoyo60 bin]# ./redis-cli -c -h 172.16.160.60 -p 6377
  2. 172.16.160.60:6377> cluster info
  3. cluster_state:ok
  4. cluster_slots_assigned:16384
  5. cluster_slots_ok:16384
  6. cluster_slots_pfail:0
  7. cluster_slots_fail:0
  8. cluster_known_nodes:8
  9. cluster_size:4
  10. cluster_current_epoch:17
  11. cluster_my_epoch:15
  12. cluster_stats_messages_ping_sent:292968
  13. cluster_stats_messages_pong_sent:308843
  14. cluster_stats_messages_update_sent:24
  15. cluster_stats_messages_sent:601835
  16. cluster_stats_messages_ping_received:308834
  17. cluster_stats_messages_pong_received:292968
  18. cluster_stats_messages_meet_received:9
  19. cluster_stats_messages_update_received:1
  20. cluster_stats_messages_received:601812
  • 查看节点信息
  1. 172.16.160.60:6377> cluster nodes
  2. afa51bebb90da31a1da4912c762edfdb713411c5 172.16.160.34:6368@16368 master – 0 1537195995792 14 connected 01364 54616826 1092312287
  3. bdc98c07bdcfc5141a3a41af25ac5b1826aa9f2a 172.16.160.61:6367@16367 slave 92ab349e2f5723cec93e8b3e26af1d4062cd1469 0 1537195994000 15 connected
  4. e4c7c9cb80caf727cb5724af7b47ce0b462b9749 172.16.160.34:6377@16377 master – 0 1537195994789 17 connected 1228816383
  5. 92ab349e2f5723cec93e8b3e26af1d4062cd1469 172.16.160.60:6377@16377 myself,master – 0 1537195993000 15 connected 13655460
  6. 78fd5441a07f6762d821b51fa330d535239953fe 172.16.160.61:6377@16377 master – 0 1537195993000 16 connected 682710922
  7. 0b6f0cabbb8488f43a6b5c8a44c781656d3075d2 172.16.160.60:6367@16367 slave 78fd5441a07f6762d821b51fa330d535239953fe 0 1537195993000 16 connected
  8. 0a170716fa820a056a8826c63a5a4c02a9aaa34a 172.16.160.34:6369@16369 slave afa51bebb90da31a1da4912c762edfdb713411c5 0 1537195996795 14 connected
  9. c447385f64b9294ee9fdab634254505e06dd3770 172.16.160.34:6367@16367 slave e4c7c9cb80caf727cb5724af7b47ce0b462b9749 0 1537195994000 17 connected

redis 集群与单例间数据迁移

原文链接:http://www.jianshu.com/p/853ee422b6d1

1 redis-migrate-tool 数据迁移工具

redis-migrate-tool 是维品会开源的一款redis数据迁移工具,github地址为:redis-migrate-tool

1.1 下载安装

下载

git clone https://github.com/vipshop/redis-migrate-tool.git

安装

  1. cd redis-migrate-tool
  2. autoreconf -fvi
  3. ./configure
  4. make
  5. src/redis-migrate-tool -h

1.2 关于redis数据迁移

redis 4.0.x 版本后发现唯品会开源的 redis-migrate-tool 的工具不能用了(可能是我本人的原因导致)。
后面在网上收集到这两种数据从单例迁移至集群的方式:

  1. 使用 redis-trib.rb import 工具直接进行迁移
/data/soft/redis/src/redis-trib.rb import --from ip1:port1 ip2:port2

该命令是将 ip1:port1 单例中的数据迁移至ip2:port2集群中。

  1. 利用redis aof持久化的机制进行迁移
    原理:如果redis开启 aof数据持久化,redis 在每次启动时会取.aof 文件的以命令的形式写入内存中。
    注:假设集群是3主3从。
    1)查看集群的节点信息
/data/soft/redis/src/redis-trib.rb check ip:port
  1. M: fbe0db1e845f6886de20096b5c46c9262057de7e 172.31.78.9:6379
  2. slots:0-5460 (5461 slots) master
  3. 1 additional replica(s)
  4. M: ebcc3a7ba0f96519de4300610bb2af522b9e4e48 172.31.78.10:6379
  5. slots:10922-16383 (5462 slots) master
  6. 1 additional replica(s)
  7. M: cd0c7659a6277a5267dc764d80dd0dd397432cd0 172.31.78.9:6479
  8. slots:5461-10921 (5461 slots) master
  9. 1 additional replica(s)
  10. S: 4d8e79b4ad96897af07aca6f5a326856c90fb520 172.31.78.10:6579
  11. slots: (0 slots) slave
  12. replicates fbe0db1e845f6886de20096b5c46c9262057de7e
  13. S: b1f3c65cbb097b22ba43209bbbc990206a1d6bdd 172.31.78.10:6479
  14. slots: (0 slots) slave
  15. replicates ebcc3a7ba0f96519de4300610bb2af522b9e4e48
  16. S: 91e0d4052e76968155c33e01a2d838b3db13b2d3 172.31.78.9:6579
  17. slots: (0 slots) slave
  18. replicates cd0c7659a6277a5267dc764d80dd0dd397432cd0
  19. [OK] All nodes agree about slots configuration.
  20. >>> Check for open slots
  21. >>> Check slots coverage
  22. [OK] All 16384 slots covered.

2)关闭所有slave节点

3)将 172.31.78.10:6379 上的所有slots 迁移到 172.31.78.9:6379上

/data/soft/redis/src/redis-trib.rb reshard --from ebcc3a7ba0f96519de4300610bb2af522b9e4e48 --to fbe0db1e845f6886de20096b5c46c9262057de7e --slots 5462 --yes 172.31.78.10:6379

4)将 172.31.78.9:6479 上的所有slots 迁移到 172.31.78.9:6379上

/data/soft/redis/src/redis-trib.rb reshard --from cd0c7659a6277a5267dc764d80dd0dd397432cd0 --to fbe0db1e845f6886de20096b5c46c9262057de7e --slots 5461 --yes 172.31.78.10:6379

这时所有slots 都集中在了 172.31.78.9:6379 上了
5)关闭 172.31.78.9:6379 节点 ,将需要迁移的单例目录下 xxx.aof 文件复制到 172.31.78.9:6379的目录下
注:复制后 172.31.78.9:6379 的目录的 xxx.aof 的文件名应和 172.31.78.9:6379 之前的一致。
6)启动 172.31.78.9:6379

/data/soft/redis/src/redis-trib.rb info 172.31.78.9:6379
  1. 172.31.78.9:6379 (fbe0db1e…) -> 7 keys | 16384 slots | 0 slaves.
  2. 172.31.78.10:6379 (ebcc3a7b…) -> 0 keys | 0 slots | 0 slaves.
  3. 172.31.78.9:6479 (cd0c7659…) -> 0 keys | 0 slots | 0 slaves.
  4. [OK] 7 keys in 3 masters.
  5. 0.00 keys per slot on average.

这时集群上key总和应和单例上的相等。
后面就可以重新分配slots了
7)将 172.31.78.9:6379 上部分slots 迁移回 172.31.78.10:6379 节点上

/data/soft/redis/src/redis-trib.rb reshard --from fbe0db1e845f6886de20096b5c46c9262057de7e --to ebcc3a7ba0f96519de4300610bb2af522b9e4e48 --slots 5461 --yes 172.31.78.10:6379

8)将 172.31.78.9:6379 上部分slots 迁移回 172.31.78.9:6479 节点上

/data/soft/redis/src/redis-trib.rb reshard --from fbe0db1e845f6886de20096b5c46c9262057de7e --to cd0c7659a6277a5267dc764d80dd0dd397432cd0 --slots 5461 --yes 172.31.78.10:6379

9)启动各个slave节点
10)查看节点信息

/data/soft/redis/src/redis-trib.rb info 172.31.78.9:6379
  1. 172.31.78.9:6379 (fbe0db1e…) -> 4 keys | 5461 slots | 1 slaves.
  2. 172.31.78.10:6379 (ebcc3a7b…) -> 1 keys | 5462 slots | 1 slaves.
  3. 172.31.78.9:6479 (cd0c7659…) -> 2 keys | 5461 slots | 1 slaves.
  4. [OK] 7 keys in 3 masters.
  5. 0.00 keys per slot on average.

到这迁移就结束了。
这个方法虽然有点暴力,但可以将思路放过来,把集群的数据迁移到单例中。
ps:亲测可用。

Redis单实例数据迁移到集群

Redis单实例数据迁移到集群
环境说明
单机redis

192.168.41.101:6379
redis集群

192.168.41.101:7000 master
192.168.41.101:7001 master
192.168.41.101:7002

192.168.41.102:7000 master
192.168.41.102:7001
192.168.41.102:7002
迁移步骤
查看集群状态及节点槽分布
[root@blaze bin]# ./redis-cli -c -p 7000
127.0.0.1:7000> cluster nodes
ab6d9f956de325cb4cf001abc31df365a5db5234 192.168.41.102:7001 slave 923c9ea13ec8fc19bed309dfbfad094320e1ca41 0 1560304528263 7 connected
48629733acbb8a580a39403dfac92845d63c97b3 192.168.41.101:7001 master – 0 1560304529768 2 connected 5461-10921
2cec8ac00f760c45d86f7903cefad85ec36704e7 192.168.41.102:7002 slave 923c9ea13ec8fc19bed309dfbfad094320e1ca41 0 1560304531271 7 connected
923c9ea13ec8fc19bed309dfbfad094320e1ca41 192.168.41.101:7000 myself,master – 0 0 7 connected 0-5460
e65b319e83997ed6f5323a26aaccba3f35522cbd 192.168.41.101:7002 slave 923c9ea13ec8fc19bed309dfbfad094320e1ca41 0 1560304532274 7 connected
5c0888d5bcceda2904311cbd5405596217c48105 192.168.41.102:7000 master – 0 1560304530268 4 connected 10922-16383
[root@blaze src]# ./redis-trib.rb check 192.168.41.101:7000
>>> Performing Cluster Check (using node 192.168.41.101:7000)
M: 923c9ea13ec8fc19bed309dfbfad094320e1ca41 192.168.41.101:7000
slots:0-5460 (5461 slots) master
3 additional replica(s)
S: ab6d9f956de325cb4cf001abc31df365a5db5234 192.168.41.102:7001
slots: (0 slots) slave
replicates 923c9ea13ec8fc19bed309dfbfad094320e1ca41
M: 48629733acbb8a580a39403dfac92845d63c97b3 192.168.41.101:7001
slots:5461-10921 (5461 slots) master
0 additional replica(s)
S: 2cec8ac00f760c45d86f7903cefad85ec36704e7 192.168.41.102:7002
slots: (0 slots) slave
replicates 923c9ea13ec8fc19bed309dfbfad094320e1ca41
S: e65b319e83997ed6f5323a26aaccba3f35522cbd 192.168.41.101:7002
slots: (0 slots) slave
replicates 923c9ea13ec8fc19bed309dfbfad094320e1ca41
M: 5c0888d5bcceda2904311cbd5405596217c48105 192.168.41.102:7000
slots:10922-16383 (5462 slots) master
0 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots…
>>> Check slots coverage…
[OK] All 16384 slots covered.
把101:7001和102:7000两个master的slots转移到101:7000上
./redis-trib.rb reshard –from 5c0888d5bcceda2904311cbd5405596217c48105 –to 923c9ea13ec8fc19bed309dfbfad094320e1ca41 –slots 5462 –yes 192.168.41.101:7000

./redis-trib.rb reshard –from 48629733acbb8a580a39403dfac92845d63c97b3 –to 923c9ea13ec8fc19bed309dfbfad094320e1ca41 –slots 5461 –yes 192.168.41.101:7000
查看转以后槽点分配
>>> Performing Cluster Check (using node 192.168.41.101:7000)
M: 923c9ea13ec8fc19bed309dfbfad094320e1ca41 192.168.41.101:7000
slots:0-16383 (16384 slots) master
3 additional replica(s)
S: ab6d9f956de325cb4cf001abc31df365a5db5234 192.168.41.102:7001
slots: (0 slots) slave
replicates 923c9ea13ec8fc19bed309dfbfad094320e1ca41
M: 48629733acbb8a580a39403dfac92845d63c97b3 192.168.41.101:7001
slots: (0 slots) master
0 additional replica(s)
S: 2cec8ac00f760c45d86f7903cefad85ec36704e7 192.168.41.102:7002
slots: (0 slots) slave
replicates 923c9ea13ec8fc19bed309dfbfad094320e1ca41
S: e65b319e83997ed6f5323a26aaccba3f35522cbd 192.168.41.101:7002
slots: (0 slots) slave
replicates 923c9ea13ec8fc19bed309dfbfad094320e1ca41
M: 5c0888d5bcceda2904311cbd5405596217c48105 192.168.41.102:7000
slots: (0 slots) master
0 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots…
>>> Check slots coverage…
[OK] All 16384 slots covered.
把192.168.41.101:6379的appendonly.aof 文件复制到101:7000的aof文件目录下
cp 6379目录/appendonly.aof 7000目录
重启101:7000节点加载aof文件
[root@blaze bin]# ./redis-cli -c -p 7000 shutdown

[root@blaze bin]# ./redis-server ../redis.conf
查看数据是否正确
dbsize
将101:7000的槽 均匀分配给其他两个master节点
./redis-trib.rb reshard –from 923c9ea13ec8fc19bed309dfbfad094320e1ca41 –to 5c0888d5bcceda2904311cbd5405596217c48105 –slots 5462 –yes 192.168.41.101:7000

./redis-trib.rb reshard –from 923c9ea13ec8fc19bed309dfbfad094320e1ca41 –to 48629733acbb8a580a39403dfac92845d63c97b3 –slots 5461 –yes 192.168.41.101:7000
再次查看集群状态和槽分配。