tidb集群新搭建,三个tikv节点只能启动一个,其余两个一直在重启!

【 TiDB 使用环境】测试/
【 TiDB 版本】
tidb新创建集群,在启动时报错:
Starting component tikv
Starting instance 10.42.32.4:20161
Starting instance 10.42.32.5:20160
Starting instance 10.42.32.3:20161
Start instance 10.42.32.5:20160 success
Start instance 10.42.32.4:20161 success

Error: failed to start tikv: failed to start: 10.42.32.3 tikv-20161.service, please check the instance’s log(/opt/tidb-deploy/tikv-20161/log) for more detail.: timed out waiting for port 20161 to be started after 2m0s
根据提示查看tikv日志发现未启动的tikv节点一直在重启:
[2025/12/24 20:31:47.648 +08:00] [INFO] [resource_group.rs:151] [“add resource group”] [ru=2147483647] [name=default] [thread_id=1]
[2025/12/24 20:31:47.650 +08:00] [INFO] [resource_group.rs:151] [“add resource group”] [ru=2147483647] [name=default] [thread_id=26]
[2025/12/24 20:31:47.650 +08:00] [INFO] [service.rs:70] [“pd meta client creating watch stream.”] [rev=5915] [path=resource_group/settings] [thread_id=26]
[2025/12/24 20:31:47.652 +08:00] [INFO] [service.rs:193] [“load controller config”] [config=“RequestUnitConfig { read_base_cost: 0.125, read_cost_per_byte: 1.52587890625e-5, write_base_cost: 1.0, write_cost_per_byte: 0.0009765625, read_cpu_ms_cost: 0.3333333333333333 }”] [thread_id=25]
[2025/12/24 20:31:47.653 +08:00] [INFO] [mod.rs:130] [“encryption: none of key dictionary and file dictionary are found.”] [thread_id=1]
[2025/12/24 20:31:47.654 +08:00] [INFO] [mod.rs:549] [“encryption is disabled.”] [thread_id=1]
[2025/12/24 20:31:47.657 +08:00] [INFO] [engine.rs:93] [“Recovering raft logs takes 2.715525ms”] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [lib.rs:91] [“Welcome to TiKV”] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [lib.rs:96] [“Release Version: 8.5.4”] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [lib.rs:96] [“Edition: Community”] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [lib.rs:96] [“Git Commit Hash: 4855bdccc64e7a8551d30ebbbd5be75a42929265”] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [lib.rs:96] [“Git Commit Branch: HEAD”] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [lib.rs:96] [“UTC Build Time: 2025-11-20 06:49:28”] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [lib.rs:96] [“Rust Version: rustc 1.77.0-nightly (89e2160c4 2023-12-27)”] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [lib.rs:96] [“Enable Features: memory-engine pprof-fp jemalloc mem-profiling portable sse test-engine-kv-rocksdb test-engine-raft-raft-engine trace-async-tasks openssl-vendored”] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [lib.rs:96] [“Profile: dist_release”] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [fips.rs:40] [“OpenSSL FIPS mode is disabled”] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [mod.rs:125] [“cgroup quota: memory=Some(9223372036854771712), cpu=None, cores={14, 28, 30, 8, 10, 3, 16, 25, 21, 26, 13, 5, 11, 29, 23, 6, 15, 18, 22, 4, 2, 9, 31, 19, 1, 7, 20, 0, 27, 12, 24, 17}”] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [mod.rs:132] [“memory limit in bytes: 135047979008, cpu cores quota: 32”] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [WARN] [lib.rs:528] [“environment variable TZ is missing, using /etc/localtime”] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [config.rs:914] [“kernel parameters”] [value=32768] [param=net.core.somaxconn] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [config.rs:914] [“kernel parameters”] [value=0] [param=net.ipv4.tcp_syncookies] [thread_id=1]
[2025/12/24 20:32:03.568 +08:00] [INFO] [config.rs:914] [“kernel parameters”] [value=0] [param=vm.swappiness] [thread_id=1]
[2025/12/24 20:32:03.586 +08:00] [INFO] [util.rs:639] [“connecting to PD endpoint”] [endpoints=10.42.32.3:2379] [thread_id=1]
[2025/12/24 20:32:03.587 +08:00] [INFO] [] [“TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter”] [thread_id=1]
[2025/12/24 20:32:03.591 +08:00] [INFO] [util.rs:639] [“connecting to PD endpoint”] [endpoints=10.42.32.4:2379] [thread_id=1]
[2025/12/24 20:32:03.594 +08:00] [INFO] [util.rs:639] [“connecting to PD endpoint”] [endpoints=10.42.32.5:2379] [thread_id=1]
[2025/12/24 20:32:03.595 +08:00] [INFO] [util.rs:639] [“connecting to PD endpoint”] [endpoints=http://10.42.32.3:2379] [thread_id=1]
[2025/12/24 20:32:03.597 +08:00] [INFO] [util.rs:639] [“connecting to PD endpoint”] [endpoints=http://10.42.32.5:2379] [thread_id=1]
[2025/12/24 20:32:03.599 +08:00] [INFO] [util.rs:809] [“connected to PD member”] [endpoints=http://10.42.32.5:2379] [thread_id=1]
[2025/12/24 20:32:03.600 +08:00] [INFO] [util.rs:631] [“all PD endpoints are consistent”] [endpoints=“["10.42.32.3:2379", "10.42.32.4:2379", "10.42.32.5:2379"]”] [thread_id=1]
[2025/12/24 20:32:03.602 +08:00] [INFO] [common.rs:327] [“connect to PD cluster”] [cluster_id=7587347571922148693] [thread_id=1]
[2025/12/24 20:32:03.606 +08:00] [INFO] [config.rs:438] [“using default coprocessor quota”] [quota=ReadableSize(16880997376)] [thread_id=1]
[2025/12/24 20:32:03.607 +08:00] [WARN] [mod.rs:2044] [“raft-engine.batch-compression-threshold 8KiB should be adpative to the size of async-io. Set it to 4KiB instead.”] [thread_id=1]
[2025/12/24 20:32:03.608 +08:00] [INFO] [config.rs:438] [“using default coprocessor quota”] [quota=ReadableSize(16880997376)] [thread_id=1]
[2025/12/24 20:32:03.608 +08:00] [INFO] [common.rs:447] [“beginning system configuration check”] [thread_id=1]

[2025/12/24 20:32:03.618 +08:00] [INFO] [mod.rs:130] [“encryption: none of key dictionary and file dictionary are found.”] [thread_id=1]
[2025/12/24 20:32:03.618 +08:00] [INFO] [mod.rs:549] [“encryption is disabled.”] [thread_id=1]
[2025/12/24 20:32:03.620 +08:00] [INFO] [engine.rs:93] [“Recovering raft logs takes 1.92425ms”] [thread_id=1]
[2025/12/24 20:32:19.284 +08:00] [INFO] [lib.rs:91] [“Welcome to TiKV”] [thread_id=1]
[2025/12/24 20:32:19.284 +08:00] [INFO] [lib.rs:96] [“Release Version: 8.5.4”] [thread_id=1]
[2025/12/24 20:32:19.284 +08:00] [INFO] [lib.rs:96] [“Edition: Community”] [thread_id=1]
[2025/12/24 20:32:19.284 +08:00] [INFO] [lib.rs:96] [“Git Commit Hash: 4855bdccc64e7a8551d30ebbbd5be75a42929265”] [thread_id=1]
[2025/12/24 20:32:19.284 +08:00] [INFO] [lib.rs:96] [“Git Commit Branch: HEAD”] [thread_id=1]
[2025/12/24 20:32:19.285 +08:00] [INFO] [lib.rs:96] [“UTC Build Time: 2025-11-20 06:49:28”] [thread_id=1]
[2025/12/24 20:32:19.285 +08:00] [INFO] [lib.rs:96] [“Rust Version: rustc 1.77.0-nightly (89e2160c4 2023-12-27)”] [thread_id=1]
[2025/12/24 20:32:19.285 +08:00] [INFO] [lib.rs:96] [“Enable Features: memory-engine pprof-fp jemalloc mem-profiling portable sse test-engine-kv-rocksdb test-engine-raft-raft-engine trace-async-tasks openssl-vendored”] [thread_id=1]
[2025/12/24 20:32:19.285 +08:00] [INFO] [lib.rs:96] [“Profile: dist_release”] [thread_id=1]
[2025/12/24 20:32:19.285 +08:00] [INFO] [fips.rs:40] [“OpenSSL FIPS mode is disabled”] [thread_id=1]
[2025/12/24 20:32:19.285 +08:00] [INFO] [mod.rs:125] [“cgroup quota: memory=Some(9223372036854771712), cpu=None, cores={3, 5, 4, 31, 13, 10, 19, 7, 25, 20, 11, 16, 12, 15, 18, 0, 23, 17, 6, 22, 21, 1, 28, 8, 9, 30, 27, 14, 26, 2, 24, 29}”] [thread_id=1]
[2025/12/24 20:32:19.285 +08:00] [INFO] [mod.rs:132] [“memory limit in bytes: 135047979008, cpu cores quota: 32”] [thread_id=1]
[2025/12/24 20:32:19.285 +08:00] [WARN] [lib.rs:528] [“environment variable TZ is missing, using /etc/localtime”] [thread_id=1]
[2025/12/24 20:32:19.285 +08:00] [INFO] [config.rs:914] [“kernel parameters”] [value=32768] [param=net.core.somaxconn] [thread_id=1]
[2025/12/24 20:32:19.285 +08:00] [INFO] [config.rs:914] [“kernel parameters”] [value=0] [param=net.ipv4.tcp_syncookies] [thread_id=1]
[2025/12/24 20:32:19.285 +08:00] [INFO] [config.rs:914] [“kernel parameters”] [value=0] [param=vm.swappiness] [thread_id=1]
[2025/12/24 20:32:19.304 +08:00] [INFO] [util.rs:639] [“connecting to PD endpoint”] [endpoints=10.42.32.3:2379] [thread_id=1]
[2025/12/24 20:32:19.304 +08:00] [INFO] [] [“TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter”] [thread_id=1]
[2025/12/24 20:32:19.308 +08:00] [INFO] [util.rs:639] [“connecting to PD endpoint”] [endpoints=10.42.32.4:2379] [thread_id=1]
[2025/12/24 20:32:19.310 +08:00] [INFO] [util.rs:639] [“connecting to PD endpoint”] [endpoints=10.42.32.5:2379] [thread_id=1]
[2025/12/24 20:32:19.311 +08:00] [INFO] [util.rs:639] [“connecting to PD endpoint”] [endpoints=http://10.42.32.3:2379] [thread_id=1]
[2025/12/24 20:32:19.313 +08:00] [INFO] [util.rs:639] [“connecting to PD endpoint”] [endpoints=http://10.42.32.5:2379] [thread_id=1]
[2025/12/24 20:32:19.315 +08:00] [INFO] [util.rs:809] [“connected to PD member”] [endpoints=http://10.42.32.5:2379] [thread_id=1]
[2025/12/24 20:32:19.315 +08:00] [INFO] [util.rs:631] [“all PD endpoints are consistent”] [endpoints=“["10.42.32.3:2379", "10.42.32.4:2379", "10.42.32.5:2379"]”] [thread_id=1]
[2025/12/24 20:32:19.317 +08:00] [INFO] [common.rs:327] [“connect to PD cluster”] [cluster_id=7587347571922148693] [thread_id=1]
[2025/12/24 20:32:19.322 +08:00] [INFO] [config.rs:438] [“using default coprocessor quota”] [quota=ReadableSize(16880997376)] [thread_id=1]
[2025/12/24 20:32:19.322 +08:00] [WARN] [mod.rs:2044] [“raft-engine.batch-compression-threshold 8KiB should be adpative to the size of async-io. Set it to 4KiB instead.”] [thread_id=1]
[2025/12/24 20:32:19.323 +08:00] [INFO] [config.rs:438] [“using default coprocessor quota”] [quota=ReadableSize(16880997376)] [thread_id=1]
[2025/12/24 20:32:19.324 +08:00] [INFO] [common.rs:447] [“beginning system configuration check”] [thread_id=1]
[2025/12/24 20:32:19.324 +08:00] [INFO] [config.rs:1101] [“data dir”] [mount_fs=“FsInfo { tp: "xfs", opts: "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota", mnt_dir: "/opt", fsname: "/dev/vdb" }”] [data_path=/opt/tidb-data/tikv-20162] [thread_id=1]
[2025/12/24 20:32:19.324 +08:00] [WARN] [config.rs:1104] [“not on SSD device”] [data_path=/opt/tidb-data/tikv-20162] [thread_id=1]
[2025/12/24 20:32:19.324 +08:00] [INFO] [config.rs:1101] [“data dir”] [mount_fs=“FsInfo { tp: "xfs", opts: "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota", mnt_dir: "/opt", fsname: "/dev/vdb" }”] [data_path=/opt/tidb-data/tikv-20162/raft] [thread_id=1]
[2025/12/24 20:32:19.325 +08:00] [WARN] [config.rs:1104] [“not on SSD device”] [data_path=/opt/tidb-data/tikv-20162/raft] [thread_id=1]

以上是点3的tikv日志,上面启动日志显示点4节点tikv已经正常启动,但启动后立马掉线,后台日志查看也是一直在重启,期间试过改变端口,发现还是一样,真正端口也未被占用。

1 个赞

not on SSD device. 硬盘是SSD吗

不是ssd,三台服务器都不是ssd

看着像是资源不够,服务器cpu内存多少的?

好像是cpu的问题,3和4cpu是海光的,5是intel的

都是 x86 架构的吗?

1 个赞

看下 tikv 日志目录下 stderr.log 有东西吗

海光是arm的

1 个赞

没东西

看下操作系统日志有啥东西没
可以登录到目标 tikv 节点,先停止服务拉起 systemctl stop tikv-20160.service,然后去到部署目录下,找下有个启动脚本,手工运行试试

【部署方式】机器部署(什么机器配置、什么硬盘)
【操作系统/CPU 架构/芯片】

这个信息补充一下

查看一下stderr日志里面有啥报错,一般TiKV 对内存、CPU、磁盘 IO 要求高,资源不足就会启动后就被kill掉

不同的架构cpu不能混合部署

已解决,都换成intel的cpu后正常启动

估计是cpu指令不兼容,系统日志报错:
Dec 25 10:22:13 host-10-42-32-3 kernel: [50256.828453] traps: tikv-server[183117] trap invalid opcode ip:560828c0604c sp:7fffe4d6e298 error:0 in tikv-server[560824800000+5f69000]
Dec 25 10:22:31 host-10-42-32-3 kernel: [50274.588000] traps: tikv-server[184129] trap invalid opcode ip:564e3420604c sp:7ffeb4415698 error:0 in tikv-server[564e2fe00000+5f69000]
Dec 25 10:22:48 host-10-42-32-3 kernel: [50292.086722] traps: tikv-server[185152] trap invalid opcode ip:55d8ad80604c sp:7ffd38b5f698 error:0 in tikv-server[55d8a9400000+5f69000]
Dec 25 10:23:06 host-10-42-32-3 kernel: [50309.835878] traps: tikv-server[186166] trap invalid opcode ip:55c4b120604c sp:7ffd0cf9d598 error:0 in tikv-server[55c4ace00000+5f69000]
Dec 25 10:23:24 host-10-42-32-3 kernel: [50327.591281] traps: tikv-server[187176] trap invalid opcode ip:55957260604c sp:7fff7de95818 error:0 in tikv-server[55956e200000+5f69000]
Dec 25 10:23:41 host-10-42-32-3 kernel: [50345.036584] traps: tikv-server[188131] trap invalid opcode ip:56299760604c sp:7ffd6187ac98 error:0 in tikv-server[562993200000+5f69000]
Dec 25 10:23:58 host-10-42-32-3 kernel: [50362.341914] traps: tikv-server[189036] trap invalid opcode ip:55cb95a0604c sp:7ffe516c4398 error:0 in tikv-server[55cb91600000+5f69000]
Dec 25 10:24:16 host-10-42-32-3 kernel: [50379.750578] traps: tikv-server[189167] trap invalid opcode ip:55c1a060604c sp:7fff5d5b4298 error:0 in tikv-server[55c19c200000+5f69000]
Dec 25 10:24:33 host-10-42-32-3 kernel: [50397.235844] traps: tikv-server[189294] trap invalid opcode ip:557437c0604c sp:7ffc4dec3e18 error:0 in tikv-server[557433800000+5f69000]
Dec 25 10:24:51 host-10-42-32-3 kernel: [50414.756052] traps: tikv-server[189436] trap invalid opcode ip:55ccc040604c sp:7ffcdf076098 error:0 in tikv-server[55ccbc000000+5f69000]
Dec 25 10:25:08 host-10-42-32-3 kernel: [50432.238081] traps: tikv-server[189593] trap invalid opcode ip:5594a500604c sp:7fff6a55be98 error:0 in tikv-server[5594a0c00000+5f69000]

1 个赞

记得课程里讲过CPU主频不一致会有问题

tiup部署
16c32G 普通机械盘(虚拟机)
欧拉操作系统
集群服务器:cpu型号海光(arm架构)和intel(x86架构)混用
问题描述:cpu是海光的服务器节点上出现tikv启动失败,并一直尝试重启,排查发现cpu指令不兼容,更换cpu为intel后正常启动

兼容问题?

1 个赞

混合部署肯定不好使

1 个赞

是的,

1 个赞