【TiDB 使用环境】生产环境
【TiDB 版本】5.4.0
【操作系统】centos7.9
【部署方式】云上部署(什么云)/机器部署(什么机器配置、什么硬盘)
ECS自建集群
【集群数据量】
3pd/4台tidb/5台tikv 16core 64G内存
【集群节点数】
【问题复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
pending-peer-region-count飙升至5650
back off飙升至19918,持续30秒后恢复
summary = TiDB tikvclient_backoff_count error
value = 19918.974358974356
[2025/04/07 13:35:15.300 +03:00] [INFO] [region_request.go:785] [“mark store’s regions need be refill”] [id=303513] [addr=192.168.250.XX:20160] [error=“no available connections”]
[2025/04/07 13:35:15.333 +03:00] [WARN] [client_batch.go:365] [“no available connections”] [target=192.168.250.XX:20160]
[2025/04/07 13:35:15.334 +03:00] [INFO] [region_cache.go:2199] [“[health check] check health error”] [store=192.168.250.XX:20160] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.250.XX:20160: connect: connection refused"”]
tikv error:
[2025/04/07 13:34:43.504 +03:00] [WARN] [apply.rs:632] [“[store 303513] handle ready 1 committed entries”] [takes=8]
[2025/04/07 13:34:43.504 +03:00] [WARN] [apply.rs:632] [“[store 303513] handle ready 1 committed entries”] [takes=8]
[2025/04/07 13:34:43.504 +03:00] [WARN] [write.rs:602] [“[store 303513] async write too slow, write_kv: 0s, write_raft: 0.008722406s, send: 0.000018097s, callback: 0s thread: sync-writer”] [takes=8]
[2025/04/07 13:34:43.504 +03:00] [WARN] [store.rs:854] [“[store 303513] handle 27 pending peers include 19 ready, 0 entries, 0 messages and 0 snapshots”] [takes=9]
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【复制黏贴 ERROR 报错的日志】
【其他附件:截图/日志/监控】
