tidb 备份数据库问题

【TiDB 使用环境】生产环境 /测试/ Poc
【TiDB 版本】
【操作系统】
【部署方式】云上部署(什么云)/机器部署(什么机器配置、什么硬盘)
【集群数据量】
【集群节点数】
【问题复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【复制黏贴 ERROR 报错的日志】
【其他附件:截图/日志/监控】

Cluster kind: standard
Cluster version: v7.1.8-5.2-20250630
Deploy user: tidb
SSH type: builtin

三个tikv

备份命令

tiup br backup db --db db20251023 --pd “10.10.10.10:2379” --storage “local:///home/tidb/beifen/snapshot-2025102300000012” --log-file 2025102300000012.log
====================================================执行命令后
Starting component br: /home/tidb/.tiup/components/br/v7.1.8-5.2-20250630/br backup db --db db20251023 --pd 10.10.10.10:2379 --storage local:///home/tidb/shuju/snapshot-2025102300000002 --log-file 2025102300000002.log
Detail BR log in 2025102300000002.log
[2025/10/23 15:40:57.871 +08:00] [INFO] [meminfo.go:196] [“use physical memory hook”] [cgroupMemorySize=9223372036854771712] [physicalMemorySize=539617456128]
[2025/10/23 15:40:57.889 +08:00] [INFO] [tikv_driver.go:201] [“using API V1.”]
[2025/10/23 15:40:57.890 +08:00] [INFO] [tidb.go:91] [“new domain”] [store=tikv-7545408461279316828] [“ddl lease”=1s] [“stats lease”=-1ns]
[2025/10/23 15:40:57.898 +08:00] [WARN] [info.go:367] [“init TiFlashReplicaManager”]
[2025/10/23 15:40:57.906 +08:00] [INFO] [domain.go:3334] [acquireServerID] [serverID=1349] [“lease id”=71a799099568ae93]
[2025/10/23 15:40:58.422 +08:00] [INFO] [domain.go:407] [“full load InfoSchema success”] [isV2=true] [currentSchemaVersion=0] [neededSchemaVersion=57516] [“elapsed time”=506.790197ms]
[2025/10/23 15:40:58.423 +08:00] [INFO] [domain.go:810] [“full load and reset schema validator”]
[2025/10/23 15:40:58.423 +08:00] [WARN] [domain.go:822] [“loading schema takes a long time”] [“take time”=514.888832ms]
[2025/10/23 15:40:58.423 +08:00] [WARN] [domain.go:1749] [“loading schema and starting ddl take a long time, we do a new reload”] [“take time”=514.906502ms]
[2025/10/23 15:40:58.425 +08:00] [INFO] [ddl.go:979] [“change job version in use”] [category=ddl] [old=v1] [new=v2]
[2025/10/23 15:40:58.426 +08:00] [INFO] [ddl.go:810] [“start DDL”] [category=ddl] [ID=c1b4e639-a116-4cc9-9041-d3eea9a26fa3] [runWorker=false] [jobVersion=v2]
[2025/10/23 15:40:58.426 +08:00] [INFO] [ddl.go:787] [“start delRangeManager OK”] [category=ddl] [“is a emulator”=false]
[2025/10/23 15:40:58.427 +08:00] [INFO] [env.go:109] [“the ingest sorted directory”] [category=ddl-ingest] [“data path”=/tmp/tidb/tmp_ddl-4000]
[2025/10/23 15:40:58.427 +08:00] [INFO] [env.go:81] [“init global ingest backend environment finished”] [category=ddl-ingest] [“memory limitation”=269808728064] [“disk usage info”=“disk usage: 4722688/269808726016, backend usage: 0”] [“max open file number”=1000000] [“lightning is initialized”=true]
[2025/10/23 15:40:58.427 +08:00] [INFO] [wait_group_wrapper.go:133] [“background process started”] [source=domain] [process=loadSchemaInLoop]
[2025/10/23 15:40:58.427 +08:00] [INFO] [wait_group_wrapper.go:133] [“background process started”] [source=domain] [process=mdlCheckLoop]
[2025/10/23 15:40:58.427 +08:00] [INFO] [wait_group_wrapper.go:133] [“background process started”] [source=domain] [process=topNSlowQueryLoop]
[2025/10/23 15:40:58.427 +08:00] [INFO] [wait_group_wrapper.go:133] [“background process started”] [source=domain] [process=infoSyncerKeeper]
[2025/10/23 15:40:58.428 +08:00] [INFO] [wait_group_wrapper.go:133] [“background process started”] [source=domain] [process=globalConfigSyncerKeeper]
[2025/10/23 15:40:58.428 +08:00] [INFO] [wait_group_wrapper.go:133] [“background process started”] [source=domain] [process=auditComponentsLoop]
[2025/10/23 15:40:58.428 +08:00] [INFO] [wait_group_wrapper.go:133] [“background process started”] [source=domain] [process=runawayStartLoop]
[2025/10/23 15:40:58.428 +08:00] [INFO] [wait_group_wrapper.go:133] [“background process started”] [source=domain] [process=requestUnitsWriterLoop]
[2025/10/23 15:40:58.428 +08:00] [INFO] [wait_group_wrapper.go:133] [“background process started”] [source=domain] [process=closestReplicaReadCheckLoop]
[2025/10/23 15:40:58.428 +08:00] [INFO] [runaway.go:68] [“try to start runaway manager loop”]
[2025/10/23 15:40:58.428 +08:00] [INFO] [manager.go:295] [“start campaign owner”] [ownerInfo=“[log-backup] /tidb/br-stream/owner”]
[2025/10/23 15:40:58.429 +08:00] [INFO] [wait_group_wrapper.go:133] [“background process started”] [source=domain] [process=logBackupAdvancer]
[2025/10/23 15:40:58.434 +08:00] [INFO] [delete_range.go:162] [“closing delRange”] [category=ddl]
[2025/10/23 15:40:58.435 +08:00] [INFO] [session_pool.go:94] [“closing session pool”] [category=ddl]
[2025/10/23 15:40:58.435 +08:00] [INFO] [ddl.go:1060] [“DDL closed”] [category=ddl] [ID=c1b4e639-a116-4cc9-9041-d3eea9a26fa3] [“take time”=1.13064ms]
[2025/10/23 15:40:58.435 +08:00] [INFO] [ddl.go:779] [“stop DDL”] [category=ddl] [ID=c1b4e639-a116-4cc9-9041-d3eea9a26fa3]
[2025/10/23 15:40:58.437 +08:00] [INFO] [domain.go:3355] [“releaseServerID succeed”] [serverID=1349]
[2025/10/23 15:40:58.437 +08:00] [INFO] [wait_group_wrapper.go:140] [“background process exited”] [source=domain] [process=mdlCheckLoop]
[2025/10/23 15:40:58.437 +08:00] [INFO] [domain.go:1388] [“loadSchemaInLoop exited.”]
[2025/10/23 15:40:58.437 +08:00] [INFO] [domain.go:1191] [“Domain stop auditing components”]
[2025/10/23 15:40:58.437 +08:00] [INFO] [wait_group_wrapper.go:140] [“background process exited”] [source=domain] [process=loadSchemaInLoop]
[2025/10/23 15:40:58.437 +08:00] [INFO] [wait_group_wrapper.go:140] [“background process exited”] [source=domain] [process=auditComponentsLoop]
[2025/10/23 15:40:58.437 +08:00] [INFO] [domain.go:3419] [“serverIDKeeper exited.”]
[2025/10/23 15:40:58.437 +08:00] [INFO] [wait_group_wrapper.go:140] [“background process exited”] [source=domain] [process=runawayStartLoop]
[2025/10/23 15:40:58.437 +08:00] [INFO] [domain.go:912] [“globalConfigSyncerKeeper exited.”]
[2025/10/23 15:40:58.437 +08:00] [INFO] [wait_group_wrapper.go:140] [“background process exited”] [source=domain] [process=globalConfigSyncerKeeper]
[2025/10/23 15:40:58.437 +08:00] [INFO] [wait_group_wrapper.go:140] [“background process exited”] [source=domain] [process=requestUnitsWriterLoop]
[2025/10/23 15:40:58.437 +08:00] [INFO] [domain.go:886] [“infoSyncerKeeper exited.”]
[2025/10/23 15:40:58.437 +08:00] [INFO] [wait_group_wrapper.go:140] [“background process exited”] [source=domain] [process=infoSyncerKeeper]
[2025/10/23 15:40:58.437 +08:00] [INFO] [manager.go:414] [“failed to campaign”] [“owner info”=“[log-backup] /tidb/br-stream/owner ownerManager 43d83362-11e7-427d-a268-4fb41532fec5”] [error=“context canceled”]
[2025/10/23 15:40:58.438 +08:00] [INFO] [manager.go:398] [“break campaign loop, context is done”] [“owner info”=“[log-backup] /tidb/br-stream/owner ownerManager 43d83362-11e7-427d-a268-4fb41532fec5”]
[2025/10/23 15:40:58.439 +08:00] [INFO] [domain.go:858] [“topNSlowQueryLoop exited.”]
[2025/10/23 15:40:58.439 +08:00] [INFO] [wait_group_wrapper.go:140] [“background process exited”] [source=domain] [process=topNSlowQueryLoop]
[2025/10/23 15:40:58.439 +08:00] [INFO] [domain.go:1874] [“closestReplicaReadCheckLoop exited.”]
[2025/10/23 15:40:58.439 +08:00] [INFO] [wait_group_wrapper.go:140] [“background process exited”] [source=domain] [process=closestReplicaReadCheckLoop]
[2025/10/23 15:40:58.439 +08:00] [INFO] [wait_group_wrapper.go:140] [“background process exited”] [source=domain] [process=logBackupAdvancer]
[2025/10/23 15:40:58.439 +08:00] [INFO] [domain.go:1563] [“domain closed”] [“take time”=5.749433ms]
Database Backup <…> 0.00%
==============================一直卡在这边
++++++++++++++++++++2025102300000012.log日志的一部分
[2025/10/23 15:43:55.243 +08:00] [INFO] [client.go:147] [“start wait store backups”] [remainingProducers=0]
[2025/10/23 15:43:55.243 +08:00] [INFO] [client.go:138] [“collect backups goroutine exits”] [round=109]
[2025/10/23 15:43:55.443 +08:00] [INFO] [client.go:184] [“This round of backup starts…”] [round=110]
[2025/10/23 15:43:55.443 +08:00] [INFO] [client.go:225] [“backup ranges”] [round=110] [incomplete-ranges=1] [cost=9.79µs]
[2025/10/23 15:43:55.444 +08:00] [WARN] [client.go:242] [“store not alive, skip backup it in this round”] [round=110] [error=“the store last heartbeat is too far, at 4m2.896904442s: [BR:KV:ErrKVStorage]tikv storage occur I/O error”] [errorVerbose=“[BR:KV:ErrKVStorage]tikv storage occur I/O error\nthe store last heartbeat is too far, at 4m2.896904442s\ngithub.com/pingcap/tidb/br/pkg/utils.CheckStoreLiveness\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/utils/misc.go:144\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).RunLoop\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/backup/client.go:240\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/backup/client.go:1126\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/task/backup.go:689\nmain.runBackupCommand\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/backup.go:57\nmain.newDBBackupCommand.func1\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/backup.go:164\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:985\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041\nmain.main\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.6.linux-amd64/src/runtime/proc.go:272\nruntime.goexit\n\t/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.6.linux-amd64/src/runtime/asm_amd64.s:1700”]
[2025/10/23 15:43:55.444 +08:00] [WARN] [client.go:242] [“store not alive, skip backup it in this round”] [round=110] [error=“the store last heartbeat is too far, at 3m58.66628062s: [BR:KV:ErrKVStorage]tikv storage occur I/O error”] [errorVerbose=“[BR:KV:ErrKVStorage]tikv storage occur I/O error\nthe store last heartbeat is too far, at 3m58.66628062s\ngithub.com/pingcap/tidb/br/pkg/utils.CheckStoreLiveness\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/utils/misc.go:144\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).RunLoop\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/backup/client.go:240\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/backup/client.go:1126\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/task/backup.go:689\nmain.runBackupCommand\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/backup.go:57\nmain.newDBBackupCommand.func1\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/backup.go:164\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:985\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041\nmain.main\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.6.linux-amd64/src/runtime/proc.go:272\nruntime.goexit\n\t/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.6.linux-amd64/src/runtime/asm_amd64.s:1700”]
[2025/10/23 15:43:55.444 +08:00] [WARN] [client.go:242] [“store not alive, skip backup it in this round”] [round=110] [error=“the store last heartbeat is too far, at 3m59.502448877s: [BR:KV:ErrKVStorage]tikv storage occur I/O error”] [errorVerbose=“[BR:KV:ErrKVStorage]tikv storage occur I/O error\nthe store last heartbeat is too far, at 3m59.502448877s\ngithub.com/pingcap/tidb/br/pkg/utils.CheckStoreLiveness\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/utils/misc.go:144\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).RunLoop\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/backup/client.go:240\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/backup/client.go:1126\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/task/backup.go:689\nmain.runBackupCommand\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/backup.go:57\nmain.newDBBackupCommand.func1\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/backup.go:164\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:985\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041\nmain.main\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.6.linux-amd64/src/runtime/proc.go:272\nruntime.goexit\n\t/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.6.linux-amd64/src/runtime/asm_amd64.s:1700”]
[2025/10/23 15:43:55.444 +08:00] [INFO] [client.go:147] [“start wait store backups”] [remainingProducers=0]
[2025/10/23 15:43:55.444 +08:00] [INFO] [client.go:138] [“collect backups goroutine exits”] [round=110]
[tidb@ttidbpoc01 ~]$ vi 2025102300000003.log
[2025/10/23 15:44:22.816 +08:00] [WARN] [client.go:242] [“store not alive, skip backup it in this round”] [round=246] [error=“the store last heartbeat is too far, at 3m56.033828539s: [BR:KV:ErrKVStorage]tikv storage occur I/O error”] [errorVerbose=“[BR:KV:ErrKVStorage]tikv storage occur I/O error\nthe store last heartbeat is too far, at 3m56.033828539s\ngithub.com/pingcap/tidb/br/pkg/utils.CheckStoreLiveness\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/utils/misc.go:144\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).RunLoop\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/backup/client.go:240\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/backup/client.go:1126\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/task/backup.go:689\nmain.runBackupCommand\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/backup.go:57\nmain.newDBBackupCommand.func1\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/backup.go:164\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:985\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041\nmain.main\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.6.linux-amd64/src/runtime/proc.go:272\nruntime.goexit\n\t/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.6.linux-amd64/src/runtime/asm_amd64.s:1700”]
[2025/10/23 15:44:22.816 +08:00] [INFO] [client.go:147] [“start wait store backups”] [remainingProducers=0]
[2025/10/23 15:44:22.816 +08:00] [INFO] [client.go:138] [“collect backups goroutine exits”] [round=246]
[2025/10/23 15:44:23.016 +08:00] [INFO] [client.go:184] [“This round of backup starts…”] [round=247]
[2025/10/23 15:44:23.016 +08:00] [INFO] [client.go:225] [“backup ranges”] [round=247] [incomplete-ranges=1] [cost=3.73µs]
[2025/10/23 15:44:23.017 +08:00] [WARN] [client.go:242] [“store not alive, skip backup it in this round”] [round=247] [error=“the store last heartbeat is too far, at 3m57.070808905s: [BR:KV:ErrKVStorage]tikv storage occur I/O error”] [errorVerbose=“[BR:KV:ErrKVStorage]tikv storage occur I/O error\nthe store last heartbeat is too far, at 3m57.070808905s\ngithub.com/pingcap/tidb/br/pkg/utils.CheckStoreLiveness\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/utils/misc.go:144\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).RunLoop\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/backup/client.go:240\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/backup/client.go:1126\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/task/backup.go:689\nmain.runBackupCommand\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/backup.go:57\nmain.newDBBackupCommand.func1\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/backup.go:164\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:985\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041\nmain.main\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.6.linux-amd64/src/runtime/proc.go:272\nruntime.goexit\n\t/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.6.linux-amd64/src/runtime/asm_amd64.s:1700”]
[2025/10/23 15:44:23.017 +08:00] [WARN] [client.go:242] [“store not alive, skip backup it in this round”] [round=247] [error=“the store last heartbeat is too far, at 4m0.467118771s: [BR:KV:ErrKVStorage]tikv storage occur I/O error”] [errorVerbose=“[BR:KV:ErrKVStorage]tikv storage occur I/O error\nthe store last heartbeat is too far, at 4m0.467118771s\ngithub.com/pingcap/tidb/br/pkg/utils.CheckStoreLiveness\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/utils/misc.go:144\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).RunLoop\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/backup/client.go:240\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/backup/client.go:1126\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/task/backup.go:689\nmain.runBackupCommand\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/backup.go:57\nmain.newDBBackupCommand.func1\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/backup.go:164\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:985\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041\nmain.main\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.6.linux-amd64/src/runtime/proc.go:272\nruntime.goexit\n\t/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.6.linux-amd64/src/runtime/asm_amd64.s:1700”]
[2025/10/23 15:44:23.017 +08:00] [WARN] [client.go:242] [“store not alive, skip backup it in this round”] [round=247] [error=“the store last heartbeat is too far, at 3m56.234910857s: [BR:KV:ErrKVStorage]tikv storage occur I/O error”] [errorVerbose=“[BR:KV:ErrKVStorage]tikv storage occur I/O error\nthe store last heartbeat is too far, at 3m56.234910857s\ngithub.com/pingcap/tidb/br/pkg/utils.CheckStoreLiveness\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/utils/misc.go:144\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).RunLoop\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/backup/client.go:240\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/backup/client.go:1126\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/pkg/task/backup.go:689\nmain.runBackupCommand\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/backup.go:57\nmain.newDBBackupCommand.func1\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/backup.go:164\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:985\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041\nmain.main\n\t/home/jenkins/agent/workspace/pingkai/tidb/release/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.6.linux-amd64/src/runtime/proc.go:272\nruntime.goexit\n\t/root/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.6.linux-amd64/src/runtime/asm_amd64.s:1700”]
[2025/10/23 15:44:23.017 +08:00] [INFO] [client.go:147] [“start wait store backups”] [remainingProducers=0]
[2025/10/23 15:44:23.017 +08:00] [INFO] [client.go:138] [“collect backups goroutine exits”] [round=247]

每个tikv 都挂载的是:
10.10.10.11:/ttidbnas01 6.4T 538G 5.9T 9% /home/tidb/shuju
麻烦各位老师给看看 谢谢!

1 个赞

【问题复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】

你遇到的问题是什么?

资源配置发一下?

          total        used        free      shared  buff/cache   available

Mem: 502Gi 11Gi 438Gi 422Mi 51Gi 487Gi

文件系统 容量 已用 可用 已用% 挂载点
devtmpfs 252G 0 252G 0% /dev
tmpfs 252G 0 252G 0% /dev/shm
tmpfs 252G 418M 251G 1% /run
tmpfs 252G 0 252G 0% /sys/fs/cgroup
/dev/mapper/klas-root 4.3T 131G 4.2T 3% /
tmpfs 252G 4.6M 252G 1% /tmp
/dev/sda2 2.0G 166M 1.9G 9% /boot
/dev/sda1 1022M 7.7M 1015M 1% /boot/efi
tmpfs 51G 0 51G 0% /run/user/0
10.10.10.11:/ttidbnas01 6.4T 538G 5.9T 9% /home/tidb/shuju
10.10.10.11:/ttidbnas01/ttidbnas0165 6.4T 538G 5.9T 9% /home/tidb/beifen

tikv正常吗?

tiup cluster display tidb-test
Cluster type: tidb
Cluster name: tidb-test
Cluster kind: standard
Cluster version: v7.1.8-5.2-20250630
Deploy user: tidb
SSH type: builtin
Grafana URL: http://10.10.10.10:3000
Dashboard URL: http://10.10.10.10:2379/dashboard
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir


10.10.10.10:3000 grafana 10.10.10.10 3000 linux/x86_64 Up - /tidb-deploy/grafana-3000
10.10.10.10:2379 pd 10.10.10.10 2379/2380 linux/x86_64 Up|UI /tidb-data/pd-2379 /tidb-deploy/pd-2379
10.10.20.10:2379 pd 10.10.20.10 2379/2380 linux/x86_64 Up|L /tidb-data/pd-2379 /tidb-deploy/pd-2379
10.10.30.10:2379 pd 10.10.30.10 2379/2380 linux/x86_64 Up /tidb-data/pd-2379 /tidb-deploy/pd-2379
10.10.10.10:9090 prometheus 10.10.10.10 9090/12020 linux/x86_64 Up /tidb-data/prometheus-9090 /tidb-deploy/prometheus-9090
10.10.10.10:4000 tidb 10.10.10.10 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
10.10.20.10:4000 tidb 10.10.20.10 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
10.10.30.10:4000 tidb 10.10.30.10 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
10.10.10.10:20160 tikv 10.10.10.10 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
10.10.20.10:20160 tikv 10.10.20.10 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
10.10.30.10:20160 tikv 10.10.30.10 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
Total nodes: 11

store not alive, skip backup it in this round
[BR:KV:ErrKVStorage]tikv storage occur I/O error
TiKV 节点异常了应该,看下监控

备份空间要关注

1 个赞

有点复杂的样子

–storage “local:///home/tidb/beifen

这种协议不支持吧?建议备份到nfs存储或者 s3:// 存储

使用br备份的时候,所有的tikv节点都有能够访问到共享存储才行。

看下集群状态,看着像tikv节点不正常

备份卡住是因为tikv底层使用了nfs存储,导致 I/O 异常、心跳超时,tikv不支持nfs,必须使用本地ssd。

看看集群状态

看下资源情况

可以排查下集群状态是否正常,感觉应该是tikv有问题

日志中的 [BR:KV:ErrKVStorage]tikv storage occur I/O error 和 “the store last heartbeat is too far”是核心信号呀

2 个赞

可能是所有 TiKV 节点存储 I/O 异常、心跳超时未被 BR 识别为存活节点,导致无法读取数据进行备份

2 个赞

BR 工具检测到 3 个 TiKV 节点均处于 “非存活” 状态,每次备份轮询都会跳过所有节点

2 个赞

[“collect backups goroutine exits”] [round=247]

1 个赞