cdc数据同步无法正常

数据无法正常同步, 日志一直在打印,但是tso就是没变化
tidb、cdc版本:
5.0.3
同步信息:


相关日志:
cdc.txt (7.4 MB)

2 个赞

任务启动多久了?还处在 normal 状态应该问题不大,如果有问题状态会改变。

1 个赞

状态还是normal,
同步过程中出现异常,
请问下怎么处理,,,卡了两天了。
日志如下
2.tar (2).gz (3.1 MB)

1 个赞

看日志里面有很多loadStore from PD failed和context deadline exceeded,是不是集群出啥问题了,display集群看看,再查一下pd的log

1 个赞

display 状态都正常

cdc同步状态变了,提示[CDC:ErrPDBatchLoadRegions][tikv:9001]PD server timeout"
但是我没看到有9001的端口。。。
部分cdc日志
[2021/07/08 17:47:18.462 +08:00] [WARN] [base_client.go:284] [“[pd] cannot update member”] [address=http://172.19.16.135:2379] [error=“[PD:client:ErrClientGetMember]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:172.19.16.135:2379 status:READY”]
[2021/07/08 17:47:20.463 +08:00] [WARN] [owner.go:823] [“failed to update service safe point”] [error=“rpc error: code = DeadlineExceeded desc = context deadline exceeded”] [errorVerbose=“rpc error: code = DeadlineExceeded desc = context deadline exceeded
github.com/tikv/pd/client.(*client).UpdateServiceGCSafePoint
\tgithub.com/tikv/pd@v1.1.0-beta.0.20210527030735-a544782ee076/client/client.go:1253
github.com/pingcap/ticdc/cdc.(*Owner).flushChangeFeedInfos
\tgithub.com/pingcap/ticdc@/cdc/owner.go:820
github.com/pingcap/ticdc/cdc.(*Owner).run
\tgithub.com/pingcap/ticdc@/cdc/owner.go:1432
github.com/pingcap/ticdc/cdc.(*Owner).Run
\tgithub.com/pingcap/ticdc@/cdc/owner.go:1295
github.com/pingcap/ticdc/cdc.(*Server).campaignOwnerLoop
\tgithub.com/pingcap/ticdc@/cdc/server.go:228
github.com/pingcap/ticdc/cdc.(*Server).run.func2
\tgithub.com/pingcap/ticdc@/cdc/server.go:316
The Go Programming Language
\tgolang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57
runtime.goexit
\truntime/asm_amd64.s:1357”] [since-last-update=42m24.42831357s]
[2021/07/08 17:47:21.463 +08:00] [WARN] [base_client.go:284] [“[pd] cannot update member”] [address=http://172.19.16.135:2379] [error=“[PD:client:ErrClientGetMember]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:172.19.16.135:2379 status:READY”]
[2021/07/08 17:47:23.501 +08:00] [ERROR] [client.go:346] [“tso request is canceled due to timeout”] [dc-location=global] [error=“[PD:client:ErrClientGetTSOTimeout]get TSO timeout”]
[2021/07/08 17:47:23.501 +08:00] [ERROR] [client.go:599] [“[pd] getTS error”] [dc-location=global] [error=“[PD:client:ErrClientGetTSO]rpc error: code = Canceled desc = context canceled”]
[2021/07/08 17:47:23.501 +08:00] [WARN] [owner.go:84] [“Fail to update minGCSafePointCache.”] [error=“rpc error: code = Canceled desc = context canceled”] [errorVerbose=“rpc error: code = Canceled desc = context canceled
github.com/tikv/pd/client.(*client).processTSORequests
\tgithub.com/tikv/pd@v1.1.0-beta.0.20210527030735-a544782ee076/client/client.go:717
github.com/tikv/pd/client.(*client).handleDispatcher
\tgithub.com/tikv/pd@v1.1.0-beta.0.20210527030735-a544782ee076/client/client.go:587
runtime.goexit
\truntime/asm_amd64.s:1357
github.com/tikv/pd/client.(*tsoRequest).Wait
\tgithub.com/tikv/pd@v1.1.0-beta.0.20210527030735-a544782ee076/client/client.go:913
github.com/tikv/pd/client.(*client).GetTS
\tgithub.com/tikv/pd@v1.1.0-beta.0.20210527030735-a544782ee076/client/client.go:933
github.com/pingcap/ticdc/cdc.(*Owner).getMinGCSafePointCache
\tgithub.com/pingcap/ticdc@/cdc/owner.go:82
github.com/pingcap/ticdc/cdc.(*Owner).flushChangeFeedInfos
\tgithub.com/pingcap/ticdc@/cdc/owner.go:758
github.com/pingcap/ticdc/cdc.(*Owner).run
\tgithub.com/pingcap/ticdc@/cdc/owner.go:1432
github.com/pingcap/ticdc/cdc.(*Owner).Run
\tgithub.com/pingcap/ticdc@/cdc/owner.go:1295
github.com/pingcap/ticdc/cdc.(*Server).campaignOwnerLoop
\tgithub.com/pingcap/ticdc@/cdc/server.go:228
github.com/pingcap/ticdc/cdc.(*Server).run.func2
\tgithub.com/pingcap/ticdc@/cdc/server.go:316
The Go Programming Language
\tgolang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57
runtime.goexit
\truntime/asm_amd64.s:1357”]

pd的日志
1.txt (4.7 MB)

1 个赞

从日志来看还是PD出问题了,检查下PD和CDC的通信是否有问题,还有Dashboard和pd ctl是否能正常使用

1 个赞

Dashboard和pd ctl 都正常,查询了下service-gc-safepoint 可以正常响应数据。
我看到一个现象,
cdc的机器,在运行的时候,与tikv 的网络连接数有12000个左右 这会不会有问题。。。

这个有人帮忙看下吗。。。

@Ricklee 帮忙看看~

1.麻烦反馈下该同步任务的具体信息,命令如下:

cdc cli changefeed query -s --pd=http://{pd-ip}:2379 --changefeed-id={chagefeed-id}

2.将 ticdc 的监控面板数据也提供下,谢谢

已经弄好了, 初步判断应该是表太多引起的, 有3个库超过2000张表,

方便告知下具体是如何调整的吗?这样其他人遇到类似问题时可以参考下。

我可能大概清楚原因了,不知道你这里和我的是不是一样,我这里cdc遇到了同样的情况,tso一直不变,我也一直以为没有更新,然后在cdc这里做了tcpdump,发现端口能够接收到数据,n多条delete,和上游沟通后才发现上游进行了批量的删除,然后cdc同步这里是分成了多个事务来处理,这块就一直卡死在这里,等处理完也就好了

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。