tidb集群扩容tikv节点后,集群经常不稳定发生重启

【TiDB 使用环境】生产环境
【TiDB 版本】8.5.1
【遇到的问题:问题现象及影响】扩容后的tikv节点经常发生重启,报错都类似,这是什么问题导致的呢
【复制黏贴 ERROR 报错的日志】
【其他附件:截图/日志/监控】

2026-03-26 04:39:00 (UTC+08:00)TiKV 1.1.1.1:20160[lib.rs:479] ["elapsed=454372632; when=454372477"] [backtrace="   0: tikv_util::set_panic_hook::{{closure}}\n             at /workspace/source/tikv/components/tikv_util/src/lib.rs:478:18\n   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2029:9\n      std::panicking::rust_panic_with_hook\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:783:13\n   2: std::panicking::begin_panic_handler::{{closure}}\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:657:13\n   3: std::sys_common::backtrace::__rust_end_short_backtrace\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:171:18\n   4: rust_begin_unwind\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:645:5\n   5: core::panicking::panic_fmt\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:72:14\n   6: tokio_timer::wheel::Wheel<T>::set_elapsed\n             at /workspace/.cargo/git/checkouts/tokio-8e927faba632ed16/4394380/tokio-timer/src/wheel/mod.rs:225:9\n      tokio_timer::wheel::Wheel<T>::poll\n   7: tokio_timer::timer::Timer<T,N>::process\n             at /workspace/.cargo/git/checkouts/tokio-8e927faba632ed16/4394380/tokio-timer/src/timer/mod.rs:272:33\n      <tokio_timer::timer::Timer<T,N> as tokio_executor::park::Park>::park\n             at /workspace/.cargo/git/checkouts/tokio-8e927faba632ed16/4394380/tokio-timer/src/timer/mod.rs:379:9\n      tokio_timer::timer::Timer<T,N>::turn\n             at /workspace/.cargo/git/checkouts/tokio-8e927faba632ed16/4394380/tokio-timer/src/timer/mod.rs:256:21\n   8: tikv_util::timer::start_global_timer::{{closure}}\n             at /workspace/source/tikv/components/tikv_util/src/timer.rs:111:17\n      <std::thread::Builder as tikv_util::sys::thread::StdThreadBuildWrapper>::spawn_wrapper::{{closure}}\n             at /workspace/source/tikv/components/tikv_util/src/sys/thread.rs:438:13\n      std::sys_common::backtrace::__rust_begin_short_backtrace\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:155:18\n   9: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:529:17\n      <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9\n      std::panicking::try::do_call\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:552:40\n      std::panicking::try\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:516:19\n      std::panic::catch_unwind\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:142:14\n      std::thread::Builder::spawn_unchecked_::{{closure}}\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:528:30\n      core::ops::function::FnOnce::call_once{{vtable.shim}}\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5\n  10: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2015:9\n      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2015:9\n      std::sys::unix::thread::Thread::new::thread_start\n             at /root/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/thread.rs:108:17\n  11: start_thread\n  12: __clone\n"] [location=/workspace/.cargo/git/checkouts/tokio-8e927faba632ed16/4394380/tokio-timer/src/wheel/mod.rs:225] [thread_name=timer] [thread_id=14]

好多rust异常代码信息,但是就看到一个timer 计时器的信息

是不是时钟异常导致panic了

TiKV 1.1.1.1:20160

这个地址不太对吧…

不知道他的ip是不是脱敏过

网络问题也会导致无法正常的获取数据了

嗯,可能他发日志的时候批量替换ip脱敏了,不然1.1.1.1有点怪怪的

哈哈哈,是的,ip脱敏了

时钟看着像没问题额,和其他节点也比对过,时间都是一致的,而且就那么一下,我也没有手动干预时间。

TiDB 8.5.1 版本 tokio 时间轮缺陷 + 扩容后时钟 / 负载异常 共同导致的 TiKV 重启。优先修复时钟同步 + 缓解调度压力根本解决需升级到 8.5.3+ 版本

这里有官方说明吗?不知道我们这个集群,是否是由于这个缺陷导致的呢,时钟我都对过,应该没问题

自动重启多数是内存溢出的情况,需要重点排查有没有大事务或者复杂的慢sql

@tidb小白 升级下 TiDB 内核版本到 v8.5.5 看一下先。

elapsed(当前流逝时间),小于 when(预期时间),触发了 Tokio 定时器的“断言失败”。可能是 :系统时间出现了回退 / 跳变,定时器计算出「当前时间比过去更早」,直接崩溃

为什么会发生时间跳变呢 :joy:我也配置了ntp了