tidb 监控 tikv pending task 中的cleanup-worker 一直居高不下

Aunt-Shirly · 2026 年2 月 27 日 23:42

Hello, cleanup-worker 任务堆积，应该是遇到了以下问题：

github.com/tikv/tikv

`Task::CheckAndCompact` may accumulate seriously when there are lots regions need to be compacted

已打开 07:44AM - 17 Apr 25 UTC

已关闭 06:42AM - 22 Jul 25 UTC

AndreMouche

type/enhancement

## Development Task TiKV do check and compact `region-compact-check-step`(100)… regions every `region-compact-check-interval`(5minutes) when the region meet the compaction condition: - region-compact-min-tombstones - region-compact-tombstones-percent - region-compact-min-redundant-rows - region-compact-redundant-rows-percent https://github.com/tikv/tikv/blob/0a97ca8ab18cdbcb626ee4e24b6db80ad472fd71/components/raftstore/src/store/worker/compact.rs#L440-L455 and TiKV process the `Task::CheckAndCompact` in one task which may compact `region-compact-check-step` regions most : https://github.com/tikv/tikv/blob/0a97ca8ab18cdbcb626ee4e24b6db80ad472fd71/components/raftstore/src/store/worker/compact.rs#L411-L438 https://github.com/tikv/tikv/blob/0a97ca8ab18cdbcb626ee4e24b6db80ad472fd71/components/raftstore/src/store/worker/compact.rs#L316-L345 if this task not finished in 5 minutes, the task `Task::CheckAndCompact` will be accumulated since there is only 1 thread in `cleanup-worker` ![Image](https://github.com/user-attachments/assets/d18e2b86-4296-4a8a-99bb-8a5b79343f73) the `compact_runner` was processed by `cleanup_worker`: https://github.com/tikv/tikv/blob/0a97ca8ab18cdbcb626ee4e24b6db80ad472fd71/components/raftstore/src/store/fsm/store.rs#L1782-L1798 and `cleanup_worker` only have 1 thread and is unconfigable: https://github.com/tikv/tikv/blob/0a97ca8ab18cdbcb626ee4e24b6db80ad472fd71/components/raftstore/src/store/fsm/store.rs#L1734-L1737 I think we need consider this scenarios for such large clusters

github.com/tikv/tikv

Task::Compact may result in a large number of repeated and useless manual::compact tasks

已打开 03:17AM - 14 May 25 UTC

已关闭 03:41AM - 07 Jul 25 UTC

AndreMouche

type/enhancement affects-7.5 affects-8.1 affects-8.5 affects-9.0

## Bug Report similar with https://github.com/tikv/tikv/issues/18403, if there …are lots pending tasks in cleanup_worker and lots regions need to be compacted, TiKV may find there are some large regions that can not be split, that is, all the mvcc versions accumulated in these regions are all belongs to a same row, in this case, TiKV will init a Task::Compact task to the `cleanup_worker` https://github.com/tikv/tikv/blob/0a97ca8ab18cdbcb626ee4e24b6db80ad472fd71/components/raftstore/src/store/peer.rs#L4562-L4588 and for `cleanup_worker`, if it is busy processing pending tasks, the duplicated Task::Compact maybe send to `cleanup_worker` again and again(interval 10s) until the first `Task::Compact` finished and clear the redundant old mvcc versions. ``` Mar 21, 2025 @ 08:03:14.277 | tikv | error | [split_observer.rs:148] ["failed to handle split req"] [err="\"no valid key found for split.\""] [region_id=157887931] [thread_id=106] Mar 21, 2025 @ 08:03:24.268 | tikv | error | [split_observer.rs:148] ["failed to handle split req"] [err="\"no valid key found for split.\""] [region_id=157887931] [thread_id=106] Mar 21, 2025 @ 08:03:34.277 [split_observer.rs:148] ["failed to handle split req"] [err="\"no valid key found for split.\""] [region_id=157887931] [thread_id=105] ``` At this time, these duplicate `Task::Compact` will still be in the pending tasks of `cleanup_worker`, and when it is its turn, because we do not check the redundant mvccs here, we directly send the manual compact task to rocksdb. https://github.com/tikv/tikv/blob/0a97ca8ab18cdbcb626ee4e24b6db80ad472fd71/components/raftstore/src/store/worker/compact.rs#L380-L409 However, in fact, the duplicate manual::compact task can no longer do anything at this time, but it will take up the time of the `cleanup-worker` to handle and wait the compaction job finished, thereby blocking the execution of the other tasks that do need compaction. ### What version of TiKV are you using? master ### Steps to reproduce A big cluster and enable TTL, meanwhile, update some keys frequently ### What did you expect? big region that belongs to one row could be compaction in time

当前 TiKV 的 cleanup_worker 为单线程模型，主要负责处理两类任务：

Compact 任务
由 split-checker 发起。不检查 region 是否真正符合 compaction 条件。直接向 RocksDB 发起 CompactRange 请求并同步等待返回。执行成本较高。
CheckAndCompact 任务
每次检查 100 个 Region：如果符合条件，则向 RocksDB 发起 CompactRange 操作并等待返回。
默认每 5 分钟执行一次。

核心问题

问题一：cleanup_worker 无法并发执行 compaction

由于 cleanup_worker 是单线程模型：

所有 compact 请求只能 串行执行
无法并发向 RocksDB 发起 compact 任务
无法充分利用系统 CPU 和 IO 资源
RocksDB compaction 资源利用率较低

表现为：

RocksDB compaction CPU 使用率无法提升
每次只存在一个 manual compaction 任务

问题二：大量 Region 导致 CheckAndCompact 无法跟上

当需要 compaction 的 Region 数量非常多时：

1）CheckAndCompact 扫描效率严重不足

每 5 分钟只检查 100 个 Region
如果集群存在大量 Region：
- 单个 Region 可能需要 数天才能被检查一次

2）CheckAndCompact 无法按周期完成

当 cleanup_worker 处理速度不足时：

一个周期内的 CheckAndCompact 任务无法完成
新任务继续进入队列
导致 cleanup_worker 队列堆积

问题三：Split-checker 产生大量重复 Compact 任务

由于部分 Region MVCC 版本堆积严重，会形成较大的 Region：

1）Split-checker 检测到大 Region 后：

向 cleanup_worker 提交 Compact 任务

2）Compact 任务需要排队执行：

排队时间较长

3）Split-checker 每 10 秒触发一次：

会重复提交相同 Region 的 Compact 任务

4）导致 cleanup_worker 队列中：

同一个 Region 的 Compact 任务大量堆积

。。。
具体可以见 issue 描述。

最终上诉问题会导致：
最终导致：

cleanup_worker 队列长期堆积
MVCC 版本堆积
Region 无法 split
冗余相同 compaction 循环发生

处理办法：找到 TiKV 中报 failed to handle split req 相关日志里面的 region-id, 对其进行手动 compact 后，重启 TiKV 节点，观察 cleanup-worker 的 pending task 是否继续上涨。