压测期间,执行计划中total_suspend_time耗时严重。怎么处理?

【TiDB 使用环境】测试环境
【TiDB 版本】V8.5.4
集群没有混合部署, tidb_replica_read设置的是 ‘leader-and-follower’

SELECT
  a.id,
  a.created_time,
  a.created_by,
  a.updated_time,
  a.updated_by,
  a.created_org_id,
  a.version,
  a.is_deleted,
  a.state,
  a.simple_json_extfield,
  a.object_json_extfield,
  a.store_id,
  a.deposit_bank,
  a.deposit_bank_no,
  a.invoice_phone,
  a.invoice_address,
  a.reviewer,
  a.tax_control_type,
  a.is_digital,
  a.tonnage_type,
  a.is_has_city,
  a.is_certificate_num,
  a.is_business_num,
  a.data_source,
  b.network AS store_network,
  b.name AS store_name,
  b.base_code AS store_code,
  b.organization_code,
  a.is_new_car_elec_ticket,
  a.is_used_car_elec_ticket,
  max(
    CASE
      WHEN c.data_type = 0 THEN c.short_name
    END
  ) AS store_short_name,
  max(
    CASE
      WHEN c.data_type = 0 THEN c.code
    END
  ) AS dealer_store_code,
  GROUP_CONCAT(
    CASE
      WHEN c.data_type = 1 THEN c.code
    END
  ) AS server_store_code,
  d.company_code
FROM
  tm_finance_invoice a
  LEFT JOIN tm_store_base b ON a.store_id = b.id
  LEFT JOIN tm_store_common c ON b.id = c.store_base_id
  LEFT JOIN tm_finance_sap d ON d.store_id = a.store_id
WHERE
  a.is_deleted = 0
GROUP BY
  a.id
ORDER BY
  id DESC
LIMIT
  10;

耗时部分的执行计划

|│ └─TableReader_186(Probe)   | 152.37| 98.79  | 802| root      |                                                 | time:3.75s, loops:4, cop_task: {num: 5, max: 1.89s, min: 2.29ms, avg: 750.2ms, p95: 1.89s, max_proc_keys: 480, p95_proc_keys: 480, tot_proc: 26.2ms, tot_wait: 19.2ms, copr_cache_hit_ratio: 0.60, build_task_duration: 133.3µs, max_distsql_concurrency: 2}, rpc_info:{Cop:{num_rpc:5, total_time:3.75s}}| data:TableRangeScan_185| N/A      | N/A   |
|│   └─TableRangeScan_185     | 152.37| 479.20 | 802| cop[tikv] | table:b                                         | tikv_task:{proc max:1.88s, min:1ms, avg: 746.4ms, p80:1.88s, p95:1.88s, iters:12, tasks:5}, scan_detail: {total_process_keys: 704, total_process_keys_size: 720213, total_keys: 2009, get_snapshot_time: 9.85ms, rocksdb: {block: {cache_hit_count: 12802}}}, time_detail: {total_process_time: 26.2ms, total_suspend_time: 3.69s, total_wait_time: 19.2ms, total_kv_read_wall_time: 3.72s, tikv_wall_time: 3.74s}| range: decided by [mdm_channel.tm_finance_invoice.store_id], keep order:false| N/A      | N/A   |
|└─IndexLookUp_174(Probe)     | 400.50| 1137.29| 950| root      |                                                 | time:3.91s, loops:4, index_task: {total_time: 2.02s, fetch_handle: 2.02s, build: 1.54µs, wait: 4.04µs}, table_task: {total_time: 1.89s, num: 2, concurrency: 10}, next: {wait_index: 2.02s, wait_table_lookup_build: 445.6µs, wait_table_lookup_resp: 1.89s}|| 21.5 KB  | N/A   |
|  ├─Selection_173(Build)     | 400.50| 290.75 | 950| cop[tikv] |                                                 | time:2.02s, loops:6, cop_task: {num: 3, max: 2.01s, min: 2.06ms, avg: 672.8ms, p95: 2.01s, max_proc_keys: 792, p95_proc_keys: 792, tot_proc: 27.2ms, tot_wait: 6.75ms, copr_cache_hit_ratio: 0.67, build_task_duration: 72.4µs, max_distsql_concurrency: 2}, rpc_info:{Cop:{num_rpc:3, total_time:2.02s}}, tikv_task:{proc max:2s, min:2ms, avg: 668.7ms, p80:2s, p95:2s, iters:10, tasks:3}, scan_detail: {total_process_keys: 792, total_process_keys_size: 36432, total_keys: 1507, get_snapshot_time: 1.82ms, rocksdb: {key_skipped_count: 792, block: {cache_hit_count: 7867}}}, time_detail: {total_process_time: 27.2ms, total_suspend_time: 1.97s, total_wait_time: 6.75ms, total_kv_read_wall_time: 2s, tikv_wall_time: 2.01s}| not(isnull(mdm_channel.tm_store_common.store_base_id))| N/A| N/A|
|  │ └─IndexRangeScan_171     | 400.50| 233.50 | 950| cop[tikv] | table:c, index:idx_store_base_id(store_base_id) | tikv_task:{proc max:2s, min:2ms, avg: 668.7ms, p80:2s, p95:2s, iters:10, tasks:3}| range: decided by [eq(mdm_channel.tm_store_common.store_base_id, mdm_channel.tm_store_base.id)], keep order:false| N/A      | N/A   |
|  └─TableRowIDScan_172(Probe)| 400.50| 629.48 | 950| cop[tikv] | table:c                                         | time:1.89s, loops:4, cop_task: {num: 2, max: 1.89s, min: 5.59ms, avg: 946ms, p95: 1.89s, max_proc_keys: 838, p95_proc_keys: 838, tot_proc: 14.8ms, tot_wait: 9.19ms, copr_cache_hit_ratio: 0.00, build_task_duration: 101.3µs, max_distsql_concurrency: 1}, rpc_info:{Cop:{num_rpc:2, total_time:1.89s}}, tikv_task:{proc max:1.88s, min:1ms, avg: 938ms, p80:1.88s, p95:1.88s, iters:8, tasks:2}, scan_detail: {total_process_keys: 950, total_process_keys_size: 827207, total_keys: 1172, get_snapshot_time: 1.79ms, rocksdb: {delete_skipped_count: 172, key_skipped_count: 664, block: {cache_hit_count: 6032}}}, time_detail: {total_process_time: 14.8ms, total_suspend_time: 1.86s, total_wait_time: 9.19ms, total_kv_read_wall_time: 1.88s, tikv_wall_time: 1.89s} | keep order:false| N/A| N/A|

c表的索引选择性低啊

store_base_id 这个字段区分度挺高的。为了减少c的回表,将(store_base_id)改为了覆盖索引 (store_base_id, code, short_name, data_type)。但是压测还是上不去,b表的耗时比较严重

total_suspend_time 是 TiDB 执行计划中算子因等待数据而暂停的总耗时,在压测场景下出现严重耗时,通常意味着数据读取或计算的并行度不足资源竞争数据分布不合理,导致算子需要频繁等待上游数据。