tiflash节点报错: Detected overflow when decoding integer of length 22 with column type UInt64: pk_type is INT64, schema_snapshot->col_id_to_block_pos

【TiDB 使用环境】测试环境
tiflash宕机且不断重启,报错日志如下:
[2026/01/22 13:28:11.554 +08:00] [ERROR] [RaftCommands.cpp:506] ["[region_id=85732 applied_term=7 applied_index=770343] catch exception: Detected overflow when decoding integer of length 22 with column type UInt64: pk_type is INT64, schema_snapshot->col_id_to_block_pos is [map info : (column_id=-1025, pos=2) (column_id=-1024, pos=1) (column_id=-1, pos=0) (column_id=1, pos=3) (column_id=2, pos=4) (column_id=3, pos=5) (column_id=4, pos=6) (column_id=5, pos=7) (column_id=6, pos=8) (column_id=7, pos=9) (column_id=8, pos=10) (column_id=9, pos=11) (column_id=10, pos=12) (column_id=11, pos=13) (column_id=12, pos=14) (column_id=14, pos=15) (column_id=15, pos=16) (column_id=16, pos=17) (column_id=17, pos=18) (column_id=18, pos=19) (column_id=19, pos=20) (column_id=20, pos=21) (column_id=21, pos=22) (column_id=22, pos=23) (column_id=23, pos=24) (column_id=24, pos=25) (column_id=25, pos=26) (column_id=26, pos=27) (column_id=27, pos=28) (column_id=28, pos=29) (column_id=29, pos=30) (column_id=30,…schema_snapshot->column_defines is [column define : (id=-1, name=_tidb_rowid, type=Int64) (id=-1024, name=_INTERNAL_VERSION, type=UInt64) (id=-1025, name=_INTERNAL_DELMARK, type=UInt8) (id=1, name=id, type=Int64) (id=2, name=customer_no, type=String) (id=3, name=name, type=Nullable(String)) (id=4, name=gender, type=Nullable(Int32)) (id=5, name=age, type=Nullable(Int32)) (id=6, name=birthday, type=Nullable(MyDateTime(0))) (id=7, name=phone, type=Nullable(String)) (id=8, name=union_id, type=Nulla…type=Nullable(String)) (id=141, name=wx_nickname, type=Nullable(String)) (id=142, name=bi_sync_flag, type=Int8) (id=143, name=key_identify_type, type=Nullable(String)) (id=144, name=key_identify_no, type=Nullable(String)) (id=145, name=phone_suffix, type=Nullable(String)) (id=146, name=customer_types, type=Nullable(String)) (id=149, name=_v$_idx_customer_profile_customer_types_0, type=Nullable(UInt64)) (id=150, name=_col$_other_phones_0, type=Nullable(String)) ];, decoding_snapshot_epoch is 2,…5E0066007A008400A700A900B200B200B200D400D500D500D700D800E000E200E800E900EB00EB00EB00EB00EB00EB00EB00EB00EB00ED00EF00F700FD0002010401060108010A010C010D01110127013D01284BD007BEAB1B0259414E475F57414E47323032353130313032333030303536323430303031E4BF9EE680BB01454E435F53286E6D775A644A774B6E47536B5933596266747A676A513D3D29000000000006F0D4B719000000738678B8190A590000399213005B3135313930333834353131303237303736305D5B22E4BF9EE680BB225D5B22454E435F53286E6D775A644A774B6E47536B5933596266747A676A513D3D29225D5B34353131303237303736305D5B22E4BF9EE680BB225D5B22454E435F53286E6D775A644A774B6E47536B5933596266747A676A513D3D29225D5B5D59414E475F57414E475B2259414E475F57414E47323032353130313032333030303536323430303031225D005B5D01000000970DEFB219E8035A48495A48550020035B5D5B5D9D800000180124005A48495A485532363032375B5D5B5D5B5D5B5D5B5D0137383538030100000015000000090D00000000000000000000000301000000150000000A0D0000000000000000000000, , e.what() = DB::Exception, Stack trace:\n\n\n 0x55c0571eaef0\tDB::Exception::Exception<unsigned long&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator>>(int, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator> const&, unsigned long&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator>&&) [tiflash+36699888]\n \tdbms/src/Common/StackTrace.cpp:23\n 0x55c0571e9a6b\tDB::ColumnVector::decodeTiDBRowV2Datum(unsigned long, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator> const&, unsigned long, bool) [tiflash+36694635]\n \tdbms/src/Columns/ColumnVector.h:272\n 0x55c05d0a3c77\tDB::ColumnNullable::decodeTiDBRowV2Datum(unsigned long, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator> const&, unsigned long, bool) [tiflash+136023159]\n \tdbms/src/Columns/ColumnNullable…

上面仅报错的一部分,还有[FATAL]级别报错同样内容。

创建副本的表结构(92个字段+9个索引):

CREATE TABLE `aaa` (
  `id` bigint NOT NULL,
  `customer_no` varchar(30) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  `gender` int DEFAULT NULL,
  `age` int DEFAULT NULL,
  `birthday` datetime DEFAULT NULL,
  `phone` varchar(128) DEFAULT NULL,
  `union_id` varchar(128) DEFAULT,
  `email` varchar(256) DEFAULT NULL,
  `identity_card` varchar(256) DEFAULT NULL,
  `address` varchar(1024) DEFAULT NULL,
  `customer_type` int DEFAULT '0',
  `is_referral` int DEFAULT '0',
  `referee_name` varchar(255) DEFAULT NULL,
  `referee_phone` varchar(128) DEFAULT NULL,
  `referee_car_number` varchar(128) DEFAULT NULL,
  `created_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `created_by` bigint DEFAULT NULL,
  `updated_by` bigint DEFAULT NUL,
  `redundant_customer_ids` varchar(2000) DEFAULT NULL,
.
.
.
.
.
.
`key_identify_type` varchar(100) DEFAULT NULL,
  `key_identify_no` varchar(300) DEFAULT NULL,
  `phone_suffix` varchar(30) DEFAULT NULL,
  `customer_types` json DEFAULT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
  KEY `tm_customer_profile_info_phone_IDX` (`phone`),
  KEY `tm_customer_profile_info_union_id_IDX` (`union_id`),
  KEY `tm_customer_profile_info_customer_no_IDX` (`customer_no`),
  KEY `tm_customer_profile_info_query_IDX` (`is_latest`,`created_time`,`is_deleted`),
  KEY `tm_customer_profile_info_name_IDX` (`name`),
  KEY `tm_customer_profile_info_phone_suffix_IDX` (`phone_suffix`),
  KEY `idx_customer_profile_filter` (`is_deleted`,`is_latest`,`brand`,`id`),
  KEY `idx_customer_profile_customer_types` ((cast(`customer_types` as unsigned array))),
  KEY `tm_customer_profile_info_other_phones_IDX` (`other_phones`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin

主键数据类型是bigint,最大的id是18 位,但报错说遇到 22 位

mysql> select max(id) from aaa;
+--------------------+
| max(id)            |
+--------------------+
| 499494239617552384 |
+--------------------+
1 row in set (0.01 sec)

执行这个SQL语句后开始报错的

ALTER TABLE aaa MODIFY COLUMN other_phones varchar(512) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL COMMENT '其他手机号';

和这个issue有点像

1 个赞

没遇到过,有可能是坏块,系统层面有磁盘报错吗

1 个赞

磁盘是好着的。重建一下看看吧

1 个赞

检查 TiKV 节点的日志,看有没有报什么错误

1 个赞

2026-01-22 13:26:11 (UTC+08:00)TiKV *********:20160[endpoint.rs:975] [“cdc initialize fail: Request error message: "peer is not leader for region 127289, leader may None" not_leader { region_id: 127289 }”] [request_id=RequestId(2424)] [conn_id=ConnId(483)] [region_id=127289] [thread_id=161]

有一些cdc相关的报错,这个有影响吗?

1 个赞

你是 version: v8.5.4 吗,这个应该老早就修了呀
https://github.com/pingcap/tidb/issues/53634

1 个赞

是v8.5.4,但是遇到了,找到了类似issue也是说解决了。我仅给表加了个字段就开始报错,无限重启。

会不会和我表太大有关?

1 个赞

看日志有点复杂,没遇到过

是从低版本升级上来的吗? 这张表的 tiflash 副本删了重建能恢复吗

1 个赞

像是逻辑坏块啊

这个表在做TiFlash后结构有没有发生过变化?

有,原本就一个tiflash副本。新增字段之后开始同步报错。重新创建副本还是报错

最初版本8.5.1

修改varchar字段后,多值索引导致的问题。
按照下面的步骤可以复现

drop table test.tm_test;

CREATE TABLE `test`.`tm_test` (
  `id` bigint NOT NULL,
  `customer_types` json DEFAULT NULL,
  `other_phones` varchar(1024) DEFAULT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
  KEY `idx_customer_profile_customer_types` ((cast(`customer_types` as unsigned array)))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='测试表';

INSERT INTO test.tm_test (id,customer_types,other_phones) VALUES
	 (40343892505477852,'[1, 0]',NULL),
	 (40343892505477907,'[1]',NULL),
	 (40343892505477915,'[1, 0]',NULL),
	 (40343892505477919,'[1]',NULL),
	 (40564894342709282,'[1, 0]',NULL),
	 (40564894342709318,'[1, 0]',NULL),
	 (40564894342709322,'[1]',NULL),
	 (40564894342709326,'[0]',NULL),
	 (40610284294152202,'[3]',NULL),
	 (40610284294152208,'[0]',''),
	 (40610284294152218,'[1, 0]',NULL),
	 (40610284294152242,'[1]',NULL),
	 (40617455906889734,'[1, 0]',NULL);

ALTER TABLE test.tm_test SET TIFLASH REPLICA 1;
select * from information_schema.tiflash_replica;

ALTER TABLE test.tm_test MODIFY COLUMN other_phones varchar(512) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NULL;

已提issue TiFlash is reporting errors and restarting continuously. [ERROR]: Detected overflow when decoding integer of length 22 with column type UInt64: pk_type is INT64, schema_snapshot->col_id_to_block_pos · Issue #10681 · pingcap/tiflash · GitHub

  • Detected overflow when decoding integer of length 22 with column type UInt64
  • 含义: TiFlash 期望读取一个整数(可能是主键 ID 或其他 INT 列),但在数据流中遇到了一个长度为 22 字节的异常数据片段,导致无法转换为 UInt64