tidb版本:8.1.1,部署在华为云上
tiflash 重启,查看tiflash.log,发现有下列报错,
2131036:[2026/03/11 11:01:32.451 +08:00] [FATAL] [Exception.cpp:106] ["Code: 0, e.displayText() = DB::TiFlashException: Memory limit exceeded caused by 'RSS(Resident Set Size) much larger than limit' : process memory size would be 137.91 GiB for (attempt to allocate chunk of 2097152 bytes), limit of memory for data computing : 136.63 GiB. Memory Usage of Storage: non-query: peak=30.95 GiB, amount=1.92 MiB; kvstore: peak=904.57 MiB, amount=10.27 KiB; query-storage-task: peak=13.20 GiB, amount=12.75 GiB; fetch-pages: peak=0.00 B, amount=0.00 B; shared-column-data: peak=13.20 GiB, amount=12.75 GiB., e.what() = DB::TiFlashException, Stack trace:\n\n\n 0x1b97e0c\tDB::TiFlashException::TiFlashException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::TiFlashError const&) [tiflash+28933644]\n \tdbms/src/Common/TiFlashException.h:263\n 0x1b97120\tMemoryTracker::alloc(long, bool) [tiflash+28930336]\n \tdbms/src/Common/MemoryTracker.cpp:219\n 0x1b96cf8\tMemoryTracker::alloc(long, bool) [tiflash+28929272]\n \tdbms/src/Common/MemoryTracker.cpp:230\n 0x1ba1c0c\tAllocator<false>::alloc(unsigned long, unsigned long) [tiflash+28974092]\n \tdbms/src/Common/Allocator.cpp:68\n 0x1c084c0\tvoid DB::PODArrayBase<1ul, 4096ul, Allocator<false>, 15ul, 16ul>::alloc<>(unsigned long) [tiflash+29394112]\n \tdbms/src/Common/PODArray.h:145\n 0x7291568\tDB::ColumnString::insertRangeFrom(DB::IColumn const&, unsigned long, unsigned long) [tiflash+120132968]\n \tdbms/src/Columns/ColumnString.cpp:97\n 0x6a6a800\tDB::DM::ColumnFileInMemory::readDataForFlush() const [tiflash+111585280]\n \tdbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileInMemory.cpp:106\n 0x6a9ec44\tDB::DM::MemTableSet::buildFlushTask(DB::DM::DMContext&, unsigned long, unsigned long, unsigned long) [tiflash+111799364]\n \tdbms/src/Storages/DeltaMerge/Delta/MemTableSet.cpp:310\n 0x6a8cb90\tDB::DM::DeltaValueSpace::flush(DB::DM::DMContext&) [tiflash+111725456]\n \tdbms/src/Storages/DeltaMerge/Delta/DeltaValueSpace.cpp:365\n 0x695d65c\tDB::DM::Segment::flushCache(DB::DM::DMContext&) [tiflash+110483036]\n \tdbms/src/Storages/DeltaMerge/Segment.cpp:2279\n 0x690008c\tDB::DM::DeltaMergeStore::flushCache(std::__1::shared_ptr<DB::DM::DMContext> const&, DB::DM::RowKeyRange const&, bool) [tiflash+110100620]\n \tdbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp:774\n 0x69028e0\tDB::DM::DeltaMergeStore::flushCache(DB::Context const&, DB::DM::RowKeyRange const&, bool) [tiflash+110110944]\n \tdbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp:747\n 0x7bd939c\tDB::KVStore::tryFlushRegionCacheInStorage(DB::TMTContext&, DB::Region const&, std::__1::shared_ptr<DB::Logger> const&, bool) [tiflash+129864604]\n \tdbms/src/Storages/KVStore/KVStore.cpp:227\n 0x7c340c4\tDB::KVStore::forceFlushRegionDataImpl(DB::Region&, bool, DB::TMTContext&, DB::RegionTaskLock const&, unsigned long, unsigned long) const [tiflash+130236612]\n \tdbms/src/Storages/KVStore/MultiRaft/Persistence.cpp:255\n 0x7c3363c\tDB::KVStore::canFlushRegionDataImpl(std::__1::shared_ptr<DB::Region> const&, unsigned char, bool, DB::TMTContext&, DB::RegionTaskLock const&, unsigned long, unsigned long, unsigned long, unsigned long) [tiflash+130233916]\n \tdbms/src/Storages/KVStore/MultiRaft/Persistence.cpp:230\n 0x7c33dd4\tDB::KVStore::tryFlushRegionData(unsigned long, bool, bool, DB::TMTContext&, unsigned long, unsigned long, unsigned long, unsigned long) [tiflash+130235860]\n \tdbms/src/Storages/KVStore/MultiRaft/Persistence.cpp:123\n 0x7c0ea08\tTryFlushData [tiflash+130083336]\n \tdbms/src/Storages/KVStore/FFI/ProxyFFI.cpp:161\n 0xffff9a840f9c\t_$LT$engine_store_ffi..observer..TiFlashObserver$LT$T$C$ER$GT$$u20$as$u20$raftstore..coprocessor..AdminObserver$GT$::pre_exec_admin::h2f5bf67dbdf7c90f [libtiflash_proxy.so+26152860]\n \tcontrib/tiflash-proxy/proxy_components/engine_store_ffi/src/observer.rs:120\n 0xffff9b665724\traftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::apply_raft_cmd::h9308910d47c3ade6 [libtiflash_proxy.so+40982308]\n \tcontrib/tiflash-proxy/components/raftstore/src/store/fsm/apply.rs:1429\n 0xffff9b67aa94\traftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::process_raft_cmd::he5587c01a9599a25 [libtiflash_proxy.so+41069204]\n \tcontrib/tiflash-proxy/components/raftstore/src/store/fsm/apply.rs:1377\n 0xffff9b67cb6c\traftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::handle_raft_committed_entries::h849a05848402ae24 [libtiflash_proxy.so+41077612]\n \tcontrib/tiflash-proxy/components/raftstore/src/store/fsm/apply.rs:1129\n 0xffff9b65bce4\traftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_apply::h915edb389d0ce878 [libtiflash_proxy.so+40942820]\n \tcontrib/tiflash-proxy/components/raftstore/src/store/fsm/apply.rs:4020\n 0xffff9b65ec14\traftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_tasks::hc0f710a21a8448f8 [libtiflash_proxy.so+40954900]\n \tcontrib/tiflash-proxy/components/raftstore/src/store/fsm/apply.rs:4351\n 0xffff9a91e7b8\t_$LT$raftstore..store..fsm..apply..ApplyPoller$LT$EK$GT$$u20$as$u20$batch_system..batch..PollHandler$LT$raftstore..store..fsm..apply..ApplyFsm$LT$EK$GT$$C$raftstore..store..fsm..apply..ControlFsm$GT$$GT$::handle_normal::h474edac058d2c646 [libtiflash_proxy.so+27060152]\n \tcontrib/tiflash-proxy/components/raftstore/src/store/fsm/apply.rs:4633\n 0xffff9a89b618\tbatch_system::batch::Poller$LT$N$C$C$C$Handler$GT$::poll::hdbfc86c50b98d3ed [libtiflash_proxy.so+26523160]\n \tcontrib/tiflash-proxy/components/batch-system/src/batch.rs:380\n 0xffff9a970444\tstd::sys_common::backtrace::__rust_begin_short_backtrace::h209bcd90e7cc37ca [libtiflash_proxy.so+27395140]\n \t/root/.rustup/toolchains/nightly-2022-11-15-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:121\n 0xffff9a9b2284\tcore::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h159d73113cffcc67 [libtiflash_proxy.so+27665028]\n \t/root/.rustup/toolchains/nightly-2022-11-15-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:513\n 0xffff9bd9f29c\tstd::sys::unix::thread::Thread::new::thread_start::h45f22376cc6c77f8 [libtiflash_proxy.so+48558748]\n \t/root/.rustup/toolchains/nightly-2022-11-15-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/thread.rs:108\n 0xffff98e17d38\tstart_thread [libpthread.so.0+32056]\n 0xffff98c0f680\tthread_start [libc.so.6+915072]"] [source="uint8_t DB::TryFlushData(DB::EngineStoreServerWrap *, uint64_t, uint8_t, uint64_t, uint64_t, uint64_t, uint64_t)"] [thread_id=9637]
看起来是语句要使用的内存超过tiflash内存本身导致重启了,但是查看慢查询也没有发现特别耗费多的内存的语句。大家有遇见过这种类似的情况吗