首頁>技術>

一、現象

某專案反饋系統非常卡,登陸 MySQL 後發現大量 SQL 處於 Opening tables 狀態,斷斷續續,一會好一會又出現,比較離奇。

之前遇到過類似的問題現象,是由於 MySQL 開啟了 AHI(自適應雜湊索引)後如果 DROPTRUNCATE 一張大表,MySQL 會同時刪除 buffer pool 中對應表上的 AHI,這個過程需會持有一把資料字典鎖(非常重的鎖),導致其他使用者執行緒因為無法獲資料字典鎖而處於 Opening tables 狀態。

但是這裡檢查過所有的會話 SQL,並沒有發現有 DROP 或者 TRUNCATE 語句,所以應該不是上述的這種問題,而是其他的原因導致了類似的現象。

值得懷疑的是,這臺 MySQL 伺服器的磁碟寫入效能很差,會不會是因為磁碟寫入太差導致這個現象?

因為一時間沒法排查出具體原因,所以針對 MySQL 程序打了個快照,方便事後詳細排查。

二、分析棧幀

用 pt-pmp 工具處理下快照後,內容比較多,這裡只顯示一些比較重要的堆疊:

1402 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),sync_array_wait_event(sync0arr.cc:475),TTASEventMutex::wait(ut0mutex.ic:89),spin_and_try_lock(ib0mutex.h:850),enter(ib0mutex.h:850),PolicyMutex<TTASEventMutex<GenericPolicy>(ib0mutex.h:850),dict_table_open_on_name(dict0dict.cc:1238),ha_innobase::open_dict_table(ha_innodb.cc:6250),ha_innobase::open(ha_innodb.cc:5888),handler::ha_open(handler.cc:2759),open_table_from_share(table.cc:3353),open_table(sql_base.cc:3559),open_and_process_table(sql_base.cc:5145),open_tables(sql_base.cc:5145),open_tables_for_query(sql_base.cc:6531),execute_sqlcom_select(sql_parse.cc:5127),mysql_execute_command(sql_parse.cc:2792),mysql_parse(sql_parse.cc:5582),dispatch_command(sql_parse.cc:1458),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2190),start_thread(libpthread.so.0),clone(libc.so.6)36 nanosleep(libpthread.so.0),os_thread_sleep(os0thread.cc:279),buf_flush_wait_flushed(buf0flu.cc:2074),log_preflush_pool_modified_pages(log0log.cc:1531),log_checkpoint_margin(log0log.cc:1973),log_check_margins(log0log.cc:1973),log_free_check(log0log.ic:491),row_ins_sec_index_entry(log0log.ic:491),row_ins_index_entry(row0ins.cc:3460),row_ins_index_entry_step(row0ins.cc:3460),row_ins(row0ins.cc:3460),row_ins_step(row0ins.cc:3460),row_insert_for_mysql_using_ins_graph(row0mysql.cc:1738),ha_innobase::write_row(ha_innodb.cc:7566),handler::ha_write_row(handler.cc:7991),write_record(sql_insert.cc:1873),Sql_cmd_insert::mysql_insert(sql_insert.cc:769),Sql_cmd_insert::execute(sql_insert.cc:3105),mysql_execute_command(sql_parse.cc:3566),mysql_parse(sql_parse.cc:5582),dispatch_command(sql_parse.cc:1458),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2190),start_thread(libpthread.so.0),clone(libc.so.6)1 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),buf_dblwr_flush_buffered_writes(buf0dblwr.cc:979),buf_dblwr_add_to_batch(buf0dblwr.cc:1154),buf_flush_write_block_low(buf0flu.cc:1099),buf_flush_page(buf0flu.cc:1099),buf_flush_try_neighbors(buf0flu.cc:1453),buf_flush_page_and_try_neighbors(buf0flu.cc:1453),buf_do_flush_list_batch(buf0flu.cc:1780),buf_flush_batch(buf0flu.cc:1780),buf_flush_do_batch(buf0flu.cc:1780),pc_flush_slot(buf0flu.cc:2874),buf_flush_page_cleaner_worker(buf0flu.cc:3504),start_thread(libpthread.so.0),clone(libc.so.6)1nanosleep(libpthread.so.0),os_thread_sleep(os0thread.cc:279),buf_flush_wait_flushed(buf0flu.cc:2074),log_preflush_pool_modified_pages(log0log.cc:1531),log_checkpoint_margin(log0log.cc:1973),log_check_margins(log0log.cc:1973),log_free_check(log0log.ic:491),que_run_threads_low(log0log.ic:491),que_run_threads(log0log.ic:491),trx_purge(trx0purge.cc:1883),srv_do_purge(srv0srv.cc:2631),srv_purge_coordinator_thread(srv0srv.cc:2631),start_thread(libpthread.so.0),clone(libc.so.6)1nanosleep(libpthread.so.0),os_thread_sleep(os0thread.cc:279),buf_flush_wait_flushed(buf0flu.cc:2074),log_preflush_pool_modified_pages(log0log.cc:1531),log_checkpoint_margin(log0log.cc:1973),log_check_margins(log0log.cc:1973),log_free_check(log0log.ic:491),que_run_threads_low(log0log.ic:491),que_run_threads(log0log.ic:491),que_eval_sql(que0que.cc:1232),dict_stats_exec_sql(dict0stats.cc:311),dict_stats_save(dict0stats.cc:2415),dict_stats_update(dict0stats.cc:3112),dict_stats_process_entry_from_recalc_pool(dict0stats_bg.cc:356),dict_stats_thread(dict0stats_bg.cc:356),start_thread(libpthread.so.0),clone(libc.so.6)

其中 1402 個執行緒是處於自旋狀態,因為無法拿到對應的 Mutex 而進行忙等,應該就是對應我們看到的大量 Opening tables 的執行緒,但是具體什麼鎖需要跟蹤下 MySQL 原始碼才能知道。

36 個執行緒都是 insert 操作,最後都再等待 buf_flush_wait_flushed,貌似是等待刷盤操作。最後三個執行緒分別是:

1. page_cleaner:後臺刷髒執行緒,最後處於 buf_dblwr_flush_buffered_writes

2. purge_thread:purge 執行緒(清理 undo),最後處於 buf_flush_wait_flushed

3. dict_stats_thread:統計資訊更新執行緒,最後處於 buf_flush_wait_flushed

簡單分析下這些執行緒,發現 insert、pugre_thread、dict_stats_thread 的最後一部分堆疊呼叫都是如下:

log_free_check -> log_check_margins  -> log_checkpoint_margin   -> log_preflush_pool_modified_pages    -> buf_flush_wait_flushed     -> os_thread_sleep
三、原始碼跟蹤

Server version: 5.7.22-log MySQL Community Server (GPL)

這裡我把原始碼分析主要拆成 2 塊,第一塊是分析 1402 個執行緒的堆疊,看下是在等待什麼鎖資源,第二塊是分析這個鎖資源被哪個執行緒佔用了,為啥不釋放?

先來看第一部分,過程就是從 MySQL 快照中找到一個如下類似的執行緒(這裡去掉了一些引數顯示),然後從 MySQL 原始碼中一步一步分析函式呼叫。

#0  0x00007efc61303965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0#1  0x0000000000ffb03b in wait at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/os/os0event.cc:165#2  os_event::wait_low at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/os/os0event.cc:335#3  0x00000000010a0ce9 in sync_array_wait_event at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/sync/sync0arr.cc:475#4  0x0000000000f8c1c4 in TTASEventMutex<GenericPolicy>::wait at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/include/ut0mutex.ic:89#5  0x0000000000f8c33b in spin_and_try_lock at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/include/ib0mutex.h:850#6  enter at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/include/ib0mutex.h:707#7  PolicyMutex<TTASEventMutex<GenericPolicy> >::enter at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/include/ib0mutex.h:987#8  0x0000000001152f41 in dict_table_open_on_name at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/dict/dict0dict.cc:1238#9  0x0000000000f72a73 in ha_innobase::open_dict_table at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/handler/ha_innodb.cc:6250#10 0x0000000000f8273b in ha_innobase::open at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/handler/ha_innodb.cc:5888#11 0x000000000081b33e in handler::ha_open at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/sql/handler.cc:2759#12 0x0000000000dc239a in open_table_from_share at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/sql/table.cc:3353#13 0x0000000000cc18b9 in open_table at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/sql/sql_base.cc:3559#14 0x0000000000cc52b6 in open_and_process_table at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/sql/sql_base.cc:5145#15 open_tables at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/sql/sql_base.cc:5756#16 0x0000000000cc5e62 in open_tables_for_query at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/sql/sql_base.cc:6531#17 0x0000000000d14ff6 in execute_sqlcom_select at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/sql/sql_parse.cc:5127#18 0x0000000000d18bce in mysql_execute_command at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/sql/sql_parse.cc:2792#19 0x0000000000d1aaad in mysql_parse at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/sql/sql_parse.cc:5582#20 0x0000000000d1bcca in dispatch_command at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/sql/sql_parse.cc:1458#21 0x0000000000d1cb74 in do_command at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/sql/sql_parse.cc:999#22 0x0000000000dedaec in handle_connection at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/sql/conn_handler/connection_handler_per_thread.cc:300#23 0x0000000001256a94 in pfs_spawn_thread at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/perfschema/pfs.cc:2190#24 0x00007efc612ffdd5 in start_thread () from /lib64/libpthread.so.0#25 0x00007efc5fdb8ead in clone () from /lib64/libc.so.6

分析過後,發現這種執行緒,最終是在 dict_table_open_on_name 函式中(注意函式行數為 1238,與堆疊一致),因為無法獲取 dict_sys->mutex,這把資料字典鎖而產生自旋等待。

再來看第二部分,dict_sys->mutex 被哪個執行緒佔了,為啥不釋放?這裡有個小技巧,就是隻需要分析總數為 1 的函式呼叫,因為這把資料字典鎖同一時間只能被一個執行緒佔用,那這裡主要是就是最後 3 個執行緒。先從最可疑的執行緒開始,即 dict_stats_thread,因為它也有一個 dict~

#0  0x00007efc61306e3d in nanosleep () from /lib64/libpthread.so.0#1  0x0000000000ffb7a0 in os_thread_sleep at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/os/os0thread.cc:279#2  0x0000000001135779 in buf_flush_wait_flushed at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/buf/buf0flu.cc:2074#3  0x0000000000fd4589 in log_preflush_pool_modified_pages at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/log/log0log.cc:1531#4  0x0000000000fd7a69 in log_checkpoint_margin at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/log/log0log.cc:1973#5  log_check_margins at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/log/log0log.cc:2065#6  0x000000000101af4d in log_free_check at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/include/log0log.ic:491#7  que_run_threads_low at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/que/que0que.cc:1109#8  que_run_threads at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/que/que0que.cc:1155#9  0x000000000101b86e in que_eval_sql at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/que/que0que.cc:1232#10 0x0000000001176b9c in dict_stats_exec_sql at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/dict/dict0stats.cc:311#11 0x0000000001179346 in dict_stats_save at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/dict/dict0stats.cc:2415#12 0x000000000117a9ca in dict_stats_update at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/dict/dict0stats.cc:3112#13 0x000000000117cc3a in dict_stats_process_entry_from_recalc_pool at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/dict/dict0stats_bg.cc:356#14 dict_stats_thread at /export/home/pb2/build/sb_0-27500212-1520171728.22/mysql-5.7.22/storage/innobase/dict/dict0stats_bg.cc:445#15 0x00007efc612ffdd5 in start_thread () from /lib64/libpthread.so.0#16 0x00007efc5fdb8ead in clone () from /lib64/libc.so.6

同樣根據該執行緒的函式呼叫,去 MySQL 原始碼裡一步一步跟蹤,看下是否佔用了資料字典鎖,比較幸運,在 dict_stats_save 函式中發現其拿走了 資料字典鎖,然後進入 dict_stats_exec_sql 函式執行更新統計資訊的 SQL,而且期間並未釋放資料字典鎖(這塊需要去函式分支中檢查,因為函式較多,這裡省略說明),如下:

找到資料字典鎖被 dict_stats_thread 佔用後就得分析為啥它不釋放呢,我們繼續往下看堆疊,這塊就得去著重分析下 log_free_check 開始往下的函式,之前也提到過 insert、pugre_thread、dict_stats_thread 等執行緒最後都是跑得這段邏輯。

dict_stats_save -> dict_stats_exec_sql  -> que_eval_sql   -> que_run_threads    -> que_run_threads_low     -> log_free_check /* Check that there is enough space in the log to accommodate possible log entries by this query step. */      -> log_check_margins /* Checks that there is enough free space in the log to start a new query step.Flushes the log buffer or makes a new checkpoint if necessary. */       -> log_checkpoint_margin /* Tries to establish a big enough margin of free space in the log groups, such that a new log entry can be catenated without an immediate need for a checkpoint. */        -> log_preflush_pool_modified_pages /* Advances the smallest lsn for which there are unflushed dirty blocks in the buffer pool. */         -> buf_flush_wait_flushed /* Waits until a flush batch of the given lsn ends */          -> os_thread_sleep

這段邏輯主要是先判斷 log buffer 可用空間是否達到臨界值(log->buf_free > log->max_buf_free),達到的話需要將 log buffer 中的日誌先刷到 redo log 中。

然後判斷 innodb buffer pool 中髒頁佔比是否達到臨界值((log->lsn - oldest_lsn) > log->max_modified_age_sync),達到的話需要進行同步刷髒頁,這個過程可能會比較久,且阻塞當前其他寫入程序。

最後判斷 checkpoint 是否達到臨界值(對比 log->lsn - log->last_checkpoint_lsnmax_checkpoint_agemax_checkpoint_age_async 3 個值的大小),分別選擇進行同步 checkpoint、非同步 checkpoint、或是不做 checkpoint。

這裡最後進入的是 log_preflush_pool_modified_pages,就是`同步刷髒頁`的過程,刷髒頁的過程就是喚醒 page clean thread 進行寫入,而自身則處於 buf_flush_wait_flushed 函式中,一直等待刷髒頁完成,就如我們之前堆疊中看到的一樣,insert、pugre_thread、dict_stats_thread 等執行緒均在等待 page clean thread 重新整理髒頁。

這塊可能我理解的不是很透徹~,大家有興趣可以去看下 MySQL · 引擎特性 · InnoDB redo log漫遊,裡面介紹的非常詳細

MySQL · 引擎特性 · InnoDB redo log漫遊

http://mysql.taobao.org/monthly/2015/05/01/

四、總結

根據上述堆疊分析,總結下來,還是由於磁碟寫入效能太差,導致了 innodb buffer pool 中髒頁LSN 落後當前 LSN 太多,達到臨界值後,觸發了 MySQL 的同步刷髒機制,期間會卡住所有正在進行寫入的執行緒,即諸多 insert 語句、purge 執行緒、還有 dict_stats_thread(統計資訊更新執行緒)均被阻塞。

我們一開始看到的許多會話處於 Opening tables 的現象,可以說是一個比較巧合的事。當某張表的資料量變化達到一定量級(預設 10%),就觸發了後臺的 dict_stats_thread,它會拿著 dict_sys->mutex(資料字典鎖)去更新對應表上面的統計資訊,但是更新過程中卡在同步刷髒頁步驟,無法及時釋放 dict_sys->mutex,因此很多 SQL 會因為無法拿到 dict_sys->mutex 而處於 Opening tables 狀態。

附錄:

http://mysql.taobao.org/monthly/2015/05/01/http://blog.itpub.net/7728585/viewspace-2149106/https://my.oschina.net/yejr/blog/4594471https://developer.aliyun.com/article/41043http://mysql.taobao.org/monthly/2020/03/08/

18
最新評論
  • BSA-TRITC(10mg/ml) TRITC-BSA 牛血清白蛋白改性標記羅丹明
  • 基於Swing的Java小遊戲——吃熱狗