首頁>技術>

ClickHouse作為OLAP分析引擎已經被廣泛使用,資料的匯入匯出是使用者面臨的第一個問題。由於ClickHouse本身無法很好地支援單條大批次的寫入,因此在實時同步資料方面需要藉助其他服務協助。本文給出一種結合Canal+Kafka的方案,並且給出在多個MySQL例項分庫分表的場景下,如何將多張MySQL資料表寫入同一張ClickHouse表的方法,歡迎大家批評指正

我的需求就是將mysql多張表的資料實時同步到clickhouse中,參考了很多網上的資料,寫的都不是很好,有的配置完成後,就會報錯,今天終於實現了該功能,分享給大家。

我主要利用canal來實現ClickHouse實時同步MySQL資料,本次不考慮接入kafka等mq

配置mysql canal安裝配置 canal-client安裝配置 建立clickhouse的資料庫和表 結果展示

1. 配置mysql開啟binlog
# mysql配置檔案/etc/my.cnf新增下面配置vi  /etc/my.cnf#插入下面內容server-id        = 1log_bin          = /var/lib/mysql/bin.logbinlog-format    = row # very important if you want to receive write, update and delete row events# optionalexpire_logs_days = 30max_binlog_size  = 768M# setup listen addressbind-address     = 0.0.0.0   複製程式碼
新增同步賬號
#登陸mysql,執行下面命令,建立賬號aaa,密碼為123456 CREATE USER 'maxwell'@'%' IDENTIFIED BY '123456'; GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'maxwell'@'%'; flush privileges;  複製程式碼
2. canal安裝配置下載 canal, 訪問 release頁面 , 選擇需要的包下載, 如以 1.1.4 版本為例
#下載canalan安裝包wget https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.deployer-1.1.4.tar.gz#解壓到指定目錄mkdir -p /usr/tool/canaltar zxvf  canal.deployer-1.1.4.tar.gz  -C  /usr/tool/canal#解壓後進入目錄,結構如下drwxr-xr-x   7 awwzc  staff   238 12 14 23:34 bindrwxr-xr-x   9 awwzc  staff   306 12 14 23:32 confdrwxr-xr-x  83 awwzc  staff  2822 12 14 23:30 libdrwxr-xr-x   4 awwzc  staff   136 12 14 23:34 logs# canal啟動時會讀取conf目錄下面的資料夾,當作instance,進入conf目錄下複製example資料夾cp -R  example/  maxwell/#移除example資料夾mv -rf example#修改canal.properties(一定要修改)# instance列表,conf目錄下必須有同名的目錄canal.destinations = maxwell#修改maxwell資料夾下面的instance.properties檔案下面幾項為你自己的資料庫配置即可vi conf/maxwell/instance.properties# position infocanal.instance.master.address=192.168.0.102:3306# username/passwordcanal.instance.dbUsername=maxwellcanal.instance.dbPassword=123456# 啟動,安裝目錄下執行以下命令,server,instance出現下面日記說明啟動成功bin/startup.sh# 檢視server日記,會出現以下日記tail -200f  logs/canal/canal.log2019-12-14 23:34:47.247 [main] INFO  com.alibaba.otter.canal.deployer.CanalLauncher - ## set default uncaught exception handler2019-12-14 23:34:47.312 [main] INFO  com.alibaba.otter.canal.deployer.CanalLauncher - ## load canal configurations2019-12-14 23:34:47.334 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## start the canal server.2019-12-14 23:34:47.406 [main] INFO  com.alibaba.otter.canal.deployer.CanalController - ## start the canal server[192.168.0.111(192.168.0.111):11111]2019-12-14 23:34:49.026 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## the canal server is running now ......# 檢視instance日記,會出現以下日記tail -200f  logs/maxwell/maxwell.log2019-12-15 17:59:12.908 [destination = example , address = /192.168.0.102:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> begin to find start position, it will be long time for reset or first position2019-12-15 17:59:12.913 [destination = example , address = /192.168.0.102:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - prepare to find start position just last position {"identity":{"slaveId":-1,"sourceAddress":{"address":"192.168.0.102","port":3306}},"postion":{"gtid":"","included":false,"journalName":"bin.000002","position":249315,"serverId":1,"timestamp":1576282583000}}2019-12-15 17:59:13.015 [destination = example , address = /192.168.0.102:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> find start position successfully, EntryPosition[included=false,journalName=bin.000002,position=249315,serverId=1,gtid=,timestamp=1576282583000] cost : 105ms , the next step is binlog dump#關閉sh bin/stop.sh複製程式碼
3. canal-client安裝配置下載 canal-adapter, 訪問 release頁面 , 選擇需要的包下載, 如以 1.1.4 版本為例
#下載wget https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.adapter-1.1.4.tar.gz#解壓mkdir -p /usr/tool/canal-adaptertar zxvf canal.adapter-1.1.4.tar.gz  -C /usr/tool/canal-adapter#解壓後目錄如下drwxr-xr-x   7 awwzc  staff   238 12 15 13:19 bindrwxr-xr-x   9 awwzc  staff   306 12 15 13:18 confdrwxr-xr-x  87 awwzc  staff  2958 12 15 13:18 libdrwxr-xr-x   3 awwzc  staff   102 12 15 13:19 logsdrwxr-xr-x   6 awwzc  staff   204 12 15 13:09 plugin#在lib目錄下面新增clickhouse連線驅動httpcore-4.4.13.jar、httpclient-4.3.3.jar、clickhouse-jdbc-0.2.4.jar;#修改配置檔案conf/application.yml檔案,修改canalServerHost、srcDataSources、canalAdapters的配置;canal.conf:    mode: tcp    canalServerHost: 127.0.0.1:11111   # canal-server的服務地址    secretKey:    #同步資料來源配置    srcDataSources    defaultDS:    #mysql連線資訊          url: jdbc:mysql://192.168.0.102:3306/maxwell?useUnicode=true          username: root          password: 123456      canalAdapters:      - instance: maxwell # canal instance Name or mq topic name        groups:        - groupId: g1          outerAdapters:          - name: logger          - name: rdb   #rdb型別            key: mysql            properties:            #clickhouse資料看配置              jdbc.driverClassName: ru.yandex.clickhouse.ClickHouseDriver              jdbc.url: jdbc:clickhouse://127.0.0.1:8123/maxwell              jdbc.username: default              jdbc.password:              1. 源資料庫與目標資料庫名字不同,源表名與目標表名不同  #修改adapter的conf/rdb/mytest_user.yml配置檔案,指定源資料庫和目標資料庫  dataSourceKey: defaultDS  destination: maxwell  groupId: g1  outerAdapterKey: mysql1  concurrent: true  dbMapping:    database: aqi_china    table: tb_aqi_0    targetTable: aqi_china.tb_aqi    #targetPk:    #  id: id    mapAll: true    #targetColumns:    #  id:    #  name:    #  role_id:    #  c_time:    #  test1:    #etlCondition: "where c_time>={}"    commitBatch: 3000 # 批次提交的大小     2. 多個源資料庫表寫入目的端的同一張表   在conf/rdb 目錄配置多個yml檔案,分別指明不同的table名稱。複製程式碼
啟動
bin/startup.sh#檢視日記,出現以下日記說名啟動成功tail -200f logs/adapter/adapter.log複製程式碼
我在啟動的時候出現了記憶體溢位的異常,可以透過修改startup.sh的配置:

啟動後需要看日誌,成功啟動後會列印:

2019-12-15 13:19:40.457 [main] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterService - ## start the canal client adapters.2019-12-15 13:19:40.464 [main] INFO  c.a.otter.canal.client.adapter.support.ExtensionLoader - extension classpath dir: /Users/awwzc/Documents/my_soft/tool/canal-adpater/plugin2019-12-15 13:19:40.542 [main] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterLoader - Load canal adapter: logger succeed2019-12-15 13:19:40.546 [main] INFO  c.a.otter.canal.client.adapter.rdb.config.ConfigLoader - ## Start loading rdb mapping config ...2019-12-15 13:19:40.637 [main] INFO  c.a.otter.canal.client.adapter.rdb.config.ConfigLoader - ## Rdb mapping config loaded2019-12-15 13:19:40.640 [main] ERROR com.alibaba.druid.pool.DruidDataSource - testWhileIdle is true, validationQuery not set2019-12-15 13:19:40.951 [main] INFO  com.alibaba.druid.pool.DruidDataSource - {dataSource-2} inited2019-12-15 13:19:40.959 [main] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterLoader - Load canal adapter: rdb succeed2019-12-15 13:19:40.986 [main] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterLoader - Start adapter for canal instance: example succeed2019-12-15 13:19:40.986 [Thread-4] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterWorker - =============> Start to connect destination: example <=============2019-12-15 13:19:40.986 [main] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterService - ## the canal client adapters are running now ......2019-12-15 13:19:40.995 [main] INFO  org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-8081"]2019-12-15 13:19:41.021 [main] INFO  org.apache.tomcat.util.net.NioSelectorPool - Using a shared selector for servlet write/read2019-12-15 13:19:41.048 [main] INFO  o.s.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 8081 (http) with context path ''2019-12-15 13:19:41.053 [main] INFO  c.a.otter.canal.adapter.launcher.CanalAdapterApplication - Started CanalAdapterApplication in 7.099 seconds (JVM running for 7.832)2019-12-15 13:19:41.122 [Thread-4] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterWorker - =============> Start to subscribe destination: example <=============2019-12-15 13:19:41.128 [Thread-4] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterWorker - =============> Subscribe destination: example succeed <=============複製程式碼
4. clickhouse建立資料庫和表
#建立資料庫CREATE DATABASE aqi_china;#建立表CREATE TABLE IF NOT EXISTS  aqi_china.tb_aqi (id Int32,uuid String,uid Int32,aqi Int32,url String,co String,dew String,h String,no2 String,o3 String,p String,pm10 String,pm25 String,so2 String,t String,w String,wg String,vtime String,ftime Int32) ENGINE=MergeTree() PARTITION BY uid ORDER BY id SETTINGS index_granularity = 8192;複製程式碼
5. 結果展示我們來看一下展示結果:

這是canal客戶端列印的日誌:

展示clickhouse的資料:

19
最新評論
  • BSA-TRITC(10mg/ml) TRITC-BSA 牛血清白蛋白改性標記羅丹明
  • istio bookinfo微服務搭建流程