說完了Hive我們接著來看另外一個建立在Hadoop基礎上的儲存引擎HBase,HBase以記憶體作為快取資料落地到HDFS的Key-Value資料庫,因為使用記憶體快取極大保障了資料的實時性和實時查詢能力,在實時場景的大資料儲存HBase是不可或缺的解決方案,常見又在使用這項技術的業務就是短鏈,比如你在微信給你的朋友發個URL最終你的朋友獲取到的是微信的一個短連結(QQ淘寶都是如此),在HBase中就儲存了這樣一個對應關係,這一切都歸功於HBase的吞吐量和實時響應速度.
附上:Hbase官網:https://hbase.apache.org/喵了個咪的部落格:w-blog.cn
1. 準備工作準備軟體包
zookeeper-3.4.10.tar.gzhbase-1.3.1-bin.tar.gz
Hbase依賴於Zookeeper和Hadoop叢集所以我們在之前配置好的Hadoop叢集下來配置整體的Hbase叢集伺服器清單
# hadoop-1 192.168.1.101 NameNode DataNode$ hadoop-2 192.168.1.102 DataNode$ hadoop-3 192.168.1.103 DataNode
Zookeeper安裝
> cd /app/install/> tar -zxvf zookeeper-3.4.10.tar.gz> mv zookeeper-3.4.10 /usr/local/
修改配置檔案
> cd /usr/local/zookeeper-3.4.10/conf/> cp zoo_sample.cfg zoo.cfg> vim zoo.cfgtickTime=2000dataDir=/usr/local/zookeeper-3.4.10/dataclientPort=2181initLimit=10syncLimit=5server.1=hadoop-1:2888:3888server.2=hadoop-2:2888:3888server.3=hadoop-3:2888:3888
所有節點修改環境變數
> vim /etc/profile# zookeeperexport ZOOKEEPER_HOME=/usr/local/zookeeper-3.4.10export PATH=$ZOOKEEPER_HOME/bin:$PATH> source /etc/profile
將zookeeper目錄複製到其他節點上
> scp -r /usr/local/zookeeper-3.4.10/ root@hadoop-2:/usr/local/zookeeper-3.4.10> scp -r /usr/local/zookeeper-3.4.10/ root@hadoop-3:/usr/local/zookeeper-3.4.10
新增myid檔案(每節點都需要)
> cd /usr/local/zookeeper-3.4.10> mkdir data> echo "1" > data/myid
注意,每個節點myid檔案要不一致
啟動並測試
# 在三臺機器上分別執行> zkServer.sh start# 檢視狀態[root@hadoop-1 zookeeper-3.4.10]# zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: /usr/local/zookeeper-3.4.10/bin/../conf/zoo.cfgMode: follower[root@hadoop-2 zookeeper-3.4.10]# zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: /usr/local/zookeeper-3.4.10/bin/../conf/zoo.cfgMode: leader[root@hadoop-3 zookeeper-3.4.10]# zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: /usr/local/zookeeper-3.4.10/bin/../conf/zoo.cfgMode: follower
2.安裝HBase> cd /app/install/> tar -zxvf hbase-1.3.1-bin.tar.gz > mv hbase-1.3.1 /usr/local/
修改配置檔案
> vim /usr/local/hbase-1.3.1/conf/hbase-env.sh# 配置Java環境變數export JAVA_HOME=/usr/local/jdk1.8# hbase使用外部的zkexport HBASE_MANAGES_ZK=false
增加相應配置
> vim /usr/local/hbase-1.3.1/conf/regionservershadoop-2hadoop-3
建立hdfs中資料存放路徑b
> hdfs dfs -mkdir /user/hadoop/hbase
複製到其他節點
> scp -r /usr/local/hbase-1.3.1/ root@hadoop-2:/usr/local/hbase-1.3.1> scp -r /usr/local/hbase-1.3.1/ root@hadoop-3:/usr/local/hbase-1.3.1# 分別賦予許可權chown -R hadoop:hadoop /usr/local/hbase-1.3.1/
所有節點配置環境變數
> vim /etc/profile# hbaseexport HBASE_HOME=/usr/local/hbase-1.3.1export PATH=$HBASE_HOME/bin:$PATH> source /etc/profile
啟動叢集
su hadoopstart-hbase.sh
通過JPS可以檢視到主節點上有HMaster程序子節點上有HRegionServer程序
內網可以訪問Hbase管理介面 http://hadoop-1:16010
3.基本操作通過如下命令可以進入Hbase的shell操作介面
hbase shellhbase(main):001:0>
一般操作
查詢伺服器狀態
hbase(main):024:0>status1 active master, 0 backup masters, 2 servers, 0 dead, 1.0000 average load
查詢HBase版本資訊
hbase(main):025:0>version1.3.1, r930b9a55528fe45d8edce7af42fef2d35e77677a, Thu Apr 6 19:36:54 PDT 2017
二、DDL操作1.建立一個表
hbase(main):011:0>create 'member','member_id','address','info' 0 row(s) in 1.2210seconds
2.獲得表的描述
hbase(main):012:0>listTABLE member 1 row(s) in 0.0160secondshbase(main):006:0>describe 'member'DESCRIPTION ENABLED {NAME => 'member', FAMILIES => [{NAME=> 'address', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', true VERSIONS => '3', COMPRESSION => 'NONE',TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fa lse', BLOCKCACHE => 'true'}, {NAME =>'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSI ONS => '3', COMPRESSION => 'NONE', TTL=> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0230seconds
3.刪除一個列族,alter,disable,enable
hbase(main):003:0>alter 'member',{NAME=>'member_id',METHOD=>'delete'}ERROR: Table memberis enabled. Disable it first before altering.
直接操作會報錯,如果需要刪除列族的時候必須先將表給disable掉。
hbase(main):004:0>disable 'member' 0 row(s) in 2.0390secondshbase(main):005:0>alter'member',{NAME=>'member_id',METHOD=>'delete'}0 row(s) in 0.0560secondshbase(main):006:0>describe 'member'DESCRIPTION ENABLED {NAME => 'member', FAMILIES => [{NAME=> 'address', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',false VERSIONS => '3', COMPRESSION => 'NONE',TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fa lse', BLOCKCACHE => 'true'}, {NAME =>'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSI ONS => '3', COMPRESSION => 'NONE', TTL=> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0230seconds
該列族已經刪除,我們繼續將表enable
hbase(main):008:0> enable 'member' 0 row(s) in 2.0420seconds
4.列出所有的表
hbase(main):028:0>listTABLE member temp_table 2 row(s) in 0.0150seconds
5.drop一個表
hbase(main):029:0>disable 'temp_table'0 row(s) in 2.0590seconds
hbase(main):030:0>drop 'temp_table'0 row(s) in 1.1070seconds
6.查詢表是否存在
hbase(main):021:0>exists 'member'Table member doesexist 0 row(s) in 0.1610seconds
7.判斷表是否enable
hbase(main):034:0>is_enabled 'member'true 0 row(s) in 0.0110seconds
8.判斷表是否disable
hbase(main):032:0>is_disabled 'member'false 0 row(s) in 0.0110seconds
三、DML操作
1.插入幾條記錄
put'member','scutshuxue','info:age','24'put'member','scutshuxue','info:birthday','1987-06-17'put'member','scutshuxue','info:company','alibaba'put'member','scutshuxue','address:contry','china'put'member','scutshuxue','address:province','zhejiang'put'member','scutshuxue','address:city','hangzhou'
put'member','xiaofeng','info:birthday','1987-4-17'put'member','xiaofeng','info:favorite','movie' put'member','xiaofeng','info:company','alibaba'put'member','xiaofeng','address:contry','china'put'member','xiaofeng','address:province','guangdong'put'member','xiaofeng','address:city','jieyang'put'member','xiaofeng','address:town','xianqiao'
2.獲取一條資料
獲取一個id的所有資料
hbase(main):001:0>get 'member','scutshuxue'COLUMN CELL address:city timestamp=1321586240244, value=hangzhou address:contry timestamp=1321586239126, value=china address:province timestamp=1321586239197, value=zhejiang info:age timestamp=1321586238965, value=24 info:birthday timestamp=1321586239015, value=1987-06-17 info:company timestamp=1321586239071, value=alibaba 6 row(s) in 0.4720seconds
獲取一個id,一個列族的所有資料
hbase(main):002:0>get 'member','scutshuxue','info'COLUMN CELL info:age timestamp=1321586238965, value=24 info:birthday timestamp=1321586239015, value=1987-06-17 info:company timestamp=1321586239071, value=alibaba 3 row(s) in 0.0210seconds
獲取一個id,一個列族中一個列的所有資料
hbase(main):002:0>get 'member','scutshuxue','info:age' COLUMN CELL info:age timestamp=1321586238965, value=24 1 row(s) in 0.0320seconds
6.更新一條記錄
將scutshuxue的年齡改成99
hbase(main):004:0>put 'member','scutshuxue','info:age' ,'99'0 row(s) in 0.0210seconds
hbase(main):005:0>get 'member','scutshuxue','info:age' COLUMN CELL info:age timestamp=1321586571843, value=99 1 row(s) in 0.0180seconds
3.通過timestamp來獲取兩個版本的資料
hbase(main):010:0>get 'member','scutshuxue',{COLUMN=>'info:age',TIMESTAMP=>1321586238965}COLUMN CELL info:age timestamp=1321586238965, value=24 1 row(s) in 0.0140seconds
hbase(main):011:0>get 'member','scutshuxue',{COLUMN=>'info:age',TIMESTAMP=>1321586571843}COLUMN CELL info:age timestamp=1321586571843, value=99 1 row(s) in 0.0180seconds
4.全表掃描:
hbase(main):013:0>scan 'member'ROW COLUMN+CELL scutshuxue column=address:city, timestamp=1321586240244, value=hangzhou scutshuxue column=address:contry, timestamp=1321586239126, value=china scutshuxue column=address:province, timestamp=1321586239197, value=zhejiang scutshuxue column=info:age,timestamp=1321586571843, value=99 scutshuxue column=info:birthday, timestamp=1321586239015, value=1987-06-17 scutshuxue column=info:company, timestamp=1321586239071, value=alibaba temp column=info:age, timestamp=1321589609775, value=59 xiaofeng column=address:city, timestamp=1321586248400, value=jieyang xiaofeng column=address:contry, timestamp=1321586248316, value=china xiaofeng column=address:province, timestamp=1321586248355, value=guangdong xiaofeng column=address:town, timestamp=1321586249564, value=xianqiao xiaofeng column=info:birthday, timestamp=1321586248202, value=1987-4-17 xiaofeng column=info:company, timestamp=1321586248277, value=alibaba xiaofeng column=info:favorite, timestamp=1321586248241, value=movie 3 row(s) in 0.0570seconds
5.刪除id為temp的值的‘info:age’欄位
hbase(main):016:0>delete 'member','temp','info:age'0 row(s) in 0.0150secondshbase(main):018:0>get 'member','temp'COLUMN CELL 0 row(s) in 0.0150seconds
6.刪除整行
hbase(main):001:0>deleteall 'member','xiaofeng'0 row(s) in 0.3990seconds
7.查詢表中有多少行:
hbase(main):019:0>count 'member' 2 row(s) in 0.0160seconds
8.給”xiaofeng”這個id增加'info:age'欄位,並使用counter實現遞增
hbase(main):057:0*incr 'member','xiaofeng','info:age' COUNTER VALUE = 1hbase(main):058:0>get 'member','xiaofeng','info:age' COLUMN CELL info:age timestamp=1321590997648, value=\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x01 1 row(s) in 0.0140secondshbase(main):059:0>incr 'member','xiaofeng','info:age'COUNTER VALUE = 2hbase(main):060:0>get 'member','xiaofeng','info:age' COLUMN CELL info:age timestamp=1321591025110, value=\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x02 1 row(s) in 0.0160seconds
獲取當前count的值
hbase(main):069:0>get_counter 'member','xiaofeng','info:age' COUNTER VALUE = 2
9.將整張表清空:
hbase(main):035:0>truncate 'member'Truncating 'member'table (it may take a while): - Disabling table... - Dropping table... - Creating table...0 row(s) in 4.3430seconds
可以看出,hbase是先將掉disable掉,然後drop掉後重建表來實現truncate的功能的。
4. 其他匯出Hbase資料# 匯出到hdfshbase org.apache.hadoop.hbase.mapreduce.Driver export member /hbase/export/member# 匯出檔案列表[hadoop@sunmi-hadoop-1 hbase-1.3.1]$ hdfs dfs -ls /hbase/export/memberFound 2 items-rw-r--r-- 2 hadoop supergroup 0 2017-08-01 15:11 /hbase/export/member/_SUCCESS-rw-r--r-- 2 hadoop supergroup 775 2017-08-01 15:11 /hbase/export/member/part-m-00000
# 匯入需要先建立表create 'member2','address','info' $ 通過匯出的資料匯入hbase org.apache.hadoop.hbase.mapreduce.Driver import member2 /hbase/export/member# 查詢資料 get 'member2','sc utshuxue'
預分割槽
類似於Hive的分割槽和桶的概念,用法如下
> create 't1', 'cf', SPLITS => ['20150501000000000', '20150515000000000', '20150601000000000']或者> create 't2', 'cf', SPLITS_FILE => '/home/hadoop/splitfile.txt'/home/hadoop/splitfile.txt中儲存內容如下:201505010000000002015051500000000020150601000000000
從HBase的Web UI中可以檢視到表的分割槽
啟動thrift 服務Hbase 有兩套Thrift呼叫方式 分別是Thrift1 和 thrift2 大部分開源和Thrift相結合的都是使用 thrift1 但是 Thrift2 是對於 thrift1 的簡化 更適合編寫程式碼中使用 可以通過指定埠的方式來同時執行兩個服務 --infoport 9096 -p 9091 推薦thrift模式 thrift2 使用指定埠
PS:但是有些服務僅僅支援thrift1的協議比如我們後面要說的的
/usr/local/hbase-1.3.1/bin/hbase-daemon.sh --config /usr/local/hbase-1.3.1/conf foreground_start thrift --infoport 9096 -p 9091
啟動 Thrift2 服務
# 開啟本機的thrift服務hbase-daemon.sh start thrift2# 開啟叢集其餘機器thrift服務hbase-daemons.sh start thrift2
使用Supervisor守護程序方式前臺執行
/usr/local/hbase-1.3.1/bin/hbase-daemon.sh --config /usr/local/hbase-1.3.1/conf foreground_start thrift2
注意如果程式長連線使用HBase服務會出現過一段時間斷開的問題應為 超時機制 60S 超時斷掉了 這個時候可以通過設定配置檔案來解決,因此在conf/hbase-site.xml中新增上配置即可:
> vim /usr/local/hbase-1.3.1/conf/hbase-site.xml<property> <name>hbase.thrift.server.socket.read.timeout</name> <value>6000000</value> <description>eg:milisecond</description></property>
服務持續執行
一般使用Supervisor來進行持續執行,當服務因為異常原因終止之後會自己拉起來,但是執行程式的一定要是前臺執行的程式,Hbase主要執行hbasemaster和hbaseregionserver就可以正常提供服務了
# hbaseregionserver/usr/local/hbase-1.3.1/bin/hbase-daemon.sh --config /usr/local/hbase-1.3.1/conf foreground_start regionserver
# hbasemaster/usr/local/hbase-1.3.1/bin/hbase-daemon.sh --config /usr/local/hbase-1.3.1/conf foreground_start master
5 總結
經過本節的介紹大家對HBase也有了一定的了解,HBase在叢集模式下能夠帶來更大的效能和容量優勢,但是HBase在統計彙總能力比較弱,下節將介紹HBase和Hive互相結合整合Hive的結構化方便查詢統計優點也結合HBase速度的優勢,並且解決Hive實時寫入的問題.
注:筆者能力有限有說的不對的地方希望大家能夠指出,也希望多多交流!