回覆列表
  • 1 # 使用者4631824322

    這個處理方法挺多的,暫且舉個例子來簡單說明一下:

    使用hadoop archive 命令透過mapreduce任務 生產 har 壓縮檔案

    測試hdfs原始檔:

    /test/lizhao/2019-01-13/*

    /test/lizhao/2019-01-14/*

    壓縮命令 hadoop archive -archiveName NAME -p <parent path> [-r <replication factor>]<src>* <dest>:

    >>> hadoop archive -archiveName 2019-01.har -p /test/lizhao 2019-01-13 2019-01-14 /test/lizhao/

    19/01/14 14:11:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

    19/01/14 14:11:55 INFO client.RMProxy: Connecting to ResourceManager at IC-1/192.168.11.180:8032

    19/01/14 14:11:56 INFO client.RMProxy: Connecting to ResourceManager at IC-1/192.168.11.180:8032

    19/01/14 14:11:56 INFO client.RMProxy: Connecting to ResourceManager at IC-1/192.168.11.180:8032

    19/01/14 14:11:56 INFO mapreduce.JobSubmitter: number of splits:1

    19/01/14 14:11:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533867597475_0001

    19/01/14 14:11:58 INFO impl.YarnClientImpl: Submitted application application_1533867597475_0001

    19/01/14 14:11:58 INFO mapreduce.Job: The url to track the job: http://ic-1:8088/proxy/application_1533867597475_0001/

    19/01/14 14:11:58 INFO mapreduce.Job: Running job: job_1533867597475_0001

    19/01/14 14:12:07 INFO mapreduce.Job: Job job_1533867597475_0001 running in uber mode : false

    19/01/14 14:12:07 INFO mapreduce.Job: map 0% reduce 0%

    19/01/14 14:12:13 INFO mapreduce.Job: map 100% reduce 0%

    19/01/14 14:12:24 INFO mapreduce.Job: map 100% reduce 100%

    19/01/14 14:12:24 INFO mapreduce.Job: Job job_1533867597475_0001 completed successfully

    19/01/14 14:12:24 INFO mapreduce.Job: Counters: 49

    *****

    Map-Reduce Framework

    Map input records=15

    Map output records=15

    Map output bytes=1205

    Map output materialized bytes=1241

    Input split bytes=116

    Combine input records=0

    Combine output records=0

    Reduce input groups=15

    Reduce shuffle bytes=1241

    Reduce input records=15

    Reduce output records=0

    Spilled Records=30

    Shuffled Maps =1

    Failed Shuffles=0

    Merged Map outputs=1

    GC time elapsed (ms)=137

    CPU time spent (ms)=6370

    Physical memory (bytes) snapshot=457756672

    Virtual memory (bytes) snapshot=3200942080

    Total committed heap usage (bytes)=398458880

    Shuffle Errors

    BAD_ID=0

    CONNECTION=0

    IO_ERROR=0

    WRONG_LENGTH=0

    WRONG_MAP=0

    WRONG_REDUCE=0

    File Input Format Counters

    Bytes Read=995

    File Output Format Counters

    Bytes Written=0

    3、檢視壓縮後的檔案:

    >>> hadoop fs -ls har:///test/lizhao/2019-01.har

    drwxr-xr-x - root supergroup 0 2019-01-14 14:06 har:///test/lizhao/2019-01.har/2019-01-13

    drwxr-xr-x - root supergroup 0 2019-01-14 14:06 har:///test/lizhao/2019-01.har/2019-01-14

    >>> hadoop fs -ls har:///test/lizhao/2019-01.har/2019-01-13

    -rw-r--r-- 2 root supergroup 22 2019-01-14 14:05 har:///test/lizhao/2019-01.har/2019-01-13/1.txt

    -rw-r--r-- 2 root supergroup 22 2019-01-14 14:05 har:///test/lizhao/2019-01.har/2019-01-13/2.txt

    -rw-r--r-- 2 root supergroup 22 2019-01-14 14:05 har:///test/lizhao/2019-01.har/2019-01-13/3.txt

    -rw-r--r-- 2 root supergroup 22 2019-01-14 14:06 har:///test/lizhao/2019-01.har/2019-01-13/5.txt

    -rw-r--r-- 2 root supergroup 22 2019-01-14 14:06 har:///test/lizhao/2019-01.har/2019-01-13/6.txt

    -rw-r--r-- 2 root supergroup 22 2019-01-14 14:06 har:///test/lizhao/2019-01.har/2019-01-13/7.txt

    4、下載har 中的檔案

    hadoop fs -get har:///test/lizhao/2019

  • 中秋節和大豐收的關聯?
  • 關於藝術展覽的英語作文?