by robin Published September 20, 2016 Updated September 20, 2016
Lately I tried installing xz/lzma codec on my local VM setup. The compression ratios are pretty awesome. Won’t do a benchmark here, try it out yourself :wink:
Steps Download codec JAR https://github.com/yongtang/hadoop-xz or https://mvnrepository.com/artifact/io.sensesecure/hadoop-xz Copy downloaded JAR to HDPs’libs folder find /usr/hdp/ -name *snappy*jar | xargs -L1 dirname | xargs -L1 sudo cp ~/hadoop-xz-1.4.jar Setup compression in HDFS config using Ambari Ambari -> HDFS -> Configs -> Advanced core-site -> io.compression.codecs -> add 'io.sensesecure.hadoop.xz.XZCodec' Testing with Hivecreate a big sample file in local dir /tmp/sample.txt
Operations in hive create table orig_sample(val string); !sh hdfs dfs -put /tmp/sample.txt /tmp; LOAD DATA INPATH '/tmp/sample.txt' OVERWRITE INTO TABLE orig_sample; -- test lzma set hive.exec.compress.output=true; set io.seqfile.compression.type=BLOCK; set mapreduce.output.fileoutputformat.compress.codec=io.sensesecure.hadoop.xz.XZCodec; drop table test_table_lzma; CREATE TABLE test_table_lzma ROW FORMAT DELIMITED FIELDS TERMINATED BY "," LINES TERMINATED BY "\n" STORED AS TEXTFILE LOCATION "/tmp/test_table_lzma" as select * from orig_sample; Checking results hdfs dfs -du -s -h /tmp/sample.txt hdfs dfs -du -s -h /tmp/test_table_lzmaRelated posts: Uninstall Hortonworks HDP 2.2 HDFS disk consumption Find what is taking hdfs space Query escaped JSON string in Hive Hive statistics using beeline and expect script