Quantcast
Channel: CodeSection,代码区,数据库(综合) - CodeSec
Viewing all articles
Browse latest Browse all 6262

Adding compression codec to Hortonworks data platform

$
0
0

by robin Published September 20, 2016 Updated September 20, 2016

Lately I tried installing xz/lzma codec on my local VM setup. The compression ratios are pretty awesome. Won’t do a benchmark here, try it out yourself :wink:

Steps Download codec JAR https://github.com/yongtang/hadoop-xz or https://mvnrepository.com/artifact/io.sensesecure/hadoop-xz Copy downloaded JAR to HDPs’libs folder find /usr/hdp/ -name *snappy*jar | xargs -L1 dirname | xargs -L1 sudo cp ~/hadoop-xz-1.4.jar Setup compression in HDFS config using Ambari Ambari -> HDFS -> Configs -> Advanced core-site -> io.compression.codecs -> add 'io.sensesecure.hadoop.xz.XZCodec' Testing with Hive

create a big sample file in local dir /tmp/sample.txt

Operations in hive create table orig_sample(val string); !sh hdfs dfs -put /tmp/sample.txt /tmp; LOAD DATA INPATH '/tmp/sample.txt' OVERWRITE INTO TABLE orig_sample; -- test lzma set hive.exec.compress.output=true; set io.seqfile.compression.type=BLOCK; set mapreduce.output.fileoutputformat.compress.codec=io.sensesecure.hadoop.xz.XZCodec; drop table test_table_lzma; CREATE TABLE test_table_lzma ROW FORMAT DELIMITED FIELDS TERMINATED BY "," LINES TERMINATED BY "\n" STORED AS TEXTFILE LOCATION "/tmp/test_table_lzma" as select * from orig_sample; Checking results hdfs dfs -du -s -h /tmp/sample.txt hdfs dfs -du -s -h /tmp/test_table_lzma

Related posts: Uninstall Hortonworks HDP 2.2 HDFS disk consumption Find what is taking hdfs space Query escaped JSON string in Hive Hive statistics using beeline and expect script


Viewing all articles
Browse latest Browse all 6262

Trending Articles