Quantcast
Channel: CodeSection,代码区,数据库(综合) - CodeSec
Viewing all articles
Browse latest Browse all 6262

HBase全攻略

$
0
0

Coprocessor 其实是一个类似 MapReduce 的分析组件,不过它极大简化了 MapReduce 模型。将请求独立地在各个 Region 中并行地运行,并提供了一套框架让用户灵活地自定义 Coprocessor

部署 # 卸载旧的 coprocessor $ alter 'yuzhouwan', METHOD => 'table_att_unset', NAME =>'coprocessor$1' # 指定新的 coprocessor $ alter 'yuzhouwan', METHOD => 'table_att', 'coprocessor' => 'hdfs://yuzhouwan/hbase/coprocessor/coprocessor-0.0.1.jar|com.yuzhouwan.hbase.coprocessor.Aggregation|111|' # 通过查看 RegionServer的日志,可观察协处理器的运行状况 编程技巧 充分利用好 CellUtil // 直接使用 byte[]进行匹配,效率会更高 // Bad: cf.equals(Bytes.toString(CellUtil.cloneFamily(cell))) CellUtil.matchingFamily(cell, cf) && CellUtil.matchingQualifier(cell, col) // 同理,应尽量使用 `Bytes.equals`,来替代 `String#equals` 发挥好协处理的并行计算能力 // 某些很难使得表数据分布均匀的场景下,可以设置好预分区[00, 01, 02, ..., 99],并关闭自动分区(详见:常见命令-分区),则可保证每个 Region上的只有单个 xx前缀。这样,导表数据的时候,轮询地在 rowkey前加上 xx前缀,则可保证无热点 Region // 在协处理器的程序中,则可先获取到 xx前缀,并在构建 Scan的时候,将前缀加在 startKey/endKey前面即可 staticStringgetStartKeyPrefix(HRegion region){ if (region == null) throw new RuntimeException("Region is null!"); byte[] startKey = region.getStartKey(); if (startKey == null || startKey.length == 0) return "00"; String startKeyStr = Bytes.toString(startKey); return isEmpty(startKeyStr) ? "00" : startKeyStr.substring(0, 2); } private static boolean isEmpty(finalString s){ return s == null || s.length() == 0; } 处理好协处理器程序里的异常

如果在协处理器里面有异常被抛出,并且 hbase.coprocessor.abortonerror 参数没有开启,那么,该协处理器会直接从被加载的环境中被删除掉。否则,则需要看异常类型,如果是 IOException 类型,则会直接被抛出;如果是 DoNotRetryIOException 类型,则不做重试,抛出异常。否则,默认将会尝试 10次 (硬编码在 AsyncConnectionImpl#RETRY_TIMER中了) 。因此需要依据自己的业务场景,对异常做好妥善的处理

常用命令 集群相关 # 无 hbase用户密码,可以用 root用户,sudo su hbase # 跳板机,可以 ssh hbase@yuzhouwan.com的方式指定 hbase用户登录 $ su - hbase $ start-hbase.sh # HMaster ThriftServer $ jps | grep -v Jps 32538 ThriftServer 9383 HMaster 8423 HRegionServer # BackUp HMaster ThriftServer $ jps | grep -v Jps 24450 jar 21882 HMaster 2296 HRegionServer 14598 ThriftServer 5998 Jstat # BackUp HMaster ThriftServer $ jps | grep -v Jps 31119 Bootstrap 8775 HMaster 25289 Bootstrap 14823 Bootstrap 12671 Jstat 9052 ThriftServer 26921 HRegionServer # HRegionServer $ jps | grep -v Jps 29356 hbase-monitor-process-0.0.2-jar-with-dependencies.jar # monitor 11023 Jstat 26135 HRegionServer $ export -p | egrep -i "(hadoop|hbase)" declare -x HADOOP_HOME="/home/bigdata/software/hadoop" declare -x HBASE_HOME="/home/bigdata/software/hbase" declare -x PATH="/usr/local/anaconda/bin:/usr/local/R-3.2.1/bin:/home/bigdata/software/java/bin:/home/bigdata/software/hadoop/bin:/home/bigdata/software/hive/bin:/home/bigdata/software/sqoop/bin:/home/bigdata/software/hbase/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin" $ java -XX:+PrintFlagsFinal -version | grep MaxHeapSize uintx MaxHeapSize := 32126271488 {product} # 29.919921875 GB java version "1.7.0_60-ea" Java(TM) SE Runtime Environment (build 1.7.0_60-ea-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode) $ top top - 11:37:03 up 545 days, 18:45, 5 users, load average: 8.74, 10.39, 10.96 Tasks: 653 total, 1 running, 652 sleeping, 0 stopped, 0 zombie Cpu(s): 32.9%us, 0.7%sy, 0.0%ni, 66.3%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 264484056k total, 260853032k used, 3631024k free, 2235248k buffers Swap: 10485756k total, 10485756k used, 0k free, 94307776k cached # Memory: 252 GB # `hbase classpath`可以拿到 HBase相关的所有依赖 $ java -classpath ~/opt/hbase/soft/yuzhouwan.jar:`hbase classpath` com.yuzhouwan.hbase.MainApp # Usage Usage: hbase [<options>] <command> [<args>] Options: --config DIR Configuration direction to use. Default: ./conf --hosts HOSTS Override the list in 'regionservers' file Commands: Some commands take arguments. Pass no args or -h for usage. shell Run the HBase shell hbck Run the hbase 'fsck' tool hlog Write-ahead-log analyzer hfile Store file analyzer zkcli Run the ZooKeeper shell upgrade Upgrade hbase master Run an HBase HMaster node regionserver Run an HBase HRegionServer node zookeeper Run a Zookeeper server rest Run an HBase REST server thrift Run the HBase Thrift server thrift2 Run the HBase Thrift2 server clean Run the HBase clean up script classpath Dump hbase CLASSPATH mapredcp Dump CLASSPATH entries required by mapreduce pe Run PerformanceEvaluation ltt Run LoadTestTool version Print the version CLASSNAME Run the class named CLASSNAME # hbase版本信息 $ hbase version 2017-01-13 11:05:07,580 INFO [main] util.VersionInfo: HBase 0.98.8-hadoop2 2017-01-13 11:05:07,580 INFO [main] util.VersionInfo: Subversion file:///e/hbase_compile/hbase-0.98.8 -r Unknown 2017-01-13 11:05:07,581 INFO [main] util.VersionInfo: Compiled by 14074019 on Mon Dec 26 20:17:32 2016 $ hadoop fs -ls /hbase drwxr-xr-x - hbase hbase 0 2017-03-01 00:05 /hbase/.hbase-snapshot drwxr-xr-x - hbase hbase 0 2016-10-26 16:42 /hbase/.hbck drwxr-xr-x - hbase hbase 0 2016-12-19 13:02 /hbase/.tmp drwxr-xr-x - hbase hbase 0 2017-01-22 20:18 /hbase/WALs drwxr-xr-x - hbase hbase 0 2015-09-18 09:34 /hbase/archive drwxr-xr-x - hbase hbase 0 2016-10-18 09:44 /hbase/coprocessor drwxr-xr-x - hbase hbase 0 2015-09-15 17:21 /hbase/corrupt drwxr-xr-x - hbase hbase 0 2017-02-20 14:34 /hbase/data -rw-r--r-- 2 hbase hbase 42 2015-09-14 12:10 /hbase/hbase.id -rw-r--r-- 2 hbase hbase 7 2015-09-14 12:10 /hbase/hbase.version drwxr-xr-x - hbase hbase 0 2016-06-28 12:14 /hbase/inputdir drwxr-xr-x - hbase hbase 0 2017-03-01 10:40 /hbase/oldWALs -rw-r--r-- 2 hbase hbase 345610 2015-12-08 16:54 /hbase/test_bulkload.txt -rw-r--r-- 2 hbase hbase 665610 2015-12-08 17:30 /hbase/test_bulkload2.txt -rw-r--r-- 2 hbase hbase 605610 2015-12-08 17:52 /hbase/test_bulkload3.txt -rw-r--r-- 2 hbase hbase 1513010 2015-12-08 17:57 /hbase/test_bulkload4.txt $ hadoop fs -ls /hbase/WALs drwxr-xr-x - hbase hbase 0 2016-12-27 16:08 /hbase/WALs/yuzhouwan03,60020,1482741120018-splitting drwxr-xr-x - hbase hbase 0 2017-03-01 10:36 /hbase/WALs/yuzhouwan03,60020,1483442645857 drwxr-xr-x - hbase hbase 0 2017-03-01 10:37 /hbase/WALs/yuzhouwan02,60020,1483491016710 drwxr-xr-x - hbase hbase 0 2017-03-01 10:37 /hbase/WALs/yuzhouwan01,60020,1483443835926 drwxr-xr-x - hbase hbase 0 2017-03-01 10:36 /hbase/WALs/yuzhouwan03,60020,1483444682422 drwxr-xr-x - hbase hbase 0 2017-03-01 10:16 /hbase/WALs/yuzhouwan04,60020,1485087488577 drwxr-xr-x - hbase hbase 0 2017-03-01 10:37 /hbase/WALs/yuzhouwan05,60020,1484790306754 drwxr-xr-x - hbase hbase 0 2017-03-01 10:37 /hbase/WALs/yuzhouwan06,60020,1484931966988 $ hadoop fs -ls /hbase/WALs/yuzhouwan01,60020,1483443835926 -rw-r--r-- 3 hbase hbase 127540109 2017-03-01 09:49 /hbase/WALs/yuzhouwan01,60020,1483443835926/yuzhouwan01%2C60020%2C1483443835926.1488330961720 # ... -rw-r--r-- 3 hbase hbase 83 2017-03-01 10:37 /hbase/WALs/yuzhouwan01,60020,1483443835926/yuzhouwan01%2C60020%2C1483443835926.1488335822133 # log $ vim /home/hbase/logs/hbase-hbase-regionserver-yuzhouwan03.log # HBase批处理 $ echo "<command>" | hbase shell $ hbase shell ../script/batch.hbase # HBase命令行 $ hbase shell $ status 1 servers, 0 dead, 41.0000 average load $ zk_dump HBase is rooted at /hbase Active master address: yuzhouwan03,60000,1481009498847 Backup master addresses: yuzhouwan02,60000,1481009591957 yuzhouwan01,60000,1481009567346 Region server holding hbase:meta: yuzhouwan03,60020,1483442645857 Region servers: yuzhouwan02,60020,1483491016710 # ... /hbase/replication: /hbase/replication/peers: /hbase/replication/peers/1: yuzhouwan03,yuzhouwan02,yuzhouwan01:2016:/hbase /hbase/replication/peers/1/peer-state: ENABLED /hbase/replication/rs: /hbase/replication/rs/yuzhouwan03,60020,1483442645857: /hbase/replication/rs/yuzhouwan03,60020,1483442645857/1: /hbase/replication/rs/yuzhouwan03,60020,1483442645857/1/yuzhouwan03%2C60020%2C1483442645857.1488334114131: 116838271 /hbase/replication/rs/1485152902048.SyncUpTool.replication.org,1234,1: /hbase/replication/rs/yuzhouwan06,60020,1484931966988: /hbase/replication/rs/yuzhouwan06,60020,1484931966988/1: # ... Quorum Server Statistics: yuzhouwan02:2015 Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT Clients: /yuzhouwan:62003[1](queued=0,recved=625845,sent=625845) # ... /yuzhouwan:11151[1](queued=0,recved=8828,sent=8828) Latency min/avg/max: 0/0/1 Received: 161 Sent: 162 Connections: 168 Outstanding: 0 Zxid: 0xc062e91c6 Mode: follower Node count: 25428 yuzhouwan03:2015 Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT Clients: /yuzhouwan:39582[1](queued=0,recved=399812,sent=399812) # ... /yuzhouwan:58770[1](queued=0,recved=3234,sent=3234) $ stop-hbase.sh 增删查改 $ list TABLE mytable yuzhouwan # ... 20 row(s) in 1.4080 seconds $ create 'yuzhouwan', {NAME => 'info', VERSIONS => 3}, {NAME => 'data', VERSIONS => 1} 0 row(s) in 0.2650 seconds => Hbase::Table - yuzhouwan $ put 'yuzhouwan', 'rk0001', 'info:name', 'Benedict Jin' $ put 'yuzhouwan', 'rk0001', 'info:gender', 'Man' $ put 'yuzhouwan', 'rk0001', 'data:pic', '[picture]' $ get 'yuzhouwan', 'rk0001', {FILTER => "ValueFilter(=, 'binary:[picture]')"} COLUMN CELL data:pic timestamp=1479092170498, value=[picture] 1 row(s) in 0.0200 seconds $ get 'yuzhouwan', 'rk0001', {FILTER => "QualifierFilter(=, 'substring:a')"} COLUMN CELL info:name timestamp=1479092160236, value=Benedict Jin 1 row(s) in 0.0050 seconds $ scan 'yuzhouwan', {FILTER => "QualifierFilter(=, 'substring:a')"} ROW COLUMN+CELL rk0001 column=info:name, timestamp=1479092160236, value=Benedict Jin 1 row(s) in 0.0140 seconds # [rk0001, rk0003) $ put 'yuzhouwan', 'rk0003', 'info:name', 'asdf2014' $ scan 'yuzhouwan', {COLUMNS => 'info', STARTROW => 'rk0001', ENDROW => 'rk0003'} # row key start with 'rk' $ put 'yuzhouwan', 'aha_rk0003', 'info:name', 'Jin' $ scan 'yuzhouwan', {FILTER => "PrefixFilter('rk')"} ROW COLUMN+CELL rk0001 column=data:pic, timestamp=1479092170498, value=[picture] rk0001 column=info:gender, timestamp=1479092166019, value=Man rk0001 column=info:name, timestamp=1479092160236, value=Benedict Jin rk0003 column=info:name, timestamp=1479092728688, value=asdf2014 2 row(s) in 0.0150 seconds $ delete 'yuzhouwan', 'rk0001', 'info:gender' $ get 'yuzhouwan', 'rk0001' COLUMN CELL data:pic timestamp=1479092170498, value=[picture] info:name timestamp=1479092160236, value=Benedict Jin 2 row(s) in 0.0100 seconds $ disable 'yuzhouwan' $ drop 'yuzhouwan' 行列修改 # 修改表 $ disable 'yuzhouwan' # 添加列 $ alter 'yuzhouwan', NAME => 'f1' $ alter 'yuzhouwan', NAME => 'f2' Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 1.3020 seconds # 修改 CQ $ create 'yuzhouwan', {NAME => 'info'} $ put 'yuzhouwan', 'rk00001', 'info:name', 'China' $ get 'yuzhouwan', 'rk00001', {COLUMN => 'info:name'}, 'value' $ put 'yuzhouwan', 'rk00001', 'info:address', 'value' $ deleteall 'yuzhouwan', 'rk00001', 'info:name' $ scan 'yuzhouwan' ROW COLUMN+CELL rk00001 column=info:address, timestamp=1480556328381, value=value 1 row(s) in 0.0220 seconds # 删除列 $ alter 'yuzhouwan', {NAME => 'f3'}, {NAME => 'f4'} $ alter 'yuzhouwan', {NAME => 'f5'}, {NAME => 'f1', METHOD => 'delete'}, {NAME => 'f2', METHOD => 'delete'}, {NAME => 'f3', METHOD => 'delete'}, {NAME => 'f4', METHOD => 'delete'} # 无法细到 CQ级别,alter 'ns_rec:tb_mem_tag', {NAME => 'cf_tag:partyIdType', METHOD => 'delete'} # 删除行 $ deteleall <table>, <rowkey> 清空表数据 # 清空表数据 $ describe 'yuzhouwan' Table yuzhouwan is ENABLED COLUMN FAMILIES DESCRIPTION {NAME => 'data', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALS E', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} {NAME => 'f5', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE' , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALS E', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} 3 row(s) in 0.0230 seconds # 0.98版本引入的命令,可以清空表数据的同时,保留 region $ truncate_preserve 'yuzhouwan' # truncate会进行 drop table和 create table的操作 $ truncate 'yuzhouwan' $ scan 'yuzhouwan' ROW COLUMN+CELL 0 row(s) in 0.3170 seconds 改表名 # 改table名 # 注意 snapshot的名字 不可带 ':' 之类的字符,也就是说,不需要特意去区分 namespace $ disable 'yuzhouwan' $ snapshot 'yuzhouwan', 'yuzhouwan_snapshot' $ clone_snapshot 'yuzhouwan_snapshot', 'ns_site:yuzhouwan' $ delete_snapshot 'yuzhouwan_snapshot' $ drop 'yuzhouwan' $ grant 'site', 'CXWR', 'ns_site:yuzhouwan' $ user_permission 'yuzhouwan' User Table,Family,Qualifier:Permission site default,yuzhouwan,,: [Permission: actions=CREATE,EXEC,WRITE,READ] hbase default,yuzhouwan,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] $ disable 'ns_site:yuzhouwan' $ drop 'ns_site:yuzhouwan' $ exists 'ns_site:yuzhouwan' Table ns_site:yuzhouwan does not exist 0 row(s) in 0.0200 seconds 压缩算法 # 压缩算法为 'SNAPPY'报错,ERROR: java.io.IOException: Compression algorithm 'snappy' previously failed test. # 尝试 LZ4 (低压缩比,高速,在 Spark2.0中已作为默认压缩算法) $ create 'yuzhouwan', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}, {NAME => 'v', COMPRESSION => 'LZ4', BLOOMFILTER => 'NONE', DATA_BLOCK_ENCODING => 'FAST_DIFF'} $ describe 'yuzhouwan' Table yuzhouwan is ENABLED COLUMN FAMILIES DESCRIPTION {NAME => 'v', DATA_BLOCK_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'LZ4', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} 1 row(s) in 0.0280 seconds 权限控制 # ACL # R - read # W - write # X - execute # C - create # A - admin $ grant 'benedict', 'WRXC', 'yuzhouwan' $ echo "scan 'hbase:acl'" | hbase shell > acl.txt yuzhouwan column=l:benedict, timestamp=1496216745249, value=WRXC yuzhouwan column=l:hbase, timestamp=1496216737326, value=RWXCA $ user_permission # 如果不接 <table_name>将从 'hbase:acl'表中获取 $ user_permission 'yuzhouwan' User Table,Family,Qualifier:Permission hbase default,yuzhouwan,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN] benedict default,yuzhouwan,,: [Permission: actions=WRITE,READ,EXEC,CREATE] 2 row(s) in 0.0510 seconds $ revoke 'benedict', 'yuzhouwan' 分区 # splits $ create 'yuzhouwan', {NAME => 'f'}, SPLITS => ['1', '2', '3'] # 5 regions $ alter 'yuzhouwan', SPLITS => ['1', '2', '3', '4', '5', '6', '7', '8', '9'] # not work # 关闭自动分区 $ alter 'yuzhouwan', {METHOD => 'table_att', SPLIT_POLICY => 'org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy'} # 配置 master是否去平衡各个 RegionServer的region数量 # 维护或者重启一个 RegionServer时,会关闭 balancer,会导致 region在 RegionServer上的分布不均,这个时候需要手工的开启 balance $ balance_switch true $ balance_switch flase 命名空间 # namespace $ list_namespace_tables 'hbase' TABLE acl meta namespace 3 row(s) in 0.0050 seconds $ list_namespace NAMESPACE default hbase # ... 50 row(s) in 0.3710 seconds $ create_namespace 'www' $ exists 'www:yuzhouwan.site' $ create 'www:yuzhouwan.site', {NAME => 'info', VERSIONS=> 9}, SPLITS => ['1','2','3','4','5','6','7','8','9'] $ alter_namespace 'www', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'} $ drop_namespace 'www' 实战技巧 Hive数据导入(Bulkload)

Bulkload就是 依据 Hive表的 schema解析 RCFile,然后通过 MapReduce程序 生成 HBase的 HFile文件,最后直接利用 bulkload机制将 HFile文件导入到 HBase中。也就是 直接存放到 HDFS中。这样会比 调用 Api一条条的导入,效率会高很多。一般的,Hive数据入库 HBase都是走的 bulkload的方式

集群间复制 (CopyTable + Replication) 相关命令 Commend Comment add_peer 添加一条复制连接,ID是连接的标识符,CLUSTER_KEY的格式是 HBase.zookeeper.quorum: HBase.zookeeper.property.clientPort: zookeeper.znode.parent list_peers 查看所有的复制连接 enable_peer 设置某条复制连接为可用状态,add_peer一条连接默认就是enable的,通过disable_peer命令让该连接变为不可用的时候,可以通过enable_peer让连接变成可用 disable_peer 设置某条复制连接为不可用状态 remove_peer 删除某条复制连接 set_peer_tableCFs 设置某条复制连接可以复制的表信息 默认 add_peer添加的复制连接是可以复制集群所有的表,如果只想复制某些表的话,就可以用 set_peer_tableCFs,复制连接的粒度可以到表的列族,表之间通过’;’隔开,列族之间通过’,’隔开,e.g: set_peer_tableCFs ‘2’, “table1; table2:cf1,cf2; table3:cfA,cfB”。使用 ‘set_peer_tableCFs’命令设置复制连接复制所有的表 append_peer_tableCFs 可以为复制连接添加需要复制的表 remove_peer_tableCFs 为复制连接删除不需要复制的表 show_peer_tableCFs 查看某条复制连接复制的表信息,查出的信息为空时,表示复制所有的表 list_replicated_tables 列出所有复制的表 监控Replication HBase Shell $ status 'replication' Metrics

源端

Metrics Name Comment sizeOfLogQueue 还有多少wal文件没处理 ageOfLastShippedOp 上一次复制延迟时间 shippedBatches 传输了多少批数据 shippedKBs 传输了多少KB的数据 shippedOps 传输了多少条数据 logEditsRead 读取了多少个logEdits logReadInBytes 读取了多少KB数据 logEditsFiltered 实际过滤了多少logEdits

目的端

Metrics Name Comment sink.ageOfLastAppliedOp 上次处理的延迟 sink.appliedBatches 处理的批次数 sink.appliedOps 处理的数据条数 完整步骤 CopyTable # 明确迁移时间 2017-01-01 00:00:00(1483200000000) 2017-05-01 00:00:00(1493568000000) # 这里需要转换时间格式为 13位的 毫秒级 unix timestamp # 在线转换工具 http://tool.chinaz.com/Tools/unixtime.aspx # 或者用 shell $ echo "`date -d "2017-01-01 00:00:00" +%s`000" $ echo "`date -d "2017-05-01 00:00:00" +%s`000" # 这里不用担心出现 边界问题 [starttime, endtime) # 源集群执行 $ hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1483200000000 --endtime=1493568000000 --peer.adr=<zk address>,<zk address>,...:2015:/<hbase parent path> <table name> # 检查数据一致性 (在两个集群分别执行,比较 RowCount是否一致) $ hbase org.apache.hadoop.hbase.mapreduce.RowCounter <table name> --endtime=1493568000000 # 进一步检查数据一致性 (在两个集群分别执行,比较 字节数是否一致) $ hadoop fs -du hdfs://<base path>/hbase/data/<namespace>/<table name> Replication # 预先进行 list_peers,避免 peer id冲突 $ list_peers $ add_peer '<peer id>', "<zk address>,<zk address>,...:2015:/<hbase parent path>" # 开启表的REPLICATION_SCOPE $ disable '<table name>' $ alter '<table name>', {NAME => '<column family>', REPLICATION_SCOPE => '1'} # 1: open; 0: close(default) $ enable '<table name>' Trouble shooting # 源集群执行 $ hbase hbck # 出现问题后 hbase hbck --repair # 没有问题后 `hbase shell`中执行 $ balance_switch true 关闭自动分区

$ alter 'yuzhouwan', {METHOD => 'table_att', SPLIT_POLICY => 'org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy'}


Viewing all articles
Browse latest Browse all 6262

Trending Articles