MongoDB 3.3.12 发布，分布式文档存储数据库

September 1, 2016, 5:45 pm

≫ Next: APOC: Database Integration, Import and Export with Awesome Procedures On Cypher

≪ Previous: The Next Generation of Apache Hadoop

MongoDB 3.3.12 发布了，MongoDB是一个介于关系数据库和非关系数据库之间的产品，是非关系数据库当中功能最丰富，最像关系数据库的。他支持的数据结构非常松散，是类似 json的bjson格式，因此可以存储比较复杂的数据类型。Mongo最大的特点是他支持的查询语言非常强大，其语法有点类似于面向对象的查询语言，几乎可以实现类似关系数据库单表查询的绝大部分功能，而且还支持对数据建立索引。

部分更新如下： [ SERVER-4936 ] - Server support for "maxStalenessMS" read preference option [ SERVER-7200 ] - use oplog as op buffer on secondaries [ SERVER-13367 ] - Specialize operator<<(std::ostream&, BSONType) [ SERVER-19507 ] - Distinct command should use covered query plan when distinct field is a non-first element of index key pattern [ SERVER-21757 ] - ServerStatus "advisoryHostFQDNs" should be optional and not default [ SERVER-22345 ] - jstestfuzz - run on Solaris [ SERVER-22382 ] - mongo shell should accept mongodb:// URI for --host [ SERVER-23100 ] - Add methods to StringBuilder/str::stream that return unowned copy of underlying string [ SERVER-23478 ] - Update "repl.network.readersCreated" metric in the OplogFetcher [ SERVER-23501 ] - Erroring command replies should include stringified ErrorCode

查看完整发布说明以了解更多

下载地址：

↧

APOC: Database Integration, Import and Export with Awesome Procedures On Cypher

September 1, 2016, 11:52 pm

≫ Next: 基于hadoop生态圈的数据仓库实践――OLAP与数据可视化（六）

≪ Previous: MongoDB 3.3.12 发布，分布式文档存储数据库

ByMichael Hunger, Developer Relations | August 31, 2016

If you haven’t seen the first part of this series, make sure to check out the first article to get an introduction to Neo4j’s user defined procedures and check out our APOC procedure library .

New APOC Release

First of all I want to announce that we just released APOC version 3.0.4.1 . You might notice the new versioning scheme which became necessary with SPI changes inNeo4j 3.0.4 which caused earlier versions of APOC to break.

That’s why we decided to release APOC versions that are tied to the Neo4j version from which they are meant to work. The last number is an ever increasing APOC build number, starting with 1.

So if you are using Neo4j 3.0.4 please upgrade to the new version, which is available as usual from http://github.com/neo4j-contrib/neo4j-apoc-procedures/releases .

Notable changes since the last release (find more details in the docs ):

Random graph generators (by Michal Bachman from GraphAware) Added export (and import) for GraphML apoc.export.graphml.* PageRank implementation that supports pulling the subgraph to run on WITH Cypher statements apoc.algo.pageRankCypher (by Atul Jangra from RightRelevance) Basic weakly connected components implementation (by Tom Michiels and Sascha Peukert) Better error messages for load.json and periodic.iterate Support for leading wildcards “*foo” in apoc.index.search (by Stefan Armbruster) apoc.schema.properties.distinct provides distinct values of indexed properties using the index (by Max de Marzi) Timeboxed execution of Cypher statements (by Stefan Armbruster) Linking of a collection of nodes with apoc.nodes.link in a chain apoc.util.sleep e.g., for testing (by Stefan Armbruster) Build switched to gradle, including release (by Stefan Armbruster)

We got also a number of documentation updates by active contributors like Dana, Chris, Kevin and Viksit.

Thanks so much to everyone for contributing to APOC. We’re now at 227 procedures and counting!

If you missed it, you can also see what was included in the previous release: APOC 1.1.0 .

But now back to demonstrating the main topics for this blog post:

Database Integration & Data Import

Besides the flexibility of the graph data model, for me personally the ability to enrich your existing graph by relating data from other data sources is a key advantage of using agraph database.

And Neo4j data import has been a very enjoyable past time of mine, which you know if you followed my activities in the last six years.

With APOC, I got the ability to pull data import capabilities directly into Cypher so that a procedure can act as a data source providing a stream of values (e.g., rows). Those are then consumed by your regular Cypher statement to create, update and connect nodes and relationships in whichever way you want.

apoc.load.json

Because it is soclose to my heart, I first started with apoc.load.json .Then I couldn’t stop anymore and added support for XML, CSV, GraphML and a lot of databases (including relational & Cassandra via JDBC, Elasticsearch, MongoDB and CouchBase ( upcoming )).

All of these procedures are used in a similar manner. You provide some kind of URL or connection information and then optionally queries / statements to retrieve data in rows. Those rows are usually maps that map columns or fields to values, depending on the data source these maps can also be deeply nested documents.

Those can be processed easily with Cypher. The map and collection lookups, functions, expressions and predicates help a lot with handling nested structures.

Let’s look at apoc.load.json . It takes a URL and optionally some configuration and returns the resulting JSON as one single map value, or if the source is an array of objects, then as a stream of maps.

The mentioned docs and previous blog posts show how to use it for loading data from Stack Overflow orTwitter search. (You have to pass in your Twitter bearer token or credentials).

Here I want to demonstrate how you could use it to load a graph from http://onodo.org , a graph visualization platform for journalists and other researchers that want to use the power of the graph to draw insights from the connections in their data.

Using Onodo to Learn Network Analysis andVisualisation https://t.co/8ZfEzsYLeA pic.twitter.com/fRjGeSAvQS

― Shawn Day (@iridium) August 16, 2016

I came across that tweet this week, and while checking out their really neat graph editing and visualization UI, I saw that both nodes and relationships for each publicly shared visualization are available as JSON.

To load the mentioned Game of Thrones graph , I just had to grab the URLs for nodes and relationships, have a quick look at the JSON structures and re-create the graph in Neo4j. Note that for creating dynamic relationship-types from the input data I use apoc.create.relationship .

call apoc.load.json("https://onodo.org/api/visualizations/21/nodes/") yield value
create (n:Person) set n+=value
with count(*) as nodes
call apoc.load.json("https://onodo.org/api/visualizations/21/relations/") yield value
match (a:Person {id:value.source_id})
match (b:Person {id:value.target_id})
call apoc.create.relationship(a,value.relation_type,{},b) yield rel
return nodes, count(*) as relationships
APOC: Database Integration, Import and Export with Awesome Procedures On Cypher

APOC: Database Integration, Import and Export with Awesome Procedures On Cypher

apoc.load.xml

The procedure for loading XML works similarly, only that I had to convert the XML into a nested map structure to be returned.

While apoc.load.xml maintains the order of the original XML, apoc.load.xmlSimple aggregates child elements into entries with the element name as a key and all the children as a value or collection value.

book.xml from Microsoft :

<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>

↧

基于hadoop生态圈的数据仓库实践――OLAP与数据可视化（六）

September 2, 2016, 2:28 am

≫ Next: SpringBoot下Mysql数据库的中文乱码问题分析

≪ Previous: APOC: Database Integration, Import and Export with Awesome Procedures On Cypher

六、Hue数据可视化实例

本节用Impala、DB查询示例说明Hue的数据查询和可视化功能。

1. Impala查询

在Impala OLAP实例一节中执行了一些查询，现在在Hue里执行查询，直观看一下结果的图形化表示效果。

(1)登录Hue，点击

图标进入“我的文档”页面。

(2)点击

创建一个名为“销售订单”的新项目。

(3)点击

进入Impala查询编辑页面，创建一个新的Impala文档。

(4)在Impala查询编辑页面，选择olap库，然后在编辑窗口输入下面的查询语句。

-- 按产品分类查询销售量和销售额

select t2.product_category pro_category,

sum(order_quantity) sum_quantity,

sum(order_amount) sum_amount

from sales_order_fact t1, product_dim t2

where t1.product_sk = t2.product_sk

group by pro_category

order by pro_category;

-- 按产品查询销售量和销售额

select t2.product_name pro_name,

sum(order_quantity) sum_quantity,

sum(order_amount) sum_amount

from sales_order_fact t1, product_dim t2

where t1.product_sk = t2.product_sk

group by pro_name

order by pro_name;

点击“执行”按钮，结果显示按产品分类的销售统计，如下图所示。接着点击“下一页”按钮，结果会显示按产品的销售统计。

(5)点击 “全屏查看结果”按钮，会全屏显示查询结果。

产品统计结果如下图所示。

产品统计柱状图如下图所示。

从图中可以看到，按销售额从大到小排序的产品依次为Hard Disk Drive、Floppy Drive、Flat Panel、Keyboard和LCD Panel。

(6)回到查询编辑页，点击“另存为...”按钮，保存成名为“按产品统计”的查询。

(7)点击“新查询”按钮，按同样的方法再建立一个“按地区统计”的查询。SQL语句如下：

-- 按州查询销售量和销售额

select t3.state state,

count(distinct t2.customer_sk) sum_customer_num,

sum(order_amount) sum_order_amount

from sales_order_fact t1

inner join customer_dim t2 on t1.customer_sk = t2.customer_sk

inner join customer_zip_code_dim t3 on t1.customer_zip_code_sk = t3.zip_code_sk

group by state

order by state;

-- 按城市查询销售量和销售额

select t3.city city,

count(distinct t2.customer_sk) sum_customer_num,

sum(order_amount) sum_order_amount

from sales_order_fact t1

inner join customer_dim t2 on t1.customer_sk = t2.customer_sk

inner join customer_zip_code_dim t3 on t1.customer_zip_code_sk = t3.zip_code_sk

group by city

order by city;

城市统计饼图如下图所示。

从图中可以看到，mechanicsburg市的销售占整个销售额的一半。

(8)再建立一个“按年月统计”的查询，这次使用动态表单功能，运行时输入年份。SQL语句如下。

-- 按年月查询销售量和销售额

select t4.year*100 + t4.month ym,

sum(order_quantity) sum_quantity,

sum(order_amount) sum_amount

from sales_order_fact t1

inner join order_date_dim t4 on t1.order_date_sk = t4.date_sk

where (t4.year*100 + t4.month) between $ym1 and $ym2

group by ym

order by ym;

注意$ym1和$ym2是动态参数，执行此查询，会出现输入框要求输入参数，如下图所示。

查询2016一年的销售情况，ym1输入201601，ym2输入201612，然后点击“执行查询”，结果线形图如下图所示。

此结果按查询语句中的order by子句排序。

至此，我们定义了三个Impala查询，进入“我的文档”页面可以看到default项目中有三个文档，而“销售订单”项目中没有文档，如下图所示。

(9)把这三个文档移动到“销售订单”项目中。

点击右面列表中的“default”按钮，会弹出“移动到某个项目”页面，点击“销售订单”，如下图所示。

将三个查询文档都如此操作后，在“销售订单”项目中会出现此三个文档，如下图所示。

以上用销售订单的例子演示了一下Hue中的Impala查询及其图形化表示。严格地说，无论是Hue还是Zeppelin，在数据可视化上与传统的BI产品相比还很初级，它们只是提供了几种常见的图表，还缺少基本的上卷、下钻、切块、切片、百分比等功能，如果只想用Hadoop生态圈里的数据可视化工具，也只能期待其逐步完善吧。

(10)最后提供一个Hue文档中通过经纬度进行地图定位的示例，其截图如下所示。

2. DB查询

缺省情况下Hue没有启用DB查询，如果点击“Query Editors” -> “DB 查询”，会提示“当前没有已配置的数据库。”，如下图所示。

按如下方法配置DB查询。

(1)进入CDH Manager的“Hue” -> “配置”页面，在“类别中选择“服务范围” -> “高级”，然后编辑“hue_safety_valve.ini 的 Hue 服务高级配置代码段(安全阀)”配置项，填写类似如下内容：

[librdbms]
[[databases]]
[[[mysql]]]

# Name to show in the UI.

nice_name="MySQL DB"

name=hive

engine=mysql

host=172.16.1.102

port=3306

user=root

password=mypassword

这里配置的是一个MySQL数据库，如下图所示。

(2)点击“保存更改”按钮，然后点击“操作” -> “重启”，重启Hue服务。

此时再次在Hue里点击“Query Editors” -> “DB 查询”，则会出现MySQL中hive库表，此库存放的是Hive元数据。此时就可以输入SQL进行查询了，如下图所示。

↧

SpringBoot下Mysql数据库的中文乱码问题分析

September 2, 2016, 2:27 am

≫ Next: redis事务实现原理

≪ Previous: 基于hadoop生态圈的数据仓库实践――OLAP与数据可视化（六）

引言：今天的问题将围绕Java写入mysql之时，中文在数据库中编程乱码的分析追踪过程，以此来了解和优化分析解决问题的过程。

1. 开发环境描述

Spring Boot 1.4.0.RELEASE, JDK 1.8, Mysql 5.7, CentOS 7

2. 问题描述

在Java代码中，保存中文到数据，发现在数据库中显示为？？？，这个是乱码的表现，剩下的问题是哪个环节出现了问题呢？

3. 问题分析以及推理

在整个环节中，产生乱码的环节主要有以下几个：java代码， IDE, 代码所在的系统， Mysql连接，数据库所在的操作系统，数据库层面。这里我们使用utf-8来做通用的编码格式。

接下来我们进行逐个分析与排查可能的问题：

A: IDE本身的编码，经过排查正确， utf-8.

B. 开发所使用的操作系统

经过确认为windows 7的中文版，应该不是问题的根源。

C. Mysql的连接驱动

目前使用的连接URL为： jdbc:log4jdbc:mysql://localhost:3306/mealsystem?useUnicode=true&characterEncoding=utf-8

问号后面挂接的unicode编码的支持，设定为utf-8.

D. 数据库所在的操作系统

[root@flybird ~]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
[root@flybird ~]# uname -a
Linux flybird 3.10.0-327.3.1.el7.x86_64 #1 SMP Wed Dec 9 14:09:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@flybird ~]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@flybird ~]#
E. 操作系统的编码以及locale：
[root@flybird ~]# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
经过确认，没有问题，都是遵守utf-8的格式。

F. 数据库中的表分析：

数据库表test, 表中5个字段，id, name, created_time, updated_time, version.

其中表的encode如下，确认为utf-8.

其中目标字段name的编码格式：

故name本身的编码没有问题。

3. Spring Boot的Java代码分析

TestEntity的定义：

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.Table;
@Entity
@Table(name="test")
public class TestEntity extends BaseEntity {
private static final long serialVersionUID = -4437451262794760844L;
@Column
private String name;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}
DAO的TestRepository.java的定义：
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Repository;
import com.rain.wx.meal.model.TestEntity;
@Repository
public interface TestRepository extends JpaRepository {
},>
测试代码：
@RunWith(SpringRunner.class)
@SpringBootTest
@ActiveProfiles("dev")
public class TestEntityTest {
@Autowired
private TestRepository testRepo;
@Test
public void testEntity() {
TestEntity test = new TestEntity();
test.setName("我的天空");
test = testRepo.save(test);
test = testRepo.findOne(test.getId());
System.out.println("tst info:" + test);
}
}
经过分析，由于IDE本身已经设置了UTF-8的编码，故在代码已经无需额外的转码，且在代码层面已经进行了转码的测试，比如utf-8, gb2312, gbk, is08859_1等编码，皆仍未乱码。

4. 基于Mysql的客户端的验证分析

基于workbench或者Navicat之类的客户端工具，打开目标表test, 手动输入中文信息到test的name字段，保存之后，重新查询，发现仍为中文信息。基于代码针对基于客户端输入的信息，进行查询发现，可以正常的查出中文信息来。

基于这个正确查询出来的结果，可以确认从数据中的查询是正确的；目前存在问题的路径为写入中文的过程。

5. 聚焦数据库本身

在之前排查完了操作系统的编码之后，数据库的编码也需要排查一下：

忽然发现character_set_server的编码是latin1, 原来问题在这样；在基本确认问题源头之后，我们来看看如何解决。

6. 问题的解决方式

修改character_set_server的encode：

>> set global character_set_server = utf8.

然后重启 mysqlServer之后，很不幸，竟然不生效。不知道问题出在哪里。。。。。。

那好吧，我们换一种方式来做吧，在/etc/my.cnf中进行初始化数据库的encode:

[client] # 新增客户端的编码
default-character-set=utf8
[mysql] # 新增客户端的编码，缺省
default-character-set=utf8
[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Recommended in standard MySQL setup
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
# 新增关于character_set_server的编码设置
init-connect='SET NAMES utf8'
character-set-server = utf8
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
这里在mysql中新增了如下记录，来初始化设置mysql数据库服务器的编码：
init-connect='SET NAMES utf8'
character-set-server = utf8
然后，重新启动mysql服务：
systemctl restart mysql
重新执行测试代码，欣喜之中看到了预期中的结果：
2016-08-31 16:26:27.613 INFO 12556 --- [ main] jdbc.audit : 4. Connection.getWarnings() returned null
2016-08-31 16:26:27.614 INFO 12556 --- [ main] jdbc.audit : 4. Connection.clearWarnings() returned
2016-08-31 16:26:27.615 INFO 12556 --- [ main] jdbc.audit : 4. Connection.clearWarnings() returned
tst info:com.rain.wx.meal.model.TestEntity@578198d9[
name=我的天空
id=7
version=0
createdTime=
updatedTime=
]
2016-08-31 16:26:27.656 INFO 12556 --- [ Thread-2] o.s.w.c.s.GenericWebApplicationContext : Closing org.springframework.web.context.support.GenericWebApplicationContext@71687585: startup date [Wed Aug 31 16:26:08 CST 2016]; root of context hierarchy
2016-08-31 16:26:27.670 INFO 12556 --- [ Thread-2] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
2016-08-31 16:26:27.677 INFO 12556 --- [ Thread-2] jdbc.connection : 1. Connection closed
2016-08-31 16:26:27.677 INFO 12556 --- [ Thread-2] jdbc.audit : 1. Connection.close() returned
2016-08-31 16:26:27.679 INFO 12556 --- [ Thread-2] jdbc.connection : 2. Connection closed
2016-08-31 16:26:27.680 INFO 12556 --- [ Thread-2] jdbc.audit : 2. Connection.close() returned
2016-08-31 16:26:27.680 INFO 12556 --- [ Thread-2] jdbc.connection : 3. Connection closed
2016-08-31 16:26:27.680 INFO 12556 --- [ Thread-2] jdbc.audit : 3. Connection.close() returned
2016-08-31 16:26:27.682 INFO 12556 --- [ Thread-2] jdbc.connection : 5. Connection closed
2016-08-31 16:26:27.683 INFO 12556 --- [ Thread-2] jdbc.audit : 5. Connection.close() returned
2016-08-31 16:26:27.684 INFO 12556 --- [ Thread-2] jdbc.audit : 4. PreparedStatement.close() returned
2016-08-31 16:26:27.685 INFO 12556 --- [ Thread-2] jdbc.audit : 4. PreparedStatement.close() returned
2016-08-31 16:26:27.685 INFO 12556 --- [ Thread-2] jdbc.connection : 4. Connection closed
2016-08-31 16:26:27.686 INFO 12556 --- [ Thread-2] jdbc.audit : 4. Connection.close() returned
2016-08-31 16:26:27.687 INFO 12556 --- [ Thread-2] com.alibaba.druid.pool.DruidDataSource : {dataSource-1} closed
7. 参考资料
http://stackoverflow.com/questions/3513773/change-mysql-default-character-set-to-utf-8-in-my-cnf http://www.cnblogs.com/-1185767500/articles/3106194.html https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_character_set_database,>

↧

redis事务实现原理

September 2, 2016, 2:26 am

≫ Next: JDBC(七)DAO设计模式

≪ Previous: SpringBoot下Mysql数据库的中文乱码问题分析

一：简介

Redis事务通常会使用MULTI,EXEC,WATCH等命令来完成,redis实现事务实现的机制与常见的关系型数据库有很大的却别,比如redis的事务不支持回滚,事务执行时会阻塞其它客户端的请求执行。

二：事务实现细节

redis事务从开始到结束通常会通过三个阶段:

1.事务开始
2.命令入队
3.事务执行
我们从下面的例子看下
redis > MULTI
OK
redis > SET "username" "bugall"
QUEUED
redis > SET "password" 161616
QUEUED
redis > GET "username"
redis > EXEC
1) ok
2) "bugall"
3) "bugall"
redis > MULTI
标记事务的开始,MULTI命令可以将执行该命令的客户端从非事务状态切换成事务状态,这一切换是通过在客户端状态的flags属性中打开REDIS_MULTI标识完成,
我们看下redis中对应部分的源码实现
void multiCommand(client *c) {
if (c->flags & CLIENT_MULTI) {
addReplyError(c,"MULTI calls can not be nested");
return;
}
c->flags |= CLIENT_MULTI; //打开事务标识
addReply(c,shared.ok);
}

在打开事务标识的客户端里,这些命令,都会被暂存到一个命令队列里,不会因为用户会的输入而立即执行

redis > SET "username" "bugall"
redis > SET "password" 161616
redis > GET "username"

执行事务队列里的命令。

redis > EXEC
这里需要注意的是,在客户端打开了事务标识后,只有命令:EXEC,DISCARD,WATCH,MULTI命令会被立即执行,其它命令服务器不会立即执行,而是将这些命令放入到一个事务队列里面,然后向客户端返回一个QUEUED回复
redis客户端有自己的事务状态,这个状态保存在客户端状态mstate属性中,mstate的结构体类型是multiState,我们看下multiState的定义
typedef struct multiState {
multiCmd *commands; //存放MULTI commands的数组
int count; //命令数量
} multiState;

我们再看下结构体类型multiCmd的结构

typedef struct multiCmd {
robj **argv; //参数
int argc; //参数数量
struct redisCommand *cmd; //命令指针
} multiCmd;

事务队列以先进先出的保存方法,较先入队的命令会被放到数组的前面,而较后入队的命令则会被放到数组的后面.

三：执行事务
当开启事务标识的客户端发送EXEC命令的时候,服务器就会执行,客户端对应的事务队列里的命令,我们来看下EXEC
的实现细节
void execCommand(client *c) {
int j;
robj **orig_argv;
int orig_argc;
struct redisCommand *orig_cmd;
int must_propagate = 0; //同步持久化,同步主从节点
//如果客户端没有开启事务标识
if (!(c->flags & CLIENT_MULTI)) {
addReplyError(c,"EXEC without MULTI");
return;
}
//检查是否需要放弃EXEC
//如果某些被watch的key被修改了就放弃执行
if (c->flags & (CLIENT_DIRTY_CAS|CLIENT_DIRTY_EXEC)) {
addReply(c, c->flags & CLIENT_DIRTY_EXEC ? shared.execaborterr :
shared.nullmultibulk);
discardTransaction(c);
goto handle_monitor;
}
//执行事务队列里的命令
unwatchAllKeys(c); //因为redis是单线程的所以这里,当检测watch的key没有被修改后就统一clear掉所有的watch
orig_argv = c->argv;
orig_argc = c->argc;
orig_cmd = c->cmd;
addReplyMultiBulkLen(c,c->mstate.count);
for (j = 0; j < c->mstate.count; j++) {
c->argc = c->mstate.commands[j].argc;
c->argv = c->mstate.commands[j].argv;
c->cmd = c->mstate.commands[j].cmd;
//同步主从节点,和持久化
if (!must_propagate && !(c->cmd->flags & CMD_READONLY)) {
execCommandPropagateMulti(c);
must_propagate = 1;
}
//执行命令
call(c,CMD_CALL_FULL);
c->mstate.commands[j].argc = c->argc;
c->mstate.commands[j].argv = c->argv;
c->mstate.commands[j].cmd = c->cmd;
}
c->argv = orig_argv;
c->argc = orig_argc;
c->cmd = orig_cmd;
//取消客户端的事务标识
discardTransaction(c);
if (must_propagate) server.dirty++;
handle_monitor:
if (listLength(server.monitors) && !server.loading)
replicationFeedMonitors(c,server.monitors,c->db->id,c->argv,c->argc);
}
四：watch/unwatch/discard
watch：
命令是一个乐观锁,它可以在EXEC命令执行之前,监视任意数量的数据库键,并在执行EXEC命令时判断是否至少有一个被watch的键值
被修改如果被修改就放弃事务的执行,如果没有被修改就清空watch的信息,执行事务列表里的命令。
unwatch：
顾名思义可以看出它的功能是与watch相反的,是取消对一个键值的“监听”的功能能
discard：
清空客户端的事务队列里的所有命令,并取消客户端的事务标记,如果客户端在执行事务的时候watch了一些键,则discard会取消所有
键的watch.
五：redis事务的ACID特性
在传统的关系型数据库中,尝尝用ACID特质来检测事务功能的可靠性和安全性。
在redis中事务总是具有原子性(Atomicity),一致性(Consistency)和隔离性(Isolation),并且当redis运行在某种特定的持久化
模式下,事务也具有耐久性(Durability).
①原子性
事务具有原子性指的是,数据库将事务中的多个操作当作一个整体来执行,服务器要么就执行事务中的所有操作,要么就一个操作也不执行。
但是对于redis的事务功能来说,事务队列中的命令要么就全部执行,要么就一个都不执行,因此redis的事务是具有原子性的。我们通常会知道
两种关于redis事务原子性的说法,一种是要么事务都执行,要么都不执行。另外一种说法是redis事务当事务中的命令执行失败后面的命令还
会执行,错误之前的命令不会回滚。其实这个两个说法都是正确的。但是缺一不可。我们接下来具体分析下
我们先看一个可以正确执行的事务例子
```
redis > MULTI
OK
redis > SET username "bugall"
QUEUED
redis > EXEC
1) OK
2) "bugall"
```
与之相反,我们再来看一个事务执行失败的例子。这个事务因为命令在放入事务队列的时候被服务器拒绝,所以事务中的所有命令都不会执行,因为
前面我们有介绍到,redis的事务命令是统一先放到事务队列里,在用户输入EXEC命令的时候再统一执行。但是我们错误的使用"GET"命令,在命令
放入事务队列的时候被检测到事务,这时候还没有接收到EXEC命令,所以这个时候不牵扯到回滚的问题,在EXEC的时候发现事务队列里有命令存在
错误,所以事务里的命令就全都不执行,这样就达到里事务的原子性,我们看下例子。
```
redis > MULTI
OK
redis > GET
(error) ERR wrong number of arguments for 'get' command
redis > GET username
QUEUED
redis > EXEC
(error) EXECABORT Transaction discarded because of previous errors
```
redis的事务和传统的关系型数据库事务的最大区别在于,redis不支持事务的回滚机制,即使事务队列中的某个命令在执行期间出现错误,整个事务也会
继续执行下去,直到将事务队列中的所有命令都执行完毕为止,我们看下面的例子
```
redis > SET username "bugall"
OK
redis > MULTI
OK
redis > SADD member "bugall" "litengfe" "yangyifang"
QUEUED
redis > RPUSH　username "b" "l" "y" //错误对键username使用列表键命令
QUEUED
redis > SADD password "123456" "123456" "123456"
QUEUED
redis > EXEC
1) (integer) 3
2) (error) WRONGTYPE Operation against a key holding the wrong kind of value
3) (integer) 3

redis的作者在十五功能的文档中解释说,不支持事务回滚是因为这种复杂的功能和redis追求的简单高效的设计主旨不符合,并且他认为,redis事务的执行时

错误通常都是编程错误造成的,这种错误通常只会出现在开发环境中,而很少会在实际的生产环境中出现,所以他认为没有必要为redis开发事务回滚功能。所以

我们在讨论redis事务回滚的时候,一定要区分命令发生错误的时候。

②一致性

事务具有一致性指的是,如果数据库在执行事务之前是一致的,那么在事务执行之后,无论事务是否执行成功,数据库也应该仍然一致的。

”一致“指的是数据符合数据库本身的定义和要求,没有包含非法或者无效的错误数据。redis通过谨慎的错误检测和简单的设计来保证事务一致性。

③隔离性

事务的隔离性指的是,即使数据库中有多个事务并发在执行,各个事务之间也不会互相影响,并且在并发状态下执行的事务和串行执行的事务产生的结果完全

相同。

因为redis使用单线程的方式来执行事务(以及事务队列中的命令),并且服务器保证,在执行事务期间不会对事物进行中断,因此,redis的事务总是以串行

的方式运行的,并且事务也总是具有隔离性的

④持久性

事务的耐久性指的是,当一个事务执行完毕时,执行这个事务所得的结果已经被保持到永久存储介质里面。

因为redis事务不过是简单的用队列包裹起来一组redis命令,redis并没有为事务提供任何额外的持久化功能,所以redis事务的耐久性由redis使用的模式

决定

- 当服务器在无持久化的内存模式下运行时,事务不具有耐久性,一旦服务器停机,包括事务数据在内的所有服务器数据都将丢失

- 当服务器在RDB持久化模式下运作的时候,服务器只会在特定的保存条件满足的时候才会执行BGSAVE命令,对数据库进行保存操作,并且异步执行的BGSAVE不

能保证事务数据被第一时间保存到硬盘里面,因此RDB持久化模式下的事务也不具有耐久性

- 当服务器运行在AOF持久化模式下,并且appedfsync的选项的值为always时,程序总会在执行命令之后调用同步函数,将命令数据真正的保存到硬盘里面,因此

这种配置下的事务是具有耐久性的。

- 当服务器运行在AOF持久化模式下,并且appedfsync的选项的值为everysec时,程序会每秒同步一次命令数据到磁盘因为停机可能会恰好发生在等待同步的那一秒内,这种可能造成事务数据丢失,所以这种配置下的事务不具有耐久性

↧

JDBC(七)DAO设计模式

September 2, 2016, 2:25 am

≫ Next: skip-external-lockingskip-locking参数详解

≪ Previous: redis事务实现原理

DAO设计模式概念

在Java web开发中，使用jsp+Javabean模式来开发web应用，程序对数据操作和访问是直接使用jdbc技术，在jsp页面嵌入大量的java代码来访问数据库，这样页面显示代码和java代码混合在一起是的代码变得难以维护，同时，代码大量重复，复用率低下。更好的做法是，在web前段，只关心数据如何显示，而不关心数据从何而来，数据的访问应该由数据层来统一完后，即是DAO，使用DAO设计模式能够很好的解决上述问题。

DAO设计模式抽象和封装了对关系型数据库的访问，DAO层负责管理数据库连接，对数据的存储做控制，使得开发者从关系型数据库中解放出来，以面向对象的思维来操作数据。

DAO层的组成部分

DAO层主要由数据库连接类，Model类，DAO接口，DAO实现类，DAO工厂类五个部分组成。

数据库连接类。数据库连接类主要负责获得数据库连接和释放数据库链接。 Model类。一个Model类和数据库中的一张数据表对应，Model类的属性与数据表的列对应，也就是说一个Model类的实例和数据的一行相对应。 DAO接口。DAO接口中定义了所有操作数据库的方法，包括CRUD等。在DAO的实现类中应该给出相应的实现。 DAO实现类。DAO实现类即是对DAO接口中所有方法的实现，包含了对对应数据表的所有操作，同时也可以理解为从二维表格到对象的封装。 DAO工厂类，工厂类主要负责创建并返回DAO实现类，控制返回的实现类是哪一个，是的客户代码无需关心使用的是哪一个实现类（只需要使用接口中的方法即可）。这样使得客户代码无需关心具体的实现类，有利于维护和扩展。
DAO设计模式实例

这里设计一个DAO层的访问，数据库连接管理采用c3p0数据库连接池。

数据库信息：

数据库链接类（JDBCUtils.java）：
package com.aaa.utils;
import java.io.InputStream;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Properties;
import com.mchange.v2.c3p0.ComboPooledDataSource;
/**
* 获取数据库连接和示范数据库资源的类
*
* @author tuxianchao
*
*/
public class JDBCUtils {
private static ComboPooledDataSource comboPooledDataSource = null;
static {
try {
InputStream inputStream = JDBCUtils.class.getClassLoader().getResourceAsStream("c3p0-config.xml");
Properties properties = new Properties();
properties.load(inputStream);
comboPooledDataSource = new ComboPooledDataSource("mysql");
} catch (Exception e) {
// TODO: handle exception
e.printStackTrace();
}
}
/**
* 获取数据库连接
*
* @return
*/
public static Connection getConnection() {
try {
return comboPooledDataSource.getConnection();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
public static void releaseDB(Connection connection, PreparedStatement preparedStatement, ResultSet resultSet) {
if (connection != null) {
try {
connection.close();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
if (preparedStatement != null) {
try {
preparedStatement.close();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
if (resultSet != null) {
try {
resultSet.close();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}

Model类（User.java）:

package com.aaa.model;
import java.sql.Date;
/**
* 模型设计。对应属数据库中的表，属性和数据库中的字段一一对应。
*
* @author tuxianchao
*
*/
public class User {
private int userId;
private String userName;
private String userPassword;
private Date userBirth;
public User() {
}
public User(int userId, String userName, String userPassword, Date userBirth) {
this.userId = userId;
this.userName = userName;
this.userPassword = userPassword;
this.userBirth = userBirth;
}
public int getUserId() {
return userId;
}
public void setUserId(int userId) {
this.userId = userId;
}
public String getUserName() {
return userName;
}
public void setUserName(String userName) {
this.userName = userName;
}
public String getUserPassword() {
return userPassword;
}
public void setUserPassword(String userPassword) {
this.userPassword = userPassword;
}
public Date getUserBirth() {
return userBirth;
}
public void setUserBirth(Date userBirth) {
this.userBirth = userBirth;
}
@Override
public String toString() {
return "User [userId=" + userId + ", userName=" + userName + ", userPassword=" + userPassword + ", userBirth="
+ userBirth + "]";
}
}

DAO(UserDAO.java):

package com.aaa.dao;
import java.util.List;
import com.aaa.model.User;
/**
* UserDAO接口,定义了CRUD等数据操作
*
* @author tuxianchao
*
*/
public interface UserDAO {
public int insert(User user);
public int update(User user);
public int delete(int userId);
public User queryById(int userId);
public List queryAll();
}

DAO实现类(UserDAOImpl.java):

package com.aaa.dao.impl;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import com.aaa.dao.UserDAO;
import com.aaa.model.User;
import com.aaa.utils.JDBCUtils;
public class UserDAOImpl implements UserDAO {
Connection connection = null;
PreparedStatement preparedStatement = null;
ResultSet resultSet = null;
@Override
public int insert(User user) {
// TODO Auto-generated method stub
int row = 0;// 影响的行数,返回0表示插入失败，返回1表示插入成功
try {
connection = JDBCUtils.getConnection();
String sql = "INSERT INTO user VALUES(?,?,?,?)";
preparedStatement = connection.prepareStatement(sql);
preparedStatement.setObject(1, user.getUserId());
preparedStatement.setObject(2, user.getUserName());
preparedStatement.setObject(3, user.getUserPassword());
preparedStatement.setObject(4, user.getUserBirth());
row = preparedStatement.executeUpdate();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return row;
}
@Override
public int update(User user) {
// TODO Auto-generated method stub
int row = 0;
try {
connection = JDBCUtils.getConnection();
String sql = "UPDATE user SET user_name = ?, user_password = ?, user_birth = ? WHERE user_id = ?";
preparedStatement = connection.prepareStatement(sql);
preparedStatement.setObject(1, user.getUserName());
preparedStatement.setObject(2, user.getUserPassword());
preparedStatement.setObject(3, user.getUserBirth());
preparedStatement.setObject(4, user.getUserId());
row = preparedStatement.executeUpdate();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return row;
}
@Override
public int delete(int userId) {
// TODO Auto-generated method stub
int row = 0;
try {
connection = JDBCUtils.getConnection();
String sql = "DELETE FROM user WHERE user_id =?";
preparedStatement = connection.prepareStatement(sql);
preparedStatement.setObject(1, userId);
row = preparedStatement.executeUpdate();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return row;
}
@Override
public User queryById(int userId) {
User user = new User();
// TODO Auto-generated method stub
try {
connection = JDBCUtils.getConnection();
String sql = "SELECT * FROM user WHERE user_id = ?";
preparedStatement = connection.prepareStatement(sql);
preparedStatement.setObject(1, userId);
resultSet = preparedStatement.executeQuery();
while (resultSet.next()) {
user.setUserId(resultSet.getInt(1));
user.setUserName(resultSet.getString(2));
user.setUserPassword(resultSet.getString(3));
user.setUserBirth(resultSet.getDate(4));
}
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return user;
}
@Override
public List queryAll() {
List users = new ArrayList<>();
User user = null;
// TODO Auto-generated method stub
try {
connection = JDBCUtils.getConnection();
String sql = "SELECT * FROM user";
preparedStatement = connection.prepareStatement(sql);
resultSet = preparedStatement.executeQuery();
while (resultSet.next()) {
user = new User();
user.setUserId(resultSet.getInt(1));
user.setUserName(resultSet.getString(2));
user.setUserPassword(resultSet.getString(3));
user.setUserBirth(resultSet.getDate(4));
users.add(user);
}
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return users;
}
}

DAO工厂类（DAOFactory.java）：

package com.aaa.dao;
import com.aaa.dao.impl.UserDAOImpl;
/**
* DAO工厂类
*
* @author tuxianchao
*
*/
public class DAOFactory {
public static UserDAO getUserDAOInstance() {
return new UserDAOImpl();
}
}

总结：在本实例中，创建了对应数据表的User.java，在DAO接口中定义了基本的方法，在DAO的实现类中给出了对应的方法，在DAO工厂类中，返回对应实现类的实例,可以这样创建一个DAO：UserDAO userDao = DAOFactory.getUserDAOInstance(); 然后总结调用接口中的方法，而不需要关心接口的具体的实现类，这也大概是使用接口的好好处吧。对于使用工厂类来生产对应的DAO实现类，这样的好处是，当添加其他的xxxDAO的时候，只需要增加一个返回对应的xxxDAO即可，这样可以统一的使用DAOFactory来获取xxxDAO。

↧

skip-external-lockingskip-locking参数详解

September 2, 2016, 2:24 am

≫ Next: MSSQL之八事务

≪ Previous: JDBC(七)DAO设计模式

mysql的配置文件my.cnf中默认存在一行skip-external-locking的参数，即“跳过外部锁定”。根据MySQL开发网站的官方解释，External-locking用于多进程条件下为MyISAM数据表进行锁定。

如果你有多台服务器使用同一个数据库目录(不建议)，那么每台服务器都必须开启external locking;

调整MySQL运行参数，修改/etc/my.cnf文件调整mysql运行参数重启MySQL后生效，在MySQL4版本以后，一部分内部变量可以在MySQL运行时设置，不过重启MySQL就失效了。

当外部锁定(external-locking)起作用时，每个进程若要访问数据表，则必须等待之前的进程完成操作并解除锁定。由于服务器访问数据表时经常需要等待解锁，因此在单服务器环境下external locking会让MySQL性能下降。所以在很多linux发行版的源中，MySQL配置文件中默认使用了skip-external-locking来避免external locking。

当使用了skip-external-locking后，为了使用MyISAMChk检查数据库或者修复、优化表，你必须保证在此过程中MySQL服务器没有使用需要操作的表。如果没有停止服务器，也至少需要先运行

1mysqladmin flush-tables

命令，否则数据表可能出现异常。

如果是多服务器环境，希望打开external locking特征，则注释掉这一行即可

1# skip-external-locking

如果是单服务器环境，则将其禁用即可，使用如下语句

1skip-external-locking

在老版本的MySQL中，此参数的写法为：

1skip-locking

如果在新版本MySQL配置中依然使用此写法，则可能出现：

[Warning] ‘–skip-locking’ is deprecated and will be removed in a future release. Please use ‘–skip-external-locking’ instead.

【优化建议】利用mysql skip-name-resolve 拔高外部访问速度

mysql skip-name-resolve 提高外部访问速度

设置建议:

对于单台运行的WEB服务器,建议加上: skip-locking skip-name-resolve skip-networking

在 php链接数据库时使用”LOCALHOST”.这样MySQL 客户端库将覆盖之并尝试连接到本地套接字.( 我们可以从 PHP.INI 中代码:

; Default socket name for local MySQL connects. If empty, uses the built-in ; MySQL defaults. mysql.default_socket = /tmp/mysql.sock

看出默认情况下 UNIX 将访问/tmp/mysql.sock) * 以下是部分选项解释:

my.cnf 默认是不存在的.你可以在/usr/local/share /mysql/下看到:

1. my-huge.cnf 2. my-innodb-heavy-4G.cnf 3. my-large.cnf 4. my-medium.cnf 5. my-small.cnf

等文件.将其中合适你机器配置的文件拷贝到/etc/my.cnf或mysql data目录/my.cnf(/var/db/mysql)下或~/.my.cnf.文件内都有详细的说明

[mysqld] port = 3306 serverid = 1 socket = /tmp/mysql.sock skip-locking # 避免MySQL的外部锁定，减少出错几率增强稳定性。

skip-name-resolve 禁止MySQL对外部连接进行DNS解析，使用这一选项可以消除MySQL进行DNS解析的时间。但需要注意，如果开启该选项，则所有远程主机连接授权都要使用IP地址方式，否则MySQL将无法正常处理连接请求!

back_log = 384 指定MySQL可能的连接数量。当 MySQL主线程在很短的时间内接收到非常多的连接请求，该参数生效，主线程花费很短的时间检查连接并且启动一个新线程。 back_log参数的值指出在MySQL暂时停止响应新请求之前的短时间内多少个请求可以被存在堆栈中。如果系统在一个短时间内有很多连接，则需要增大该参数的值，该参数值指定到来的TCP/IP连接的侦听队列的大小。不同的操作系统在这个队列大小上有它自己的限制。试图设定back_log高于你的操作系统的限制将是无效的。默认值为50。对于Linux系统推荐设置为小于512的整数。

key_buffer_size = 256M # key_buffer_size指定用于索引的缓冲区大小，增加它可得到更好的索引处理性能。对于内存在 4GB左右的服务器该参数可设置为256M或384M。注意：该参数值设置的过大反而会是服务器整体效率降低!

max_allowed_packet = 4M thread_stack = 256K table_cache = 128K sort_buffer_size = 6M 查询排序时所能使用的缓冲区大小。注意：该参数对应的分配内存是每连接独占!如果有100个连接，那么实际分配的总共排序缓冲区大小为100 × 6 = 600MB。所以，对于内存在4GB左右的服务器推荐设置为6-8M。

read_buffer_size = 4M 读查询操作所能使用的缓冲区大小。和sort_buffer_size一样，该参数对应的分配内存也是每连接独享!

join_buffer_size = 8M 联合查询操作所能使用的缓冲区大小，和sort_buffer_size一样，该参数对应的分配内存也是每连接独享!

myisam_sort_buffer_size = 64M table_cache = 512 thread_cache_size = 64 query_cache_size = 64M 指定MySQL查询缓冲区的大小。可以通过在MySQL控制台执行以下命令观察：代码:

# > SHOW VARIABLES LIKE ‘%query_cache%’; # > SHOW STATUS LIKE ‘Qcache%’;

如果 Qcache_lowmem_prunes的值非常大，则表明经常出现缓冲不够的情况; 如果Qcache_hits的值非常大，则表明查询缓冲使用非常频繁，如果该值较小反而会影响效率，那么可以考虑不用查询缓冲;Qcache_free_blocks，如果该值非常大，则表明缓冲区中碎片很多。

tmp_table_size = 256M max_connections = 768 指定 MySQL允许的最大连接进程数。如果在访问论坛时经常出现Too Many Connections的错误提示，则需要增大该参数值。

max_connect_errors = 10000000 wait_timeout = 10 指定一个请求的最大连接时间，对于4GB左右内存的服务器可以设置为 5-10。

thread_concurrency = 8 该参数取值为服务器逻辑CPU数量×2，在本例中，服务器有2颗物理CPU，而每颗物理CPU又支持H.T超线程，所以实际取值为4 × 2 = 8

skip-networking 开启该选项可以彻底关闭MySQL的TCP/IP连接方式，如果WEB服务器是以远程连接的方式访问MySQL数据库服务器则不要开启该选项!否则将无法正常连接!

【问题分析】mysql中的 skip-name-resolve 问题

今天安装了个，重启时发现Error.log有下面提示：

100616 21:05:15 [Warning] 'user' entry 'root@hexuweb101' ignored in --skip-name-resolve mode.
2100616 21:05:15 [Warning] 'user' entry '@hexuweb101' ignored in --skip-name-resolve mode

产生的原因是 my.cnf 中我设置了 skip-name-resolve，skip-name-resolve是禁用dns解析，所以在mysql的授权表中就不能使用主机名了，只能使用IP 。

与是我删除了user table 中的host是域名项就可以了。

如果你有一个很慢的DNS和许多主机，你可以通过用--skip-name-resolve禁用DNS查找或增加HOST_CACHE_SIZE定义(默认值：128)并重新编译mysqld来提高性能。

你可以用--skip-host-cache选项启动服务器来禁用主机名缓存。要想清除主机名缓存，执行FLUSH HOSTS语句或执行mysqladmin flush-hosts命令。

如果你想要完全禁止TCP/IP连接，用--skip-networking选项启动mysqld。

新加的一台服务器，连接内网中的一台mysql服务器的时候，经常出现超时。登陆到mysql，查看进程的信息 show processlist; 发现大量的进程的状态为 login 原来默认的时候mysql启动时是不使用 skip-name-resolve选项的，这样的话，从其它主机的连接会比较慢，因为mysql会对这个ip做dns反向查询，导致大量的连接处于 login状态….. 解决这个问题有两个办法一是加入 skip-name-resolve参数重启mysql 二是在 /etc/hosts中加入一句 192.168.0.2 server2 其中 192.168.0.2是新加的服务器的内网ip，server2是新服务器的主机名

在mysql客户端登陆mysql服务器的登录速度太慢的解决方案一篇文章中，我介绍了如何通过在my.ini文件(linux下是my.cnf文件)中添加”SKIP-NAME-RESOLVE”的参数设置，使得客户端在登录服务器的时候不通过主机解析这一关，直接登陆的方法，以此来提高登录速度。

这里要介绍一下这种方法的负面作用，以及不合理的时机使用这种方法会引发的不可发现的错误。

首先，回顾一下在my.ini文件中添加”SKIP-NAME-RESOLVE”参数来提高访问速度的原理：

在没有设置该参数的时候，客户端在登陆请求发出后，服务器要解析请求者是谁，经过解析，发现登录者是从另外的电脑登录的，也就是说不是服务器本机，那么，服务器会到mysql.user表中去查找是否有这个用户，假设服务器IP是192.168.0.1，而客户机的IP是192.168.0.2;那么查询的顺序是先找‘root’@'192.168.0.2′这个user是否存在，若存在，则匹配这个用户登陆，并加载权限列表。若没有该用户，则查找‘root’@'%’这个用户是否存在，若存在，则加载权限列表。否则，登录失败。

在设置了SKIP-NAME-RESOLVE参数后，客户端的登录请求的解析式同上面一样的，但是在服务器本机的解析过程却发生了改变：服务器会把在本机登录的用户自动解析为‘root’@'127.0.0.1′; 而不是‘root’@'localhost’;这样一来就坏了，因为我们在服务器上登录是为了进行一些维护操作，但是显然，‘root’@'127.0.0.1′这个用户是被默认为‘root’@'%’这个用户的，这个用户还没有足够得权限去执行一些超级管理员‘root’@'localhost’才能执行的大作。因为未分配权限。

所以结论是：加入你在服务器本机上登录mysql服务器的话，要么先取消SKIP-NAME-RESOLVE的参数设置，重新启动服务器再登陆，设置完成后，再设置上该参数;要么就给‘root’@'127.0.0.1′分配超级管理员权限，但这么做显然是不明智的，因为任何人在任何机器上都可以用这个用户执行管理员操作，前提是知道了密码。

我有一次在mysql服务器上执行数据库创建脚本，并同时创建表、触发器、存储过程等。结果，总是失败，经过了一上午的折腾，最后发现时这个参数造成我以‘root’@'127.0.0.1′这个用户登陆了服务器，这个用户没有创建触发器的权限。后来，取消了SKIP-NAME-RESOLVE参数后，执行成功，再把该参数设置回去。重启。OK。

所以，在设置这个参数的时候一定要注意时机：先用超级管理员将所有的用户创建好，再将权限分配好之后，才设置这个参数生效。

【参数说明】 [mysqld]

max_allowed_packet=16M

增加该变量的值十分安全，这是因为仅当需要时才会分配额外内存。例如，仅当你发出长查询或mysqld必须返回大的结果行时mysqld才会分配更多内存。该变量之所以取较小默认值是一种预防措施，以捕获客户端和服务器之间的错误信息包，并确保不会因偶然使用大的信息包而导致内存溢出。

如果你正是用大的BLOB值，而且未为mysqld授予为处理查询而访问足够内存的权限，也会遇到与大信息包有关的奇怪问题。如果怀疑出现了该情况，请尝试在mysqld_safe脚本开始增加ulimit -d 256000，并重启mysqld。

##########################################

##### MySQL怎样打开和关闭数据库表 #####

##########################################

table_cache, max_connections和max_tmp_tables影响服务器保持打开的文件的最大数量。如果你增加这些值的一个或两个，你可以遇到你的操作系统每个进程打开文件描述符的数量上强加的限制。然而，你可以能在许多系统上增加该限制。请教你的OS文档找出如何做这些，因为改变限制的方法各系统有很大的不同。

table_cache与max_connections有关。例如，对于200个打开的连接，你应该让一张表的缓冲至少有200 * n，这里n是一个联结(join)中表的最大数量。

打开表的缓存可以增加到一个table_cache的最大值(缺省为64;这可以用mysqld的-O table_cache=#选项来改变)。一个表绝对不被关闭，除非当缓存满了并且另外一个线程试图打开一个表时或如果你使用mysqladmin refresh或mysqladmin flush-tables。

当表缓存满时，服务器使用下列过程找到一个缓存入口来使用：

不是当前使用的表被释放，以最近最少使用(LRU)顺序。

如果缓存满了并且没有表可以释放，但是一个新表需要打开，缓存必须临时被扩大。

如果缓存处于一个临时扩大状态并且一个表从在用变为不在用状态，它被关闭并从缓存中释放。

对每个并发存取打开一个表。这意味着，如果你让2个线程存取同一个表或在同一个查询中存取表两次(用AS)，表需要被打开两次。任何表的第一次打开占2个文件描述符;表的每一次额外使用仅占一个文件描述符。对于第一次打开的额外描述符用于索引文件;这个描述符在所有线程之间共享

skip-networking

不在tcp/ip端口上进行监听，所有的连接都是通过本地的socket文件连接，这样可以提高安全性，确定是不能通过网络连接数据库。

skip-locking

避免mysql的外部锁定，增强稳定性

skip-name-resolve

避免mysql对外部的连接进行DNS解析，若使用此设置，那么远程主机连接时只能使用ip，而不能使用域名

max_connections = 3000

指定mysql服务所允许的最大连接进程数，

max_connect_errors = 1000

每个主机连接允许异常中断的次数，当超过该次数mysql服务器将禁止该主机的连接请求，直到mysql服务重启，或者flushhosts命令清空host的相关信息

table_cache = 614k

表的高速缓冲区的大小，当mysql访问一个表时，如果mysql表缓冲区还有空间，那么这个表就会被打开通放入高速缓冲区，好处是可以更快速的访问表中的内容。

如果open_tables和opened_tables的值接近该值，那么久该增加该值的大小

max_allowed_packet = 4M

设定在网络传输中一次可以传输消息的最大值，系统默认为1M，最大可以是1G

sort_buffer_size = 16M

排序缓冲区用来处理类似orderby以及groupby队列所引起的排序，系统默认大小为2M,该参数对应分配内存是每个连接独占的，若有100个连接，实际分配的排序缓冲区大小为6*100;推荐设置为6M-8M

join_buffer_size 8M

联合查询操作所使用的缓冲区大小。

thread_cache_size = 64

设置threadcache池中可以缓存连接线程的最大数量，默认为0，该值表示可以重新利用保存在缓存中线程的数量，当断开连接时若缓存中还有空间，那么客户端的线程将被放到缓存中，如果线程重新被请求，那么请求将从缓存中读取，若果缓存中是空的或者是新的请求，那么线程将被重新创建。设置规律为：1G内存设置为8,2G内存设置为16,4G以上设置为64

query_cache_size = 64M

指定mysql查询缓冲区的大小，用来缓冲select的结果，并在下一次同样查询的时候不再执行查询而直接返回结果，根据Qcache_lowmem_prunes的大小，来查看当前的负载是否足够高

query_cache_limit = 4M

只有小于该值的结果才被缓冲，放置一个极大的结果将其他所有的查询结果都覆盖

tmp_table_size 256M

内存临时表的大小，如果超过该值，会将临时表写入磁盘

default_storage_engine = MYISAM

创建表时默认使用的存储引擎

log-bin=mysql-bin

打开二进制日志功能

key_buffer_size = 384M

指定索引缓冲区的大小，内存为4G时刻设置为256M或384M

read_buffer_size = 8M

用来做MYISAM表全表扫描的缓冲大小

。。。。。。。。。

↧

MSSQL之八事务

September 2, 2016, 2:23 am

≫ Next: Oracle查询技巧与优化（三）字符串操作

≪ Previous: skip-external-lockingskip-locking参数详解

事务(TRANSACTION)是作为单个逻辑工作单元执行的一系列操作,这些操作作为一个整体一起向系统提交，要么都执行、要么都不执行 ,事务是一个不可分割的工作逻辑单元 .
事务必须具备以下四个属性，简称ACID 属性：
原子性（Atomicity）：事务是一个完整的操作。事务的各步操作是不可分的（原子的）；要么都执行，要么都不执行
一致性（Consistency）：当事务完成时，数据必须处于一致状态
隔离性（Isolation）：对数据进行修改的所有并发事务是彼此隔离的，这表明事务必须是独立的，它不应以任何方式依赖于或影响其他事务
永久性（Durability）：事务完成后，它对数据库的修改被永久保持，事务日志能够保持事务的永久性
/*--举例：为什么需要事务--*/
--同一银行，如都是农行的帐号，可以直接转账
/*－--------------建表-----------------*/
--创建农行帐户表bank
IF EXISTS(SELECT * FROM sysobjects WHERE name='bank')
DROP TABLE bank
GO
CREATE TABLE bank
(
customerName CHAR(10), --顾客姓名
cardID CHAR(10) NOT NULL , --卡号
currentMoney MONEY --当前余额
)
GO
/*---添加约束：根据银行规定，帐户余额不能少于1元，除非销户----*/
ALTER TABLE bank
ADD CONSTRAINT CK_currentMoney CHECK(currentMoney>=1)
GO
/*--插入测试数据：张三开户，开户金额为800 ；李四开户，开户金额1 ---*/
INSERT INTO bank(customerName,currentMoney,cardId) VALUES('张三',1000,'1001 0001')
INSERT INTO bank(customerName,currentMoney,cardId) VALUES('李四',1,'1002 0002')
GO
--查看结果
delete from bank
SELECT * FROM bank
GO
/*--转帐测试：张三希望通过转账，直接汇钱给李四1000元--*/
--我们可能会这样这样写代码
--张三的帐户少1000元，李四的帐户多1000元
/***************开始
UPDATE bank SET currentMoney=currentMoney-1000
WHERE customerName='张三'
UPDATE bank SET currentMoney=currentMoney+1000
WHERE customerName='李四'
*********结束/
GO
--再次查看结果，结果发现了什么严重的错误?如何解决呢？
SELECT * FROM bank
GO
--恢复原来的数据
--UPDATE bank SET currentMoney=currentMoney-1000 WHERE customerName='李四'
SET NOCOUNT ON --不显示受影响的行数信息
print '查看转帐事务前前前前前前的余额'
SELECT * FROM bank
GO
/*--开始事务（指定事务从此处开始，后续的T-SQL语句都是一个整体--*/
BEGIN TRANSACTION
/*--定义变量，用于累计事务执行过程中的错误--*/
DECLARE @errorSum INT
SET @errorSum=0 --初始化为0，即无错误
/*--转帐：张三的帐户少1000元，李四的帐户多1000元*/
UPDATE bank SET currentMoney=currentMoney-200 WHERE customerName='张三'
SET @errorSum=@errorSum+@@error --累计是否有错误
UPDATE bank SET currentMoney=currentMoney+200 WHERE customerName='李四'
SET @errorSum=@errorSum+@@error --累计是否有错误
print '查看转帐事务过程中中中中中中的余额'
SELECT * FROM bank
/*--根据是否有错误，确定事务是提交还是撤销---*/
IF @errorSum<>0 --如果有错误
BEGIN
print '交易失败，回滚事务'
ROLLBACK TRANSACTION
END
ELSE
BEGIN
print '交易成功，提交事务，写入硬盘，永久的保存'
COMMIT TRANSACTION
END
GO
print '查看转帐事务后后后后后后后的余额'
SELECT * FROM bank
GO
--*******************************案例一
--@@rowcount 返回受上一语句影响的行数。
--select @@rowcount
--select @@error
create table tab1
(
stu_id int primary key,
stu_name varchar(5),
stu_age int,
stu_height int
)
create table tab2
(
stu_id int primary key,
stu_name varchar(5),
stu_age int,
stu_height int
)
-----------------------------开始事务------------------------------------------------------
begin transaction
insert into tab1 (stu_id,stu_name,stu_age,stu_height) values(1,'小明',22,180)
if @@error <> 0 --or @@rowcount <> 1
goto seed
insert into tab1 (stu_id,stu_name,stu_age,stu_height) values(1,'小黄',23,150)
if @@error <> 0 --or @@rowcount <> 1
goto seed
insert into tab1 (stu_id,stu_name,stu_age,stu_height) values(1,'小李',24,190)
if @@error <> 0 --or @@rowcount <> 1
goto seed
insert into tab1 (stu_id,stu_name,stu_age,stu_height) values(4,'小张',25,176)
seed:
--print @@error
if @@error <> 0 or @@rowcount <> 1
begin
--select @@error
--select @@rowcount as ssss
rollback transaction
print '发生错误提交无法完成！！！！'
end
else
begin
commit transaction
print '无错误发生提交正常！！！'
end
-------------------------------结束事务--------------------------------------------------
delete from tab1
select * from tab1
truncate table tab1
drop table tab1
drop table tab2
---------------------------------------------------------------------------------------------
Select a,b,c into tab1 from tab2 where a=2
If @@rowcount=0 Print "no rows were copied"
SELECT CONVERT(char(5), 3.147) AS 'CHAR(1)',
CONVERT(char(5), 3.147) AS 'CHAR(3)',
CONVERT(char(120), 3.147) AS 'CHAR(5)'
GO
--**************************************************************案例2
use master
go
create table 物品管理数据表
(
部门 varchar(10),
物品 varchar(10),
数量 int,
CONSTRAINT CK_物品管理数据表 CHECK (数量 > 0)
)
insert 物品管理数据表 (部门,物品,数量) values('财务部','办公桌',2)
insert 物品管理数据表 (部门,物品,数量) values('业务部','办公桌',10)
insert 物品管理数据表 (部门,物品,数量) values('管理部','办公桌',5)
insert 物品管理数据表 (部门,物品,数量) values('业务部','会议桌',5)
insert 物品管理数据表 (部门,物品,数量) values('研发部','会议桌',7)
insert 物品管理数据表 (部门,物品,数量) values('生产部','会议桌',8)
go
select * from 物品管理数据表
truncate table 物品管理数据表
drop table 物品管理数据表
--显式事务
-------------------------事务开始----------------------------------------
Begin Transaction --开始事务
update 物品管理数据表
set 数量 = 数量 + 1
where 部门='业务部' and 物品='办公桌'
if @@error>0 --or @@rowcount<>1
begin
goto error1
end
update 物品管理数据表
set 数量 = 数量 - 1
where 部门='财务部' and 物品='办公桌'
error1:
if @@error>0 --or @@rowcount<>1
begin
print '毛病!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
rollback transaction --取消并回滚事务
end
else
--print '毛病！！！'
commit tran --提交事务
select @@rowcount
select @@error
-------------------------事物结束----------------------------------------
-------------------------查询结果----------------------------------------
select * from 物品管理数据表
--**********************************隐形事务
set implicit_transactions on --
--set implicit_transactions off --关闭隐含事务模式
--隐性事务一般只使用在测试或查错上，由于会占用大量资源，
--因此并不建议在数据库实际运作时使用。
--**********************************
create table 物品管理
(
物品id int not null primary key,
物品名称 char(10),
物品数量 int,
部门 char(10)
)
insert into 物品管理 (物品id,物品名称,物品数量,部门) values(1,'桌子',12,'行政部')
insert into 物品管理 (物品id,物品名称,物品数量,部门) values(2,'板凳',23,'学术部')
insert into 物品管理 (物品id,物品名称,物品数量,部门) values(3,'书架',33,'市场部')
insert into 物品管理 (物品id,物品名称,物品数量,部门) values(4,'电脑',22,'人事部')
insert into 物品管理 (物品id,物品名称,物品数量,部门) values(5,'杯子',6,'财务部')
insert into 物品管理 (物品id,物品名称,物品数量,部门) values(6,'鼠标',45,'组织部')
select * from 物品管理
---------------------------------------事务开始---------------------------------------------
begin transaction object
insert into 物品管理 (物品id,物品名称,物品数量,部门) values(7,'C语言',2,'开发部')
save transaction jet
insert into 物品管理 (物品id,物品名称,物品数量,部门) values(8,'Java',9,'开发部')
if @@error <> 0
begin
rollback tran jet
print '输入记录出现问题，请重新检查！！！'
end
commit tran object
---------------------------------------事务结束---------------------------------------------
truncate table 物品管理
drop table 物品管理
select * from 物品管理
---*************************************************************事务保存点2
begin tran affair
....... -- 操作语句
save tran temptran
...... -- 操作语句
if (@@error <> 0)
rollback tran temptran -- 回滚到事务保存点
else
commit tran affair
----------------------示例----------------------------------
create table stu_info
(
stu_id int primary key not null,
stu_name varchar(5),
stu_age int,
stu_height int
)
insert into stu_info (stu_id,stu_name,stu_age,stu_height) values(1,'小明',22,180)
insert into stu_info (stu_id,stu_name,stu_age,stu_height) values(2,'小黄',23,150)
insert into stu_info (stu_id,stu_name,stu_age,stu_height) values(3,'小张',25,176)
insert into stu_info (stu_id,stu_name,stu_age,stu_height) values(4,'小王',26,164)
insert into stu_info (stu_id,stu_name,stu_age,stu_height) values(5,'小兵',24,170)
------------------------------开始事务-------------------------------------
begin transaction stu
delete from stu_info
where stu_id = 1
save transaction protec
update stu_info
set stu_name = '红旗'
where stu_id in (3,5)
if @@error > 0 or @@rowcount <> 1
rollback tran protec
else
commit transaction stu
----------------结束事务--------------------------------------------------
select * from stu_info
truncate table stu_info
--**************************************************************锁
select suser_sid('Arwen')
select suser_sname(0x2EBCE6E90123D24AA542D8F538F278AD)
select user_name(3)
select user_id('guest')
-----------------------------------------------------------------------------------------------------------------------------
use northwind
SELECT *
FROM Employees WITH (nolock) --这个语句就提供出了所有的数据，包括正在被其它处理器使用的数据，所以，得出的数据可能是脏数据，但是对于任务而言并没有很大的影响。
UPDATE
Employees WITH (tablock)
SET Title='Test' -- 这个例子就是更新表中所有的行，所以使用了一个表锁。
/*
FASTFIRSTROW —选取结果集中的第一行，并将其优化
HOLDLOCK —持有一个共享锁直至事务完成
NOLOCK —不允许使用共享锁或独享锁。这可能会造成数据重写或者没有被确认就返回的情况；因此，就有可能使用到脏数据。这个提示只能在SELECT中使用。
PAGLOCK —锁表格
READCOMMITTED —只读取被事务确认的数据。这就是SQL Server的默认行为。
READPAST —跳过被其它进程锁住的行，所以返回的数据可能会忽略行的内容。这也只能在SELECT中使用。
READUNCOMMITTED —等价于NOLOCK.
REPEATABLEREAD —在查询语句中，对所有数据使用锁。这可以防止其它的用户更新数据，但是新的行可能被其它的用户插入到数据中，并且被最新访问该数据的用户读取。
ROWLOCK —按照行的级别来对数据上锁。SQL Server通常锁到页或者表级别来修改行，所以当开发者使用单行的时候，通常要重设这个设置。
SERIALIZABLE —等价于HOLDLOCK.
TABLOCK —按照表级别上锁。在运行多个有关表级别数据操作的时候，你可能需要使用到这个提示。
UPDLOCK —当读取一个表的时候，使用更新锁来代替共享锁，并且保持一直拥有这个锁直至事务结束。它的好处是，可以允许你在阅读数据的时候可以不需要锁，并且以最快的速度更新数据。
XLOCK —给所有的资源都上独享锁，直至事务结束。
*/
--*****************************************************************脏读
create database bank
go
use bank
go
create table student
(
stud_id int,
stud_name char(10),
grade int
)
drop table student
insert into student(stud_id,stud_name,grade)
values(1,'小贵子',79)
insert into student(stud_id,stud_name,grade)
values(2,'小春子',98)
insert into student(stud_id,stud_name,grade)
values(3,'小溜子',68)
insert into student(stud_id,stud_name,grade)
values(4,'小毛子',86)
delete from student
select * from student
===================================================
--set implicit_transactions on
--set implicit_transactions off
------------------------------------------------------------------------
--========================事务缺陷======================================
--脏读
begin transaction
update student
set grade=100
where stud_id=1
if @@error >0
rollback transaction
commit transaction
--=======================不可重复读==============================
--事务查询
set transaction isolation level
repeatable read
---============================================
set transaction isolation level
repeatable read
begin transaction
select * from student
--where stud_id=1
commit transaction
--=============================================
--事务一
set transaction isolation level
repeatable read
begin transaction
update student
set grade=220
where stud_id=1
commit transaction
--事务二
set transaction isolation level
repeatable read
begin transaction
select * from student
where stud_id=1
commit transaction
--************************隔离级别
--隔离级别
级别一 read uncommitted
级别二 read committed
级别三 repeatable read
级别四 serializable
set transaction isolation level
repeatable read
begin transaction
select * from student
where stud_id=1
commit transaction

↧

Oracle查询技巧与优化（三）字符串操作

September 2, 2016, 2:22 am

≫ Next: C#mysqlcheck检查数据库异常+修复数据库

≪ Previous: MSSQL之八事务

前言

前两篇blog分别介绍了Oracle中的单表查询（Oracle查询技巧与优化（一）单表查询与排序）与多表查询（Oracle查询技巧与优化（二）多表查询）的相关技巧与优化，那么接下来本篇blog就具体研究一下Oracle查询中使用频率很高的一种数据类型——“字符串”的相关操作与注意点。

常用操作符与函数

如题，先简单回顾一下我个人认为在Oracle的查询或存储过程中使用频率较高的操作符与函数，作为基础知识回顾，首先是最常用的字符串连接符“||”，即Oracle中的字符串拼接符号，例如：

select 'wlwlwlwl' || '015' || 'csdn' as title from dual;
运行结果如下：
Oracle查询技巧与优化（三）字符串操作

vc+zo9PDtcTX1rf7tK66r8r9o6zK18/IysdpbnN0ctPrc3Vic3Ryo6zV4sG9uPa6r8r9vq2zo73hus/KudPDo6zPwsPmt9ax8L+00rvPwqOsz8jAtL+0v7TP4LbUvPK1pdK70Km1xHN1YnN0crqvyv2jrMv8tcTT77eouPHKvcjnz8KjujxzdHJvbmc+U1VCU1RSKGNFeHByZXNzaW9uo6xuU3RhcnRQb3NpdGlvbiBbo6xuQ2hhcmFjdGVyc1JldHVybmVkXSk8L3N0cm9uZz6jrM2oy9e1xL2yo6y12jG49rLOyv3Kx9S019a3+7Suo6y12jK49rLOyv3Kx7+qyry92MihtcTOu9bDo6y12jO49rLOyv3Kx73YyKGzpLbIo6zA/cjno7o8L3A+DQo8cHJlIGNsYXNzPQ=="brush:sql;">
select substr('abcdefg',0,1) as newstr from dual; // 返回a
select substr('abcdefg',1,1) as newstr from dual; // 返回a，0和1都代表第1个字符
select substr('abcdefg',2,4) as newstr from dual; // 返回bcde，注意包含位置字符
select substr('abcdefg',-3,2) as newstr from dual; // 返回ef，负数表示从后往前数位置，其余不变

substr函数实在不需要做过多说明，接下来看看instr函数，它的作用是“在一个字符串中查找指定的字符,返回被查找到的指定的字符的位置”，有点类似于java中String的indexOf方法：

select instr('abcd','a') from dual; // 返回1
select instr('abcd','b') from dual; // 返回2
select instr('abcd','c') from dual; // 返回3
select instr('abcd','e') from dual; // 返回0
如上所示，如果找不到指定的子串，则返回0，以上是instr函数最简单的用法，接下来具体看一下instr函数完整的语法格式：instr(string1,string2[,start_position[,nth_appearence]])，可以看到它最多支持4个参数，下面分别解释一下每个参数的意思：
string1，即源字符串。 string2，目标字符串，即要在string1中查找的字符串。 start_position，从string1开始查找的位置。可选，默认为1，正数时，从左到右检索，负数时，从右到左检索。 nth_appearence，查找第几次出现string2。可选，默认为1，不能为负。
如上所示，之所以instr函数很强大往往是依赖第4个参数的作用，在某些场合往往能起到关键作用，下面看一个我项目中的例子，首先学生表有一个字段记录了每个学生的体育考试选考科目，每人选5门待考科目，每个科目都有一个代码，在学生表存储的数据格式是“每门科目代码加逗号拼接的字符串”，形如：
Oracle查询技巧与优化（三）字符串操作

如上图，每一个选考项在字典表均可对应：
Oracle查询技巧与优化（三）字符串操作

现在假设我们有这样一个需求，把每个学生的每一门体育选考项目的代码和名称分别列出来，达到以下的效果：
Oracle查询技巧与优化（三）字符串操作

如上图所示，该如何实现呢？首先思路很明确，把学生表记录的选考科目代码字符串进行拆分，然后再一一列举，那么此时需要注意的一个问题是代码的长度不确定，科目代码可能是1位数字，例如：1,2,3,8,9，但也可能存在两位数字，例如：1,2,3,11,12，那么直接用substr按位截取肯定是行不通的，此时我们应该想办法如何按符号截取，每个考试科目代码中间都是用逗号隔开的，如果能找到相邻两个逗号之间的数字，那么问题就迎刃而解了，这里需要用到的就是前面说的instr和substr相结合：

select t1.sid_,
t1.stuname_,
t1.km1,
t2.itemvalue_ km1name,
t1.km2,
t3.itemvalue_ km2name,
t1.km3,
t4.itemvalue_ km3name,
t1.km4,
t5.itemvalue_ km4name,
t1.km5,
t6.itemvalue_ km5name
from (select sid_,
stuname_,
substr(tysxkm_, 0, instr(tysxkm_, ',', 1, 1) - 1) as km1,
substr(tysxkm_,
instr(tysxkm_, ',', 1, 1) + 1,
instr(tysxkm_, ',', 1, 2) - instr(tysxkm_, ',', 1, 1) - 1) as km2,
substr(tysxkm_,
instr(tysxkm_, ',', 1, 2) + 1,
instr(tysxkm_, ',', 1, 3) - instr(tysxkm_, ',', 1, 2) - 1) as km3,
substr(tysxkm_,
instr(tysxkm_, ',', 1, 3) + 1,
instr(tysxkm_, ',', 1, 4) - instr(tysxkm_, ',', 1, 3) - 1) as km4,
substr(tysxkm_, instr(tysxkm_, ',', 1, 4) + 1) as km5
from t_studentinfo) t1
left join (select itemkey_, itemvalue_
from t_dict
where itemname_ = 'SportsType') t2
on t1.km1 = t2.itemkey_
left join (select itemkey_, itemvalue_
from t_dict
where itemname_ = 'SportsType') t3
on t1.km2 = t3.itemkey_
left join (select itemkey_, itemvalue_
from t_dict
where itemname_ = 'SportsType') t4
on t1.km3 = t4.itemkey_
left join (select itemkey_, itemvalue_
from t_dict
where itemname_ = 'SportsType') t5
on t1.km4 = t5.itemkey_
left join (select itemkey_, itemvalue_
from t_dict
where itemname_ = 'SportsType') t6
on t1.km5 = t6.itemkey_;

如上所示，第15行找到第1个逗号的位置，同时通过减1来算出截取长度，17-18行则是找到第2个和第3个逗号的位置，并截取出其中的考试科目代码，依此类推，而25行则是找到最后一个逗号的位置并直接截取后半段得到最后一项考试科目代码，这样就完成了每项考试科目代码的分割，核心思想是通过符号来分割，在blog后面将介绍一种更为简便的方式（正则函数REGEXP_SUBSTR），此处暂且用substr和instr组合的方式来实现，旨在回顾基础性的重点内容，接下来再看一些Oracle字符串相关的基础性常用的函数，比如求字符串长度：

select length('abcdefg') as length_ from dual; // 返回7

去空格函数trim：

select trim(' abcdefg ') from dual; // 去左右空格
select ltrim(' abcdefg') from dual; // 去左空格
select rtrim('abcdefg ') from dual; // 去右空格

还有字母大小写转换函数：

select upper('AbCdEfG') from dual; // 小写转大写返回ABCDEFG
select lower('AbCdEfG') from dual; // 大写转小写返回abcdefg

Oracle中基础性的常用函数先记录这么多，接下来具体研究一下Oracle中较为复杂的字符串应用场景以及相关函数。

字符串文字中包含引号
如题，单引号的转义问题，解决方法很简单，只需要把一个单引号换成两个单引号表示即可：
Oracle查询技巧与优化（三）字符串操作

除此之外，在Oracle10g开始引入了q-quote特性，允许按照指定规则，也就是Q或q开头，字符串前后使用界定符”’”，规则很简单：

q-quote界定符可以是除了TAB、空格、回车外的任何单字符或多字节字符。界定符可以是[ ]、{ }、<>、( )，而且必须成对出现。
举例来看一下：
Oracle查询技巧与优化（三）字符串操作

如上图所示，格式也很简单，不必再做过多解释。

计算字符在字符串中出现的次数

如题，例如有如下字符串：GREEN,RED,YELLOW,BLUE,WHITE，现在需要用SQL查询出其中包含的单词个数，通常我们只需要算出字符串中的逗号个数再加1就行了，那么如何计算字符串中某个字符的个数呢？这里就要用到Oracle的正则表达式函数了：

select length(regexp_replace('GREEN,RED,YELLOW,BLUE,WHITE', --source_char，源字符串
'[^,]', --pattern，正则表达式（替换满足正则的值）
'', --replace_string，替换后的目标字符串
1, --position，起始位置（默认为1）
0, --occurrence，替换的次数（0是无限次）
'i')) --match_parameter，匹配行为（'i'是无视大小写）
+ 1 as count_
from dual;
运行结果如下：
Oracle查询技巧与优化（三）字符串操作

如上图所示，注释中简要说明了REGEXP_REPLACE函数每个参数的含义，通常只需要前三个参数即可，在Oracle的Database Online Documentation中可以看到该函数的格式：
Oracle查询技巧与优化（三）字符串操作

再简单解释一下上面的SQL语句，可以发现正则表达式是[^,]，中括号中的^符号表示“不包含以及否定的意思”（而中括号外的^则表示字符串的开始，更多的Oracle正则表达式可参考Oracle正则表达式使用介绍），所以regexp_replace('GREEN,RED,YELLOW,BLUE,WHITE', '[^,]','')就表示将“除了逗号以外的所有字符替换成空”，然后通过length函数就算出了逗号的长度（个数），最后再加1就得到单词的个数了。从Oracle 11g开始又引入了一个新的正则函数REGEXP_COUNT使得这个问题的解决方案更加简单了：
select regexp_count('GREEN,RED,YELLOW,BLUE,WHITE', ',') + 1 as count_
from dual;
运行结果如下：
Oracle查询技巧与优化（三）字符串操作

如上图所示，REGEXP_COUNT函数可以直接计算出逗号的个数，所以就无需再通过REGEXP_REPLACE去迂回计算了，但注意这个函数是Oracle 11g之后引入的，既然提到了REGEXP_COUNT，下面就具体看一下它的语法格式：
Oracle查询技巧与优化（三）字符串操作

如上图，可以看到REGEXP_COUNT相较于REGEXP_REPLACE简单一些，必选参数依然两个，用法也很简单，关于REGEXP_COUNT暂且先介绍这么多。

从字符串中删除不需要的字符
如题，这次以经典的scott.emp表为例：
Oracle查询技巧与优化（三）字符串操作

比如我有这样一个需求，我想把ENAME这一列中的名字中包含的元音字母（AEIOU）去掉，那么该如何做呢？上一小节提到过REGEXP_REPLACE这个函数，显而易见，就用它就可以轻松的完成替换：

select regexp_replace(ename, '[AEIOU]') as ENAME from scott.emp;
运行结果如下：
Oracle查询技巧与优化（三）字符串操作

如上图所示，通过REGEXP_REPLACE函数很容易将AEIOU这5个字母转换为空字符串，从而避免了直接用replace函数要进行多层嵌套的问题。

字符和数字相分离

如题，回到我们的学生表，首先创建一个测试用的视图：

create or replace view v_test_11 as
select bmh_ || stuname_ as data from t_studentinfo;
select * from v_test_11;
运行结果如下：
Oracle查询技巧与优化（三）字符串操作

如上所示，我们通过拼接学生的考号和姓名组成了新的一列data，现在的需求是再将这一列按考号和姓名拆分开来，如何实现呢？很明显拆分的是数字和汉字，那么自然用正则最为合适了：

select regexp_replace(data, '[0-9]') as name1,
regexp_replace(data, '[^0-9]') as zkzh
from v_test_11;
运行结果如下：
Oracle查询技巧与优化（三）字符串操作

如上图，可以看到成功拆分了考号和姓名，简单解释一下上面的正则，[0-9]代表数字[0123456789]，还可以表示为[[:digit:]]，那么将数字全部替换后自然就剩下了所有的汉字组成的NAME1字段，同理，[^0-9]表示[0-9]的外集，即“除了0123456789”之外的所有字符，那么将除了数字之外的所有字符替换为空，剩下的自然就是纯数字组成的ZKZH字段了，还需要注意一点就是^符号，它在方括号表达式内的意思是“否定、非、相反”的意思，如果它在方括号表达式以外则表示“字符串开始”的意思，这个会在后面的例子中再做说明。
按字符串中的数值排序

如题，依旧先创建一个测试视图：

create or replace view v_test_05 as
select stuname_ || ' ' ||schoolname_ || ' ' || lqfs_ as data from t_lq order by stuname_;
select * from v_test_05;
运行结果如下：
Oracle查询技巧与优化（三）字符串操作

如上图，我们再一次进行了拼接构造数据，这次拼接了三列，分别是学生姓名，所在学校以及考试总分，我们这次的需求是按照分数倒序排列（目前是按照姓名拼音字母顺序排序的），该如何处理呢？通过这几个例子应该找出规律了，这种提取字符串中某一类型的数据再做相关操作的直接用正则表达式肯定是最方便的！所以我们依然通过正则提取列中的分数再来进行排序即可：

select data from v_test_05 order by to_number(regexp_replace(data, '[^0-9]')) desc
运行结果如下所示：
Oracle查询技巧与优化（三）字符串操作

如上图，依旧通过正则[^0-9]将字符串中的非数字全部替换为空，然后得到纯分数字符串后再通过to_number函数转换为数字即可排序，与上面的几个例子思路基本一致，都应用了正则函数REGEXP_REPLACE，关于REGEXP_REPLACE的函数至此就介绍的差不多了，最后再补充一个例子是今年项目中遇到的一个问题，需求是“将查询出的字符串每个字符中间加空格后返回”，比如：数据库中的字段值原本是”abc”，那么查询出来应当是”a b c”，如果是”王小明”，那么查询出来应当是”王小明”，解决这个问题最简单的方式依然是通过正则：
select regexp_replace('abc','(.)','\1 ') as data from dual
运行结果如下图所示：
Oracle查询技巧与优化（三）字符串操作

如上图所示，正则(.)表示匹配除换行符n之外的任意单个字符，而\1则是每个字符自身再加一个空格，所以就得到了我们预期的结果。

提取第n个分隔的子串
如题，这里就会用到我们blog开头用instr和substr写的那个例子的简便写法，上面也说了会用到将要介绍的第三个正则函数REGEXP_SUBSTR，所以先看看这个正则函数的语法格式以及参数说明：
Oracle查询技巧与优化（三）字符串操作

如上图，看一下每个参数的含义：

source_string：源字符串 pattern：正则表达式 position：起始位置 occurrence：第n个能匹配正则的字符串

可以看到和前面的正则函数都差不多，接下来将blog开头的例子改写成REGEXP_SUBSTR的形式：

select t1.sid_,
t1.stuname_,
t1.km1,
t2.itemvalue_ km1name,
t1.km2,
t3.itemvalue_ km2name,
t1.km3,
t4.itemvalue_ km3name,
t1.km4,
t5.itemvalue_ km4name,
t1.km5,
t6.itemvalue_ km5name
from (select sid_,
stuname_,
regexp_substr(tysxkm_, '[^,]+', 1, 1) as km1,
regexp_substr(tysxkm_, '[^,]+', 1, 2) as km2,
regexp_substr(tysxkm_, '[^,]+', 1, 3) as km3,
regexp_substr(tysxkm_, '[^,]+', 1, 4) as km4,
regexp_substr(tysxkm_, '[^,]+', 1, 5) as km5
from t_studentinfo) t1
left join (select itemkey_, itemvalue_
from t_dict
where itemname_ = 'SportsType') t2
on t1.km1 = t2.itemkey_
left join (select itemkey_, itemvalue_
from t_dict
where itemname_ = 'SportsType') t3
on t1.km2 = t3.itemkey_
left join (select itemkey_, itemvalue_
from t_dict
where itemname_ = 'SportsType') t4
on t1.km3 = t4.itemkey_
left join (select itemkey_, itemvalue_
from t_dict
where itemname_ = 'SportsType') t5
on t1.km4 = t5.itemkey_
left join (select itemkey_, itemvalue_
from t_dict
where itemname_ = 'SportsType') t6
on t1.km5 = t6.itemkey_;
运行结果如下：
Oracle查询技巧与优化（三）字符串操作

如上图，可以看到通过REGEXP_SUBSTR来做“按符号截取字符串”要比instr和substr组合好用的多，正则中加号表示匹配1次以上的意思，所以[^,]+所以表示匹配不包含逗号的多个字符，在此也就表示被逗号分隔后的各个子串。REGEXP_SUBSTR的第3个参数表示“从第1个字符开始”，同其它正则函数的position都是一样的，而重点是第4个参数，表示第n个能匹配到该正则的字符串，分隔之后自然就是按自然顺序就可以取到各字符串了。
分解IP地址

如题，例如需要将一个ip地址中的各段取出来，显而易见和上面的需求完全一样，只不过一个是逗号分隔，而ip地址是点分隔：

select regexp_substr(v.ipaddr, '[^.]+', 1, 1) as firstpart,
regexp_substr(v.ipaddr, '[^.]+', 1, 2) as secondpart,
regexp_substr(v.ipaddr, '[^.]+', 1, 3) as thirdpart,
regexp_substr(v.ipaddr, '[^.]+', 1, 4) as fourthpart
from (select '192.168.0.100' as ipaddr from dual) v
运行结果如下：
Oracle查询技巧与优化（三）字符串操作

和上一个例子用法一模一样，在此就不做过多赘述了。

查询只包含字母或数字型的数据
如题，说到查询中的正则，那么肯定是要用到REGEXP_LIKE这个函数了，依旧是先看一下REGEXP_LIKE的语法格式和参数说明：
Oracle查询技巧与优化（三）字符串操作

如上图所示，REGEXP_LIKE只有3个参数，用法类似于普通的LIKE模糊查询，下面依次看一下这每个参数的含义：

source_string：源字符串 pattern：正则表达式 match_parameter：匹配参数，例如’i’，作用是忽略大小写

REGEXP_LIKE返回匹配正则的字符，依旧通过例子来看，首先创建一个测试view：

create or replace view v_test_06 as
select '123' as data from dual union all
select 'abc' from dual union all
select '123abc' from dual union all
select 'abc123' from dual union all
select 'a1b2c3' from dual union all
select 'a1b2c3#' from dual union all
select '3$' from dual union all
select 'a 2' from dual union all
select '0123456789' from dual union all
select 'b3#45' from dual;
查询一下这个view：
Oracle查询技巧与优化（三）字符串操作

如上所示，可以看到准备了一组测试数据，那么接下来的需求是“查询只包含字母或数字的值“，就是说只能包含字母和数字，不能有其它字符，很明显依然要通过正则来判断，下面先看一下完整的写法：

select data from v_test_06 where regexp_like(data,'^[0-9A-Za-z]+$');
运行结果如下：
Oracle查询技巧与优化（三）字符串操作

如上图，可以看到已经成功过滤出只包含字符和数字的值，那么接下来具体解释一下这个正则^[0-9A-Za-z]+$，首先REGEXP_LIKE对应普通的LIKE，REGEXP_LIKE(data,’[ABC]’)就相当于LIKE ‘%A%’ or LIKE ‘%B%’ or LIKE ‘%C%’，而REGEXP_LIKE(data,’[0-9A-Za-z]+’)就相当于LIKE ‘%数字%’ or LIKE ‘%小写字母%’ or LIKE ‘%大写字母%’。需要格外注意的一点就是^符号，它在中括号外面表示字符串开始（在前面的例子中均在字符串内，表示否定及相反），^[0-9A-Za-z]意思就是匹配以任意数字或者任意大小写字母开头的字符串，而正则中的$符号表示字符串结束，所以[0-9A-Za-z]$就表示匹配任意数字或大小写字母结尾的字符串，完全类似于模糊查询的LIKE ‘A%’ 和LIKE ‘%A’，但是^$在一起的时候的就是精确查找了，比如：
Oracle查询技巧与优化（三）字符串操作

那么上面这个正则^[0-9A-Za-z]+$为何能匹配到所有“只包含字符或数字的值”呢？其实加号（+）在这里也起到了关键作用，加号在正则中的意思是“匹配前面的表达式一次或多次”，所以在这里加号就表示每一个字符都要匹配[0-9A-Za-z]这个规则且以数字字母开头和结尾，所以这样就查出了只包含字符或数字的值。
列转行
如题，最后的一个话题，谈谈Oracle11.2版本开始提供的一个用于列传行的函数listagg，用法和postgresql9.3中的string_agg函数基本一致，我在之前的博客也专门介绍过postgres的string_agg这个函数（postgresql 9.3 自定义聚合函数实现多行数据合并成一列），但语法上比postgres的string_agg函数更繁琐一些，首先来看一下listagg这个函数的语法格式：
Oracle查询技巧与优化（三）字符串操作

如上图，listagg函数有4个参数，下面简单解释一下：

measure_expr：分组中每个列的表达式，也就是说需要合并的列 delimiter：分隔符，不设置的话，就表示无分割符 order_by_clause：进行合并中要遵守的排序顺序 query_partition_clause：表示listagg是具有分析函数analyze funcation特性
所以说listagg尽管更多被用作聚集函数，但它还是有analyze funcation特性。下面通过一个例子具体看一下listagg的用法，首先学生体育成绩表有如下数据：
Oracle查询技巧与优化（三）字符串操作

如上图，是一张体育考试成绩表，每名学生有5门考试成绩，可以看到ZKZH_这一列是考号，SKXMDM_是之前说过的每门考试科目的代码，而HSCJ_就是考试成绩了，假设我现在的需求是按考号分组，将每名学生的SKXMDM_用逗号拼接，同时求体育总成绩，该怎么做呢？很典型的一个列传行，这里用list_agg函数就再合适不过了：

select zkzh_,
listagg(skxmdm_, ',') within group(order by zkzh_) as skxmdm,
sum(hscj_) score
from T_SPORTTESTSCORE t
where t.kz1_ = 1
group by zkzh_
运行结果如下：
Oracle查询技巧与优化（三）字符串操作

如上图所示，可以看到很好的完成了列的合并以及sum求和，到此为止Oracle中字符串相关的函数暂且介绍到这里，后面有机会还会陆续添加。

总结

本篇blog着重记录了Oracle中和字符串相关的一些个人认为比较常用及重要的函数和相关使用场景，重难点是那4个正则函数，如果在查询中能灵活运用正则函数的话确实能快捷的实现一些复杂的需求，最后由于个人能力有限难免有疏漏之处欢迎各位读者批评指正，同样也希望对读了本文的新手朋友们有所帮助，The End。

↧

C#mysqlcheck检查数据库异常+修复数据库

September 2, 2016, 2:21 am

≫ Next: 数据库对表的三种分割技术

≪ Previous: Oracle查询技巧与优化（三）字符串操作

断电等不可预期的错误导致数据库表不能使用。。。所以在网上找了一下有什么可以修复户数据库的。

1.SQL语句。

2.mysql自带的mysqlcheck工具。

虽然有了介绍但是并不知道如何使用。

大家都是直接贴代码，但是对于没有经验的人来说都不知道是从哪里执行这几行代码。。。为此我也是飞了好多时间。

下面来介绍如何使用这个语句

至于解释随便搜一下满地都是关键词 mysqlcheck

1.sql语句修复

数据库还能执行sql语句时可以尝试

check table my_table;
repair table mytable;
check tables my_table1,my_table2;
这个是可以看到返回的
C#mysqlcheck检查数据库异常+修复数据库

但是只支持MyISAM格式的数据库表

可以这样更换一下类型或者所以下其他方式

alter table `cashier_goods` engine = MyISAM;
C#mysqlcheck检查数据库异常+修复数据库

2.第二种就是调用 mysql 中bin目录下的mysqlcheck来

我这里给出C#代码我的数据库名为supercashier 可以根据自己的需要来改

大致步骤是

1.确定mysql的bin位置

2.调用cmd执行命令并输出日志

代码也可自己根据需要调整（给出的例子是检测数据库是否异常异常则尝试修复一次，没有异常直接退出）

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text;
namespace FixMySQL
{
class Program
{
static void Main(string[] args)
{
bool checkresult = MethodC();//数据库检测
Console.WriteLine(checkresult ? "数据库正常" : "数据库损坏");
if (checkresult)
{
return;
}
Console.WriteLine("\r\n 是否需要进行尝试修复！");
Console.WriteLine("尝试修复请输入 y 并点击回车");
string isneedrepair = Console.ReadLine();
if (isneedrepair.ToUpper() == "Y")
{
MethodA();//修复并输出日志
}
Console.WriteLine("尝试修复后仍不能正常使用，请毫不犹豫联系我们！\r\n");
Console.WriteLine("按任意键退出!");
Console.ReadLine();
}
private static bool MethodC()
{
//数据库路径
var mySQLPath = Process.GetProcessesByName("mysqld");
try
{
string sqlpath = mySQLPath[0].MainModule.FileName;
//Console.WriteLine(mySQLPath[0].MainModule.FileName);
}
catch (Exception)
{
Console.WriteLine("未检测到数据库，或数据库没有开启！");
Console.WriteLine("按任意键退出！ \r\n");
Console.ReadLine();
return false;
}
string path = System.IO.Path.GetDirectoryName(mySQLPath[0].MainModule.FileName);
//桌面路径
string DeskTopPath = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory);
DeskTopPath += @"\超级支付数据库自动修复" + DateTime.Now.ToString("yyyyMMddhhmmss");
if (!System.IO.Directory.Exists(DeskTopPath))
{
System.IO.Directory.CreateDirectory(DeskTopPath);
}
string B = "mysqlcheck -c --databases supercashier -uroot";//检查库或表
Directory.SetCurrentDirectory(path);
string result = ExecuteCommand(B);
//string[] anaylize = System.Text.RegularExpressions.Regex.Split(result, "\r\n");
string[] anaylize = result.Replace("\r\n", "|").TrimEnd('|').Split('|');
foreach (var item in anaylize)
{
if (!string.IsNullOrEmpty(item) && !item.Contains("OK"))
{
return false;
}
}
return true;
}
private static void MethodA()
{
//数据库路径
var mySQLPath = Process.GetProcessesByName("mysqld");
try
{
string sqlpath = mySQLPath[0].MainModule.FileName;
//Console.WriteLine(mySQLPath[0].MainModule.FileName);
}
catch (Exception)
{
Console.WriteLine("未检测到数据库，或数据库没有开启！");
Console.WriteLine("按任意键退出！ \r\n");
Console.ReadLine();
return;
}
string path = System.IO.Path.GetDirectoryName(mySQLPath[0].MainModule.FileName);
//桌面路径
string DeskTopPath = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory);
DeskTopPath += @"\超级支付数据库自动修复" + DateTime.Now.ToString("yyyyMMddhhmmss");
if (!System.IO.Directory.Exists(DeskTopPath))
{
System.IO.Directory.CreateDirectory(DeskTopPath);
}
Console.WriteLine("输出路径" + DeskTopPath);
Console.WriteLine("请在桌面输出路径中查看修复日志文件");
//创建bat 使得修复代码可视化
StringBuilder batstr = new StringBuilder();
batstr.Append("数据库分析日志说明：\r\n");
batstr.Append("OK\r\n");
batstr.Append("表示数据库表正常\r\n");
batstr.Append("Table is already up to date\r\n");
batstr.Append("表示数据库表已经是最新的 \r\n\r\n");
batstr.Append("数据库检查日志说明：\r\n");
batstr.Append("OK\r\n");
batstr.Append("表示数据库表正常\r\n");
batstr.Append("broken\r\n");
batstr.Append("表示数据库表损坏\r\n\r\n");
batstr.Append("数据库修复日志说明：\r\n");
batstr.Append("OK \r\n");
batstr.Append("表示数据库表修复成功\r\n");
batstr.Append("The storage engine for the table doesn't support repair\r\n");
batstr.Append("表示数据库表类型不支持修复\r\n\r\n");
batstr.Append("数据库优化日志说明：\r\n");
batstr.Append("OK\r\n");
batstr.Append("表示数据库表优化成功\r\n");
batstr.Append("Table does not support optimize, doing recreate + analyze instead \r\n");
batstr.Append("表不支持优化，而是重新创建+分析\r\n");
batstr.Append("尝试修复后仍不能正常使用，请毫不犹豫联系我们！\r\n");
System.IO.File.WriteAllText(DeskTopPath + @"\日志阅读说明.txt", batstr.ToString());
string A = "mysqlcheck -a --databases supercashier -uroot>" + DeskTopPath + @"\数据库分析日志.txt" + " \r\n";//分析指定的表所有数据库
string B = "mysqlcheck -c --databases supercashier -uroot>" + DeskTopPath + @"\数据库表检查日志.txt" + " \r\n";//检查库或表
string C = "mysqlcheck -r --databases supercashier -uroot>" + DeskTopPath + @"\数据库表修复日志.txt" + " \r\n";//修复库或表
string D = "mysqlcheck -o --databases supercashier -uroot>" + DeskTopPath + @"\数据库表优化日志.txt" + " \r\n";//优化指定的表
//string E = "mysqlcheck --auto-repair --databases supercashier -uroot>" + DeskTopPath + @"\ssss.txt" + " \r\n";
//string E = "mysqlcheck --auto-repair --databases supercashier -uroot>" + DeskTopPath + @"\ssss.txt" + " \r\n";
// --repair--quick 尝试快速修复
//--repair 正常修复（除非快速修复失败）
//--repair--force 强行修复
Directory.SetCurrentDirectory(path);
Console.WriteLine("1.正在进行全局分析...");
ExecuteCommand(A);
Console.WriteLine("全局分析完成！ \r\n");
Console.WriteLine("2.正在进行检查...");
ExecuteCommand(B);
Console.WriteLine("检查完成！ \r\n");
Console.WriteLine("3.正在进行修复...");
ExecuteCommand(C);
Console.WriteLine("修复完成！ \r\n");
Console.WriteLine("4.正在进行优化...");
Console.WriteLine("优化过程所需时间较长请耐心等待！ \r\n");
ExecuteCommand(D);
Console.WriteLine("优化完成！ \r\n");
Console.WriteLine("按任意键退出！ \r\n");
Console.ReadLine();
}
private static void MethodB()
{
//数据库路径
var mySQLPath = Process.GetProcessesByName("mysqld");
try
{
Console.WriteLine(mySQLPath[0].MainModule.FileName);
}
catch (Exception)
{
Console.WriteLine("没有数据库！");
return;
}
string path = System.IO.Path.GetDirectoryName(mySQLPath[0].MainModule.FileName);
//桌面路径
string DeskTopPath = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory);
DeskTopPath += @"\mysql_Fix";
if (!System.IO.Directory.Exists(DeskTopPath))
{
System.IO.Directory.CreateDirectory(DeskTopPath);
}
Console.WriteLine("请在桌面输出路径中查看修复日志文件");
//创建bat 使得修复代码可视化
StringBuilder batstr = new StringBuilder();
//batstr.Append(" @echo off" + "\r\n");
batstr.Append("cd " + path + "\r\n");
batstr.Append("mysqlcheck -a --databases supercashier -uroot>" + DeskTopPath + @"\A_Analysis.txt" + " \r\n");//分析指定的表所有数据库
batstr.Append("mysqlcheck -c --databases supercashier -uroot>" + DeskTopPath + @"\B_check.txt" + " \r\n");//检查库或表
batstr.Append("mysqlcheck -r --databases supercashier -uroot>" + DeskTopPath + @"\C_repair.txt" + " \r\n");//修复库或表
batstr.Append("mysqlcheck -o --databases supercashier -uroot>" + DeskTopPath + @"\D_optimization.txt" + " \r\n");//优化指定的表
batstr.Append("mysqlcheck --auto-repair --databases supercashier -uroot>" + DeskTopPath + @"\ssss.txt" + " \r\n");
System.IO.File.WriteAllText(DeskTopPath + @"\fixmysql.bat", batstr.ToString());
//执行一次语句
Process.Start(DeskTopPath + @"\fixmysql.bat");
System.IO.File.WriteAllText(DeskTopPath + @"\fixmysql.txt", ExecuteCommand(batstr.ToString()));
//Console.WriteLine();
Console.ReadLine();
}
///
/// 使用命令行执行命令并返回结果
///
///The command.
///
private static string ExecuteCommand(string command)
{
try
{
// create the ProcessStartInfo using "cmd" as the program to be run,
// and "/c " as the parameters.
// Incidentally, /c tells cmd that we want it to execute the command that follows,
// and then exit.
var procStartInfo =
new System.Diagnostics.ProcessStartInfo("cmd", "/c " + command);
// The following commands are needed to redirect the standard output.
// This means that it will be redirected to the Process.StandardOutput StreamReader.
procStartInfo.RedirectStandardOutput = true;
procStartInfo.UseShellExecute = false;
// Do not create the black window.
procStartInfo.CreateNoWindow = true;
// Now we create a process, assign its ProcessStartInfo and start it
var proc = new Process();
proc.StartInfo = procStartInfo;
proc.Start();
// Get the output into a string
return proc.StandardOutput.ReadToEnd();
// Display the command output.
//Console.WriteLine(result);
}
catch (Exception objException)
{
// Log the exception
//MessageBox.Show(objException.Message);
Console.WriteLine(objException.Message);
return null;
}
}
}
}
代码中

A方法是整套的检查修复流程

B方法是利用bat执行cmd命令这样代码量少但是会暴漏语句可以执行完再删除

C方法是只检查数据库表是否正常

里面有个蛋疼的是字符串输出的时候会把路径加上很多字符 cmd识别不出来

所以利用下面的语句来切换工作目录

Directory.SetCurrentDirectory(path);
不过也可以看到方法B中是通过cd 目录来实现这个效果的

反正听起来蛮简单的，做起来好多小细节需要自己把握

希望对大家有用

↧

数据库对表的三种分割技术

September 2, 2016, 2:20 am

≫ Next: SAP收购云版Hadoop软件开发商Altiscale

≪ Previous: C#mysqlcheck检查数据库异常+修复数据库

一.水平分割

水平分割根据某些条件将数据放到两个或多个独立的表中。即按记录进分分割，不同的记录可以分开保存，每个子表的列数相同。水平切割将表分为多个表。每个表包含的列数相同，但是数据行更少。例如，可以将一个包含十亿行的表水平分区成 12 个表，每个小表表示特定年份内一个月的数据。任何需要特定月份数据的查询只需引用相应月份的表。

通常用来水平分割表的条件有：日期时间维度、地区维度等，当然还有更多的业务维度。下面我举几个例子来解说一下

案例1：某个公司销售记录数据量太大了，我们可以对它按月进行水平分割，每个月的销售记录单独成一张表。

案例2：某个集团在各个地区都有分公司，该集团的订单数据表太大了，我们可以按分公司所在的地区进行水平切割。

案例3：某电信公司的话单按日期、地市水平切割后，发现数据量太大，然后他们又按品牌、号码段进行水平切割

水平分割通常在下面的情况下使用：

(1)表数据量很大，分割后可以降低在查询时需要读的数据和索引的页数，同时也降低了索引的层数，加快了查询速度。

(2)表中的数据本来就有独立性，例如表中分别记录各个地区的数据或不同时期的数据，特别是有些数据常用，而另外一些数据不常用。

(3)需要把数据存放到多个介质上。

(4)需要把历史数据和当前的数据拆分开。

优点：

1：降低在查询时需要读的数据和索引的页数，同时也降低了索引的层数，加快了查询速度。

缺点：

1：水平分割会给应用增加复杂度，它通常在查询时需要多个表名，查询所有数据需要union操作。在许多数据库应用中，这种复杂性会超过它带来的优点，因为只要索引关键字不大，则在索引用于查询时，表中增加两到三倍数据量，查询时也就增加读一个索引层的磁盘次数。

二.垂直分割

垂直分割表(不破坏第三范式)，把主码(主键)和一些列放到一个表，然后把主码(主键)和另外的一些列放到另一个表中。将原始表分成多个只包含较少列的表。如果一个表中某些列常用，而另外一些列不常用，则可以采用垂直分割。

优点：

1：垂直分割可以使得行数据变小，一个数据块(Block)就能存放更多的数据，在查询时就会减少I/O次数(每次查询时读取的Block 就少)。

2：垂直分割表可以达到最大化利用Cache的目的。

缺点：

1：表垂直分割后，主码(主键)出现冗余，需要管理冗余列

2：会引起表连接JOIN操作(增加CPU开销)需要从业务上规避

三. 库表散列

表散列与水平分割相似，但没有水平分割那样的明显分割界限，采用Hash算法把数据分散到各个分表中, 这样IO更加均衡。一般来说，我们会按照业务或者功能模块将数据库进行分离，不同的模块对应不同的数据库或者表，再按照一定的策略对某个页面或者功能进行更小的数据库散列，比如用户表，按照用户ID进行表散列，散列128张表，则应就能够低成本的提升系统的性能并且有很好的扩展性

↧

SAP收购云版Hadoop软件开发商Altiscale

September 2, 2016, 2:19 am

≫ Next: 基于haddop的HDFS和Excel开源库POI导出大数据报表(二)

≪ Previous: 数据库对表的三种分割技术

网易科技讯8月25日消息，据VentureBeat报道，消息人士称，德国企业软件公司SAP收购了基于云的Hadoop软件开发商Altiscale，协议金额超过1.25亿美元。Hadoop是存储、处理和分析大量不同数据的开源软件。协议预计在几周后宣布。

消息人士称，Altiscale的投资者将获得3-4倍的现金回报。该公司通过风投融资至少4200万美元，其中在2014年12月就融资了3000万美元，投资者包括红杉资本等。SAP在采取行动加强云软件组合，此协议将使SAP成为Hadoop即服务领域的著名提供商之一。

Altiscale的创始人和CEO雷米埃斯泰塔（Raymie Stata）曾在2004年向雅虎出售了他的另一家公司Stata Labs，并在雅虎开发了Hadoop软件，他的团队将帮助SAP与IBM、微软等竞争。Altiscale的竞争对手包括亚马逊的AWS等。

Altiscale成立于2012年，位于加州帕罗奥托。最初该公司的名称为VertiCloud，职员超过90人。对此，SAP拒绝置评，Altiscale未立即回应评论要求。（木秀林）

本文来源：网易科技报道责任编辑：白鑫_NT4464

↧

基于haddop的HDFS和Excel开源库POI导出大数据报表(二)

September 2, 2016, 5:18 am

≫ Next: Cassandra原理介绍

≪ Previous: SAP收购云版Hadoop软件开发商Altiscale

接着上一篇《基于haddop的HDFS和Excel开源库POI导出大数据报表(一)》的遗留的问题开始，这篇做优化处理。

优化导出流程

在一开始的时候，当我获取到订单的数量，遍历订单，获取用户id和用户的地址id，逐条查询，可想而知，1w条数据，我要查询数据库1w*2，这种资源消耗是伤不起的，小号的时间大多数花在了查询上面。

后来，做了一次优化，将用户id和地址id分别放入到list中，每500条查询一次，加入有1w条，可以执行查询 (10000 / 500) = 20 ,只需要查询20次即可，一般而言这个数目更小，原因用户id重复，同一个用户有很多订单，这样选择set比起list好很多，那么查询次数又降低了很多。

@Component("userService") @Path("user") @Produces({ContentType.APPLICATION_JSON_UTF_8}) public class UserServiceImpl implements UserService { private static final int PER_TIMES = 500; @Resource private UserRepo repo; @Override @GET @Path("/users") public Map<Long, User> getUsersByUids(List<Long> uids) { if (uids == null || uids.size() <= 0) { return null; } Map<Long, User> map = new HashMap<>(); int times = uids.size() > PER_TIMES ? (int) Math.ceil(uids.size() * 1.0 / PER_TIMES) : 1; for (int i = 0; i < times; i++) { // 执行多少次查询 StringBuffer strUids = new StringBuffer(); strUids.append("(0"); for (int j = i * PER_TIMES; j < ((i + 1) * PER_TIMES) && j < uids.size(); j++) { // 每次查询多少条数据 strUids.append(",").append(uids.get(j)); } strUids.append(")"); String uid = strUids.toString(); // Map<Long, User> m = repo.getUserByUids(uid); if (m != null && m.size() > 0) { System.out.println("第" + i + "次循环,返回数据" + m.size()); map.putAll(m); } } return map; } // ... 其他的业务逻辑 }

在使用内部for循坏的时候，我犯了基本的算法错误,原来的代码：

// ... // size 是第一个for循坏外面的变量，初识值为 size = uids.size(); StringBuffer strUids = new StringBuffer(); strUids.append("(0"); for (int j = i * PER_TIMES; j < PER_TIMES && j < size; j++) { strUids.append(",").append(uids.get(j)); } size = size - (i + 1) * PER_TIMES; strUids.append(")"); String uid = strUids.toString(); // ...

是的，你没看错，这个错误我犯了，记在这里，是为了提醒以后少犯这样低级的错误。不管外部循环如何，里面的size值一直在减小，PER_TIMES值不变。

假如 PER_TIMES =500; i = 2; 那么里面的for是这样的，j = 2 * 500;j < 500 && j < (1000 - 500); j++;错误就在这里了，1000 < 500永远为false，再说了size的值一直在减小，j也会小于size。

这个错误造成的直接问题是数据空白，因为只会执行一次，第二次条件就为false了。

舍弃反射

在接口传来的数据类似这样的json：

{ "params": { "starttm": 1469980800000 }, "filename": "2016-08-28-订单.xlsx", "header": { "crtm": "下单时间", "paytm": "付款时间", "oid": "订单ID", "iid": "商品ID", "title": "商品标题", "type": "商品类型", "quantity": "购买数量", "user": "买家用户名", "uid": "买家ID", "pro": "省市", "city": "区县", "addr": "买家地址", "status": "订单状态", "refund": "退款状态", "pri": "单价", "realpay": "实付款", "tel": "电话", "rec": "收件人姓名", "sex": "性别", "comment": "备注" } }

按照header字段的key生成数据，所以，一开始我是拿key通过反射获取 get+"Key" 值得，但是这样导致很慢。

/** * 直接读取对象属性值, 无视private/protected修饰符, 不经过getter函数. * @param obj * @param fieldName * @return */ public static Object getFieldValue(final Object obj, final String fieldName) { Field field = getAccessibleField(obj, fieldName); if (null == field) { throw new IllegalArgumentException("Could not find field [" + fieldName + "] on " + "target [" + obj + "]"); } Object result = null; try { result = field.get(obj); } catch (IllegalAccessException e) { LOGGER.error("不可能抛出的异常{}" + e); } return result; } /** * 循环向上转型, 获取对象的DeclaredField, 并强制设置为可访问. * 如向上转型到Object仍无法找到, 返回null. * @param obj * @param fieldName * @return */ public static Field getAccessibleField(final Object obj, final String fieldName) { Assert.notNull(obj, "OBJECT不能为空"); Assert.hasText(fieldName, "fieldName"); for (Class<?> superClass = obj.getClass(); superClass != Object.class; superClass = superClass.getSuperclass()) { try { Field field = superClass.getDeclaredField(fieldName); field.setAccessible(true); return field; } catch (NoSuchFieldException e) { //NOSONAR // Field不在当前类定义,继续向上转型 } } return null; }

因为这些字段来自多个不同的对象，可能某些字段注入会失败，当注入失败的时候尝试注入到另一个对象。我觉得耗时也在这地方，后来修改成直接使用getter方法获取，速度也有提升。

jar包冲突解决

错误：

javax.servlet.ServletException: Servlet execution threw an exception org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) com.alibaba.druid.support.http.WebStatFilter.doFilter(WebStatFilter.java:123)
基于haddop的HDFS和Excel开源库POI导出大数据报表(二)

test环境和dev环境均好，但是线上环境报错。几经波折，终于知道，引起错误的原因是jar包冲突，resteasy和jersey包的冲突。项目中如何引入的jersey，这就得跟hadoop有关。

研究后发现，hadoop并不一定需要jersey，因此果断舍弃掉jersey包：

compile ('org.apache.hadoop:hadoop-common:2.7.2') { exclude(module: 'jersey') exclude(module: 'contribs') } compile ('org.apache.hadoop:hadoop-hdfs:2.7.2') { exclude(module: 'jersey') exclude(module: 'contribs') } compile ('org.apache.hadoop:hadoop-client:2.7.2') { exclude(module: 'jersey') exclude(module: 'contribs') }

尽管项目rest接口报错，但是启动不会报错，mq的执行正常。原因，大家都看到，jar包冲突，引起web中的过滤器根本不去请求路由，直接被过滤掉了。

HDFS优化后的封装 public class HDFSUtils { private static FileSystem fs = null; public static FileSystem getFileSystem(Configuration conf) throws IOException, URISyntaxException { if (null == fs) { fs = FileSystem.get(conf); } return fs; } /** * 判断路径是否存在 * * @param conf hadoop 配置 * @param path hadoop 文件路径 * @return 文件是否存在 * @throws IOException */ public static boolean exits(Configuration conf, String path) throws IOException, URISyntaxException { FileSystem fs = getFileSystem(conf); return fs.exists(new Path(path)); } /** * 创建文件 * * @param conf hadoop 配置 * @param filePath 本地文件路径 * @param contents 文件内容 * @throws IOException */ public static void createFile(Configuration conf, String filePath, byte[] contents) throws IOException, URISyntaxException { try (FileSystem fs = getFileSystem(conf); FSDataOutputStream outputStream = createFromFileSystem(fs, filePath)) { outputStream.write(contents, 0, contents.length); outputStream.hflush(); } } private static FSDataOutputStream createFromFileSystem(FileSystem fs, String filePath) throws IOException { Path path = new Path(filePath); return fs.create(path); } private static FSDataInputStream openFromFileSystem(FileSystem fs, String filePath) throws IOException { Path path = new Path(filePath); return fs.open(path); } /** * 创建文件 * * @param conf hadoop 配置 * @param filePath 本地文件路径 * @param workbook excel workbook 内容 * @throws IOException */ public static void createFile(Configuration conf, String filePath, Workbook workbook) throws IOException, URISyntaxException { try (FileSystem fs = getFileSystem(conf); FSDataOutputStream outputStream = createFromFileSystem(fs, filePath)) { ByteArrayOutputStream os = new ByteArrayOutputStream(); workbook.write(os); outputStream.write(os.toByteArray()); outputStream.hflush(); } } /** * 创建文件 * * @param conf hadoop 配置 * @param filePath 本地文件路径 * @param contents 文件内容 * @throws IOException */ public static void uploadWorkbook(Configuration conf, String filePath, byte[] contents) throws IOException, URISyntaxException { try (FileSystem fs = getFileSystem(conf); FSDataOutputStream outputStream = createFromFileSystem(fs, filePath)) { outputStream.write(contents, 0, contents.length); outputStream.hflush(); } } /** * 创建文件 * * @param conf hadoop 配置 * @param filePath 本地文件路径 * @param fileContent 文件内容 * @throws IOException */ public static void createFile(Configuration conf, String fileContent, String filePath) throws IOException, URISyntaxException { createFile(conf, filePath, fileContent.getBytes()); } /** * 上传文件 * * @param conf hadoop 配置 * @param localFilePath 本地文件路径 * @param remoteFilePath 远程文件路径 * @throws IOException */ public static void copyFromLocalFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException, URISyntaxException { try (FileSystem fs = getFileSystem(conf)) { Path localPath = new Path(localFilePath); Path remotePath = new Path(remoteFilePath); fs.copyFromLocalFile(true, true, localPath, remotePath); } } /** * 删除目录或文件 * * @param conf hadoop 配置 * @param remoteFilePath 远程文件路径 * @param recursive if the subdirectories need to be traversed recursively * @return 是否成功 * @throws IOException */ public static boolean deleteFile(Configuration conf, String remoteFilePath, boolean recursive) throws IOException, URISyntaxException { try (FileSystem fs = getFileSystem(conf)) { return fs.delete(new Path(remoteFilePath), recursive); } } /** * 删除目录或文件(如果有子目录,则级联删除) * * @param conf hadoop 配置 * @param remoteFilePath 远程文件路径 * @return 是否成功 * @throws IOException */ public static boolean deleteFile(Configuration conf, String remoteFilePath) throws IOException, URISyntaxException { return deleteFile(conf, remoteFilePath, true); } /** * 文件重命名 * * @param conf hadoop 配置 * @param oldFileName 原始文件名 * @param newFileName 新文件名 * @return 是否成功 * @throws IOException */ public static boolean renameFile(Configuration conf, String oldFileName, String newFileName) throws IOException, URISyntaxException { try (FileSystem fs = getFileSystem(conf)) { Path oldPath = new Path(oldFileName); Path newPath = new Path(newFileName); return fs.rename(oldPath, newPath); } } /** * 创建目录 * * @param conf hadoop 配置 * @param dirName hadoop 目录名 * @return 是否成功 * @throws IOException */ public static boolean createDirectory(Configuration conf, String dirName) throws IOException, URISyntaxException { try (FileSystem fs = getFileSystem(conf)) { Path dir = new Path(dirName); return fs.mkdirs(dir); } } /** * 列出指定路径下的所有文件(不包含目录) * * @param fs hadoop文件系统 * @param basePath 基础路径 * @param recursive if the subdirectories need to be traversed recursively */ public static RemoteIterator<LocatedFileStatus> listFiles(FileSystem fs, String basePath, boolean recursive) throws IOException { return fs.listFiles(new Path(basePath), recursive); } /** * 列出指定路径下的文件（非递归） * * @param conf hadoop 配置 * @param basePath 基础路径 * @return 文件状态集合 * @throws IOException */ public static RemoteIterator<LocatedFileStatus> listFiles(Configuration conf, String basePath) throws IOException, URISyntaxException { try (FileSystem fs = getFileSystem(conf)) { return fs.listFiles(new Path(basePath), false); } } /** * 列出指定目录下的文件\子目录信息（非递归） * * @param conf hadoop 配置 * @param dirPath 文件目录 * @return 文件状态数组 * @throws IOException */ public static FileStatus[] listStatus(Configuration conf, String dirPath) throws IOException, URISyntaxException { try (FileSystem fs = getFileSystem(conf)) { return fs.listStatus(new Path(dirPath)); } } /** * 读取文件内容并写入outputStream中 * * @param conf hadoop 配置 * @param filePath 文件路径 * @param os 输出流 * @throws IOException */ public static void readFile(Configuration conf, String filePath, OutputStream os) throws IOException, URISyntaxException { FileSystem fs = getFileSystem(conf); Path path = new Path(filePath); try (FSDataInputStream inputStream = fs.open(path)) { int c; while ((c = inputStream.read()) != -1) { os.write(c); } } } /** * 读取文件内容并返回 * @param conf hadoop 配置 * @param filePath 本地文件路径 * @return 文件内容 * @throws IOException * @throws URISyntaxException */ public static String readFile(Configuration conf, String filePath) throws IOException, URISyntaxException { String fileContent; try (FileSystem fs = getFileSystem(conf); InputStream inputStream = openFromFileSystem(fs, filePath); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(inputStream.available())) { IOUtils.copyBytes(inputStream, outputStream, conf); byte[] lens = outputStream.toByteArray(); fileContent = new String(lens, "UTF-8"); } return fileContent; } }

优化1：所有的try{} finally{}均由try代替掉了。而把简单代码放到try()里面了。try()是java7的特性，叫自动资源释放，具有关闭流的作用，不再手动去在finally中关闭各种stream和文件句柄，前提是，这些可关闭的资源必须实现 java.lang.AutoCloseable 接口。

新增了一个方法：

public static void createFile(Configuration conf, String filePath, Workbook workbook) throws IOException, URISyntaxException { try (FileSystem fs = getFileSystem(conf); FSDataOutputStream outputStream = createFromFileSystem(fs, filePath)) { ByteArrayOutputStream os = new ByteArrayOutputStream(); workbook.write(os); outputStream.write(os.toByteArray()); outputStream.hflush(); } }

方法参数： hadoop配置，完整文件名， Workbook 。这里通过workbook.write把Workbook写到ByteArrayOutputStream中，然后把ByteArrayOutputStream流写入到FSDataOutputStream流，再flush到磁盘。

这个优化的原因：下载文件的时候，读取流必须是POI的WorkBook的流，如果转换成其他的流，发生乱码。

POI优化后的封装 package cn.test.web.utils; import cn.common.util.Utils; import org.apache.commons.io.FilenameUtils; import org.apache.poi.hssf.record.crypto.Biff8EncryptionKey; import org.apache.poi.hssf.usermodel.HSSFFont; import org.apache.poi.hssf.usermodel.HSSFFooter; import org.apache.poi.hssf.usermodel.HSSFHeader; import org.apache.poi.hssf.usermodel.HSSFWorkbook; import org.apache.poi.openxml4j.exceptions.InvalidFormatException; import org.apache.poi.openxml4j.opc.OPCPackage; import org.apache.poi.openxml4j.opc.PackageAccess; import org.apache.poi.poifs.filesystem.POIFSFileSystem; import org.apache.poi.ss.usermodel.Cell; import org.apache.poi.ss.usermodel.CellStyle; import org.apache.poi.ss.usermodel.Font; import org.apache.poi.ss.usermodel.Footer; import org.apache.poi.ss.usermodel.Header; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.ss.usermodel.Sheet; import org.apache.poi.ss.usermodel.Workbook; import org.apache.poi.ss.usermodel.WorkbookFactory; import org.apache.poi.xssf.streaming.SXSSFWorkbook; import org.apache.poi.xssf.usermodel.XSSFWorkbook; import java.io.BufferedInputStream; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.util.List; import java.util.Properties; /** * Created with dongjia-data-presentation * User zhoujunwen * Date 16/8/11 * Time 下午5:02 */ public class POIUtils { private static final short HEADER_FONT_SIZE = 16; // 大纲字体 private static final short FONT_HEIGHT_IN_POINTS = 14; // 行首字体 private static final int MEM_ROW = 100; public static Workbook createWorkbook(String file) { String ext = FilenameUtils.getExtension(CommonUtils.getFileName(file)); Workbook wb = createSXSSFWorkbook(MEM_ROW); /*switch (ext) { case "xls": wb = createHSSFWorkbook(); break; case "xlsx": wb = createXSSFWorkbook(); break; default: wb = createHSSFWorkbook(); }*/ return wb; } public static Workbook createWorkbookByIS(String file, InputStream inputStream) { String ext = FilenameUtils.getExtension(CommonUtils.getFileName(file)); Workbook wb = null; try { OPCPackage p = OPCPackage.open(inputStream); wb = new SXSSFWorkbook(new XSSFWorkbook(p), 100); } catch (Exception ex) { try { wb = new HSSFWorkbook(inputStream, false); } catch (IOException e) { wb = new XSSFWorkbook(); } } return wb; } /** * * @param wb * @param file * @return */ public static Workbook writeFile(Workbook wb, String file) { if (wb == null || Utils.isEmpty(file)) { return null; } FileOutputStream out = null; try { out = new FileOutputStream(file); wb.write(out); } catch (IOException e) { e.printStackTrace(); } finally { if (out != null) { try { out.close(); } catch (IOException e) { e.printStackTrace(); } } } return wb; } public static Workbook createHSSFWorkbook() { //生成Workbook HSSFWorkbook wb = new HSSFWorkbook(); //添加Worksheet（不添加sheet时生成的xls文件打开时会报错） @SuppressWarnings("unused") Sheet sheet = wb.createSheet(); return wb; } public static Workbook createSXSSFWorkbook(int memRow) { Workbook wb = new SXSSFWorkbook(memRow); Sheet sheet = wb.createSheet(); return wb; } public static Workbook createXSSFWorkbook() { XSSFWorkbook wb = new XSSFWorkbook(); @SuppressWarnings("unused") Sheet sheet = wb.createSheet(); return wb; } public static Workbook openWorkbook(String file) { FileInputStream in = null; Workbook wb = null; try { in = new FileInputStream(file); wb = WorkbookFactory.create(in); } catch (InvalidFormatException | IOException e) { e.printStackTrace(); } finally { try { if (in != null) { in.close(); } } catch (IOException e) { e.printStackTrace(); } } return wb; } public static Workbook openEncryptedWorkbook(String file, String password) { FileInputStream input = null; BufferedInputStream binput = null; POIFSFileSystem poifs = null; Workbook wb = null; try { input = new FileInputStream(file); binput = new BufferedInputStream(input); poifs = new POIFSFileSystem(binput); Biff8EncryptionKey.setCurrentUserPassword(password); String ext = FilenameUtils.getExtension(CommonUtils.getFileName(file)); switch (ext) { case "xls": wb = new HSSFWorkbook(poifs); break; case "xlsx": wb = new XSSFWorkbook(input); break; default: wb = new HSSFWorkbook(poifs); } } catch (IOException e) { e.printStackTrace(); } return wb; } /** * 追加一个sheet,如果wb为空且isNew为true,创建一个wb * * @param wb * @param isNew * @param type 创建wb类型,isNew为true时有效 1:xls,2:xlsx * @return */ public static Workbook appendSheet(Workbook wb, boolean isNew, int type) { if (wb != null) { Sheet sheet = wb.createSheet(); } else if (isNew) { if (type == 1) { wb = new HSSFWorkbook(); wb.createSheet(); } else { wb = new XSSFWorkbook(); wb.createSheet(); } } return wb; } public static Workbook setSheetName(Workbook wb, int index, String sheetName) { if (wb != null && wb.getSheetAt(index) != null) { wb.setSheetName(index, sheetName); } return wb; } public static Workbook removeSheet(Workbook wb, int index) { if (wb != null && wb.getSheetAt(index) != null) { wb.removeSheetAt(index); } return wb; } public static void insert(Sheet sheet, int row, int start, List<?> columns) { for (int i = start; i < (row + start); i++) { Row rows = sheet.createRow(i); if (columns != null && columns.size() > 0) { for (int j = 0; j < columns.size(); j++) { Cell ceil = rows.createCell(j); ceil.setCellValue(String.valueOf(columns.get(j))); } } } } public static void insertRow(Row row, List<?> columns) { if (columns != null && columns.size() > 0) { for (int j = 0; j < columns.size(); j++) { Cell ceil = row.createCell(j); ceil.setCellValue(String.valueOf(columns.get(j))); } } } /** * 设置excel头部 * * @param wb * @param sheetName * @param columns 比如:["国家","活动类型","年份"] * @return */ public static Workbook setHeader(Workbook wb, String sheetName, List<?> columns) { if (wb == null) return null; Sheet sheet = wb.getSheetAt(0); if (sheetName == null) { sheetName = sheet.getSheetName(); } insert(sheet, 1, 0, columns); return setHeaderStyle(wb, sheetName); } /** * 插入数据 * * @param wb Workbook * @param sheetName sheetName,默认为第一个sheet * @param start 开始行数 * @param data 数据,List嵌套List ,比如:[["中国","奥运会",2008],["伦敦","奥运会",2012]] * @return */ public static Workbook setData(Workbook wb, String sheetName, int start, List<?> data) { if (wb == null) return null; if (sheetName == null) { sheetName = wb.getSheetAt(0).getSheetName(); } if (!Utils.isEmpty(data)) { if (data instanceof List) { int s = start; Sheet sheet = wb.getSheet(sheetName); for (Object rowData : data) { Row row = sheet.createRow(s); insertRow(row, (List<?>) rowData); s++; } } } return wb; } /** * 移除某一行 * * @param wb * @param sheetName sheet name * @param row 行号 * @return */ public static Workbook delRow(Workbook wb, String sheetName, int row) { if (wb == null) return null; if (sheetName == null) { sheetName = wb.getSheetAt(0).getSheetName(); } Row r = wb.getSheet(sheetName).getRow(row); wb.getSheet(sheetName).removeRow(r); return wb; } /** * 移动行 * * @param wb * @param sheetName * @param start 开始行 * @param end 结束行 * @param step 移动到那一行后(前) ,负数表示向前移动 * moveRow(wb,null,2,3,5); 把第2和3行移到第5行之后 * moveRow(wb,null,2,3,-1); 把第3行和第4行往上移动1行 * @return */ public static Workbook moveRow(Workbook wb, String sheetName, int start, int end, int step) { if (wb == null) return null; if (sheetName == null) { sheetName = wb.getSheetAt(0).getSheetName(); } wb.getSheet(sheetName).shiftRows(start, end, step); return wb; } public static Workbook setHeaderStyle(Workbook wb, String sheetName) { Font font = wb.createFont(); CellStyle style = wb.createCellStyle(); font.setBoldweight(HSSFFont.BOLDWEIGHT_BOLD); font.setFontHeightInPoints(FONT_HEIGHT_IN_POINTS); font.setFontName("黑体"); style.setFont(font); if (Utils.isEmpty(sheetName)) { sheetName = wb.getSheetAt(0).getSheetName(); } int row = wb.getSheet(sheetName).getFirstRowNum(); int cell = wb.getSheet(sheetName).getRow(row).getLastCellNum(); for (int i = 0; i < cell; i++) { wb.getSheet(sheetName).getRow(row).getCell(i).setCellStyle(style); } return wb; } public static Workbook setHeaderOutline(Workbook wb, String sheetName, String title) { if (wb == null) return null; if (Utils.isEmpty(sheetName)) { sheetName = wb.getSheetAt(0).getSheetName(); } Header header = wb.getSheet(sheetName).getHeader(); header.setLeft(HSSFHeader.startUnderline() + HSSFHeader.font("宋体", "Italic") + "让传承成为潮流!" + HSSFHeader.endUnderline()); header.setCenter(HSSFHeader.fontSize(HEADER_FONT_SIZE) + HSSFHeader.startDoubleUnderline() + HSSFHeader.startBold() + title + HSSFHeader.endBold() + HSSFHeader.endDoubleUnderline()); header.setRight("时间:" + HSSFHeader.date() + " " + HSSFHeader.time()); return wb; } public static Workbook setFooter(Workbook wb, String sheetName, String copyright) { if (wb == null) return null; if (Utils.isEmpty(sheetName)) { sheetName = wb.getSheetAt(0).getSheetName(); } Footer footer = wb.getSheet(sheetName).getFooter(); if (Utils.isEmpty(copyright)) { copyright = "dongjia"; } footer.setLeft("Copyright @ " + copyright); footer.setCenter("Page:" + HSSFFooter.page() + " / " + HSSFFooter.numPages()); footer.setRight("File:" + HSSFFooter.file()); return wb; } public static Workbook create(String sheetNm, String file, List<?> header, List<?> data, String title, String copyright) { Workbook wb = createWorkbook(file); if (Utils.isEmpty(sheetNm)) { sheetNm = wb.getSheetAt(0).getSheetName(); } setHeaderOutline(wb, sheetNm, title); setHeader(wb, sheetNm, header); setData(wb, sheetNm, 1, data); setFooter(wb, sheetNm, copyright); if (wb != null) { return wb; } return null; } public static String getSystemFileCharset() { Properties pro = System.getProperties(); return pro.getProperty("file.encoding"); } // TODO 后面增加其他设置 }

这里面修复了一个bug，这个bug导致数据写入过大，耗内存，耗CPU。下面是修改后的方法。

public static Workbook setData(Workbook wb, String sheetName, int start, List<?> data) { if (wb == null) return null; if (sheetName == null) { sheetName = wb.getSheetAt(0).getSheetName(); } if (!Utils.isEmpty(data)) { if (data instanceof List) { int s = start; Sheet sheet = wb.getSheet(sheetName); for (Object rowData : data) { Row row = sheet.createRow(s); insertRow(row, (List<?>) rowData); s++; } } } return wb; } public static void insertRow(Row row, List<?> columns) { if (columns != null && columns.size() > 0) { for (int j = 0; j < columns.size(); j++) { Cell ceil = row.createCell(j); ceil.setCellValue(String.valueOf(columns.get(j))); } } }

下面是原来的写法：

public static Workbook setData(Workbook wb, String sheetName, int start, List<?> data) { if (wb == null) return null; if (sheetName == null) { sheetName = wb.getSheetAt(0).getSheetName(); } if (data != null || data.size() > 0) { if (data instanceof List) { int s = start; for (Object columns : data) { insert(wb, sheetName, data.size() - (s - 1), s, (List<?>) columns); s++; } } } return wb; } public static void insert(Sheet sheet, int row, int start, List<?> columns) { for (int i = start; i < (row + start); i++) { Row rows = sheet.createRow(i); if (columns != null && columns.size() > 0) { for (int j = 0; j < columns.size(); j++) { Cell ceil = rows.createCell(j); ceil.setCellValue(String.valueOf(columns.get(j))); } } } }

错误：for (Object columns : data)已经在遍历数据了，但是在insert中又for (int i = start; i < (row + start); i++)遍历了一次，而且遍历的无厘头，尽管无厘头，数据却写进去，至于写入到什么地方了，就不知道，反正是成倍的增大内存和cpu。

↧

Cassandra原理介绍

September 2, 2016, 5:17 am

≫ Next: Redis教程――Redis简介

≪ Previous: 基于haddop的HDFS和Excel开源库POI导出大数据报表(二)

我们的反欺诈SAAS服务，提供的是基于用户行为的实时数据分析服务，既然是实时服务对性能就有非常高的要求，吞吐量(Thoughput)要能满足大量并发用户的请求，响应时间(Response Time)要足够短不能影响用户正常的业务处理。而数据分析，是建立在大数据处理的基础上，会从各个维度分析用户的行为数据，因此在用户请求时势必要记录所有的用户行为数据，这是一个非常庞大的数据量，单机是完全无法满足业务的持续增长，需要一个能够支持方便水平扩展(Horizontal Scaling)的数据库系统，而Cassandra相对传统关系数据库和其它NoSQL更能贴合这两点的要求。

Cassandra 最初源自Facebook，结合了 Google BigTable 面向列的特性和[Amazon Dynamo](http://en.wikipedia.org/wiki/Dynamo (storage system) ) 分布式哈希（DHT）的P2P特性于一身，具有很高的性能、可扩展性、容错、部署简单等特点。

它虽然有多的优点，但国内使用的公司貌似不多，远没有Hbase和MongoDB火，从百度指数上可以明显看到这三个系统在国内的热度对比。相对国内冷静的市场来说，Cassandra在国外发展的倒是如火如荼，国外这个专门对数据库进行评分的网站 DB-Engines 显示Cassandra排进了前十名，比Hbase的名次要高好几位，从2013年开始有了突飞猛进的增长，目前已有超过 1500多家公司在使用Cassandra，可惜的是没有多少国内公司，只有一家做云存储的创业公司云诺名列榜单。这也证明了网上的中文资源为何相对匮乏，我不得不找英文资料来看，倒是顺便加强了我的英文阅读能力，也算是失之东隅得之桑榆。

吸引我的特性

吸引我选择Cassandra作为NoSQL的原因主要有如下三点：

极高的读写性能

Cassandra写数据时，首先会将请求写入Commit Log以确保数据不会丢失，然后再写入内存中的Memtable，超过内存容量后再将内存中的数据刷到磁盘的SSTable，并定期异步对SSTable做数据合并(Compaction)以减少数据读取时的查询时间。因为写入操作只涉及到顺序写入和内存操作，因此有非常高的写入性能。而进行读操作时，Cassandra支持像 LevelDB 一样的实现机制，数据分层存储，将热点数据放在Memtable和相对小的SSTable中，所以能实现非常高的读性能。

简单的部署结构

相对Hbase等的主从结构，Cassandra是去中心化的P2P结构，所有节点完全一样没有单点，对小公司来说，我完全可以选择数据复制份数为2，先用两三台机器把Cassandra搭起来，既能保证数据的可靠性也方便今后机器的扩展，而Hbase起码得四五台机器吧。以后为了更好地支持客户可能需要在多个地方建立数据中心，而Cassandra对多数据中心的支持也很好，可以方便地部署多个数据中心，今早还看到一个俄罗斯最大电信公司的案例。另外我们的机器现在托管在一个小机房里，万一到时机器满了无法增加要考虑搬迁机房时，使用多数据中心的方式也能做到无缝迁移。

和Spark的结合

Cassandra作为一个底层的存储系统，能够方便地和Spark集成进行实时计算，这一点对我们的业务场景有致命的吸引力，我看到国外有很多使用案例就是用Spark+Cassandra来实现Velocity计算，比如 Ooyala （需自备梯子）。

基本架构

Cassandra没有像BigTable或Hbase那样选择中心控制节点，而选择了无中心的P2P架构，网络中的所有节点都是对等的，它们构成了一个环，节点之间通过P2P协议每秒钟交换一次数据，这样每个节点都拥有其它所有节点的信息，包括位置、状态等，如下图所示。

客户端可以连接集群中的任一个节点，和客户端建立连接的节点叫协作者(coordinator)，它相当于一个代理，负责定位该次请求要发到哪些实际拥有本次请求所需数据的节点上去获取，但如何获取并返回，主要根据客户端要求的一致性级别（Consistency Level）来定，比如：ONE指只要有一个节点返回数据就可以对客户端做出响应，QUONUM指需要返回几个根据用户的配置数目，ALL指等于数据复制份数的所有节点都返回结果才能向客户端做出响应，对于数据一致性要求不是特别高的可以选择ONE，它是最快的一种方式。

Cassandra的核心组件包括：

Gossip：点对点的通讯协议，用来相互交换节点的位置和状态信息。当一个节点启动时就立即本地存储Gossip信息，但当节点信息发生变化时需要清洗历史信息，比如IP改变了。通过Gossip协议，每个节点定期每秒交换它自己和它已经交换过信息的节点的数据，每个被交换的信息都有一个版本号，这样当有新数据时可以覆盖老数据，为了保证数据交换的准确性，所有的节点必须使用同一份集群列表，这样的节点又被称作seed。

Partitioner：负责在集群中分配数据，由它来决定由哪些节点放置第一份的copy，一般情况会使用Hash来做主键，将每行数据分布到不同的节点上，以确保集群的可扩展性。

Replica placement strategy：复制策略，确定哪个节点放置复制数据，以及复制的份数。

Snitch：定义一个网络拓扑图，用来确定如何放置复制数据，高效地路由请求。

cassandra.yaml：主配置文件，设置集群的初始化配置、表的缓存参数、调优参数和资源使用、超时设定、客户端连接、备份和安全

写请求

当写事件发生时，首先由Commit Log捕获写事件并持久化，保证数据的可靠性。之后数据也会被写入到内存中，叫Memtable，当内存满了之后写入数据文件，叫SSTable，它是Log-Structured Storage Table的简称。如果客户端配置了Consistency Level是ONE，意味着只要有一个节点写入成功，就由代理节点(Coordinator)返回给客户端写入完成。当然这中间有可能会出现其它节点写入失败的情况，Cassandra自己会通过Hinted Handoff或Read Repair 或者Anti-entropy Node Repair方式保证数据最终一致性。

对于多数据中心的写入请求，Cassandra做了优化，每个数据中心选取一个Coordinator来完成它所在数据中心的数据复制，这样客户端连接的节点只需要向数据中心的一个节点转发复制请求即可，由这个数据中心的Coordinator来完成该数据中心内的数据复制。

Cassandra的存储结构类似LSM树（ Log-Structured Merge Tree ）这种结构，不像传统数据一般都使用B+树，存储引擎以追加的方式顺序写入磁盘连续存储数据，写入是可以并发写入，不像B+树一样需要加锁，写入速度非常高，LevelDB、Hbase都是使用类似的存储结构。

Commit Log记录每次写请求的完整信息，此时并不会根据主键进行排序，而是顺序写入，这样进行磁盘操作时没有随机写导致的磁盘大量寻道操作，对于提升速度有极大的帮助，号称最快的本地数据库LevelDB也是采用这样的策略。Commit Log会在Memtable中的数据刷入SSTable后被清除掉，因此它不会占用太多磁盘空间，Cassandra的配置时也可以单独设置存储区，这为使用高性能但容量小价格昂贵的SSD硬盘存储Commit Log，使用速度度但容量大价格非常便宜的传统机械硬盘存储数据的混合布局提供了便利。

写入到Memtable时，Cassandra能够动态地为它分配内存空间，你也可以使用工具自己调整。当达到阀值后，Memtable中的数据和索引会被放到一个队列中，然后flush到磁盘，可以使用memtable flush queue_size参数来指定队列的长度。当进行flush时，会停止写请求。也可以使用nodetool flush工具手动刷新数据到磁盘，重启节点之前最好进行此操作，以减少Commit Log回放的时间。为了刷新数据，会根据partition key对Memtables进行重排序，然后顺序写入磁盘。这个过程是非常快的，因为只包含Commit Log的追加和顺序的磁盘写入。

当memtable中的数据刷到SSTable后，Commit Log中的数据将被清理掉。每个表会包含多个Memtable和SSTable，一般刷新完毕，SSTable是不再允许写操作的。因此，一个partition一般会跨多个SSTable文件，后续通过Compaction对多个文件进行合并，以提高读写性能。

这里所述的写请求不单指Insert操作，Update操作也是如此，Cassandra对Update操作的处理和传统关系数据库完全不一样，并不立即对原有数据进行更新，而是会增加一条新的记录，后续在进行Compaction时将数据再进行合并。Delete操作也同样如此，要删除的数据会先标记为Tombstone，后续进行Compaction时再真正永久删除。

读请求

读取数据时，首先检查Bloom filter，每一个SSTable都有一个Bloom filter用来检查partition key是否在这个SSTable，这一步是在访问任何磁盘IO的前面就会做掉。如果存在，再检查partition key cache，然后再做如下操作：

如果在cache中能找到索引，到compression offset map中找拥有这个数据的数据块，从磁盘上取得压缩数据并返回结果集。如果在cache中找不到索引，搜索partition summary确定索引在磁盘上的大概位置，然后获取索引入口，在SSTable上执行一次单独的寻道和一个顺序的列读取操作，下面也是到compression offset map中找拥有这个数据的数据块，从磁盘上取得压缩数据并返回结果集。读取数据时会合并Memtable中缓存的数据、多个SSTable中的数据，才返回最终的结果。比如更新用户Email后，用户名、密码等还在老的SSTable中，新的EMail记录到新的SSTable中，返回结果时需要读取新老数据并进行合并。

2.0之后的Bloom filter,compression offset map,partition summary都不放在Heap中了，只有partition key cache还放在Heap中。Bloom filter增长大约1~2G每billion partitions。partition summary是partition index的样本，你可以通过index_interval来配置样本频率。compression offset map每TB增长1~3G。对数据压缩越多，就会有越多个数的压缩块，和越大compression offset table。

读请求(Read Request)分两种，一种是Rirect Read Request，根据客户端配置的Consistency Level读取到数据即可返回客户端结果。一种是Background Read Repair Request,除了直接请求到达的节点外，会被发送到其它复制节点，用于修复之前写入有问题的节点，保证数据最终一致性。客户端读取时，Coordinator首先联系Consistency Level定义的节点，发送请求到最快响应的复制节点上，返回请求的数据。如果有多个节点被联系，会在内存比较每个复制节点传过的数据行，如果不一致选取最近的数据（根据时间戳）返回给客户端，并在后台更新过期的复制节点，这个过程被称作Read Repair 。

下面是Consistency Level 为ONE的读取过程，Client连接到任意一个节点上，该节点向实际拥有该数据的节点发出请求，响应最快的节点数据回到Coordinator后，就将数据返回给Client。如果其它节点数据有问题，Coordinator会将最新的数据发送有问题的节点上，进行数据的修复。

数据整理（Compaction）

更新操作不会立即更新，这样会导致随机读写磁盘，效率不高，Cassandra会把数据顺序写入到一个新的SSTable，并打上一个时间戳以标明数据的新旧。它也不会立马做删除操作，而是用Tombstone来标记要删除的数据。Compaction时，将多个SSTable文件中的数据整合到新的SSTable文件中，当旧SSTable上的读请求一完成，会被立即删除，空余出来的空间可以重新利用。虽然Compcation没有随机的IO访问，但还是一个重量级的操作，一般在后台运行，并通过限制它的吞吐量来控制，`compaction throughput mb per sec参数可以设置，默认是16M/s。另外，如果key cache显示整理后的数据是热点数据，操作系统会把它放入到page cache里，以提升性能。它的合并的策略有以下两种：

SizeTieredCompactionStrategy :每次更新不会直接更新原来的数据，这样会造成随机访问磁盘，性能不高，而是在插入或更新直接写入下一个sstable，这样是顺序写入速度非常快，适合写敏感的操作。但是，因为数据分布在多个sstable，读取时需要多次磁盘寻道，读取的性能不高。为了避免这样情况，会定期在后台将相似大小的sstable进行合并，这个合并速度也会很快，默认情况是4个sstable会合并一次，合并时如果没有过期的数据要清理掉，会需要一倍的空间，因此最坏情况需要50%的空闲磁盘。

LeveledCompactionStrategy：创建固定大小默认是5M的sstable，最上面一级为L0下面为L1，下面一层是上面一层的10倍大小。这种整理策略读取非常快，适合读敏感的情况，最坏只需要10%的空闲磁盘空间，它参考了LevelDB的实现，详见 LevelDB的具体实现原理。

这里也有关于这两种方式的详细描述。

数据复制和分发

数据分发和复制通常是一起的，数据用表的形式来组织，用主键来识别应该存储到哪些节点上，行的copy称作replica。当一个集群被创建时，至少要指定如下几个配置：Virtual Nodes，Partitioner，Replication Strategy，Snitch。

数据复制策略有两种，一种是SimpleStrategy，适合一个数据中心的情况，第一份数据放在Partitioner确定的节点，后面的放在顺时针找到的节点上，它不考虑跨数据中心和机架的复制。另外一种是NetworkTopologyStargegy，第一份数据和前一种一样，第二份复制的数据放在不同的机架上，每个数据中心可以有不同数据的replicas。

Partitioner策略有三种，默认是Murmur3Partitioner，使用MurmurHash。RandomPartitioner，使用Md5 Hash。ByteOrderedPartitioner使用数据的字节进行有顺分区。Cassandra默认使用MurmurHash，这种有更高的性能。

Snitch用来确定从哪个数据中心和哪个机架上写入或读取数据,有如下几种策略：

DynamicSnitch：监控各节点的执行情况，根据节点执行性能自动调节，大部分情况推荐使用这种配置 SimpleSnitch：不会考虑数据库和机架的情况，当使用SimpleStategy策略时考虑使用这种情况 RackInterringSnitch：考虑数据库中和机架 PropertyFileSnitch：用cassandra-topology.properties文件来自定义 GossipPropertyFileSnitch:定义一个本地的数据中心和机架，然后使用Gossip协议将这个信息传播到其它节点，对应的配置文件是cassandra-rockdc.properties 失败检测和修复（Failure detection and recovery）

Cassandra从Gossip信息中确认某个节点是否可用，避免客户端请求路由到一个不可用的节点，或者执行比较慢的节点，这个通过dynamic snitch可以判断出来。Cassandra不是设定一个固定值来标记失败的节点，而是通过连续的计算单个节点的网络性能、工作量、以及其它条件来确定一个节点是否失败。节点失败的原因可能是硬件故障或者网络中断等，节点的中断通常是短暂的但有时也会持续比较久的时间。节点中断并不意味着这个节点永久不可用了，因此不会永久地从网络环中去除，其它节点会定期通过Gossip协议探测该节点是否恢复正常。如果想永久的去除，可以使用nodetool手工删除。

当节点从中断中恢复过来后，它会缺少最近写入的数据，这部分数据由其它复制节点暂为保存，叫做Hinted Handoff，可以从这里进行自动恢复。但如果节点中断时间超过max hint window in ms（默认3小时）设定的值，这部分数据将会被丢弃，此时需要用nodetool repair在所有节点上手工执行数据修复，以保证数据的一致性。

动态扩展

Cassandra最初版本是通过一致性Hash 来实现节点的动态扩展的，这样的好处是每次增加或减少节点只会影响相邻的节点，但这个会带来一个问题就是造成数据不均匀，比如新增时数据都会迁移到这台机的机器上，减少时这台机器上的数据又都会迁移到相邻的机器上，而其它机器都不能分担工作，势必会造成性能问题。从1.2版本开始，Cassandra引入了虚拟节点(Virtual Nodes)的概念，为每个真实节点分配多个虚拟节点（默认是256），这些节点并不是按Hash值顺序排列，而是随机的，这样在新增或减少一个节点时，会有很多真实的节点参与数据的迁移，从而实现了负载匀衡。

↧

Redis教程――Redis简介

September 2, 2016, 1:35 pm

≫ Next: What Every Accidental DBA Needs to Know Now: #2 Restores

≪ Previous: Cassandra原理介绍

Redis是一个开源的(BSD许可)基于内存的数据结构存储器。通常可作为数据库，缓存和消息中介。它支持的数据结构有：string，hash，list，set，sorted set，bitmap 和 hyperloglog。Redis有内置的复制、Lua脚本、LRU缓存、事务和不同层级的磁盘持久化功能，还通过Redis Sentinel提供了高可用性，通过Redis集群实现了自动化分割。

Redis是由ANSI C语言编写的，在无需额外依赖下，运行于大多数POSIX系统，如 linux、*BSD、OS X。Redis 是在 Linux和OS X两款操作系统下开发和充分测试的，推荐Linux为部署环境。Redis也可以运行在Solaris派生系统上，如 SmartOS，但是支持有待加强。没有官方支持的windows构建版本，但是微软开发和维护了一个64位Windows的版本。

↧

What Every Accidental DBA Needs to Know Now: #2 Restores

September 2, 2016, 1:34 pm

≫ Next: Things to Consider When Deciding on a Primary Key for your INNODB table

≪ Previous: Redis教程――Redis简介

Preface

In the first article in this series published last month I mentioned that I’ll be presenting a session at the 2016 IT/Dev Connections Conference titled (some) of the Top 10 Things Every Accidental DBA Needs to Know . Why “some”? Because the list is actually endless it’s difficult to say what is considered important for a DBA just starting their professional journey and one who is an established database administrator moving towards Senior DBA status.

In advance of this year’s IT/Dev Connections conference I’ll elaborate on these ten points I intend to cover in my session and will extend this into a regular series of articles aimed at those IT professionals who have found themselves assigned stewardship of their company’s data.

This first two articles cover what are arguably the most important aspects of the Database Administration profession: the ability to recover lost or damaged data through the backup and restore process. I’ve always stated that a successful backup is irrelevant only a successful restore with no loss (or at the worst acceptable loss) of data is the benchmark for a Database Administrator’s success.

We covered the basics of backups in the first article. Of course I stress the importance of a successful restore but of course you can’t have that with at least one viable backup. This article covers what comes next and what can sometimes strike fear into the heart of a new DBA particularly one who took the reins not by choice but rather by circumstance.

Types of Restores

Just as there are multiple varieties of backups in Microsoft SQL Server there are also multiple types of restore processes that can be utilized when facing the need to recover a database or part or a database that has gone foul.

There are three basic types of backups in Microsoft SQL Server we will be covering :

Database

Transaction Log Restores

File and Filegroup Restores

Page Restores

We will cover the first two restore processes here as they’re the ones a DBA is most likely to encounter. Restoring pages and files or filegroups are advanced topics that will not be covered in this series since we’re trying to keep it aimed at the new Database Administrator.

Author’s Note: I move forth with the expectation that you either have a working understanding of backing up SQL Server databases or have at least read the first article in this series .

Database Restores

A database restore involves at least one backup file and it has to be a full backup file. Using a single backup file, if successful, will produce a restored database to a point of time when the backup was taken. It will be a copy of that database at that time less any transactions that were uncommitted. Database restores are a common method of migrating copies of databases amongst server for such things as testing or troubleshooting; so long as the database is reasonably sized (and by reasonably sized that is a loose term for manageable with the constraints of time and disk space you need to move it around.

By no means though are you constrained to the single file restore-just-to-the-point-of-the-last-backup situation. You can string together multiple backups (of multiple types) in order to successfully bring a database up to a point in time right before failure or corruption should you choose and if your backup and restore strategy are proper. If your database is in a logged recovery model (Full or Bulk-Logged see the backup article in this series for explanations of recovery models) then you can perform a chain of backups to do just that recover right up to the point of failure. In this case you would do so by first applying your base database backup followed (possibly) by a differential backup and then finally by one or more transaction log backups up to a point in time of your choosing. That point in time could coincide with the end of the last transaction log backup to be applied or somewhere in the middle of that last transaction log backup.

The base backup will build your platform for the restore process. You can stop here if you don’t need to apply any further backups to meet your recovery needs. If you need to move towards a point in time recovery you’re then left to a mixture of differential and transaction log backups. As explained in the last article, differential backups capture all changed pages in the database since the last full backup. This means that if you schedule your full backups at noon each Monday and your differential backups at 11:00am every day then each sequential daily differential backup will get larger until the next full backup is cut that following Monday at noon. (Assuming there is activity in the database on a regular basis and that activity involves either changes to data via INSERT, UPDATE, or DELETE statements or modifying the structure of objects in the database.) Under this schedule if you have a failure on Wednesday at 1:00pm you would first restore the previous Monday’s full backup followed by the differential backup from Wednesday. Then you’d employ transaction log backups from that point to roll forward the database to a point in time right before 1:00pm.

Hopefully you’re backing up the transaction logs if this sort of recovery is important for you!

Conversely, let’s say the failure is instead at 9:00am on Wednesday. Now you’d have a bit more work because you could not use that Wednesday differential backup as a stepping off point before rolling forward transaction log backups since it was taken past the point of failure you’re trying to recover. Instead you’ll need to apply the Monday full backup followed by the Tuesday differential backup. Then you’ll need to apply as many transaction log backups as it takes to get to 9:00am on Wednesday from the point the differential backup completed on Tuesday.

I outlined this because you need to know two main things about differential backups: you should not have to apply multiple differential backups and differential backups have no concept of understanding when a transaction took place that modified data or the structure of the database you can’t stop half-way through a differential backup when trying to recover to a point in time.

Recovery, No Recovery, Standby

When discussing restores it’s important to understand the concept of the state you intend to leave your database in once you perform the restore of a single backup be it a database, transaction log, or a differential backup. There are three states to cover:

Recovery

A database in a recovered state means it’s ready for use and you’ll not be able to restore any additional backups to bring the database further along in it’s lifeline. In order to perform a chain of restores you can’t break the log sequence of transactions a database is comprised of. You can think of the log sequence as the story of a database. Each transaction is logged with a log sequence number or LSN. In a logged recovery model, when restoring full, differential, and transaction log backups the backups must be ordered and applied in such a manner that the order of LSNs is not broken. If you were to allow a database to come into recovery and users start creating new transactional activity against the database then you could not apply further backups because that chain is now broken by new transactional activity logging new LSNs. To place a database in a recovered state at the end of the restore process you use the keyword RECOVERY in the WITH clause of the restore command. We will look at syntax at the end of the article. Don’t fret.

No Recovery

If a database is in a recovering state you’ll see it displayed as restoring in SQL Server Management Studio’s Object Explorer. This means that the database is in a state where at least one backup file has been restored and the database is ready to accept the next backup in the chain be it a differential or a transaction log backup. It could also mean that whomever restored the last backup failed to place it into recovery with the RECOVERY keyword mentioned above. A database in recovery, while able to accept subsequent backup files in the restore process, is not capable of allowing activity from end users this even includes read activity that would not modify data in any manner. Once a database has been in a recovered state you cannot place it into a recovery state to accept new restore process activity for the reasons already stated above. To ensure a database in the midst of a restore process can accept additional differential or transaction log backups you need to ensure you have the NO_RECOVERY keyword included in the restore command’s WITH clause.

Standby

A database in standby mode is in limbo: it does allow read-only activity by users and with a gentle nudge it can accept additional restore activity. In order to allow this and provide a stable and consistent state of data for querying it must rollback any uncommitted transactions. If it discarded those transactions you’d not be able to restore any further backups because the LSN chain would be broken. So what SQL Server does is stores those uncommitted transactions in what is called an UNDO file. Before any further transaction logs can be restored SQL will need to place the database into a recovering state, apply the transactions stored in the UNDO file, then finally restore the next transaction log in the backup chain. To place a database into a standby state you’ll need to specify the keyword STANDBY in the restore statements WITH clause and also specify a path to the UNDO file.

Transaction Log Backups

At this point I’ve already danced around the concept of transaction log backups. You can use these backup files to roll a recovering database forward to any point in time that is covered in the timespan of the transaction log backup. Any uncommitted transactions at the end of the backup are left uncommitted with the expectation that you’ll be able to apply additional backups to the restoring database. Should you choose to bring a database into RECOVERY it will roll back those uncommitted transactions. Should you choose NO_RECOVERY it will leave them intact and uncommitted. Select STANDBY and you’ll end up with those uncommitted transactions rolled back and preserved in the UNDO file. Transaction log backups store all information necessary to apply the transactions in time order and without collision as though you’re just playing back a recording of the activity in the database essentially that is exactly what you’re doing when you apply a transaction log backup.

Our Sample Database Backup History

This section and the following section using Transact-SQL uses the following backups of my SQL_Cruise database which is in Full recovery meaning all transactions are being logged and that if I so choose I could recover to a point in time if my backup files exist for that point in time.

What Every Accidental DBA Needs to Know Now: #2 Restores

Furthermore we’re going to say that someone (not me of course) dropped a very important table at 20:26 on 8/30/2016 with the following command:

DROP TABLE Very_Important_Table;

The Restore Process: GUI to T-SQL Process

I will be using the latest download of SQL Server Management Studio to demonstrate. Earlier supported versions will look similar but may differ slightly.

The restore forms are located through right-clicking on the database you wish to recover. Then navigate through subsequent pop-up menus until you get to the different recovery types you can employ.

For this example we need to use the database restore process. When the Restore Database form is displayed it will fill in necessary values to get you to the latest recovery point your backups can recover to based upon stored backup history. In this case we start with the last full backup, the last differential backup prior to what SQL Server thinks is your restore point the last point in time that’s recoverable, then all transaction logs to get you to that point.

If we wanted to recover to the last point in time that we could we would navigate on to the Files and Options pages in this form. If that is what you need to do now then by all means skip to the next paragraph where we discuss those pages. Those of you hanging on for a point in time recovery join me as I click on the Timeline… button:

This brings up a form that looks something like this in SQL Server Management Studio 2016. The only difference is that Last backup taken would be selected. I’ve already taken the step to fill in the time I want to restore to instead.

If I now click OK we can join the rest of the group on the Files page. The Files page is utilized to allow you to restore to a different location. This is valuable should you want to restore a copy of the database on a different server with perhaps a different file structure. When it comes to the syntax this will add a MOVE sub-statement to the underlying code as we will see later. Since we’re restoring over the existing database we will make no modifications here.

Finally we’re left with the Options page. I’ve filled in the relevant values for this example but will explain each section below:

Restore options:

This is what determines will happen with our existing database when the restore process completes:

Overwrite the existing database is what we want here since we need to replace what (someone) has broken. If you select this option you will lose any transactions that took place since the point in time you recovered to should you bring the database into a recovered state.

If you were dealing with a database under replication you’d want to consider the Preserve the replication settings option to ensure you’ll be able to get this database back into your replication scheme.

Restrict access to the restricted database will not allow general access to the database. You could use this to bring the database up and troubleshoot without letting users back in.

Recovery state:

We’ve covered this at length above. We’re selecting RESTORE WITH RECOVERY to bring the database back online. As you can see you can specify an UNDO file if you were to restore this database into STANDBY at the end of the restore process.

Tail-Log backup:

A tail of the log backup is a special process that is two-phased: a final transaction log backup is taken and then the database is placed into a state that prevents anyone to connect to it other than the connection performing the tail of the log backup and the subsequent recovery process. You can’t restore a database if there are connections to it. What this does is two things: it ensures you have a backup through to the point the database was last “up” should someone change their mind about the point in time restore you’re about to perform and it makes sure you can run the restore by kicking out any existing connections. There are seldom any reasons not to check this as a precaution if running in a logged recovery model.

The final options allow you to force the database into single user mode (in case you’re not performing a tail of the log backup or running in Simple recovery) and an option to have the process prompt you before each file restore step.

At this point we’re ready to bring the database up to the recovery point we’ve been instructed. What I want to show you before we do that is what the code that will execute this process looks like. In most dialogs in SQL Server Management Studio you have the option to script out the work behind the scenes to a new query window, clipboard, or a SQL Server Agent job to execute at a later time. If I script this to a new query window and clean it up to make it presentable you can see all the steps taken in the code window that follows: (the notations are my own. SQL is nice, but not that nice).

USE [master]

--Place the database into Single_User mode:

ALTER DATABASE [SQL_Cruise] SET SINGLE_USER WITH ROLLBACK IMMEDIATE

--Perform a tail of the log backup just in case:

BACKUP LOG [SQL_Cruise]

TO DISK = N'C:\Data\mssql12.MSSQLSERVER\MSSQL\Backup\SQL_Cruise_LogBackup_2016-08-30_20-37-27.bak'

WITH NOFORMAT, NOINIT, NAME = N'SQL_Cruise_LogBackup_2016-08-30_20-37-27', NOSKIP, NOREWIND, NOUNLOAD, STATS = 5

--Restore the first, full backup:

RESTORE DATABASE [SQL_Cruise]

FROM DISK = N'C:\temp\SQL_Cruise_FULL.bak'

WITH FILE = 1, NORECOVERY, NOUNLOAD, REPLACE, STATS = 5

--Restore the latest differential backup possible to meet our goal:

RESTORE DATABASE [SQL_Cruise]

FROM DISK = N'C:\temp\SQL_Cruise_DIFF_3.bak'

WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 5

--Restore the complete transaction log backups in order:

RESTORE LOG [SQL_Cruise]

FROM DISK = N'C:\temp\SQL_Cruise_log_03.trn'

WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 5

RESTORE LOG [SQL_Cruise]

FROM DISK = N'C:\temp\SQL_Cruise_log_04.trn'

WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 5

--Restore the last transaction log backup that includes the point in time and recover:

RESTORE LOG [SQL_Cruise]

FROM DISK = N'C:\temp\SQL_Cruise_log_05.trn'

WITH FILE = 1, RECOVERY, NOUNLOAD, STATS = 5,

STOPAT = N'2016-08-30T20:26:00'

--Bring the database out of single_user mode:

ALTER DATABASE [SQL_Cruise] SET MULTI_USER

Note that there are two tape-dependent terms used within the generated code: NOUNLOAD and NOREWIND. They are ignored in non-tape operations but still are generated by default. They can be ignored. NOSKIP, as you’ll see in the tail of the log backup, controls whether a backup operation checks the expiration of a backup set on media before overwriting. Here it is also added by default and can be ignored.

As one can see from the code the database is going to be placed into single user mode severing all sessions connecting to the database and rolling back their uncommitted transactions in the process. This then allow a final transaction log backup to take place preserving the state of the database prior to overwriting it as part of the restore process. After the tail of the log backup is done we step through the full, differential, and log restores each leaving the database in a norecovery state so the next restore operation can take place. Finally we get to the transaction log backup that hosts the point in time we want to recover to. Utilizing the STOPAT command we can set the process to go no further in it’s transaction replay than that point in time. You’ll note we also restore WITH RECOVERY at this point so no more restores can occur. Finally we bring the database out of single user mode to let all users get back into the database.

I mentioned that should to want to move the database to a new location you can employ the WITH MOVE command by changing the restore path in the Files page of the restore dialog. Doing so only affects the initial full backup restoration because all subsequent restores reference the database without need to set or create the file structure. I've isolated the code below for your review:

RESTORE DATABASE [SQL_Cruise] FROM DISK = N'C:\temp\SQL_Cruise_FULL.bak'

WITH FILE = 1,

MOVE N'SQL_Cruise'

TO N'C:\Data\MSSQL12.MSSQLSERVER\MSSQL\DATA\SQL_Cruise_NEW.mdf',

MOVE N'SQL_Cruise_log'

TO N'C:\Data\MSSQL12.MSSQLSERVER\MSSQL\DATA\SQL_Cruise_NEW_log.ldf',

NORECOVERY, NOUNLOAD, REPLACE, STATS = 5

STATS

Let meaddress the STATS = 5 syntax This is the value that denotes what percent of the restore is messaged back to the console in SQL Server Management Studio when you execute the restore process. The default is5 which meansyou'll be messaged every 5% the restore completes. I find this verbose and depending on the size of the database tend to set this value anywhere between 10 and 50.

FILE Setting

You may notice the FILE = 1 setting for each restore. This is the most common value you encounter. Think of the file number as the position of multiple backups sent to the backup file. It’s possible to back up more than once to the same file without overwriting it. Each backup sent to the same file is positioned subsequent to the last and its file number is incremented by 1. Since there is only one backup per file the file number in this example is 1.

Simple Transact-SQL Backup Commands

Just as I provided in the backup article this is where I provide you with some simple, templated, commands that cover the theory we just discussed. If you’re not familiar with templates in SQL Server Management Studio check out these two articles:

Introduction to SQL Server Management Studio Templates

Deeper int the SQL Server Management Studio Template Explorer

Tail of the Log Backup

--Place the database into Single_User mode:

ALTER DATABASE [<database_name,,>] SET SINGLE_USER WITH ROLLBACK IMMEDIATE

--Perform a tail of the log backup just in case:

BACKUP LOG [<database_name,,>]

TO DISK = N'<tail_of_the_log_backup_file_location,,>'

WITH NOFORMAT, NOINIT, NAME = N'<tail_of_the_log_backup_file_logical_name,,>', STATS = <stats_value,,10>

Restore Full or Differential Backup To Same Location (same syntax) RESTORE DATABASE [<database_name,,>]

FROM DISK = N'<backup_file_location,,>'

WITH FILE = <file_number,, 1>, <recovery_state,,NORECOVERY>, <replace_setting,,REPLACE>, STATS = <stats_value,,10>

Restore Full Backup to New Location RESTORE DATABASE [<database_name,,>]

FROM DISK = N'<backup_file_location,,>'

WITH FILE = <file_number,, 1>,

MOVE N'<logical_data_file_name,,>'

TO N'<physical_data_file_path,,>',

MOVE N'<logical_log_file_name,,>'

TO N'<physical_log_file_path,,>',

<recovery_state,,NORECOVERY>, <replace_setting,,REPLACE>, STATS = <stats_value,,10>

Note that you may have multiple data files to restore. If so you can easily modify this code to add more data files.

Restore Transaction Log RESTORE LOG [<database_name,,>] FROM DISK = N'<backu

↧

Things to Consider When Deciding on a Primary Key for your INNODB table

September 2, 2016, 1:33 pm

≫ Next: Redis Servers Targeted with Fake Ransomware

≪ Previous: What Every Accidental DBA Needs to Know Now: #2 Restores

When it comes to schema design, one of the more common issues I see for INNODB tables is in the selection of a primary key, or an absence of a primary key entirely. TodayI would like to illustrate some of the best practices when it comes to selecting a primary key for your consideration as you design new tables for your schema, or modify existing ones.

Let’s start by reviewing how the table structure works in INNODB.

Everything in INNODB is an index, even the base table where you store your data, which is considered to be a clustered index. Indexes require some kind of node identification, and if a primary key is explicitly defined it will act as that clustered index node identifier and will determine the order of the data as it’s stored on disk. If no primary key is defined, INNODB will check to see if a secondary non nullable unique index is available and if so it will use that to form identification of nodes in the clustered index. If no secondary unique index is available, INNODB will still create a node identifier on the back end called DB_ROW_ID , which is an implicit identifier that increments as data is added to the table.

Here are a few points to consider which will illustrate why you should consider explicitly defining a primary key, and how it should be formed.

Without an explicit primary key, mysql will create an identifier for you anyway but you won’t be able to use it.

As noted above, if you don’t have a primary key or a secondary non nullable unique index defined, INNDOB is going to create an identifier for you, but it’s going to create it on the back end where you won’t be able to actually use it. If you’re going to have a row identifier in your table, you might as well have it exposed so you can make use of it.

You can save disk space by creating a column for a primary key.

When you don’t have a primary key defined, you can actually end up storing more data on disk. In the following example I’ve created a schema called ‘pktest’ and have created two tables ‘test’ and ‘testpk’. Both tables have a column where a single bit is stored, but ‘testpk’ also has a second column called ‘id’ which is mediumint, is set as the primary key for the table, and is set to auto_increment. Both tables were populated with 15M records of data…

mysql> show create table test \G *************************** 1. row *************************** Table: test Create Table: CREATE TABLE `test` ( `data` bit(1) NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> show create table testpk \G *************************** 1. row *************************** Table: testpk Create Table: CREATE TABLE `testpk` ( `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT, `data` bit(1) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> select count(*) from test; +----------+ | count(*) | +----------+ | 15000000 | +----------+ 1 row in set (21.46 sec) mysql> select count(*) from testpk; +----------+ | count(*) | +----------+ | 15000000 | +----------+ 1 row in set (4.43 sec) mysql> select data+0 from test limit 10; +--------+ | data+0 | +--------+ | 0 | | 0 | | 0 | | 1 | | 1 | | 0 | | 1 | | 1 | | 0 | | 1 | +--------+ 10 rows in set (0.06 sec) mysql> select id, data+0 from testpk limit 10; +----+--------+ | id | data+0 | +----+--------+ | 1 | 1 | | 2 | 1 | | 3 | 0 | | 4 | 1 | | 5 | 0 | | 6 | 1 | | 7 | 1 | | 8 | 1 | | 9 | 1 | | 10 | 0 | +----+--------+ 10 rows in set (0.00 sec) mysql> exit Bye [root@cent1 pktest]# pwd /var/lib/mysql/pktest [root@cent1 pktest]# ls -lh total 769M -rw-rw---- 1 mysql mysql 65 May 25 08:55 db.opt -rw-rw---- 1 mysql mysql 8.4K May 25 08:55 test.frm -rw-rw---- 1 mysql mysql 408M May 25 09:16 test.ibd -rw-rw---- 1 mysql mysql 8.4K May 25 09:10 testpk.frm -rw-rw---- 1 mysql mysql 360M May 25 09:20 testpk.ibd

As you can see above, the table where the primary key is defined has used less space. The reason for this is that the implicit identifier DB_ROW_ID is a 6 byte integer, where mediumint only uses 3 bytes. So if you have a table where each row can be identified by a datatype smaller than a BIGINT UNSIGNED (IE: Your table is going to have 4294967295 rows or fewer), you will save disk space by using an explicitly defined integer as a primary key.

It should also be noted that this is going to save space when it comes to secondary indexes as well as each node in the secondary index is going to reference the identifier for the clustered index.

The smaller the data footprint, the more you can store in the innodb buffer pool.

Some external tools may not work without a primary key.

One other item to note is that external tools like pt-online-schema-change won’t work unless there is an explicitly defined primary key.

[root@cent1 ~]# pt-online-schema-change --alter "ENGINE=INNODB" D=pktest,t=test --user=root --password=$pw --dry-run Operation, tries, wait: copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 Starting a dry run. `pktest`.`test` will not be altered. Specify --execute instead of --dry-run to alter the table. Creating new table... Created new table pktest._test_new OK. Altering new table... Altered `pktest`.`_test_new` OK. 2016-05-27T15:17:23 Dropping new table... 2016-05-27T15:17:23 Dropped new table OK. Dry run complete. `pktest`.`test` was not altered. The new table `pktest`.`_test_new` does not have a PRIMARY KEY or a unique index which is required for the DELETE trigger. Row based replication will be less efficient without an explicit primary key.

When you replicate using row based binary logging, a full table scan is required to update the record on the slave server due to the fact that the implicit identifier DB_ROW_ID is not included in the binary log. This will apply to mixed mode binary logging as well when mysql finds a transaction to be non-deterministic in nature and converts the binary logging even to row-based.

Try to use integers as identifiers wherever possible.

You should avoid using strings (UUIDs) as identifiers whenever you can. When you insert new data into your table it’s going to order it on disk based on the order of the identifier of the primary key. So having string values where something can be inserted in the middle of the table will likely result in excess page splitting. You can help alleviate this problem by adjusting the innodb_fill_factor variable (now available in MySQL 5.7) to allow for greater space availability in each page, but overall this is going to use a lot more space on disk and isn’t going to help you anywhere near as much as changing the identifier to an integer would.

Remember, data being inserted into the clustered index has to be pushed to disk as innodb moves through its cycles and flushes out dirty pages, so there will be higher IO usage than string data that’s added to a secondary index where the change buffer may hold onto the data in memory (with supporting records in the system tablespace) for longer periods of time before finally flushing the secondary index data to disk.

One potential exception to the above comment about avoiding UUIDs is that if you are using UUID type 1 (this is what is provided with the mysql function UUID()), and if you remove the dashes and reverse the order of the first three sections of the output and use this as your primary key, you can save space and get similar insert speeds than what you may have when using an integer auto incrementing identifier with a secondary index to support the UUID. This method was outlined in a

↧

Redis Servers Targeted with Fake Ransomware

September 2, 2016, 4:25 pm

≫ Next: IT consultants eye big data future, predict analytics trends

≪ Previous: Things to Consider When Deciding on a Primary Key for your INNODB table

A crook is hacking Internet-exposed Redis servers, adding a rogue SSH key on infected systems, deleting user data, and leaving a ransom note behind in an attempt to fool the server owner that his data was encrypted by ransomware.

The attacker tells the Redis DB owner he should pay a 2 Bitcoin (~$1,100) ransom to recover his files, but in reality, all the data is gone, according to a honeypot server set up by Duo Security that has captured the crook's real actions.

The return of Crackit Ryan

The problem at the core of this issue is the fact that server owners leave crucial and very sensitive Redis databases exposed online. Duo researchers say they've found over 18,000 Redis databases available online that featured no password authentication.

Researchers say that they've identified evidence of attacks on 13,000 of these servers (around 72 percent). The evidence they're mentioning is an SSH key which the attacker has left behind after breaking into the vulnerable server.

The SSH key is named "crackit" and was found on almost all 13,000 servers. The key also has a Jabber ID attached: ryan@exploit.im.

At the start of July, in a similar report, Risk Based Security discovered the same SSH key and the Jabber ID on 6,338 servers , albeit without any clues of the attacker deleting files and asking for ransom.

Based on a user comment on our story dated August 2, fake ransomware seems to have been a recent addition to the crook's mode of operation.

Some victims paid the ransom note

According to Duo, after compromising each Redis server, the crook deletes data from the /var/www/ , /usr/share/nginx/ , /var/lib/mysql/ , and /data/ folders. Based on honeypot data, there is no attempt to encrypt any of the data, or back it up on another server.

After these operations, the crook rewrites the server's MOTD and adds a file to the server's root folder called READ_TO_DECRYPT. This ransom note points the user to a URL that contains the following ransom note (embedded below).

According to Bitcoin blockchain statistics, the crook has received three payments to the Bitcoin address listed inside the ransom note. The crook made 2.5995 Bitcoin (~$1,450).

Knowing what the hacker is actually up to, users should not pay the ransom in any way if they discover the "cracka" SSH key on their servers. Users can still recover files from off-site data stores if they regularly create backups for their servers.

↧

IT consultants eye big data future, predict analytics trends

September 2, 2016, 4:24 pm

≫ Next: New in Couchbase Mobile v1.3: Inter-instance Replication in Sync Gateway

≪ Previous: Redis Servers Targeted with Fake Ransomware

Increased adoption of cloud analytics applications and analytics as a service platforms. A bigger role for graph database and analytics technology in big data applications. Consolidation in the NoSQL database market. Some high-profile big data project failures involving still-emerging open source technologies.

Those are some of the business intelligence, analytics and data management developments predicted to happen over the next 12 months by attendees at the 2016 Pacific Northwest BI Summit . The conference, held annually in Grants Pass, Ore., brings together a small group of IT consultants and vendor executives to discuss technology issues and trends, and one of the sessions each year features round-robin predictions by the participants. Here are highlights of the view into the expected BI, analytics and big data future that emerged from this year's event.

More analytics in the cloud. Various participants said they expect to see expanding use of cloud-based analytics systems , including analytics services offerings, as companies look to accelerate the deployment of analytics applications and reduce costs in the process. Hand in hand with that, they forecast growing deployments of data warehouses andbig data systems in the cloud. In fact, Gartner analyst Merv Adrian said recent surveys by the consulting and market research company show there's already "a helluva lot more" Hadoop systems running in the Amazon Web Services cloud than was thought previously. Through its Amazon Elastic MapReduce platform, AWS has "more users of Hadoop than all the other vendors in the market combined," Adrian said, putting the number of EMR installations in the thousands.

Data governance to step forward -- maybe. Calls for stronger data governance processes have accompanied the increasing role of BI and analytics applications in driving business decision making, especially with big data environments expanding the amount and types of data that organizations are collecting and using across a larger set of systems. Some of the conference attendees predicted that data governance will finally become a more central focus in many organizations over the next 12 months. Others weren't so sure, though. "Data governance will still be boring to CEOs -- they just don't get it," said Claudia Imhoff, president of consultancy Intelligent Solutions Inc. in Boulder, Colo.

Graph technology trends higher. Graph databases are already getting increased attention from technology vendors and users alike because of their ability to map relationships between different data elements for more insightful analytics in applications that fit the graph model. Donald Farmer, vice president of innovation and design at BI software vendor Qlik, predicted that graph technology will become a central component of all database platforms as part of the big data future. Graph analytics could also aid in basic data management functions, such as metadata management, added Mike Ferguson, managing director of U.K.-based consultancy Intelligent Business Strategies Ltd.

NoSQL market shakeout ahead. In a bit of hyperbole, Ferguson cited a total of 250NoSQL databases available for use. The actual number might not be that high -- but there are more NoSQL vendors than the market can support, he said in predicting that some consolidation is ahead. Adrian agreed, saying a small number of vendors are getting out in front of the rest of the NoSQL pack. Illustrating that, Gartner included 10 NoSQL software suppliers among the top vendors of operational databases in a Magic Quadrant report published last October, with four of them making the Leaders category. SQL-on-Hadoop query engines are also ripe for a reduction in current offerings, Ferguson said. Without any hyperbole, he noted that he's tracking 23 different SQL-on-Hadoop platforms , most of them emerging technologies that are still maturing. The plethora of choices "has to get simplified," he said.

Big data deployment nightmares. Speaking of emerging technologies, Adrian predicted there will be at least one "catastrophic failure of an open source project" that mainstream enterprises have bet on and deployed early in its development process. Nonetheless, he expects development and adoption of open source technologies to continue to drivethe big data future forward. "I don't think that means the fertile soup of open source creation slows down -- in fact, it accelerates," Adrian said. "It's a more important avenue now [for big data technology development] than commercial mechanisms are."

↧

New in Couchbase Mobile v1.3: Inter-instance Replication in Sync Gateway

September 2, 2016, 5:46 pm

≫ Next: Fake Ransomware Targets Redis Instances

≪ Previous: IT consultants eye big data future, predict analytics trends

New in Couchbase Mobile v1.3: Inter-instance Replication in Sync Gateway

DNA undergoing replication.

Graphic courtesy of Madprime with permission under license CC BY-SA 2.0

Sync Gateway

Sync Gatewayforms the "glue" betweenCouchbase LiteandCouchbase Serverin theCouchbase Mobilestack. It's a secure web gateway that enables sync and data access over the web.

That's one way to look at it, which doesn't really do Sync Gateway justice. You can intelligently use it to link together clients without any back end, for example. In aprevious post, I talked about OpenID Connect support. Sync Gateway acts there to ease the whole authorization flow. There's more, but in this post I want to focus on a new capability added in version 1.3.

A Deeper Look at Syncing

Syncing (short for synchronizing) refers to keeping data consistent across two or more instances of a database. Syncing can be a difficult problem. Any time two writers try to make conflicting changes, the architecture has to deal with it.

Some databases simply ignore the issue, forcing a single-writer only use. Others depend on detecting conflicts in real time and rejecting them. These approaches don't work in cases where different copies of the database can't always communicate to coordinate.

A goodsync architectureis a critical component of a full solution that addresses uses where a device may be on a slow network connection, or be disconnected entirely part of the time. Sync Gateway implements a core piece of that architecture in Couchbase.

We refer to the full stack as Couchbase Mobile. Mobile uses are an obvious case where offline performance can be important. Really, though, you can use Couchbase Mobile, including Sync Gateway, in many other scenarios. Couchbase Lite and Sync Gateway run on a broad range of platforms, and are typically easy to port to new ones. This makes Couchbase Mobile useful for anything from desktop (or maybe I should say laptop) to IoT.

Here are some of the key feature to know about Sync Gateway replications in general:

JSON configuration to specify replications Supports multiple replications running concurrently Can run both OneShot and Continuous replications Does not store anything persistently Stateless -- can be interrupted/restarted anytime without negative side effects Filter replications using channels Inter-instance Replication

Release 1.3 adds a new capability to Sync Gateway, the ability to replicate (sync) between Sync Gateway instances. Each replication gets configured as a uni-directional flow between two endpoints. This makes them quite flexible.

For example, a simple replication can specify two databases on the same Sync Gateway instance. This might not seem that interesting, but because each database can have its own sync function, you might use one database as a feeder for another, allowing complex business logic to manage what gets passed through.

This diagram shows a more typicaluse case. Here, we have two Couchbase Server clusters, and two Sync Gateway clusters. The Sync Gateway clusters run a bi-directional replication (really, two uni-directional replications). This could form the basis for a globally distributed system. Standard network routing procedures would ensure clients connect to the nearest Sync Gateway, improving network performance. For more details on replication in Sync Gateway, take a look at the documentationhere.

Document Revisions and Conflicts

It's important to understand how Couchbase Mobile handles conflict resolution. CBM uses what's known as a multiversion concurrency control. You can think of CBM as storing not just one version of a document, but rather a tree of revisions. Two disconnected writes updating the same original revision of a document create two new revisions. Both exist in the tree. Couchbase gives you several ways to resolve the conflict. See the documentationherefor more details.

More Resources

Adam Wiggins, co-founder of Heroku, wrote a great article on why syncing offers a better UX .

I'll be posting a follow-up article soon illustrating a simple example of inter-instance replication. You can find the code on github .(Please note, this is an intentionally simple example. It is not intended as production quality code.)

↧