J2Cache 和 JetCache 框架有何不同？原荐

December 15, 2018, 12:12 pm

≫ Next: Geospatial Queries In MongoDB(Using location based data)

≪ Previous: pull subdocument in array mongodb

Java自由人 OsChina开发日志

正文

J2Cache 和 JetCache 框架有何不同？

原

荐

红薯发布于刚刚

字数 406

阅读 0

J2Cache JetCache

新鲜出炉，2019最新大厂面试题总汇！ >>>
J2Cache 和 JetCache 框架有何不同？原荐

从软件名称看还有点像呢？但这两者完全不是一回事！

JetCache 是阿里的一个基于 Java 的缓存系统封装，提供统一的 API 和注解来简化缓存的使用。也就是说这个项目主要的目的是为了让所有的缓存框架通过 JetCache 实现统一的接口调用，让你不需要关心底层缓存的 API 细节。这是设计模式层面上的封装。

而 J2Cache 完全不同，J2Cache 是一种全新的缓存功能设计。它主要要解决的问题是：

使用内存缓存时，一旦应用重启后，由于缓存数据丢失，缓存雪崩，给数据库造成巨大压力，导致应用堵塞使用内存缓存时，多个应用节点无法共享缓存数据使用集中式缓存，由于大量的数据通过缓存获取，导致缓存服务的数据吞吐量太大，带宽跑满。现象就是 Redis 服务负载不高，但是由于机器网卡带宽跑满，导致数据读取非常慢

更详细关于 J2Cache 的介绍请看这里。

所以呢，不要再搞混了哦！目前 J2Cache 还没有同类产品！

著作权归作者所有

共有人打赏支持

红薯

官方OSCer

开源中国官方OSCer

领取条件：开源中国社区正式员工可以领取

开源马克杯

开源马克杯是开源中国定制的“高大上”Coders 喝水利器！

领取条件：购买或拥有开源马克杯的 OSCer 可领取

开源T恤

开源中国倾力打造的一款写代码无 bug T恤

领取条件：拥有开源T恤的 OSCer 可以领取

开源项目作者

作为一个开源项目作者，是时候站出来拯救世界了！

领取条件：开源项目被开源中国收录的开发者可领取

参与源创会

“源创会”在线下联结了各位 OSCer，推广开源项目和理念，很荣幸有你的参与~

领取条件：参与过开源中国“源创会”的 OSCer 可以领取

十周年

开源中国十周岁啦～感谢 OSCer 一路同行

领取条件：领取开源报告并收获三位 OSCer 点亮即可领取

乐于助人

乐于助人是开源社区的良好礼仪~！

领取条件：回答上榜“回答被采纳数 TOP 5”或“回答问题数 TOP 5”任意一个榜单

粉丝 20820

博文 131

码字总数 47104

作品 8

深圳

产品经理

提问

Geospatial Queries In MongoDB(Using location based data)

December 15, 2018, 4:32 pm

≫ Next: RedisPlus 3.1.0 优化操作体验新增命令交互

≪ Previous: J2Cache 和 JetCache 框架有何不同？原荐

Overview of Geospatial queries in MongoDB: Although we can perform the queries based on spatial coordinates in SQL as well the simplicity and ease of implementation of such queries in MongoDB which is a JSON based storage is quite easy. Let’s explore these by doing.

We will proceed with an example of collection named “Address”.

Geospatial Queries In MongoDB

The very first step to use queries based on Geospatial coordinates in MongoDB is to create a 2dsphere or 2d index for the field which holds the spatial data in GeoJSON or legacy coordinate pairs format.

Note: 2dsphere index is used for the spherical geometry(like earth) while 2d is used for a plan geometry.

let’s consider one document structure for the collection Address as:

{ "houseno": "", "street": "", "city": "", "country": "" "mapLocation": { type: "", coordinates: [-14.767834223.164836] } // For GeoJSON "mapLocation": [-14.7678342, 23.164836] // For legacy coordinate pairs }

Note*: For every coordinates field 1st element should be longitude and 2nd latitude. For example in the above document -14.7678342 is longitude & 23.164836 is latitude.

In the above document the if “mapLocation” field is in GeoJSON format then the type field

May be one of these :

1. Point : In this type, the coordinates field represent a single location. eg [34, 56]

2. MultiPoint :In this type, the coordinates field represent an array of locations

eg : [ [-57.3562, 44.3421], [-55.3572, 45.3491], [-57.3862, 44.3121], [-56.3362, 43.3441] ]

3. LineString : In this type, the coordinates field represent an array of two or more locations (A Line)

eg: [ [-57.3562, 44.3421], [-56.3562, 42.3421] ]

4. MultiLineString : In this type, the coordinates field represent an array of LineString coordinates (Array of Lines).

eg: [ [ [-57.3562, 44.3421], [-56.3562, 42.3421] ], [ [-57.3562, 44.3421], [-56.3562, 42.3421] ], [ [-57.3562, 44.3421], [-56.3562, 42.3421] ] ]

5. Polygon : In this type, the coordinates field represent an array of Linear rings (A closed loop made up of at least four LineStrings in which the first and last coordinates must be the same).

eg : [ [ [0, 0], [3, 6], [6, 1], [0, 0] ] ]

6. MultiPolygon : In this type, the coordinates field represent an array of Polygons.

eg: [ [ [ [-57.62, 44.21], [-57.32, 44.31], [-57.62, 44.3], [-57.35, 44.34] ] ], [ [ [-57.62, 44.21], [-57.32, 44.31], [-57.62, 44.3], [-57.35, 44.34] ] ], [ [ [-57.62, 44.21], [-57.32, 44.31], [-57.62, 44.3], [-57.35, 44.34] ] ] ]

Note*: For legacy coordinate pair, by default the “2dsphere” index converts the data in GeoJSON Point format

Now, the index for mapLocation field can be created as :

db.Address.createIndex({"mapLocation": "2dsphere"}) // By mongodb command AddressSchema.index({"mapLocation": "2dsphere"}) // By mongoose

For 2d index, just replace the index value by “2d” instead of “2dsphere”

All setup is done! Now let’s play with the various interesting queries using few operators provided by mongodb like $geoWithin, $near, $geoNear, $nearSphere, $geoIntersects

Also Read: Export HTML To PDF Using javascript

Geospatial queries using MongoDB

*We will consider only the “Point” type geometry having a single location in mapLocation field.

1. $geoWithin : With this operator, we can find locations(documents) those are within thespecified shape like a box, polygon, circle, circle on a sphere(earth).

Examples : db.Address.find({ "mapLocation": { $geoWithin: { $centerSphere: [ [-67.76458, 34.64573], 10/3963 ] } } }) The above query will return all the documents from the Address collection which exist within the circle having center [-67.76458, 34.64573] and 10 miles on spherical geometry. 3963 is in radians. $centerSphere supports both GeoJSON and legacy coordinate pairs for the data typeof mapLocation and 2d and 2dsphere index both as well. db.Address.find({ "mapLocation": { $geoWithin: { $box: { [ [bottom left coordinates], [top right coordinates] ] } } } }) For bottom left coordinates [0,0] and top right coordinates [20, 20] the above query will return all the documents from Address collection for which the mapLocation field existwithin the box [0, 0], [0, 20], [20, 0], [20, 20]. $box operator only supported by legacy coor-dinates pairs and not with GeoJSON shapes. db.Address.find({ "mapLocation": { $geoWithin: { $polygon: { [ [x1, y1], [x2, y2], [x3, y3] ] } } } }) The above query will return all the documents having coordinates within the polygon defined by [x1, y1], [x2, y2], [x3, y3]. $polygon operator only supports legacy coordinates pairs and not with GeoJSON shapes.

2. $nearSphere : With this operator, we can find all the documents having locations nearest to a given geometry. It calculates the distance using spherical geometry like earth.

Examples : db.Address.find({ "mapLocation": { $nearSphere: { $geometry: { type: "Point", coordinates: [-76.87635, 65.76363] }, $minDistance: 500, $maxDistance: 1000 } } }) The above query will return all the documents in sorted order from nearest to farthest from the spatial coordinate [-76.87635, 65.76363] having min distance 500 meters and max distance 1000 meters. And for legacy coordinate pairs on 2d index: db.Address.find({ "mapLocation": { $nearSphere: [-76.87635, 65.76363] $minDistance: 500, $maxDistance: 1000 } }) 3. $near : With this operator, we can achieve the same result as from $nearSphere for “2dsphere” index but in case of “2d” $nearSphere still calculate the distance u

↧

RedisPlus 3.1.0 优化操作体验新增命令交互

December 15, 2018, 9:56 pm

≫ Next: 10个常见的Redis面试"刁难"问题

≪ Previous: Geospatial Queries In MongoDB(Using location based data)

视化管理客户端，欢迎大家交流，感谢支持，帮忙点个star

RedisPlus是为Redis可视化管理开发的一款开源免费的桌面客户端软件，支持windows 、linux、Mac三大系统平台，RedisPlus提供更加高效、方便、快捷的使用体验，有着更加现代化的用户界面风格。该软件支持单机、集群模式连接，同时还支持SSH（单机、集群）通道连接。RedisPlus致力于为大家提供一个高效的Redis可视化管理软件。

项目开源地址： https://gitee.com/MaxBill/RedisPlus

软件下载地址： https://pan.baidu.com/s/1ETwWnEj4rbsE1S3GlYHlWg

更新日志：

在链接界面双击某个连接后希望能自动跳到数据界面

链接信息希望能鼠标右键操作编辑，移除链接，操作链接，备份链接等等

json格式展示插件扩展

console交互

删除一个数据的bug

新增数据不刷新的bug

测试连接状态

数据视图可复制数据

zset新增编辑增加score

命令模式入口加到连接页面，移除数据视图的命令模式入口

连接增加高效模式，设置中心支持切换

单机/集群下set/zset/list/hash 修改数据内容

实时日志窗口

注意：mac和win平台测试不到位，可能存在某些未知问题，又遇到问题的麻烦反馈下，谢谢

更多预览图请移步项目主页： https://gitee.com/MaxBill/RedisPlus

↧

10个常见的Redis面试"刁难"问题

December 15, 2018, 9:54 pm

≫ Next: Docusign Integration With Stripe

≪ Previous: RedisPlus 3.1.0 优化操作体验新增命令交互

2018年07月25日 16:22:46 varyall 阅读数：3632

导读：在程序员面试过程中Redis相关的知识是常被问到的话题。作为一名在互联网技术行业打击过成百上千名的资深技术面试官，本文作者总结了面试过程中经常问到的问题。十分值得一读。

作者简介：钱文品（老钱），互联网分布式高并发技术十年老兵，目前任掌阅科技资深后端工程师。熟练使用 Java、python、Golang 等多种计算机语言，开发过游戏，制作过网站，写过消息推送系统和mysql 中间件，实现过开源的 ORM 框架、Web 框架、RPC 框架等

Redis在互联网技术存储方面使用如此广泛，几乎所有的后端技术面试官都要在Redis的使用和原理方面对小伙伴们进行各种刁难。作为一名在互联网技术行业打击过成百上千名【请允许我夸张一下】的资深技术面试官，看过了无数落寞的身影失望的离开，略感愧疚，故献上此文，希望各位读者以后面试势如破竹，永无失败！

Redis有哪些数据结构？

字符串String、字典Hash、列表List、集合Set、有序集合SortedSet。

如果你是Redis中高级用户，还需要加上下面几种数据结构HyperLogLog、Geo、Pub/Sub。

如果你说还玩过Redis Module，像BloomFilter，RedisSearch，Redis-ML，面试官得眼睛就开始发亮了。

使用过Redis分布式锁么，它是什么回事？

先拿setnx来争抢锁，抢到之后，再用expire给锁加一个过期时间防止锁忘记了释放。

这时候对方会告诉你说你回答得不错，然后接着问如果在setnx之后执行expire之前进程意外crash或者要重启维护了，那会怎么样？

这时候你要给予惊讶的反馈：唉，是喔，这个锁就永远得不到释放了。紧接着你需要抓一抓自己得脑袋，故作思考片刻，好像接下来的结果是你主动思考出来的，然后回答：我记得set指令有非常复杂的参数，这个应该是可以同时把setnx和expire合成一条指令来用的！对方这时会显露笑容，心里开始默念：摁，这小子还不错。

假如Redis里面有1亿个key，其中有10w个key是以某个固定的已知的前缀开头的，如果将它们全部找出来？

使用keys指令可以扫出指定模式的key列表。

对方接着追问：如果这个redis正在给线上的业务提供服务，那使用keys指令会有什么问题？

这个时候你要回答redis关键的一个特性：redis的单线程的。keys指令会导致线程阻塞一段时间，线上服务会停顿，直到指令执行完毕，服务才能恢复。这个时候可以使用scan指令，scan指令可以无阻塞的提取出指定模式的key列表，但是会有一定的重复概率，在客户端做一次去重就可以了，但是整体所花费的时间会比直接用keys指令长。

使用过Redis做异步队列么，你是怎么用的？

一般使用list结构作为队列，rpush生产消息，lpop消费消息。当lpop没有消息的时候，要适当sleep一会再重试。

如果对方追问可不可以不用sleep呢？list还有个指令叫blpop，在没有消息的时候，它会阻塞住直到消息到来。

如果对方追问能不能生产一次消费多次呢？使用pub/sub主题订阅者模式，可以实现1:N的消息队列。

如果对方追问pub/sub有什么缺点？在消费者下线的情况下，生产的消息会丢失，得使用专业的消息队列如rabbitmq等。

如果对方追问redis如何实现延时队列？我估计现在你很想把面试官一棒打死如果你手上有一根棒球棍的话，怎么问的这么详细。但是你很克制，然后神态自若的回答道：使用sortedset，拿时间戳作为score，消息内容作为key调用zadd来生产消息，消费者用zrangebyscore指令获取N秒之前的数据轮询进行处理。

到这里，面试官暗地里已经对你竖起了大拇指。但是他不知道的是此刻你却竖起了中指，在椅子背后。

如果有大量的key需要设置同一时间过期，一般需要注意什么？

如果大量的key过期时间设置的过于集中，到过期的那个时间点，redis可能会出现短暂的卡顿现象。一般需要在时间上加一个随机值，使得过期时间分散一些。

Redis如何做持久化的？

bgsave做镜像全量持久化，aof做增量持久化。因为bgsave会耗费较长时间，不够实时，在停机的时候会导致大量丢失数据，所以需要aof来配合使用。在redis实例重启时，优先使用aof来恢复内存的状态，如果没有aof日志，就会使用rdb文件来恢复。

如果再问aof文件过大恢复时间过长怎么办？你告诉面试官，Redis会定期做aof重写，压缩aof文件日志大小。如果面试官不够满意，再拿出杀手锏答案，Redis4.0之后有了混合持久化的功能，将bgsave的全量和aof的增量做了融合处理，这样既保证了恢复的效率又兼顾了数据的安全性。这个功能甚至很多面试官都不知道，他们肯定会对你刮目相看。

如果对方追问那如果突然机器掉电会怎样？取决于aof日志sync属性的配置，如果不要求性能，在每条写指令时都sync一下磁盘，就不会丢失数据。但是在高性能的要求下每次都sync是不现实的，一般都使用定时sync，比如1s1次，这个时候最多就会丢失1s的数据。

Pipeline有什么好处，为什么要用pipeline？

可以将多次IO往返的时间缩减为一次，前提是pipeline执行的指令之间没有因果相关性。使用redis-benchmark进行压测的时候可以发现影响redis的QPS峰值的一个重要因素是pipeline批次指令的数目。

Redis的同步机制了解么？

Redis可以使用主从同步，从从同步。第一次同步时，主节点做一次bgsave，并同时将后续修改操作记录到内存buffer，待完成后将rdb文件全量同步到复制节点，复制节点接受完成后将rdb镜像加载到内存。加载完成后，再通知主节点将期间修改的操作记录同步到复制节点进行重放就完成了同步过程。

是否使用过Redis集群，集群的原理是什么？

Redis Sentinal着眼于高可用，在master宕机时会自动将slave提升为master，继续提供服务。

Redis Cluster着眼于扩展性，在单个redis内存不足时，使用Cluster进行分片存储。

↧

Docusign Integration With Stripe

December 15, 2018, 11:54 pm

≫ Next: mongodb~mapreduce的实现特殊逻辑的统计

≪ Previous: 10个常见的Redis面试"刁难"问题

Apple pays integration with Stripe (STPTestPaymentAuthorizationViewController in debugging)

I am trying to integrate apple pay with stripe in my iOS app. Using ApplePayStub provide by stripe to check apple pay in DEBUG Mode I am using the latest stripe and ApplePayStub code from git Trying to run on iPhone 6 simulator and the code that i am

API / Web payment services that can easily be integrated with Rails

What are some of the popular web payment API/services that can be easily integrated with ruby on rails framework? I have tried using PayPal the standard version and it was not too flexibler. I have never tried the pro version. Can you guys suggest ot

Which CMS application has been integrated with existing Grails

I have an existing grails app with multiple forms. I need a CMS which allows me to add existing grails forms to this cms and assign each form different permissions. I need to be able to change site template css from admin page. Something like dotnet

Which Web application framework is highly integrated with Neo4J?

I have been looking at Neo4j today and I find it exciting. I wonder if there is web application framework which is highly integrated with Neo4J? Basically I would like to start experimenting with Neo4J and I thought maybe there is obvious choice of w

Rails - implementation of subscriptions with Stripe

I am about to implement subscription billings with Stripe and what kind of makes think is the way how subscriptions works out. I understand that I can specify all possible plans for subscriptions in Stripe's dashboard. In my app, when I send a Stripe

A great way to perform continuous integration with angular CLI and Jenkins

I trying to configure a correct way to perform a continious integration with Angular CLI. Just for fun, I manage my Jenkins in windows and I have created a test project with Angular CLI. This project is bound to Bitbucket remote and I using Sourcetre

Spring Integration with Twitter Problem with Authentication

I am trying to post a message to my twitter account using Spring Integration with Twitter with a standalone program on my windows XP machine. But I am getting the following error - WARNING: POST request for "https://api.twitter.com/1/statuses/update.

Paypal integration with Google-python application engine

I have to integrate paypal with my application which is built on app engine patch with python. I searched over the web and found some issues reported in paypal integration with google application engine like here: http://groups.google.com/group/googl

What is the problem with qt 4.5.0 integration with Visual Studio 2008?

I downloaded and installed qt evaluation for vs2008 and expect it to be integrated with VS, but it is not. It is trial 30 days commercial license. What`s wrong with it or may be I got it wrong?The Visual Studio integration is a separate download: htt

Can the Dropwizard settings be integrated with Scalatra? I try to setup a performance metrics tool with one of our new service built with Scala. I use Dropwizard Metrics since it is a popular one. The following code can't get compiled @Timed get("/greeting", operation(dummy)){ val name: Option[String] Continuous integration with XCode robots and cocoapodes

I'm having problems with bots occasionally failing when building. A pre build action is used to install the pods, this works well some times. However when the integration fails it's because of this error: Installing Pods Analyzing dependencies Downlo

Continuous stack integration with Visual C ++ and C #

Please recommend a good continuous integration that would build and integrate with the .net stack and the visual c++ as well. Some recommendations I have got are Jenkins CruiseControl Teamcity Because of the polyglot nature of the project, which cont

The distributed source control system has the best integration with Windows & Visual Studio?

It seems that both git and mercurial are rather linux oriented. Which of them is more mature on windows?If you're looking for a strong integration with a DVCS inside Visual Studio: Check the full tutorial here: http://codicesoftware.blogspot.com/2010

Tomcat integration with the siteminder

Is there a detailed guide for using Siteminder with Tomcat .I searched over Google and SO most of the questions were unanswered for this.Here's the direct link to the Tomcat and JBoss application server agent download with pdf. https://support.ca.com

↧

mongodb~mapreduce的实现特殊逻辑的统计

December 15, 2018, 11:52 pm

≫ Next: 3分钟了解HBase行键设计

≪ Previous: Docusign Integration With Stripe

map reduce的解释

这是一张来自mongodb-mapreduce图示，比较能说明问题

其实我们可以从word count这个实例来理解MapReduce。MapReduce大体上分为六个步骤：input, split, map, shuffle, reduce, output。细节描述如下：

输入(input)：如给定一个文档，包含如下四行：
Hello Java
Hello C
Hello Java
Hello C++ 拆分(split)：将上述文档中每一行的内容转换为key-value对，即：
0 - Hello Java
1 - Hello C
2 Hello Java
3 - Hello C++ 映射(map)：将拆分之后的内容转换成新的key-value对，即：
(Hello , 1)
(Java , 1)
(Hello , 1)
(C , 1)
(Hello , 1)
(Java , 1)
(Hello , 1)
(C++ , 1) 派发(shuffle)：将key相同的扔到一起去，即：
(Hello , 1)
(Hello , 1)
(Hello , 1)
(Hello , 1)
(Java , 1)
(Java , 1)
(C , 1)
(C++ , 1)
注意：这一步需要移动数据，原来的数据可能在不同的datanode上，这一步过后，相同key的数据会被移动到同一台机器上。最终，它会返回一个list包含各种k-value对，即：
{ Hello: 1,1,1,1}
{Java: 1,1}
{C: 1}
{C++: 1} 缩减(reduce)：把同一个key的结果加在一起。如：
(Hello , 4)
(Java , 2)
(C , 1)
(C++,1) 输出(output): 输出缩减之后的所有结果。 { "_id" : ObjectId("5a79391534cdbd692825e978"), "cdn" : "Conversant", "domain" : "7img1.xxxx.com", "status_code" : { "200" : 80, "206" : 3, "404" : 2, "304" : 4 } }

使用下面语句对status_code各种key进行统计

db.getCollection('log_coll').mapReduce( function(){ var codes = this.status_code; Object.keys(codes).forEach(function(k){ emit(k, codes[k]); }) }, function(k, v){ return Array.sum(v); }, { out : {inline : 1}, query: {} } )

也可以只显示状态为200的条目

db.getCollection('log_coll').mapReduce( function(){ var codes = this.status_code; Object.keys(codes).forEach(function(k){ if(codes[k].id=="200"){ emit(k, codes[k]); } }) }, function(k, v){ return Array.sum(v); }, { out : {inline : 1}, query: {} } ) 多级对象如果判断各级对象是否存在 db.getCollection('client_accounts').mapReduce( function(){ if(this.client!=undefined){ if(this.client.employees!=undefined) { var codes = this.client.employees; Object.keys(codes).forEach(function(k){ emit(k, codes[k]); }) } } }, function(k, v){ }, { out : {inline : 1}, query: {} } )

## 下面看多条件分组的mapreduce实现

我将多个条件拼接在一起方便查看，正式环境时可以使用js对象。

db.customerWorkloadTotal.mapReduce( function() { emit(this.salespersonId + "_" + this.customerId, this); }, function(key, values) { tagIntention = 0; signCustomerNums = 0; trackContactNums = 0; values.forEach(function(v) { if (v.tagIntention > 0) { tagIntention = v.tagIntention; } if (v.signCustomerNums > 0) { signCustomerNums = 1; } if (v.trackContactNums > 0) { trackContactNums = 1; } }); return { "tagIntention": tagIntention, "signCustomerNums": signCustomerNums, "trackContactNums": trackContactNums }; }, { query: { totalDate: { $gte: "2018-10-31" } }, sort: { totalDate: -1 }, out: { inline: 1 } } ).find()

↧

3分钟了解HBase行键设计

December 15, 2018, 11:50 pm

≫ Next: Advanced Design Patterns for DynamoDB

≪ Previous: mongodb~mapreduce的实现特殊逻辑的统计

HBase行键需要满足如下原则：

唯一原则：行键对应关系型数据库的唯一键，系统设计之初必须考虑有足够的唯一行键去支持业务的数据量。

长度原则：长度适中，一般从几十到一百字节，建议使用定长，方便从行键提取所需数据，而无须查询出数据内容以节省网络开销。

散列原则：避免递增，否则读写负载都会集中在某个热点分区，降低性能，甚至引起分区服务器过载而宕机。

HBase行键设计技巧

由于HBase不支持二级索引，所以HBase行键作为唯一的也是最有效的索引，需要尽可能多的糅合各种查询条件以提高查询效率，常见的设计技巧有：

反转补齐：对于用来存储实体数据的表，通常将实体ID（如用户ID）反转补齐位数后作为行键的开始，这样首先满足了对该实体数据查询的需求，同时由于反转了实体ID，所以最近产生的实体以及其数据不会落到同一个Region，避免了热点区间的产生。

使用GeoHash：GeoHash算法可以用来将多维数据映射为一维字符串，尤其是基于空间的经纬度数据，空间上靠近的经纬度点映射后的一维字符串在字典顺序上也靠近（当然会有特殊的临界问题）。

OpenTSDB：OpenTSDB是基于HBase的一个存储时序数据的数据库应用，通常用来存储一些系统的监控数据或者系统日志，OpenTSDB的行键设计类似对HBase的行键做了一个二次索引，格式为：

UID matric +TimeBase+UID tag1key +UID tag1value +UID tag2key +UID tag2value +UID tagNkey +UID tagNvalue

其行键设计会将所有的监控指标或者需要查询的业务标签均映射到一个等长的UID，然后将监控指标的UID作为行键的开始，这样设计有几个好处：

因为通常查询监控数据的时候都会选定一个监控指标（如CPU、内存等），这样相同监控指标的数据会相邻存储，提供查询效率。将监控指标映射为等长的UID可以减少行键的长度与重复度，减少存储空间，同时可以方便的从行键根据偏移量反向推演出监控指标。

↧

Advanced Design Patterns for DynamoDB

December 16, 2018, 10:12 pm

≫ Next: Enhanced PolyBase SQL 2019 MongoDB and external table

≪ Previous: 3分钟了解HBase行键设计

They Scaled YouTube ― Now They’ll Shard Everyone with PlanetScale ―Vitess is a relational data storage system (that horizontally scales mysql) that helped YouTube overcome a lot of scaling issues and its creators have now launchedPlanetScale, a company aiming to “turbocharge MySQL in the cloud” with Vitess.

TechCrunch

Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB ― A fascinating talk aimed at existing users of Amazon’s DynamoDB key/value/document store who want to really step up another level, but packed with useful thinking about non-relational data storage you might find useful as a Mongo or Cassandra, etc. user.

Rick Houlihan

JackDB: A Modern Database Client ― JackDB is a web-based database client with built-in tools for security and collaboration. Sign up for a free trial today.

JackDB, Inc. sponsor

Has FaunaDB Cracked The Code for Global Transactionality? ―FaunaDB is a distributed, multi-model, globally scalable OLTP database whose creators are trying to solve the problem of ensuring ACID transactions can work across instances in multiple data centers around the world.

Alex Woodie

Bringing Modern Programming Languages to the Oracle Database with GraalVM ― GraalVM is an interesting extension of the Java Virtual Machine that acts as a more universal VM for other languages than typical JVM languages like Java and Scala. By embedding GraalVM into Oracle, users of other languages can get direct access to Oracle’s features.

Bastian Hossbach

What's Happened in Databases in Fall 2018 ― While we keep you updated with database news each week, Markus’s quarterly expositions are helpful to get an idea of the big trends. Here he ponders the introduction of SQL to NoSQL systems and Amazon’s use of Oracle.

Markus Winand

Graph Databases And How They Relate to Other Graph Data Types ― Explore the ever-evolving world of graph technology including native graphs, property graphs, hypergraphs and RDF triple stores and all their trade-offs.

Bryce Merkl Sasaki (Neo4j)

How to use Redis Streams in your Apps ― Streams were a new data type introduced to the recently releasedRedis 5.0.

Roshan Kumar (RedisLabs)

Enforcing Transitive Constraints in SQL

Benchling Engineering

Orafce: Oracle Compatibility Functions for PostgreSQL ― Orafce implements some of the functions from Oracle Database that might be useful for users migrating to Postgres.

Orafce

mongodb-memory-server: Spin Up MongoDB in Memory for Faster Testing ― Spins up a real MongoDB Server programmatically from Node.js for testing or mocking during development and hold data in memory by default.

Pavel Chertorogov

JSON Schema Document Validation? In MongoDB - No Problem ― Our database expert will guide you through the steps to create a valid JSON schema for your MongoDB collection.

Studio 3T sponsor

Vitess 3.0: Powerful Sharding for MySQL ― Check the featured items for something about the creators of Vitess and their new company, PlanetScale.

Adrianna Tan

:computer:Jobs

UK Data Engineering Roles? Try hackajob ― See how much you could earn by putting your skills to the test, average salary 65k.

hackajob

↧

Enhanced PolyBase SQL 2019 MongoDB and external table

December 16, 2018, 10:10 pm

≫ Next: The Evolution and Future of Graph Technology: Intelligent Systems

≪ Previous: Advanced Design Patterns for DynamoDB

In this 5th part of the ongoing series of SQL Server 2019 Enhanced PolyBase, we will learn how to install and configure MongoDB and create an external table.

SQL Server 2019 supports MongoDB Teradata and ODBC based data sources. We will install MongoDB on both the windows environment as well as on Ubuntu OS.

MongoDB installation on Windows system

MongoDB is a non-relational and open-source document-based database. It offers important features such as high performance and automatic scaling. MongoDB comes into two editions Enterprise and the Community edition. In this article, we will install the MongoDB community edition on windows environment.

Download the MongoDB Community Servers
Enhanced PolyBase SQL 2019 MongoDB and external table

Once MongoDB package file download completes double-click on the file to launch it.

It launches the MongoDB 4.0.4 set up wizard. Click on next to continue the setup wizard.

Accept the end user license agreement and click Next.

We need to choose the setup type:

Complete: installs all program features. It will require additional disk space as compared to the custom set up. This is recommended to use if we do not have much knowledge about MongoDB Custom: In this setup mode, we need to choose the program features. It requires an advanced level of knowledge to install custom level set up

In this article, we will perform a complete installation.

The next page is the service configuration. We will go with the default configuration here.

If required, change the below directory locations:

Data Directory: C:\program Files\MongoDB\Server\4.0\data\ Log Directory: C:\program Files\MongoDB\Server\4.0\log\
Enhanced PolyBase SQL 2019 MongoDB and external table

In next page, we can choose to install MongoDB Compass official graphical user interface tool. We can skip this as well.

In the next screen, click on ‘Install’ to begin the installation.

You can monitor the installation status in the progress window.

We have installed MongoDB 4.0.4 in the windows environment.

During the installation, we have specified the MongoDB directory. Below is the content for MongoDB directory.

In the ‘C:\program files\MongoDB\Server\4.0\bin’ directory, we can find the highlighted .exe files

Server process: mongod.exe Client process: mongo.exe MongoDB shard process: mongos.exe
Enhanced PolyBase SQL 2019 MongoDB and external table

We need to add the path of the MongoDB directory into the environment variable under the ‘PATH’ variable.

Open the command prompt with administrative permission and launch the mongod.exe server process with below command.

>mongod dbpath “C:\Program Files\MongoDB\Server\4.0\data”

Here dbpath points to the database directory.

This command starts the mondod.exe server service and you can see the status ‘waiting for connections on port 27017’

Launch another command prompt and run the ‘mongo’ to start the client tool. It connects to the mongo server service and we can see the prompt ‘>’ . We can run the queries in MongoDB now.

While the client processes connect to the host MongoDB server instance, we can view the status as highlighted. We can see the connection information in the server instance command prompt. You get the information about the client OS type, name, version, architecture as well.

In MongoDB, we can connect to any database name. If just switch to that database. It is not required to create the database first.

In below screen, we connected to the database:

>use SQLShackDemoPolyBase

We can see the message ‘switched to DB SQLShackDemoPolyBase’

We can list down the database using the command ‘show dbs’

You can notice here that it does not show the database ‘SQLShackDemoPolyBase’ . We have not created any document in this database; therefore, it is not listed under the database names.

We can insert the document in below format.

>db.product.insert ({“name” :”SQLServer2019″})

Once the document is inserted, you get a message ‘writeResult ({ “nInserted”:1})’

Now, if you run the query ‘show dbs’ to view the databases, we can see the DB ‘SQLShackDemoPolyBase’

We can create multiple documents using the below format.

db.products.insert( [ { _id: 10, item: "Mpuse", qty: 50, type: "Computer" }, { _id: 11, item: "Keyboard", qty: 20, type: "accessories" }, { _id: 12, item: "Monitor", qty: 100 } ], { ordered: false } )
Enhanced PolyBase SQL 2019 MongoDB and external table

MongoDB on Ubuntu linux

We have learned how to install MongoDB in a windows environment in the above steps. Now, we will install MongoDB on the Ubuntu Linux. In this article, I am using a virtual machine configured with the Ubuntu OS. You can follow the articleLinux with Ubuntu to configure the virtual machine.

Open a terminal connection in Putty and run the below steps.

We need to import the GPG keys using the below command

sudo apt-key adv keyserver hkp://keyserver.ubuntu.com:80 recv EA312927

In this step, we need to add list file in sources.list.d. Run the below command. echo “deb http://repo.mongodb.org/apt/ubuntu “$(lsb_release -sc)”/mongodb-org/3.2 multiverse” | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list
Enhanced PolyBase SQL 2019 MongoDB and external table

Update the repository. This updates the software repository with the latest configurations.

Sudo apt-get update

↧

The Evolution and Future of Graph Technology: Intelligent Systems

December 17, 2018, 2:02 am

≫ Next: Partitioned consensus and its impact on Spanner’s latency

≪ Previous: Enhanced PolyBase SQL 2019 MongoDB and external table

The Evolution and Future of Graph Technology: Intelligent Systems

ByAxel Morgner, Co-Founder and CEO, Structr | December 14, 2018

Reading time: 5 minutes

The field of graph technology has developed rapidly in recent years.

From the beginnings of theNeo4j graph database released almost nine years ago in version 1.0 up to current the latest version (and also the developments of other manufacturers, though Neo4j has not lost its leading position in the market since then), the area of graph databases and graph-based systems has established itself as an independent technology sector that’s growing at an above-average rate.

Evidence for this evolution is also GQL (Graph Query Language) a proposal for a future industry standard of a common graph query language, analogous to SQL. Standardization committees such as W3C or ISO are already discussing it.

The standardization of SQL helped relational databases to breakthrough and gain a broad adoption in the database sector. The dominant position of relational databases has, however, has been challenged due to competition in the NoSQL (“Not Only SQL”) market, fueled by massively increasing data volumes (big data). And graph technology in particular, as one of the fastest growing branches of the big data solution, has played a major role in the tides turning to a standard graph query language.

There must be reasons for the current successful developments in the graph field. Considering that graph theory and graph models have been widely used for a long time (within and outside the IT world), the reasons must be found on the technological level.

In addition to the high maturity of the respective implementations, the property graph data model should be mentioned here, which stands out in its relative simplicity from more complex and more academic data models (such as UML or RDF). There are only nodes and edges and both can have attributes. This creates greater flexibility, because many existing graph-based concepts are easily mapped to the property graph model while remaining easy to understand for us humans.

The decision of some graph technology manufacturers to limit their products to this rather simple model, and to expand it only carefully, has proven to be well-considered. Users and developers had the freedom they needed to design and implement ideas with a still relatively new technology. It also enabled the database vendors to observe and learn from the sometimes very creative applications of their technology in the real world.

For example, the extension of the property graph model to include labels and the introduction ofCypher, a declarative query language for Neo4j, are good “lessons learned,” and what you can achieve if you foster good relationships with your community of users, customers and partners (and occasionally listen to them).

The Structr platform can also be seen as one of these creative solutions based on graph technology.

Structr uses a graph database as primary data storage in all areas and makes extensive use of its flexibility to implement functions, such as a data model modifiable at runtime and the new flow engine which allows visual programming. In addition to the benefits of the Neo4j graph platform, the positive characteristics of a graph application platform such as Structr add up and make it possible to implement any kind of application in a relatively short amount of time.

Use Cases: More Data, More Dynamics, More Intelligence

Initially, when the technology and products were new, there were assumptions as to in which areas and for which applications graph technology would be adapted.

Both Neo4j and Structr were initially developed for the use case of Content and Media Asset Management. It was a great surprise, for both manufacturers, that predictions about which industry or which use case was suitable for the use of graph databases and platforms, turned out to be wrong. However, they were wrong because apparently almost all areas in which software systems are used can benefit from graph technology.

This is also reflected in the following picture, which shows the development of graph technology use cases from simple, rather static cases with small amounts of data to dynamic, intelligent systems with large amounts of data.

“Static” here means that the data changes relatively seldom. A typical example is Content Management, in which media content created once is stored and managed in a searchable way. The focus lies more on good organization and fast read access a requirement profile that graph databases meet well from the beginning.

Over time, more and more dynamic use cases with frequently changing data, such as real-time transaction monitoring or social media applications, were added. Consequently, the technical solutions were honed over time for good write performance.

Many graph solutions initially introduced in isolated areas have proven their value and have gradually been extended by additional hierarchies or dimensions, while growing in complexity and data volume. An example is the historization of data as a temporal extension of data that only describes a single state.

The advantages of graphs over less flexible models is evident: Graphs can be extended in any direction without having to migrate existing structures. This brings even more benefits on a medium and long-term time scale as it further simplifies development and increases maintainability.

In addition, intelligent graph application cases often characterized by the combined usage of static and dynamic information (or primary and metadata) to derive new correlations by smart graph processing and graph algorithms have become increasingly common in recent times.

It is no coincidence that graph-based systems in particular are increasingly being used as the basis for NLP (natural language processing), ML (machine learning) and AI (artificial intelligence) systems. Most models in these disciplines are graphs. Not only are graphs similar to linguistic language models, taxonomies and ontologies, but they also resemble the structures of the real world, from neurons and body cells to elementary particles and their interactions.

In short, graphs are the most general and easiest-to-understand representations of the real world and are therefore best suited as data models and for persisting graph-like structures.

Graphs As the Basis for Intelligent Software

The fact that intelligent software is best developed with graph technology is also demonstrated by the development of Neo4j and Structr.

Only recently,Neo4j announced the completion of a further financing round (Series E Round) amounting to USD 80 million. With the additional investment volume, the acceptance of the graph platform is to be further expanded and Neo4j is equipped to keep up in the AI and ML environment.

The users and customers of Structr also benefit from this, because Structr helps with accessing Neo4’s benefits quickly and easily.

In version 3.0, we added a flow engine to the platform, which makes it possible to develop and run programs in a graphical flow editor. This is the basis for the development of a ML and AI module that we will launch in 2019.

As a supplement to the existing Schema Editor and Flow Engine modules, it will extend the range of functions in the direction of ML and AI and contribute to making the vision a reality that applications can be developed with less and less programming and special knowledge, which optimally exploit the advantages of graph technology.

New to graph technology?

Graph Databases for Beginners Get My Copy

↧

Partitioned consensus and its impact on Spanner’s latency

December 17, 2018, 4:24 am

≫ Next: Using Neo4j’s Full-Text Search With GraphQL Schema Directives

≪ Previous: The Evolution and Future of Graph Technology: Intelligent Systems

In a post that I published in September , I described two primary approaches for performing consensus in distributed systems, and how the choice of approach affects the consistency guarantees of the system. In particular, consensus can either be unified , such that all writes in the system participate in the same distributed consensus protocol, or it can be partitioned , such that different subsets of the data participate in distinct consensus protocols.

The primary downside of partitioned consensus was that achieving global consistency is much more complicated. Consistency guarantees require that requests submitted after previous requests complete will never “go back in time” and view a state of the system that existed prior to the completed request. Such guarantees are hard to enforce in partitioned consensus systems since different partitions operate independently from each other: Consistency requires a notion of “before” and “after” --- even for events on separate machines or separate partitions. For partitions that operate completely independently, the most natural way to define “before” and “after” is to use real time --- the time on the clocks of the different partitions. However, clocks tend to skew at the millisecond granularity, and keeping clocks in sync is nontrivial. We discussed how Google has a hardware solution that aids in clock synchronization, while other solutions attempt to use software-only clock synchronization algorithms.

In contrast, unified consensus results in a global order of all requests. This global order can be used to implement the notion of “before” and “after” without having to rely on local clock time, which entirely avoids the need for clock synchronization. This results in stronger consistency guarantees: unified consensus systems can guarantee consistency at all times, while partitioned consensus systems can only guarantee consistency if the clock skew stays within an expected bound. For software-only implementations, it is hard to avoid occasionally violating the maximum clock skew bound assumption, and the violations themselves may not be discoverable. Therefore, unified consensus is the safer option.

The post led to several interesting debates, most of which are beyond the scope of this post. However, there was one interesting debate I’d like to explore more deeply in this post. In particular, the question arose: Are there any fundamental latency differences between unified-consensus systems and partitioned-consensus systems? When I read the comments to my post (both on the post itself and also external discussion boards), I noticed that there appears to be a general assumption amongst my readers that unified consensus systems must have higher latency than partitioned consensus systems. One reader even accused me of purposely avoiding discussing latency in that post in order to cover up a disadvantage of unified consensus systems. In this post, I want to clear up some of the misconceptions and inaccurate assumptions around these latency tradeoffs, and present a deeper (and technical) analysis on how these different approaches to consensus have surprisingly broad consequences on transaction latency. We will analyze the latency tradeoff from three perspectives: (1) Latency for write transactions, (2) Latency for linearizable read-only transactions and (3) Latency for serializable snapshot transactions.

Latency for write transactions

The debate around write transactions is quite interesting since valid arguments can be made for both sides.

The partitioned-consensus side points out two latency downsides of unified consensus: (1) As mentioned in my previous post , in order to avoid scalability bottlenecks, unified consensus algorithms perform consensus batch-at-a-time instead of on individual requests. They, therefore, have to pay the additional latency of collecting transactions into batches prior to running consensus. In the original Calvin paper, batch windows were 10ms (so the average latency would be 5ms); however, we have subsequently halved the batch window latency in my labs at Yale/UMD. FaunaDB (which uses Calvin’s unified consensus approach) also limits the batch window to 10ms. (2) For unified consensus, there will usually be one extra network hop to get the request to the participant of the consensus protocol for its local region. This extra hop is local, and therefore can be assumed to take single-digit milliseconds. If you combine latency sources (1) and (2), the extra latency incurred by the preparation stage for unified consensus is approximately 10-15ms.

On the other hand, the unified-consensus side points out that multi-partition atomic write transactions require two-phase commit for partitioned-consensus systems. For example, let’s say that a transaction writes data in two partitions: A and B. In a partitioned-consensus system, the write that occurred in each partition achieves consensus separately. It is very possible that the consensus in partition A succeeds while in B it fails. If the system guarantees atomicity for transactions, then the whole transaction must fail, which requires coordination across the partitions --- usually via two-phase commit. Two-phase commit can result in significant availability outages (if the coordinator fails at the wrong time) unless it runs on top of consensus protocols. Thus Spanner and Spanner-derivative systems all run two-phase commit over partitioned consensus for multi-partition transactions. The latency cost of the Raft/Paxos protocol itself (once it gets started) is the same for unified and partitioned consensus, but two-phase commit requires two rounds of consensus to commit such transactions. A single round of consensus may take between 5ms and 200ms, depending on how geographically disperse the deployment is. Since Spanner requires two sequential rounds, the minimum transaction latency is double that --- between 10ms for single-region deployments to 400ms for geographically disperse deployments.

In practice, this two-phase commit also has an additional issue: a transaction cannot commit until all partitions vote to commit. A simple majority is not sufficient --- rather every partition must vote to commit. A single slow partition (for example, a partition undergoing leader election) stalls the entire transaction. This increases the observed long tail latency in proportion to transaction complexity.

In contrast, unified consensus systems such as Calvin and its derivatives such as FaunaDB do not require two-phase commit. [Side note: a discussion of how to avoid two-phase commit in a unified consensus system can be found in this VLDB paper . FaunaDB’s approach is slightly different, but the end result is the same: no two-phase commit.] As a result, unified consensus systems such as Calvin and FaunaDB only require one round of consensus to commit all transactions --- even transactions that access data on many different machines. The bottom line is that the better latency option between unified or partitioned consensus for writ

↧

Using Neo4j’s Full-Text Search With GraphQL Schema Directives

December 17, 2018, 4:22 am

≫ Next: lymph nodes in neck diagram

≪ Previous: Partitioned consensus and its impact on Spanner’s latency

Using Neo4j’s Full-Text Search WithGraphQL Defining Custom Query Fields Using The Cypher GraphQL Schema Directive

William Lyon

We’ve seen previously how easy it is to create a GraphQL API backed by Neo4j usingneo4j-graphql.js, a part ofGRANDstack. In this post, we will see how to expose Neo4j’s full-text search functionality in our GraphQL API by defining a custom Query field usinga @cypher GraphQL schema directive.

Full-Text Indexes InNeo4j

Introduced in Neo4j 3.5 full-text indexes are very useful for matching based on user input. Imagine a form in a business search web app. The user wants to search for businesses by name, but might not be the best speller…

Using Neo4j’s Full-Text Search With GraphQL Schema Directives

Full-text search is useful for searching using user input, such as search text. Here we use case-insensitive fuzzy matching to match on business names with various misspellings of“Pizza”.

Note we’ve previously been able to use fulltext indexes in Neo4j via explicit indexes and as exposed through APOC. And of course we can always use regular expressions, but these aren’t backed by an index.

Creating A Full-Text Index

We first need to create a full-text index in the database. The API for this is through the use of procedures, namely db.index.fulltext.createNodeIndex()

The API for creating full-text indexes using procedures. To see this in Neo4j Browser, run `CALL dbms.procedures()`

So let’s create a full-text index using the name property on Business labels:

CALL db.index.fulltext.createNodeIndex("businessNameIndex", ["Business"],["name"])

Note that we can combine multiple node labels and properties into a single index. This would be useful if, for example, we had a description property on our Business nodes that we wanted to search as well

We can also create a full-text index for relationships, using db.index.fulltext.createRelationshipIndex()

Querying A FulltextIndex
Using Neo4j’s Full-Text Search With GraphQL Schema Directives

The procedures used for querying a full-text index inNeo4j.

Just like creating a full-text index, we query it using a procedure as well:

<strong>CALL</strong> db.index.fulltext.queryNodes("businessNameIndex", "pizza")

We can use Lucene query syntax in our query, for example, to use fuzzy matching :

<strong>CALL</strong> db.index.fulltext.queryNodes("businessNameIndex", "pizzza~")

and of course, we can use the results of the full-text query in a more complex Cypher query. Here we ensure any Business matches from our full-text index are indeed connected to the “Restaurants” category:

CALL db.index.fulltext.queryNodes("businessNameIndex", "pizzza~")
YIELD node
MATCH (node)-[:IN_CATEGORY]->(:Category {name: "Restaurants"})
RETURN node Creating Our GraphQL API With neo4j-graphql.js

Now let’s switch to creating our GraphQL service. We’ll use neo4j-graphql.js to quickly spin up a GraphQL API.

npm install --save neo4j-graphql-js

The first step in creating a GraphQL API is creating our type definitions. Let’s define some simple types for businesses and categories.

type Business {
id: ID!
name: String
address: String
categories: [Category] @relation(
name: "IN_CATEGORY", direction: "OUT")
} type Category {
name: ID!
businesses: [Business] @relation(
name: "IN_CATEGORY", direction: "IN")
}

Next, we pass these type definitions off to `makeAugmentedSchema` from neo4j-graphql.js and we’re up and running with a full GraphQL API on top of Neo4j:

import { ApolloServer } from "apollo-server";
import { makeAugmentedSchema } from "neo4j-graphql-js"; const schema = makeAugmentedSchema({
typeDefs
}); const server = new ApolloServer({
schema
}); server.listen(process.env.GRAPHQL_LISTEN_PORT, "0.0.0.0")
.then(({ url }) => {
console.log(`GraphQL API ready at ${url}`);
});
Using Neo4j’s Full-Text Search With GraphQL Schema Directives

The GraphQL API auto-generated from our type definitions.

We can then examine the auto-generated GraphQL API in GraphQL Playground. We can see that 2 Query fields are available: Business and Category The arguments for these fields allow us to search for exact matches to the business name property (or address, or id) but how can we search for fuzzy matches? For example, we want to consider case-insensitive comparisons and slight misspellings. We saw above how to query for these fuzzy matches using Cypher, but how can we bring that into our GraphQL API?

This is where creating a custom Query field usingthe @cypher GraphQL schema directive comes in. By annotating a field in our GraphQL type definitions with a Cypher query, we can map that Cypher query to the GraphQL field!

Adding A Custom Query Field To Our GraphQLSchema

We’ll add a fuzzyBusinessByName field to the Query type in our GraphQL type definitions, annotating that field with a Cypher query using the @cypher GraphQL schema directive:

type Query {
fuzzyBusinessByName(searchString: String): [Business] @cypher(
statement: """
CALL db.index.fulltext.queryNodes(
'businessName', $searchString+'~')
YIELD node RETURN node
"""
)
} If you’re not familiar with GraphQL schema directives, they are GraphQL’s bu

↧

lymph nodes in neck diagram

December 17, 2018, 4:20 am

≫ Next: Why Sticky Session on Sitecore Content Authoring

≪ Previous: Using Neo4j’s Full-Text Search With GraphQL Schema Directives

Lymph Nodes In Neck Diagram

lymph nodes in neck diagram - file diagram showing the position of the lymph nodes in .

lymph nodes in neck diagram - neck lymph glands locations diagram neck get free image .

lymph nodes in neck diagram - lymph nodes in neck diagram location anatomy organ .

lymph nodes in neck diagram - location of lymph nodes in neck diagram human anatomy chart .

lymph nodes in neck diagram - lymph nodes and neck diagram patient co uk .

lymph nodes in neck diagram - neck labelled diagram glands diagram lymph node in .

lymph nodes in neck diagram - cancers of the floor of the .

lymph nodes in neck diagram - back of lymph nodes location diagram lymph nodes in .

lymph nodes in neck diagram - diagram diagram of lymph nodes in neck and .

lymph nodes in neck diagram - lymph node neck diagram human anatomy chart .

lymph nodes in neck diagram - glands in neck diagram glands neck diagram gallery .

lymph nodes in neck diagram - pictures lymph nodes back of neck anatomy human charts .

lymph nodes in neck diagram - diagram swollen glands in neck diagram .

lymph nodes in neck diagram - lymph nodes in neck diagram anatomy diagram .

lymph nodes in neck diagram - lymph nodes in neck location diagram anatomy organ .

lymph nodes in neck diagram - diagram lymph node chain lymph nodes in the brain .

lymph nodes in neck diagram - lymph nodes diagram neck collection lymph nodes in neck .

lymph nodes in neck diagram - lymph nodes .

lymph nodes in neck diagram - lymph nodes in back of neck diagram manicpixi lymph .

lymph nodes in neck diagram - neck lymph node levels diagram archives human anatomy chart .

lymph nodes in neck diagram - file diagram showing the lymph nodes in the and neck .

lymph nodes in neck diagram - lymph nodes in back of neck diagram anatomy system diagrams .

lymph nodes in neck diagram - lymph nodes location types significance lymph node .

lymph nodes in neck diagram - lymph nodes in neck diagram smartdraw diagrams .

lymph nodes in neck diagram - cervical lymph node anatomy gallery human anatomy diagram .

lymph nodes in neck diagram - lymph nodes neck diagram human anatomy chart .

lymph nodes in neck diagram - neck lymph node locations nursessity .

lymph nodes in neck diagram - lymph nodes back of neck diagram diagram diagram of lymph .

↧

Why Sticky Session on Sitecore Content Authoring

December 17, 2018, 4:18 am

≫ Next: JNoSQL和Jakarta EE

≪ Previous: lymph nodes in neck diagram

Why have to use InProc session state on Sitecore Content Authoring

A note on why InProc session state & sticky sessions must be used for Sitecore Content Authoring. The official Sitecore documentation was mentioning that Content Authoring could use a shared session state provider, but after testing, and being in contact with Sitecore support this documentation was update to reflect reality at the moment.

https://kb.sitecore.net/articles/860809

Bad Practice to use Sticky Sessions

A quick intro on why Sticky Sessions are bad. https://docs.microsoft.com/en-us/azure/architecture/guide/design-principles/scale-out

Avoid instance stickiness.Stickiness, or session affinity, is when requests from the same client are always routed to the same server. Stickiness limits the application’s ability to scale out. For example, traffic from a high-volume user will not be distributed across instances. Causes of stickiness include storing session state in memory, and using machine-specific keys for encryption. Make sure that any instance can handle any request.

Depending on your routing mechanism, you may also find that your disribution of load across your servers is uneven. With one server being overworked, and other servers being underutilised.

E.g. if you reboot a server, everyone is going to lose their session, and fail over to another server. Then when the server comes back up, new sessions to be routed to the under utilised server, but those previous sessions have been stacked onto another machine.

Or if you use sticky sessions via IP address, and lots of people work in the same office, all of that office will be allocated to the same machine.

Why Sticky Session on Sitecore Content Authoring

Redis provider network stability

The Stack Exchange redis session state provider at the time often had network stability issues, particularly on Azure Redis (TLS) https://github.com/StackExchange/StackExchange.Redis/issues/871

Version 2 has since been released, which would be worth testing to see how much this has improved the issue.

Too much data being put into Session state

However, another issue which couldn’t be solved, was that too much data was being stored in session state.

228534 - Jsnlog entries spam session storage 228355 - Validation related objects spam session storage

The JsnLog entries can be disabled by changing \App_Config\Include\Sitecore.JSNLog.config <jsnlog enabled=“false”

For validation though, there is no easy work around. You can disable the validation bar - so no new validation messages are loaded with each Content Editor refresh. But new objects will be added if Validation is triggered manually.

ASP.NET WebForms Legacy

And this is on top of the Content Editor still using WebForms with ViewState and ControlState needing to be stored to a shared medium, which may as well also be Redis is using a Redis session state provider. (Although a Database is another option if you have spare DTUs, why not be consistent)

Redis is designed for small objects

https://azure.microsoft.com/en-gb/blog/investigating-timeout-exceptions-in-stackexchange-redis-for-azure-redis-cache/

Redis Server and StackExchange.Redis are optimized for many small requests rather than fewer large requests. Splitting your data into smaller chunks may improve things here

A large session state, viewstate, control state isn’t going to help.

Summary

Ultimately right now, to have more than once Content Authoring server, you have to use Sticky Sessions and InProc session state.

Looking forward to a future version of Sitecore allowing to get away from Sticky Sessions on Content Authoring, either by reducing the amount of data needed in Session Storage, or getting away from Viewstate & ControlState - perhaps with Sitecore Horizon the new editor built in Angular.

↧

JNoSQL和Jakarta EE

December 17, 2018, 4:16 am

≫ Next: Redis是单线程的，但Redis为什么这么快？

≪ Previous: Why Sticky Session on Sitecore Content Authoring

本文探讨了Jakarta EE的最新发展方向：NoSQL数据库。介绍JNoSQL，这是Jakarta EE针对Eclipse和NoSQL数据库的第一个集成项目。

自1999年首次引入Java 2企业版以来，Enterprise Java平台一直在稳步发展。今天，Enterprise Java正在Eclipse Foundation下使用新品牌Jakarta EE进行标准化。Jakarta EE在Java EE 8停止的地方占据了一席之地，但未来的路线图将侧重于现代创新，如微服务，模块化，以及现在的NoSQL数据库。

该 JNoSQL项目。

什么是NoSQL？

NoSQL数据库是一种超越关系数据库的数据持久性新方法。NoSQL数据库提供的性能和可扩展性通常更适合当今的动态和大规模数据存储和访问需求。

NoSQL数据库迅速普及，并已广泛应用于包括金融在内的多个行业。根据 DB-Engine 排名，列出的十个数据库中有四个是NoSQL。今天有超过 225个NoSQL数据库可供选择。

Java EE和NoSQL

NoSQL的快速采用与各种各样的实现相结合，推动了创建一组标准化API的愿望。在Java世界中，这最初是由Oracle 努力为Java EE 9定义NoSQL API而提出的定义的新API：

JPA的设计并未考虑NoSQL。单个API或注释集不适用于所有数据库类型。 NoSQL上的JPA意味着注释的使用不一致。 NoSQL世界的多样性至关重要。

遗憾的是，当Java EE被捐赠给Eclipse Foundation时，Java EE 9的Oracle提议尚未完成。

Jakarta EE和NoSQL

为了将NoSQL创新引入Jakarta EE平台，Eclipse Foundation的JNoSQL项目将成为NoSQL数据库的新标准化Java API的基础。JNoSQL项目的目标是成为定义基于Java的NoSQL API标准的基础，该标准使Jakarta EE平台能够：

支持四种NoSQL类型（键值，列族，文档和图形）为每个存储类别实现Java NoSQL通信API（类似于JDBC） - 它将使用 Apache Tinkerpop 为每个受支持的类型提供Mapping API（类似于JPA），并带有一组共享注释。定义一组标准的注释，这些注释在通信和映射API中都很有用。

要了解有关Eclipse JNoSQL的更多信息，请单击本文的链接。

总结

采用JNoSQL作为添加到Jakarta EE平台的第一个新标准的基础，体现了Eclipse Foundation对Enterprise Java现代化的承诺。JNoSQL项目期待为Jakarta EE平台做出贡献，并与新的Jakarta EE工作组合作，将JNoSQL集成并标准化为NoSQL数据库的API。

↧

Redis是单线程的，但Redis为什么这么快？

December 17, 2018, 4:14 am

≫ Next: golang 开源项目全集 - zzhongcy的专栏

≪ Previous: JNoSQL和Jakarta EE

近乎所有与Java相关的面试都会问到缓存的问题，基础一点的会问到什么是“二八定律”、什么是“热数据和冷数据”，复杂一点的会问到缓存雪崩、缓存穿透、缓存预热、缓存更新、缓存降级等问题，这些看似不常见的概念，都与我们的缓存服务器相关，一般常用的缓存服务器有Redis、Memcached等，而笔者目前最常用的也只有Redis这一种。

如果你在以前面试的时候还没有遇到过面试官问你《为什么说Redis是单线程的以及Redis为什么这么快！》，那么你看到这篇文章的时候，你应该觉得是一件很幸运的事情！如果你刚好是一位高逼格的面试官，你也可以拿这道题去面试对面“望穿秋水”般的小伙伴，测试一下他的掌握程度。

好啦！步入正题！我们先探讨一下Redis是什么，Redis为什么这么快、然后在探讨一下为什么Redis是单线程的？

一.Redis简介

Redis是一个开源的内存中的数据结构存储系统，它可以用作：数据库、缓存和消息中间件。

它支持多种类型的数据结构，如字符串（String），散列（Hash），列表（List），集合（Set），有序集合（Sorted Set或者是ZSet）与范围查询，Bitmaps，Hyperloglogs 和地理空间（Geospatial）索引半径查询。其中常见的数据结构类型有：String、List、Set、Hash、ZSet这5种。

Redis 内置了复制（Replication），LUA脚本（Lua scripting）， LRU驱动事件（LRU eviction），事务（Transactions）和不同级别的磁盘持久化（Persistence），并通过 Redis哨兵（Sentinel）和自动分区（Cluster）提供高可用性（High Availability）。

Redis也提供了持久化的选项，这些选项可以让用户将自己的数据保存到磁盘上面进行存储。根据实际情况，可以每隔一定时间将数据集导出到磁盘（快照），或者追加到命令日志中（AOF只追加文件），他会在执行写命令时，将被执行的写命令复制到硬盘里面。您也可以关闭持久化功能，将Redis作为一个高效的网络的缓存数据功能使用。

Redis不使用表，他的数据库不会预定义或者强制去要求用户对Redis存储的不同数据进行关联。

数据库的工作模式按存储方式可分为：硬盘数据库和内存数据库。Redis 将数据储存在内存里面，读写数据的时候都不会受到硬盘 I/O 速度的限制，所以速度极快。

（1）硬盘数据库的工作模式：
Redis是单线程的，但Redis为什么这么快？

（2）内存数据库的工作模式：
Redis是单线程的，但Redis为什么这么快？

看完上述的描述，对于一些常见的Redis相关的面试题，是否有所认识了，例如：什么是Redis、Redis常见的数据结构类型有哪些、Redis是如何进行持久化的等。

二.Redis到底有多快？

Redis采用的是基于内存的采用的是单进程单线程模型的 KV 数据库，由C语言编写，官方提供的数据是可以达到100000+的QPS（每秒内查询次数）。

这个数据不比采用单进程多线程的同样基于内存的 KV 数据库 Memcached 差！

横轴是连接数，纵轴是QPS。此时，这张图反映了一个数量级，希望大家在面试的时候可以正确的描述出来，不要问你的时候，你回答的数量级相差甚远！

三.Redis为什么这么快？

1、完全基于内存，绝大部分请求是纯粹的内存操作，非常快速。数据存在内存中，类似于HashMap，HashMap的优势就是查找和操作的时间复杂度都是O(1)；

2、数据结构简单，对数据操作也简单，Redis中的数据结构是专门进行设计的；

3、采用单线程，避免了不必要的上下文切换和竞争条件，也不存在多进程或者多线程导致的切换而消耗 CPU，不用去考虑各种锁的问题，不存在加锁释放锁操作，没有因为可能出现死锁而导致的性能消耗；

4、使用多路I/O复用模型，非阻塞IO；

5、使用底层模型不同，它们之间底层实现方式以及与客户端之间通信的应用协议不一样，Redis直接自己构建了VM 机制，因为一般的系统调用系统函数的话，会浪费一定的时间去移动和请求；

以上几点都比较好理解，下边我们针对多路 I/O 复用模型进行简单的探讨：

（1）多路 I/O 复用模型

多路I/O复用模型是利用 select、poll、epoll 可以同时监察多个流的 I/O 事件的能力，在空闲的时候，会把当前线程阻塞掉，当有一个或多个流有 I/O 事件时，就从阻塞态中唤醒，于是程序就会轮询一遍所有的流（epoll 是只轮询那些真正发出了事件的流），并且只依次顺序的处理就绪的流，这种做法就避免了大量的无用操作。

这里“多路”指的是多个网络连接，“复用”指的是复用同一个线程。

采用多路 I/O 复用技术可以让单个线程高效的处理多个连接请求（尽量减少网络 IO 的时间消耗），且 Redis 在内存中操作数据的速度非常快，也就是说内存内的操作不会成为影响Redis性能的瓶颈，主要由以上几点造就了 Redis 具有很高的吞吐量。

四.那么为什么Redis是单线程的？

我们首先要明白，上边的种种分析，都是为了营造一个Redis很快的氛围！官方FAQ表示，因为Redis是基于内存的操作，CPU不是Redis的瓶颈，Redis的瓶颈最有可能是机器内存的大小或者网络带宽。既然单线程容易实现，而且CPU不会成为瓶颈，那就顺理成章地采用单线程的方案了（毕竟采用多线程会有很多麻烦！）。

看到这里，你可能会气哭！本以为会有什么重大的技术要点才使得Redis使用单线程就可以这么快，没想到就是一句官方看似糊弄我们的回答！但是，我们已经可以很清楚的解释了为什么Redis这么快，并且正是由于在单线程模式的情况下已经很快了，就没有必要在使用多线程了！

但是，我们使用单线程的方式是无法发挥多核CPU 性能，不过我们可以通过在单机开多个Redis 实例来完善！

警告1：这里我们一直在强调的单线程，只是在处理我们的网络请求的时候只有一个线程来处理，一个正式的Redis Server运行的时候肯定是不止一个线程的，这里需要大家明确的注意一下！例如Redis进行持久化的时候会以子进程或者子线程的方式执行（具体是子线程还是子进程待读者深入研究）；例如我在测试服务器上查看Redis进程，然后找到该进程下的线程：

ps命令的“-T”参数表示显示线程（Show threads, possibly with SPID column.）“SID”栏表示线程ID，而“CMD”栏则显示了线程名称。

警告2：在上图中FAQ中的最后一段，表述了从Redis 4.0版本开始会支持多线程的方式，但是，只是在某一些操作上进行多线程的操作！所以该篇文章在以后的版本中是否还是单线程的方式需要读者考证！

五.注意点

1、我们知道Redis是用”单线程-多路复用IO模型”来实现高性能的内存数据服务的，这种机制避免了使用锁，但是同时这种机制在进行sunion之类的比较耗时的命令时会使redis的并发下降。

因为是单一线程，所以同一时刻只有一个操作在进行，所以，耗时的命令会导致并发的下降，不只是读并发，写并发也会下降。而单一线程也只能用到一个CPU核心，所以可以在同一个多核的服务器中，可以启动多个实例，组成master-master或者master-slave的形式，耗时的读命令可以完全在slave进行。

需要改的redis.conf项：

pidfile /var/run/redis/redis_6377.pid #pidfile要加上端口号

port 6377 #这个是必须改的

logfile /var/log/redis/redis_6377.log #logfile的名称也加上端口号

dbfilename dump_6377.rdb #rdbfile也加上端口号

2、“我们不能任由操作系统负载均衡，因为我们自己更了解自己的程序，所以，我们可以手动地为其分配CPU核，而不会过多地占用CPU，或是让我们关键进程和一堆别的进程挤在一起。”

CPU 是一个重要的影响因素，由于是单线程模型，Redis 更喜欢大缓存快速 CPU，而不是多核。

在多核 CPU 服务器上面，Redis 的性能还依赖NUMA 配置和处理器绑定位置。最明显的影响是 redis-benchmark 会随机使用CPU内核。为了获得精准的结果，需要使用固定处理器工具（在 linux 上可以使用 taskset）。最有效的办法是将客户端和服务端分离到两个不同的 CPU 来高校使用三级缓存。

六.扩展

以下也是你应该知道的几种模型，祝你的面试一臂之力！

1、单进程多线程模型：mysql、Memcached、Oracle（windows版本）；

2、多进程模型：Oracle（Linux版本）；

3、Nginx有两类进程，一类称为Master进程(相当于管理进程)，另一类称为Worker进程（实际工作进程）。启动方式有两种：

（1）单进程启动：此时系统中仅有一个进程，该进程既充当Master进程的角色，也充当Worker进程的角色。

（2）多进程启动：此时系统有且仅有一个Master进程，至少有一个Worker进程工作。

（3）Master进程主要进行一些全局性的初始化工作和管理Worker的工作；事件处理是在Worker中进行的。

↧

golang 开源项目全集 - zzhongcy的专栏

December 17, 2018, 7:54 am

≫ Next: 开发如何避免redis集群访问倾斜和数据倾斜

≪ Previous: Redis是单线程的，但Redis为什么这么快？

一直更新中，地址：https://github.com/golang/go/wiki/Projects#zeromq Indexes and search engines

These sites provide indexes and search engines for Go packages:

awesome-go - A community curated list of high-quality resources. Awesome Go @LibHunt - Your go-to Go Toolbox. godoc.org - A documentation browser for any Go open source package. go-hardware - Curated list of resources for using Go on non-standard hardware. go-patterns - Commonly used patterns and idioms for Go. gopm.io - Download Go packages by version go-search - Search engine dedicated to Go projects and source. gowalker - API documentation generator and search. Golang Data Science - Curated list of data science libraries for Go. Go Report Card - Code quality summaries for any Go project. Sourcegraph - Source indexing, analysis and search. Dead projects

If you find a project in this list that is dead or broken, please either mark it as such or mention it in the #go-nuts IRC channel.

Table of Contents Astronomy Build Tools Caching Cloud Computing Command-line Option Parsers Command-line Tools Compression Concurrency and Goroutines Configuration File Parsers Console User Interface Continuous Integration Cryptography Databases Data Processing Data Structures Date Development Tools Distributed/Grid Computing Documentation Editors Email Encodings and Character Sets Error handling File Systems Games GIS Go Implementations Graphics and Audio GUIs and Widget Toolkits Hardware Language and Linguistics Logging & Monitoring Machine Learning & AI Mathematics Microservices Miscellaneous Music Networking DNS

↧

开发如何避免redis集群访问倾斜和数据倾斜

December 17, 2018, 7:52 am

≫ Next: Flink 源码解析之从 Example 出发：读懂 Flink On Yarn 任务执行流程

≪ Previous: golang 开源项目全集 - zzhongcy的专栏

[TOC] 概述

redis 集群部署方式大部分采用类 Twemproxy 的方式进行部署。即通过 Twemproxy 对 redis key 进行分片计算，将 redis key 进行分片计算，分配到多个 redis 实例中的其中一个。tewmproxy 架构图如下：

由于 Twemproxy 背后的多个 redis 实例在内存配置和 cpu 配置上都是一致的，所以一旦出现访问量倾斜或者数据量倾斜，则可能会导致某个 redis 实例达到性能瓶颈，从而使整个集群达到性能瓶颈。

hot key出现造成集群访问量倾斜

Hot key ，即热点 key，指的是在一段时间内，该 key 的访问量远远高于其他的 redis key，导致大部分的访问流量在经过 proxy 分片之后，都集中访问到某一个 redis 实例上。hot key 通常在不同业务中，存储着不同的热点信息。比如

新闻应用中的热点新闻内容；活动系统中某个用户疯狂参与的活动的活动配置；商城秒杀系统中，最吸引用户眼球，性价比最高的商品信息；

……

解决方案 1. 使用本地缓存

在 client 端使用本地缓存，从而降低了redis集群对hot key的访问量，但是同时带来两个问题：1、如果对可能成为 hot key 的 key 都进行本地缓存，那么本地缓存是否会过大，从而影响应用程序本身所需的缓存开销。2、如何保证本地缓存和redis集群数据的有效期的一致性。

针对这两个问题，先不展开讲，先将第二个解决方案。

2. 利用分片算法的特性，对key进行打散处理

我们知道 hot key 之所以是 hot key，是因为它只有一个key，落地到一个实例上。所以我们可以给hot key加上前缀或者后缀，把一个hotkey 的数量变成 redis 实例个数N的倍数M，从而由访问一个 redis key 变成访问 N * M 个redis key。

N*M 个 redis key 经过分片分布到不同的实例上，将访问量均摊到所有实例。

代码如下：

//redis 实例数 const M = 16 //redis 实例数倍数（按需设计，2^n倍，n一般为1到4的整数） const N = 2 func main() { //获取 redis 实例 c, err := redis.Dial("tcp", "127.0.0.1:6379") if err != nil { fmt.Println("Connect to redis error", err) return } defer c.Close() hotKey := "hotKey:abc" //随机数 randNum := GenerateRangeNum(1, N*M) //得到对 hot key 进行打散的 key tmpHotKey := hotKey + "_" + strconv.Itoa(randNum) //hot key 过期时间 expireTime := 50 //过期时间平缓化的一个时间随机值 randExpireTime := GenerateRangeNum(0, 5) data, err := redis.String(c.Do("GET", tmpHotKey)) if err != nil { data, err = redis.String(c.Do("GET", hotKey)) if err != nil { data = GetDataFromDb() c.Do("SET", "hotKey", data, expireTime) c.Do("SET", tmpHotKey, data, expireTime + randExpireTime) } else { c.Do("SET", tmpHotKey, data, expireTime + randExpireTime) } } }

在这个代码中，通过一个大于等于 1 小于 M * N 的随机数，得到一个 tmp key，程序会优先访问tmp key，在得不到数据的情况下，再访问原来的 hot key，并将 hot key的内容写回 tmp key。值得注意的是，tmp key的过期时间是 hot key 的过期时间加上一个较小的随机正整数，保证在 hot key 过期时，所有 tmp key 不会同时过期而造成缓存雪崩。这是一种通过坡度过期的方式来避免雪崩的思路，同时也可以利用原子锁来写入数据就更加的完美，减小db的压力。

另外还有一件事值得一提，默认情况下，我们在生成 tmp key的时候，会把随机数作为 hot key 的后缀，这样符合redis的命名空间，方便 key 的收归和管理。但是存在一种极端的情况，就是hot key的长度很长，这个时候随机数不能作为后缀添加，原因是 Twemproxy 的分片算法在计算过程中，越靠前的字符权重越大，考后的字符权重则越小。也就是说对于key名，前面的字符差异越大，算出来的分片值差异也越大，更有可能分配到不同的实例（具体算法这里不展开讲）。所以，对于很长 key 名的 hot key，要对随机数的放入做谨慎处理，比如放在在最后一个命令空间的最前面（eg：由原来的 space1:space2:space3_rand 改成 space1:space2:rand_space3）。

big key 造成集群数据量倾斜

big key ，即数据量大的 key ，由于其数据大小远大于其他key，导致经过分片之后，某个具体存储这个 big key 的实例内存使用量远大于其他实例，造成，内存不足，拖累整个集群的使用。big key 在不同业务上，通常体现为不同的数据，比如：

论坛中的大型持久盖楼活动；聊天室系统中热门聊天室的消息列表；

……

解决方案对 big key 进行拆分

对 big key 存储的数据（big value）进行拆分，变成value1，value2… valueN,

如果big value 是个大json 通过 mset 的方式，将这个 key 的内容打散到各个实例中，减小big key 对数据量倾斜造成的影响。 //存 mset key1, vlaue1, key2, vlaue2 ... keyN, valueN //取 mget key1, key2 ... keyN 如果big value 是个大list，可以拆成将list拆成。= list_1， list_2, list3, listN 其他数据类型同理。既是big key 也是 hot key

在开发过程中，有些 key 不只是访问量大，数据量也很大，这个时候就要考虑这个 key 使用的场景，存储在redis集群中是否是合理的，是否使用其他组件来存储更合适；如果坚持要用 redis 来存储，可能考虑迁移出集群，采用一主一备（或1主多备）的架构来存储。

其他如何发现 hot key，big key 1. 事前-预判

在业务开发阶段，就要对可能变成 hot key ，big key 的数据进行判断，提前处理，这需要的是对产品业务的理解，对运营节奏的把握，对数据设计的经验。

2.事中-监控和自动处理监控在应用程序端，对每次请求 redis 的操作进行收集上报;不推荐，但是在运维资源缺少的场景下可以考虑。开发可以绕过运维搞定）；在proxy层，对每一个 redis 请求进行收集上报;（推荐，改动涉及少且好维护）；对 redis 实例使用monitor命令统计热点key（不推荐，高并发条件下会有造成redis 内存爆掉的隐患）；机器层面，Redis客户端使用TCP协议与服务端进行交互，通信协议采用的是RESP。如果站在机器的角度，可以通过对机器上所有Redis端口的TCP数据包进行抓取完成热点key的统计（不推荐，公司每台机器上的基本组件已经很多了，别再添乱了）；自动处理

通过监控之后，程序可以获取 big key 和 hot key，再报警的同时，程序对 big key 和 hot key 进行自动处理。或者通知程序猿利用一定的工具进行定制化处理（在程序中对特定的key 执行前面提到的解决方案）

3.事后

尽量还是不要事后了吧，都是血和泪的教训，不展开讲。

谢谢阅读，欢迎交流。

↧

Flink 源码解析之从 Example 出发：读懂 Flink On Yarn 任务执行流程

December 17, 2018, 12:44 pm

≫ Next: Online Hadoop MapReduce Test Practice For Hadoop Interview

≪ Previous: 开发如何避免redis集群访问倾斜和数据倾斜

Flink 源码解析之从 Example 出发：读懂 Flink On Yarn 任务执行流程

微信公众号：深广大数据Club

关注可了解更多大数据相关的资讯。问题或建议，请公众号留言;

如果你觉得深广大数据Club对你有帮助，欢迎赞赏

本文主要讲述Apache Flink在On Yarn模式下提交任务的执行流程源码分析。

关于本地模式以及集群模式，请阅读以下两篇文章：

Flink源码解析 | 从Example出发：读懂本地任务执行流程

Flink源码解析 | 从Example出发：读懂集群任务执行流程

环境部署脚本入口

在yarn集群上启动一个长时间运行的flink集群，通过脚本yarn-session.sh来启动。

./bin/yarn-session.sh -n 4 -jm 1024m -tm 4096m

我们从yarn-session.sh脚本入手，先来看下脚本的内容。

bin=`dirname "$0"` bin=`cd "$bin"; pwd` # get Flink config . "$bin"/config.sh if [ "$FLINK_IDENT_STRING" = "" ]; then FLINK_IDENT_STRING="$USER" fi JVM_ARGS="$JVM_ARGS -Xmx512m" CC_CLASSPATH=`manglePathList $(constructFlinkClassPath):$INTERNAL_HADOOP_CLASSPATHS` log=$FLINK_LOG_DIR/flink-$FLINK_IDENT_STRING-yarn-session-$HOSTNAME.log log_setting="-Dlog.file="$log" -Dlog4j.configuration=file:"$FLINK_CONF_DIR"/log4j-yarn-session.properties -Dlogback.configurationFile=file:"$FLINK_CONF_DIR"/logback-yarn.xml" export FLINK_CONF_DIR $JAVA_RUN $JVM_ARGS -classpath "$CC_CLASSPATH" $log_setting org.apache.flink.yarn.cli.FlinkYarnSessionCli -j "$FLINK_LIB_DIR"/flink-dist*.jar "$@"

脚本流程

获取配置

设置jvm参数

设置log配置

调用FlinkYarnSessionCli执行

脚本使用指南 Usage: Required -n,--container <arg> Number of YARN container to allocate (=Number of Task Managers) Optional -D <arg> Dynamic properties -d,--detached Start detached -jm,--jobManagerMemory <arg> Memory for JobManager Container with optional unit (default: MB) -nm,--name Set a custom name for the application on YARN -q,--query Display available YARN resources (memory, cores) -qu,--queue <arg> Specify YARN queue. -s,--slots <arg> Number of slots per TaskManager -tm,--taskManagerMemory <arg> Memory per TaskManager Container with optional unit (default: MB) -z,--zookeeperNamespace <arg> Namespace to create the Zookeeper sub-paths for HA mode FlinkYarnSessionCli.java

入口类：org.apache.flink.yarn.cli.FlinkYarnSessionCli.java

main 方法中调用 run 方法执行on yarn的部署。

# main ... final FlinkYarnSessionCli cli = new FlinkYarnSessionCli( flinkConfiguration, configurationDirectory, "", ""); // no prefix for the YARN session SecurityUtils.install(new SecurityConfiguration(flinkConfiguration)); retCode = SecurityUtils.getInstalledContext().runSecured(() -> cli.run(args)); ...

run 方法中，第一步是获取参数进行参数解析

final CommandLine cmd = parseCommandLineOptions(args, true);

命令行解析完毕后有限匹配是否为help模式。如果是直接打印帮助信息。

if (cmd.hasOption(help.getOpt())) { printUsage(); return 0; }

判断是否包含 -q 的操作，则调用yarnClusterDescriptor.getClusterDescription()打印yarn的资源信息后退出

if (cmd.hasOption(query.getOpt())) { final String description = yarnClusterDescriptor.getClusterDescription(); System.out.println(description); return 0; }

如果不是 -q 或者 -h 这类操作，则进入主流程。

主流程开始先判断是否包含applicationId操作。

该操作命令用法如下：

Usage: Required -id,--applicationId <yarnAppId> YARN application Id # 附加到正在运行的Flink YARN会话application_1463870264508_0029 # 示例：./bin/yarn-session.sh -id application_1463870264508_0029

传入applicationId，通过yarnClusterDescriptor进行retrieve（检索）获取clusterClient对象。

if (cmd.hasOption(applicationId.getOpt())) { yarnApplicationId = ConverterUtils.toApplicationId(cmd.getOptionValue(applicationId.getOpt())); clusterClient = yarnClusterDescriptor.retrieve(yarnApplicationId); }

若不包含applicationId，则调用deploySessionCluster进行部署。

final ClusterSpecification clusterSpecification = getClusterSpecification(cmd); clusterClient = yarnClusterDescriptor.deploySessionCluster(clusterSpecification); //------------------ ClusterClient deployed, handle connection details yarnApplicationId = clusterClient.getClusterId(); try { final LeaderConnectionInfo connectionInfo = clusterClient.getClusterConnectionInfo(); System.out.println("Flink JobManager is now running on " + connectionInfo.getHostname() +':' + connectionInfo.getPort() + " with leader id " + connectionInfo.getLeaderSessionID() + '.'); System.out.println("JobManager Web Interface: " + clusterClient.getWebInterfaceURL()); writeYarnPropertiesFile( yarnApplicationId, clusterSpecification.getNumberTaskManagers() * clusterSpecification.getSlotsPerTaskManager(), yarnClusterDescriptor.getDynamicPropertiesEncoded()); } catch (Exception e) { try { clusterClient.shutdown(); } catch (Exception ex) { LOG.info("Could not properly shutdown cluster client.", ex); } try { yarnClusterDescriptor.killCluster(yarnApplicationId); } catch (FlinkException fe) { LOG.info("Could not properly terminate the Flink cluster.", fe); } throw new FlinkException("Could not write the Yarn connection information.", e); }

包含如下步骤

deploySessionCluster部署clusterClient

获取ApplicationId

通过clusterClient 获取LeaderConnectionInfo

写入yarn属性信息

on yarn部署还涉及到客户端是否分离的问题，yarn-sesion.sh脚本中指定 -d 或 --detached ，可以启动分离的YarnSession，而不需要客户端一直运行。

if (yarnClusterDescriptor.isDetachedMode()) { LOG.info("The Flink YARN client has been started in detached mode. In order to stop " + "Flink on YARN, use the following command or a YARN web interface to stop it:\n" + "yarn application -kill " + yarnApplicationId); }

注：在这种情况下，Flink YARN客户端将仅向群集提交Flink，然后自行关闭。请注意，在这种情况下，无法使用Flink停止YARN会话。

使用YARN实用程序（yarn application -kill ）来停止YARN会话。

如不指定分离方式，客户端需要持续运行，可以通过 ctrl+c 或者 stop 来停止。

以下是非分离方式的代码内容：

ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(); final YarnApplicationStatusMonitor yarnApplicationStatusMonitor = new YarnApplicationStatusMonitor( yarnClusterDescriptor.getYarnClient(), yarnApplicationId, new ScheduledExecutorServiceAdapter(scheduledExecutorService)); try { runInteractiveCli( clusterClient, yarnApplicationStatusMonitor, acceptInteractiveInput); } finally { try { yarnApplicationStatusMonitor.close(); } catch (Exception e) { LOG.info("Could not properly close the Yarn application status monitor.", e); } clusterClient.shutDownCluster(); try { clusterClient.shutdown(); } catch (Exception e) { LOG.info("Could not properly shutdown cluster client.", e); } // shut down the scheduled executor service ExecutorUtils.gracefulShutdown( 1000L, TimeUnit.MILLISECONDS, scheduledExecutorService); deleteYarnPropertiesFile(); ApplicationReport applicationReport; try { applicationReport = yarnClusterDescriptor .getYarnClient() .getApplicationReport(yarnApplicationId); } catch (YarnException | IOException e) { LOG.info("Could not log the final application report.", e); applicationReport = null; } if (applicationReport != null) { logFinalApplicationReport(applicationReport); } }

创建scheduledExecutorService

创建状态监听器yarnApplicationStatusMonitor

运行交互式客户端runInteractiveCli

一系列关闭操作。

生成Application报表

runInteractiveCli方法 while (continueRepl) { final ApplicationStatus applicationStatus = yarnApplicationStatusMonitor.getApplicationStatusNow(); switch (applicationStatus) { case FAILED: case CANCELED: System.err.println("The Flink Yarn cluster has failed."); continueRepl = false; break; case UNKNOWN: if (!isLastStatusUnknown) { unknownStatusSince = System.nanoTime(); isLastStatusUnknown = true; } if ((System.nanoTime() - unknownStatusSince) > 5L * CLIENT_POLLING_INTERVAL_MS * 1_000_000L) { System.err.println("The Flink Yarn cluster is in an unknown state. Please check the Yarn cluster."); continueRepl = false; } else { continueRepl = repStep(in, readConsoleInput); } break; case SUCCEEDED: if (isLastStatusUnknown) { isLastStatusUnknown = false; } // ------------------ check if there are updates by the cluster ----------- try { final GetClusterStatusResponse status = clusterClient.getClusterStatus(); if (status != null && numTaskmanagers != status.numRegisteredTaskManagers()) { System.err.println("Number of connected TaskManagers changed to " + status.numRegisteredTaskManagers() + ". " + "Slots available: " + status.totalNumberOfSlots()); numTaskmanagers = status.numRegisteredTaskManagers(); } } catch (Exception e) { LOG.warn("Could not retrieve the current cluster status. Skipping current retrieval attempt ...", e); } printClusterMessages(clusterClient); continueRepl = repStep(in, readConsoleInput); } }

方法内部主要是一个while循环体，判断条件是continueRepl，循环体内部逻辑

通过yarnApplicationStatusMonitor.getApplicationStatusNow()获取ApplicationStatus状态信息

switch判断ApplicationStatus是FAILED、CANCELED、UNKNOWN还是SUCCEEDED

FAILED、CANCELED continueRepl为false,跳出循环

UNKNOWN 会对比当前的时间与最后unknown的时间，如果大于5L*CLIENT_POLLING_INTERVAL_MS * 1_000_000L,则continueRepl为false,跳出循环；否则继续循环

SUCCEEDED

获取集群状态信息

验证集群所注册的TaskManager数量与所指定的数量是否相符，不相符则打印err日志

打印集群消息

调用repStep方法获取continueRepl值

repStep方法主要是用于交互式方式接受用户输入:quit,stop,help。 private static boolean repStep( BufferedReader in, boolean readConsoleInput) throws IOException, InterruptedException { // wait until CLIENT_POLLING_INTERVAL is over or the user entered something. long startTime = System.currentTimeMillis(); while ((System.currentTimeMillis() - startTime) < CLIENT_POLLING_INTERVAL_MS && (!readConsoleInput || !in.ready())) { Thread.sleep(200L); } //------------- handle interactive command by user. ---------------------- if (readConsoleInput && in.ready()) { String command = in.readLine(); switch (command) { case "quit": case "stop": return false; case "help": System.err.println(YARN_SESSION_HELP); break; default: System.err.println("Unknown command '" + command + "'. Showing help:"); System.err.println(YARN_SESSION_HELP); break; } } return true; }

用户通过客户端输入 quit 或 stop ，返回 false ；其他则返回 true

run()方法最后执行yarnClusterDescriptor.close

yarnClusterDescriptor.close();

外层的流程讲完了，我们来看下部署YarnSession集群的过程。

YarnSession部署
Flink 源码解析之从 Example 出发：读懂 Flink On Yarn 任务执行流程

部署通过AbstractYarnClusterDescriptor.deploySessionCluster方法，调用deployInternal()执行部署。

@Override public ClusterClient<ApplicationId> deploySessionCluster(ClusterSpecification clusterSpecification) throws ClusterDeploymentException { try { return deployInternal( clusterSpecification, "Flink session cluster", getYarnSessionClusterEntrypoint(), null, false); } catch (Exception e) { throw new ClusterDeploymentException("Couldn't deploy Yarn session cluster", e); } }

deployInternal()方法将阻塞，直到将ApplicationMaster/JobManager部署到纱线上为止。

由于代码块较长，这里我们做代码拆分展示分析。

1、配置验证 validateClusterSpecification(clusterSpecification); if (UserGroupInformation.isSecurityEnabled()) { // note: UGI::hasKerberosCredentials inaccurately reports false // for logins based on a keytab (fixed in Hadoop 2.6.1, see HADOOP-10786), // so we check only in ticket cache scenario. boolean useTicketCache = flinkConfiguration.getBoolean(SecurityOptions.KERBEROS_LOGIN_USETICKETCACHE); UserGroupInformation loginUser = UserGroupInformation.getCurrentUser(); if (loginUser.getAuthenticationMethod() == UserGroupInformation.AuthenticationMethod.KERBEROS && useTicketCache && !loginUser.hasKerberosCredentials()) { LOG.error("Hadoop security with Kerberos is enabled but the login user does not have Kerberos credentials"); throw new RuntimeException("Hadoop security with Kerberos is enabled but the login user " + "does not have Kerberos credentials"); } } isReadyForDeployment(clusterSpecification);

validateClusterSpecification方法主要是读取taskManagerMemorySize以及计算cutoff

private void validateClusterSpecification(ClusterSpecification clusterSpecification) throws FlinkException { try { final long taskManagerMemorySize = clusterSpecification.getTaskManagerMemoryMB(); // We do the validation by calling the calculation methods here // Internally these methods will check whether the cluster can be started with the provided // ClusterSpecification and the configured memory requirements final long cutoff = ContaineredTaskManagerParameters.calculateCutoffMB(flinkConfiguration, taskManagerMemorySize); TaskManagerServices.calculateHeapSizeMB(taskManagerMemorySize - cutoff, flinkConfiguration); } catch (IllegalArgumentException iae) { throw new FlinkException("Cannot fulfill the minimum memory requirements with the provided " + "cluster specification. Please increase the memory of the cluster.", iae); } }

之后则是验证是否启动Security，进行安全验证，最后判断是否可执行部署。

2、检查指定的yarn queue // ------------------ Check if the specified queue exists -------------------- checkYarnQueues(yarnClient); 3、读取并设置动态属性 // ------------------ Add dynamic properties to local flinkConfiguraton ------ Map<String, String> dynProperties = getDynamicProperties(dynamicPropertiesEncoded); for (Map.Entry<String, String> dynProperty : dynProperties.entrySet()) { flinkConfiguration.setString(dynProperty.getKey(), dynProperty.getValue()); } 4、检查yarn集群是否能满足资源请求.

创建YarnClientApplication 以及 GetNewApplicationResponse

// Create application via yarnClient final YarnClientApplication yarnApplication = yarnClient.createApplication(); final GetNewApplicationResponse appResponse = yarnApplication.getNewApplicationResponse();

验证集群资源

Resource maxRes = appResponse.getMaximumResourceCapability(); final ClusterResourceDescription freeClusterMem; try { freeClusterMem = getCurrentFreeClusterResources(yarnClient); } catch (YarnException | IOException e) { failSessionDuringDeployment(yarnClient, yarnApplication); throw new YarnDeploymentException("Could not retrieve information about free cluster resources.", e); } final int yarnMinAllocationMB = yarnConfiguration.getInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 0); final ClusterSpecification validClusterSpecification; try { validClusterSpecification = validateClusterResources( clusterSpecification, yarnMinAllocationMB, maxRes, freeClusterMem); } catch (YarnDeploymentException yde) { failSessionDuringDeployment(yarnClient, yarnApplication); throw yde; } LOG.info("Cluster specification: {}", validClusterSpecification);

获取ExecutionMode，执行startAppMaster并返回YarnClusterClient对象

final ClusterEntrypoint.ExecutionMode executionMode = detached ? ClusterEntrypoint.ExecutionMode.DETACHED : ClusterEntrypoint.ExecutionMode.NORMAL; flinkConfiguration.setString(ClusterEntrypoint.EXECUTION_MODE, executionMode.toString()); ApplicationReport report = startAppMaster( flinkConfiguration, applicationName, yarnClusterEntrypoint, jobGraph, yarnClient, yarnApplication, validClusterSpecification); String host = report.getHost(); int port = report.getRpcPort(); // Correctly initialize the Flink config flinkConfiguration.setString(JobManagerOptions.ADDRESS, host); flinkConfiguration.setInteger(JobManagerOptions.PORT, port); flinkConfiguration.setString(RestOptions.ADDRESS, host); flinkConfiguration.setInteger(RestOptions.PORT, port); // the Flink cluster is deployed in YARN. Represent cluster return createYarnClusterClient( this, validClusterSpecification.getNumberTaskManagers(), validClusterSpecification.getSlotsPerTaskManager(), report, flinkConfiguration, true);

由于篇幅问题，至于startAppMaster这里就不再深入分析，有兴趣的朋友可以自行阅读。或者在后续文章中再做详细讲解。

任务执行入口

程序执行入口同样是$FLINK_HOME/bin/flink run，通过CliFrontend.run调用runProgram()来运行项目。

1、实例化ClusterDescriptor ClusterDescriptor<T> clusterDescriptor = customCommandLine.createClusterDescriptor(commandLine);

此处的customCommandLine对象一个FlinkYarnSessionCli实例。

2、获取clusterId并结合DetachedMode判断处理逻辑 if (clusterId == null && runOptions.getDetachedMode()) { ... final JobGraph jobGraph = PackagedProgramUtils.createJobGraph(program, configuration, parallelism); final ClusterSpecification clusterSpecification = customCommandLine.getClusterSpecification(commandLine); client = clusterDescriptor.deployJobCluster( clusterSpecification, jobGraph, runOptions.getDetachedMode()); ... }else{ if (clusterId != null) { client = clusterDescriptor.retrieve(clusterId); shutdownHook = null; } else { // also in job mode we have to deploy a session cluster because the job // might consist of multiple parts (e.g. when using collect) final ClusterSpecification clusterSpecification = customCommandLine.getClusterSpecification(commandLine); client = clusterDescriptor.deploySessionCluster(clusterSpecification); // if not running in detached mode, add a shutdown hook to shut down cluster if client exits // there's a race-condition here if cli is killed before shutdown hook is installed if (!runOptions.getDetachedMode() && runOptions.isShutdownOnAttachedExit()) { shutdownHook = ShutdownHookUtil.addShutdownHook(client::shutDownCluster, client.getClass().getSimpleName(), LOG); } else { shutdownHook = null; } } ... }

第一个if中主要是用于DetachedMode(客户端分离模式)

创建JobGraph

获取ClusterSpecification

部署任务集群deployJobCluster

如果条件不成立则走else。

第二个if中如果clusterId不为空，则通过clusterDescriptor.retrieve获取client对象。

否则通过clusterDescriptor.deploySessionCluster部署，获取client对象

3、通过client对象执行任务 executeProgram(program, client, userParallelism); 4、执行获取JobSubmissionResult结果 protected void executeProgram(PackagedProgram program, ClusterClient<?> client, int parallelism) throws ProgramMissingJobException, ProgramInvocationException { logAndSysout("Starting execution of program"); final JobSubmissionResult result = client.run(program, parallelism); .... } 5、client提交任务

经过多层run方法调用最终执行YarnClusterClient.submitJob()方法。

public JobSubmissionResult submitJob(JobGraph jobGraph, ClassLoader classLoader) throws ProgramInvocationException { if (isDetached()) { if (newlyCreatedCluster) { stopAfterJob(jobGraph.getJobID()); } return super.runDetached(jobGraph, classLoader); } else { return super.run(jobGraph, classLoader); } }

之后的方法调用与前一篇文章《Flink源码解析 | 从Example出发：读懂集群任务执行流程》一致。需要进一步去阅读了解。

之后的文章准备讲下采用start-scala-shell.sh脚本执行的流程，各类Graph的生成以及actor系统。之后会将以上这些文章整理出一篇完整的Apache Flink的任务执行流程总结文章出来，尽请期待。

关注公众号

↧

Online Hadoop MapReduce Test Practice For Hadoop Interview

December 17, 2018, 12:42 pm

≫ Next: This Week in Neo4j Crime Investigation Sandbox, GraphQL to Cypher transpiler ...

≪ Previous: Flink 源码解析之从 Example 出发：读懂 Flink On Yarn 任务执行流程

1. Hadoop Mapreduce Test

It seems that you have mastered the level 1 quiz . So are you ready for the next level? This Hadoop MapReduce test will consist of more of amateur level questions and less of the basics, so be prepared. The only motive behind this MapReduce quiz is to furnish your knowledge and build your accuracy on the questions regarding MapReduce because if you answer them correctly, that will raise your confidence ultimately leading to crack the Hadoop Interview .

Online Hadoop MapReduce Test Practice For Hadoop Interview

So, All the Best.

↧