Quantcast
Channel: CodeSection,代码区,数据库(综合) - CodeSec
Viewing all articles
Browse latest Browse all 6262

如何立足Hadoop成功建立商务智能:七项必备诀窍

$
0
0

在企业实施Hadoop技术时,其中的顶级用例无疑在于商务智能(简称BI)。根据新近发布的一项基准调查结果,我们整理出最适用于处理各类工作负载的几款HadoopSQL引擎。下面,我们一起来看:

1. 不存在万试万灵的选项


如何立足Hadoop成功建立商务智能:七项必备诀窍

Complex Queries

AtScale's Klahr warns that, while Impala and Presto do well on concurrency, the results shifted as queries became more complex. When it came to complex queries, SparkSQL started to outperform Impala, Klahr told InformationWeek. "You need to have a multi-engine strategy and a mechanism that can automatically route end-user queries to the right engine without the end-user having to think about 'Am I writing a Spark query or an Impala query?'" he said, noting that AtScale does perform that kind of automatic routing to the best engine.

5. 大规模数据集
如何立足Hadoop成功建立商务智能:七项必备诀窍

Large Data Sets

Querying big data sets generally means slower results. The fastest performing engines for these data sets were Spark SQL at less than 20 seconds, followed by Impala at less than 40 seconds. Response times for both of these engines improved significantly from the benchmark six months ago to today. Hive and Presto returned results in just over 2 minutes. Increasing the number of joins generally increased processing time, according to AtScale. Spark SQL and Impala were more likely to perform best as the number of joins increased.

6. 不同引擎各擅胜场
如何立足Hadoop成功建立商务智能:七项必备诀窍

Open Source Advances

Klahr told InformationWeek in an interview that between the first edition of the benchmark 6 months ago and today, the query performance of Hive improved by 3.5x, Spark by 2.5x, and Impala by 3x. "If I'm a buyer or an executive, these improvements are going to make me stop and question any investment on a proprietary Hadoop engine," Klahr said, because these open source tools are being improved at a rapid pace.

End.


Viewing all articles
Browse latest Browse all 6262

Trending Articles