A powerful feature in Postgres called Foreign Data Wrappers lets end users integrate data from MongoDB, Hadoop and other solutions with their Postgres database and leverage it as single, seamless database using SQL.
Use of these features has skyrocketed since EDB released to the open source community new FDWs for MongoDB, Hadoop and MySQL that support both read and write capabilities. Now greatly enhanced, FDWs enable integrating data across disparate deployments to support new workloads, expanded development goals and harvesting greater value from data.
Target Audience: This presentation is intended for IT Professionals seeking to do more with Postgres in his every day projects and build new applications.
A powerful feature in Postgres called Foreign Data Wrappers lets end users integrate data from MongoDB, Hadoop and other solutions with their Postgres database and leverage it as single, seamless database using SQL.
Use of these features has skyrocketed since EDB released to the open source community new FDWs for MongoDB, Hadoop and MySQL that support both read and write capabilities. Now greatly enhanced, FDWs enable integrating data across disparate deployments to support new workloads, expanded development goals and harvesting greater value from data.
Target Audience: This presentation is intended for IT Professionals seeking to do more with Postgres in his every day projects and build new applications.
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
Inefficient data workloads are all too common across enterprises - causing costly delays, breakages, hard-to-maintain complexity, and ultimately lost productivity. For a typical enterprise with multiple data warehouses, thousands of reports, and hundreds of thousands of ETL jobs being executed every day, this loss of productivity is a real problem. Add to all of this the complex handwritten SQL queries, and there can be nearly a million queries executed every month that desperately need to be optimized, especially to take advantage of the benefits of Apache Hadoop. How can enterprises dig through their workloads and inefficiencies to easily see which are the best fit for Hadoop and what’s the fastest path to get there?
Cloudera Navigator Optimizer is the solution - analyzing existing SQL workloads to provide instant insights into your workloads and turns that into an intelligent optimization strategy so you can unlock peak performance and efficiency with Hadoop. As the newest addition to Cloudera’s enterprise Hadoop platform, and now available in limited beta, Navigator Optimizer has helped customers profile over 1.5 million queries and ultimately save millions by optimizing for Hadoop.
Hive on spark is blazing fast or is it finalHortonworks
This presentation was given at the Strata + Hadoop World, 2015 in San Jose.
Apache Hive is the most popular and most widely used SQL solution for Hadoop. To keep pace with Hadoop’s increasingly vital role in the Enterprise, Hive has transformed from a batch-only, high-latency system into a modern SQL engine capable of both batch and interactive queries over large datasets. Hive’s momentum is accelerating: With Spark integration and a shift to in-memory processing on the horizon, Hive continues to expand the boundaries of Big Data.
In this talk the speakers examined Hive performance, past, present and future. In particular they looked at Hive’s origins as a petabyte scale SQL engine.
Through some numbers and graphs, they showed how Hive became 100x faster by moving beyond MapReduce, by vectorizing execution and by introducing a cost-based optimizer.
They detailed and discussed the challenges of scalable SQL on Hadoop.
The looked into Hive’s sub-second future, powered by LLAP and Hive on Spark.
And showed just how fast Hive on Spark really is.
24. 起動が速い
•
1秒以内で立ち上がる
•
仕組み
•
Spark avoids this problem by using a fast event-driven RPC library
to launch tasks and by reusing its worker processes. It can launch
thousands of tasks per second with only about 5 ms of over- head
per task, making task lengths of 50–100 ms and MapReduce jobs
of 500 ms viable. What surprised us is how much this affected
query performance, even in large (multi-minute) queries[2].
•
意訳: 実装がんばったら,5msec で 1タスク立ち上がるように
なったよ
•
Tuesday, October 22, 13
[2]Shark: SQL and Rich Analytics at Scale
31. Shark とは?
• Spark 上で SQL はじめました
CREATE TABLE logs_last_month_cached AS SELECT * FROM logs
WHERE time > date(...);
SELECT page, count(*) c FROM logs_last_month_cached GROUP
BY page ORDER BY c DESC LIMIT 10;
Shark(SQL)
Spark
HDFS
Tuesday, October 22, 13