SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
51.
Hive LLAP
HDFS
LLAPは複数のYARN Container上でデーモンとし
て動作し、Tezタスクの高速化を行う
Node
Hive
Query
Contain
er
Contain
er
Contain
er
Contain
er
LLAP LLAP LLAP LLAP
LLAP = Live Long and Prosper
Live Long And Process
YARN Cluster Container デーモン上
でのクエリ
フラグメント
の実行
インメモリ
のキャッ
シュ
Page. 51
60.
評価で用いたSQL
4マスタ連携
SELECT "customer"."c_nation" AS "c_nation",
"part"."p_color" AS "p_color",
SUM("lineorder"."lo_revenue") AS "sum_lo_revenue_ok"
FROM "public"."lineorder" "lineorder"
INNER JOIN "public"."customer" "customer" ON ("lineorder"."lo_custkey" = "customer"."c_custkey")
INNER JOIN "public"."supplier" "supplier" ON ("lineorder"."lo_suppkey" = "supplier"."s_suppkey")
INNER JOIN "public"."part" "part" ON ("lineorder"."lo_partkey" = "part"."p_partkey")
INNER JOIN "public"."dwdate" "dwdate" ON ("lineorder"."lo_orderdate" = "dwdate"."d_datekey")
WHERE (("customer"."c_nation" IN ('ALGERIA', 'ARGENTINA', 'BRAZIL')) AND ("part"."p_color" IN ('black', 'blue',
'brown')))
GROUP BY 1, 2
SELECT "customer"."c_nation" AS "c_nation",
"part"."p_color" AS "p_color",
SUM("lineorder"."lo_revenue") AS "sum_lo_revenue_ok"
FROM "public"."lineorder" "lineorder"
INNER JOIN "public"."customer" "customer" ON ("lineorder"."lo_custkey" = "customer"."c_custkey")
INNER JOIN "public"."supplier" "supplier" ON ("lineorder"."lo_suppkey" = "supplier"."s_suppkey")
INNER JOIN "public"."part" "part" ON ("lineorder"."lo_partkey" = "part"."p_partkey")
INNER JOIN "public"."dwdate" "dwdate" ON ("lineorder"."lo_orderdate" = "dwdate"."d_datekey")
GROUP BY 1, 2
SELECT "customer"."c_nation" AS "c_nation",
"part"."p_color" AS "p_color",
SUM("lineorder"."lo_revenue") AS "sum_lo_revenue_ok"
FROM "public"."lineorder" "lineorder"
INNER JOIN "public"."customer" "customer" ON ("lineorder"."lo_custkey" = "customer"."c_custkey")
INNER JOIN "public"."supplier" "supplier" ON ("lineorder"."lo_suppkey" = "supplier"."s_suppkey")
INNER JOIN "public"."part" "part" ON ("lineorder"."lo_partkey" = "part"."p_partkey")
INNER JOIN "public"."dwdate" "dwdate" ON ("lineorder"."lo_orderdate" = "dwdate"."d_datekey")
WHERE ("part"."p_color" IN ('black', 'blue', 'brown'))
GROUP BY 1, 2
ヒット件数 多
(51億件)
ヒット件数 中
(2.1億件)
ヒット件数 少
(25百万件)
Page. 60
61.
評価で用いたSQL
2マスタ連携
SELECT "customer"."c_nation" AS "c_nation",
"part"."p_color" AS "p_color",
SUM("lineorder"."lo_revenue") AS "sum_lo_revenue_ok"
FROM "public"."lineorder" "lineorder"
INNER JOIN "public"."customer" "customer" ON ("lineorder"."lo_custkey" = "customer"."c_custkey")
INNER JOIN "public"."part" "part" ON ("lineorder"."lo_partkey" = "part"."p_partkey")
WHERE (("customer"."c_nation" IN ('ALGERIA', 'ARGENTINA', 'BRAZIL')) AND ("part"."p_color" IN ('black', 'blue',
'brown')))
GROUP BY 1, 2
SELECT "customer"."c_nation" AS "c_nation",
"part"."p_color" AS "p_color",
SUM("lineorder"."lo_revenue") AS "sum_lo_revenue_ok"
FROM "public"."lineorder" "lineorder"
INNER JOIN "public"."customer" "customer" ON ("lineorder"."lo_custkey" = "customer"."c_custkey")
INNER JOIN "public"."part" "part" ON ("lineorder"."lo_partkey" = "part"."p_partkey")
GROUP BY 1, 2
SELECT "customer"."c_nation" AS "c_nation",
"part"."p_color" AS "p_color",
SUM("lineorder"."lo_revenue") AS "sum_lo_revenue_ok"
FROM "public"."lineorder" "lineorder"
INNER JOIN "public"."customer" "customer" ON ("lineorder"."lo_custkey" = "customer"."c_custkey")
INNER JOIN "public"."part" "part" ON ("lineorder"."lo_partkey" = "part"."p_partkey")
WHERE ("part"."p_color" IN ('black', 'blue', 'brown'))
GROUP BY 1, 2
ヒット件数 多
(51億件)
ヒット件数 中
(2.1億件)
ヒット件数 少
(25百万件)
Page. 61
62.
Data Platform for Hadoopの概要
バッチ処理/リアルタイム処理と多様なデータ分析に対応可能な『ビッグデータ分
析共通基盤』を、事前検証済みで提供し、迅速な導入を実現
① バッチ処理とリアルタイム処理
に対応
大規模データの分散処理に適した「Apache™
Hadoop®」とインメモリ処理を効率的に行い、
リアルタイムな処理を可能にする「Apache™
Spark®」により大規模データのバッチ処理か
らリアルタイム処理までを実現
② 構造データ、非構造データ
両方の処理に最適
③ 事前検証による迅速な導入
と安定した運用
大規模データを元来のデータ構造のま
まで蓄積。データ活用の際には、アプ
リケーションの用途に合わせてデータ
構造を指定しながら読みだすことがで
きるため、データ活用の自由度を拡大
必要なハードウェアとソフトウェアを組み
合わせ、事前に設計(サイジングとチュー
ニング含む)・検証・構築した統合型シス
テムで提供。導入期間の短縮とプラット
フォーム品質の安定を両立し、トータルコ
ストの削減に貢献
リリース情報(国内)2016年2月出荷開始
64.
DX 1000の特長
省電力 高並列・高密度
Traditional servers
*1:Based on all the servers with Atom C2000 and 2.5GbEther
*2: A total of network bandwidth of all downlinks in the rack
*3:RMS :Rack Management System
700 servers
per rack
75% less space
75% less energy
5600 processor cores
per rack
22TB of memory per rack
90TB of SSD storage
per rack
Networking*2
3.5Tbps per rack
• サーバ向けIntel® AtomTMプロセッサー C2000シリーズの採用やSSDの標準搭
載等により、クラス最高レベル*1の高集積/省電力化を実現
• CPU性能とバランスのとれた2.5GbitEthernetをいち採用。10GbEtherにくらべ電
力を抑えながら、スケールアウト型のビッグデータ分析基盤で求められる高速ネ
ットワーク環境を実現
*1:Atom C2000搭載,2.5GbEtherNW対応機種において
*2:サーバ当たりNW帯域×サーバ台数
Page. 64
Our queries were already highly optimized. So we focused on some other parts. A query execution essentially is put together from – Client execution [ 0s if done correctly ] – Optimization [HiveServer2] [~ 0.1s] – HCatalog lookups [Hcatalog, Metastore] [ very fast in hive 14 ] – Application Master creation [4-5s] – Container Allocation [3-5s] – Query Execution