SQL et in-memory sur Hadoop avec Pivotal et HAWQ

  • 715 views
Uploaded on

Pivotal, la plateforme Big Data signé EMC, embarque des technologies pour gérer des requêtes sql en mémoire très performante et pas que ... …

Pivotal, la plateforme Big Data signé EMC, embarque des technologies pour gérer des requêtes sql en mémoire très performante et pas que ...

Présentation de Alexandre Vasseur et Jérôme Campo de Pivotal

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
715
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
11
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A NEW PLATFORM FOR A NEW ERA
  • 2. SQL et in-memory sur Hadoop avec Pivotal et HAWQ Alexandre Vasseur Jérôme Campo Field Engineering, Pivotal © Copyright 2013 Pivotal. All rights reserved.
  • 3. Pivotal Spin off d’EMC et VMware Editeur logiciel Plus de 1250 employés Data Science Team © Copyright 2013 Pivotal. All rights reserved. Pivotal HD
  • 4. Hadoop à 1000 noeuds pour la communauté Ÿ  1000 noeuds, 24 000 cores Ÿ  48 TB RAM Ÿ  24 PB (12 000 disques) Ÿ  Améliorer Hadoop Ÿ  Valider l’éco système Hadoop à l’échelle http://www.analyticsworkbench.com © Copyright 2013 Pivotal. All rights reserved.
  • 5. Pivotal Hadoop HAWQ– Advanced Database Services ANSI SQL + Analytics Pivotal HD Enterprise Resource Management & Workflow Xtension Framework HBase Catalog Services Dynamic Pipelining Pig, Hive, Mahout Map Reduce Hadoop Virtualization (HVE) Yarn Sqoop Data Loader Apache Pivotal HD Added Value Configure, Deploy, Monitor, Manage HDFS Zookeeper © Copyright 2013 Pivotal. All rights reserved. Query Optimizer Command Flume Center
  • 6. 10 ans de R&D sur la base de données massivement parallèle •  Moteur SQL haute performance –  Multi-petabyte –  ANSI SQL complet –  Drivers standardisés et éco-système •  Accès direct aux formats Hadoop –  Text, Avro, Hive, HBase, autres formats via API •  Database massivement parrallèle sur Hadoop –  Format colonne, compressé, partitionnés, polymorphe –  Gestion des priorités et des accès MAD lib © Copyright 2013 Pivotal. All rights reserved. •  In-Database Analytics –  Bibliothèques statistiques et machine learning parrallèlisées –  Accessible via R ou SQL
  • 7. Fonctionnement de HAWQ Clients SELECT beer, price FROM Bars b, Sells s WHERE b.name = s.bar AND b.city = ‘San Francisco’ HAWQ Master Host Query Parser JDBC/ODBC SQL Console Query Optimizer HDFS Namenode HAWQ Segment Host HAWQ Segment Host HAWQ Segment Host Query Executor Query Executor Query Executor HDFS Datanode HDFS Datanode HDFS Datanode © Copyright 2013 Pivotal. All rights reserved. ...
  • 8. Fonctionnement de HAWQ Execution Plan MotionGather Clients Projects.beer, s.price HashJoinb.name = s.bar HAWQ Master Host MotionRedist(b.name) Query Parser JDBC/ODBC SQL Console Query Optimizer HDFS Namenode € s ScanSells Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Host HAWQ Segment Host HAWQ Segment Host Query Executor Query Executor Query Executor HDFS Datanode HDFS Datanode HDFS Datanode © Copyright 2013 Pivotal. All rights reserved. ...
  • 9. Fonctionnement de HAWQ Clients HAWQ Master Host Query Parser JDBC/ODBC Query Optimizer HDFS Namenode SQL Console HAWQ Segment Host MotionGather Projects.beer, s.price Query Executor HAWQ Segment Host MotionGather Projects.beer, s.price HAWQ Segment Host MotionGather Projects.beer, s.price MotionRedist(b.name) MotionRedist(b.name) MotionRedist(b.name) s ScanSells Filterb.city = 'San Francisco' s ScanSells Filterb.city = 'San Francisco' HDFS Datanode © Copyright 2013 Pivotal. All rights reserved. Filterb.city = 'San Francisco' b ScanBars b ScanBars b ScanBars Query Executor HashJoinb.name = s.bar HashJoinb.name = s.bar HashJoinb.name = s.bar s ScanSells Query Executor HDFS Datanode HDFS Datanode ...
  • 10. 10 ans de R&D sur les grilles mémoires NoSQL/NewSQL Sensor Data / Feeds Map-Reduce Analytic Apps Model Refresh Model Refresh I/P & O/P Formatter Online Apps HAWQ GPXF DW Native Persistence External Tables Re-evaluate Model Shared Data - HFiles © Copyright 2013 Pivotal. All rights reserved. Re-evaluate Model HDFS ICM
  • 11. In-memory No/NewSQL sur Hadoop Ÿ  Bénéfices d’une grille mémoire –  Données en mémoire quand il le faut –  Très haute disponibilité, concurrence massive, temps de réponse mémoire Ÿ  Intégration native Hadoop –  Eviction / stockage sur HDFS natif –  Accès à la donnée in-memory ou globale via SQL/NoSQL et HAWQ © Copyright 2013 Pivotal. All rights reserved.
  • 12. Tester Pivotal HD Pivotal HD Single Node VM Pivotal HD avec Vagrant Ÿ  Hadoop Stack Components – Pig, Hive, Hbase, HDFS, Mahout, YARN, MRv2 Ÿ  Installation multi VM avec Virtual Box ou VMware Workstation/Fusion Ÿ  HAWQ / PXF Ÿ  Command Center Ÿ  DataLoader Ÿ  Eclipse, Maven, Ant Ÿ  Retail Data Set http://gopivotal.com/pivotal-products/data/pivotal-hd#4 http://blog.gopivotal.com/products/in-45-min-set-up-hadoop-pivotal-hd-on-a-multi-vm-cluster-run-test-data © Copyright 2013 Pivotal. All rights reserved.
  • 13. Big/Fast Demo – Big Data Workflow HTTP Pipe Filter Transform Tap Tap JSON Field Extract JSON Field Logistic Extract Regression MAD lib © Copyright 2013 Pivotal. All rights reserved. HDFS Sink Analytic Counter Analytic Counter
  • 14. We’re hiring ! avasseur@gopivotal.com jcampo@gopivotal.com Merci © Copyright 2013 Pivotal. All rights reserved.