Unified
Data Analytics Platform
(with Zeppelin, Ambari, Geode, SpringXD and
HAWQ)
by Christian Tzolov
@christzolov
Whoami
Christian Tzolov
Technical Architect at Pivotal,
BigData, Hadoop, SpringXD,
Apache Committer, Crunch PMC
member
ctzolov@pivotal.io
blog.tzolov.net
@christzolov
Contents
• DEMO
• Zeppelin Interpreters
• PSQL (to became JDBC in 0.6.x)
• Geode
• SpringXD
• Apache Ambari
• Zeppelin Service
• Geode, HAWQ and Spring XD services
• Webpage Embedder View
Demo: Twitter Streams with
SpringXD, Geode and HAWQ
Technical Stack
Apache HDFS Data Lake - PHD or HDP Hadoop
Apache HAWQ SQL on Hadoop (OLAP)
Apache Geode In-memory data grid (OLTP)
Spring XD Integration and Streaming Runtime
Apache Ambari Manages All Clusters
Apache Zeppelin Web UI for interaction with Data Systems
Hadoop/HDFS
Geode HAWQ
SpringXD
Ambari
Zeppelin
Spring XD
Orchestrates and automates all steps across multiple
data stream pipelines
• HTTP
• Tail
• File
• Mail
• Twitter
• Gemfire
• Syslog
• TCP
• UDP
• JMS
• RabbitMQ
• MQTT
• Kafka
• Reactor TCP/UDP
• Filter
• Transformer
• Object-to-JSON
• JSON-to-Tuple
• Splitter
• Aggregator
• HTTP Client
• Groovy Scripts
• Java Code
• JPMML Evaluator
• Spark Streaming
• File
• HDFS
• JDBC
• TCP
• Log
• Mail
• RabbitMQ
• Gemfire
• Splunk
• MQTT
• Kafka
• Dynamic Router
• Counters
Apache Geode
• Cache - Performance / Consistency / Resiliency
• Region - Highly available, redundant, distributed
Map
China Railway
Corporation
5,700 train stations
4.5 million tickets per day
20 million daily users
1.4 billion page views per day
40,000 visits per second
Indian Railways
7,000 stations
72,000 miles of track
23 million passengers daily
120,000 concurrent users
10,000 transactions per minute
Apache HAWQ
• Built around a Greenplum MPP DB
• 100% ANSI SQL compliant: SQL-92/99/2003…
• ODBC and JDBC
• Hadoop Native: Parquet, HDFS and YARN
• Extensible - Web Tables, PXF
• TPC-DS outperforms Impala by overall 454%
Demo
tweets = twittersearch --query=<keywork> | hdfs --directory=/user/zeppelin/xd/tweets
geodeTap = tap:stream:tweets > gemfire-json-server --regionName=regionTweet
hawqTap = tap:stream:tweets > transform --script=tweetJsonToTsv.groovy | gpfdist --table=xdsink
tweetsCount = tap:stream:tweets > json-to-tuple | transform --expression='payload.id_str' | counter
SpringXD Interpreter(s)
• %xd.stream and %xd.job
• Multiple streams or jobs in a paragraph.
• Special Deploy/Launch Semantics
• Zeppelin Dynamic Forms (${…})
• Comprihensive Stream and Job DSL auto-
completion (Ctrl+.)
SpringXD Conf
PSQL Interpreter
• Prefix: %psql.sql
• PostgreSQL, HAWQ/PXF, Greenplum … JDBC
• PSQL command line shell (via %sh)
• Zeppelin Dynamic Forms (${…})
• Comprihensive SQL/JDBC autocompletion (Ctrl+.)
PSQL Configuration
PSQL Doc
https://zeppelin.incubator.apache.org/docs/0.5.5-
incubating/interpreter/postgresql.html
PSQL/HAWQ Demo
• http://10.68.58.121:9995/#/notebook/2B2ZYS18Y
Geode Interpreter
• Prefix: %geode.oql
• OQL and PDX nested access (user.name)
• Geode command line shell (via %sh)
• Zeppelin Dynamic Forms (${…})
• Basic OQL auto-completion (Ctrl+.)
Geode Configuration
Geode Doc
https://zeppelin.incubator.apache.org/docs/0.5
.5-incubating/interpreter/geode.html
Geode Tutorial
• http://10.68.58.121:9995/#/notebook/2AW57BUN4
Apache Ambari
Zeppelin, Geode, HAWQ, SpringXD Services …
Ambari Services
Ambari Services
• Ambari Zeppelin Service: github , rpm, blog
• Ambari Geode Service: github, rpm
• Ambari SpringXD Service: github
• Ambari HAWQ Service (Pivotal BDS dist)
Ambari Blueprint
http://<ambari>:8080/api/v1/clusters/mv10?format=blueprint
Webpage Ebedder
https://github.com/tzolov/ambari-webpage-embedder-view
stay in touch
ctzolov@pivotal.io
blog.tzolov.net
@christzolov
https://nl.linkedin.com/in/tzolov

Apache Zeppelin Meetup Christian Tzolov 1/21/16

Editor's Notes

  • #2 I’m glad to be here. be able to share few ideas about how to combine the strenght of powerfull tools like zep, xd, geode, hawq.
  • #3 Mention the PSQL, Geode and SpringXD intepreters work
  • #5 http://10.68.58.121:9995/#/notebook/2BC41KDMZ
  • #6 http://10.68.58.121:9995/#/notebook/2BC41KDMZ
  • #7 http://10.68.58.121:8080/#/main/services/HDFS/summary
  • #8 It simplifies big data projects by orchestrating and automating all steps across multiple data stream pipelines—creating, deploying, and managing many pipelines in a unified, extensible, distributed way. A stream is composed from modules. Each module is deployed to a container and its channels are bound to the transport.
  • #11 http://10.68.58.121:9995/#/notebook/2BC41KDMZ
  • #13 add spring xd ambari container layout
  • #14 %psql.sql drop table if exists mytable; create table mytable (i int); insert into mytable select generate_series(1, 100); %psql.sql select * from mytable; %psql.sql select count(*) from mytable; select * from mytable;
  • #15 add slide showing the HAWQ deployment model (AMBARI)
  • #17 %psql.sql drop table if exists mytable; create table mytable (i int); insert into mytable select generate_series(1, 100); %psql.sql select * from mytable; %psql.sql select count(*) from mytable; select * from mytable;
  • #23 http://10.68.58.121:8080/#/main/dashboard/metrics
  • #24 Zeppelin http://blog.tzolov.net/2015/08/zeppelin-service-for-ambari.html?view=sidebar https://github.com/tzolov/zeppelin-ambari-plugin https://bintray.com/big-data/rpm/zeppelin-ambari-plugin/view Geode (Credits to Steve Shangguan!) https://github.com/tzolov/ambari-gemfire/tree/geode https://bintray.com/big-data/rpm/geode-ambari-plugin/view
  • #26 https://github.com/tzolov/ambari-webpage-embedder-view