Apache Zeppelin Meetup Christian Tzolov 1/21/16

Unified
Data Analytics Platform
(with Zeppelin, Ambari, Geode, SpringXD and
HAWQ)
by Christian Tzolov
@christzolov

Whoami
Christian Tzolov
Technical Architect at Pivotal,
BigData, Hadoop, SpringXD,
Apache Committer, Crunch PMC
member
ctzolov@pivotal.io
blog.tzolov.net
@christzolov

Contents
• DEMO
• Zeppelin Interpreters
• PSQL (to became JDBC in 0.6.x)
• Geode
• SpringXD
• Apache Ambari
• Zeppelin Service
• Geode, HAWQ and Spring XD services
• Webpage Embedder View

Demo: Twitter Streams with
SpringXD, Geode and HAWQ

Technical Stack
Apache HDFS Data Lake - PHD or HDP Hadoop
Apache HAWQ SQL on Hadoop (OLAP)
Apache Geode In-memory data grid (OLTP)
Spring XD Integration and Streaming Runtime
Apache Ambari Manages All Clusters
Apache Zeppelin Web UI for interaction with Data Systems
Hadoop/HDFS
Geode HAWQ
SpringXD
Ambari
Zeppelin

Spring XD
Orchestrates and automates all steps across multiple
data stream pipelines
• HTTP
• Tail
• File
• Mail
• Twitter
• Gemfire
• Syslog
• TCP
• UDP
• JMS
• RabbitMQ
• MQTT
• Kafka
• Reactor TCP/UDP
• Filter
• Transformer
• Object-to-JSON
• JSON-to-Tuple
• Splitter
• Aggregator
• HTTP Client
• Groovy Scripts
• Java Code
• JPMML Evaluator
• Spark Streaming
• File
• HDFS
• JDBC
• TCP
• Log
• Mail
• RabbitMQ
• Gemfire
• Splunk
• MQTT
• Kafka
• Dynamic Router
• Counters

Apache Geode
• Cache - Performance / Consistency / Resiliency
• Region - Highly available, redundant, distributed
Map
China Railway
Corporation
5,700 train stations
4.5 million tickets per day
20 million daily users
1.4 billion page views per day
40,000 visits per second
Indian Railways
7,000 stations
72,000 miles of track
23 million passengers daily
120,000 concurrent users
10,000 transactions per minute

Apache HAWQ
• Built around a Greenplum MPP DB
• 100% ANSI SQL compliant: SQL-92/99/2003…
• ODBC and JDBC
• Hadoop Native: Parquet, HDFS and YARN
• Extensible - Web Tables, PXF
• TPC-DS outperforms Impala by overall 454%

Demo
tweets = twittersearch --query=<keywork> | hdfs --directory=/user/zeppelin/xd/tweets
geodeTap = tap:stream:tweets > gemfire-json-server --regionName=regionTweet
hawqTap = tap:stream:tweets > transform --script=tweetJsonToTsv.groovy | gpfdist --table=xdsink
tweetsCount = tap:stream:tweets > json-to-tuple | transform --expression='payload.id_str' | counter

SpringXD Interpreter(s)
• %xd.stream and %xd.job
• Multiple streams or jobs in a paragraph.
• Special Deploy/Launch Semantics
• Zeppelin Dynamic Forms (${…})
• Comprihensive Stream and Job DSL auto-
completion (Ctrl+.)

PSQL Interpreter
• Prefix: %psql.sql
• PostgreSQL, HAWQ/PXF, Greenplum … JDBC
• PSQL command line shell (via %sh)
• Comprihensive SQL/JDBC autocompletion (Ctrl+.)

PSQL Doc
https://zeppelin.incubator.apache.org/docs/0.5.5-
incubating/interpreter/postgresql.html

PSQL/HAWQ Demo
• http://10.68.58.121:9995/#/notebook/2B2ZYS18Y

Geode Interpreter
• Prefix: %geode.oql
• OQL and PDX nested access (user.name)
• Geode command line shell (via %sh)
• Basic OQL auto-completion (Ctrl+.)

Geode Doc
https://zeppelin.incubator.apache.org/docs/0.5
.5-incubating/interpreter/geode.html

Geode Tutorial
• http://10.68.58.121:9995/#/notebook/2AW57BUN4

Apache Ambari
Zeppelin, Geode, HAWQ, SpringXD Services …

Ambari Services
• Ambari Zeppelin Service: github , rpm, blog
• Ambari Geode Service: github, rpm
• Ambari SpringXD Service: github
• Ambari HAWQ Service (Pivotal BDS dist)

Ambari Blueprint
http://<ambari>:8080/api/v1/clusters/mv10?format=blueprint

Webpage Ebedder
https://github.com/tzolov/ambari-webpage-embedder-view

stay in touch
ctzolov@pivotal.io
blog.tzolov.net
@christzolov
https://nl.linkedin.com/in/tzolov

Apache Zeppelin Meetup Christian Tzolov 1/21/16

More Related Content

What's hot

Viewers also liked

Similar to Apache Zeppelin Meetup Christian Tzolov 1/21/16

More from PivotalOpenSourceHub

Recently uploaded

Apache Zeppelin Meetup Christian Tzolov 1/21/16

Editor's Notes