Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Timothy	Spann
2017	Future	of	Data	– Princeton	Meetup
June	20,	2017
H...
2 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
• Schema Registry – Milind Pandit
• HDF	Streaming	Updates	– Tim	Span...
3 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Ambari	Integration
4 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Format	and	Schema	Aware	Efficient	Flow	Management
à Provide	processo...
5 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Record	Reader	CS
6 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Record	Writer	CS
7 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
‘QueryRecord’	Processor	– Treat	streaming	records	as	tables
8 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Component	Versioning
9 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Stream	Processing	– Introducing	Streaming	Analytics	Manager	(SAM)
St...
10 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
SAM	- Write	Complex	Streaming	Applications	With	No	Code
Streaming	A...
11 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
SAM’s	Value	Proposition
à Build	and	deploy	complex	stream	applicati...
12 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Stream	Builder	Module	for	App	Developers	
à Builder	components,	sho...
13 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Stream	Insight	Module	for	Business	Analysts
à A	tool	to	create	real...
14 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Stream	Ops	Module	for	IT	Operations	
à Create	and	manage	different	...
15 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Stream	Builder	Module	for	App	Developers	
à Builder	components,	sho...
16 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
SAM	is	All	about	Doing	Real-Time	Analytics	on	the	Stream
Real-Time
...
17 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Real-Time	Prescriptive	Analytics
à Question:	What	should	we	do	righ...
18 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Real-Time	Predictive	Analytics
à Question:	No	violation	events	but	...
19 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Building	the	Predictive	Model	on	HDP
Explore	small	subset	of	events...
20 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Logistical	Regression	Model
21 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Scoring	the	Predictive	Model	on	HDF
Use	SAM’s	enrich/custom	process...
22 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
SAM’s	Model	Registry	and	PMML	Processor
à Model	Registry
– Sam	has	...
23 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
SAM	Extensibility:	Custom	Processors,	UDF,	UDAFs
à Custom	Component...
24 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Streaming	Split	Join	Pattern
à 3	Enrichments	have	to	performed	on	t...
25 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Stream	Insight	Module	for	Business	Analysts
à A	tool	to	create	time...
26 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Streaming	Analytics	Manager
27 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Set	Up	An	Environment	for	SAM
28 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Hortonworks	SAM	Canvas	to	build	the	Streaming	Analytics	App	
withou...
29 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Hortonworks	SAM	App	Dashboard
30 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Schema	Registry	Dashboard	and	Details	of	One	Schema
31 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Contact:
Timothy	Spann			@PaaSDeV
www.meetup.com/futureofdata-princ...
32 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Hortonworks	Community	Connection
Read access for everyone, join to ...
33 ©	Hortonworks	Inc.	2011	– 2017		All	Rights	Reserved
Community	Engagement
Participate now at: community.hortonworks.com©...
Upcoming SlideShare
Loading in …5
×

Introduction to HDF 3.0

Introduction to Streaming Analytics Manager, Schema Registry and NiFi 1.2

  • Be the first to comment

Introduction to HDF 3.0

  1. 1. 1 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Timothy Spann 2017 Future of Data – Princeton Meetup June 20, 2017 Hosted by TRAC Intermodal Introduction to HDF 3.0
  2. 2. 2 © Hortonworks Inc. 2011 – 2017 All Rights Reserved • Schema Registry – Milind Pandit • HDF Streaming Updates – Tim Spann • EDW Optimization with Hadoop and HDF - Gregory C Keys, PhD.
  3. 3. 3 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Ambari Integration
  4. 4. 4 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Format and Schema Aware Efficient Flow Management à Provide processors for schema aware record structure for common processing patterns – Split, Enrich, Partition, Convert, Query (SQL queries powered by Apache Calcite) – Put/Get records between NiFi and Kafka, ElasticSearch, RDMBS (more soon) – Easy bridging to/from Columnar data formats like ORC or Parquet à Separate format/schema specific logic into extensible record readers and writers – Developers can write new readers/writers – Users can create new readers/writers with scripting live in production! à So what? – Format and schema aware processing *with* generic reusable components – Maintains full provenance/lineage trail – Dramatic speed/efficiency increase per node – Integration with Hortonworks Schema Registry and extensible for others
  5. 5. 5 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Record Reader CS
  6. 6. 6 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Record Writer CS
  7. 7. 7 © Hortonworks Inc. 2011 – 2017 All Rights Reserved ‘QueryRecord’ Processor – Treat streaming records as tables
  8. 8. 8 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Component Versioning
  9. 9. 9 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Stream Processing – Introducing Streaming Analytics Manager (SAM) Streaming Analytics Manager A brand new product module in the HDF stack to design, develop, deploy and manage streaming analytics app with a drag-and-drop user experience
  10. 10. 10 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SAM - Write Complex Streaming Applications With No Code Streaming Analytics Manager à A brand new product module in the HDF stack to design, develop, deploy and manage streaming analytics app with drag-and-drop paradigm – Build streaming analytics applications that do event correlation, context enrichment , complex pattern matching, analytical aggregations and creation of alerts/notifications when insights are discovered. – Give the coders the power to add key functions and extend the platform (add custom sinks, processors, spouts, etc..)
  11. 11. 11 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SAM’s Value Proposition à Build and deploy complex stream applications without writing any code à Only open source tool in the market with graphical programming paradigm à Speed time-to-market to build complex streaming analytics applications à Build streaming analytics applications without specialized skillsets. à Decouple data format from the streaming application itself while being schema aware à Support multiple underlining streaming engines
  12. 12. 12 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Stream Builder Module for App Developers à Builder components, shown on the canvas palette, are the building blocks used by the app developer to build streaming applications à Drag and drop to build a working streaming application without writing a single line of code à 4 Types of Components: Sources, Processors, Sinks and Custom
  13. 13. 13 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Stream Insight Module for Business Analysts à A tool to create real-time analytics dashboards, charts and graphs à 30+ visualization charts out of the box with customization capability à Druid is the Analytics Engine that powers the Stream Insight Module.
  14. 14. 14 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Stream Ops Module for IT Operations à Create and manage different environments in which individual streaming applications will be built à Environments consists of services such as HDFS, Kafka, Storm from different service pools à Save time and reduce operational overhead with same drag and drop paradigm as the stream build module
  15. 15. 15 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Stream Builder Module for App Developers à Builder components, shown on the canvas palette, are the building blocks used by the app developer to build streaming apps. à Drag and drop to build a working streaming application without writing a single line of code. à 4 Types of Components: Sources, Processors, Sinks and Custom
  16. 16. 16 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SAM is All about Doing Real-Time Analytics on the Stream Real-Time Prescriptive Analytics Real-Time Analytics Real-Time Predictive Analytics Real-Time Descriptive Analytics What should we do right now? What could happen now/soon? What is happening right now?
  17. 17. 17 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Real-Time Prescriptive Analytics à Question: What should we do right now? à Context: It is rainy, the driver is been on the road for 12 hours and he has 30 high speeding alerts over a 3 minute window in the last 2 hours. à Answer: Dispatch a radio call to the Driver to slow down
  18. 18. 18 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Real-Time Predictive Analytics à Question: No violation events but what might happen that I need to be worried about? à My data science team has a model that can predict that based on – Weather – Roads – Driver HR info like driver certification status, wagePlan – Driver timesheet info like hours, and miles logged over the last week
  19. 19. 19 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Building the Predictive Model on HDP Explore small subset of events to identify predictive features and make a hypothesis. E.g. hypothesis: “foggy weather causes driver violations” 1 Identify suitable ML algorithms to train a model – we will use classification algorithms as we have labeled events data 2 Transform enriched events data to a format that is friendly to Spark MLlib – many ML libs expect training data in a certain format 3 Train a logistic classification Spark model on YARN, with above events as training input, and iterate to fine tune generated model 4
  20. 20. 20 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Logistical Regression Model
  21. 21. 21 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Scoring the Predictive Model on HDF Use SAM’s enrich/custom processors to enrich the event with the features required for the model6 Enrich with Features Use SAM’s projection/custom processors to transform/normalize the streaming event and the features required for the model 7 Transform/Normalize Use SAM’s PMML processor to score the model for each stream event with its required features8 Score Model Use SAM’s rule and notification processors to alert, notify and take action using the results of the model9 Alert / Notify / Action Export the Spark Mllib model and import into the HDF’s Model Registry 5 Model Registry
  22. 22. 22 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SAM’s Model Registry and PMML Processor à Model Registry – Sam has repository to store and manage PMML based predictive models – First class features like version, evolution policies, etc, will be added in future release à PMML Processor – Processor that can use model from the registry and score the models based on the input stream of events coming in
  23. 23. 23 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SAM Extensibility: Custom Processors, UDF, UDAFs à Custom Components – Most users will want to build custom components to meet certain requirements. – SAM provides the ability to add build custom component using the SAM SDK – The jars then can then be uploaded in SAM via the User Interface à 3 Types of Custom Components – Custom Processors – Custom UDF • User defined functions that are used by the Projection processor – Custom UDAFs • User defined aggregate functions that are used by the Aggregate processor. – SDK can be used to create custom UDF functions for windowed aggregations
  24. 24. 24 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Streaming Split Join Pattern à 3 Enrichments have to performed on the event stream to feed into model: – From Lat, Long and time, query weather conditions – From driverId, look up information about driver’s certification and wage plan – From driverId, look up information about how many miles and hours was on the driver on the road last week à Streaming Split Join Pattern – Complex Pattern that allows parallel processing to decrease latency (Used by Apache Metron extensively) 1. Create a splitJoin Key 2. Split the stream into n where n is the number of different enrichments you want to do 3. Join the n streams based on the splitJoinKey Complex pattern to implement that SAM allows the user to do simply with no code!
  25. 25. 25 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Stream Insight Module for Business Analysts à A tool to create time-series and real-time analytics dashboards, charts and graphs à 30+ visualization charts out of the box with customization capability à Druid is the Analytics Engine that powers the Stream Insight Module.
  26. 26. 26 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Streaming Analytics Manager
  27. 27. 27 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Set Up An Environment for SAM
  28. 28. 28 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks SAM Canvas to build the Streaming Analytics App without writing a line of code
  29. 29. 29 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks SAM App Dashboard
  30. 30. 30 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Schema Registry Dashboard and Details of One Schema
  31. 31. 31 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Contact: Timothy Spann @PaaSDeV www.meetup.com/futureofdata-princeton community.hortonworks.com/users/9304/tspann.html
  32. 32. 32 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks Community Connection Read access for everyone, join to participate and be recognized • Full Q&A Platform (like StackOverflow) • Knowledge Base Articles • Code Samples and Repositories
  33. 33. 33 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Community Engagement Participate now at: community.hortonworks.com© Hortonworks Inc. 2011 – 2015. All Rights Reserved 4,000+ Registered Users 10,000+ Answers 15,000+ Technical Assets One Website!

×