SlideShare a Scribd company logo
1 of 19
© 2015 DataTorrent
Akshay Gore, Bhupesh Chawda
DataTorrent
Apex Hands-on Lab - Into the code!
Getting started with your first Apex Application!
© 2015 DataTorrent
Operators
• Input Adaptor Vs
Generic Operators ?
• What are streams?
• What are ports?
© 2015 DataTorrent
Apex Operator Lifecycle
© 2015 DataTorrent
Apex Streaming Application
public class Application implements StreamingApplication
{
populateDAG(DAG dag, Configuration conf)
{
// Add Operators to dag - dag.addOperator(args)
// Add Streams between operators - dag.addStream(args)
// Additional config + Hints to YARN - Optional
}
}
© 2015 DataTorrent
Apex Application - FilterWords
Apex Application DAG
• Problem statement - Filter words in the file
ᵒ Read a file located on HDFS
ᵒ Split each line into words, check if it is not one of the forbidden words and write it
down to HDFS
HDFS
Lines Filtered Words
HDFS
© 2015 DataTorrent
FilterWords Application DAG
Reader Tokenize Processor Writter
Input
Operator
(Adapter)
Output
Operator
(Adapter)
Generic
Operators
HDFS HDFS
Lines Words
Filtered
Words
© 2015 DataTorrent
Prerequisites
• JAVA 1.7 or above
• Maven 3.0 or above
• Apache Apex projects:
ᵒ Apache Apex Core: core platform, engine
ᵒ Apache Apex Malhar: operators library
• Hadoop cluster in running state
• Your favourite IDE - Eclipse / vi
© 2015 DataTorrent
Demo time!
• Apex application structure
• Application code walk through
• How to execute the application
• Assignment
© 2015 DataTorrent
Assignment - WordCount
Apex Application DAG
• Problem statement - Count occurrences of words in a file
ᵒ Read a file located on HDFS
ᵒ Emit count at the end of the every window and writes into HDFS
HDFS
Lines <Word, Count>
HDFS
© 2015 DataTorrent
Assignment - Word Count Application DAG
Reader Tokenize
Counter
Output
HDFS HDFS
Lines Words
<Word,
count>
© 2015 DataTorrent
Assignment - What you need to do
Reader Tokenizer Processor Writer
String String String
Line Words Words’
Counter Writer
Map
{Word: Count}
Assignment
© 2015 DataTorrent
Assignment - Hints
• Create copy of Processor.java. Name it Counter.java
• Modify Counter.java as follows:
ᵒ Define a data structure which can hold counts for words
ᵒ Process method of input port must count the occurrences
ᵒ Clear the counts in beginWindow() call
ᵒ Emit the counts in endWindow() call
© 2015 DataTorrent
Solution - Changes to Counter.java
• Need to define a data structure which can hold counts for words
private HashMap<String, Integer> counts = new HashMap<>();
• Process method of input port must count the occurrences
if(counts.containsKey(refinedWord)) {
counts.put(refinedWord, counts.get(refinedWord) + 1);
} else {
counts.put(refinedWord, 1);
}
● Clear the counts in beginWindow call
counts.clear();
● Emit the counts in endWindow call
output.emit(counts.toString());
● Run Application Test
© 2015 DataTorrent
Assignment - Are we done yet?
• Change the DAG
ᵒ Replace Processor operator with the newly created operator - Counter
© 2015 DataTorrent
Assignment - Slight change
• We are emitting a Map. However it is still a string.
ᵒ Change type of output port of Counter to type Map
ᵒ Change type of input port of Writer to Map
ᵒ Make appropriate changes to Writer to read a Map and write in a format such that
each line belongs to a single word.
© 2015 DataTorrent
Assignment - Final change
• Change the code such that each count is the overall count, not just for each
window?
© 2015 DataTorrent
Summary - Recap
• Writing Apache Apex operators
• Chaining the operators into an Apache Apex application
• Executing the application on the Apache Apex platform
© 2015 DataTorrent
Where to go from here?
Apache Apex Documentation - http://apex.incubator.apache.org/docs.html
Apache Apex Core Git - https://github.com/apache/incubator-apex-core
Apache Apex Malhar Git - https://github.com/apache/incubator-apex-malhar
Join Users Mailing List - users-subscribe@apex.incubator.apache.org
Join Dev Mailing List - dev-subscribe@apex.incubator.apache.org
Send queries to Users Mailing List - users@apex.incubator.apache.org
Send queries to Dev Mailing List - dev@apex.incubator.apache.org
© 2015 DataTorrent
Thank You

More Related Content

What's hot

Salesforce Summer 14 Release
Salesforce Summer 14 ReleaseSalesforce Summer 14 Release
Salesforce Summer 14 ReleaseJyothylakshmy P.U
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @ShanghaiLuke Han
 
LambdaFlow: Scala Functional Message Processing
LambdaFlow: Scala Functional Message Processing LambdaFlow: Scala Functional Message Processing
LambdaFlow: Scala Functional Message Processing John Nestor
 
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaInfluxData
 
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...InfluxData
 
Apache Apex as YARN Application
Apache Apex as YARN ApplicationApache Apex as YARN Application
Apache Apex as YARN ApplicationChinmay Kolhatkar
 
Enhancements in Java 9 Streams
Enhancements in Java 9 StreamsEnhancements in Java 9 Streams
Enhancements in Java 9 StreamsCorneil du Plessis
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...InfluxData
 
TFIDF and Machine Learning – efficient hybrid processing
TFIDF and Machine Learning – efficient hybrid processingTFIDF and Machine Learning – efficient hybrid processing
TFIDF and Machine Learning – efficient hybrid processingMor Krispil
 
Orca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataOrca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataEMC
 
Scilab: Computing Tool For Engineers
Scilab: Computing Tool For EngineersScilab: Computing Tool For Engineers
Scilab: Computing Tool For EngineersNaren P.R.
 
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...Flink Forward
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKInfluxData
 
Parallel First-Order Operations
Parallel First-Order OperationsParallel First-Order Operations
Parallel First-Order OperationsSina Madani
 
Whats New For Developers In JDK 9
Whats New For Developers In JDK 9Whats New For Developers In JDK 9
Whats New For Developers In JDK 9Simon Ritter
 

What's hot (18)

Salesforce Summer 14 Release
Salesforce Summer 14 ReleaseSalesforce Summer 14 Release
Salesforce Summer 14 Release
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
 
LambdaFlow: Scala Functional Message Processing
LambdaFlow: Scala Functional Message Processing LambdaFlow: Scala Functional Message Processing
LambdaFlow: Scala Functional Message Processing
 
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
 
Rseminarp
RseminarpRseminarp
Rseminarp
 
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...
 
Apache Apex as YARN Application
Apache Apex as YARN ApplicationApache Apex as YARN Application
Apache Apex as YARN Application
 
Asynchronous apex
Asynchronous apexAsynchronous apex
Asynchronous apex
 
Enhancements in Java 9 Streams
Enhancements in Java 9 StreamsEnhancements in Java 9 Streams
Enhancements in Java 9 Streams
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
 
Apex as yarn application
Apex as yarn applicationApex as yarn application
Apex as yarn application
 
TFIDF and Machine Learning – efficient hybrid processing
TFIDF and Machine Learning – efficient hybrid processingTFIDF and Machine Learning – efficient hybrid processing
TFIDF and Machine Learning – efficient hybrid processing
 
Orca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataOrca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big Data
 
Scilab: Computing Tool For Engineers
Scilab: Computing Tool For EngineersScilab: Computing Tool For Engineers
Scilab: Computing Tool For Engineers
 
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
 
Parallel First-Order Operations
Parallel First-Order OperationsParallel First-Order Operations
Parallel First-Order Operations
 
Whats New For Developers In JDK 9
Whats New For Developers In JDK 9Whats New For Developers In JDK 9
Whats New For Developers In JDK 9
 

Similar to Building YARN Applications

Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache ApexPramod Immaneni
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120Hyoungjun Kim
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupThomas Weise
 
Apache Apex Meetup at Cask
Apache Apex Meetup at CaskApache Apex Meetup at Cask
Apache Apex Meetup at CaskApache Apex
 
Lambdas : Beyond The Basics
Lambdas : Beyond The BasicsLambdas : Beyond The Basics
Lambdas : Beyond The BasicsSimon Ritter
 
Apache Apex - Hadoop Users Group
Apache Apex - Hadoop Users GroupApache Apex - Hadoop Users Group
Apache Apex - Hadoop Users GroupPramod Immaneni
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...Yahoo Developer Network
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceSachin Aggarwal
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)Apache Apex
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureJianfeng Zhang
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureRajesh Balamohan
 
Building Your First Apache Apex Application
Building Your First Apache Apex ApplicationBuilding Your First Apache Apex Application
Building Your First Apache Apex ApplicationApache Apex
 
Building your first aplication using Apache Apex
Building your first aplication using Apache ApexBuilding your first aplication using Apache Apex
Building your first aplication using Apache ApexYogi Devendra Vyavahare
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data AnalyticsSupersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analyticsmason_s
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationApache Apex
 
Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...Erwin de Kreuk
 
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdfDataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdfMiguel Angel Fajardo
 

Similar to Building YARN Applications (20)

Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache Apex
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application Meetup
 
Apache Apex Meetup at Cask
Apache Apex Meetup at CaskApache Apex Meetup at Cask
Apache Apex Meetup at Cask
 
Lambdas : Beyond The Basics
Lambdas : Beyond The BasicsLambdas : Beyond The Basics
Lambdas : Beyond The Basics
 
Apache Apex - Hadoop Users Group
Apache Apex - Hadoop Users GroupApache Apex - Hadoop Users Group
Apache Apex - Hadoop Users Group
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Building Your First Apache Apex Application
Building Your First Apache Apex ApplicationBuilding Your First Apache Apex Application
Building Your First Apache Apex Application
 
Building your first aplication using Apache Apex
Building your first aplication using Apache ApexBuilding your first aplication using Apache Apex
Building your first aplication using Apache Apex
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data AnalyticsSupersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
 
Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...
 
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdfDataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
 

More from Apache Apex

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexApache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareApache Apex
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Apache Apex
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataApache Apex
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentApache Apex
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingApache Apex
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsApache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Apache Apex
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentApache Apex
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 

More from Apache Apex (20)

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 

Recently uploaded

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?Watsoo Telematics
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 

Recently uploaded (20)

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 

Building YARN Applications

  • 1. © 2015 DataTorrent Akshay Gore, Bhupesh Chawda DataTorrent Apex Hands-on Lab - Into the code! Getting started with your first Apex Application!
  • 2. © 2015 DataTorrent Operators • Input Adaptor Vs Generic Operators ? • What are streams? • What are ports?
  • 3. © 2015 DataTorrent Apex Operator Lifecycle
  • 4. © 2015 DataTorrent Apex Streaming Application public class Application implements StreamingApplication { populateDAG(DAG dag, Configuration conf) { // Add Operators to dag - dag.addOperator(args) // Add Streams between operators - dag.addStream(args) // Additional config + Hints to YARN - Optional } }
  • 5. © 2015 DataTorrent Apex Application - FilterWords Apex Application DAG • Problem statement - Filter words in the file ᵒ Read a file located on HDFS ᵒ Split each line into words, check if it is not one of the forbidden words and write it down to HDFS HDFS Lines Filtered Words HDFS
  • 6. © 2015 DataTorrent FilterWords Application DAG Reader Tokenize Processor Writter Input Operator (Adapter) Output Operator (Adapter) Generic Operators HDFS HDFS Lines Words Filtered Words
  • 7. © 2015 DataTorrent Prerequisites • JAVA 1.7 or above • Maven 3.0 or above • Apache Apex projects: ᵒ Apache Apex Core: core platform, engine ᵒ Apache Apex Malhar: operators library • Hadoop cluster in running state • Your favourite IDE - Eclipse / vi
  • 8. © 2015 DataTorrent Demo time! • Apex application structure • Application code walk through • How to execute the application • Assignment
  • 9. © 2015 DataTorrent Assignment - WordCount Apex Application DAG • Problem statement - Count occurrences of words in a file ᵒ Read a file located on HDFS ᵒ Emit count at the end of the every window and writes into HDFS HDFS Lines <Word, Count> HDFS
  • 10. © 2015 DataTorrent Assignment - Word Count Application DAG Reader Tokenize Counter Output HDFS HDFS Lines Words <Word, count>
  • 11. © 2015 DataTorrent Assignment - What you need to do Reader Tokenizer Processor Writer String String String Line Words Words’ Counter Writer Map {Word: Count} Assignment
  • 12. © 2015 DataTorrent Assignment - Hints • Create copy of Processor.java. Name it Counter.java • Modify Counter.java as follows: ᵒ Define a data structure which can hold counts for words ᵒ Process method of input port must count the occurrences ᵒ Clear the counts in beginWindow() call ᵒ Emit the counts in endWindow() call
  • 13. © 2015 DataTorrent Solution - Changes to Counter.java • Need to define a data structure which can hold counts for words private HashMap<String, Integer> counts = new HashMap<>(); • Process method of input port must count the occurrences if(counts.containsKey(refinedWord)) { counts.put(refinedWord, counts.get(refinedWord) + 1); } else { counts.put(refinedWord, 1); } ● Clear the counts in beginWindow call counts.clear(); ● Emit the counts in endWindow call output.emit(counts.toString()); ● Run Application Test
  • 14. © 2015 DataTorrent Assignment - Are we done yet? • Change the DAG ᵒ Replace Processor operator with the newly created operator - Counter
  • 15. © 2015 DataTorrent Assignment - Slight change • We are emitting a Map. However it is still a string. ᵒ Change type of output port of Counter to type Map ᵒ Change type of input port of Writer to Map ᵒ Make appropriate changes to Writer to read a Map and write in a format such that each line belongs to a single word.
  • 16. © 2015 DataTorrent Assignment - Final change • Change the code such that each count is the overall count, not just for each window?
  • 17. © 2015 DataTorrent Summary - Recap • Writing Apache Apex operators • Chaining the operators into an Apache Apex application • Executing the application on the Apache Apex platform
  • 18. © 2015 DataTorrent Where to go from here? Apache Apex Documentation - http://apex.incubator.apache.org/docs.html Apache Apex Core Git - https://github.com/apache/incubator-apex-core Apache Apex Malhar Git - https://github.com/apache/incubator-apex-malhar Join Users Mailing List - users-subscribe@apex.incubator.apache.org Join Dev Mailing List - dev-subscribe@apex.incubator.apache.org Send queries to Users Mailing List - users@apex.incubator.apache.org Send queries to Dev Mailing List - dev@apex.incubator.apache.org

Editor's Notes

  1. Operators are basic compute units. Operators process each incoming tuple and emit zero or more tuples on output ports as per the business logic. Input Adapter - This is one of the starting points in the application DAG and is responsible for getting tuples from an external system. At the same time, such data may also be generated by the operator itself, without interacting with the outside world Generic Operator - This type of operator accepts input tuples from the previous operators and passes them on to the following operators in the DAG Output Adapter - This is one of the ending points in the application DAG and is responsible for writing the data out to some external system.
  2. An operator passes through various stages during its lifetime. Each stage is an API call that the Streaming Application Master makes for an operator. setup() call initializes the operator and prepares itself to start processing tuples. beginWindow() call marks the beginning of an application window and allows for any processing to be done before a window starts process() call belongs to the InputPort and gets triggered when any tuple arrives at the Input port of the operator emitTuples() call is used by Input adapters to emit any tuples that are fetched from the external systems endWindow() call marks the end of the window and allows for any processing to be done after the window ends teardown() call is used for gracefully shutting down the operator and releasing any resources held by the operator
  3. Skeleton for Apex application
  4. For application development or for functional testing, hadoop cluster or services as it can run in the local file system as single process with multiple threads. A hadoop cluster (distributed cluster) is recommended for benchmarking and production testing. For single node cluster, throughput will not be high as multi node cluster, memory constraints