SlideShare a Scribd company logo
1 of 19
© 2015 DataTorrent
Akshay Gore, Bhupesh Chawda
DataTorrent
Apex Hands-on Lab - Into the code!
Getting started with your first Apex Application!
© 2015 DataTorrent
Operators
• Input Adaptor Vs
Generic Operators ?
• What are streams?
• What are ports?
© 2015 DataTorrent
Apex Operator Lifecycle
© 2015 DataTorrent
Apex Streaming Application
public class Application implements StreamingApplication
{
populateDAG(DAG dag, Configuration conf)
{
// Add Operators to dag - dag.addOperator(args)
// Add Streams between operators - dag.addStream(args)
// Additional config + Hints to YARN - Optional
}
}
© 2015 DataTorrent
Apex Application - FilterWords
Apex Application DAG
• Problem statement - Filter words in the file
ᵒ Read a file located on HDFS
ᵒ Split each line into words, check if it is not one of the forbidden words and write it
down to HDFS
HDFS
Lines Filtered Words
HDFS
© 2015 DataTorrent
FilterWords Application DAG
Reader Tokenize Processor Writter
Input
Operator
(Adapter)
Output
Operator
(Adapter)
Generic
Operators
HDFS HDFS
Lines Words
Filtered
Words
© 2015 DataTorrent
Prerequisites
• JAVA 1.7 or above
• Maven 3.0 or above
• Apache Apex projects:
ᵒ Apache Apex Core: core platform, engine
ᵒ Apache Apex Malhar: operators library
• Hadoop cluster in running state
• Your favourite IDE - Eclipse / vi
© 2015 DataTorrent
Demo time!
• Apex application structure
• Application code walk through
• How to execute the application
• Assignment
© 2015 DataTorrent
Assignment - WordCount
Apex Application DAG
• Problem statement - Count occurrences of words in a file
ᵒ Read a file located on HDFS
ᵒ Emit count at the end of the every window and writes into HDFS
HDFS
Lines <Word, Count>
HDFS
© 2015 DataTorrent
Assignment - Word Count Application DAG
Reader Tokenize
Counter
Output
HDFS HDFS
Lines Words
<Word,
count>
© 2015 DataTorrent
Assignment - What you need to do
Reader Tokenizer Processor Writer
String String String
Line Words Words’
Counter Writer
Map
{Word: Count}
Assignment
© 2015 DataTorrent
Assignment - Hints
• Create copy of Processor.java. Name it Counter.java
• Modify Counter.java as follows:
ᵒ Define a data structure which can hold counts for words
ᵒ Process method of input port must count the occurrences
ᵒ Clear the counts in beginWindow() call
ᵒ Emit the counts in endWindow() call
© 2015 DataTorrent
Solution - Changes to Counter.java
• Need to define a data structure which can hold counts for words
private HashMap<String, Integer> counts = new HashMap<>();
• Process method of input port must count the occurrences
if(counts.containsKey(refinedWord)) {
counts.put(refinedWord, counts.get(refinedWord) + 1);
} else {
counts.put(refinedWord, 1);
}
● Clear the counts in beginWindow call
counts.clear();
● Emit the counts in endWindow call
output.emit(counts.toString());
● Run Application Test
© 2015 DataTorrent
Assignment - Are we done yet?
• Change the DAG
ᵒ Replace Processor operator with the newly created operator - Counter
© 2015 DataTorrent
Assignment - Slight change
• We are emitting a Map. However it is still a string.
ᵒ Change type of output port of Counter to type Map
ᵒ Change type of input port of Writer to Map
ᵒ Make appropriate changes to Writer to read a Map and write in a format such
that each line belongs to a single word.
© 2015 DataTorrent
Assignment - Final change
• Change the code such that each count is the overall count, not just for each
window?
© 2015 DataTorrent
Summary - Recap
• Writing Apache Apex operators
• Chaining the operators into an Apache Apex application
• Executing the application on the Apache Apex platform
© 2015 DataTorrent
Where to go from here?
Apache Apex Documentation - http://apex.incubator.apache.org/docs.html
Apache Apex Core Git - https://github.com/apache/incubator-apex-core
Apache Apex Malhar Git - https://github.com/apache/incubator-apex-malhar
Join Users Mailing List - users-subscribe@apex.incubator.apache.org
Join Dev Mailing List - dev-subscribe@apex.incubator.apache.org
Send queries to Users Mailing List - users@apex.incubator.apache.org
Send queries to Dev Mailing List - dev@apex.incubator.apache.org
© 2015 DataTorrent
Thank You

More Related Content

What's hot

Orca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataOrca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big Data
EMC
 

What's hot (17)

3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
 
LambdaFlow: Scala Functional Message Processing
LambdaFlow: Scala Functional Message Processing LambdaFlow: Scala Functional Message Processing
LambdaFlow: Scala Functional Message Processing
 
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
 
Rseminarp
RseminarpRseminarp
Rseminarp
 
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...
Monitoring, Alerting, and Tasks as Code by Russ Savage, Director of Product M...
 
Apache Apex as YARN Application
Apache Apex as YARN ApplicationApache Apex as YARN Application
Apache Apex as YARN Application
 
Asynchronous apex
Asynchronous apexAsynchronous apex
Asynchronous apex
 
Enhancements in Java 9 Streams
Enhancements in Java 9 StreamsEnhancements in Java 9 Streams
Enhancements in Java 9 Streams
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
 
Apex as yarn application
Apex as yarn applicationApex as yarn application
Apex as yarn application
 
TFIDF and Machine Learning – efficient hybrid processing
TFIDF and Machine Learning – efficient hybrid processingTFIDF and Machine Learning – efficient hybrid processing
TFIDF and Machine Learning – efficient hybrid processing
 
Orca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataOrca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big Data
 
Scilab: Computing Tool For Engineers
Scilab: Computing Tool For EngineersScilab: Computing Tool For Engineers
Scilab: Computing Tool For Engineers
 
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
 
Parallel First-Order Operations
Parallel First-Order OperationsParallel First-Order Operations
Parallel First-Order Operations
 
Whats New For Developers In JDK 9
Whats New For Developers In JDK 9Whats New For Developers In JDK 9
Whats New For Developers In JDK 9
 

Viewers also liked

Indoor video wall project reference
Indoor video wall project referenceIndoor video wall project reference
Indoor video wall project reference
Ivy Lin
 
Paredes blancas y aburridas con los vinilos decorativos el problema se va a ...
Paredes blancas y aburridas  con los vinilos decorativos el problema se va a ...Paredes blancas y aburridas  con los vinilos decorativos el problema se va a ...
Paredes blancas y aburridas con los vinilos decorativos el problema se va a ...
Marc Monserrat Monné
 
Emprendimiento
EmprendimientoEmprendimiento
Emprendimiento
Zlolorojo
 

Viewers also liked (14)

Popular National Parks to Visit in Canada
Popular National Parks to Visit in CanadaPopular National Parks to Visit in Canada
Popular National Parks to Visit in Canada
 
City Council March 19, 2013 Planning
City Council March 19, 2013 PlanningCity Council March 19, 2013 Planning
City Council March 19, 2013 Planning
 
Indoor video wall project reference
Indoor video wall project referenceIndoor video wall project reference
Indoor video wall project reference
 
Crear un-cuestionario
Crear un-cuestionarioCrear un-cuestionario
Crear un-cuestionario
 
Laamistad
LaamistadLaamistad
Laamistad
 
Paredes blancas y aburridas con los vinilos decorativos el problema se va a ...
Paredes blancas y aburridas  con los vinilos decorativos el problema se va a ...Paredes blancas y aburridas  con los vinilos decorativos el problema se va a ...
Paredes blancas y aburridas con los vinilos decorativos el problema se va a ...
 
Vecinos de Achumani se declaran en emergencia y movilización por el derecho a...
Vecinos de Achumani se declaran en emergencia y movilización por el derecho a...Vecinos de Achumani se declaran en emergencia y movilización por el derecho a...
Vecinos de Achumani se declaran en emergencia y movilización por el derecho a...
 
Emprendimiento
EmprendimientoEmprendimiento
Emprendimiento
 
Supply Chain Strategy
Supply Chain StrategySupply Chain Strategy
Supply Chain Strategy
 
Новогодняя елка в младшей школе
Новогодняя елка в младшей школеНовогодняя елка в младшей школе
Новогодняя елка в младшей школе
 
I sintomi di un'alta o bassa autostima
I sintomi di un'alta o bassa autostimaI sintomi di un'alta o bassa autostima
I sintomi di un'alta o bassa autostima
 
Security, Identity, and DevOps, oh my - Print
Security, Identity, and DevOps, oh my - PrintSecurity, Identity, and DevOps, oh my - Print
Security, Identity, and DevOps, oh my - Print
 
Genotypes and phenotypes
Genotypes and phenotypesGenotypes and phenotypes
Genotypes and phenotypes
 
Ficheiros de-escrita-criativa-varias-30
Ficheiros de-escrita-criativa-varias-30Ficheiros de-escrita-criativa-varias-30
Ficheiros de-escrita-criativa-varias-30
 

Similar to University program - writing an apache apex application

Similar to University program - writing an apache apex application (20)

Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache Apex
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application Meetup
 
Apache Apex Meetup at Cask
Apache Apex Meetup at CaskApache Apex Meetup at Cask
Apache Apex Meetup at Cask
 
Lambdas : Beyond The Basics
Lambdas : Beyond The BasicsLambdas : Beyond The Basics
Lambdas : Beyond The Basics
 
Apache Apex - Hadoop Users Group
Apache Apex - Hadoop Users GroupApache Apex - Hadoop Users Group
Apache Apex - Hadoop Users Group
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Building Your First Apache Apex Application
Building Your First Apache Apex ApplicationBuilding Your First Apache Apex Application
Building Your First Apache Apex Application
 
Building your first aplication using Apache Apex
Building your first aplication using Apache ApexBuilding your first aplication using Apache Apex
Building your first aplication using Apache Apex
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data AnalyticsSupersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
 
Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...
 
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdfDataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

University program - writing an apache apex application

  • 1. © 2015 DataTorrent Akshay Gore, Bhupesh Chawda DataTorrent Apex Hands-on Lab - Into the code! Getting started with your first Apex Application!
  • 2. © 2015 DataTorrent Operators • Input Adaptor Vs Generic Operators ? • What are streams? • What are ports?
  • 3. © 2015 DataTorrent Apex Operator Lifecycle
  • 4. © 2015 DataTorrent Apex Streaming Application public class Application implements StreamingApplication { populateDAG(DAG dag, Configuration conf) { // Add Operators to dag - dag.addOperator(args) // Add Streams between operators - dag.addStream(args) // Additional config + Hints to YARN - Optional } }
  • 5. © 2015 DataTorrent Apex Application - FilterWords Apex Application DAG • Problem statement - Filter words in the file ᵒ Read a file located on HDFS ᵒ Split each line into words, check if it is not one of the forbidden words and write it down to HDFS HDFS Lines Filtered Words HDFS
  • 6. © 2015 DataTorrent FilterWords Application DAG Reader Tokenize Processor Writter Input Operator (Adapter) Output Operator (Adapter) Generic Operators HDFS HDFS Lines Words Filtered Words
  • 7. © 2015 DataTorrent Prerequisites • JAVA 1.7 or above • Maven 3.0 or above • Apache Apex projects: ᵒ Apache Apex Core: core platform, engine ᵒ Apache Apex Malhar: operators library • Hadoop cluster in running state • Your favourite IDE - Eclipse / vi
  • 8. © 2015 DataTorrent Demo time! • Apex application structure • Application code walk through • How to execute the application • Assignment
  • 9. © 2015 DataTorrent Assignment - WordCount Apex Application DAG • Problem statement - Count occurrences of words in a file ᵒ Read a file located on HDFS ᵒ Emit count at the end of the every window and writes into HDFS HDFS Lines <Word, Count> HDFS
  • 10. © 2015 DataTorrent Assignment - Word Count Application DAG Reader Tokenize Counter Output HDFS HDFS Lines Words <Word, count>
  • 11. © 2015 DataTorrent Assignment - What you need to do Reader Tokenizer Processor Writer String String String Line Words Words’ Counter Writer Map {Word: Count} Assignment
  • 12. © 2015 DataTorrent Assignment - Hints • Create copy of Processor.java. Name it Counter.java • Modify Counter.java as follows: ᵒ Define a data structure which can hold counts for words ᵒ Process method of input port must count the occurrences ᵒ Clear the counts in beginWindow() call ᵒ Emit the counts in endWindow() call
  • 13. © 2015 DataTorrent Solution - Changes to Counter.java • Need to define a data structure which can hold counts for words private HashMap<String, Integer> counts = new HashMap<>(); • Process method of input port must count the occurrences if(counts.containsKey(refinedWord)) { counts.put(refinedWord, counts.get(refinedWord) + 1); } else { counts.put(refinedWord, 1); } ● Clear the counts in beginWindow call counts.clear(); ● Emit the counts in endWindow call output.emit(counts.toString()); ● Run Application Test
  • 14. © 2015 DataTorrent Assignment - Are we done yet? • Change the DAG ᵒ Replace Processor operator with the newly created operator - Counter
  • 15. © 2015 DataTorrent Assignment - Slight change • We are emitting a Map. However it is still a string. ᵒ Change type of output port of Counter to type Map ᵒ Change type of input port of Writer to Map ᵒ Make appropriate changes to Writer to read a Map and write in a format such that each line belongs to a single word.
  • 16. © 2015 DataTorrent Assignment - Final change • Change the code such that each count is the overall count, not just for each window?
  • 17. © 2015 DataTorrent Summary - Recap • Writing Apache Apex operators • Chaining the operators into an Apache Apex application • Executing the application on the Apache Apex platform
  • 18. © 2015 DataTorrent Where to go from here? Apache Apex Documentation - http://apex.incubator.apache.org/docs.html Apache Apex Core Git - https://github.com/apache/incubator-apex-core Apache Apex Malhar Git - https://github.com/apache/incubator-apex-malhar Join Users Mailing List - users-subscribe@apex.incubator.apache.org Join Dev Mailing List - dev-subscribe@apex.incubator.apache.org Send queries to Users Mailing List - users@apex.incubator.apache.org Send queries to Dev Mailing List - dev@apex.incubator.apache.org

Editor's Notes

  1. Operators are basic compute units. Operators process each incoming tuple and emit zero or more tuples on output ports as per the business logic. Input Adapter - This is one of the starting points in the application DAG and is responsible for getting tuples from an external system. At the same time, such data may also be generated by the operator itself, without interacting with the outside world Generic Operator - This type of operator accepts input tuples from the previous operators and passes them on to the following operators in the DAG Output Adapter - This is one of the ending points in the application DAG and is responsible for writing the data out to some external system.
  2. An operator passes through various stages during its lifetime. Each stage is an API call that the Streaming Application Master makes for an operator. setup() call initializes the operator and prepares itself to start processing tuples. beginWindow() call marks the beginning of an application window and allows for any processing to be done before a window starts process() call belongs to the InputPort and gets triggered when any tuple arrives at the Input port of the operator emitTuples() call is used by Input adapters to emit any tuples that are fetched from the external systems endWindow() call marks the end of the window and allows for any processing to be done after the window ends teardown() call is used for gracefully shutting down the operator and releasing any resources held by the operator
  3. Skeleton for Apex application
  4. For application development or for functional testing, hadoop cluster or services as it can run in the local file system as single process with multiple threads. A hadoop cluster (distributed cluster) is recommended for benchmarking and production testing. For single node cluster, throughput will not be high as multi node cluster, memory constraints