SlideShare a Scribd company logo
1 of 20
Download to read offline
The Fault-tolerant Aggregation of Massive Real-time
Data Streams
By: Timothy Farkas
Dimensions Computation
What’s The Problem?
Requirements
● Handle massive amount of data flowing into the system all the time.
● Extract meaningful statistics (aggregations) from the data in real-time.
● Isolate the effect of different variables in real-time.
● Identify trends over time, and observe changes in real time.
Who Cares?
● AdTech
● Telecom
● Industrial companies
● Appliance companies
● And many more
Dimensions Computation
An AdTech Use Case
What’s The Solution?
How Does DC Work?
Data Assumptions
● Our data is split into discrete pieces called Tuples.
● Each Tuple contains a set of Dimensions and a set of Measures.
● Measures are the pieces of information we want to collect statistics about.
● Dimensions are the variables which can impact our Measures.
● Each Tuple contains the same Dimensions and Measures.
How Does DC Work?
Processing Assumptions
● Dimensions Computation produces
Aggregations of our Measures
using Aggregators.
● Aggregators are Commutative and
Associative operations that are
performed on Measures.
● An Aggregation is represented by a
Dimension Combination.
● Dimension Combinations are
unique subsets of Dimensions.
How Does DC Work? (Example)
1. Take a Tuple
2. Extract the Dimensions
Combinations
3. Each Dimensions
Combination has an
Aggregation
4. Add the Tuple’s Measures
to the Aggregation for each
extracted Dimension
Combination
How Does DC Work? (Example)
How Does DC Work? (Example)
What About Other Aggregators?
Time Bucketing
● Aggregations every minute, hour, and day
Non-Commutative and Non-Associative Aggregators
● Average
● Standard Deviation
How Do We Build The App?
● Distributed software platform for Big Data
● Runs on Hadoop
● Real-time streaming data
● Fault-tolerant
What is Apache Apex?
● Tuple: Discrete unit of information sent from one operator to another.
● Operator: Java code that performs an operation on tuples. The code runs in a
hadoop container on a hadoop cluster.
● DAG: Operators can be connected to form an application. Tuple transfer between
operators is 1-way, so the application forms Directed Acyclic Graph.
● Window Id: An Id that is associated with Tuples and Operators, and is used for
fault tolerance.
Anatomy Of An Apache Apex App
Anatomy Of An Apache Apex App
● Partition: A Partition is a copy of an Operator, which
processes a subset of the data intended for the Operator.
● Unifier: An Operator which combines the Tuples
produced by upstream operators.
Scaling An Apache Apex App
Scaling An Apache Apex App
Apache Apex For Dimensions Computation
● Short-term aggregations
are done in-memory by
the DC Operator.
● The results of the in-
memory aggregations
are unified.
● Long-lived aggregations
are managed by the
Store Operator which
spools data to disk
(HDFS).
How Do We Manage The App?
RTS
● Dimensions Computation is available in Data Torrent RTS Enterprise Edition
● docs.datatorrent.com/operators/dimensions_computation/
● github.com/tweise/apex-samples/blob/master/mysqlDimensionsApp/
● www.datatorrent.com
● apex.apache.org
● Hadoop Summit Talk
Resources
● Widgets: Sasha Parfenov, Dean Lockgaurd
● Backend: David Yan, Bright Chen, Chinmay Kolhatkar
● Community Relationship: Jie Wu
● Apache Apex Community (1,400+ members)
Acknowledgements
Questions?

More Related Content

Similar to Dimensions computation meetup

Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareApache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop PlatformApache Apex
 
Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache ApexPramod Immaneni
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Bhupesh Chawda
 
Real-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache ApexReal-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache ApexApache Apex
 
Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systemsNenad Bozic
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...In-Memory Computing Summit
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridYahoo Developer Network
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...InfluxData
 
Datadogoverview.pptx
Datadogoverview.pptxDatadogoverview.pptx
Datadogoverview.pptxssuser8bc443
 
Large scale virtual Machine log collector (Project-Report)
Large scale virtual Machine log collector (Project-Report)Large scale virtual Machine log collector (Project-Report)
Large scale virtual Machine log collector (Project-Report)Gaurav Bhardwaj
 
Workflows via Event driven architecture
Workflows via Event driven architectureWorkflows via Event driven architecture
Workflows via Event driven architectureMilan Patel
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 

Similar to Dimensions computation meetup (20)

Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Major ppt
Major pptMajor ppt
Major ppt
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 
Stream Processing with Apache Apex
Stream Processing with Apache ApexStream Processing with Apache Apex
Stream Processing with Apache Apex
 
Sea of Data
Sea of DataSea of Data
Sea of Data
 
Modern Software Architectures - Overview
Modern Software Architectures - Overview Modern Software Architectures - Overview
Modern Software Architectures - Overview
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016
 
Real-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache ApexReal-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache Apex
 
Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systems
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory grid
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
 
Datadogoverview.pptx
Datadogoverview.pptxDatadogoverview.pptx
Datadogoverview.pptx
 
Large scale virtual Machine log collector (Project-Report)
Large scale virtual Machine log collector (Project-Report)Large scale virtual Machine log collector (Project-Report)
Large scale virtual Machine log collector (Project-Report)
 
Workflows via Event driven architecture
Workflows via Event driven architectureWorkflows via Event driven architecture
Workflows via Event driven architecture
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 

Recently uploaded

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 

Recently uploaded (20)

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 

Dimensions computation meetup

  • 1. The Fault-tolerant Aggregation of Massive Real-time Data Streams By: Timothy Farkas Dimensions Computation
  • 2. What’s The Problem? Requirements ● Handle massive amount of data flowing into the system all the time. ● Extract meaningful statistics (aggregations) from the data in real-time. ● Isolate the effect of different variables in real-time. ● Identify trends over time, and observe changes in real time. Who Cares? ● AdTech ● Telecom ● Industrial companies ● Appliance companies ● And many more
  • 3. Dimensions Computation An AdTech Use Case What’s The Solution?
  • 4. How Does DC Work? Data Assumptions ● Our data is split into discrete pieces called Tuples. ● Each Tuple contains a set of Dimensions and a set of Measures. ● Measures are the pieces of information we want to collect statistics about. ● Dimensions are the variables which can impact our Measures. ● Each Tuple contains the same Dimensions and Measures.
  • 5. How Does DC Work? Processing Assumptions ● Dimensions Computation produces Aggregations of our Measures using Aggregators. ● Aggregators are Commutative and Associative operations that are performed on Measures. ● An Aggregation is represented by a Dimension Combination. ● Dimension Combinations are unique subsets of Dimensions.
  • 6. How Does DC Work? (Example) 1. Take a Tuple 2. Extract the Dimensions Combinations 3. Each Dimensions Combination has an Aggregation 4. Add the Tuple’s Measures to the Aggregation for each extracted Dimension Combination
  • 7. How Does DC Work? (Example)
  • 8. How Does DC Work? (Example)
  • 9. What About Other Aggregators? Time Bucketing ● Aggregations every minute, hour, and day Non-Commutative and Non-Associative Aggregators ● Average ● Standard Deviation
  • 10. How Do We Build The App?
  • 11. ● Distributed software platform for Big Data ● Runs on Hadoop ● Real-time streaming data ● Fault-tolerant What is Apache Apex?
  • 12. ● Tuple: Discrete unit of information sent from one operator to another. ● Operator: Java code that performs an operation on tuples. The code runs in a hadoop container on a hadoop cluster. ● DAG: Operators can be connected to form an application. Tuple transfer between operators is 1-way, so the application forms Directed Acyclic Graph. ● Window Id: An Id that is associated with Tuples and Operators, and is used for fault tolerance. Anatomy Of An Apache Apex App
  • 13. Anatomy Of An Apache Apex App
  • 14. ● Partition: A Partition is a copy of an Operator, which processes a subset of the data intended for the Operator. ● Unifier: An Operator which combines the Tuples produced by upstream operators. Scaling An Apache Apex App
  • 15. Scaling An Apache Apex App
  • 16. Apache Apex For Dimensions Computation ● Short-term aggregations are done in-memory by the DC Operator. ● The results of the in- memory aggregations are unified. ● Long-lived aggregations are managed by the Store Operator which spools data to disk (HDFS).
  • 17. How Do We Manage The App? RTS
  • 18. ● Dimensions Computation is available in Data Torrent RTS Enterprise Edition ● docs.datatorrent.com/operators/dimensions_computation/ ● github.com/tweise/apex-samples/blob/master/mysqlDimensionsApp/ ● www.datatorrent.com ● apex.apache.org ● Hadoop Summit Talk Resources
  • 19. ● Widgets: Sasha Parfenov, Dean Lockgaurd ● Backend: David Yan, Bright Chen, Chinmay Kolhatkar ● Community Relationship: Jie Wu ● Apache Apex Community (1,400+ members) Acknowledgements