Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fast Data processing with RFX

486 views

Published on

Concepts and how to build a simple data product with RFX

Published in: Data & Analytics
  • Be the first to comment

Fast Data processing with RFX

  1. 1. Fast Data Processing with RFX Simplify Fast Data Processing tantrieuf31@gmail.com http://www.rfxlab.com
  2. 2. The Big Picture
  3. 3. Demo first
  4. 4. Content at glance 1. BEAM✲ methodology for agile data warehouse 2. Introduction to Fast Data 3. Problem “Fast Data in web analytics” 4. Examples for fast data design pattern (RFX or Reactive Function X) 4.1. Event data actor 4.2. Event data agent 4.3. Event data collector 4.4. Event data router 4.5. Event data processor 4.6. Event data storage 4.7. Event data query 4.8. Event data reactor 5. Demo “Fast Data in web analytics” with source code explanation
  5. 5. 1 - BEAM✲ methodology
  6. 6. 1 - BEAM✲ methodology for Agile Data Warehouse BEAM✲ stands for Business Event Analysis & Modelling, and it’s a methodology for gathering business requirements for Agile Data Warehouses and building those warehouses. It was developed by Lawrence Corr (@LawrenceCorr) and Jim Stagnitto (@JimStag), and published in their book Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema.
  7. 7. Example with BEAM✲
  8. 8. Goal: Modeling all business events and put into a database in agile way
  9. 9. 2 - Fast Data
  10. 10. Introduction to Fast Data
  11. 11. 3 - Problems in Practice
  12. 12. Problems “Fast Data in web analytics” 1. Counting pageview of website 2. Counting unique user of website 3. Sending email when pageview is unnormal (simple DDOS attack detection)
  13. 13. 4 - Thinking with RFX
  14. 14. ● A design pattern to solve big fast data problems ● A collection of Open Source Tools ● The mission of RFX 1. Build data product quickly with design patterns 2. Apply BEAM✲ for agile data pipeline 3. React to critical events in near-real-time What is RFX or Reactive Function X ?
  15. 15. Philosophy of RFX
  16. 16. How to solve problems with RFX ?
  17. 17. “Fast Data in web analytics” 1. Counting pageview of website 2. Counting unique user of website 3. Sending email when pageview is unnormal (simple DDOS attack detection)
  18. 18. Apply RFX into Pageview Analytics 1.1. Event data actor: a web user 1.2. Event data agent: RFX-track-js 1.3. Event data collector: RFX-track-server 1.4. Event data router: Apache Kafka 1.5. Event data processor: RFX-stream 1.6. Event data storage: Redis, MySQL 1.7. Event data query: RFX-data-api 1.8. Event data reactor: RFX-reactor
  19. 19. Demo and Explanation for code and concepts
  20. 20. https://github.com/rfxlab/pageview-analytics-with-rfx
  21. 21. Readings ● http://www.decisionone.co.uk/press/agile-data-warehouse-design-sampler.pdf ● http://www.slideshare.net/votrongdao/agile-data-warehouse-34427798 ● Apache Kafka Installation Video | How To Setup Apache Kafka https://youtu.be/Fg8cTsEk7Gc ● https://www.tutorialspoint.com/apache_kafka/ ● https://kafka.apache.org/quickstart ● http://xyu.io/2015/07/13/building-a-faster-etl-pipeline-with-flume-kafka-and-hive/ ● http://blog.cloudera.com/blog/2015/06/architectural-patterns-for-near-real-time-data-pr ocessing-with-apache-hadoop/ ● https://www.oreilly.com/ideas/drivetrain-approach-data-products

×