Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack

3,290 views

Published on

Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack
Why we still need SQL for Big Data ?
How to make Big Data more responsive and faster ?

Published in: Data & Analytics

Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack

  1. 1. Apache Phoenix with Actor Model (Akka.io) for Real-time Big Data Programming Stack Why we still need SQL for Big Data ? How to make Big Data more responsive and faster ? By http://nguyentantrieu.info Tech Lead at eClick team - FPT Online
  2. 2. Contents 1. What is Big data and Why ? 2. When standard relational database (Oracle,MySQL, ...) is not good enough 3. Common problems in big data system 4. Introducing open-source tools in Big Data System a. Apache Phoenix for ad-hoc query b. Actor Model and Akka.io for reactive data processing
  3. 3. What Does Big Data Actually Mean? “Big data means data that cannot fit easily into a standard relational database.” Hal Varian- Chief Economist, Google http://www.brookings.edu/blogs/techtank/posts/2014/09/11-big-data-definition
  4. 4. When standard relational database (Oracle,MySQL, ...) is not good enough the “analytic system” MySQL database from a startup, tracking all actions in mobile games: iOS, Android, ...
  5. 5. Complex analytic system and the “scale” pain
  6. 6. Definition from the crowd “Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning.” Jonathan Stuart Ward and Adam Barker Source: http://arxiv.org/abs/1309.5821 http://www.technologyreview.com/view/519851/the-big-data-conundrum-how-to-define- it/
  7. 7. “Chaotic” fact and the demand 80% of that data is unstructured or “chaotic” Photos, videos and social media posts - data that says so much about us - but cannot be analyzed via traditional methods Demand: “Finding order among chaos”
  8. 8. 3 common problems in Big Data System 1. Size: the volume of the datasets is a critical factor. 2. Complexity: the structure, behaviour and permutations of the datasets is a critical factor. 3. Technologies: the tools and techniques which are used to process a sizable or complex dataset is a critical factor.
  9. 9. Introducing open-source tools in Big Data System Apache Phoenix as SQL ad-hoc query engine Actor Model as nano-service for reactive data computation in the dawn of “Fast data”
  10. 10. Some innovative tools were born in the dawn of Big Data Age
  11. 11. But could an elephant fly without wings ?
  12. 12. But a phoenix can fly !
  13. 13. What is Apache Phoenix ? Apache Phoenix is a SQL skin over HBase. It means scaling Phoenix just like scale-up and scale-out the Hbase
  14. 14. Phoenix SQL Engine
  15. 15. Interesting features of Apache Phoenix ● Embedded JDBC driver implements the majority of java.sql interfaces, including the metadata APIs. ● Allows columns to be modeled as a multi-part row key or key/value cells. ● Full query support with predicate push down and optimal scan key formation. ● DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for adding/removing columns. ● Versioned schema repository. Snapshot queries use the schema that was in place when data was written. ● DML support: UPSERT VALUES for row-by-row insertion, UPSERT SELECT for mass data transfer between the same or different tables, and DELETE for deleting rows. ● Limited transaction support through client-side batching. ● Single table only - no joins yet and secondary indexes are a work in progress. ● Follows ANSI SQL standards whenever possible ● Requires HBase v 0.94.2 or above ● 100% Java
  16. 16. the Phoenix table schema
  17. 17. Setting JDBC Phoenix Driver
  18. 18. Phoenix and SQL tool in Eclipse 4
  19. 19. Phoenix vs Hive (running over HDFS and HBase) http://phoenix.apache.org/performance.html
  20. 20. Actor Model in the dawn of “Fast data”
  21. 21. http://youtu.be/TnLiEWglqHk - Google I/O 2014 - The dawn of "Fast Data"
  22. 22. The paper: MillWheel: Fault-Tolerant Stream Processing at Internet Scale
  23. 23. What is actor model ? ● Carl Hewitt defined the Actor Model in 1973 as a mathematical theory that treats “Actors” as the universal primitives of concurrent digital computation. ● A fitting model for heavily-parallel processing in a cloud environment
  24. 24. What actor model ?
  25. 25. is the framework for implementing Actor computation
  26. 26. Inspired by MillWheel of Google and Storm of Twitter, I have developed my own framework, the “Rfx” (Reactive Functor Extension) with Akka as core
  27. 27. The pipeline of finding social trends in real-time analytics
  28. 28. Facebook Social Trending from a website
  29. 29. Quick demo Using Akka (Rfx) and Apache Phoenix for Social Media Real-time Analytics
  30. 30. Links for self-study and research Actor Model and Programming: ● http://nguyentantrieu.info/blog/the-architecture-for-real-time-event-processing-with- reactive-actor-model ● http://www.slideshare.net/drorbr/the-actor-model-towards-better-concurrency ● http://www.infoq.com/articles/reactive-cloud-actors ● http://www.mc2ads.com/p/rfx-for-big-data-developer.html Apache Phoenix ● http://java.dzone.com/articles/apache-phoenix-sql-driver ● http://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html Big Data and Data Science ● http://www.mc2ads.com and http://www.mc2ads.org ● http://datascience101.wordpress.com ● http://lambda-architecture.net ● http://www.bigdata-startups.com ● https://www.coursera.org/course/datasci

×