Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Evolving Beyond the Data Lake: A Story of Wind and Rain

524 views

Published on

Strata Singapore 2016

Published in: Data & Analytics
  • Be the first to comment

Evolving Beyond the Data Lake: A Story of Wind and Rain

  1. 1. 1© 2016 MapR Technologies 1© 2016 MapR Technologies Evolving Beyond the Data Lake A Story of Wind and Rain
  2. 2. 2© 2016 MapR Technologies 2 Industry Leaders Are Investing in Disruptive Technology Now Innovating and reducing costs at the same time Source: IDC, Gartner; Analysis & Estimates: MapR Next-gen consists of cloud, big data, software and hardware related expenses (100,000) (80,000) (60,000) (40,000) (20,000) - 20,000 40,000 60,000 80,000 100,000 120,000 2013 2014 2015 2016 2017 2018 2019 2020 Investment in Next-Gen vs. Legacy Technologies for Data $120 100 80 60 40 20 (20) (40) (60) (80) (100) In Billions Total $ Growth of IT Market Next-Gen Growth Legacy Market Growth/Shrink in $ 90% of data is on next-gen technology in just four years
  3. 3. 3© 2016 MapR Technologies 3 Application Development and Deployment Oracle Bulk Load Machine Learning Data Lake Predictive Modeling BI / Reporting Insights DB Events (Kafka) NoSQL SQL Server Graph DB Microservice (.NET) Microservice (NodeJS) Microservice (Java) Customer Insights SQL Server IIS, ASP.NET Desktop Browser (Javascript, jQuery) SQL HTML, CSS, JS Microsoft Reporting Service 2005 Today Desktop Browser (Javascript, 20+ Frameworks) Tablet Native Android Native iOS JSON JSON, CSS, HTML, JS Backendfor Frontend (Java)
  4. 4. 4© 2016 MapR Technologies 4 Application Development and Deployment Oracle Bulk Load Machine Learning Data Lake Predictive Modeling BI / Reporting Insights DB Events (Kafka) NoSQL SQL Server Graph DB Microservice (.NET) Backendfor Frontend (Java) Microservice (NodeJS) Microservice (Java) Desktop Browser (Javascript, 20+ Frameworks) Tablet Native Android Native iOS Customer Insights JSON JSON, CSS, HTML, JS SQL Server IIS, ASP.NET Desktop Browser (Javascript, jQuery) SQL HTML, CSS, JS Microsoft Reporting Service 2005 Today
  5. 5. 5© 2016 MapR Technologies 5© 2016 MapR Technologies© 2016 MapR Technologies Messaging platforms
  6. 6. 6© 2016 MapR Technologies 6 Producers Consumers A stream is an unbounded sequence of events carried from a set of producers to a set of consumers. What’s a Stream? Producers and consumers don’t have to be aware of each other, instead they participate in shared topics. This is called publish/subscribe. /Events:Topic
  7. 7. 7© 2016 MapR Technologies 7 Publishers and Subscribers (pub-sub) /Events:Topic Analytics Consumers Stream ProcessorsSocial Platforms Servers (Logs, Metrics) Sensors Mobile Apps Other Apps & Microservices Alerting Systems Stream Processing Frameworks Databases & Search Engines Dashboards Other Apps & Microservices
  8. 8. 8© 2016 MapR Technologies 8 Considering a Messaging Platform • 50-100k messages per second used to be good – Not really good to handle decoupled communication between services • Kafka model is BLAZING fast – Kafka 0.9 API with message sizes at 200 bytes – MapR Streams on a 5 node cluster sustained 18 million events / sec – Throughput of 3.5GB/s and over 1.5 trillion events / day • Manual sharding is not a “great” solution – Adding more servers should be easy and fool proof, not painful – Yes, I have lived through this
  9. 9. 9© 2016 MapR Technologies 9 Goals • Real-time or near-time – Includes situations with deadlines – Also includes situations where delay is simply undesirable – Even includes situations where delay is just fine • Microservices – Streaming is a convenient idiom for design – Microservices … you know we wanted it – Service isolation is a key requirement
  10. 10. 10© 2016 MapR Technologies 10 Advantages of Messaging and Real-time Enablement • Less moving parts – Less things to go wrong • Better resource utilization – Scale any application up or down on demand • Common deployment model (new isolation model) – Repeatability between environments (dev, qa, production) • Improved integration testing – Listen to production streams in dev and qa (** this is a BIG DEAL! **) • Shared file system – Get at the data anywhere in the cluster – Simplifies business continuity
  11. 11. 11© 2016 MapR Technologies 11 A microservice is loosely coupled with bounded context
  12. 12. 12© 2016 MapR Technologies 12 How to Couple Services and Break micro-ness • Shared schemas, relational stores • Ad hoc communication between services • Enterprise service busses • Brittle protocols • Poor protocol versioning Don’t do this!
  13. 13. 13© 2016 MapR Technologies 13 How to Decouple Services • Use self-describing data • Private databases • Infrastructural communication between services • Use modern protocols • Adopt future-proof protocol practices • Use shared storage where necessary due to scale
  14. 14. 14© 2016 MapR Technologies 14 Decoupled Architecture Producer Activity Handler Producer Producer Historical Interesting Data Real-time Analysis Results Dashboard Anomaly Detection
  15. 15. 15© 2016 MapR Technologies 15 Mechanisms for Decoupling • Traditional message queues? – Message queues are classic answer – Key feature/flaw is out-of-order acknowledgement – Many implementations – You pay a huge performance hit for persistence • Kafka-esque Logs? – Logs are like queues, but with ordering – Out-of-order consumption is possible, acknowledgement not so much – Canonical base implementation is Kafka – Performance plus persistence
  16. 16. 16© 2016 MapR Technologies 16 Shared Resources
  17. 17. 17© 2016 MapR Technologies 17 Fraud Detection ? POS 1 location, t, card # yes/no? POS 2 location, t, card # yes/no?
  18. 18. 18© 2016 MapR Technologies 18 Traditional Solution POS 1..n Fraud detector Last card use
  19. 19. 19© 2016 MapR Technologies 19 What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector
  20. 20. 20© 2016 MapR Technologies 20 What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector
  21. 21. 21© 2016 MapR Technologies 21 How to Get Service Isolation POS 1..n Fraud detector Last card use Updater card activity
  22. 22. 22© 2016 MapR Technologies 22 New Uses of Data POS 1..n Fraud detector Last card use Updater Card location history Other card activity
  23. 23. 23© 2016 MapR Technologies 23 Scaling Through Isolation POS 1..n Last card use Updater POS 1..n Last card use Updater card activity Fraud detector Fraud detector
  24. 24. 24© 2016 MapR Technologies 24© 2016 MapR Technologies Use Cases
  25. 25. 25© 2016 MapR Technologies 25 Event-based Data Drives Applications Failure Alerts Real-time application & network monitoring Trending now Web Personalized Offers Real-time Fraud Detection Ad optimization Supply Chain Optimization
  26. 26. 26© 2016 MapR Technologies 26 Classifiers Fighting Fraudulent Web Traffic Activity Stream Click Stream Deviation from Normal Blacklist Activities Whitelist Activities User Activity Profile Known Bad Classifier All OK Classifier Session Alteration Stream Notify Security
  27. 27. 27© 2016 MapR Technologies 27 Similarities between Marketing and Fraud? Customer 360 Website Fraud • Build a user profile – What are their normal usage patterns • Build “segmented” profiles – What do real users normally do • Dynamically alter website – Prevent user functionality • Kick-off external workflows – Notify security team • Build a user profile – What type of content do they like • Build “segmented” profiles – Company affiliation • Dynamically alter website – Show alternate content • Kick-off external workflows – Nurture emails
  28. 28. 28© 2016 MapR Technologies 28 Message Bus Specialized Storage Operational Applications J2EE AppServer Relational Database Legacy Business Platforms • IT must integrate all the products • Inability to operationalize the insight rapidly • Can’t deal with high speed data ingestion and processing • Scale up architecture leads to high cost Specialized Storage Analytical Applications Analytic Database ETL Tool BI Tool
  29. 29. 29© 2016 MapR Technologies 29 Converged Data Platform Analytical Applications Operational Applications Converged Applications Complete Access to Real-time and Historical Data in One Platform Developers Creating Database and Event Based Applications (Bottom Line Initiatives) (Top Line Initiatives) Analysts Creating BI Reports and KPIs on Data Warehouse Historical Data Current Data
  30. 30. 30© 2016 MapR Technologies 30 Web-Scale Storage MapR-FS MapR-DB Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability MapR Streams Event StreamingDatabase MapR Platform Services: Open API Architecture Assures Interoperability, Avoids Lock-in HDFS API POSIX NFS SQL, HBase API JSON API Kafka API
  31. 31. 31© 2016 MapR Technologies 31 Converged Application Benefits • Consumers scale horizontally with partitions • 1:1 mapping between consumer and partition • Enables predictable scaling as production needs grow • Data can be seamlessly replicated to another cluster • Enables HA with zero code changes • Data is indexed dynamically according to receivers, senders • Scales beyond the capabilities of Kafka • Snapshots can be taken to capture state • Enables faster testing and deployment of applications
  32. 32. 32© 2016 MapR Technologies 32 Not All Data Platforms are the Same
  33. 33. 33© 2016 MapR Technologies 33 @kingmesal jscott@mapr.com Engage with us! kingmesal

×