Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Lakes are Worth Saving

Data lakes started as a place for big data discovery and exploration for highly-technical users. But leading organizations have evolved data lakes to support a wide variety of business use cases, assisted by machine learning and artificial intelligence, visual analytics, BI tools, and other methods.

The data native concept helps explain how big data should be used, particularly for data lakes. But so many attempts at building data lakes have failed due to process and technology mismatches.

Join us for this webinar with industry experts from Evolved Media, Arcadia Data, and MapR to learn:

- Why and how data lakes have failed over the years
- What key analytic and data governance capabilities you should aspire to have in your data lake
- Key challenges to overcome
- Principles to consider for your data lake strategy

  • Login to see the comments

  • Be the first to like this

Data Lakes are Worth Saving

  1. 1. Arcadia Data. Proprietary and Confidential March 28, 2018 WEBINAR Data Lakes are Worth Saving
  2. 2. Arcadia Data. Proprietary and Confidential2 Logistics for Today’s Webinar An archived version of the event recording will be available at worth-saving-webinar/ • Log questions in the chat panel located in the GoToWebinar control panel of your screen • Questions will be addressed during the Q&A session of the event QUESTIONS EVENT RECORDING A PDF of the speaker slides will be distributed to all attendees PDF SLIDES Desktop control panel Browser control panel
  3. 3. Arcadia Data. Proprietary and Confidential Agenda 1. Meet your presenters 2. A data native approach: Key to saving the data lake 3. Why have data lakes failed? 4. Big data versus small data questions 5. How to save your data lake? 6. Four key challenges 7. The right architecture 3
  4. 4. Arcadia Data. Proprietary and Confidential Meet Your Presenters 4 Dan Woods CEO & Chief Analyst, Early Adopter Research I create ideas about technology products, based on a broad technical understanding. By writing as an analyst in Forbes and working with Evolved Media’s clients, I see the magic in technology and why it matters to IT buyers. Steve Wooledge VP Marketing, Arcadia Data Steve Wooledge is responsible for overall go-to-market strategy and marketing for Arcadia Data. He is a 15-year veteran of enterprise software in both large public companies and early-stage start-ups and has a passion for bringing innovative technology to market. Jack Norris Senior VP of Data and Applications, MapR Jack drives the adoptions and understanding of applications enabled by data convergence. With 20+ years of enterprise software experience, Jack has demonstrated success in defining analytic, operational and cloud applications.
  5. 5. Arcadia Data. Proprietary and Confidential A DATA NATIVE APPROACH: KEY TO SAVING THE DATA LAKE In a data native approach 5 You analyze the data in place (no extracts) You empower business users (semantic modeling) You operationalize results (dashboards of data plus business context) You view data across all endpoints and users (see what’s being used)
  6. 6. Arcadia Data. Proprietary and Confidential Poll #1: Where are you with your big data / data lake deployment? 1. Gathering knowledge - thinking about Hadoop/cloud/other platforms 2. Developing strategy - defining architecture, selecting tools 3. Piloting - big data analytics platform in place and testing 4. Deployed - defined use case, end-users using/analyzing data 5. Fully operational - broad use of data lake across the business 6
  7. 7. Arcadia Data. Proprietary and Confidential WHY HAVE DATA LAKES FAILED? 7 Use old processes Data swamps Pilot purgatory Can’t operationaliz e No analyst ownership Small data questions
  8. 8. Arcadia Data. Proprietary and Confidential BIG DATA VERSUS SMALL DATA QUESTIONS 8 Security What are the sales trends over last two years? Who is about to churn and what do we know about their journey? How much more can we find out? What are the login/logout behaviors of contractors? Sales Correlate online actions of particular contractors, analyzing multiple sources (packets, logins, IP addresses, pages, files accessed) Big data questions provide a rich and broad context as opposed to a precise answer to a precise question
  9. 9. Arcadia Data. Proprietary and Confidential HOW TO SAVE YOUR DATA LAKE 9 Playground for analysts with all needed data Available for easy integration into analytic platforms and business apps Support real-time, interactive, and batch Handle all data, including unstructured and via schema on read Save data lakes by making them
  10. 10. Arcadia Data. Proprietary and Confidential FOUR KEY CHALLENGES TO FACE 10 Data discovery/data organization1 Interactive use to drive business action2 Simplified configuration, operations, administration3 Using the resulting insights and data in real time and in support of applications 4
  11. 11. Arcadia Data. Proprietary and Confidential THE RIGHT ARCHITECTURE 11 Variety VolumeScaleBlend Speed Diversity of application processing and analytics on a common data infrastructure Interactive and in-place analytics Data FabricData Lake Applications APIs Streams Replication Edge Support
  13. 13. © 2018 MapR Technologies 13 Architecture Matters On-premise or Cloud Infrastructure • Dashboards providing visibility and insight into business event impact. Open layer for 3rd party and customization. • Converged processing and analytics layer for low latency, broad ranging applications. • Underlying data fabric to provide scale, reliability, protection and security.
  14. 14. Arcadia Data. Proprietary and Confidential Saving Data Lakes with Business Insights Steve Wooledge
  15. 15. 15 Arcadia Data. Proprietary and Confidential “Data and analytics leaders are feeling the pressure of rapidly increasing amounts of unprocessed data in data lakes. The challenge is in getting value from this data through analytics insights." Source: Gartner. Derive Value From Data Lakes Using Analytics Design Patterns. 26 September 2017 Saving the Data Lake
  16. 16. Arcadia Data. Proprietary and Confidential16 Big Data Challenges, Trending 0 10 20 30 40 50 60 52 42 33 31 28 27 26 26 19 12 5 Percent,% Integrating multiple data sources Defining our strategy Funding Risk and governance Skills and capabilities Determining value Source: Gartner Big Data Adoption Survey, September 2016 2016 Survey (n = 184) Integrating with existing infrastructure Leadership or Org. issues Other Understanding what is “big data” Infrastructure/ architecture
  17. 17. Arcadia Data. Proprietary and Confidential17 “Data” and “Platforms" Have Changed – Why Haven’t BI Tools? From To Data Platforms BI Tools rows and columns and multi-structured batch and interactive and real-time small and large volumes many sources internal and external tables and doc’s, search indexes, events schema on write and schema on read commodity hardware ETL and ELT and ELDT data warehouses and data lakes rows and columns batch smaller data volumes limited # sources mainly internal tables schema on write proprietary hardware ETL data warehouses SQL queries extracts cubes BI servers small/med scale Why haven’t BI tools evolved?
  18. 18. Arcadia Data. Proprietary and Confidential POLL #2 – How do you (or plan to) give users access to analyze data in data lake? 1. Development tools (e.g., Spark, MapReduce) 2. Direct SQL access (e.g., Hive, Impala, Spark SQL, Drill) 3. Traditional BI tools (e.g., Tableau, Qlik, MicroStrategy) 4. Machine learning / AI tools 5. Native visual analytics and BI tools 18
  19. 19. Arcadia Data. Proprietary and Confidential19 Can Your BI Stand up to Big Data Analytic Requirements? 19
  20. 20. Arcadia Data. Proprietary and Confidential20 Enterprises Today Need Two Separate BI Standards
  21. 21. Arcadia Data. Proprietary and Confidential21 Data Warehouse BI Architecture 21 BI Server Analytic Process Optimize Physical Semantic Layer Secure Data Load Data Big Data Requirements Native Connection Semi-Structured Parallel Real-time Data Warehouse (RDBMS)
  22. 22. Arcadia Data. Proprietary and Confidential22 Data Lake BI Architecture – The Native BI and Analytics Way 22 BI Server Analytic Process Optimize Physical Semantic Layer Secure Data Load Data Big Data Requirements Native Connection Semi-Structured Parallel Real-time Data Warehouse (RDBMS) Data Lake (*DFS, Cloud Object Storage) Arcadia Data was built from inception to run natively within data lakes
  23. 23. Arcadia Data. Proprietary and Confidential Real-Time Customer Support Response “Arcadia Enterprise was at least 5- 10x faster for complex queries for dozens of concurrent users. Arcadia dramatically accelerates our insights while reducing cost and complexity.” — Data Lake Service Owner Use Cases • Customer support • Log analytics • Identifying upsell opportunities and churn risks Challenges • Customer support needed real-time access to reports that covered a million online meetings per day with multiple attendees and multiple interactions. • SQL and SQL-on-Hadoop struggled as user concurrency scaled. Results • Customer service reps used to wait tens of minutes for a customer report. With Smart Acceleration™, they run reports and respond live, using existing BI tools. • Significant performance gains, especially in concurrency. • 300 CSRs can run ad hoc reports. Fortune 500 Communications Technology Company
  24. 24. Arcadia Data. Proprietary and Confidential24 The Result: Faster BI Analytics and Higher User Concurrency 24 25 35 88 105 169 427404 644 1440 120 214 366 199 379.107 687 0 200 400 600 800 1000 1200 1400 1 2 5 10 15 30 CompletionTime(seconds) # of Concurrent Jobs Query 1 Performance Testing - Heavy Query Arcadia Hive Impala Spark Customer Benchmark of a Legacy BI Tool Accelerated by Arcadia Data On a MapR Data Lake
  25. 25. Arcadia Data. Proprietary and Confidential25 BI for Data Lakes Must be Architected for Scale and Performance Edge Node JDBC BI Server Data Warehouse BI Architecture • BI Server can’t scale out • Significant data movement, modeling, security management Data Lake Cluster “Big Data” BI Architecture • Edge node BI server only scales via long planning • Performance optimizations require heavy IT intervention • Only passing SQL with no semantic information (e.g., filters) Native BI within Data Lake Architecture • Scales linearly with DataNodes while retaining agility • Semantic model is “pushed down” and distributed • Highly optimized “based on usage” physical model • No data movement; single security model Native BI = “Lossless”, high-definition analytics DataNodes Browser DataNodes + Arcadia Data Lake Cluster Browser Edge Node BI Server DataNodes Data Lake Cluster Browser
  26. 26. Arcadia Data. Proprietary and Confidential26 Query acceleration for scale, performance, and concurrency Smart Acceleration Leverages What Is Learned during Data Discovery Ad hoc queries Arcadia Enterprise makes recommendations – build these with a click. Data Lake Cluster • Fast query responses • Minimal modeling • Live acceleration (no downtime) All Granular Data Analytical Views Accelerated application queries
  27. 27. Arcadia Data. Proprietary and Confidential27 Data Warehouse BI Tools Treat Data Lakes Like Any Other Database 1. Land / Secure Data High cost to deploy, govern, and manage 2. Semantic Modeling 3. Extract to BI Server 4. Secure 5. Performance Modeling 6. Analytic / Visual Discovery 2nd Iteration Nth Iteration Iterate on steps 2 - 6 in feedback loop. Iterate on steps 2 - 6 in feedback loop. … Data Warehouse and Data Lakes BI Server With traditional BI tools, the analytics process is the same for data warehouses and data lakes. Too early – use cases not fully defined yet. Slow, repetitive feedback loop to refine models. Too late – need to re-model based on use cases. 7. Production 7. Production
  28. 28. Arcadia Data. Proprietary and Confidential28 Native BI Flips Discovery and Optimization for Faster Time to Value 1. Land / Secure Data 2. Analytic / Visual Discovery 3. Semantic Modeling 5. Production Analytics & BI within Data Lake 4. Optimize Performance High cost to deploy, govern, and manage Data Warehouse or Data Lake BI Server Faster time to value Quick feedback loops - One security model - No movement of data - Discover first, take action second. Performance modeling for production deployment is optional. 1. Land / Secure Data 2. Semantic Modeling 3. Extract to BI Server 4. Secure 5. Performance Modeling 6. Analytic / Visual Discovery Nth iteration Iterate on steps 2 - 6 in feedback loop. 7. Production
  29. 29. Arcadia Data. Proprietary and Confidential29 Top Use Cases for Native BI and Analytics on Data Lakes 29 MODERN BI PLATFORM & CUSTOMER INTELLIGENCE FINANCIAL SERVICES AND INSURANCE RISK & SECURITY OPTIMIZATION IOT ANALYTICS  DW optimization  Customer 360  Marketing analytics  Fundamental Review of Trading Book (FRTB)  Trade surveillance  Anti-money laundering  Location intelligence  Cybersecurity  Security information & event management  Fraudulent behavioral analysis  Data center monitoring  Telematics  System log analysis  Manufacturing quality assurance  Predictive maintenance
  30. 30. Arcadia Data. Proprietary and Confidential Demo: See It in Action
  31. 31. Social media: 31 Saving the Data Lake Whitepaper See MapR & Arcadia in Action Download Arcadia Instant g-data-lakes-whitepaper/