Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Time's Up! Getting Value from Big Data Now

317 views

Published on


The Briefing Room with Dr. Robin Bloor and CASK

We all know the promise of big data, but who gets the value? There are plenty of success stories already, and most of them involve one key ingredient: facilitated access to important data sets. Most research studies suggest that the Pareto principle applies: 80 percent goes to data integration, and only 20 to analysis. Inverting that balance is the Holy Grail.

Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain why the time has finally come for turning the tables on the status quo in analytics. He'll be briefed by CASK CEO Jonathan Gray, who will showcase his company's big data integration platform, CDAP, which was specifically designed to expedite time-to-value for big data.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Time's Up! Getting Value from Big Data Now

  1. 1. Grab some coffee and enjoy the pre-show banter before the top of the hour! !
  2. 2. The Briefing Room Time's Up! Getting Value from Big Data Now
  3. 3. Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  4. 4. u  Reveal the essential characteristics of enterprise software, good and bad u  Provide a forum for detailed analysis of today s innovative technologies u  Give vendors a chance to explain their product to savvy analysts u  Allow audience members to pose serious questions... and get answers! Mission
  5. 5. Big Integration u  Old infrastructure lacking u  New pipes are needed u  Well begun is half done!
  6. 6. Analyst Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com @robinbloor
  7. 7. CASK u  CASK offers a unified integration platform for big data applications and data lakes u  Its CDAP architecture provides data containers, program containers and application containers for data and applications on Hadoop u  CASK also offers Hydrator for building and managing data pipelines and data lakes, and Tracker for data lake governance
  8. 8. Guest Jonathan Gray Jonathan Gray, Founder & CEO of Cask, is an entrepreneur and software engineer with a background in startups, open source and all things data. Prior to founding Cask, Jonathan was a software engineer at Facebook where he drove HBase engineering efforts, including Facebook Messages and several other large-scale projects from inception to production. An open source evangelist, Jonathan was responsible for helping build the Facebook engineering brand through developer outreach and refocusing the open source strategy of the company. Prior to Facebook, Jonathan founded Streamy.com, where he became an early adopter of Hadoop and HBase and is now a core contributor and active committer in the community. Jonathan holds a bachelor’s degree in Electrical and Computer Engineering and Business Administration from Carnegie Mellon University.
  9. 9. Big Data on Tap cask.co November 1, 2016 The Briefing Room Jonathan Gray Founder & CEO
  10. 10. cask.co Hadoop Enables New Applications and Architectures 2 ENTERPRISE DATA LAKES BIG DATA ANALYTICS PRODUCTION DATA APPS Batch and Realtime Data Ingestion Any type of data from any type of source in any volume Batch and Streaming ETL Code-free self-service creation and management of pipelines SQL Exploration and Data Science All data is automatically accessible via SQL and client SDKs Data as a Service Easily expose generic or custom REST APIs on any data 360o Customer View Integrate data from any source and expose through queries and APIs Realtime Dashboards Perform realtime OLAP aggregations and serve them through REST APIs Time Series Analysis Store, process and serve massive volumes of time-series data Realtime Log Analytics Ingestion and processing of high-throughput streaming log events Recommendation Engines Build models in batch using historical data and serve them in realtime Anomaly Detection Systems Process streaming events and predictably compare them in realtime to historical data NRT Event Monitoring Reliably monitor large streams of data and perform defined actions within a specified time Internet of Things Ingestion, storage and processing of events that is highly-available, scalable and consistent ENTERPRISE DATA LAKES BIG DATA ANALYTICS PRODUCTION DATA APPS Batch and Realtime Data Ingestion Any type of data from any type of source in any volume Batch and Streaming ETL Code-free self-service creation and management of pipelines SQL Exploration and Data Science All data is automatically accessible via SQL and client SDKs Data as a Service Easily expose generic or custom REST APIs on any data 360o Customer View Integrate data from any source and expose through queries and APIs Realtime Dashboards Perform realtime OLAP aggregations and serve them through REST APIs Time Series Analysis Store, process and serve massive volumes of time-series data Realtime Log Analytics Ingestion and processing of high-throughput streaming log events Recommendation Engines Build models in batch using historical data and serve them in realtime Anomaly Detection Systems Process streaming events and predictably compare them in realtime to historical data NRT Event Monitoring Reliably monitor large streams of data and perform defined actions within a specified time Internet of Things Ingestion, storage and processing of events that is highly-available, scalable and consistent ENTERPRISE DATA LAKES BIG DATA ANALYTICS PRODUCTION DATA APPS Batch and Realtime Data Ingestion Any type of data from any type of source in any volume Batch and Streaming ETL Code-free self-service creation and management of pipelines SQL Exploration and Data Science All data is automatically accessible via SQL and client SDKs Data as a Service Easily expose generic or custom REST APIs on any data 360o Customer View Integrate data from any source and expose through queries and APIs Realtime Dashboards Perform realtime OLAP aggregations and serve them through REST APIs Time Series Analysis Store, process and serve massive volumes of time-series data Realtime Log Analytics Ingestion and processing of high-throughput streaming log events Recommendation Engines Build models in batch using historical data and serve them in realtime Anomaly Detection Systems Process streaming events and predictably compare them in realtime to historical data NRT Event Monitoring Reliably monitor large streams of data and perform defined actions within a specified time Internet of Things Ingestion, storage and processing of events that is highly-available, scalable and consistent Data Applications Drive Meaningful Business Value
  11. 11. cask.co3 But Getting Value from Big Data is Hard Too much focus on infrastructure and integration, rather than applications and analytics Divergence of distributions and technologies Integration silos created by narrow point solutions Proliferation of projects, services and APIs Complexity of technologies and new user learning curve
  12. 12. cask.co4 Without a consistent set of tools, IT will not be an effective data enabler for the business Developer Architecture & Programming Focused on Apps & Solutions Ops Configuring & Monitoring Focused on Infrastructure & SLA’s LOB / Product Driving Revenue & Decision Making Focused on Products & Insights Data Scientist Scripting & Machine Learning Focused on Data & Algorithms And There Are Many Faces of Hadoop
  13. 13. cask.co5 Enter Cask AT&T, Cloudera and Ericsson Strategic Investors 3.5 Cask Data Application Platform, Cask Hydrator and Cask Tracker Latest Release AT&T, Ericsson, Lotame, Salesforce, Cloudera, Hortonworks, MapR, Microsoft, IBM, Tableau… Key Customers & Partners By early Hadoop engineers from Facebook and Yahoo! Founded in 2011 Andreessen Horowitz, Safeguard, Battery Venture and Ignition Partners Raised $37+ Million Featuring Cask Market,
 the “big data app store” NEW: CDAP 4 Preview A Container Architecture that puts Big Data on Tap Why “Cask” ?
  14. 14. cask.co6 Convergence of Big Data Apps and Data Integration The Evolution of the Cask Platform Big Data Apps + Data Integration • Data ingest • Data pipelines • Workflows and metadata “WebLogic Meets Informatica” CDAP v3 Big Data App Server • Abstractions & integrations • Metrics & logs • Debugging environment “WebLogic for Hadoop” CDAP v2 Unified Integration for Big Data • Security & governance • Self-service environment • Enterprise integrations “Unified Big Data Integration” CDAP v4
  15. 15. cask.co Introducing Cask Data Application Platform (CDAP) 7 First Unified Integration Platform for Big Data
 
 Platform for distributed apps, bringing together
 application management with data integration 
 • 100% open source and built for extensibility • Supports all major Hadoop distributions and clouds • Integrates the latest open source big data technologies Data Lake Fraud Detection Recommendation Engine Sensor Data Analytics Customer 360 Modern Data Integration Distributed Application Framework Self-Service User Experience Enterprise-grade Security & Governance
  16. 16. cask.co8 • Real-time and Batch • Reliable and Scalable • Simple and Self-Service Modern Data Integration EXPLORE for analytics and data science PROCESS for ETL and machine learning SERVE any data to any destination INGEST any data from any source
  17. 17. cask.co9 Distributed Application Framework DEVELOP rapidly build applications TEST powerful test and CI framework DEPLOY run any apps in any environment SCALE horizontally scale apps and data • Real-time and Batch • Memory, Local, Distributed • Analytics and Applications
  18. 18. cask.co10 Security and Governance CAPTURE store all metadata about your data DISCOVER easily locate any of your data TRACK every audit plus lineage graphs ANALYZE understand usage patterns of data AUTHENTICATE AUTHORIZEENCRYPT
  19. 19. cask.co11 A data discovery tool to explore metadata and usageA code-free framework to build and run data pipelines Self-Service User Experience Drag & drop graphical interface Create, debug, deploy and manage Separation of logic and execution environment Native to Hadoop & Spark — scales out Rich app- level metadata Track lineage and audits Analyze usage of datasets MDM integration framework
  20. 20. cask.co The CDAP Architecture 12 Applications Programs MapReduce Spark Tigon Workflow Service Worker Metadata Datasets Table Avro Parquet Timeseries OLAP Cube Geospatial ObjectStore Metadata Metadata • Application Container Architecture • Reusable Programming Abstractions • Global User and Machine Metadata • Highly Extensible Plugin Architecture
  21. 21. cask.co13 Single framework for building and running data apps and data lakes on Hadoop and Spark Rapid Development • Standardization, deep integrations, tools and docs • Separation of app logic from data logic and integration logic • Conceptual integrity within applications and consistency across environments Production Operations & Governance • Simplified packaging, deployment and monitoring of apps on Hadoop • Enhanced security and governance with centralized metrics and logs • Tracking and exploration of metadata, data provenance, audit trails and usage analytics CDAP Enables the Full Big Data Application Lifecycle reduces time to develop and deploy big data apps by 80% reduces time to insights and accelerates business value removes barriers to innovation and future-proofs your apps
  22. 22. cask.co14 Customer Success Stories Customer
 Situation Lack of existing Hadoop expertise and frustration with hand-coding and scripting tools Cask Hydrator for rapid creation of data pipelines and Cask Tracker for data discovery POC in 2 days
 Production in 2 months Cask
 Solution Small team and significant technical challenges limit pace of development and solution scale CDAP for real-time ingestion and consistent processing with production operations support Development in 1 month
 Production in 3 months Hundreds of Users
 Thousands of Pipelines Multiple teams and technologies with widely varied skillsets and incompatible design choices CDAP for data lake management and orchestration, tightly integrated into existing systems Health Insurance Provider
 offloading clinical / immunization reporting from Netezza Leading SaaS Platform
 taking new real-time, massive scale products to market Large Telco Enterprise
 building a centralized, secured,
 multi-tenant Data Lake
  23. 23. cask.co15 Cask was Named a Gartner Cool Vendor 2016 Cask was Certified a Great Place to Work 2016 “ … for the rest of us who lack the technological chips or patience to make it all work, there’s good news: it will soon get easier, thanks to the work done by the big data pioneers, as well as vendors like Cask …” (Alex Woodie, Managing Editor, Datanami) Awards and Accolades “ … “Cask has tilted the playing field, earning a massive unfair advantage over proprietary point products for data integration and ingest …” (Nik Rouda, Senior Analyst, Enterprise Strategy Group) “ … “CDAP is a big win for us … the amount of code we needed to write was minimal with CDAP, and it was much easier and faster than we ever expected …” (Jia-Long Wu, Data Architect, Lotame)
  24. 24. cask.co16 NEW: CDAP 4 — Big Data Apps on Tap! Available for download now! Release of CDAP 4 Preview “Big Data App Store” Cask Market Interactive Data Preparation Cask Wrangler Interactive Wizards for Common Tasks Resource Center Rewrite based on React Reimagined CDAP UI
  25. 25. cask.co17 The “App Store for Big Data” Cask Market • Goal: Time to value in minutes w/ no existing experience • Application and Library Ecosystem with pre-built Hadoop solutions, reusable templates, and third-party plugins • Available from anywhere inside the CDAP UI with a click • Initially, everything in the Cask Market has been bootstrapped by Cask based on ongoing work across our customers, is 100% open source and available on GitHub • Eventually, developers and ISVs will be able to showcase and market their own applications and libraries (ex: Graylog) Cask Market includes Interactive, Guided Wizards for Configuring Pre-Built Templates NEW: CDAP 4 — Big Data Apps on Tap!
  26. 26. cask.co18 Building Data Pipelines on Hadoop with Cask Hydrator Data Lake Webinar Introduction to Cask Hydrator CDAP - Containers on Hadoop CDAP Extensions - Cask Hydrator and Cask Tracker ESG Solution Spotlight CDAP Technical Concepts (video) Cask / Cloudera Solution Brief Cask Resources
  27. 27. cask.co ● CDAP provides the first unified integration platform for big data ● Cask Hydrator and Cask Tracker are visual extensions of CDAP for self-service access ● CDAP empowers enterprise IT to deliver
 faster time to value for Hadoop and Spark, from prototype to production ● Cask Market is a “big data app store” available in CDAP 4 with pre-built apps, pipelines, plugins ● CDAP is 100% open source, highly extensible, enterprise-ready, and commercially supported Big Data on Tap Summary
  28. 28. cask.co20 For more information, go to: cask.co Thanks!
  29. 29. Perceptions & Questions Analyst: Robin Bloor
  30. 30. Big Data Foundations? Robin Bloor, PhD
  31. 31. Neither Hadoop Nor Spark Is a Solution However, both are useful and increasingly versatile components for Big Data applications
  32. 32. The Evolution of the Little Elephant u  Hortonworks: Apache pure play. No apparent vision. u  Cloudera: Some proprietary components (Cloudera Manager, Impala, Cloudera Search). Vision is corporate data hub(?) u  MapR: Also some proprietary components (MapR-FS, MapR Streams, MapR-DB) u  And then there’s the cloud.
  33. 33. The Ship of Fools Until Hadoop’s direction is controlled by a single “captain” we may have to tolerate the ship of fools
  34. 34. The “Big Data Hype Cycle” Is Misleading u  Big Data is an ecosystem, not a technology – which distorts this graph u  Some analytics applications have experienced “absurd acceleration” u  Hadoop is, in many instances, a laggard - Spark too u  Nevertheless, we seem to be exiting “the trough”
  35. 35. Data lake or Governance Hub?
  36. 36. The System Management Issue Mobile Devices DesktopsServers IoT The Cloud Archive Data Stores Data Assaying Data Capture Real-Time Streaming? Data Mgt Data Serving The Prospecting Domain Apps Data Life Cycle Mgt Staging Area (Hadoop?) System Management
  37. 37. The Fundamental Issue Big Data does not really have a foundation. Neither, imho, does the Data Lake. Luckily, there are third parties…
  38. 38. u  Regarding Hadoop, do you have any “preferred components?” u  How do you stay current with the various distros? Backward compatibility? Can a customer upgrade at will? u  How does your technology impact performance (if at all)? u  Do you provide a consultancy service?
  39. 39. u  Which companies/services do you regard as competitive? u  Do you have any specific partners? u  What does an implementation look like?
  40. 40. THANK YOU for your ATTENTION! Some images provided courtesy of Wikimedia Commons

×