Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

When everybody wants Big Data Who gets it?

95 views

Published on

The accelerating supply of big data is converging with accelerating data demand from everyday business users. What does it take to get from Hadoop as a data reservoir to Hadoop as a day-to-day data source for your business and end users?

The answer to ‘what’ is ‘how’ and ‘who’. Reducing architectural reliance on ‘small data’ technologies and broadening access to Hadoop hold the key to big data payoff.

Join Nik Rouda, Big Data Analyst and blogger at the Enterprise Strategy Group (ESG), as he hosts this webcast featuring guest presentations from real world practitioners Tanwir Danish, VP of Product Development at Marketshare (acquired by Neustar) and Rajiv Synghal, Chief Architect, Big Data Strategy at Kaiser Permanente.

Latest research on Hadoop adoption patterns and anti-patterns
Putting users at the center of big data utilization and avoiding the data scientist paradox
Architectural misconceptions that can tank big data initiatives
Security and multi-tenancy strategies to accelerate adoption
Retooling skills and organizational thinking when big data is the rule, not the exception

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

When everybody wants Big Data Who gets it?

  1. 1. When everybody wants Big Data Who gets it? Webcast brought to you by FEATURING Featuring Enter
  2. 2. Arcadia is a Hadoop-native platform that connects business users to big data Distributed BI & Analytics Engine runs on each Hadoop node Users connect via a web browser Brought to you by Arcadia Data
  3. 3. Rajiv Synghal, Chief Architect, Big Data Strategy at Kaiser Permanente. TODAY’S PRESENTERS Nik Rouda Senior Analyst - ENTERPRISE STRATEGY GROUP (ESG) Rajiv Synghal Chief Architect Big Data Strategy - KAISER PERMANENTE Tanwir Danish VP of Product Development - MARKETSHARE PARTNERS A NEUSTAR SOLUTION
  4. 4. Nik Rouda Senior Analyst Enterprise Strategy Grou BI and Analytics on Big Data A Market View Nik Rouda, Senior Analyst Enterprise Strategy Group | Getting to the bigger truth.™ © 2016 by The Enterprise Strategy Group, Inc. BI and Analytics on Big Data A Market View Nik Rouda, Senior Analyst
  5. 5. © 2016 by The Enterprise Strategy Group, Inc. Big Data & Analytics: A Business Imperative 48% 32% 14% 3% 2% 1% Our most important priority One of our top 5 priorities One of our top 10 priorities One of our top 20 priorities Not among our top 20 priorities Don’t know/no opinion will increase spending on big data & analytics in 2016 63% Relative to all of your organization’s business and IT priorities over the next 12-18 months, how would you rate the importance of its big data analytics projects and initiatives? (Percent of respondents, N=475)
  6. 6. © 2016 by The Enterprise Strategy Group, Inc. What Does Big Data Mean to You? 23% 32% 34% 40% 40% 40% 43% Veracity (i.e., uncertainty of quality or results) Support new business requirements Scale of data (i.e., volume) Reduced total cost of ownership (TCO) Want more access to analytics across lines of business Diversity of data sources (i.e., variety) Speed of analytics operations (i.e., velocity) Which of the following requirements are most responsible for driving your organization to evaluate new analytics solutions? (Percent of respondents, N=200, three responses accepted)
  7. 7. © 2016 by The Enterprise Strategy Group, Inc. Common Challenges for BI, Analytics, & Big Data Data sets too large for analytics Difficulty spanning disparate systems Lack of skills to manage and analyze data Limited collaboration between IT, analysts, and LoB Difficulty working with both structured and unstructured data
  8. 8. © 2016 by The Enterprise Strategy Group, Inc. Business Outcomes from Big Data & Analytics 54% want faster tactical response to customers 54% want reduced risk on business decisions 50% want better sales & marketing performance 49% want improved operational efficiency
  9. 9. © 2016 by The Enterprise Strategy Group, Inc. Hadoop is Steadily Gaining Market Adoption Already using Hadoop, 20% Very interested in Hadoop, 37% Somewhat interested in Hadoop, 27% Not at all interested in Hadoop, 5% Not familiar with Hadoop technology, 8% Don’t know, 3% How would you rate your organization’s interest in implementing Hadoop? (Percent of respondents, N=475) /
  10. 10. © 2016 by The Enterprise Strategy Group, Inc. Hadoop Deployment Drivers 17% 26% 30% 30% 35% 37% 37% 37% 39% To replace your BI tools To replace your data warehouse(s) To complement/offload your data warehouse(s) To perform real-time and/or streaming analytics To accommodate more diverse and/or unstructured data … To distribute analytics workloads across commodity servers To establish a centralized data lake or hub To complement/offload your BI tools To reduce costs of data storage and archive What drove your organization’s decision to implement Hadoop? (Percent of respondents, N=46, multiple responses accepted)
  11. 11. © 2016 by The Enterprise Strategy Group, Inc. Most Important Features When Considering Solutions to use SQL on Hadoop 13% 24% 24% 24% 26% 28% 28% 35% 45% Manageable workloads with YARN Schema-less or schema-on-read Complete ANSI SQL support Support for complex or nested data types Breadth of file types supported (e.g., Parquet, JSON, text, AVRO, Hive, etc.) Rich security controls Reliability with full ACID compliance High concurrency/parallelized High performance/low latency When considering solutions to use SQL on Hadoop, what are the most important features? (Percent of respondents, N=94, three responses accepted)
  12. 12. © 2016 by The Enterprise Strategy Group, Inc. How Hadoop Will Fit Against Traditional Data Warehouse Approach Hadoop will largely replace our existing data warehouse, 26% Hadoop will offload/optimize our existing data warehouse, 36% Hadoop will be used only for limited data warehouse -like functions, 28% No plans to use Hadoop for any data warehouse -like functions, 11% How do you anticipate Hadoop will fit against your organization’s traditional data warehouse approach? (Percent of respondents, N=94) /
  13. 13. © 2016 by The Enterprise Strategy Group, Inc. Core IT Teams Expected to Implement & Manage Projects 23% 23% 25% 27% 29% 32% 52% 53% Service provider Value-added reseller (VAR) Business application vendor Management consultancy Systems integrator (SI) Business analyst/data scientist team IT applications team IT infrastructure and operations team Which of the following groups provides the skills and manpower to implement and manage the technologies supporting initiatives in the area of big data and analytics? (Percent of respondents, N=475, multiple responses accepted)
  14. 14. © 2016 by The Enterprise Strategy Group, Inc. Skills Development is a Priority Cybersecurity (i.e., information security), 44% Big data analytics, 26% Infrastructure management, 17% Application development and deployment, 12% 36% cite a problematic skills shortage for business intelligence & analytics In which of the following functional areas do you believe skills development would be most beneficial to your employees (i.e., IT staff) in terms of their career path and benefit to your organization? (Percent of respondents, N=627)
  15. 15. Tanwir Danish VP of Product Development A STORY OF GROWTH WITH ARCADIA Tanwir Danish, Head of Product & UX, MarketShare a Neustar Solution Copyright © 2016 Neustar, Inc. All Rights Reserved
  16. 16. MarketShare links markeEng to revenue. We do so by analyzing customer interacEons and resultant business outcomes using data science 16
  17. 17. OUR TECHNOLOGY STACK EVOLVED TO SUPPORT KEY GOALS Stage 2012 2013 2014 2015 2016 Raw Data S3 | self managed HADOOP PredicEve AnalyEcs Workflow HIVE extract | Sampling | R ReporEng workflow HIVE extract | Oracle | Tableau Decision AnalyEcs Workflow ORACLE | memCacheDB
  18. 18. A KEY NEED UNDERSTAND CUSTOMER JOURNEY AND IMPACT OF EACH INTERACTION ON BUSINESS OUTCOMES 18
  19. 19. Consumer visits store location and makes a purchase $ 10s of data sources 10s of Terrabytes of data 10s of millions pathways 10s of touchpoints types WHAT DOES CUSTOMER JOURNEY LOOK LIKE?
  20. 20. PERFORMANCE ZOOM INTO CUSTOMER JOURNEY (POWER OF TENS) TWO KEY CHALLENGES WE RAN INTO
  21. 21. ADOPTION OF Stage Q1’15 Q2’15 Q3’15 Q4’15 Q1’16 Adop4on Milestone ACTION App MAP App (Internal) TV App STRATEGY App Release 5.0 of ACTION App Key Func4onality Improvement Integrated Arcadia with AlEscale Parametrized reports VisualizaEon enhancements Progressive disclosure Config driven reports Benefit Visibility and performance Visibility for internal team Time to market Scale Efficiency
  22. 22. With Arcadia SoluEon MarketShare clients understand their customers beaer. In addiEon, our report development process has become more agile and client configuraEon more efficient. 22
  23. 23. Rajiv Synghal Chief Architect Big Data Strategy - KAISER PERMANENTE Formulating Big Data for Analytics in Health Care Rajiv Synghal, Chief Architect, Big Data Strategy
  24. 24. 1 3600 MEMBER VIEW IN HEALTHCARE Personal Behaviors (Life Style Choices, Preferences, Activities, QoL) Social Factors (Friends, Family, Affiliations, Communication, Activities) Demographic Factors (Age, Address, Employer, Industry) Family History and Genetics Personal “-omics” (Genomics, Proteomics, Transcriptomes, Metabolomics) Medical Care (encounter, labs, Rx, medical devices, etc.) Environmental Factors Environment (Temperature, Humidity, Pollen Count,..) Geographic (Closest Hospital, Pharmacy, Care Clinic,…) Member
  25. 25. CHANGING ANALYTICAL NEEDS IN HEALTHCARE 1 2 3 4 5 6 7 8 9 10 Food Recommendations Environment Monitoring Resource Recommendations Drive Recommendation Biometric Monitoring Alerts/ Dashboards Brand Sentiment Triage Recommendations Expert Advice Event Prediction The changing market conditions driven by healthcare reform will require Kaiser Permanente to find new innovative ways to deliver on its mission of high quality affordable care.
  26. 26. FUTURE OF DATA AND ANALYTICS IN HEALTHCARE OUR FUTURE Analytics leveraged by users OUR PRESENT Massive data stores and interconnectivity Network Storage and Compute Data Analytics: SAS, “R”, Proprietary, Open-source, Third party Scientist Clinician Business Ubiquitous Access § Provide access to a greater breadth of data § Enable data accessibility to more than data scientists and BI experts Rapid Prototyping § Enable more frequent prototyping and development § Faster delivery of new capabilities Entities that are able to leverage Data & Analytics are becoming leaders in their respective industries
  27. 27. FUTURE DATA LIFECYCLE MANAGEMENT IN HEALTHCARE DataClassification DataAccess SLAs DataStore Characteristics Hyperactive High frequency access to data by multiple users and applications with high temporal locality. • Milliseconds –second • Compute centric cluster • Data pinned in memory. • Number of copies is dynamically adjusted based upon frequency and recency of access Active Frequent access to data by multiple applications and users with low temporal locality. • Seconds -minutes • Compute centric cluster • Multiple copies of data on nodes with SSDs or on nodes with smaller, faster spinning disks • On-demand in-memory cache • Number of copies is dynamically adjusted based upon frequency of access Inactive Sparse access to data over long periods of time. • Minutes • Storage centric cluster • Multiple copies of data on nodes with smaller, slower spinning disks • Number of copies is determined by system fault-tolerance threshold requirements Dormant No access to data for long periods of time. • Hours • Storage centric cluster • Multiple copies of data on nodes with bigger, slower spinning disks • Number of copies is determined by system fault-tolerance threshold requirements Frequency, recency (temporal locality) and system fault-tolerance thresholds determine the tiered storage requirements for ALL enterprise data. In the Big Data Analytics World, Data is never deleted except when it is required for compliance reasons. Create Store Obsolete DeleteFour Stages: DataClassification DataAccess SLAs DataStore Characteristics Hyperactive High frequency access to data by multiple users and applications with high temporal locality. • Milliseconds –second • Compute centric cluster • Data pinned in memory. • Number of copies is dynamically adjusted based upon frequency and recency of access Active Frequent access to data by multiple applications and users with low temporal locality. • Seconds -minutes • Compute centric cluster • Multiple copies of data on nodes with SSDs or on nodes with smaller, faster spinning disks • On-demand in-memory cache • Number of copies is dynamically adjusted based upon frequency of access Inactive Sparse access to data over long periods of time. • Minutes • Storage centric cluster • Multiple copies of data on nodes with smaller, slower spinning disks • Number of copies is determined by system fault-tolerance threshold requirements Dormant No access to data for long periods of time. • Hours • Storage centric cluster • Multiple copies of data on nodes with bigger, slower spinning disks • Number of copies is determined by system fault-tolerance threshold requirements Frequency, recency (temporal locality) and system fault-tolerance thresholds determine the tiered storage requirements for ALL enterprise data. In the Big Data Analytics World, Data is never deleted except when it is required for compliance reasons. Create Store Obsolete DeleteFour Stages: 5 FUTURE DATA LIFECYCLE MANAGEMENT IN HEALTHCARE DataClassification DataAccess SLAs DataStore Characteristics Hyperactive High frequency access to data by multiple users and applications with high temporal locality. • Milliseconds –second • Compute centric cluster • Data pinned in memory. • Number of copies is dynamically adjusted based upon frequency and recency of access Active Frequent access to data by multiple applications and users with low temporal locality. • Seconds -minutes • Compute centric cluster • Multiple copies of data on nodes with SSDs or on nodes with smaller, faster spinning disks • On-demand in-memory cache • Number of copies is dynamically adjusted based upon frequency of access Inactive Sparse access to data over long periods of time. • Minutes • Storage centric cluster • Multiple copies of data on nodes with smaller, slower spinning disks • Number of copies is determined by system fault-tolerance threshold requirements Dormant No access to data for long periods of time. • Hours • Storage centric cluster • Multiple copies of data on nodes with bigger, slower spinning disks • Number of copies is determined by system fault-tolerance threshold requirements Frequency, recency (temporal locality) and system fault-tolerance thresholds determine the tiered storage requirements for ALL enterprise data. In the Big Data Analytics World, Data is never deleted except when it is required for compliance reasons. Create Store Obsolete DeleteFour Stages:
  28. 28. EVOLUTION OF ANALYTICAL TOOLS IN HEALTHCARE DataClassification DataAccess SLAs Data Preparation and Analytical Tools Hyperactive High frequency access to data by multiple users and applications with high temporal locality. • Milliseconds –second • Compute centric cluster • Data pinned in memory. • Data Tagging & Cataloguing –Waterline Data Science • Data Preparation –Trifacta Tools • Data Integration –Trifacta, Informatica • Cubes and Visualization –Arcadia Data • SAS – LASR (Visual Statistics, Visual Analytics, In-memory Statistics) • H2O –In-memory deep machine learning, “R” Active Frequent access to data by multiple applications and users with low temporal locality. • Seconds -minutes • Compute centric cluster • On-demand in-memory caching of data • Data Tagging & Cataloguing –Waterline Data Science • Data Preparation –Trifacta Tools • Data Integration –Trifacta, Informatica • Cubes and Visualization –Arcadia Data • SAS – LASR (Visual Statistics, Visual Analytics, In-memory Statistics) • H2O –In-memory deep machine learning, “R” Inactive Sparse access to data over long periods of time. • Minutes • Storage centric cluster • Data Tagging & Cataloguing –Waterline Data Science • Data Preparation –Trifacta Tools • Cubes and Visualization –Arcadia Data • Data Visualization –Tableau Servers & Desktops • SAS Tools running on separate SAS Grid Cluster Dormant • Hours • Storage centric cluster • Data Tagging & Cataloguing –Waterline Data Science Analytical tools space is changing very fast. Data Preparation and Analytical tools that natively run “IN” the cluster provide a higher degree of security and freedom than the ones that run “WITH” the cluster. EVOLUTION OF ANALYTICAL TOOLS IN HEALTHCARE DataClassification DataAccess SLAs Data Preparation and Analytical Tools Hyperactive High frequency access to data by multiple users and applications with high temporal locality. • Milliseconds –second • Compute centric cluster • Data pinned in memory. • Data Tagging & Cataloguing –Waterline Data Science • Data Preparation –Trifacta Tools • Data Integration –Trifacta, Informatica • Cubes and Visualization –Arcadia Data • SAS – LASR (Visual Statistics, Visual Analytics, In-memory Statistics) • H2O –In-memory deep machine learning, “R” Active Frequent access to data by multiple applications and users with low temporal locality. • Seconds -minutes • Compute centric cluster • On-demand in-memory caching of data • Data Tagging & Cataloguing –Waterline Data Science • Data Preparation –Trifacta Tools • Data Integration –Trifacta, Informatica • Cubes and Visualization –Arcadia Data • SAS – LASR (Visual Statistics, Visual Analytics, In-memory Statistics) • H2O –In-memory deep machine learning, “R” Inactive Sparse access to data over long periods of time. • Minutes • Storage centric cluster • Data Tagging & Cataloguing –Waterline Data Science • Data Preparation –Trifacta Tools • Cubes and Visualization –Arcadia Data • Data Visualization –Tableau Servers & Desktops • SAS Tools running on separate SAS Grid Cluster Dormant No access to data for long periods of time. • Hours • Storage centric cluster • Data Tagging & Cataloguing –Waterline Data Science • Data Preparation –Trifacta Tools • Data Visualization –Tableau Servers & Desktops • SAS Tools running on separate SAS Grid Cluster Analytical tools space is changing very fast. Data Preparation and Analytical tools that natively run “IN” the cluster provide a higher degree of security and freedom than the ones that run “WITH” the cluster.
  29. 29. Rajiv Synghal, Chief Architect, Big Data Strategy at Kaiser Permanente. PANEL DISCUSSION Nik Rouda Rajiv Synghal ` Tanwir Danish
  30. 30. Rajiv Synghal, Chief Architect, Big Data Strategy at Kaiser Permanente. Q&A Nik Rouda Rajiv Synghal ` Tanwir Danish
  31. 31. When everybody wants Big Data Who gets it?Thank you.

×