The accelerating supply of big data is converging with accelerating data demand from everyday business users. What does it take to get from Hadoop as a data reservoir to Hadoop as a day-to-day data source for your business and end users?
The answer to ‘what’ is ‘how’ and ‘who’. Reducing architectural reliance on ‘small data’ technologies and broadening access to Hadoop hold the key to big data payoff.
Join Nik Rouda, Big Data Analyst and blogger at the Enterprise Strategy Group (ESG), as he hosts this webcast featuring guest presentations from real world practitioners Tanwir Danish, VP of Product Development at Marketshare (acquired by Neustar) and Rajiv Synghal, Chief Architect, Big Data Strategy at Kaiser Permanente.
Latest research on Hadoop adoption patterns and anti-patterns
Putting users at the center of big data utilization and avoiding the data scientist paradox
Architectural misconceptions that can tank big data initiatives
Security and multi-tenancy strategies to accelerate adoption
Retooling skills and organizational thinking when big data is the rule, not the exception
19. Consumer
visits store
location and
makes a
purchase
$
10s of data sources
10s of Terrabytes of data
10s of millions pathways
10s of touchpoints types
WHAT DOES CUSTOMER JOURNEY LOOK LIKE?
23. Rajiv Synghal
Chief Architect
Big Data Strategy
-
KAISER
PERMANENTE
Formulating Big Data for Analytics in Health Care
Rajiv Synghal, Chief Architect, Big Data Strategy
24. 1
3600 MEMBER VIEW IN HEALTHCARE
Personal Behaviors
(Life Style Choices,
Preferences, Activities, QoL)
Social Factors
(Friends, Family, Affiliations,
Communication, Activities)
Demographic Factors
(Age, Address, Employer, Industry)
Family History and Genetics
Personal “-omics”
(Genomics, Proteomics, Transcriptomes,
Metabolomics)
Medical Care
(encounter, labs,
Rx, medical devices, etc.)
Environmental
Factors
Environment
(Temperature,
Humidity,
Pollen Count,..)
Geographic
(Closest Hospital,
Pharmacy,
Care Clinic,…)
Member
25. CHANGING ANALYTICAL NEEDS IN HEALTHCARE
1 2 3 4 5 6 7 8 9 10
Food
Recommendations
Environment
Monitoring
Resource
Recommendations
Drive
Recommendation
Biometric
Monitoring
Alerts/
Dashboards
Brand
Sentiment
Triage
Recommendations
Expert
Advice
Event
Prediction
The changing market conditions driven by healthcare reform will require Kaiser Permanente to
find new innovative ways to deliver on its mission of high quality affordable care.
26. FUTURE OF DATA AND ANALYTICS IN HEALTHCARE
OUR FUTURE
Analytics leveraged by users
OUR PRESENT
Massive data stores and interconnectivity
Network
Storage and Compute
Data
Analytics: SAS, “R”, Proprietary,
Open-source, Third party
Scientist Clinician Business
Ubiquitous Access
§ Provide access to a greater
breadth of data
§ Enable data accessibility to
more than data scientists and BI
experts
Rapid Prototyping
§ Enable more frequent
prototyping and development
§ Faster delivery of new
capabilities
Entities that are able to leverage
Data & Analytics are becoming
leaders in their respective
industries
27.
28. FUTURE DATA LIFECYCLE MANAGEMENT IN HEALTHCARE
DataClassification DataAccess SLAs DataStore Characteristics
Hyperactive
High frequency access to data by multiple users and
applications with high temporal locality.
• Milliseconds –second • Compute centric cluster
• Data pinned in memory.
• Number of copies is dynamically adjusted based upon frequency and
recency of access
Active
Frequent access to data by multiple applications and
users with low temporal locality.
• Seconds -minutes • Compute centric cluster
• Multiple copies of data on nodes with SSDs or on nodes with smaller,
faster spinning disks
• On-demand in-memory cache
• Number of copies is dynamically adjusted based upon frequency of
access
Inactive
Sparse access to data over long periods of time.
• Minutes • Storage centric cluster
• Multiple copies of data on nodes with smaller, slower spinning disks
• Number of copies is determined by system fault-tolerance threshold
requirements
Dormant
No access to data for long periods of time.
• Hours • Storage centric cluster
• Multiple copies of data on nodes with bigger, slower spinning disks
• Number of copies is determined by system fault-tolerance threshold
requirements
Frequency, recency (temporal locality) and system fault-tolerance thresholds determine the tiered
storage requirements for ALL enterprise data.
In the Big Data Analytics World, Data is never deleted except when it is required for compliance reasons.
Create Store Obsolete DeleteFour Stages:
DataClassification DataAccess SLAs DataStore Characteristics
Hyperactive
High frequency access to data by multiple users and
applications with high temporal locality.
• Milliseconds –second • Compute centric cluster
• Data pinned in memory.
• Number of copies is dynamically adjusted based upon frequency and
recency of access
Active
Frequent access to data by multiple applications and
users with low temporal locality.
• Seconds -minutes • Compute centric cluster
• Multiple copies of data on nodes with SSDs or on nodes with smaller,
faster spinning disks
• On-demand in-memory cache
• Number of copies is dynamically adjusted based upon frequency of
access
Inactive
Sparse access to data over long periods of time.
• Minutes • Storage centric cluster
• Multiple copies of data on nodes with smaller, slower spinning disks
• Number of copies is determined by system fault-tolerance threshold
requirements
Dormant
No access to data for long periods of time.
• Hours • Storage centric cluster
• Multiple copies of data on nodes with bigger, slower spinning disks
• Number of copies is determined by system fault-tolerance threshold
requirements
Frequency, recency (temporal locality) and system fault-tolerance thresholds determine the tiered
storage requirements for ALL enterprise data.
In the Big Data Analytics World, Data is never deleted except when it is required for compliance reasons.
Create Store Obsolete DeleteFour Stages:
5
FUTURE DATA LIFECYCLE MANAGEMENT IN HEALTHCARE
DataClassification DataAccess SLAs DataStore Characteristics
Hyperactive
High frequency access to data by multiple users and
applications with high temporal locality.
• Milliseconds –second • Compute centric cluster
• Data pinned in memory.
• Number of copies is dynamically adjusted based upon frequency and
recency of access
Active
Frequent access to data by multiple applications and
users with low temporal locality.
• Seconds -minutes • Compute centric cluster
• Multiple copies of data on nodes with SSDs or on nodes with smaller,
faster spinning disks
• On-demand in-memory cache
• Number of copies is dynamically adjusted based upon frequency of
access
Inactive
Sparse access to data over long periods of time.
• Minutes • Storage centric cluster
• Multiple copies of data on nodes with smaller, slower spinning disks
• Number of copies is determined by system fault-tolerance threshold
requirements
Dormant
No access to data for long periods of time.
• Hours • Storage centric cluster
• Multiple copies of data on nodes with bigger, slower spinning disks
• Number of copies is determined by system fault-tolerance threshold
requirements
Frequency, recency (temporal locality) and system fault-tolerance thresholds determine the tiered
storage requirements for ALL enterprise data.
In the Big Data Analytics World, Data is never deleted except when it is required for compliance reasons.
Create Store Obsolete DeleteFour Stages:
29. EVOLUTION OF ANALYTICAL TOOLS IN HEALTHCARE
DataClassification DataAccess SLAs Data Preparation and Analytical Tools
Hyperactive
High frequency access to data by multiple users and
applications with high temporal locality.
• Milliseconds –second • Compute centric cluster
• Data pinned in memory.
• Data Tagging & Cataloguing –Waterline Data Science
• Data Preparation –Trifacta Tools
• Data Integration –Trifacta, Informatica
• Cubes and Visualization –Arcadia Data
• SAS – LASR (Visual Statistics, Visual Analytics, In-memory Statistics)
• H2O –In-memory deep machine learning, “R”
Active
Frequent access to data by multiple applications and
users with low temporal locality.
• Seconds -minutes • Compute centric cluster
• On-demand in-memory caching of data
• Data Tagging & Cataloguing –Waterline Data Science
• Data Preparation –Trifacta Tools
• Data Integration –Trifacta, Informatica
• Cubes and Visualization –Arcadia Data
• SAS – LASR (Visual Statistics, Visual Analytics, In-memory Statistics)
• H2O –In-memory deep machine learning, “R”
Inactive
Sparse access to data over long periods of time.
• Minutes • Storage centric cluster
• Data Tagging & Cataloguing –Waterline Data Science
• Data Preparation –Trifacta Tools
• Cubes and Visualization –Arcadia Data
• Data Visualization –Tableau Servers & Desktops
• SAS Tools running on separate SAS Grid Cluster
Dormant • Hours • Storage centric cluster
• Data Tagging & Cataloguing –Waterline Data Science
Analytical tools space is changing very fast. Data Preparation and Analytical tools that natively run
“IN” the cluster provide a higher degree of security and freedom than the ones that run “WITH” the
cluster.
EVOLUTION OF ANALYTICAL TOOLS IN HEALTHCARE
DataClassification DataAccess SLAs Data Preparation and Analytical Tools
Hyperactive
High frequency access to data by multiple users and
applications with high temporal locality.
• Milliseconds –second • Compute centric cluster
• Data pinned in memory.
• Data Tagging & Cataloguing –Waterline Data Science
• Data Preparation –Trifacta Tools
• Data Integration –Trifacta, Informatica
• Cubes and Visualization –Arcadia Data
• SAS – LASR (Visual Statistics, Visual Analytics, In-memory Statistics)
• H2O –In-memory deep machine learning, “R”
Active
Frequent access to data by multiple applications and
users with low temporal locality.
• Seconds -minutes • Compute centric cluster
• On-demand in-memory caching of data
• Data Tagging & Cataloguing –Waterline Data Science
• Data Preparation –Trifacta Tools
• Data Integration –Trifacta, Informatica
• Cubes and Visualization –Arcadia Data
• SAS – LASR (Visual Statistics, Visual Analytics, In-memory Statistics)
• H2O –In-memory deep machine learning, “R”
Inactive
Sparse access to data over long periods of time.
• Minutes • Storage centric cluster
• Data Tagging & Cataloguing –Waterline Data Science
• Data Preparation –Trifacta Tools
• Cubes and Visualization –Arcadia Data
• Data Visualization –Tableau Servers & Desktops
• SAS Tools running on separate SAS Grid Cluster
Dormant
No access to data for long periods of time.
• Hours • Storage centric cluster
• Data Tagging & Cataloguing –Waterline Data Science
• Data Preparation –Trifacta Tools
• Data Visualization –Tableau Servers & Desktops
• SAS Tools running on separate SAS Grid Cluster
Analytical tools space is changing very fast. Data Preparation and Analytical tools that natively run
“IN” the cluster provide a higher degree of security and freedom than the ones that run “WITH” the
cluster.