A brief tutorial on Big Data and its applications to healthcare. The discussion is centered around technical aspects related to this method of computing rather than concrete examples of its use in medical practice.
Massive-Scale Analytics Applied to Real-World Problemsinside-BigData.com
In this deck from PASC18, David Bader from Georgia Tech presents: Massive-Scale Analytics Applied to Real-World Problems.
"Emerging real-world graph problems include: detecting and preventing disease in human populations; revealing community structure in large social networks; and improving the resilience of the electric power grid. Unlike traditional applications in computational science and engineering, solving these social problems at scale often raises new challenges because of the sparsity and lack of locality in the data, the need for research on scalable algorithms and development of frameworks for solving these real-world problems on high performance computers, and for improved models that capture the noise and bias inherent in the torrential data streams. In this talk, Bader will discuss the opportunities and challenges in massive data-intensive computing for applications in social sciences, physical sciences, and engineering."
Watch the video: https://wp.me/p3RLHQ-iPk
Learn more: https://pasc18.pasc-conference.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
With big data poised to change the healthcare ecosystem, organizations need to devote time and resources to understanding this phenomenon and realizing the envisioned benefits.
A brief tutorial on Big Data and its applications to healthcare. The discussion is centered around technical aspects related to this method of computing rather than concrete examples of its use in medical practice.
Massive-Scale Analytics Applied to Real-World Problemsinside-BigData.com
In this deck from PASC18, David Bader from Georgia Tech presents: Massive-Scale Analytics Applied to Real-World Problems.
"Emerging real-world graph problems include: detecting and preventing disease in human populations; revealing community structure in large social networks; and improving the resilience of the electric power grid. Unlike traditional applications in computational science and engineering, solving these social problems at scale often raises new challenges because of the sparsity and lack of locality in the data, the need for research on scalable algorithms and development of frameworks for solving these real-world problems on high performance computers, and for improved models that capture the noise and bias inherent in the torrential data streams. In this talk, Bader will discuss the opportunities and challenges in massive data-intensive computing for applications in social sciences, physical sciences, and engineering."
Watch the video: https://wp.me/p3RLHQ-iPk
Learn more: https://pasc18.pasc-conference.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
With big data poised to change the healthcare ecosystem, organizations need to devote time and resources to understanding this phenomenon and realizing the envisioned benefits.
Big data is everywhere , although sometimes we may not immediately realize it . First thing to be believed is that most of us don't deal with large amount of data in our life except in unusual circumstance. Lacking this immediate experience, we often fail to understand both opportunities as well challenges presented by big data. There are currently a number of issues and challenges in addressing these characteristics going forward.
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...Amit Sheth
Keynote given at ICDE2014, April 2014. Details at: http://ieee-icde2014.eecs.northwestern.edu/keynotes.html
A video of a version of this talk is available here: http://youtu.be/8RhpFlfpJ-A
(download to see many hidden slides).
Two versions of this talk, targeted at Smart Energy and Personalized Digital Health domains/apps at: http://wiki.knoesis.org/index.php/Smart_Data
Previous (older) version replaced by this version: http://www.slideshare.net/apsheth/big-data-to-smart-data-keynote
Baptist Health: Solving Healthcare Problems with Big DataMapR Technologies
Editor’s Note: Download the complimentary MapR Guide to Big Data in Healthcare for more information: https://mapr.com/mapr-guide-big-data-healthcare/
There is no better example of the important role that data plays in our lives than in matters of our health and our healthcare. There’s a growing wealth of health-related data out there, and it’s playing an increasing role in improving patient care, population health, and healthcare economics.
Join this webinar to hear how Baptist Health is using big data and advanced analytics to address a myriad of healthcare challenges—from patient to payer—through their consumer- centric approach.
MapR Technologies will cover broader big data healthcare trends and production use cases that demonstrate how to converge data and compute power to deliver data-driven healthcare applications.
Are you having doubts and questions about how to use Big Data in your organizations? The presentation here would clear some of your doubts.
Feel free to comment if you have more queries or write to us at: bigdata@xoriant.com
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaMaria de la Iglesia
Según Hal Varian (experto en microeconomía y economía de la información y, desde el año 2002, Chief Economist de Google) “En los próximos años, el trabajo más atractivo será el de los estadísticos: La capacidad de recoger datos, comprenderlos, procesarlos, extraer su valor, visualizarlos, comunicarlos serán todas habilidades importantes en las próximas décadas. Ahora disponemos de datos gratuitos y omnipresentes. Lo que aún falta es la capacidad de comprender estos datos“.
6.a survey on big data challenges in the context of predictiveEditorJST
Information is producing from various assets in a quick fashion. In request to know how much information is advancing we require predictive analytics. When the information is semi organized or unstructured the ordinary business insight calculations or instruments are not useful. In this paper, we have attempted to call attention to the difficulties when we utilize business knowledge devices
The mission of the IHME is to apply rigorous measurement and analysis to help policy makers make better decisions on a range of health policy issues. Like other organizations, the IHME have embraced containers and micro-services aggressively to better support hundreds of collaborating researchers.
In addition to containerized workloads, the IHME run a wide-variety of traditional analytic, simulation and high-performance computing workloads on an HPC cluster with 15,000 cores and 13PB of storage. Researchers increasingly need to combine both containerized and non-containerized elements into workflow pipelines, and a key challenge has been ensuring SLAs for various departments and avoiding duplicate infrastructure and unnecessary data movement and duplication. In collaboration with industry partners, IHME have deployed a unique solution based on Univa’s Navops technology that allows them to combine containerized and traditional analytic and high-performance application workloads on a single shared Kubernetes cluster, ensuring departmental SLAs and helping contain infrastructure costs.
In this talk Dr. Grandison will discuss IHME, their experience deploying containerized applications and how they went about using Kubernetes to support a variety of new containerized applications as well as a variety of traditional analytic applications.
User Experience - How Sensors and Big Data will change your Healthcare experi...Mark D'Cunha
In the Hospital of the Future, Big Data is one of your doctors.
The growing use of sensors will drive huge volumes of data that will change your Healthcare experience. We must learn how to create better user experiences for monitoring, fitness and health.
This prevention is a reflection of my vision on how Big Data impacts healthcare and the efforts that Oracle and VX Healthcare Analytics put into making Big Data work in the patient profiling space
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...datacite
2013 DataCite Summer Meeting - Making Research better
DataCite. Co-sponsored by CODATA.
Thursday, 19 September 2013 at 13:00 - Friday, 20 September 2013 at 12:30
Washington, DC. National Academy of Sciences
http://datacite.eventbrite.co.uk/
Transforming Research in Collaboration with Funding AgenciesAmazon Web Services
Funding agencies constitute one of the essential pillars for research and have been the backbone for innovation. Data-driven collaborative research is an integral part of many domains. In this session, leaders from the world's largest biomedical and science research agencies, the National Institutes of Health (NIH) and the National Science Foundation (NSF) discuss their programs, including NIH Data Commons and Harnessing the Data Revolution (HDR). The goal of the NIH Data Commons is to accelerate new biomedical discoveries by providing a cloud-based platform where investigators can store, share, access, and compute on digital objects generated from biomedical research. HDR is one of the 10 "Big Ideas" for future investment from the NSF for fundamental data science research. These collaborative initiatives will enable researchers to accelerate science and engineering through improved access to data, tooling, analytic resources in the cloud. These programs will revolutionize the way scientific data and resources are utilized by the research communities.
Big data is everywhere , although sometimes we may not immediately realize it . First thing to be believed is that most of us don't deal with large amount of data in our life except in unusual circumstance. Lacking this immediate experience, we often fail to understand both opportunities as well challenges presented by big data. There are currently a number of issues and challenges in addressing these characteristics going forward.
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...Amit Sheth
Keynote given at ICDE2014, April 2014. Details at: http://ieee-icde2014.eecs.northwestern.edu/keynotes.html
A video of a version of this talk is available here: http://youtu.be/8RhpFlfpJ-A
(download to see many hidden slides).
Two versions of this talk, targeted at Smart Energy and Personalized Digital Health domains/apps at: http://wiki.knoesis.org/index.php/Smart_Data
Previous (older) version replaced by this version: http://www.slideshare.net/apsheth/big-data-to-smart-data-keynote
Baptist Health: Solving Healthcare Problems with Big DataMapR Technologies
Editor’s Note: Download the complimentary MapR Guide to Big Data in Healthcare for more information: https://mapr.com/mapr-guide-big-data-healthcare/
There is no better example of the important role that data plays in our lives than in matters of our health and our healthcare. There’s a growing wealth of health-related data out there, and it’s playing an increasing role in improving patient care, population health, and healthcare economics.
Join this webinar to hear how Baptist Health is using big data and advanced analytics to address a myriad of healthcare challenges—from patient to payer—through their consumer- centric approach.
MapR Technologies will cover broader big data healthcare trends and production use cases that demonstrate how to converge data and compute power to deliver data-driven healthcare applications.
Are you having doubts and questions about how to use Big Data in your organizations? The presentation here would clear some of your doubts.
Feel free to comment if you have more queries or write to us at: bigdata@xoriant.com
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaMaria de la Iglesia
Según Hal Varian (experto en microeconomía y economía de la información y, desde el año 2002, Chief Economist de Google) “En los próximos años, el trabajo más atractivo será el de los estadísticos: La capacidad de recoger datos, comprenderlos, procesarlos, extraer su valor, visualizarlos, comunicarlos serán todas habilidades importantes en las próximas décadas. Ahora disponemos de datos gratuitos y omnipresentes. Lo que aún falta es la capacidad de comprender estos datos“.
6.a survey on big data challenges in the context of predictiveEditorJST
Information is producing from various assets in a quick fashion. In request to know how much information is advancing we require predictive analytics. When the information is semi organized or unstructured the ordinary business insight calculations or instruments are not useful. In this paper, we have attempted to call attention to the difficulties when we utilize business knowledge devices
The mission of the IHME is to apply rigorous measurement and analysis to help policy makers make better decisions on a range of health policy issues. Like other organizations, the IHME have embraced containers and micro-services aggressively to better support hundreds of collaborating researchers.
In addition to containerized workloads, the IHME run a wide-variety of traditional analytic, simulation and high-performance computing workloads on an HPC cluster with 15,000 cores and 13PB of storage. Researchers increasingly need to combine both containerized and non-containerized elements into workflow pipelines, and a key challenge has been ensuring SLAs for various departments and avoiding duplicate infrastructure and unnecessary data movement and duplication. In collaboration with industry partners, IHME have deployed a unique solution based on Univa’s Navops technology that allows them to combine containerized and traditional analytic and high-performance application workloads on a single shared Kubernetes cluster, ensuring departmental SLAs and helping contain infrastructure costs.
In this talk Dr. Grandison will discuss IHME, their experience deploying containerized applications and how they went about using Kubernetes to support a variety of new containerized applications as well as a variety of traditional analytic applications.
User Experience - How Sensors and Big Data will change your Healthcare experi...Mark D'Cunha
In the Hospital of the Future, Big Data is one of your doctors.
The growing use of sensors will drive huge volumes of data that will change your Healthcare experience. We must learn how to create better user experiences for monitoring, fitness and health.
This prevention is a reflection of my vision on how Big Data impacts healthcare and the efforts that Oracle and VX Healthcare Analytics put into making Big Data work in the patient profiling space
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...datacite
2013 DataCite Summer Meeting - Making Research better
DataCite. Co-sponsored by CODATA.
Thursday, 19 September 2013 at 13:00 - Friday, 20 September 2013 at 12:30
Washington, DC. National Academy of Sciences
http://datacite.eventbrite.co.uk/
Transforming Research in Collaboration with Funding AgenciesAmazon Web Services
Funding agencies constitute one of the essential pillars for research and have been the backbone for innovation. Data-driven collaborative research is an integral part of many domains. In this session, leaders from the world's largest biomedical and science research agencies, the National Institutes of Health (NIH) and the National Science Foundation (NSF) discuss their programs, including NIH Data Commons and Harnessing the Data Revolution (HDR). The goal of the NIH Data Commons is to accelerate new biomedical discoveries by providing a cloud-based platform where investigators can store, share, access, and compute on digital objects generated from biomedical research. HDR is one of the 10 "Big Ideas" for future investment from the NSF for fundamental data science research. These collaborative initiatives will enable researchers to accelerate science and engineering through improved access to data, tooling, analytic resources in the cloud. These programs will revolutionize the way scientific data and resources are utilized by the research communities.
SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017Sandra Gesing
The importance of Big Data and Open Data to achieve scientific advancements in precision medicine is beyond doubt and evident in many different projects and initiatives such as the Precision Medicine Initiative (All of Us), ICTBioMed, NCIP Hub, 100K Genomics England Project, NIH Cancer Moonshot, and the Million Veterans Program. In April 2013, McKinsey & Company proclaimed that Big Data has the ability to revolutionize pharmaceutical research and development within clinical environments, by using data for better informed decision making and targeting the diverse user roles including physicians, consumers, insurers, and regulators. Companies from a wide spectrum such as Oracle Health Sciences, Google, and Data4Cure build solutions that help address efficient and secure data sharing with the patient or clinician in mind. Open data can be maintained and shared by patient communities such as PatientsLikeMe.com and build an invaluable resource for further data mining.
Even with all these advances there are still challenges to address including a recent Precision Medicine World Conference announcement in November 2016: “We are missing easy-to-use solutions to share patient data.” Science gateways are a solution to fill the gap and help form by definition end-to-end solutions – web-based, mobile or desktop applications - that provide intuitive access to advanced resources and allow researchers to focus on tackling today’s challenging science questions. Science Gateways abstract the complex underlying computing and data infrastructure as far as feasible and desired by the stakeholder and can be tailored to different target groups with diverse backgrounds, demands, and technical knowledge.
Science Gateways have existed for over a decade and a wide variety of frameworks and APIs have been developed to support the efficient creation of science gateways and ease the implementation of connections to Cloud infrastructures and distributed data on a large scale. The importance of science gateways has been recognized by NSF by funding the creation of a Science Gateways Community Institute (SGCI) to serve the community with free resources, services, experts, and ideas for creating and sustaining science gateways. To achieve this goal, the SGCI serves the community with five areas that have diverse foci and which also closely interact: Incubator, Extended Developer Support, Scientific Software Collaborative, Community Engagement and Exchange and Workforce Development.
The Institute is technology-agnostic and serves the community by offering a wide variety of services and using technologies that are the best fitting solution for the use case. Gateways allow for precision medicine to be more efficiently developed or adapted by lowering the barriers to data sharing and Big Data analysis.
PAARL's 1st Marina G. Dayrit Lecture Series held at UP's Melchor Hall, 5F, Proctor & Gamble Audiovisual Hall, College of Engineering, on 3 March 2017, with Albert Anthony D. Gavino of Smart Communications Inc. as resource speaker on the topic "Using Big Data to Enhance Library Services"
1. Big Data in Health Care
Strata – New York
September, 28th 2016
Sabrina Dahlgren, Director, Health care Delivery and Innovation, Kaiser Permanente - @sabridahlgren
Taposh Dutta Roy, Health Lead, Innovation and Data Science, Decision Support, Kaiser Permanente -@taposhdr
Rajiv Synghal, Chief Architect, Big Data Strategy, Kaiser Permanente @synghalr
2. the Largest integrated delivery system in the US
primary care
specialty care
home care
lab
hospital
pharmacy
optical
dental
research
insurance
integratedApproaching 11 million members
INTEGRATED CARE DELIVERY
200K employees
19,000 physicians
38 hospitals
608 medical offices and
other outpatient facilities
$61 billion operating revenue
(2016)
Mission: Kaiser Permanente exists to provide high-quality, affordable
health care services and to improve the health of our members and
the communities we serve.
3. TECHNOLOGYINNOVATION ORIGINS
1960s|Dr. Sidney Garfield & Dr. Morris Collen
“We should begin to take advantage of electronic digital computers.”
“Continuing total health care requires a continuing life record for
each individual…The content of that life record, now made possible
by computer information technology, will chart the course to be
taken by each individual for optimal health.”
Sidney Garfield, MD
Hospital Computer Systems, 1974
Supporting Health Care with Technology
4. OUR TECHNOLOGY JOURNEY
EHR
(KP Health
Connect)
Circle of Support
(continuous
availability, ancillary
systems)
Online
(KP.org)
Transforming
Health Care
Foundational
(Facilities)
Precision Medicine
Mobile App
(Video
Visits) Machine learning
Patient 360 View
(Cloud, IOT)
7. 6 HRI Consumer Survey, PwC, 2013, 2015
Percentage of consumers with at least one
medical, health or fitness app on their mobile
devices doubled from 2013 to 2015*
9. 8
Health 2050 Vision
Figure 1,Health 2050: The Realization of Personalized Medicine through Crowdsourcing, the Quantified Self, and the Participatory Biocitizen, Journal of
Personalized Medicine
Traditional
Population-
wide
demographics
Cohort-
relevant
measures
Individual
N=1
Traditional Model Emerging Model Future Model
10. Analytics and Enterprise Information Strategy…
9
right insights
Business
intelligence
and analytics
right time
and form
Delivery and
decision aids
good data
Data
sources and
platforms
information as a
strategic asset
Frame right questions
Make better decisions fast
Link decisions to action
decision maker
centered
Leaders & Managers
Care and Service Providers
Patients, Members, Groups
Approach Strategy Outcome
Easy to
Frame right questions
Know and do right things
Make better decisions
right information
+
at the right time
+
in the right hands
11. Making the RIGHT Architecture Choices…
Feature RDBMS Ecosystem Distributed Ecosystem
Security ü Easy within a System. Fragmented across Enterprise. ü Five tier security built-in across all enterprise systems.
Governance ü Easy within a domain. Difficult across domains. ü Easy across systems. Easy to align fields and definitions.
Organizational Preparedness ü Mature People, Process. § Maturing People, Processes
Operational Readiness ü Mature § Maturing. 24 x 7 Support. Active-Active DR. Class of Service 0/1.
Storage Formats § Single. Opaque storage formats. ü Multiple. Open storage formats.
Row Vs. Columnar § Maturing ü Mature
Data Partitioning § Add on component. After thought. ü Partitioning and hashing are first class citizens.
OLAP § Design time and pre-built aggregations. ü On demand, real time aggregations. Agility.
Alignment § Version mismatch across functional components ü Functional components are an integral part of the ecosystem
Search & Data Mining § Missing. Add on component. ü Native
h/w Scalability § Each tier has to independently scale - CPU/Memory. ü Built in scalability.
System Scalability § Vertical ü Horizontal
RDBMS Ecosystem - Distributed Ecosystem -
Centralized Data and Processing on a Distributed Platform.
Minimize data
movement
Data and its various interaction points
Heuristic
processing
Identify data
linkages
Novel parallel processing
techniques and
algorithms
Liberate data from
source systems
Knowledge silos
Fragmented Data and Processing on RDBMS Platforms.
Siloed IT
departments
Fragmented data
models
Data
repurposing
Data
rejuvenation
Fragmented profiling &
data quality
Missing data
linkages
Siloed
business units
Fragmented data selection
rules, business logic,
algorithms…
13. Collect Curate Enrich
12
Pass
Sample Design
Collect
Ingest
Pre-checks
Failed
Ingest
Process
2016-08-25 5:35
2016-08-25 5:35
Curate Enrich
Pass
Verification
Pass
Data
Profiling
2016-08-25 5:35
2016-08-25 5:35
Pass
Enrich
Pre-checks
NA
Enrich
Process
2016-08-25 5:35
Use-case
Growth of run time
Growth of storage
Total tables
processed
40 tables
(Q1: 35 tables)
40 tables 12 tables
Source
Systematic Data Liberation…
Solving data
repurposing
– single
copy,
multiple
use, enrich &
context
archive
continuous
archive
ingest
true ELT
semantic
equivalence
14. 13
episodes
pharmacy
membership
lab
kp.org
employee
hr
system logs
help desk
Consistent User Experience: One Platform
HDFS
Raw
Data
Zone
All Data Encrypted @ Rest
JDBC - Impala Lookup Query / Arcadia OLAP Query
User
Defined
Zone
Refined
Data
Zone
Master
Data
Zone
Meta
Data
Zone
Reference
Data
Zone
Usage
Data
Zone
Exploratory
Intelligence
Analyze MineRefineDiscover
Smart Data Zone
(Semantic Layer)
Data Platform – Landing Zone
Visualize
episode groupings
high utilizers
actionable findings
search prescriptions
semantic
hr analytics
risk intelligence
Liberate Curate Enrich ConsumeCollect
17. 16
Machine Learning - Current Process
Refining data
(clean/impute)
Applying variety of
algorithms – training
& testing datasets
Getting relevant
data
Feature
engineering
Data
pipeline
Dashboard for
model
performance
Data
pipeline
Continuous
improvement
18. Our Analytics Strategy
Mentoring
Developed our
people through
training.
Our
analytics
strategy
1
Challenging
Developed an
internal crowd
sourced machine
learning challenge.
2
Enabling
Provided an
infrastructure to
explore and
develop.
3
19. Mentoring
September 30, 2016
Training Programs In Person Events Ongoing Learning
Training our people
on big data and
statistical tools.
Developed
learning forum and
venues to support
machine learning.
Lunch and learn
seminars
20. 19
The power of crowd
https://en.wikipedia.org/wiki/Board_of_Longitude
What was the first problem solved through crowd sourcing
and when?
1714 – British Board of Longitude
Problem : To determine the longitude of a ship at sea.
21. Data Science Challenge
September 30, 2016
Nature (Aug 2016): Crowdsourcing biomedical research: leveraging communities as innovation engines Julio Saez-Rodriguez et. al.
22. 21
Organizational Challenge Flow
Nature (Aug 2016): Crowdsourcing biomedical research: leveraging communities as innovation engines Julio Saez-Rodriguez et. al.
23. IMPROVED ANALYTICS INFRASTRUCTURE
Competition
enabled us to test
an analytics
ecosystem
Competition
prompted us to
develop a process
to manage open
source tools.
The data science
team provided
access to
participants to de-
identified data
following HIPAA
compliance.
24. Results from Challenge
Fast facts
• Over 100 KP data scientists participated in the competition
• 1000 models were submitted in 6 weeks
• Model performance improved by more than 5%
• … in less than <10% of time
Learnings
•Discussion forums were lively and collaboration increased
•New algorithm strategies were discovered
•Papers are planned to be published to share the learning on
the algorithm strategies
27. 26
Acknowledgement
We wish to acknowledge the contribution of
many to this work:
• The Permanente Medical Group Physicians and the
Permanente Federation
• Health Plan and Hospital Operations, Quality and Finance
teams
• Kaiser Permanente Information Technology
It takes a “virtual” village … !