SlideShare a Scribd company logo
©2015 MFMER | slide-1
Big Data Platform Processes Daily
Healthcare Data for Clinic Use
at Mayo Clinic
Dequan Chen, Ph.D.
Mayo Clinic Big Data Core Team
chen.dequan@mayo.edu; 507-250-7400
San Jose Convention Center
June 9, 2015
©2015 MFMER | slide-2
Outlines
• Healthcare Data Challenge at Mayo Clinic
 HL7 Data for Integrated Enterprise Usage
 Data Volumes, Types, and Processing Velocity
 Incapability of Existing RDBMS/Web Systems
• Big Data Implementation for Enterprise-Level
Clinical and Non-Clinical Usage
• Big Data Implementation in Support of
Colorectal Surgery Applications
• On-going & Future Direction
• Conclusion
©2015 MFMER | slide-3
Healthcare Data Challenge at Mayo Clinic
©2015 MFMER | slide-4
Healthcare Data Challenge at Mayo Clinic
HL7 Data for Integrated Enterprise Usage
• World’s largest integrated not-for-profit
healthcare system – > 70 hospitals and clinics
 Enterprise Core Value: The Needs of the
Patient Come First
• Mayo Clinic Rochester, Minn. recognized as the
top hospital in the nation for 2014-2015 by U.S.
News & World Report
• Provides care for > 1m (1,317,900 in 2014)
patients from all 50 states & > 150 countries
annually
©2015 MFMER | slide-5
Healthcare Data Challenge at Mayo Clinic
HL7 Data for Integrated Enterprise Usage
• Generates large amounts of EHR Data
 Structured
 Semi-Structured
 Unstructured
• HL7 messages – mix of semi- and un-structured
EHR data
 Enterprise-level clinical usage (diagnosis, treatment,
prevention, or clinical reporting)
 Enterprise-level non-clinical usage (research,
business intelligence, or health information exchange)
©2015 MFMER | slide-6
Healthcare Data Challenge at Mayo Clinic
HL7 Data for Integrated Enterprise Usage
• HL7 Message Example: (Source: http://www.priorityhealth.com)
MSH|^~&|XXXX|C|PRIORITYHEALTH|PRIORITYHEALTH|20080511103530||ORU
^R01|Q335939501T337311002|P|2.3|||
PID|1||94000000000^^^Priority
Health||LASTNAME^FIRSTNAME^INIT||19460101|M|||||
PD1|1|||1234567890^PCPLAST^PCPFIRST^M^^^^^NPI|
OBR|1||185L29839X64489JLPF~X64489^ACC_NUM|JLPF^Lipid Panel -
C||||||||||||1694^DOCLAST^DOCFIRST^^MD||||||20080511103529|||
OBX|1|NM|JHDL^HDL Cholesterol
(CAD)|1|62|CD:289^mg/dL|>40^>40|""||""|F|||20080511103500|||^^^""|
OBX|2|NM|JTRIG^Triglyceride (CAD)|1|72|CD:289^mg/dL|35-
150^35^150|""||""|F|||20080511103500|||^^^""|
OBX|3|NM|JVLDL^VLDL-C (calc -
CAD)|1|14|CD:289^mg/dL||""||""|F|||20080511103500|||^^^""|
OBX|4|NM|JLDL^LDL-C (calc - CAD)|1|134|CD:289^mg/dL|0-
100^0^100|H||""|F|||20080511103500|||^^^""|
OBX|5|NM|JCHO^Cholesterol (CAD)|1|210|CD:289^mg/dL|90-
200^90^200|H||""|F|||20080511103500|||^^^""|
…
©2015 MFMER | slide-7
Healthcare Data Challenge at Mayo Clinic
Data Volumes
• Daily enterprise-wide volume of real-time HL7
message data (msgs/day)
• Large number (~1.83 billion prior to 12-31-
2014) of historical HL7 message data at Mayo
Clinic
©2015 MFMER | slide-8
Healthcare Data Challenge at Mayo Clinic
Data Types, and Processing Velocity
• 60+ document types of HL7 messages
 Each document type generated by an individual
healthcare source system
 Ex: Clinical Notes (cnote), Surgical Notes (opnote),
Radiology, Pathology, Health_Quest, ECG/EKG…
• Capability of fast processing (storing, analyzing,
retrieving) of all types of HL7 data
 Real-time data and/or historical data
 Seconds - ER, ICU and Surgery Healthcare
 Minutes - Internal or Prevention Medicine
©2015 MFMER | slide-9
Healthcare Data Challenge at Mayo Clinic
Challenges of Existing RDBMS/Web Systems
• For enterprise-level clinical and non-clinical
usage, the existing multiple RDBMS-based
system implementations cannot achieve:
 All Real-time HL7 Messages – synchronously
stored, analyzed and retrieved
 All Real-time and/or Historical HL7 Messages –
quickly analyzed and retrieved
 Fast Free-Text Search on any medical terms
 Easy & Lower-cost scalability (scale-up & scale-out)
©2015 MFMER | slide-10
Big Data Implementation for Enterprise-
Level Clinical and Non-Clinical Usage
©2015 MFMER | slide-11
Mayo Clinic Big Data Platform
MC BigData Appliance (V1.0)
• Started implementation in Jan 2014
• Purchased from Teradata
• SUSE Linux Enterprise Server 11 (SLES 11)
• Integration and Production Hadoop clusters
• Each Hadoop cluster:
 2 edge nodes, 2 master nodes, 6 data nodes
 Hadoop Stack – TDH1.3.2: Teradata-certified and
modified HDP1.3.2
 HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Flume, Hcatalog,
WebHcat, Zookeeper, Ganglia, Nagios, Oozie, Hue, and Ambari
©2015 MFMER | slide-12
Mayo Clinic Big Data Platform
MC BigData Appliance (V1.0)
• Each Hadoop cluster (cont’d):
 Built-in PostgreSQL Database Instance
 Apache Storm (version 0.9.1)
 ElasticSearch (ES) (version 1.0.0) Cluster
 Instances of Home-Developed Storm Topology –
MayoTopology (One instance for one doc-type of
HL7 messages)
o1 spout
o 2+ bolts
©2015 MFMER | slide-13
Data Flow Architecture
Persisting Mayo Clinic Healthcare Data Into HDFS and
ElasticSearch Index inside MC BigData Appliance(V1.0)
©2015 MFMER | slide-14
Testing (Measurement) Architecture
Measurement of Data Persisting Capacity of MC BigData
Appliance (V1.0)
©2015 MFMER | slide-15
HL7 Data Processing Capability
HDFS and ES Index Data Persisting Capacity of MC
BigData Appliance (V1.0)
• Daily HL7-Persisting Capacity
• 62 ± 4 million HL7 messages/day
• No statistically significant change occurs when more
types of HL7 Messages were included
• ~20-50x more capacity than current daily max
volume of all internal HL7 messages
©2015 MFMER | slide-16
HL7 Data Processing Capability
Ultra-Fast Free-Text Search Capacity by ES Index of MC
BigData Appliance (V1.0)
• Data on ES Index
 HL7-V2-messages-derived-JSON-documents
• Data Set size vs. Query speed (querying “Pain”)
• ES Index: ~20000 – 30000x faster than NSE
©2015 MFMER | slide-17
Production Processing HL7 Data
Reliability of MayoTopology Instances on BDProd Cluster
on MC BigData Appliance (V1.0 & V2.0)
• Production running started in May 2014
• Expanded to add additional edge nodes (3
more) and data nodes (4 more)
• Upgraded from TDH1.3.2 to TDH 2.1
.. from ES1.0.0 to .. to ES1.5.2
• Successfully identified and fixed critical issues
on the appliance
• No Data Loss occurred up to now
©2015 MFMER | slide-18
Big Data Implementation in Support of
Colorectal Surgery Applications
©2015 MFMER | slide-19
Goals for Support of Colorectal Surgery
Applications
• Optimize an existing NLP (Natural Language
Processing) Pipeline
• Move from thousands of HL7 documents to tens of
thousands of documents processed daily
• Replace existing free-text search facility used
by Clinical Web Service supported applications
• Move from minutes to milliseconds per search
• Simplify overall architecture, increase data
volume/velocity, and reduce costs
©2015 MFMER | slide-20
Output
Parser
Colorectal
Surgical
Applications
HL7
HL7
HL7
UIMA
Annotators
4-6 JMS queues,
250-300k HL7
msgs /day
Elasticsearch
(indexing & free
text search)
HL7
HL7
HL7
Storm
(ID,
Transform,
and Parse)
HDFS
HL7 mgs for
annotation
To
Annotation
Facility
Clinical
Web
Services
REST
API
SQL
NLP Discovery
(MR, Hive, Pig,
other)
Existing
Components
Radiology
Surgical
ECG/EKG
Pathology
Clinical Notes
Insurance
Claims
Flume
RDBMS
Persist
Rules
Engine
New
Components
Services
EnterpriseMessagingQueues(ESB)
Solution Architecture
In Support of Colorectal Surgery Applications
©2015 MFMER | slide-21
Enterprise Messaging Queues
ClinNotes
Surgery
Radiology
Rch Results
Insurance
ClinNotes
OpNotes
RadiolRpt
ECG Rpt
HDFS
Big Data Platform
Storm
1.Parse/Transform HL7
2.Persist JSON to Elasticsearch
3.Persist HL7 to HDFS
4.Route HL7 to NLP Queues
ClinNotes
OpNotes
RadiolRpt
ECG Rpt
NLP Input Queues
NLP Output Queues
RBMS Structured Data Store
NLP Evidence (annotations)
Structured Data (from source
systems)
CRS Point
of Care Tool
User Interfaces
External NLP Annotators
Parse HL7
Setup UIMA Resources
based on Message Type
Run UIMA Pipeline
Output 1:n Annotation
Results (NLP evidence)
A1. CRS_BLEED
A2. CRS_ILEUS
A3. NEURO_BLEED
An. <expandable>
…
Big Data Input Queues
Josh Pankratz – Apr 24, 2014
Elasticsearch
SQL
Solution Implementation
©2015 MFMER | slide-22
Data
Flow
Architecture
©2015 MFMER | slide-23
CRS HL7 Data Processing Capability
MC BigData Appliance(Hadoop-ES)-NLPAnnotation(DS)-
AmalgaRDB/Web for Colorectal Surgery (CRS)
• Daily HL7-Persisting Capacity
• 535k ± 31k messages/day
• ~8-25x more capacity than current daily max
volume of CRS HL7 messages
©2015 MFMER | slide-24
Production Processing CRS HL7 Data
Reliability of MC BigData Appliance(Hadoop-ES)-
NLPAnnotation(DS)-AmalgaRDB/Web for Colorectal
Surgery (CRS)
• Production running started in July 2014
• No Data Loss occurred up to now
©2015 MFMER | slide-25
On-going & Future Direction
©2015 MFMER | slide-26
On-going & Future Direction
• Move current NLP Annotation Pipeline from
DataStage production server environment to
MC BigData appliance Hadoop cluster for
CRS applications
• Storm Topology
• Dedicated edge nodes
Faster & more reliable NLP annotation
Higher Capacity of HL7 message processing
©2015 MFMER | slide-27
On-going & Future Direction
• Build a unified data architecture – Unified Data
Platform (UDP, an enterprise-integrated
system) over the next few years:
Enhance the Big Data platform
Utilize existing RDBMS-based replication and data
warehouse environment
Create a variety of data endpoints (cubes, data
services, advanced visualizations) for enterprise
usage
Integrate with non-Hadoop components for
advanced Big Data analytics – R, Revolution R..
©2015 MFMER | slide-28
Conclusion
©2015 MFMER | slide-29
Conclusion
Take-Home Messages
• The implemented BigData platform coupled with
DataStage (NLP) & RDBMS exceeds current
Mayo Clinic patient-care needs:
• Reliably handle ~20-50x more capacity than
current daily volume of all HL7 messages
• Provide ultra-fast Free-Text Search capabilities on
medical terms
• Reliably handle ~8-25x more capacity than current
daily volume of HL7 messages for Colorectal
Surgery Applications
• Significantly outperform RDBMS-only-based
systems
©2015 MFMER | slide-30
Conclusion
Take-Home Messages
• Big Data is a core component of Mayo Clinic
UDP, which can utilize the power of Big Data
technology at enterprise-level:
Large data storage capability
Structured, semi-structured and unstructured data
Fast data exchange with RDBMS-based systems
A variety of data-oriented Hadoop components –
HDFS, Pig, Hive, HBase, Spark ..
In-situ non-Hadoop data-processing components –
R, Revolution R ..
©2015 MFMER | slide-31
Questions & Discussion
©2015 MFMER | slide-32
Reference Links
• Mayo Clinic: http://www.mayoclinic.org/
• HL7: http://www.hl7.org/
• Hadoop Stack: http://hortonworks.com
• Apache Storm: https://storm.apache.org/
• ElasticSearch: https://www.elastic.co/
• Teradata: http://www.teradata.com/

More Related Content

What's hot

Azure automation
Azure automationAzure automation
Azure automation
Tariq Younas
 
Introduction to GCP (Google Cloud Platform)
Introduction to GCP (Google Cloud Platform)Introduction to GCP (Google Cloud Platform)
Introduction to GCP (Google Cloud Platform)
Pulkit Gupta
 
Disaster Recovery using Azure Services
Disaster Recovery using Azure ServicesDisaster Recovery using Azure Services
Disaster Recovery using Azure Services
Anoop Nair
 
Mastering Azure Monitor
Mastering Azure MonitorMastering Azure Monitor
Mastering Azure Monitor
Richard Conway
 
How to migrate workloads to the google cloud platform
How to migrate workloads to the google cloud platformHow to migrate workloads to the google cloud platform
How to migrate workloads to the google cloud platform
actualtechmedia
 
Migrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with ConfidenceMigrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with Confidence
David J Rosenthal
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
Amazon Web Services
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute Services
Amazon Web Services
 
Pentaho technical whitepaper-1-6
Pentaho technical whitepaper-1-6Pentaho technical whitepaper-1-6
Pentaho technical whitepaper-1-6skonda
 
Introduction to Azure
Introduction to AzureIntroduction to Azure
Introduction to Azure
Robert Crane
 
API strategy with IBM API connect
API strategy with IBM API connectAPI strategy with IBM API connect
API strategy with IBM API connect
Kellton Tech Solutions Ltd
 
Credon - Qlik Sense Presentation
Credon - Qlik Sense PresentationCredon - Qlik Sense Presentation
Credon - Qlik Sense Presentation
Wim Wuytens
 
IT Transformation with AWS
IT Transformation with AWSIT Transformation with AWS
IT Transformation with AWS
Amazon Web Services
 
Microsoft Cloud Adoption Framework for Azure: Governance Conversation
Microsoft Cloud Adoption Framework for Azure: Governance ConversationMicrosoft Cloud Adoption Framework for Azure: Governance Conversation
Microsoft Cloud Adoption Framework for Azure: Governance Conversation
Nicholas Vossburg
 
cloud Resilience
cloud Resilience cloud Resilience
cloud Resilience
Integral university, India
 
TechEvent Cloud Governance
TechEvent Cloud GovernanceTechEvent Cloud Governance
TechEvent Cloud Governance
Trivadis
 
Cloud Economics
Cloud EconomicsCloud Economics
Cloud Economics
Amazon Web Services
 
AWS Account Best Practices
AWS Account Best PracticesAWS Account Best Practices
AWS Account Best Practices
Amazon Web Services
 
Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results
Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results
Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results Amazon Web Services
 
Google Cloud Platform (GCP)
Google Cloud Platform (GCP)Google Cloud Platform (GCP)
Google Cloud Platform (GCP)
Chetan Sharma
 

What's hot (20)

Azure automation
Azure automationAzure automation
Azure automation
 
Introduction to GCP (Google Cloud Platform)
Introduction to GCP (Google Cloud Platform)Introduction to GCP (Google Cloud Platform)
Introduction to GCP (Google Cloud Platform)
 
Disaster Recovery using Azure Services
Disaster Recovery using Azure ServicesDisaster Recovery using Azure Services
Disaster Recovery using Azure Services
 
Mastering Azure Monitor
Mastering Azure MonitorMastering Azure Monitor
Mastering Azure Monitor
 
How to migrate workloads to the google cloud platform
How to migrate workloads to the google cloud platformHow to migrate workloads to the google cloud platform
How to migrate workloads to the google cloud platform
 
Migrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with ConfidenceMigrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with Confidence
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute Services
 
Pentaho technical whitepaper-1-6
Pentaho technical whitepaper-1-6Pentaho technical whitepaper-1-6
Pentaho technical whitepaper-1-6
 
Introduction to Azure
Introduction to AzureIntroduction to Azure
Introduction to Azure
 
API strategy with IBM API connect
API strategy with IBM API connectAPI strategy with IBM API connect
API strategy with IBM API connect
 
Credon - Qlik Sense Presentation
Credon - Qlik Sense PresentationCredon - Qlik Sense Presentation
Credon - Qlik Sense Presentation
 
IT Transformation with AWS
IT Transformation with AWSIT Transformation with AWS
IT Transformation with AWS
 
Microsoft Cloud Adoption Framework for Azure: Governance Conversation
Microsoft Cloud Adoption Framework for Azure: Governance ConversationMicrosoft Cloud Adoption Framework for Azure: Governance Conversation
Microsoft Cloud Adoption Framework for Azure: Governance Conversation
 
cloud Resilience
cloud Resilience cloud Resilience
cloud Resilience
 
TechEvent Cloud Governance
TechEvent Cloud GovernanceTechEvent Cloud Governance
TechEvent Cloud Governance
 
Cloud Economics
Cloud EconomicsCloud Economics
Cloud Economics
 
AWS Account Best Practices
AWS Account Best PracticesAWS Account Best Practices
AWS Account Best Practices
 
Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results
Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results
Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results
 
Google Cloud Platform (GCP)
Google Cloud Platform (GCP)Google Cloud Platform (GCP)
Google Cloud Platform (GCP)
 

Viewers also liked

Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
DataWorks Summit
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
DataWorks Summit
 
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...DataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
DataWorks Summit
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
 
large scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraphlarge scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraph
DataWorks Summit
 
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesAirflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesDataWorks Summit
 
June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2DataWorks Summit
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsDataWorks Summit
 
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceImproving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceDataWorks Summit
 
Apache Lens: Unified OLAP on Realtime and Historic Data
Apache Lens: Unified OLAP on Realtime and Historic DataApache Lens: Unified OLAP on Realtime and Historic Data
Apache Lens: Unified OLAP on Realtime and Historic DataDataWorks Summit
 
From Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for AllFrom Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for All
DataWorks Summit
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
DataWorks Summit
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
DataWorks Summit
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application ResourcesDataWorks Summit
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
DataWorks Summit
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
DataWorks Summit
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
DataWorks Summit
 

Viewers also liked (20)

Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
large scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraphlarge scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraph
 
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesAirflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
 
June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
 
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceImproving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
 
Apache Lens: Unified OLAP on Realtime and Historic Data
Apache Lens: Unified OLAP on Realtime and Historic DataApache Lens: Unified OLAP on Realtime and Historic Data
Apache Lens: Unified OLAP on Realtime and Historic Data
 
From Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for AllFrom Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for All
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
 

Similar to Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic

Wolters Kluwer Improves Patient Outcomes with GigaSpaces XAP
Wolters Kluwer Improves Patient Outcomes with GigaSpaces XAP Wolters Kluwer Improves Patient Outcomes with GigaSpaces XAP
Wolters Kluwer Improves Patient Outcomes with GigaSpaces XAP
Amnon Raviv
 
How to Restructure Active Directory with ZeroIMPACT
How to Restructure Active Directory with ZeroIMPACTHow to Restructure Active Directory with ZeroIMPACT
How to Restructure Active Directory with ZeroIMPACT
Quest
 
How to Restructure and Modernize Active Directory
How to Restructure and Modernize Active DirectoryHow to Restructure and Modernize Active Directory
How to Restructure and Modernize Active Directory
Quest
 
inmation Presentation_2017
inmation Presentation_2017inmation Presentation_2017
inmation Presentation_2017
inmation Software GmbH
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
David Peyruc
 
Dennis Kehoe - ECO 15: Digital connectivity in healthcare
Dennis Kehoe - ECO 15: Digital connectivity in healthcareDennis Kehoe - ECO 15: Digital connectivity in healthcare
Dennis Kehoe - ECO 15: Digital connectivity in healthcare
Innovation Agency
 
Application of Distributed processing and Big data in agricultural DSS
Application of Distributed processing and Big data in agricultural DSSApplication of Distributed processing and Big data in agricultural DSS
Application of Distributed processing and Big data in agricultural DSS
Anusha Basavaraj
 
An Overview of the Message Broker Healthcare Connectivity Pack
An Overview of the Message Broker Healthcare Connectivity PackAn Overview of the Message Broker Healthcare Connectivity Pack
An Overview of the Message Broker Healthcare Connectivity PackAnt Phillips
 
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell World
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data Architecture
Wei-Chiu Chuang
 
Partners in Technology 13 Sept 2013 HSIA CIO Ray Brown
Partners in Technology 13 Sept 2013 HSIA CIO Ray BrownPartners in Technology 13 Sept 2013 HSIA CIO Ray Brown
Partners in Technology 13 Sept 2013 HSIA CIO Ray Brown
Digital Queensland
 
CTO Perspectives: What's Next for Data Management and Healthcare?
CTO Perspectives: What's Next for Data Management and Healthcare?CTO Perspectives: What's Next for Data Management and Healthcare?
CTO Perspectives: What's Next for Data Management and Healthcare?
Health Catalyst
 
Connecting Clinical Applications with WebSphere Message Broker
Connecting Clinical Applications with WebSphere Message BrokerConnecting Clinical Applications with WebSphere Message Broker
Connecting Clinical Applications with WebSphere Message Broker
Ant Phillips
 
Solution Architecture US healthcare
Solution Architecture US healthcare Solution Architecture US healthcare
Solution Architecture US healthcare
sumiteshkr
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014
Raja Chiky
 
Accelerate Healthcare Technology Modernization with Containerization and DevOps
Accelerate Healthcare Technology Modernization with Containerization and DevOpsAccelerate Healthcare Technology Modernization with Containerization and DevOps
Accelerate Healthcare Technology Modernization with Containerization and DevOps
CitiusTech
 
NHS England Open Source Event: Connecting Leeds: Open Platform
NHS England Open Source Event: Connecting Leeds: Open Platform NHS England Open Source Event: Connecting Leeds: Open Platform
NHS England Open Source Event: Connecting Leeds: Open Platform
Tony Shannon
 
Delivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsDelivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analytics
MariaDB plc
 
BSA (Graham Mitchell) Polypharmacy Prescribing Comparators Overview March 17
 BSA (Graham Mitchell) Polypharmacy Prescribing Comparators Overview March 17 BSA (Graham Mitchell) Polypharmacy Prescribing Comparators Overview March 17
BSA (Graham Mitchell) Polypharmacy Prescribing Comparators Overview March 17
Health Innovation Wessex
 
Genome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data StyleGenome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data Style
Julius Remigio, CBIP
 

Similar to Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic (20)

Wolters Kluwer Improves Patient Outcomes with GigaSpaces XAP
Wolters Kluwer Improves Patient Outcomes with GigaSpaces XAP Wolters Kluwer Improves Patient Outcomes with GigaSpaces XAP
Wolters Kluwer Improves Patient Outcomes with GigaSpaces XAP
 
How to Restructure Active Directory with ZeroIMPACT
How to Restructure Active Directory with ZeroIMPACTHow to Restructure Active Directory with ZeroIMPACT
How to Restructure Active Directory with ZeroIMPACT
 
How to Restructure and Modernize Active Directory
How to Restructure and Modernize Active DirectoryHow to Restructure and Modernize Active Directory
How to Restructure and Modernize Active Directory
 
inmation Presentation_2017
inmation Presentation_2017inmation Presentation_2017
inmation Presentation_2017
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
 
Dennis Kehoe - ECO 15: Digital connectivity in healthcare
Dennis Kehoe - ECO 15: Digital connectivity in healthcareDennis Kehoe - ECO 15: Digital connectivity in healthcare
Dennis Kehoe - ECO 15: Digital connectivity in healthcare
 
Application of Distributed processing and Big data in agricultural DSS
Application of Distributed processing and Big data in agricultural DSSApplication of Distributed processing and Big data in agricultural DSS
Application of Distributed processing and Big data in agricultural DSS
 
An Overview of the Message Broker Healthcare Connectivity Pack
An Overview of the Message Broker Healthcare Connectivity PackAn Overview of the Message Broker Healthcare Connectivity Pack
An Overview of the Message Broker Healthcare Connectivity Pack
 
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data Architecture
 
Partners in Technology 13 Sept 2013 HSIA CIO Ray Brown
Partners in Technology 13 Sept 2013 HSIA CIO Ray BrownPartners in Technology 13 Sept 2013 HSIA CIO Ray Brown
Partners in Technology 13 Sept 2013 HSIA CIO Ray Brown
 
CTO Perspectives: What's Next for Data Management and Healthcare?
CTO Perspectives: What's Next for Data Management and Healthcare?CTO Perspectives: What's Next for Data Management and Healthcare?
CTO Perspectives: What's Next for Data Management and Healthcare?
 
Connecting Clinical Applications with WebSphere Message Broker
Connecting Clinical Applications with WebSphere Message BrokerConnecting Clinical Applications with WebSphere Message Broker
Connecting Clinical Applications with WebSphere Message Broker
 
Solution Architecture US healthcare
Solution Architecture US healthcare Solution Architecture US healthcare
Solution Architecture US healthcare
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014
 
Accelerate Healthcare Technology Modernization with Containerization and DevOps
Accelerate Healthcare Technology Modernization with Containerization and DevOpsAccelerate Healthcare Technology Modernization with Containerization and DevOps
Accelerate Healthcare Technology Modernization with Containerization and DevOps
 
NHS England Open Source Event: Connecting Leeds: Open Platform
NHS England Open Source Event: Connecting Leeds: Open Platform NHS England Open Source Event: Connecting Leeds: Open Platform
NHS England Open Source Event: Connecting Leeds: Open Platform
 
Delivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsDelivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analytics
 
BSA (Graham Mitchell) Polypharmacy Prescribing Comparators Overview March 17
 BSA (Graham Mitchell) Polypharmacy Prescribing Comparators Overview March 17 BSA (Graham Mitchell) Polypharmacy Prescribing Comparators Overview March 17
BSA (Graham Mitchell) Polypharmacy Prescribing Comparators Overview March 17
 
Genome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data StyleGenome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data Style
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic

  • 1. ©2015 MFMER | slide-1 Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic Dequan Chen, Ph.D. Mayo Clinic Big Data Core Team chen.dequan@mayo.edu; 507-250-7400 San Jose Convention Center June 9, 2015
  • 2. ©2015 MFMER | slide-2 Outlines • Healthcare Data Challenge at Mayo Clinic  HL7 Data for Integrated Enterprise Usage  Data Volumes, Types, and Processing Velocity  Incapability of Existing RDBMS/Web Systems • Big Data Implementation for Enterprise-Level Clinical and Non-Clinical Usage • Big Data Implementation in Support of Colorectal Surgery Applications • On-going & Future Direction • Conclusion
  • 3. ©2015 MFMER | slide-3 Healthcare Data Challenge at Mayo Clinic
  • 4. ©2015 MFMER | slide-4 Healthcare Data Challenge at Mayo Clinic HL7 Data for Integrated Enterprise Usage • World’s largest integrated not-for-profit healthcare system – > 70 hospitals and clinics  Enterprise Core Value: The Needs of the Patient Come First • Mayo Clinic Rochester, Minn. recognized as the top hospital in the nation for 2014-2015 by U.S. News & World Report • Provides care for > 1m (1,317,900 in 2014) patients from all 50 states & > 150 countries annually
  • 5. ©2015 MFMER | slide-5 Healthcare Data Challenge at Mayo Clinic HL7 Data for Integrated Enterprise Usage • Generates large amounts of EHR Data  Structured  Semi-Structured  Unstructured • HL7 messages – mix of semi- and un-structured EHR data  Enterprise-level clinical usage (diagnosis, treatment, prevention, or clinical reporting)  Enterprise-level non-clinical usage (research, business intelligence, or health information exchange)
  • 6. ©2015 MFMER | slide-6 Healthcare Data Challenge at Mayo Clinic HL7 Data for Integrated Enterprise Usage • HL7 Message Example: (Source: http://www.priorityhealth.com) MSH|^~&|XXXX|C|PRIORITYHEALTH|PRIORITYHEALTH|20080511103530||ORU ^R01|Q335939501T337311002|P|2.3||| PID|1||94000000000^^^Priority Health||LASTNAME^FIRSTNAME^INIT||19460101|M||||| PD1|1|||1234567890^PCPLAST^PCPFIRST^M^^^^^NPI| OBR|1||185L29839X64489JLPF~X64489^ACC_NUM|JLPF^Lipid Panel - C||||||||||||1694^DOCLAST^DOCFIRST^^MD||||||20080511103529||| OBX|1|NM|JHDL^HDL Cholesterol (CAD)|1|62|CD:289^mg/dL|>40^>40|""||""|F|||20080511103500|||^^^""| OBX|2|NM|JTRIG^Triglyceride (CAD)|1|72|CD:289^mg/dL|35- 150^35^150|""||""|F|||20080511103500|||^^^""| OBX|3|NM|JVLDL^VLDL-C (calc - CAD)|1|14|CD:289^mg/dL||""||""|F|||20080511103500|||^^^""| OBX|4|NM|JLDL^LDL-C (calc - CAD)|1|134|CD:289^mg/dL|0- 100^0^100|H||""|F|||20080511103500|||^^^""| OBX|5|NM|JCHO^Cholesterol (CAD)|1|210|CD:289^mg/dL|90- 200^90^200|H||""|F|||20080511103500|||^^^""| …
  • 7. ©2015 MFMER | slide-7 Healthcare Data Challenge at Mayo Clinic Data Volumes • Daily enterprise-wide volume of real-time HL7 message data (msgs/day) • Large number (~1.83 billion prior to 12-31- 2014) of historical HL7 message data at Mayo Clinic
  • 8. ©2015 MFMER | slide-8 Healthcare Data Challenge at Mayo Clinic Data Types, and Processing Velocity • 60+ document types of HL7 messages  Each document type generated by an individual healthcare source system  Ex: Clinical Notes (cnote), Surgical Notes (opnote), Radiology, Pathology, Health_Quest, ECG/EKG… • Capability of fast processing (storing, analyzing, retrieving) of all types of HL7 data  Real-time data and/or historical data  Seconds - ER, ICU and Surgery Healthcare  Minutes - Internal or Prevention Medicine
  • 9. ©2015 MFMER | slide-9 Healthcare Data Challenge at Mayo Clinic Challenges of Existing RDBMS/Web Systems • For enterprise-level clinical and non-clinical usage, the existing multiple RDBMS-based system implementations cannot achieve:  All Real-time HL7 Messages – synchronously stored, analyzed and retrieved  All Real-time and/or Historical HL7 Messages – quickly analyzed and retrieved  Fast Free-Text Search on any medical terms  Easy & Lower-cost scalability (scale-up & scale-out)
  • 10. ©2015 MFMER | slide-10 Big Data Implementation for Enterprise- Level Clinical and Non-Clinical Usage
  • 11. ©2015 MFMER | slide-11 Mayo Clinic Big Data Platform MC BigData Appliance (V1.0) • Started implementation in Jan 2014 • Purchased from Teradata • SUSE Linux Enterprise Server 11 (SLES 11) • Integration and Production Hadoop clusters • Each Hadoop cluster:  2 edge nodes, 2 master nodes, 6 data nodes  Hadoop Stack – TDH1.3.2: Teradata-certified and modified HDP1.3.2  HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Flume, Hcatalog, WebHcat, Zookeeper, Ganglia, Nagios, Oozie, Hue, and Ambari
  • 12. ©2015 MFMER | slide-12 Mayo Clinic Big Data Platform MC BigData Appliance (V1.0) • Each Hadoop cluster (cont’d):  Built-in PostgreSQL Database Instance  Apache Storm (version 0.9.1)  ElasticSearch (ES) (version 1.0.0) Cluster  Instances of Home-Developed Storm Topology – MayoTopology (One instance for one doc-type of HL7 messages) o1 spout o 2+ bolts
  • 13. ©2015 MFMER | slide-13 Data Flow Architecture Persisting Mayo Clinic Healthcare Data Into HDFS and ElasticSearch Index inside MC BigData Appliance(V1.0)
  • 14. ©2015 MFMER | slide-14 Testing (Measurement) Architecture Measurement of Data Persisting Capacity of MC BigData Appliance (V1.0)
  • 15. ©2015 MFMER | slide-15 HL7 Data Processing Capability HDFS and ES Index Data Persisting Capacity of MC BigData Appliance (V1.0) • Daily HL7-Persisting Capacity • 62 ± 4 million HL7 messages/day • No statistically significant change occurs when more types of HL7 Messages were included • ~20-50x more capacity than current daily max volume of all internal HL7 messages
  • 16. ©2015 MFMER | slide-16 HL7 Data Processing Capability Ultra-Fast Free-Text Search Capacity by ES Index of MC BigData Appliance (V1.0) • Data on ES Index  HL7-V2-messages-derived-JSON-documents • Data Set size vs. Query speed (querying “Pain”) • ES Index: ~20000 – 30000x faster than NSE
  • 17. ©2015 MFMER | slide-17 Production Processing HL7 Data Reliability of MayoTopology Instances on BDProd Cluster on MC BigData Appliance (V1.0 & V2.0) • Production running started in May 2014 • Expanded to add additional edge nodes (3 more) and data nodes (4 more) • Upgraded from TDH1.3.2 to TDH 2.1 .. from ES1.0.0 to .. to ES1.5.2 • Successfully identified and fixed critical issues on the appliance • No Data Loss occurred up to now
  • 18. ©2015 MFMER | slide-18 Big Data Implementation in Support of Colorectal Surgery Applications
  • 19. ©2015 MFMER | slide-19 Goals for Support of Colorectal Surgery Applications • Optimize an existing NLP (Natural Language Processing) Pipeline • Move from thousands of HL7 documents to tens of thousands of documents processed daily • Replace existing free-text search facility used by Clinical Web Service supported applications • Move from minutes to milliseconds per search • Simplify overall architecture, increase data volume/velocity, and reduce costs
  • 20. ©2015 MFMER | slide-20 Output Parser Colorectal Surgical Applications HL7 HL7 HL7 UIMA Annotators 4-6 JMS queues, 250-300k HL7 msgs /day Elasticsearch (indexing & free text search) HL7 HL7 HL7 Storm (ID, Transform, and Parse) HDFS HL7 mgs for annotation To Annotation Facility Clinical Web Services REST API SQL NLP Discovery (MR, Hive, Pig, other) Existing Components Radiology Surgical ECG/EKG Pathology Clinical Notes Insurance Claims Flume RDBMS Persist Rules Engine New Components Services EnterpriseMessagingQueues(ESB) Solution Architecture In Support of Colorectal Surgery Applications
  • 21. ©2015 MFMER | slide-21 Enterprise Messaging Queues ClinNotes Surgery Radiology Rch Results Insurance ClinNotes OpNotes RadiolRpt ECG Rpt HDFS Big Data Platform Storm 1.Parse/Transform HL7 2.Persist JSON to Elasticsearch 3.Persist HL7 to HDFS 4.Route HL7 to NLP Queues ClinNotes OpNotes RadiolRpt ECG Rpt NLP Input Queues NLP Output Queues RBMS Structured Data Store NLP Evidence (annotations) Structured Data (from source systems) CRS Point of Care Tool User Interfaces External NLP Annotators Parse HL7 Setup UIMA Resources based on Message Type Run UIMA Pipeline Output 1:n Annotation Results (NLP evidence) A1. CRS_BLEED A2. CRS_ILEUS A3. NEURO_BLEED An. <expandable> … Big Data Input Queues Josh Pankratz – Apr 24, 2014 Elasticsearch SQL Solution Implementation
  • 22. ©2015 MFMER | slide-22 Data Flow Architecture
  • 23. ©2015 MFMER | slide-23 CRS HL7 Data Processing Capability MC BigData Appliance(Hadoop-ES)-NLPAnnotation(DS)- AmalgaRDB/Web for Colorectal Surgery (CRS) • Daily HL7-Persisting Capacity • 535k ± 31k messages/day • ~8-25x more capacity than current daily max volume of CRS HL7 messages
  • 24. ©2015 MFMER | slide-24 Production Processing CRS HL7 Data Reliability of MC BigData Appliance(Hadoop-ES)- NLPAnnotation(DS)-AmalgaRDB/Web for Colorectal Surgery (CRS) • Production running started in July 2014 • No Data Loss occurred up to now
  • 25. ©2015 MFMER | slide-25 On-going & Future Direction
  • 26. ©2015 MFMER | slide-26 On-going & Future Direction • Move current NLP Annotation Pipeline from DataStage production server environment to MC BigData appliance Hadoop cluster for CRS applications • Storm Topology • Dedicated edge nodes Faster & more reliable NLP annotation Higher Capacity of HL7 message processing
  • 27. ©2015 MFMER | slide-27 On-going & Future Direction • Build a unified data architecture – Unified Data Platform (UDP, an enterprise-integrated system) over the next few years: Enhance the Big Data platform Utilize existing RDBMS-based replication and data warehouse environment Create a variety of data endpoints (cubes, data services, advanced visualizations) for enterprise usage Integrate with non-Hadoop components for advanced Big Data analytics – R, Revolution R..
  • 28. ©2015 MFMER | slide-28 Conclusion
  • 29. ©2015 MFMER | slide-29 Conclusion Take-Home Messages • The implemented BigData platform coupled with DataStage (NLP) & RDBMS exceeds current Mayo Clinic patient-care needs: • Reliably handle ~20-50x more capacity than current daily volume of all HL7 messages • Provide ultra-fast Free-Text Search capabilities on medical terms • Reliably handle ~8-25x more capacity than current daily volume of HL7 messages for Colorectal Surgery Applications • Significantly outperform RDBMS-only-based systems
  • 30. ©2015 MFMER | slide-30 Conclusion Take-Home Messages • Big Data is a core component of Mayo Clinic UDP, which can utilize the power of Big Data technology at enterprise-level: Large data storage capability Structured, semi-structured and unstructured data Fast data exchange with RDBMS-based systems A variety of data-oriented Hadoop components – HDFS, Pig, Hive, HBase, Spark .. In-situ non-Hadoop data-processing components – R, Revolution R ..
  • 31. ©2015 MFMER | slide-31 Questions & Discussion
  • 32. ©2015 MFMER | slide-32 Reference Links • Mayo Clinic: http://www.mayoclinic.org/ • HL7: http://www.hl7.org/ • Hadoop Stack: http://hortonworks.com • Apache Storm: https://storm.apache.org/ • ElasticSearch: https://www.elastic.co/ • Teradata: http://www.teradata.com/