SlideShare a Scribd company logo
www.boozallen.comJune 5, 2018
Spark + AI Helps
the FDA Protect
the Nation
Kun Ei Kang
Booz | Allen | Hamilton
Chief Innovation Architect
BOOZ|ALLEN|HAMILTON
Jonathan Chu
Booz | Allen | Hamilton
Chief Technologist
www.boozallen.com
AGENDA
2
4
2 3
1
www.boozallen.com
FDA AND YOU
3
§ The FDA oversees products which account for 20
cents of every dollar spent by consumers
§ Foreign production of FDA-regulated goods and
materials has exploded over the last decade
§ FDA-regulated food and medical products originate
from more than:
– 150 countries
– 130,000 importers
– 300,000 foreign facilities
www.boozallen.com
FDA IMPORTS PROGRAM
4
Responsibilities:
§ Protect public health by electronically screening
all FDA regulated product imports
§ Determine and stop product line/shipments if
they pose significant risk
§ Obtain samples of products for further
laboratory screening as needed
High Volume
~40 Billion shipments in 2017
with continuous annual increase
of 5-10%
Performance Insight
Sheer volume of data led to
increased labor to assess
performance of electronic
screening rules
Change Management
Need for more robust capabilities to
evaluate proposed changes to
screening criteria for importsChallenges
Innovation center, Washington, D.C.
www.boozallen.com
ODP TURNS DISPARATE DATA INTO VALUE ADDED SOLUTION FOR
OUR CLIENTS
COLLECT
Get and track
data from the
source
PROCESS
Give Data the
Power of
Greater
Context
AGGREGATE
From disparate
data sources,
one version of
the truth
EXPOSE
Abstract away
complexities
through a
single interface
VISUALIZATIONS
APPLICATIONS
DATA SCIENCE
ANALYTICS
BUSINESS INTELLIGENCE
CURATION & DISCOVERY
DATA CONSUMERS
Security, Governance, Provenance, Lineage
Insights
DATA SOURCES
SOCIAL MEDIA
NEWSFEEDS
WEB CRAWLERS
PROPRIETARY
SOURCES
www.boozallen.com
ODP CORE TECHNOLOGY STACK
§ Best-of-breed open source technologies chosen, configured and
integrated
§ Containerized data ingest and processing pipeline Apache Spark
and Docker
§ Automated deployment into Amazon Web Services (AWS) and
Azure
§ Ability to swap in-and-out technologies based on use case, as well
as tailor deployments based on use case (e.g. Elasticsearch vs
Solr; deploy search capabilities without Hadoop ecosystem)
§ Data Management dashboard for job management, data tracking
lineage/provenance, metadata management, and governance
§ Analytics platform that democratizes data science and enables
analytic decision making for anyone (of any skill set) in an
organization
ODP Framework
www.boozallen.com
SOLUTION JOURNEY
PERFORMANCE ANALYTICS
Provide analytics and dashboards
for assessing current import
screening rules performance
SCREENING OPTIMIZATION
Improve imports screening
through Machine Learning + AI
AGGREGATION
Preparation of data for FDA
subject matter experts
8
www.boozallen.com
PHASE 1: AGGREGATION
CHALLENGE
Billions of records in various
database tables dating back to
2007
Spark jobs migrate data into data lake
Filter and join data set
Redistribute data for performance
9
www.boozallen.com
Spark Submit
ODP IMPLEMENTATION AT FDA
Spark SQL Job.jar
15 Spark Packages
Connectors
Spark Cluster
Spark SQL WEB UI
JSON
Connection config
SQL
www.boozallen.com
PHASE 2: PERFORMANCE ANALYTICS
Leverage Client’s Tableau Investment
Create Tableau Dashboards
Enable Parquet to Tableau TDE format
CHALLENGE
With thousands of rules in place,
provide intuitive user experience
and insight to provide clarity on
current rules performance
11
www.boozallen.com
TABLEAU CONNECTOR
www.boozallen.com
PHASE 3: CURRENT PHASE - BUSINESS OPTIMIZATION
Recommend changes to model
Determine impact to organization
Determine projected performance
GOALS
Automate use of data captured
during imports entry review and
other data to ascertain risk of
products to improve entry review
process
13
www.boozallen.com
DEMONSTRATION OF ODP
14
www.boozallen.com
WHAT WE LEARNED
§ The use of Spark further facilitates innovation due to its ability to quickly experiment and
demonstrate business value
§ Due to regulatory mandates, there will always be a human factor in the final decision
“Information is the oil of the 21st century,
and analytics is the combustion engine.”
– Peter Sondergaard -
15
www.boozallen.com
CONTACT US
For more information about Open Data Platform, contact:
Jonathan Chu
chu_jonathan@bah.com
opendataplatform@bah.com
https://boozallen.github.io/opendataplatform/
Kun Ei Kang
kang_kun@bah.com
16
www.boozallen.com
ODP REFERENCE SLIDES
17
Innovation center, Washington, D.C.
An Open Source data management platform harvested from our Booz Allen developer community
www.boozallen.com
19This document is confidential and intended solely for the client to whom it is addressed.
2020
130
exabytes
40,000+
exabytes
x2
2005
~40%
“touched” by cloud
computing providers
~33%
of the digital universe will
be valuable if analyzed
DATA CHALLENGES CONTINUE TO GROW EXPONENTIALLY
SOURCES: IDC – The Digital Universe in 2020, The
Forrester Wave: Big Data Hadoop Solutions, Gartner
2016 Metadata Management MQ, Gartner Newsroom
DIGITAL UNIVERSE
DATA GROWTH
(by 2020)
(by 2020)
x300
5,200+ GB
per person
Forrester estimates that companies are only making use of 12% of the data they own
GROWTH OF SENSITIVE DATA
REQUIRING PROTECTION
< 1/3
2010
> 40%
2020
of which < 50%
is protected
unprotected data will
grow by a factor of 26
GROWING NEED OF
DATA GOVERNANCE
Gartner estimates data lakes will still not have
effective metadata management until at least 2018
By 2020, information governance and policies will
be managed by metadata alone
Poor quality of data costs an average organization
$13.5 million per year
www.boozallen.com
THE TRUTH ABOUT BIG DATA
20
“The more data I have, the more questions I can answer, the more
knowledgeable my organization will be”
It’s about the right data, combined with the right tools,
for the right problem— not how much data is stored
BELIEF REALITY
“Through 2017, 60% of big data projects will fail to go beyond piloting and experimentation, and will be abandoned.”
Gartner, http://www.gartner.com/newsroom/id/3130017
Policies, restrictions, and security need to continue to be enforced
and the original origin of the data and any changes to it need to
be tracked throughout the life of the data asset
Successful organizations leverage industry technologies,
but they also adopt an agile culture that enables them to
try new ideas and pivot quickly based on lessons learned
Stored data needs to be cataloged to enable analysts to
exploit the data. Information such as data set name, data
formats, data tagging, release-ability, retention, and other
metadata must be available.
“If I combine all my data together I will have full insight
into what my organization is trying to accomplish”
“If I leverage industry technologies, I will be successful”
”Once I have all my data in one place, I can focus on analytics”
vs.
www.boozallen.com
WE NEED TO EMPOWER ORGANIZATIONS TO MOVE
FROM DATA CHAOS TO POWERFUL ANALYTICS THAT
DRIVE INSIGHT INTO THEIR MISSION
21This document is confidential and intended solely for the client to whom it is addressed.
www.boozallen.com
OPEN DATA PLATFORM CAPABILITES
22
DATA SECURITY
Ensure security is applied and data
is protected from the point it enters
to the platform until it leaves
DATA GOVERNANCE
Gain insight into the health of your
data feeds, users of your data (both
human and system), metadata and
policies, and the lineage of data
across the entire platform
DATA GATEWAY
Access data across your enterprise
using an open common API
DATA INGEST
Fully established and orchestrated
Docker-based ingest pipelines
supporting RDBMS, Web Services,
JSON, HL7, ICPUBS, and more
DATA QUALITY
Refine your data either during or post-
processing to add greater context to
the data and enhance quality
PLATFORM AUTOMATION
Fully automated deployment ODP
enabling repeatable deployments
with no manual intervention
DATA ANALYTICS
Derive powerful insights quickly
through readily available analytical
capabilities
www.boozallen.com
ODP TURNS DISPARATE DATA INTO ONE VERSION OF
THE TRUTH
COLLECT
Get and track
data from the
source
PROCESS
Give Data the
Power of
Greater
Context
AGGREGATE
From disparate
data sources,
one version of
the truth
EXPOSE
Abstract away
complexities
through a
single interface
VISUALIZATIONS
APPLICATIONS
DATA SCIENCE
ANALYTICS
BUSINESS INTELLIGENCE
CURATION & DISCOVERY
DATA CONSUMERS
Security, Governance, Provenance, Lineage
Insights
DATA SOURCES
SOCIAL MEDIA
NEWSFEEDS
WEB CRAWLERS
PROPRIETARY
SOURCES
www.boozallen.com
EXCHANGE – FIND & CURATE
The power to take control of your data
INTEGRATED TEAMS & APPLICATIONS: Dissolve technical and cultural
barriers by building a community for all skill levels. ODP
empowers both technical staff and non-technical staff to get
value from data.
FILE TYPE AGNOSTIC: Links directly to any URL or data file,
anywhere, using our intuitive Web extension.
CUSTOMIZED CURATION: Organizes data sets on clients’ terms.
Collects data sources into custom “nets,” then curates them
by topic.
DATA-GATHERING COMMUNITY: Users can create a profile, curate
data, and gain followers. A social data community levels out
the playing field to find quality data.
CROWD-SOURCED QUALITY: The Comment, Like, and Star Rate
features give everyone a say about the quality of data
sources.
CUSTOM SOLUTION PACKAGES: One size does not fit all. Every
organization faces unique challenges, and no one should
settle for a standard, shrink-wrapped, stand-alone tool.
www.boozallen.com
EXPLORE – ANALYZE & DISCOVER
25
Data science & machine learning for the masses
EMBRACE MODERN ANALYTICS: Gain greater data insights through
machine learning, natural language processing, advanced data
querying, data curation bookmarking.
GROW A DATA SCIENCE CAPABILITY: More than just tools, ODP helps
build a data science community and nurtures the growth of an
analytic capability.
API TO ANALYTICS: Downloads data locally with one click, or sends data
via API to Sailfish Explore.
NATURAL LANGUAGE INTERFACE: Users can ask “plain-English” questions
about the data—no special query language is required.
VISUAL QUERY BUILDER: Users can simply drag-and-drop to build
complex queries, without coding.
WORKFLOW MANAGEMENT: Users can think through their analytical
approach in real time, saving their workflow, sharing it with others,
and/or scheduling it ahead of time.
www.boozallen.com
ODP ARCHITECTURE AND TECHNOLOGY
STACK
26Booz Allen Hamilton Internal
Data:
• Streaming
• Batch
• Structured
• Unstructured
Raw Data Zone
• Unaltered Source Data1. Collect 2. Prepare 3. Organize 4. Profile 5. Enrich 6. Aggregate
• Establish data
policies
• Cleanse and
normalize data
• Extract metadata
• Store and
catalog
• Data Archive
• Assess Data Quality
• Manage Version
Control
• Track provenance /
lineage
• Enhance
metadata
• Create
relationships
• Combine disparate
data sources,
create single
version of truth
Trusted Data Zone
• Master Data
• Cleaned, Processed, Organized
• Correlated Entities
• Policy Enforcement
• Version Controlled
Sandbox Data Zone
• Combination of trusted and raw data
• Data Sampling to support Development,
Experimentation, and Prototypes
Analytic Data Zone
• In-Memory
• Optimized Indexes
• Graph Analysis
• Data Warehouses
• Machine Learning
Data Sharing Services
• Search
• Query
• Mutation Services
• Enterprise Application Integration
• Publish/Subscribe
• External Data Connections
• Business Process Management
LDAP / ADFS
Data Security
• Cell Level Security
• Row Level Filtering
• Attribute Security
• Centralized Policy Enforcement
Data Governance
• Data Catalog
• Data Quality
• Data Provenance and Lineage
www.boozallen.com
TYPICAL IMPLEMENTATION STRATEGY
2-3 Weeks
Proof of Concept
6-8 Weeks
Pilot
Varies by Customer
Operational
• Demonstrate technical
capabilities in Booz Allen
environment
• Target single contextual use
case
• Public data sources
• Demonstrate technical
capabilities in Client sandbox
environment
• Sample client data sets
ingested
• Pilot use case implemented
against client needs
• Operational deployment in
Client environment against full
client needs
• Planning, Installation
• Accreditation
• Client data sets ingested
• Monitoring/management of
Open Data Platform
• Platform optimization based on
operational use
• Upgrades
Ex. For a prospective commercial client, we ingested
publicly available geospatial, health, and financial
information in 3 weeks to showcase our ability to
respond to health events worldwide
Ex. For a government client, we ingested multiple
acquisition data sets to produce example analytics
and enable the client to reduce license costs while
increasing visibility across functions of the enterprise
Ex. For a government client, we support a multi-year
deployment ingesting over 100 feeds to support
strategic analysis of threats against US assets and
provide Defense/Intel level data security
www.boozallen.com
WHY LEVERAGE THE ODP?
§ We have packaged our experience and expertise into an automated package to give organizations a “hot-
start” to analyze data quickly
§ We have researched, integrated, and hardened best-of-breed Open Source Software using our experience
gained in Defense, Intel, Civil, and Commercial deployments
§ It is backed by a community of Big Data Professionals, Certified Cloud Architects, and Security
Professionals to ensure successful, low-risk delivery
§ It solves the challenges with deploying and managing a platform, wrangling and managing data feeds, and
integrating best-of-breed Open Source software
ODP enables Booz Allen to partner w/ clients in an Open Architecture
www.boozallen.com
A GLIMPSE OF OUR BIG DATA EXPERIENCE
* Defense / Intel Deployments on SIPRNET and JWICS
Plus Commercial Pharmaceutical, Gas &
Oil, and Telecommunications

More Related Content

What's hot

Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)Matt Barnes
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
Shallote Dsouza
 
Hadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHAHadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHA
Hortonworks
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
Michel Dumontier
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Big Data and Data Virtualization
Big Data and Data VirtualizationBig Data and Data Virtualization
Big Data and Data VirtualizationKenneth Peeples
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
Sanjay Padhi, Ph.D
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
Sanjay Padhi, Ph.D
 
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...
Booz Allen Hamilton
 
Unlocking big data
Unlocking big dataUnlocking big data
Kickfire: Best Of All Worlds
Kickfire: Best Of All WorldsKickfire: Best Of All Worlds
Kickfire: Best Of All Worlds
Enterprise Technology Management (ETM)
 
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...emermell
 
Dealing with Dark Data
Dealing with Dark DataDealing with Dark Data
Dealing with Dark Data
Kazoup
 
Distributed Ledger Tech Applications - Health Report V1.6
Distributed Ledger Tech Applications - Health Report V1.6Distributed Ledger Tech Applications - Health Report V1.6
Distributed Ledger Tech Applications - Health Report V1.6
Sean Manion PhD
 
AUTOMATED TESTING OF LAB MANAGEMENT SERVICES ON CLOUD
AUTOMATED TESTING OF LAB MANAGEMENT SERVICES ON CLOUDAUTOMATED TESTING OF LAB MANAGEMENT SERVICES ON CLOUD
AUTOMATED TESTING OF LAB MANAGEMENT SERVICES ON CLOUD
Indium Software
 
Delivering on the Promise of Big Data and the Cloud
Delivering on the Promise of Big Data and the CloudDelivering on the Promise of Big Data and the Cloud
Delivering on the Promise of Big Data and the CloudBooz Allen Hamilton
 
Analytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataAnalytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big Data
David Pittman
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
Tony Bain
 

What's hot (20)

End User Informatics
End User InformaticsEnd User Informatics
End User Informatics
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
 
Hadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHAHadoop and Data Virtualization - A Case Study by VHA
Hadoop and Data Virtualization - A Case Study by VHA
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Big Data and Data Virtualization
Big Data and Data VirtualizationBig Data and Data Virtualization
Big Data and Data Virtualization
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
iri-highres
iri-highresiri-highres
iri-highres
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...
 
Unlocking big data
Unlocking big dataUnlocking big data
Unlocking big data
 
Kickfire: Best Of All Worlds
Kickfire: Best Of All WorldsKickfire: Best Of All Worlds
Kickfire: Best Of All Worlds
 
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
 
Dealing with Dark Data
Dealing with Dark DataDealing with Dark Data
Dealing with Dark Data
 
Distributed Ledger Tech Applications - Health Report V1.6
Distributed Ledger Tech Applications - Health Report V1.6Distributed Ledger Tech Applications - Health Report V1.6
Distributed Ledger Tech Applications - Health Report V1.6
 
AUTOMATED TESTING OF LAB MANAGEMENT SERVICES ON CLOUD
AUTOMATED TESTING OF LAB MANAGEMENT SERVICES ON CLOUDAUTOMATED TESTING OF LAB MANAGEMENT SERVICES ON CLOUD
AUTOMATED TESTING OF LAB MANAGEMENT SERVICES ON CLOUD
 
Delivering on the Promise of Big Data and the Cloud
Delivering on the Promise of Big Data and the CloudDelivering on the Promise of Big Data and the Cloud
Delivering on the Promise of Big Data and the Cloud
 
Analytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataAnalytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big Data
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 

Similar to Apache Spark + AI Helps and FDA Protects the Nation with Jonathan Chu and Kun Ei Kang

BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
Big Data Week
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Cambridge Semantics
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
VMware Tanzu
 
Streaming and Visual Data Discovery for the Internet of Things
Streaming and Visual Data Discovery for the Internet of ThingsStreaming and Visual Data Discovery for the Internet of Things
Streaming and Visual Data Discovery for the Internet of Things
DatawatchCorporation
 
Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013
IBM Sverige
 
Activate Your Data Lakehouse with an Enterprise Knowledge Graph
Activate Your Data Lakehouse with an Enterprise Knowledge GraphActivate Your Data Lakehouse with an Enterprise Knowledge Graph
Activate Your Data Lakehouse with an Enterprise Knowledge Graph
DATAVERSITY
 
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics Dell Statisti...
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics  Dell Statisti...BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics  Dell Statisti...
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics Dell Statisti...
Big Data Week
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
Denodo
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
Cambridge Semantics
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Denodo
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperativeTrillium Software
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
Denodo
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
Prof.Balakrishnan S
 
Three Dimensions of Data as a Service
Three Dimensions of Data as a ServiceThree Dimensions of Data as a Service
Three Dimensions of Data as a Service
Denodo
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
Databricks
 
How to Consume Your Data for AI
How to Consume Your Data for AIHow to Consume Your Data for AI
How to Consume Your Data for AI
DATAVERSITY
 
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Denodo
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Denodo
 
Unleashing the Power of Cloud-Based Big Data Analytics.pptx
Unleashing the Power of Cloud-Based Big Data Analytics.pptxUnleashing the Power of Cloud-Based Big Data Analytics.pptx
Unleashing the Power of Cloud-Based Big Data Analytics.pptx
Golu187360
 

Similar to Apache Spark + AI Helps and FDA Protects the Nation with Jonathan Chu and Kun Ei Kang (20)

BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Streaming and Visual Data Discovery for the Internet of Things
Streaming and Visual Data Discovery for the Internet of ThingsStreaming and Visual Data Discovery for the Internet of Things
Streaming and Visual Data Discovery for the Internet of Things
 
Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013
 
Activate Your Data Lakehouse with an Enterprise Knowledge Graph
Activate Your Data Lakehouse with an Enterprise Knowledge GraphActivate Your Data Lakehouse with an Enterprise Knowledge Graph
Activate Your Data Lakehouse with an Enterprise Knowledge Graph
 
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics Dell Statisti...
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics  Dell Statisti...BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics  Dell Statisti...
BDW Chicago 2016 - John K. Thompson, GM for Advanced Analytics Dell Statisti...
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperative
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
Three Dimensions of Data as a Service
Three Dimensions of Data as a ServiceThree Dimensions of Data as a Service
Three Dimensions of Data as a Service
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 
How to Consume Your Data for AI
How to Consume Your Data for AIHow to Consume Your Data for AI
How to Consume Your Data for AI
 
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Unleashing the Power of Cloud-Based Big Data Analytics.pptx
Unleashing the Power of Cloud-Based Big Data Analytics.pptxUnleashing the Power of Cloud-Based Big Data Analytics.pptx
Unleashing the Power of Cloud-Based Big Data Analytics.pptx
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 

Apache Spark + AI Helps and FDA Protects the Nation with Jonathan Chu and Kun Ei Kang

  • 1. www.boozallen.comJune 5, 2018 Spark + AI Helps the FDA Protect the Nation Kun Ei Kang Booz | Allen | Hamilton Chief Innovation Architect BOOZ|ALLEN|HAMILTON Jonathan Chu Booz | Allen | Hamilton Chief Technologist
  • 3. www.boozallen.com FDA AND YOU 3 § The FDA oversees products which account for 20 cents of every dollar spent by consumers § Foreign production of FDA-regulated goods and materials has exploded over the last decade § FDA-regulated food and medical products originate from more than: – 150 countries – 130,000 importers – 300,000 foreign facilities
  • 4. www.boozallen.com FDA IMPORTS PROGRAM 4 Responsibilities: § Protect public health by electronically screening all FDA regulated product imports § Determine and stop product line/shipments if they pose significant risk § Obtain samples of products for further laboratory screening as needed High Volume ~40 Billion shipments in 2017 with continuous annual increase of 5-10% Performance Insight Sheer volume of data led to increased labor to assess performance of electronic screening rules Change Management Need for more robust capabilities to evaluate proposed changes to screening criteria for importsChallenges
  • 6. www.boozallen.com ODP TURNS DISPARATE DATA INTO VALUE ADDED SOLUTION FOR OUR CLIENTS COLLECT Get and track data from the source PROCESS Give Data the Power of Greater Context AGGREGATE From disparate data sources, one version of the truth EXPOSE Abstract away complexities through a single interface VISUALIZATIONS APPLICATIONS DATA SCIENCE ANALYTICS BUSINESS INTELLIGENCE CURATION & DISCOVERY DATA CONSUMERS Security, Governance, Provenance, Lineage Insights DATA SOURCES SOCIAL MEDIA NEWSFEEDS WEB CRAWLERS PROPRIETARY SOURCES
  • 7. www.boozallen.com ODP CORE TECHNOLOGY STACK § Best-of-breed open source technologies chosen, configured and integrated § Containerized data ingest and processing pipeline Apache Spark and Docker § Automated deployment into Amazon Web Services (AWS) and Azure § Ability to swap in-and-out technologies based on use case, as well as tailor deployments based on use case (e.g. Elasticsearch vs Solr; deploy search capabilities without Hadoop ecosystem) § Data Management dashboard for job management, data tracking lineage/provenance, metadata management, and governance § Analytics platform that democratizes data science and enables analytic decision making for anyone (of any skill set) in an organization ODP Framework
  • 8. www.boozallen.com SOLUTION JOURNEY PERFORMANCE ANALYTICS Provide analytics and dashboards for assessing current import screening rules performance SCREENING OPTIMIZATION Improve imports screening through Machine Learning + AI AGGREGATION Preparation of data for FDA subject matter experts 8
  • 9. www.boozallen.com PHASE 1: AGGREGATION CHALLENGE Billions of records in various database tables dating back to 2007 Spark jobs migrate data into data lake Filter and join data set Redistribute data for performance 9
  • 10. www.boozallen.com Spark Submit ODP IMPLEMENTATION AT FDA Spark SQL Job.jar 15 Spark Packages Connectors Spark Cluster Spark SQL WEB UI JSON Connection config SQL
  • 11. www.boozallen.com PHASE 2: PERFORMANCE ANALYTICS Leverage Client’s Tableau Investment Create Tableau Dashboards Enable Parquet to Tableau TDE format CHALLENGE With thousands of rules in place, provide intuitive user experience and insight to provide clarity on current rules performance 11
  • 13. www.boozallen.com PHASE 3: CURRENT PHASE - BUSINESS OPTIMIZATION Recommend changes to model Determine impact to organization Determine projected performance GOALS Automate use of data captured during imports entry review and other data to ascertain risk of products to improve entry review process 13
  • 15. www.boozallen.com WHAT WE LEARNED § The use of Spark further facilitates innovation due to its ability to quickly experiment and demonstrate business value § Due to regulatory mandates, there will always be a human factor in the final decision “Information is the oil of the 21st century, and analytics is the combustion engine.” – Peter Sondergaard - 15
  • 16. www.boozallen.com CONTACT US For more information about Open Data Platform, contact: Jonathan Chu chu_jonathan@bah.com opendataplatform@bah.com https://boozallen.github.io/opendataplatform/ Kun Ei Kang kang_kun@bah.com 16
  • 18. Innovation center, Washington, D.C. An Open Source data management platform harvested from our Booz Allen developer community
  • 19. www.boozallen.com 19This document is confidential and intended solely for the client to whom it is addressed. 2020 130 exabytes 40,000+ exabytes x2 2005 ~40% “touched” by cloud computing providers ~33% of the digital universe will be valuable if analyzed DATA CHALLENGES CONTINUE TO GROW EXPONENTIALLY SOURCES: IDC – The Digital Universe in 2020, The Forrester Wave: Big Data Hadoop Solutions, Gartner 2016 Metadata Management MQ, Gartner Newsroom DIGITAL UNIVERSE DATA GROWTH (by 2020) (by 2020) x300 5,200+ GB per person Forrester estimates that companies are only making use of 12% of the data they own GROWTH OF SENSITIVE DATA REQUIRING PROTECTION < 1/3 2010 > 40% 2020 of which < 50% is protected unprotected data will grow by a factor of 26 GROWING NEED OF DATA GOVERNANCE Gartner estimates data lakes will still not have effective metadata management until at least 2018 By 2020, information governance and policies will be managed by metadata alone Poor quality of data costs an average organization $13.5 million per year
  • 20. www.boozallen.com THE TRUTH ABOUT BIG DATA 20 “The more data I have, the more questions I can answer, the more knowledgeable my organization will be” It’s about the right data, combined with the right tools, for the right problem— not how much data is stored BELIEF REALITY “Through 2017, 60% of big data projects will fail to go beyond piloting and experimentation, and will be abandoned.” Gartner, http://www.gartner.com/newsroom/id/3130017 Policies, restrictions, and security need to continue to be enforced and the original origin of the data and any changes to it need to be tracked throughout the life of the data asset Successful organizations leverage industry technologies, but they also adopt an agile culture that enables them to try new ideas and pivot quickly based on lessons learned Stored data needs to be cataloged to enable analysts to exploit the data. Information such as data set name, data formats, data tagging, release-ability, retention, and other metadata must be available. “If I combine all my data together I will have full insight into what my organization is trying to accomplish” “If I leverage industry technologies, I will be successful” ”Once I have all my data in one place, I can focus on analytics” vs.
  • 21. www.boozallen.com WE NEED TO EMPOWER ORGANIZATIONS TO MOVE FROM DATA CHAOS TO POWERFUL ANALYTICS THAT DRIVE INSIGHT INTO THEIR MISSION 21This document is confidential and intended solely for the client to whom it is addressed.
  • 22. www.boozallen.com OPEN DATA PLATFORM CAPABILITES 22 DATA SECURITY Ensure security is applied and data is protected from the point it enters to the platform until it leaves DATA GOVERNANCE Gain insight into the health of your data feeds, users of your data (both human and system), metadata and policies, and the lineage of data across the entire platform DATA GATEWAY Access data across your enterprise using an open common API DATA INGEST Fully established and orchestrated Docker-based ingest pipelines supporting RDBMS, Web Services, JSON, HL7, ICPUBS, and more DATA QUALITY Refine your data either during or post- processing to add greater context to the data and enhance quality PLATFORM AUTOMATION Fully automated deployment ODP enabling repeatable deployments with no manual intervention DATA ANALYTICS Derive powerful insights quickly through readily available analytical capabilities
  • 23. www.boozallen.com ODP TURNS DISPARATE DATA INTO ONE VERSION OF THE TRUTH COLLECT Get and track data from the source PROCESS Give Data the Power of Greater Context AGGREGATE From disparate data sources, one version of the truth EXPOSE Abstract away complexities through a single interface VISUALIZATIONS APPLICATIONS DATA SCIENCE ANALYTICS BUSINESS INTELLIGENCE CURATION & DISCOVERY DATA CONSUMERS Security, Governance, Provenance, Lineage Insights DATA SOURCES SOCIAL MEDIA NEWSFEEDS WEB CRAWLERS PROPRIETARY SOURCES
  • 24. www.boozallen.com EXCHANGE – FIND & CURATE The power to take control of your data INTEGRATED TEAMS & APPLICATIONS: Dissolve technical and cultural barriers by building a community for all skill levels. ODP empowers both technical staff and non-technical staff to get value from data. FILE TYPE AGNOSTIC: Links directly to any URL or data file, anywhere, using our intuitive Web extension. CUSTOMIZED CURATION: Organizes data sets on clients’ terms. Collects data sources into custom “nets,” then curates them by topic. DATA-GATHERING COMMUNITY: Users can create a profile, curate data, and gain followers. A social data community levels out the playing field to find quality data. CROWD-SOURCED QUALITY: The Comment, Like, and Star Rate features give everyone a say about the quality of data sources. CUSTOM SOLUTION PACKAGES: One size does not fit all. Every organization faces unique challenges, and no one should settle for a standard, shrink-wrapped, stand-alone tool.
  • 25. www.boozallen.com EXPLORE – ANALYZE & DISCOVER 25 Data science & machine learning for the masses EMBRACE MODERN ANALYTICS: Gain greater data insights through machine learning, natural language processing, advanced data querying, data curation bookmarking. GROW A DATA SCIENCE CAPABILITY: More than just tools, ODP helps build a data science community and nurtures the growth of an analytic capability. API TO ANALYTICS: Downloads data locally with one click, or sends data via API to Sailfish Explore. NATURAL LANGUAGE INTERFACE: Users can ask “plain-English” questions about the data—no special query language is required. VISUAL QUERY BUILDER: Users can simply drag-and-drop to build complex queries, without coding. WORKFLOW MANAGEMENT: Users can think through their analytical approach in real time, saving their workflow, sharing it with others, and/or scheduling it ahead of time.
  • 26. www.boozallen.com ODP ARCHITECTURE AND TECHNOLOGY STACK 26Booz Allen Hamilton Internal Data: • Streaming • Batch • Structured • Unstructured Raw Data Zone • Unaltered Source Data1. Collect 2. Prepare 3. Organize 4. Profile 5. Enrich 6. Aggregate • Establish data policies • Cleanse and normalize data • Extract metadata • Store and catalog • Data Archive • Assess Data Quality • Manage Version Control • Track provenance / lineage • Enhance metadata • Create relationships • Combine disparate data sources, create single version of truth Trusted Data Zone • Master Data • Cleaned, Processed, Organized • Correlated Entities • Policy Enforcement • Version Controlled Sandbox Data Zone • Combination of trusted and raw data • Data Sampling to support Development, Experimentation, and Prototypes Analytic Data Zone • In-Memory • Optimized Indexes • Graph Analysis • Data Warehouses • Machine Learning Data Sharing Services • Search • Query • Mutation Services • Enterprise Application Integration • Publish/Subscribe • External Data Connections • Business Process Management LDAP / ADFS Data Security • Cell Level Security • Row Level Filtering • Attribute Security • Centralized Policy Enforcement Data Governance • Data Catalog • Data Quality • Data Provenance and Lineage
  • 27. www.boozallen.com TYPICAL IMPLEMENTATION STRATEGY 2-3 Weeks Proof of Concept 6-8 Weeks Pilot Varies by Customer Operational • Demonstrate technical capabilities in Booz Allen environment • Target single contextual use case • Public data sources • Demonstrate technical capabilities in Client sandbox environment • Sample client data sets ingested • Pilot use case implemented against client needs • Operational deployment in Client environment against full client needs • Planning, Installation • Accreditation • Client data sets ingested • Monitoring/management of Open Data Platform • Platform optimization based on operational use • Upgrades Ex. For a prospective commercial client, we ingested publicly available geospatial, health, and financial information in 3 weeks to showcase our ability to respond to health events worldwide Ex. For a government client, we ingested multiple acquisition data sets to produce example analytics and enable the client to reduce license costs while increasing visibility across functions of the enterprise Ex. For a government client, we support a multi-year deployment ingesting over 100 feeds to support strategic analysis of threats against US assets and provide Defense/Intel level data security
  • 28. www.boozallen.com WHY LEVERAGE THE ODP? § We have packaged our experience and expertise into an automated package to give organizations a “hot- start” to analyze data quickly § We have researched, integrated, and hardened best-of-breed Open Source Software using our experience gained in Defense, Intel, Civil, and Commercial deployments § It is backed by a community of Big Data Professionals, Certified Cloud Architects, and Security Professionals to ensure successful, low-risk delivery § It solves the challenges with deploying and managing a platform, wrangling and managing data feeds, and integrating best-of-breed Open Source software ODP enables Booz Allen to partner w/ clients in an Open Architecture
  • 29. www.boozallen.com A GLIMPSE OF OUR BIG DATA EXPERIENCE * Defense / Intel Deployments on SIPRNET and JWICS Plus Commercial Pharmaceutical, Gas & Oil, and Telecommunications