All contents © MuleSoft, LLC
Matt McLarty, @mattmclartybc
Sanjna Verma, @_sanjuverm
Data with a Mission
A COVID-19 API Case Study
All contents © MuleSoft, LLC
The digital data explosion
2
From https://www.zdnet.com/article/by-2025-nearly-30-percent-of-data-generated-will-be-real-time-idc-says/
All contents © MuleSoft, LLC
The API-enabled data opportunity
APIs can help...
● Provide practical access to
data
● Embed data inferences
into core capabilities
● Weave data-derived
insights into user
experiences
3
Value Capture
User data collected
from API-powered
interactions
Value Creation
Data insights produced
using ML model-based
analytics
Value Delivery
Captivating user
experiences powered by
insight-based APIs
From
https://blogs.mulesoft.com/biz/api/value-from-data-with-ai-api-business-model/
All contents © MuleSoft, LLC
APIs and data value
API value depends on context
● Why will the data be used?
● How will the data be used?
API usefulness
● Who are the consumers?
● What problems do they want to solve?
API usability
● How do consumers want to access the data?
● How can access methods be optimized for all?
4
From http://semanticstudios.com/user_experience_design/
All contents © MuleSoft, LLC
Vision Gather, unify, and deliver trusted COVID-19 data to organizations around the world.
5
All contents © MuleSoft, LLC 6
infected?
sick?
hospitalized?
All contents © MuleSoft, LLC 7
infected?
sick?
hospitalized?
All contents © MuleSoft, LLC
COVID-19 Data Platform
Gather, unify, and deliver trusted COVID-19 data, powered by Salesforce
Highly curated data sources
Standardized data models
Resilient data pipeline
Accessible to all
Free to everyone!
All contents © MuleSoft, LLC
Secure and
standardized
model
COVID-19 Data Platform: Pipeline
ingest & normalize
MuleSoft
Anypoint
Platform
deliver
Tableau
Prep
Data Sources
Data warehouse
store
Tableau
MuleSoft
Tableau Public
AWS Data
Exchange
Data partners
Salesforce
Traction on
Demand
SI/ISV Partners
MuleSoft Public
Empowering our ecosystem with reliable data
COVID Data
Tracking API
Crisis Response
Developer Portal
curate
Coming
soon!
Coming
soon!
Hyper API
validated by industry experts
consumers
All contents © MuleSoft, LLC
Secure and
standardized
model
COVID-19 Data Platform: Pipeline
ingest & normalize
MuleSoft
Anypoint
Platform
deliver
Tableau
Prep
Data Sources
Data warehouse
store
Tableau
MuleSoft
Tableau Public
AWS Data
Exchange
Data partners
Salesforce
Traction on
Demand
SI/ISV Partners
MuleSoft Public
Empowering our ecosystem with reliable data
COVID Data
Tracking API
Crisis Response
Developer Portal
curate
Coming
soon!
Coming
soon!
Hyper API
validated by industry experts
consumers
Data Sources
curate
Coming
soon!
Coming
soon!
validated by industry experts
All contents © MuleSoft, LLC
Medical Resource DataPublic Health Data Other Public Data
What data will be in the COVID-19 Data Platform?
These are the categories of global data that are critical to making informed decisions
Coming soon!
All contents © MuleSoft, LLC
REST OData GraphQL
Choosing the API protocol and architecture
12
We knew that we
wanted this API to be
web accessible
Mule messages (XML) Data has grown from a
few KB to a few
hundred GB. With over
100 different attributes,
something like
GraphQL is important
to consider
Future
All contents © MuleSoft, LLC
Data in Processing Data out
How to consider the data flow
13
JSON or CSV Mule messages (XML) JSON
All contents © MuleSoft, LLC
Secure and
standardized
model
COVID-19 Data Platform: Pipeline
ingest & normalize
MuleSoft
Anypoint
Platform
deliver
Tableau
Prep
Data Sources
Data warehouse
store
Tableau
MuleSoft
Tableau Public
AWS Data
Exchange
Data partners
Salesforce
Traction on
Demand
SI/ISV Partners
MuleSoft Public
Empowering our ecosystem with reliable data
COVID Data
Tracking API
Crisis Response
Developer Portal
curate
Coming
soon!
Coming
soon!
Hyper API
validated by industry experts
consumersingest & normalize
MuleSoft
Anypoint
Platform
Secure and
standardized
model
deliver
Data warehouse
store
MuleSoft
COVID Data
Tracking API
All contents © MuleSoft, LLC
Process
layer
Experience
layer
System
layer
API-led approach: how data enters the pipeline
15
NYT system API EU CDC
system API
COVID Tracking
Project system
API
All contents © MuleSoft, LLC
Process
layer
Experience
layer
System
layer
API-led approach: how data enters the pipeline
16
NYT system API EU CDC
system API
COVID Tracking
Project system
API
Snowflake
system API
All contents © MuleSoft, LLC
Process
layer
Experience
layer
System
layer
API-led approach: how data enters the pipeline
17
Data Synchronization Process API
(with scheduler)
NYT system API EU CDC
system API
COVID Tracking
Project system
API
Snowflake
system API
Data Aggregation Process API
Inbound queue
(partner only)
Outbound queue
(to snowflake)
All contents © MuleSoft, LLC
Process
layer
Experience
layer
System
layer
API-led approach: how data enters the pipeline
18
Data Synchronization Process API
(with scheduler)
NYT system API EU CDC
system API
COVID Tracking
Project system
API
Snowflake
system API
Data Contributions API
Data Aggregation Process API
Inbound queue
(partner only)
Outbound queue
(to snowflake)
All contents © MuleSoft, LLC
Designing the ingestion pipeline
- API-led connectivity for the overall
applications structure
- Mule application architecture for reusable
components
- API-design first to start building each
individual API
19
OPERATE
DESIGN
DEPLOY
DEV &
TEST
ENGAGE
All contents © MuleSoft, LLC
Designing API specifications
Data in
20
Tools used: API designer, Studio 7, Exchange, GitHub
Custom sys
APIs
Reusable
fragments
All contents © MuleSoft, LLC
2
API fragments with reusable
libraries for error handling
21
15 1
API specifications built – all
RAML
Master library with the rules
to respect the CDM
All contents © MuleSoft, LLC
Implementing the ingestion pipeline
- Time to implementation: 3.5 weeks
- 65% of time spent on nailing DataWeave
scripts
- 35% of time spent on testing applications
and validating the flow of data via pipeline
- Hard to quantify “reuse”
22
OPERATE
DESIGN
DEPLOY
DEV &
TEST
ENGAGE
All contents © MuleSoft, LLC
Developing the core API implementations
23
Tools used: Mule, Studio 7, DataWeave Playground, MUnit
DataSense
loading
DataWeave
scripting
All contents © MuleSoft, LLC
Testing the core implementations
24
Unit testing
Acceptance
testing
Tools used: Mule, Studio 7, DataWeave Playground, MUnit
All contents © MuleSoft, LLC
Secure and
standardized
model
COVID-19 Data Platform: Pipeline
ingest & normalize
MuleSoft
Anypoint
Platform
deliver
Tableau
Prep
Data Sources
Data warehouse
store
Tableau
MuleSoft
Tableau Public
AWS Data
Exchange
Data partners
Salesforce
Traction on
Demand
SI/ISV Partners
MuleSoft Public
Empowering our ecosystem with reliable data
COVID Data
Tracking API
Crisis Response
Developer Portal
curate
Coming
soon!
Coming
soon!
Hyper API
validated by industry experts
consumers
Tableau
Prep
Tableau
MuleSoft
COVID Data
Tracking API
Hyper API
All contents © MuleSoft, LLC
Deploying the ingestion pipeline
- Deploying our APIs did not mean “going
live” with the platform
- Deploying is more clicks than code, but it’s
a lot of context-switching
- We rolled back twice
26
OPERATE
DESIGN
DEPLOY
DEV &
TEST
ENGAGE
All contents © MuleSoft, LLC
API output (raw) and the visualization output
Comparing the outputs
27
Tools used: Runtime Manager, Postman, Tableau
All contents © MuleSoft, LLC
Sharing the APIs so users could use the API
- Behavioral changes: we published a live
API implemented endpoint to Anypoint
Exchange
- Thus, going live meant we nailed how to
“engage” or publicize our APIs and ensure
it was usable
- Many challenges in going live, including
understanding how a user could actually
use the API
28
OPERATE
DESIGN
DEPLOY
DEV &
TEST
ENGAGE
All contents © MuleSoft, LLC
Expectations of API usability
29
Request
access for
“open” API
Live
implemented
endpoint
All contents © MuleSoft, LLC
Results since the go-live
- Survived a minor DOS attack: having a
CDN protected us even more
- Averaged ~600K unique API requests
- Averaged ~30K views to the Tableau
visualizations
30
OPERATE
DESIGN
DEPLOY
DEV &
TEST
ENGAGE
All contents © MuleSoft, LLC
API-led in action
31
All contents © MuleSoft, LLC
Secure and
standardized
model
COVID-19 Data Platform: Pipeline started
ingest & normalize
MuleSoft
Anypoint
Platform
deliver
Tableau
Prep
Data Sources
Data warehouse
store
Tableau
MuleSoft
Tableau Public
AWS Data
Exchange
Data partners
Salesforce
Traction on
Demand
SI/ISV Partners
MuleSoft Public
Empowering our ecosystem with reliable data
COVID Data
Tracking API
Crisis Response
Developer Portal
curate
Coming
soon!
Coming
soon!
Hyper API
validated by industry experts
consumers
Tableau
Prep
Tableau
MuleSoft
COVID Data
Tracking API
Hyper API
Secure and
standardized
model
COVID-19 Data Platform: Pipeline today
ingest & normalize
MuleSoft
Anypoint
Platform
deliver
Tableau
Prep
Data Sources
Data warehouse
store
Tableau
MuleSoft
Tableau Public
AWS Data
Exchange
Data partners
Salesforce
Traction on
Demand
SI/ISV Partners
MuleSoft Public
Empowering our ecosystem with reliable data
COVID Data
Tracking API
Crisis Response
Developer Portal
curate
Coming
soon!
Coming
soon!
Hyper API
validated by industry experts
consumers
All contents © MuleSoft, LLC
Process
layer
Experience
layer
System
layer
API-led approach: how we started
34
Data Synchronization Process API
(with scheduler)
NYT system API EU CDC
system API
COVID Tracking
Project system
API
Snowflake
system API
Data Contributions API
Data Aggregation Process API
Inbound queue
(partner only)
Outbound queue
(to snowflake)
All contents © MuleSoft, LLC
Process
layer
Experience
layer
System
layer
API-led approach: where we are today
35
Data Synchronization Process API
(with scheduler)
NYT SYS
API
EU CDC SYS
API
COVID Tracking
Project SYS API
Snowflake
SYS API
Data Contributions API
Data Aggregation Process API
Inbound queue
(partner only)
Outbound queue
(to snowflake)
KFF SYS
API
Washington
SYS API
Texas SYS
API
MIT SYS APIOXFORD SYS
API
System
Checker
SYS API
All contents © MuleSoft, LLC
Contextualized data is in high
demand
Make proprietary business and external data
readily accessible and understandable
Data needs to be consistent and
available
Data needs to be available in different
ecosystems and places at once
Developer readability needs to be
maximized
Ensure data can be human AND machine
readable
Lessons learned
Theme-specific data is curated for global analysis and visualization
AWS Data
Exchange
Work.com
Command Center
MuleSoft
Exchange
Traction on
Demand
Tableau Data
Hub
Salesforce Core
40K users
Global
reach
All contents © MuleSoft, LLC
For more information...
37
Click here to book a
workshop to explore
your Data+API Strategy
Click here to access
the COVID Data
Platform APIs
Click here to learn
about MuleSoft’s data
integration solutions
All contents © MuleSoft, LLC
THANK YOU!
@_sanjuverm
@mattmclartybc

apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt McLarty & Sanjna Verma

  • 1.
    All contents ©MuleSoft, LLC Matt McLarty, @mattmclartybc Sanjna Verma, @_sanjuverm Data with a Mission A COVID-19 API Case Study
  • 2.
    All contents ©MuleSoft, LLC The digital data explosion 2 From https://www.zdnet.com/article/by-2025-nearly-30-percent-of-data-generated-will-be-real-time-idc-says/
  • 3.
    All contents ©MuleSoft, LLC The API-enabled data opportunity APIs can help... ● Provide practical access to data ● Embed data inferences into core capabilities ● Weave data-derived insights into user experiences 3 Value Capture User data collected from API-powered interactions Value Creation Data insights produced using ML model-based analytics Value Delivery Captivating user experiences powered by insight-based APIs From https://blogs.mulesoft.com/biz/api/value-from-data-with-ai-api-business-model/
  • 4.
    All contents ©MuleSoft, LLC APIs and data value API value depends on context ● Why will the data be used? ● How will the data be used? API usefulness ● Who are the consumers? ● What problems do they want to solve? API usability ● How do consumers want to access the data? ● How can access methods be optimized for all? 4 From http://semanticstudios.com/user_experience_design/
  • 5.
    All contents ©MuleSoft, LLC Vision Gather, unify, and deliver trusted COVID-19 data to organizations around the world. 5
  • 6.
    All contents ©MuleSoft, LLC 6 infected? sick? hospitalized?
  • 7.
    All contents ©MuleSoft, LLC 7 infected? sick? hospitalized?
  • 8.
    All contents ©MuleSoft, LLC COVID-19 Data Platform Gather, unify, and deliver trusted COVID-19 data, powered by Salesforce Highly curated data sources Standardized data models Resilient data pipeline Accessible to all Free to everyone!
  • 9.
    All contents ©MuleSoft, LLC Secure and standardized model COVID-19 Data Platform: Pipeline ingest & normalize MuleSoft Anypoint Platform deliver Tableau Prep Data Sources Data warehouse store Tableau MuleSoft Tableau Public AWS Data Exchange Data partners Salesforce Traction on Demand SI/ISV Partners MuleSoft Public Empowering our ecosystem with reliable data COVID Data Tracking API Crisis Response Developer Portal curate Coming soon! Coming soon! Hyper API validated by industry experts consumers
  • 10.
    All contents ©MuleSoft, LLC Secure and standardized model COVID-19 Data Platform: Pipeline ingest & normalize MuleSoft Anypoint Platform deliver Tableau Prep Data Sources Data warehouse store Tableau MuleSoft Tableau Public AWS Data Exchange Data partners Salesforce Traction on Demand SI/ISV Partners MuleSoft Public Empowering our ecosystem with reliable data COVID Data Tracking API Crisis Response Developer Portal curate Coming soon! Coming soon! Hyper API validated by industry experts consumers Data Sources curate Coming soon! Coming soon! validated by industry experts
  • 11.
    All contents ©MuleSoft, LLC Medical Resource DataPublic Health Data Other Public Data What data will be in the COVID-19 Data Platform? These are the categories of global data that are critical to making informed decisions Coming soon!
  • 12.
    All contents ©MuleSoft, LLC REST OData GraphQL Choosing the API protocol and architecture 12 We knew that we wanted this API to be web accessible Mule messages (XML) Data has grown from a few KB to a few hundred GB. With over 100 different attributes, something like GraphQL is important to consider Future
  • 13.
    All contents ©MuleSoft, LLC Data in Processing Data out How to consider the data flow 13 JSON or CSV Mule messages (XML) JSON
  • 14.
    All contents ©MuleSoft, LLC Secure and standardized model COVID-19 Data Platform: Pipeline ingest & normalize MuleSoft Anypoint Platform deliver Tableau Prep Data Sources Data warehouse store Tableau MuleSoft Tableau Public AWS Data Exchange Data partners Salesforce Traction on Demand SI/ISV Partners MuleSoft Public Empowering our ecosystem with reliable data COVID Data Tracking API Crisis Response Developer Portal curate Coming soon! Coming soon! Hyper API validated by industry experts consumersingest & normalize MuleSoft Anypoint Platform Secure and standardized model deliver Data warehouse store MuleSoft COVID Data Tracking API
  • 15.
    All contents ©MuleSoft, LLC Process layer Experience layer System layer API-led approach: how data enters the pipeline 15 NYT system API EU CDC system API COVID Tracking Project system API
  • 16.
    All contents ©MuleSoft, LLC Process layer Experience layer System layer API-led approach: how data enters the pipeline 16 NYT system API EU CDC system API COVID Tracking Project system API Snowflake system API
  • 17.
    All contents ©MuleSoft, LLC Process layer Experience layer System layer API-led approach: how data enters the pipeline 17 Data Synchronization Process API (with scheduler) NYT system API EU CDC system API COVID Tracking Project system API Snowflake system API Data Aggregation Process API Inbound queue (partner only) Outbound queue (to snowflake)
  • 18.
    All contents ©MuleSoft, LLC Process layer Experience layer System layer API-led approach: how data enters the pipeline 18 Data Synchronization Process API (with scheduler) NYT system API EU CDC system API COVID Tracking Project system API Snowflake system API Data Contributions API Data Aggregation Process API Inbound queue (partner only) Outbound queue (to snowflake)
  • 19.
    All contents ©MuleSoft, LLC Designing the ingestion pipeline - API-led connectivity for the overall applications structure - Mule application architecture for reusable components - API-design first to start building each individual API 19 OPERATE DESIGN DEPLOY DEV & TEST ENGAGE
  • 20.
    All contents ©MuleSoft, LLC Designing API specifications Data in 20 Tools used: API designer, Studio 7, Exchange, GitHub Custom sys APIs Reusable fragments
  • 21.
    All contents ©MuleSoft, LLC 2 API fragments with reusable libraries for error handling 21 15 1 API specifications built – all RAML Master library with the rules to respect the CDM
  • 22.
    All contents ©MuleSoft, LLC Implementing the ingestion pipeline - Time to implementation: 3.5 weeks - 65% of time spent on nailing DataWeave scripts - 35% of time spent on testing applications and validating the flow of data via pipeline - Hard to quantify “reuse” 22 OPERATE DESIGN DEPLOY DEV & TEST ENGAGE
  • 23.
    All contents ©MuleSoft, LLC Developing the core API implementations 23 Tools used: Mule, Studio 7, DataWeave Playground, MUnit DataSense loading DataWeave scripting
  • 24.
    All contents ©MuleSoft, LLC Testing the core implementations 24 Unit testing Acceptance testing Tools used: Mule, Studio 7, DataWeave Playground, MUnit
  • 25.
    All contents ©MuleSoft, LLC Secure and standardized model COVID-19 Data Platform: Pipeline ingest & normalize MuleSoft Anypoint Platform deliver Tableau Prep Data Sources Data warehouse store Tableau MuleSoft Tableau Public AWS Data Exchange Data partners Salesforce Traction on Demand SI/ISV Partners MuleSoft Public Empowering our ecosystem with reliable data COVID Data Tracking API Crisis Response Developer Portal curate Coming soon! Coming soon! Hyper API validated by industry experts consumers Tableau Prep Tableau MuleSoft COVID Data Tracking API Hyper API
  • 26.
    All contents ©MuleSoft, LLC Deploying the ingestion pipeline - Deploying our APIs did not mean “going live” with the platform - Deploying is more clicks than code, but it’s a lot of context-switching - We rolled back twice 26 OPERATE DESIGN DEPLOY DEV & TEST ENGAGE
  • 27.
    All contents ©MuleSoft, LLC API output (raw) and the visualization output Comparing the outputs 27 Tools used: Runtime Manager, Postman, Tableau
  • 28.
    All contents ©MuleSoft, LLC Sharing the APIs so users could use the API - Behavioral changes: we published a live API implemented endpoint to Anypoint Exchange - Thus, going live meant we nailed how to “engage” or publicize our APIs and ensure it was usable - Many challenges in going live, including understanding how a user could actually use the API 28 OPERATE DESIGN DEPLOY DEV & TEST ENGAGE
  • 29.
    All contents ©MuleSoft, LLC Expectations of API usability 29 Request access for “open” API Live implemented endpoint
  • 30.
    All contents ©MuleSoft, LLC Results since the go-live - Survived a minor DOS attack: having a CDN protected us even more - Averaged ~600K unique API requests - Averaged ~30K views to the Tableau visualizations 30 OPERATE DESIGN DEPLOY DEV & TEST ENGAGE
  • 31.
    All contents ©MuleSoft, LLC API-led in action 31
  • 32.
    All contents ©MuleSoft, LLC Secure and standardized model COVID-19 Data Platform: Pipeline started ingest & normalize MuleSoft Anypoint Platform deliver Tableau Prep Data Sources Data warehouse store Tableau MuleSoft Tableau Public AWS Data Exchange Data partners Salesforce Traction on Demand SI/ISV Partners MuleSoft Public Empowering our ecosystem with reliable data COVID Data Tracking API Crisis Response Developer Portal curate Coming soon! Coming soon! Hyper API validated by industry experts consumers Tableau Prep Tableau MuleSoft COVID Data Tracking API Hyper API
  • 33.
    Secure and standardized model COVID-19 DataPlatform: Pipeline today ingest & normalize MuleSoft Anypoint Platform deliver Tableau Prep Data Sources Data warehouse store Tableau MuleSoft Tableau Public AWS Data Exchange Data partners Salesforce Traction on Demand SI/ISV Partners MuleSoft Public Empowering our ecosystem with reliable data COVID Data Tracking API Crisis Response Developer Portal curate Coming soon! Coming soon! Hyper API validated by industry experts consumers
  • 34.
    All contents ©MuleSoft, LLC Process layer Experience layer System layer API-led approach: how we started 34 Data Synchronization Process API (with scheduler) NYT system API EU CDC system API COVID Tracking Project system API Snowflake system API Data Contributions API Data Aggregation Process API Inbound queue (partner only) Outbound queue (to snowflake)
  • 35.
    All contents ©MuleSoft, LLC Process layer Experience layer System layer API-led approach: where we are today 35 Data Synchronization Process API (with scheduler) NYT SYS API EU CDC SYS API COVID Tracking Project SYS API Snowflake SYS API Data Contributions API Data Aggregation Process API Inbound queue (partner only) Outbound queue (to snowflake) KFF SYS API Washington SYS API Texas SYS API MIT SYS APIOXFORD SYS API System Checker SYS API
  • 36.
    All contents ©MuleSoft, LLC Contextualized data is in high demand Make proprietary business and external data readily accessible and understandable Data needs to be consistent and available Data needs to be available in different ecosystems and places at once Developer readability needs to be maximized Ensure data can be human AND machine readable Lessons learned Theme-specific data is curated for global analysis and visualization AWS Data Exchange Work.com Command Center MuleSoft Exchange Traction on Demand Tableau Data Hub Salesforce Core 40K users Global reach
  • 37.
    All contents ©MuleSoft, LLC For more information... 37 Click here to book a workshop to explore your Data+API Strategy Click here to access the COVID Data Platform APIs Click here to learn about MuleSoft’s data integration solutions
  • 38.
    All contents ©MuleSoft, LLC THANK YOU! @_sanjuverm @mattmclartybc