How Data Virtualization enables Data
Scientists at Prologis
Ryan Thompson
Director, Data Virtualization and Architecture, Prologis
3
About Prologis
$1.5TRILLION
is the economic value of goods flowing through
our distribution centers each year, representing:
2.8%
of GDP for the 19 countries where
we do business
%2.0
of the World’s
GDP
1983 100 GLOBAL 768 MSF
Founded Most sustainable corporations
$87B
Assets under management on four continents
MILLION
employees under Prologis’ roofs
1.0
4
Business Need
Cost Optimization
Identify improvements in spend on Capital
Deployments
Caption Placeholder
Easy integration for Data Science Team
Integrate with 3rd party data capture
platform
5
Technical Challenge
Cloud First Philosophy
Minimize ETL
Flexible Data Consumption
Scalable Infrastructure
6
Solution
Architecture
7
Solution
Virtualized – On demand model scoring
Partial Cache – Historical scores backed to DB & real time scoring of new data
Full Cache (Full Refresh) – Large datasets benefit from batched scoring
Full Cache (Incremental Refresh) – Benefit of full cache with optimized refresh processing
Processing Capabilities
8
Benefits
Data Scientists
-Enable access to data in one environment
-Ability to code in languages they already know
-Speed up integration into logical data lake
Capital Deployment
-Validate assumptions with data
-Provide quicker access to insights
-Access/Distribute findings easily
9
Future Roadmap
10
Recommendations
Design re-usable patterns for Data Scientists
-A template for input/output in various
languages you want to integrate speeds up
future integrations
Utilize Standard Integration Patterns
-API Endpoints
-JSON payloads
Utilize the benefits of virtualization
-Use a microservice development pattern
-Swap out new versions without impact
-Input data is not limited to a single source
type (db, api, flat file, etc)
Increase performance with cache
-Full Incremental Cache enables real time
scoring plus the benefit of backed by a
database
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written
authorization from Denodo Technologies.

Prologis: How Data Virtualization Enables Data Scientists

  • 2.
    How Data Virtualizationenables Data Scientists at Prologis Ryan Thompson Director, Data Virtualization and Architecture, Prologis
  • 3.
    3 About Prologis $1.5TRILLION is theeconomic value of goods flowing through our distribution centers each year, representing: 2.8% of GDP for the 19 countries where we do business %2.0 of the World’s GDP 1983 100 GLOBAL 768 MSF Founded Most sustainable corporations $87B Assets under management on four continents MILLION employees under Prologis’ roofs 1.0
  • 4.
    4 Business Need Cost Optimization Identifyimprovements in spend on Capital Deployments Caption Placeholder Easy integration for Data Science Team Integrate with 3rd party data capture platform
  • 5.
    5 Technical Challenge Cloud FirstPhilosophy Minimize ETL Flexible Data Consumption Scalable Infrastructure
  • 6.
  • 7.
    7 Solution Virtualized – Ondemand model scoring Partial Cache – Historical scores backed to DB & real time scoring of new data Full Cache (Full Refresh) – Large datasets benefit from batched scoring Full Cache (Incremental Refresh) – Benefit of full cache with optimized refresh processing Processing Capabilities
  • 8.
    8 Benefits Data Scientists -Enable accessto data in one environment -Ability to code in languages they already know -Speed up integration into logical data lake Capital Deployment -Validate assumptions with data -Provide quicker access to insights -Access/Distribute findings easily
  • 9.
  • 10.
    10 Recommendations Design re-usable patternsfor Data Scientists -A template for input/output in various languages you want to integrate speeds up future integrations Utilize Standard Integration Patterns -API Endpoints -JSON payloads Utilize the benefits of virtualization -Use a microservice development pattern -Swap out new versions without impact -Input data is not limited to a single source type (db, api, flat file, etc) Increase performance with cache -Full Incremental Cache enables real time scoring plus the benefit of backed by a database
  • 11.
    © Copyright DenodoTechnologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.