SlideShare a Scribd company logo
1 of 23
C o p yri g h t © 2 0 1 5 , S A S In s t i t u t e In c . A l l ri g h t s
re s e rve d .
1
Data Regions:
Modernizing Your Company’s Data Ecosystem
Evan Levy
Vice President, Data Management Programs
SAS
EvanJayLevy
Copyright © 2016, SAS Institute Inc. All rights reserved. 2
A 20 Year Old Paradigm
The Change Data Perspective
Traditional Assumption
All data originates from internal systems
The company runs on OLTP systems
Users have the BI/DW to address their
reporting and analysis needs
Users require data from many sources
(and the quantity is growing)
Business Operations rely on OLTP, Data, and
Analytics
The Data Warehouse is the data source
Today’s RealityMost data is internal; >35% is external
Today’s Reality
We have multiple analytical systems:
data mining, exploration, sandboxes, etc.
1339F9C1339F9C
Copyright © 2016, SAS Institute Inc. All rights reserved. 3
Data Challenges…
“Why is all the data put into the
warehouse? Only 3 people need
to use the data”
“Can you tell me what data we
purchased from outside
vendors?”
“Why will it take you 30 days to
load data? I can cut and paste it
into my server in 4 minutes.”
“We have to standardize business
terminology. We’ve learned that
data governance is critical.”
“Why do I have to work around
the ‘infrastructure’. Shouldn’t it
be built for my needs?”
“You send me a file from SalesForce
every month, and the layout changes
every month. And you don’t tell me.”
“We have data all over (systems,
the cloud, external apps, etc.)
Why don’t we have a catalog of
the sources?
“Finance wants all data reconciled.
I can’t wait. Why do I have to suffer
from their requirements?”
133A061
Copyright © 2016, SAS Institute Inc. All rights reserved. 4
Data Characteristics
Data
Access
Domain
Structure
Audience
Integrity
1337ADC
Copyright © 2016, SAS Institute Inc. All rights reserved. 5
Data Characteristics
Audience
The individual user (and their skills and data needs)
Reviewing data about a
known situations
Report users
DW Developers
Uses ETL tools to
retrieve and load data
Analytic Developers
Builds analytical models to
manipulate known data
Data Scientists
Analyzes any available data
to identify new trends
BI Developers
Building reports using
structured data
Business Analyst
Analyzing data to for a
new hypothesis
Develops code to navigate
any available data source
Application Developers
1337ADC
Copyright © 2016, SAS Institute Inc. All rights reserved. 6
A business analyst running a
report on DBMS tables
Data Characteristics
Access
Custom code navigating a flat
file (to retrieve specific values)
Code call platform specific
APIs for data access
The methods, interfaces, and tools used to access the data
A cloud-application
sending transactions
SQL
An application listening /
receiving event streams
A data scientist playing
with data in a sandbox
Access
1337ADC
Copyright © 2016, SAS Institute Inc. All rights reserved. 7
Data Characteristics
Structure
Structured Data Semi Structured Data
Unstructured Data
The structure and organization of the data content
1337ADC
Copyright © 2016, SAS Institute Inc. All rights reserved. 8
Enterprise
Business Unit
Data Characteristics
Domain
Organization
Project
Individual
The business context for data usage1337ADC
Copyright © 2016, SAS Institute Inc. All rights reserved. 9
Data Characteristics
Integrity
Client John Smith
Username Oracleuser
RequestDate 9/28/2000
Request Time 23:59:07
Status Code OK
Browser Netscape
203.93.245.97 - oracleuser [28/Sep/2000:23:59:07 -
0700] "GET /files/search/search.jsp?s=driver&a=10
HTTP/1.0" 200 2374 "http://datawarehouse.
oracle.co/contents.htm" "Mozilla/4.7 [en] (WinNT; I)"
P;ECalibri;M220;SB;L10
P;ECalibri;M220;L11
P;ECalibri;M220;SI;L24
P;ECalibri;M220;SB;L9 P;ECalibri;M220;L10
P;ESegoe UI;M200;L9 P;ESegoe
UI;M200;SB;L9 P;ECalibri;M180;L9
F;P0;DG0G8;M300 B;Y12;X5;D0 0 11 4
O;L;D;V0;K47;G100 0.001 F;M495;R1
F;SM24;Y1;X1 C;K"name" F;SM24;X2
C;K"Shares" F;SM24;X3 C;K"Quote/ Price"
F;SM24;X4 C;K"cost/ share" F;SM24;X5
C;K"total cost" F;SM24;Y2;X1 C;K"aapl"
F;P4;FF2G;SM24;X2 C;K1454.4024 F;SM24;X3
C;K126.85 F;SM24;X4 C;K79.006952
F;P4;FF2G;SM24;X5 C;K114907.9
F;SM24;Y3;X1 C;K"axp" F;P4;FF2G;SM24;X2
C;K1454.4108 F;SM24;X3 C;K79.27 F;SM24;X4
…
name Shares Quote/ Price cost/ share total cost
aapl 1,454.40 126.85 79.006952 114,907.90
axp 1,454.41 79.27 84.671889 123,147.71
bmy 3,666.51 63.95 43.25259 158,586.21
brk.b 1,000 143.46 119.3527 119,352.70
celg 1,000 116.44 102.47094 102,470.94
chl 500 71.4 71.4179 35,708.95
The format, typing, and accuracy of the data
1337ADC
Copyright © 2016, SAS Institute Inc. All rights reserved. 10
The 5 Characteristics of Data
Data
Access
Domain
Structure
Audience
Integrity
1339F9C
Copyright © 2016, SAS Institute Inc. All rights reserved. 11
Challenging the Existing Data Paradigm
Support numerous new
data sources
Establish a shared source
staging area
Allow “trial & error”
analysis for all users
Support Self Service Data
(ETL, report, analysis, etc.)
Support different levels
of data acceptance
1339F9C
Copyright © 2016, SAS Institute Inc. All rights reserved. 12
Data Regions
Internal
Applications
SourceData
Repository
Cloud
Applications
Data
StreamsFiles
Services
Inbound Data
Source
Onboarding
Sandbox
Reporting
& BI
Enterprise
View
Data
Exploration
Advanced
Analytics &
Modeling
Messages
133A061
Copyright © 2016, SAS Institute Inc. All rights reserved. 13
Data Regions
Addressing an Enterprise Data Need
Internal
Applications
SourceData
Repository
Cloud
Applications
Data
StreamsFiles
Services
Inbound Data
Source
Onboarding
Sandbox
Reporting
& BI
Enterprise
View
Data
Exploration
Advanced
Analytics &
Modeling
Messages
Create an
environment that
fits user needs (not
IT convenience)
Support data
onboarding and
distribution as a
production need
Support a diverse
set of data usage
needs
Address the
complexities of
data movement
Reduce
resource/skill
overlap across the
company
133A061
Copyright © 2016, SAS Institute Inc. All rights reserved. 14
Data Regions
Source Onboarding
Audience Source Onboarding developers only; receiving for Source Data repository
Access Supports multiple delivery methods: txns, messages, bulk formats.
Structure Data layout based on source system. Likely dynamic & volatile
Domain N/A. This detail is implicit with the data source and the supplier.
Integrity N/A. Data details are defined by the data supplier.
• Manages the delivery of data from internal & external sources
• Holds data until acceptance is complete; Data is then moved
to the Source Data Repository
• Centralized support for sophisticated data capture methods
(ESP, 3rd party data delivery, API/messaging, etc.)
• Productionalizes source data capture, identification and
sharing
1339F9C
Copyright © 2016, SAS Institute Inc. All rights reserved. 15
Data Regions
Source Data Repository
• Stores and retains all source data content; reduces enterprise
storage requirements
• Establishes centralized registry of available data sources.
• Reflects a defined data layout (independent of source
changes)
• Alleviates developers’ need to learn data navigation, layout,
naming conventions on dozens of source systems
Audience Data Integration (Developers – DW, Application, Data Scientists, etc. )
Access Usually file oriented (transaction and other access based on situation)
Structure Company-centric, documented layout; Incl structured & unstructured
Domain N/A. Data reflects source
Integrity Company-centric format; Data quality and accuracy not addressed.1339F9C
Copyright © 2016, SAS Institute Inc. All rights reserved. 16
Data Regions
Data Exploration
• Supports one-off, in depth business analysis using any data
─ Environment is permanent but resource usage is very transient
─ Does not support production application access or deployment
• Often a general purpose platform that can support numerous
technologies (Big Data, files, RDBMS, advanced analytics, etc.)
• A walled-off, protected data scientist-centric environment
Audience Data Scientists & Analytics Developers (unable to be supported by sandbox)
Access All access methods due to the “from scratch” nature of environment
Structure All data layouts. (Unstructured likely due to focus on new concept development)
Domain Typically enterprise or line of business level
Integrity Data transformed/standardized to streamline exploration efforts (often ignored
for new or unknown data sources)1339F9C
Copyright © 2016, SAS Institute Inc. All rights reserved. 17
Data Regions
Enterprise View
• Contains multiple integrated subject areas (w/ long-term history)
• Content reflects enterprise trusted (and corrected) data
• Includes metadata (terms, definitions, lineage, etc.)
• Supports query processing and data provisioning
─ Online end-user queries and reporting
─ Data provisioning to analytical and transactional systems
─ Content continually updated (where possible)
Audience All user. Most access will occur via query tools or data manipulation/ETL tools
Access Usually query-based access (w/existing tools). Unstructured requires APIs
Structure Data is usually structured. (unstructured requires special tools/extensions
Domain Enterprise level. Other domains may use content for provisioning purposes
Integrity Reflective of enterprise terminology and value standards1339F9C
Copyright © 2016, SAS Institute Inc. All rights reserved. 18
Data Regions
Sandbox
• Allowing users to extend their analysis with custom data
─ Supports structured data and queries using existing tools/technologies
─ Focused on supporting additional (external) data
• Environment is temporary; does not support production
─ Walled-off environment; reports or data not distributable
• Allows for business-level data discovery and exploration
─ Supports one-off user data needs
Audience Advanced business users. Requites dbms query and data integration skills
Access Data is accessible via SQL/table environment.
Structure Data content is structured and RDBMS oriented (goal is data variety)
Domain Any/All domains (enterprise to individual)
Integrity Enterprise data is standardized/corrected. Other data must be addressed by user1337ADC1339F9C
Copyright © 2016, SAS Institute Inc. All rights reserved. 19
Data Regions
Reporting and Business Intelligence
• Supports defined reporting and ad hoc analysis (departmental data marts)
• Supports an application- or tool-centric view of data
─ Simplifies tool access and data manipulation, or
─ Reflects unique business (organization) view of data details
• Requires additional technical staff resources
─ ETL processing for additional sources, aggregates, hierarchies, etc.
─ Query and usage support for non-enterprise data
Audience Business users focused on using standard reports and content
Access Usually SQL-based access. Some data may be tool-centric (e.g. OLAP cubes)
Structure Usually structured data and reflecting rows of columns
Domain Likely to use enterprise data. Additional data may reflect different structure or
domain as needed.
Integrity Enterprise data is standardized/corrected. Other data must be addressed by user1337ADC1339F9C
Copyright © 2016, SAS Institute Inc. All rights reserved. 20
Data Regions
Advanced Analytics & Modeling
• A processing environment that can support advanced analytics
─ Typically general purpose processing platforms with inexpensive directly
attached storage
─ Data is structured and often stored in highly denormalized structures
─ usually driven by a specialized tool or language
• Typically small, high-value user audience
• Production-supported environment. Data & Results are distributed
Audience Highly skilled technical staff (data scientists, developers with advanced analysis skills)
Access Data accessed via specialized tools using standard and custom access methods.
Structure Data is usually structured; May process unstructured data into structured content
Domain Typically enterprise-level data. Business drivers are often specific to organization
Integrity Data is often cleansed and standardized
1339F9C
Copyright © 2016, SAS Institute Inc. All rights reserved. 21
Data Services
SourceData
Repository
Source
Onboarding
Sandbox
Reporting
& BI
Enterprise
View
Data
Exploration
Advanced
Analytics &
Modeling
Data Transformation
Data Quality
Data Governance
Metadata
1339F9C
Copyright © 2016, SAS Institute Inc. All rights reserved. 22
Getting Started, Moving Forward…
• Evaluate the diversity of audiences and domains
− Understand the unique combinations – those dictate the complexity
of your environment
− Review the external data that is already in use
• Extend your environment one region at a time
− Focus on adding (or remediating) regions based on business need
• Sharing data is not a courtesy – it’s a production need
− Data provisioning and integration is a costly activity; it should be addressed
with “economies-of-scale” methods
− Establishing repositories (with card catalogs) to provide “raw” and
“approved” data is a necessity
13378871339F9C
Copyr ight © 2016, SAS Institute Inc. All rights reser ved .
THANKS!
www.EvanJLevy.com@EvanJayLevy
Evan.Levy@SAS.com

More Related Content

What's hot

Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014
Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014
Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidDataWorks Summit
 
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceBenefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceDataWorks Summit/Hadoop Summit
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetDataWorks Summit
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...DataWorks Summit
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewDataWorks Summit/Hadoop Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseDataWorks Summit
 
Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcDataWorks Summit
 
Tools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudTools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudDataWorks Summit
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudDataWorks Summit/Hadoop Summit
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep DiveHortonworks
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United AirlinesDataWorks Summit
 

What's hot (20)

Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014
Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014
Red Hat in Financial Services - Presentation at Hortonworks Booth - Strata 2014
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
 
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceBenefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business Intelligence
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
 
Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
 
Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache Orc
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Tools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudTools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloud
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 

Similar to Data Regions: Modernizing your company's data ecosystem

SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunk
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Cambridge Semantics
 
Achieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturingAchieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturingDataWorks Summit
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRyan Andhavarapu
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSAWS User Group Kochi
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and ManufacturingCloudera, Inc.
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsDenodo
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesCindy Irby
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about DataBigDataExpo
 
CWIN17 India / Bigdata architecture yashowardhan sowale
CWIN17 India / Bigdata architecture  yashowardhan sowaleCWIN17 India / Bigdata architecture  yashowardhan sowale
CWIN17 India / Bigdata architecture yashowardhan sowaleCapgemini
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...DataWorks Summit
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 

Similar to Data Regions: Modernizing your company's data ecosystem (20)

SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
 
Achieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturingAchieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturing
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data Warehouse
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and Manufacturing
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural Components
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about Data
 
CWIN17 India / Bigdata architecture yashowardhan sowale
CWIN17 India / Bigdata architecture  yashowardhan sowaleCWIN17 India / Bigdata architecture  yashowardhan sowale
CWIN17 India / Bigdata architecture yashowardhan sowale
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Data Regions: Modernizing your company's data ecosystem

  • 1. C o p yri g h t © 2 0 1 5 , S A S In s t i t u t e In c . A l l ri g h t s re s e rve d . 1 Data Regions: Modernizing Your Company’s Data Ecosystem Evan Levy Vice President, Data Management Programs SAS EvanJayLevy
  • 2. Copyright © 2016, SAS Institute Inc. All rights reserved. 2 A 20 Year Old Paradigm The Change Data Perspective Traditional Assumption All data originates from internal systems The company runs on OLTP systems Users have the BI/DW to address their reporting and analysis needs Users require data from many sources (and the quantity is growing) Business Operations rely on OLTP, Data, and Analytics The Data Warehouse is the data source Today’s RealityMost data is internal; >35% is external Today’s Reality We have multiple analytical systems: data mining, exploration, sandboxes, etc. 1339F9C1339F9C
  • 3. Copyright © 2016, SAS Institute Inc. All rights reserved. 3 Data Challenges… “Why is all the data put into the warehouse? Only 3 people need to use the data” “Can you tell me what data we purchased from outside vendors?” “Why will it take you 30 days to load data? I can cut and paste it into my server in 4 minutes.” “We have to standardize business terminology. We’ve learned that data governance is critical.” “Why do I have to work around the ‘infrastructure’. Shouldn’t it be built for my needs?” “You send me a file from SalesForce every month, and the layout changes every month. And you don’t tell me.” “We have data all over (systems, the cloud, external apps, etc.) Why don’t we have a catalog of the sources? “Finance wants all data reconciled. I can’t wait. Why do I have to suffer from their requirements?” 133A061
  • 4. Copyright © 2016, SAS Institute Inc. All rights reserved. 4 Data Characteristics Data Access Domain Structure Audience Integrity 1337ADC
  • 5. Copyright © 2016, SAS Institute Inc. All rights reserved. 5 Data Characteristics Audience The individual user (and their skills and data needs) Reviewing data about a known situations Report users DW Developers Uses ETL tools to retrieve and load data Analytic Developers Builds analytical models to manipulate known data Data Scientists Analyzes any available data to identify new trends BI Developers Building reports using structured data Business Analyst Analyzing data to for a new hypothesis Develops code to navigate any available data source Application Developers 1337ADC
  • 6. Copyright © 2016, SAS Institute Inc. All rights reserved. 6 A business analyst running a report on DBMS tables Data Characteristics Access Custom code navigating a flat file (to retrieve specific values) Code call platform specific APIs for data access The methods, interfaces, and tools used to access the data A cloud-application sending transactions SQL An application listening / receiving event streams A data scientist playing with data in a sandbox Access 1337ADC
  • 7. Copyright © 2016, SAS Institute Inc. All rights reserved. 7 Data Characteristics Structure Structured Data Semi Structured Data Unstructured Data The structure and organization of the data content 1337ADC
  • 8. Copyright © 2016, SAS Institute Inc. All rights reserved. 8 Enterprise Business Unit Data Characteristics Domain Organization Project Individual The business context for data usage1337ADC
  • 9. Copyright © 2016, SAS Institute Inc. All rights reserved. 9 Data Characteristics Integrity Client John Smith Username Oracleuser RequestDate 9/28/2000 Request Time 23:59:07 Status Code OK Browser Netscape 203.93.245.97 - oracleuser [28/Sep/2000:23:59:07 - 0700] "GET /files/search/search.jsp?s=driver&a=10 HTTP/1.0" 200 2374 "http://datawarehouse. oracle.co/contents.htm" "Mozilla/4.7 [en] (WinNT; I)" P;ECalibri;M220;SB;L10 P;ECalibri;M220;L11 P;ECalibri;M220;SI;L24 P;ECalibri;M220;SB;L9 P;ECalibri;M220;L10 P;ESegoe UI;M200;L9 P;ESegoe UI;M200;SB;L9 P;ECalibri;M180;L9 F;P0;DG0G8;M300 B;Y12;X5;D0 0 11 4 O;L;D;V0;K47;G100 0.001 F;M495;R1 F;SM24;Y1;X1 C;K"name" F;SM24;X2 C;K"Shares" F;SM24;X3 C;K"Quote/ Price" F;SM24;X4 C;K"cost/ share" F;SM24;X5 C;K"total cost" F;SM24;Y2;X1 C;K"aapl" F;P4;FF2G;SM24;X2 C;K1454.4024 F;SM24;X3 C;K126.85 F;SM24;X4 C;K79.006952 F;P4;FF2G;SM24;X5 C;K114907.9 F;SM24;Y3;X1 C;K"axp" F;P4;FF2G;SM24;X2 C;K1454.4108 F;SM24;X3 C;K79.27 F;SM24;X4 … name Shares Quote/ Price cost/ share total cost aapl 1,454.40 126.85 79.006952 114,907.90 axp 1,454.41 79.27 84.671889 123,147.71 bmy 3,666.51 63.95 43.25259 158,586.21 brk.b 1,000 143.46 119.3527 119,352.70 celg 1,000 116.44 102.47094 102,470.94 chl 500 71.4 71.4179 35,708.95 The format, typing, and accuracy of the data 1337ADC
  • 10. Copyright © 2016, SAS Institute Inc. All rights reserved. 10 The 5 Characteristics of Data Data Access Domain Structure Audience Integrity 1339F9C
  • 11. Copyright © 2016, SAS Institute Inc. All rights reserved. 11 Challenging the Existing Data Paradigm Support numerous new data sources Establish a shared source staging area Allow “trial & error” analysis for all users Support Self Service Data (ETL, report, analysis, etc.) Support different levels of data acceptance 1339F9C
  • 12. Copyright © 2016, SAS Institute Inc. All rights reserved. 12 Data Regions Internal Applications SourceData Repository Cloud Applications Data StreamsFiles Services Inbound Data Source Onboarding Sandbox Reporting & BI Enterprise View Data Exploration Advanced Analytics & Modeling Messages 133A061
  • 13. Copyright © 2016, SAS Institute Inc. All rights reserved. 13 Data Regions Addressing an Enterprise Data Need Internal Applications SourceData Repository Cloud Applications Data StreamsFiles Services Inbound Data Source Onboarding Sandbox Reporting & BI Enterprise View Data Exploration Advanced Analytics & Modeling Messages Create an environment that fits user needs (not IT convenience) Support data onboarding and distribution as a production need Support a diverse set of data usage needs Address the complexities of data movement Reduce resource/skill overlap across the company 133A061
  • 14. Copyright © 2016, SAS Institute Inc. All rights reserved. 14 Data Regions Source Onboarding Audience Source Onboarding developers only; receiving for Source Data repository Access Supports multiple delivery methods: txns, messages, bulk formats. Structure Data layout based on source system. Likely dynamic & volatile Domain N/A. This detail is implicit with the data source and the supplier. Integrity N/A. Data details are defined by the data supplier. • Manages the delivery of data from internal & external sources • Holds data until acceptance is complete; Data is then moved to the Source Data Repository • Centralized support for sophisticated data capture methods (ESP, 3rd party data delivery, API/messaging, etc.) • Productionalizes source data capture, identification and sharing 1339F9C
  • 15. Copyright © 2016, SAS Institute Inc. All rights reserved. 15 Data Regions Source Data Repository • Stores and retains all source data content; reduces enterprise storage requirements • Establishes centralized registry of available data sources. • Reflects a defined data layout (independent of source changes) • Alleviates developers’ need to learn data navigation, layout, naming conventions on dozens of source systems Audience Data Integration (Developers – DW, Application, Data Scientists, etc. ) Access Usually file oriented (transaction and other access based on situation) Structure Company-centric, documented layout; Incl structured & unstructured Domain N/A. Data reflects source Integrity Company-centric format; Data quality and accuracy not addressed.1339F9C
  • 16. Copyright © 2016, SAS Institute Inc. All rights reserved. 16 Data Regions Data Exploration • Supports one-off, in depth business analysis using any data ─ Environment is permanent but resource usage is very transient ─ Does not support production application access or deployment • Often a general purpose platform that can support numerous technologies (Big Data, files, RDBMS, advanced analytics, etc.) • A walled-off, protected data scientist-centric environment Audience Data Scientists & Analytics Developers (unable to be supported by sandbox) Access All access methods due to the “from scratch” nature of environment Structure All data layouts. (Unstructured likely due to focus on new concept development) Domain Typically enterprise or line of business level Integrity Data transformed/standardized to streamline exploration efforts (often ignored for new or unknown data sources)1339F9C
  • 17. Copyright © 2016, SAS Institute Inc. All rights reserved. 17 Data Regions Enterprise View • Contains multiple integrated subject areas (w/ long-term history) • Content reflects enterprise trusted (and corrected) data • Includes metadata (terms, definitions, lineage, etc.) • Supports query processing and data provisioning ─ Online end-user queries and reporting ─ Data provisioning to analytical and transactional systems ─ Content continually updated (where possible) Audience All user. Most access will occur via query tools or data manipulation/ETL tools Access Usually query-based access (w/existing tools). Unstructured requires APIs Structure Data is usually structured. (unstructured requires special tools/extensions Domain Enterprise level. Other domains may use content for provisioning purposes Integrity Reflective of enterprise terminology and value standards1339F9C
  • 18. Copyright © 2016, SAS Institute Inc. All rights reserved. 18 Data Regions Sandbox • Allowing users to extend their analysis with custom data ─ Supports structured data and queries using existing tools/technologies ─ Focused on supporting additional (external) data • Environment is temporary; does not support production ─ Walled-off environment; reports or data not distributable • Allows for business-level data discovery and exploration ─ Supports one-off user data needs Audience Advanced business users. Requites dbms query and data integration skills Access Data is accessible via SQL/table environment. Structure Data content is structured and RDBMS oriented (goal is data variety) Domain Any/All domains (enterprise to individual) Integrity Enterprise data is standardized/corrected. Other data must be addressed by user1337ADC1339F9C
  • 19. Copyright © 2016, SAS Institute Inc. All rights reserved. 19 Data Regions Reporting and Business Intelligence • Supports defined reporting and ad hoc analysis (departmental data marts) • Supports an application- or tool-centric view of data ─ Simplifies tool access and data manipulation, or ─ Reflects unique business (organization) view of data details • Requires additional technical staff resources ─ ETL processing for additional sources, aggregates, hierarchies, etc. ─ Query and usage support for non-enterprise data Audience Business users focused on using standard reports and content Access Usually SQL-based access. Some data may be tool-centric (e.g. OLAP cubes) Structure Usually structured data and reflecting rows of columns Domain Likely to use enterprise data. Additional data may reflect different structure or domain as needed. Integrity Enterprise data is standardized/corrected. Other data must be addressed by user1337ADC1339F9C
  • 20. Copyright © 2016, SAS Institute Inc. All rights reserved. 20 Data Regions Advanced Analytics & Modeling • A processing environment that can support advanced analytics ─ Typically general purpose processing platforms with inexpensive directly attached storage ─ Data is structured and often stored in highly denormalized structures ─ usually driven by a specialized tool or language • Typically small, high-value user audience • Production-supported environment. Data & Results are distributed Audience Highly skilled technical staff (data scientists, developers with advanced analysis skills) Access Data accessed via specialized tools using standard and custom access methods. Structure Data is usually structured; May process unstructured data into structured content Domain Typically enterprise-level data. Business drivers are often specific to organization Integrity Data is often cleansed and standardized 1339F9C
  • 21. Copyright © 2016, SAS Institute Inc. All rights reserved. 21 Data Services SourceData Repository Source Onboarding Sandbox Reporting & BI Enterprise View Data Exploration Advanced Analytics & Modeling Data Transformation Data Quality Data Governance Metadata 1339F9C
  • 22. Copyright © 2016, SAS Institute Inc. All rights reserved. 22 Getting Started, Moving Forward… • Evaluate the diversity of audiences and domains − Understand the unique combinations – those dictate the complexity of your environment − Review the external data that is already in use • Extend your environment one region at a time − Focus on adding (or remediating) regions based on business need • Sharing data is not a courtesy – it’s a production need − Data provisioning and integration is a costly activity; it should be addressed with “economies-of-scale” methods − Establishing repositories (with card catalogs) to provide “raw” and “approved” data is a necessity 13378871339F9C
  • 23. Copyr ight © 2016, SAS Institute Inc. All rights reser ved . THANKS! www.EvanJLevy.com@EvanJayLevy Evan.Levy@SAS.com