Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
C o p yri g h t © 2 0 1 5 , S A S In s t i t u t e In c . A l l ri g h t s
re s e rve d .
1
Data Regions:
Modernizing Your...
Copyright © 2016, SAS Institute Inc. All rights reserved. 2
A 20 Year Old Paradigm
The Change Data Perspective
Traditional...
Copyright © 2016, SAS Institute Inc. All rights reserved. 3
Data Challenges…
“Why is all the data put into the
warehouse? ...
Copyright © 2016, SAS Institute Inc. All rights reserved. 4
Data Characteristics
Data
Access
Domain
Structure
Audience
Int...
Copyright © 2016, SAS Institute Inc. All rights reserved. 5
Data Characteristics
Audience
The individual user (and their s...
Copyright © 2016, SAS Institute Inc. All rights reserved. 6
A business analyst running a
report on DBMS tables
Data Charac...
Copyright © 2016, SAS Institute Inc. All rights reserved. 7
Data Characteristics
Structure
Structured Data Semi Structured...
Copyright © 2016, SAS Institute Inc. All rights reserved. 8
Enterprise
Business Unit
Data Characteristics
Domain
Organizat...
Copyright © 2016, SAS Institute Inc. All rights reserved. 9
Data Characteristics
Integrity
Client John Smith
Username Orac...
Copyright © 2016, SAS Institute Inc. All rights reserved. 10
The 5 Characteristics of Data
Data
Access
Domain
Structure
Au...
Copyright © 2016, SAS Institute Inc. All rights reserved. 11
Challenging the Existing Data Paradigm
Support numerous new
d...
Copyright © 2016, SAS Institute Inc. All rights reserved. 12
Data Regions
Internal
Applications
SourceData
Repository
Clou...
Copyright © 2016, SAS Institute Inc. All rights reserved. 13
Data Regions
Addressing an Enterprise Data Need
Internal
Appl...
Copyright © 2016, SAS Institute Inc. All rights reserved. 14
Data Regions
Source Onboarding
Audience Source Onboarding dev...
Copyright © 2016, SAS Institute Inc. All rights reserved. 15
Data Regions
Source Data Repository
• Stores and retains all ...
Copyright © 2016, SAS Institute Inc. All rights reserved. 16
Data Regions
Data Exploration
• Supports one-off, in depth bu...
Copyright © 2016, SAS Institute Inc. All rights reserved. 17
Data Regions
Enterprise View
• Contains multiple integrated s...
Copyright © 2016, SAS Institute Inc. All rights reserved. 18
Data Regions
Sandbox
• Allowing users to extend their analysi...
Copyright © 2016, SAS Institute Inc. All rights reserved. 19
Data Regions
Reporting and Business Intelligence
• Supports d...
Copyright © 2016, SAS Institute Inc. All rights reserved. 20
Data Regions
Advanced Analytics & Modeling
• A processing env...
Copyright © 2016, SAS Institute Inc. All rights reserved. 21
Data Services
SourceData
Repository
Source
Onboarding
Sandbox...
Copyright © 2016, SAS Institute Inc. All rights reserved. 22
Getting Started, Moving Forward…
• Evaluate the diversity of ...
Copyr ight © 2016, SAS Institute Inc. All rights reser ved .
THANKS!
www.EvanJLevy.com@EvanJayLevy
Evan.Levy@SAS.com
Upcoming SlideShare
Loading in …5
×

Data Regions: Modernizing your company's data ecosystem

378 views

Published on

Data Regions: Modernizing your company's data ecosystem

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Data Regions: Modernizing your company's data ecosystem

  1. 1. C o p yri g h t © 2 0 1 5 , S A S In s t i t u t e In c . A l l ri g h t s re s e rve d . 1 Data Regions: Modernizing Your Company’s Data Ecosystem Evan Levy Vice President, Data Management Programs SAS EvanJayLevy
  2. 2. Copyright © 2016, SAS Institute Inc. All rights reserved. 2 A 20 Year Old Paradigm The Change Data Perspective Traditional Assumption All data originates from internal systems The company runs on OLTP systems Users have the BI/DW to address their reporting and analysis needs Users require data from many sources (and the quantity is growing) Business Operations rely on OLTP, Data, and Analytics The Data Warehouse is the data source Today’s RealityMost data is internal; >35% is external Today’s Reality We have multiple analytical systems: data mining, exploration, sandboxes, etc. 1339F9C1339F9C
  3. 3. Copyright © 2016, SAS Institute Inc. All rights reserved. 3 Data Challenges… “Why is all the data put into the warehouse? Only 3 people need to use the data” “Can you tell me what data we purchased from outside vendors?” “Why will it take you 30 days to load data? I can cut and paste it into my server in 4 minutes.” “We have to standardize business terminology. We’ve learned that data governance is critical.” “Why do I have to work around the ‘infrastructure’. Shouldn’t it be built for my needs?” “You send me a file from SalesForce every month, and the layout changes every month. And you don’t tell me.” “We have data all over (systems, the cloud, external apps, etc.) Why don’t we have a catalog of the sources? “Finance wants all data reconciled. I can’t wait. Why do I have to suffer from their requirements?” 133A061
  4. 4. Copyright © 2016, SAS Institute Inc. All rights reserved. 4 Data Characteristics Data Access Domain Structure Audience Integrity 1337ADC
  5. 5. Copyright © 2016, SAS Institute Inc. All rights reserved. 5 Data Characteristics Audience The individual user (and their skills and data needs) Reviewing data about a known situations Report users DW Developers Uses ETL tools to retrieve and load data Analytic Developers Builds analytical models to manipulate known data Data Scientists Analyzes any available data to identify new trends BI Developers Building reports using structured data Business Analyst Analyzing data to for a new hypothesis Develops code to navigate any available data source Application Developers 1337ADC
  6. 6. Copyright © 2016, SAS Institute Inc. All rights reserved. 6 A business analyst running a report on DBMS tables Data Characteristics Access Custom code navigating a flat file (to retrieve specific values) Code call platform specific APIs for data access The methods, interfaces, and tools used to access the data A cloud-application sending transactions SQL An application listening / receiving event streams A data scientist playing with data in a sandbox Access 1337ADC
  7. 7. Copyright © 2016, SAS Institute Inc. All rights reserved. 7 Data Characteristics Structure Structured Data Semi Structured Data Unstructured Data The structure and organization of the data content 1337ADC
  8. 8. Copyright © 2016, SAS Institute Inc. All rights reserved. 8 Enterprise Business Unit Data Characteristics Domain Organization Project Individual The business context for data usage1337ADC
  9. 9. Copyright © 2016, SAS Institute Inc. All rights reserved. 9 Data Characteristics Integrity Client John Smith Username Oracleuser RequestDate 9/28/2000 Request Time 23:59:07 Status Code OK Browser Netscape 203.93.245.97 - oracleuser [28/Sep/2000:23:59:07 - 0700] "GET /files/search/search.jsp?s=driver&a=10 HTTP/1.0" 200 2374 "http://datawarehouse. oracle.co/contents.htm" "Mozilla/4.7 [en] (WinNT; I)" P;ECalibri;M220;SB;L10 P;ECalibri;M220;L11 P;ECalibri;M220;SI;L24 P;ECalibri;M220;SB;L9 P;ECalibri;M220;L10 P;ESegoe UI;M200;L9 P;ESegoe UI;M200;SB;L9 P;ECalibri;M180;L9 F;P0;DG0G8;M300 B;Y12;X5;D0 0 11 4 O;L;D;V0;K47;G100 0.001 F;M495;R1 F;SM24;Y1;X1 C;K"name" F;SM24;X2 C;K"Shares" F;SM24;X3 C;K"Quote/ Price" F;SM24;X4 C;K"cost/ share" F;SM24;X5 C;K"total cost" F;SM24;Y2;X1 C;K"aapl" F;P4;FF2G;SM24;X2 C;K1454.4024 F;SM24;X3 C;K126.85 F;SM24;X4 C;K79.006952 F;P4;FF2G;SM24;X5 C;K114907.9 F;SM24;Y3;X1 C;K"axp" F;P4;FF2G;SM24;X2 C;K1454.4108 F;SM24;X3 C;K79.27 F;SM24;X4 … name Shares Quote/ Price cost/ share total cost aapl 1,454.40 126.85 79.006952 114,907.90 axp 1,454.41 79.27 84.671889 123,147.71 bmy 3,666.51 63.95 43.25259 158,586.21 brk.b 1,000 143.46 119.3527 119,352.70 celg 1,000 116.44 102.47094 102,470.94 chl 500 71.4 71.4179 35,708.95 The format, typing, and accuracy of the data 1337ADC
  10. 10. Copyright © 2016, SAS Institute Inc. All rights reserved. 10 The 5 Characteristics of Data Data Access Domain Structure Audience Integrity 1339F9C
  11. 11. Copyright © 2016, SAS Institute Inc. All rights reserved. 11 Challenging the Existing Data Paradigm Support numerous new data sources Establish a shared source staging area Allow “trial & error” analysis for all users Support Self Service Data (ETL, report, analysis, etc.) Support different levels of data acceptance 1339F9C
  12. 12. Copyright © 2016, SAS Institute Inc. All rights reserved. 12 Data Regions Internal Applications SourceData Repository Cloud Applications Data StreamsFiles Services Inbound Data Source Onboarding Sandbox Reporting & BI Enterprise View Data Exploration Advanced Analytics & Modeling Messages 133A061
  13. 13. Copyright © 2016, SAS Institute Inc. All rights reserved. 13 Data Regions Addressing an Enterprise Data Need Internal Applications SourceData Repository Cloud Applications Data StreamsFiles Services Inbound Data Source Onboarding Sandbox Reporting & BI Enterprise View Data Exploration Advanced Analytics & Modeling Messages Create an environment that fits user needs (not IT convenience) Support data onboarding and distribution as a production need Support a diverse set of data usage needs Address the complexities of data movement Reduce resource/skill overlap across the company 133A061
  14. 14. Copyright © 2016, SAS Institute Inc. All rights reserved. 14 Data Regions Source Onboarding Audience Source Onboarding developers only; receiving for Source Data repository Access Supports multiple delivery methods: txns, messages, bulk formats. Structure Data layout based on source system. Likely dynamic & volatile Domain N/A. This detail is implicit with the data source and the supplier. Integrity N/A. Data details are defined by the data supplier. • Manages the delivery of data from internal & external sources • Holds data until acceptance is complete; Data is then moved to the Source Data Repository • Centralized support for sophisticated data capture methods (ESP, 3rd party data delivery, API/messaging, etc.) • Productionalizes source data capture, identification and sharing 1339F9C
  15. 15. Copyright © 2016, SAS Institute Inc. All rights reserved. 15 Data Regions Source Data Repository • Stores and retains all source data content; reduces enterprise storage requirements • Establishes centralized registry of available data sources. • Reflects a defined data layout (independent of source changes) • Alleviates developers’ need to learn data navigation, layout, naming conventions on dozens of source systems Audience Data Integration (Developers – DW, Application, Data Scientists, etc. ) Access Usually file oriented (transaction and other access based on situation) Structure Company-centric, documented layout; Incl structured & unstructured Domain N/A. Data reflects source Integrity Company-centric format; Data quality and accuracy not addressed.1339F9C
  16. 16. Copyright © 2016, SAS Institute Inc. All rights reserved. 16 Data Regions Data Exploration • Supports one-off, in depth business analysis using any data ─ Environment is permanent but resource usage is very transient ─ Does not support production application access or deployment • Often a general purpose platform that can support numerous technologies (Big Data, files, RDBMS, advanced analytics, etc.) • A walled-off, protected data scientist-centric environment Audience Data Scientists & Analytics Developers (unable to be supported by sandbox) Access All access methods due to the “from scratch” nature of environment Structure All data layouts. (Unstructured likely due to focus on new concept development) Domain Typically enterprise or line of business level Integrity Data transformed/standardized to streamline exploration efforts (often ignored for new or unknown data sources)1339F9C
  17. 17. Copyright © 2016, SAS Institute Inc. All rights reserved. 17 Data Regions Enterprise View • Contains multiple integrated subject areas (w/ long-term history) • Content reflects enterprise trusted (and corrected) data • Includes metadata (terms, definitions, lineage, etc.) • Supports query processing and data provisioning ─ Online end-user queries and reporting ─ Data provisioning to analytical and transactional systems ─ Content continually updated (where possible) Audience All user. Most access will occur via query tools or data manipulation/ETL tools Access Usually query-based access (w/existing tools). Unstructured requires APIs Structure Data is usually structured. (unstructured requires special tools/extensions Domain Enterprise level. Other domains may use content for provisioning purposes Integrity Reflective of enterprise terminology and value standards1339F9C
  18. 18. Copyright © 2016, SAS Institute Inc. All rights reserved. 18 Data Regions Sandbox • Allowing users to extend their analysis with custom data ─ Supports structured data and queries using existing tools/technologies ─ Focused on supporting additional (external) data • Environment is temporary; does not support production ─ Walled-off environment; reports or data not distributable • Allows for business-level data discovery and exploration ─ Supports one-off user data needs Audience Advanced business users. Requites dbms query and data integration skills Access Data is accessible via SQL/table environment. Structure Data content is structured and RDBMS oriented (goal is data variety) Domain Any/All domains (enterprise to individual) Integrity Enterprise data is standardized/corrected. Other data must be addressed by user1337ADC1339F9C
  19. 19. Copyright © 2016, SAS Institute Inc. All rights reserved. 19 Data Regions Reporting and Business Intelligence • Supports defined reporting and ad hoc analysis (departmental data marts) • Supports an application- or tool-centric view of data ─ Simplifies tool access and data manipulation, or ─ Reflects unique business (organization) view of data details • Requires additional technical staff resources ─ ETL processing for additional sources, aggregates, hierarchies, etc. ─ Query and usage support for non-enterprise data Audience Business users focused on using standard reports and content Access Usually SQL-based access. Some data may be tool-centric (e.g. OLAP cubes) Structure Usually structured data and reflecting rows of columns Domain Likely to use enterprise data. Additional data may reflect different structure or domain as needed. Integrity Enterprise data is standardized/corrected. Other data must be addressed by user1337ADC1339F9C
  20. 20. Copyright © 2016, SAS Institute Inc. All rights reserved. 20 Data Regions Advanced Analytics & Modeling • A processing environment that can support advanced analytics ─ Typically general purpose processing platforms with inexpensive directly attached storage ─ Data is structured and often stored in highly denormalized structures ─ usually driven by a specialized tool or language • Typically small, high-value user audience • Production-supported environment. Data & Results are distributed Audience Highly skilled technical staff (data scientists, developers with advanced analysis skills) Access Data accessed via specialized tools using standard and custom access methods. Structure Data is usually structured; May process unstructured data into structured content Domain Typically enterprise-level data. Business drivers are often specific to organization Integrity Data is often cleansed and standardized 1339F9C
  21. 21. Copyright © 2016, SAS Institute Inc. All rights reserved. 21 Data Services SourceData Repository Source Onboarding Sandbox Reporting & BI Enterprise View Data Exploration Advanced Analytics & Modeling Data Transformation Data Quality Data Governance Metadata 1339F9C
  22. 22. Copyright © 2016, SAS Institute Inc. All rights reserved. 22 Getting Started, Moving Forward… • Evaluate the diversity of audiences and domains − Understand the unique combinations – those dictate the complexity of your environment − Review the external data that is already in use • Extend your environment one region at a time − Focus on adding (or remediating) regions based on business need • Sharing data is not a courtesy – it’s a production need − Data provisioning and integration is a costly activity; it should be addressed with “economies-of-scale” methods − Establishing repositories (with card catalogs) to provide “raw” and “approved” data is a necessity 13378871339F9C
  23. 23. Copyr ight © 2016, SAS Institute Inc. All rights reser ved . THANKS! www.EvanJLevy.com@EvanJayLevy Evan.Levy@SAS.com

×