Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT FOR
HIGH-PERFORMANCE ANALYTICS
DAN SOCEANU
SENIOR SOLUTIONS ARCHITECT
DATA MANAGEMENT
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
BEFORE WE BEGIN SAS ACKNOWLEDGEMENTS
Ron Agresta, Product Director, Data Management
Lisa Dodson, Global Technology Practice Manager, Data Management
David Pope, Pre-Sales Manager, Energy & Manufacturing
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT WHY ARE WE HERE?
• Data is rarely fit for analytic
purposes
• End-users are overwhelmed
o What data do I use?
o How do I load data?
o How can I find only the data I
need?
• Real-time needs
• The rise of “self-service
analytics”
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
CAN YOU LEVERAGE OPEN SOURCE
ANALYTICS?
CAN YOU
SCALE YOUR
DATA AND YOUR
ANALYTICS?
DO YOU GROW
A CULTURE OF
INNOVATION?
CAN YOU ANALYZE ALL
OF YOUR DATA?
CAN YOU MODERNIZE
YOUR LEGACY BI
STRATEGY?
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Data Management for
High Performance
Analytics
0
IoT
Operational
Unstructured
Web
Text
Optimization
Forecasting
Mining
High Performance
Analytics
Data Sources
DATA MANAGEMENT BRIDGING THE GAP
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Data Access Tier
Analytical Tier
Visualization Tier
Data Preparation Tier
Visualization
Analytics
Preparation
Access
DATA MANAGEMENT
CONVERGENCE OF DATA PREP, ANALYTICAL
PROCESSING AND PROVISIONING
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT DATA FLOW FOR HIGH PERFORMANCE ANALYTICS
Data Management
Data
Warehouse
Dynamic
ReportingRead
ETL
Dynamic
Visualization
ACCESS
DataManagement
Analytical
Data
Warehouse
DataMonitoring
ExplorationQualityIntegration
MDM
Data
Marts
Model
Development
Operational
MQ
XML
Cloud
SOURCES
Repository
High
Performance
Analytics
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ANALYTICS HISTORICAL VS. ADVANCED
Descriptive
 What happened?
 When?
 Why?
• Frequency
Distributions
• Correlation Measures
• Event Study
• Association Rules
Predictive
 What will happen?
 When?
 Why?
 How does that effect us?
 What actions should I
take?
• Estimation & Forecasting
• Segmentation
• Optimization
ANALYTICS
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
HIGH-PERFORMANCE
ANALYTICS
SAS SOLUTIONS
SAS High-Performance Data Mining
Predictive models using thousands of variables to produce more accurate and timely insights
SAS High-Performance Econometrics
Analytical models using complete data, not just a subset
SAS High-Performance Optimization
Model and solve optimization problems that are very large or cumbersome to solve
SAS High-Performance Statistics
Statistical models using big data to produce more accurate and timely insights
SAS High-Performance Text Mining
Better understand communications and create new value from big text data
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
HIGH-PERFORMANCE
ANALYTICS
SAS ANALYTIC PROCESSING APPROACHES
Traditional
Move data from source to the SAS server, process it and write back results (single server or SAS
Grid Manager)
In-Database
Move SAS processing to the data source and allow SAS processing to occur under the control of
the source environment (e.g. relational database or Hadoop). The analytic code executes in the
database process.
In-memory “Alongside” the Database
Move SAS processing to the data source but allow a SAS process to run "along-side”. The analytic
processes and the database processes are co-located and share resources.
In-memory “Next to” the Database
Move data from source to a dedicated SAS environment for processing. Does not require making
a physical copy of the data before processing and, once the processing is complete, the data is
not required to be kept in the dedicated SAS environment. This separates the resources
associated with data storage & processing and the SAS advanced analytical processing.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT VS. DATA PREPARATION?
Business Need
• Support analytical methods for decision
making, use cases and required actions
Data Governance
• Gap assessment; people, process and
technology
• Auditability, traceability, automated rules,
monitoring, collaboration
Productivity
• Data preparation, provisioning, reporting
DATA MANAGEMENT
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT VS. DATA PREPARATION?
Business Need
• Support analytical methods for decision
making, use cases and required actions
Data Governance
• Gap assessment; people, process and
technology
• Auditability, traceability, automated rules,
monitoring, collaboration
Productivity
• Data preparation, provisioning, reporting
DATA MANAGEMENT DATA PREPARATION
Identify
• Profile
• Data types
• Numeric
• Character
• Contextual
• Cardinality
Access
• ETL
• Batch
• Real-time
• Latency
• Data Movement
• Connectivity
• Data Sources
Data Quality
• De-duplicate
• Standardize
• Missing values
• Imputation
• Enrich
• Binning
• Matching
• Identify
anomalies
Reshape
• Wide & flat
• Long & lean
• Transformation
logic
• Transpositions
• Frequency
analysis
• Appending data
• Partitioning
data
• Summarization
Metadata
• Lineage
• Semantic
glossary
• Data
relationships
• Impact analysis
• Hierarchy
management
• Collaboration
• Repeatability
• Entity
management
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT THE ROLE OF DATA GOVERNANCE
Data Lifecycle
Reference and
Master Data
Data Security
Data
Architecture
Metadata Data Quality
Data
Administration
Data Warehousing
& BI/Analytics
DATA MANAGEMENT
DataStewardship
Roles&Tasks
Decision-making Bodies
Guiding Principles
Program Objectives
Decision Rights
DATA GOVERNANCE
DG without DM = only an academic exercise
DM without DG = the continued culture of “I know a guy”
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT THE IMPORTANCE OF DATA GOVERNANCE
POSITIONS ENTERPRISE DATA ISSUES AS CROSS-FUNCTIONAL
• Establishes guiding principles for data sharing
• Eliminates data ownership issues and “turf wars”
• Ensures appropriate stakeholders have a say in decision making
ESTABLISHES BUSINESS STAKEHOLDERS AS INFORMATION OWNERS
• Aligns data policy with business strategies and priorities
• Aligns data quality with business measures and acceptance
• Helps to Identify ROI for data related activity
FORMALIZES DATA STEWARDSHIP
• Clarifies accountability for data definitions, rules, and quality
• Ensures data is managed separately from applications
• Formalizes monitoring and measurement of critical data
FOSTERS IMPROVED ALIGNMENT BETWEEN BUSINESS AND IT
• Links IT-driven data management activities with business unit activity
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
PARADIGM SHIFT
DATA PREPARATION IS ABOUT THE
BUSINESS NEED & USE CASE
80% 20%
Identify Access Data Quality Reshape Metadata Business Use
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA PREPARATION FIVE KEY FOCUS AREAS
DATA PREPARATION
Identify
•Profile
•Data types
•Numeric
•Character
•Contextual
•Cardinality
Access
•ETL
•Batch
•Real-time
•Latency
•Data Movement
•Connectivity
•Data Sources
Data Quality
•De-duplicate
•Standardize
•Missing values
•Imputation
•Enrich
•Binning
•Matching
•Identify
anomalies
Reshape
•Wide & flat
•Long & lean
•Transformation
logic
•Transpositions
•Frequency
analysis
•Appending data
•Partitioning data
•Summarization
Metadata
•Lineage
•Semantic
glossary
•Data
relationships
•Impact analysis
•Hierarchy
management
•Collaboration
•Repeatability
•Entity
management
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
IDENTIFY WHAT DO I HAVE AND HOW USEFUL IS IT?
Is my data
consistent?
Is my data
complete?
Is my data
highly
unique?
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
IDENTIFY WHAT DO I HAVE AND HOW USEFUL IS IT?
Is my data
normal?
Is my data
linear?
What are the
associations in
the data?
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ACCESS SO MANY DATA TYPES AND SOURCES
Access Excel SQLServer Oracle MySQL
Boolean Yes/No Bit Byte N/A Boolean
integer Number Int Number Int Int
float Number
(single)
Float Number Float Numeric
currency Currency Money NA NA Money
string NA Char Char Char Char
string Text VarChar VarChar VarChar VarChar
binary OLE Obj
Memo
Binary
Varbinary
Image
Long
Raw
Blob
Text
Binary
Varbinary
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
DATA QUALITY THE FOUNDATION
• Standardization
• Parsing
• Casing
• Identification
• De-duplication
• “Fuzzy” matching
• Clustering
• Entity resolution
• Survivorship
• Gender Analysis
• Locale Guessing
• Address Verification
• Address Enrichment (geocoding)
Business
Logic &
Rules
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA QUALITY FILLING IN THE GAPS AND STANDARDIZING
Standardizing
Text
De-duplication
Standardizing Numeric
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
FILLING IN THE GAPS AND STANDARDIZING
Dropping outliers
Grouping or binning data
DATA QUALITY
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
RESHAPE FIT FOR PURPOSE?
Schema/view
Or
Flat Table?
Format of data
Data quality
dimensions?
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
RESHAPE FLATTENING THE DATA
• Efficient storage
• Fast retrieval
• Defined
schema
• WIDE tables /Time series data
• Iteration (build, test, repeat)
• Schema-less
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
RESHAPE SUMMARIZATION
Each product category will become its own row, with each
product purchased its own distinct category column.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
RESHAPE TRANSPOSITION FOR DATA MINING
Add up the quantities for
each product purchased,
in each product category.
Copyright © 2013, SAS Institute Inc. All rights reserved.
METADATA MANAGE DATA HIERARCHIES AND RELATIONSHIPS
Customer
Types
Hierarchy
Coverage
Products
Financial
Accounts
Address
Inquiries
Product Party
Accounts
Transactions
Authorizations
Individual Organization
Inquiries
Loans
Terms
Collaterals
Ratings
External
Assets
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
METADATA ENTITY RESOLUTION
EMPLOYER_NA
ME_GRPID
EMPLOYER_NAME = Name of the client employer
(SOL0003n_Employer_Name)
cnt
28296ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A. S 6
ČESKOSLOVENSKÁ OBCHODNÍ BANKA A.S. 182
ČSKOSLOVENSKÁ OBCHODNÍ BANKA A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA. A.S. 2
ČESKOSLOVENSKÁ OBCHODNÍ BANKA,A.S. 78
ČESKOSLOVENSKÁ OBCHODNÍ BANKA A. S. 9
ČESKOSLOVENSKÁ OBCHODNÍ BANKA ,A.S. 2
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S 6
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S . 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S. 717
ČESKOSLOVENSKUÁ OBCHODNÍ BANKA A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA A.S 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, S.R.O. 3
ČESKOSLOVENSKÁ OBCHODNÍ BAŃKA, A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A. S. 587
ČESKOSLOVENSKÁOBCHODNÍBANKA, A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍBANKA A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANLA 1
ČESKOSLOVENSKÁ OBCHODNÍ BÁNKA, A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA,A.S 2
ČESKOSLOVENSKÁ OBCHODNÍ BANKA 27
ČESKOSLOVENSKÁOBCHODNÍBANKA,A.S. 1
Example:
Entity Resolution
Employer Name
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
METADATA SEMANTIC RECONCILIATION AND BUSINESS GLOSSARY
Business Glossary and
Terms
Technical Architecture Diagram
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
METADATA LINEAGE & TRACEABILITY
A view into existing
data sources/targets,
jobs and the
associated ‘owners’
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
METADATA COLLABORATION AND REPEATABILITY
Collaboration
& Role-based
Dashboarding
Workflow & Data
Remediation
Process Orchestration
Unified Lineage
Job Monitoring
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
Decision MakingCustomer Focus
Compliance
Mandates
Mergers &
Acquisitions
At-Risk Projects
Operational
Efficiencies
CORPORATE DRIVERS
Data Quality
Data
Integration
Reference Data
Management
Master Data
Management
Data
Visualization
Data
Monitoring
Metadata
Management
Business
Glossary
SOLUTIONS
Data Lifecycle
Reference and
Master Data
Data Security
Data
Architecture
Metadata Data Quality
Data
Administration
Data Warehousing
& BI/Analytics
DATA MANAGEMENT
DataStewardship
Roles&Tasks
Decision-making Bodies
Guiding Principles
Program Objectives
Decision Rights
DATA GOVERNANCE
People
Process
Technology
METHODS
SAS DATA
MANAGEMENT
FRAMEWORK FOR SUCCESS
Data
Virtualization
Data Profiling
& Exploration
Copyright © 2013, SAS Institute Inc. All rights reserved.
QUESTIONS & ANSWERS THANK YOU!
DAN.SOCEANU@SAS.COM

Data Management for High Performance Analytics

  • 1.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT FOR HIGH-PERFORMANCE ANALYTICS DAN SOCEANU SENIOR SOLUTIONS ARCHITECT DATA MANAGEMENT
  • 2.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. BEFORE WE BEGIN SAS ACKNOWLEDGEMENTS Ron Agresta, Product Director, Data Management Lisa Dodson, Global Technology Practice Manager, Data Management David Pope, Pre-Sales Manager, Energy & Manufacturing
  • 3.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT WHY ARE WE HERE? • Data is rarely fit for analytic purposes • End-users are overwhelmed o What data do I use? o How do I load data? o How can I find only the data I need? • Real-time needs • The rise of “self-service analytics”
  • 4.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. CAN YOU LEVERAGE OPEN SOURCE ANALYTICS? CAN YOU SCALE YOUR DATA AND YOUR ANALYTICS? DO YOU GROW A CULTURE OF INNOVATION? CAN YOU ANALYZE ALL OF YOUR DATA? CAN YOU MODERNIZE YOUR LEGACY BI STRATEGY?
  • 5.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. Data Management for High Performance Analytics 0 IoT Operational Unstructured Web Text Optimization Forecasting Mining High Performance Analytics Data Sources DATA MANAGEMENT BRIDGING THE GAP
  • 6.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. Data Access Tier Analytical Tier Visualization Tier Data Preparation Tier Visualization Analytics Preparation Access DATA MANAGEMENT CONVERGENCE OF DATA PREP, ANALYTICAL PROCESSING AND PROVISIONING
  • 7.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT DATA FLOW FOR HIGH PERFORMANCE ANALYTICS Data Management Data Warehouse Dynamic ReportingRead ETL Dynamic Visualization ACCESS DataManagement Analytical Data Warehouse DataMonitoring ExplorationQualityIntegration MDM Data Marts Model Development Operational MQ XML Cloud SOURCES Repository High Performance Analytics
  • 8.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. ANALYTICS HISTORICAL VS. ADVANCED Descriptive  What happened?  When?  Why? • Frequency Distributions • Correlation Measures • Event Study • Association Rules Predictive  What will happen?  When?  Why?  How does that effect us?  What actions should I take? • Estimation & Forecasting • Segmentation • Optimization ANALYTICS
  • 9.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. HIGH-PERFORMANCE ANALYTICS SAS SOLUTIONS SAS High-Performance Data Mining Predictive models using thousands of variables to produce more accurate and timely insights SAS High-Performance Econometrics Analytical models using complete data, not just a subset SAS High-Performance Optimization Model and solve optimization problems that are very large or cumbersome to solve SAS High-Performance Statistics Statistical models using big data to produce more accurate and timely insights SAS High-Performance Text Mining Better understand communications and create new value from big text data
  • 10.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. HIGH-PERFORMANCE ANALYTICS SAS ANALYTIC PROCESSING APPROACHES Traditional Move data from source to the SAS server, process it and write back results (single server or SAS Grid Manager) In-Database Move SAS processing to the data source and allow SAS processing to occur under the control of the source environment (e.g. relational database or Hadoop). The analytic code executes in the database process. In-memory “Alongside” the Database Move SAS processing to the data source but allow a SAS process to run "along-side”. The analytic processes and the database processes are co-located and share resources. In-memory “Next to” the Database Move data from source to a dedicated SAS environment for processing. Does not require making a physical copy of the data before processing and, once the processing is complete, the data is not required to be kept in the dedicated SAS environment. This separates the resources associated with data storage & processing and the SAS advanced analytical processing.
  • 11.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT VS. DATA PREPARATION? Business Need • Support analytical methods for decision making, use cases and required actions Data Governance • Gap assessment; people, process and technology • Auditability, traceability, automated rules, monitoring, collaboration Productivity • Data preparation, provisioning, reporting DATA MANAGEMENT
  • 12.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT VS. DATA PREPARATION? Business Need • Support analytical methods for decision making, use cases and required actions Data Governance • Gap assessment; people, process and technology • Auditability, traceability, automated rules, monitoring, collaboration Productivity • Data preparation, provisioning, reporting DATA MANAGEMENT DATA PREPARATION Identify • Profile • Data types • Numeric • Character • Contextual • Cardinality Access • ETL • Batch • Real-time • Latency • Data Movement • Connectivity • Data Sources Data Quality • De-duplicate • Standardize • Missing values • Imputation • Enrich • Binning • Matching • Identify anomalies Reshape • Wide & flat • Long & lean • Transformation logic • Transpositions • Frequency analysis • Appending data • Partitioning data • Summarization Metadata • Lineage • Semantic glossary • Data relationships • Impact analysis • Hierarchy management • Collaboration • Repeatability • Entity management
  • 13.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT THE ROLE OF DATA GOVERNANCE Data Lifecycle Reference and Master Data Data Security Data Architecture Metadata Data Quality Data Administration Data Warehousing & BI/Analytics DATA MANAGEMENT DataStewardship Roles&Tasks Decision-making Bodies Guiding Principles Program Objectives Decision Rights DATA GOVERNANCE DG without DM = only an academic exercise DM without DG = the continued culture of “I know a guy”
  • 14.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT THE IMPORTANCE OF DATA GOVERNANCE POSITIONS ENTERPRISE DATA ISSUES AS CROSS-FUNCTIONAL • Establishes guiding principles for data sharing • Eliminates data ownership issues and “turf wars” • Ensures appropriate stakeholders have a say in decision making ESTABLISHES BUSINESS STAKEHOLDERS AS INFORMATION OWNERS • Aligns data policy with business strategies and priorities • Aligns data quality with business measures and acceptance • Helps to Identify ROI for data related activity FORMALIZES DATA STEWARDSHIP • Clarifies accountability for data definitions, rules, and quality • Ensures data is managed separately from applications • Formalizes monitoring and measurement of critical data FOSTERS IMPROVED ALIGNMENT BETWEEN BUSINESS AND IT • Links IT-driven data management activities with business unit activity
  • 15.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. PARADIGM SHIFT DATA PREPARATION IS ABOUT THE BUSINESS NEED & USE CASE 80% 20% Identify Access Data Quality Reshape Metadata Business Use
  • 16.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. DATA PREPARATION FIVE KEY FOCUS AREAS DATA PREPARATION Identify •Profile •Data types •Numeric •Character •Contextual •Cardinality Access •ETL •Batch •Real-time •Latency •Data Movement •Connectivity •Data Sources Data Quality •De-duplicate •Standardize •Missing values •Imputation •Enrich •Binning •Matching •Identify anomalies Reshape •Wide & flat •Long & lean •Transformation logic •Transpositions •Frequency analysis •Appending data •Partitioning data •Summarization Metadata •Lineage •Semantic glossary •Data relationships •Impact analysis •Hierarchy management •Collaboration •Repeatability •Entity management
  • 17.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. IDENTIFY WHAT DO I HAVE AND HOW USEFUL IS IT? Is my data consistent? Is my data complete? Is my data highly unique?
  • 18.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. IDENTIFY WHAT DO I HAVE AND HOW USEFUL IS IT? Is my data normal? Is my data linear? What are the associations in the data?
  • 19.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. ACCESS SO MANY DATA TYPES AND SOURCES Access Excel SQLServer Oracle MySQL Boolean Yes/No Bit Byte N/A Boolean integer Number Int Number Int Int float Number (single) Float Number Float Numeric currency Currency Money NA NA Money string NA Char Char Char Char string Text VarChar VarChar VarChar VarChar binary OLE Obj Memo Binary Varbinary Image Long Raw Blob Text Binary Varbinary
  • 20.
    Copyr ight ©2012, SAS Institute Inc. All rights reser ved. DATA QUALITY THE FOUNDATION • Standardization • Parsing • Casing • Identification • De-duplication • “Fuzzy” matching • Clustering • Entity resolution • Survivorship • Gender Analysis • Locale Guessing • Address Verification • Address Enrichment (geocoding) Business Logic & Rules
  • 21.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. DATA QUALITY FILLING IN THE GAPS AND STANDARDIZING Standardizing Text De-duplication Standardizing Numeric
  • 22.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. FILLING IN THE GAPS AND STANDARDIZING Dropping outliers Grouping or binning data DATA QUALITY
  • 23.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. RESHAPE FIT FOR PURPOSE? Schema/view Or Flat Table? Format of data Data quality dimensions?
  • 24.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. RESHAPE FLATTENING THE DATA • Efficient storage • Fast retrieval • Defined schema • WIDE tables /Time series data • Iteration (build, test, repeat) • Schema-less
  • 25.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. RESHAPE SUMMARIZATION Each product category will become its own row, with each product purchased its own distinct category column.
  • 26.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. RESHAPE TRANSPOSITION FOR DATA MINING Add up the quantities for each product purchased, in each product category.
  • 27.
    Copyright © 2013,SAS Institute Inc. All rights reserved. METADATA MANAGE DATA HIERARCHIES AND RELATIONSHIPS Customer Types Hierarchy Coverage Products Financial Accounts Address Inquiries Product Party Accounts Transactions Authorizations Individual Organization Inquiries Loans Terms Collaterals Ratings External Assets
  • 28.
    Copyr ight ©2012, SAS Institute Inc. All rights reser ved. METADATA ENTITY RESOLUTION EMPLOYER_NA ME_GRPID EMPLOYER_NAME = Name of the client employer (SOL0003n_Employer_Name) cnt 28296ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A. S 6 ČESKOSLOVENSKÁ OBCHODNÍ BANKA A.S. 182 ČSKOSLOVENSKÁ OBCHODNÍ BANKA A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA. A.S. 2 ČESKOSLOVENSKÁ OBCHODNÍ BANKA,A.S. 78 ČESKOSLOVENSKÁ OBCHODNÍ BANKA A. S. 9 ČESKOSLOVENSKÁ OBCHODNÍ BANKA ,A.S. 2 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S 6 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S . 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S. 717 ČESKOSLOVENSKUÁ OBCHODNÍ BANKA A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA A.S 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, S.R.O. 3 ČESKOSLOVENSKÁ OBCHODNÍ BAŃKA, A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A. S. 587 ČESKOSLOVENSKÁOBCHODNÍBANKA, A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍBANKA A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANLA 1 ČESKOSLOVENSKÁ OBCHODNÍ BÁNKA, A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA,A.S 2 ČESKOSLOVENSKÁ OBCHODNÍ BANKA 27 ČESKOSLOVENSKÁOBCHODNÍBANKA,A.S. 1 Example: Entity Resolution Employer Name
  • 29.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. METADATA SEMANTIC RECONCILIATION AND BUSINESS GLOSSARY Business Glossary and Terms Technical Architecture Diagram
  • 30.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. METADATA LINEAGE & TRACEABILITY A view into existing data sources/targets, jobs and the associated ‘owners’
  • 31.
    Copyr ight ©2015, SAS Institute Inc. All rights reser ved. METADATA COLLABORATION AND REPEATABILITY Collaboration & Role-based Dashboarding Workflow & Data Remediation Process Orchestration Unified Lineage Job Monitoring
  • 32.
    Copyr ight ©2016, SAS Institute Inc. All rights reser ved. Decision MakingCustomer Focus Compliance Mandates Mergers & Acquisitions At-Risk Projects Operational Efficiencies CORPORATE DRIVERS Data Quality Data Integration Reference Data Management Master Data Management Data Visualization Data Monitoring Metadata Management Business Glossary SOLUTIONS Data Lifecycle Reference and Master Data Data Security Data Architecture Metadata Data Quality Data Administration Data Warehousing & BI/Analytics DATA MANAGEMENT DataStewardship Roles&Tasks Decision-making Bodies Guiding Principles Program Objectives Decision Rights DATA GOVERNANCE People Process Technology METHODS SAS DATA MANAGEMENT FRAMEWORK FOR SUCCESS Data Virtualization Data Profiling & Exploration
  • 33.
    Copyright © 2013,SAS Institute Inc. All rights reserved. QUESTIONS & ANSWERS THANK YOU! DAN.SOCEANU@SAS.COM