The Data Architect Manifesto
Session ID#: 10144
REMINDER
Check in on the
COLLABORATE mobile app
Prepared by:
Mahesh Vallampati
Practice Principal
Keste
@mvallamp
About the Presenter
■ Mahesh Vallampati
▪ Career
— Practice Leader for Business Intelligence and Oracle Financials at
Keste
— Sales and Consulting at Oracle for 9 years
▪ Education
— Courses in Business/Accounting at Houston Community College— Courses in Business/Accounting at Houston Community College
— Master’s in EE from Texas A&M University
■ Career Focus
▪ Used to be a DBA
▪ Now Techno-Functional (Fechnical)
is an AWARD-WINNING software solutions and
development company headquartered in Plano, Texas.
We focus on the EXECUTION, DELIVERY and SUPPORT of enterprise software
& systems for the high technology, communications,
life sciences and industrial manufacturing amongst other industries.
Keste – kest n. [old world language derivative]; A culture that is agile and adaptive
3
I am an Architect
4
Contact Info
■ White Papers
▪ http://www.slideshare.net/mvallamp
■ Email:
▪ Mahesh.Vallampati@keste.com
■ Twitter: #mvallamp■ Twitter: #mvallamp
■ Blogs:
▪ http://mvallamp.blogspot.com /
▪ http://oraexalytics.blogspot.com
■ Linked in Group Leader: DBA Manager
■ Oracle Alumni Admin for content: 5000 members
Agenda
■ Preamble
■ Manifesto
■ The declaration of the Manifesto
■ The pledge
Preamble
IT Architecture
■ The IEEE Definition
▪ Describes the fundamental organization of a system
▪ Embodies it components
▪ Describes the relationships between the components and the
environment
▪ Describes the principles governing the design and evolution▪ Describes the principles governing the design and evolution
Data Architecture-Zachmann
Layer View Data (What) RACI
EA DA Bus DBA
1 Scope/Contextual List of things and
architectural standards important to the
business
A C R I
2 Business Model/Conceptual Semantic model C RA I I2 Business Model/Conceptual Semantic model
or Conceptual/Enterprise Data Model
C RA I I
3 System Model/Logical Enterprise/Logical Data
Model
C RA I I
4 Technology Model/Physical Data Model C C I RA
5 Detailed Representations in Actual databases I C I RA
Data Architecture Drivers
Driver Description
Enterprise
Requirements
The requirements of a business system that processes
data
Technology Drivers Existing standards, software and resource knowledge
Economics Business Drivers, Competitive advantage, Business
cycle
Business Policies Compliance, Policies and regulatory environment
Data Processing
Needs
Type of Data Processing – Transaction, Data
Warehousing, Mixed Load
Conceptual, Logical and Physical
Feature Conceptual Logical Physical
Entity Names X X
Entity
Relationships X X
Attributes XAttributes X
Primary Keys X X
Foreign Keys X X
Table Names X
Column Names X
Column Data
Types X
Data Cycle
Conceptual
LogicalPhysical
Manifesto
Manifesto
■ A public declaration of policy and aims
■ The two famous manifestos of all time
▪ The Declaration of Independence
▪ The Communist Manifesto - by Karl Marx
The declaration of the manifesto
In the beginning…
■ In the Beginning there was Codd…
▪ We acknowledge the father of modern relational data theory
▪ He was a British citizen who fought in World War II
▪ He got his Ph.D. from Michigan
▪ Just like all innovations, his work was ignored by his employer -
IBMIBM
▪ Larry Ellison recalled reading the paper and being inspired
enough to make several billions
And then there was Date..
■ Date was an English computer scientist
■ He popularized and taught relational data theory
■ His book on relational data theory is a classic that is used
even today
■ The book is,” An Introduction to Database Systems”
■ He later wrote a book called Databases, Types and the■ He later wrote a book called Databases, Types and the
Relational Model which is more popularly referred to as the
third manifesto.
Use The keys
■ We promise to use the key, the whole key and nothing but the
key, so help me Codd.
▪ A mnemonic that helps in verifying the third normal form
▪ A tongue in cheek obeisance to the father of relational theory
■ Keys
▪ The key – 1st Normal Form▪ The key – 1st Normal Form
▪ The whole Key – 2nd Normal Form
▪ Nothing but the key – 3rd Normal Form
Have a functional perspective
■ While most data architects think in terms of data models, it is
beneficial to think in terms of business functions
■ Having a functional or logical data model that has a business
perspective puts things into focus
■ A functional perspectives gives context and business purpose
to a data model
Have a functional perspective
Customers
Buying
Users
Clients
Shopping Lists
Order Guide
External
Products
Inventory
Products/
Item Master
Buying
Products
Vendors
Ordering RulesCustomer
Product
Tags
Customers X
Products
Orders
Feel free to comment
■ "Don't let it end like this. Tell them I said something" ~ last
words of Pancho Villa
■ Oracle offers a mechanism to store comments
▪ Tables
▪ Columns
▪ Materialized views▪ Materialized views
▪ IndexType
▪ User Defined Operators
Comment on Tables
■ create table foo(bar number);
■ comment on table foo is 'This is a comment for foo';
■ select * from user_tab_comments where table_name=‘FOO’
TABLE_NAME TABLE_TYPE COMMENTS
FOO TABLE This is a comment for foo
Comment on Columns
■ comment on column foo.bar is 'This is a comment for bar';
■ select * from user_col_comments where comments is not
null;
TABLE_NAME COLUMN_NAME COMMENTS
FOO BAR This is a comment for barFOO BAR This is a comment for bar
He named names
■ Naming columns should be consistent across tables
■ A column that is used widely in several tables should have
the same name
■ You will not believe how often it is not the case
■ Keep abbreviations and short names consistent across table
name and columnsname and columns
Always use Aliases
■ When referring to tables in queries, always use aliases
■ Also when referring to columns in queries, always prefix them
with their table alias
■ This helps the reviewer or user or developers to understand
what is being referred to from where
■ It is especially important when doing outer joins on the■ It is especially important when doing outer joins on the
columns that are being joined.
■ My favorite table alias is for FND_USER
It is OK to be ANSI and not (+)
■ ANSI SQL is the way to go from a data architecture
perspective
■ ANSI SQL is highly portable and can make applications
potentially database neutral
■ Yes, ANSI is verbose
■ Yes, it can be confusing■ Yes, it can be confusing
■ Yes, it is painful
■ But it is worth it
Know the Who
■ All table should have the Who Columns
▪ CREATED_BY – The user who created the record
▪ UPDATED_BY – The user who updated the record
▪ CREATION_DATE – The date and time the record was created
▪ LAST_UPDATE_DATE – The date and time the record was
updatedupdated
Master of his domain
■ Domains allow you to define and reuse a data type with
optional constraints or allowable values. You can use
domains in the Logical and Relational models.
■ The concept of domains should be adopted more by data
architects
■ Oracle SQL Data Modeler now provides domain features in
its modeling capabilityits modeling capability
Know Attribute Domains
■ STATUS_INDICATOR – NUMBER
▪ 1
▪ 2
▪ 3
▪ 4
■ So what do these values mean?■ So what do these values mean?
■ A survey of architects had different interpretations for their
meaning
■ Instead have a table structure that captures these attribute
domains
FND_IT
■ Oracle’s Approach in EBS for domain values
▪ FND_LOOKUP_VALUES
■ Use a similar approach
▪ TAB_COL_DOMAIN_LOOKUPS
▪ For each distinct value in the column domain store the value
and its meaningand its meaning
▪ Eliminate any ambiguities about what the few distinct values in
the column mean
■ This has the benefit of deriving meanings for columns from
queries instead of using other sub-optimal approaches
Documenting Attribute Domains
Table Name Column Name Column Values Value Meaning
PRODUCT_MASTER STATUS_INDICATOR 1 Org Product
PRODUCT_MASTER STATUS_INDICATOR 2 Third Party
PRODUCT_MASTER STATUS_INDICATOR 3 Government
Product
PRODUCT_MASTER STATUS_INDICATOR 4 Discontinued
CHECK_IT
■ When using small domain ranges say distinct values in
column < 10, use a check constraint
■ This eliminates the possibility that non-domain values will get
filled
Design for the Analytic
■ A focus on data mapping to functionality should not blind us
from the analytic
■ Make sure the data model is analytic friendly
■ See if it can be modeled as a snowflake or a star
■ Or use click-stream tables
■ Always ask the question- Can I mine this data?■ Always ask the question- Can I mine this data?
Know the business
■ The future demands people who know both technology and
business
■ Meet, talk and work with the users of the system
■ Live their life for a day and use the system like they do
■ Find the question behind the question
■ Design for the analytic ( business insight ) and the data■ Design for the analytic ( business insight ) and the data
Know more…
■ As a Data Architect, know more
▪ Than the developer
▪ Than the user
▪ Than the business
▪ Than the business Analyst
▪ Than the tester▪ Than the tester
▪ Than the PM
Data is now big
■ From a relational standpoint, Big Data is the converse
■ It is and can be counter-intuitive
■ There is actually a NO-SQL
■ It is a big deal
■ It is un-structured
■ It is however learnable■ It is however learnable
Do the Math (Financial)
■ There are always business requirements that involve using
large data sets
■ While that sounds awesome and cool, it comes with a lot of
costs
■ Large Data Sets impose significant overhead on IT services
whether it be Infrastructure, DBA, licenses and development
costscosts
■ We did a cost benefit analysis for a customer who wanted to
use Advanced Pricing and convinced them to use Simple
Pricing
Do the Math
Probability 50%
Discount Rate 5%
Year1 Year2 Year3 Year4 Year5
RevenueRevenue
Upside $4,000,000 $4,000,000 $4,000,000 $4,000,000 $4,000,000
NPV $17,317,907 NPV for 5 Years
Probable
Revenue $8,658,953 NPV times the Probability
Investment
Required $15,000,000 Capital Investment Required. Depreciation not included.
Profit ($6,341,047) Revenue-Cost Incurred
Know the Stat
■ Every relational database uses some kind of statistical model
about the data
■ This data is used to determine query plans
■ Most of them assume a uniform distribution of the data
■ Any skewed distribution of the data has to be “taught” to the
system as a hint or a special process to gather itsystem as a hint or a special process to gather it
■ Any Data Architect should be able to articulate the statistical
distribution of a column values
Know the Stat
■ Data Science or Big Data Analytics is all about statistics
■ A huge stream of data is mined to generate customer
preferences
■ These preferences are used to drive product placement and
other revenue and profit enhancing initiatives
Know the Stat
■ At a minimum, know the following
▪ Mean, Median and Mode
▪ Standard Deviation
▪ Quintile, Decile, Quartile and Percentile
▪ An awareness of Regression Analysis
Write it down
■ For every table in the system, have a Wikipedia page
■ Or a note-let
■ Have a one pager or one paragraph about the table and the
business function it supports
■ For every column, have a short description as to what it
meansmeans
Write it Down (Example)
Column Name Data Type Comments
ORG_ID NUMBER Customer Organization
CUST_NBR NUMBER Customer Number
Customers have departments and this table tracks it and it is an outer join
from the customer table. Table Name: HZ_CUST_DEPT
CUST_NBR NUMBER Customer Number
DEPT_NBR NUMBER(38,0) Customer Department
DEPT_NAME VARCHAR2(25 BYTE)
Customer Department
Name
DEPT_ACTV_IND VARCHAR2(1 BYTE)
Indicates if the
Department for the
customer is active or not
(Y/N)?
Visualize It
■ Be comfortable in data visualization techniques
■ Be able to represent data in different formats in a way that
generates insight
■ Most BI Tools provide this and be able to provide innovative
perspectives on data, results and reports
■ Information Dashboard Design by Stephen Few is particularly■ Information Dashboard Design by Stephen Few is particularly
insightful
Be savvy about Algorithms
■ Algorithms provide a framework to think about complex
business requirements
■ Ask the question, whether the algorithm required will be
complex
■ If the answer is yes, costs will be high
■ You should be able to articulate in terms of O(n), O(nlog(n)),■ You should be able to articulate in terms of O(n), O(nlog(n)),
O(n*n) and so on
Mask the Data
■ As data security becomes an increasingly important topic,
masking the data from PROD to DEV becomes an important
task
■ Masking the data in PROD from users of the system also
becomes important
■ For e.g., salaries in Oracle HR tables are now masked and
were not a few versions agowere not a few versions ago
■ A savvy Oracle developer could pretty much know the
salaries of every employee in the company
Secure the Data
■ As a Data Architect, we need to be able to define secure
methods to protect the data from internal and external threats
■ Features like Oracle Database vault and secure backups are
key features that make it possible
■ While there are security teams, as a data architect, we need
to be able to identify data vulnerabilities
■ Become familiar with encryption technologies like RSA
Drive towards Master Data
■ Master Data for key enterprise domains (customer, products)
are becoming common place
■ We need to adopt this wave and lead from the front
■ Master Data Management is here to stay
Where do your users spend time?
What Data
Users Do?
How they do it? Industry Standard
Data Gathering Users spend a lot of gathering data 35
Data They then spend a lot of time formatting it 20Data
Formatting
They then spend a lot of time formatting it 20
Data
Reconciliation
They then reconcile the data 30
Data Analysis They then analyze the data 15
Get Certified
■ CDMP
▪ Certified Data Management Professional
■ Data Management Association International (DAMA)
■ Institute for Certification of Computing Professionals (ICCP)
■ Three ICCP exams:
▪ IS Core exam▪ IS Core exam
▪ Data Management Core exam
▪ One elective
You will speak many tongues
■ Not just SQL or PL/SQL
▪ XML and XSLT
▪ NO SQL
▪ UML (Unified Modeling Language)
▪ Java is the cobol of the 21st century
■ Not Just ER Data Models■ Not Just ER Data Models
▪ Logical Data Models
▪ Process flows that necessitate the entities of these logical
entities
Be Responsible
■ Be Responsible for
▪ Organizing Data
▪ Treat Data as an Asset
▪ Leverage Data to achieve the strategic goals of the enterprise
▪ Data Quality
▪ Data Governance▪ Data Governance
▪ Data Security
The pledge of the data architect
The pledge
■ We, the data architects, hereby solemnly swear, that we will
safeguard the data assets of the enterprise, by securing it
from external threats, masking it from internal threats,
document it to avoid secrecy, ensure data quality and data
governance and commit to ongoing learning and new
approaches, and provide value to our stakeholders, so help
me Codd.me Codd.
at Collaborate
Questions to @mvallamp
Text
972-804-5511
Mahesh Vallampati
Practice Leader, BI and EBS
Mahesh.Vallampati@keste.com
972-804-5511
Q and A
■ Q
Please complete the session
evaluation
We appreciate your feedback and insight
You may complete the session evaluation either
on paper or online via the mobile app

The Data Architect Manifesto

  • 1.
    The Data ArchitectManifesto Session ID#: 10144 REMINDER Check in on the COLLABORATE mobile app Prepared by: Mahesh Vallampati Practice Principal Keste @mvallamp
  • 2.
    About the Presenter ■Mahesh Vallampati ▪ Career — Practice Leader for Business Intelligence and Oracle Financials at Keste — Sales and Consulting at Oracle for 9 years ▪ Education — Courses in Business/Accounting at Houston Community College— Courses in Business/Accounting at Houston Community College — Master’s in EE from Texas A&M University ■ Career Focus ▪ Used to be a DBA ▪ Now Techno-Functional (Fechnical)
  • 3.
    is an AWARD-WINNINGsoftware solutions and development company headquartered in Plano, Texas. We focus on the EXECUTION, DELIVERY and SUPPORT of enterprise software & systems for the high technology, communications, life sciences and industrial manufacturing amongst other industries. Keste – kest n. [old world language derivative]; A culture that is agile and adaptive 3
  • 4.
    I am anArchitect 4
  • 5.
    Contact Info ■ WhitePapers ▪ http://www.slideshare.net/mvallamp ■ Email: ▪ Mahesh.Vallampati@keste.com ■ Twitter: #mvallamp■ Twitter: #mvallamp ■ Blogs: ▪ http://mvallamp.blogspot.com / ▪ http://oraexalytics.blogspot.com ■ Linked in Group Leader: DBA Manager ■ Oracle Alumni Admin for content: 5000 members
  • 6.
    Agenda ■ Preamble ■ Manifesto ■The declaration of the Manifesto ■ The pledge
  • 7.
  • 8.
    IT Architecture ■ TheIEEE Definition ▪ Describes the fundamental organization of a system ▪ Embodies it components ▪ Describes the relationships between the components and the environment ▪ Describes the principles governing the design and evolution▪ Describes the principles governing the design and evolution
  • 9.
    Data Architecture-Zachmann Layer ViewData (What) RACI EA DA Bus DBA 1 Scope/Contextual List of things and architectural standards important to the business A C R I 2 Business Model/Conceptual Semantic model C RA I I2 Business Model/Conceptual Semantic model or Conceptual/Enterprise Data Model C RA I I 3 System Model/Logical Enterprise/Logical Data Model C RA I I 4 Technology Model/Physical Data Model C C I RA 5 Detailed Representations in Actual databases I C I RA
  • 10.
    Data Architecture Drivers DriverDescription Enterprise Requirements The requirements of a business system that processes data Technology Drivers Existing standards, software and resource knowledge Economics Business Drivers, Competitive advantage, Business cycle Business Policies Compliance, Policies and regulatory environment Data Processing Needs Type of Data Processing – Transaction, Data Warehousing, Mixed Load
  • 11.
    Conceptual, Logical andPhysical Feature Conceptual Logical Physical Entity Names X X Entity Relationships X X Attributes XAttributes X Primary Keys X X Foreign Keys X X Table Names X Column Names X Column Data Types X
  • 12.
  • 13.
  • 14.
    Manifesto ■ A publicdeclaration of policy and aims ■ The two famous manifestos of all time ▪ The Declaration of Independence ▪ The Communist Manifesto - by Karl Marx
  • 15.
    The declaration ofthe manifesto
  • 16.
    In the beginning… ■In the Beginning there was Codd… ▪ We acknowledge the father of modern relational data theory ▪ He was a British citizen who fought in World War II ▪ He got his Ph.D. from Michigan ▪ Just like all innovations, his work was ignored by his employer - IBMIBM ▪ Larry Ellison recalled reading the paper and being inspired enough to make several billions
  • 17.
    And then therewas Date.. ■ Date was an English computer scientist ■ He popularized and taught relational data theory ■ His book on relational data theory is a classic that is used even today ■ The book is,” An Introduction to Database Systems” ■ He later wrote a book called Databases, Types and the■ He later wrote a book called Databases, Types and the Relational Model which is more popularly referred to as the third manifesto.
  • 18.
    Use The keys ■We promise to use the key, the whole key and nothing but the key, so help me Codd. ▪ A mnemonic that helps in verifying the third normal form ▪ A tongue in cheek obeisance to the father of relational theory ■ Keys ▪ The key – 1st Normal Form▪ The key – 1st Normal Form ▪ The whole Key – 2nd Normal Form ▪ Nothing but the key – 3rd Normal Form
  • 19.
    Have a functionalperspective ■ While most data architects think in terms of data models, it is beneficial to think in terms of business functions ■ Having a functional or logical data model that has a business perspective puts things into focus ■ A functional perspectives gives context and business purpose to a data model
  • 20.
    Have a functionalperspective Customers Buying Users Clients Shopping Lists Order Guide External Products Inventory Products/ Item Master Buying Products Vendors Ordering RulesCustomer Product Tags Customers X Products Orders
  • 21.
    Feel free tocomment ■ "Don't let it end like this. Tell them I said something" ~ last words of Pancho Villa ■ Oracle offers a mechanism to store comments ▪ Tables ▪ Columns ▪ Materialized views▪ Materialized views ▪ IndexType ▪ User Defined Operators
  • 22.
    Comment on Tables ■create table foo(bar number); ■ comment on table foo is 'This is a comment for foo'; ■ select * from user_tab_comments where table_name=‘FOO’ TABLE_NAME TABLE_TYPE COMMENTS FOO TABLE This is a comment for foo
  • 23.
    Comment on Columns ■comment on column foo.bar is 'This is a comment for bar'; ■ select * from user_col_comments where comments is not null; TABLE_NAME COLUMN_NAME COMMENTS FOO BAR This is a comment for barFOO BAR This is a comment for bar
  • 24.
    He named names ■Naming columns should be consistent across tables ■ A column that is used widely in several tables should have the same name ■ You will not believe how often it is not the case ■ Keep abbreviations and short names consistent across table name and columnsname and columns
  • 25.
    Always use Aliases ■When referring to tables in queries, always use aliases ■ Also when referring to columns in queries, always prefix them with their table alias ■ This helps the reviewer or user or developers to understand what is being referred to from where ■ It is especially important when doing outer joins on the■ It is especially important when doing outer joins on the columns that are being joined. ■ My favorite table alias is for FND_USER
  • 26.
    It is OKto be ANSI and not (+) ■ ANSI SQL is the way to go from a data architecture perspective ■ ANSI SQL is highly portable and can make applications potentially database neutral ■ Yes, ANSI is verbose ■ Yes, it can be confusing■ Yes, it can be confusing ■ Yes, it is painful ■ But it is worth it
  • 27.
    Know the Who ■All table should have the Who Columns ▪ CREATED_BY – The user who created the record ▪ UPDATED_BY – The user who updated the record ▪ CREATION_DATE – The date and time the record was created ▪ LAST_UPDATE_DATE – The date and time the record was updatedupdated
  • 28.
    Master of hisdomain ■ Domains allow you to define and reuse a data type with optional constraints or allowable values. You can use domains in the Logical and Relational models. ■ The concept of domains should be adopted more by data architects ■ Oracle SQL Data Modeler now provides domain features in its modeling capabilityits modeling capability
  • 29.
    Know Attribute Domains ■STATUS_INDICATOR – NUMBER ▪ 1 ▪ 2 ▪ 3 ▪ 4 ■ So what do these values mean?■ So what do these values mean? ■ A survey of architects had different interpretations for their meaning ■ Instead have a table structure that captures these attribute domains
  • 30.
    FND_IT ■ Oracle’s Approachin EBS for domain values ▪ FND_LOOKUP_VALUES ■ Use a similar approach ▪ TAB_COL_DOMAIN_LOOKUPS ▪ For each distinct value in the column domain store the value and its meaningand its meaning ▪ Eliminate any ambiguities about what the few distinct values in the column mean ■ This has the benefit of deriving meanings for columns from queries instead of using other sub-optimal approaches
  • 31.
    Documenting Attribute Domains TableName Column Name Column Values Value Meaning PRODUCT_MASTER STATUS_INDICATOR 1 Org Product PRODUCT_MASTER STATUS_INDICATOR 2 Third Party PRODUCT_MASTER STATUS_INDICATOR 3 Government Product PRODUCT_MASTER STATUS_INDICATOR 4 Discontinued
  • 32.
    CHECK_IT ■ When usingsmall domain ranges say distinct values in column < 10, use a check constraint ■ This eliminates the possibility that non-domain values will get filled
  • 33.
    Design for theAnalytic ■ A focus on data mapping to functionality should not blind us from the analytic ■ Make sure the data model is analytic friendly ■ See if it can be modeled as a snowflake or a star ■ Or use click-stream tables ■ Always ask the question- Can I mine this data?■ Always ask the question- Can I mine this data?
  • 34.
    Know the business ■The future demands people who know both technology and business ■ Meet, talk and work with the users of the system ■ Live their life for a day and use the system like they do ■ Find the question behind the question ■ Design for the analytic ( business insight ) and the data■ Design for the analytic ( business insight ) and the data
  • 35.
    Know more… ■ Asa Data Architect, know more ▪ Than the developer ▪ Than the user ▪ Than the business ▪ Than the business Analyst ▪ Than the tester▪ Than the tester ▪ Than the PM
  • 36.
    Data is nowbig ■ From a relational standpoint, Big Data is the converse ■ It is and can be counter-intuitive ■ There is actually a NO-SQL ■ It is a big deal ■ It is un-structured ■ It is however learnable■ It is however learnable
  • 37.
    Do the Math(Financial) ■ There are always business requirements that involve using large data sets ■ While that sounds awesome and cool, it comes with a lot of costs ■ Large Data Sets impose significant overhead on IT services whether it be Infrastructure, DBA, licenses and development costscosts ■ We did a cost benefit analysis for a customer who wanted to use Advanced Pricing and convinced them to use Simple Pricing
  • 38.
    Do the Math Probability50% Discount Rate 5% Year1 Year2 Year3 Year4 Year5 RevenueRevenue Upside $4,000,000 $4,000,000 $4,000,000 $4,000,000 $4,000,000 NPV $17,317,907 NPV for 5 Years Probable Revenue $8,658,953 NPV times the Probability Investment Required $15,000,000 Capital Investment Required. Depreciation not included. Profit ($6,341,047) Revenue-Cost Incurred
  • 39.
    Know the Stat ■Every relational database uses some kind of statistical model about the data ■ This data is used to determine query plans ■ Most of them assume a uniform distribution of the data ■ Any skewed distribution of the data has to be “taught” to the system as a hint or a special process to gather itsystem as a hint or a special process to gather it ■ Any Data Architect should be able to articulate the statistical distribution of a column values
  • 40.
    Know the Stat ■Data Science or Big Data Analytics is all about statistics ■ A huge stream of data is mined to generate customer preferences ■ These preferences are used to drive product placement and other revenue and profit enhancing initiatives
  • 41.
    Know the Stat ■At a minimum, know the following ▪ Mean, Median and Mode ▪ Standard Deviation ▪ Quintile, Decile, Quartile and Percentile ▪ An awareness of Regression Analysis
  • 42.
    Write it down ■For every table in the system, have a Wikipedia page ■ Or a note-let ■ Have a one pager or one paragraph about the table and the business function it supports ■ For every column, have a short description as to what it meansmeans
  • 43.
    Write it Down(Example) Column Name Data Type Comments ORG_ID NUMBER Customer Organization CUST_NBR NUMBER Customer Number Customers have departments and this table tracks it and it is an outer join from the customer table. Table Name: HZ_CUST_DEPT CUST_NBR NUMBER Customer Number DEPT_NBR NUMBER(38,0) Customer Department DEPT_NAME VARCHAR2(25 BYTE) Customer Department Name DEPT_ACTV_IND VARCHAR2(1 BYTE) Indicates if the Department for the customer is active or not (Y/N)?
  • 44.
    Visualize It ■ Becomfortable in data visualization techniques ■ Be able to represent data in different formats in a way that generates insight ■ Most BI Tools provide this and be able to provide innovative perspectives on data, results and reports ■ Information Dashboard Design by Stephen Few is particularly■ Information Dashboard Design by Stephen Few is particularly insightful
  • 45.
    Be savvy aboutAlgorithms ■ Algorithms provide a framework to think about complex business requirements ■ Ask the question, whether the algorithm required will be complex ■ If the answer is yes, costs will be high ■ You should be able to articulate in terms of O(n), O(nlog(n)),■ You should be able to articulate in terms of O(n), O(nlog(n)), O(n*n) and so on
  • 46.
    Mask the Data ■As data security becomes an increasingly important topic, masking the data from PROD to DEV becomes an important task ■ Masking the data in PROD from users of the system also becomes important ■ For e.g., salaries in Oracle HR tables are now masked and were not a few versions agowere not a few versions ago ■ A savvy Oracle developer could pretty much know the salaries of every employee in the company
  • 47.
    Secure the Data ■As a Data Architect, we need to be able to define secure methods to protect the data from internal and external threats ■ Features like Oracle Database vault and secure backups are key features that make it possible ■ While there are security teams, as a data architect, we need to be able to identify data vulnerabilities ■ Become familiar with encryption technologies like RSA
  • 48.
    Drive towards MasterData ■ Master Data for key enterprise domains (customer, products) are becoming common place ■ We need to adopt this wave and lead from the front ■ Master Data Management is here to stay
  • 49.
    Where do yourusers spend time? What Data Users Do? How they do it? Industry Standard Data Gathering Users spend a lot of gathering data 35 Data They then spend a lot of time formatting it 20Data Formatting They then spend a lot of time formatting it 20 Data Reconciliation They then reconcile the data 30 Data Analysis They then analyze the data 15
  • 50.
    Get Certified ■ CDMP ▪Certified Data Management Professional ■ Data Management Association International (DAMA) ■ Institute for Certification of Computing Professionals (ICCP) ■ Three ICCP exams: ▪ IS Core exam▪ IS Core exam ▪ Data Management Core exam ▪ One elective
  • 51.
    You will speakmany tongues ■ Not just SQL or PL/SQL ▪ XML and XSLT ▪ NO SQL ▪ UML (Unified Modeling Language) ▪ Java is the cobol of the 21st century ■ Not Just ER Data Models■ Not Just ER Data Models ▪ Logical Data Models ▪ Process flows that necessitate the entities of these logical entities
  • 52.
    Be Responsible ■ BeResponsible for ▪ Organizing Data ▪ Treat Data as an Asset ▪ Leverage Data to achieve the strategic goals of the enterprise ▪ Data Quality ▪ Data Governance▪ Data Governance ▪ Data Security
  • 53.
    The pledge ofthe data architect
  • 54.
    The pledge ■ We,the data architects, hereby solemnly swear, that we will safeguard the data assets of the enterprise, by securing it from external threats, masking it from internal threats, document it to avoid secrecy, ensure data quality and data governance and commit to ongoing learning and new approaches, and provide value to our stakeholders, so help me Codd.me Codd.
  • 55.
    at Collaborate Questions to@mvallamp Text 972-804-5511 Mahesh Vallampati Practice Leader, BI and EBS Mahesh.Vallampati@keste.com 972-804-5511
  • 56.
  • 57.
    Please complete thesession evaluation We appreciate your feedback and insight You may complete the session evaluation either on paper or online via the mobile app