Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Data Architect Manifesto


Published on

The Data Architect role is one of the most misunderstood roles in Information Technology. The role is usually done in parts by several members in IT. DBA's, Application Architects and Developers perform this role in some fashion or the other. But having a single resource or team own this role brings tremendous advantages in standardization, compliance, documentation and performance.

Published in: Technology
  • Be the first to comment

The Data Architect Manifesto

  1. 1. The Data Architect Manifesto Session ID#: 10144 REMINDER Check in on the COLLABORATE mobile app Prepared by: Mahesh Vallampati Practice Principal Keste @mvallamp
  2. 2. About the Presenter ■ Mahesh Vallampati ▪ Career — Practice Leader for Business Intelligence and Oracle Financials at Keste — Sales and Consulting at Oracle for 9 years ▪ Education — Courses in Business/Accounting at Houston Community College— Courses in Business/Accounting at Houston Community College — Master’s in EE from Texas A&M University ■ Career Focus ▪ Used to be a DBA ▪ Now Techno-Functional (Fechnical)
  3. 3. is an AWARD-WINNING software solutions and development company headquartered in Plano, Texas. We focus on the EXECUTION, DELIVERY and SUPPORT of enterprise software & systems for the high technology, communications, life sciences and industrial manufacturing amongst other industries. Keste – kest n. [old world language derivative]; A culture that is agile and adaptive 3
  4. 4. I am an Architect 4
  5. 5. Contact Info ■ White Papers ▪ ■ Email: ▪ ■ Twitter: #mvallamp■ Twitter: #mvallamp ■ Blogs: ▪ / ▪ ■ Linked in Group Leader: DBA Manager ■ Oracle Alumni Admin for content: 5000 members
  6. 6. Agenda ■ Preamble ■ Manifesto ■ The declaration of the Manifesto ■ The pledge
  7. 7. Preamble
  8. 8. IT Architecture ■ The IEEE Definition ▪ Describes the fundamental organization of a system ▪ Embodies it components ▪ Describes the relationships between the components and the environment ▪ Describes the principles governing the design and evolution▪ Describes the principles governing the design and evolution
  9. 9. Data Architecture-Zachmann Layer View Data (What) RACI EA DA Bus DBA 1 Scope/Contextual List of things and architectural standards important to the business A C R I 2 Business Model/Conceptual Semantic model C RA I I2 Business Model/Conceptual Semantic model or Conceptual/Enterprise Data Model C RA I I 3 System Model/Logical Enterprise/Logical Data Model C RA I I 4 Technology Model/Physical Data Model C C I RA 5 Detailed Representations in Actual databases I C I RA
  10. 10. Data Architecture Drivers Driver Description Enterprise Requirements The requirements of a business system that processes data Technology Drivers Existing standards, software and resource knowledge Economics Business Drivers, Competitive advantage, Business cycle Business Policies Compliance, Policies and regulatory environment Data Processing Needs Type of Data Processing – Transaction, Data Warehousing, Mixed Load
  11. 11. Conceptual, Logical and Physical Feature Conceptual Logical Physical Entity Names X X Entity Relationships X X Attributes XAttributes X Primary Keys X X Foreign Keys X X Table Names X Column Names X Column Data Types X
  12. 12. Data Cycle Conceptual LogicalPhysical
  13. 13. Manifesto
  14. 14. Manifesto ■ A public declaration of policy and aims ■ The two famous manifestos of all time ▪ The Declaration of Independence ▪ The Communist Manifesto - by Karl Marx
  15. 15. The declaration of the manifesto
  16. 16. In the beginning… ■ In the Beginning there was Codd… ▪ We acknowledge the father of modern relational data theory ▪ He was a British citizen who fought in World War II ▪ He got his Ph.D. from Michigan ▪ Just like all innovations, his work was ignored by his employer - IBMIBM ▪ Larry Ellison recalled reading the paper and being inspired enough to make several billions
  17. 17. And then there was Date.. ■ Date was an English computer scientist ■ He popularized and taught relational data theory ■ His book on relational data theory is a classic that is used even today ■ The book is,” An Introduction to Database Systems” ■ He later wrote a book called Databases, Types and the■ He later wrote a book called Databases, Types and the Relational Model which is more popularly referred to as the third manifesto.
  18. 18. Use The keys ■ We promise to use the key, the whole key and nothing but the key, so help me Codd. ▪ A mnemonic that helps in verifying the third normal form ▪ A tongue in cheek obeisance to the father of relational theory ■ Keys ▪ The key – 1st Normal Form▪ The key – 1st Normal Form ▪ The whole Key – 2nd Normal Form ▪ Nothing but the key – 3rd Normal Form
  19. 19. Have a functional perspective ■ While most data architects think in terms of data models, it is beneficial to think in terms of business functions ■ Having a functional or logical data model that has a business perspective puts things into focus ■ A functional perspectives gives context and business purpose to a data model
  20. 20. Have a functional perspective Customers Buying Users Clients Shopping Lists Order Guide External Products Inventory Products/ Item Master Buying Products Vendors Ordering RulesCustomer Product Tags Customers X Products Orders
  21. 21. Feel free to comment ■ "Don't let it end like this. Tell them I said something" ~ last words of Pancho Villa ■ Oracle offers a mechanism to store comments ▪ Tables ▪ Columns ▪ Materialized views▪ Materialized views ▪ IndexType ▪ User Defined Operators
  22. 22. Comment on Tables ■ create table foo(bar number); ■ comment on table foo is 'This is a comment for foo'; ■ select * from user_tab_comments where table_name=‘FOO’ TABLE_NAME TABLE_TYPE COMMENTS FOO TABLE This is a comment for foo
  23. 23. Comment on Columns ■ comment on column is 'This is a comment for bar'; ■ select * from user_col_comments where comments is not null; TABLE_NAME COLUMN_NAME COMMENTS FOO BAR This is a comment for barFOO BAR This is a comment for bar
  24. 24. He named names ■ Naming columns should be consistent across tables ■ A column that is used widely in several tables should have the same name ■ You will not believe how often it is not the case ■ Keep abbreviations and short names consistent across table name and columnsname and columns
  25. 25. Always use Aliases ■ When referring to tables in queries, always use aliases ■ Also when referring to columns in queries, always prefix them with their table alias ■ This helps the reviewer or user or developers to understand what is being referred to from where ■ It is especially important when doing outer joins on the■ It is especially important when doing outer joins on the columns that are being joined. ■ My favorite table alias is for FND_USER
  26. 26. It is OK to be ANSI and not (+) ■ ANSI SQL is the way to go from a data architecture perspective ■ ANSI SQL is highly portable and can make applications potentially database neutral ■ Yes, ANSI is verbose ■ Yes, it can be confusing■ Yes, it can be confusing ■ Yes, it is painful ■ But it is worth it
  27. 27. Know the Who ■ All table should have the Who Columns ▪ CREATED_BY – The user who created the record ▪ UPDATED_BY – The user who updated the record ▪ CREATION_DATE – The date and time the record was created ▪ LAST_UPDATE_DATE – The date and time the record was updatedupdated
  28. 28. Master of his domain ■ Domains allow you to define and reuse a data type with optional constraints or allowable values. You can use domains in the Logical and Relational models. ■ The concept of domains should be adopted more by data architects ■ Oracle SQL Data Modeler now provides domain features in its modeling capabilityits modeling capability
  29. 29. Know Attribute Domains ■ STATUS_INDICATOR – NUMBER ▪ 1 ▪ 2 ▪ 3 ▪ 4 ■ So what do these values mean?■ So what do these values mean? ■ A survey of architects had different interpretations for their meaning ■ Instead have a table structure that captures these attribute domains
  30. 30. FND_IT ■ Oracle’s Approach in EBS for domain values ▪ FND_LOOKUP_VALUES ■ Use a similar approach ▪ TAB_COL_DOMAIN_LOOKUPS ▪ For each distinct value in the column domain store the value and its meaningand its meaning ▪ Eliminate any ambiguities about what the few distinct values in the column mean ■ This has the benefit of deriving meanings for columns from queries instead of using other sub-optimal approaches
  31. 31. Documenting Attribute Domains Table Name Column Name Column Values Value Meaning PRODUCT_MASTER STATUS_INDICATOR 1 Org Product PRODUCT_MASTER STATUS_INDICATOR 2 Third Party PRODUCT_MASTER STATUS_INDICATOR 3 Government Product PRODUCT_MASTER STATUS_INDICATOR 4 Discontinued
  32. 32. CHECK_IT ■ When using small domain ranges say distinct values in column < 10, use a check constraint ■ This eliminates the possibility that non-domain values will get filled
  33. 33. Design for the Analytic ■ A focus on data mapping to functionality should not blind us from the analytic ■ Make sure the data model is analytic friendly ■ See if it can be modeled as a snowflake or a star ■ Or use click-stream tables ■ Always ask the question- Can I mine this data?■ Always ask the question- Can I mine this data?
  34. 34. Know the business ■ The future demands people who know both technology and business ■ Meet, talk and work with the users of the system ■ Live their life for a day and use the system like they do ■ Find the question behind the question ■ Design for the analytic ( business insight ) and the data■ Design for the analytic ( business insight ) and the data
  35. 35. Know more… ■ As a Data Architect, know more ▪ Than the developer ▪ Than the user ▪ Than the business ▪ Than the business Analyst ▪ Than the tester▪ Than the tester ▪ Than the PM
  36. 36. Data is now big ■ From a relational standpoint, Big Data is the converse ■ It is and can be counter-intuitive ■ There is actually a NO-SQL ■ It is a big deal ■ It is un-structured ■ It is however learnable■ It is however learnable
  37. 37. Do the Math (Financial) ■ There are always business requirements that involve using large data sets ■ While that sounds awesome and cool, it comes with a lot of costs ■ Large Data Sets impose significant overhead on IT services whether it be Infrastructure, DBA, licenses and development costscosts ■ We did a cost benefit analysis for a customer who wanted to use Advanced Pricing and convinced them to use Simple Pricing
  38. 38. Do the Math Probability 50% Discount Rate 5% Year1 Year2 Year3 Year4 Year5 RevenueRevenue Upside $4,000,000 $4,000,000 $4,000,000 $4,000,000 $4,000,000 NPV $17,317,907 NPV for 5 Years Probable Revenue $8,658,953 NPV times the Probability Investment Required $15,000,000 Capital Investment Required. Depreciation not included. Profit ($6,341,047) Revenue-Cost Incurred
  39. 39. Know the Stat ■ Every relational database uses some kind of statistical model about the data ■ This data is used to determine query plans ■ Most of them assume a uniform distribution of the data ■ Any skewed distribution of the data has to be “taught” to the system as a hint or a special process to gather itsystem as a hint or a special process to gather it ■ Any Data Architect should be able to articulate the statistical distribution of a column values
  40. 40. Know the Stat ■ Data Science or Big Data Analytics is all about statistics ■ A huge stream of data is mined to generate customer preferences ■ These preferences are used to drive product placement and other revenue and profit enhancing initiatives
  41. 41. Know the Stat ■ At a minimum, know the following ▪ Mean, Median and Mode ▪ Standard Deviation ▪ Quintile, Decile, Quartile and Percentile ▪ An awareness of Regression Analysis
  42. 42. Write it down ■ For every table in the system, have a Wikipedia page ■ Or a note-let ■ Have a one pager or one paragraph about the table and the business function it supports ■ For every column, have a short description as to what it meansmeans
  43. 43. Write it Down (Example) Column Name Data Type Comments ORG_ID NUMBER Customer Organization CUST_NBR NUMBER Customer Number Customers have departments and this table tracks it and it is an outer join from the customer table. Table Name: HZ_CUST_DEPT CUST_NBR NUMBER Customer Number DEPT_NBR NUMBER(38,0) Customer Department DEPT_NAME VARCHAR2(25 BYTE) Customer Department Name DEPT_ACTV_IND VARCHAR2(1 BYTE) Indicates if the Department for the customer is active or not (Y/N)?
  44. 44. Visualize It ■ Be comfortable in data visualization techniques ■ Be able to represent data in different formats in a way that generates insight ■ Most BI Tools provide this and be able to provide innovative perspectives on data, results and reports ■ Information Dashboard Design by Stephen Few is particularly■ Information Dashboard Design by Stephen Few is particularly insightful
  45. 45. Be savvy about Algorithms ■ Algorithms provide a framework to think about complex business requirements ■ Ask the question, whether the algorithm required will be complex ■ If the answer is yes, costs will be high ■ You should be able to articulate in terms of O(n), O(nlog(n)),■ You should be able to articulate in terms of O(n), O(nlog(n)), O(n*n) and so on
  46. 46. Mask the Data ■ As data security becomes an increasingly important topic, masking the data from PROD to DEV becomes an important task ■ Masking the data in PROD from users of the system also becomes important ■ For e.g., salaries in Oracle HR tables are now masked and were not a few versions agowere not a few versions ago ■ A savvy Oracle developer could pretty much know the salaries of every employee in the company
  47. 47. Secure the Data ■ As a Data Architect, we need to be able to define secure methods to protect the data from internal and external threats ■ Features like Oracle Database vault and secure backups are key features that make it possible ■ While there are security teams, as a data architect, we need to be able to identify data vulnerabilities ■ Become familiar with encryption technologies like RSA
  48. 48. Drive towards Master Data ■ Master Data for key enterprise domains (customer, products) are becoming common place ■ We need to adopt this wave and lead from the front ■ Master Data Management is here to stay
  49. 49. Where do your users spend time? What Data Users Do? How they do it? Industry Standard Data Gathering Users spend a lot of gathering data 35 Data They then spend a lot of time formatting it 20Data Formatting They then spend a lot of time formatting it 20 Data Reconciliation They then reconcile the data 30 Data Analysis They then analyze the data 15
  50. 50. Get Certified ■ CDMP ▪ Certified Data Management Professional ■ Data Management Association International (DAMA) ■ Institute for Certification of Computing Professionals (ICCP) ■ Three ICCP exams: ▪ IS Core exam▪ IS Core exam ▪ Data Management Core exam ▪ One elective
  51. 51. You will speak many tongues ■ Not just SQL or PL/SQL ▪ XML and XSLT ▪ NO SQL ▪ UML (Unified Modeling Language) ▪ Java is the cobol of the 21st century ■ Not Just ER Data Models■ Not Just ER Data Models ▪ Logical Data Models ▪ Process flows that necessitate the entities of these logical entities
  52. 52. Be Responsible ■ Be Responsible for ▪ Organizing Data ▪ Treat Data as an Asset ▪ Leverage Data to achieve the strategic goals of the enterprise ▪ Data Quality ▪ Data Governance▪ Data Governance ▪ Data Security
  53. 53. The pledge of the data architect
  54. 54. The pledge ■ We, the data architects, hereby solemnly swear, that we will safeguard the data assets of the enterprise, by securing it from external threats, masking it from internal threats, document it to avoid secrecy, ensure data quality and data governance and commit to ongoing learning and new approaches, and provide value to our stakeholders, so help me Codd.
  55. 55. at Collaborate Questions to @mvallamp Text 972-804-5511 Mahesh Vallampati Practice Leader, BI and EBS 972-804-5511
  56. 56. Q and A ■ Q
  57. 57. Please complete the session evaluation We appreciate your feedback and insight You may complete the session evaluation either on paper or online via the mobile app