Successfully reported this slideshow.
Your SlideShare is downloading. ×

Evolution of the DBA to Data Platform Administrator/Specialist

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 34 Ad

Evolution of the DBA to Data Platform Administrator/Specialist

Download to read offline

DBA's used to be Relational Database centric for instance managing Microsoft SQL Server or Oracle, in this changing world of polyglot database environments their role has expanded not just into new platforms other than SQL but also new legal governance, modelling techniques, architecture etc. They need to have a base knowledge of Kimball, Inmon, Data Vault, what CAP theorem is, LAMBDA, Big Data, Data Science etc.

DBA's used to be Relational Database centric for instance managing Microsoft SQL Server or Oracle, in this changing world of polyglot database environments their role has expanded not just into new platforms other than SQL but also new legal governance, modelling techniques, architecture etc. They need to have a base knowledge of Kimball, Inmon, Data Vault, what CAP theorem is, LAMBDA, Big Data, Data Science etc.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Evolution of the DBA to Data Platform Administrator/Specialist (20)

Advertisement

Recently uploaded (20)

Evolution of the DBA to Data Platform Administrator/Specialist

  1. 1. Tony Rogerson Microsoft Data Platform MVP tonyrogerson@sqlserverfaq.com @tonyrogerson
  2. 2.  Professional ◦ 29 years of Database experience – (6 on DB2, 1 on Oracle and 23 on SQL Server) ◦ Freelance SQL Server and Data Platform specialist ◦ Fellow BCS, Masters in BI, PGCert in Data Science ◦ I also do F# (and the less relevant cousin C#)  Community ◦ Founder member of UK SQL User Group, SQLServerFAQ.com, DataIdol.com, DDD, SQLBits and SQL Relay ◦ Microsoft SQL Server MVP since 1997, and now a Data Platform MVP ◦ Technical blog: http://sqlblogcasts.com/blogs/tonyrogerson (legacy) http://dataidol.com/tonyrogerson (General DP blog) http://sqlserverfaq.com/tonyrogerson (MS DP blog)
  3. 3. Group discussion – I can only discuss from what I’ve seen myself over the past few years and recent while looking for work
  4. 4.  What’s a Data Platform?  Define the traditional Database Administrator ◦ Logical and Physical Modelling ◦ Data Governance ◦ HADR  The importance of a play area  The expanding skillset ◦ Beyond Relational – alternative Databases ◦ Polyglot Database Environment ◦ The Distributed Database and understanding CAP ◦ Alternate architectures - LAMBDA ◦ ETL ◦ Business Intelligence, Data Science, Data Platform Engineer ◦ What else? Audience please….
  5. 5. Types Structured Un-structured Semi-structured Applications Fat client, Web Intranet, Mobile Storage Database Type SQL NoSQL NewSQL Business Intelligence Standard Reporting from standard process metrics from the Data Warehouse/ Reporting database Business Analytics Investigative Reporting over past data. Management Science Data Science Investigative {Data Analytics, Business Analytics} over structured, semi, unstructured data for possible patterns – use of Machine Learning and Pattern Matching algorithms. Data Creators, Data Contributors, Data Consumers
  6. 6. Business Intelligence SSRS, Crystal, Business Objects, PowerPivot, Excel, QlikView, Tableau, Reporting apps…. Types Structured – Normal Form, JSON, XML Un-structured – {developers think all data is like this } Semi-structured – JSON, XML, Key/Value Pair Applications C#, F#, Java etc. [Data sourcing] Storage Database Type SQL – Oracle, DB2, Sybase, SQL Server, MySQL etc. NoSQL – CouchDB, Raven, Cassandra, Hadoop, MongoDB, Neo4j NewSQL – Postgres-XL, Postgres-XC, Volt-DB, NuoDB Business Analytics SAS, SPSS, Statistica, MatLab etc.. Data Science BI + BA + ‘R’, Pyphon, Machine Learning packages, SQL, MapR, Data Extraction, ML, Visualisations, Story Boarding SQL, MapR, U-SQL..Data Creators, Data Contributors, Data Consumers
  7. 7.  SSIS ◦ pull RSS feed and store in SQL Server ◦ ODATA source example  Azure File Share ◦ Storing archive data
  8. 8. Modelling Data Governance HADR Releasing Stuff
  9. 9.  Data is an Asset – Security Guard  Data Custodian – Compliance, ???  Liaison between Business and Devs  Liaison between Business and Infrastructure  What else?
  10. 10.  Custodian of the Business Taxonomy ◦ Data Dictionary  Logical / Physical ◦ Normal Form ◦ Logical Model (relationships) V Physical Model (vender dependent schema)  Relational V Dimensional ◦ Entity Relationship modelling (tables and relationships between) ◦ Dimensional Modelling (facts and dimensions) – models to usability and performance
  11. 11.  ICO Principals  Data Protection Laws – Security, Retention  Your responsibilities – vary within the Org
  12. 12.  High Availability ◦ Understanding Latency ◦ Mirroring ◦ Availability Groups ◦ Log Shipping (?)  Disaster Recovery ◦ Practiced Procedures ◦ DR Resource misalignment ◦ Implementing contingency ◦ Dealing with Data corruption or Accidents (if I only have AG’s – what’s the issue?)
  13. 13.  Applying Database releases ◦ Which Databases? SQL / NoSQL etc.  Supportability (level of reqd knowledge)  Patching Servers
  14. 14.  You protect the Integrity and Availability of the “Database Platform”  Not limited to SQL Server ◦ NoSQL products ◦ Relational “SQL” products ◦ NewSQL
  15. 15. Play Areas Knowing what to learn
  16. 16.  Align with your company ◦ Talk to developers, see what they are using, take a lead with Data Technology – nurture their use of Data. ◦ Data is an Asset, without data your company won’t exist – make your company realise your importance and you need to be right up there in the decision making for technology direction  Align with the industry ◦ Job boards, trends  Be one (ok – a couple of) steps ahead!
  17. 17.  You can’t play in live!  Decent laptop – 16GiB+ RAM, SSD / M2 Flash  VirtualBox ◦ Multiple Windows Server, build a domain, build a cluster etc. ◦ Multiple Linux ◦ Etc.
  18. 18. Beyond Relational – alternative Databases Polyglot Database Environment The Distributed Database and CAP LAMBDA ETL MDM Cloud
  19. 19.  Business environment is “Polyglot”  Require understanding of ◦ NoSQL ◦ CAP Theorem ◦ LAMBDA (edge case) ◦ Big Data – what it really is ◦ CEP (is this a Database related tech?) ◦ ETL ◦ Data Science – what it really is ◦ BI ◦ Kimball, Inmon ◦ Data Vault
  20. 20.  Really means – No NF  Key Value Stores (Riak, CouchDB)  Column (Cassandra)  Document (MongoDB)  Graph (Neo4J)  Object (Bit niche )  Ironically – most have a SQL like interface now or in development!
  21. 21.  Consistency ◦ All nodes show the same value ◦ Eventual Consistency  Availability ◦ Node will return data  Partition Tolerance ◦ Islands form when network fails – clients connect to local nodes so when isolated you lose consistency.  You can only have two of the 3 and never all three.
  22. 22. 1 2 3 4 5 6 Insert Update Delete DatCtr A Insert Update Delete DatCtr B Insert Update Delete DatCtr C
  23. 23.  No – it’s not just Hadoop  Velocity, Variety, Volume  BD can be done in anything. ◦ Velocity – CEP, In-Memory, distributed computing ◦ Variety – varied types of data, structured / un. ◦ Volume – size of the data  BD is not definitive – depends on your budget, ability etc.
  24. 24.  Processing a data stream in flight  Window over the stream and determine trends  Read the stream rather than poll the database
  25. 25.  If you aren’t using Machine Learning / Data Mining algol’s you aren’t doing Data Science  If you know what you are looking for – you aren’t doing DS.  DS isn’t just R, you can do DS in numerous tools, R has a large library of packages to use against your data  DS is where you are looking for patterns in your data and trying to understand them to then formulate standard process flows to take advantage.
  26. 26.  Scale out – distributed – data processing architecture  Batch, Speed, Service layers  For low latency, high updates  Robust
  27. 27.  Kimball ◦ Dimensional modelling with star schema ◦ Dimensions and Facts ◦ Bottom up – data marts to EDW ◦ Aspires to Single Version of the Truth  Inmon ◦ Normal Form ◦ Can also use star schema ◦ Form the EDW and then use data marts ◦ Stronger approach to Single Version of the Truth
  28. 28.  Modelling method  Pull all your uncleansed data and store it in one place  Buffer between Operational Databases and the Conformed Data Warehouse
  29. 29.  Are you really on the Cloud or just managed remotely located server environment?  Real cloud has immediate elasticity, hides infrastructure, easy to spawn up new resource and near immediate.  Market’d cloud is really managed servers – no immediate elasticity, servers are provisioned and that takes time.
  30. 30.  True cloud offers elasticity for Distributed Database capabilities – proper scale out. ◦ Azure Elastic Database (Sharding) ◦ SQL 2016 Stretch Feature  Remember CAP? Yep – you need to understand that.  On-Prem tends to be scale up, single box – single database  Cloud – some of your tasks will disappear because it’s done for you. But your role is a Data Centric role and not Infrastructure Centric.

Editor's Notes

  • 20:00 – 21:00 Tony Rogerson - SQL Server Data Platform specialist” who used to be known as “Database Administrator"
    The year was 1995 and I was a SQL Developer/Database Administrator designing schema, writing and optimising SQL, managing log shipping and backups. The year is now 2016 and that relatively small skill set has exploded dramatically with ETL (SSIS plus some C#), MDM, Business Intelligence (Kimball, Inmon, Lambda, hybrid), Data Science (Statistics, Business Skills, R, F#, HDInsight, Hadoop), Cloud (AWS, Azure, Thirdparty on/off prem), Data Governance (ICO principles/rules, Security, International DP rules).
     In this session we will look at today’s SQL Server Data Platform specialists, you know who they are because even though you are still called “DBA” you are actually one of them!
     We will cover off introductions with demos into the following technology areas: ETL, BI, DS and Azure with examples on using them within a Data Platform setting.

×