SlideShare a Scribd company logo
DATABASE
THEORY AND
MODELING
A C R A S H C O U R S E D B A H O W TO
WHAT IS A DBA?
DEVELOPMENT
• Capacity Planning
• Database Design
• Database Implementation
• Migration
OPERATIONS
• Installation
• Configuration
• Monitoring
• Security and Access Management
• Troubleshooting
• Backup and Recovery
DATABASE THEORY
THE STUDY OF DATABASES AND DATA MANAGEMENT SYSTEMS
• Finite Model Theory
• Database Design Theory
• Dependency Theory
• Concurrency Control
• Deductive Databases
• Temporal and Spatial Databases
• Uncertain Data
• Query Languages and Logic
DATA MODELING
T U R N I N G B U S I N E S S R E Q U I R E M E N T S I N TO
D ATA R O A D M A P S
REASONS FOR MODELING DATA
WHAT?
• Provide a definition of our data
• Provide a format for our data
WHY?
• Compatibility of data
• Lower cost to build, operate, and
maintain systems
THREE KINDS OF DATA MODEL
INSTANCES
• Conceptual Data Model
• Logical (External) Data Model
• Physical Data Model
CONCEPTUAL MODEL
• Entities that comprise your data
• Creating data objects
• Identifying any relationships between objects
• "Business Requirements"
PROJECT SCOPE
MY BUSINESS REQUIREMENTS
• I have a lot of video games
• I want a simple way to be able to find my video games by keywords
• And keep track of what system they are for
• And keep track of when I last played them and when someone else played them
• And keep track of if I beat them, and my kids too
CONCEPTUAL MODEL
Game
• Name
• System
Keywords
• Categories
• Type
Player
• Date
• Completed
LOGICAL MODEL – FLAT MODEL
Game Title System Liz Last Play Pat Last
Play
Liz
Complete
Pat
Complete
Keywords
FFX PS2 2016-05-01 2016-06-04 Yes No fantasy, jrpg
Chrono
Trigger
PS1 2014-07-05 Yes No jrpg
Forza 4 Xbox360 2017-03-02 No No racing
HIERARCHICAL MODEL
Games
Xbox360
Forza4
PS1
Chrono Trigger
PS2
Final Fantasy X
RELATIONAL MODEL
• I have a system
• I have a game
• I have a player
• Each game has one system, each system can have many games
• Games can have many players, each player can have additional information
DATA STORAGE
C H O O S I N G D ATA B A S E S O F T WA R E
RELATIONAL DATABASES
• Relate (Link) different bits of data to each other
• Very reliable places to store data
• MUST be ACID
ACID COMPLIANCE
• Atomicity
• Consistency
• Isolation
• Durability
DOCUMENT DATABASES
• Schemaless
• Good Performance
• Speedy and Distributed
• Consistency model is BASE
• Graph Databases are Document Databases with relationships added for traversal
BASE
• Basic Availability
• Soft State
• Eventually Consistent
DATA WAREHOUSES
• A place to aggregate and store data for reporting and analysis
• ETL
– Extract
– Transform
– Load
• Data Mart (single subject area)
• OLAP (Online analytical Processing)
• OLTP (Online transaction Processing)
SCALING DATABASES
• Horizontal (Distributed)
• Vertical
• Read Replicas
• Multi-Master
• Partioning (Sharding)
CAP THEOREM
Consistency
Availability
Fault
Tolerance
CHOOSE… WISELY
• Politics will factor into this!
• You don't have to pick just one
• Choose the right solution for the right problem
• With so much available in cloud services and the ease of using containers, spinning up
lightweight places for redis to use in addition to your Postgresql server is not more
expensive!
NORMALIZATION
O R G A N I Z I N G I N TO TA B L E S A N D C O L U M N S
NO MORE ANOMALIES
• Update Anomaly
• Insertion Anomaly
• Deletion Anomaly
• Fidelity Anomaly
NO DUPLICATED DATA
MINIMIZE REDESIGN ON EXTENSION
• Store all data in only one place
• What happens if I add an additional family member I want to track in my application
• The normalized version makes this simple
FIRST NORMAL FORM
1NF
• Has a Primary Key – can be a COMPOUND key
• Has only atomic values
• Has no repeated columns
SECOND NORMAL FORM
2NF
• Table is 1NF AND
• All non-key columns are PK dependent
THIRD NORMAL FORM
3NF
• 2NF PLUS
• No Transitively dependent attributes
BUT WAIT – THERE'S MORE!
• 7 more to be exact
• They're not really that useful in most situations
• You can learn about them from Wikipedia!
DENORMALIZATION
• Wait – didn't you just say to normalize things?
• Usually has one purpose, increased performance, and should be use sparingly
• Doesn't have to be "full" denormalization
– Storing count totals of many elements in a relationship
– star schema "fact-dimension" models
– prebuilt summarizations
RELATIONSHIPSC A R D I N A L I T Y B E T W E E N A L L T H E T H I N G S
TYPES OF RELATIONS
• One to One
• One to Many
• Many to Many
TYPES OF KEYS
• Natural Key
• Alternate Key
• GUID (UUID)
OPTIMIZATION
M A K E I T G O FA S T. . E R
PICK TWO?
Speed
Small
Size
Correct
Data
PHYSICS MATTERS
• Make sure you have enough hardware
• Tune your I/O
– Block and Stripe size allocation for RAID configuration
– Transaction logs in the right spot
– Frequently joined tables on separate discs
• Tune your network protocols
• Adjust cache sizes
UPDATE ALL THE THINGS
• Update your operating system
• Update your db software
• Update your communications protocols
TUNE YOUR SYSTEMS
• Check your vendor for configuration tuning
• Perform your recommended maintenance tasks
PROFILE YOUR CODE
• Check for slow queries
• Check the execution plan on the queries
• Add Indexes to speed up joins
• Rewrite or alter queries to make them perform faster
• Create Views for a query that are indexed separately
– This is best for common joins
• Move routines for data manipulation into stored procedures
• Create cached or denormalized versions of really slow queries
REFACTORING
DATA
M O V I N G S T U F F A R O U N D S U C K S
D O I T A N Y WAY
REFERENTIAL INTEGRITY
REFACTORING
• Add constraints
• Remove constraints
• Add Hard Delete
• Add Soft Delete
• Add Trigger for Calculated Column
• Add Trigger for History
• Add Indexes
DATA QUALITY REFACTORING
• Add lookup table
• Apply Standard codes
• Apply Standard Type
• Add a column constraint
• Introduce common format
STRUCTURAL REFACTORING
• Add a new element
• Delete an existing element
• Merge elements
• Change association types
• Split elements
ARCHITECTURE REFACTORING
• Replace a method with a view
• Add a calculation method
• Encapsulate a table with a view
• Add a mirror table
• Add a read only table
LEARNING MORE
• Free University Courses
– Databases are one thing colleges get RIGHT
– MIT, Stanford, and others have great database theory classes
– Warning, many use python – it won't kill you
• Books
– http://web.cecs.pdx.edu/~maier/TheoryBook/TRD.html - The Theory of Relational Databases
– https://www.amazon.com/Database-Design-Relational-Theory-Normal/dp/1449328016 -
Database Design and Relational Theory
– http://databaserefactoring.com/ Database refactoring
CONTACT
• auroraeosrose@gmail.com
• @auroraeosrose
• http://emsmith.net
• http://github.com/auroraeosros
e
• Freenode
• #phpwomen
• #phpmentoring
• #php-gtk

More Related Content

What's hot

Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Alex Gorbachev
 
Execution Plans: What Can You Do With Them
Execution Plans: What Can You Do With ThemExecution Plans: What Can You Do With Them
Execution Plans: What Can You Do With Them
Grant Fritchey
 
SqlDay 2018 - Brief introduction into SQL Server Execution Plans
SqlDay 2018 - Brief introduction into SQL Server Execution PlansSqlDay 2018 - Brief introduction into SQL Server Execution Plans
SqlDay 2018 - Brief introduction into SQL Server Execution Plans
Marek Maśko
 
Introduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database ProfessionalsIntroduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database Professionals
Alex Gorbachev
 
Unit Testing SQL Server
Unit Testing SQL ServerUnit Testing SQL Server
Unit Testing SQL Server
Giovanni Scerra ☃
 
Incredible ODI tips to work with Hyperion tools that you ever wanted to know
Incredible ODI tips to work with Hyperion tools that you ever wanted to knowIncredible ODI tips to work with Hyperion tools that you ever wanted to know
Incredible ODI tips to work with Hyperion tools that you ever wanted to know
Rodrigo Radtke de Souza
 
Essbase Statistics DW: How to Automatically Administrate Essbase Using ODI
Essbase Statistics DW: How to Automatically Administrate Essbase Using ODIEssbase Statistics DW: How to Automatically Administrate Essbase Using ODI
Essbase Statistics DW: How to Automatically Administrate Essbase Using ODI
Rodrigo Radtke de Souza
 
Got documents - The Raven Bouns Edition
Got documents - The Raven Bouns EditionGot documents - The Raven Bouns Edition
Got documents - The Raven Bouns Edition
Maggie Pint
 
Stardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF DatabaseStardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF Database
kendallclark
 
PostgreSQL Rocks Indonesia
PostgreSQL Rocks IndonesiaPostgreSQL Rocks Indonesia
PostgreSQL Rocks Indonesia
PGConf APAC
 
Intro to Python for C# Developers
Intro to Python for C# DevelopersIntro to Python for C# Developers
Intro to Python for C# Developers
Sarah Dutkiewicz
 
PostgreSQL at 20TB and Beyond
PostgreSQL at 20TB and BeyondPostgreSQL at 20TB and Beyond
PostgreSQL at 20TB and Beyond
Chris Travers
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
Richard Schneeman
 
Introduction to Memory Contexts
Introduction to Memory ContextsIntroduction to Memory Contexts
Introduction to Memory Contexts
Chris Travers
 
Make Text Search "Work" for Your Apps - JavaOne 2013
Make Text Search "Work" for Your Apps - JavaOne 2013Make Text Search "Work" for Your Apps - JavaOne 2013
Make Text Search "Work" for Your Apps - JavaOne 2013
javagroup2006
 
Optimize Performance and Scalability
Optimize Performance and ScalabilityOptimize Performance and Scalability
Optimize Performance and Scalability
Zoomdata
 
Query Any Data by Wayne Eckerson
Query Any Data by Wayne EckersonQuery Any Data by Wayne Eckerson
Query Any Data by Wayne Eckerson
Zoomdata
 
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonA data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madison
Terry Bunio
 
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Michael McIntosh
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot Persistence
Abdelmonaim Remani
 

What's hot (20)

Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
 
Execution Plans: What Can You Do With Them
Execution Plans: What Can You Do With ThemExecution Plans: What Can You Do With Them
Execution Plans: What Can You Do With Them
 
SqlDay 2018 - Brief introduction into SQL Server Execution Plans
SqlDay 2018 - Brief introduction into SQL Server Execution PlansSqlDay 2018 - Brief introduction into SQL Server Execution Plans
SqlDay 2018 - Brief introduction into SQL Server Execution Plans
 
Introduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database ProfessionalsIntroduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database Professionals
 
Unit Testing SQL Server
Unit Testing SQL ServerUnit Testing SQL Server
Unit Testing SQL Server
 
Incredible ODI tips to work with Hyperion tools that you ever wanted to know
Incredible ODI tips to work with Hyperion tools that you ever wanted to knowIncredible ODI tips to work with Hyperion tools that you ever wanted to know
Incredible ODI tips to work with Hyperion tools that you ever wanted to know
 
Essbase Statistics DW: How to Automatically Administrate Essbase Using ODI
Essbase Statistics DW: How to Automatically Administrate Essbase Using ODIEssbase Statistics DW: How to Automatically Administrate Essbase Using ODI
Essbase Statistics DW: How to Automatically Administrate Essbase Using ODI
 
Got documents - The Raven Bouns Edition
Got documents - The Raven Bouns EditionGot documents - The Raven Bouns Edition
Got documents - The Raven Bouns Edition
 
Stardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF DatabaseStardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF Database
 
PostgreSQL Rocks Indonesia
PostgreSQL Rocks IndonesiaPostgreSQL Rocks Indonesia
PostgreSQL Rocks Indonesia
 
Intro to Python for C# Developers
Intro to Python for C# DevelopersIntro to Python for C# Developers
Intro to Python for C# Developers
 
PostgreSQL at 20TB and Beyond
PostgreSQL at 20TB and BeyondPostgreSQL at 20TB and Beyond
PostgreSQL at 20TB and Beyond
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
Introduction to Memory Contexts
Introduction to Memory ContextsIntroduction to Memory Contexts
Introduction to Memory Contexts
 
Make Text Search "Work" for Your Apps - JavaOne 2013
Make Text Search "Work" for Your Apps - JavaOne 2013Make Text Search "Work" for Your Apps - JavaOne 2013
Make Text Search "Work" for Your Apps - JavaOne 2013
 
Optimize Performance and Scalability
Optimize Performance and ScalabilityOptimize Performance and Scalability
Optimize Performance and Scalability
 
Query Any Data by Wayne Eckerson
Query Any Data by Wayne EckersonQuery Any Data by Wayne Eckerson
Query Any Data by Wayne Eckerson
 
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonA data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madison
 
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot Persistence
 

Similar to Database theory and modeling

Got documents Code Mash Revision
Got documents Code Mash RevisionGot documents Code Mash Revision
Got documents Code Mash Revision
Maggie Pint
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
Ike Ellis
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
Tony Tam
 
Revision
RevisionRevision
Revision
David Sherlock
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion Engine
Adam Doyle
 
Binder1.pdf
Binder1.pdfBinder1.pdf
Binder1.pdf
RanumBagaskoro
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
abdulrahmanhelan
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
ColdFusionConference
 
NoSql
NoSqlNoSql
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
Fusion 3 Overview Webinar
Fusion 3 Overview Webinar Fusion 3 Overview Webinar
Fusion 3 Overview Webinar
Lucidworks
 
30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices
David Dhavan
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
David Smelker
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data lab
Chris Kernaghan
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
Ike Ellis
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tiger
Elizabeth Smith
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
Sanura Hettiarachchi
 
No SQL
No SQLNo SQL

Similar to Database theory and modeling (20)

Got documents Code Mash Revision
Got documents Code Mash RevisionGot documents Code Mash Revision
Got documents Code Mash Revision
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Revision
RevisionRevision
Revision
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion Engine
 
Binder1.pdf
Binder1.pdfBinder1.pdf
Binder1.pdf
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 
NoSql
NoSqlNoSql
NoSql
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Fusion 3 Overview Webinar
Fusion 3 Overview Webinar Fusion 3 Overview Webinar
Fusion 3 Overview Webinar
 
30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data lab
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tiger
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
No SQL
No SQLNo SQL
No SQL
 

More from Elizabeth Smith

Welcome to the internet
Welcome to the internetWelcome to the internet
Welcome to the internet
Elizabeth Smith
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tiger
Elizabeth Smith
 
Modern sql
Modern sqlModern sql
Modern sql
Elizabeth Smith
 
Php extensions
Php extensionsPhp extensions
Php extensions
Elizabeth Smith
 
Php internal architecture
Php internal architecturePhp internal architecture
Php internal architecture
Elizabeth Smith
 
Taming the tiger - pnwphp
Taming the tiger - pnwphpTaming the tiger - pnwphp
Taming the tiger - pnwphp
Elizabeth Smith
 
Php extensions
Php extensionsPhp extensions
Php extensions
Elizabeth Smith
 
Php extensions
Php extensionsPhp extensions
Php extensions
Elizabeth Smith
 
Php’s guts
Php’s gutsPhp’s guts
Php’s guts
Elizabeth Smith
 
Lexing and parsing
Lexing and parsingLexing and parsing
Lexing and parsing
Elizabeth Smith
 
Hacking with hhvm
Hacking with hhvmHacking with hhvm
Hacking with hhvm
Elizabeth Smith
 
Security is not a feature
Security is not a featureSecurity is not a feature
Security is not a feature
Elizabeth Smith
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
Elizabeth Smith
 
Mentoring developers-php benelux-2014
Mentoring developers-php benelux-2014Mentoring developers-php benelux-2014
Mentoring developers-php benelux-2014
Elizabeth Smith
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
Elizabeth Smith
 
Socket programming with php
Socket programming with phpSocket programming with php
Socket programming with php
Elizabeth Smith
 
Mentoring developers
Mentoring developersMentoring developers
Mentoring developers
Elizabeth Smith
 
Do the mentor thing
Do the mentor thingDo the mentor thing
Do the mentor thing
Elizabeth Smith
 
Spl in the wild - zendcon2012
Spl in the wild - zendcon2012Spl in the wild - zendcon2012
Spl in the wild - zendcon2012
Elizabeth Smith
 
Mentoring developers - Zendcon 2012
Mentoring developers - Zendcon 2012Mentoring developers - Zendcon 2012
Mentoring developers - Zendcon 2012
Elizabeth Smith
 

More from Elizabeth Smith (20)

Welcome to the internet
Welcome to the internetWelcome to the internet
Welcome to the internet
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tiger
 
Modern sql
Modern sqlModern sql
Modern sql
 
Php extensions
Php extensionsPhp extensions
Php extensions
 
Php internal architecture
Php internal architecturePhp internal architecture
Php internal architecture
 
Taming the tiger - pnwphp
Taming the tiger - pnwphpTaming the tiger - pnwphp
Taming the tiger - pnwphp
 
Php extensions
Php extensionsPhp extensions
Php extensions
 
Php extensions
Php extensionsPhp extensions
Php extensions
 
Php’s guts
Php’s gutsPhp’s guts
Php’s guts
 
Lexing and parsing
Lexing and parsingLexing and parsing
Lexing and parsing
 
Hacking with hhvm
Hacking with hhvmHacking with hhvm
Hacking with hhvm
 
Security is not a feature
Security is not a featureSecurity is not a feature
Security is not a feature
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
Mentoring developers-php benelux-2014
Mentoring developers-php benelux-2014Mentoring developers-php benelux-2014
Mentoring developers-php benelux-2014
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
Socket programming with php
Socket programming with phpSocket programming with php
Socket programming with php
 
Mentoring developers
Mentoring developersMentoring developers
Mentoring developers
 
Do the mentor thing
Do the mentor thingDo the mentor thing
Do the mentor thing
 
Spl in the wild - zendcon2012
Spl in the wild - zendcon2012Spl in the wild - zendcon2012
Spl in the wild - zendcon2012
 
Mentoring developers - Zendcon 2012
Mentoring developers - Zendcon 2012Mentoring developers - Zendcon 2012
Mentoring developers - Zendcon 2012
 

Recently uploaded

存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
fovkoyb
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
Discover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to IndiaDiscover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to India
davidjhones387
 
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
cuobya
 
Understanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdfUnderstanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdf
SEO Article Boost
 
Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?
Paul Walk
 
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
zoowe
 
Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!
Toptal Tech
 
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
CIOWomenMagazine
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
Search Result Showing My Post is Now Buried
Search Result Showing My Post is Now BuriedSearch Result Showing My Post is Now Buried
Search Result Showing My Post is Now Buried
Trish Parr
 
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
uehowe
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
cuobya
 
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
ysasp1
 
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
xjq03c34
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
zyfovom
 
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
cuobya
 
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdfMeet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Florence Consulting
 

Recently uploaded (20)

存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
Discover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to IndiaDiscover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to India
 
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
 
Understanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdfUnderstanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdf
 
Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?
 
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
 
Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!
 
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
Search Result Showing My Post is Now Buried
Search Result Showing My Post is Now BuriedSearch Result Showing My Post is Now Buried
Search Result Showing My Post is Now Buried
 
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
 
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
 
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
 
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
 
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdfMeet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdf
 

Database theory and modeling

  • 1. DATABASE THEORY AND MODELING A C R A S H C O U R S E D B A H O W TO
  • 2. WHAT IS A DBA? DEVELOPMENT • Capacity Planning • Database Design • Database Implementation • Migration OPERATIONS • Installation • Configuration • Monitoring • Security and Access Management • Troubleshooting • Backup and Recovery
  • 3. DATABASE THEORY THE STUDY OF DATABASES AND DATA MANAGEMENT SYSTEMS • Finite Model Theory • Database Design Theory • Dependency Theory • Concurrency Control • Deductive Databases • Temporal and Spatial Databases • Uncertain Data • Query Languages and Logic
  • 4. DATA MODELING T U R N I N G B U S I N E S S R E Q U I R E M E N T S I N TO D ATA R O A D M A P S
  • 5. REASONS FOR MODELING DATA WHAT? • Provide a definition of our data • Provide a format for our data WHY? • Compatibility of data • Lower cost to build, operate, and maintain systems
  • 6. THREE KINDS OF DATA MODEL INSTANCES • Conceptual Data Model • Logical (External) Data Model • Physical Data Model
  • 7. CONCEPTUAL MODEL • Entities that comprise your data • Creating data objects • Identifying any relationships between objects • "Business Requirements"
  • 8. PROJECT SCOPE MY BUSINESS REQUIREMENTS • I have a lot of video games • I want a simple way to be able to find my video games by keywords • And keep track of what system they are for • And keep track of when I last played them and when someone else played them • And keep track of if I beat them, and my kids too
  • 9. CONCEPTUAL MODEL Game • Name • System Keywords • Categories • Type Player • Date • Completed
  • 10. LOGICAL MODEL – FLAT MODEL Game Title System Liz Last Play Pat Last Play Liz Complete Pat Complete Keywords FFX PS2 2016-05-01 2016-06-04 Yes No fantasy, jrpg Chrono Trigger PS1 2014-07-05 Yes No jrpg Forza 4 Xbox360 2017-03-02 No No racing
  • 12. RELATIONAL MODEL • I have a system • I have a game • I have a player • Each game has one system, each system can have many games • Games can have many players, each player can have additional information
  • 13. DATA STORAGE C H O O S I N G D ATA B A S E S O F T WA R E
  • 14. RELATIONAL DATABASES • Relate (Link) different bits of data to each other • Very reliable places to store data • MUST be ACID
  • 15. ACID COMPLIANCE • Atomicity • Consistency • Isolation • Durability
  • 16. DOCUMENT DATABASES • Schemaless • Good Performance • Speedy and Distributed • Consistency model is BASE • Graph Databases are Document Databases with relationships added for traversal
  • 17. BASE • Basic Availability • Soft State • Eventually Consistent
  • 18. DATA WAREHOUSES • A place to aggregate and store data for reporting and analysis • ETL – Extract – Transform – Load • Data Mart (single subject area) • OLAP (Online analytical Processing) • OLTP (Online transaction Processing)
  • 19. SCALING DATABASES • Horizontal (Distributed) • Vertical • Read Replicas • Multi-Master • Partioning (Sharding)
  • 21. CHOOSE… WISELY • Politics will factor into this! • You don't have to pick just one • Choose the right solution for the right problem • With so much available in cloud services and the ease of using containers, spinning up lightweight places for redis to use in addition to your Postgresql server is not more expensive!
  • 22. NORMALIZATION O R G A N I Z I N G I N TO TA B L E S A N D C O L U M N S
  • 23. NO MORE ANOMALIES • Update Anomaly • Insertion Anomaly • Deletion Anomaly • Fidelity Anomaly
  • 24. NO DUPLICATED DATA MINIMIZE REDESIGN ON EXTENSION • Store all data in only one place • What happens if I add an additional family member I want to track in my application • The normalized version makes this simple
  • 25. FIRST NORMAL FORM 1NF • Has a Primary Key – can be a COMPOUND key • Has only atomic values • Has no repeated columns
  • 26. SECOND NORMAL FORM 2NF • Table is 1NF AND • All non-key columns are PK dependent
  • 27. THIRD NORMAL FORM 3NF • 2NF PLUS • No Transitively dependent attributes
  • 28. BUT WAIT – THERE'S MORE! • 7 more to be exact • They're not really that useful in most situations • You can learn about them from Wikipedia!
  • 29. DENORMALIZATION • Wait – didn't you just say to normalize things? • Usually has one purpose, increased performance, and should be use sparingly • Doesn't have to be "full" denormalization – Storing count totals of many elements in a relationship – star schema "fact-dimension" models – prebuilt summarizations
  • 30. RELATIONSHIPSC A R D I N A L I T Y B E T W E E N A L L T H E T H I N G S
  • 31. TYPES OF RELATIONS • One to One • One to Many • Many to Many
  • 32. TYPES OF KEYS • Natural Key • Alternate Key • GUID (UUID)
  • 33. OPTIMIZATION M A K E I T G O FA S T. . E R
  • 35. PHYSICS MATTERS • Make sure you have enough hardware • Tune your I/O – Block and Stripe size allocation for RAID configuration – Transaction logs in the right spot – Frequently joined tables on separate discs • Tune your network protocols • Adjust cache sizes
  • 36. UPDATE ALL THE THINGS • Update your operating system • Update your db software • Update your communications protocols
  • 37. TUNE YOUR SYSTEMS • Check your vendor for configuration tuning • Perform your recommended maintenance tasks
  • 38. PROFILE YOUR CODE • Check for slow queries • Check the execution plan on the queries • Add Indexes to speed up joins • Rewrite or alter queries to make them perform faster • Create Views for a query that are indexed separately – This is best for common joins • Move routines for data manipulation into stored procedures • Create cached or denormalized versions of really slow queries
  • 39. REFACTORING DATA M O V I N G S T U F F A R O U N D S U C K S D O I T A N Y WAY
  • 40. REFERENTIAL INTEGRITY REFACTORING • Add constraints • Remove constraints • Add Hard Delete • Add Soft Delete • Add Trigger for Calculated Column • Add Trigger for History • Add Indexes
  • 41. DATA QUALITY REFACTORING • Add lookup table • Apply Standard codes • Apply Standard Type • Add a column constraint • Introduce common format
  • 42. STRUCTURAL REFACTORING • Add a new element • Delete an existing element • Merge elements • Change association types • Split elements
  • 43. ARCHITECTURE REFACTORING • Replace a method with a view • Add a calculation method • Encapsulate a table with a view • Add a mirror table • Add a read only table
  • 44. LEARNING MORE • Free University Courses – Databases are one thing colleges get RIGHT – MIT, Stanford, and others have great database theory classes – Warning, many use python – it won't kill you • Books – http://web.cecs.pdx.edu/~maier/TheoryBook/TRD.html - The Theory of Relational Databases – https://www.amazon.com/Database-Design-Relational-Theory-Normal/dp/1449328016 - Database Design and Relational Theory – http://databaserefactoring.com/ Database refactoring
  • 45. CONTACT • auroraeosrose@gmail.com • @auroraeosrose • http://emsmith.net • http://github.com/auroraeosros e • Freenode • #phpwomen • #phpmentoring • #php-gtk

Editor's Notes

  1. Wouldn't it be great if everyone had a DBA to design and manage data for you? Most places don't have this luxury, instead the burden falls on the developer. Your application is awesome, people are using it everywhere. But is your data storage designed to scale to millions of users in a way that's economical and efficient? Data modeling and theory is the process of taking your application and designing how to store and process your data in a way that won't melt down. This talk will walk through proper data modeling, choosing a data storage type, choosing database software, and architecting data relationships in your system. We'll also walk through "refactoring data" using normalization and optimization. This talk is mainly designed for people (like me) who start off developing and realize that they are not only the dev but the dba and everything else Tell a story about moving a website (in 1998) from storage in flat html files into a database and having no idea what I was doing
  2. A DBA has a lot of hats they have to wear Knowledge of database Queries Knowledge of database theory Knowledge of database design Knowledge about the RDBMS itself, e.g. Microsoft SQL Server or MySQL Knowledge of structured query language (SQL), e.g. SQL/PSM or Transact-SQL General understanding of distributed computing architectures, e.g. Client–server model General understanding of operating system, e.g. Windows or Linux General understanding of storage technologies and networking General understanding of routine maintenance, recovery, and handling failover of a database Basically DBAs wear two hats – one that has to do with day to day maintenance and is more of an IT position – this includes tuning systems, troubleshooting, backups, etc. And then there is the design and architecture portion of being a DBA – which is generally the part a programmer gets shoved into with little or no preparation. This talk is designed to give you a crash course in the database theory and modeling portion of being a DBA, and how to make smart choices in your code
  3. Database theory is all the ways that we store and manage data all these other things below it are parts of database theory finite model theory deals with the relation between a formal language (syntax) and its interpretations (semantics) Database design involves classifying data and identifying interrelationships. This theoretical representation of the data is called an ontology – which is the theory behind the database's design. dependency theory studies implication and optimization problems related to logical constraints, commonly called dependencies, on databases concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible. deductive database is a database system that can make deductions (i.e., conclude additional facts) based on rules and facts stored in the (deductive) database (datalog and prolog) temporal and spatial database are special types storing time data and spatial data like polygons, points, and lines uncertain data is data that contains noise that makes it deviate from the correct, intended or original values how many does the audience understand or can name?
  4. Wait – why are we modeling our database before we pick what database software technology to use? We have a saying in my current position that answers those user questions of "would it be possible to?" Anything is possible – how useful and how much effort is involved are the more important questions Although you could make a database technology store ANY kind of data (and I've seen some pretty horrific shoehorning in my career) you and everyone else will be a lot happier if your software choices help instead of hinder what you're trying to accomplish But first, you must figure out your data What are you trying to store and how are you trying to store it? Or if this isn't a shiny greenfield project – what are you currently storing and how, then what would be the ideal way to store and access the data. yes, you can (and should!) refactor your data models! Twisting the code into knots or doing things in code the database should be doing is a recipe for down-time (story time – working on an unnamed project to protect the innocent and the guilty, I ended up writing a schema on top of a mongodb system instead of storing the data in a relational database and having the program output appropriate json stored in a cached format)
  5. The quality of your data model can severely help or hinder your future work Business rules, specific to how things are done in a particular place, are often fixed in the structure of a data model. This means that small changes in the way business is conducted lead to large changes in computer systems and interfaces Data models for different systems are arbitrarily different. The result of this is that complex interfaces are required between systems that share data. These interfaces can account for between 25-70% of the cost of current systems Data cannot be shared electronically with customers and suppliers, because the structure and meaning of data has not been standardized. For example, engineering design data and drawings for process plant are still sometimes exchanged on paper Another story about us currently dealing with this structure and meaning of data problem – the people running the machines on the floor expect different things from the cnc programmers who expect different things from the engineers. We're currently working on bundling all the data in electronic format needed for each step of the process in a data structure that is defined and standardized
  6. Although this is not the ONLY way to do things, it is a very GOOD way to do things This idea of 3 levels of architecture originated in the 1970s American National Standards Institute. 1975. ANSI/X3/SPARC Study Group on Data Base Management Systems; Interim Report. yes, sparc, you heard right I'll talk about this later – but database theory hasn't really changed a lot – the basic mathematical and logical theories underlying databases and how they work haven't changed Only our implementations on these theories has changed Are your brains bleeding yet? Let's get a little more hands on
  7. Creating a conceptual model of your data can be the most difficult part of any process Often you're asked to do this when you're not the "domain owner" This is not your data and you don’t quite know what people do with it The BEST way to get this information is to ASK, and then to LISTEN (and write stuff down) Drawing pictures works well to – simple diagrams help people understand
  8. So this is a pretty basic place to start In my "concept" I have a list of concrete things (a video game) and I want to be able to keep track of information about these games So this is my basic concept,
  9. So I have a conceptual model of my games – the game has information about it like a name and the system it's played on The game also has some keywords I can use for searching – like a game category such as rpg or a play style type such as first person Then I want to collect information about playing the game – the player name, the last date they played, if the game was completed or not After the conceptual model for the data is found we need to turn this into a logical model
  10. So the logical model is a method of mapping this stuff into what we expect And anyone who has ever had to deal with any type of businesses knows their favorite method of storing data Excel! Because a spreadsheet is the BEST way of storing data right? In this case we're starting with just a flat model – a way of representing stuff in a straightforward way But, this usually doesn't work really well First of all, we have a spot where there is no information – I hate racing games and first person shooters – Patrick is not as gung ho about jrpgs So any rows with those kinds of games will have "empty" columns That's not very smart Part of transitioning our conceptual model to our logical model involves dealing with relationships But what kind of relationships are most important for our data? Well there's one I see right now…
  11. So all the games do have the advantage of being group by systems. So I could do a hierarchical model of that But that doesn't really work that fantastically does it? Although it does give me an idea of what kind of data I have but remember, some times of data are not a hierarchy Some types of data are not flat
  12. Some type of data are not relational, but in this case my data IS relational data means you have things that – well – have a relationship with each other
  13. so we have an idea of the type of data we want to collect – how do we make a decision on what to use?
  14. so relational databases are the oldies but goodies originally proposed by proposed by E. F. Codd in 1970 almost all dbs use sql for querying and maintaining the db
  15. intended to guarantee validity even in the event of errors, power failures, etc. In the context of databases, a sequence of database operations that satisfies the ACID properties, and thus can be perceived as a single logical operation on the data, is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction. Atomicity Transactions are often composed of multiple statements. Atomicity guarantees that each transaction is treated as a single "unit", which either succeeds completely, or fails completely: if any of the statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors and crashes. Consistency Consistency ensures that a transaction can only bring the database from one valid state to another, maintaining database invariants: any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. This prevents database corruption by an illegal transaction, but does not guarantee that a transaction is correct. Isolation Transactions are often executed concurrently (e.g., reading and writing to multiple tables at the same time). Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially. Isolation is the main goal of concurrency control; depending on the method used, the effects of an incomplete transaction might not even be visible to other transactions. Durability Durability guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure (e.g., power outage or crash). This usually means that completed transactions (or their effects) are recorded in non-volatile memory.
  16. designed for storing, retrieving and managing document-oriented information, also known as semi-structured data. Document-oriented databases are one of the main categories of NoSQL databases, and the popularity of the term "document-oriented database" has grown[1] with the use of the term NoSQL itself. XML databases are a subclass of document-oriented databases that are optimized to work with XML documents. Graph databases are similar, but add another layer, the relationship, which allows them to link documents for rapid traversal. Document-oriented databases are inherently a subclass of the key-value store, another NoSQL database concept. The difference lies in the way the data is processed; in a key-value store, the data is considered to be inherently opaque to the database, whereas a document-oriented system relies on internal structure in the document in order to extract metadata that the database engine uses for further optimization.
  17. For many domains and use cases, ACID transactions are far more pessimistic (i.e., they’re more worried about data safety) than the domain actually requires. although some databases are starting to bring some of the features of rdbm's (schemas and acid compliance) – there's a tradeoff in speed for that ;) Basic Availability The database appears to work most of the time. Soft-state Stores don’t have to be write-consistent, nor do different replicas have to be mutually consistent all the time. Eventual consistency Stores exhibit consistency at some later point (e.g., lazily at read time). Given BASE’s loose consistency, developers need to be more knowledgeable and rigorous about consistent data if they choose a BASE store for their application. It’s essential to be familiar with the BASE behavior of your chosen aggregate store and work within those constraints. On the other hand, planning around BASE limitations can sometimes be a major disadvantage when compared to the simplicity of ACID transactions. A fully ACID database is the perfect fit for use cases where data reliability and consistency are essential.
  18. is a system used for reporting and data analysis, and is considered a core component of business intelligence.[1] DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place[2] that are used for creating analytical reports for workers throughout the enterprise.[3] The typical Extract, transform, load (ETL)-based data warehouse[4] uses staging, data integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the disparate source data systems. The integration layer integrates the disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database. The integrated data are then moved to yet another database, often called the data warehouse database, where the data is arranged into hierarchical groups, often called dimensions, and into facts and aggregate facts. The combination of facts and dimensions is sometimes called a star schema. The access layer helps users retrieve data. OLAP databases store aggregated, historical data in multi-dimensional schemas (usually star schemas). OLAP systems typically have data latency of a few hours, as opposed to data marts, where latency is expected to be closer to one day. The OLAP approach is used to analyze multidimensional data from multiple sources and perspectives. The three basic operations in OLAP are : Roll-up (Consolidation), Drill-down and Slicing & Dicing. OLTP systems emphasize very fast query processing and maintaining data integrity in multi-access environments. For OLTP systems, effectiveness is measured by the number of transactions per second. OLTP databases contain detailed and current data. Benefits Integrate data from multiple sources into a single database and data model. More congregation of data to single database so a single query engine can be used to present data in an ODS. Mitigate the problem of database isolation level lock contention in transaction processing systems caused by attempts to run large, long-running, analysis queries in transaction processing databases. Maintain data history, even if the source transaction systems do not. Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger. Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data. Present the organization's information consistently. Provide a single common data model for all data of interest regardless of the data's source. Restructure the data so that it makes sense to the business users. Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems. Add value to operational business applications, notably customer relationship management (CRM) systems. Make decision–support queries easier to write. Organize and disambiguate repetitive data.[7]
  19. Read replicas allow data to be available for reading across any number of servers, called “slaves”. One server remains the “master” and accepts any incoming write requests, along with read requests. This technique is common for relational databases, as most vendors support replication of data to multiple read-only servers. The more read replicas installed, the more read-based queries may be scaled. While the read replica technique allows for scaling out reads, what happens if you need to scale out to a large number of writes as well? The multi-master technique may be used to allow any client to write data to any database server. This enables all read replicas to be a master rather than just slaves. This enables applications to scale out the number of reads and writes. However, this also requires that our applications generate universally unique identifiers, also known as “UUIDs”, or sometimes referring to as globally unique identifiers or “GUIDs”. Otherwise, two rows in the same table on two different servers might result in the same ID, causing a data collision during the multi-master replication process. Very large data sets often produce so much data that any one server cannot access or modify the data by itself without severly impacting scale and performance. This kind of problem cannot be solved through read replicas or multi-master designs. Instead, the data must be separated in some way to allow it to be easily accessible. Horizontal partitioning, also called “sharding”, distributes data across servers. Data may be partioned to different server(s) based on a specific customer/tenant, date range, or other sharding scheme. Vertical partioning separates the data associated to a single table and groups it into frequently accessed and rarely accessed. The pattern chosen allows for the database and database cache to manage less information at once. In some cases, data patterns may be selected to move data across multiple filesystems for parallel reading and therefore increased performance. GDPR
  20. Brewer's theorem after computer scientist Eric Brewer, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:[1][2][3] Consistency: Every read receives the most recent write or an error Availability: Every request receives a (non-error) response – without guarantee that it contains the most recent write Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes Think of this as being a riff on "fast/cheap/good" you get two! Database systems designed with traditional ACID guarantees in mind such as RDBMS choose consistency over availability, whereas systems designed around the BASE philosophy, common in the NoSQL movement for example, choose availability over consistency.[6]
  21. There are lots of choices that have come into play that are beyond just the technical considerations. Price, availability, what your CEO read in the magazine last week will all contribute to this. Can your IT department install this? mysql does a middling job of everything except being easy to install and administrate
  22. So – let's talk about normalizing data normalizing data has a couple purposes but is not the be all end all of databases generally however, normalization SOLVES more problems than it creates
  23. Basically normalization exists to help get rid of anomalies in data This means that the data is the same for all things in all places, and we aren't storing duplication AND POSSIBLY INCORRECT data What if you spell checking with ck in one name and que in another? What if Patrick moves out and I remove all his game data from my database, except for 50 rows I forgot?
  24. This may seem to be a small thing, but small data can build up over time and take up lots more space than you'd expect! It really is designed to decrease the amount of pain and suffering when iterating on the design of the database
  25. So how would we structure my database application? atomic values basically means you're storing only ONE value – so you can't do two telephone numbers in a telephone column now the atomic thing is rather interesting since one could argue that dates or strings can be "decomposed" which is the definition of atomic – basically atomic is used in current form to mean "not xml or json or some other representation of complex data" … or simply ignored
  26. This basically means that every table should be related to the primary key of the first table Partial dependencies are removed, i.e., all non key attributes are fully functional dependent on the primary key. In other words, nonkey attributes cannot depend on a subset of the primary key.
  27. "[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key." "so help me Codd".[8] - That's Edgar F Codd who invented relational database management while working for IBM Requiring existence of "the key" ensures that the table is in 1NF; requiring that non-key attributes be dependent on "the whole key" ensures 2NF; further requiring that non-key attributes be dependent on "nothing but the key" ensures 3NF.
  28. And…. no one cares
  29. Now that I've preached on how to normalize databases, I'm going to tell you it's perfectly fine to denormalize AFTER you've normalized and AS NEEDED you may find that one or two queries or tables constitutes most of your speed problems and judicious use of denormalization can help
  30. Often you'll see subsets of this as zero or one, only one, one to zero or many you should be connecting tables that represent entity types many to many relations are generally done using an association table – the relationship becomes an entity in a table linking them together
  31. states two letter abbreviation
  32. So on to the last part of being a dba – that usually comes after you have stuff written You have to optimize it! but what does it mean to "optimize" your database What does "fast" mean for a database? the answer is always – it depends Are you focused on your data always being correct? or on fast load times? or on small storage space?
  33. As in all things you're not always going to be able to optimize for all things Usually Faster is going to mean you are storing more on disc – via caching or denormalized layout or something else Usually correct data is going to come about by making things less concurrent and more robust – more checks (hence… slower) Usually small size means you're storing as little as possible in a very optimized way, which generally means more work for your application As long as you understand the tradeoffs you can "speed things up"
  34. No matter what you do to optimize you are going to hit physical barriers Sometimes that means "speeding up your database" means throwing more hardware at the problem There is a finite amount of processing that any system will be able to do. So the solution may be two systems instead
  35. Most of this section tends to go to a bit of "no brainer" land You want your db to go faster? keep your software up to date those are both "easy" in theory but possibly "expensive" in truth to do But building in a cadence of upgrading systems will keep you and your users happier
  36. Tune your database management system – that sounds "easy" as well but is made more difficult by the fact that each vendor has it's own requirements for tuning But generally this is a process of checking your vendor for best practicing and benchmarking for memory allocation, caches, concurrency settings (like reserving processors or memory) and fiddleing with network protocols maintenance tasks can involve things like vacuuming postgresql dbs or defragmentation, statistics updates, adjust the size of transaction logs and rotate and offload logging I had an sql server system running like a dog a 50gb transaction log from a migration will do that to you
  37. This should be last on your list. And don't just guess, actually check which queries are slow. Almost every database has a way to log slow queries And most frameworks and db abstraction layers have logging and timing functionality to catch exceptionally slow queries
  38. The biggest issue with refactoring data is the possibility of data loss so most people tend to shy away from large data refactors EVEN if a data refactor would cut their code in half This is a fallacy – think about the word refactoring – it's a small change to the database schema that improves it's design without changing it's semantics The #1 issue with database refactoring is COMMUNICATION BETWEEN THOSE RESPONSIBLE FOR THE CODE AND THOSE RESPONSIBLE FOR THE DATABASE code refactorings only need to maintain behavioral semantics while database refactorings also must maintain informational semantics Database refactoring does not change the way data is interpreted or used and does not fix bugs or add new functionality. Every refactoring to a database leaves the system in a working state, thus not causing maintenance lags, provided the meaningful data exists in the production environment.
  39. These are generally some of the easiest and most effective refactors you can do on a database Discuss briefly how each thing could help with making your application better
  40. lookup table is easy Standard code would be making sure the same country/state codes as those in a lookup table are used standard type would be making sure all phone numbers are the same sized integer make sure your column constraint gives you logical values – like age should be > 0 but less than 200 make sure all your phone numbers are stored as integers with no separator values Most of these will require two steps change the code to make sure the values are checked properly before coming in Run a migration on the data to make sure the values are correct Change the database if necessary These are also less "lossy" types of refactoring but tend to improve the quality of the data being stored
  41. by element here I mean Table View Column this is the "hard" problems The changes that might make your code much nicer, but require a good deal of work And without tests!! and backups!! this can bite you The best thing to do in this case is make SMALL changes a little at a time AND TEST
  42. These are generally large changes to the actual architecture of the application, not just to the relationships or the data or the structure These are changes that can have the greatest impact on performance
  43. There are a lot of places to learn more about databases. But the really BEST way to learn is to DO play around with a new system. Think of how you'd redo your present storage mechanism if you could It might lead to actually being able to do it for real
  44. Aurora Eos Rose is the handle I’ve had forever – greek and roman goddesses of the dawn and aurora rose from sleeping beauty