SlideShare a Scribd company logo
1 of 37
Download to read offline
The evolution of database technology (I)
Huibert Aalbers
Senior Certified Executive IT Architect
IT Insight podcast
• This podcast belongs to the IT Insight series
• You can subscribe to the podcast through iTunes.
• Additional material such as presentations in PDF format or white
papers mentioned in the podcast can be downloaded from the IT
insight section of my site at http://www.huibert-aalbers.com
• You can send questions or suggestions regarding this podcast to my
personal email, huibert_aalbers@mac.com
Hierarchical databases
• In the 60’s IBM launched the first
computers equipped with a hard disk
drive
• This spurred the development of a
technology to store, process and
retrieve data. IMS, in 1968, became
the first commercial database
software, developed by IBM to
inventory the very large bill of
materials (BOM) for the Saturn V moon
rocket and Apollo space vehicle.
• IMS was the first hierarchical database
Hierarchical databases
• Hierarchical databases have a serios
limitation. They only support 1 to n
relationships, which make data
modeling difficult
• A parent can have multiple
children
• A child can only have a single
parent
Hierarchical databases
• The most well known hierarchical
databases are
• IMS (still popular in large banks)
• Windows registry
• LDAP directories (depending on the
implementation)
• Hierarchical databases still have a
significant performance edge over
more modern relational databases
Relational databases
• In 1970, Ted Codd, a British
mathematician who worked at IBM,
published a paper titled “A relational
model of data for large shared data
banks”
• His groundwork generated much interest
in the information management world and
spurred the creation of new companies
such as Oracle (1977) or Informix (1980)
that implemented Codd’s ideas.
Meanwhile, IBM developed DB2, which
first appeared on mainframes (1981) and
later on distributed platforms.
Relational databases
• For over thirty years, relational
databases have ruled the database
market, based on their undeniable
strengths
• During that period, users have shaped
the evolution of the technology by
demanding new features and
increased performance
Strengths of Relational Databases
• Great technology to store large
volumes of structured data
• The consistency of the data is
guaranteed through the
implementation of the ACID properties
• Atomicity
• Consistency
• Isolation
• Durability
User requirements that have shaped modern
relational databases
• Increased scalability
• Alibí to perform complex queries against
large data sets (Data warehousing)
• Support for new programming
languages and types of data
• Requirements inspired by trends in
modern programming languages
• Improved administration features to ease
management of large numbers of
database instances
Increased scalability
• Symmetric Multiprocessing (SMP)
• IBM System 65 (1967)
• UNIX (starting in the mid 80’s)
• Support for multiples processor cores
• Power 4 (2001)
• Data partitioning
• SQL query optimizer improvements
• Data compression
• Increased use of RAM
• Clustering
What are the relational databases bottlenecks?
• I/O
• SQL joins
• Transactions (Locks), Distributed
Transactions (Two phase commit)
• Concurrency
• Hardware
Data partitioning
• Hard disk drives used to be the main
bottleneck which prevented quick data
access. That is why a system was
needed to access the data from
multiple disks, in parallel.
• A partitioned table has its data spread
over multiple disks, based on:
• Expression
• Range
• Round-Robin
Data compression
• Data compression allows for significant storage (and
therefore money) savings. In addition, and this may
sound counterintuitive, it also increases
performance, since data is read much faster (with
less I/O), specially when data is stored in a columnar
form. Administrators can choose to compress:
• Data
• Indices
• Blobs
• Results are spectacular
• Up to 80% less space needed to store the data
• Up to 20% less I/O
In-memory databases
These databases store data in memory (RAM) instead of hard disk drives to
scale better and support extremely high volumes transactions
• This technology was originally designed to meet needs of specific
industries (telcos and financial institutions primarily), that required
processing unusually high volumes of transactions
Recently, the line that divided in-memory databases from traditional databases
has started to blur with the introductions of databases such as DB2 BLU which
automatically try to make the most use of RAM to improve performance without
requiring all the data to be loaded in memory
BLU(Oracle Exalytics)
Data Warehouse
• The need for analyzing vast amounts of data was the first
application that challenged the dominance of RDMS as
the only tool required to work with data, as performance
became a serious issue.
• In order to avoid impacting the performance of OLTP
(OnLine Transaction Processing) databases, common
sense dictated that data analysis should be performed on
a different data store. As a result, the process is as
follows:
• The data is first moved from the OLTP database to an
operational data store (ODS), a repository used to
transform the data before it can be used
• Then, the data is moved to the databased in which the
information is analyzed, the Data Warehouse (DW)
• This process is at the origin of the spectacular growth
in the use of ETL (Extraction, Transformation and
Load) and data quality tools
Extraction-Transformation-Load
ETL tools
• In Data warehouse environments, it is
common to update the data regularly
(usually nightly) with the latest information
from the transactional systems (OLTP). In
general, the data needs to be transformed
before it can be loaded into the DWH.
• In addition, it is still very common to
exchange data between systems by
sending flat files from one computer to
another.
• This will probably disappear over time,
as we move to a world where systems
need to be online at all time.
Data Replication
• In modern environments which are online 24/7,
exchanging flat files to share information among
systems is not a viable solution.
• As soon as a change happens in one
database, this needs to be reflected in the
other repositories that require the
información.
• The data replication needs to be guaranteed,
even if one of the repositories is momentarily
off-line
• Most databases include some kind of built-in
replication functionality, but it is usually
limited in scope, i.e. not allowing replication
between databases from different brands.
Data cleansing and enrichment
• Analyzing dirty data is simply not possible. It
needs to be cleansed.
• Standardize data (addresses, names, etc.)
• Eliminate duplicates, erroneous data, etc.
• Further, deeper analysis can be performed
when the data is enriched with additional data
• Geocoding (distance to store)
• Demographics (age, sex, marital status,
estimated income, house ownership,
attended college, political leaning, charitable
giving, number of cars owned, etc.)
Data Warehouse
• Data is usually kept in a star schema, a special
case of the snowflake schema, which is effective
to handle simpler DWH queries
• Fact tables are at the centre of the schema and
surrounded by the dimensions tables.
• These tables are usually not normalized, for
performance reasons. Referential integrity is not
a concern as the data is usually imported from
databases that enforce referential integrity.
• Specialized DWH databases can load data
very quickly, run queries very fast by using
specialized indices and execute critical
operations such as building aggregate tables in
optimized ways
BLU
Database clustering
If web servers can scale horizontally, why can’t
relational databases do the same? Couldn’t we share
the workload among multiple computer nodes?
In order to achieve that, computer scientists have
created two distinct architectures
• Shared disk
• Multiple instances of the database, all
pointing to a single copy of de los datos
• Shared Nothing
• Multiples instances of the database, each
one owning part of the data set (the data is
partitioned)
Shared Disk
• Pros
• If one database instance or even a computing
node fails, the system keeps working
• Good performance when reading data, even
though the shared disk can become a bottleneck
• Cons
• Write operations become the main bottleneck
(specially when using more than two nodes),
because all the nodes need to be coordinated
• This can be mitigated by partitioning the data
• If the shared disk fails, the whole system fails
• Recovery after a node fails is a lengthy operation
pureScale
Shared Nothing
XPSEEE/DPF
• Pros
• In general, write operations are extremely fast
• Scales linearly
• Cons
• Read operations can be slower when queries
execute joins on data residing on different disks
• This also applies, to a lesser extent, to write
operations on data residing on multiple disks
• If a computing node or its disk fails, all that data
becomes unavailable
Data marts
• The first DWH grew extremely quickly
until they became too hard to manage
• That is why organizations started to
build specialized Data marts by
function (HR, finance, sales, etc.) or
department
• In order to avoid creating information
silos, all data marts should use the
same dimensions
• This is usually enforced by the ETL
tools
OLAP cubes
The data is stored in a repository using a
star schema, which in turn is used to
build a multidimensional cube to analyze
the information through multiple
dimensions (sales, regions, time periods,
etc.)
MOLAP / ROLAP tools
• MOLAP tools (Multidimensional OLAP)
load data into a cube, on which the user
can quickly execute complex queries
• ROLAP tools (Relational OLAP) transform
user queries into complex SQL queries
that are executed on a relational database
• This requires a relational database that
has been optimized to handle data
warehouse type queries
• In addition, to improve performance,
aggregate tables need to be built
MOLAP tools
• Pros
• In most cases delivers better performance, due
to index optimization, data cache and efficient
storage mechanisms
• Lower disk space usage due to efficient data
compression techniques
• Aggregate data is built automatically
• Cons
• Loading large data sets can be slow
• Working with models that include large amounts
of data and a large number of dimensions is
highly inefficient
ROLAP tools
• Pros
• Usually scales better (dimensions and registers)
• Loading data with a robust ETL usually is much
faster
• Cons
• Generally offers worse performance when both
MOLAP and ROLAP tools can perform the job. This
can however be mitigated by using ad-hoc
database extensions (for example DB2 cubes)
• Depends on SQL. In some cases, this does not
translate well for some particular use cases
(budgeting, financial reporting, etc.)
• Uses much more space on disk
HOLAP tools
HOLAP (Hybrid Online Analytical Processing)
is a combination of ROLAP and MOLAP
With this technology it becomes possible to
store part of the data in a MOLAP repository
and the remainder information in a ROLAP
one in order to choose the best strategy for
each case. For example:
• Keep large tables with the detailed data
in a relational database
• Keep agregate data in a MOLAP
repository
Hardware (Appliances)
In order to obtain the best performance from
the software and simplify the database
management, some manufacturers have
opted for developing integrated hardware
and software soluciones (a.k.a. appliances)
• It simplifies configuration (loading data,
index and schema creation, etc).
• It simplifies maintenance (standard
components and streamlined support)
• It allows to get the most performance out
of the hardware by using specialized
chips and optimized storage devices
Hardware (CPU)
• The IBM Power 8 micro processor was
specially designed to excel at data
processing applications
• Large memory cache (512k for each
core, 96MB shared L3 cache and a
128MB L4 cache outside the chip)
• 8 threads per core, 12 cores per
chip (96 threads per chip)
• Up to 5GHz
Columnar databases
BLU
In Business Analytics environments, it is very unlikely that all the columns of a
register will be required as part of the result of a query or in the WHERE clause
• Having the data organized by columns instead of by register (rows) allows
to significantly improve query times because usually much less information
has to be read from disk
• Modern databases such as DB2 BLU have been designed to excel both in
OLTP as well as in OLAP environments. That means that DBAs can choose
at database or table creation time how the data will be stored on disk
(columns or rows)
Support for new data types
During the 90s, developers started to ask for
expanded datatype support in relational databases
• Distinct types based on existing types
• STRUCT like composed types
• Completely new data types with their own
indexing methods (videos, pictures, sound)
• Time series
• Coordinates (2D, 3D)
• Text documents
• XML, Word, PDF, etc.
• Etc.
Requirements inspired by trends in modern
programming languages
• Inheritance
• Tables and types that inherit part of
their structure from other tables/types
• Polymorphism
• More flexibility to define/overload
functions, stored procedures and
operators
• Stored procedures written in modern
programming languages
Object-relational databases
• Illustra was company that developed an
object relational databased that pioneered
many of these interesting concepts that came
primarily from Java and Smalltalk
• Informix acquired Ilustra and integrated these
novel ideas into version 9.x of its flagship IDS
database
• Later on, DB2 and Oracle also implemented
some of those ideas
• Mapping Java objects to a relational database
(O/R mapping) is a different issue that can be
solved using object persistence libraries
Improved database management
• The more options a DBA has to tune the
system, the better are his chances to get
the most performance out of the system
• However, as we provide more knobs to tune
the system, the DBA’s job becomes more
and more complex, specially in large
datacenter where a single DBA may be
responsible for hundreds of database
instances
• The solution to this problem is Autonomic
Computing, which allows the database to
tune itself, based on rules that result from
experience
Relational databases have evolved, a lot
Despite the fact that just before the data explosion resulting from the Web 2.0
phenomenon, some large enterprises still used niche databases to cope with
the limitations of relational databases in some edge use cases, the fact is that
in most cases the most advanced database products (such as DB2, Oracle
and Informix) had been very successful evolving very quickly in order to solve
virtually all emerging information management problems, and therefore
avoiding to have their privileged position be threatened in any significant way
by new products.
Contact information
On Twitter: @huibert (English), @huibert2 (Spanish)
Web site: http://www.huibert-aalbers.com
Blog: http://www.huibert-aalbers.com/blog

More Related Content

What's hot

Power BI: From the Basics
Power BI: From the BasicsPower BI: From the Basics
Power BI: From the BasicsNikkia Carter
 
Basic oracle-database-administration
Basic oracle-database-administrationBasic oracle-database-administration
Basic oracle-database-administrationsreehari orienit
 
Optimizing E-Business Suite Storage Using Oracle Advanced Compression
Optimizing E-Business Suite Storage Using Oracle Advanced CompressionOptimizing E-Business Suite Storage Using Oracle Advanced Compression
Optimizing E-Business Suite Storage Using Oracle Advanced CompressionAndrejs Karpovs
 
Landing Self Service Analytics using Microsoft Azure & Power BI
Landing Self Service Analytics using Microsoft Azure & Power BILanding Self Service Analytics using Microsoft Azure & Power BI
Landing Self Service Analytics using Microsoft Azure & Power BIVisual_BI
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Introduction to databases
Introduction to databasesIntroduction to databases
Introduction to databasesAashima Wadhwa
 
Power BI / AAS Model Optimization
Power BI / AAS Model OptimizationPower BI / AAS Model Optimization
Power BI / AAS Model OptimizationDan English
 
Oracle Architecture
Oracle ArchitectureOracle Architecture
Oracle ArchitectureNeeraj Singh
 
Partitioning tables and indexing them
Partitioning tables and indexing them Partitioning tables and indexing them
Partitioning tables and indexing them Hemant K Chitale
 
Introduction to distributed database
Introduction to distributed databaseIntroduction to distributed database
Introduction to distributed databaseSonia Panesar
 
Less01 architecture
Less01 architectureLess01 architecture
Less01 architectureAmit Bhalla
 
Vdi how-it-works618
Vdi how-it-works618Vdi how-it-works618
Vdi how-it-works618shiva2shetty
 
Oracle Database Introduction
Oracle Database IntroductionOracle Database Introduction
Oracle Database IntroductionChhom Karath
 
05. performance-concepts
05. performance-concepts05. performance-concepts
05. performance-conceptsMuhammad Ahad
 

What's hot (20)

Oracle DBA
Oracle DBAOracle DBA
Oracle DBA
 
Power BI: From the Basics
Power BI: From the BasicsPower BI: From the Basics
Power BI: From the Basics
 
Basic oracle-database-administration
Basic oracle-database-administrationBasic oracle-database-administration
Basic oracle-database-administration
 
Less08 users
Less08 usersLess08 users
Less08 users
 
Optimizing E-Business Suite Storage Using Oracle Advanced Compression
Optimizing E-Business Suite Storage Using Oracle Advanced CompressionOptimizing E-Business Suite Storage Using Oracle Advanced Compression
Optimizing E-Business Suite Storage Using Oracle Advanced Compression
 
Landing Self Service Analytics using Microsoft Azure & Power BI
Landing Self Service Analytics using Microsoft Azure & Power BILanding Self Service Analytics using Microsoft Azure & Power BI
Landing Self Service Analytics using Microsoft Azure & Power BI
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Introduction to databases
Introduction to databasesIntroduction to databases
Introduction to databases
 
Less07 storage
Less07 storageLess07 storage
Less07 storage
 
Power BI / AAS Model Optimization
Power BI / AAS Model OptimizationPower BI / AAS Model Optimization
Power BI / AAS Model Optimization
 
Oracle Architecture
Oracle ArchitectureOracle Architecture
Oracle Architecture
 
Partitioning tables and indexing them
Partitioning tables and indexing them Partitioning tables and indexing them
Partitioning tables and indexing them
 
Introduction to distributed database
Introduction to distributed databaseIntroduction to distributed database
Introduction to distributed database
 
Less01 architecture
Less01 architectureLess01 architecture
Less01 architecture
 
ORACLE ARCHITECTURE
ORACLE ARCHITECTUREORACLE ARCHITECTURE
ORACLE ARCHITECTURE
 
Vdi how-it-works618
Vdi how-it-works618Vdi how-it-works618
Vdi how-it-works618
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Oracle Database Introduction
Oracle Database IntroductionOracle Database Introduction
Oracle Database Introduction
 
Storage
StorageStorage
Storage
 
05. performance-concepts
05. performance-concepts05. performance-concepts
05. performance-concepts
 

Viewers also liked

Enterprise Search, Simple, Complex and Powerful
Enterprise Search, Simple, Complex and PowerfulEnterprise Search, Simple, Complex and Powerful
Enterprise Search, Simple, Complex and PowerfulFindwise
 
Backward chaining(bala,karthi,rajesh)
Backward chaining(bala,karthi,rajesh)Backward chaining(bala,karthi,rajesh)
Backward chaining(bala,karthi,rajesh)Nagarajan
 
Introduction & history of dbms
Introduction & history of dbmsIntroduction & history of dbms
Introduction & history of dbmssethu pm
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data WarehousingThomas Kejser
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...Kai Wähner
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsMark Rittman
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Health Catalyst
 

Viewers also liked (8)

Enterprise Search, Simple, Complex and Powerful
Enterprise Search, Simple, Complex and PowerfulEnterprise Search, Simple, Complex and Powerful
Enterprise Search, Simple, Complex and Powerful
 
Backward chaining(bala,karthi,rajesh)
Backward chaining(bala,karthi,rajesh)Backward chaining(bala,karthi,rajesh)
Backward chaining(bala,karthi,rajesh)
 
Introduction & history of dbms
Introduction & history of dbmsIntroduction & history of dbms
Introduction & history of dbms
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
 
Forward Backward Chaining
Forward Backward ChainingForward Backward Chaining
Forward Backward Chaining
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
 

Similar to ITI015En-The evolution of databases (I)

What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?CQD
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introductionMurli Jha
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 
The Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of dataThe Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of databcantrill
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothAdaryl "Bob" Wakefield, MBA
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lakepunedevscom
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
UNIT 5- Other Databases.pdf
UNIT 5- Other Databases.pdfUNIT 5- Other Databases.pdf
UNIT 5- Other Databases.pdfShitalGhotekar
 

Similar to ITI015En-The evolution of databases (I) (20)

What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
 
The Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of dataThe Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of data
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
data warehousing
data warehousingdata warehousing
data warehousing
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
dbms introduction.pptx
dbms introduction.pptxdbms introduction.pptx
dbms introduction.pptx
 
UNIT 5- Other Databases.pdf
UNIT 5- Other Databases.pdfUNIT 5- Other Databases.pdf
UNIT 5- Other Databases.pdf
 

Recently uploaded

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...HyderabadDolls
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...vershagrag
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 

Recently uploaded (20)

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 

ITI015En-The evolution of databases (I)

  • 1. The evolution of database technology (I) Huibert Aalbers Senior Certified Executive IT Architect
  • 2. IT Insight podcast • This podcast belongs to the IT Insight series • You can subscribe to the podcast through iTunes. • Additional material such as presentations in PDF format or white papers mentioned in the podcast can be downloaded from the IT insight section of my site at http://www.huibert-aalbers.com • You can send questions or suggestions regarding this podcast to my personal email, huibert_aalbers@mac.com
  • 3. Hierarchical databases • In the 60’s IBM launched the first computers equipped with a hard disk drive • This spurred the development of a technology to store, process and retrieve data. IMS, in 1968, became the first commercial database software, developed by IBM to inventory the very large bill of materials (BOM) for the Saturn V moon rocket and Apollo space vehicle. • IMS was the first hierarchical database
  • 4. Hierarchical databases • Hierarchical databases have a serios limitation. They only support 1 to n relationships, which make data modeling difficult • A parent can have multiple children • A child can only have a single parent
  • 5. Hierarchical databases • The most well known hierarchical databases are • IMS (still popular in large banks) • Windows registry • LDAP directories (depending on the implementation) • Hierarchical databases still have a significant performance edge over more modern relational databases
  • 6. Relational databases • In 1970, Ted Codd, a British mathematician who worked at IBM, published a paper titled “A relational model of data for large shared data banks” • His groundwork generated much interest in the information management world and spurred the creation of new companies such as Oracle (1977) or Informix (1980) that implemented Codd’s ideas. Meanwhile, IBM developed DB2, which first appeared on mainframes (1981) and later on distributed platforms.
  • 7. Relational databases • For over thirty years, relational databases have ruled the database market, based on their undeniable strengths • During that period, users have shaped the evolution of the technology by demanding new features and increased performance
  • 8. Strengths of Relational Databases • Great technology to store large volumes of structured data • The consistency of the data is guaranteed through the implementation of the ACID properties • Atomicity • Consistency • Isolation • Durability
  • 9. User requirements that have shaped modern relational databases • Increased scalability • Alibí to perform complex queries against large data sets (Data warehousing) • Support for new programming languages and types of data • Requirements inspired by trends in modern programming languages • Improved administration features to ease management of large numbers of database instances
  • 10. Increased scalability • Symmetric Multiprocessing (SMP) • IBM System 65 (1967) • UNIX (starting in the mid 80’s) • Support for multiples processor cores • Power 4 (2001) • Data partitioning • SQL query optimizer improvements • Data compression • Increased use of RAM • Clustering
  • 11. What are the relational databases bottlenecks? • I/O • SQL joins • Transactions (Locks), Distributed Transactions (Two phase commit) • Concurrency • Hardware
  • 12. Data partitioning • Hard disk drives used to be the main bottleneck which prevented quick data access. That is why a system was needed to access the data from multiple disks, in parallel. • A partitioned table has its data spread over multiple disks, based on: • Expression • Range • Round-Robin
  • 13. Data compression • Data compression allows for significant storage (and therefore money) savings. In addition, and this may sound counterintuitive, it also increases performance, since data is read much faster (with less I/O), specially when data is stored in a columnar form. Administrators can choose to compress: • Data • Indices • Blobs • Results are spectacular • Up to 80% less space needed to store the data • Up to 20% less I/O
  • 14. In-memory databases These databases store data in memory (RAM) instead of hard disk drives to scale better and support extremely high volumes transactions • This technology was originally designed to meet needs of specific industries (telcos and financial institutions primarily), that required processing unusually high volumes of transactions Recently, the line that divided in-memory databases from traditional databases has started to blur with the introductions of databases such as DB2 BLU which automatically try to make the most use of RAM to improve performance without requiring all the data to be loaded in memory BLU(Oracle Exalytics)
  • 15. Data Warehouse • The need for analyzing vast amounts of data was the first application that challenged the dominance of RDMS as the only tool required to work with data, as performance became a serious issue. • In order to avoid impacting the performance of OLTP (OnLine Transaction Processing) databases, common sense dictated that data analysis should be performed on a different data store. As a result, the process is as follows: • The data is first moved from the OLTP database to an operational data store (ODS), a repository used to transform the data before it can be used • Then, the data is moved to the databased in which the information is analyzed, the Data Warehouse (DW) • This process is at the origin of the spectacular growth in the use of ETL (Extraction, Transformation and Load) and data quality tools
  • 16. Extraction-Transformation-Load ETL tools • In Data warehouse environments, it is common to update the data regularly (usually nightly) with the latest information from the transactional systems (OLTP). In general, the data needs to be transformed before it can be loaded into the DWH. • In addition, it is still very common to exchange data between systems by sending flat files from one computer to another. • This will probably disappear over time, as we move to a world where systems need to be online at all time.
  • 17. Data Replication • In modern environments which are online 24/7, exchanging flat files to share information among systems is not a viable solution. • As soon as a change happens in one database, this needs to be reflected in the other repositories that require the información. • The data replication needs to be guaranteed, even if one of the repositories is momentarily off-line • Most databases include some kind of built-in replication functionality, but it is usually limited in scope, i.e. not allowing replication between databases from different brands.
  • 18. Data cleansing and enrichment • Analyzing dirty data is simply not possible. It needs to be cleansed. • Standardize data (addresses, names, etc.) • Eliminate duplicates, erroneous data, etc. • Further, deeper analysis can be performed when the data is enriched with additional data • Geocoding (distance to store) • Demographics (age, sex, marital status, estimated income, house ownership, attended college, political leaning, charitable giving, number of cars owned, etc.)
  • 19. Data Warehouse • Data is usually kept in a star schema, a special case of the snowflake schema, which is effective to handle simpler DWH queries • Fact tables are at the centre of the schema and surrounded by the dimensions tables. • These tables are usually not normalized, for performance reasons. Referential integrity is not a concern as the data is usually imported from databases that enforce referential integrity. • Specialized DWH databases can load data very quickly, run queries very fast by using specialized indices and execute critical operations such as building aggregate tables in optimized ways BLU
  • 20. Database clustering If web servers can scale horizontally, why can’t relational databases do the same? Couldn’t we share the workload among multiple computer nodes? In order to achieve that, computer scientists have created two distinct architectures • Shared disk • Multiple instances of the database, all pointing to a single copy of de los datos • Shared Nothing • Multiples instances of the database, each one owning part of the data set (the data is partitioned)
  • 21. Shared Disk • Pros • If one database instance or even a computing node fails, the system keeps working • Good performance when reading data, even though the shared disk can become a bottleneck • Cons • Write operations become the main bottleneck (specially when using more than two nodes), because all the nodes need to be coordinated • This can be mitigated by partitioning the data • If the shared disk fails, the whole system fails • Recovery after a node fails is a lengthy operation pureScale
  • 22. Shared Nothing XPSEEE/DPF • Pros • In general, write operations are extremely fast • Scales linearly • Cons • Read operations can be slower when queries execute joins on data residing on different disks • This also applies, to a lesser extent, to write operations on data residing on multiple disks • If a computing node or its disk fails, all that data becomes unavailable
  • 23. Data marts • The first DWH grew extremely quickly until they became too hard to manage • That is why organizations started to build specialized Data marts by function (HR, finance, sales, etc.) or department • In order to avoid creating information silos, all data marts should use the same dimensions • This is usually enforced by the ETL tools
  • 24. OLAP cubes The data is stored in a repository using a star schema, which in turn is used to build a multidimensional cube to analyze the information through multiple dimensions (sales, regions, time periods, etc.)
  • 25. MOLAP / ROLAP tools • MOLAP tools (Multidimensional OLAP) load data into a cube, on which the user can quickly execute complex queries • ROLAP tools (Relational OLAP) transform user queries into complex SQL queries that are executed on a relational database • This requires a relational database that has been optimized to handle data warehouse type queries • In addition, to improve performance, aggregate tables need to be built
  • 26. MOLAP tools • Pros • In most cases delivers better performance, due to index optimization, data cache and efficient storage mechanisms • Lower disk space usage due to efficient data compression techniques • Aggregate data is built automatically • Cons • Loading large data sets can be slow • Working with models that include large amounts of data and a large number of dimensions is highly inefficient
  • 27. ROLAP tools • Pros • Usually scales better (dimensions and registers) • Loading data with a robust ETL usually is much faster • Cons • Generally offers worse performance when both MOLAP and ROLAP tools can perform the job. This can however be mitigated by using ad-hoc database extensions (for example DB2 cubes) • Depends on SQL. In some cases, this does not translate well for some particular use cases (budgeting, financial reporting, etc.) • Uses much more space on disk
  • 28. HOLAP tools HOLAP (Hybrid Online Analytical Processing) is a combination of ROLAP and MOLAP With this technology it becomes possible to store part of the data in a MOLAP repository and the remainder information in a ROLAP one in order to choose the best strategy for each case. For example: • Keep large tables with the detailed data in a relational database • Keep agregate data in a MOLAP repository
  • 29. Hardware (Appliances) In order to obtain the best performance from the software and simplify the database management, some manufacturers have opted for developing integrated hardware and software soluciones (a.k.a. appliances) • It simplifies configuration (loading data, index and schema creation, etc). • It simplifies maintenance (standard components and streamlined support) • It allows to get the most performance out of the hardware by using specialized chips and optimized storage devices
  • 30. Hardware (CPU) • The IBM Power 8 micro processor was specially designed to excel at data processing applications • Large memory cache (512k for each core, 96MB shared L3 cache and a 128MB L4 cache outside the chip) • 8 threads per core, 12 cores per chip (96 threads per chip) • Up to 5GHz
  • 31. Columnar databases BLU In Business Analytics environments, it is very unlikely that all the columns of a register will be required as part of the result of a query or in the WHERE clause • Having the data organized by columns instead of by register (rows) allows to significantly improve query times because usually much less information has to be read from disk • Modern databases such as DB2 BLU have been designed to excel both in OLTP as well as in OLAP environments. That means that DBAs can choose at database or table creation time how the data will be stored on disk (columns or rows)
  • 32. Support for new data types During the 90s, developers started to ask for expanded datatype support in relational databases • Distinct types based on existing types • STRUCT like composed types • Completely new data types with their own indexing methods (videos, pictures, sound) • Time series • Coordinates (2D, 3D) • Text documents • XML, Word, PDF, etc. • Etc.
  • 33. Requirements inspired by trends in modern programming languages • Inheritance • Tables and types that inherit part of their structure from other tables/types • Polymorphism • More flexibility to define/overload functions, stored procedures and operators • Stored procedures written in modern programming languages
  • 34. Object-relational databases • Illustra was company that developed an object relational databased that pioneered many of these interesting concepts that came primarily from Java and Smalltalk • Informix acquired Ilustra and integrated these novel ideas into version 9.x of its flagship IDS database • Later on, DB2 and Oracle also implemented some of those ideas • Mapping Java objects to a relational database (O/R mapping) is a different issue that can be solved using object persistence libraries
  • 35. Improved database management • The more options a DBA has to tune the system, the better are his chances to get the most performance out of the system • However, as we provide more knobs to tune the system, the DBA’s job becomes more and more complex, specially in large datacenter where a single DBA may be responsible for hundreds of database instances • The solution to this problem is Autonomic Computing, which allows the database to tune itself, based on rules that result from experience
  • 36. Relational databases have evolved, a lot Despite the fact that just before the data explosion resulting from the Web 2.0 phenomenon, some large enterprises still used niche databases to cope with the limitations of relational databases in some edge use cases, the fact is that in most cases the most advanced database products (such as DB2, Oracle and Informix) had been very successful evolving very quickly in order to solve virtually all emerging information management problems, and therefore avoiding to have their privileged position be threatened in any significant way by new products.
  • 37. Contact information On Twitter: @huibert (English), @huibert2 (Spanish) Web site: http://www.huibert-aalbers.com Blog: http://www.huibert-aalbers.com/blog