AWS Community Day CPH - Three problems of Terraform
Five database trends - updated April 2015
1. REMINDER
Check in on the
COLLABORATE mobile app
Top 5 Trends in Database
Technology
Guy Harrison,
Executive Director, Information Mgt R&D,
Dell Software Group
Session ID#: 995
@guyharrison
2. Top 5 Trends in
Database
Technology
Guy Harrison,
Executive Director, Information Mgt R&D,
Dell Software Group
11. Dell Software Group11
History of databases
Magnetic tape
“flat” (sequential) files
Pre-computer
technologies:
Printing press
Dewey
decimal
system
Punched cards
Magnetic Disk
IMS
Relational
Model
defined
Indexed-Sequential
Access Mechanism
(ISAM)
Network Model
IDMS
ADABAS
System R
Oracle V2
Ingres
dBase
DB2
Informix
Sybase
SQL Server
Access
Postgres
MySQL
Cassandra
Hadoop
Vertica
Riak
HBase
Dynamo
MongoDB
Redis
VoltDB
Hana
Neo4J
Aerospike
Hierarchical model
1960-701940-50 1950-60 1970-80 1980-90 1990-2000 2000-2010
12. Dell Software Group12
Why?
• 3rd Platform drives
new demands on
the database:
– Global High
Availability
– Data volumes
– Unstructured data
– Transaction rates
– Latency
• A single architecture
cannot meet all
those demands
13. Dell Software Group13
Operational
RDBMS
(Oracle, SQL
Server, …)
In-memory
Analytics
(HANA,
Exalytics …)
In-memory
processing
(Spark)
Hadoop
Web DBMS
(MySQL,
Mongo,
Cassandra)
ERP & in-
house CRM
Analytic/BI
software
(SAS,
Tableau)
Web Server
Data
Warehouse
RDBMS
(Oracle,
Terradata …)
It takes all sorts
24. Dell Software Group24
More Data
• Storing all data – including machine generated and
sol, Social, community, demographic data in
original format – for ever
To More Effect
• Smarter use of data (data science) to achieve
competitive or human benefit
25. Dell Software Group25
More Data
• Storing all data – including machine generated
and sol, Social, community, demographic data in
original format – for ever
To More Effect
• Smarter use of data (data science) to achieve
competitive or human benefit
39. Dell Software Group40
Hadoop is the most concrete Big Data
technology
Toad: your companion in
the Big Data revolution
40. Dell Software Group41
More Data
• Storing all data – including machine generated and
sol, Social, community, demographic data in
original format – for ever
To More Effect
• Smarter use of data (data science) to achieve
competitive or human benefit
41. Dell Software Group42
More Data
• Storing all data – including machine generated and
sol, Social, community, demographic data in
original format – for ever
To More Effect
• Smarter use of data (data science) to achieve
competitive or human benefit
42. Dell Software Group43
Big Data Analytics
AKA Data Science
Machine Learning
• Programs that evolve
with “experience
Predictive
Analytics
• Programs that
extrapolate from
past to future
Collective
Intelligence
• Programs that use
inputs from
“crowds” to
simulate
intelligence
47. Dell Software Group49
CAP Theorem says something has to give
• CAP (Brewer’s) Theorem
says you can only have
two out of three of
Consistency, Partition
Tolerance, Availability Consistency
• Everyone always sees
the same data
Availability
• System stays up
when nodes fail
Partition
Tolerance
• System stays up
when network
between nodes fail
Oracle RAC
lives here
NO
GO
Most NoSQL lives
here
48. Dell Software Group50
Major influences on non-relational
• Eventually consistent transaction model
• Consistent hashing
Amazon Dynamo
• Column Family model for sparse distributed
columnar data
Google BigTable
• Paved the way for the document database
OODBMS and XML DBs
50. Dell Software Group52
Name Site Counter
Dick Ebay 507,018
Dick Google 690,414
Jane Google 716,426
Dick Facebook 723,649
Jane Facebook 643,261
Jane ILoveLarry.com 856,767
Dick MadBillFans.com 675,230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarry.com
5 MadBillFans.com
NameId SiteId Counter
1 1 507,018
1 3 690,414
2 3 716,426
1 3 723,649
2 3 643,261
2 4 856,767
1 5 675,230
Id Name Ebay Google Facebook (other columns) MadBillFans.com
1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230
Id Name Google Facebook (other columns) ILoveLarry.com
2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767
BigTable Data Model
51. Dell Software Group53
OODBMS -1990s
• The OODBMS Manifesto
(Atkinson/Bancilhon/DeWitt/Dittrich/Maier/Zdo
nik, '90)
• "A relational database is like a garage that forces
you to take your car apart and store the pieces in
little drawers“
– Also SQL is ugly
• “A Object database is like a closet which requires
that you hang up your suit with tie, underwear,
belt socks and shoes all attached” (Dave Ensor)
http://4.bp.blogspot.com/-
IPgd1Tg8ByE/UkOzH-
g1FmI/AAAAAAAACB0/QYg8kE
Vp5_0/s1600/db4o_vs_orm.png
52. Dell Software Group54
Revenge of the Object Nerds – Document
databases
• Structured documents – XML and
JSON (JavaScript Object Notation)
become more prevalent within
applications
• Web programmers start storing
these in BLOBS in MySQL
• Emergence of XML and JSON
databases
63. Dell Software Group65
Exadata Hybrid Columnar Compression
• Provides high
compression ratio
• Manageable impact
on row read/write
operations
• Some optimization of
analytic queries
SELECT SUM(Column4)
FROM table
71. Dell Software Group76
Tiered storage management
Main Memory
DDR SSD
Flash SSD
Fast Disk (SAS, RAID 0+1)
Slow Disk (SATA, RAID 5)
Tape, Flat Files, Hadoop
$/IOP
$/GB
72. Dell Software Group77
In-Memory databases
• Cost of RAM falling
50% each 18 months.
• Some databases can
fit entirely within the
RAM of a single server
or cluster of servers
0.001
0.01
0.1
1
10
100
$1.00
$10.00
$100.00
$1,000.00
$10,000.00
$100,000.00
1990 1995 2000 2005 2010 2015 2020
Size(GB)
Cost(US$/GB)
Year
US$/GB Size (GB)
73. Dell Software Group78
Oracle Times Ten
• In-memory transactional database
• Disk-based Checkpoints and disk-
based logging
• By default, COMMITs are not durable
(writes to the transaction log are
asynchronous).
• Can configure synchronous
replication or synchronous log writes
to avoid data loss
• Columnar compression and analytic
functions in the Exalytics version
Clients
Memory
Checkpoints
Transaction
Logs
Commits
Point in time
snapshot
74. Dell Software Group79
SAP Hana
Memory
Row Store
Column store
Delta store
Persistence Layer
Savepoints
Data files
Txn logs
Note: Table must be either row or column – not both
78. Dell Software Group83
VoltDB
• Single threaded access
to memory: no
latch/mutex waits
• Transactions in self-
contained stored
procedures: minimal
locking
• K-Safety for COMMIT:
No sync waits
CPU
In-memory
Partition
CPU
In-memory
Partition
CPU
In-memory
Partition
CPU
In-memory
Partition
CPU
In-memory
Partition
CPU
In-memory
Partition
Clients Clients Clients
79. Dell Software Group84
Spark (sort of) in-memory Hadoop
• In Memory compute
• HDFS compatible
• Libraries for data processing,
machine learning, streaming,
SQL, etc
• Python and Scala interfaces
• Part of the Berkeley Data
Analytic Stack
• Integrating into all Hadoop
distributions (and Cassandra)
HDFS
Tachyon – in memory
File system
Spark: in-memory distributed compute
Spark
Streaming
Mlib
Machine
Learning
SparkSQL
Mesos Cluster manager
80. Dell Software Group85
Data files
Oracle 12c in-memory
Memory (SGA)
Row store Column Store (IMCU)
OLTP Analytics
(SMU)
database Column store
Redo Logs
85. Dell Software Group90
Dell In-Memory Appliances for Cloudera Enterprise
Mid-Size Configuration
16 Node Cluster
R720- 4 Infrastructure Nodes
R720XD- 12 Data Nodes
Force10- S4810P
Force10- S55
~528TB (disk raw space)
~4.5 TB (raw memory)
Starter Configuration
8 Node Cluster
R720- 4 Infrastructure Nodes
R720XD- 4 Data Nodes
Force10- S55
~176TB (disk raw space)
~1.5TB (raw memory)
Small Enterprise
Configuration
24 Node Cluster
R720- 4 Infrastructure Nodes
R720XD- 20 Data Nodes
~880TB (disk raw space)
~7.5 TB (raw memory)
Expansion Unit- R720XD-4 Data, Cloudera Enterprise Data Hub, Scale in Blocks
86. Dell Software Group91
Dell appliances for any database
• Dell provides appliances and reference
architectures specifically designed for:
– Oracle
– SQL Server
– HANA
– SSD database acceleration
– Large memory footprints
87. Dell Software Group92
• Success in Big Data requires
capabilities at multiple
technology levels: hardware,
software infrastructure,
business intelligence and
analytics
• Only Dell can deliver
capabilities at every technology
layer
• Only Dell’s solutions are
designed and priced to suit
mid-market initial deployments
and to scale to the largest
enterprise
Data Integration
Hadoop and database
software
Advanced Analytics
Business Intelligence
Server and Storage
Boomi
Boomi,
Toad Intelligence Central
Dell appliances for Hadoop,
Oracle, etc
Dell servers and storage
arrays
Toad
Data
point
Statistica
Systems Management Dell Foglight and TOAD
Big Data for the rest of us
89. Please complete the session
evaluation
We appreciate your feedback and insight
You may complete the session evaluation either
on paper or online via the mobile app
Editor's Notes
When you think about Dell you probably think about laptops
Or servers that might run databases or a Hadoop cluster, but you probably don't think of Dell as having expertise in either Oracle or Hadoop
But actually Dell now has a billion-dollar software arm which includes the world's number one independent database tool – toad – used by millions of users and supporting almost every data platform
NoSQL tends to be strongly coupled with the application. Everybody else is out of luck
In 2000 a 1 TB database would have required 200 500 GB disks - with an aggregate IO capacity of around 2000 IO per second. Today that database could be supported in a single 1 TB disc but which would only support 200 I/O is per second