★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
high performance databases
1. Presenter : mehdi varse
Varse.mehdi@gmail.com
High performance databases
in the name of god
1
2. outline
• performance metrics
• Explain the issue
• Database Tuning
• In-memory database
• parallel database systems
• Distributed database systems
• New High-Performance databases
• High-Performance Database requirements
2
3. The Problem
• at least 2.5 Exabyte of data is produced every day
• Google processes 3.5 billion requests per day
• Registration of one million customer transactions every hour at
Wal-Mart
• Updates/Posts :
Google: 34,000 searches per second
Yahoo: 3,200 searches per second
• Facebook status updates: 700 per second
• Twitter tweets: 600 per second
• Buzz posts: 55 per second
3
4. performance metrics
to Monitor in Enterprise Applications
• Business Transactions
• Query Performance
• User and Query Conflicts
• Capacity
• Configuration
• NoSQL Databases
4
5. Database Tuning
5
Database Tuning is the activity of making a
database application run more quickly. “More
quickly” usually means higher throughput, though
it may mean lower response time for time-critical
applications.
6. 6
Application
Programmer
(e.g., business analyst,
Data architect)
Sophisticated
Application
Programmer
(e.g., SAP admin)
DBA,
Tuner
Hardware
[Processor(s), Disk(s), Memory]
Operating System
Concurrency Control Recovery
Storage SubsystemIndexes
Query Processor
Application
7. Memory tuning
• The main memory is the one of most important features that
affect database performance
7
8. Query Cache
• The query cache stores results of SELECT queries
• It is useful if the change is small
• Sample :
on a Linux Alpha 2×500MHz system with 2GB RAM and a 64MB query cache:
Searches for a single row in a single-row table are 238% faster with the query
cache than without it
8
9. Database caching
• Database caching is a process included in the design of computer
applications
• database caching is used to achieve high scalability and
performance.
• Database caching improves scalability by distributing query
workload from backend to multiple cheap front-end systems.
9
11. In-memory database
• An in-memory database system is a database management system
that stores data entirely in main memory.
• Used in Applications where response time is critical
Sqlite in memory: rc = sqlite3_open(":memory:", &db);
• in-memory databases will be able to run at full speed and maintain
data in the event of power failure.
• Sample of in-memory databases:
Redis(VMware / Pivotal Software - 2009)
SQLite
11
12. parallel database systems
goals parallel database systems :
high performance
Scalable
fault tolerant database management
three key components of a high performance parallel DBMS:
data partitioning strategies
algorithms for parallel processing of a join operator
Need a framework that controls the placement of data
Examples : Oracle parallel Server , IBM’s DB2 parallel Edition
12
14. Designing distributed database systems
• It may be stored in multiple computers, located in the same physical
location; or may be dispersed over a network of interconnected computers
• Unlike parallel systems, in which the processors are tightly coupled and
constitute a single database system, a distributed database system consists
of loosely coupled sites that share no physical components.
14
15. NoSQL Databases
• originally referring “no sql” OR “not only sql”
• designed to manage the scalability and performance issues
• support eventual consistency rather than ACID
• divided into four categories :
I. Key-value stores such as redis
II. document databases such as mongodb
III. graph databases such as neo4j
IV. column-oriented databases such as cassandra
15
16. High-Performance Database requirements
• Select one or more database with respect to your data types
• According to the selected database provide hardware
platforms(memory,disc and cpu)
• Use high speed network to connect nodes If you want to use the
distributed database
• Tune your database for optimal use of resources
• optimize your queries
16
17. Review data stores used in Facebook
• MYSQL:
storage such as wall posts, user information, timeline etc
This data is replicated between their various data centers.
• MEMCACHED:
Facebook makes heavy use of Memcached
a memory caching system to reduce reading time
• HAYSTACK:
each uploaded photo, Facebook generates and stores four images of different sizes
current growth rate is 220 million new photos per week
Implements a HTTP based photo server which stores photos in a generic object store
called Haystack
17
18. Review databases used in Facebook
• CASSANDRA:
The Apache Cassandra database is the right choice when you need scalability and
high-availability without compromising performance
Facebook uses it for its Inbox search.
18
20. Cassandra
Architecture Overview
• Cassandra was designed with the understanding that
system/hardware failure can and do occur.
• Peer-to-peer, distributed system
• All nodes the same
• Custom data replication to ensure fault tolerance
• Read/Write-anywhere design
20
21. Conclusion
• In Small Scales relational databases act better than nosql databases
• If you need to execute complex queries, relational databases is best
choose
• If you need to large scale or distributed database you can use the
nosql databases
21
22. References
• Jose M. Faleiro, Daniel J. Abadi, “FIT: A Distributed Database Performance Tradeoff”, IEEE,2015
• KLAUS ELHARDT , “A Database Cache for High Performance and Fast Restart in Database Systems”
22
25. performance metrics
Business Transactions
Business Transactions provide insight into real user behavior: they capture real-
time performance that real users are experiencing as they interact with your
application. involves capturing the response time of a business transaction
25
26. performance metrics
Query Performance
• Selecting More Data Than Needed
• Inefficient Joins Between Tables
• Too Few or Too Many Indexes
• Too Much Literal SQL Causing Parse Contention
The most obvious place to look for poor query performance is in the query
itself. Problems can result from queries that take too long to identify the
required data or bring the data back. Look for the following issues in queries.
26
27. performance metrics
User and Query Conflicts
• Page/row Locking Due to Slow Queries
• Transactional Locks and Deadlocks
• Batch Activities Causing Resource Contention for
Online Users
Databases are designed to be multi-user, but the activities of multiple users
can cause conflicts.
27
28. performance metrics
Capacity
• Not Enough CPUs or CPU Speed Too Slow
• Slow Disk
• Full or Misconfigured Disks
• Not Enough Memory
• Slow Network
Not all database performance issues are database issues. Some problems
result from running the database on inadequate hardware.
28
29. performance metrics
Configuration
• Buffer Cache Too Small
• No Query Caching
• I/O Contention Due to Temporary Table Creation on
Disk
Every database has a large number of configuration settings. Default values
may not be enough to give your database the performance it needs.
29
30. performance metrics
NoSQL Databases
• Finicky Transactions
• Complex Databases
• Consistent JOINS
• Flexibility in Schema Design
• Resource Intensive
NoSQL has much appeal because of its ability to handle large amounts of data
very rapidly. However, some disadvantages should be assessed when weighing
if NoSQL is right for your use-case scenario.
30