As operational database schemas become complex, users resort to denormalization to handle performance issues. This includes a range of techniques from materialized views to using MySQL as a key-value store for blobs containing full objects. While denormalization solves immediate bottlenecks, it comes at a hefty price. In this presentation Ari will explore common denormalization approaches and tradeoffs using real world examples. He will then present a solution under development at Akiban Technologies to alleviate these same problems much more efficiently, and allow users to get the best of both worlds.
The primary focus of this presentation is approaching the migration of a large, legacy data store into a new schema built with Django. Includes discussion of how to structure a migration script so that it will run efficiently and scale. Learn how to recognize and evaluate trouble spots.
Also discusses some general tips and tricks for working with data and establishing a productive workflow.
The primary focus of this presentation is approaching the migration of a large, legacy data store into a new schema built with Django. Includes discussion of how to structure a migration script so that it will run efficiently and scale. Learn how to recognize and evaluate trouble spots.
Also discusses some general tips and tricks for working with data and establishing a productive workflow.
This summer, coming to a server near you, Cassandra 3.0! Contributors and committers have been working hard on what is the most ambitious release to date. It’s almost too much to talk about, but we will dig into some of the most important, ground breaking features that you’ll want to use. Indexing changes that will make your applications faster and spark jobs more efficient. Storage engine changes to get even more density and efficiency from your nodes. Developer focused features like full JSON support and User Defined Functions. And finally, one of the most requested features, Windows support, has made it’s arrival. There is more, but you’ll just have to some see for yourself. Get your front row seat and don’t miss it!
Introduction to data modeling with apache cassandraPatrick McFadin
Are you using relational databases and wonder how to get started with data modeling and Apache Cassandra? Here is a starting tour of how to get started. Translating from the knowledge you already have to the knowledge you need to effective with Cassandra development. We cover patterns and anti-patterns. Get going today!
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
This powerpoint is from my presentation at the KDLA KY Bookmobile & Outreach Services conference held in Lexington, KY 8/31/09 - 9/01/09. I hope it can help other libraries with issues they may have, and I\'ll be happy to answer any questions you have as well!
This summer, coming to a server near you, Cassandra 3.0! Contributors and committers have been working hard on what is the most ambitious release to date. It’s almost too much to talk about, but we will dig into some of the most important, ground breaking features that you’ll want to use. Indexing changes that will make your applications faster and spark jobs more efficient. Storage engine changes to get even more density and efficiency from your nodes. Developer focused features like full JSON support and User Defined Functions. And finally, one of the most requested features, Windows support, has made it’s arrival. There is more, but you’ll just have to some see for yourself. Get your front row seat and don’t miss it!
Introduction to data modeling with apache cassandraPatrick McFadin
Are you using relational databases and wonder how to get started with data modeling and Apache Cassandra? Here is a starting tour of how to get started. Translating from the knowledge you already have to the knowledge you need to effective with Cassandra development. We cover patterns and anti-patterns. Get going today!
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
This powerpoint is from my presentation at the KDLA KY Bookmobile & Outreach Services conference held in Lexington, KY 8/31/09 - 9/01/09. I hope it can help other libraries with issues they may have, and I\'ll be happy to answer any questions you have as well!
Smooth running: ensure your systems training projects run without a hitchBrightwave Group
This practical example-packed session covers everything learning teams need to consider ahead of, and during, a systems project of any shape or size. Presented by Brightwave's Head of Production Rachel Sefton-Smith, the session covers:
• Gaining buy-in from the right internal stakeholders and teams
• Agreeing UAT environment early on
• Planning roll-out and training implementation
• Compiling a realistic schedule
• Planning and getting ahead - even if the system isn’t quite there
• Dealing with timescale challenges
This presentation was delivered by Rachel Sefton-Smith on Wednesday 28th January at Learning Technologies 2015.
Kelly C. Ruggles is a financial educator and fee-based financial planner based in Spokane, Washington. Along with nearly two decades of experience in the field of financial planning, Kelly C. Ruggles also is the founder of American Reliance Group Inc.
http://www.brightwave.co.uk/beyond-the-course
This presentation was originally delivered by Charles Gould (Managing Director, Brightwave) at Beyond The Course in Edinburgh on 12 June 2012.
About this session
Over the years Brightwave has helped many world-leading organisations tackle business challenges with e-learning. In many cases, this has taken the form of courses, the formats of which are familiar to most of us. Yet our approach to e-learning has evolved.
While the core imperatives may have remained the same (a business need, a specific audience, measurable change in behaviour and a clear message/content), the tools and resources available to our designers are proliferating. The role of the learning designer has become more complex, more wide-reaching and arguably more valuable. In this session, Charles will draw from recent experience at Brightwave, including the latest thinking from its design team to address some very current questions.
Is the course really dead? When might it still be appropriate?
How do we meet organisational needs while exploiting less formal learning?
What resources and tools are being harnessed to replace or supplement the course?
How do we enhance learning using communications, social media and mobile technology?
What new opportunities do learning designers have and how should they use them?
In what direction are organisations moving if they are moving beyond the course?
Next generation learning: How new tech are changing the gameBrightwave Group
Digital technologies have radically altered the ways that people capture and harness the skills, knowledge and information they need to do their jobs better. We're moving beyond the restrictions of a linear e-learning course into a continuously online world of resources and connections. Learning is more granular, less formal and more mobile than ever.
This seminar discusses the theory and presents striking examples of how next generation learning technologies are already working within the new learning paradigm to offer real benefits for your organisation.
Key learning points:
• Core factors influencing how we work today
• New ways of learning that tie in to learners' expectations: social, informal, mobile learning
• How to empower learners to benefit from the opportunities of the next generation learning environment
• New technologies that provide real impact to learners and organisations alike.
As well as challenging the value of formal, structured learning, L&D is starting to think more about how informal learning, relationships and performance support can be enabled for the benefit of their learners and their organisation.
From community management and the trends impacting the workplace of tomorrow, a Total Learning approach harnesses these opportunities and makes for an exciting future for technology-enabled learning.
The session featured a lively panel discussion where four experts share their views on key learning topics:
●Managing learner expectations.
● How will the necessary shift from 'course to resource' change the role of L&D?
● The opportunity to facilitate informal learning for career and professional development.
● New technologies and emerging trends.
This presentation was delivered by Meg Green on Wednesday 28th January at the Learning Technologies 2015 exhibition.
Financial planning is a necessity for people who are nearing the retirement age in order to live comfortably without depending on anyone for their financial needs.
Kelly C. Ruggles is a financial educator and fee-based financial planner based in Spokane, Washington. Along with nearly two decades of experience in the field of financial planning, Kelly C. Ruggles also is the founder of American Reliance Group Inc.
Breakthrough performance with MySQL Cluster (2012)Frazer Clement
Presentation from the MySQL Connect conference in San Francisco 2012.
Describes cluster architecture and impacts on performance, benchmarking, analysing and techniques for improving performance.
Conference slides: MySQL Cluster Performance TuningSeveralnines
This presentation goes through performance tuning basics in MySQL Cluster.
It also covers the new parameters and status variables of MySQL Cluster 7.2 to determine issues with e.g disk data performance and query (join) performance.
Python Utilities for Managing MySQL DatabasesMats Kindahl
Managing a MySQL database server can become a full time job. What we need are tools that bundle a set of related tasks into a common utility. While there are several such utility libraries to choose, it is often the case that you need to customize them to your needs. The MySQL Utilities library is the answer to that need. It is open source so you can modify and expand it as you see fit.
This is the presentation from OSCON 2011 in Portland.
Building and deploying large scale real time news system with my sql and dist...Tao Cheng
Maintaining a constantly updated large data set alone is a big challenging not only to database administrators but also to developers as it is hard to maintain and expand. It adds more stress when the requirement is to serve real time data to heavy traffic websites.
In this presentation, we first examine the initial characteristics of AOL’s Real Time News system, the design strategy, and how MySQL fits into the overall architecture. We then review the issues encountered and the solutions applied when the system characteristics changed due to ever growing data set size and new query patterns.
In addition to common MySQL design, trouble-shooting, and performance tuning techniques, we will also share a heuristic algorithm implemented in the application servers to reduce the response time of complex queries from hours to a few milliseconds.
SQLFire is VMware's in-memory distributed NewSQL database.
I delivered this preso in connection with Jags, the product architect and we covered the design choices SQLFire makes to achieve extreme scalability, as well as the connection between big data and fast data.
The deck looks a little different in presenter mode so for best results download and enjoy.
Trivadis TechEvent 2017 Oracle to My SQL Migration - Challenges by Robert Bia...Trivadis
Migration from one database engine to another one for existing applications is not a trivial task. There are a lot of technical challenges, including but not limited to database features, object/data types and different behavior. In this presentation we would like to talk about a customer PoC, with the goal to migrate as many databases from Oracle to MySQL as possible. This includes categorization of several hundred Oracle databases into migration classes, based on custom criteria, as well as tool development to support schema (DDL) conversion and data migration.
SQLFire is a memory-optimized distributed SQL database from VMware. SQLFire is built for applications that need higher speed and lower latency than traditional databases can offer, but also require strong support for querying and transactions.
This webinar introduces the basics of SQLFire, including a discussion of why traditional databases are not scalable enough to deal with the demands of modern applications. I cover some of the extensions SQLFire makes to the SQL standard in order to be a truly horizontally-scalable SQL database.
The demo presented with the webinar shows how SQLFire can transparently scale to processes requests faster. In the demo a number of inserts are made, but not before a complex validation processes is done on the data being inserted. As a result the inserts are very slow. With SQLFire though you can simply add or remove nodes at any time, so if you anticipate a period where you need more processing power you can add a node and process inserts faster. SQLFire is designed to be horizontally scalable in all features, so you can scale not only inserts but also queries, transactions, etc.
Full source code for the demo is available (see the slides for details).
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreDataWorks Summit
Today, Yahoo! uses Hive in many different spaces, from ETL pipelines to adhoc user queries. Increasingly, we are investigating the practicality of applying Hive to real-time queries, such as those generated by interactive BI reporting systems. In order for Hive to succeed in this space, it must be performant in all aspects of query execution, from query compilation to job execution. One such component is the interaction with the underlying database at the core of the Metastore.
As an alternative to ObjectStore, we created OracleStore as a proof-of-concept. Freed of the restrictions imposed by DataNucleus, we were able to design a more performant database schema that better met our needs. Then, we implemented OracleStore with specific goals built-in from the start, such as ensuring the deduplication of data.
In this talk we will discuss the details behind OracleStore and the gains that were realized with this alternative implementation. These include a reduction of 97%+ in the storage footprint of multiple tables, as well as query performance that is 13x faster than ObjectStore with DirectSQL and 46x faster than ObjectStore without DirectSQL.
Apache Hive is a rapidly evolving project which continues to enjoy great adoption in the big data ecosystem. As Hive continues to grow its support for analytics, reporting, and interactive query, the community is hard at work in improving it along with many different dimensions and use cases. This talk will provide an overview of the latest and greatest features and optimizations which have landed in the project over the last year. Materialized views, the extension of ACID semantics to non-ORC data, and workload management are some noteworthy new features.
We will discuss optimizations which provide major performance gains as well as integration with other big data technologies such as Apache Spark, Druid, and Kafka. The talk will also provide a glimpse of what is expected to come in the near future.
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...ArangoDB Database
Even though most NoSQL databases follow
the "schemafree" data paradigma, it is still import to choose the right data model to make the best of the underlying database technology. This talk provides an overview of
the different data storage models available in popular NoSQL databases. It also introduces some best practices on how to model your data for both best performance and best querying.
Solving performance problems in MySQL without denormalization
1. Solving Performance Problems in MySQL Without Denormalization
RENORMALIZE
Akiban Technologies, Inc. Confidential & Proprietary
2. Problem Statement
Schemas scale out
Data volume grows
Joins become a real bottleneck
Akiban Technologies, Inc. Confidential & Proprietary 2
3. Two Common Manifestations
SQL Joins
Queries become slower as more tables are
joined.
Application Object Creations
Constructing an object is as expensive as
SELECTing the sum of its parts
Denormalize. Problem solved.
Akiban Technologies, Inc. Confidential & Proprietary 3
4. Application Growing Pains
Web Cache
Server Server
V6 Release
V5
V4
V3
V1
V2
Rip & ReplaceDB
Shard Database
Add Customers!
Get Caching
Replicate DB
De-normalize
Complexity & Cost
Customers
MySQL
Rip & Replace Database Architecture
MySQL MySQL
Slaves
MySQL Sharding
?
MySQL
Time
4
5. De·nor·mal·ize
[de-nawr-muh-lahyze]
verb, -ized, -iz·ing.
–verb (used with object)
1. the process of attempting to optimize the read
performance of a database by adding redundant
data or by grouping data wikipedia
2. Denormalize means to allow redundancy in a
table so that the table can remain flat UCSD Blink
3. The process of restructuring a normalized data
model to accommodate operational constraints or
system limitations celiang.tongji.edu.cn
Akiban Technologies, Inc. Confidential & Proprietary 5
6. Materialized Views
Persistent database object
Contains the results of a query
Store summary and pre-joined tables
Require maintenance/refresh for dynamic data
SELECT
DISTINCT(n.nid),n.sticky,n.title,n.created
FROM node n
INNER JOIN term_node tn0
ON n.vid = tn0.vid
WHERE n.status = 1
AND tn0.tid IN (77)
ORDER BY n.sticky DESC, n.created DESC
LIMIT 0, 25;
Result: using where, using filesort
Akiban Technologies, Inc. Confidential & Proprietary 6
7. Drupal Materialized View Project
CREATE TABLE `mv_drupalorg_node_by_term` (
`entity_type` varchar(64) NOT NULL,
`entity_id` int(10) unsigned NOT NULL DEFAULT '0’,
`term_tid` int(10) unsigned NOT NULL DEFAULT '0',
`node_sticky` int(11) NOT NULL DEFAULT '0',
`last_node_activity` int(11) NOT NULL DEFAULT '0',
`node_created` int(11) NOT NULL DEFAULT '0',
`node_title` varchar(255) NOT NULL DEFAULT '’,
PRIMARY KEY (`entity_type`,`entity_id`,`term_tid`),
KEY `activity`
(`term_tid`,`node_sticky`,`last_node_activity`,`node_created`),
KEY `creation` (`term_tid`,`node_sticky`,`node_created`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
SELECT DISTINCT entity_id AS nid, node_sticky AS sticky, node_title
AS title,
node_created AS created
FROM mv_drupalorg_node_by_term
WHERE term_tid IN (77)
ORDER BY node_sticky DESC, node_created DESC
LIMIT 0, 25;
Result: using where, using temporary table
Akiban Technologies, Inc. Confidential & Proprietary 7
8. Denormalization Technique Listing
Technique Pros Cons
Materialized views Faster queries (no joins) Data explosion
Manually keep synched
Store object as Blob Fast object get No modeling, or querying
Denormalize 1NF: Folding Data in one row limited # of child rows
parent-child into parent table Hard to query (UNION hell)
Denormalize 2NF to 1NF: repeat Avoid join Data explosion
columns from 1 table in M table Manually keep synched
(Double writing)
Adding derived columns Avoid joins, aggregation Manually keep synched
Property bag (RDF) Schema flexibility Manage schema in app
Akiban Technologies, Inc. Confidential & Proprietary Hard to index or perform 8
9. Renormalization
Join for free
- Improved performance. 10-100x!
- Retrieve an object in one request
Akiban Technologies, Inc. Confidential & Proprietary 9
10. Introduction to Table-Groups
Traditional SQL
Schema à Table à Column
Akiban newSQL
Schema à GROUP à Table à Column
Table-Groups are first class citizens
Akiban Technologies, Inc. Confidential & Proprietary 10
11. Typical Relational DB Schema
Akiban Technologies, Inc. Confidential & Proprietary 11
13. Table-Groups Eliminate Joins
Logical
Physical
Users Users_Roles Sessions
Artist Table-group
uid name pass
id rid
id sid timestamp
1 rriegel *** 1 1 1 19390 2011-10-01-06:02.00
2 twegner *** 1 2 2 22828 2011-10-04-22:32.10
2 1 1 49377 2011-10-04-16:07.30
Table Group
Table Table
bTree bTree bTree
Akiban Technologies, Inc. Confidential & Proprietary 13
14. Benefits of Table-grouping
SQL join operations are fast
- Table Group access is equivalent to a
single table access. Joins are free!
- Performance increases 10-100x
Applications do not change
- Maintain the same tables and SQL
- Objects (e.g. ORM) fetched in one request
- Akiban uses standard MySQL replication
Akiban Technologies, Inc. Confidential & Proprietary 14
15. Design Partner Sample Query
SELECT t1.id , t3.c1,
t3.c2, t3.c3, t3.c4
FROM t1
INNER JOIN t2 on t2.id = t1.id
LEFT JOIN t3 ON t1.id = t3.id
WHERE t2.region in (1297789)
AND t1.c1 = '0'
ORDER BY t1.latestLogin DESC
LIMIT 500
Akiban Technologies, Inc. Confidential & Proprietary 15
16. Typical MySQL EXPLAIN Plan
10 Project Results
Sort 9
Temp Table 8
2 Joins 7
4 6 2 Table Accesses
2 3 5
1 3 Index Accesses
Akiban Technologies, Inc. Confidential & Proprietary
17. Efficiency for Speed and Scale
No Joins,
Project Results 3 Temp Tables or
Sorts!
1 Group Access 2
1 Group Index Access Typical MySQL EXPLAIN Project Results
Sort
Temp Table
1
2 Joins
2 Table Accesses
3 Index Accesses
Akiban Technologies, Inc. Confidential & Proprietary
19. Object Creation Query Stream
SELECT * FROM t1 Where u.uid=1387
SELECT * FROM t2 Where as.uid=1387
SELECT * FROM t3 Where os.uid=1387
SELECT * FROM t4 Where pm.uid=1387
SELECT * FROM t5 Where pl.uid=1387
SELECT * FROM t6 Where pa.uid=1387
...
...
Akiban Technologies, Inc. Confidential & Proprietary 19
20. Becomes Single ORM Request
SELECT * ,
(SELECT * FROM t2 where as.uid=u.uid),
(SELECT * FROM t3 where as.uid=u.uid),
...
FROM t1 Where u.uid=1387;
Or simply:
get my_schema:t1:uid=1387
Akiban Technologies, Inc. Confidential & Proprietary 20
21. Object Access in One Request
Akiban Technologies, Inc. Confidential & Proprietary 21
22. Application Integration
Data replicated to Akiban Fully independent server
HA Redirect Enabled
MySQL Master Akiban Server
MySQL adapter
Replication
MyISAM / InnoDB
Storage
Write Operations Problem Queries
Akiban Technologies, Inc. Confidential & Proprietary 22
23. Akiban is looking for Design Partners!
Do you have
• Slow multi-join read queries?
• User concurrency or data volume challenges?
http://www.akiban.com/design-partner-program
Akiban Technologies, Inc. Confidential & Proprietary 23
24. Ah, so you’re…
Denormalizing…no.
- Schema doesn’t change
- Data is stored once, more efficiently
Materializing Views…no.
- No triggers or post-processing
- No 2ndary logical objects
Introducing Write Latency…no.
- Previous design partner showed 2x write
improvement
Akiban Technologies, Inc. Confidential & Proprietary 24
25. Table-Grouping: A Closer Look
Artist Each table maintains its own bTree
id name gender
Indexes add their own bTrees
1 Lennon M
• Covering index
2 Joplin F
• Index on frequently joined columns
Covering
• Index on common sort order
Index
Join Cols
Index Sort
Order
Index
How many indexes do you maintain?
• Slow updates == reduced concurrency
Table • More resources == more overhead
bTree
• Ongoing maintenance == high TCO
Akiban Technologies, Inc. Confidential & Proprietary 25