The document discusses scalability issues with databases and proposes solutions. It introduces the concept of parallel databases which improve performance through linear scaling of reads, writes, joins and other operations. ParElastic is introduced as a parallel database architecture built on MySQL that addresses scalability through database virtualization and horizontal scaling in a way that is elastic and transparent to applications.
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
The unprecedented scale at which data is consumed and generated today has shown a large demand for scalable data management and given rise to non-relational, distributed "NoSQL" database systems. Two central problems triggered this process: 1) vast amounts of user-generated content in modern applications and the resulting requests loads and data volumes 2) the desire of the developer community to employ problem-specific data models for storage and querying. To address these needs, various data stores have been developed by both industry and research, arguing that the era of one-size-fits-all database systems is over. The heterogeneity and sheer amount of these systems - now commonly referred to as NoSQL data stores - make it increasingly difficult to select the most appropriate system for a given application. Therefore, these systems are frequently combined in polyglot persistence architectures to leverage each system in its respective sweet spot. This tutorial gives an in-depth survey of the most relevant NoSQL databases to provide comparative classification and highlight open challenges. To this end, we analyze the approach of each system to derive its scalability, availability, consistency, data modeling and querying characteristics. We present how each system's design is governed by a central set of trade-offs over irreconcilable system properties. We then cover recent research results in distributed data management to illustrate that some shortcomings of NoSQL systems could already be solved in practice, whereas other NoSQL data management problems pose interesting and unsolved research challenges.
If you'd like to use these slides for e.g. teaching, contact us at gessert at informatik.uni-hamburg.de - we'll send you the PowerPoint.
Cloud Databases in Research and PracticeFelix Gessert
The combination of database systems and cloud computing is extremely attractive: unlimited storage capacities, elastic scalability and as-a-Service models seem to be within reach. This talk will give an in-depth survey of existing solutions for cloud databases that evolved in the last years and provide classification and comparison. This includes real-world systems (e.g. Azure Tables, DynamoDB and Parse) as well as research approaches (e.g. RelationalCloud and ElasTras). In practice however, there are some unsolved problems. Network latency, scalable transactions, SLAs, multi-tenancy, abstract data modelling, elastic scalability and polyglot persistence pose daunting tasks for many scenarios. Therefore, we conclude with „Orestes“ a research approach based on well-known techniques such as web caching, Bloom filters and optimistic concurrency control that demonstrates how existing cloud databases can be enhanced to suit specific applications.
The presentation begins with an overview of the growth of non-structured data and the benefits NoSQL products provide. It then provides an evaluation of the more popular NoSQL products on the market including MongoDB, Cassandra, Neo4J, and Redis. With NoSQL architectures becoming an increasingly appealing database management option for many organizations, this presentation will help you effectively evaluate the most popular NoSQL offerings and determine which one best meets your business needs.
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
The unprecedented scale at which data is consumed and generated today has shown a large demand for scalable data management and given rise to non-relational, distributed "NoSQL" database systems. Two central problems triggered this process: 1) vast amounts of user-generated content in modern applications and the resulting requests loads and data volumes 2) the desire of the developer community to employ problem-specific data models for storage and querying. To address these needs, various data stores have been developed by both industry and research, arguing that the era of one-size-fits-all database systems is over. The heterogeneity and sheer amount of these systems - now commonly referred to as NoSQL data stores - make it increasingly difficult to select the most appropriate system for a given application. Therefore, these systems are frequently combined in polyglot persistence architectures to leverage each system in its respective sweet spot. This tutorial gives an in-depth survey of the most relevant NoSQL databases to provide comparative classification and highlight open challenges. To this end, we analyze the approach of each system to derive its scalability, availability, consistency, data modeling and querying characteristics. We present how each system's design is governed by a central set of trade-offs over irreconcilable system properties. We then cover recent research results in distributed data management to illustrate that some shortcomings of NoSQL systems could already be solved in practice, whereas other NoSQL data management problems pose interesting and unsolved research challenges.
If you'd like to use these slides for e.g. teaching, contact us at gessert at informatik.uni-hamburg.de - we'll send you the PowerPoint.
Cloud Databases in Research and PracticeFelix Gessert
The combination of database systems and cloud computing is extremely attractive: unlimited storage capacities, elastic scalability and as-a-Service models seem to be within reach. This talk will give an in-depth survey of existing solutions for cloud databases that evolved in the last years and provide classification and comparison. This includes real-world systems (e.g. Azure Tables, DynamoDB and Parse) as well as research approaches (e.g. RelationalCloud and ElasTras). In practice however, there are some unsolved problems. Network latency, scalable transactions, SLAs, multi-tenancy, abstract data modelling, elastic scalability and polyglot persistence pose daunting tasks for many scenarios. Therefore, we conclude with „Orestes“ a research approach based on well-known techniques such as web caching, Bloom filters and optimistic concurrency control that demonstrates how existing cloud databases can be enhanced to suit specific applications.
The presentation begins with an overview of the growth of non-structured data and the benefits NoSQL products provide. It then provides an evaluation of the more popular NoSQL products on the market including MongoDB, Cassandra, Neo4J, and Redis. With NoSQL architectures becoming an increasingly appealing database management option for many organizations, this presentation will help you effectively evaluate the most popular NoSQL offerings and determine which one best meets your business needs.
An unprecedented amount of data is being created and is accessible. This presentation will instruct on using the new NoSQL technologies to make sense of all this data.
Considerations for using NoSQL technology on your next IT projectAkmal Chaudhri
The slideshare view is not great, but the downloadable PDF file is just fine.
Originally presented at:
British Computer Society (BCS) SPA-270, London, UK, 6 February 2013
http://www.bcs-spa.org/cgi-bin/view/SPA/NoSqlDatabasesForBigData
The NoSQL movement has introduced four new database architectural patterns that complement, but not replace, traditional relational and analytical databases. This presentation will introduce these four patterns and discuss their relative strengths and weaknesses for solving a variety of business problems. These problems include Big Data (scalability), search, high availability and agility. For each type of problem we look at how NoSQL databases take different approaches to solving these problems and how you can use this knowledge to find the right database architecture for your business challenges.
The relational database has been the dominant database model for many years. However, a new model called NoSQL is gaining significant attention. NoSQL DBs are non-relational data stores that have been employed in various scenarios, where traditional RDBMS features matter less, and the improved performance of storing or retrieving relatively simple data sets matters most. The relational and the NoSQL database model are each good for specific applications. Depending on the problem to solve, a NoSQL or a relational model can be advantageous. In this session we present some typical use cases and how they can be solved with both NoSQL and the RDMBS databases. Will there be clear a winner or is there room for both NoSQL and RDMBS in the future?
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
We'll be covering some aspects of our architecture, highlighting differences between MongoDB and Cassandra. We'll go in depth to explain why Cassandra is a better choice for our general purpose Application Platform (SHIFT) as well as our Media Buying Analytics tool (the SHIFT Media Manager). We'll be going over common design patterns people might be familiar with coming from a background with MongoDB and highlight how Cassandra would be used as a better alternative. We'll also touch more on cqlengine which is nearing feature completeness as the Cassandra object mapper for Python.
NoSQL databases are currently used in several applications scenarios in contrast to Relations Databases. Several type of Databases there exist. In this presentation we compare Key Value, Column Oriented, Document Oriented and Graph Databases. Using a simple case study there are evaluated pros and cons of the NoSQL databases taken into account.
SQL vs. NoSQL. It's always a hard choice.Denis Reznik
This will be an interesting and sometimes fun session with a small demo. This session will answer some of your questions and force you to think about new questions. It will not be very technical, so it's ok for choose another more technical session from the schedule :) But if will decide to come, I can assure you, that you will not be disappointed. We will do a thought experiment with one famous public high-loaded website, will look at advantages and disadvantages of SQL and NoSQL databases, and will choose the best database engine for it.
An unprecedented amount of data is being created and is accessible. This presentation will instruct on using the new NoSQL technologies to make sense of all this data.
Considerations for using NoSQL technology on your next IT projectAkmal Chaudhri
The slideshare view is not great, but the downloadable PDF file is just fine.
Originally presented at:
British Computer Society (BCS) SPA-270, London, UK, 6 February 2013
http://www.bcs-spa.org/cgi-bin/view/SPA/NoSqlDatabasesForBigData
The NoSQL movement has introduced four new database architectural patterns that complement, but not replace, traditional relational and analytical databases. This presentation will introduce these four patterns and discuss their relative strengths and weaknesses for solving a variety of business problems. These problems include Big Data (scalability), search, high availability and agility. For each type of problem we look at how NoSQL databases take different approaches to solving these problems and how you can use this knowledge to find the right database architecture for your business challenges.
The relational database has been the dominant database model for many years. However, a new model called NoSQL is gaining significant attention. NoSQL DBs are non-relational data stores that have been employed in various scenarios, where traditional RDBMS features matter less, and the improved performance of storing or retrieving relatively simple data sets matters most. The relational and the NoSQL database model are each good for specific applications. Depending on the problem to solve, a NoSQL or a relational model can be advantageous. In this session we present some typical use cases and how they can be solved with both NoSQL and the RDMBS databases. Will there be clear a winner or is there room for both NoSQL and RDMBS in the future?
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
We'll be covering some aspects of our architecture, highlighting differences between MongoDB and Cassandra. We'll go in depth to explain why Cassandra is a better choice for our general purpose Application Platform (SHIFT) as well as our Media Buying Analytics tool (the SHIFT Media Manager). We'll be going over common design patterns people might be familiar with coming from a background with MongoDB and highlight how Cassandra would be used as a better alternative. We'll also touch more on cqlengine which is nearing feature completeness as the Cassandra object mapper for Python.
NoSQL databases are currently used in several applications scenarios in contrast to Relations Databases. Several type of Databases there exist. In this presentation we compare Key Value, Column Oriented, Document Oriented and Graph Databases. Using a simple case study there are evaluated pros and cons of the NoSQL databases taken into account.
SQL vs. NoSQL. It's always a hard choice.Denis Reznik
This will be an interesting and sometimes fun session with a small demo. This session will answer some of your questions and force you to think about new questions. It will not be very technical, so it's ok for choose another more technical session from the schedule :) But if will decide to come, I can assure you, that you will not be disappointed. We will do a thought experiment with one famous public high-loaded website, will look at advantages and disadvantages of SQL and NoSQL databases, and will choose the best database engine for it.
Relational databases are used extensively in many applications and systems, but they are not always the best data store solution to the problem at hand. In this session we discuss the limitations of RDBMS and show which NoSQL solutions can be used to overcome these limitations. We also cover migration topics, such as how to add NoSQL databases without adding complexity to your development and operations.
[db tech showcase Tokyo 2017] C16: Azure SQL Database - Are you ready for the...Insight Technology, Inc.
As organizations see the benefits of the cloud, you may find yourself involved in migration projects which target the move from on-premises SQL Server to the cloud. Are you ready for this?
In this session, we will compare and contrast different migration strategies. We will cover different ways to migrate your SQL Server database from on-premises to Azure, and how to detect and solve potential migration blockers and issues.
Если раньше при старте нового проекта нам нужно было выбрать одну из доступных на тот момент SQL баз данных, то за последние 5 лет ситуация кардинально изменилась. Теперь выбор стал гораздо сложнее. SQL или NoSQL? Сloud или on-premises? Если SQL/NoSQL - то какая именно? А может использовать и то и другое?
В данном докладе мы постараемся представить общий обзор доступных сегодня решений для хранения данных и определиться с критериями выбора.
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseEric Bragas
In this presentation, we take a look at the components of a modern ETL platform using the latest and greatest Azure technologies to leverage PaaS services for parallel data loading, distributed data processing, and SQL databases as a semantic layer. Originally presented for the Orange County SQL Saturday, April 2018.
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
Presented by Mark Miller, Software Engineer, Cloudera
As the NoSQL ecosystem looks to integrate great search, great search is naturally beginning to expose many NoSQL features. Will these Goliath's collide? Or will they remain specialized while intermingling – two sides of the same coin.
Come learn about where SolrCloud fits into the NoSQL landscape. What can it do? What will it do? And how will the big data, NoSQL, Search ecosystem evolve. If you are interested in Big Data, NoSQL, distributed systems, CAP theorem and other hype filled terms, than this talk may be for you.
Visualizing big data in the browser using sparkDatabricks
In this talk at 2015 Spark Summit East, @mhfalaki from Databricks shows how Spark can be used along with open source visualization tools such as, D3, Matplotlib, and ggplot, to address challenges in visualizing large data sets.
A 30 minute talk I did at Cassandra Dublin and Cassandra London. Just some things I've learned along the way as I've helped some of the largest users of Cassandra be successful. Learn form other peoples mistakes!
In this talk from the Dublin Websummit 2014 AWS Technical Evangelist Danilo Poccia discusses NoSQL technology.
Includes an introduction to NoSQL DB and a discussion of when it is time to consider NoSQL.
Danilo also introduces Amazon DynamoDB as a NoSQL solution and talks through several case studies of customers that are using Amazon DynamoDB today.
SVP of Couchbase: The Exciting World of NoSQL: Scaling NoSQL Data, N1QL vs. S...✔ Eric David Benari, PMP
Slides from this Database Month event:
http://www.DBMonth.com/database/nosql
In this exciting presentation, you will learn from the Senior VP of Engineering at Couchbase how to design your project to utilize a memory-first NoSQL architecture.
We will provide the fundamentals of querying a NoSQL database and utilizing the N1QL language (pronounced "nickel") to access JSON documents in a SQL-like manner.
You will also learn how to use elastic scaling and XDCR cross data center replication to dramatically advance your application's scalability and availability. We will cover the benefits of support from different cloud providers and container orchestration frameworks.
You will also learn how to maximize the benefits of NoSQL within mobile applications by using a Sync Gateway.
We will also discuss how you can get involved in the Couchbase community as well as how global support works. You cannot afford to miss this non-relational database event, it will be epic!
Ravi Mayuram, Senior VP of Products & Engineering, Couchbase
Ravi Mayuram is responsible for product development and delivery of Couchbase NoSQL offerings.
Ravi comes to Couchbase from Oracle where he was a senior director of engineering leading innovations in the areas of recommender systems and social graph, search and analytics, and lightweight client frameworks. He was also responsible for kickstarting the Oracle cloud collaboration platform.
Previously in his career, Ravi has held senior technical and management positions at BEA, Siebel, Informix and HP in addition to couple of start ups including BroadBand Office, a Kleiner Perkins funded venture. Ravi holds a MS in Mathematics from University of Delhi.
Database Camp 2016 @ United Nations, NYC - Javier de la Torre, CEO, CARTO✔ Eric David Benari, PMP
Database Driven Location Intelligence: The Missing Dimension
Javier de la Torre, Founder & CEO, CARTO
Video of this session at the Database Camp conference at the UN is on http://www.Database.Camp
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...✔ Eric David Benari, PMP
Advancing Real-Time Responses in Web Applications
Michael Glukhovsky, Co-Founder, RethinkDB
Video of this session at the Database Camp conference at the UN is on http://www.Database.Camp
Database Camp 2016 @ United Nations, NYC - Minerva Tantoco, CTO of the City o...✔ Eric David Benari, PMP
Building A Smart + Equitable City
Minvera Tantoco, Chief Technical Officer, City of New York
Predicting Student Residential Data
Jonathan Geis, NYC Department of Education
Video of this session at the Database Camp conference at the UN is on http://www.Database.Camp
In Chip Biz Analytics - Innovation & Disruption
Amir Orad, CEO of Sisense
Video of this session at the Database Camp conference at the UN is on http://www.Database.Camp
Global Knowledge Collaboration to Cure Cancer: How GPUs Impact Graph & Predictive Analytics
Brad Bebee, CEO of Blazegraph
Video of this session at the Database Camp conference at the UN is on http://www.Database.Camp
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase✔ Eric David Benari, PMP
The Database Platform for the Digital Economy
Bob Wiederhold, CEO of Couchbase
Video of this session at the Database Camp conference at the UN is on http://www.Database.Camp
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
2. What’s this presentation about?
• Scalability and the database tier
•
•
•
•
•
•
What’s the problem?
How did we get here?
Some proposed solutions
What are parallel databases?
What’s ParElastic?
How do I get ParElastic?
• Q&A
October 3, 2013
Tweet this presentation
#parelastic
Scalability and the database tier | NYC MySQL Meetup
2
3. What is the scalability
problem?
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
3
4. What is the scalability problem?
• Has many faces
•
•
•
•
Connections and Concurrency
Data Volume and Retention Period
Databases and Tenants
Read vs. Write
• Your problem(s)
• May be more than one
• May change over time
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
4
5. Connections and Concurrency
• More [Active] Connections
• Worse Performance
• Sizing your database
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
5
6. Data Volume and Retention Period
• Longer Retention Period
• More Data
• More Data
• Worse Performance
• Progressive deterioration
• All data in memory
• All indexes in memory
• Not enough memory
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
6
7. Databases and “Tenants”
• Common paradigm in SaaS applications
• Each tenant’s application instance has a database
• Several databases on each database instance
• More databases per instance
• Worse Performance
In one customer engagement we were informed that no more than 1000
tenants could be located on one database instance before performance
became unacceptable
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
7
8. Read vs. Write
• Simple read (SELECT) queries could scale well
• Key based lookups
• With favorable indexes
• Things that cause heartburn
• Complex joins (with large data sets)
• Sorts
• Aggregation
• Reads are easier to scale than writes
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
8
9. How did we get here?
A brief history lesson
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
9
10. How did we get here? [1]
• A combination of factors
• Changes in the application user/usage
• Driven by the Internet and mobile computing
• “News Cycles” are getting shorter
• Economics
• Commodity computing is cheap and getting cheaper
• Solutions that can “scale-out” win, others lose
• Ability to leverage higher core-densities
• Other databases does a better job at this than MySQL
• MySQL would do great if you had a 20GHz processor ;)
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
10
11. How did we get here? [2]
• The Evolution of the Database Management System
• A battle between “generalized” and “specialized”
• The Relational Database Management System (RDBMS)
• Designed for monolithic systems
• SMP
• Scale-Up
• Applications evolve quickly!
• Databases respond slowly
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
11
12. How did we get here? [3]
• Moore’s Law
• Scale-Up seemed like a fine answer
• But there are limits …
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
12
13. How did we get here? [4]
• Database architectures traditionally were
• Shared CPU/Memory/Disk
• Also known as “Shared-Everything”
• But “Shared-Everything” doesn’t scale
• At least not for databases
A server costing twice as much doesn’t always give you twice as much
database “power”. You reach a point of diminishing returns.
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
13
14. How did we get here? [5]
• You can pay more but you may not get more
Source: Amazon RDS TPC-C Benchmark. Md. Borhan Uddin, Bo He,
Radu Sion, Cloud Computing Center, SUNY Stony Brook.
Viewed online http://digitalpiglet.org/research/sion2010cloud-rds.pdf
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
14
16. Some proposed solutions
• Several strategies have been advocated
•
•
•
•
•
Cache, Cache, Cache,…
Get a bigger server [a.k.a. Scale-Up]
Sharding [a form of Scale-Out]
NoSQL or NewSQL [typically Scale-Out]
Replication and variants
• We look at each one in more detail
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
16
17. Cache, Cache, Cache!
That’s easy! Do
some caching!
caching transitive verb
to cache
cache
noun
Temporary computer storage used for quick retrieval
of data in order to increase processing speed.
• Caching only addresses
‘read’; not ‘write’
• Social Media workloads
are 'write heavy‘,
'interactive‘ and ‘highly
personalized’
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
17
18. Get a bigger server [Scale-Up]
I will use a
bigger database
server
Can I even get a
bigger server?
What if
m2.4xlarge isn’t
enough?
Maybe I just
have too much
data?
Maybe I have
too many users?
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
18
19. Sharding [a form of Scale-Out]
Sharding will solve
my problem!
shard
noun ˈshärd
a piece or fragment of a brittle
substance <shards of glass>; broadly :
a small piece or part
sharding
noun ˈshär-diŋ
(a) to make ones application brittle or
fragmented;
(b) to take one big problem and make
many small problems;
(c) to complicate an application while
claiming to solve a scalability
problem;
(d) to decrease developer
productivity;
(e) a bad idea;
(f) sharding library: a mechanism
that attempts (unsuccessfully) to
hide the bad taste of sharding
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
19
20. NoSQL or NewSQL?
You need NoSQL
or NewSQL!
• Yes, I have to rewrite my
application
• Yes, not all queries will work
• No, there’s no standard query
language
• No, most do not have ACID
guarantees; hell some don’t even
guarantee Durability
• Yes, most are somewhat untried
science-experiments
• More flavors than Ben & Jerry’s
Ice Cream [yes, really]
• But, all the cool kids are doing it!
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
20
21. Replication and variants
• Replication based solutions (typically called clustering)
•
•
•
•
Many copies of the data
Distribute queries across the copies
Keep the copies synchronized: like herding cats
Write bottleneck
• Read/Write splitting
•
•
•
•
Single Master (gets all the writes)
Many Slaves (share the reads)
Unpredictable latency
Write bottleneck
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
21
22. What about MySQL Cluster?
• MySQL Cluster is a strange beast
• For best results, you must use the NDB interface
• Only supports the NDB storage engine
• Primarily a distributed in-memory Key-Value Store
• That is ACID compliant and supports joins and things if you
use the SQL interface
• But no one tells you about the performance of this path!
• Published benchmarks are all “FlexAsync” which talk
directly to the NDB interface
• And READ-ONLY
For more details visit http://www.parelastic.com/blog/mysql-cluster-and-benchmarks
Or stick around after the presentation and we can chat!
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
22
24. What are parallel databases?
1
• A database architecture proposed in 1992
• Very successfully applied to many database problems
• Oracle Exadata, Netezza, Teradata, Greenplum, …
• An example of the “Shared Nothing” database
2
paradigm
1
Parallel Database Systems: The future of high performance database processing [1992, Dewitt, Gray,
ftp://ftp.cs.wisc.edu/pub/techreports/1992/TR1079.pdf]
2
The Case for Shared Nothing [1986, Stonebraker, http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf]
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
24
25. How parallel databases execute queries
Image from “Parallel Database Systems: The future of high performance database processing” [1992, Dewitt,
Gray, ftp://ftp.cs.wisc.edu/pub/techreports/1992/TR1079.pdf]
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
25
26. Benefits of parallel databases
• Linear improvement in “reads”
• Linear improvements in “writes”
• Better than linear improvement in “joins”
• Better than linear improvement in “aggregation”
• Better than linear improvement in “sorts”
For more details, refer “Parallel Database Systems: The future of high performance database processing”
[1992, Dewitt, Gray, ftp://ftp.cs.wisc.edu/pub/techreports/1992/TR1079.pdf]
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
26
27. Parallel Databases vs. Sharding
• Parallel Database
• Database architecture
• Application is data
location agnostic
• Application perceives a
single database
• Requires no application
rewrites
• Application is not
constrained by parallel
database architecture
• A parallel database
handles any schema
October 3, 2013
• Sharding
• Application architecture
• Application is data location
aware
• Application perceives a
collection of databases
• Requires application
rewrites
• Application is constrained
to the limitations of the
sharding architecture
• Not all schemas are
shard’able
Scalability and the database tier | NYC MySQL Meetup
27
29. What is ParElastic?
• An approach to relational database virtualization
• Addresses issues of scalability in relational databases
• A parallel database architecture
• Built on standard MySQL or MySQL variant databases
• Horizontal Scalability
• Elastic
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
29
30. ParElastic: System Architecture
ParElastic Architecture protected by US8214356, “Apparatus for elastic database processing with heterogeneous data”
10/7/2013
Flex Your Database | ParElastic ® Database Virtualization
Engine
30
31. Data Distribution: How it works
• User data is “distributed” across multiple storage nodes
• Queries are executed in parallel by some [or all] nodes
• Multiple distribution models supported
•
•
•
•
Range
Hash
Broadcast
Random
• ParElastic guarantees co-location and query execution
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
31
32. Storage Elasticity: How it works
• A “generational scheme”
• Storage Nodes added over time
• Each creates a new “generation”
• Unnecessary to migrate large amounts of data
• A key drawback with “sharding” that requires “resharding”
Storage Elasticity protected by US8478790, US8386532 and other patents.
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
32
33. ParElastic: How It Works
10/7/2013
Flex Your Database | ParElastic ® Database Virtualization
Engine
33
34. ParElastic: Simple query processing example
SELECT COUNT(*)
FROM CUSTOMER;
count(*)
-------2771
(1 row affected)
PROVISION 1 DYNAMIC NODE
ON DYNAMIC NODE
CREATE TEMP TABLE
T1
( C INT );
ON ALL STORAGE NODES
SELECT COUNT(*)
FROM CUSTOMER
AND REDISTRIBUTE
TO T1
ON DYNAMIC NODE
SELECT SUM(C)
FROM T1;
10/7/2013
Flex Your Database | ParElastic ® Database Virtualization
Engine
34
35. ParElastic Performance Benefits
• Connection Scalability
• ParElastic Tier Elasticity; have more or less ParElastic servers
• Storage / Data Volume Scalability
• Add ParElastic Persistent Nodes as data volumes increase
• Multiple machines working together
• Workloads are variable
• Compute Node Elasticity; have more or less as required
• Databases and Tenants [SaaS applications]
• ParElastic Adaptive Multi-tenancy ™
• No application change
• Queries processed by, data stored on standard MySQL!
10/7/2013
Flex Your Database | ParElastic ® Database Virtualization
Engine
35
39. ParElastic data “ingest”
One Million rows/s!
15 Storage Nodes, 2 ParElastic Servers
Tests conducted in Amazon Cloud. Native MySQL testing on m1.xlarge server, standard MySQL, standard EBS volumes. Test driver was a c1.xlarge server to provide
sufficient CPU head-room to generate load. ParElastic run with 5 and 15 persistent storage nodes identically configured, m1.xlarge, standard MySQL, standard EBS
Volumes. 15 node test employed two c1.xlarge test drivers. Best ParElastic performance was with 10 threads, 10 persistent storage nodes and an insert batch size of
5,000 tuples per insert batch. Best native MySQL performance was with 2 threads and a batch size of 10,000 tuples per insert batch.
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
39
40. What’s the ParElastic Overhead?
Query Time
15.72ms
Test Client
Machine 1
Query Time
17.03ms
ParElastic overhead ~ 1.31ms
Network RTT
0.35ms
Machine 1
ParElastic
Machine 2
mysqld
mysqld
Machine 2
Machine 3
October 3, 2013
Test Client
mysqld
…
Scalability and the database tier | NYC MySQL Meetup
Machine 4
40
41. Characterizing ParElastic Performance
• A “fixed cost”, the overhead per query
• A “variable cost” for query processing
• Consider this example, a simple “COUNT” query.
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
41
42. Some things to keep in mind
• Horizontal Scale-Out benefits from
• Being “stateless”, or at least having less state
• Adhering to a truly “shared nothing” approach
• Horizontal Scale-Out is impeded by
• Complex or Shared “State”
• Things that violate the “shared nothing” paradigm
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
42
43. What is ParElastic?
• An approach to relational database virtualization
• "A Hypervisor for the Database Tier"
• Scale out database capacity across many servers
• Effectively handle workloads too big for one server
• Share this pool of database among many applications
• Efficiently allocate database capacity to workload
• An elastic, multi-tenant, parallel database architecture
• Built on standard MySQL or MySQL variant databases
• Horizontal Scalability
• Elastic
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
43
44. Some target markets
• Database Virtualization – “Hypervisor for the Database”
• Reduce capex and simplify administration for development
and test
• SaaS Enablement
• Simplified deployment of SaaS applications using multitenancy
• High Volume Database Applications
• High traffic websites, (e.g. social, ecommerce, on-line games)
• High speed data ingest (e.g. click tracking, sensor arrays,
mobile)
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
45
45. Where do I get
ParElastic?
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
46
46. Getting ParElastic
• For Evaluations
• Available at no charge on Amazon Marketplace
• Preconfigured for evaluation purposes; not performance
testing
• Runs completely on a single EC2 instance
• For Larger Configurations
•
•
•
•
Contact ParElastic
Email: info@parelastic.com
Twitter: @parelastic
Web: http://www.parelastic.com
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
47
47. Getting ParElastic
• On the Amazon AWS Marketplace
(aws.amazon.com/marketplace)
• Quick start guide and simple (two-step) setup wizard
provided.
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
48
49. Conclusion
• Database Scalability is a very real problem
• The Cloud has put a very complicated wrinkle in it
• The problem was seen before with commodity servers
• Virtualization was able to address this problem
• Several “hacks” have been proposed
• Not really solutions, just hacks
• ParElastic is a database virtualization solution
• Based on standard relational databases
• Provides benefits of horizontal scalability and multi-tenancy
• ParElastic is available for evaluation on many platforms
• Free evaluation also available on Amazon Marketplace
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
50
50. Contacting ParElastic
• Look us up online
– http://www.parelastic.com
• Watch an explainer video
– http://www.parelastic.com/video
• Contact us
– Email: info@parelastic.com
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
51
52. Image Credits
•
Moore’s Law
•
•
Hercules slays the Hydra
•
•
Wikipedia [http://commons.wikimedia.org/wiki/File%3AHercules_slaying_the_Hydra.jpg]
CPU History
•
•
Wikipedia [http://commons.wikimedia.org/wiki/File%3ATransistor_Count_and_Moore's_Law_-_2011.svg]
Phillip E. Ross, “Why CPU Frequency Stalled” [http://spectrum.ieee.org/computing/hardware/why-cpu-frequency-stalled]
Herding Cats
•
Image from [http://wodongatafe.wordpress.com/2011/05/27/herding-cats-or-facilitating-a-webinar-whats-the-difference/]
October 3, 2013
Scalability and the database tier | NYC MySQL Meetup
53