This is my presentation from Percona Live! London last year. I describe my Big Data system built on MySQL/NDB and show that we can preserve the relational model for most Big Data purposes in the real world.
The rise of NoSQL is characterized with confusion and ambiguity; very much like any fast-emerging organic movement in the absence of well-defined standards and adequate software solutions. Whether you are a developer or an architect, many questions come to mind when faced with the decision of where your data should be stored and how it should be managed. The following are some of these questions: What does the rise of all these NoSQL technologies mean to my enterprise? What is NoSQL to begin with? Does it mean "No SQL"? Could this be just another fad? Is it a good idea to bet the future of my enterprise on these new exotic technologies and simply abandon proven mature Relational DataBase Management Systems (RDBMS)? How scalable is scalable? Assuming that I am sold, how do I choose the one that fit my needs best? Is there a middle ground somewhere? What is this Polyglot Persistence I hear about? The answers to these questions and many more is the subject of this talk along with a survey of the most popular of NoSQL technologies. Be there or be square.
Wordnik's technical co-founder Tony Tam describes the reason for going NoSQL. During his talk Tony will discuss the selection criteria, testing + evaluation and successful, zero-downtime migration to MongoDB. Additionally details on Wordnik's speed and stability will be covered as well as how NoSQL technologies have changed the way Wordnik scales.
Summary of past Cassandra benchmarks performed by Netflix and description of how Netflix uses Cassandra interspersed with a live demo automated using Jenkins and Jmeter that created two 12 node Cassandra clusters from scratch on AWS, one with regular disks and one with SSDs. Both clusters were scaled up to 24 nodes each during the demo.
The rise of NoSQL is characterized with confusion and ambiguity; very much like any fast-emerging organic movement in the absence of well-defined standards and adequate software solutions. Whether you are a developer or an architect, many questions come to mind when faced with the decision of where your data should be stored and how it should be managed. The following are some of these questions: What does the rise of all these NoSQL technologies mean to my enterprise? What is NoSQL to begin with? Does it mean "No SQL"? Could this be just another fad? Is it a good idea to bet the future of my enterprise on these new exotic technologies and simply abandon proven mature Relational DataBase Management Systems (RDBMS)? How scalable is scalable? Assuming that I am sold, how do I choose the one that fit my needs best? Is there a middle ground somewhere? What is this Polyglot Persistence I hear about? The answers to these questions and many more is the subject of this talk along with a survey of the most popular of NoSQL technologies. Be there or be square.
Wordnik's technical co-founder Tony Tam describes the reason for going NoSQL. During his talk Tony will discuss the selection criteria, testing + evaluation and successful, zero-downtime migration to MongoDB. Additionally details on Wordnik's speed and stability will be covered as well as how NoSQL technologies have changed the way Wordnik scales.
Summary of past Cassandra benchmarks performed by Netflix and description of how Netflix uses Cassandra interspersed with a live demo automated using Jenkins and Jmeter that created two 12 node Cassandra clusters from scratch on AWS, one with regular disks and one with SSDs. Both clusters were scaled up to 24 nodes each during the demo.
Conference tutorial: MySQL Cluster as NoSQLSeveralnines
Slides from the 'MySQL Cluster as NoSQL' tutorial at Percona Live MySQL Conference 2012 in London.
Tutorial covers:
*MySQL Cluster administration
* NoSQL options for MySQL Cluster and when to use what
* Memcached (installation and configuration)
* Cluster/J
* NDBAPI
* Benchmarking of different access methods on a live cluster
This was co-presented at the OpenStack Summit 2013 in Portland by Kamesh Pemmaraju, Product Manager from Dell and Neil Levine Inktank.
Inktank Ceph is a transformational open source storage solution fully integrated into OpenStack providing scalable object and block storage (via Cinder) using commodity servers. The Ceph solution is resilient to failures, uses storage efficiently, and performs well under a variety of VM Workloads.
Dell Crowbar is an open source software framework that can automatically deploy Ceph and OpenStack on bare metal servers in a matter of hours. The Ceph team worked with Dell to create a Ceph barclamp (a crowbar extention) that integrates Glance, Cinder, and Nova-Volume. As a result, it is lot faster and easier to install, configure, and manage a sizable OpenStack and Ceph cluster that is tightly integrated and cost- optimized.
Hear how OpenStack users can address their storage deployment challenges:
Considerations when selecting a cloud storage system
Overview of the Ceph architecture with unique features and benefits
Overview of Dell Crowbar and how it can automate and simplify Ceph/OpenStack deployments Best practices in deploying cloud storage with Ceph and OpenStack
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scaleLinkedIn
This is one of two presentations given by LinkedIn engineers at Java One 2009.
This presentation was given by David Raccah and Dhananjay Ragade from LinkedIn.
For more information, check out http://blog.linkedin.com/
(A talk given at Wix R&D in Dnipro, Ukraine on March 2017. Video available at https://www.youtube.com/watch?v=eIX33mQdkAI&feature=youtu.be)
While microservices are conceptually simple, it's a deep rabbit hole to go down. Deceptively simple questions can have far-reaching implications: Which communication protocol should I choose? Is event-driven the way to go? What monitoring tools should I put in place?
In this talk we'll cover some of the fundamental questions, outline the solutions adopted or developed by Wix, and share our hindsight on what worked well for us, what didn't and thoughts on future directions for our stack.
This presentation will be useful to those
who would like to get acquainted with lifetime history
of successful monolithic Java application.
It shows architectural and technical evolution of one Java web startup that is beyond daily coding routine and contains a lot of simplifications, Captain Obvious and internet memes.
But this presentation is not intended for monolithic vs. micro services architectures comparison.
Structor - Automated Building of Virtual Hadoop ClustersOwen O'Malley
Discusses vagrant scripts to setup and deploy a working Hadoop multiple node cluster with or without security. All source code is available on https://github.com/hortonworks/structor .
Novell Identity Manager Tips, Tricks and Best PracticesNovell
This session covers the top tips, tricks and best practices for each component of Novell Identity Manager. You will receive experience by learning from the common mistakes made by others. The session focuses on the identity engine, connector set, Roles Based Provisioning Module and utilities that go into making any Identity Manager implementation easy. You will walk away with the knowledge of best practices for installation, configuration, and customization of Identity Manager. In addition, you will know how to create new policies to enhance your identity and access management solution by removing the barriers between applications, data stores and network platforms. A good understanding of Novell Identity Manager architectures and policies is highly recommended.
Working with XSLT, XPath and ECMA Scripts: Make It Simpler with Novell Identi...Novell
Using XPath, XSLT, ECMA scripts judiciously is vital to building complex policies easily in your identity management projects. This session will compare these techniques to achieve similar results and show exact benefits of using one over the other in specific use cases. It will also go through the lifecyle of Novell Identity Manager policy management: how the policies are developed, tested and deployed in Novell Identity Manager Designer. This is an advanced session, and assumes you have a significant level of experience with Identity Manager and Designer.
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn. This was a presentation made at QCon 2009 and is embedded on LinkedIn's blog - http://blog.linkedin.com/
Conference tutorial: MySQL Cluster as NoSQLSeveralnines
Slides from the 'MySQL Cluster as NoSQL' tutorial at Percona Live MySQL Conference 2012 in London.
Tutorial covers:
*MySQL Cluster administration
* NoSQL options for MySQL Cluster and when to use what
* Memcached (installation and configuration)
* Cluster/J
* NDBAPI
* Benchmarking of different access methods on a live cluster
This was co-presented at the OpenStack Summit 2013 in Portland by Kamesh Pemmaraju, Product Manager from Dell and Neil Levine Inktank.
Inktank Ceph is a transformational open source storage solution fully integrated into OpenStack providing scalable object and block storage (via Cinder) using commodity servers. The Ceph solution is resilient to failures, uses storage efficiently, and performs well under a variety of VM Workloads.
Dell Crowbar is an open source software framework that can automatically deploy Ceph and OpenStack on bare metal servers in a matter of hours. The Ceph team worked with Dell to create a Ceph barclamp (a crowbar extention) that integrates Glance, Cinder, and Nova-Volume. As a result, it is lot faster and easier to install, configure, and manage a sizable OpenStack and Ceph cluster that is tightly integrated and cost- optimized.
Hear how OpenStack users can address their storage deployment challenges:
Considerations when selecting a cloud storage system
Overview of the Ceph architecture with unique features and benefits
Overview of Dell Crowbar and how it can automate and simplify Ceph/OpenStack deployments Best practices in deploying cloud storage with Ceph and OpenStack
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scaleLinkedIn
This is one of two presentations given by LinkedIn engineers at Java One 2009.
This presentation was given by David Raccah and Dhananjay Ragade from LinkedIn.
For more information, check out http://blog.linkedin.com/
(A talk given at Wix R&D in Dnipro, Ukraine on March 2017. Video available at https://www.youtube.com/watch?v=eIX33mQdkAI&feature=youtu.be)
While microservices are conceptually simple, it's a deep rabbit hole to go down. Deceptively simple questions can have far-reaching implications: Which communication protocol should I choose? Is event-driven the way to go? What monitoring tools should I put in place?
In this talk we'll cover some of the fundamental questions, outline the solutions adopted or developed by Wix, and share our hindsight on what worked well for us, what didn't and thoughts on future directions for our stack.
This presentation will be useful to those
who would like to get acquainted with lifetime history
of successful monolithic Java application.
It shows architectural and technical evolution of one Java web startup that is beyond daily coding routine and contains a lot of simplifications, Captain Obvious and internet memes.
But this presentation is not intended for monolithic vs. micro services architectures comparison.
Structor - Automated Building of Virtual Hadoop ClustersOwen O'Malley
Discusses vagrant scripts to setup and deploy a working Hadoop multiple node cluster with or without security. All source code is available on https://github.com/hortonworks/structor .
Novell Identity Manager Tips, Tricks and Best PracticesNovell
This session covers the top tips, tricks and best practices for each component of Novell Identity Manager. You will receive experience by learning from the common mistakes made by others. The session focuses on the identity engine, connector set, Roles Based Provisioning Module and utilities that go into making any Identity Manager implementation easy. You will walk away with the knowledge of best practices for installation, configuration, and customization of Identity Manager. In addition, you will know how to create new policies to enhance your identity and access management solution by removing the barriers between applications, data stores and network platforms. A good understanding of Novell Identity Manager architectures and policies is highly recommended.
Working with XSLT, XPath and ECMA Scripts: Make It Simpler with Novell Identi...Novell
Using XPath, XSLT, ECMA scripts judiciously is vital to building complex policies easily in your identity management projects. This session will compare these techniques to achieve similar results and show exact benefits of using one over the other in specific use cases. It will also go through the lifecyle of Novell Identity Manager policy management: how the policies are developed, tested and deployed in Novell Identity Manager Designer. This is an advanced session, and assumes you have a significant level of experience with Identity Manager and Designer.
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn. This was a presentation made at QCon 2009 and is embedded on LinkedIn's blog - http://blog.linkedin.com/
Run Cloud Native MySQL NDB Cluster in KubernetesBernd Ocklin
The more your database aligns with Cloud Native principles such as resilience, scaling, auto-healing and data consistency across all nodes, the better it also runs as DBaaS in Kubernetes. I walk through running databases in Kubernetes and demos manual deployment and deployment with an NDB operator.
This talk was given at the MySQL Dev Room FOSDEM 2021.
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsJosh Carlisle
When done correctly Serverless offers fantastic potential but can also lead to spectacular failure when critical concepts are overlooked. With over a dozen Serverless implementations on Azure Functions over the last couple years, I’ve learned some lessons the hard way. In this talk, I will be sharing a few of the most impactful hard-earned lessons and how I was able to overcome them. I’ll be touching on topics ranging from considerations using traditional relational databases, managing service and data connections to managing complexity and increasing observability. The talk is done in the context of Azure Functions but whose concepts apply equally to all Serverless Platforms.
Database as a Service on the Oracle Database Appliance PlatformMaris Elsins
Speaker: Marc Fielding, Co-speaker: Maris Elsins.
Oracle Database Appliance provides a robust, highly-available, cost-effective, and surprisingly scalable platform for database as a service environment. By leveraging Oracle Enterprise Manager's self-service features, databases can be provisioned on a self-service basis to a cluster of Oracle Database Appliance machines. Discover how multiple ODA devices can be managed together to provide both high availability and incremental, cost-effective scalability. Hear real-world lessons learned from successful database consolidation implementations.
Slides from the second meeting of the Toronto High Scalability Meetup @ http://www.meetup.com/toronto-high-scalability/
-Basics of High Scalability and High Availability
-Using a CDN to Achieve 99% Offload
-Caching at the Code Layer
Relational databases are used extensively in many applications and systems, but they are not always the best data store solution to the problem at hand. In this session we discuss the limitations of RDBMS and show which NoSQL solutions can be used to overcome these limitations. We also cover migration topics, such as how to add NoSQL databases without adding complexity to your development and operations.
Similar to A Global In-memory Data System for MySQL (20)
Always Offline: Delay-Tolerant Networking for the Internet of ThingsDaniel Austin
Discussion of IoT networks and DTNs, including some speculation on social behavior-based routing and the similarities between the IoT and the ecology of living things.
Keynote for HTML5 Devconf Fall 2014. I discuss the psychological and physical response times of human beings as nodes on the Internet, and answer the title question.
This is an update of my previous presentation from 2012. I discuss several recent topics including the Witnesses Principle, the rise of Big Metadata, and the need for autonomy and possibly self-awareness for financial instruments designed to hold value for periods greater than a single human lifetime.
Keynote presentation for NoSQL Now! 2014 conference.
* Why there will be Internet of Things as commonly conceived
* The IoT as a Big Data problem
* The rise of Big Metadata
* Imagining the Internet of Light Bulbs
* Systems Thinking and Light Bulb Architecture
* The 'WItnesses' Principle
* Security and Privacy with the IoT
* A Species and its Data
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...Daniel Austin
[Slides from NoSQL Now! 2013 Lightning Talks]
Grover’s Search is a famous Quantum Computing algorithm for searching random databases. It’s the fastest possible search algorithm in this universe. This is what Google will look like when it grows up! This is the second of 2 parts, explaing the algorithm in more detail.
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...Daniel Austin
[Slides from NoSQL Now! 2013 Lightning Talks]
Ever wonder about quantum computing? With the recent announcement of the first commercial deployment off a quantum computing device, we’ve now crossed the barrier between theory and practice. This just-for-fun talk will provide some insight into quantum computing and related topics.
Reconceiving the Web as a Distributed (NoSQL) Data SystemDaniel Austin
[Slides from NoSQL Now! 2013]
Nearly every Web request is a request for information from a database or a front-end caching system for one. Based on this concept, we can reconceive the Web as a large-scale distributed data system using NoSQL query languages across high-level protocols such as HTTP. Exploring this idea further leads us to a better understanding of the structure of the Web, and invites us to apply modern NoSQL thinking toward making it better. My goal is to re-orient people’s thinking toward the Web as a big NoSQL data system and then explore the implications.
This is my iGnite Night talk from Surge 2011. I describe Grover's Search, a Quantum Computing search algorithm that is the fastest possible in this Universe. It's a challenge to explain something about Quantum Computing in 10 minutes!
Notes on a High-Performance JSON ProtocolDaniel Austin
This is my presentation from JSConf 2011. I am proposing a new Web protocol to improve performance across the Internet. It's based on a dual-band protocol layered over TCP/IP and UDP and is backward compatible with existing HTTP-based systems.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
The Art of the Pitch: WordPress Relationships and Sales
A Global In-memory Data System for MySQL
1. A Global In-memory Data System for MySQL
Daniel Austin, PayPal Technical Staff
Percona Live!
London, Oct. 25, 2011 v1.1
2. AGENDA
Intro: Globalizing NDB
Proposed Architecture
What We Learned
Q&A
Global In-memory MySQL Confidential and Proprietary 2
3. Our Mission
“UserBase: Develop a globally
distributed DB For User-related data”
• Must Not Fail (99.999%)
• Must Not Lose Data. Period.
• Must Support Transactions
• Must Be FAST
• Must Support (some) SQL
• May Be Buzzword-compliant (RFC 2119)
Confidential and Proprietary
4. THE FUNDAMENTAL PROBLEM IN
DISTRIBUTED DATA SYSTEMS
“How Do We Manage Reliable
Distribution of Data Across Geographical
Distances?”
Confidential and Proprietary
5. To SQL or Not to SQL,’Tis the Query
• Modern NoSQL Systems as solutions
– Hive, Cassandra, Mongo, 120+ more
– Mostly designed for „Big Data‟ problems
– Low levels of maturity
• Trade-offs:
– Consistency vs. Availability, Fault
Tolerance (dreaded CAP theorem)
– Relational Model & X-actions dropped
Confidential and Proprietary
6. SYSTEM AVAILABILITY DEFINED
• Availability of the entire
system:
n m
Asys = 1 – P(1-Pri)j
i=1 j=1 V
I
P
• Number of Parallel
Components Needed to
Parallel
Achieve Availability Amin:
Serial
Nmin =
[ln(1-Amin)/ln(1-r)]
Confidential and Proprietary
7. What about “High Performance”?
• Maximum lightspeed distance on Earth’s
Surface: ~67 ms
• Target: data available worldwide in < 1000 ms
Sound Easy?
Think Again!
Confidential and Proprietary
8. Intro: Globalizing NDB
Proposed Architecture
What We Learned
Q&A
Global In-memory MySQL Confidential and Proprietary 8
9. WHY NDB?
Pro Con
• True HA by design • Some semantic
– Fast recovery limitations on fields
• Supports (some) X- • Size constraints (2
actions TB?)
• Relational Model – Hardware limits
• In-memory also
architecture = high • Higher cost/byte
performance • Requires reasonable
• Disk storage for data partitioning
non-indexed data • Higher complexity
(since 5.1)
• APIs, APIs, APIs Confidential and Proprietary
10. How NDB Works in One Slide
Graphics courtesy dev.mysql.com
Confidential and Proprietary
12. AWS Meets NDB
• Why AWS?
– Cheap and easy infrastructure-in-a-box
(Or so we thought! Ha!)
• Services Used:
– EC2 (Centos 5.3, small instances for mgm
& query nodes, XL for data
– Elastic IPs/ELB
– EBS Volumes
– S3
– Cloudwatch
Confidential and Proprietary
13. ARCHITECTURAL TILES
AWS Availability Zones
Tiling Rules
A B
• Never separate NDB & SQL
• Ndb:2-SQL:1-MGM:1
• Scale by adding more tiles
• Failover 1st to nearest AZ
• Then to nearest DC
• At least 1 replica/AZ
C ELB
• Don‟t share nodes
• Mgmt nodes are redundant
Limitations Unused (not present in all locations)
• AWS is network-bound @
250 MBPS – ouch!
• Need specific ACL across AZ
boundaries
• AZs not uniform!
• No GSLB NDB MGM SQL
• Dynamic IPs
• ELB sticky sessions !reliable
Confidential and Proprietary
14. Architecture Stack
Scale by Tiling
A B A B A B
A B
A B
A B A B
5 AWS Data Centers:
US-E, US-W, TK, EU, AS
Confidential and Proprietary
15. Other Technologies Considered
• Paxos
– Elegant-but-complex consensus-based
messaging protocol
– Used in Google Megastore, Bing metadata
• Java Query Caching
– Queries as serialized objects
– Not yet working
• Multiple Ring Architectures
– Even more complicated = no way
Confidential and Proprietary
16. Intro: Globalizing NDB
Proposed Architecture
What We Learned
Q&A
Global In-memory MySQL Confidential and Proprietary 16
17. SYSTEM READ/WRITE PERFORMANCE (!)
What we tested:
• 32 & 256 byte char fields
In-region replication tests
• Reads, writes, query speed vs. volume
• Data replication speeds
Results:
• Global replication < 350 ms
• 256 byte read < 10ms worldwide
06/19/2011 06/20/2011 06/21/2011 06/22/2011 06/23/2011
Confidential and Proprietary
18. Data Models and Query Optimization for NDB
• Network Latency is an obvious issue
• Data model requires all segments present
in each geo-region
• Parameterized (Linked) Joins
– SPJ technique from Clustra (see Clement
Frazer‟s blog for details)
Confidential and Proprietary
19. Commit Ordering
• Why does commit ordering matter?
• Write operators are non-commutative
[W(d,t1),W(d,t2)] != 0 unless t1=t2
– Can lead to inconsistency
– Can lead to timestamp corruption
– Forcing sequential writes defeats Amdahl‟s
rul
• Can show up in GSLB scenarios
Confidential and Proprietary
20. Dark Side of AWS
• Deploying NDB at scale on AWS is hard
– Dynamic IPs (use hostfile)
– DNS issues
– Security groups (ec2-authorize)
– Inconsistent EC2 deployments
• Are availability zones independent
network segments?
– No GSLB (!) (rent or buy)
Be Prepared to struggle a bit!
Confidential and Proprietary
21. NOTES ON GLOBAL LOAD BALANCING
• Don‟t try this at home
– Rent or buy a
solution
– BGP-based solutions
are best
• Deployment and config
for AWS are tough
• We tried both Zeus and
Dyn
– Liked Dyn better, $$
& ease of use
• Absolutely crucial part
of infrastructure!
Graphics courtesy http://www.oes.co.th
Confidential and Proprietary
22. Hard Lessons, Shared
• Be Careful…
– With “Eventual Consistency”-related concepts
– ACID, CAP are not really as well-defined as we‟d like
considering how often we invoke them
• NDB is a good solution
– Real HA, real SQL
– Notable limitations around fields, datatypes
– Successfully competes with NoSQL systems for most use
cases – better in many cases
• NoSQL Systems
– All have relatively low levels of maturity
– More suitable for simple key-value models
– Better for very high volumes
Confidential and Proprietary
23. Summing Up on v0.7
• It works!
• Very fast, very reliable
• Very complicated!
• AWS poses challenges that private data
centers may not experience
• You can achieve high performance and
availability without giving up relational
models and read consistency!
Confidential and Proprietary
24. Future Directions
• Alternate solution using Pacemaker,
Heartbeat
– From Yves Trudeau @ Percona
– Uses InnoDB, not NDB
• Implement Memcached plugin
– To test NoSQL functionality, APIs
• Add simple connection-based persistence
to preserve connections during failover
• Better data node distribution
• Better testing & monitoring
Confidential and Proprietary
25. “In the long run, we are all dead
eventually consistent.”
Maynard Keynes on NoSQL Databases
Twitter: @daniel_b_austin
Emai: daaustin@paypal.com
With apologies and thanks to the real DB experts, Andrew Goodman, Yves
Trudeau, Clement Frazer, Daniel Abadi, and everyone else!
Editor's Notes
Service Reliability.
This is really the problem we want to solve. It’s one of the fundamental problems in computer science and doesn’t have a completely satisfactory solution.
Performance is response time.
NDB has a much higher maturity level than most NoSQL systems