This document summarizes a presentation about using MySQL and the NDB storage engine to build a globally distributed in-memory database system on AWS. It proposes using MySQL/NDB clusters tiled across AWS availability zones to provide high availability and performance at a large scale. Key challenges discussed include managing data consistency across wide geographical distances and dealing with limitations of AWS like network performance and lack of global load balancing. Lessons learned are that NDB can successfully compete with NoSQL for most use cases by providing ACID compliance without sacrificing availability or performance.
A Global In-memory Data System for MySQLDaniel Austin
This is my presentation from Percona Live! London last year. I describe my Big Data system built on MySQL/NDB and show that we can preserve the relational model for most Big Data purposes in the real world.
Summary of past Cassandra benchmarks performed by Netflix and description of how Netflix uses Cassandra interspersed with a live demo automated using Jenkins and Jmeter that created two 12 node Cassandra clusters from scratch on AWS, one with regular disks and one with SSDs. Both clusters were scaled up to 24 nodes each during the demo.
Conference tutorial: MySQL Cluster as NoSQLSeveralnines
Slides from the 'MySQL Cluster as NoSQL' tutorial at Percona Live MySQL Conference 2012 in London.
Tutorial covers:
*MySQL Cluster administration
* NoSQL options for MySQL Cluster and when to use what
* Memcached (installation and configuration)
* Cluster/J
* NDBAPI
* Benchmarking of different access methods on a live cluster
From distributed caches to in-memory data gridsMax Alexejev
A brief introduction into modern caching technologies, starting from distributed memcached to modern data grids like Oracle Coherence.
Slides were presented during distributed caching tech talk in Moscow, May 17 2012.
A Global In-memory Data System for MySQLDaniel Austin
This is my presentation from Percona Live! London last year. I describe my Big Data system built on MySQL/NDB and show that we can preserve the relational model for most Big Data purposes in the real world.
Summary of past Cassandra benchmarks performed by Netflix and description of how Netflix uses Cassandra interspersed with a live demo automated using Jenkins and Jmeter that created two 12 node Cassandra clusters from scratch on AWS, one with regular disks and one with SSDs. Both clusters were scaled up to 24 nodes each during the demo.
Conference tutorial: MySQL Cluster as NoSQLSeveralnines
Slides from the 'MySQL Cluster as NoSQL' tutorial at Percona Live MySQL Conference 2012 in London.
Tutorial covers:
*MySQL Cluster administration
* NoSQL options for MySQL Cluster and when to use what
* Memcached (installation and configuration)
* Cluster/J
* NDBAPI
* Benchmarking of different access methods on a live cluster
From distributed caches to in-memory data gridsMax Alexejev
A brief introduction into modern caching technologies, starting from distributed memcached to modern data grids like Oracle Coherence.
Slides were presented during distributed caching tech talk in Moscow, May 17 2012.
Replication enhancements in MySQL 5.6, including GTIDs, HA / Self-healing, multi-threaded slaves and more. Slides over design rationale, implementation and how to get started using these new capabilities
Cisco UCS Integrated Infrastructure for Big Data with CassandraDataStax Academy
With growing popularity of big data, it becomes imperative for enterprises to adopt the right platform for their workload, with efficient and user-friendly management of large scale clusters. In this session we will explore Cisco's revolutionary innovations that deliver leading-edge infrastructure, well suited for Cassandra like data base platforms, purpose built for performance and scalability. This enables our customers to unlock the intelligence in their data. Not only this provides a sustainable competitive advantage to their business, but also scales with their growing business needs.
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scaleLinkedIn
This is one of two presentations given by LinkedIn engineers at Java One 2009.
This presentation was given by David Raccah and Dhananjay Ragade from LinkedIn.
For more information, check out http://blog.linkedin.com/
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesDataStax Academy
Presenter: Roopa Tangirala, Senior Cloud Data Architect at Netflix
High availability is an important requirement for any online business and trying to architect around failures and expecting infrastructure to fail, and even then be highly available, is the key to success. One such effort here at Netflix was the Active-Active implementation where we provided region resiliency. This presentation will discuss the brief overview of the active-active implementation and how it leveraged Cassandra’s architecture in the backend to achieve its goal. It will cover our journey through A-A from Cassandra’s perspective, the data validation we did to prove the backend would work without impacting customer experience. The various problems we faced, like long repair times and gc_grace settings, plus lessons learned and what would we do differently next time around, will also be discussed.
In this Introduction to GlusterFS webinar, introduction and review of the GlusterFS architecture and key functionalities. Learn how GlusterFS is deployed in the datacenter, in the cloud, or between the two. We’ll also cover a brief update on GlusterFS v3.3 which is currently in beta.
Introduction to GlusterFS Webinar - September 2011GlusterFS
Looking for a high performance, scale-out NAS file system? Or are you a new user of GlusterFS and want to learn more? This educational monthly webinar provides an introduction and review of the GlusterFS architecture and key functionalities. Learn how GlusterFS is deployed in the datacenter, in the cloud, or between the two.
White Paper - CEVA-XM4 Intelligent Vision ProcessorCEVA, Inc.
A change has come to consumer electronics. Once confined to the desktop, processing-intensive algorithms for image enhancement, computational photography and computer vision have moved en masse to camera ready smartphones, tablets, wearables and other embedded mobile devices. This movement has already hit the limits of today’s underlying hardware ability to keep pace in terms of performance, space and energy
efficiency, yet we are only seeing the tip of the iceberg.
A clear and tangible indicator of recent advances in mobile imaging and vision that are pushing these limits of design is the dual-camera smartphone, with its accompanying sensor and signal-chain processing for 3D vision and scanning, along with many other image-enhancement features. While consumers may believe they are coming closer to the ideal camera-plus-phone converged solution, designers and equipment manufacturers understand that compromises have been made as the increasingly advanced algorithms are simply relying upon the pre-existing hardware.
This hardware, typically comprising a CPU and a GPU, was not designed to support such processing-intensive imaging algorithms, so it is forcing developers to compromise on features and image quality to match the processing capabilities of the hardware. Even so, the total application continues to consume too much power and drastically shortens battery life, too much so for the still unwary user.
As newer and more-complex algorithms develop to meet both consumer demand for increased functionality as well as manufacturers’ need for differentiation, an alternate approach to the underlying vision processing architecture is required if the delicate balance between functionality and acceptable battery life is to be maintained. This alternate approach relies on the adoption of dedicated, on-chip vision processors that are able to cope with both current and future complex imaging and vision algorithms. CEVA-XM4 is exactly that, a fully programmable processor that was designed from the ground up to accelerate the most demanding image-processing and computer-vision algorithms.
This document supplies an overview of the CEVA-XM4 processor’s capabilities, architecture, features, target applications, use cases and code examples.
A generic layer that can be used with many key-value storage engines like Redis, Memcached, LMDB, etc
Focus: performance, cross-datacenter active-active replication and high availability
Features: node warmup (cold bootstrapping), tunable consistency, S3 backups/restores
Status: Open source, fully integrated with existing NetflixOSS ecosystem
NoSQL databases such as Redis, MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offer significantly better scalability and performance. However, using a NoSQL database means giving up the benefits of the relational model such as SQL, constraints and ACID transactions. For some applications, the solution is polyglot persistence: using SQL and NoSQL databases together.
In this talk, you will learn about the benefits and drawbacks of polyglot persistence and how to design applications that use this approach. We will explore the architecture and implementation of an example application that uses MySQL as the system of record and Redis as a very high-performance database that handles queries from the front-end. You will learn about mechanisms for maintaining consistency across the various databases.
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Redis Labs
Dynomite is a
thin, distributed dynamo layer for different storage engines and protocols. Currently at Netflix, we are focusing on using
Redis as the storage engine. Dynomite supports multi-datacenter replication and is designed for high availability. In the age of high scalability and big data, Dynomite’s design goal is to turn single-server datastore solutions into peer-to-peer, linearly
scalable, clustered systems while still preserving the native client/server protocols of the datastores, e.g., Redis protocol. In this talk, we are going to present Dynomite recent features, and the Dyno client. Both projects are open source and available to the community.
Keynote presentation for NoSQL Now! 2014 conference.
* Why there will be Internet of Things as commonly conceived
* The IoT as a Big Data problem
* The rise of Big Metadata
* Imagining the Internet of Light Bulbs
* Systems Thinking and Light Bulb Architecture
* The 'WItnesses' Principle
* Security and Privacy with the IoT
* A Species and its Data
Replication enhancements in MySQL 5.6, including GTIDs, HA / Self-healing, multi-threaded slaves and more. Slides over design rationale, implementation and how to get started using these new capabilities
Cisco UCS Integrated Infrastructure for Big Data with CassandraDataStax Academy
With growing popularity of big data, it becomes imperative for enterprises to adopt the right platform for their workload, with efficient and user-friendly management of large scale clusters. In this session we will explore Cisco's revolutionary innovations that deliver leading-edge infrastructure, well suited for Cassandra like data base platforms, purpose built for performance and scalability. This enables our customers to unlock the intelligence in their data. Not only this provides a sustainable competitive advantage to their business, but also scales with their growing business needs.
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scaleLinkedIn
This is one of two presentations given by LinkedIn engineers at Java One 2009.
This presentation was given by David Raccah and Dhananjay Ragade from LinkedIn.
For more information, check out http://blog.linkedin.com/
Cassandra Summit 2014: Active-Active Cassandra Behind the ScenesDataStax Academy
Presenter: Roopa Tangirala, Senior Cloud Data Architect at Netflix
High availability is an important requirement for any online business and trying to architect around failures and expecting infrastructure to fail, and even then be highly available, is the key to success. One such effort here at Netflix was the Active-Active implementation where we provided region resiliency. This presentation will discuss the brief overview of the active-active implementation and how it leveraged Cassandra’s architecture in the backend to achieve its goal. It will cover our journey through A-A from Cassandra’s perspective, the data validation we did to prove the backend would work without impacting customer experience. The various problems we faced, like long repair times and gc_grace settings, plus lessons learned and what would we do differently next time around, will also be discussed.
In this Introduction to GlusterFS webinar, introduction and review of the GlusterFS architecture and key functionalities. Learn how GlusterFS is deployed in the datacenter, in the cloud, or between the two. We’ll also cover a brief update on GlusterFS v3.3 which is currently in beta.
Introduction to GlusterFS Webinar - September 2011GlusterFS
Looking for a high performance, scale-out NAS file system? Or are you a new user of GlusterFS and want to learn more? This educational monthly webinar provides an introduction and review of the GlusterFS architecture and key functionalities. Learn how GlusterFS is deployed in the datacenter, in the cloud, or between the two.
White Paper - CEVA-XM4 Intelligent Vision ProcessorCEVA, Inc.
A change has come to consumer electronics. Once confined to the desktop, processing-intensive algorithms for image enhancement, computational photography and computer vision have moved en masse to camera ready smartphones, tablets, wearables and other embedded mobile devices. This movement has already hit the limits of today’s underlying hardware ability to keep pace in terms of performance, space and energy
efficiency, yet we are only seeing the tip of the iceberg.
A clear and tangible indicator of recent advances in mobile imaging and vision that are pushing these limits of design is the dual-camera smartphone, with its accompanying sensor and signal-chain processing for 3D vision and scanning, along with many other image-enhancement features. While consumers may believe they are coming closer to the ideal camera-plus-phone converged solution, designers and equipment manufacturers understand that compromises have been made as the increasingly advanced algorithms are simply relying upon the pre-existing hardware.
This hardware, typically comprising a CPU and a GPU, was not designed to support such processing-intensive imaging algorithms, so it is forcing developers to compromise on features and image quality to match the processing capabilities of the hardware. Even so, the total application continues to consume too much power and drastically shortens battery life, too much so for the still unwary user.
As newer and more-complex algorithms develop to meet both consumer demand for increased functionality as well as manufacturers’ need for differentiation, an alternate approach to the underlying vision processing architecture is required if the delicate balance between functionality and acceptable battery life is to be maintained. This alternate approach relies on the adoption of dedicated, on-chip vision processors that are able to cope with both current and future complex imaging and vision algorithms. CEVA-XM4 is exactly that, a fully programmable processor that was designed from the ground up to accelerate the most demanding image-processing and computer-vision algorithms.
This document supplies an overview of the CEVA-XM4 processor’s capabilities, architecture, features, target applications, use cases and code examples.
A generic layer that can be used with many key-value storage engines like Redis, Memcached, LMDB, etc
Focus: performance, cross-datacenter active-active replication and high availability
Features: node warmup (cold bootstrapping), tunable consistency, S3 backups/restores
Status: Open source, fully integrated with existing NetflixOSS ecosystem
NoSQL databases such as Redis, MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offer significantly better scalability and performance. However, using a NoSQL database means giving up the benefits of the relational model such as SQL, constraints and ACID transactions. For some applications, the solution is polyglot persistence: using SQL and NoSQL databases together.
In this talk, you will learn about the benefits and drawbacks of polyglot persistence and how to design applications that use this approach. We will explore the architecture and implementation of an example application that uses MySQL as the system of record and Redis as a very high-performance database that handles queries from the front-end. You will learn about mechanisms for maintaining consistency across the various databases.
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Redis Labs
Dynomite is a
thin, distributed dynamo layer for different storage engines and protocols. Currently at Netflix, we are focusing on using
Redis as the storage engine. Dynomite supports multi-datacenter replication and is designed for high availability. In the age of high scalability and big data, Dynomite’s design goal is to turn single-server datastore solutions into peer-to-peer, linearly
scalable, clustered systems while still preserving the native client/server protocols of the datastores, e.g., Redis protocol. In this talk, we are going to present Dynomite recent features, and the Dyno client. Both projects are open source and available to the community.
Keynote presentation for NoSQL Now! 2014 conference.
* Why there will be Internet of Things as commonly conceived
* The IoT as a Big Data problem
* The rise of Big Metadata
* Imagining the Internet of Light Bulbs
* Systems Thinking and Light Bulb Architecture
* The 'WItnesses' Principle
* Security and Privacy with the IoT
* A Species and its Data
Reconceiving the Web as a Distributed (NoSQL) Data SystemDaniel Austin
[Slides from NoSQL Now! 2013]
Nearly every Web request is a request for information from a database or a front-end caching system for one. Based on this concept, we can reconceive the Web as a large-scale distributed data system using NoSQL query languages across high-level protocols such as HTTP. Exploring this idea further leads us to a better understanding of the structure of the Web, and invites us to apply modern NoSQL thinking toward making it better. My goal is to re-orient people’s thinking toward the Web as a big NoSQL data system and then explore the implications.
This is an update of my previous presentation from 2012. I discuss several recent topics including the Witnesses Principle, the rise of Big Metadata, and the need for autonomy and possibly self-awareness for financial instruments designed to hold value for periods greater than a single human lifetime.
Keynote for HTML5 Devconf Fall 2014. I discuss the psychological and physical response times of human beings as nodes on the Internet, and answer the title question.
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...Daniel Austin
[Slides from NoSQL Now! 2013 Lightning Talks]
Grover’s Search is a famous Quantum Computing algorithm for searching random databases. It’s the fastest possible search algorithm in this universe. This is what Google will look like when it grows up! This is the second of 2 parts, explaing the algorithm in more detail.
Always Offline: Delay-Tolerant Networking for the Internet of ThingsDaniel Austin
Discussion of IoT networks and DTNs, including some speculation on social behavior-based routing and the similarities between the IoT and the ecology of living things.
NoSQL is not a buzzword anymore. The array of non- relational technologies have found wide-scale adoption even in non-Internet scale focus areas. With the advent of the Cloud...the churn has increased even more yet there is no crystal clear guidance on adoption techniques and architectural choices surrounding the plethora of options available. This session initiates you into the whys & wherefores, architectural patterns, caveats and techniques that will augment your decision making process & boost your perception of architecting scalable, fault-tolerant & distributed solutions.
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn. This was a presentation made at QCon 2009 and is embedded on LinkedIn's blog - http://blog.linkedin.com/
Relational databases are used extensively in many applications and systems, but they are not always the best data store solution to the problem at hand. In this session we discuss the limitations of RDBMS and show which NoSQL solutions can be used to overcome these limitations. We also cover migration topics, such as how to add NoSQL databases without adding complexity to your development and operations.
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBigDataCloud
"Navigating the Database Universe" was the topic of the Big Data Cloud meetup held on Jan 24th 2013 in Santa Clara, CA. This is the presentation made by Mike Stonebraker & Scott Jarr of VoltDB.
This meetup was sponsored by VoltDb.
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...lisapaglia
Webinar presentation delivered by Dr. Michael Stonebraker and Scott Jarr of VoltDB on December 11, 2012. www.voltdb.com
The design decisions you make today will have a huge performance impact down the line. Until recently, when it came to databases, the choice was easy. Essentially, you had one option: the RDBMS. Today, there's a new universe of databases being thrown into production — and not always with the greatest success. How do you make the right choice for your next application? Database pioneer Dr. Michael Stonebraker and VoltDB co-founder Scott Jarr have some thoughts.
NoSQL databases like MongoDB, Elasticsearch, and Cassandra are synonymous with scalability, search, and developer agility. But there’s a downside...having to give up the ease and comfort of SQL.
Or do you?
Join this webcast to learn how the newest databases, like CrateDB and CockroachDB deliver the benefits of NoSQL with the ease of SQL by building SQL engines on top of custom NoSQL technology stacks. Database industry veteran Andy Ellicott, who helped launch Vertica, VoltDB, Cloudant, and now with Crate.io, will provide a no-BS view of current DBMS architectures and predictions for the future of data.
If you’re a DBMS user, this webcast will help you make sense of a very crowded DBMS market and make better-informed decisions for your new tech stacks.
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...Daniel Austin
[Slides from NoSQL Now! 2013 Lightning Talks]
Ever wonder about quantum computing? With the recent announcement of the first commercial deployment off a quantum computing device, we’ve now crossed the barrier between theory and practice. This just-for-fun talk will provide some insight into quantum computing and related topics.
This is my iGnite Night talk from Surge 2011. I describe Grover's Search, a Quantum Computing search algorithm that is the fastest possible in this Universe. It's a challenge to explain something about Quantum Computing in 10 minutes!
Notes on a High-Performance JSON ProtocolDaniel Austin
This is my presentation from JSConf 2011. I am proposing a new Web protocol to improve performance across the Internet. It's based on a dual-band protocol layered over TCP/IP and UDP and is backward compatible with existing HTTP-based systems.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
UiPath Test Automation using UiPath Test Suite series, part 5
Yes sql08 inmemorydb
1. A Global In-memory Data System for MySQL
Daniel Austin, PayPal Technical Staff
Percona Live! MySQL Conference
Santa Clara, April 12th, 2012 v1.3
2. Intro: Globalizing NDB
Proposed Architecture
What We Learned
Q&A
Global In-memory MySQL Confidential and Proprietary 2
3. Mission YesSQL
“Develop a globally distributed DB For
user-related data.”
• Must Not Fail (99.999%)
• Must Not Lose Data. Period.
• Must Support Transactions
• Must Support (some) SQL
• Must WriteRead 32-bit integer globally in
1000ms
• MinMax Data Volume: 10-100 TB (working)
• Must Scale Linearly with Costs
Confidential and Proprietary
4. THE FUNDAMENTAL PROBLEM IN
DISTRIBUTED DATA SYSTEMS
“How Do We Manage Reliable
Distribution of Data Across Geographical
Distances?”
Confidential and Proprietary
5. One Approach: The NoSQL Solution
• NoSQL Systems provide a solution that relaxes
many of the common constraints of typical
RDBMS systems
– Slow - RDBMS has not scaled with CPUs
– Often require complex data management
(SOX, SOR)
– Costly to build and maintain, slow to change and
adapt
– Intolerant of CAP models (more on this later)
• Non-relational models, usually key-value
• May be batched or streaming
• Not necessarily distributed geographically
Confidential and Proprietary
6. Big Data Myth #1: Big Data = NoSQL
• „Big Data‟ Refers to a Common Set of Problems
– Large Volumes
– High Rates of Change
• Of Data
• Of Data Models
• Of Data Presentation and Output
– Often Require „Fast Data‟ as well as „Big‟
• Near-real Time Analytics
• Mapping Complex Structures
Takeaway: Big Data is the problem, NoSQL is one
(proposed) solution
Confidential and Proprietary
7. NoSQL Hype Cycle: Where Are We Now?
There are currently more than 120+ NoSQL
databases listed at nosql-database.org!
You Are Here ?
As the pace of new technology solutions has slowed, some clear winners have emerged.
Confidential and Proprietary
8. 3 Essential Kinds of NoSQL Systems
1. Columnar K-V Systems
– Hadoop, Hbase, Cassandra, PNUTs
2. Document-Based
– MongoDB, TerraCotta
3. Graph-Based
– FlockDB, Voldemort
Takeaway: These were originally designed as
solutions to specific problems because no
commercial solution would work.
Confidential and Proprietary
9. Intro: Globalizing Big Data
Proposed Architecture
What We Learned
Q&A
Global In-memory MySQL Confidential and Proprietary 9
10. OUR APPROACH: THE YESSQL SOLUTION
Use MySQL/NDB
Use AWS
Key Tradeoffs:
Cost vs. Performance
HA vs. CAP
Complexity vs. RDBMS features
Confidential and Proprietary
11. WHY NDB?
Pro Con
• True HA by design • Some semantic
– Fast recovery limitations on fields
• Supports (some) X- • Size constraints (2
actions TB?)
• Relational Model – Hardware limits
• In-memory also
architecture = high • Higher cost/byte
performance • Requires reasonable
• Disk storage for data partitioning
non-indexed data • Higher complexity
(since 5.1)
• APIs, APIs, APIs Confidential and Proprietary
12. How NDB Works in One Slide
Graphics courtesy dev.mysql.com
Confidential and Proprietary
14. SYSTEM AVAILABILITY DEFINED
• Availability of the entire
system:
n m
Asys = 1 – P(1-Pri)j
i=1 j=1 V
I
P
• Number of Parallel
Components Needed to
Parallel
Achieve Availability Amin:
Serial
Nmin =
[ln(1-Amin)/ln(1-r)]
Confidential and Proprietary
15. Big Data Myth #2: The CAP Theorem Doesn’t
Say What You Think It Does
• Consistency, Availability, (Network) Partition
– Pick any two? Not really.
• The Real Story: These are not Independent
Variables
• AP =CP (Um, what? But…A != C )
• Variations:
– PACELC (adds latency tolerance)
– VPAC (adds variability)
Takeaway: the real story here is about the tradeoffs
made by designers of different systems, and the
main tradeoff is between consistency and
availability, usually in favor of the latter.
Confidential and Proprietary
16. What about “High Performance”?
• Maximum lightspeed distance on Earth’s
Surface: ~67 ms
• Target: data available worldwide in < 1000 ms
Sound Easy?
Think Again!
Confidential and Proprietary
17. AWS Meets NDB
• Why AWS?
– Cheap and easy infrastructure-in-a-box
(Or so we thought! Ha!)
• Services Used:
– EC2 (Centos 5.3, small instances for mgm
& query nodes, XL for data
– Elastic IPs/ELB
– EBS Volumes
– S3
– Cloudwatch
Confidential and Proprietary
18. ARCHITECTURAL TILES
AWS Availability Zones
Tiling Rules
A B
• Never separate NDB & SQL
• Ndb:2-SQL:1-MGM:1
• Scale by adding more tiles
• Failover 1st to nearest AZ
• Then to nearest DC
• At least 1 replica/AZ
C ELB
• Don‟t share nodes
• Mgmt nodes are redundant
Limitations Unused (not present in all locations)
• AWS is network-bound @
250 MBPS – ouch!
• Need specific ACL across AZ
boundaries
• AZs not uniform!
• No GSLB NDB MGM SQL
• Dynamic IPs
• ELB sticky sessions !reliable
Confidential and Proprietary
19. Architecture Stack
Scale by Tiling
A B A B A B A B
A B
A B
A B A B
7 AWS Data Centers:
US-E, US-W, US-W2, TK, EU, AS, SA-B
Confidential and Proprietary
20. Other Technologies Considered
• Paxos
– Elegant-but-complex consensus-based
messaging protocol
– Used in Google Megastore, Bing metadata
• Java Query Caching
– Queries as serialized objects
– Not yet working
• Multiple Ring Architectures
– Even more complicated = no way
Confidential and Proprietary
21. Intro: Globalizing Big Data
Proposed Architecture
What We Learned
Q&A
Global In-memory MySQL Confidential and Proprietary 21
22. SYSTEM READ/WRITE PERFORMANCE (!)
What we tested:
• 32 & 256 byte char fields
In-region replication tests
• Reads, writes, query speed vs. volume
• Data replication speeds
Results:
• Global replication < 350 ms
• 256 byte writeread < 1000ms worldwide
06/19/2011 06/20/2011 06/21/2011 06/22/2011 06/23/2011
Confidential and Proprietary
23. The Commit Ordering Problem
• Why does commit ordering matter?
• Write operators are non-commutative
[W(d,t1),W(d,t2)] != 0 unless t1=t2
– Can lead to inconsistency
– Can lead to timestamp corruption
– Forcing sequential writes defeats Amdahl‟s
rule
• Can show up in GSLB scenarios
Confidential and Proprietary
24. Dark Side of AWS
• Deploying NDB at scale on AWS is hard
– Dynamic IPs (use hostfile)
– DNS issues
– Security groups (ec2-authorize)
– Inconsistent EC2 deployments
• Are availability zones independent
network segments?
– No GSLB (!) (rent or buy)
Be prepared to struggle a bit!
Confidential and Proprietary
25. Future Directions for YesSQL
• Alternate solution using Pacemaker, Heartbeat
– From Yves Trudeau @ Percona
– Uses InnoDB, not NDB
• Implement Memcached plugin
– To test NoSQL functionality, APIs
• Add simple connection-based persistence to preserve
connections during failover
• Can we increase the number of data nodes?
• As the number of datacenters goes up, sync times go
down. What‟s the relation?
• So far only limited testing of hard configurations
– Production systems will likely need more RAM for MGMs
Confidential and Proprietary
26. Hard Lessons, Shared
• Be Careful…
– With “Eventual Consistency”-related concepts
– ACID, CAP are not really as well-defined as we‟d like
considering how often we invoke them
• NDB is a good solution
– Real HA, real performance, real SQL
– Notable limitations around fields, datatypes
– Successfully competes with NoSQL systems for most use
cases – better in many cases
• AWS makes the hard things possible
– But not so much the easy things easier
Takeaway: You can achieve high performance and
availability without giving up relational models and
read consistency!
Confidential and Proprietary
27. “In the long run, we are all dead
eventually consistent.”
Maynard Keynes on NoSQL Databases
Twitter: @daniel_b_austin
Email: daaustin@paypal.com
With apologies and thanks to the real DB experts, Andrew Goodman, Yves
Trudeau, Clement Frazer, Daniel Abadi, and everyone else!
Editor's Notes
Service Reliability. Must be buzzword compliant, as in RFC 2119 Tradeoffs discussed previously.
This is really the problem we want to solve. It’s one of the fundamental problems in computer science and doesn’t have a completely satisfactory solution.
But is this all really necessary? Or just fashion?
This is big myth #1. they are not at all necessarily even related, one could have either or both
Mike’s talk last year, only
For YesSQL, we needed a more complex set of relations, and transactional support.
The rest of this section of our talk will be working through the consequences of these tradeoffs.
NDB has a much higher maturity level than most NoSQL systems
The CAP Theorem is a limited version of the Systemic Qualities model.
Performance is response time.
“Conservation of timestamps”
Also need to consider updates from versions after 5.1