Megastore is a scalable data storage system developed by Google to meet the requirements of modern interactive online services. It blends the scalability of NoSQL databases with the convenience of SQL, providing ACID transactions across entity groups. Megastore uses Bigtable for data storage and an improved Paxos algorithm to synchronously replicate transaction logs across data centers, achieving high availability even in the case of data center failures.
This presentation on "Getting Started with HazelCast" was made by Sandeep Kumar Pandey from Lastminute.com in Core Java / BoJUG meetup group on 24th March.
"In this session, we are going to talk about high level architecture of Hazelcast framework and we will look into the Java Collections and concepts which has been used to build the framework. We will also have a live demo on Distributed Cache using Hazelcast."
One of MongoDB’s primary appeals to developers is that it gives them the ability to start application development without needing to define a formal, up-front schema. Operations teams appreciate the fact that they don't need to perform a time-consuming schema upgrade operation every time the developers need to store a different attribute (as an example, The Weather Channel is now able to launch new features in hours whereas it used to take weeks). For business leaders, the application gets launched much faster, and new features can be rolled out more frequently. MongoDB powers agility.
Some projects reach a point where it's necessary to define rules on what's being stored in the database – for example, that for any document in a particular collection, you can be assured that certain attributes are present.
To address the challenges discussed above, while at the same time maintaining the benefits of a dynamic schema, MongoDB 3.2 introduces document validation.
There is significant flexibility to customize which parts of the documents are **and are not** validated for any collection.
This presentation on "Getting Started with HazelCast" was made by Sandeep Kumar Pandey from Lastminute.com in Core Java / BoJUG meetup group on 24th March.
"In this session, we are going to talk about high level architecture of Hazelcast framework and we will look into the Java Collections and concepts which has been used to build the framework. We will also have a live demo on Distributed Cache using Hazelcast."
One of MongoDB’s primary appeals to developers is that it gives them the ability to start application development without needing to define a formal, up-front schema. Operations teams appreciate the fact that they don't need to perform a time-consuming schema upgrade operation every time the developers need to store a different attribute (as an example, The Weather Channel is now able to launch new features in hours whereas it used to take weeks). For business leaders, the application gets launched much faster, and new features can be rolled out more frequently. MongoDB powers agility.
Some projects reach a point where it's necessary to define rules on what's being stored in the database – for example, that for any document in a particular collection, you can be assured that certain attributes are present.
To address the challenges discussed above, while at the same time maintaining the benefits of a dynamic schema, MongoDB 3.2 introduces document validation.
There is significant flexibility to customize which parts of the documents are **and are not** validated for any collection.
Polyglot Database - Linuxcon North America 2016Dave Stokes
Many Relation Databases are adding NoSQL features to their products. So what happens when you can get direct access to the data as a key/value pair, or you can store an entire document in a column of a relational table, and more
What Your Database Query is Really DoingDave Stokes
Do you ever wonder what your database servers is REALLY doing with that query you just wrote. This is a high level overview of the process of running a query
New Security Features in Apache HBase 0.98: An Operator's GuideHBaseCon
Speakers: Andrew Purtell and Ramkrishna Vasudevan (Intel)
HBase 0.98 introduces several new security features: visibility labels, cell ACLs, transparent encryption, and coprocessor framework changes. This talk will cover the new capabilities available in HBase 0.98+, the threat models and use cases they cover, how these features stack up against other data stores in the Apache big data ecosystem, and how operators and security architects can take advantage of them.
This presentation provides a comparison of types of caches highlighting the use of distributed caching. This is followed by an introduction to JCache API. Code samples are at [2].
This is the presentation I did at Java Colombo JUG. [2]
[1] https://github.com/kasunbg/jcache-samples
[2] http://www.meetup.com/java-colombo/events/223811796/
Develop PHP Applications with MySQL X DevAPIDave Stokes
The X DevAPI provides a way to use MySQL as a NoSQL JSON Document Store and this presentation covers how to use it with the X DevAPI PHP PECL extension. And it also works with traditional relational tables. Presented at Oracle CodeOne 24 October 2018
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax
Leveraging your operational data for advanced and predictive analytics enables deeper insights and greater value for cloud applications. DSE Analytics is a complete platform for Operational Analytics, including data ingestion, stream processing, batch analysis, and machine learning.
In this talk we will provide an overview of DSE Analytics as it applies to data science tools and techniques, and demonstrate these via real world use cases and examples.
Brian Hess
Rob Murphy
Rocco Varela
About the Speakers
Brian Hess Senior Product Manager, Analytics, DataStax
Brian has been in the analytics space for over 15 years ranging from government to data mining applied research to analytics in enterprise data warehousing and NoSQL engines, in roles ranging from Cryptologic Mathematician to Director of Advanced Analytics to Senior Product Manager. In all these roles he has pushed data analytics and processing to massive scales in order to solve problems that were previously unsolvable.
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...Vivek Adithya Mohankumar
The research paper covers the consolidated interpretation of NoSQL systems, on the basis of performance, scalability and data aggregation, and compares the types of NoSQL databases based on their implementation and maintenance.
Run Cloud Native MySQL NDB Cluster in KubernetesBernd Ocklin
The more your database aligns with Cloud Native principles such as resilience, scaling, auto-healing and data consistency across all nodes, the better it also runs as DBaaS in Kubernetes. I walk through running databases in Kubernetes and demos manual deployment and deployment with an NDB operator.
This talk was given at the MySQL Dev Room FOSDEM 2021.
Big Data: Big SQL web tooling (Data Server Manager) self-study labCynthia Saracco
This hands-on lab introduces you to Data Server Manager, a Web tool for querying and monitoring your Big SQL database. Data Server Manager (DSM) and Big SQL support select Apache Hadoop platforms.
Dutch PHP Conference 2021 - MySQL Indexes and HistogramsDave Stokes
Slides from the 2021 Dutch PHP Conference on MySQL Indexes, histograms, and other things to speed up your database queries. Speeding up your database queries is mainly learning how to efficiently give the query optimizer what is needs to provide the best query plan for your data.
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...Varad Meru
Slides created as a part of CS 295's week 5 on Transactions and Systems.
CS 295 (Cloud Computing and BigData) at UCI - https://sites.google.com/site/cs295cloudcomputing/
Polyglot Database - Linuxcon North America 2016Dave Stokes
Many Relation Databases are adding NoSQL features to their products. So what happens when you can get direct access to the data as a key/value pair, or you can store an entire document in a column of a relational table, and more
What Your Database Query is Really DoingDave Stokes
Do you ever wonder what your database servers is REALLY doing with that query you just wrote. This is a high level overview of the process of running a query
New Security Features in Apache HBase 0.98: An Operator's GuideHBaseCon
Speakers: Andrew Purtell and Ramkrishna Vasudevan (Intel)
HBase 0.98 introduces several new security features: visibility labels, cell ACLs, transparent encryption, and coprocessor framework changes. This talk will cover the new capabilities available in HBase 0.98+, the threat models and use cases they cover, how these features stack up against other data stores in the Apache big data ecosystem, and how operators and security architects can take advantage of them.
This presentation provides a comparison of types of caches highlighting the use of distributed caching. This is followed by an introduction to JCache API. Code samples are at [2].
This is the presentation I did at Java Colombo JUG. [2]
[1] https://github.com/kasunbg/jcache-samples
[2] http://www.meetup.com/java-colombo/events/223811796/
Develop PHP Applications with MySQL X DevAPIDave Stokes
The X DevAPI provides a way to use MySQL as a NoSQL JSON Document Store and this presentation covers how to use it with the X DevAPI PHP PECL extension. And it also works with traditional relational tables. Presented at Oracle CodeOne 24 October 2018
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax
Leveraging your operational data for advanced and predictive analytics enables deeper insights and greater value for cloud applications. DSE Analytics is a complete platform for Operational Analytics, including data ingestion, stream processing, batch analysis, and machine learning.
In this talk we will provide an overview of DSE Analytics as it applies to data science tools and techniques, and demonstrate these via real world use cases and examples.
Brian Hess
Rob Murphy
Rocco Varela
About the Speakers
Brian Hess Senior Product Manager, Analytics, DataStax
Brian has been in the analytics space for over 15 years ranging from government to data mining applied research to analytics in enterprise data warehousing and NoSQL engines, in roles ranging from Cryptologic Mathematician to Director of Advanced Analytics to Senior Product Manager. In all these roles he has pushed data analytics and processing to massive scales in order to solve problems that were previously unsolvable.
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...Vivek Adithya Mohankumar
The research paper covers the consolidated interpretation of NoSQL systems, on the basis of performance, scalability and data aggregation, and compares the types of NoSQL databases based on their implementation and maintenance.
Run Cloud Native MySQL NDB Cluster in KubernetesBernd Ocklin
The more your database aligns with Cloud Native principles such as resilience, scaling, auto-healing and data consistency across all nodes, the better it also runs as DBaaS in Kubernetes. I walk through running databases in Kubernetes and demos manual deployment and deployment with an NDB operator.
This talk was given at the MySQL Dev Room FOSDEM 2021.
Big Data: Big SQL web tooling (Data Server Manager) self-study labCynthia Saracco
This hands-on lab introduces you to Data Server Manager, a Web tool for querying and monitoring your Big SQL database. Data Server Manager (DSM) and Big SQL support select Apache Hadoop platforms.
Dutch PHP Conference 2021 - MySQL Indexes and HistogramsDave Stokes
Slides from the 2021 Dutch PHP Conference on MySQL Indexes, histograms, and other things to speed up your database queries. Speeding up your database queries is mainly learning how to efficiently give the query optimizer what is needs to provide the best query plan for your data.
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...Varad Meru
Slides created as a part of CS 295's week 5 on Transactions and Systems.
CS 295 (Cloud Computing and BigData) at UCI - https://sites.google.com/site/cs295cloudcomputing/
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB
AOL experienced explosive growth and needed a new database that was both flexible and easy to deploy with little effort. They chose MongoDB. Due to the complexity of internal systems and the data, most of the migration process was spent building a new identity platform and adapters for legacy apps to talk to MongoDB. Systems were migrated in 4 phases to ensure that users were not impacted during the switch. Turning on dual reads/writes to both legacy databases and MongoDB also helped get production traffic into MongoDB during the process. Ultimately, the project was successful with the help of MongoDB support. Today, the team has 15 shards, with 60-70 GB per shard.
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
Watch full webinar here: https://bit.ly/3ohtRqm
Companies with corporate data lakes also need a strategy for how to best integrate them with their overall data fabric. To take full advantage of a data lake, data architects must determine what data belongs in the Lake vs. other sources, how end users are going to find and connect to the data they need as well as the best way to leverage the processing power of the data lake. This webinar will provide you with a deep dive look at how the Denodo Platform for data virtualization enables companies to maximize their investment in their corporate data lake.
Watch on-demand this webinar to learn:
- How to create a logical data fabric with Denodo
- How to leverage the a data lake for MPP Acceleration and Summary Views
- How to leverage Presto with Denodo for file based data lakes (ie. S3, ADLS, HDFS, etc.)
SpringPeople - Introduction to Cloud ComputingSpringPeople
Cloud computing is no longer a fad that is going around. It is for real and is perhaps the most talked about subject. Various players in the cloud eco-system have provided a definition that is closely aligned to their sweet spot –let it be infrastructure, platforms or applications.
This presentation will provide an exposure of a variety of cloud computing techniques, architecture, technology options to the participants and in general will familiarize cloud fundamentals in a holistic manner spanning all dimensions such as cost, operations, technology etc
This is the course that was presented by James Liddle and Adam Vile for Waters in September 2008.
The book of this course can be found at: http://www.lulu.com/content/4334860
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
Organizations today need a broad set of enterprise data cloud services with key data functionality to modernize applications and utilize machine learning. They need a platform designed to address multi-faceted needs by offering multi-function Data Management and analytics to solve the enterprise’s most pressing data and analytic challenges in a streamlined fashion. They need a worry-free experience with the architecture and its components.
Scaling JPA applications or deploying them to flexible resources can be a challenge. How do I scale, what is the impact on caching and how can I reuse resources? In this talk we will work through these challenges with real examples using JPA and EclipseLink. Exploring where and when to apply best practices and the many features available for caching, scalability, resource sharing and elastic deployments.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
By Design, not by Accident - Agile Venture Bolzano 2024
Noha mega store
1. MegaStore
Google Inc.
Jason Baker, Chris Bond, James C Corbett, JJ Furman,
Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li,
Alexander Lloyd, Vadim Yushprakh. CIDR 2011.
Presented by: Noha Elprince
22 June, 2011
2. What is MegaStore?
§ A storage system developed to meet the
requirements of today’s online interactive services.
§ Megastore is the data engine supporting the Google
App Engine (GAE) https://appengine.google.com/
§ GAE cloud computing technology:
Hosts/virtualizes web apps across multiple
servers on Google’s platform.
Ø Fast development and deployment.
Ø Simple administration.
Ø No need to worry about hardware
patches or backups and scalability.
Ø
2
3. Outline
— Motivation & Problem
— Methodology
— Design of Megastore
—
—
—
Data Model
Data Storage
Transactions and Concurrency Control
— How Megastore achieves Availability and Scalability.
—
—
PAXOS.
Megastore’s approach.
— Experience
— Related Work
— Conclusion
3
4. Megastore- Motivation
• Storage requirements of today’s interactive online
applications.
—
—
—
—
—
Highly scalable
Rapid development
Low latency
Durability and consistency
Availability and fault tolerance.
• These requirements are in conflict !
4
5. CAP Theorem – Eric Brewer 2000
“In a distributed database system,
you can only have at most two of
the following three characteristics:
Ø Consistency
Ø Availability
Ø Partition tolerance
”
ACID = Atomicity, Consistency,
Isolation, Durability.
5
6. Problem
§ Conflicts between Available systems:
— RDBMS
Rich set of features, expressive language helps development,
but difficult to scale.
Eg: MySQL, PostgreSQL, MS SQL Server, Oracle RDB.
— NoSQL datastores
Highly Scalable but Limited API and loose consistency models.
Eg: Google’s BigTable, Apache Hadoop’s Hbase, Facebook’s Cassandra.
§
Reliability of a single datacenter cant be guaranteed 100%.
[“Always expect the unexpected”—James Patterson]
6
7. Methodology
— Megastore blends the scalability of NoSQL with the
convenience of traditional RDBMS.
— High reliability can be achieved by:
Ø Data lives in multiple data centers.
Ø Write to a majority of datacenters synchronously.
Ø Allow the infrastructure decide what datacenter to read from and
write to.
7
8. Outline
þ — Motivation & Problem
þ — Methodology
— Design of Megastore
—
—
—
Data Model
Data Storage
Transactions and Concurrency Control
— How Megastore achieves Availability and Scalability.
—
—
PAXOS.
Megastore’s approach.
— Experience
— Related Work
— Conclusion
8
9. Design of Megastore : DataModel
— The data model is declared in a schema.
— Each schema has a set of tables : root tables or child tables.
— Entity Group – consists of a root entity along with all child
entities.
CREATE SCHEMA PhotoApp;
CREATE TABLE User {
required int64 user_id;
required string name;
} PRIMARY KEY(user_id),
ENTITY GROUP ROOT;
CREATE TABLE Photo {
required int64 user_id;
required int32 photo_id;
required int64 time;
required string full_url;
optional string thumbnail_url;
repeated string tag;
} PRIMARY KEY(user_id, photo_id),
IN TABLE User,
ENTITY GROUP KEY(user_id)
REFERENCES User;
9
10. Design of Megastore : DataModel
• (Hierarchical) data is de-normalized to eliminate the join costs
Joins are implemented in application level
• Outer joins with parallel queries using secondary indexed
• Provides an efficient stand-in for SQL-style joins
10
11. Design of Megastore : Data Storage
How is it stored in BigTable?
“A Bigtable is a compressed, high performance, and proprietary database
system built on :
Google File System (GFS), Chubby Lock service and other Google
programs ”
11
12. Design of Megastore : Data Storage
Example:
User {user_id:101, name: ‘John’ }
Photo{ user_id:101, photo_id:501, time 2009, full_url:
‘john-pic1’,
Row
Key
101
User.na Photo.
me
time
User{user_id:102, name: ‘Mary’ }
Photo{ user_id:102, photo_id:600, time:2009,
full_url: ‘mary-pic1’, tag:’office’, tag:’picnic’,
tag:’Paris’}
Photo{ user_id:102, photo_id:601, time:2011,
full_url: ‘mary-pic2’, tag:’birthday’, tag:’friends’}
Photo
URL
John
101,
501
2009
Vacation,
Hoilday,
Paris
…
101,
502
2010
Office,
friends, pub
…
102,
600
2009
Office,
Picnic,
Paris
…
102,
601
2011
Birthday,
Friends
…
tag:’vacation’, tag:’holiday’, tag:’Paris’}
Photo{ user_id:101, photo_id:502, time:2010, full_url:
‘john-pic2’, tag:’office’,
tag:’friends’, tag:’pub’}
Photo.
Tag
102
Mary
12
13. Design of Megastore : Data Storage
— Indexing
— Local Index – find data within Entity Group.
CREATE LOCAL INDEX PhotosByTime ON Photo(user_id, time);
— Global Index - spans entity groups.
CREATE GLOBAL INDEX PhotosByTag ON Photo(tag) STORING
(thumbnail_url);
— The ‘Storing’ Clause
Ø Faster retrieval of certain properties.
13
15. Outline
þ — Motivation & Problem
þ — Methodology
þ — Design of Megastore
✓ — Data Model
✓ — Data Storage
—
Transactions and Concurrency Control
— How Megastore achieves Availability and Scalability.
—
—
PAXOS.
Megastore’s approach.
— Experience
— Related Work
— Conclusion
15
16. Transactions and Concurrency Control
• Each Entity Group acts as mini-db, provides
ACID semantics.
• Transaction management using Write
Ahead Logging (WAL).
• BigTable feature – ability to store multiple
data for same row/column with different
timestamps.
• Cross entity group transactions supported
via two-phase commit (2PC).
• Entites in an Entity group employs
Multiversion Concurrency Control (MVCC).
17. Transactions and Concurrency Control
— MVCC: multiversion concurrency control
Using timestamps - reads and writes do not block each other.
— Read consistency
— Current: wait for uncommitted writes then read last committed value
— Snapshot: doesn't’t wait. Reads last committed values.
— Inconsistent reads: ignore the state of log and read the last values directly
(data may be stale)
— Write consistency
— Determine the next available log position
— Assigns mutations of write-ahead log (WAL) a timestamp higher than any
previous one
— Employs Paxos to settle the resource contention : Select a winner to write on
a certain entity group. The others will abort/retry their operations.
It uses optimistic concurrency OCC with mutations (write operations):
(Assumes there is no transaction ‘s data conficts => proceed without locks )
18. Transactions and Concurrency Control
q Queues
§ Provide transactional messaging
between entity groups.
§ Each message either is :
Ø Synchronous: has a single
sending and receiving entity group.
Ø Asynchronous: has different
sending and receiving entity group.
Fig. Operations across entity groups
Ø Useful to perform operations that affect many entity groups.
18
19. Transactions and Concurrency Control
q Two-Phase Commit (2PC)
§ Coordinator: the component that receives the commit/abort request
§ Participants: the resource managers that did work on behalf of
the transaction (by reading/updating resources).
* Goal: Ensure that the coordinator and all participants either
commit/abort the transaction => Atomicity is satisfied. Source: Ref[2]
Disadv. High latency
Adv.
Simplify code for unique secondary key enforcement.
19
20. Other Features
— Integrated Backup System
Ø used to restore back an entity group’s state to
any point in time
— Data Encryption
Ø use distinct key/entity group
20
21. Outline
þ — Motivation & Problem
þ — Methodology
þ — Design of Megastore
✓ — Data Model
✓ — Data Storage
✓ — Transactions and Concurrency Control
— How Megastore achieves Availability and Scalability.
—
—
PAXOS.
Megastore’s approach.
— Experience
— Related Work
— Conclusion
21
22. Megastore – Availability / Scalability
v Megastore Replication System
• Replication is done per entity group by:
synchronously replicating the group’s
transaction log into a number of replicas.
• Reads and writes can be initiated
from any replicas.
• Writes require one round of interdatacenter communication.
• ACID semantics are preserved
regardless of what replica a client
starts from.
Fig. Scalable Replication
23. Megastore – Replication
— PAXOS Algorithm
• a way to reach consensus among a group of replicas on a single value.
• Databases typically use PAXOS to replicate a transaction log, where a
separate instance of PAXOS is used for each position in the log.
Source: Ref[3]
Adv. Tolerates delayed or reordered messages and replicas that fail by
Stopping (can tolerate upto N/2 failures).
Disadv. high-latency bec. it demands multiple rounds of communication.
so Megastore uses an improved version.
24. Megastore – Replication
• Master-Based Approach
Ø A Master-Slave model is generally used where the Master
handles all the replication of writes.
Ø But it causes a bottleneck.
25. Megastore – Replication
• MegaStore Replication System (PAXOS-modified)
§ Fast Reads
-
Allow local reads from any where.
- Tracks a set of entity groups for which its replica has observed
all PAXOS writes and serve their local reads.
§ Fast Writes
- A specific replica is chosen as a leader.
- The leader decides the proposal no. and sends it to other writers.
- The first writer submits a value to the leader, wins the
right to ask all replicas to accept that value.
• Select the next write’s leader using the closest replica heuristic
(aim: minimizes the writer-leader latency by observing: most
apps submit writes from the same region repeatedly).
26. Outline
þ — Motivation & Problem
þ — Methodology
þ — Design of Megastore
✓ — Data Model
✓ — Data Storage
✓ — Transactions and Concurrency Control
þ — How Megastore achieves Availability and Scalability.
—
—
PAXOS.
Megastore’s approach.
— Experience
— Related Work
— Conclusion
26
27. Experience
² Real-world deployment
— More than 100 production application use Megastore
(e.g. Google App Engine)
— Most of applications see extremely high availability
— Most of users see average write latencies of 100~400 ms.
28. Related Work
— NoSQL data storage systems
— Bigtable, Cassandra, Yahoo PNUTS, Amazon SimpleDB
— Data replication process
— Hbase, CouchDB, Dynamo, …
— Extend replication scheme of traditional RDBMS
systems
— Paxos algorithm
— SCALARIS, Keyspace, …
— Few have used Paxos to achieve synchronous replication
29. Conclusion
Megastore
Ø A scalable, highly available datastore for interactive
internet services.
Ø Paxos is used for synchronous replication.
Ø Bigtable as the scalable datastore while adding richer
primitives (ACID, Indexes).
Ø Has over 100 applications in productions
29
31. References
— [1] “Megastore: Providing Scalable Highly Available Storage for
Interactive Services.” Jason Baker et al.. CIDR 2011.
— [2] “Principles of transaction Processing.”
Philip A. Bernstein, Eric Newcomer, Morgan Kaufmann, 2009.
— [3] http://paprika.umw.edu/~ernie/cpsc321/10312006.html
— [4] Google MegaStore’s Presentation at SIGMOD 2008.
http://perspectives.mvdirona.com/2008/07/10/
GoogleMegastore.aspx.
31
32. Megastore – Replication
Megastore Read Process
— Each replica stores mutations
and metadata for the log entries
— Read process
— 1. Query Local
—
— 2.
—
—
— 3.
—
Up-to-date check
Find position
Highest log position
Select replica
Catchup
Check the consensus
value from other
replica
— 4. Validate
— Synchronizing with
up-to-data
— 5. Query data
— Read data with timestamp
33. Megastore – Replication
— Megastore Write Process
—
—
Each replica stores mutations
and metadata for the log entries
Write process
— 1. Accept leader
—
Ask the leader to accept
the value as proposal
number
— 2. Prepare
—
Run the Paxos Prepare
phase at all replica
— 3. Accept
—
Ask remaining replicas
to accept the value
— 4. Invalidate
—
Fault handling for replicas
which did not accept the value
— 5. Apply
—
Apply the value’s mutation at
as many replicas as possible