The document discusses a talk given by Ikai Lan about the App Engine Datastore. The talk aimed to provide an understanding of how the datastore works under the hood and conceptual background for a persistence coding lab. It covered topics like the underlying Bigtable infrastructure, indexing and queries, entity groups, and different layers of the datastore including Bigtable, Megastore, and the App Engine Datastore API. It provided examples of saving an entity using the low-level Java API and performing queries on properties using the built-in indexes. Complex queries can be resolved through a zig zag merge join strategy across multiple indexes.
AFCEA C4I Symposium: The 4th C in C4I Stands for Cloud:Factors Driving Adopti...Patrick Chanezon
Computer systems architecture evolve in cycles every 15-20 years, oscillating between centralization and decentralization: centralized mainframes of the 60s, decentralized PCs of the 80s, centralized web apps of the 90s. Since 2010, we see a new architecture shift back to the 80's client-server model, with 3 trends: powerful mobile device (android, iphone), the browser becoming a rich client platform with html5, and cloud platforms commoditizing distributed computing on the server. This talk is about the server side of the current architecture shift.
As most technology architecture changes, cloud computing adoption is driven by factors from multiple dimensions, not only technical ones:
- technology: Big Data & fast networks, shift from vertical to horizontal scalability, commoditization of distributed computing (Virtualization, Sharding, Storage, NoSQL databases, Paxos, Map/Reduce, Go language), centralization of security
- economy: broadband and wireless ubiquity, shift from product to services, economies of scale, Moore's law, cost of electricity becoming main driver for computing cost , pay as you go models
- culture: consumerization of enterprise technology, technology achieves ubiquity by disappearing
20 years ago when I was involved with Command and Control Systems for the french DoD, they were called C3I. Since then it seems they added a C for Computers, C4I. Maybe for the next 20 years the 4th C of C4I should stand for Cloud.
Finding the Right Data Solution for Your Application in the Data Storage Hays...Srinath Perera
The NoSQL movement has rekindled interest in data storage solutions. A few years ago, within limited scale systems, storage choices for programmers and architects were simple where relational databases were almost always the choice. However, advent of Cloud and ever increasing user bases for applications have given rise to larger scale systems. Relational databases cannot always scale to meet the needs of those systems, and as an alternative, the NoSQL movement has proposed many solutions.
For a programmer who wants to select a data model, they now have to choose from a wide variety of choices like Local memory, Relational databases, Files, Distributed Cache, Column Family Storage, Document Storage, Name value pairs, Graph DBs, Service Registries, Queue, and Tuple Space etc. Furthermore, there are different layers/access choices such as directly accessing data, using object to relation mapping layer like hibernate/JPA, or using data services. Moreover, users also need to worry about how to scale up the storage in multiple dimensions like the number of databases, the number of tables, the amount of data in a table, frequency of requests, types of requests (read/write ratio).
Consequently, choosing the right data model for a given problem is no longer trivial, and such a choice needs a clear understanding of different storage offerings, their similarities, differences, as well as associated tradeoffs. We faced the same problem while designing the data interfaces for Stratos Platform as a Service (SaaS) offering, and in this talk, we would like to share our findings and experiences of that work. We will present a survey of different data models, their differences as well as similarities, tradeoffs, and killer apps for each model. We believe the participants will walk away with a border understanding about data models and guidelines on which model to be used when.
Oracle database performance monitoring diagnosis and reporting with EG Innova...eG Innovations
The Oracle database platform is powering many of today's business-critical applications and services. As applications and IT infrastructures are getting more complex and interconnected, performance issues anywhere in the IT infrastructure can quickly cascade and negatively impact end user experience. When Oracle database access is slow, is the issue with the Oracle database configuration or sizing? Or could it because of the storage tier? Virtualization platform? Application queries? Network?
Join this live demo to see how next-generation performance monitoring & analytics provides deep visibility into Oracle database environments to accelerate the diagnosis of application and server performance issues, and quickly restore user experience. During the live demonstration, we will show you how to:
• Have a single unified monitoring solution that addresses your database, virtualization, network and storage monitoring, diagnosis, analytics, and reporting needs;
• Use intelligent analytics to analyze and correlate performance inside the database server and across the other tiers of your IT environment to provide unparalleled speed & ease of proactive alerting, diagnosis & analysis;
• View best-in-class customizable dashboards that integrate performance metrics regarding the database and other tiers to provide real-time role-based and domain-based views on user experience, system and service health, resource consumption, capacity and more;
• Report on historical performance and trends and analyze usage patterns to right-size and optimize your IT infrastructure for maximum ROI;
Glitches can occur in even the best run IT operations. In this session, Accelrys support experts will share tips and tricks for proactively managing the performance of your ELN and detailed strategies for troubleshooting issues when they arise. Discussions will draw from real-world experience and will provide you with detailed strategies to leverage Accelrys support and minimize the time required to diagnose an issue.
RightScale Webinar: ServerTemplates™ are the innovative "secret sauce" of the RightScale Cloud Management Platform. They enable you to easily architect, launch, manage, and monitor multi-server deployments. More than half of the 40,000-plus RightScale ServerTemplates were created from scratch by our customers. By using ServerTemplates, you can slice up your existing configurations into your own custom blueprints for cloud servers. In this session, we'll share best practices for developing, testing, and maintaining your own custom ServerTemplates.
Understanding Metadata: Why it's essential to your big data solution and how ...Zaloni
In this O'Reilly webcast, Ben Sharma (cofounder and CEO of Zaloni) and Vikram Sreekanti (software engineer in the AMPLab at UC Berkeley) discuss the value of collecting and analyzing metadata, and its potential to impact your big data solution and your business.
Watch the replay here: http://oreil.ly/28LO7IW
Slides for a talk at the Colorado Software Summit in 2008 that I did about growing Bumper Sticker, a Ruby on Rails Facebook app to over a billion pageviews.
Funny thing is ... I had to bail on the conference. Had to ship product.
AFCEA C4I Symposium: The 4th C in C4I Stands for Cloud:Factors Driving Adopti...Patrick Chanezon
Computer systems architecture evolve in cycles every 15-20 years, oscillating between centralization and decentralization: centralized mainframes of the 60s, decentralized PCs of the 80s, centralized web apps of the 90s. Since 2010, we see a new architecture shift back to the 80's client-server model, with 3 trends: powerful mobile device (android, iphone), the browser becoming a rich client platform with html5, and cloud platforms commoditizing distributed computing on the server. This talk is about the server side of the current architecture shift.
As most technology architecture changes, cloud computing adoption is driven by factors from multiple dimensions, not only technical ones:
- technology: Big Data & fast networks, shift from vertical to horizontal scalability, commoditization of distributed computing (Virtualization, Sharding, Storage, NoSQL databases, Paxos, Map/Reduce, Go language), centralization of security
- economy: broadband and wireless ubiquity, shift from product to services, economies of scale, Moore's law, cost of electricity becoming main driver for computing cost , pay as you go models
- culture: consumerization of enterprise technology, technology achieves ubiquity by disappearing
20 years ago when I was involved with Command and Control Systems for the french DoD, they were called C3I. Since then it seems they added a C for Computers, C4I. Maybe for the next 20 years the 4th C of C4I should stand for Cloud.
Finding the Right Data Solution for Your Application in the Data Storage Hays...Srinath Perera
The NoSQL movement has rekindled interest in data storage solutions. A few years ago, within limited scale systems, storage choices for programmers and architects were simple where relational databases were almost always the choice. However, advent of Cloud and ever increasing user bases for applications have given rise to larger scale systems. Relational databases cannot always scale to meet the needs of those systems, and as an alternative, the NoSQL movement has proposed many solutions.
For a programmer who wants to select a data model, they now have to choose from a wide variety of choices like Local memory, Relational databases, Files, Distributed Cache, Column Family Storage, Document Storage, Name value pairs, Graph DBs, Service Registries, Queue, and Tuple Space etc. Furthermore, there are different layers/access choices such as directly accessing data, using object to relation mapping layer like hibernate/JPA, or using data services. Moreover, users also need to worry about how to scale up the storage in multiple dimensions like the number of databases, the number of tables, the amount of data in a table, frequency of requests, types of requests (read/write ratio).
Consequently, choosing the right data model for a given problem is no longer trivial, and such a choice needs a clear understanding of different storage offerings, their similarities, differences, as well as associated tradeoffs. We faced the same problem while designing the data interfaces for Stratos Platform as a Service (SaaS) offering, and in this talk, we would like to share our findings and experiences of that work. We will present a survey of different data models, their differences as well as similarities, tradeoffs, and killer apps for each model. We believe the participants will walk away with a border understanding about data models and guidelines on which model to be used when.
Oracle database performance monitoring diagnosis and reporting with EG Innova...eG Innovations
The Oracle database platform is powering many of today's business-critical applications and services. As applications and IT infrastructures are getting more complex and interconnected, performance issues anywhere in the IT infrastructure can quickly cascade and negatively impact end user experience. When Oracle database access is slow, is the issue with the Oracle database configuration or sizing? Or could it because of the storage tier? Virtualization platform? Application queries? Network?
Join this live demo to see how next-generation performance monitoring & analytics provides deep visibility into Oracle database environments to accelerate the diagnosis of application and server performance issues, and quickly restore user experience. During the live demonstration, we will show you how to:
• Have a single unified monitoring solution that addresses your database, virtualization, network and storage monitoring, diagnosis, analytics, and reporting needs;
• Use intelligent analytics to analyze and correlate performance inside the database server and across the other tiers of your IT environment to provide unparalleled speed & ease of proactive alerting, diagnosis & analysis;
• View best-in-class customizable dashboards that integrate performance metrics regarding the database and other tiers to provide real-time role-based and domain-based views on user experience, system and service health, resource consumption, capacity and more;
• Report on historical performance and trends and analyze usage patterns to right-size and optimize your IT infrastructure for maximum ROI;
Glitches can occur in even the best run IT operations. In this session, Accelrys support experts will share tips and tricks for proactively managing the performance of your ELN and detailed strategies for troubleshooting issues when they arise. Discussions will draw from real-world experience and will provide you with detailed strategies to leverage Accelrys support and minimize the time required to diagnose an issue.
RightScale Webinar: ServerTemplates™ are the innovative "secret sauce" of the RightScale Cloud Management Platform. They enable you to easily architect, launch, manage, and monitor multi-server deployments. More than half of the 40,000-plus RightScale ServerTemplates were created from scratch by our customers. By using ServerTemplates, you can slice up your existing configurations into your own custom blueprints for cloud servers. In this session, we'll share best practices for developing, testing, and maintaining your own custom ServerTemplates.
Understanding Metadata: Why it's essential to your big data solution and how ...Zaloni
In this O'Reilly webcast, Ben Sharma (cofounder and CEO of Zaloni) and Vikram Sreekanti (software engineer in the AMPLab at UC Berkeley) discuss the value of collecting and analyzing metadata, and its potential to impact your big data solution and your business.
Watch the replay here: http://oreil.ly/28LO7IW
Similar to Introducing the App Engine datastore (20)
Slides for a talk at the Colorado Software Summit in 2008 that I did about growing Bumper Sticker, a Ruby on Rails Facebook app to over a billion pageviews.
Funny thing is ... I had to bail on the conference. Had to ship product.
OSCON Google App Engine Codelab - July 2010ikailan
Slides for the App Engine codelab given on July 20, 2010. Note that a more verbose version of this codelab is available at:
https://sites.google.com/site/gdevelopercodelabs/app-engine/python-codelab
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
2. Hands on with the App
Engine Datastore
Ikai Lan
May 9th, 2011
2
Thursday, May 26, 2011
3. About the speaker
• Ikai Lan - Developer Programs Engineer, Developer Relations
• Twitter: @ikai
• Google Profile: http://profiles.google.com/ikai.lan
3
Thursday, May 26, 2011
5. Goals of this talk
• Understand a bit of how the datastore works underneath the
hood
• Have a conceptual background for the persistence codelab
5
Thursday, May 26, 2011
6. Understanding the datastore
• The underlying Bigtable
• Indexing and queries
• Complex queries
• Entity groups
• Underlying infrastructure
6
Thursday, May 26, 2011
7. Datastore layers
Complex Entity Group Queries on Key range Get and set
queries Transactions properties scan by key
Datastore
✓ ✓ ✓ ✓ ✓
Megastore
✓ ✓ ✓ ✓
Bigtable
✓ ✓
7
Thursday, May 26, 2011
8. Datastore layers
Get and set
Complex Entity Group Group on Key on
Complex Entity Queries Queries range byGet and set
key, key
queries Transactions properties
queries Transactions properties
scan range scans
by key
Datastore
✓✓ ✓ ✓ ✓ ✓✓ ✓✓
Megastore
✓ ✓ ✓ ✓✓ ✓✓
Bigtable
✓ ✓✓
8
Thursday, May 26, 2011
9. What does a Bigtable row look like?
Source: http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/bigtable-osdi06.pdf
9
Thursday, May 26, 2011
10. Bigtable API
• “Give me the column ‘name’ at key 123”
• “Set the column ‘name’ at key 123 to ‘ikai’”
• “Give me all columns where the key is greater than 100 and less
than 200”
10
Thursday, May 26, 2011
11. Datastore layers
Get and set
Complex Entity Group Group on Key on
Complex Entity Queries Queries range byGet and set
key, key
queries Transactions properties
queries Transactions properties
scan range scans
by key
Datastore
✓✓ ✓ ✓ ✓ ✓✓ ✓✓
Megastore
✓ ✓ ✓ ✓✓ ✓✓
Bigtable
✓ ✓✓
11
Thursday, May 26, 2011
12. Megastore API
• “Give me all rows where the column ‘name’ equals ‘ikai’”
• “Transactionally write an update to this group of entities”
• “Do a cross datacenter write of this data such that reads will be
strongly consistent” (High Replication Datastore)
• Megastore paper: http://www.cidrdb.org/cidr2011/Papers/
CIDR11_Paper32.pdf
12
Thursday, May 26, 2011
13. Datastore layers
Get and set
Complex Entity Group Group on Key on
Complex Entity Queries Queries range byGet and set
key, key
queries Transactions properties
queries Transactions properties
scan range scans
by key
Datastore
✓✓ ✓ ✓ ✓ ✓✓ ✓✓
Megastore
✓ ✓ ✓ ✓✓ ✓✓
Bigtable
✓ ✓✓
13
Thursday, May 26, 2011
14. App Engine Datastore API
• “Give me all Users for my app where the name equals ‘ikai’,
company equals ‘Google’, and sort them by the ‘awesome’
column, descending”
14
Thursday, May 26, 2011
17. Let’s save an Entity with the low-level Java API
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Entity ikai = new Entity("User", "ikai@google.com");
ikai.setProperty("firstName", "ikai");
ikai.setProperty("company", "google");
ikai.setUnindexedProperty("biography",
"Ikai is a great man, a great, great man.");
datastore.put(ikai);
16
Thursday, May 26, 2011
18. Get an instance of the DatastoreService
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Fetch a client instance
Entity ikai = new Entity("User", "ikai@google.com");
ikai.setProperty("firstName", "ikai");
ikai.setProperty("company", "google");
ikai.setUnindexedProperty("biography",
"Ikai is a great man, a great, great man.");
datastore.put(ikai);
17
Thursday, May 26, 2011
19. Instantiate a new Entity
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Entity ikai = new Entity("User", "ikai@google.com");
Set the Entity Kind
ikai.setProperty("firstName", "ikai");
ikai.setProperty("company", "google");
ikai.setUnindexedProperty("biography",
"Ikai is a great man, a great, great man.");
datastore.put(ikai);
18
Thursday, May 26, 2011
20. Instantiate a new Entity
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Entity ikai = new Entity("User", "ikai@google.com");
ikai.setProperty("firstName", "ikai"); a
Set unique key
ikai.setProperty("company", "google");
ikai.setUnindexedProperty("biography",
"Ikai is a great man, a great, great man.");
datastore.put(ikai);
19
Thursday, May 26, 2011
21. Set indexed properties
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
First argument is the
Entity ikai = new Entity("User", "ikai@google.com");
property name
ikai.setProperty("firstName", "ikai");
ikai.setProperty("company", "google");
ikai.setUnindexedProperty("biography", argument
Second is the
property value
"Ikai is a great man, a great, great man.");
datastore.put(ikai);
20
Thursday, May 26, 2011
22. Set unindexed properties
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Entity ikai = new Entity("User", "ikai@google.com");
This property will be saved, but we
ikai.setProperty("firstName", "ikai");
ikai.setProperty("company", "google");
will not run queries against it
ikai.setUnindexedProperty("biography",
"Ikai is a great man, a great, great man.");
datastore.put(ikai);
21
Thursday, May 26, 2011
23. Commit the entity to the datastore
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Entity ikai = new Entity("User", "ikai@google.com");
ikai.setProperty("firstName", "ikai");
ikai.setProperty("company", "google");
ikai.setUnindexedProperty("biography",
"Ikai is a thing! man, a great, great man.");
Save the great
datastore.put(ikai);
22
Thursday, May 26, 2011
24. What happens when we save?
Write the entity
Make the Success!
write RPC
Write the
indexes
23
Thursday, May 26, 2011
25. What actually gets written?
Entities table
Bigtable key Value
AppId:User:ikai@google.com ( Protobuf serialized entity - includes
firstName, company and biography
values )
Indexes table
Bigtable key Value
AppId:User:firstName:ikai:ikai@google.com ( Empty )
AppId:User:company:google:ikai@google.com ( Empty )
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
24
Thursday, May 26, 2011
26. Now let’s run a query
• If we have the key, we can fetch it right away by key
• What if we don’t? We need indexes.
25
Thursday, May 26, 2011
27. Let’s run a query
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Query queryByName = new Query("User");
queryByName.addFilter("firstName",
FilterOperator.EQUAL, "ikai");
List<Entity> results = datastore.prepare(
queryByName).asList(
FetchOptions.Builder.withDefaults());
// Roughly equivalent to:
// SELECT * from User WHERE firstname = ‘ikai’;
26
Thursday, May 26, 2011
28. Step 1: Query the indexes table
Entities table
Bigtable key Value
AppId:User:ikai@google.com ( Protobuf serialized entity - includes
firstName, company and biography
values )
Scan the indexes table for values >=
AppId:User:firstName:
Indexes table
Bigtable key Value
AppId:User:firstName:ikai:ikai@google.com ( Empty )
AppId:User:company:google:ikai@google.com ( Empty )
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
27
Thursday, May 26, 2011
29. Step 2: Start extracting keys
Entities table
Bigtable key Value
AppId:User:ikai@google.com ( Protobuf serialized entity - includes
firstName, company and biography
values )
Indexes table
Bigtable key Value
AppId:User:firstName:ikai:ikai@google.com ( Empty )
AppId:User:company:google:ikai@google.com ( Empty )
That gets us this row - extract the key
ikai@google.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
28
Thursday, May 26, 2011
30. Step 3: Batch get the entities themselves
Entities table
Bigtable key Value
AppId:User:ikai@google.com ( Protobuf serialized entity - includes
firstName, company and biography
values )
Now
Indexes table let’s go back to the entities table and
fetch that key. Success! Value
Bigtable key
AppId:User:firstName:ikai:ikai@google.com ( Empty )
AppId:User:company:google:ikai@google.com ( Empty )
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
29
Thursday, May 26, 2011
31. Key takeaways
• This isn’t a relational database
– There are no full table scans
– Indexes MUST exist for every property we want to query
– Natively, we can only query on matches or startsWith queries
– Don’t index what we never need to query on
• Get by key = one step. Query on property value = 2 steps
30
Thursday, May 26, 2011
32. Let’s run a more complex query!
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Query queryByName = new Query("User");
queryByName.addFilter("firstName",
FilterOperator.EQUAL, "ikai");
queryByName.addFilter("company",
FilterOperator.EQUAL, "google");
List<Entity> results = datastore.prepare(
queryByName).asList(
FetchOptions.Builder.withDefaults());
// Roughly equivalent to:
// SELECT * from User WHERE firstname = ‘ikai’
// AND company = ‘google’;
31
Thursday, May 26, 2011
33. Query resolution strategies
• This query can be resolved using built in indexes
– Zig zag merge join - we’ll cover this example
• Can be optimized using composite indexes
32
Thursday, May 26, 2011
34. Zig zag across multiple indexes
Begin by scanning indexes >=
Bigtable key
AppId:User:company:google
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
AppId:User:company:google:ikai@google.com
AppId:User:company:google:max@google.com
AppId:User:company:megacorp:zed@megacorp.com
Bigtable key
AppId:User:firstName:alfred:alfred@acme.com
AppId:User:firstName:ikai:ikai@acme.com
AppId:User:firstName:ikai:ikai@google.com
AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
33
Thursday, May 26, 2011
35. Zig zag across multiple indexes
Bigtable key
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
AppId:User:company:google:ikai@google.com
AppId:User:company:google:max@google.com
AppId:User:company:megacorp:zed@megacorp.com
There’s at least a partial match,
Bigtable key
so we “jump” to the next index
AppId:User:firstName:alfred:alfred@acme.com
AppId:User:firstName:ikai:ikai@acme.com
AppId:User:firstName:ikai:ikai@google.com
AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
34
Thursday, May 26, 2011
36. Zig zag across multiple indexes
Bigtable key
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
AppId:User:company:google:ikai@google.com
AppId:User:company:google:max@google.com
Move to the next index. Start a scan for keys >=
AppId:User:company:megacorp:zed@megacorp.com
AppId:User:firstName:ikai:david@google.com Bigtable key
AppId:User:firstName:alfred:alfred@acme.com
AppId:User:firstName:ikai:ikai@acme.com
AppId:User:firstName:ikai:ikai@google.com
AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
35
Thursday, May 26, 2011
37. Zig zag across multiple indexes
Bigtable key
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
AppId:User:company:google:ikai@google.com
AppId:User:company:google:max@google.com
Okay, so that’s a twist. The first value that
AppId:User:company:megacorp:zed@megacorp.com
matches has key ikai@google.com! Does this
Bigtable key
value exist in the first index? AppId:User:firstName:alfred:alfred@acme.com
AppId:User:firstName:ikai:ikai@acme.com
AppId:User:firstName:ikai:ikai@google.com
AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
36
Thursday, May 26, 2011
38. Zig zag across multiple indexes
Let’s advance the original cursor to >=
Bigtable key
AppId:User:company:google:ikai@google.com
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
AppId:User:company:google:ikai@google.com
AppId:User:company:google:max@google.com
AppId:User:company:megacorp:zed@megacorp.com
Bigtable key
AppId:User:firstName:alfred:alfred@acme.com
AppId:User:firstName:ikai:ikai@acme.com
AppId:User:firstName:ikai:ikai@google.com
AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
37
Thursday, May 26, 2011
39. Zig zag across multiple indexes
Bigtable key
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
AppId:User:company:google:ikai@google.com
AppId:User:company:google:max@google.com
AppId:User:company:megacorp:zed@megacorp.com
Bigtable key
AppId:User:firstName:alfred:alfred@acme.com
Alright! We found a match. Let’s AppId:User:firstName:ikai:ikai@acme.com
add the key to our in memory list AppId:User:firstName:ikai:ikai@google.com
and go back to the first index AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
38
Thursday, May 26, 2011
40. Zig zag across multiple indexes
Bigtable key Let’s move on to see if there are any more
AppId:User:company:acme:alfred@acme.com
matches. Let’s start at max@google.com
AppId:User:company:google:david@google.com
AppId:User:company:google:ikai@google.com
AppId:User:company:google:max@google.com
AppId:User:company:megacorp:zed@megacorp.com
Bigtable key
Bigtable key
AppId:User:firstName:alfred:alfred@acme.com
AppId:User:firstName:alfred:alfred@acme.com
AppId:User:firstName:ikai:ikai@acme.com
AppId:User:firstName:ikai:ikai@acme.com
AppId:User:firstName:ikai:ikai@google.com
AppId:User:firstName:ikai:ikai@google.com
AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
39
Thursday, May 26, 2011
41. Zig zag across multiple indexes
Bigtable key
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
AppId:User:company:google:ikai@google.com
AppId:User:company:google:max@google.com
Are there any keys >=
AppId:User:company:megacorp:zed@megacorp.com
AppId:User:firstName:ikai:max@google.com? Bigtable key
AppId:User:firstName:alfred:alfred@acme.com
AppId:User:firstName:ikai:ikai@acme.com
AppId:User:firstName:ikai:ikai@google.com
AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
40
Thursday, May 26, 2011
42. Zig zag across multiple indexes
Bigtable key
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
AppId:User:company:google:ikai@google.com
AppId:User:company:google:max@google.com
AppId:User:company:megacorp:zed@megacorp.com
No. We’re at the end of our Bigtable key
index scans. Let’s do a batch AppId:User:firstName:alfred:alfred@acme.com
key of our list of keys: AppId:User:firstName:ikai:ikai@acme.com
AppId:User:firstName:ikai:ikai@google.com
[ ‘ikai@google.com’ ]
AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
41
Thursday, May 26, 2011
43. Batch get the entities themselves
Entities table
Bigtable key Value
AppId:User:ikai@google.com ( Protobuf serialized entity - includes
firstName, company and biography
values )
Now let’s go back to the entities table and
fetch that key. Success!
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
42
Thursday, May 26, 2011
44. Let’s change the shape of the data
• Zig zag performance is HIGHLY dependent on the shape of the
data
• Let’s go ahead and muck with the data a bit
43
Thursday, May 26, 2011
46. Same query, sparsely distributed matches
Begin by scanning indexes >=
Bigtable key
AppId:User:company:google
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
AppId:User:company:google:ikai@google.com
AppId:User:company:google:max@google.com
AppId:User:company:megacorp:zed@megacorp.com
Bigtable key
AppId:User:firstName:alfred:alfred@acme.com
AppId:User:firstName:ikai:ikai@acme.com
AppId:User:firstName:igor:ikai@google.com
AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
45
Thursday, May 26, 2011
47. Same query, sparsely distributed matches
Bigtable key
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
AppId:User:company:google:ikai@google.com
AppId:User:company:google:max@google.com
AppId:User:company:megacorp:zed@megacorp.com
Move to the next index. Start a scan for keys >=
Bigtable key
AppId:User:firstName:ikai:david@google.com
AppId:User:firstName:alfred:alfred@acme.com
AppId:User:firstName:ikai:ikai@acme.com
AppId:User:firstName:igor:ikai@google.com
AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
46
Thursday, May 26, 2011
48. Same query, sparsely distributed matches
Bigtable key
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
AppId:User:company:google:ikai@google.com
AppId:User:company:google:max@google.com
AppId:User:company:megacorp:zed@megacorp.com
Bigtable key
Oh ... no matches. Let’s AppId:User:firstName:alfred:alfred@acme.com
AppId:User:firstName:ikai:ikai@acme.com
move back to the first AppId:User:firstName:igor:ikai@google.com
index and move the AppId:User:firstName:ikai:ikai@megacorp.com
cursor down AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
47
Thursday, May 26, 2011
50. Same query, sparsely distributed matches
Bigtable key
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
AppId:User:company:google:ikai@google.com
AppId:User:company:google:max@google.com
AppId:User:company:megacorp:zed@megacorp.com
Move to the next index. Start a scan for keys >=
Bigtable key
AppId:User:firstName:ikai:ikai@google.com
AppId:User:firstName:alfred:alfred@acme.com
AppId:User:firstName:ikai:ikai@acme.com
AppId:User:firstName:igor:ikai@google.com
AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
49
Thursday, May 26, 2011
51. Same query, sparsely distributed matches
Bigtable key
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
Oh ... no matches here
AppId:User:company:google:ikai@google.com either. Let’s go back to
AppId:User:company:google:max@google.com the first index.
AppId:User:company:megacorp:zed@megacorp.com
Bigtable key
AppId:User:firstName:alfred:alfred@acme.com
AppId:User:firstName:ikai:ikai@acme.com
AppId:User:firstName:igor:ikai@google.com
AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
50
Thursday, May 26, 2011
52. Same query, sparsely distributed matches
Bigtable key
AppId:User:company:acme:alfred@acme.com
AppId:User:company:google:david@google.com
Oh ... no matches here
AppId:User:company:google:ikai@google.com either. Let’s go back to
AppId:User:company:google:max@google.com the first index.
AppId:User:company:megacorp:zed@megacorp.com
Bigtable key
AppId:User:firstName:alfred:alfred@acme.com
... if these indexes were AppId:User:firstName:ikai:ikai@acme.com
huge, we could be here AppId:User:firstName:igor:ikai@google.com
for a while! AppId:User:firstName:ikai:ikai@megacorp.com
AppId:User:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
51
Thursday, May 26, 2011
53. What happens in this case?
• If we traverse too many indexes, the datastore throws a
NeedIndexException
• We’ll want to build a composite index
52
Thursday, May 26, 2011
54. Composite index
Bigtable key
AppId:User:company:acme:firstName:alfred:alfred@acme.com
AppId:User:company:google:firstName:david:david@google.com
AppId:User:company:google:firstName:ikai:ikai@google.com
AppId:User:company:google:firstName:max:max@google.com
AppId:User:company:megacorp:firstName:zed:zed@megacorp.com
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
53
Thursday, May 26, 2011
55. Composite index
Bigtable key
AppId:User:company:acme:firstName:alfred:alfred@acme.com
AppId:User:company:google:firstName:david:david@google.com
AppId:User:company:google:firstName:ikai:ikai@google.com
AppId:User:company:google:firstName:max:max@google.com
AppId:User:company:megacorp:firstName:zed:zed@megacorp.com
Search for all keys >=
AppId:User:company:google:firstName:ikai
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
54
Thursday, May 26, 2011
56. Composite index
Bigtable key
AppId:User:company:acme:firstName:alfred:alfred@acme.com
AppId:User:company:google:firstName:david:david@google.com
AppId:User:company:google:firstName:ikai:ikai@google.com
AppId:User:company:google:firstName:max:max@google.com
AppId:User:company:megacorp:firstName:zed:zed@megacorp.com
Well, that was much faster, wasn’t it?
Read more: http://code.google.com/appengine/articles/storage_breakdown.html
55
Thursday, May 26, 2011
57. Composite index tradeoffs
• Created at entity save time - incurs additional datastore CPU
and storage quota
• You can only create 200 composite index
• You need to know the possible queries ahead of time!
56
Thursday, May 26, 2011
58. Complex Queries takeaways
• This isn’t a relational database
– There are no full table scans
– Indexes MUST exist for every property we want to query
• Performance depends on the shape of the data
• Worse case scenario: if your query matches are highly sparse
• Build composite indexes when you need them
57
Thursday, May 26, 2011
61. Why entity groups?
• We can perform transactions within this group - but not outside
• Data locality - data are stored “near” each other
• Strongly consistent queries when using High Replication
datastore within this entity group
59
Thursday, May 26, 2011
62. Entity groups and transactions
• A hierarchical structuring of your data into Megastore’s unit of
atomicity
• Allows for transactional behavior - but only within a single entity
group
• Key unit of consistency when using High Replication datastore
60
Thursday, May 26, 2011
63. Example: Data for a blog hosting service
User
Blog Has many
Has many
Entry
Has many Comment
61
Thursday, May 26, 2011
64. Example: Data for a blog hosting service
User
Blog Has many
Has many
Entry
This can be structured as
an entity group (tree
structure)! Has many Comment
62
Thursday, May 26, 2011
65. Structure this data as an entity group
Entity
User
group root
Blog Blog
Entry Entry Entry
Comment
Comment Comment
63
Thursday, May 26, 2011
66. How are entity groups stored?
Entities table
Bigtable key Value
AppId:User:ikai@google.com ( Protobuf serialized User )
AppId:User:ikai@google.com/Blog:123 ( Protobuf serialized Blog )
AppId:User:ikai@google.com/Blog:123/Entry:456 ( Protobuf serialized Entry )
AppId:User:ikai@google.com/Blog:123/Entry:789 ( Protobuf serialized Entry )
AppId:User:ikai@google.com/Blog:123/Entry:456/ ( Protobuf serialized Comment )
Comment:111
AppId:User:ikai@google.com/Blog:123/Entry:456/ ( Protobuf serialized Comment )
Comment:222
AppId:User:ikai@google.com/Blog:123/Entry:789/ ( Protobuf serialized Comment )
Comment:333
Read more: http://code.google.com/appengine/docs/python/datastore/entities.html
64
Thursday, May 26, 2011
67. How are entity groups stored?
Entities table Entity groups have a single root entity
Bigtable key Value
AppId:User:ikai@google.com ( Protobuf serialized User )
AppId:User:ikai@google.com/Blog:123 ( Protobuf serialized Blog )
AppId:User:ikai@google.com/Blog:123/Entry:456 ( Protobuf serialized Entry )
AppId:User:ikai@google.com/Blog:123/Entry:789 ( Protobuf serialized Entry )
AppId:User:ikai@google.com/Blog:123/Entry:456/ ( Protobuf serialized Comment )
Comment:111
AppId:User:ikai@google.com/Blog:123/Entry:456/ ( Protobuf serialized Comment )
Comment:222
AppId:User:ikai@google.com/Blog:123/Entry:789/ ( Protobuf serialized Comment )
Comment:333
Read more: http://code.google.com/appengine/docs/python/datastore/entities.html
65
Thursday, May 26, 2011
68. How are entity groups stored?
Entities table
Bigtable key Value
AppId:User:ikai@google.com ( Protobuf serialized User )
AppId:User:ikai@google.com/Blog:123 ( Protobuf serialized Blog )
AppId:User:ikai@google.com/Blog:123/Entry:456 ( Protobuf serialized Entry )
AppId:User:ikai@google.com/Blog:123/Entry:789 ( Protobuf serialized Entry )
Child entities embed the entire ancestry in
AppId:User:ikai@google.com/Blog:123/Entry:456/ ( Protobuf serialized Comment )
Comment:111 their keys
AppId:User:ikai@google.com/Blog:123/Entry:456/ ( Protobuf serialized Comment )
Comment:222
AppId:User:ikai@google.com/Blog:123/Entry:789/ ( Protobuf serialized Comment )
Comment:333
Read more: http://code.google.com/appengine/docs/python/datastore/entities.html
66
Thursday, May 26, 2011
69. Let’s write an entity group transactionally
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Entity ikai = new Entity("User", "ikai@google.com");
Entity blog = new Entity("Blog", "ikaisays.com",
ikai.getKey());
Entity entry = new Entity("Entry", "datastore-intro",
blog.getKey());
// Auto assign an ID
Entity comment = new Entity("Comment", entry.getKey());
Transaction tx = datastore.beginTransaction();
// Helper function for clarity
datastore.put(Arrays.asList(ikai, blog,entry, comment));
tx.commit();
67
Thursday, May 26, 2011
70. Let’s write an entity group transactionally
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService(); Create the root entity
Entity ikai = new Entity("User", "ikai@google.com");
Entity blog = new Entity("Blog", "ikaisays.com",
ikai.getKey());
Entity entry = new Entity("Entry", "datastore-intro",
blog.getKey());
// Auto assign an ID
Entity comment = new Entity("Comment", entry.getKey());
Transaction tx = datastore.beginTransaction();
// Helper function for clarity
datastore.put(Arrays.asList(ikai, blog,entry, comment));
tx.commit();
68
Thursday, May 26, 2011
71. Let’s write an entity group transactionally
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Entity ikai = new Entity("User", "ikai@google.com");
Entity blog = new Entity("Blog", "ikaisays.com",
ikai.getKey());
Entity entry = new Entity("Entry", "datastore-intro",
blog.getKey());
This is the first child entity - notice the third
// Auto assign an ID
Entity comment = new Entity("Comment", entry.getKey());
argument, which specifies the parent entity key
Transaction tx = datastore.beginTransaction();
// Helper function for clarity
datastore.put(Arrays.asList(ikai, blog,entry, comment));
tx.commit();
69
Thursday, May 26, 2011
72. Let’s write an entity group transactionally
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Entity ikai = new Entity("User", "ikai@google.com");
Entity blog = new Entity("Blog", "ikaisays.com",
ikai.getKey());
Entity entry = new Entity("Entry", "datastore-intro",
blog.getKey());
// Auto assign an ID
Entity comment = new Entity("Comment", entry.getKey());
The next deeper entity sets the blog as the
parent
Transaction tx = datastore.beginTransaction();
// Helper function for clarity
datastore.put(Arrays.asList(ikai, blog,entry, comment));
tx.commit();
70
Thursday, May 26, 2011
73. Let’s write an entity group transactionally
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Entity ikai = new Entity("User", "ikai@google.com");
Entity blog = new Entity("Blog", "ikaisays.com",
We can also opt to not provide a key name and
ikai.getKey());
Entity entry = new Entity("Entry", "datastore-intro",
just use a parent key for a new entity
blog.getKey());
// Auto assign an ID
Entity comment = new Entity("Comment", entry.getKey());
Transaction tx = datastore.beginTransaction();
// Helper function for clarity
datastore.put(Arrays.asList(ikai, blog,entry, comment));
tx.commit();
71
Thursday, May 26, 2011
74. Let’s write an entity group transactionally
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Entity ikai = new Entity("User", "ikai@google.com");
Entity blog = new Entity("Blog", "ikaisays.com",
ikai.getKey());
Entity entry = new Entity("Entry", "datastore-intro",
blog.getKey());
Start a new transaction
// Auto assign an ID
Entity comment = new Entity("Comment", entry.getKey());
Transaction tx = datastore.beginTransaction();
// Helper function for clarity
datastore.put(Arrays.asList(ikai, blog,entry, comment));
tx.commit();
72
Thursday, May 26, 2011
75. Let’s write an entity group transactionally
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Entity ikai = new Entity("User", "ikai@google.com");
Entity blog = new Entity("Blog", "ikaisays.com",
ikai.getKey());
Entity entry = new Entity("Entry", "datastore-intro",
blog.getKey());
// Auto assign an ID
Entity comment = new Entity("Comment", entry.getKey());
Transaction tx = datastore.beginTransaction();
// Helper function for clarity
datastore.put(Arrays.asList(ikai, blog,entry, comment));
tx.commit();
Put the entities in parallel
73
Thursday, May 26, 2011
76. Let’s write an entity group transactionally
DatastoreService datastore = DatastoreServiceFactory
.getDatastoreService();
Entity ikai = new Entity("User", "ikai@google.com");
Entity blog = new Entity("Blog", "ikaisays.com",
ikai.getKey());
Entity entry = new Entity("Entry", "datastore-intro",
blog.getKey());
// Auto assign an ID
Entity comment = new Entity("Comment", entry.getKey());
Transaction tx = datastore.beginTransaction();
// Helper function for clarity
Actually commit the changes
datastore.put(Arrays.asList(ikai, blog,entry, comment));
tx.commit();
74
Thursday, May 26, 2011
77. Step 1: Commit
Changes to Changes to entities
Commit
entities visible and indexes visible
Roll the timestamp forward on
the root entity
75
Thursday, May 26, 2011
78. On read, check for the most
Step 2: Entity visible recent timestamp on the root
entity
Changes to Changes to entities
Commit
entities visible and indexes visible
This is the version we want
since it represents a
complete write
76
Thursday, May 26, 2011
79. Step 3: Indexes updated
Changes to Changes to entities
Commit
entities visible and indexes visible
Indexes are written - now we
can query for this entity with
the new properties
77
Thursday, May 26, 2011
80. Entity group and transactions takeaways
• Structure data into hierarchical trees
– Large enough to be useful, small enough to maximize
transactional throughput
• Transactions need an entity group root - roughly 1 transaction/
second
– If you write N entities that are all part of 1 entity group, it counts as
1 write
• Optimistic locking used - can be expensive with a lot of
contention
78
Thursday, May 26, 2011
81. General datastore tips
• Denormalize as much as possible
– As much as possible, treat datastore as a key-value store
(Dictionary or Map like structure)
– Move large reporting to offline processing. This lets you avoid
unnecessary indexes
• Use entity groups for your data
• Build composite indexes where you need them - “need” depends
on shape of your data
79
Thursday, May 26, 2011