The document discusses the transition from relational databases to NoSQL databases. It notes that the two main drivers for adopting NoSQL databases are the lack of flexibility in relational schemas and the inability to scale out data. It provides examples of different types of NoSQL databases like key-value, document, columnar, and graph databases. The document specifically focuses on distributed document databases, explaining their structure where each record is a self-describing JSON or XML document that can have complex, nested data structures. It compares the relational and document data models, providing an example of how user profile data would be structured in each. Finally, it demonstrates how making changes to data is simpler with a document database by embedding all related information
The Data2Semantics project (COMMIT P23) is all about enriching research data, and making it more reusable for future research. Using Linked Data for this task is a fairly obvious step to make (surprise!). However, there are several shortcomings the current practices in publishing Linked Data, that calls for a slightly
different approach which (hopefully) bridges a gap between Web 2.0 and Web 3.0. I will present a proof-of-concept service (Linkitup) that works on top of existing scientific data repositories, and allows individual researchers to enrich their data with additional (linked) metadata.
A great power point presentation for DBMS Concepts from start to end and with best examples chapter by chapter. Please go though each chapters sequentially for your knowledge.
A very easy going study material for better understanding and concepts of Database Management System.
The Data2Semantics project (COMMIT P23) is all about enriching research data, and making it more reusable for future research. Using Linked Data for this task is a fairly obvious step to make (surprise!). However, there are several shortcomings the current practices in publishing Linked Data, that calls for a slightly
different approach which (hopefully) bridges a gap between Web 2.0 and Web 3.0. I will present a proof-of-concept service (Linkitup) that works on top of existing scientific data repositories, and allows individual researchers to enrich their data with additional (linked) metadata.
A great power point presentation for DBMS Concepts from start to end and with best examples chapter by chapter. Please go though each chapters sequentially for your knowledge.
A very easy going study material for better understanding and concepts of Database Management System.
Polyglot Database - Linuxcon North America 2016Dave Stokes
Many Relation Databases are adding NoSQL features to their products. So what happens when you can get direct access to the data as a key/value pair, or you can store an entire document in a column of a relational table, and more
Analyzing Semi-Structured Data At Volume In The CloudRobert Dempsey
Presentation from Snowflake Computing at the November 2015 Data Wranglers DC meetup.
The Cloud, Mobile and Web Applications are producing semi-structured data at an unprecedented rate. IT professionals continue to struggle capturing, transforming, and analyzing these complex data structures mixed with traditional relational style datasets using conventional MPP and/or Hadoop infrastructures. Public cloud infrastructures such as Amazon and Azure provide almost unlimited resources and scalability to handle both structured and semi-structured data (XML, JSON, AVRO) at Petabyte scale. These new capabilities coupled with traditional data management access methods such as SQL allow organizations and businesses new opportunities to leverage analytics at an unprecedented scale while greatly simplifying data pipeline architectures and providing an alternative to the "data lake".
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Dipti Borkar
Born at Facebook, Presto is an open source high performance, distributed SQL query engine. With the disaggregation of storage and compute, Presto was created to simplify querying of all data lakes - cloud data lakes like S3 and on premise data lakes like HDFS. Presto's high performance and flexibility has made it a very popular choice for interactive query workloads on large Hadoop-based clusters as well as AWS S3, Google Cloud Storage and Azure blob store. Today it has grown to support many users and use cases including ad hoc query, data lake house analytics, and federated querying. In this session, we will give an overview on Presto including architecture and how it works, the problems it solves, and most common use cases. We'll also share the latest innovation in the project as well as the future roadmap.
Polyglot Database - Linuxcon North America 2016Dave Stokes
Many Relation Databases are adding NoSQL features to their products. So what happens when you can get direct access to the data as a key/value pair, or you can store an entire document in a column of a relational table, and more
Analyzing Semi-Structured Data At Volume In The CloudRobert Dempsey
Presentation from Snowflake Computing at the November 2015 Data Wranglers DC meetup.
The Cloud, Mobile and Web Applications are producing semi-structured data at an unprecedented rate. IT professionals continue to struggle capturing, transforming, and analyzing these complex data structures mixed with traditional relational style datasets using conventional MPP and/or Hadoop infrastructures. Public cloud infrastructures such as Amazon and Azure provide almost unlimited resources and scalability to handle both structured and semi-structured data (XML, JSON, AVRO) at Petabyte scale. These new capabilities coupled with traditional data management access methods such as SQL allow organizations and businesses new opportunities to leverage analytics at an unprecedented scale while greatly simplifying data pipeline architectures and providing an alternative to the "data lake".
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Dipti Borkar
Born at Facebook, Presto is an open source high performance, distributed SQL query engine. With the disaggregation of storage and compute, Presto was created to simplify querying of all data lakes - cloud data lakes like S3 and on premise data lakes like HDFS. Presto's high performance and flexibility has made it a very popular choice for interactive query workloads on large Hadoop-based clusters as well as AWS S3, Google Cloud Storage and Azure blob store. Today it has grown to support many users and use cases including ad hoc query, data lake house analytics, and federated querying. In this session, we will give an overview on Presto including architecture and how it works, the problems it solves, and most common use cases. We'll also share the latest innovation in the project as well as the future roadmap.
How companies use NoSQL & Couchbase - NoSQL Now 2014Dipti Borkar
My presentation from the NoSQL Now 2014 conference.
Abstract
NoSQL databases including Couchbase are increasingly being selected as the backend technology for web and mobile apps. Document databases in particular are well suited for a large number of different use cases as an operational datastore.
This session provides a brief overview of Couchbase Server, a document database and its underlying distributed architecture. In addition, Dipti will present some common use cases of Couchbase with a drill down into three specific customer use cases.
Paypal – A multi data center session store
LivePerson – A scalable, real time analytics system
Orbitz – A highly available cache solution
Silicon Valley NoSQL Meetup - Nov 2012. View with animations: video version here: https://vimeo.com/54691785
http://www.meetup.com/Silicon-Valley-NoSQL/events/88257222/
For more information visit: www.couchbase.com
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012Dipti Borkar
For more deep NoSQL content from Couchbase, check out http://www.couchbase.com/webinars
NoSQL databases have emerged as a better match than relational systems for modern interactive applications, offering cost-effective data management at “Big Data” scale. But there are significant differences between structured and schema-less database technology. What should architects and technical managers know as they explore NoSQL solutions for their teams?
In this workshop you will learn:
- How to evaluate NoSQL (both technical advantages and limitations) as a potential data management approach
- Critical differences between NoSQL and RDBMS for designing, building and running production applications
- Ideal use cases for NoSQL technology and sample reference architectures
Couchbase Server and IBM BigInsights: One + One = ThreeDipti Borkar
Session presented at CouchConf San Francisco
http://www.couchbase.com/couchconf-san-francisco
Frequently the terms NoSQL and Big Data are used as synonyms. While both technologies divert from the traditional RDBMS data model and spread data across clusters of servers, the “problems” these technologies address are quite different. Hadoop, is focused on data analysis – gleaning insights from large volumes of data. NoSQL databases, focus on interactive applications – delivering high-performance, cost-effective data management for massive number of users. In this session, we share how IBM BigInsights and Couchbase Server can used together to build better applications.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
3. Two
big
drivers
for
NoSQL
adop&on
49%
35%
29%
16%
12%
11%
Lack
of
flexibility/
Inability
to
Performance
Cost
All
of
these
Other
rigid
schemas
scale
out
data
challenges
Source:
Couchbase
Survey,
December
2011,
n
=
1351.
3
6. Document
Databases
• Each
record
in
the
database
is
a
self-‐
describing
document
{
• Each
document
has
an
independent
“UUID”:
“ 21f7f8de-‐8051-‐5b89-‐86
“Time”:
“2011-‐04-‐01T13:01:02.42
“Server”:
“A2223E”,
structure
“Calling
Server”:
“A2213W”,
“Type”:
“E100”,
“Initiating
User”:
“dsallings@spy.net”,
• Documents
can
be
complex
“Details”:
{
“IP”:
“ 10.1.1.22”,
• All
databases
require
a
unique
key
“API”:
“InsertDVDQueueItem”,
“Trace”:
“cleansed”,
• Documents
are
stored
using
JSON
or
“Tags”:
[
“SERVER”,
XML
or
their
deriva&ves
“US-‐West”,
“API”
]
• Content
can
be
indexed
and
queried
}
}
• Offer
auto-‐sharding
for
scaling
and
replica&on
for
high-‐availability
6
9. Rela&onal
vs
Document
data
model
C1
C2
C3
C4
{
JSON
JSON
}
JSON
Rela&onal
data
model
Document
data
model
Highly-‐structured
table
organiza&on
Collec&on
of
complex
documents
with
with
rigidly-‐defined
data
formats
and
arbitrary,
nested
data
formats
and
record
structure.
varying
“record”
format.
9
10. Example:
User
Profile
User
Info
Address
Info
KEY
First
Last
ZIP_id
ZIP_id
CITY
STATE
ZIP
1
Dip&
Borkar
2
1
DEN
CO
30303
2
Joe Smith
2
2
MV
CA
94040
3
Ali
Dodson
2
3
CHI
IL
60609
4
John
Doe
3
4
NY
NY
10010
To
get
informa&on
about
specific
user,
you
perform
a
join
across
two
tables
10
11. Document
Example:
User
Profile
{
“ID”:
1,
=
+
“FIRST”:
“Dip&”,
“LAST”:
“Borkar”,
“ZIP”:
“94040”,
“CITY”:
“MV”,
“STATE”:
“CA”
}
JSON
All
data
in
a
single
document
11
12. Making
a
Change
Using
RDBMS
User
Table
Photo
Table
Country
Table
Country
TEL Country
User
ID
First
Last
Zip
ID
User
ID
3
Photo
ID
Comment
ID
Country
ID
Country
name
2
d043
NYC
001
001
USA
1
Dip&
Borkar
94040
001
2
b054
Bday
007
002
UK
2
Joe
Smith
94040
001
5
c036
Miami
001
003
Argen&na
3
Ali
Dodson
94040
001
7
d072
Sunset
133
004
Australia
5002
e086
Spain
133
4
Sarah
Gorin
NW1
002
005
Aruba
Status
Table
006
Austria
5
Bob
Young
30303
001
Country
User
ID
Status
ID
Text
ID
007
Brazil
6
Nancy
Baker
10010
001
1
a42
At
conf
134
008
Canada
4
b26
excited
007
7
Ray
Jones
31311
001
5
c32
hockey
008
009
Chile
8
Lee
Chen
V5V3M
008
12
d83
Go
A’s
001
•
•
•
5000
e34
sailing
005
•
.
•
.
130
Portugal
•
.
Affilia&ons
Table
Country
User
ID
Affl
ID
Affl
Name
ID
131
Romania
50000
Doug
Moore
04252
001
2
a42
Cal
001
132
Russia
4
b96
USC
001
50001
Mary
White
SW195
002
133
Spain
7
c14
UW
001
50002
Lisa
Clark
12425
001
8
e22
Oxford
002
134
Sweden
12
13. Making
the
Same
Change
with
a
Document
Database
{
“ID”:
1,
“FIRST”:
“Dip&”,
“LAST”:
“Borkar”,
“ZIP”:
“94040”,
“CITY”:
“MV”,
“STATE”:
“CA”,
“STATUS”:
}
,
{
“TEXT”:
“At
Conf”
}
“GEO_LOC”:
“134”
},
“COUNTRY”:
”USA”
}
JSON
Just
add
informa&on
to
a
document
13
14. Document
modeling
• Are
these
separate
object
in
the
model
layer?
Q
•
•
Are
these
objects
accessed
together?
Do
you
need
updates
to
these
objects
to
be
atomic?
• Are
mul&ple
people
edi&ng
these
objects
concurrently?
When
considering
how
to
model
data
for
a
given
applica&on
• Think
of
a
logical
container
for
the
data
• Think
of
how
data
groups
together
14
15. Document
Design
Op&ons
• One
document
that
contains
all
related
data
– Data
is
de-‐normalized
– Be]er
performance
and
scale
– Eliminate
client-‐side
joins
• Separate
documents
for
different
object
types
with
cross
references
– Data
duplica&on
is
reduced
– Objects
may
not
be
co-‐located
– Transac&ons
supported
only
on
a
document
boundary
– Most
document
databases
do
not
support
joins
15
16. Document
ID
/
Key
selec&on
• Similar
to
primary
keys
in
rela&onal
databases
• Documents
are
sharded
based
on
the
document
ID
• ID
based
document
lookup
is
extremely
fast
• Usually
an
ID
can
only
appear
once
in
a
bucket
Q
•
Do
you
have
a
unique
way
of
referencing
objects?
•
Are
related
objects
stored
in
separate
documents?
Op&ons
• UUIDs,
date-‐based
IDs,
numeric
IDs
• Hand-‐crajed
(human
readable)
• Matching
prefixes
(for
mul&ple
related
objects)
16
17. Example:
En&&es
for
a
Blog
BLOG
• User
profile
The
main
pointer
into
the
user
data
• Blog
entries
• Badge
sekngs,
like
a
twi]er
badge
• Blog
posts
Contains
the
blogs
themselves
• Blog
comments
• Comments
from
other
users
17
20. Threaded
Comments
• You
can
imagine
how
to
take
this
to
a
threaded
list
List
First
Reply
to
comment
Blog
List
comment
More
Comments
Advantages
• Only
fetch
the
data
when
you
need
it
• For
example,
rendering
part
of
a
web
page
• Spread
the
data
and
load
across
the
en&re
cluster
20
22. Rela&onal
Technology
Scales
Up
Applica&on
Scales
Out
Just
add
more
commodity
web
servers
System
Cost
Applica&on
Performance
Web/App
Server
Tier
Users
RDBMS
Scales
Up
Get
a
bigger,
more
complex
server
System
Cost
Applica&on
Performance
Won’t
scale
beyond
this
point
Rela&onal
Database
Users
Expensive
and
disrup&ve
sharding,
doesn’t
perform
at
web
scale
22
23. Couchbase
Server
Scales
Out
Like
App
Tier
Applica&on
Scales
Out
Just
add
more
commodity
web
servers
System
Cost
Applica&on
Performance
Web/App
Server
Tier
Users
NoSQL
Database
Scales
Out
Cost
and
performance
mirrors
app
&er
System
Cost
Applica&on
Performance
Couchbase
Distributed
Data
Store
Users
Scaling
out
flatens
the
cost
and
performance
curves
23
25. The
Process
–
From
Evalua&on
to
Go
Live
No
different
from
evalua&ng
a
rela&onal
database
1
Analyze
your
requirements
2
Find
solu&ons
/
products
that
match
key
requirements
3
Execute
a
proof
of
concept
/
performance
evalua&on
4
Begin
development
of
applica&on
5
Deploy
in
staging
and
then
produc&on
New
requirements
è
New
solu&ons
25
26. 1
Analyze
your
requirements
Common
applica&on
requirements
• Rapid
applica&on
development
– Changing
market
needs
– Changing
data
needs
• Scalability
– Unknown
user
demand
– Constantly
growing
throughput
• Consistent
Performance
– Low
response
&me
for
be]er
user
experience
– High
throughput
to
handle
viral
growth
• Reliability
– Always
online
26
27. 2
Find
solu&ons
that
match
key
requirements
• Linear
Scalability
• Schema
flexibility
NoSQL
• High
Performance
• Mul&-‐document
transac&ons
• Database
Rollback
• Complex
security
needs
RDBMS
• Complex
joins
• Extreme
compression
needs
• Both
/
depends
on
the
data
RDBMS
NoSQL
27
28. 3
Proof
of
concept
/
Performance
evalua&on
Prototype
a
workload
• Look
for
consistent
performance…
– Low
response
&mes
/
latency
• For
be]er
user
experience
– High
throughput
• To
handle
viral
growth
• For
resource
efficiency
• …
across
– Read
heavy
/
Write
heavy
/
Mixed
workloads
– Clusters
of
growing
sizes
• …
and
watch
for
– Conten&on
/
heavy
locking
– Linear
scalability
28
29. 3
Other
considera&ons
Accessing
data
App
Server
– No
standards
exist
yet
– Typically
via
SDKs
or
over
HTTP
– Check
if
the
programing
language
of
your
choice
is
supported.
Consistency
App
Server
– Consistent
only
at
the
document
level
– Most
documents
stores
currently
don’t
support
mul&-‐document
transac&ons
– Analyze
your
applica&on
needs
Availability
App
Server
– Each
node
stores
ac&ve
and
replica
data
(Couchbase)
– Each
node
is
either
a
master
or
slave
(MongoDB)
29
30. 3
Other
considera&ons
Opera&ons
App
Server
– Monitoring
the
system
– Backup
and
restore
the
system
– Upgrades
and
maintenance
– Support
Ease
of
Scaling
App
Server
– Ease
of
adding
and
reducing
capacity
Client
– Single
node
type
– App
availability
on
topology
changes
Indexing
and
Querying
– Secondary
indexes
(Map
func&ons)
– Aggregates
Grouping
(Reduce
func&ons)
– Basic
querying
30
31. 4
Begin
development
Data
Modeling
and
Document
Design
31
32. 5
Deploying
to
staging
and
produc&on
• Monitoring
the
system
• RESTful
interfaces
/
Easy
integra&on
with
monitoring
tools
• High-‐availability
• Replica&on
• Failover
and
Auto-‐failover
• Always
Online
–
even
for
maintenance
tasks
• Database
upgrades
• Sojware
(OS)
and
Hardware
upgrades
• Backup
and
restore
• Index
building
• Compac&on
32
35. So
are
you
being
impacted
by
these?
Schema
Rigidity
problems
• Do
you
store
serialized
objects
in
the
database?
• Do
you
have
lots
of
sparse
tables
with
very
few
columns
Q
being
used
by
most
rows?
• Do
you
find
that
your
applica&on
developers
require
schema
changes
frequently
due
to
constantly
changing
data?
• Are
you
using
your
database
as
a
key-‐value
store?
Scalability
problems
• Do
you
periodically
need
to
upgrade
systems
to
more
powerful
servers
and
scale
up?
Q
• Are
you
reaching
the
read
/
write
throughput
limit
of
a
single
database
server?
• Is
your
server’s
read
/
write
latency
not
mee&ng
your
SLA?
• Is
your
user
base
growing
at
a
frightening
pace?
35
36. Is
NoSQL
the
right
choice
for
you?
Does
your
applica&on
need
rich
database
func&onality?
• Mul&-‐document
transac&ons
• Complex
security
needs
–
user
roles,
document
level
security,
authen&ca&on,
authoriza&on
integra&on
• Complex
joins
across
bucket
/
collec&ons
• BI
integra&on
• Extreme
compression
needs
NoSQL
may
not
be
the
right
choice
for
your
applica&on
36
38. Market
Adop&on
Internet
Companies
Enterprises
• Social
Gaming
• Communica&ons
• Ad
Networks
• Retail
• Social
Networks
• Financial
Services
• Online
Business
• Health
Care
Services
• Automo&ve/Airline
• E-‐Commerce
• Agriculture
• Online
Media
• Content
Management
• Consumer
Electronics
• Cloud
Services
• Business
Systems
38
39. Market
Adop&on
–
Customers
Internet
Companies
Enterprises
More
than
300
customers
-‐-‐
5,000
produc&on
deployments
worldwide
39
40. Applica&on
Characteris&cs
-‐
Data
driven
• 3rd
party
or
user
defined
structure
(Twi]er
feeds)
• Support
for
unlimited
data
growth
(Viral
apps)
• Data
with
non-‐homogenous
structure
• Need
to
quickly
and
ojen
change
data
structure
• Variable
length
documents
• Sparse
data
records
• Hierarchical
data
Couchbase
is
a
good
fit
40
41. Applica&on
Characteris&cs
-‐
Performance
driven
• Low
latency
cri&cal
(ex.
1millisecond)
• High
throughput
(ex.
200000
ops
/
sec)
• Large
number
of
users
• Unknown
demand
with
sudden
growth
of
users/data
• Predominantly
direct
document
access
• Read
/
Mixed
/
Write
heavy
workloads
Couchbase
is
a
good
fit
41
42. Use
Case
Examples
Web
app
or
Use-‐case
Couchbase
Solu&on
Example
Customer
Content
and
Metadata
Couchbase
document
store
+
Elas&c
Search
McGraw-‐Hill…
Management
System
Social
Game
or
Mobile
Couchbase
stores
game
and
player
data
Zynga…
App
Ad
Targe&ng
Couchbase
stores
user
informa&on
for
fast
AOL…
access
User
Profile
Store
Couchbase
Server
as
a
key-‐value
store
TuneWiki…
Session
Store
Couchbase
Server
as
a
key-‐value
store
Concur….
High
Availability
Couchbase
Server
as
a
memcached
&er
Orbitz…
Caching
Tier
replacement
Chat/Messaging
Couchbase
Server
DOCOMO…
Plavorm
42
44. Couchbase
Server
NoSQL
Distributed
Document
Database
for
interac&ve
web
applica&ons
2.0
44
45. Couchbase
Server
Grow
cluster
without
Easy
applica&on
changes,
without
Scalability
down&me
with
a
single
click
Consistent
sub-‐millisecond
Consistent,
High
read
and
write
response
&mes
Performance
with
consistent
high
throughput
Always
On
No
down&me
for
sowware
24x7x365
upgrades,
hardware
maintenance,
etc.
45
46. Flexible
Data
Model
{
“ID”:
1,
“FIRST”:
“Dip&”,
“LAST”:
“Borkar”,
“ZIP”:
“94040”,
“CITY”:
“MV”,
“STATE”:
“CA”
}
JSON
JSON
JSON
JSON
• No
need
to
worry
about
the
database
when
changing
your
applica&on
• Records
can
have
different
structures,
there
is
no
fixed
schema
• Allows
painless
data
model
changes
for
rapid
applica&on
development
46
48. Couchbase
Server
2.0
Architecture
8092
11211
11210
Query
API
Memcapable
1.0
Memcapable
2.0
Moxi
Query
Engine
REST
management
API/Web
UI
vBucket
state
and
replica&on
manager
Memcached
Global
singleton
supervisor
Rebalance
orchestrator
Configura&on
manager
Node
health
monitor
Process
monitor
Heartbeat
Couchbase
EP
Engine
Data
Manager
Cluster
Manager
storage
interface
New
Persistence
Layer
htp
on
each
node
one
per
cluster
Erlang/OTP
HTTP
Erlang
port
mapper
Distributed
Erlang
8091
4369
21100
-‐
21199
48
49. Couchbase
Server
2.0
Architecture
8092
11211
11210
Query
API
Memcapable
1.0
Memcapable
2.0
Moxi
Query
Engine
REST
management
API/Web
UI
vBucket
state
and
replica&on
manager
Memcached
Global
singleton
supervisor
Rebalance
orchestrator
Configura&on
manager
Node
health
monitor
Process
monitor
Heartbeat
Couchbase
EP
Engine
storage
interface
New
Persistence
Layer
htp
on
each
node
one
per
cluster
Erlang/OTP
HTTP
Erlang
port
mapper
Distributed
Erlang
8091
4369
21100
-‐
21199
49
50. Couchbase
deployment
Web
Applica&on
Couchbase
Client
Library
Data
Flow
Cluster
Management
50
51. Single
node
-‐
Couchbase
Write
Opera&on
2
Doc
1
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Doc
1
Queue
Disk
Queue
Disk
Couchbase
Server
Node
51
52. Single
node
-‐
Couchbase
Update
Opera&on
2
Doc
1’
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Doc
1
Doc
1’
Queue
Disk
Queue
Disk
Doc
1
Couchbase
Server
Node
52
53. Single
node
-‐
Couchbase
Read
Opera&on
2
Doc
1
GET
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Queue
Doc
1
Disk
Queue
Disk
Doc
1
Couchbase
Server
Node
53
54. Single
node
-‐
Couchbase
Cache
Evic&on
2
Doc
6
2
3
4
5
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Queue
Doc
1
Disk
Queue
Disk
Doc
1
Doc
6
Doc
5
Doc
4
Doc
3
Doc
2
Couchbase
Server
Node
54
55. Single
node
–
Couchbase
Cache
Miss
2
Doc
1
GET
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Queue
Doc
1
Doc
5
4
4
Doc
Doc
Doc
3
2
Doc
Disk
Queue
Disk
Doc
1
Doc
6
Doc
5
Doc
4
Doc
3
Doc
2
Couchbase
Server
Node
55
56. Cluster
wide
-‐
Basic
Opera&on
APP
SERVER
1
APP
SERVER
2
COUCHBASE
Client
Library
COUCHBASE
Client
Library
CLUSTER
MAP
CLUSTER
MAP
READ/WRITE/UPDATE
SERVER
1
SERVER
2
SERVER
3
• Docs
distributed
evenly
across
ACTIVE
ACTIVE
ACTIVE
servers
Doc
5
Doc
Doc
4
Doc
Doc
1
Doc
• Each
server
stores
both
ac&ve
and
replica
docs
Doc
2
Doc
Doc
7
Doc
Doc
2
Doc
Only
one
server
ac&ve
at
a
&me
• Client
library
provides
app
with
Doc
9
Doc
Doc
8
Doc
Doc
6
Doc
simple
interface
to
database
REPLICA
REPLICA
REPLICA
• Cluster
map
provides
map
to
which
server
doc
is
on
Doc
4
Doc
Doc
6
Doc
Doc
7
Doc
App
never
needs
to
know
Doc
1
Doc
Doc
3
Doc
Doc
9
Doc
• App
reads,
writes,
updates
docs
Doc
8
Doc
Doc
2
Doc
Doc
5
Doc
• Mul&ple
app
servers
can
access
same
document
at
same
&me
COUCHBASE
SERVER
CLUSTER
User
Configured
Replica
Count
=
1
56
57. Cluster
wide
-‐
Add
Nodes
to
Cluster
APP
SERVER
1
APP
SERVER
2
COUCHBASE
Client
Library
COUCHBASE
Client
Library
CLUSTER
MAP
CLUSTER
MAP
READ/WRITE/UPDATE
READ/WRITE/UPDATE
SERVER
1
SERVER
2
SERVER
3
SERVER
4
SERVER
5
• Two
servers
added
ACTIVE
ACTIVE
ACTIVE
ACTIVE
ACTIVE
One-‐click
opera&on
Doc
5
Doc
Doc
4
Doc
Doc
1
Doc
• Docs
automa&cally
rebalanced
across
Doc
2
Doc
Doc
7
Doc
Doc
2
Doc
cluster
Even
distribu&on
of
docs
Minimum
doc
movement
Doc
9
Doc
Doc
8
Doc
Doc
6
Doc
• Cluster
map
updated
REPLICA
REPLICA
REPLICA
REPLICA
REPLICA
• App
database
Doc
4
Doc
Doc
6
Doc
Doc
7
Doc
calls
now
distributed
over
larger
number
of
Doc
1
Doc
Doc
3
Doc
Doc
9
Doc
servers
Doc
8
Doc
Doc
2
Doc
Doc
5
Doc
COUCHBASE
SERVER
CLUSTER
User
Configured
Replica
Count
=
1
57
58. Cluster
wide
-‐
Fail
Over
Node
APP
SERVER
1
APP
SERVER
2
COUCHBASE
Client
Library
COUCHBASE
Client
Library
CLUSTER
MAP
CLUSTER
MAP
SERVER
1
SERVER
2
SERVER
3
SERVER
4
SERVER
5
• App
servers
accessing
docs
ACTIVE
ACTIVE
ACTIVE
ACTIVE
ACTIVE
• Requests
to
Server
3
fail
Doc
5
Doc
Doc
4
Doc
Doc
1
Doc
Doc
9
Doc
Doc
6
Doc
• Cluster
detects
server
failed
Promotes
replicas
of
docs
to
Doc
2
Doc
Doc
7
Doc
Doc
2
Doc
Doc
8
Doc
Doc
ac&ve
Updates
cluster
map
Doc
1
Doc
3
• Requests
for
docs
now
go
to
REPLICA
REPLICA
REPLICA
REPLICA
REPLICA
appropriate
server
Doc
4
Doc
Doc
6
Doc
Doc
7
Doc
Doc
5
Doc
Doc
8
Doc
• Typically
rebalance
would
follow
Doc
1
Doc
Doc
3
Doc
Doc
9
Doc
Doc
2
Doc
COUCHBASE
SERVER
CLUSTER
User
Configured
Replica
Count
=
1
58
59. Indexing
and
Querying
APP
SERVER
1
APP
SERVER
2
COUCHBASE
Client
Library
COUCHBASE
Client
Library
CLUSTER
MAP
CLUSTER
MAP
Query
SERVER
1
SERVER
2
SERVER
3
• Indexing
work
is
distributed
ACTIVE
ACTIVE
ACTIVE
amongst
nodes
Doc
5
Doc
Doc
5
Doc
Doc
5
Doc
• Large
data
set
possible
Doc
2
Doc
Doc
2
Doc
Doc
2
Doc
• Parallelize
the
effort
Doc
9
Doc
• Each
node
has
index
for
data
stored
Doc
9
Doc
Doc
9
Doc
on
it
REPLICA
REPLICA
REPLICA
• Queries
combine
the
results
from
Doc
4
Doc
required
nodes
Doc
4
Doc
Doc
4
Doc
Doc
1
Doc
Doc
1
Doc
Doc
1
Doc
Doc
8
Doc
Doc
8
Doc
Doc
8
Doc
COUCHBASE
SERVER
CLUSTER
User
Configured
Replica
Count
=
1
59
60. Cross
Data
Center
Replica&on
(XDCR)
SERVER
1
SERVER
2
SERVER
3
ACTIVE
ACTIVE
ACTIVE
COUCHBASE
SERVER
CLUSTER
Doc
Doc
Doc
NY
DATA
CENTER
Doc
2
Doc
Doc
Doc
9
Doc
Doc
RAM
RAM
RAM
Doc
Doc
Doc
Doc
Doc
Doc
Doc
Doc
Doc
DISK
DISK
DISK
SERVER
1
SERVER
2
SERVER
3
ACTIVE
ACTIVE
ACTIVE
Doc
Doc
Doc
Doc
2
Doc
Doc
Doc
9
Doc
Doc
RAM
RAM
RAM
COUCHBASE
SERVER
CLUSTER
Doc
Doc
Doc
Doc
Doc
Doc
Doc
Doc
Doc
SF
DATA
CENTER
DISK
DISK
DISK
60