The document provides an overview of NoSQL databases. It discusses how NoSQL databases were developed as an alternative to relational databases to address issues of scale, diversity of data types, and large data sizes. It describes some key aspects of NoSQL databases, including their use of eventual consistency, automatic partitioning of large amounts of data, and various data storage models like key-value, columnar, and document-based approaches. Examples of NoSQL databases discussed include DynamoDB, Bigtable, and CouchDB.
Here is my seminar presentation on No-SQL Databases. it includes all the types of nosql databases, merits & demerits of nosql databases, examples of nosql databases etc.
For seminar report of NoSQL Databases please contact me: ndc@live.in
Here is my seminar presentation on No-SQL Databases. it includes all the types of nosql databases, merits & demerits of nosql databases, examples of nosql databases etc.
For seminar report of NoSQL Databases please contact me: ndc@live.in
This is a presentation of the popular NoSQL database Apache Cassandra which was created by our team in the context of the module "Business Intelligence and Big Data Analysis".
“not only SQL.”
NoSQL databases are databases store data in a format other than relational tables.
NoSQL databases or non-relational databases don’t store relationship data well.
In this technical overview of Azure Cosmos DB you will learn how easy it is to get started building planet-scale applications with Azure Cosmos DB. We’ll then take a closer look at important design aspects around global distribution, consistency, and server-side partitioning. How to model your data to fit your app’s needs using tools and APIs you love.
This Presentation is about NoSQL which means Not Only SQL. This presentation covers the aspects of using NoSQL for Big Data and the differences from RDBMS.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
In this lecture we analyze document oriented databases. In particular we consider why there are the first approach to nosql and what are the main features. Then, we analyze as example MongoDB. We consider the data model, CRUD operations, write concerns, scaling (replication and sharding).
Finally we presents other document oriented database and when to use or not document oriented databases.
This presentation is related to nosql database and nosql database types information. this presentationa also contains discussion about, how mongodb works and mongodb security and mongodb sharding information.
MongoDB Atlas makes it easy to set up, operate, and scale your MongoDB deployments in the cloud. From high availability to scalability, security to disaster recovery - we've got you covered.
Automated: With MongoDB Atlas, you no longer need to worry about operational tasks such as provisioning, configuration, patching, upgrades, backups, and failure recovery. MongoDB Atlas provides the functionality and reliability you need, at the click of a button.
Flexible: Only MongoDB Atlas combines the critical capabilities of relational databases with the innovations of NoSQL. Radically simplify development and operations by delivering a diverse range of capabilities in a single, managed database platform.
Secure: MongoDB Atlas provides multiple levels of security for your database. These include robust access control, network isolation using Amazon VPC, IP whitelists, encryption of data in-flight using TLS/SSL, and optional encryption of the underlying filesystem.
Scalable: MongoDB Atlas grows with you, all with the click of a button. You can scale up across a range of instance sizes, and scale-out with automatic sharding. And you can do it with zero application downtime.
Highly Available: MongoDB Atlas is designed to offer exceptional uptime. Recovery from instance failures is transparent and fully automated. A minimum of three copies of your data are replicated across availability zones and continuously backed up.
High Performance: MongoDB Atlas provides high throughput and low latency for the most demanding workloads. Consistent, predictable performance eliminates the need for separate caching tiers, and delivers a far better price-performance ratio compared to traditional database software.
This is a presentation of the popular NoSQL database Apache Cassandra which was created by our team in the context of the module "Business Intelligence and Big Data Analysis".
“not only SQL.”
NoSQL databases are databases store data in a format other than relational tables.
NoSQL databases or non-relational databases don’t store relationship data well.
In this technical overview of Azure Cosmos DB you will learn how easy it is to get started building planet-scale applications with Azure Cosmos DB. We’ll then take a closer look at important design aspects around global distribution, consistency, and server-side partitioning. How to model your data to fit your app’s needs using tools and APIs you love.
This Presentation is about NoSQL which means Not Only SQL. This presentation covers the aspects of using NoSQL for Big Data and the differences from RDBMS.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
In this lecture we analyze document oriented databases. In particular we consider why there are the first approach to nosql and what are the main features. Then, we analyze as example MongoDB. We consider the data model, CRUD operations, write concerns, scaling (replication and sharding).
Finally we presents other document oriented database and when to use or not document oriented databases.
This presentation is related to nosql database and nosql database types information. this presentationa also contains discussion about, how mongodb works and mongodb security and mongodb sharding information.
MongoDB Atlas makes it easy to set up, operate, and scale your MongoDB deployments in the cloud. From high availability to scalability, security to disaster recovery - we've got you covered.
Automated: With MongoDB Atlas, you no longer need to worry about operational tasks such as provisioning, configuration, patching, upgrades, backups, and failure recovery. MongoDB Atlas provides the functionality and reliability you need, at the click of a button.
Flexible: Only MongoDB Atlas combines the critical capabilities of relational databases with the innovations of NoSQL. Radically simplify development and operations by delivering a diverse range of capabilities in a single, managed database platform.
Secure: MongoDB Atlas provides multiple levels of security for your database. These include robust access control, network isolation using Amazon VPC, IP whitelists, encryption of data in-flight using TLS/SSL, and optional encryption of the underlying filesystem.
Scalable: MongoDB Atlas grows with you, all with the click of a button. You can scale up across a range of instance sizes, and scale-out with automatic sharding. And you can do it with zero application downtime.
Highly Available: MongoDB Atlas is designed to offer exceptional uptime. Recovery from instance failures is transparent and fully automated. A minimum of three copies of your data are replicated across availability zones and continuously backed up.
High Performance: MongoDB Atlas provides high throughput and low latency for the most demanding workloads. Consistent, predictable performance eliminates the need for separate caching tiers, and delivers a far better price-performance ratio compared to traditional database software.
DataStax C*ollege Credit: What and Why NoSQL?DataStax
In the first of our bi-weekly C*ollege Credit series Aaron Morton, DataStax MVP for Apache Cassandra and Apache Cassandra committer and Robin Schumacher, VP of product management at DataStax, will take a look back at the history of NoSQL databases and provide a foundation of knowledge for people looking to get started with NoSQL, or just wanting to learn more about this growing trend. You will learn how to know that NoSQL is right for your application, and how to pick a NoSQL database. This webinar is C* 101 level.
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
Presented by Mark Miller, Software Engineer, Cloudera
As the NoSQL ecosystem looks to integrate great search, great search is naturally beginning to expose many NoSQL features. Will these Goliath's collide? Or will they remain specialized while intermingling – two sides of the same coin.
Come learn about where SolrCloud fits into the NoSQL landscape. What can it do? What will it do? And how will the big data, NoSQL, Search ecosystem evolve. If you are interested in Big Data, NoSQL, distributed systems, CAP theorem and other hype filled terms, than this talk may be for you.
NoSQL is not a buzzword anymore. The array of non- relational technologies have found wide-scale adoption even in non-Internet scale focus areas. With the advent of the Cloud...the churn has increased even more yet there is no crystal clear guidance on adoption techniques and architectural choices surrounding the plethora of options available. This session initiates you into the whys & wherefores, architectural patterns, caveats and techniques that will augment your decision making process & boost your perception of architecting scalable, fault-tolerant & distributed solutions.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Monitoring Java Application Security with JDK Tools and JFR Events
NoSQL
1. NoSQL
Databases
Yousof
Alsatom
Wirtscha1sinforma3k
Master
Program
Humboldt-‐Universität
zu
Berlin
2012
2. Agenda
• Rela3onal
databases
model
• Advantages
&
Disadvantages
• NoSql
• Basic
Concepts,
Technique
and
PaOern
in
comparison
with
DBRMS
• Consistency
• Par33oning
• Storage
Layout
2
3. Agenda
• NoSQL
data
model
• Key
–
Value
• DynamoDB
• Big
table
–
column
family
• Google
bigtable
• Document
Databases
• CouchDB
• GraphDB
• Neo4j
• Conclusion
3
4. Database
and
DBMS
• In
essence,
a
database
is
a
collec3on
of
data
that
exists
over
a
long
period
of
3me,
o1en
many
years.
•
Commonly,
the
term
database
refers
to
a
collec3on
of
data
that
is
managed
by
a
Database
Management
System
(DBMS).
• A
DBMS
is
a
(powerful)
tool
for
crea3ng
and
managing
large
amounts
of
data
efficiently
and
allowing
it
to
persist
over
long
periods
of
3me,
safely.
4
5. Rela9onal
Model
• A
rela3onal
database
is
a
collec3on
of
data
items
organized
as
a
set
of
formally-‐described
tables
from
which
data
can
be
accessed
or
reassembled
in
many
different
ways
without
having
to
reorganize
the
database
tables.
[techtarget.com].
Edgar
Frank
"Ted"
Codd
(August
23,
1923
–
April
18,
2003)
IBM,
5
6. Rela9onal
Database
• A
rela9onal
database
is
a
collec3on
of
data
items
organized
as
a
set
of
formally
described
tables
from
which
data
can
be
accessed
easily
[Wikipedia].
6
9. Example,
Project
Management
System
[Qian
Sha,
2003]
• Possible
queries
• Give
ma
all
employees
who
is
working
in
project
X
• Give
me
the
percentage
of
progress
for
project
Y
9
10. Rela9onal
Database,
Advantages
• Reliability
• ACID
• Atomicity
:
All
or
nothing
• Consistency
• Isola3on
• concurrent
execu3on
of
transac3ons
results
in
a
system
state
that
could
have
been
obtained
if
transac3ons
are
executed
serially
• Durability
• means
that
once
a
transac3on
has
been
commiJed,
it
will
remain
so,
even
in
the
event
of
power
loss,
crashes,
or
errors.
10
11. Rela9onal
Database,
Limita9on
• Scalability
• Users
can
scale
a
rela3onal
database
by
running
it
on
a
more
powerful—
and
expensive—
computer.
• To
scale
beyond
a
certain
point,
though,
it
must
be
distributed
across
mul3ple
servers.
• Rela3onal
databases
don’t
work
easily
in
a
distributed
manner
because
joining
their
tables
across
a
distributed
system
is
difficult.
[Jeremy
Zawodny]
• Complexity
• Convert
all
data
into
tables,
Complex,
slow
(Exampl
:
Wikipedia)
• SQL
can
work
only
with
structured
data
[
Prof.
Stefan
Edlich,
Beuth
University
of
Applied
Sciences
in
Berlin]
11
15. NoSQL
• Not
using
the
rela3onal
model
(nor
the
SQL
language)
• No
schema,
allowing
fields
to
be
added
to
any
record
without
controls
• Open
source
• Designed
to
work
on
large
clusters
• Based
on
the
needs
of
21st
century
web
proper3es
15
16. NoSQL,
History
• Carlo
Strozzi
used
the
term
NoSQL
in
1998
to
name
his
lightweight,
open-‐
source
rela3onal
database
that
did
not
expose
the
standard
SQL
interface.
• Johan
Oskarsson
has
organized
a
meetup
for
folks
interested
in
distributed
structured
data
storage
and
is
calling
it
NoSQL.
The
event,
being
held
June
11th
in
San
Fransisco,
16
17. NoSQL
• Consistency
• It
uses
an
eventual
consistency
(consistency
model
used
in
the
parallel
programming).
• Weak
consistent
• Par33oning
• Automa3c
Par33oning
(Data
is
growing
)
• Storage
Layout
• Row-‐Based
Storage
Layout
• Columnar
Storage
Layout
• …
17
18. NoSQL
• Data
Model
• Key
/
Value
• Bigtable
• DocumentDB
• GraphDB
18
20. Hash
Table
• Type
Unsorted
associa3ve
array
• Invented:
1953
• Time
complexity
:
in
big
O
nota3on
Average
Worst
case
Space
O(n)
O(n)
Search
O(1
+
n/k)
O(n)
Insert
O(1)
O(n)
Delete
O(1
+
n/k)
O(n)
Wikipedia
:
hOp://en.wikipedia.org/wiki/Hash_tables
20
21. Key
–
Value
• The
infrastructure
is
made
up
by
tens
of
thousands
of
servers
and
network
components
located
in
many
datacenters
around
the
world.
• Availability
&
reliability
are
the
most
important
factors
for
Amazon
• Dynamo
targets
to
achieve
high
availability
with
less
consistency
Service-‐oriented
architecture
of
Amazon’s
plaXorm
Dynamo:
Amazon’s
Highly
Available
Key-‐value
Store.
September
2007.
21
22. Key
–
Value,
Dynamo
History
• Giuseppe
DeCandia
militate
against
RDMBSs
at
Amazon
• They
admit
that
advances
have
been
made
to
scale
and
par33on
RDBMSs
but
state
that
such
setups
remain
difficult
to
configure
and
operate,
2006
• Dynamo
has
built
on
2007
22
23. Dynamo,
Consistency
Hashing
Data
is
par33oned
and
replicated
using
consistent
hashing
• Goal
:
Scalability
and
Availability
•
the
output
range
of
a
hash
func3on
is
treated
as
a
fixed
circular
space
or
““ring”
• Ordered
(new
node
take
random
key)
• Clockwise
• Departure
or
arrival
a
node
effect
only
neighbors
• Each
node
becomes
responsible
for
the
region
in
the
ring
between
it
and
its
predecessor
node
on
the
ring.
• ”Virtual
Nodes”:
Each
node
can
be
responsible
for
more
than
one
virtual
node.
Dynamo:
Amazon’s
Highly
Available
Key-‐value
Store.
September
2007.
23
24. Dynamo,
Vector
Clock
• Data
Versioning,
Dynamo
uses
vector
Object
Node
clocks
in
order
to
capture
causality
between
different
versions
of
the
same
object.
Clock
• A
vector
clock
is
a
list
of
(node,
counter)
pairs.
• Every
version
of
every
object
is
associated
with
one
vector
clock.
• If
the
counters
on
the
first
object’s
clock
are
less-‐than-‐or-‐equal
to
all
of
the
nodes
in
the
second
clock,
then
the
first
is
an
ancestor
of
the
second
and
can
be
forgoOen.
Dynamo:
Amazon’s
Highly
Available
Key-‐value
Store.
September
2007.
24
26. Dynamo,
Sloppy
Quorum
• Handling
Failures,
Sloppy
Quorum
• A
quorum
is
the
minimum
number
of
votes
that
a
distributed
transac3on
has
to
obtain
in
order
to
be
allowed
to
perform
an
opera3on
in
a
distributed
system.
[Wikipedia]
• Sloppy
Quorum
• read
and
write
opera3ons
are
performed
on
the
first
N
healthy
nodes
from
the
preference
list,
which
may
not
always
be
the
first
N
nodes
encountered
while
walking
the
consistent
hashing
ring.
• Example
:
• A
is
down
…
• D
has
meta
data
• When
A
come
back,
D
will
aOempt
to
deliver
the
replica
to
A
Dynamo:
Amazon’s
Highly
Available
Key-‐value
Store.
September
2007.
26
27. Dynamo,
Gossip-‐based
membership
protocol
and
failure
detec9on.
• A
gossip-‐based
protocol
propagates
membership
changes
and
maintains
an
eventually
consistent
view
of
membership.
27
28. Key
–
Value,
Dynamo
Problem
Technique
Advantage
Par33oning
Consistent
Hashing
Incremental
Scalability
Vector
clocks
with
reconcilia3on
Version
size
is
decoupled
from
update
High
Availability
for
writes
during
reads
rates.
Handling
temporary
failures
Sloppy
Quorum
and
hinted
handoff
Provides
high
availability
and
durability
guarantee
when
some
of
the
replicas
are
not
available.
Synchronizes
divergent
replicas
in
the
Recovering
from
permanent
failures
An3-‐entropy
using
Merkle
trees
background.
Preserves
symmetry
and
avoids
having
a
centralized
registry
for
storing
Gossip-‐based
membership
protocol
membership
and
node
liveness
Membership
and
failure
detec3on
and
failure
detec3on.
informa3on.
Dynamo:
Amazon’s
Highly
Available
Key-‐value
Store.
September
2007.
28
29. Key
–
Value,
Dynamo
• Query
Model
• get(key)
:
objects,
context
• Context:
metadata
such
as
the
object
version
is
stored,
it
is
useful
in
case
of
conflict
• put(key,
context,
object),
The
key
is
hashed
by
the
MD5
algorithm
29
30. Other
Key
/
Value
NoSQL
tools
Riak
makes
data
highly
available
for
use
in
read
and
write-‐intensive
web
applica3ons.
30
32. Bigtable
• Bigtable
is
described
as
“a
distributed
storage
system
for
managing
structured
data
that
is
designed
to
scale
to
a
very
large
size:
petabytes
of
data
across
thousands
of
commodity
servers”
[Google
Labs]
• Bigtable
• distributed,
• Persistent
mul3-‐
dimensional
sorted
map.
• The
map
is
indexed
by
a
row
key,
column
key,
and
a
3mestamp
• Each
value
in
the
map
is
an
uninterpreted
array
of
bytes.
• (row:string,
column:string,
3me:int64)
→
string
32
33. Google’s
Bigtable
• It
is
used
by
over
sixty
projects
at
Google
as
of
2006,
• Web
indexing
• Google
Earth
• Google
Analy3cs
• Orkut
• Google
Docs
33
34. Google’s
Bigtable,
Data
Model
• Store
CNN
Web
pages
• Row
name
is
the
reversed
URL
• Contents
column
family
contains
the
page
contents
• Anchor column family contains the text of any anchors that
reference the page
Row
Column
Family
A
Distributed
Storage
System
for
Structured
Data.
November
2006.
hOp://labs.google.com/papers/bigtable-‐osdi06.pdf
34
35. Google’s
Bigtable,
Data
Model
• CNN’s
home
page
is
referenced
by
both
the
Sports
Illustrated
and
the
MY-‐
look
home
pages.
• The
row
contains
columns
named
anchor:cnnsi.com
and
anchor:my.look.ca.
• t3
:
3me
stamp
Row
Column
Family
A
Distributed
Storage
System
for
Structured
Data.
November
2006.
hOp://labs.google.com/papers/bigtable-‐osdi06.pdf
35
36. Google’s
Bigtable,
Data
Model
Tablet,
Rows
from
same
domain
Com.google.docs
Com.google.mail
Com.google.play
Tablet,
lexicographic
order
36
37. Google’s
Bigtable,
Data
Model
• Notes
• Has
no
fixed
of
number
of
rows
or
columns
• Every
value
also
has
an
associated
3mestamp
• Each
value
is
addressed
by
the
triple
(domain-‐name,
column-‐name,
3mestamp)
37
40. Google’s
Bigtable,
More
• Example
with
eclipse
:
hOp://www.kobu.com/appeng/index-‐en.htm
• Bigtable
as
a
web
service
:
hOp://bigtable.appspot.com/
• Performance
and
benchmarking:
Chang,
Fay
;
Dean,
Jeffrey
;
Ghemawat,
Sanjay
;
Hsieh,
Wilson
C.
;
Wallach,
Deborah
A.
;
Burrows,
Mike
;
Chandra,
Tushar
;
Fikes,
Andrew
;
Gruber,
Robert
E.:
Bigtable:
A
Distributed
Storage
System
for
Structured
Data.
November
2006.
–
hOp://
labs.google.com/papers/bigtable-‐osdi06.pdf
40
41. Other
Bigtable
NoSQL
tools
Use
HBase
when
you
need
random,
real3me
read/write
access
to
your
Big
Data.
This
project's
goal
is
the
hos3ng
of
very
large
tables
41
43. Document
Databases
• Storing,
retrieving,
and
managing
document-‐oriented,
or
semi
structured
data,
informa3on
• Documents
encapsulate
and
encode
data
(or
informa3on)
in
some
standard
formats
or
encodings.
• Encodings
in
use
include
XML,
YAML,
JSON,
and
BSON,
as
well
as
binary
forms
like
PDF
and
Microso1
Office
documents
(MS
Word,
Excel,
and
so
on).
Wikipedia
:
hOp://en.wikipedia.org/wiki/Document-‐oriented_database
43
44. CouchDB
• Distributed
Database
System
• Before
each
document
saved
as
XML
• Javascript
func3on
(JSON
for
steriliza3on)
select
and
aggregate
documents
• Current
Release
:
1.2
(April
2012)
• Started
on
2005
• Ini3a3ve
:
Damien
Katz
44
45. CouchDB,
Overview
• Implemented
by
ERLANG
• ERLANG
• Func3onal
language
• It
was
designed
by
Ericsson
to
support
distributed,
fault-‐tolerant,
so1-‐
real-‐3me,
non-‐stop
applica3ons.
• Code
example
fac(N)
when
N
>
0,
is_integer(N)
-‐>
N
*
fac(N-‐1)
45
46. CouchDB,
Overview
• Documents
consist
of
named
fields
• key/name
and
a
value.
• Fieldname
has
to
be
unique
within
a
document
• Value
may
a
string
(of
arbitrary
length),
number,
boolean,
date,
an
ordered
list
or
an
associa3ve
map,
document
could
refer
to
another
document
• Example,
wiki
ar3cle
(document):
• "Title"
:
"CouchDB”,
• "Last
editor"
:
"172.5.123.91”,
• "Last
modified":
"9/23/2010”,
• "Categories":
["Database",
"NoSQL",
"Document
Database"],
• "Body":
"CouchDB
is
a
...",
• "Reviewed":
false
46
47. CouchDB,
Overview
• Each document has an id : 128 bit value
• Version number 32 bit value
• B-Trees do document indexing (id, version, some meta-data)
47
48. CouchDB
• CouchDB
uses
B-‐tree
storage
engine
for
all
internal
data,
documents,
and
views.
• Using
MapReduce,
return
and
key
or
range,
complexity
O(log
N)
Source
:CouchDB
the
Defini3ve
Guide,
O’REILLY,
Andelson,
Lebnardt
&
Slater
48
49. CouchDB,
Revisions
• If
you
want
to
change
a
field
in
specific
document?
• Load
document
• Change
it
in
JSON
or
your
object
in
actual
programming
• For
update
or
delete
a
document,
CouchDB
expects
you
include
a
_rev
• When
CouchDB
confirms
changes,
it
generate
a
new
_rev
• This
revision
system
also
called
a
Mul3-‐Version
Concurrency
control
MVCC
49
50. CouchDB,
Locking
Mechanism
• Mul3
Version
Concurrency
Control
MVCC
• Documents
in
CouchDB
saved
like
they
are
in
Subversion
Control
Source
:
CouchDB
the
Defini3ve
Guide,
O’REILLY,
Andelson,
Lebnardt
&
Slater
50
51. CouchDB,
Views
{
"_id":"hello-‐world",
"_rev":"43FBA4E7AB",
"3tle":"Hello
World”,
"body":"Well
hello
and
welcome
to
my
new
blog...",
"date":"2009/01/15
15:52:20"
}
{
"_id":"bought-‐a-‐cat",
"_rev":"4A3BBEE711",
"3tle":"Bought
a
Cat",
"body":"I
went
to
the
the
pet
store
earlier
and
brought
home
a
liOle
kiOy...",
"date":"2009/02/17
21:13:39"
}
func3on(doc)
{
if(doc.date
&&
doc.3tle)
{
emit(doc.date,
doc.3tle);
}
}
51
52. CouchDB,
AJachement
• CouchDB
documents
can
have
aOachments
just
like
an
email
message
can
have
aOachments.
• AOachment
is
iden3fied
by
• Name
• MIME
type
(or
Content-‐Type),
any
data
• Number
of
bytes
the
aOachment
contains.
• Example
:
• curl
-‐vX
PUT
hOp://127.0.0.1:5984/albums/
6e1295ed6c29495e54cc05947f18c8af/
artwork.jpg?
rev=2-‐2739352689
-‐-‐data-‐binary
@artwork.jpg
-‐H
"Content-‐Type:
image/jpg"
• Retrieve
aOachment:
• h7p://
127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af/
artwork.jpg
52
53. CouchDB,
Replica9on
• CouchDB
replica3on
is
a
mechanism
to
synchronize
databases.
• Replica3on
synchronizes
two
databases
locally
or
remotely.
53
54. CouchDB,
Replica9on
• Create
target
Database
(it
is
not
automa3c)
• curl
-‐X
PUT
hOp://127.0.0.1:5984/albums-‐replica
• Perform
replica3on:
• curl
-‐vX
POST
hOp://127.0.0.1:5984/_replicate
-‐d
'{"source":"albums","target":"albums-‐replica"}'
• What
we
did
local
replica3on,
it
is
useful
for
backup
or
to
ac3viate
roll
back
• It
is
important
to
note
that
replica3on
replicates
the
database
only
as
it
was
at
the
point
in
3me
when
replica3on
was
started.
54
55. Other
Document
Database
tools
• MongoDB
(from
"humongous")
is
a
scalable,
high-‐performance,
open
source
NoSQL
database.
WriOen
in
C++,
55
57. Graph
Databases
• A
graph
database
uses
graph
structures
with
nodes,
edges,
and
proper3es
to
represent
and
store
data.
By
defini3on,
a
graph
database
is
any
storage
system
that
provides
index-‐free
adjacency.
This
means
that
every
element
contains
a
direct
pointer
to
its
adjacent
element
and
no
index
lookups
are
necessary
[Wikipedia].
57
58. Graph
Databases
Survey
of
Graph
Database
Models
,
ACM
Compu3ng
Surveys,
Vol.
40,
No.
1,
Ar3cle
1,
Publica3on
date:
February
2008.
RENZO
ANGLES
and
CLAUDIO
GUTIERREZ,
University
Chile
58
59. Graph
Databases,
Data
model
proper9es
• Graph
databases
are
o1en
faster
for
associa3ve
data
sets
• Scale
more
naturally
to
large
data
sets
as
they
do
not
typically
require
expensive
join
opera3ons.
• As
they
depend
less
on
a
rigid
schema,
they
are
more
suitable
to
manage
ad-‐hoc
and
changing
data
with
evolving
schemas.
• Graph
databases
are
a
powerful
tool
for
graph-‐like
queries
• Compu3ng
the
shortest
path
between
two
nodes
in
the
graph.
• Other
graph-‐like
queries
can
be
performed
over
a
graph
database
in
a
natural
way
(for
example
graph's
diameter
computa3ons
or
community
detec3on).
59
60. Graph
Databases,
Neo4j
• Neo4j
is
an
open-‐source
graph
database,
implemented
in
Java.
• The
developers
describe
Neo4j
as
"embedded,
disk-‐based,
fully
transac3onal
Java
persistence
engine
that
stores
data
structured
in
graphs
rather
than
in
tables".
• Neo4j
version
1.0
was
released
in
February,
2010.
• Neo4j
was
developed
by
Neo
Technology,
Inc.,
based
in
the
San
Francisco
Bay
Area,
US
and
Malmö,
Sweden.
60
61. Neo4j,
Node
&
Rela9on
• A
Graph
contains
Nodes
and
Rela3onships
• “A
Graph
—records
data
in→
Nodes
—which
have→
Proper3es”
• “Nodes
—are
organized
by→
Rela3onships
—which
also
have→
Proper3es”
61
62. Neo4j,
Traversal
•
Query
a
Graph
with
a
Traversal
• Traversal
—navigates→
a
Graph;
it
—iden3fies→
Paths
—which
order→
Nodes
• A
Traversal
is
how
you
query
a
Graph,
naviga3ng
from
star3ng
Nodes
to
related
Nodes
according
to
an
algorithm,
finding
answers
to
ques3ons
like
“what
music
do
my
friends
like
that
I
don’t
yet
own,”
or
“if
this
power
supply
goes
down,
what
web
services
are
affected?”
62
63. Neo4j,
Indexes
• Indexes
look-‐up
Nodes
or
Rela3onships
• “An
Index
—maps
from→
Proper3es
—to
either→
Nodes
or
Rela3onships”
• O1en,
you
want
to
find
a
specific
Node
or
Rela9onship
according
to
a
Property
it
has.
Rather
than
traversing
the
en3re
graph,
use
an
Index
to
perform
a
look-‐up,
for
ques3ons
like
“find
the
Account
for
username
master-‐of-‐graphs.”
63
64. Neo4j,
Database
• Neo4j
is
a
Graph
Database
• “A
Graph
Database
—
manages
a→
Graph
and
—also
manages
related→
Indexes”
64
69. NoSQL,
BASE
• NoSQL
characterized
by
BASE:
•
• Basically
Available:
Use
replica3on
to
reduce
the
likelihood
of
data
unavailability
and
use
sharding,
or
par33oning
the
data
among
many
different
storage
servers,
to
make
any
remaining
failures
par3al.
The
result
is
a
system
that
is
always
available,
even
if
subsets
of
the
data
become
unavailable
for
short
periods
of
3me.
• So1
state:
While
ACID
systems
assume
that
data
consistency
is
a
hard
requirement,
NoSQL
systems
allow
data
to
be
inconsistent
and
relegate
designing
around
such
inconsistencies
to
applica3on
developers.
• Eventually
consistent:
Although
applica3ons
must
deal
with
instantaneous
consistency,
NoSQL
systems
ensure
that
at
some
future
point
in
3me
the
data
assumes
a
consistent
state.
In
contrast
to
ACID
systems
that
enforce
consistency
at
transac3on
commit,
NoSQL
guarantees
consistency
only
at
some
undefined
future
3me.
69
70. ACID
vs.
BASE
noSQL
Databases,
Prof.
Walter
Kriha,
StuOgart
Media
University
70
71. Sta9s9cs
• The
worldwide
NoSQL
market
is
expected
to
reach
$3.4
Billion
by
2018
at
a
CAGR
of
21%
between
2013
and
2018.
NoSQL
market
will
generate
$14
Billion
in
revenues
over
the
period
2013
–
2018.
• CAGR
:
Compound
annual
growth
rate
• V(t0)
:
start
value,
V(tn)
:
finish
value,
• tn-‐
t0
:
number
of
years.
Resource
:
hOp://www.marketresearchmedia.com/2010/11/11/nosql-‐market/
71
72. When
to
USE?
Size
Key
-‐
Value
Bigtable
Doc-‐DB
GraphDB
Complexity
From neo4j
72
73. When
to
USE?
hOp://paolodedios.com/blog/2010/5/19/the-‐visual-‐guide-‐to-‐nosql-‐systems.html
73
77. Papers
1. DeCandia,
Giuseppe
;
Hastorun,
Deniz
;
Jampani,
Madan
;
Kakulapa3,
Gu-‐
navardhan
;
Lakshman,
Avinash
;
Pilchin,
Alex
;
Sivasubramanian,
Swaminathan
;
Vosshall,
Peter
;
Vogels,
Werner:
Dynamo:
Amazon’s
Highly
Available
Key-‐value
Store.
September
2007.
2. Chang,
Fay
;
Dean,
Jeffrey
;
Ghemawat,
Sanjay
;
Hsieh,
Wilson
C.
;
Wallach,
Deborah
A.
;
Burrows,
Mike
;
Chandra,
Tushar
;
Fikes,
Andrew
;
Gruber,
Robert
E.:
Bigtable:
A
Distributed
Storage
System
for
Structured
Data.
November
2006.
–
hOp://
labs.google.com/papers/bigtable-‐osdi06.pdf
3. Fay
Chang,
Jeffrey
Dean,
Sanjay
Ghemawat,
Wilson
C.
Hsieh,
Deborah
A.
Wallach
Mike
Burrows,
Tushar
Chandra,
Andrew
Fikes,
Robert
E.
Gruber:
Bigtable:
A
Distributed
Storage
System
for
Structured
Data
2006
4. RENZO
ANGLES
and
CLAUDIO
GUTIERREZ,
University
Chile
:
Survey
of
Graph
Database
Models
,
ACM
Compu3ng
Surveys,
Vol.
40,
No.
1,
Ar3cle
1,
Publica3on
date:
February
2008.
77
78. Papers
5. Survey
of
Graph
Database
Performance
on
the
HPC
Scalable
Graph
Analysis
Benchmark,
D.
Dominguez-‐Sal,
P.
Urb
́on-‐Bayes,
A.
Gim
enez-‐Van
̃o
́,
S.
Go
́
́mez-‐Villamor,
N.
Mart
́ınez-‐Baz
́an,
and
J.L.
Larriba-‐Pey,
Universitat
Polit`ecnica
de
Catalunya,
2010
6. Chad
Vicknair,
Michael
Macias:
A
Comparison
of
a
Graph
Database
and
a
Rela3onal
Database,
A
Data
Provenance
Perspec3ve
,
ACMSE
’10,
April
15-‐17,
2010,
Oxford,
MS,
USA
7. Bradford
Stephens.
HBase
vs.
Cassandra:
NoSQL
Bat-‐
tle!,
2009.
hOp://
www.roadtofailure.com/2009/10/29/
hbase-‐vs-‐cassandra-‐nosql-‐baOle/
comment-‐page-‐1/,
last
accessed
on
February
2011.
8. ON-‐LINE
PROJECT
MANAGEMENT
SYSTEM,
Qian
Sha
Bachelor
of
Economics,
Capital
University
of
Economics
and
Business,
2003
Will
NoSQL
Databases
Live
Up
to
Their
Promise?
Neal
LeaviO,
2010
78
79. Papers
9. Karger,
D.,
Lehman,
E.,
Leighton,
T.,
Panigrahy,
R.,
Levine,
M.,
and
Lewin,
D.
1997.
Consistent
hashing
and
random
trees:
distributed
caching
protocols
for
relieving
hot
spots
on
the
World
Wide
Web.
In
Proceedings
of
the
Twenty-‐Ninth
Annual
ACM
Symposium
on
theory
of
Compu3ng
(El
Paso,
Texas,
United
States,
May
04
-‐
06,
1997).
STOC
'97.
ACM
Press,
New
York,
NY,
654-‐663.
10. Lamport,
L.Time,
clocks
and
the
ordering
of
events
in
a
distributed
system.
ACM
Communica3ons,
21(7),
pp.
558-‐
565,
1978.
11. André
Allavena
,
Alan
Demers,
John
E.
Hopcro1
:
Correctness
of
a
Gossip
Based
Membership
Protocol
NY
2005,
ACM
1-‐58113-‐994-‐2/05/0007
79
80. Resources,
Web
link
• Introduc3on
data
structure
for
GraphDB,
Shunya
Kimura
:
hOp://www.slideshare.net/skimura/graphdatabase-‐data-‐structure
• Compare
nosql
database
:
hOp://nosql.findthebest.com/
• Oracle
White
paper
Sep.2011
Oracle
NoSQL
Database
• CouchDB:
hOp://www.couchbase.com/
• Open
Source
implementa3on
of
Big
Table:
HBase,
hOp://hbase.apache.org/
• hOp://www.db-‐class.org/course/video/preview_list
(Stanford
university)
• hOp://technirvanaa.wordpress.com/tag/nosql-‐disadvantages/
(March.
2011)
• hOp://www.kavistechnology.com/blog/?p=1577
(March
2010)
• hOp://www.couchbase.com/press-‐releases/couchbase-‐survey-‐shows-‐accelerated-‐
adop3on-‐nosql-‐2012
(Survey
2012)
• hOp://www.couchbase.com/why-‐nosql/nosql-‐database
• Couch
DB
wiki
:
hOp://wiki.apache.org/couchdb/
• hOp://highlyscalable.wordpress.com/2012/03/01/nosql-‐data-‐modeling-‐techniques/
(Very
good)
• hOp://neo4j.org/
• hOp://blog.neo4j.org/2010/03/modeling-‐categories-‐in-‐graph-‐database.html
• Neo4j
documenta3on
:
hOp://components.neo4j.org/neo4j/1.8.M05/apidocs/
• SQL
Databases
v.
noSQL
Databases,
Michael
Stonebraker,
MIT,
2010
80
81. Do
you
want
to
know
more?
• What
The
Heck
Are
You
Actually
Using
Nosql
For?
hOp://highscalability.com/blog/2010/12/6/what-‐the-‐heck-‐are-‐you-‐actually-‐
using-‐nosql-‐for.html
Nice
Tutorials
for
couchDB
hOp://couchapp.org/page/videos
81
82. CouchDB,
Example
• Download
CouchDB
from
:
hOp://couchdb.apache.org/
• Example
source
:
Source
:
CouchDB
the
Defini3ve
Guide,
O’REILLY,
Andelson,
Lebnardt
&
Slater
(
hOp://guide.couchdb.org/dra1/tour.html#figure/4
)
• GO
-‐>
hOp://127.0.0.1:5984/
82