This document provides an introduction to NoSQL and data scalability techniques. It discusses key concepts like data scalability and horizontal scaling. The two main architectures for scalable data systems are data grids and NoSQL systems. Example NoSQL implementations discussed include MongoDB, which uses a document-oriented data model, and GigaSpaces XAP, a data grid platform. The document provides examples of using MongoDB with Python and discusses some advantages and disadvantages of MongoDB compared to SQL databases.
Imran Sarwar Bajwa, S. Irfan Hyder [2005], "PCA Based Image Classification of Single-Layered Cloud Types", in 1st IEEE International Conference on Emerging Technologies (ICET 2005), Islamabad, Pakistan, Jan 2005, pp:365-369
Imran Sarwar Bajwa, S. Irfan Hyder [2005], "PCA Based Image Classification of Single-Layered Cloud Types", in 1st IEEE International Conference on Emerging Technologies (ICET 2005), Islamabad, Pakistan, Jan 2005, pp:365-369
How to Develop True Distributed Simulations? HLA & DDS InteroperabilityJose Carlos Diaz
Nowadays there are two publish / subscribe based communications standards suitable for the simulation market: IEEE HLA (High Level Architecture), developed in origin by the US DMSO and OMG DDS (Data Distribution Service), developed by a group of companies with support of OMG organization. HLA is focused in interoperability and reusability of simulations and DDS is a more general standard, focused on real time systems. This paper presents NCWare, a new software abstraction layer developed by NEXTEL ENGINEERING that combines HLA and DDS standards providing interoperability between them.
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...IDES Editor
Clustering is an important research area for
mobile ad hoc networks (MANETs) as it increases the
capacity of network, reduces the routing overhead and
makes the network more scalable in the presence of both
high mobility and a large number of mobile nodes. In
clustering the clusterhead manage and store recent routing
information. However the frequent change of clusterhead
leads to loss of routing information stored, changes the route
between two nodes, affects the performance of the routing
protocol and makes the cluster structure unstable.
Communication overhead in terms of exchanging messages
is needed to elect a new clusterhead. The goal then would be
to keep the clusterhead change as least as possible to make
cluster structure more stable, to prevent loss of routing
information which in turn improve the performance of
routing protocol based on clustering. This can be achieved
by an efficient cluster maintenance scheme. In this work, a
novel clustering algorithm, namely Incremental
Maintenance Clustering Scheme (IMS) is proposed for
Mobile Ad Hoc Networks. The goals are yielding low
number of clusterhead and clustermember changes,
maintaining stable clusters, minimizing the number of
clustering overhead. Through simulations the performance
of IMS is compared with that of least cluster change (LCC)
and maintenance scheme of Cluster Based Routing Protocol
(CBRP) in terms of the number of clusterhead changes,
number of cluster-member changes and clustering overhead
by varying mobility and speed. The simulation results
demonstrate the superiority of IMS over LCC and
maintenance scheme of CBRP.
Today’s data centers are being asked to do more with little additional financial support forthcoming. They are under a constant barrage to do things faster, more inexpensively and with no disruption to the revenue-‐generating part of the company. Frequently, they are experiencing 24 X 7 operations, numerous new application deployments and explosive data growth. Data storage often becomes a crucial limiting factor to meeting these stringent demands.
Faced with these seemingly insurmountable challenges, CIO’s are discovering that the old way of responding to application and data growth is unacceptable. That is, the outdated “rip and replace” method to improve capacity and IO performance, which often meant disruptive migration, doesn’t work anymore. Instead, a more nimble alternative is needed. In response to this need for storage agility, NetApp has recently released a new version of their storage operating environment called Data ONTAP® 8.1. This new software update successfully eliminates many problems of typical monolithic or legacy storage systems.
Manage rising disk prices with storage virtualization webinarHitachi Vantara
Learn how storage virtualization can reclaim existing storage on the floor. Extend thin provisioning to existing storage to increase disk utilization and defer capital purchases. Take advantage of zero reclaim and write same to reclaim storage reclamation.
A new paradigm that accommodates and exploits SDN, Network Analytics (NA) and AI, and provides use cases that illustrate its applicability and benefits
How to Develop True Distributed Simulations? HLA & DDS InteroperabilityJose Carlos Diaz
Nowadays there are two publish / subscribe based communications standards suitable for the simulation market: IEEE HLA (High Level Architecture), developed in origin by the US DMSO and OMG DDS (Data Distribution Service), developed by a group of companies with support of OMG organization. HLA is focused in interoperability and reusability of simulations and DDS is a more general standard, focused on real time systems. This paper presents NCWare, a new software abstraction layer developed by NEXTEL ENGINEERING that combines HLA and DDS standards providing interoperability between them.
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...IDES Editor
Clustering is an important research area for
mobile ad hoc networks (MANETs) as it increases the
capacity of network, reduces the routing overhead and
makes the network more scalable in the presence of both
high mobility and a large number of mobile nodes. In
clustering the clusterhead manage and store recent routing
information. However the frequent change of clusterhead
leads to loss of routing information stored, changes the route
between two nodes, affects the performance of the routing
protocol and makes the cluster structure unstable.
Communication overhead in terms of exchanging messages
is needed to elect a new clusterhead. The goal then would be
to keep the clusterhead change as least as possible to make
cluster structure more stable, to prevent loss of routing
information which in turn improve the performance of
routing protocol based on clustering. This can be achieved
by an efficient cluster maintenance scheme. In this work, a
novel clustering algorithm, namely Incremental
Maintenance Clustering Scheme (IMS) is proposed for
Mobile Ad Hoc Networks. The goals are yielding low
number of clusterhead and clustermember changes,
maintaining stable clusters, minimizing the number of
clustering overhead. Through simulations the performance
of IMS is compared with that of least cluster change (LCC)
and maintenance scheme of Cluster Based Routing Protocol
(CBRP) in terms of the number of clusterhead changes,
number of cluster-member changes and clustering overhead
by varying mobility and speed. The simulation results
demonstrate the superiority of IMS over LCC and
maintenance scheme of CBRP.
Today’s data centers are being asked to do more with little additional financial support forthcoming. They are under a constant barrage to do things faster, more inexpensively and with no disruption to the revenue-‐generating part of the company. Frequently, they are experiencing 24 X 7 operations, numerous new application deployments and explosive data growth. Data storage often becomes a crucial limiting factor to meeting these stringent demands.
Faced with these seemingly insurmountable challenges, CIO’s are discovering that the old way of responding to application and data growth is unacceptable. That is, the outdated “rip and replace” method to improve capacity and IO performance, which often meant disruptive migration, doesn’t work anymore. Instead, a more nimble alternative is needed. In response to this need for storage agility, NetApp has recently released a new version of their storage operating environment called Data ONTAP® 8.1. This new software update successfully eliminates many problems of typical monolithic or legacy storage systems.
Manage rising disk prices with storage virtualization webinarHitachi Vantara
Learn how storage virtualization can reclaim existing storage on the floor. Extend thin provisioning to existing storage to increase disk utilization and defer capital purchases. Take advantage of zero reclaim and write same to reclaim storage reclamation.
A new paradigm that accommodates and exploits SDN, Network Analytics (NA) and AI, and provides use cases that illustrate its applicability and benefits
Presentation on an overview of LinkedIn data driven products and infrastructure given on 26 Oct 2012 in the big-data symposium given in honor of the retirement of my PhD advisor Dr Martin H. Schultz.
I Am A Red Raider: A Marketing Campaign Designed with Engagement in MindAllison Matherly
In many instances, social media is an after thought when creating a marketing campaign, leading to messaging that isn't a perfect fit. When creating a new marketing campaign, the marketing team at Texas Tech University decided to use a concept that would inherently lead to social media engagement through both traditional and digital marketing tactics. Texas Tech chose to tell individual stories of faculty, staff, students and alumni, while also accepting submissions from readers of their stories to continue telling the story of being a Red Raider. In addition, many departments and areas of the university were able to use the messaging to continue the campaign, including a natural translation for new students and athletics, different areas supporting current students, and the fact that alumni will always be Red Raiders. The results from this campaign have been staggering, with national press from two digital billboards in Times Square and the announcement of it on a variety of social media channels, to the sheer number of times the #IAmARedRaider hashtag has been used, and the stories being told through it, the campaign has been extremely successful. Thanks to the social media component, this campaign has worked its way into daily interaction with Texas Tech.
Irrigation with municipal waste water is a suitable disposal option in all regions where additional moisture can be effectively utilized for improved crop production. Waste water loading is to be based on the consumptive water use of the crop being grown. The primary objective should be enhancement of crop production. The root zone of productive soils can often serve as one of the most active media for the decomposition, immobilization, or utilization of wastes.
Fieldking farm Implements, Rotavator, Power Harrow, Cultivator, Laser Levelle...DINESH GARG
Beri Udyog (FIELDKING)
Manufacture and exporter of world class farm Implements. we are exportoing to 75 countries and at 500 domestic sales point.
FIELDKING is one stop shop for fulfilling need of farm equipment business and farmer.
Curious about how to take that first step from CCNP to Programmer? Hey guess what, you don't even have to make an entire career out of programming, you can use it to AUGMENT your existing job responsibilities and make your life easier. This deck won't teach you everything you need to know, but it should help making that first step a little bit easier.
Dynamo Systems - QCon SF 2012 PresentationShanley Kane
A look at Dynamo-based systems: the architectural principles, use cases and requirements; where they differ from relational databases; and where they are going.
In this paper we describe NoSQL, a series of non-relational database
technologies and products developed to address the current problems the
RDMS system are facing: lack of true scalability, poor performance on high
data volumes and low availability. Some of these products have already been
involved in production and they perform very well: Amazon’s Dynamo,
Google’s Bigtable, Cassandra, etc. Also we provide a view on how these
systems influence the applications development in the social and semantic Web
sphere.
In this paper we describe NoSQL, a series of non-relational database technologies and products developed to address the current problems the RDMS system are facing: lack of true scalability, poor performance on high data volumes and low availability. Some of these products have already been involved in production and they perform very well: Amazon’s Dynamo, Google’s Bigtable, Cassandra, etc. Also we provide a view on how these systems influence the applications development in the social and semantic Web sphere.
In this paper we introduce the MADlib project, including the background that led to its beginnings, and the motivation for its open source nature. We provide an overview of the library’s architecture and design patterns, and provide a description of various statistical methods in that context.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Neuro-symbolic is not enough, we need neuro-*semantic*
No sql and data scalability
1. #105
Get More Refcardz! Visit refcardz.com
Getting Started with
CONTENTS INCLUDE:
n
Introduction
n
Scalable Data Architecture
NoSQL and Data Scalability
n
Is NoSQL For You?
n
mongoDB
n
GigaSpaces XAP
Google App Engine Datastore and more...
By Eugene Ciurana
n
Implementations of either often share characteristics from the
INTRODUCTION other.
The DZone Refcard #43 is an introduction to system high Data Grids
availability and scalability terminology and techniques Data grids process workloads defined as independent jobs
(http://refcardz.dzone.com/refcardz/scalability). The next that don’t require data sharing among processes. Storage
logical step is the scalable handling of massive data volumes or network may be shared across all nodes of the grid, but
resulting from having these powerful processing capabilities. intermediate results have no bearing on other jobs progress or
on other nodes in the grid, such as a MapReduce cluster.
This Refcard demystifies NoSQL and data scalability
techniques by introducing some core concepts. It also offers
an overview of current technologies available in this domain
Consumer
and suggests how to apply them.
What is Data Scalability?
Master
Data scalability is the ability of a system to store, manipulate,
analyze, and otherwise process ever increasing amounts of
www.dzone.com
data without reducing overall system availability, performance, Load Balancer
or throughput.
Data scalability is achieved by a combination of more powerful
Node Node Node Node
processing capabilities and larger but efficient storage
mechanisms.
Relational and hierarchical databases scale up by adding more Load Balancer
processors, more storage, caching systems, and such. Soon
they hit either a cost or a practical scalability limit because
they are difficult or impossible to scale out. These database Node Node Node Node
management systems are designed as single units that must
maintain data integrity, and enforce schema rules to guarantee Figure 1 - Data Grid
it. This rigidity is what promotes upward, but not outward,
Getting Started with NoSQL and Data Scalability
scalability. Data grids expose their functionality through a single API
(either a Web service or native to the application programming
Oracle RAC is a cluster of multiple computers with language) that abstracts its topology and implementation from
access to a common database. This is considered its data processing consumer.
Hot only vertically scalable because the processing
Tip (usually in the form of stored procedures) may scale
out, but the shared storage facilities don’t scale with
the cluster.
Get over 90 DZone Refcardz
Data integrity and schemas are suited for handling
transactional, normalized, uniform data. They handle FREE from Refcardz.com!
unstructured or rapidly evolving data structures with difficulty
or exponentially larger costs.
Hot Data replication is not the same as data scalability!
Tip
SCALABLE DATA ARCHITECTURES
There are two general kinds of architectures used for building
scalable data systems: data grids and NoSQL systems.
DZone, Inc. | www.dzone.com
2. 2
Getting Started with NoSQL and Data Scalability
Areas of Application NoSQL vs RDBMS vs OO Analogies
• Financial modeling NoSQL RDBMS OO
• Data mining
Kind Table Class
• Click stream analytics
Entity Record Object
• Document clustering
• Distributed sorting or grepping Attribute Column Property
• Simulations
• Inverted index construction
IS NoSQL FOR YOU?
• Protein folding
NoSQL Preparation:
NoSQL describes a horizontally scalable, non-relational
• Don’t fall prey to the NoSQL fad
database with built-in replication support. Applications
• Don’t be stubborn; neither NoSQL nor traditional
interact with it through a simple API, and the data is stored in a
databases apply to all cases
“schema-free”, flat addressing repository, usually as large files
• Apply the CAP Theorem to your use cases to
or data blocks. The repository is often a custom file system
determine feasibility
designed to support NoSQL operations.
Brewer’s (CAP) Theorem
NoSQL is intended as shorthand for “not only It’s impossible for a distributed computer system to
Hot SQL.” Complete architectures almost always mix simultaneously provide all three of these guarantees:
Tip
traditional and NoSQL databases. • Consistency (all nodes see the same data at the same time)
• Availability (node failures don’t prevent survivors from
NoSQL setups are best suited for non-OLTP applications that continuing to operate)
process massive amounts of structured and unstructured data • Partition tolerance (no failures less than total network
at a lower cost and with higher efficiency than RDBMs and failures cause the system to fail)
stored procedures.
Since only two of these characteristics are guaranteed for any
given scalable system, use your functional specification and
Consumer business SLA (service level agreement) to determine what your
minimum and target goals for CAP are, pick the two that meet
your requirements, and proceed to implement the appropriate
Node Node Node Node technology.
Rule of Thumb: NoSQL’s primary goal is to achieve
Virtual File System
logical table management, load balancing, garbage collection
Hot horizontal scalability. It attains this by reducing
(HDFS, mongoFS, Hypertable) Tip
transactional semantics and referential integrity.
Tablet Tablet Tablet
Server 0 Server 1 Server n
Use Figure 3 to identify the best match between your
Distributed File System
application’s CAP requirements and the suggested SQL and
NoSQL systems listed.
FS 0 FS 1 FS 2 FS n
Figure 2 - NoSQL Topology Relational
Key-Value
Column-Oriented
NoSQL storage is highly replicated (a commit doesn’t occur Document-Oriented
until the data is successfully written to at least two separate
Consistency Availability
storage devices) and the file systems are optimized for write-
C A
RDBMs (Oracle, MySQL), Aster Data, Green Plum, Vertica
only commits. The storage devices are formatted to handle
large blocks (32 MB or more). Caching and buffering are
a,
mo
designed for high I/O throughput. The NoSQL database
dr
ng edi
an
oD s,
s
R
is implemented as a data grid for processing (mapReduce,
as
B, Be
ak C
Pick Any Two
Te rke
Ri I,
queries, CRUD, etc.)
B, , KA
rra ley
sto D
hD et
re B,
uc bin
Areas of Application
,D M
Co Ca
ata em
B, yo
• Document storage
sto cac
leD Tok
re he
• Object databases
mp t,
, H DB
Si mor
yp , S
lde
er
• Graph databases
tab cala
Vo
le, ris
• Key/value stores
o,
m
Hb
na
P
as
• Eventually consistent key/value stores
Dy
, e
This list is the implementation counterpart to the data grid Partition tolerance
areas of application; the data store feeds the computational Figure 3 - CAP Selection Chart
network and, together, they form the NoSQL database. (source: Nathan Hurst's Blog)
DZone, Inc. | www.dzone.com
3. 3
Getting Started with NoSQL and Data Scalability
encoded JSON representation. BSON is designed to be
mongoDB traversable, lightweight and efficient. Applications can map
BSON/JSON documents using native representations like
mongoDB is a document-based NoSQL database that bridges dictionaries, lists, and arrays, leaving the BSON translation
the gap between scalable key-value stores like Datastore to the native mongoDB driver specific to each programming
and Memcache DB, and RDBMS’s querying and robustness language.
capabilities. Some of its main features include:
BSON Example
• Document-oriented storage - data is manipulated as {
JSON-like documents 'name' : 'Tom',
'age' : 42
• Querying - uses JavaScript and has APIs for submitting }
queries in every major programming language
• In-place updates - atomicity Language Representation
• Indexing - any attribute in a document may be used for Python {
'name' : 'Tom',
indexing and query optimization 'age' : 42
• Auto-sharding - enables horizontal scalability }
• Map/reduce - the mongoDB cluster may run smaller Ruby {
"name" => "Tom",
MapReduce jobs than a Hadoop cluster with significant "age" => 42
}
cost and efficiency improvements
Java BasicDBObject d;
mongoDB implements its own file system to optimize I/O d = new BasicObject();
d.put("name", "Tom");
throughput by dividing larger objects into smaller chunks. d.put("age", 42);
Documents are stored in two separate collections: files PHP array( "name" => "Tom",
containing the object meta-data, and chunks that form a "age" => 42);
larger document when combined with database accounting Dynamic languages offer a closer object mapping to BSON/
information. The mongoDB API provides functions for JSON than compiled languages.
manipulating files, chunks, and indices directly. The
administration tools enable GridFS maintenance. The complete BSON specification is available from:
http://bsonspec.org/
Consumer mongoDB Programming
Programming in mongoDB requires an active server running
fail-over
the mongod and the mongos database daemons (see Figure
5), and a client application that uses one of the language-
mongoDB Server (master) mongoDB Server (slave)
mongod mongod
specific drivers.
Database Database
daemon daemon
mongos mongos
Sharding
daemon
Sharding
daemon Hot All the examples in this Refcard are written in Python
Data Data
Tip for conciseness.
Storage Storage
Starting the Server
Figure 5 - mongoDB Cluster
Log on to the master server and execute:
A mongoDB cluster consists of a master and a slave. The slave [servername:user] ./mongod
may become the master in a fail-over scenario, if necessary.
Having a master/slave configuration (also known as Active/ The server will display its status messages to the console
Passive or A/P cluster) helps ensure data integrity since only unless stdout is redirected elsewhere.
the master is allowed to commit changes to the store at any Programming Example
given time. A commit is successful only if the data is written to This example allocates a database if one doesn’t already exist,
GridFS and replicated in the slave. instantiates a collection on the server, and runs a couple of
queries.
mongoDB also supports a limited master/master
configuration. It’s useful only for inserts, queries, The mongoDB Developer Manual is available from:
Hot and deletions by specific object ID. It must not http://www.mongodb.org/display/DOCS/Manual
Tip
be used if updates of a single object may occur #!/usr/bin/env jython
concurrently. import pymongo
from pymongo import Connection
Caching
connection = Connection('servername', 27017)
mongoDB has a built-in cache that runs directly in the cluster
without external requirements. Any query is transparently db = connection['people_database']
cached in RAM to expedite data transfer rates and to reduce peopleList = db['people_list']
disk I/O. person = {
'name' : 'Tom',
Document Format 'age' : 42 }
mongoDB handles documents in BSON format, a binary-
DZone, Inc. | www.dzone.com
4. 4
Getting Started with NoSQL and Data Scalability
peopleList.insert(person) • JSON data and program objects storage - many RESTful
web services provide JSON data; they can be stored in
person = {
'name' : 'Nancy', mongoDB without language serialization overhead
'age' : 69 }
(especially when compared against XML documents)
peopleList.insert(person) • Content management systems - JSON/BSON objects can
# find first entry: represent any kind of document, including those with a
person = peopleList.find_one()
binary representation
# find a specific person:
mongoDB Drawbacks
person = peopleList.find_one({ 'name' : 'Joe'})
• No JOIN operations - each document is stand-alone
if person is None: • Complex queries - some complex queries and indices are
print "Joe isn’t here!"
else: better suited for SQL
print person['age']
• No row-level locking - unsuitable for transactional data
# bulk inserts without error prone application-level assistance
persons = [{ 'name' : 'Joe' }, {'name' : 'Sue'}]
peopleList.insert(persons) If any of these is part of the functional requirements, a SQL
database would be better suited for the application.
# queries with multiple results
for person in peopleList.find():
print person['name'] GIGASPACES XAP
for person in peopleList.find({'age' : {'$ge' : 21}}).sort('name'):
print person['name'] The GigaSpaces eXtreme Application Platform is a data
# count: grid designed to replace traditional application servers. It
nDrinkingAge = peopleList.find({'age' : {'$ge' : 21}}).count()
operates based on an event-processing model where the
application dispatches objects to the processing nodes
# indexing
associated with a given data partition. The system may be
from pymongo import ASCENDING, DESCENDING
configured so that data state on the grid may trigger events, or
peopleList.create_index([('age', DESCENDING), ('name', ASCENDING)]) an application may dispatch specific commands imperatively.
GigaSpaces XAP also manages all threading and execution
The PyMongo documentation is available at: aspects of the operation, including thread and connection
http://api.mongodb.org/python - guides for other languages pools. GigaSpaces XAP implements Spring transactions and
are also available from this web site. auto-recovery. The system detects any failed operations
The code in the previous example performs these operations: in a computational node and automatically rolls back the
transaction; it then places it back in the space where another
• Connect to the database server started in the previous
node picks it up to complete processing.
section
• Attach a database; notice that the database is treated like
Application Frameworks
an associative array
• Get a collection (loosely equivalent to a table in a Java C++ .Net Groovy
relational database), treated like an associative array
Mule Spring JEE Jetty
• Insert one or more entities
• Query for one or more entites
Although mongoDB treats all these data as BSON internally,
most of the APIs allow the use of dictionary-style objects to
streamline the development process. XAP Deployment Virtualization
XAP
Object ID Management
A successful insertion into the database results in a valid and XAP Middleware Virtualization
Monitoring
Object ID. This is the unique identifier in the database for a (Virtualized Clustering Layer)
given document. When querying the database, a return value
will include this attribute:
{
"name" : "Tom",
"age" : 42,
"_id" : ObjectId('999999')
} RDBMS Memcache DB mongoDB
Users may override this Object ID with any argument as long as
it’s unique, or allow mongoDB to assign one automatically.
Figure 4 - GigaSpaces XAP Data Grid
Common Use Cases
• Caching - more robust capabilities, plus persistence, when GigaSpaces provides data persistence, distributed processing,
compared against a pure caching system and caching by interfacing with SQL and NoSQL data stores.
• High volume processing - RDBMS may be too expensive The GigaSpaces API abstracts all back-end operations (job
or slow to run in comparison dispatching, data persistence, and caching) and makes
DZone, Inc. | www.dzone.com
5. 5
Getting Started with NoSQL and Data Scalability
it transparent to the application. GigaSpaces XAP may
implement distributed computing operations like MapReduce GOOGLE APP ENGINE DATASTORE
and run them in its nodes, or it may dispatch them for
processing to the underlying subsystem if the functionality The Datastore is the main scalability feature of Google App
is available (e.g. mongoDB, Hadoop). The application may Engine applications. It’s not a relational database or a façade
implement transactions using the Spring API, the GigaSpaces for one. Datastore is a public API for accessing Google’s
XPA transactional facilities, or by implementing a workflow Bigtable high-performance distributed database system.
where specific NoSQL stores handle entities or groups of Think of it as a sparse array distributed across multiple servers
entities. This must be implemented explicitly in systems like that also allows an infinite number of columns and rows.
mongoDB, but may exist in other NoSQL systems like Google Applications may even define new columns “on the fly”. The
App Engine’s Datastore. Datastore scales by adding new servers to a cluster; Google
provides this functionality without user participation.
GigaSpaces XAP Programming Google Your
Applications Applications
The GigaSpaces XAP API is very rich and covers many aspects
beyond NoSQL and data scalability areas like configuration Datastore API 1 Datastore
Java (JDO, JPA) Other language Python
management, deployment, web services, third-party product
Bigtable
integration, etc. Master Server
(Logical table management, load balancing, garbage collection)
The GigaSpaces XAP documentation is at:
Tablet Tablet Tablet
http://www.gigaspaces.com/wiki/display/XAP71/Programmer%27s+Guide Server 0 Server 1 Server n
NoSQL operations may be implemented over these APIs: Google File System
• SQLQuery - allows querying a space using a SQL-like FS 0 FS 1 FS 2 FS n
syntax and regular expressions; do not confuse it with
Figure 6 - Datastore Architecture
JDBC support.
Datastore operations are defined around entities. Entities
• Persistency - mostly supports RDBMSs but may implement
can have one-to-many or many-to-many relationships. The
other persistency mechanisms through the External Data
Datastore assigns unique IDs unless the application specifies
Source Components API.
a unique key. Datastore also disallows some property names
• memcached - support for key/value pair distributed that it uses for housekeeping. The complete Datastore
dictionaries available to any client in the grid; entities documentation is available from:
are automatically made available across all nodes. The http://code.google.com/appengine/docs/python/datastore/
memcached API is implemented on top of the data
grid, and it’s interchangeable with other memcached Did you notice the parallels between Datastore
implementations. Hot and mongoDB so far? Many NoSQL database
• Task Execution - allows synchronous and asynchronous job Tip implementations have solved similar problems in
execution on specific nodes or clusters similar ways.
The GigaSpaces XAP API is in a minority of stateful NoSQL Transactions and Entity Groups
systems. Most NoSQL systems strive to achieve statelessness Datastore supports transactions. These transactions are only
to increase scalability and data consistency. effective on entities that belong to the same entity group.
Entities in a given group are guaranteed to be stored on the
Common Use Cases same server.
• Real-time analytics - dynamic data analysis and reporting
Datastore Programming
based on data entered into a system less than a minute
The programming model is based on inheritance of basic
before effective time of use
entities, db.Model and db.Expando. Persistent data is
• Map/reduce - distributed data processing of large data mapped onto an entity specialization of either of these classes.
sets across a computational grid The API provides persistence and querying instance methods
• Near-zero downtime - allows for database schema for every entity managed by the Datastore.
changes without homebrew master/slave configurations or
Programming Example
proprietary RDBMS dependencies
The Datastore API is simpler than other NoSQL APIs and is
GigaSpaces XAP NoSQL Drawbacks highly optimized to work in the App Engine environment. In
• Complexity - the server, transactional, and grid model are this example we insert data into the data store, then run a query:
more complex than for other NoSQL systems from google.appengine.ext import db
• Application server model - the API and components are class Person(db.Model):
name = db.StringProperty(required=True)
geared toward building applications and transactional age = db.IntegerProperty(require=True)
logic
person = Person(name = 'Tom', age = 42)
• Steeper learning curve
person.put()
• Higher TCO - brings a requirement of a specialized,
person = Person(name = 'Sue', age = 69)
well-trained system administration team with higher
requirements than other NoSQL systems person.put()
DZone, Inc. | www.dzone.com