The document provides an overview of SQL and NoSQL databases. It discusses the history and features of SQL, including transactions, schema definition, and commercial database examples. NoSQL databases are outlined as being non-relational, distributed, and horizontally scalable. Common NoSQL database types like document stores, column stores, and key-value stores are summarized.
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
Contact me for other informations and to download
This presentation contains the introduction to NOSQL databases, it's types with examples, differentiation with 40 year old relational database management system, it's usage, why and we should use it.
This presentation explains why NoSQL databases came over SQL databases although SQL databases has been successfully technology for more than twenty years. Moreover, This presentation discuses the characteristics and classifications of NoSQL databases. Finally, These slides cover four NoSQL databases briefly.
DataStage Online Training, Job Oriented Data Stage Training Classes by Real Time Expert for India, USA, Canada, UK, Japan, Singapore , Hyderabad, Bangalore, Pune @ +91 7680813158
This session provides an introduction to using SSIS. This is an update to my older presentation on the topic: http://www.slideshare.net/rmaclean/sql-server-integration-services-2631027
Analysis and evaluation of riak kv cluster environment using basho benchStevenChike
Many institutions and companies with technological development have been producing large size of structured and unstructured data. Therefore, we need special databases to deal with these data and thus emerged NoSQL databases. They are widely used in the cloud databases and the distributed systems. In the era of big data, those databases provide a scalable high availability solution. So we need new architectures to try to meet the need to store more and more different kinds of different data. In order to arrive at a good structure of large and diverse data, this structure must be tested and analyzed in depth with the use of different benchmark tools. In this paper, we experiment the Riak key-value database to measure their performance in terms of throughput and latency, where huge amounts of data are stored and retrieved in different sizes in a distributed database environment. Throughput and latency of the NoSQL database over different types of experiments and different sizes of data are compared and then results were discussed.
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...ArangoDB Database
Even though most NoSQL databases follow
the "schemafree" data paradigma, it is still import to choose the right data model to make the best of the underlying database technology. This talk provides an overview of
the different data storage models available in popular NoSQL databases. It also introduces some best practices on how to model your data for both best performance and best querying.
UNIT : -(6)
CONNECTING DATABASE WITH ADO.NET
Content:
•ADO.NET Architecture
•Data provider and its core object
•DataSet class
•Data Binding
•SQL Data Source
Concepts of Apache Hive in Big Data.
contains:
what is hive?
why hive?
how hive works
hive Architecture
data models in hive
pros and cons of hive
hiveql
pig vs hive
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
NEWYORKSYSTRAINING are destined to offer quality IT online training and comprehensive IT consulting services with complete business service delivery orientation.
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
Contact me for other informations and to download
This presentation contains the introduction to NOSQL databases, it's types with examples, differentiation with 40 year old relational database management system, it's usage, why and we should use it.
This presentation explains why NoSQL databases came over SQL databases although SQL databases has been successfully technology for more than twenty years. Moreover, This presentation discuses the characteristics and classifications of NoSQL databases. Finally, These slides cover four NoSQL databases briefly.
DataStage Online Training, Job Oriented Data Stage Training Classes by Real Time Expert for India, USA, Canada, UK, Japan, Singapore , Hyderabad, Bangalore, Pune @ +91 7680813158
This session provides an introduction to using SSIS. This is an update to my older presentation on the topic: http://www.slideshare.net/rmaclean/sql-server-integration-services-2631027
Analysis and evaluation of riak kv cluster environment using basho benchStevenChike
Many institutions and companies with technological development have been producing large size of structured and unstructured data. Therefore, we need special databases to deal with these data and thus emerged NoSQL databases. They are widely used in the cloud databases and the distributed systems. In the era of big data, those databases provide a scalable high availability solution. So we need new architectures to try to meet the need to store more and more different kinds of different data. In order to arrive at a good structure of large and diverse data, this structure must be tested and analyzed in depth with the use of different benchmark tools. In this paper, we experiment the Riak key-value database to measure their performance in terms of throughput and latency, where huge amounts of data are stored and retrieved in different sizes in a distributed database environment. Throughput and latency of the NoSQL database over different types of experiments and different sizes of data are compared and then results were discussed.
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...ArangoDB Database
Even though most NoSQL databases follow
the "schemafree" data paradigma, it is still import to choose the right data model to make the best of the underlying database technology. This talk provides an overview of
the different data storage models available in popular NoSQL databases. It also introduces some best practices on how to model your data for both best performance and best querying.
UNIT : -(6)
CONNECTING DATABASE WITH ADO.NET
Content:
•ADO.NET Architecture
•Data provider and its core object
•DataSet class
•Data Binding
•SQL Data Source
Concepts of Apache Hive in Big Data.
contains:
what is hive?
why hive?
how hive works
hive Architecture
data models in hive
pros and cons of hive
hiveql
pig vs hive
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
NEWYORKSYSTRAINING are destined to offer quality IT online training and comprehensive IT consulting services with complete business service delivery orientation.
KLC has devised Shuttle maths program that has evolved from the ABC concept of learning.This helps us identify what best works for a student and device a methodology for him.
Compare the capabilities of the Microsoft Access, Microsoft SQL Serv.pdfarihantplastictanksh
Compare the capabilities of the Microsoft Access, Microsoft SQL Server, Oracle’s MySQL, and
Oracle relational database management systems (RDBMSs). Your paper should discuss the
processing speeds, data storage capabilities, maximum users supported, platforms supported,
user interfaces, development tools, vendor support, and cost. Discuss and cite at least two
references in addition to our textbook. Your paper should be 3-5 pages in length (excluding title
and References pages)
Solution
Microsoft Access
Overview:
Microsoft Access is a part of Microsoft Office,
it is commercially available database in the market
Inexpensive/standard on most computers
users can create complex databases
database professionalas can use construct a database
customers of MS-Access:
It is mainly used in small corporate companies or IT Sectors with 1-80 endusers.
Features of MS-Access:
1.It is having GUI Interface for creating databases
2. A databae contains tables, forms, reports, queries, macros.
3. It facilitates autocontent wizards to build tables or forms or reports.
4. It acts as an interface to other DBMS using ODBC
5. It is used for small business companies
6. Provides security like password protection
7. Provides a Data dictionary
8. We can repair the database
9. We can create different views
10. External data can be imported into Access
11. We can create web pages based using the database
12. It has as built in Macro functions
13. It uses Structurered Query Language
14. We can create forms, reports etc by using Visual Basic Application programming
15. Provides Add in controls like calendars
16. It can merged into word and analysed with Excel etc.
Issues:
Security:
User level security is very difficult
Tuning:
It does not have the ability to split over multiple Hard Drives, multiple CPUs or to place tables
into memory.
Locking:
Basic handling of concurrent users Backup and recovery at basic level.
ANSI SQL standard often doesn\'t work,MS-Access has it\'s own modified version of ANSI
SQL.
MySQL
Overview
MySQL is a database engine. It has a command line interface that allows the creation of
database. It Requires Front-end applications to access it for end users. EX:- C#, PHP, Microsoft
ASP.Net.
Typical users
Small companies or workgroups, through to very large Internet databases with large numbers of
users
Ex:wikipedia,Moodle.
Features
1. Speed:One of the fastest databases available
2. Ease of use: when compared to larger databases such as Oracle Uses standard SQL
3. Capability: A multi-threaded server allowing many clients to connect at the same time Fully
networked for the Internet with built in security
4.Portability: Runs on a many operating systems and different hardware
5. Small size: when compared to other large databases e.g. Oracle
6. Availabliity and Cost: Open Source ,Free in most situations to use
7. Open distribution and source code: You can check how it works – if you have the knowledge.
8. interface to other DBMS’s using Open Database Connectivit.
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
Jan 22nd, 2010 Hadoop meetup presentation on project voldemort and how it plays well with Hadoop at linkedin. The talk focus on Linkedin Hadoop ecosystem. How linkedin manage complex workflows, data ETL , data storage and online serving of 100GB to TB of data.
Course Title: Database Programming with SQL
Course Code: DEE 431
TOPICS COVER:
Database Terminologies
Drawbacks of Traditional System
Data processing Modes
Application of DBMS
Types of Database
Histroy of Database
Characteristics of Database
Advantages and Disadvantages of Database
Types of database architecture: 1 Tier, 2 Tier, 3 Tier
This deck talks about the basic overview of NoSQL technologies, implementation vendors/products, case studies, and some of the core implementation algorithms. The presentation also describes a quick overview of "Polyglot Persistency", "NewSQL" like emerging trends.
The deck is targeted to beginners who wants to get an overview of NoSQL databases.
What is NoSQL? How does it come to the picture? What are the types of NoSQL? Some basics of different NoSQL types? Differences between RDBMS and NoSQL. Pros and Cons of NoSQL.
What is MongoDB? What are the features of MongoDB? Nexus architecture of MongoDB. Data model and query model of MongoDB? Various MongoDB data management techniques. Indexing in MongoDB. A working example using MongoDB Java driver on Mac OSX.
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Mohamed Galal
Nowadays the amount of data becomes very large, every organization produces a huge amount of data daily.
Thus we want new technology to help in storing and query a huge amount of data in acceptable time.
The old relational model may help in consistency but it was not designed to deal with big data problem.
In this slides, I will describe the relational model, NoSql Models and the NewSql models with some examples.
في الفيديو ده بيتم شرح ما هي المشاكل التي انتجت ظهور هذا النوع من قواعد البيانات
انواع المشاريع التي يمكن استخدامها بها
نبذة عن تاريخها و مزاياها و عيوبها
https://youtu.be/I9zgrdCf0fY
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
How would you build a database to support sustained ingestion of several hundreds of thousands rows per second while running near real-time queries on top?
In this session I will go over some of the technical decisions and trade-offs we applied when building QuestDB, an open source time-series database developed mainly in JAVA, and how we can achieve over four million row writes per second on a single instance without blocking or slowing down the reads. There will be code and demos, of course.
We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Give you a brief overview of the product. - What is esProc SPL? And show some cases helping you to know what it uses for. Talk about why esProc works better. And overview its brief characteristics. After that, Introduce the main technical solutions which esProc is often used.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
2. The following is a short, incomplete history of the SQL
1987 – Initial ISO/IEC Standard
1989 – Referential Integrity
1992 – SQL2
1995 SQL/CLI (ODBC)
1996 SQL/PSM – Procedural Language extensions
1999 – User Defined Types
2003 – SQL/XML
2008 – Expansions and corrections
2011 (or 2012) System Versioned and Application Time
Period Tables
2
3. Data stored in columns and tables
Relationships represented by data
Data Manipulation Language
Data Definition Language
Transactions
Abstraction from physical layer
3
4. Applications specify what, not how
Query optimization engine
Physical layer can change without modifying
applications
Create indexes to support queries
In Memory databases
4
5. Data manipulated with Select, Insert, Update, &
Delete statements
Select T1.Column1, T2.Column2 …
From Table1, Table2 …
Where T1.Column1 = T2.Column1 …
Data Aggregation
Compound statements
Functions and Procedures
Explicit transaction control
5
6. Schema defined at the start
Create Table (Column1 Datatype1, Column2 Datatype
2, …)
Constraints to define and enforce relationships
Primary Key
Foreign Key
Etc.
Triggers to respond to Insert, Update , & Delete
Stored Modules
Alter …
Drop …
Security and Access Control
6
7. Atomic – All of the work in a transaction completes
(commit) or none of it completes
Consistent – A transaction transforms the database
from one consistent state to another consistent state.
Consistency is defined in terms of constraints.
Isolated – The results of any changes made during a
transaction are not visible until the transaction has
committed.
Durable – The results of a committed transaction
survive failures
7
8. Commercial
IBM DB2
Oracle RDMS
Microsoft SQL Server
Sybase SQL Anywhere
Open Source (with commercial options)
MySQL
Ingres
Significant portions of the
world’s economy use SQL databases!
8
9. From www.nosql-database.org:
Next Generation Databases mostly addressing some of the
points: being non-relational, distributed, open-source
and horizontal scalable. The original intention has been
modern web-scale databases. The movement began
early 2009 and is growing rapidly. Often more
characteristics apply as: schema-free, easy replication
support, simple API, eventually consistent / BASE
(not ACID), a huge data amount, and more.
9
11. Large data volumes
Google’s “big data”
Scalable replication and distribution
Potentially thousands of machines
Potentially distributed around the world
Queries need to return answers quickly
Mostly query, few updates
Asynchronous Inserts & Updates
Schema-less
ACID transaction properties are not needed – BASE
CAP Theorem
Open source development
11
12. Acronym contrived to be the opposite of ACID
Basically Available,
Soft state,
Eventually Consistent
Characteristics
Weak consistency – stale data OK
Availability first
Best effort
Approximate answers OK
Aggressive (optimistic)
Simpler and faster
12
13. A distributed system can support only two of the
following characteristics:
Consistency
Availability
Partition tolerance
The slides from Brewer’s July 2000 talk do not define
these characteristics.
13
14. all nodes see the same data at the same time –
Wikipedia
client perceives that a set of operations has occurred
all at once – Pritchett
More like Atomic in ACID transaction properties
14
15. node failures do not prevent survivors from
continuing to operate – Wikipedia
Every operation must terminate in an intended
response – Pritchett
15
16. the system continues to operate despite
arbitrary message loss – Wikipedia
Operations will complete, even if individual
components are unavailable – Pritchett
16
17. Discussing NoSQL databases is complicated because
there are a variety of types:
Column Store – Each storage block contains data from
only one column
Document Store – stores documents made up of
tagged elements
Key-Value Store – Hash table of keys
17
18. XML Databases
Graph Databases
Codasyl Databases
Object Oriented Databases
Etc…
Will not address these today
18
19. Each storage block contains data from only one
column
Example: Hadoop/Hbase
http://hadoop.apache.org/
Yahoo, Facebook
Example: Ingres VectorWise
Column Store integrated with an SQL database
http://www.ingres.com/products/vectorwise
19
20. More efficient than row (or document) store if:
Multiple row/record/documents are inserted at the
same time so updates of column blocks can be
aggregated
Retrievals access only some of the columns in a
row/record/document
20
22. {
"_id": "guid goes here",
"_rev": "314159",
"type": "abstract",
"author": "Keith W. Hare"
"title": "SQL Standard and NoSQL Databases",
"body": "NoSQL databases (either no-SQL or Not Only SQL)
are currently a hot topic in some parts of
computing.",
"creation_timestamp": "2011/05/10 13:30:00 +0004"
}
22
23. "_id"
GUID – Global Unique Identifier
Passed in or generated by CouchDB
"_rev"
Revision number
Versioning mechanism
"type", "author", "title", etc.
Arbitrary tags
Schema-less
Could be validated after the fact by user-written routine
23
24. Hash tables of Keys
Values stored with Keys
Fast access to small data values
Example – Project-Voldemort
http://www.project-voldemort.com/
Linkedin
Example – MemCacheDB
http://memcachedb.org/
Backend storage is Berkeley-DB
24
25. Technique for indexing and searching large data
volumes
Two Phases, Map and Reduce
Map
Extract sets of Key-Value pairs from underlying data
Potentially in Parallel on multiple machines
Reduce
Merge and sort sets of Key-Value pairs
Results may be useful for other searches
25
26. Map Reduce techniques differ across products
Implemented by application developers, not by
underlying software
26
27. Google granted US Patent 7,650,331, January 2010
System and method for efficient large-scale data processing
A large-scale data processing system and method includes one
or more application-independent map modules configured to
read input data and to apply at least one application-specific
map operation to the input data to produce intermediate data
values, wherein the map operation is automatically parallelized
across multiple processors in the parallel processing
environment. A plurality of intermediate data structures are
used to store the intermediate data values. One or more
application-independent reduce modules are configured to
retrieve the intermediate data values and to apply at least one
application-specific reduce operation to the intermediate
data values to provide output data.
27
28. Syntax varies
HTML
Java Script
Etc.
Asynchronous – Inserts and updates do not wait for
confirmation
Versioned
Optimistic Concurrency
28
29. Syntax Varies
No set-based query language
Procedural program languages such as Java, C, etc.
Application specifies retrieval path
No query optimizer
Quick answer is important
May not be a single “right” answer
29
30. Small upfront software costs
Suitable for large scale distribution on commodity
hardware
30
31. NoSQL databases reject:
Overhead of ACID transactions
“Complexity” of SQL
Burden of up-front schema design
Declarative query expression
Yesterday’s technology
Programmer responsible for
Step-by-step procedural language
Navigating access path
31
32. SQL Databases
Predefined Schema
Standard definition and interface language
Tight consistency
Well defined semantics
NoSQL Database
No predefined Schema
Per-product definition and interface language
Getting an answer quickly is more important than
getting a correct answer
32