This document compares different NoSQL database options and discusses which type may be best for different use cases. It provides an overview of the current NoSQL landscape and models, including key-value, document, graph and wide column stores. Specific databases like Redis, CouchBase, Neo4j and Cassandra are compared based on features like query support, operations, and commercial options. The document recommends choosing a database based on the specific problem and considering aspects like data size, read/write needs, and tradeoffs between consistency, availability and partitioning. It also advocates starting small but with significance and considering hybrid SQL/NoSQL approaches.
NoSQL Database: Classification, Characteristics and ComparisonMayuree Srikulwong
My students' presentation of a paper "NoSQL Database: New Era of Databases for Big Data Analytics - Classification, Characteristics and Comparison" by Moniruzzaman, A.B.M. and Hossain, S.A. (2013).
Hybrid Transactional & Analytical Processing are a new breed of database queries offered by NewSQL engines. This talk features key engine capabilities required to offer HTAP and state of the art of PostgreSQL 11 that aligns with an HTAP vision in terms of sharding, fault tolerance, high availability, and replication
Why we need Database Awareness?
Document vs Relational
Row-based vs Column-based
In-memory Database vs In-memory Data grids
Graph
Time-series
Solr vs ElasticSearch
Event Store
NoSQL Database: Classification, Characteristics and ComparisonMayuree Srikulwong
My students' presentation of a paper "NoSQL Database: New Era of Databases for Big Data Analytics - Classification, Characteristics and Comparison" by Moniruzzaman, A.B.M. and Hossain, S.A. (2013).
Hybrid Transactional & Analytical Processing are a new breed of database queries offered by NewSQL engines. This talk features key engine capabilities required to offer HTAP and state of the art of PostgreSQL 11 that aligns with an HTAP vision in terms of sharding, fault tolerance, high availability, and replication
Why we need Database Awareness?
Document vs Relational
Row-based vs Column-based
In-memory Database vs In-memory Data grids
Graph
Time-series
Solr vs ElasticSearch
Event Store
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Here are the slides for my talk "An intro to Azure Data Lake" at Techorama NL 2018. The session was held on Tuesday October 2nd from 15:00 - 16:00 in room 7.
These slides are a copy of a last Azure Cosmos DB + Gremlin API in Action session which I had the pleasure to present on June 2nd, 2018 at PASS SQL Saturday event in Montreal. The original PowerPoint version contained much more elaborate series of animations. We understand that those had to be flatten for upload in this case. Though I guess you'll get the idea of the logic involved.
Data Integration through Data Virtualization (SQL Server Konferenz 2019)Cathrine Wilhelmsen
Data Integration through Data Virtualization - PolyBase and new SQL Server 2019 Features (Presented at SQL Server Konferenz 2019 on February 21st, 2019)
Azure SQL Database (SQL DB) is a database-as-a-service (DBaaS) that provides nearly full T-SQL compatibility so you can gain tons of benefits for new databases or by moving your existing databases to the cloud. Those benefits include provisioning in minutes, built-in high availability and disaster recovery, predictable performance levels, instant scaling, and reduced overhead. And gone will be the days of getting a call at 3am because of a hardware failure. If you want to make your life easier, this is the presentation for you.
How to Build Modern Data Architectures Both On Premises and in the CloudVMware Tanzu
Enterprises are beginning to consider the deployment of data science and data warehouse platforms on hybrid (public cloud, private cloud, and on premises) infrastructure. This delivers the flexibility and freedom of choice to deploy your analytics anywhere you need it and to create an adaptable and agile analytics platform.
But the market is conspiring against customer desire for innovation...
Leading public cloud vendors are interested in pushing their new, but proprietary, analytic stacks, locking customers into subpar Analytics as a Service (AaaS) for years to come.
In tandem, Legacy Data Warehouse vendors are trying to extend the lifecycle of their costly and aging appliances with new features of marginal value, simply imitating the same limiting models of public cloud vendors.
New vendors are coming up with interesting ideas, but these ideas are often lacking critical features that don’t provide support for hybrid solutions, limiting the immediate value to users.
It is 2017—you can, in fact, have your analytics cake and eat it too! Solve your short term costs and capabilities challenges, and establish a long term hybrid data strategy by running the same open source analytics platform on your infrastructure as it exists today.
In this webinar you will learn how Pivotal can help you build a modern analytical architecture able to run on your public, private cloud, or on-premises platform of your choice, while fully leveraging proven open source technologies and supporting the needs of diverse analytical users.
Let’s have a productive discussion about how to deploy a solid cloud analytics strategy.
Presenter : Jacque Istok, Head of Data Technical Field for Pivotal
https://content.pivotal.io/webinars/jul-20-how-to-build-modern-data-architectures-both-on-premises-and-in-the-cloud
The rising interest in NoSQL technology over the last few years resulted in an increasing number of evaluations and comparisons among competing NoSQL technologies From survey we create a concise and up-to-date comparison of NoSQL engines, identifying their most beneficial use from the software engineer point of view.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Here are the slides for my talk "An intro to Azure Data Lake" at Techorama NL 2018. The session was held on Tuesday October 2nd from 15:00 - 16:00 in room 7.
These slides are a copy of a last Azure Cosmos DB + Gremlin API in Action session which I had the pleasure to present on June 2nd, 2018 at PASS SQL Saturday event in Montreal. The original PowerPoint version contained much more elaborate series of animations. We understand that those had to be flatten for upload in this case. Though I guess you'll get the idea of the logic involved.
Data Integration through Data Virtualization (SQL Server Konferenz 2019)Cathrine Wilhelmsen
Data Integration through Data Virtualization - PolyBase and new SQL Server 2019 Features (Presented at SQL Server Konferenz 2019 on February 21st, 2019)
Azure SQL Database (SQL DB) is a database-as-a-service (DBaaS) that provides nearly full T-SQL compatibility so you can gain tons of benefits for new databases or by moving your existing databases to the cloud. Those benefits include provisioning in minutes, built-in high availability and disaster recovery, predictable performance levels, instant scaling, and reduced overhead. And gone will be the days of getting a call at 3am because of a hardware failure. If you want to make your life easier, this is the presentation for you.
How to Build Modern Data Architectures Both On Premises and in the CloudVMware Tanzu
Enterprises are beginning to consider the deployment of data science and data warehouse platforms on hybrid (public cloud, private cloud, and on premises) infrastructure. This delivers the flexibility and freedom of choice to deploy your analytics anywhere you need it and to create an adaptable and agile analytics platform.
But the market is conspiring against customer desire for innovation...
Leading public cloud vendors are interested in pushing their new, but proprietary, analytic stacks, locking customers into subpar Analytics as a Service (AaaS) for years to come.
In tandem, Legacy Data Warehouse vendors are trying to extend the lifecycle of their costly and aging appliances with new features of marginal value, simply imitating the same limiting models of public cloud vendors.
New vendors are coming up with interesting ideas, but these ideas are often lacking critical features that don’t provide support for hybrid solutions, limiting the immediate value to users.
It is 2017—you can, in fact, have your analytics cake and eat it too! Solve your short term costs and capabilities challenges, and establish a long term hybrid data strategy by running the same open source analytics platform on your infrastructure as it exists today.
In this webinar you will learn how Pivotal can help you build a modern analytical architecture able to run on your public, private cloud, or on-premises platform of your choice, while fully leveraging proven open source technologies and supporting the needs of diverse analytical users.
Let’s have a productive discussion about how to deploy a solid cloud analytics strategy.
Presenter : Jacque Istok, Head of Data Technical Field for Pivotal
https://content.pivotal.io/webinars/jul-20-how-to-build-modern-data-architectures-both-on-premises-and-in-the-cloud
The rising interest in NoSQL technology over the last few years resulted in an increasing number of evaluations and comparisons among competing NoSQL technologies From survey we create a concise and up-to-date comparison of NoSQL engines, identifying their most beneficial use from the software engineer point of view.
We provide an overview of the expressive object model, secondary indexes, high availability, write scalability, query language support, performance benchmarks - database model, performance benchmarks - load characteristics, performance benchmarks - consistency requirements, ease of use, and navigation aggregation.
Content as a Service: What to Know About Decoupled CMSPantheon
Learn:
-How decoupled architecture can help future-proof a website
-How decoupled architecture leverages a wider set of experts by clearly delineating front and back-end
-How to use modules and patterns to build decoupled websites using Drupal 7 and WordPress
-What to expect from both Drupal 8 and the upcoming WordPress JSON API
This presentation introduces the graph model as obvious choice for rich and connected data. Graph Databases are a category of open-source NoSQL datastores which are specialized in storing, handling and querying graph structures efficiently.
Use cases represent the applicability of the graph model across many domains.
Neo4j as the most widely used graph database supports the property graph model, which is explained in detail.
To query a graph database a powerful and expressive but also friendly and easily understandable query language that is tailored for graph patterns is key. Neo4j's Cypher is such a query language developed from the ground up to support expressing challenging use-cases in a comprehensive way.
A series of examples rounds up the presentation to apply the lessons learned.
Microservices Architecture for Content Management Systems using AWS Lambda an...Mitoc Group
Content Management Systems are by nature resource intensive, expensive to customize, and difficult to manage at scale. What if we can change this perception and help PHP / Drupal developers architect a content platform that is high performance and low cost, high security and low maintenance? This talk will focus on 3 key topics: 1) serverless environment, 2) microservices architecture and 3) hands-on demos. We will describe a serverless solution and propose a scalable architecture that will help Drupal community to adopt cloud-native approach without huge efforts or expensive resources allocation.
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
The unprecedented scale at which data is consumed and generated today has shown a large demand for scalable data management and given rise to non-relational, distributed "NoSQL" database systems. Two central problems triggered this process: 1) vast amounts of user-generated content in modern applications and the resulting requests loads and data volumes 2) the desire of the developer community to employ problem-specific data models for storage and querying. To address these needs, various data stores have been developed by both industry and research, arguing that the era of one-size-fits-all database systems is over. The heterogeneity and sheer amount of these systems - now commonly referred to as NoSQL data stores - make it increasingly difficult to select the most appropriate system for a given application. Therefore, these systems are frequently combined in polyglot persistence architectures to leverage each system in its respective sweet spot. This tutorial gives an in-depth survey of the most relevant NoSQL databases to provide comparative classification and highlight open challenges. To this end, we analyze the approach of each system to derive its scalability, availability, consistency, data modeling and querying characteristics. We present how each system's design is governed by a central set of trade-offs over irreconcilable system properties. We then cover recent research results in distributed data management to illustrate that some shortcomings of NoSQL systems could already be solved in practice, whereas other NoSQL data management problems pose interesting and unsolved research challenges.
If you'd like to use these slides for e.g. teaching, contact us at gessert at informatik.uni-hamburg.de - we'll send you the PowerPoint.
Relational databases vs Non-relational databasesJames Serra
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
ارائه در زمینه کلان داده،
کارگاه آموزشی "عصر کلان داده، چرا و چگونه؟" در بیست و دومین کنفرانس انجمن کامپیوتر ایران csicc2017.ir
وحید امیری
vahidamiry.ir
datastack.ir
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
Amazon Elastic MapReduce (Amazon EMR) makes it easy to provision and manage Hadoop in the AWS Cloud. Hadoop is available in multiple distributions and Amazon EMR gives you the option of using the Amazon Distribution or the MapR Distribution for Hadoop.
This webinar will show you examples of how to use Amazon EMR to with the MapR Distribution for Hadoop. You will learn how you can free yourself from the heavy lifting required to run Hadoop on-premises, and gain the advantages of using the cloud to increase flexibility and accelerate projects while lowering costs.
What we'll learn:
• See a live demonstration of how you can quickly and easily launch your first Hadoop cluster in a few steps.
• Examples of real world applications and customer successes in production
• Best practices for maximizing the benefits of using MapR with AWS.
NativeX (formerly W3i) recently transitioned a large portion of their backend infrastructure from MS SQL Server to Apache Cassandra. Today, its Cassandra cluster backs its mobile advertising network supporting over 10 million daily active users producing over 10,000 transactions per second with an average database request latency of under 2 milliseconds. Going from relational to noSQL required NativeX's engineers to re-train, re-tool and re-think the way it architects applications and infrastructure. Learn why Cassandra was selected as a replacement, what challenges were encountered along the way, and what architecture and infrastructure were involved in the implementation.
Let's make a brief introduction to Azure Data eXplorer, with many examples using Kusto dialect and C# client.
With a particular focus on IIoT contexts and proces control data, let's discover how to implement time series analysis in terms of pattern recognition, and trend correlation.
NoSQL is not a buzzword anymore. The array of non- relational technologies have found wide-scale adoption even in non-Internet scale focus areas. With the advent of the Cloud...the churn has increased even more yet there is no crystal clear guidance on adoption techniques and architectural choices surrounding the plethora of options available. This session initiates you into the whys & wherefores, architectural patterns, caveats and techniques that will augment your decision making process & boost your perception of architecting scalable, fault-tolerant & distributed solutions.
In the spirit of the book 7 Databases in 7 Weeks, Lara Rubbelke and Karen Lopez cover ~seven databases and datastores in the SQL and NoSQL world, when to use them, and how they are SQL-like.
From SQLBitsXV
Notice an error? Let me know. I welcome this sort of feedback.
High performance Redis is popular among developers for its incredible performance, versatility and simplicity. The powerful combination of low cost memory and high performance Redis brings to life new next generation analytic uses - such as simultaneous real time transaction and analytics processing. With Redis Labs' RLEC Flash on AWS SSD instances, you can get fantastic performance at up to 70% lower costs. Join this session to learn how next generation Flash from leading memory provider Intel has made significant strides in performance while retaining its cost advantage to memory. Using a combination of AWS' powerful SSD instances, and Redis Labs' RLEC Flash, you can achieve up to 3M ops/sec at sub millisecond latencies, with a combination of RAM and Flash. The session will also feature customer use cases from a large university, a large customer engagement company and a pioneer of online Flash sales. Session sponsored by Redis Labs.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
3. The current world of NoSQL The current world of NoSQL RDBMS 105+ databases NoSQL 122+ databases Forecast NoSQL market expected to reach $3.4 Billion by 2018 NoSQL market revenue $14 Billion over 2013 – 2018 RDBMS are great and ... will be fine
4.
5.
6.
7.
8. NoSQL models 31% Key Value 10% Document 13% Graph 9% Column Family 6% XML 10% Object 21% Other
9.
10.
11. Comparison General Information Redis CouchBase Neo4j Cassandra Language C C, C++, Erlang Java Java Commercial Support Third party companies Consulting & Support with Enterprise Neo4j Advanced, Neo4j Enterprise DataStax, Impetus, Acunu, Riptapo, Cubet Technologies Customers GitHub, Guardian Media Group Zynga, AOL, BBC Adobe, Cisco, StudiVZ, Deutsche Telekom, Fanbox Twitter, Digg, Reddit, Rackspace, Facebook Licenses New BSD Community & Enterprise Licenses GPL or AGPLv3 Apache License 2
13. Best Use Redis CouchBase Neo4j Cassandra Real-time systems where low latency is critical (games) Syncing online and offline data (allows synchronization and sharing of data and applications across multiple platforms and mobile devices) Cloud/network management Managing large streams of non-transactional data: apache logs, application logs, etc High performance caching tier for web sites and other applications Social and online gaming Social, geospatial data Consistent, fast response times under writes (high volume writes) Server for backed sessions or transient data Data management layer for recommendation engine Bioinformatics Real-time analytics & statistics Service offering some real-time statistics Highly available solution
14. Which model should you use? Column Oriented Store Document Store Key Value Store Graph Database More specific: which NoSQL database?
15.
16.
17.
18. Lessons learned from actual use Hybrid Approach NoSQL RDBMS Business Facade Two Databases: NoSQL + RDBMS Key Value Storage for Session Data + RDBMS for User Data Column Storage for Reporting Data + RDBMS for User Data
19.
Editor's Notes
It's a well known truth that we should choose the right tool for the job. Everyone says that. Who can disagree? The problem is this is not helpful assertion without being able to answer more specific questions like: what jobs are the tools good at? What NoSQL database should I choose out of many available options? Here is Table of Contents of our today's workshop which aim's is to to help you to get an answer for this question. 1) First off, I will provide some statistics and interesting numbers from the current world of NoSQL. 2) Next, you will know some info on NoSQL initiative and specifically about classes of NoSQL databases. 3) Then I will tell you the differences between existing models & classes of NoSQL. We will stop more on one specific database out of each class. 4) Next five minutes I will briefly tell about differences of these databases and outline best use cases for each. 5) When speaking about how ultimately choose the right tool I will recap some recommendations and good approaches. 6) NoSQL clients - few real world examples and stories (from Renat). 7) Finally lessons that were learnt from the actual usage of NoSQL.
The worldwide NoSQL market is expected to reach $3.4 billion by 2018 and NoSQL market will generate $14 Billion in revenues over the period 2013 – 2018. RDBMS are great and the forecast that they will be fine. Why?
Oracle officially released memcached daemon plugin that talks with InnoDB and NoSQL+MySQL has become an official solution. More changes bridging NoSQL\\SQL divide: Neo4j recently announced that JDBC interface was created which forwards database queries to Neo4j and allows common applications to access the NoSQL database without modification. Cassandra + CQL (structured query language) Couchbase Server 2.0 comes along with a NoSQL query language called UnQL Interest in using key-value pair (KVP) technology has reemerged to the point where the traditional RDMS vendors evaluate strategy of developing in-house NoSQL solutions and integrating them in current product offers. It will not take long before we’ll see acquisitions driven by emerging NoSQL technology. Oracle officially released memcached daemon plugin that talks with InnoDB and NoSQL+MySQL has become an official solution. More changes bridging NoSQL\\SQL divide: Neo4j recently announced that JDBC interface was created which forwards database queries to Neo4j and allows common applications to access the NoSQL database without modification. Cassandra + CQL (structured query language) Couchbase Server 2.0 comes along with a NoSQL query language called UnQL By the way, the same state was with the database market in the 1970s before SQL was invented (a lot of APIs and no single standard)
NoSQL initiative promotes a loosely defined class of non-relational data stores that break with a ACID paradigm and relational databases. NoSQL data management systems are inherently: - Schema-free (no unneeded complexity; flexible data models; variety of features and strict data consistency of RDBMS might be unnecessary; - Huge data amount & high throughput over slow, expensive in terms of performance relational databases in favor of more efficient and cheaper ways of managing data; dealing with big data and web scale; - Eventually consistent / BASE (not ACID) -basically available, soft state, eventual consistency; - Simple API
Core NoSQL systems can be divided in these main classes: Key-Value Stores (Riak, Redis, MemcacheDB) Wide Column Store / Column Families (Cassandra, Hbase, Amazon SimpleDB) Document Stores (MongoDB, CouchDB, Jackrabbit) Graph Databases (Neo4J, InfiniteGraph) XML Databases (Berkeley DB XML, eXist) - typically communication is performed by means of HTTP/REST, WebDAV, SOAP, XML-RPC and xml-oriented query method: XQuery, Xpointer, Xpath Object Databases (Objectivity, db4o) – one of the main goals is to provide an easy and native interface to persistence for object oriented programming languages. Other (unresolved and uncategorized)
Redis is an open-source, networked, in-memory, key-value data store with optional durability. It is written in ANSI C. The development of Redis is sponsored by VMware. CouchBase is open source, schema-free document database, which provides JavaScript-based map/reduce-indexing to query and analyze data; peer-based replication, geoCouch for creating location-aware applications, binary packages for Red Hat and Ubuntu Linux, Windows, and Mac OS X. It combines CouchDB, Membase, and Memcached. Neo4j is open source, either embedded or standalone server with REST API, disk-based, fully transactional Java persistence engine. It stores data with multiple relationships, multiple connections in graphs rather than in tables. Cassandra is an open source distributed database management system designed to handle large amounts of data spread. It provides a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powered their Inbox Search feature until late 2010.
Commercial Support: Redis - CouchBase – depending on whether it community edition license or enterprise license Neo4j - Cassandra – third companies provide commercial support and commercial distributions of Cassandra. Customers & some notable users: Redis – Online hosting service GitHub, British Guardian Media Group CouchBase - organizations including Zynga, AOL, the BBC and thousands of others power their interactive web applications with Couchbase Neo4j – Adobe, Cisco, StudiVZ (the largest social network in Europe), Fanbox (social networking website)
Client Libraries (Accessing your data should be easy): CouchBase non-vBucket ("Classic" Memcached clients) or vBucket-aware Type 2 Membase clients (vBucket is defined as the "owner" of a subset of the key space of a membase cluster. Every key "belongs" to a vBucket. A mapping function is used to calculate the vbucket in which a given key belongs). Cassandra's client API is built entirely on top of Thrift for different programming languages including Python, Java, .NET, Ruby, PHP, Perl, C++ Map\\Reduce (Generally available parallel computing might be impotant, ): Cassandra – enables certain Hadoop functionality against Cassandra's data. ACID transactions: Redis is not a "durable" datastore, in the sense of the "D" in ACID. CouchBase – support ACID transaction semantics Neo4j – supports ACID transactions with the default isolation level is read committed, locks are acquired at the Node and Relationship level, deadlock detection is built into the core transaction management.
Redis: Service offering some-realtime statistics. A good example of this - an application built on Redis, a tool for debugging HTTP requests built in 48 hours by Leah Culver and Chris Wanstrath. It's called Hurl. Transient data. Any transient data used by your application is also a good fit for Redis CouchBase: Sync mobile data to cloud data - Not all iPhones, iPads or iPod Touch devices are online all the time, or even within range of Internet connectivity. But the devices, and software, must be useful whether online or offline. Social and online gaming – CouchBase can be a good option for data management layer in the social and online gaming, where predictable latency, responsiveness and automated data caching are required. Data management layer for recommendation engine – recommendation engine targeting ads and offers. Targeting algorithms and approaches can change and often require changes in input data. With schema-free data it's no need to define a database schema before inserting data. Neo4j: Social, geospatial data – neo4j allows queries to find target nodes or shortest paths. It allows indexing on node/relationship properties.
- Maturity Some databases are not as proven Incomplete NoSQL solutions You write a larger data management tier You maintain your business code and infrastructure code You have to customize management and deployment technology and procedures - Connectivity/querying APIs for .NET, Java, Perl, Python, etc. Some solutions have no querying When available query languages differ Lack of general ad-hoc querying – “no” SQL
A distributed system can support only two of the following characteristics: Consistency (all nodes see the same data at the same time), Availability (every operation must terminate in an intended response), Partition tolerance (Operations will complete, even if individual components are unavailable)
Start small, but significant – meaning that you should focus on the problem you try to solve with NoSQL