This presentation contains the introduction to NOSQL databases, it's types with examples, differentiation with 40 year old relational database management system, it's usage, why and we should use it.
This presentation contains the introduction to NOSQL databases, it's types with examples, differentiation with 40 year old relational database management system, it's usage, why and we should use it.
“not only SQL.”
NoSQL databases are databases store data in a format other than relational tables.
NoSQL databases or non-relational databases don’t store relationship data well.
MongoDB is a popular NoSQL database. This presentation was delivered during a workshop.
First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases.
Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less.
At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.
Apache HBase™ is the Hadoop database, a distributed, salable, big data store.Its a column-oriented database management system that runs on top of HDFS.
Apache HBase is an open source NoSQL database that provides real-time read/write access to those large data sets. ... HBase is natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN.
This presentation about HBase will help you understand what is HBase, what are the applications of HBase, how is HBase is different from RDBMS, what is HBase Storage, what are the architectural components of HBase and at the end, we will also look at some of the HBase commands using a demo. HBase is an essential part of the Hadoop ecosystem. It is a column-oriented database management system derived from Google’s NoSQL database Bigtable that runs on top of HDFS. After watching this video, you will know how to store and process large datasets using HBase. Now, let us get started and understand HBase and what it is used for.
Below topics are explained in this HBase presentation:
1. What is HBase?
2. HBase Use Case
3. Applications of HBase
4. HBase vs RDBMS
5. HBase Storage
6. HBase Architectural Components
What is this Big Data Hadoop training course about?
Simplilearn’s Big Data Hadoop training course lets you master the concepts of the Hadoop framework and prepares you for Cloudera’s CCA175 Big data certification. The Big Data Hadoop and Spark developer course have been designed to impart in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
MongoDB is the most famous and loved NoSQL database. It has many features that are easy to handle when compared to conventional RDBMS. These slides contain the basics of MongoDB.
“not only SQL.”
NoSQL databases are databases store data in a format other than relational tables.
NoSQL databases or non-relational databases don’t store relationship data well.
MongoDB is a popular NoSQL database. This presentation was delivered during a workshop.
First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases.
Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less.
At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.
Apache HBase™ is the Hadoop database, a distributed, salable, big data store.Its a column-oriented database management system that runs on top of HDFS.
Apache HBase is an open source NoSQL database that provides real-time read/write access to those large data sets. ... HBase is natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN.
This presentation about HBase will help you understand what is HBase, what are the applications of HBase, how is HBase is different from RDBMS, what is HBase Storage, what are the architectural components of HBase and at the end, we will also look at some of the HBase commands using a demo. HBase is an essential part of the Hadoop ecosystem. It is a column-oriented database management system derived from Google’s NoSQL database Bigtable that runs on top of HDFS. After watching this video, you will know how to store and process large datasets using HBase. Now, let us get started and understand HBase and what it is used for.
Below topics are explained in this HBase presentation:
1. What is HBase?
2. HBase Use Case
3. Applications of HBase
4. HBase vs RDBMS
5. HBase Storage
6. HBase Architectural Components
What is this Big Data Hadoop training course about?
Simplilearn’s Big Data Hadoop training course lets you master the concepts of the Hadoop framework and prepares you for Cloudera’s CCA175 Big data certification. The Big Data Hadoop and Spark developer course have been designed to impart in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
MongoDB is the most famous and loved NoSQL database. It has many features that are easy to handle when compared to conventional RDBMS. These slides contain the basics of MongoDB.
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
Presented by Mark Miller, Software Engineer, Cloudera
As the NoSQL ecosystem looks to integrate great search, great search is naturally beginning to expose many NoSQL features. Will these Goliath's collide? Or will they remain specialized while intermingling – two sides of the same coin.
Come learn about where SolrCloud fits into the NoSQL landscape. What can it do? What will it do? And how will the big data, NoSQL, Search ecosystem evolve. If you are interested in Big Data, NoSQL, distributed systems, CAP theorem and other hype filled terms, than this talk may be for you.
The rising interest in NoSQL technology over the last few years resulted in an increasing number of evaluations and comparisons among competing NoSQL technologies From survey we create a concise and up-to-date comparison of NoSQL engines, identifying their most beneficial use from the software engineer point of view.
Big Data is the reality of modern business: from big companies to small ones, everybody is trying to find their own benefit. Big Data technologies are not meant to replace traditional ones, but to be complementary to them. In this presentation you will hear what is Big Data and Data Lake and what are the most popular technologies used in Big Data world. We will also speak about Hadoop and Spark, and how they integrate with traditional systems and their benefits.
Similar to No SQL- The Future Of Data Storage (20)
Electron JS | Build cross-platform desktop applications with web technologiesBethmi Gunasekara
Electron is an open source library developed by GitHub for building cross-platform desktop applications with HTML, CSS, and JavaScript. Electron accomplishes this by combining Chromium and Node.js into a single runtime. Apps can be packaged for Mac, Windows, and Linux.
Demo: https://github.com/bethmi/electron-mysql-demo
General Framework for Sentiment Analysis of Twitter Data, with Special Attent...Bethmi Gunasekara
This project presents a general framework for sentiment analysis of Twitter data, by analyzing the typical public reaction towards health and well-being in Twitter media. The proposed framework is developed using Python, based on part-of-speech (POS) tagged bigrams. Tweets mentioning about common health issues are collected using NodeXL, a free and open-source network analysis tool. Extracted unstructured twitter data is preprocessed and a representative feature vector is generated for each tweet. A probabilistic classifier like Naïve Bayes is trained to determine the polarity and polarity score of the tweet.
This system presents three major outputs: automatic classification of a given tweet, analysis of the general public attitude as well as the top stories from that given set of tweets. Also it contains a module to track the most popular words or phrases in the feed related to a specific topic.
TestNG is a testing framework inspired from JUnit and NUnit, which can be used as a core unit test framework for Java project.
Demo: https://github.com/bethmi/testng-demo
“eLEAD” a Construction Industry Web portal, that provides all the information related to ongoing and upcoming opportunities in the field. The site developed using PHP, presents the latest details about a project and the vacancies available up-to-date along with their social plugins. An online inquiry desk has been created for the user, to solve any matter regarding an article.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
3. • Flat Files
• Hierarchical Databases
• Object Oriented Databases
• Relational Databases
have been in use to store &
retrieve data for ages
3
4. Problems associated with RDBMS
• Unable to address large volumes of
data.
• Unable to handle agile sprints, quick
iteration, and frequent code push
• Expensive, monolithic architecture
4
6. • The machines in these
large clusters are
individually unreliable
• But the overall cluster
keeps working even as
machines die - so the
overall cluster is reliable.
• The “cloud” is exactly this
kind of cluster, which
means relational
databases don’t play well
with the cloud.
6
8. • Web services provide an alternative to shared
databases for application integration
• They make it easier for different applications to
choose their own data storage, avoiding relational
databases.
• Google → Bigtable
• Amazon → Dynamo
8
11. What is NoSQL?
• NoSQL database is the first alternative to relational
databases, with scalability, availability, and fault
tolerance being key deciding factors.
• It goes well beyond the more widely understood
relational databases in satisfying the needs of
today’s business applications.
– Oracle
– SQL Server
11
12. Why NoSQL?
• Big Users
• Big Data
• The Internet of Things
• Cloud computing
12
14. Big Users
• NoSQL offers the
dynamic scalability and
level of scale they need
while maintaining the
performance users
demand.
14
15. Big Data
• NoSQL provides a much
more flexible, schema-
less data model that
better maps to an
application’s data
organization
• It simplifies the
interaction between the
application and the
database, resulting in less
code to write, debug, and
maintain.
15
16. The Internet of Things
• NoSQL can
– scale concurrent data
access to millions of
connected devices and
systems
– store billions of data
points
– meet the performance
requirements of mission-
critical infrastructure and
operations
16
17. Cloud Computing
• NoSQL databases are
built from the ground
up to be distributed,
scale-out technologies
• It gives a better fit with
the highly distributed
nature of the three-tier
internet architecture.
17
18. Reasons to choose NoSQL databases for
future development work
• To improve programmer productivity
by using a database which matches an
application's needs better.
• To improve data access performance
via some combination of
– handling larger data volumes,
– reducing latency,
– improving throughput.
18
19. Prominent NoSQL database users
• Google
• Facebook
• Mozilla
• Adobe
• Foursquare
• LinkedIn
• Digg
• McGraw-Hill Education
• Vermont Public Radio
19
22. Common Characteristics
• Not a relational data model
– No SQL queries
• Tends to be designed to run on clusters of
multiple nodes
• Tends to be Open Source
• No fixed schema, allowing you to store any
data in any record
• Designed for data sets of web scale
• Follows CAP theorem
22
23. Scale-up Database Tier with RDBMS
• To support more concurrent
users and store more data,
relational databases require
a bigger and more
expensive server with more
CPUs, memory, and disk
storage.
• At some point, the capacity
of even the biggest server
can be outstripped and the
relational database cannot
scale further!
23
24. Scale-out Database Tier with NoSQL
• NoSQL databases provide
an easier, linear, and cost
effective approach to
database scaling.
• As the number of
concurrent users grows,
simply add additional low-
cost, commodity servers to
your cluster.
• There’s no need to modify
the application, since the
application always sees a
single (distributed)
database.
24
25. Performing Queries???
• RESTful interfaces (HTTP as an access API)
• Query languages other than SQL
– GQL - SQL-like QL for Google BigTable
– SPARQL - Query language for the Semantic Web
– Gremlin - the graph traversal language
– Sones Graph Query Language
• Query APIs
– The Google BigTable DataStore API
– The Neo4j Traversal API
25
28. • Because of the variety
of approaches and
overlaps it is difficult to
maintain an overview of
non-relational
databases.
• A basic classification is
based on data model.
28
30. Key-Value databases
• Simplest NoSQL data store
• Handles large amounts of data.
• Based on Amazon’s Dynamo paper.
• Key value stores allow developer to
store schema-less data, as hash table
where each key is unique and the
value can be string, JSON, BLOB
(basic large object) etc.
• A key may be strings, hashes, lists,
sets, sorted sets and values are
stored against these keys.
• Key-Value stores can be used as
collections, dictionaries, associative
arrays etc.
30
31. • Examples for Key-value store Databases:
– Riak
– Redis
– Memcached
– Berkeley DB
– HamsterDB (especially suited for embedded use)
– Amazon DynamoDB (not open-source)
– Project Voldemort
– Couchbase.
31
32. Document databases
• A collection of documents
• Data in this model is stored inside
documents.
• A document is a key value collection
where the key allows access to its
value.
• Documents are not typically forced to
have a schema and therefore are
flexible and easy to change.
• Documents are stored into
collections in order to group different
kinds of data.
• Documents can contain many
different key-value pairs, or key-array
pairs, or even nested documents.
32
34. Column family stores
• Column-oriented databases primarily
work on columns and every column is
treated individually.
• Stores data in column specific files
and query processors work on
columns too.
• All data within each column data file
have the same type which makes it
ideal for compression.
• Column stores can improve the
performance of queries as it can
access specific column data.
34
36. Graph databases
• A graph database stores data in a graph.
• It is capable of elegantly representing
any kind of data in a highly accessible
way.
• Each node represents an entity (such as
a student or business) and each edge
represents a connection or relationship
between two nodes.
• Every node and edge is defined by a
unique identifier.
• Each node knows its adjacent nodes.
• As the number of nodes increases, the
cost of a local step (or hop) remains the
same.
• Index for lookups.
36
38. Performance
Data Model Performance Scalability Flexibility Complexity Functionality
Key-Value
High High High None
Variable
(none)
Column
Oriented
High High Moderate Low Minimal
Document
Oriented
High
Variable
(High)
High Low
Variable
(low)
Graph
Variable Variable High High
Graph
Theory
Relational
Variable Variable Low Moderate
Relational
Algebra
38
39. How to select your NoSQL database?
39
Key-value databases
• For storing session
information, user profiles,
preferences, shopping cart
data.
• Avoid when you need to
query data or to operate
on multiple keys at the
same time.
Document databases
• For content management
systems, blogging
platforms, web analytics,
real-time analytics, and e-
commerce-applications.
• Avoid systems that need
complex transactions
spanning multiple
operations or queries
against varying aggregate
structures.
Column family
databases
• For content management
systems, blogging
platforms, maintaining
counters, expiring usage,
heavy write volume such
as log aggregation.
• Avoid systems that are in
early development,
changing query patterns.
Graph databases
• For connected data
networks like social
networks, spatial data,
routing information for
goods and money,
recommendation engines
41. There are now more than 50 vendors in
NoSQL DB software and services space!!!
41
42. Even the most popular RDBMS vendors are
pragmatic about the future of databases!!!
42
• Berkeley DB (open-source)Oracle
• Hadoop
• MongoDBIBM
• NoSQL solutions on its
Windows Azure cloud-based
storage solution
Microsoft
43. Job Market
• There is a huge
opportunity for those
with an expertise in
NoSQL databases
43
44. The percentage of job market for MySQL has been
more or less flat, while for Mongo
the job market has been increasing
exponentially...
44
45. 45
This is sure to amplify, as the NoSQL
databases become more and more mature.
47. Summary
• Selecting the correct database for your goal is very
important.
• NoSQL offers better solutions in handling BIG DATA
• Most of them are also open-source.
• Often, organizations will begin with a small-scale trial
of a NoSQL database in their organization, which
makes it possible to develop an understanding of the
technology.
• When comparing with other NoSQL databases,
databases like Cassandra, Hbase & MongoDB are more
popular among enterprise developers because they
require little overhead and can be up and running
quickly for prototyping new kinds of apps or data
analysis.
47