Big data analytics provides various advantages like better decision making and preventing fraudulent activities. The document discusses introduction to big data analytics including what is big data, evolution of big data, types of data, characteristics of big data, applications of big data, distributed file systems, and NoSQL databases. NoSQL databases are useful for big data as they can scale horizontally and support unstructured data from sources like social media.
Introduction to Big Data
Big Data is a massive collection of data that is growing exponentially over time.
It is a data set that is so large and complex that traditional data management tools cannot store or process it efficiently.
Big data is a type of data that is extremely large in size.
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
It is an exciting and interesting time to be involved in data. More change of influence has occurred in the database management in the last 18 months than has occurred in the last 18 years. New technologies such as NoSQL & Hadoop and radical redesigns of existing technologies, like NewSQL , will change dramatically how we manage data moving forward.
These technologies bring with them possibilities both in terms of the scale of data retained but also in how this data can be utilized as an information asset. The ability to leverage Big Data to drive deep insights will become a key competitive advantage for many organisations in the future.
Join Tony Bain as he takes us through both the high level drivers for the changes in technology, how these are relevant to the enterprise and an overview of the possibilities a Big Data strategy can start to unlock.
Every day we roughly create 2.5 Quintillion bytes of data; 90% of the worlds collected data has been generated only in the last 2 years. In this slide, learn the all about big data
in a simple and easiest way.
INTRODUCTION TO BIG DATA AND HADOOP
9
Introduction to Big Data, Types of Digital Data, Challenges of conventional systems - Web data, Evolution of analytic processes and tools, Analysis Vs reporting - Big Data Analytics, Introduction to Hadoop - Distributed Computing
Challenges - History of Hadoop, Hadoop Eco System - Use case of Hadoop – Hadoop Distributors – HDFS – Processing Data with Hadoop – Map Reduce.
Introduction to Big Data
Big Data is a massive collection of data that is growing exponentially over time.
It is a data set that is so large and complex that traditional data management tools cannot store or process it efficiently.
Big data is a type of data that is extremely large in size.
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
It is an exciting and interesting time to be involved in data. More change of influence has occurred in the database management in the last 18 months than has occurred in the last 18 years. New technologies such as NoSQL & Hadoop and radical redesigns of existing technologies, like NewSQL , will change dramatically how we manage data moving forward.
These technologies bring with them possibilities both in terms of the scale of data retained but also in how this data can be utilized as an information asset. The ability to leverage Big Data to drive deep insights will become a key competitive advantage for many organisations in the future.
Join Tony Bain as he takes us through both the high level drivers for the changes in technology, how these are relevant to the enterprise and an overview of the possibilities a Big Data strategy can start to unlock.
Every day we roughly create 2.5 Quintillion bytes of data; 90% of the worlds collected data has been generated only in the last 2 years. In this slide, learn the all about big data
in a simple and easiest way.
INTRODUCTION TO BIG DATA AND HADOOP
9
Introduction to Big Data, Types of Digital Data, Challenges of conventional systems - Web data, Evolution of analytic processes and tools, Analysis Vs reporting - Big Data Analytics, Introduction to Hadoop - Distributed Computing
Challenges - History of Hadoop, Hadoop Eco System - Use case of Hadoop – Hadoop Distributors – HDFS – Processing Data with Hadoop – Map Reduce.
Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, and information privacy.
Big Data with Hadoop and HDInsight. This is an intro to the technology. If you are new to BigData or just heard of it. This presentation help you to know just little bit more about the technology.
What is Big Data and why it is required and needed for the organization those who really need and generating huge amount of data and when it will be use
The presentation includes the introduction to the topic, the various dimensions of big data, its evolution from big data 1.0 to bid data 3.0 and its impact on various industries, uses as well as the challenges it faces. The concluding slide gives a brief on the future of big data.
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
Joe Caserta, President at Caserta Concepts addressed the challenges of Business Intelligence in the Big Data world at the Third Annual Great Lakes BI Summit in Detroit, MI on Thursday, March 26. His talk "Architecting for Big Data: Trends, Tips and Deployment Options," focused on how to supplement your data warehousing and business intelligence environments with big data technologies.
For more information on this presentation or the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/.
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
I will discuss the growth of big data and the evolution of traditional enterprise models with addition of critical building blocks to handle the intense development of data in the enterprise. According to IDC approximations the size of the digital universe in 2011 will be 1.8 zettabytes. With statistics evolution beyond Moore’s Law the average enterprise will need to manage 50 times more information by the year 2020 while cumulative IT team by only 1.5 percent. With this challenge in mind, the combination of big data models into existing enterprise infrastructures is a critical element when seeing the addition of new big data building blocks while bearing in mind the efficiency.
Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.
to effectively analyze this kind of information is now seen as a key competitive advantage to better inform decisions. In order to do so, organizations employ Sentiment Analysis (SA) techniques on these data. However, the usage of social media around the world is ever-increasing, which considerably accelerates massive data generation and makes traditional SA systems unable to deliver useful insights. Such volume of data can be efficiently analyzed using the combination of SA techniques and Big Data technologies. In fact, big data is not a luxury but an essential necessary to make valuable predictions. However, there are some challenges associated with big data such as quality that could highly affect the SA systems’ accuracy that use huge volume of data. Thus, the quality aspect should be addressed in order to build reliable and credible systems. For this, the goal of our research work is to consider Big Data Quality Metrics (BDQM) in SA that rely of big data. In this paper, we first highlight the most eloquent BDQM that should be considered throughout the Big Data Value Chain (BDVC) in any big data project. Then, we measure the impact of BDQM on a novel SA method accuracy in a real case study by giving simulation results.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
More Related Content
Similar to Big Data Analytics Materials, Chapter: 1
Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, and information privacy.
Big Data with Hadoop and HDInsight. This is an intro to the technology. If you are new to BigData or just heard of it. This presentation help you to know just little bit more about the technology.
What is Big Data and why it is required and needed for the organization those who really need and generating huge amount of data and when it will be use
The presentation includes the introduction to the topic, the various dimensions of big data, its evolution from big data 1.0 to bid data 3.0 and its impact on various industries, uses as well as the challenges it faces. The concluding slide gives a brief on the future of big data.
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
Joe Caserta, President at Caserta Concepts addressed the challenges of Business Intelligence in the Big Data world at the Third Annual Great Lakes BI Summit in Detroit, MI on Thursday, March 26. His talk "Architecting for Big Data: Trends, Tips and Deployment Options," focused on how to supplement your data warehousing and business intelligence environments with big data technologies.
For more information on this presentation or the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/.
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
I will discuss the growth of big data and the evolution of traditional enterprise models with addition of critical building blocks to handle the intense development of data in the enterprise. According to IDC approximations the size of the digital universe in 2011 will be 1.8 zettabytes. With statistics evolution beyond Moore’s Law the average enterprise will need to manage 50 times more information by the year 2020 while cumulative IT team by only 1.5 percent. With this challenge in mind, the combination of big data models into existing enterprise infrastructures is a critical element when seeing the addition of new big data building blocks while bearing in mind the efficiency.
Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.
to effectively analyze this kind of information is now seen as a key competitive advantage to better inform decisions. In order to do so, organizations employ Sentiment Analysis (SA) techniques on these data. However, the usage of social media around the world is ever-increasing, which considerably accelerates massive data generation and makes traditional SA systems unable to deliver useful insights. Such volume of data can be efficiently analyzed using the combination of SA techniques and Big Data technologies. In fact, big data is not a luxury but an essential necessary to make valuable predictions. However, there are some challenges associated with big data such as quality that could highly affect the SA systems’ accuracy that use huge volume of data. Thus, the quality aspect should be addressed in order to build reliable and credible systems. For this, the goal of our research work is to consider Big Data Quality Metrics (BDQM) in SA that rely of big data. In this paper, we first highlight the most eloquent BDQM that should be considered throughout the Big Data Value Chain (BDVC) in any big data project. Then, we measure the impact of BDQM on a novel SA method accuracy in a real case study by giving simulation results.
Similar to Big Data Analytics Materials, Chapter: 1 (20)
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
2. Content
• What is Big Data? Evolution of Big Data
• Big data Challenges-Traditional versus big data approach
• Structured, unstructured, semi-structured and quasi structured data.
• Characteristics of Big data- Five Vs
• Big data applications.
• Basics of Distributed File System
• The Big Data Technology Landscape: No-SQL
3. What is Big Data?
• Big Data is a term used for a collection of data sets that
are large and complex, which is difficult to store and
process using available database management tools or
traditional data processing applications.
• The challenge includes capturing, curating, storing,
searching, sharing, transferring, analyzing and
visualization of this data.
• Big Data analytics is a process used to extract meaningful
insights, such as hidden patterns, unknown correlations,
market trends, and customer preferences.
• Big Data analytics provides various advantages—it can
be used for better decision making, preventing fraudulent
activities, among other things.
6. SR.NO TRADITIONAL DATA APPROACH BIG DATA APPROACH
1 Traditional data is generated in enterprise
level.
Big data is generated in outside and
enterprise level.
2 Its volume ranges from Gigabytes to Terabytes. Its volume ranges from Petabytes to
Zettabytes or Exabytes.
3 Traditional database system deals with
structured data.
Big data system deals with structured,
semi structured and unstructured data.
4 Traditional data is generated per hour or per
day or more.
But big data is generated more
frequently mainly per seconds.
5 Traditional data source is centralized and it is
managed in centralized form.
Big data source is distributed and it is
managed in distributed form.
6 Data integration is very easy. Data integration is very difficult.
7 Normal system configuration is capable to
process traditional data.
High system configuration is required to
process big data.
8 The size of the data is very small. The size is more than the traditional data
size.
9 Traditional data base tools are required to
perform any data base operation.
Special kind of data base tools are
required to perform any data base
operation.
Big data Challenges-Traditional versus
big data approach
7. SR.NO TRADITIONAL DATA BIG DATA
10 Its data model is strict schema
based and it is static.
Its data model is flat schema
based and it is dynamic.
11 Traditional data is stable and
inter relationship.
Big data is not stable and
unknown relationship.
12
Traditional data is in manageable
volume.
Big data is in huge volume which
becomes unmanageable.
13 It is easy to manage and
manipulate the data.
It is difficult to manage and
manipulate the data.
14 Its data sources includes ERP
transaction data, CRM
transaction data, financial data,
organizational data, web
transaction data etc.
Its data sources includes social
media, device data, sensor data,
video, images, audio etc.
15 Traditional data base tools are
required to perform any data
base operation.
Big data source is distributed and
it is managed in distributed form.
8. Types of Big Data
• Unstructured
• Quasi-Structured
• Semi-Structured
• Structured
12. Characteristics of Big Data
five characteristics that define Big Data are: Volume, Velocity, Variety,
Veracity and Value.
VOLUME
• Volume refers to the ‘amount of data’,
which is growing day by day at a very fast
pace.
• The size of data generated by humans,
machines and their interactions on social
media itself is massive.
• Researchers have predicted that 40
Zettabytes (40,000 Exabytes) will be
generated by 2020, which is an increase of
300 times from 2005.
12
13. Characteristics of Big Data
VELOCITY
• Velocity is defined as the pace at which different sources
generate the data every day.
• This flow of data is massive and continuous.
• There are 1.03 billion Daily Active Users (Facebook DAU) on
Mobile as of now, which is an increase of 22% year-over-year.
• This shows how fast the number of users are growing on social
media and how fast the data is getting generated daily.
• If we are able to handle the velocity, we will be able to generate
insights and take decisions based on real-time data.
14. VARIETY
• As there are many sources which are contributing to Big
Data, the type of data they are generating is different.
• It can be structured, semi-structured or unstructured.
• Hence, there is a variety of data which is getting generated
every day.
• Earlier, we used to get the data from excel and databases,
now the data are coming in the form of images, audios,
videos, sensor data etc. as shown in below image.
• Hence, this variety of unstructured data creates problems
in capturing, storage, mining and analyzing the data.
15. VERACITY
• Veracity refers to the data in doubt or uncertainty of data available due to data
inconsistency and incompleteness.
• In the image below, you can see that few values are missing in the table. Also, a
few values are hard to accept, for example – 15000 minimum value in the 3rd
row, it is not possible.
• This inconsistency and incompleteness is Veracity.
• Data available can sometimes get messy and maybe difficult to trust.
• With many forms of big data, quality and accuracy are difficult to control like
Twitter posts with hashtags, abbreviations, typos and colloquial speech.
• The volume is often the reason behind for the lack of quality and accuracy in the
data.
16. VALUE
• It is all well and good to have access to big data but unless we can turn it into
value it is useless.
• By turning it into value It means, Is it adding to the benefits of the
organizations who are analyzing big data? Is the organization working on Big
Data achieving high ROI (Return On Investment)?
• Unless, it adds to their profits by working on Big Data, it is useless.
17. Applications of Big Data
• Smarter Healthcare
-Making use of the petabytes of patient’s data, the organization
can extract meaningful information and then build applications
that can predict the patient’s deteriorating condition in advance.
• Telecom
-Telecom sectors collects information, analyzes it and provide
solutions to different problems.
- By using Big Data applications, telecom companies have been
able to significantly reduce data packet loss, which occurs when
networks are overloaded, and thus, providing a seamless
connection to their customers.
18. Applications of Big Data
• Retail
Retail has some of the tightest margins, and is one of the greatest
beneficiaries of big data.
The beauty of using big data in retail is to understand consumer
behavior.
Amazon’s recommendation engine provides suggestion based on the
browsing history of the consumer.
• Traffic control
Traffic congestion is a major challenge for many cities globally.
Effective use of data and sensors will be key to managing traffic better
as cities become increasingly densely populated.
18
19. Applications of Big Data
• Manufacturing
Analyzing big data in the manufacturing industry can reduce
component defects, improve product quality, increase efficiency, and
save time and money.
• Search Quality
Every time we are extracting information from google, we are
simultaneously generating data for it.
Google stores this data and uses it to improve its search quality.
19
20. The Big Data Technology Landscape: No-
SQL(Not Only SQL)
• No-SQL :The term NoSQL was first coined by Carlo Strozzi in 1998 to
name his light weight, open-source, non-relational database that did
not expose the standard SQL interface.
• A NoSQL originally referring to non SQL or non relational is a
database that provides a mechanism for storage and retrieval of data.
• This data is modeled in means other than the tabular relations used
in relational databases.
• NoSQL databases are used in real-time web applications and big data
and their use are increasing over time.
• NoSQL systems are also sometimes called Not only SQL to emphasize
the fact that they may support SQL-like query languages.
21. The Big Data Technology Landscape: No-
SQL(Not Only SQL)
• A NoSQL database includes simplicity of design, simpler horizontal
scaling to clusters of machines and finer control over availability.
• The data structures used by NoSQL databases are different from
those used by default in relational databases which makes some
operations faster in NoSQL.
• The suitability of a given NoSQL database depends on the problem it
should solve.
• Data structures used by NoSQL databases are sometimes also viewed
as more flexible than relational database tables.
22. The Big Data Technology Landscape: No-
SQL(Not Only SQL)
• The concept of NoSQL databases became popular with Internet
giants like Google, Facebook, Amazon, etc. who deal with huge
volumes of data.
• The system response time becomes slow when you use RDBMS for
massive volumes of data.
• To resolve this problem, we could “scale up” our systems by
upgrading our existing hardware. This process is expensive.
• The alternative for this issue is to distribute database load on
multiple hosts whenever the load increases. This method is known as
“scaling out.”
23. The Big Data Technology Landscape: No-
SQL(Not Only SQL)
NoSQL database is non-relational, so it scales out better than relational
databases as they are designed with web applications in mind.
24. Advantages of NoSQL
1. Can easily scale up and down: NoSQL database supports scaling
rapidly and elastically and allows to scale to the cloud.
• Cluster scale: It allows distribution of database across 100+ nodes
often in multiple data centers,
• Performance scale: It sustains over 100,000+ database reads and
writes per second.
• Data scale: It supports housing of 1 billion+ documents in the
database,
2. Doesn't require a pre-defined schema: NoSQL does not require any
adherence to pre-defined schema
25. Advantages of NoSQL
3. It is pretty flexible. For example, if we look at MongoDB, the
documents in a collection can have different sets of key-value pairs.
4. Cheap, easy to implement: Deploying NoSQL properly allows for all
of the benefits : High availability, fault tolerance, etc, while also
lowering operational costs.
25
26. Types of NoSQL Databases
• Key-value Pair Based
• Column-oriented
• Graph based
• Document-oriented
27. Types of NoSQL Databases
Key Value Pair Based
• Data is stored in key/value pairs. It is designed in such a way to
handle lots of data and heavy load.
• Key-value pair storage databases store data as a hash table where
each key is unique, and the value can be a JSON, BLOB(Binary Large
Objects), string, etc.
• It is one of the most basic NoSQL database example. This kind of
NoSQL database is used as a collection, dictionaries, associative
arrays, etc. Key value stores help the developer to store schema-less
data.
• They work best for shopping cart contents.
• Redis, Dynamo, Riak are some NoSQL examples of key-value store
DataBases. They are all based on Amazon’s DynamoDB paper.
29. Column-based
• Column-oriented databases work on
columns and are based on BigTable paper
by Google.
• Every column is treated separately. Values
of single column databases are stored
contiguously.
• They deliver high performance on
aggregation queries like SUM, COUNT, AVG,
MIN etc. as the data is readily available in a
column.
• Column-based NoSQL databases are
widely used to manage data
warehouses, business intelligence, CRM,
Library card catalogs,
• HBase, Cassandra, HBase, Hypertable are
NoSQL query examples of column based
database.
Types of NoSQL Databases
30. Document-Oriented:
• Document-Oriented NoSQL DB stores and retrieves data as a key
value pair but the value part is stored as a document.
• The document is stored in JSON or XML formats.
• The value is understood by the DB and can be queried.
Types of NoSQL Databases
31. Types of NoSQL Databases
• In this diagram our left we can see we have rows and columns, and in
the right, we have a document database which has a similar structure
to JSON.
• Now for the relational database, we have to know what columns we
have and so on.
• However, for a document database, we have data store like JSON
object. We do not require to define which make it flexible.
• The document type is mostly used for CMS systems, blogging
platforms, real-time analytics & e-commerce applications.
• It should not use for complex transactions which require multiple
operations or queries against varying aggregate structures.
• Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes,
MongoDB, are popular Document originated DBMS systems.
Document-Oriented
32. Graph-Based
• A graph type database stores entities as well the relations amongst those
entities.
• The entity is stored as a node with the relationship as edges.
• An edge gives a relationship between nodes.
• Every node and edge has a unique identifier.
• Compared to a relational database where tables are loosely connected, a
Graph database is a multi-relational in nature.
• Traversing relationship is fast as they are already captured into the DB,
and there is no need to calculate them.
• Graph base database mostly used for social networks, logistics, spatial
data.
• Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based
databases.
Types of NoSQL Databases