SlideShare a Scribd company logo
Big Data Technologies
Presented By:
Pulkit Narwal
What are Big Data Technologies?
• Actually, Big Data Technologies is the utilized software that
incorporates data mining, data storage, data sharing, and data
visualization, the comprehensive term embraces data, data
framework including tools and techniques used to investigate and
transform data.
• Big Data Technology can be defined as a Software-Utility that is
designed to Analyse, Process and Extract the information from an
extremely complex and large data sets which the Traditional Data
Processing Software could never deal with.
Types of Big Data Technologies:
• Big Data Technology is mainly classified into two types:
1. Operational Big Data Technologies
2. Analytical Big Data Technologies
Operational Big Data Technologies
• The Operational Big Data is all about the normal day to day data that we
generate. This could be the Online Transactions, Social Media, or the data
from a Particular Organisation etc. You can even consider this to be a kind
of Raw Data which is used to feed the Analytical Big Data Technologies.
• A few examples of Operational Big Data Technologies are as follows:
Online ticket bookings, which includes your Rail tickets, Flight tickets, movie tickets
etc.
Online shopping which is your Amazon, Flipkart, Walmart, Snap deal and many more.
Data from social media sites like Facebook, Instagram, what’s app and a lot more.
The employee details of any Multinational Company.
Analytical Big Data Technologies
• Analytical Big Data is like the advanced version of Big Data
Technologies. It is a little complex than the Operational Big Data. In
short, Analytical big data is where the actual performance part comes
into the picture and the crucial real-time business decisions are made
by analyzing the Operational Big Data.
• Few examples of Analytical Big Data Technologies are as follows:
Stock marketing
Carrying out the Space missions where every single bit of information is
crucial.
Weather forecast information.
Medical fields where a particular patients health status can be monitored.
Types of Big Data Technologies
• Top big data technologies are divided into 4 fields which are classified
as follows:
1. Data Storage
2. Data Mining
3. Data Analytics
4. Data Visualization
Source: javatpoint
Data Storage
1. Hadoop Framework was designed to store and process data in
a Distributed Data Processing Environment with commodity
hardware with a simple programming model. It can Store and
Analyse the data present in different machines with High Speeds
and Low Costs.
• Developed by: Apache Software Foundation in the year 2011 10th of Dec.
• Written in: JAVA
• Current stable version: Hadoop 3.11
Data Storage
2. MongoDB: The NoSQL Document Databases like MongoDB, offer a
direct alternative to the rigid schema used in Relational Databases.
This allows MongoDB to offer Flexibility while handling a wide
variety of Datatypes at large volumes and across Distributed
Architectures.
• Developed by: MongoDB in the year 2009 11th of Feb
• Written in: C++, Go, JavaScript, Python
• Current stable version: MongoDB 4.0.10
Data Storage
3. RainStor is a software company that developed a Database
Management System of the same name designed to Manage and
Analyse Big Data for large enterprises. It uses Deduplication
Techniques to organize the process of storing large amounts of data
for reference.
• Developed by: RainStor Software company in the year 2004.
• Works like: SQL
• Current stable version: RainStor 5.5
Data Storage
4. Hunk lets you access data in remote Hadoop Clusters through
virtual indexes and lets you use the Splunk Search Processing
Language to analyse your data. With Hunk, you can Report and
Visualize large amounts from your Hadoop and NoSQL data sources.
• Developed by: Splunk INC in the year 2013.
• Written in: JAVA
• Current stable version: Splunk Hunk 6.2
Data Storage
5. Cassandra: Cassandra forms a top choice among the list of popular
NoSQL databases which is a free and an open-source database,
which is distributed and has a wide columnar storage and can
efficiently handle data on large commodity clusters i.e. it is used to
provide high availability along with no single failure point.
• Among the list of main features includes the ones like distributed nature,
scalability, fault-tolerant mechanism, MapReduce support, tunable
consistency, query language property, supports multi data center replication
and eventual consistency.
Data Mining
1. Presto is an open source Distributed SQL Query Engine for
running Interactive Analytic Queries against data sources of all
sizes ranging from Gigabytes to Petabytes. Presto allows querying
data in Hive, Cassandra, Relational Databases and Proprietary Data
Stores.
• Developed by: Apache Foundation in the year 2013.
• Written in: JAVA
• Current stable version: Presto 0.22
Data Mining
2. RapidMiner is a Centralized solution that features a very powerful
and robust Graphical User Interface that enables users to Create,
Deliver, and maintain Predictive Analytics. It allows creating very
Advanced Workflows, Scripting support in several languages.
• Developed by: RapidMiner in the year 2001
• Written in: JAVA
• Current stable version: RapidMiner 9.2
Data Mining
3. Elasticsearch is a Search Engine based on the Lucene Library. It
provides a Distributed, MultiTenant-capable, Full-Text Search Engine
with an HTTP Web Interface and Schema-free JSON documents.
• Developed by: Elastic NV in the year 2012.
• Written in: JAVA
• Current stable version: ElasticSearch 7.1
Data Analytics
1. Apache Kafka is a Distributed Streaming platform. A streaming
platform has Three Key Capabilities that are as follows:
i. Publisher
ii. Subscriber
iii. Consumer
This is similar to a Message Queue or an Enterprise Messaging System.
• Developed by: Apache Software Foundation in the year 2011
• Written in: Scala, JAVA
• Current stable version: Apache Kafka 2.2.0
Data Analytics
2. Splunk captures, Indexes, and correlates Real-time data in a
Searchable Repository from which it can generate Graphs, Reports,
Alerts, Dashboards, and Data Visualizations. It is also used for
Application Management, Security and Compliance, as well as
Business and Web Analytics.
• Developed by: Splunk INC in the year 2014 6th May
• Written in: AJAX, C++, Python, XML
• Current stable version: Splunk 7.3
Data Analytics
3. KNIME allows users to visually create Data Flows, Selectively
execute some or All Analysis steps, and Inspect the Results, Models,
and Interactive views. KNIME is written in Java and based on Eclipse
and makes use of its Extension mechanism to add Plugins providing
Additional Functionality.
• Developed by: KNIME in the year 2008
• Written in: JAVA
• Current stable version: KNIME 3.7.2
Data Analytics
4. Spark provides In-Memory Computing capabilities to deliver Speed,
a Generalized Execution Model to support a wide variety of
applications, and Java, Scala, and Python APIs for ease of
development.
• Developed by: Apache Software Foundation
• Written in: Java, Scala, Python, R
• Current stable version: Apache Spark 2.4.3
Data Analytics
5. R is a Programming Language and free software environment
for Statistical Computing and Graphics. The R language is widely
used among Statisticians and Data Miners for developing Statistical
Software and majorly in Data Analysis.
• Developed by: R-Foundation in the year 2000 29th Feb
• Written in: Fortran
• Current stable version: R-3.6.0
Data Analytics
6. BlockChain is used in essential functions such as payment, escrow,
and title can also reduce fraud, increase financial privacy, speed up
transactions, and internationalize markets. BlockChain can be used
for achieving the following in a Business Network Environment:
 Shared Ledger: Here we can append the Distributed System of records across a Business
network.
 Smart Contract: Business terms are embedded in the transaction Database and Executed
with transactions.
 Privacy: Ensuring appropriate Visibility, Transactions are Secure, Authenticated and
Verifiable
 Consensus: All parties in a Business network agree to network verified transactions.
•Developed by: Bitcoin
•Written in: JavaScript, C++, Python
•Current stable version: Blockchain
4.0
Data Visualization
1. Tableau is a Powerful and Fastest growing Data Visualization tool
used in the Business Intelligence Industry. Data analysis is very fast
with Tableau and the Visualizations created are in the form of
Dashboards and Worksheets.
• Developed by: TableAU 2013 May 17th
• Written in: JAVA, C++, Python, C
• Current stable version: TableAU 8.2
Data Visualization
2. Plotly: Mainly used to make creating Graphs faster and more
efficient. API libraries for Python, R, MATLAB, Node.js,
Julia, and Arduino and a REST API. Plotly can also be used to style
Interactive Graphs with Jupyter notebook.
• Developed by: Plotly in the year 2012
• Written in: JavaScript
• Current stable version: Plotly 1.47.4

More Related Content

What's hot

Parquet overview
Parquet overviewParquet overview
Parquet overview
Julien Le Dem
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
Navdeep Charan
 
Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadataLouis liu
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Agile Testing Alliance
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
MIJIN AN
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
Jurriaan Persyn
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engine
InfluxData
 
Mongo db
Mongo dbMongo db
Mongo db
Akshay Mathur
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Jim Mlodgenski
 
Case Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureCase Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa Architecture
Joey Bolduc-Gilbert
 
Webinar: Machine Learning para Microcontroladores
Webinar: Machine Learning para MicrocontroladoresWebinar: Machine Learning para Microcontroladores
Webinar: Machine Learning para Microcontroladores
Embarcados
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
SATOSHI TAGOMORI
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
Nicola Ferraro
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
kiran palaka
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
MongoDB
 

What's hot (20)

Parquet overview
Parquet overviewParquet overview
Parquet overview
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadata
 
HBase internals
HBase internalsHBase internals
HBase internals
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engine
 
Mongo db
Mongo dbMongo db
Mongo db
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
Case Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureCase Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa Architecture
 
Webinar: Machine Learning para Microcontroladores
Webinar: Machine Learning para MicrocontroladoresWebinar: Machine Learning para Microcontroladores
Webinar: Machine Learning para Microcontroladores
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 

Similar to Big Data Technologies.pdf

Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
neeraj rathore
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Slim Baltagi
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Top 10 renowned big data companies
Top 10 renowned big data companiesTop 10 renowned big data companies
Top 10 renowned big data companies
Robert Smith
 
BD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdfBD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdf
eramfatima43
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
FredReynolds2
 
25 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 202225 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 2022
Kavika Roy
 
Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021
Mobcoder
 
Memory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective ViewMemory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective View
ijtsrd
 
Big Data
Big DataBig Data
Big Data
Kirubaburi R
 
Big Data
Big DataBig Data
Big Data
Neha Mehta
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
TarekHamdi8
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
Mohammadhasan Farazmand
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
Sitamarhi Institute of Technology
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
Nitesh Ghosh
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Sreedhar Chowdam
 
Big Data Companies and Apache Software
Big Data Companies and Apache SoftwareBig Data Companies and Apache Software
Big Data Companies and Apache Software
Bob Marcus
 

Similar to Big Data Technologies.pdf (20)

Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Top 10 renowned big data companies
Top 10 renowned big data companiesTop 10 renowned big data companies
Top 10 renowned big data companies
 
BD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdfBD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdf
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
25 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 202225 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 2022
 
Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021
 
Memory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective ViewMemory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective View
 
Big Data
Big DataBig Data
Big Data
 
Big Data
Big DataBig Data
Big Data
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Ss eb29
Ss eb29Ss eb29
Ss eb29
 
Big Data Companies and Apache Software
Big Data Companies and Apache SoftwareBig Data Companies and Apache Software
Big Data Companies and Apache Software
 

Recently uploaded

Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 

Recently uploaded (20)

Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 

Big Data Technologies.pdf

  • 2. What are Big Data Technologies? • Actually, Big Data Technologies is the utilized software that incorporates data mining, data storage, data sharing, and data visualization, the comprehensive term embraces data, data framework including tools and techniques used to investigate and transform data. • Big Data Technology can be defined as a Software-Utility that is designed to Analyse, Process and Extract the information from an extremely complex and large data sets which the Traditional Data Processing Software could never deal with.
  • 3. Types of Big Data Technologies: • Big Data Technology is mainly classified into two types: 1. Operational Big Data Technologies 2. Analytical Big Data Technologies
  • 4. Operational Big Data Technologies • The Operational Big Data is all about the normal day to day data that we generate. This could be the Online Transactions, Social Media, or the data from a Particular Organisation etc. You can even consider this to be a kind of Raw Data which is used to feed the Analytical Big Data Technologies. • A few examples of Operational Big Data Technologies are as follows: Online ticket bookings, which includes your Rail tickets, Flight tickets, movie tickets etc. Online shopping which is your Amazon, Flipkart, Walmart, Snap deal and many more. Data from social media sites like Facebook, Instagram, what’s app and a lot more. The employee details of any Multinational Company.
  • 5. Analytical Big Data Technologies • Analytical Big Data is like the advanced version of Big Data Technologies. It is a little complex than the Operational Big Data. In short, Analytical big data is where the actual performance part comes into the picture and the crucial real-time business decisions are made by analyzing the Operational Big Data. • Few examples of Analytical Big Data Technologies are as follows: Stock marketing Carrying out the Space missions where every single bit of information is crucial. Weather forecast information. Medical fields where a particular patients health status can be monitored.
  • 6. Types of Big Data Technologies • Top big data technologies are divided into 4 fields which are classified as follows: 1. Data Storage 2. Data Mining 3. Data Analytics 4. Data Visualization Source: javatpoint
  • 7. Data Storage 1. Hadoop Framework was designed to store and process data in a Distributed Data Processing Environment with commodity hardware with a simple programming model. It can Store and Analyse the data present in different machines with High Speeds and Low Costs. • Developed by: Apache Software Foundation in the year 2011 10th of Dec. • Written in: JAVA • Current stable version: Hadoop 3.11
  • 8. Data Storage 2. MongoDB: The NoSQL Document Databases like MongoDB, offer a direct alternative to the rigid schema used in Relational Databases. This allows MongoDB to offer Flexibility while handling a wide variety of Datatypes at large volumes and across Distributed Architectures. • Developed by: MongoDB in the year 2009 11th of Feb • Written in: C++, Go, JavaScript, Python • Current stable version: MongoDB 4.0.10
  • 9. Data Storage 3. RainStor is a software company that developed a Database Management System of the same name designed to Manage and Analyse Big Data for large enterprises. It uses Deduplication Techniques to organize the process of storing large amounts of data for reference. • Developed by: RainStor Software company in the year 2004. • Works like: SQL • Current stable version: RainStor 5.5
  • 10. Data Storage 4. Hunk lets you access data in remote Hadoop Clusters through virtual indexes and lets you use the Splunk Search Processing Language to analyse your data. With Hunk, you can Report and Visualize large amounts from your Hadoop and NoSQL data sources. • Developed by: Splunk INC in the year 2013. • Written in: JAVA • Current stable version: Splunk Hunk 6.2
  • 11. Data Storage 5. Cassandra: Cassandra forms a top choice among the list of popular NoSQL databases which is a free and an open-source database, which is distributed and has a wide columnar storage and can efficiently handle data on large commodity clusters i.e. it is used to provide high availability along with no single failure point. • Among the list of main features includes the ones like distributed nature, scalability, fault-tolerant mechanism, MapReduce support, tunable consistency, query language property, supports multi data center replication and eventual consistency.
  • 12. Data Mining 1. Presto is an open source Distributed SQL Query Engine for running Interactive Analytic Queries against data sources of all sizes ranging from Gigabytes to Petabytes. Presto allows querying data in Hive, Cassandra, Relational Databases and Proprietary Data Stores. • Developed by: Apache Foundation in the year 2013. • Written in: JAVA • Current stable version: Presto 0.22
  • 13. Data Mining 2. RapidMiner is a Centralized solution that features a very powerful and robust Graphical User Interface that enables users to Create, Deliver, and maintain Predictive Analytics. It allows creating very Advanced Workflows, Scripting support in several languages. • Developed by: RapidMiner in the year 2001 • Written in: JAVA • Current stable version: RapidMiner 9.2
  • 14. Data Mining 3. Elasticsearch is a Search Engine based on the Lucene Library. It provides a Distributed, MultiTenant-capable, Full-Text Search Engine with an HTTP Web Interface and Schema-free JSON documents. • Developed by: Elastic NV in the year 2012. • Written in: JAVA • Current stable version: ElasticSearch 7.1
  • 15. Data Analytics 1. Apache Kafka is a Distributed Streaming platform. A streaming platform has Three Key Capabilities that are as follows: i. Publisher ii. Subscriber iii. Consumer This is similar to a Message Queue or an Enterprise Messaging System. • Developed by: Apache Software Foundation in the year 2011 • Written in: Scala, JAVA • Current stable version: Apache Kafka 2.2.0
  • 16. Data Analytics 2. Splunk captures, Indexes, and correlates Real-time data in a Searchable Repository from which it can generate Graphs, Reports, Alerts, Dashboards, and Data Visualizations. It is also used for Application Management, Security and Compliance, as well as Business and Web Analytics. • Developed by: Splunk INC in the year 2014 6th May • Written in: AJAX, C++, Python, XML • Current stable version: Splunk 7.3
  • 17. Data Analytics 3. KNIME allows users to visually create Data Flows, Selectively execute some or All Analysis steps, and Inspect the Results, Models, and Interactive views. KNIME is written in Java and based on Eclipse and makes use of its Extension mechanism to add Plugins providing Additional Functionality. • Developed by: KNIME in the year 2008 • Written in: JAVA • Current stable version: KNIME 3.7.2
  • 18. Data Analytics 4. Spark provides In-Memory Computing capabilities to deliver Speed, a Generalized Execution Model to support a wide variety of applications, and Java, Scala, and Python APIs for ease of development. • Developed by: Apache Software Foundation • Written in: Java, Scala, Python, R • Current stable version: Apache Spark 2.4.3
  • 19. Data Analytics 5. R is a Programming Language and free software environment for Statistical Computing and Graphics. The R language is widely used among Statisticians and Data Miners for developing Statistical Software and majorly in Data Analysis. • Developed by: R-Foundation in the year 2000 29th Feb • Written in: Fortran • Current stable version: R-3.6.0
  • 20. Data Analytics 6. BlockChain is used in essential functions such as payment, escrow, and title can also reduce fraud, increase financial privacy, speed up transactions, and internationalize markets. BlockChain can be used for achieving the following in a Business Network Environment:  Shared Ledger: Here we can append the Distributed System of records across a Business network.  Smart Contract: Business terms are embedded in the transaction Database and Executed with transactions.  Privacy: Ensuring appropriate Visibility, Transactions are Secure, Authenticated and Verifiable  Consensus: All parties in a Business network agree to network verified transactions. •Developed by: Bitcoin •Written in: JavaScript, C++, Python •Current stable version: Blockchain 4.0
  • 21. Data Visualization 1. Tableau is a Powerful and Fastest growing Data Visualization tool used in the Business Intelligence Industry. Data analysis is very fast with Tableau and the Visualizations created are in the form of Dashboards and Worksheets. • Developed by: TableAU 2013 May 17th • Written in: JAVA, C++, Python, C • Current stable version: TableAU 8.2
  • 22. Data Visualization 2. Plotly: Mainly used to make creating Graphs faster and more efficient. API libraries for Python, R, MATLAB, Node.js, Julia, and Arduino and a REST API. Plotly can also be used to style Interactive Graphs with Jupyter notebook. • Developed by: Plotly in the year 2012 • Written in: JavaScript • Current stable version: Plotly 1.47.4