SlideShare a Scribd company logo
1 of 11
Presentation On
Big Data Open Source
Technologies
Presented By:
Neeraj Rathore
What is Big Data?
• Big Data refers to the large
amounts of data pouring in
from various data sources &
has different
formats(structured, semi-
structured & unstructured)
• Because of the varied nature of
this Data, the traditional
relational database systems
are incapable of handling this
data.
What are Big Data Technologies & Why
these are needed?
• It can be defined as a Software-Utility that is designed to
Analyse , Process and Extract the information from an
extremely complex and large data sets which the
Traditional Data Processing Software could never deal
with.
• We need big Data Processing Technologies to Analyse this
huge amount of Real-time data and come up with
Conclusions and Predictions to reduce the risks in the
future.
Top Big Data Technologies
Top big data technologies are divided into four fields based on their usage:
• Data Storage : Big data storage is a storage infrastructure that is designed
specifically to store , manage & retrieve massive amounts of data or big
data. It enable quick processing & retrieval of big quantities of data.
• Data Analytics: Data analytics is the process of inspecting , cleansing ,
transforming & modelling data with the goal of discovering useful
information ,informing conclusions & supporting decision making.
• Data Mining: Data mining involves exploring & analyzing large amounts of
data to find patterns for big data. The goal of data mining is either
classification or prediction.
• Data Visualisation : Data Visualisation is the practice of translating
information into a visual context , such as a map or graph, to make data
easier for human brain to understand.
Open Source Big Data Technologies for
Storage & Management
• Apache Hadoop:
• The Apache Hadoop software
library is a big data framework .
HDFS is used for storing data.
• It allows distributed processing of
data sets across clusters of
computers.
• Developed by: Apache Software
Foundation in the year 2011 on 10
Dec.
• Written in: JAVA
• Companies using it: Microsoft,
IBM, Intel, MAPR, cloudera,
Hortonworks etc.
• Cassandra:
• Apache Cassandra database
provides an effective management
of large amounts of data.
• Supports replication of data
across multiple data centers for
scalability.
• Offers very good fault tolerance
and low latency.
• Devloped by: Apache Software
Foundation in the year 2008 in
july.
• Written in: JAVA
• Companies using it: Netflix ,
Walmart , Uber , McDonalds etc.
• Mongo DB:
• Mongo DB is an open source No
SQL database which is cross-
platform compatible with many
built-in features.
• Developed by: Mongo DB in the
year 2009 on 11 Feb.
• Written in: C++ , Go , JavaScript ,
Python
• Apache Hbase:
• Apache HBase is a popular &
highly efficient Column-oriented
Nosql database built on top of
HDFS that allows performing
read/write operations on large
datasets in real time using
key/Value data.
• Developed by: Apache Software
Foundation in the year 2008 on
28 March.
• Written in: JAVA
Open Source Big Data Technologies For
Data Analytics
• Apache Spark
• Open source big data tool which
fills the gaps of Apache Hadoop
concerning data processing.
• Spark can handle both batch data
& real-time data.
• As Spark does in- memory data
processing, it processes data much
faster than traditional disk
processing.
• Developed by:Apache Software
Foundation
• Written in: JAVA, Scala, Python ,
R
• Apache Hive:
• It allows programmers analyze
large data sets on Hadoop
• It helps with quering and
managing large datasets real fast
• Developed by: Apache Software
Foundation in year 2010 on 1 oct.
• Written in: JAVA
• Hadoop MapReduce:
• Programming model or pattern
used to access big data stored in
the Hadoop File System(HDFS)
• Facilitates processing by splitting
petabytes of data into smaller
chunks
• The logic is executed on the server
where the data already resides
which makes the process quicker.
• Apache kafka:
• Distributed streaming platform.
• It aims to provide a unified , high
throughput , low-latency platform
for handling real-time data feeds.
• Developed by: Apache Software
Foundation in the year 2011
• Written in: Scala, JAVA
Open Source Big data Technologies for
Data Mining
• Presto:
• Open Source Distributed SQL
Query Engine for running analytic
queries against data sources of all
sizes ranging from gigabytes to
petabytes.
• Developed by: Apache Foundation
in the year 2013.
• Written in: JAVA
• Elastic Search:
• Based on Lucene library.
• It provides a distributed ,
multiTenant-capable , full-text
search engine with an HTTP web
interface and schema –free JSON
documents.
• Developed by: Elastic NV in the
year 2012
• Written in: JAVA
Open Source Technologies for Data
Visualisation
• Candela:
• Candala is a data visualisation
package made available through
the Resonant platform.
• It separates itself from other tools
by providing a full suite of data
visualisation tools.
• Charted:
• An open-source tool that
automatically visualizes data.
• Charted is perhaps one of the
easiest data visualisation tools
around, as it simply requires a
link to a .csv file or a google sheets
location; hit Go and charted
creates a visual display using a bar
or line chart.
• Developed by: Product Science
Team in the year 2013
Thank you

More Related Content

What's hot

Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & ApplicationsFazle Rabbi Ador
 
Big Data
Big DataBig Data
Big DataNGDATA
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEyad Manna
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics toolsNascenia IT
 
Introduction to data mining technique
Introduction to data mining techniqueIntroduction to data mining technique
Introduction to data mining techniquePawneshwar Datt Rai
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL DatabasesRajith Pemabandu
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 

What's hot (20)

Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
 
Big Data
Big DataBig Data
Big Data
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics tools
 
Introduction to data mining technique
Introduction to data mining techniqueIntroduction to data mining technique
Introduction to data mining technique
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL Databases
 
Big Data
Big DataBig Data
Big Data
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 

Similar to Big Data Open Source Tech Presentation

Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdataTom Rogers
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdfRAHULRAHU8
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxVanshGupta597842
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopArchana Gopinath
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 
hive architecture and hive components in detail
hive architecture and hive components in detailhive architecture and hive components in detail
hive architecture and hive components in detailHariKumar544765
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Dataconomy Media
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 

Similar to Big Data Open Source Tech Presentation (20)

Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptx
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data
Big DataBig Data
Big Data
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
hive architecture and hive components in detail
hive architecture and hive components in detailhive architecture and hive components in detail
hive architecture and hive components in detail
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Data analytics
Data analyticsData analytics
Data analytics
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 

Recently uploaded

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 

Recently uploaded (20)

Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 

Big Data Open Source Tech Presentation

  • 1. Presentation On Big Data Open Source Technologies Presented By: Neeraj Rathore
  • 2. What is Big Data? • Big Data refers to the large amounts of data pouring in from various data sources & has different formats(structured, semi- structured & unstructured) • Because of the varied nature of this Data, the traditional relational database systems are incapable of handling this data.
  • 3. What are Big Data Technologies & Why these are needed? • It can be defined as a Software-Utility that is designed to Analyse , Process and Extract the information from an extremely complex and large data sets which the Traditional Data Processing Software could never deal with. • We need big Data Processing Technologies to Analyse this huge amount of Real-time data and come up with Conclusions and Predictions to reduce the risks in the future.
  • 4. Top Big Data Technologies Top big data technologies are divided into four fields based on their usage: • Data Storage : Big data storage is a storage infrastructure that is designed specifically to store , manage & retrieve massive amounts of data or big data. It enable quick processing & retrieval of big quantities of data. • Data Analytics: Data analytics is the process of inspecting , cleansing , transforming & modelling data with the goal of discovering useful information ,informing conclusions & supporting decision making. • Data Mining: Data mining involves exploring & analyzing large amounts of data to find patterns for big data. The goal of data mining is either classification or prediction. • Data Visualisation : Data Visualisation is the practice of translating information into a visual context , such as a map or graph, to make data easier for human brain to understand.
  • 5. Open Source Big Data Technologies for Storage & Management • Apache Hadoop: • The Apache Hadoop software library is a big data framework . HDFS is used for storing data. • It allows distributed processing of data sets across clusters of computers. • Developed by: Apache Software Foundation in the year 2011 on 10 Dec. • Written in: JAVA • Companies using it: Microsoft, IBM, Intel, MAPR, cloudera, Hortonworks etc. • Cassandra: • Apache Cassandra database provides an effective management of large amounts of data. • Supports replication of data across multiple data centers for scalability. • Offers very good fault tolerance and low latency. • Devloped by: Apache Software Foundation in the year 2008 in july. • Written in: JAVA • Companies using it: Netflix , Walmart , Uber , McDonalds etc.
  • 6. • Mongo DB: • Mongo DB is an open source No SQL database which is cross- platform compatible with many built-in features. • Developed by: Mongo DB in the year 2009 on 11 Feb. • Written in: C++ , Go , JavaScript , Python • Apache Hbase: • Apache HBase is a popular & highly efficient Column-oriented Nosql database built on top of HDFS that allows performing read/write operations on large datasets in real time using key/Value data. • Developed by: Apache Software Foundation in the year 2008 on 28 March. • Written in: JAVA
  • 7. Open Source Big Data Technologies For Data Analytics • Apache Spark • Open source big data tool which fills the gaps of Apache Hadoop concerning data processing. • Spark can handle both batch data & real-time data. • As Spark does in- memory data processing, it processes data much faster than traditional disk processing. • Developed by:Apache Software Foundation • Written in: JAVA, Scala, Python , R • Apache Hive: • It allows programmers analyze large data sets on Hadoop • It helps with quering and managing large datasets real fast • Developed by: Apache Software Foundation in year 2010 on 1 oct. • Written in: JAVA
  • 8. • Hadoop MapReduce: • Programming model or pattern used to access big data stored in the Hadoop File System(HDFS) • Facilitates processing by splitting petabytes of data into smaller chunks • The logic is executed on the server where the data already resides which makes the process quicker. • Apache kafka: • Distributed streaming platform. • It aims to provide a unified , high throughput , low-latency platform for handling real-time data feeds. • Developed by: Apache Software Foundation in the year 2011 • Written in: Scala, JAVA
  • 9. Open Source Big data Technologies for Data Mining • Presto: • Open Source Distributed SQL Query Engine for running analytic queries against data sources of all sizes ranging from gigabytes to petabytes. • Developed by: Apache Foundation in the year 2013. • Written in: JAVA • Elastic Search: • Based on Lucene library. • It provides a distributed , multiTenant-capable , full-text search engine with an HTTP web interface and schema –free JSON documents. • Developed by: Elastic NV in the year 2012 • Written in: JAVA
  • 10. Open Source Technologies for Data Visualisation • Candela: • Candala is a data visualisation package made available through the Resonant platform. • It separates itself from other tools by providing a full suite of data visualisation tools. • Charted: • An open-source tool that automatically visualizes data. • Charted is perhaps one of the easiest data visualisation tools around, as it simply requires a link to a .csv file or a google sheets location; hit Go and charted creates a visual display using a bar or line chart. • Developed by: Product Science Team in the year 2013