SlideShare a Scribd company logo
IN THE NAME OF GOD
BIG DATA ANALYTICS
HADOOP AND CASSANDRA
Author: Samira Riki
A airline jet collect 10 terabytes of sensor data
for every 30 minutes of flying time.
NYSE generates about one terabyte of new trade
data per day to perform stock trading analytics to
determine trends for optimal trades.
3
 Twitter has over 500 milion registered users.
 79% of US Twitter users are more likely to buy from brands
they follow.
 67% of US Twitter users are more likely to buy from brands
they follow.
 57% of all companies that use social media for business use
Twitter.
“Big Data is the frontier of a firm's ability to
store, process, and access (SPA) all the data
it needs to operate effectively, make
decisions, reduce risks, and serve
customers.”
... How big is BIG?
Let’s look at
Big Data
in a different way…
Byte
Byte : one grain of rice
Kilobyte
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
One ByteExabyte
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Zettabyte
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICE BALL! Yottabyte
HobbyistByte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICE BALL!
Desktop
HobbyistByte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICE BALL!
Desktop
Hobbyist
Internet
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICE BALL!
Desktop
Hobbyist
Internet
Big Data
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICE BALL!
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICE BALL!
Desktop
Hobbyist
The Future?
Internet
Big Data
Byte : one grain of rice
Kilobyte : cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte : 2 Container Ships
Petabyte : Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICE BALL!
Process data in parallel? -not simple 
23
 An idea: parallelism
 A problem: Parallelism is Hard
Synchronization
Deadlock
Limited bandwidth
Timing issues and co-ordination
Split and Aggregation
 Coputer are complicate
Driver failure
Data availability
Hey! We have Distributed computing!!!
Yes,we have distributed computing and it also come up with
some challenges
24
 Resource sharing
 Concurrency
 Fault tolerance
 Heterogeneity
 Transparency
To address most of these challenges(but not all) Hadoop
come in.
Hadoop origin
25
• An Elephant can’t jump.But can carry heavy load!!!
• Apache Haddop is a framework that allows for the distributed
processing of large data sets across clusters of commodity
computers using a simple programming model.it is designed to scale
up from single servers to thousands of machines,each providing
computation and storage.
• Hadoop is an open-source implementation of Google
MapReduce,GFS(distributed file system).
• Hadoop was created by Doug Cutting the creator of Apache
Lucene,the widely used text search library.
Hadoop Architecture
26
Hadoop designed and built on two independent frame works.
Hadoop= HDFS + Map reduce
HDFS(Storage and File system):HDFS is a reliable distributed file system
that provides high-throughput access to data.
MapReduce(processing):MapReduce is a framework for performing high
performance distributed data processing using the divide and aggregate
programming paradigm.
Hadoop has a master/slave architecture for both storage and
processing.
Hadoop Master and Slave Architecture
27
The components of HDFS are
 Name Node
 Data Node
 Secondary Name Node
28
29
30
The components of MapRedeuce are:
 Job Tracker
 Task Trackers
Who uses Hadoop?
31
 Amazon/A9
 Facebook
 Google
 IBM
 Joost
 Last.fm
 New York Times
 PowerSet
 Yahoo!
 Twitter
 LinkedIn
Cassandra
32
• Apache Cassandra is an open source distributed database
management system designed to handle large amounts of data
across many commodity servers, providing high availability with no
single point of failure. Cassandra offers robust support for clusters
spanning multiple datacenters.
Main features
33
 Cassandra places a high value on performance.
In 2012, University of Toronto researchers studying NoSQL systems
concluded that "In terms of scalability, there is a clear winner
throughout our experiments.
 Decentralized
 Supports replication and multi data center replication
 Scalability
 Fault-tolerant
 Query language
 MapReduce support
The data model
34
New use cases
35
• Geographic data
• Weather data
• Rfid
• Travel schedules
• Hotel reservation
Big Data isn’t big,
if you know how to
use it.
References
37
1.Big data:the next frontier for innovation,competition
and productivity-McKinsy&company
2. Big Data Meets Big Data Analytics-SAS Company
3. Big data tutorial-Marko Grobelnik
4. Big Data Spectrum
38
Q?

More Related Content

What's hot

Big Data
Big DataBig Data
Big Data
Vinayak Kamath
 
Big Data
Big DataBig Data
Big Data
Seminar Links
 
Big data ppt
Big data pptBig data ppt
Big data ppt
IDBI Bank Ltd.
 
Big data
Big dataBig data
Importance of Big data for your Business
Importance of Big data for your BusinessImportance of Big data for your Business
Importance of Big data for your Business
azuyo.com
 
Big data
Big dataBig data
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Hritika Raj
 
What is big data?
What is big data?What is big data?
What is big data?
David Wellman
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Edureka!
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Haluan Irsad
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
Rohit Dubey
 
Slides: Data Monetization — Demonstrating Quantifiable Financial Benefits fro...
Slides: Data Monetization — Demonstrating Quantifiable Financial Benefits fro...Slides: Data Monetization — Demonstrating Quantifiable Financial Benefits fro...
Slides: Data Monetization — Demonstrating Quantifiable Financial Benefits fro...
DATAVERSITY
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
Vivek Gautam
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
Mithlesh Sadh
 
Data Driven Decisions: Building an Insight Driven Culture
Data Driven Decisions: Building an Insight Driven CultureData Driven Decisions: Building an Insight Driven Culture
Data Driven Decisions: Building an Insight Driven Culture
Amazon Web Services
 
Big Data
Big DataBig Data
Big Data
Big DataBig Data
Big Data
Rohit Jain
 
Big data ppt
Big data pptBig data ppt
Big data ppt
pranay adimalla
 
Big data & Digital Marketing
Big data & Digital MarketingBig data & Digital Marketing
Big data & Digital Marketing
Karthik Bharath
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
Sadhana Singh
 

What's hot (20)

Big Data
Big DataBig Data
Big Data
 
Big Data
Big DataBig Data
Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
Importance of Big data for your Business
Importance of Big data for your BusinessImportance of Big data for your Business
Importance of Big data for your Business
 
Big data
Big dataBig data
Big data
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Slides: Data Monetization — Demonstrating Quantifiable Financial Benefits fro...
Slides: Data Monetization — Demonstrating Quantifiable Financial Benefits fro...Slides: Data Monetization — Demonstrating Quantifiable Financial Benefits fro...
Slides: Data Monetization — Demonstrating Quantifiable Financial Benefits fro...
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Data Driven Decisions: Building an Insight Driven Culture
Data Driven Decisions: Building an Insight Driven CultureData Driven Decisions: Building an Insight Driven Culture
Data Driven Decisions: Building an Insight Driven Culture
 
Big Data
Big DataBig Data
Big Data
 
Big Data
Big DataBig Data
Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data & Digital Marketing
Big data & Digital MarketingBig data & Digital Marketing
Big data & Digital Marketing
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 

Viewers also liked

Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
GERARDO BARBERENA
 
Webanalytics with haddop and Hive
Webanalytics with haddop and HiveWebanalytics with haddop and Hive
Webanalytics with haddop and Hive
Le Kien Truc
 
Cryptography
CryptographyCryptography
Cryptography
Niharjyoti Sarangi
 
A study of cryptography for satellite applications
A study of cryptography for satellite applicationsA study of cryptography for satellite applications
A study of cryptography for satellite applications
Rajesh Ishida
 
cryptography ppt free download
cryptography ppt free downloadcryptography ppt free download
cryptography ppt free download
Twinkal Harsora
 
Cryptography
CryptographyCryptography
Cryptography
Deepak Kumar
 
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPTBIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
JSPM's JSCOE , Pune Maharashtra.
 
Cryptography and E-Commerce
Cryptography and E-CommerceCryptography and E-Commerce
Cryptography and E-Commerce
Hiep Luong
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
Emilio Coppa
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Uwe Printz
 
Cryptography
CryptographyCryptography
Cryptography
gueste4c97e
 
cryptography
cryptographycryptography
cryptography
Abhijeet Singh
 
Cryptography and network security
Cryptography and network securityCryptography and network security
Cryptography and network security
patisa
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Thirunavukkarasu Ps
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
Bernard Marr
 
Big Data
Big DataBig Data
Big Data
NGDATA
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
Edureka!
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
Bernard Marr
 

Viewers also liked (20)

Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Webanalytics with haddop and Hive
Webanalytics with haddop and HiveWebanalytics with haddop and Hive
Webanalytics with haddop and Hive
 
Cryptography
CryptographyCryptography
Cryptography
 
A study of cryptography for satellite applications
A study of cryptography for satellite applicationsA study of cryptography for satellite applications
A study of cryptography for satellite applications
 
cryptography ppt free download
cryptography ppt free downloadcryptography ppt free download
cryptography ppt free download
 
Cryptography
CryptographyCryptography
Cryptography
 
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPTBIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
 
Cryptography and E-Commerce
Cryptography and E-CommerceCryptography and E-Commerce
Cryptography and E-Commerce
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Cryptography
CryptographyCryptography
Cryptography
 
cryptography
cryptographycryptography
cryptography
 
Cryptography and network security
Cryptography and network securityCryptography and network security
Cryptography and network security
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
Big Data
Big DataBig Data
Big Data
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 

Similar to Big data

BIG DATA
BIG DATABIG DATA
BIG DATA
Abhishek Bhurke
 
Big data
Big data Big data
Big data
lia borsha
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Mahawar
 
Big data anuj
Big data anujBig data anuj
Big data anuj
Anuj Pandey
 
Whatisbigdata 130718170809-phpapp01
Whatisbigdata 130718170809-phpapp01Whatisbigdata 130718170809-phpapp01
Whatisbigdata 130718170809-phpapp01
Vera Kovaleva
 
What is big data
What is big dataWhat is big data
Big data
Big dataBig data
Big data
CourseHunt
 
Intro to big data and how it works
Intro to big data and how it worksIntro to big data and how it works
Intro to big data and how it works
Nadeem Tahir
 
Big data
Big dataBig data
Big data
Big data Big data
Big data
Fathima Mifra
 
20171017 3PL Machine Learning & AI in Transport & Logistics
20171017 3PL Machine Learning & AI in Transport & Logistics20171017 3PL Machine Learning & AI in Transport & Logistics
20171017 3PL Machine Learning & AI in Transport & Logistics
Frank Salliau
 
Big Data Chapter1.pdf
Big Data Chapter1.pdfBig Data Chapter1.pdf
Big Data Chapter1.pdf
SantoshUpreti6
 
Big data overview
Big data overviewBig data overview
Big data overview
Ganesan Vetriselvan
 
Big data overview
Big data overviewBig data overview
Big data overview
Ganesan Vetriselvan
 
Bigdata presentation
Bigdata presentationBigdata presentation
Bigdata presentation
SatishAlerts
 
Bigdata presentation
Bigdata presentationBigdata presentation
Bigdata presentation
SatishAlerts
 
Machine Learning & AI - A Gentle Introduction
Machine Learning & AI - A Gentle IntroductionMachine Learning & AI - A Gentle Introduction
Machine Learning & AI - A Gentle Introduction
Frank Salliau
 
Big Data: Impact on Global Health and Clinical Decision Making
Big Data: Impact on Global Health and Clinical Decision MakingBig Data: Impact on Global Health and Clinical Decision Making
Big Data: Impact on Global Health and Clinical Decision Making
Bedirhan Ustun
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
Shakir Ali
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
Shakir Ali
 

Similar to Big data (20)

BIG DATA
BIG DATABIG DATA
BIG DATA
 
Big data
Big data Big data
Big data
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Big data anuj
Big data anujBig data anuj
Big data anuj
 
Whatisbigdata 130718170809-phpapp01
Whatisbigdata 130718170809-phpapp01Whatisbigdata 130718170809-phpapp01
Whatisbigdata 130718170809-phpapp01
 
What is big data
What is big dataWhat is big data
What is big data
 
Big data
Big dataBig data
Big data
 
Intro to big data and how it works
Intro to big data and how it worksIntro to big data and how it works
Intro to big data and how it works
 
Big data
Big dataBig data
Big data
 
Big data
Big data Big data
Big data
 
20171017 3PL Machine Learning & AI in Transport & Logistics
20171017 3PL Machine Learning & AI in Transport & Logistics20171017 3PL Machine Learning & AI in Transport & Logistics
20171017 3PL Machine Learning & AI in Transport & Logistics
 
Big Data Chapter1.pdf
Big Data Chapter1.pdfBig Data Chapter1.pdf
Big Data Chapter1.pdf
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Bigdata presentation
Bigdata presentationBigdata presentation
Bigdata presentation
 
Bigdata presentation
Bigdata presentationBigdata presentation
Bigdata presentation
 
Machine Learning & AI - A Gentle Introduction
Machine Learning & AI - A Gentle IntroductionMachine Learning & AI - A Gentle Introduction
Machine Learning & AI - A Gentle Introduction
 
Big Data: Impact on Global Health and Clinical Decision Making
Big Data: Impact on Global Health and Clinical Decision MakingBig Data: Impact on Global Health and Clinical Decision Making
Big Data: Impact on Global Health and Clinical Decision Making
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
 

Recently uploaded

Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
PauloRodrigues104553
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
PuktoonEngr
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
nooriasukmaningtyas
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 

Recently uploaded (20)

Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 

Big data

  • 1. IN THE NAME OF GOD BIG DATA ANALYTICS HADOOP AND CASSANDRA Author: Samira Riki
  • 2. A airline jet collect 10 terabytes of sensor data for every 30 minutes of flying time. NYSE generates about one terabyte of new trade data per day to perform stock trading analytics to determine trends for optimal trades.
  • 3. 3  Twitter has over 500 milion registered users.  79% of US Twitter users are more likely to buy from brands they follow.  67% of US Twitter users are more likely to buy from brands they follow.  57% of all companies that use social media for business use Twitter.
  • 4. “Big Data is the frontier of a firm's ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.”
  • 5.
  • 6. ... How big is BIG?
  • 7. Let’s look at Big Data in a different way…
  • 8. Byte Byte : one grain of rice
  • 9. Kilobyte Byte : one grain of rice Kilobyte : cup of rice
  • 10. Megabyte Byte : one grain of rice Kilobyte : cup of rice Megabyte : 8 bags of rice
  • 11. Gigabyte Byte : one grain of rice Kilobyte : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks
  • 12. Terabyte Byte : one grain of rice Kilobyte : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte : 2 Container Ships
  • 13. Petabyte Byte : one grain of rice Kilobyte : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte : 2 Container Ships Petabyte : Blankets Manhattan
  • 14. One ByteExabyte Byte : one grain of rice Kilobyte : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte : 2 Container Ships Petabyte : Blankets Manhattan Exabyte : Blankets west coast states
  • 15. Byte : one grain of rice Kilobyte : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte : 2 Container Ships Petabyte : Blankets Manhattan Exabyte : Blankets west coast states Zettabyte : Fills the Pacific Ocean Zettabyte
  • 16. Byte : one grain of rice Kilobyte : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte : 2 Container Ships Petabyte : Blankets Manhattan Exabyte : Blankets west coast states Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE RICE BALL! Yottabyte
  • 17. HobbyistByte : one grain of rice Kilobyte : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte : 2 Container Ships Petabyte : Blankets Manhattan Exabyte : Blankets west coast states Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE RICE BALL!
  • 18. Desktop HobbyistByte : one grain of rice Kilobyte : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte : 2 Container Ships Petabyte : Blankets Manhattan Exabyte : Blankets west coast states Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE RICE BALL!
  • 19. Desktop Hobbyist Internet Byte : one grain of rice Kilobyte : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte : 2 Container Ships Petabyte : Blankets Manhattan Exabyte : Blankets west coast states Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE RICE BALL!
  • 20. Desktop Hobbyist Internet Big Data Byte : one grain of rice Kilobyte : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte : 2 Container Ships Petabyte : Blankets Manhattan Exabyte : Blankets west coast states Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE RICE BALL!
  • 21. Byte : one grain of rice Kilobyte : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte : 2 Container Ships Petabyte : Blankets Manhattan Exabyte : Blankets west coast states Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE RICE BALL!
  • 22. Desktop Hobbyist The Future? Internet Big Data Byte : one grain of rice Kilobyte : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte : 2 Container Ships Petabyte : Blankets Manhattan Exabyte : Blankets west coast states Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE RICE BALL!
  • 23. Process data in parallel? -not simple  23  An idea: parallelism  A problem: Parallelism is Hard Synchronization Deadlock Limited bandwidth Timing issues and co-ordination Split and Aggregation  Coputer are complicate Driver failure Data availability Hey! We have Distributed computing!!!
  • 24. Yes,we have distributed computing and it also come up with some challenges 24  Resource sharing  Concurrency  Fault tolerance  Heterogeneity  Transparency To address most of these challenges(but not all) Hadoop come in.
  • 25. Hadoop origin 25 • An Elephant can’t jump.But can carry heavy load!!! • Apache Haddop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.it is designed to scale up from single servers to thousands of machines,each providing computation and storage. • Hadoop is an open-source implementation of Google MapReduce,GFS(distributed file system). • Hadoop was created by Doug Cutting the creator of Apache Lucene,the widely used text search library.
  • 26. Hadoop Architecture 26 Hadoop designed and built on two independent frame works. Hadoop= HDFS + Map reduce HDFS(Storage and File system):HDFS is a reliable distributed file system that provides high-throughput access to data. MapReduce(processing):MapReduce is a framework for performing high performance distributed data processing using the divide and aggregate programming paradigm. Hadoop has a master/slave architecture for both storage and processing.
  • 27. Hadoop Master and Slave Architecture 27 The components of HDFS are  Name Node  Data Node  Secondary Name Node
  • 28. 28
  • 29. 29
  • 30. 30 The components of MapRedeuce are:  Job Tracker  Task Trackers
  • 31. Who uses Hadoop? 31  Amazon/A9  Facebook  Google  IBM  Joost  Last.fm  New York Times  PowerSet  Yahoo!  Twitter  LinkedIn
  • 32. Cassandra 32 • Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters.
  • 33. Main features 33  Cassandra places a high value on performance. In 2012, University of Toronto researchers studying NoSQL systems concluded that "In terms of scalability, there is a clear winner throughout our experiments.  Decentralized  Supports replication and multi data center replication  Scalability  Fault-tolerant  Query language  MapReduce support
  • 35. New use cases 35 • Geographic data • Weather data • Rfid • Travel schedules • Hotel reservation
  • 36. Big Data isn’t big, if you know how to use it.
  • 37. References 37 1.Big data:the next frontier for innovation,competition and productivity-McKinsy&company 2. Big Data Meets Big Data Analytics-SAS Company 3. Big data tutorial-Marko Grobelnik 4. Big Data Spectrum
  • 38. 38 Q?