SlideShare a Scribd company logo
BIG

DATA
WHY NOW ?
World’s information totaled over

2 Zetabytes
That’s 2 Trillion Gigabytes

By 2020, this number will be

35 Trillion ZB
“world’s data is doubling every

1.2 years”
“80% of this data is unstructured”
5V
Money
Big Table
Google File
System
Map
Reduce
2003

2004

2005

2006

Impala

Amazon
Dremel
Dynamo
Apache
Hadoop
Apache
Cassandra
2007

2008

2009

2010

2011

2012

Spanner ?

2013

2013

Today
Analytics

Realtime

(Hadoop)

(“NoSql”)
THE ECOSYSTEM
Hadoop Ecosystem
Apache Hadoop is an open-source software
framework that supports running applications on
large clusters of commodity hardware.
Replication
Fault Tolerant
Commodity Hardware
Map Reduce
Map Reduce
Word Count
World's largest biometric identity platform

2,00,00,00,00,000

Biometric Matches

2 PB

Data

Hadoop

Stack
This is just the Beginning of
This is just the Beginning of
“Big Data Revolution”
“Big Data Revolution”
sameer.sawhney@gmail.com
@sameersaw at twitter

Images
Raymond Bryson
Marius B
IntelFreePress License
Pedro Moura Pinheiro

More Related Content

What's hot

Big Data
Big DataBig Data
Big Data
Raja Ram Dutta
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data Pipelines
Lynn Langit
 
0 to kaggle in 30 minutes
0 to kaggle in 30 minutes0 to kaggle in 30 minutes
0 to kaggle in 30 minutes
miztsai
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdata
Abinaya B
 
Hadoop
HadoopHadoop
VariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsVariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomics
Lynn Langit
 
Hadoop
HadoopHadoop
Industry trends.v0.1pptx
Industry trends.v0.1pptxIndustry trends.v0.1pptx
Industry trends.v0.1pptx
Arindam Banerji
 
Biq query devfest2017_slides
Biq query devfest2017_slidesBiq query devfest2017_slides
Biq query devfest2017_slides
getdinesh
 
See the forest AND the trees: Free tools for data visualisation
See the forest AND the trees: Free tools for data visualisationSee the forest AND the trees: Free tools for data visualisation
See the forest AND the trees: Free tools for data visualisation
Paul Rowe
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
sarith divakar
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
Lynn Langit
 
Opportunities for Genetic Programming Researchers in Blockchain
Opportunities for Genetic Programming Researchers in BlockchainOpportunities for Genetic Programming Researchers in Blockchain
Opportunities for Genetic Programming Researchers in Blockchain
Trent McConaghy
 
The Industry 4.0 revolution
The Industry 4.0 revolutionThe Industry 4.0 revolution
The Industry 4.0 revolution
Kwanwoo Park
 
Easylearning Guru online Hadoop class
Easylearning Guru online Hadoop class Easylearning Guru online Hadoop class
Easylearning Guru online Hadoop class
KCC Software Ltd. & Easylearning.guru
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
Alex Liu
 
big data and hadoop
big data and hadoopbig data and hadoop
big data and hadoop
Shamama Kamal
 
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Amazon Web Services
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
Arohi Khandelwal
 
Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's Perspective
Hien Luu
 

What's hot (20)

Big Data
Big DataBig Data
Big Data
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data Pipelines
 
0 to kaggle in 30 minutes
0 to kaggle in 30 minutes0 to kaggle in 30 minutes
0 to kaggle in 30 minutes
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdata
 
Hadoop
HadoopHadoop
Hadoop
 
VariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsVariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomics
 
Hadoop
HadoopHadoop
Hadoop
 
Industry trends.v0.1pptx
Industry trends.v0.1pptxIndustry trends.v0.1pptx
Industry trends.v0.1pptx
 
Biq query devfest2017_slides
Biq query devfest2017_slidesBiq query devfest2017_slides
Biq query devfest2017_slides
 
See the forest AND the trees: Free tools for data visualisation
See the forest AND the trees: Free tools for data visualisationSee the forest AND the trees: Free tools for data visualisation
See the forest AND the trees: Free tools for data visualisation
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
 
Opportunities for Genetic Programming Researchers in Blockchain
Opportunities for Genetic Programming Researchers in BlockchainOpportunities for Genetic Programming Researchers in Blockchain
Opportunities for Genetic Programming Researchers in Blockchain
 
The Industry 4.0 revolution
The Industry 4.0 revolutionThe Industry 4.0 revolution
The Industry 4.0 revolution
 
Easylearning Guru online Hadoop class
Easylearning Guru online Hadoop class Easylearning Guru online Hadoop class
Easylearning Guru online Hadoop class
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
 
big data and hadoop
big data and hadoopbig data and hadoop
big data and hadoop
 
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's Perspective
 

Similar to Big Data

Addressing dm-cloud
Addressing dm-cloudAddressing dm-cloud
Addressing dm-cloud
Genoveva Vargas-Solar
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
Dharmesh Tank
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
InnoTech
 
Predictive modelling with azure ml
Predictive modelling with azure mlPredictive modelling with azure ml
Predictive modelling with azure ml
Koray Kocabas
 
The Walking Data
The Walking DataThe Walking Data
The Walking Data
JESS3
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
gauravsc36
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
Sitaram Kotnis
 
State of Big Data Markets
State of Big Data MarketsState of Big Data Markets
State of Big Data Markets
Kyle Redinger
 
Big Data Basic Concepts | Presented in 2014
Big Data Basic Concepts  | Presented in 2014Big Data Basic Concepts  | Presented in 2014
Big Data Basic Concepts | Presented in 2014
Kenneth Igiri
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handy
Praveen Sripati
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
Eli White
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
Md Mizanur Rahman
 
Hw09 Protein Alignment
Hw09   Protein AlignmentHw09   Protein Alignment
Hw09 Protein Alignment
Cloudera, Inc.
 
Tech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataTech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big Data
Steve Watt
 
Galaxy of bits
Galaxy of bitsGalaxy of bits
Galaxy of bits
Michal Zylinski
 
Big Data Expo 2015 - Clusterpoint The Future of Big Data
Big Data Expo 2015 - Clusterpoint The Future of Big DataBig Data Expo 2015 - Clusterpoint The Future of Big Data
Big Data Expo 2015 - Clusterpoint The Future of Big Data
BigDataExpo
 
IBM Smart Camp: Philippe Souidi on Big Data
IBM Smart Camp: Philippe Souidi on Big DataIBM Smart Camp: Philippe Souidi on Big Data
IBM Smart Camp: Philippe Souidi on Big Data
Philippe Souidi
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend Story
Amazon Web Services
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
Bigdata
Bigdata Bigdata
Bigdata
Anuraj Anand
 

Similar to Big Data (20)

Addressing dm-cloud
Addressing dm-cloudAddressing dm-cloud
Addressing dm-cloud
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Predictive modelling with azure ml
Predictive modelling with azure mlPredictive modelling with azure ml
Predictive modelling with azure ml
 
The Walking Data
The Walking DataThe Walking Data
The Walking Data
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
State of Big Data Markets
State of Big Data MarketsState of Big Data Markets
State of Big Data Markets
 
Big Data Basic Concepts | Presented in 2014
Big Data Basic Concepts  | Presented in 2014Big Data Basic Concepts  | Presented in 2014
Big Data Basic Concepts | Presented in 2014
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handy
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
 
Hw09 Protein Alignment
Hw09   Protein AlignmentHw09   Protein Alignment
Hw09 Protein Alignment
 
Tech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataTech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big Data
 
Galaxy of bits
Galaxy of bitsGalaxy of bits
Galaxy of bits
 
Big Data Expo 2015 - Clusterpoint The Future of Big Data
Big Data Expo 2015 - Clusterpoint The Future of Big DataBig Data Expo 2015 - Clusterpoint The Future of Big Data
Big Data Expo 2015 - Clusterpoint The Future of Big Data
 
IBM Smart Camp: Philippe Souidi on Big Data
IBM Smart Camp: Philippe Souidi on Big DataIBM Smart Camp: Philippe Souidi on Big Data
IBM Smart Camp: Philippe Souidi on Big Data
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend Story
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Bigdata
Bigdata Bigdata
Bigdata
 

Recently uploaded

High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 

Recently uploaded (20)

High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 

Big Data

Editor's Notes

  1. We all live in Data Age ….. While data storage capacity has increased, the speed at which data is read is still very slow.. Amount of data that is publicly available is increasing at a very past pace..Big data[1][2] is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications
  2. Collosal amount of data is being generated, and this has changed things..
  3. In good old days, we were using RDMS to store and process this data…we used to bring data to processing units but now data is huge…2 technologies have made this possible..
  4. Characteristics of Big Data :Gartner defined 3 Vs : Volume, Velocity and varietyVeracity: Can this data be trusted ?Volume : Peta/Exa not TB ? Twitter alone around 7 TB of data every day, Facebook 10 TB, Google 20 PB every dayIn 2013: 200 million active users creating over 400 million Tweets each day.In 2011: Every day 200 million tweets, 10 million page book , reading this text will take 31 years In 2010: 65 million a dayIn 2009: 2 million tweets a dayVariety : Different sourcesValue : Meaningful
  5. What is the problem that solution solves?Technology overviewSpecific solutionChallenges in current implementation/solution if any?Advantages and DisadvantagesAny alternatives of the specific solutionWay forward for the technology/solution?(Optional)
  6. In defining big data, it’s also important to understand the mix of unstructured and multi-structured data that comprises the volume of information.Unstructured data comes from information that is not organized or easily interpreted by traditional databases or data models, and typically, it’s text-heavy. Metadata, Twitter tweets, and other social media posts are good examples of unstructured data.Multi-structured data refers to a variety of data formats and types and can be derived from interactions between people and machines, such as web applications or social networks. A great example is web log data, which includes a combination of text and visual images along with structured data like form or transactional information. 
  7. Characteristics of Big Data :Gartner defined 3 Vs : Volume, Velocity and varietyVeracity: Can this data be trusted ?Volume : Peta/Exa not TB ? Twitter alone around 7 TB of data every day, Facebook 10 TB, Google 20 PB every dayIn 2013: 200 million active users creating over 400 million Tweets each day.In 2011: Every day 200 million tweets, 10 million page book , reading this text will take 31 years In 2010: 65 million a dayIn 2009: 2 million tweets a dayVariety : Different sourcesValue : Meaningful
  8. Characteristics of Big Data :Gartner defined 3 Vs : Volume, Velocity and varietyVeracity: Can this data be trusted ?Volume : Peta/Exa not TB ? Twitter alone around 7 TB of data every day, Facebook 10 TB, Google 20 PB every dayIn 2013: 200 million active users creating over 400 million Tweets each day.In 2011: Every day 200 million tweets, 10 million page book , reading this text will take 31 years In 2010: 65 million a dayIn 2009: 2 million tweets a dayVariety : Different sourcesValue : Meaningful
  9. Characteristics of Big Data :Gartner defined 3 Vs : Volume, Velocity and varietyVeracity: Can this data be trusted ?Volume : Peta/Exa not TB ? Twitter alone around 7 TB of data every day, Facebook 10 TB, Google 20 PB every dayIn 2013: 200 million active users creating over 400 million Tweets each day.In 2011: Every day 200 million tweets, 10 million page book , reading this text will take 31 years In 2010: 65 million a dayIn 2009: 2 million tweets a dayVariety : Different sourcesValue : Meaningful
  10. Data Source : Data Repository (data persists) : Filter and Transform : Compute (Distributed Scale out system)Map Reduce is inevitable.1980: Impedance Mismatch problem : Row/Columns for Relational Databases Integration Mechanism ( Relational Dominance into the 2000s)1990: Object databases : 2000: Big Internet sites, Amazon , Google ( Traffic) Lots of trafficBigger boxes : Real limits, CostLot of little boxes, SQL was designed on single node system.Google: Big TableAmazon: DynamoNoSQL movement: term comes from Johan Oskarsson : san francisco --- London , proposed meetup (late 2000), twitter hashtag,Short unique, #nosql, (Twitter hashtag to advertise a single meeting)Data Model:1. Key-Value: 2. Document Data model : JSON ( No schema), portions of documents, 3. Column Family : Single Row key having multiple column families, where each column family is aggregate of columsn which fit together.Aggregate is about storing all related items in 1cluster.
  11. 1980: Impedance Mismatch problem : Row/Columns for Relational Databases Integration Mechanism ( Relational Dominance into the 2000s)1990: Object databases : 2000: Big Internet sites, Amazon , Google ( Traffic) Lots of trafficBigger boxes : Real limits, CostLot of little boxes, SQL was designed on single node system.Google: Big TableAmazon: DynamoNoSQL movement: term comes from Johan Oskarsson : san francisco --- London , proposed meetup (late 2000), twitter hashtag,Short unique, #nosql, (Twitter hashtag to advertise a single meeting)Data Model:1. Key-Value: 2. Document Data model : JSON ( No schema), portions of documents, 3. Column Family : Single Row key having multiple column families, where each column family is aggregate of columsn which fit together.Aggregate is about storing all related items in 1cluster.
  12. HadoopMapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.
  13. The Mapper implementation (lines 14-26), via the map method (lines 18-25), processes one line at a time, as provided by the specified TextInputFormat (line 49). It then splits the line into tokens separated by whitespaces, via the StringTokenizer, and emits a key-value pair of < <word>, 1>.The Reducer implementation (lines 28-36), via the reduce method (lines 29-35) just sums up the values, which are the occurence counts for each key (i.e. words in this example).
  14. 1980: Impedance Mismatch problem : Row/Columns for Relational Databases Integration Mechanism ( Relational Dominance into the 2000s)1990: Object databases : 2000: Big Internet sites, Amazon , Google ( Traffic) Lots of trafficBigger boxes : Real limits, CostLot of little boxes, SQL was designed on single node system.Google: Big TableAmazon: DynamoNoSQL movement: term comes from Johan Oskarsson : san francisco --- London , proposed meetup (late 2000), twitter hashtag,Short unique, #nosql, (Twitter hashtag to advertise a single meeting)Data Model:1. Key-Value: 2. Document Data model : JSON ( No schema), portions of documents, 3. Column Family : Single Row key having multiple column families, where each column family is aggregate of columsn which fit together.Aggregate is about storing all related items in 1cluster.