SlideShare a Scribd company logo
1 of 10
Big Data Technologies
Chandra Chikkareddy
Introduction
• Big Data is Data that is hard to capture, store, and analyze with
commonly used software tools due to its very large size
• “World’s nervous system—a real-time feedback loop which didn’t
exist before” - Yahoo CEO Marissa Mayer
• Mobile devices, smart energy meters, remote sensing,
wireless sensors, software machine logs, cameras, rfid
readers, etc. are creating massive amounts of data
that businesses & governments now have the
opportunity to analyze and act upon.
• Every day approx 2.5 quintillion (2.5×10^18) bytes of
data is created.
• Business and economic possibilities of big data and its
wider implications are important issues that business
leaders and policy makers will tackle in the years
ahead
Why you should care?
Industry verticals using Big Data
Digital Media & E-Commerce Real-time ad targeting, Web analytics & trends
Energy and Utilities Smart meter analytics, Asset management
Financial Services Risk and fraud management, Portfolio
management, Customer analytics
Government Threat Management, Law Enforcement (Real-time
multimodal surveillance, Cyber security detection),
Macro economic analytics
Healthcare and Life Sciences New drug development, Medical record text
analytics, Genomic analytics
Retail CRM, Targeted marketing analysis, Vendor delivery
& Supply chain optimizations, Market basket
analysis, Click-stream analysis
Telecommunications CRM, Call detail record analysis, Least cost routing,
Fraud management
Transportation Logistics optimization, Traffic congestion
Any industry vertical which accumulates a sufficient quantity of data can leverage
Big data technologies. Here are some of the verticals
Big Data landscape/technologies
Source:
http://www.forbes.com/sites/oracle/2012/12/13/billions-of-reasons-to-get-ready-for-big-data/
http://www.rosebt.com/1/post/2012/6/big-data-vendor-landscape.html
http://www.dataart.com/software-outsourcing/big-data
http://www.capgemini.com/technology-blog/2012/09/big-data-vendors-technologies/
Big Data Process/Steps
Data processing steps at a basic level can be broken into
three stages. Data as being raw indicators, information
as the meaningful interpretation of those signals, and
insight as an actionable piece of knowledge.
• Consider 10 million page views a day on a popular
web site
• Capture User id for every page view and store them as
integer
• 10 million x 4 bytes = 40 MB of storage/day
• 40MB x 30 days = 1.17 GB/month
• Data quickly grows and so does challenges around
storage, processing and analytics.
Why Web Analytics quickly leads to Big Data Science
10^7 elements
Domain of 32 –
bit integers
40MB / day
New Algorithm techniques in traditional computing
• Probabilistic Data structures
• Cardinality Estimation, Frequency Estimation, Range Query,
Membership Query etc.
Distributed computing /Divide and Conquer
• Break processing units into equal parts, get individual results, and
aggregate
• Distributed systems are complex to build and maintain
• Depended on academia & research labs for renting compute
Dealing with large datasets
Traditional Distributed system challenges
Data exchange requires synchronization
Temporal dependencies are complicated
Difficult to deal with partial failures of the system
Mostly at compute time, data is copied to the compute nodes
Developers spend more time designing for failure than they do actually
working on the problem itself
Transferring data to compute nodes becomes a bottleneck
• Typical disk data transfer rate: 75MB/sec -- Time taken to
transfer 100GB of data to the processor: approx 22 mins.
New approach is needed
Ideal system for distributed computing
Partial failure support
Data recoverability
Component recoverability
Consistency
Scalability

More Related Content

Viewers also liked

Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Thomas Vanhove
 
Imagining Supply Chain Processes Outside-in. Building Value Networks at IBM t...
Imagining Supply Chain Processes Outside-in. Building Value Networks at IBM t...Imagining Supply Chain Processes Outside-in. Building Value Networks at IBM t...
Imagining Supply Chain Processes Outside-in. Building Value Networks at IBM t...Lora Cecere
 
Security issues associated with big data in cloud
Security issues associated  with big data in cloudSecurity issues associated  with big data in cloud
Security issues associated with big data in cloudsornalathaNatarajan
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Predictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligencePredictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligenceManish Jain
 
OpenSource Big Data Platform - Flamingo Project
OpenSource Big Data Platform - Flamingo ProjectOpenSource Big Data Platform - Flamingo Project
OpenSource Big Data Platform - Flamingo ProjectBYOUNG GON KIM
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Big Data Analytics in Energy & Utilities
Big Data Analytics in Energy & UtilitiesBig Data Analytics in Energy & Utilities
Big Data Analytics in Energy & UtilitiesAnders Quitzau
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 

Viewers also liked (16)

Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Imagining Supply Chain Processes Outside-in. Building Value Networks at IBM t...
Imagining Supply Chain Processes Outside-in. Building Value Networks at IBM t...Imagining Supply Chain Processes Outside-in. Building Value Networks at IBM t...
Imagining Supply Chain Processes Outside-in. Building Value Networks at IBM t...
 
Security issues associated with big data in cloud
Security issues associated  with big data in cloudSecurity issues associated  with big data in cloud
Security issues associated with big data in cloud
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Big Data (security Issue)
Big Data (security Issue)Big Data (security Issue)
Big Data (security Issue)
 
Predictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligencePredictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial Intelligence
 
Big data security
Big data securityBig data security
Big data security
 
OpenSource Big Data Platform - Flamingo Project
OpenSource Big Data Platform - Flamingo ProjectOpenSource Big Data Platform - Flamingo Project
OpenSource Big Data Platform - Flamingo Project
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 
What is big data?
What is big data?What is big data?
What is big data?
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Big Data Analytics in Energy & Utilities
Big Data Analytics in Energy & UtilitiesBig Data Analytics in Energy & Utilities
Big Data Analytics in Energy & Utilities
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 

Recently uploaded

Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024christinemoorman
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03DallasHaselhorst
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Riya Pathan
 

Recently uploaded (20)

Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchir
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737
 

Big data technologies

  • 2. Introduction • Big Data is Data that is hard to capture, store, and analyze with commonly used software tools due to its very large size • “World’s nervous system—a real-time feedback loop which didn’t exist before” - Yahoo CEO Marissa Mayer
  • 3. • Mobile devices, smart energy meters, remote sensing, wireless sensors, software machine logs, cameras, rfid readers, etc. are creating massive amounts of data that businesses & governments now have the opportunity to analyze and act upon. • Every day approx 2.5 quintillion (2.5×10^18) bytes of data is created. • Business and economic possibilities of big data and its wider implications are important issues that business leaders and policy makers will tackle in the years ahead Why you should care?
  • 4. Industry verticals using Big Data Digital Media & E-Commerce Real-time ad targeting, Web analytics & trends Energy and Utilities Smart meter analytics, Asset management Financial Services Risk and fraud management, Portfolio management, Customer analytics Government Threat Management, Law Enforcement (Real-time multimodal surveillance, Cyber security detection), Macro economic analytics Healthcare and Life Sciences New drug development, Medical record text analytics, Genomic analytics Retail CRM, Targeted marketing analysis, Vendor delivery & Supply chain optimizations, Market basket analysis, Click-stream analysis Telecommunications CRM, Call detail record analysis, Least cost routing, Fraud management Transportation Logistics optimization, Traffic congestion Any industry vertical which accumulates a sufficient quantity of data can leverage Big data technologies. Here are some of the verticals
  • 6. Big Data Process/Steps Data processing steps at a basic level can be broken into three stages. Data as being raw indicators, information as the meaningful interpretation of those signals, and insight as an actionable piece of knowledge.
  • 7. • Consider 10 million page views a day on a popular web site • Capture User id for every page view and store them as integer • 10 million x 4 bytes = 40 MB of storage/day • 40MB x 30 days = 1.17 GB/month • Data quickly grows and so does challenges around storage, processing and analytics. Why Web Analytics quickly leads to Big Data Science 10^7 elements Domain of 32 – bit integers 40MB / day
  • 8. New Algorithm techniques in traditional computing • Probabilistic Data structures • Cardinality Estimation, Frequency Estimation, Range Query, Membership Query etc. Distributed computing /Divide and Conquer • Break processing units into equal parts, get individual results, and aggregate • Distributed systems are complex to build and maintain • Depended on academia & research labs for renting compute Dealing with large datasets
  • 9. Traditional Distributed system challenges Data exchange requires synchronization Temporal dependencies are complicated Difficult to deal with partial failures of the system Mostly at compute time, data is copied to the compute nodes Developers spend more time designing for failure than they do actually working on the problem itself Transferring data to compute nodes becomes a bottleneck • Typical disk data transfer rate: 75MB/sec -- Time taken to transfer 100GB of data to the processor: approx 22 mins. New approach is needed
  • 10. Ideal system for distributed computing Partial failure support Data recoverability Component recoverability Consistency Scalability