Big Data and its
applications
Introduction
Big Data – use cases, applications,
technologies and vendors overview
Aimed at providing high level overview
of tools and technologies related to big
data
Topics covered
Introduction to Big Data
◦ Definition, need for Big Data, hype cycle
Applications of Big Data
◦ Industry-wise applications
Big Data Technologies Overview
◦ Hadoop, PIG, Hive, NoSQL, Columnar DB
Big Data Vendors Overview
◦ Amazon, Cloudera, Hortonworks, MapR
etc
Big Data - definition
popular term used to describe the
exponential growth and availability of
data, both structured and
unstructured.
collection of data from traditional and
digital sources inside and outside your
company that represents a source for
ongoing discovery and analysis.
may refer to both volume of data as
well as the tools and processes
3 Vs of Big Data
Need for Big Data
2.7 ZB of data in Digital Universe
Today
FB stores and analyzes 30+ PB of
data
Walmart data exceeds 2.5 PB
better decision making and increased
operational efficiency
When to go for a Big Data
Soln
Analyze all types of data
Most or all of the data to be analyzed
Iterative and exploratory
Business measures not predetermined
Traditional warehouse not suitable for
unstructured data and schema
compliant
Gartner’s Hype Cycle
Retail – Pricing Optimization
Analyze millions of sold or items for sale
Valuable insights about customers and
markets in quicker timeframes
Aggregate data from multiple channels in
multiple formats
Day long jobs complete in minutes
Retail – Smart shopping exp
Pricing data, POS, txns, Social media,
call center records, promotions
Better understanding of customer
preferences, shopping patterns
Geo location apps - deliver
personalized marketing experience
Big Data in Finance
Customer segmentation
◦ Correlate purchase history, profile info,
behaviour on social media
◦ Generate portfolio advice
Fraud Detection systems
Wealth Management
◦ Investment Research – try out new
investment ideas, improve algorithmic trading
◦ Customer knowledge – unified view of
customer
Big Data in Finance
Regulatory Compliance
◦ Impact of Credit Crisis ‘08 – regulatory
compliance
◦ Stringent monitoring and reporting of data
Risk Management
◦ Better analysis of investment positions and risk
metrics
Big Data in Healthcare
EMR – Electronic Medical Records
initiative in US
Complete digitization of a patient’s
medical info such as profile, disease
treatment, pharmacy visits etc
Shared across networks
Slow adoption and challenges in
aggregation
Big Data in Healthcare
Predict health issues
◦ Build Model that predicts patient’s risk
◦ Hospital to do followup with high-risk
patients to avoid hospitalization
Predicting outbreaks
◦ IBM Research project -STEM
◦ Model – correlates disease data with
climate and temperature
◦ Can predict disease outbreak for regions
expecting climatic change
Big Data – Internet of
Things
Data generated by machine – RFID
chips implanted in devices
3 phases
◦ Data ingestion – cost
◦ Data storage - cost
◦ Analytics – real value
Outsouce phases 1 and 2 to DBAAS
(redshift, hortonworks, cloudera)
UPS – Case study
Aim
◦ Find the fastest and most fuel-efficient
way to deliver packages to customers
ORION research project
◦ Captures driver behaviour and safety
habits thru GPS
◦ Sensor data on fuel emissions and
consumption
◦ Monitors deliveries and customer service
◦ Runs advanced algorithms to optimize
routes
UPS – Case Study
early testing in 2011-2012 for 10k
routes – 1.5 million gallons of fuel
saved
Complete deployment in 55000 routes
throughout North America by 2017
Big Data – Technologies
Mapreduce
◦ programming paradigm allows massive
job execution parallely across thousands
of servers
◦ Map task - input dataset is converted into
a different set of key/value pairs
◦ Reduce task - several of the outputs of
the "Map" task are combined to form a
reduced set of tuples
Big Data - Technologies
Hadoop
◦ Most popular open-source implementation
of mapreduce
◦ Can work with multiple forms of data
◦ run processor-intensive machine learning
jobs
HIVE
◦ Developed by FB and later made open-
source
◦ SQL like feature on top of hadoop
◦ Query data stored in a hadoop cluster
Big Data - Technologies
PIG
◦ Scripting language
◦ Transforms data present in Hadoop
cluster
◦ Developed by Yahoo and made open-
source
NoSQL
◦ Schema less databases
◦ Storage and retrieval of huge amounts of
unstructured data
◦ Scalable, flexible and cloud-friendly but
less consistent
Other Big Data
Technologies
Search engines – Lucene, Solr,
ElasticSearch, Amazon CloudSearch
Stream Processing
◦ Apache Storm, Apache Spark, Cloudera’s
Impala, Yahoo’s S4 and Apache Tez
Big Data – Vendors
Amazon
◦ Elastic Map Reduce – Amazon’s hadoop
distribution to be run on AWS
infrastructure
◦ “largest adoption of hadoop platforms in
the market” – Forrester report
Cloudera
◦ Uses many aspects of open-source
hadoop
◦ Lot of features built on top of its hadoop
namely Cloudera Manager and Impala
Big Data - Vendors
Hortonworks
◦ Builds open-source hadoop ecosystem
◦ Also innovates – Ambari – cluster
management software
IBM
◦ Infosphere BigInsights – Analytics at rest
◦ Infosphere streams – Analytics in motion
◦ Hadoop-based analytics
◦ Stream computing
◦ Data Warehousing
◦ Application development
Big Data - Vendors
Intel
◦ Develops custom Hadoop version on
Xeon chips
◦ Closest affinity between hardware and
software
MapR
◦ Best growing Hadoop distribution
company
◦ Highest scores for distribution
architechture and data processing
capabilities
Big Data - Vendors
Microsoft
◦ Does not encourage open-source but
promotes hadoop
◦ HDInsight
Hadoop as a service to be run on Windows
Azure
based on Hortonworks’ hadoop distribution
◦ Polybase
SQL server info can be searched on hadoop
◦ Big presence in other markets enables
delivering end-end Hadoop solution
Big Data - Vendors
Teradata
◦ SQL and RDBMS specialization
◦ Partnered with HortonWorks
◦ Integrated Hadoop with existing SQL
offerings
◦ Existing teradata users can use Hadoop
platform to process warehouses data
Questions

Big Data - Applications and Technologies Overview

  • 1.
    Big Data andits applications
  • 2.
    Introduction Big Data –use cases, applications, technologies and vendors overview Aimed at providing high level overview of tools and technologies related to big data
  • 3.
    Topics covered Introduction toBig Data ◦ Definition, need for Big Data, hype cycle Applications of Big Data ◦ Industry-wise applications Big Data Technologies Overview ◦ Hadoop, PIG, Hive, NoSQL, Columnar DB Big Data Vendors Overview ◦ Amazon, Cloudera, Hortonworks, MapR etc
  • 4.
    Big Data -definition popular term used to describe the exponential growth and availability of data, both structured and unstructured. collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis. may refer to both volume of data as well as the tools and processes
  • 5.
    3 Vs ofBig Data
  • 6.
    Need for BigData 2.7 ZB of data in Digital Universe Today FB stores and analyzes 30+ PB of data Walmart data exceeds 2.5 PB better decision making and increased operational efficiency
  • 7.
    When to gofor a Big Data Soln Analyze all types of data Most or all of the data to be analyzed Iterative and exploratory Business measures not predetermined Traditional warehouse not suitable for unstructured data and schema compliant
  • 8.
  • 9.
    Retail – PricingOptimization Analyze millions of sold or items for sale Valuable insights about customers and markets in quicker timeframes Aggregate data from multiple channels in multiple formats Day long jobs complete in minutes
  • 10.
    Retail – Smartshopping exp Pricing data, POS, txns, Social media, call center records, promotions Better understanding of customer preferences, shopping patterns Geo location apps - deliver personalized marketing experience
  • 11.
    Big Data inFinance Customer segmentation ◦ Correlate purchase history, profile info, behaviour on social media ◦ Generate portfolio advice Fraud Detection systems Wealth Management ◦ Investment Research – try out new investment ideas, improve algorithmic trading ◦ Customer knowledge – unified view of customer
  • 12.
    Big Data inFinance Regulatory Compliance ◦ Impact of Credit Crisis ‘08 – regulatory compliance ◦ Stringent monitoring and reporting of data Risk Management ◦ Better analysis of investment positions and risk metrics
  • 13.
    Big Data inHealthcare EMR – Electronic Medical Records initiative in US Complete digitization of a patient’s medical info such as profile, disease treatment, pharmacy visits etc Shared across networks Slow adoption and challenges in aggregation
  • 14.
    Big Data inHealthcare Predict health issues ◦ Build Model that predicts patient’s risk ◦ Hospital to do followup with high-risk patients to avoid hospitalization Predicting outbreaks ◦ IBM Research project -STEM ◦ Model – correlates disease data with climate and temperature ◦ Can predict disease outbreak for regions expecting climatic change
  • 15.
    Big Data –Internet of Things Data generated by machine – RFID chips implanted in devices 3 phases ◦ Data ingestion – cost ◦ Data storage - cost ◦ Analytics – real value Outsouce phases 1 and 2 to DBAAS (redshift, hortonworks, cloudera)
  • 16.
    UPS – Casestudy Aim ◦ Find the fastest and most fuel-efficient way to deliver packages to customers ORION research project ◦ Captures driver behaviour and safety habits thru GPS ◦ Sensor data on fuel emissions and consumption ◦ Monitors deliveries and customer service ◦ Runs advanced algorithms to optimize routes
  • 17.
    UPS – CaseStudy early testing in 2011-2012 for 10k routes – 1.5 million gallons of fuel saved Complete deployment in 55000 routes throughout North America by 2017
  • 18.
    Big Data –Technologies Mapreduce ◦ programming paradigm allows massive job execution parallely across thousands of servers ◦ Map task - input dataset is converted into a different set of key/value pairs ◦ Reduce task - several of the outputs of the "Map" task are combined to form a reduced set of tuples
  • 19.
    Big Data -Technologies Hadoop ◦ Most popular open-source implementation of mapreduce ◦ Can work with multiple forms of data ◦ run processor-intensive machine learning jobs HIVE ◦ Developed by FB and later made open- source ◦ SQL like feature on top of hadoop ◦ Query data stored in a hadoop cluster
  • 20.
    Big Data -Technologies PIG ◦ Scripting language ◦ Transforms data present in Hadoop cluster ◦ Developed by Yahoo and made open- source NoSQL ◦ Schema less databases ◦ Storage and retrieval of huge amounts of unstructured data ◦ Scalable, flexible and cloud-friendly but less consistent
  • 21.
    Other Big Data Technologies Searchengines – Lucene, Solr, ElasticSearch, Amazon CloudSearch Stream Processing ◦ Apache Storm, Apache Spark, Cloudera’s Impala, Yahoo’s S4 and Apache Tez
  • 22.
    Big Data –Vendors Amazon ◦ Elastic Map Reduce – Amazon’s hadoop distribution to be run on AWS infrastructure ◦ “largest adoption of hadoop platforms in the market” – Forrester report Cloudera ◦ Uses many aspects of open-source hadoop ◦ Lot of features built on top of its hadoop namely Cloudera Manager and Impala
  • 23.
    Big Data -Vendors Hortonworks ◦ Builds open-source hadoop ecosystem ◦ Also innovates – Ambari – cluster management software IBM ◦ Infosphere BigInsights – Analytics at rest ◦ Infosphere streams – Analytics in motion ◦ Hadoop-based analytics ◦ Stream computing ◦ Data Warehousing ◦ Application development
  • 24.
    Big Data -Vendors Intel ◦ Develops custom Hadoop version on Xeon chips ◦ Closest affinity between hardware and software MapR ◦ Best growing Hadoop distribution company ◦ Highest scores for distribution architechture and data processing capabilities
  • 25.
    Big Data -Vendors Microsoft ◦ Does not encourage open-source but promotes hadoop ◦ HDInsight Hadoop as a service to be run on Windows Azure based on Hortonworks’ hadoop distribution ◦ Polybase SQL server info can be searched on hadoop ◦ Big presence in other markets enables delivering end-end Hadoop solution
  • 26.
    Big Data -Vendors Teradata ◦ SQL and RDBMS specialization ◦ Partnered with HortonWorks ◦ Integrated Hadoop with existing SQL offerings ◦ Existing teradata users can use Hadoop platform to process warehouses data
  • 27.

Editor's Notes

  • #3 Tip: Add your own speaker notes here.
  • #5 Tip: Add your own speaker notes here.
  • #8 Tip: Add your own speaker notes here.
  • #12 Tip: Add your own speaker notes here.