Big Data – A Brief Overview    Petabytes, Hadoop, Analytics, Collaborative business intelligence,Data scientists, In-Memor...
Big Data•   What is it?•   Where does it come from?•   How do we process it?•   What do we do with it?•   Who are the play...
What Is Big Data?Like the term Cloud, it is a bit           Nebulous
Attributes of Big Data• Volume• Velocity - streaming• Variety
Where Does It Come From?        It Depends
Key DriversSpread of cloud computing, mobile   computing and social mediatechnologies, financial transactions
Sources of Big Data•   Chatter from social networks,•   Web server logs,•   Traffic flow sensors,•   Satellite imagery,•  ...
How Do We Process It?
Process PipelineSource: http://radar.oreilly.com
HadoopA distributed processing Framework       based on Map/Reduce
PigA platform for analyzing large data sets that    consists of a high-level language forexpressing data analysis programs...
MahoutA machine learning library with algorithms  for clustering, classification and batch   based collaborative filtering...
HiveData warehouse software built on top ofApache Hadoop that facilitates queryingand managing large datasets residing in ...
PegasusA Peta-scale graph mining system that runs in parallel, distributed manner on top of                    Hadoop
SqoopA tool designed for efficiently transferring bulk data between Apache Hadoop andstructured data stores such as relati...
Flume          A distributed service forcollecting, aggregating, and moving large        log data amounts to HDFS.
Yahoo S4 S4 is a general-purpose, distributed, scalable,partially fault-tolerant, pluggable platform that     allows progr...
Twitter StormStorm can be used to process astream of new data and update    databases in real time.
TrendsFunding, Companies, Applications, Jo             bs, IPOs
Funding & IPO• Cloudera, (Commerical Hadoop) more than  $75 million• MapR (Cloudera competitor) has raised more  than $25 ...
Big Data Application Domains•   Healthcare•   The public sector•   Retail•   Manufacturing•   Personal-location data•   Fi...
A Few Examples
PayPal Tracking Architecture
Market and Market Segments   Research Data and Predictions
http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues
Market for big data tools will risefrom $9 billion to $86 billion in 2020
http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues
Future of Big Data• More Powerful and Expressive Tools for Analysis• Streaming Data Processing (Storm from Twitter and S4 ...
http://www.evolven.com/blog/big-data-predictions.html
Opportunities
Skills Gap•   Statistics•   Operations Research•   Math•   Programming•   So-called "Data Hacking"
Big data – a brief overview
Big data – a brief overview
Big data – a brief overview
Big data – a brief overview
Big data – a brief overview
Big data – a brief overview
Big data – a brief overview
Big data – a brief overview
Big data – a brief overview
Big data – a brief overview
Big data – a brief overview
Upcoming SlideShare
Loading in...5
×

Big data – a brief overview

2,104

Published on

Published in: Technology, Education
1 Comment
5 Likes
Statistics
Notes
No Downloads
Views
Total Views
2,104
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
110
Comments
1
Likes
5
Embeds 0
No embeds

No notes for slide

Big data – a brief overview

  1. 1. Big Data – A Brief Overview Petabytes, Hadoop, Analytics, Collaborative business intelligence,Data scientists, In-Memory Databases, NoSQL platforms
  2. 2. Big Data• What is it?• Where does it come from?• How do we process it?• What do we do with it?• Who are the players?• What are the opportunities?
  3. 3. What Is Big Data?Like the term Cloud, it is a bit Nebulous
  4. 4. Attributes of Big Data• Volume• Velocity - streaming• Variety
  5. 5. Where Does It Come From? It Depends
  6. 6. Key DriversSpread of cloud computing, mobile computing and social mediatechnologies, financial transactions
  7. 7. Sources of Big Data• Chatter from social networks,• Web server logs,• Traffic flow sensors,• Satellite imagery,• Broadcast audio streams,• Banking transactions,• MP3s of rock music,• The content of web pages,• Scans of government documents,• GPS trails,• Telemetry from automobiles,• Financial market data• ….
  8. 8. How Do We Process It?
  9. 9. Process PipelineSource: http://radar.oreilly.com
  10. 10. HadoopA distributed processing Framework based on Map/Reduce
  11. 11. PigA platform for analyzing large data sets that consists of a high-level language forexpressing data analysis programs, coupled with infrastructure for evaluating these programs.
  12. 12. MahoutA machine learning library with algorithms for clustering, classification and batch based collaborative filtering that are implemented on top of Apache Hadoop.
  13. 13. HiveData warehouse software built on top ofApache Hadoop that facilitates queryingand managing large datasets residing in distributed storage.
  14. 14. PegasusA Peta-scale graph mining system that runs in parallel, distributed manner on top of Hadoop
  15. 15. SqoopA tool designed for efficiently transferring bulk data between Apache Hadoop andstructured data stores such as relational databases.
  16. 16. Flume A distributed service forcollecting, aggregating, and moving large log data amounts to HDFS.
  17. 17. Yahoo S4 S4 is a general-purpose, distributed, scalable,partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.
  18. 18. Twitter StormStorm can be used to process astream of new data and update databases in real time.
  19. 19. TrendsFunding, Companies, Applications, Jo bs, IPOs
  20. 20. Funding & IPO• Cloudera, (Commerical Hadoop) more than $75 million• MapR (Cloudera competitor) has raised more than $25 million• 10Gen (Maker of the MongoDB) $32 million• DataStax (Products based on Apache Cassandra) $11 million• Splunk raised about $230 million through IPO
  21. 21. Big Data Application Domains• Healthcare• The public sector• Retail• Manufacturing• Personal-location data• Finance
  22. 22. A Few Examples
  23. 23. PayPal Tracking Architecture
  24. 24. Market and Market Segments Research Data and Predictions
  25. 25. http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues
  26. 26. Market for big data tools will risefrom $9 billion to $86 billion in 2020
  27. 27. http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues
  28. 28. Future of Big Data• More Powerful and Expressive Tools for Analysis• Streaming Data Processing (Storm from Twitter and S4 from Yahoo)• Rise of Data Market Places (InfoChimps, Azure Marketplace)• Development of Data Science Workflows and Tools (Chorus, The Guardian, New York Times)• Increased Understanding of Analysis and Visualizationhttp://www.evolven.com/blog/big-data-predictions.html
  29. 29. http://www.evolven.com/blog/big-data-predictions.html
  30. 30. Opportunities
  31. 31. Skills Gap• Statistics• Operations Research• Math• Programming• So-called "Data Hacking"
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×