What is
Big Data?
@timgasper
#ffmassive
Source:
The Big Data
Insight Group
Source: IBM
2010
1.2 Zettabytes Per Year
2020
35.2 Zettabytes Per Year
Source: IBM
volume
velocity
variety
CRM/customer support
POS/purchases
ERP/accounting
email/documents/collab.
BI & data warehouse
system & network logs
web lo...
#ffmassive
Big Data is a collection of data sets so
large and complex that it becomes
difficult to process using on-hand
database man...
The challenges include
capture, curation, storage, search,
sharing, analysis, and visualization.
#ffmassive
Source: Wikipe...
#ffmassive
Source: PARC
Who Can
Manage
Innovation and
Complexity to
Deliver Value
Quickly?
 Multiple Layers of
Technology to
Integrate
 Can Take...
hadoop!
#ffmassive
#ffmassive
what about real-time?
#ffmassive
#ffmassive
#ffmassive
#1 enterprise cloud for big data
some of our customers our partners
#ffmassive
What is Big Data?
Upcoming SlideShare
Loading in …5
×

What is Big Data?

1,276 views

Published on

Why is everyone buzzing about Big Data? Here are some slides I presented at FFMassive at SXSW 2012 regarding what big data is, some of the stats, and some different approaches to solving the problem. This is very high level, oriented at folks who haven't encountered big data much.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,276
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
64
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • GARTNERBig data means chiefly means three things: large (big) data volume, large throughput of data per second or minute, and a large variety of different types of data to handle.Variety – the prior slide has just a small subset of the data sources our clients are excited aboutWhat do you need in order to be able to solve these problems?
  • So let’s dig into it. Big data is a pretty easy idea to explain: we produce data, all the time, constantly, and we produce a lot of it. Data centers now take up 1.3% of global energy usage – as much as the entire continent of Australia. So we have some similarly big challenges and even bigger opportunities.On the left on this slide I’ve listed just a few of the kinds of data sources that might be available to an agency, should they choose to ingest them. Everything from their own clients’ customer databases, to streams of tweets from Twitter, to Google search results and even forum posts, can be ingested in the pursuit of building something that generates insights for their clients.
  • The last few years has seen rapid pace of innovation in big data. With any new approach, new skills are required.It wasn’t that long ago that web pioneers solved one hard problem (web search ad display) with big data. They quickly rolled big data out to meet a range of applications and create lots of value.And with that came the explosion of products and vendors vying to meet this market.New skills are required that can’t be found internally or from traditional or offshore consulting firms Partnering with a team that helps you make the choices to help you manage innovation and accelerate your time to value.
  • Despite how well-known Hadoop is, even in the agency ecosystem, there’s still often confusion about what it actually does and what problems it solves. Let me show you an illustrative – if a little silly – example that might help you to understand exactly when and why you would use Hadoop.
  • Welcome to the Batch Sub Shop! We make sandwiches, lots of sandwiches. If we get a big order for 1,000 subs, we execute that order all at once. Hadoop has two phases in each calculation or job that executes, the map phase and the reduce phase. In the map phase, input data is modified, transformed, parsed, or otherwise altered or prepared. In our Batch Sub Shop, the map phase is when we slice our bread and our veggies and prepare our meat.In the reduce phase, transformed data from the map phase is assembled into the final output we want. In our Batch Sub Shop, the reduce phase is when we assemble all the sandwich orders from the sliced bread, veggies, and meats we prepared in the map phase. In a few hours, we’ll deliver a huge batch of sandwiches, fresh and delicious.For those of you who have started to evaluate Hadoop, I hope you’re getting the joke here. Hadoop is great in the same way that a caterer is great: if you have a big order and you don’t need it right away, it’s the perfect choice. Similarly, if you have a large amount of data that you need analyzed and you don’t need the result right away you should use Hadoop. This is one of the reasons that Hadoop so popular for analyzing historical data in a batch processing paradigm.
  • But say you were really hungry, and you just really want to eat your sandwich now. Our batch sub shop will make you wait 3 hours! Sure, you’ll get 1000 sandwiches at the 3-hour mark, but that’s not very helpful if you just wanted one right away.Similarly, Hadoop is not the right big data tool to use when you want results right away, in real-time, because not only do you have to assemble all your data in one place, as we assembled all our ingredients in one place in the batch sub shop, but you also have to wait for the full computation to finish before you get any results. This can often take hours. This makes Hadoop appropriate for batch or offline calculations that can run, say, overnight, and whose results we won’t need to see till morning.But what if we need results right away?
  • Enter the Streaming Sub Shop. This sub shop works like a conveyor belt. Ingredients enter on the left and as they move through the shop, we slice ‘em, dice ‘em, assemble those sandwiches, and get them toasted and served. The first sandwich will come out in just a few minutes and sandwiches will continuously be produced afterwards as they’re continuously fed in.Similarly, there are technologies complementary to Hadoop which enable this kind of stream processing of big data.
  • But now let’s return to the central challenge of big data: why aren’t you doing it right now? Why aren’t your competitors? It’s because it’s hard, you lack the expertise, and you haven’t or can’t hire the necessary resources – all of whom are rare and expensive.
  • we are a big data cloud services provider for the enterprise. we bundle together all the analytics infrastructure you need, like Hadoop, real-time analytics, and powerful databases, and provide the hosting, support, and expertise – so that you can focus on analytics and driving those business use cases and apps – not on wrangling with the complex systems
  • What is Big Data?

    1. 1. What is Big Data? @timgasper #ffmassive
    2. 2. Source: The Big Data Insight Group
    3. 3. Source: IBM
    4. 4. 2010 1.2 Zettabytes Per Year 2020 35.2 Zettabytes Per Year Source: IBM
    5. 5. volume velocity variety
    6. 6. CRM/customer support POS/purchases ERP/accounting email/documents/collab. BI & data warehouse system & network logs web logs/clickstream google analytics/omniture other SaaS products / APIs facebook/twitter/yelp/4sq experian/epsilon/acxiom mobile devices sensors machine-to-machine product reviews google search results ? many terabytes of data, sometimes many petabytes more data than ever before #ffmassive
    7. 7. #ffmassive
    8. 8. Big Data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. #ffmassive Source: Wikipedia
    9. 9. The challenges include capture, curation, storage, search, sharing, analysis, and visualization. #ffmassive Source: Wikipedia
    10. 10. #ffmassive Source: PARC
    11. 11. Who Can Manage Innovation and Complexity to Deliver Value Quickly?  Multiple Layers of Technology to Integrate  Can Take Months to Build One Analytic Application PROVEN VALUE FOR RANGE OF APPLICATIONS EXPLOSION OF PRODUCTS AND VENDORS Current State Source: Think Big Analytics
    12. 12. hadoop! #ffmassive
    13. 13. #ffmassive
    14. 14. what about real-time? #ffmassive
    15. 15. #ffmassive
    16. 16. #ffmassive
    17. 17. #1 enterprise cloud for big data some of our customers our partners #ffmassive

    ×