https://github.com/glennblock
https://twitter.com/gblock
“I should be
tweeting"
3
Make machine data accessible, usable
and valuable to everyone.
Platform for Machine Data
Any Machine Data
HA Indexes
and Storage
Search and
Investigation
Proactive
Monitoring
Operationa...
DATA
15,000 BC – Pictures
Lascaux, France
6000 BC – Symbols
3,500 BC – Language
1,275 BC – Papyrus
1st - 13th Century - Codex
13th Century – Movable type
15th Century – Printing press
19th to 20th century
Babbage Analytical engine
1936 – Turing machine
1945 – ENIAC
1947 – The first bug
1977 - Arpanet
1990s Internet
Phones and Tablets
RFID
Cloud
Services
New consumer devices
23
90 percent of all the data in the
world has been generated over
the last two years
source: sciencedaily.com
Every day 2.5 quintillion bytes
of data is generated
1 quintillion = 1 + 18 zeros!
57.5 billion 32 GB iPads
source: storag...
2.7 zettabytes exist in the digital
universe
1 zettabyte = 1 + 21 zeros!
42zb = All human speech digitized
source: highsca...
How big is big?
That’s A LOT of data!
How do you harness it?
This is what big data
is really about.
Asking questions and
getting answers
Massive amounts of data.
Machine generated
VOLUME
Data is coming from a multitude of sources
Mix of structured and un-structured
(JSON, XML, CSV, Plain Text)
Need a way to ...
VARIETY
Log files
Activity Feeds
Emails
Device Streams
Audio Files
Videos
Data arrives at many different frequencies
Need to be able to process real time.
VELOCITY
Not all data that is stored is useful.
Need to identify the useful data
Need to wade through all the noise
VERACITY
SOLUTIONS
Map/Reduce
function map(String name, String document):
// name: document name
// document: document contents
for each word...
Hi scale and availability databases
Distributed processing
of large datasets
Data Visualization and analysis
End to end tools
More information
www.mongodb.org
www.memsql.com
cassandra.apache.org
hadoop.apache.org
www.tableausoftware.com
www.elastic...
@gblock http://github.com/glennblock
http://www.flickr.com/photos/11812960@N04/4050576435
Getting your head around big data
Getting your head around big data
Upcoming SlideShare
Loading in...5
×

Getting your head around big data

658

Published on

My talk on Big Data from Dallas Day of .NET 2014

Published in: Engineering
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
658
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
12
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • At Splunk, our mission is to make machine data accessible, usable and valuable to everyone. Andthis overarching mission is what drives our company and product priorities.
  • Splunk is the leading platform for machine data analytics with over 5,200 organizations using Splunk (as of 7/1/13) – from tens of GB to many tens of TBs of data PER DAY.Splunk software is optimized for real-time, low latency and interactivity.Splunk software reliably collects and indexes all the streaming data from IT systems and technology devices in real-time - tens of thousands of sources in unpredictable formats and types.The value from Splunking machine data is described as Operational Intelligence. This enables organizations to: 1. Find and fix problems dramatically faster2. Automatically monitor to identify issues, problems and attacks3. Gain end-to-end visibility to track and deliver on IT KPIs and make better-informed IT decisions4. Gain real-time insight from operational data to make better-informed business decisions
  • The Lascaux cave paintings record the first known narrative stories. Telling stories through visualization of eventsMike Bostock,D3 and the NewYork Times VizCSS had to come from somewhere
  • Jiahu, China
  • Getting your head around big data

    1. 1. https://github.com/glennblock https://twitter.com/gblock “I should be tweeting"
    2. 2. 3 Make machine data accessible, usable and valuable to everyone.
    3. 3. Platform for Machine Data Any Machine Data HA Indexes and Storage Search and Investigation Proactive Monitoring Operational Visibility Real-time Business Insights Commodity Servers Online Services Web Services Servers Security GPS Location Storage Desktops Networks Packaged Applications Custom ApplicationsMessaging Telecoms Online Shopping Cart Web Clickstreams Databases Energy Meters Call Detail Records Smartphones and Devices RFID
    4. 4. DATA
    5. 5. 15,000 BC – Pictures Lascaux, France
    6. 6. 6000 BC – Symbols
    7. 7. 3,500 BC – Language
    8. 8. 1,275 BC – Papyrus
    9. 9. 1st - 13th Century - Codex
    10. 10. 13th Century – Movable type
    11. 11. 15th Century – Printing press
    12. 12. 19th to 20th century Babbage Analytical engine
    13. 13. 1936 – Turing machine
    14. 14. 1945 – ENIAC
    15. 15. 1947 – The first bug
    16. 16. 1977 - Arpanet
    17. 17. 1990s Internet
    18. 18. Phones and Tablets
    19. 19. RFID
    20. 20. Cloud
    21. 21. Services
    22. 22. New consumer devices 23
    23. 23. 90 percent of all the data in the world has been generated over the last two years source: sciencedaily.com
    24. 24. Every day 2.5 quintillion bytes of data is generated 1 quintillion = 1 + 18 zeros! 57.5 billion 32 GB iPads source: storagenewsletter.com
    25. 25. 2.7 zettabytes exist in the digital universe 1 zettabyte = 1 + 21 zeros! 42zb = All human speech digitized source: highscalability.com
    26. 26. How big is big?
    27. 27. That’s A LOT of data!
    28. 28. How do you harness it?
    29. 29. This is what big data is really about.
    30. 30. Asking questions and getting answers
    31. 31. Massive amounts of data. Machine generated VOLUME
    32. 32. Data is coming from a multitude of sources Mix of structured and un-structured (JSON, XML, CSV, Plain Text) Need a way to store it and and query it VARIETY
    33. 33. VARIETY Log files Activity Feeds Emails Device Streams Audio Files Videos
    34. 34. Data arrives at many different frequencies Need to be able to process real time. VELOCITY
    35. 35. Not all data that is stored is useful. Need to identify the useful data Need to wade through all the noise VERACITY
    36. 36. SOLUTIONS
    37. 37. Map/Reduce function map(String name, String document): // name: document name // document: document contents for each word w in document: emit (w, 1) function reduce(String word, Iterator partialCounts): // word: a word // partialCounts: a list of aggregated partial counts sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum)
    38. 38. Hi scale and availability databases
    39. 39. Distributed processing of large datasets
    40. 40. Data Visualization and analysis
    41. 41. End to end tools
    42. 42. More information www.mongodb.org www.memsql.com cassandra.apache.org hadoop.apache.org www.tableausoftware.com www.elasticsearch.org splunk.com
    43. 43. @gblock http://github.com/glennblock http://www.flickr.com/photos/11812960@N04/4050576435
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×