Big Data Week - Myths and Legends

  • 4,004 views
Uploaded on

Presentation by Nick Halstead on some of the Myths around Big Data.

Presentation by Nick Halstead on some of the Myths around Big Data.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Thanks Nick for the slides, I'll not pretended that I understand what have been there but I wanted to ask sis you record this session as a video?

    If it's available, could you post it too? As we're using data for our startup, yet we still defining the stakeholders for us.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
4,004
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
43
Comments
1
Likes
8

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. NICK HALSTEAD, FOUNDERDATASIFT, @NIKBig Data“Myths and Legends”#BDW13Thursday, 25 April 13
  • 2. #BDW13BIG DATASOCIAL DATA +TV MONITORING POLITICAL TRACKING FINANCIAL FEEDS#DATASIFTThursday, 25 April 13
  • 3. #BDW13BIG DATASOCIAL DATA +TV MONITORING POLITICAL TRACKING FINANCIAL FEEDS1.5 BILLION ITEMS DAY#DATASIFTThursday, 25 April 13
  • 4. #BDW13BIG DATASOCIAL DATA +TV MONITORING POLITICAL TRACKING FINANCIAL FEEDS1.5 BILLION ITEMS DAY1.5 PETABYTES OF STORAGE#DATASIFTThursday, 25 April 13
  • 5. #BDW13BIG DATASOCIAL DATA +TV MONITORING POLITICAL TRACKING FINANCIAL FEEDS1.5 BILLION ITEMS DAY1.5 PETABYTES OF STORAGE5000 CPU HADOOP CLUSTER #DATASIFTThursday, 25 April 13
  • 6. Big Data“Myths and Legends”#BD13Thursday, 25 April 13
  • 7. BIG DATA PERCEPTION#GOOGLEI THOUGHT I WOULD ASK GOOGLE....Thursday, 25 April 13
  • 8. BIG DATA PERCEPTION#GOOGLEI THOUGHT I WOULD ASK GOOGLE....Thursday, 25 April 13
  • 9. BIG DATA PERCEPTION#GOOGLEI THOUGHT I WOULD ASK GOOGLE....Thursday, 25 April 13
  • 10. BIG DATA VENDOR “MYTHS”Thursday, 25 April 13
  • 11. Thursday, 25 April 13
  • 12. BIG DATA VENDOR “MYTHS”Thursday, 25 April 13
  • 13. #BDW13Thursday, 25 April 13
  • 14. 1. YOU MUST BUY ALL OF THIS (for one job!)#BDW13Thursday, 25 April 13
  • 15. 2. HOW BIG IS “BIG”Thursday, 25 April 13
  • 16. #BDW13Thursday, 25 April 13
  • 17. 20 PETABYTES IN EACH SEARCH INDEX REBULD (this was 2 years ago)#BDW13Thursday, 25 April 13
  • 18. 20 PETABYTES IN EACH SEARCH INDEX REBULD (this was 2 years ago)900,000 SERVERS#BDW13Thursday, 25 April 13
  • 19. #BDW13Thursday, 25 April 13
  • 20. #BDW133.2 BILLION LIKES AND COMMENTS PER DAYThursday, 25 April 13
  • 21. #BDW133.2 BILLION LIKES AND COMMENTS PER DAYOVER HALF A PETABYTE … EVERY 24 HOURSThursday, 25 April 13
  • 22. #BDW13 #HADRONThursday, 25 April 13
  • 23. 150 MILLION SENSORS DELIVERING DATA 40 MILLION TIMES PER SECOND#BDW13 #HADRONThursday, 25 April 13
  • 24. 150 MILLION SENSORS DELIVERING DATA 40 MILLION TIMES PER SECOND10’s OF PETABYTES PER YEAR#BDW13 #HADRONThursday, 25 April 13
  • 25. A TYPICAL COMPANYThursday, 25 April 13
  • 26. A TYPICAL COMPANY100 EMPLOYEESThursday, 25 April 13
  • 27. A TYPICAL COMPANY100 EMPLOYEES10,000 CUSTOMERSThursday, 25 April 13
  • 28. A TYPICAL COMPANY100 EMPLOYEES10,000 CUSTOMERS25 DATABASES (customers, transactions, etc)Thursday, 25 April 13
  • 29. A TYPICAL COMPANY100 EMPLOYEES10,000 CUSTOMERS1 MILLION TRANSACTIONS RECORDS25 DATABASES (customers, transactions, etc)Thursday, 25 April 13
  • 30. A TYPICAL COMPANY100 EMPLOYEES10,000 CUSTOMERS1 MILLION TRANSACTIONS RECORDS5,000 BYTES PER TRANSACTION25 DATABASES (customers, transactions, etc)Thursday, 25 April 13
  • 31. A TYPICAL COMPANY100 EMPLOYEES10,000 CUSTOMERS1 MILLION TRANSACTIONS RECORDS5,000 BYTES PER TRANSACTION25 DATABASES (customers, transactions, etc)=4 GIGABYTES (for largest database)Thursday, 25 April 13
  • 32. A TYPICAL COMPANY100 EMPLOYEES10,000 CUSTOMERS1 MILLION TRANSACTIONS RECORDS5,000 BYTES PER TRANSACTION25 DATABASES (customers, transactions, etc)=4 GIGABYTES (for largest database)=20 GIGABYTES (for ALL company data)Thursday, 25 April 13
  • 33. A TYPICAL HARDDRIVE2000 GIGABYTES (2TB)Thursday, 25 April 13
  • 34. A TYPICAL HARDDRIVE2000 GIGABYTES (2TB)4000 GIGABYTES (4TB)Thursday, 25 April 13
  • 35. 3. YOU NEED *LOTS* OF DATA SCIENTISTS#DILBERT#BDW13Thursday, 25 April 13
  • 36. 3. YOU NEED *LOTS* OF DATA SCIENTISTS#DILBERT#BDW13Thursday, 25 April 13
  • 37. 4. HOW BIG DATA IS USED#BDW13Thursday, 25 April 13
  • 38. 4. HOW BIG DATA IS USED#BDW13BANKINGThursday, 25 April 13
  • 39. 4. HOW BIG DATA IS USED#BDW13BANKINGCOMMUNICATIONSThursday, 25 April 13
  • 40. 4. HOW BIG DATA IS USED#BDW13BANKINGCOMMUNICATIONSGOVERNMENTThursday, 25 April 13
  • 41. 4. HOW BIG DATA IS USED#BDW13Thursday, 25 April 13
  • 42. 4. HOW BIG DATA IS USED#BDW13WEB LOGS 51%Thursday, 25 April 13
  • 43. 4. HOW BIG DATA IS USED#BDW13WEB LOGS 51%CLICK STREAM 35%Thursday, 25 April 13
  • 44. 5. HADOOP GONE BAD+SQL#BDW13 #HADOOPGONEBADThursday, 25 April 13
  • 45. RDBM - RELATIONAL DATABASE#BDW13Thursday, 25 April 13
  • 46. RDBM - RELATIONAL DATABASENEEDS TO BE PRE-DEFINED#BDW13Thursday, 25 April 13
  • 47. RDBM - RELATIONAL DATABASENEEDS TO BE PRE-DEFINEDREQUIRES INDEX TO PERFORM#BDW13Thursday, 25 April 13
  • 48. RDBM - RELATIONAL DATABASENEEDS TO BE PRE-DEFINEDREQUIRES INDEX TO PERFORMQUERIES ARE CONSTRAINED#BDW13Thursday, 25 April 13
  • 49. MAP REDUCE#MAPREDUCE#BDW13Thursday, 25 April 13
  • 50. MAP REDUCEPROCESS CLOSE TO THE DATA#MAPREDUCE#BDW13Thursday, 25 April 13
  • 51. MAP REDUCEPROCESS CLOSE TO THE DATAPARALLEL EXECUTION#MAPREDUCE#BDW13Thursday, 25 April 13
  • 52. MAP REDUCEPROCESS CLOSE TO THE DATAPARALLEL EXECUTIONANY TYPE OF ANALYSIS#MAPREDUCE#BDW13Thursday, 25 April 13
  • 53. MAP REDUCEPROCESS CLOSE TO THE DATAPARALLEL EXECUTIONANY TYPE OF ANALYSISHIDES DETAILS OFFAULT TOLERANCE, LOCALITYAND LOAD BALANCING#MAPREDUCE#BDW13Thursday, 25 April 13
  • 54. BIG DATA SCHEMA #NOSQLHBASECOLUMNS FILES#BDW13Thursday, 25 April 13
  • 55. (QUICK ASIDE)#SIDEBARThursday, 25 April 13
  • 56. GOOGLE FILE SYSTEM (GFS) GOOGLE MAPREDUCE (GMR).GOOGLE STARTED ALL THIS....Thursday, 25 April 13
  • 57. GOOGLE DREMELhttp://bit.ly/mS8QxX#BDW13Thursday, 25 April 13
  • 58. GOOGLE DREMELINTERACTIVE ANALYSIShttp://bit.ly/mS8QxX#BDW13Thursday, 25 April 13
  • 59. GOOGLE DREMELINTERACTIVE ANALYSISSCALE UP TO 10,000 SERVERShttp://bit.ly/mS8QxX#BDW13Thursday, 25 April 13
  • 60. GOOGLE DREMELINTERACTIVE ANALYSISSCALE UP TO 10,000 SERVERSCOLUMN STORAGEhttp://bit.ly/mS8QxX#BDW13Thursday, 25 April 13
  • 61. OpenDremelGOOGLE BIG QUERYGoogleBig Query#BDW13Thursday, 25 April 13
  • 62. http://research.google.com/archive/spanner.htmlGOOGLE SPANNER#SPANNER #NEWSQLThursday, 25 April 13
  • 63. http://research.google.com/archive/spanner.htmlGOOGLE SPANNER#SPANNER #NEWSQLThursday, 25 April 13
  • 64. http://research.google.com/archive/spanner.htmlGOOGLE SPANNER#SPANNER #NEWSQLRELATIONAL DATABASEThursday, 25 April 13
  • 65. http://research.google.com/archive/spanner.htmlGOOGLE SPANNER#SPANNER #NEWSQLRELATIONAL DATABASEGLOBALLY DISTRIBUTEDThursday, 25 April 13
  • 66. http://research.google.com/archive/spanner.htmlGOOGLE SPANNER#SPANNER #NEWSQLRELATIONAL DATABASEGLOBALLY DISTRIBUTEDUSE GPS / TRUETIMEThursday, 25 April 13
  • 67. http://research.google.com/archive/spanner.htmlGOOGLE SPANNER#SPANNER #NEWSQLRELATIONAL DATABASEGLOBALLY DISTRIBUTEDUSE GPS / TRUETIMENO OPEN SOURCE EQUIVALENTThursday, 25 April 13
  • 68. Thursday, 25 April 13
  • 69. BIG DATA IS THE NEW OILThursday, 25 April 13
  • 70. NICK HALSTEAD, FOUNDERHTTP://DATASIFT.COMWE ARE HIRING!!Thursday, 25 April 13