Big Data vs Data Warehousing

2,556 views

Published on

An attempt to fi

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,556
On SlideShare
0
From Embeds
0
Number of Embeds
41
Actions
Shares
0
Downloads
121
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • We are at the end of the growth curve... 9B is our total population... This is an important observation because many data estimates are based on human activity and has so far assumed exponention growthm.. This is NOT the case anymore!
  • This show the development of hard drive capacity over time
  • The calculation is not meant to be read, just letting people know we did the calc and what it PHYSICALLY means (see the animation)... There is a real cost to storing a lot of data, and this is one of the reasons cloud makes a lot of senseWine bottles
  • This is Hyde Park.. From on end to the other...
  • Big Data vs Data Warehousing

    1. 1. Bigdata vs. Data Warehousing Synergy or Conflict? Thomas Kejser thomas@kejser.org http://blog.kejser.org @thomaskejser
    2. 2. Who is this Guy?Thomas Kejserhttp://blog.kejser.org@thomaskejser• Formerly: Lead SQLCAT EMEA• Now: CTO FusionIo EMEA• 15 year database experience• Performance Tuner
    3. 3. Human Consciousness Doesn’t Scale 10 9Billion Humans 8 7 6 5 2000 2050 2100 2150 2200 2250 Year Source: United Nations Projections
    4. 4. Text Messages in a TableCREATE TABLE AllTexts ( Sender BIGINT 8B , Receiver BIGINT 8B , SenderLocation BIGINT 8B , ReceiverLocation BIGINT 8B , Time DATETIME 8B , SMS VARCHAR(140) 140B) = 180Bytes
    5. 5. How much do we text?• World Average • 6.1 Trillion Text Messages / year • About 80% cell phone coverage • 7 billion people • 3 messages/day/person• But: • Teenagers: 50 messages/daySource: Pew Internet Research 2010 & ITU
    6. 6. How much will we EVER text?• 9B people acting like teenagers (in 2050) • 50 texts/day• That’s 450 billion texts/day • 164 Trillion texts/year (20x today) • 180 bytes each • Assume x3 compression• Approximation: 10 Petabytes/year in 2050
    7. 7. Moore’s Hard Drives LOGCapacity GB Can it be done? Year
    8. 8. How Large is this/year?Hard Disk (4TB) : 2.5” Wine Bottle (75cl): 4.0” About 1500 Wine Bottles
    9. 9. In the Data Center• Calculating: • 2U Storage=24 Disks (includes compute) • 4TB per Disk • 100TB in 2U (a bit less) • 10PB = 200U storage• About six racks
    10. 10. Warehouses Serve us Well..
    11. 11. … And it is Becoming a Commodity• Good Management Interfaces• Standard SQL • with a few extensions• Appliances• Support system• Homogenous HW • In chunks
    12. 12. vs.
    13. 13. PDW vs. Hive – Scan/seekQuery 1 Query 2SELECT count(*) SELECT max(l_quantity)FROM lineitem FROM lineitem WHERE l_orderkey > 1000 and l_orderkey < 100000 GROUP BY l_linestatus Secs. 1500 1000 Hive 500 PDW 0 Query 1 Query 2
    14. 14. PDW vs. Hive - Joins PDW-U:SELECT max(l_orderkey) • orders partitioned on c_custkeyFROM ordersJOIN lineitem • lineitem partitioned on l_partkeyON l_orderkey = o_orderkey PDW-P: • orders partitioned on o_orderkey • lineitem partitioned on l_orderkey Secs. 4000 3000 Hive 2000 PDW-U 1000 PDW-P 0 Hive PDW-U PDW-P
    15. 15. What does Big Data need to Catch up?• Thread startup times• Co-location awareness• Files vs. optimized DB memory structures• Column stores and other DB tech Generic is good…… but when there is structure, make use of it!
    16. 16. • What is Bigdata Very Unstructured Data
    17. 17. How many Pictures of Cats?• Flickr Today: • 300MB/month • 2GB/year • 51M users (too small?)• Estimate: 102 PB / year• 10 x text messages Source: WikiPedia
    18. 18. How big is this in wine bottles?
    19. 19. We have learned how to store it!
    20. 20. What is HDFS?• Distributed File System• Open Source• No more SAN• The Failure Unit is the Server
    21. 21. Fully unstructured data is boring…Unless you get money for storing it
    22. 22. Acquiring Personal InformationYour Semi-structured Data, the Old Fashioned Way
    23. 23. The Social AngleWho do you talk to and how often?
    24. 24. The ReasonsWhy do you own a cell phone?
    25. 25. Saturday, 1:39am - at The PubYour Semi-structured Data, For Free
    26. 26. Big Value Extraction of of meaning and insightfrom semi-structured data
    27. 27. Extracting Meaning from HumansMethod ExamplesTurn semi-structure to structure Image recognition, network proximity and super nodes, social mediaNeedle in a haystack Extract outliers, FraudHerd behaviors Clustering, Pattern Recognition, “Customers who bought this also bought”Text classification and search Text indexes, syntactic counting, pagerankText to structure Semantic analysis, loose structure into structure
    28. 28. Find New Customers “Michael, who is Tommy Thomas respected among his peers, Michael often talks about his new, cool gadgets”
    29. 29. Cross Sell “Families who own an Aston Martin will often buy a Mini Cooper too”
    30. 30. Free Information
    31. 31. Need: Lots of CPU Cores!
    32. 32. Need: Data Centers!
    33. 33. Provisioning has to be REALLY fast
    34. 34. Things to Learn for the Future• Get good at • Statistics (again) • Distributed Algorithms • Tuning• Understand Physical Constraints• Acquire deep domain knowledge
    35. 35. Something is Changing Today Tomorrow CAPEX Hardware OPEX Hardware You
    36. 36. The Mother of All Stovepipes
    37. 37. Big Data / Staging (No Model)Data youare afraid Data You Deliveryto lose actually need (Model)
    38. 38. Synergy Create Structure for me Warehouse Here is a table
    39. 39. Applying Social Media to Structure
    40. 40. Summary Data Warehouse Big Data• There is a model • Don’t bother modeling!• Seek Co-location • Optional Co-Location• Respond in seconds • Respond in minutes• Calculate first, query after • Calculate while querying• Expensive HW • Cheap HW• Optimise for target HW • Good enough on all HW• Homogenous HW • Heterogeneous HW• Pay vendor, expect • Free license, optimise optimised yourself
    41. 41. &

    ×