Data Alchemy


The “Big Data” and “Data Science” rhetoric of recent years seems to focus mostly on collecting, storing and analysing existing data. Data which many seem to think they have “too much of” already. However, the greatest discoveries in both science and business rarely come from analysing things that are already there. True innovation starts with asking Big Questions. Only then does it become apparent which data is needed to find the answers we seek.
In this session, we relive the true story of an epic voyage in search of data. A quest for knowledge that will take us around the globe and into the solar system. Along the way, we attempt to transmute lead into gold, use machine learning to optimise email marketing campaigns, experiment with sauerkraut, investigate a novel “Data Scientific” method for sentiment analysis, and discover a new continent.
This ancient adventure brings new perspectives on the Big Data and Data Science challenges we face today. Come and see how learning from the past can help you solve the problems of the future.

  1. 1. Data Alchemy (And A Boat Filled With Sauerkraut) @lukasvermeer
  2. 2. Lukas Vermeer Experiments at @lukasvermeer
  3. 3. SAUERKRAUT Introduction @lukasvermeer
  4. 4. HMS Endeavour Our voyage starts when a ship sails from Plymouth on 26 August 1768. @lukasvermeer
  5. 5. @lukasvermeer - Wikipedia “Provisions loaded at the outset of the voyage included 6,000 pieces of pork and 4,000 of beef, nine tons of bread, five tons of flour, three tons of sauerkraut, one ton of raisins and sundry quantities of cheese, salt, peas, oil, sugar and oatmeal.”
  6. 6. @lukasvermeer
  7. 7. REVOLUTIONS Chapter 1 @lukasvermeer
  8. 8. @lukasvermeer - Thomas Kuhn “Progress in science is not a simple line leading to the truth.”
  9. 9. Aristarchus of Samos c. 310 BC – c. 230 BC @lukasvermeer
  10. 10. @lukasvermeer - Archimedes “[Aristarchus’] hypotheses are that the fixed stars and the Sun remain unmoved, that the Earth revolves about the Sun on the circumference of a circle, […]”
  11. 11. @lukasvermeer - George Pólya “His fame rests on his heliocentric theory. […] Perhaps ‘theory’ is too strong a word, for his proofs were weak; yet it was a great idea.”
  12. 12. Claudius Ptolemy c. 100 AD – c. 170 AD @lukasvermeer
  13. 13. @lukasvermeer - Claudius Ptolemy “But it has escaped [heliocentric proponents’] notice in the light of what happens around us in the air that such a notion would seem altogether absurd.”
  14. 14. @lukasvermeer - Claudius Ptolemy “For the earth would always outstrip them in its eastward motion, so that all other bodies would seem to be left behind and to move towards the west.”
  15. 15. @lukasvermeer - Claudius Ptolemy No westward motion. No stellar parallax. Geocentric math works. QED
  16. 16. @lukasvermeer - Claudius Ptolemy No observed westward motion. No observed stellar parallax. Geocentric math works to explain what we have observed. GED
  17. 17. Nicolaus Copernicus 1473 – 1543 @lukasvermeer
  18. 18. Dē Revolutionibus Orbium Coelestium Copernicus's vision of the universe, published in 1543, the year of his death, though he had formulated the theory several decades earlier. @lukasvermeer
  19. 19. @lukasvermeer - Andreas Osiander “These hypotheses need not be true nor even probable. On the contrary, if they provide a calculus consistent with the observations, that alone is enough.”
  20. 20. Galileo Galilei 1564 – 1642 @lukasvermeer
  21. 21. @lukasvermeer - Michael Fowler “The real breakthrough that ultimately led to the acceptance of Copernicus’ theory was due to Galileo, but was actually a technological rather than a conceptual breakthrough.”
  22. 22. Galileo did not invent the idea. He built a better telescope. @lukasvermeer
  23. 23. Galileo first observed the moons of Jupiter This observation upset the notion that all celestial bodies revolve around the Earth. Galileo published a full description in March 1610. @lukasvermeer
  24. 24. Multiple models could probably explain the data you already have. Determining which one is closer to the truth requires a directed effort to collect new data (to the contrary). @lukasvermeer
  25. 25. @lukasvermeer Imagine we have two observations.
  26. 26. @lukasvermeer This theory fits the data.
  27. 27. @lukasvermeer But so does this theory.
  28. 28. @lukasvermeer This observation doesn’t help.
  29. 29. @lukasvermeer But this one does.
  30. 30. @lukasvermeer Uh oh. Time for a paradigm shift!
  31. 31. @lukasvermeer Data You Have Data You Need
  32. 32. @lukasvermeer Data You Have Data You Need Sauerkraut Science
  33. 33. TRANSIT Chapter 2 @lukasvermeer
  34. 34. “On the sizes and distances” Aristarchus's 3rd-century BC calculations on the relative sizes of (from left) the Sun, Earth and Moon, from a 10th-century AD Greek copy. @lukasvermeer
  35. 35. @lukasvermeer Aristarchus (3rd century BC) Distance to the sun 380 - 1520 Earth Radii
  36. 36. Diagram from Edmund Halley's 1716 paper Addressed to the Royal Society showing how the Venus transit could be used to calculate the distance between the Earth and the Sun. @lukasvermeer
  37. 37. Route of the first voyage of James Cook An expedition to the south Pacific Ocean aboard HMS Endeavour, from 1768 to 1771. It was the first of three Pacific voyages of which Cook was the commander. @lukasvermeer
  38. 38. Three years of travel. For two timestamps. @lukasvermeer
  39. 39. @lukasvermeer - James Cook “Not a Clowd was to be seen the Whole day and the Air was perfectly clear, so that we had every advantage we could desire in Observing the whole of the passage of the Planet Venus over the Suns disk.”
  40. 40. The "black drop effect" As recorded during the 1769 transit by James Cook. @lukasvermeer
  41. 41. Right place, right time, right idea. Insufficiently accurate telescope. @lukasvermeer
  42. 42. @lukasvermeer Jérôme Lalande (1771) Distance to the sun 24 000 Earth Radii
  43. 43. Science is limited by data. Data is limited by engineering. @lukasvermeer
  44. 44. @lukasvermeer - Elon Musk “In the absence of the engineering, you do not have the data. You just hit a limit. You can be real smart within the context of the limit of the data you have, but unless you have a way to get more data, you can’t make progress.”
  45. 45. @lukasvermeer Data You Have Data You Need
  46. 46. @lukasvermeer Data You Have Data You Need Data You COULD CREATE
  47. 47. TRANSMUTATION Chapter 3 @lukasvermeer
  48. 48. @lukasvermeer - Michael Palmer “Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.”
  49. 49. @lukasvermeer - Wikipedia “The philosopher's stone is a legendary alchemical substance capable of turning base metals such as mercury into gold or silver.”
  50. 50. Data Alchemy is not Data Science @lukasvermeer
  51. 51. “Your home for data science”. @lukasvermeer
  52. 52. Something good, something bad Hotel reviews on @lukasvermeer
  53. 53. Sentiment Analysis Excerpt from “Entity Based Sentiment Analysis on Twitter” by Siddharth Batra and Deepak Rao (Stanford University). @lukasvermeer
  54. 54. Kaggle is to real-life machine learning as chess is to war Intellectually challenging and great mental exercise, but YOU DON'T KNOW, MAN! YOU WEREN'T THERE! @lukasvermeer
  55. 55. Sentiment Analysis At, we solve the sentiment analysis challenge at data collection time. @lukasvermeer
  56. 56. @lukasvermeer Data You Have Data You Need Data You COULD CREATE
  57. 57. @lukasvermeer Data You Have Data You Need Data You COULD CREATE Delete or archive?
  58. 58. @lukasvermeer Data You Have Data You Need Data You COULD CREATEKeep, or recreate?
  59. 59. @lukasvermeer Data You Have Data You Need Data You COULD CREATE Proxies?
  60. 60. @lukasvermeer
  61. 61. @lukasvermeer
  62. 62. @lukasvermeer
  63. 63. @lukasvermeer
  64. 64. @lukasvermeer
  65. 65. Deciding which data to collect, and how, is a fundamental step in the scientific method. Limited both by available theories and engineering. @lukasvermeer
  66. 66. Some of us are philosophers. Some of us build telescopes. @lukasvermeer
  67. 67. @lukasvermeer - Voltaire “Judge a man by his questions rather than by his answers.”
  68. 68. COOK’S SECRET SECOND OBJECTIVE Appendix A @lukasvermeer
  69. 69. Terra Australis Nondum Cognita 1570 map by Abraham Ortelius depicting a large continent on the bottom of the map and also an Arctic continent. @lukasvermeer
  70. 70. Route of the first voyage of James Cook An expedition to the south Pacific Ocean aboard HMS Endeavour, from 1768 to 1771. It was the first of three Pacific voyages of which Cook was the commander. @lukasvermeer
  71. 71. A new map of the world With Captain Cook's tracks, his discoveries and those of the other circumnavigators. Published in 1800 by W. Palmer. @lukasvermeer
  72. 72. @lukasvermeer - Matthew Flinders (1814) “There is no probability, that any other detached body of land, of nearly equal extent, will ever be found in a more southern latitude”
  73. 73. EXPERIMENTING WITH SCURVY Appendix B @lukasvermeer
  74. 74. @lukasvermeer - Johann Bachstrom (1734) “This evil is solely owing to a total abstinence from fresh vegetable food, and greens; which is alone the primary cause of the disease”
  75. 75. @lukasvermeer - George F. M. Ball (2004) “Twelve sailors with scurvy were divided into six pairs and each pair was given a different daily concoction in addition to a common diet. Two fortunate patients were each given two oranges and one lemon every day; only these two recovered, thus demonstrating the efficacy of oranges and lemons.”
  76. 76. @lukasvermeer - Wikipedia “Malt and wort were top of the list of the remedies Cook was ordered to investigate. The others were beer, Sauerkraut and Lind's rob. The list did not include lemons.”
  77. 77. TLDR Limes cure scurvy! No they don’t. Experiment shows they do! Theory still says no. We’re eating limes now! What about these cheaper ones, though? Experiment says equally effective! So why did Scott’s expedition fail? Breaking news: limes don’t cure scurvy!? Told you so. So maybe eat more fresh meat? Works for Scott! Hooray! Science saves the day! Science also found this thing called “vitamin C”? Yes! And guess what? Limes cure scurvy again! … @lukasvermeer 1734 1747 1795 1860 1903 1911 1932