A Brief History Of Data


Published on

My presentation from QCON London 2014 entitled "A Brief History Of Data"

Published in: Data & Analytics, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • 3 words to sum up what types of projects I am drawn to.
  • Heard all the sheep jokes , any original heckles are most welcome.
  • What am I talking about :Data Where it came from What it looks like todayHow are we going to do some useful things with itMarketing guys created itBut the underlying data , tools and technologies and fundamental core discoveries didn’t
  • Like hollywood , I like a good origin story
  • Shoutout to wolfram alpha.The invention of arithmetic provides a way to abstractly compute numbers of objects. The Ishango Bone, a baboon's fibula discovered in the 1970's, was a counting system, a list of prime numbers and even gives evidence of a multiplication tableImagine taking these to the bar ! No androids , iphones.
  • The Lascaux cave paintings record the first known narrative stories. Telling stories through visualization of eventsMike Bostock,D3 and the NewYork Times VizCSS had to come from somewhere
  • No good if they are just sitting on a wall.A central event in the emergence of civilization, written language provides a systematic way to record and transmit knowledge. Old Sumerian love poem. Oh would they be dissapointed in Bieber.
  • The first known calendar system is established, rounding the lunar month to 30 days to create a 360-day year. The “like a boss Epoch” 1970 . Pfffft.
  • The Library at Thebes is the first known effort to gather and make many sources of knowledge available in one place. Inscription above the door “medicine for the soul” , haven’t heard people say that about folks trying to troubleshoot RDBMS scalability issues 
  • The Turin Papyrus is the first known topographic map. Geostats is going to be one of the most important “dimensions of data” in the future , and I’ll show a demo.
  • The Pythagoreans promote the idea that numbers can be used to systematically understand and compute aspects of nature, music, and the world. Euclid writes his Elements, systematically presenting theorems of geometry and arithmetic. Archimedes uses mathematics to create and understand technological devices and possibly builds gear-based, mechanical astronomical calculators. Antikythera Mechanism A gear-based device that survives today is created to compute calendrical computation. Aristotle tries to systematize knowledge, first, by classifying objects in the world, and second, by inventing the idea of logic as a way to formalize human reasoning. The father of OO and the if/then/else statement ? No disrespect to the Smalltalk team at Xerox PARC
  • Pliny creates an encyclopedia that claims to summarize all knowledge with references to its sources. Tsai Lun invents paper in China. French philosopher Nicole Oresme introduces the notion of drawing graphs of values. The printing press , Moveable type makes it economical to print many kinds of documents. Codices : A codex (Latincaudex for "trunk of a tree" or block of wood, book; plural codices) is a book made up of a number of sheets of paper, vellum, papyrus, or similar, with hand-written content,[1] usually stacked and bound by fixing one edge and with covers thicker than the sheets, but sometimes continuous and folded concertina-style. The alternative to paged codex format for a long document is the continuous scroll. Examples of folded codices are the Maya codices. Sometimes the term is used for a book-style format, including modern printed books but excluding folded books.Before codices came scrolls.
  • John Napier publishes the first tables of logarithms,fundamentialmathmatic tenets behind stats , geometry , cryptography etc..Leibniz promotes the idea of answering all human questions by converting them to a universal symbolic language, then applying logic using a machine. He also tries to organize the systematic collection of knowledge to use in such a system. Graunt and others start to systematically summarize demographic and economic data using statistical ideas based on mathematics. James Watt and John Southern create (but keep secret for 24 years) a device for automatically tracing variation of pressure with volume in a steam engine.The Jacquard loom weaves patterns specified by punched cards. Babbage constructed a mechanical computer to automate the creation of mathematical knowledge. Samuel Morse sends the first public telegraph message. Paul Julius Reuter uses pigeons to fly stock prices between Aachen and Brussels. Dewey invented the Dewey Decimal System for classifying the world's knowledge and specifying how to organize books in libraries. Indexing for performance, NOSQL and key value stores ideas came from somewhere.
  • Ronald Fisher and others lay the foundations for modern statistics. Turing, Bletchly Park, shows that any reasonable computation can be done by programming a fixed universal machine—and then speculated that such a machine could emulate the brain. Fortran, COBOL, and other early computer languages defines the concept of a precise formal representation for tasks to be performed by computers. Roger Tomlinson initiates the Canada Geographic Information System, creating the first GIS system.The first two nodes of what would become the ARPANET were interconnected between Leonard Kleinrock's Network Measurement Center at the UCLA's School of Engineering and Applied Science and Douglas Engelbart's NLS system at SRI International (SRI) in Menlo Park, California, on 29 October 1969.[1The Unicode standard assigns a numerical code to every glyph in every human language.
  • Where are we heading ?Where does it come from ?What does it look like ?Is it just stuff from computers ?Is it just humans and human creations that represent data ?
  • Log filesJVM JMXSNMPTapping the wire tcpdump, ngrep etc…APIs (REST etc…)
  • Data always been hereIf a bear shits in the woodsDoes data exist without someone there to observe it ?
  • Capturing data from the past
  • Data all around us , it may not necessarily look like data at first, until you think about it as data and how to capture it
  • Data is starting to permeate our daily lives and create a lot of opportunity for data driven solutionsAnd when you have the means to correlate data together , this can lead to many possibilitys
  • Even bearded hipsters can be useful.I aspire to have that amount of hairFitbit , humans are a source of data
  • A lot of data being createdA lot of data existsA lot more data will be created.We are analysing it and are getting more powerful at analysing it.
  • The data V’s , useful for booth babing duty.VeracitySome data is inherently uncertain, for example: sentiment and truthfulness in humans; GPS sensors bouncing among the skyscrapers of Manhattan; weather condi- tions; economic factors; and the future. When dealing with these types of data, no amount of data cleansing can correct for it. Yet despite uncertainty, the data still contains valuable information. The need to acknowledge and embrace this uncertainty is a hallmark of big data.
  • Where are we heading ?Huge DataReally F***en astronomical dataNope
  • Just dataBut we’ll be better at generating it more optima tallyData sources might increase , IOT etc…But , volumes may taper out to a less exponential trajectoryYou need people who understand the data , to impart knowledge about the dataThat allows us to generate insightsAnd ultimately take actions , unless you just want to look at pretty charts all day.For me that is the data story.
  • Developers will be the kingmakers (to use a RedMonkism)
  • A few musings of several potential ones.
  • At Splunk, our mission is to make machine data accessible, usable and valuable to everyone. Andthis overarching mission is what drives our company and product priorities.
  • What is Splunk
  • Splunk is the leading platform for machine data analytics with over 5,200 organizations using Splunk (as of 7/1/13) – from tens of GB to many tens of TBs of data PER DAY.Splunk software is optimized for real-time, low latency and interactivity.Splunk software reliably collects and indexes all the streaming data from IT systems and technology devices in real-time - tens of thousands of sources in unpredictable formats and types.The value from Splunking machine data is described as Operational Intelligence. This enables organizations to: 1. Find and fix problems dramatically faster2. Automatically monitor to identify issues, problems and attacks3. Gain end-to-end visibility to track and deliver on IT KPIs and make better-informed IT decisions4. Gain real-time insight from operational data to make better-informed business decisions
  • BUILD SPLUNK APPSThe Splunk Web Framework makes building a Splunk app looks and feels like building any modern web application.  The Simple Dashboard Editor makes it easy to BUILD interactive dashboards and user workflows as well as add custom styling, behavior and visualizations. Simple XML is ideal for fast, lightweight app customization and building. Simple XML development requires minimal coding knowledge and is well-suited for Splunk power users in IT to get fast visualization and analytics from their machine data. Simple XML also lets the developer “escape” to HTML with one click to do more powerful customization and integration with JavaScript. Developers looking for more advanced functionality and capabilities can build Splunk apps from the ground up using popular, standards-based web technologies: JavaScript and Django. The Splunk Web Framework lets developers quickly create Splunk apps by using prebuilt components, styles, templates, and reusable samples as well as supporting the development of custom logic, interactions, components, and UI. Developers can choose to program their Splunk app using Simple XML, JavaScript or Django (or any combination thereof).EXTEND AND INTEGRATE SPLUNKSplunk Enterprise is a robust, fully-integrated platform that enables developers to INTEGRATE data and functionality from Splunk software into applications across the organization using Software Development Kits (SDKs) for Java, JavaScript, C#, Python, PHP and Ruby. These SDKs make it easier to code to the open REST API that sits on top of the Splunk Engine. With almost 200 endpoints, the REST API lets developers do programmatically what any end user can do in the UI and more. The Splunk SDKs include documentation, code samples, resources and tools to make it faster and more efficient to program against the Splunk REST API using constructs and syntax familiar to developers experienced with Java, Python, JavaScript, PHP, Ruby and C#. Developers can easily manage HTTP access, authentication and namespaces in just a few lines of code.  Developers can use the Splunk SDKs to: - Run real-time searches and retrieve Splunk data from line-of-business systems like Customer Service applications - Integrate data and visualizations (charts, tables) from Splunk into BI tools and reporting dashboards- Build mobile applications with real-time KPI dashboards and alerts powered by Splunk - Log directly to Splunk from remote devices and applications via TCP, UDP and HTTP- Build customer-facing dashboards in your applications powered by user-specific data in Splunk - Manage a Splunk instance, including adding and removing users as well as creating data inputs from an application outside of Splunk- Programmatically extract data from Splunk for long-term data warehousingDevelopers can EXTEND the power of Splunk software with programmatic control over search commands, data sources and data enrichment. Splunk Enterprise offers search extensibility through: - Custom Search Commands - developers can add a custom search script (in Python) to Splunk to create own search commands. To build a search that runs recursively, developers need to make calls directly to the REST API- Scripted Lookups: developers can programmatically script lookups via Python.- Scripted Alerts: can trigger a shell script or batch file (we provide guidance for Python and PERL).- Search Macros: make chunks of a search reuseable in multiple places, including saved and ad hoc searches.  Splunk also provides developers with other mechanisms to extend the power of the platform.-Data Models: allow developers to abstract away the search language syntax, making Splunk queries (and thus, functionality) more manageable and portable/shareable. - Modular Inputs: allow developers to extend Splunk to programmatically manage custom data input functionality via REST.
  • Social DataTwitterTwitter REST InputShow setup for qconlondon tags and searching over raw dataCreate on the fly dashboardCreate searchtime extraction and dashboardShow precanned full dashboardShow underlying HTML/JS sourceFoursquareFoursquare REST inputShow 3 pre canned searchesShow haversine search commandShow geostats and create simple map on the fly in a dashboardPublic Data / Open Data / Geo DataFirebase , SF Muni realtime transit dataNode Scripted InputShow dashboardInputlookup for data genShow html/js sourceMobile Device signal outages
  • New Apps that take advantage of all this new data and correlations and insightsMore data more accessible , the democritization of dataHelp people : More effcient ways to run your car your household etc..Socio economic dataCyberbullyingSplunk for good
  • A Brief History Of Data

    1. 1. Copyright © 2014 Splunk Inc. A brief history of data Damien Dallimore Worldwide Developer Evangelist @ Splunk
    2. 2. Who Am I 2 Worldwide Developer Evangelist @ Splunk I code I talk about coding Community, Collaboration, Open
    3. 3. From Aotearoa (New Zealand) 3
    4. 4. Where did the “BIG DATA” come from 4
    5. 5. 5 Let’s go on a journey
    6. 6. 20,000 BC Arithmetic 6 We start to count things
    7. 7. 15,000 BC Cave Painting 7 We start to record data visually
    8. 8. 3,500 BC Written Language 8 We start to record and “transmit” knowledge
    9. 9. 2,500 BC Sumerian Calendar 9 We start to organize and track time
    10. 10. 1,250 BC Library at Thebes 10 We start to store data in mass
    11. 11. 1,150 BC Egyptian Maps 11 The origin of Google Maps
    12. 12. 1000 BC Era Math, Computation, Logic 12 We start to develop understanding through numbers 500 BC Pythagoras 300 BC Euclid And how numbers can be used to compute data 250 BC Archimedes 100 BC Antikythera Mechanism We start to classify objects and use logic to derive insights 350 BC Aristotle
    13. 13. 0 - 1600 13 78 Pliny : “all” the world’s knowledge captured 105 Paper : bulk recording of data 340 Codices : making data browseable with sections and indexes 1350 Nicole Oresme : turning data into picture 1453 Guttenberg : mass distribution of data
    14. 14. 1600 - 1900 14 1640 Napier : before logs there were logarithms 1662 Graunt : father of statistics 1796 Watt : recording data with a machine 1801 Jacquard : programming !! 1830 Babbage : the first mechanical and programmable computer 1844 Morse : data encodings 1850 Reuter : first “WAN” (the CSMA/CD was a bit messy) 1876 Dewey : data classification
    15. 15. 1900 - 2000 15 1930’s Fisher : modern statistics 1936 Turing : the universal computer 1950’s Programming Languages : Fortran et al 1962 Tomlinson : first standard for Geo Data 1963 ASCII : a standard for representing letters and numbers 1969 ARPANET and other Protocols 1970’s RDBMS : ETL , BI , Data Warehouses , I am your father 1970’s/ 80’s Personal Computing : the foundations for alot of today’s data 1982 TCP/IP standardized and the Internet came to be 1989 The Web and HTML blink tags 1991 Unicode : all languages captured
    16. 16. To infinity and beyond 16 Web 2.0 Google Social Networking IOT
    17. 17. 17 Data Today ?
    18. 18. 18
    19. 19. 19
    20. 20. 20
    21. 21. 21
    22. 22. 22
    23. 23. 23
    24. 24. 24
    25. 25. 25
    26. 26. 26
    27. 27. Data Stats Candy 27 Every day 2.5 quintillion bytes of data (1 followed by 18 zeros) are created A full 90 percent of all the data in the world has been generated over the last two years 2.7 Zetabytes of data exist in the digital universe today. Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data Akamai analyzes 75 million events per day to better target advertisements. Decoding the human genome originally took 10 years to process; now it can be achieved in one week. Data production will be 44 times greater in 2020 than it was in 2009
    28. 28. Data Characteristics 28 VOLUME VARIETY VERACITY VELOCITY
    29. 29. 29 Data Tomorrow ?
    30. 30. 30
    32. 32. 32
    33. 33. 33 How are we going wrangle this data ?
    34. 34. Release the Developers 34
    35. 35. New approaches to data platforms are needed 35 Traditional ETL / Data Warehouses = schema at write time To cope with data today = schema at read time A Data language , the new SQL Platforms that support an ecosystem of developers , content creators , data knowledge sharing Open and easily extensible to cope with a variety of data sources and use cases, API oriented Elasticity
    36. 36. 36 Make machine data accessible, usable and valuable to everyone.
    37. 37. Spelunking 37
    38. 38. Platform for Machine Data Any Machine Data HA Indexes and Storage Search and Investigation Proactive Monitoring Operational Visibility Real-time Business Insights Commodity Servers Online Services Web Services Servers Security GPS Location Storage Desktops Networks Packaged Applications Custom ApplicationsMessaging Telecoms Online Shopping Cart Web Clickstreams Databases Energy Meters Call Detail Records Smartphones and Devices RFID
    39. 39. Powerful Platform for Enterprise Developers 39 REST API Build Splunk Apps Extend and Integrate Splunk Simple XML JavaScript Django Web Framework Java JavaScript Python Ruby C# PHP Data Models Search Extensibility Modular Inputs SDKs
    40. 40. Enough talking DEMO TIME !!
    41. 41. The Developer Opportunity in Data 41 It’s fun to make cool things Get a job , build a business , make money ! Promote yourself , Promote your company Get involved in community projects Do Good Think of new data sources and tap into them Democratize data Discover new things & drive society forward We talk alot about the how , what , where and who ….. but what about the WHY
    42. 42. Contact Me ddallimore@splunk.com @damiendallimore http://dev.splunk.com http://blogs.splunk.com/dev 42
    43. 43. Thankyou !