3 words to sum up what types of projects I am drawn to.
Heard all the sheep jokes , any original heckles are most welcome.
What am I talking about :Data Where it came from What it looks like todayHow are we going to do some useful things with itMarketing guys created itBut the underlying data , tools and technologies and fundamental core discoveries didn’t
Like hollywood , I like a good origin story
Shoutout to wolfram alpha.The invention of arithmetic provides a way to abstractly compute numbers of objects. The Ishango Bone, a baboon's fibula discovered in the 1970's, was a counting system, a list of prime numbers and even gives evidence of a multiplication tableImagine taking these to the bar ! No androids , iphones.
The Lascaux cave paintings record the first known narrative stories. Telling stories through visualization of eventsMike Bostock,D3 and the NewYork Times VizCSS had to come from somewhere
No good if they are just sitting on a wall.A central event in the emergence of civilization, written language provides a systematic way to record and transmit knowledge. Old Sumerian love poem. Oh would they be dissapointed in Bieber.
The first known calendar system is established, rounding the lunar month to 30 days to create a 360-day year. The “like a boss Epoch” 1970 . Pfffft.
The Library at Thebes is the first known effort to gather and make many sources of knowledge available in one place. Inscription above the door “medicine for the soul” , haven’t heard people say that about folks trying to troubleshoot RDBMS scalability issues
The Turin Papyrus is the first known topographic map. Geostats is going to be one of the most important “dimensions of data” in the future , and I’ll show a demo.
The Pythagoreans promote the idea that numbers can be used to systematically understand and compute aspects of nature, music, and the world. Euclid writes his Elements, systematically presenting theorems of geometry and arithmetic. Archimedes uses mathematics to create and understand technological devices and possibly builds gear-based, mechanical astronomical calculators. Antikythera Mechanism A gear-based device that survives today is created to compute calendrical computation. Aristotle tries to systematize knowledge, first, by classifying objects in the world, and second, by inventing the idea of logic as a way to formalize human reasoning. The father of OO and the if/then/else statement ? No disrespect to the Smalltalk team at Xerox PARC
Pliny creates an encyclopedia that claims to summarize all knowledge with references to its sources. Tsai Lun invents paper in China. French philosopher Nicole Oresme introduces the notion of drawing graphs of values. The printing press , Moveable type makes it economical to print many kinds of documents. Codices : A codex (Latincaudex for "trunk of a tree" or block of wood, book; plural codices) is a book made up of a number of sheets of paper, vellum, papyrus, or similar, with hand-written content, usually stacked and bound by fixing one edge and with covers thicker than the sheets, but sometimes continuous and folded concertina-style. The alternative to paged codex format for a long document is the continuous scroll. Examples of folded codices are the Maya codices. Sometimes the term is used for a book-style format, including modern printed books but excluding folded books.Before codices came scrolls.
John Napier publishes the first tables of logarithms,fundamentialmathmatic tenets behind stats , geometry , cryptography etc..Leibniz promotes the idea of answering all human questions by converting them to a universal symbolic language, then applying logic using a machine. He also tries to organize the systematic collection of knowledge to use in such a system. Graunt and others start to systematically summarize demographic and economic data using statistical ideas based on mathematics. James Watt and John Southern create (but keep secret for 24 years) a device for automatically tracing variation of pressure with volume in a steam engine.The Jacquard loom weaves patterns specified by punched cards. Babbage constructed a mechanical computer to automate the creation of mathematical knowledge. Samuel Morse sends the first public telegraph message. Paul Julius Reuter uses pigeons to fly stock prices between Aachen and Brussels. Dewey invented the Dewey Decimal System for classifying the world's knowledge and specifying how to organize books in libraries. Indexing for performance, NOSQL and key value stores ideas came from somewhere.
Ronald Fisher and others lay the foundations for modern statistics. Turing, Bletchly Park, shows that any reasonable computation can be done by programming a fixed universal machine—and then speculated that such a machine could emulate the brain. Fortran, COBOL, and other early computer languages defines the concept of a precise formal representation for tasks to be performed by computers. Roger Tomlinson initiates the Canada Geographic Information System, creating the first GIS system.The first two nodes of what would become the ARPANET were interconnected between Leonard Kleinrock's Network Measurement Center at the UCLA's School of Engineering and Applied Science and Douglas Engelbart's NLS system at SRI International (SRI) in Menlo Park, California, on 29 October 1969.[1The Unicode standard assigns a numerical code to every glyph in every human language.
Where are we heading ?Where does it come from ?What does it look like ?Is it just stuff from computers ?Is it just humans and human creations that represent data ?
Log filesJVM JMXSNMPTapping the wire tcpdump, ngrep etc…APIs (REST etc…)
Data always been hereIf a bear shits in the woodsDoes data exist without someone there to observe it ?
Capturing data from the past
Data all around us , it may not necessarily look like data at first, until you think about it as data and how to capture it
Data is starting to permeate our daily lives and create a lot of opportunity for data driven solutionsAnd when you have the means to correlate data together , this can lead to many possibilitys
Even bearded hipsters can be useful.I aspire to have that amount of hairFitbit , humans are a source of data
A lot of data being createdA lot of data existsA lot more data will be created.We are analysing it and are getting more powerful at analysing it.
The data V’s , useful for booth babing duty.VeracitySome data is inherently uncertain, for example: sentiment and truthfulness in humans; GPS sensors bouncing among the skyscrapers of Manhattan; weather condi- tions; economic factors; and the future. When dealing with these types of data, no amount of data cleansing can correct for it. Yet despite uncertainty, the data still contains valuable information. The need to acknowledge and embrace this uncertainty is a hallmark of big data.
Where are we heading ?Huge DataReally F***en astronomical dataNope
Just dataBut we’ll be better at generating it more optima tallyData sources might increase , IOT etc…But , volumes may taper out to a less exponential trajectoryYou need people who understand the data , to impart knowledge about the dataThat allows us to generate insightsAnd ultimately take actions , unless you just want to look at pretty charts all day.For me that is the data story.
Developers will be the kingmakers (to use a RedMonkism)
A few musings of several potential ones.
At Splunk, our mission is to make machine data accessible, usable and valuable to everyone. Andthis overarching mission is what drives our company and product priorities.
What is Splunk
Splunk is the leading platform for machine data analytics with over 5,200 organizations using Splunk (as of 7/1/13) – from tens of GB to many tens of TBs of data PER DAY.Splunk software is optimized for real-time, low latency and interactivity.Splunk software reliably collects and indexes all the streaming data from IT systems and technology devices in real-time - tens of thousands of sources in unpredictable formats and types.The value from Splunking machine data is described as Operational Intelligence. This enables organizations to: 1. Find and fix problems dramatically faster2. Automatically monitor to identify issues, problems and attacks3. Gain end-to-end visibility to track and deliver on IT KPIs and make better-informed IT decisions4. Gain real-time insight from operational data to make better-informed business decisions
Social DataTwitterTwitter REST InputShow setup for qconlondon tags and searching over raw dataCreate on the fly dashboardCreate searchtime extraction and dashboardShow precanned full dashboardShow underlying HTML/JS sourceFoursquareFoursquare REST inputShow 3 pre canned searchesShow haversine search commandShow geostats and create simple map on the fly in a dashboardPublic Data / Open Data / Geo DataFirebase , SF Muni realtime transit dataNode Scripted InputShow dashboardInputlookup for data genShow html/js sourceMobile Device signal outages
New Apps that take advantage of all this new data and correlations and insightsMore data more accessible , the democritization of dataHelp people : More effcient ways to run your car your household etc..Socio economic dataCyberbullyingSplunk for good
20,000 BC Arithmetic
We start to count things
15,000 BC Cave Painting
We start to record data visually
3,500 BC Written Language
We start to record and “transmit” knowledge
2,500 BC Sumerian Calendar
We start to organize and track time
1,250 BC Library at Thebes
We start to store data in mass
1,150 BC Egyptian Maps
The origin of Google Maps
1000 BC Era Math, Computation, Logic
We start to develop understanding through numbers
500 BC Pythagoras
300 BC Euclid
And how numbers can be used to compute data
250 BC Archimedes
100 BC Antikythera Mechanism
We start to classify objects and use logic to derive insights
350 BC Aristotle
0 - 1600
78 Pliny : “all” the world’s knowledge captured
105 Paper : bulk recording of data
340 Codices : making data browseable with sections and indexes
1350 Nicole Oresme : turning data into picture
1453 Guttenberg : mass distribution of data
1600 - 1900
1640 Napier : before logs there were logarithms
1662 Graunt : father of statistics
1796 Watt : recording data with a machine
1801 Jacquard : programming !!
1830 Babbage : the first mechanical and programmable computer
1844 Morse : data encodings
1850 Reuter : first “WAN” (the CSMA/CD was a bit messy)
1876 Dewey : data classification
1900 - 2000
1930’s Fisher : modern statistics
1936 Turing : the universal computer
1950’s Programming Languages : Fortran et al
1962 Tomlinson : first standard for Geo Data
1963 ASCII : a standard for representing letters and numbers
1969 ARPANET and other Protocols
1970’s RDBMS : ETL , BI , Data Warehouses , I am your father
1970’s/ 80’s Personal Computing : the foundations for alot of today’s data
1982 TCP/IP standardized and the Internet came to be
1989 The Web and HTML blink tags
1991 Unicode : all languages captured
To infinity and beyond
Data Stats Candy
Every day 2.5 quintillion bytes of data (1 followed by 18 zeros) are created
A full 90 percent of all the data in the world has been generated over the last two years
2.7 Zetabytes of data exist in the digital universe today.
Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data
Akamai analyzes 75 million events per day to better target advertisements.
Decoding the human genome originally took 10 years to process; now it can be achieved in one
Data production will be 44 times greater in 2020 than it was in 2009
New approaches to data platforms are needed
Traditional ETL / Data Warehouses = schema at write time
To cope with data today = schema at read time
A Data language , the new SQL
Platforms that support an ecosystem of developers , content creators , data knowledge sharing
Open and easily extensible to cope with a variety of data sources and use cases, API oriented
Make machine data accessible, usable
and valuable to everyone.
Platform for Machine Data
Any Machine Data
Powerful Platform for Enterprise Developers
Build Splunk Apps Extend and Integrate Splunk
The Developer Opportunity in Data
It’s fun to make cool things
Get a job , build a business , make money !
Promote yourself , Promote your company
Get involved in community projects
Think of new data sources and tap into them
Discover new things & drive society forward
We talk alot about the how , what , where and who ….. but what about the WHY