• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Strata Conference 2012
 

Strata Conference 2012

on

  • 1,395 views

 

Statistics

Views

Total Views
1,395
Views on SlideShare
770
Embed Views
625

Actions

Likes
0
Downloads
0
Comments
0

2 Embeds 625

http://michelleli.ca 615
http://dev.en.oreilly.com 10

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Strata in Santa Clara, California has gathered over 2,000 developers, journalists and data scientists in one place to discuss data - big and small - at what has become the data event of the year. Oh and we're there too. See where the data enthusiasts came from, what they want to talk about - and how much data they process
  • Locate untapped sourcesRefine data rather than just selling it. For instance, the analysis georeferenced photos you have seen previously as led tothe production of new layer of information for navigation systems.Research Challenge on Visualization http://www.w3.org/2012/06/pmod/visualization.pdfIntroduction and definition As the Google CEO Eric Schmidt pointed out in 2010, currently in two days is created in the world as much information as it was from the appearance of man till 2003. This is due to the explosion in computing techniques, which led to the generation of a tremendous amount of data which are stored in the internet and processed in the IT systems all over the world. In fact as predicted by CISCO4, by 2015 the annual global IP traffic will reach 966 Exabytes (1018 bytes) (nearly a Zettabyte (1021 bytes)), increasing fourfold from about 900 Petabytes (1015 bytes) back in 2000 and around 2,500 Petabytes in 20105. But data are not only stored in the internet, rather in an exponentially increasing number of IT infrastructures.
  • Materialize data into new services or into new ‘data products’.Some examples of new technologies for data collections6 are: web logs; RFID; sensor networks; social networks; social data (due to the Social data revolution), Internet text and documents; Internet search indexing; call detail records; astronomy, atmospheric science, genomics, biogeochemical, biological; military surveillance; medical records; photography archives; video archives; large-scale eCommerce. In fact, in order to manage this huge amount of data, when it comes to human-computer interaction there is a need to distil the most important information to be presented it in a humanly understandable and comprehensive way. Here it comes visualisation, which is a way to interpret and translate data from computer understandable formats to human ones by employing graphical models, charts, graphs and other images that are conventional for humans7. In a sense we can define visualisation as any technique for creating images, diagrams, or animations to communicate a message or an idea. Since from the beginning of human history, visualisation has been an effective way to communicate both abstract and concrete ideas ------------------------http://www.livework.co.uk/articles/data-is-the-new-oil-part-1-business-informationData, whilst valuable, is a commodityThis is where the process of refinement comes in. We need to refine the data into services. And these services need to meet the needs and issues of the businesses that information providers hope to sell to.Data owners need to think about how to use their data to help fix their customers’ challenges rather than focusing on the number of data sets they can sell.We use information about location, weather, traffic conditions in ways that help us make decisions and fit well into our lives. We all know that information can be live, dynamic and personal to our life context. If data providers do not adopt this kind of Service Thinking then they will be superseded by more agile providers or by Google themselves. The opportunity is there for information businesses to significantly add value to their data assets by treating the provision of information as a service.---------------------------http://ana.blogs.com/maestros/2006/11/data_is_the_new.html “Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc., to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.”---------------------------http://www.forbes.com/sites/perryrotella/2012/04/02/is-data-the-new-oil/according to IBM, the digital universe will grow to eight zetabytes by 2015real impetus is the potential insights we can derive from this new, vast, and growing natural resource. If data is the next big thing, then companies need to think about a new business model that exploits this valuable resource.
  • Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.Today’s commodity hardware, cloud architectures and open source software bring big data processing into the reach of the less well-resourced. Big data processing is eminently feasible for even the small garage startups, who can cheaply rent server time in the cloud.The value of big data to an organization falls into two categories: analytical use, and enabling new products. Big data analytics can reveal insights hidden previously by data too costly to process, such as peer influence among customers, revealed by analyzing shoppers’ transactions, social and geographical data. Being able to process every item of data in reasonable time removes the troublesome need for sampling and promotes an investigative approach to data, in contrast to the somewhat static nature of running predetermined reports.
  • Noah Iliinksy’s Designing Data Visualizations Author of Beautiful Visualization & O’Reilly’s Designing Data VisualizationsNoah Iliinsky, of Complex Diagrams and Designing Data Visualizations, takes our focus from the clear and factual to good storytelling. While data has its properties that need to be honored, he places equal emphasis on knowing your audience and being able to state exactly what it is you want to convey. In terms of design advice, Iliinsky is slightly less explicit about established rules. He borrows a quote from Moritz Stefaner, that "position is everything, color is difficult." No one wants to see arbitrarily chosen, confusing color schemes, but it's no reason to shy away from it completely.Jock Mackinlay’s The Science of Visualization- Tableau
  • Goal: pop out important information to present effectivelyTake advantage of human visual comparison/system
  • http://www.infovis-wiki.net/index.php/Preattentive_processinghttp://www.csc.ncsu.edu/faculty/healey/PP/index.html
  • “The representation and presentation of data that exploits our visual perception abilities in order to amplify cognition”http://complexdiagrams.com/2009/03/tire-chart/ Toughness axis (vertical) isn’t well-defined/ordered: “burly” vs “svelte” gives an idea but is intentionally ambiguous (loose categorical grouping) Rim sizes are preattentively differentiable Price & special features not included in this level of use other ideas: filter by rim size (and price), use icons, reduce grid lines (nominal categories?)
  • Hans Rosling: TEDTalks “Myths about the developing world“ (2006)
  • When you don’t yet have a story to tellEach color corresponds to a different group within the professional network, which can be labeled by the user. The graph should allow users to recognize connections that share mutual people, or indentify areas that might be underrepresentedZoomable interface. Select a node to see highlighted nodes that are mutual connections.
  • http://qph.cf.quoracdn.net/main-qimg-40df8574b885918dde4c2496025a323fuse visuals to thinkExperience is active and involves people trying to answer questionsTask: “question answering”
  • Visual properties don’t help us compare the share of each client
  • Use defaults: timelines for timeseries, maps for geographic data
  •  This just takes technology and pours it into a periodic table-shaped box. Timelines are great — it’s a really powerful axis, that time axis, because you can see where there are clumps and trends. Pour it into a box like [the periodic table] and you get none of that.
  • Timeline is obviousPlacement is keySee departure and arrival times and flight duration in relation to one anotherTime bar across the top has both time zones listedsort order (ranked) “agony” filter? “Agony” is a combination of price, time of day, number of stopovers. That’s the one you want! That’s really smart.
  • Axes give you information for free About targets When searching (think grouping)
  • The top image is an example of poor use of colour to represent sea elevation and land topology. The hues have no natural order and only simply disrupts the reading.The bottom map uses natural colours (blue for ocean and brown for land). It shows ordering and depth/height using varied levels of saturation and luminance.
  • Color is meaningful
  • http://fellinlovewithdata.com/guides/the-hidden-legacy-of-bertin-and-the-semiology-of-graphicshttp://mkt.tableausoftware.com/downloads/designing-great-visualizations.pdf
  • What does data tell us about ourselves and the places (cities, streets, buildings) we live in?A researcher, engineer in the domains of user experience and data science- Investigates interplay between people and data.
  • “A good sketch is better than a long speech.”
  • We have been focusing on specific types of data, we call ‘network data’. Network data are the byproducts of ourinteractions with digital infrastructures as nicely animated here by our friend TimoArnall in his project ‘Wireless in theworld’ http://www.nearfield.org/2010/06/new-film-wireless-in-the-world-2. Practically, we have materializinginformation from pretty much anything that is networked in our cities: cellphones, cars, shared bikes, digital cameras,credit cards, ...Video: making invisible wireless technologies visible, in order to better understand and communicate with and about them. Here we are creating communicative material that uses dashed-line abstractions to visualise the presence of wireless technologies in the everyday environment. What if we could see every field produced by an Oyster card or NFC enabled mobile phone for instance?http://www.nearfield.org/2010/06/new-film-wireless-in-the-world-2
  • We have been focusing on specific types of data, we call ‘network data’. Network data are the byproducts of ourinteractions with digital infrastructures as nicely animated here by our friend TimoArnall in his project ‘Wireless in theworld’ http://www.nearfield.org/2010/06/new-film-wireless-in-the-world-2. Practically, we have materializinginformation from pretty much anything that is networked in our cities: cellphones, cars, shared bikes, digital cameras,credit cards, ...Video: making invisible wireless technologies visible, in order to better understand and communicate with and about them. Here we are creating communicative material that uses dashed-line abstractions to visualise the presence of wireless technologies in the everyday environment. What if we could see every field produced by an Oyster card or NFC enabled mobile phone for instance?http://www.nearfield.org/2010/06/new-film-wireless-in-the-world-2
  • http://villevivante.ch/Based on this conclusion the City of Geneva decided to take the challenge to visualize these digital traces created by our mobile phones. The objective of this installation is to make this data visible and allow you to explore these streams of connected people around the city, in their everyday life.
  • Cumulative activity of the city per hour & per daySize + brightness indicates aggregate activity at that hour-----------------------------Every mobile phone leaves digital traces permanently, while interacting with the mobile infrastructure.Geneva generates approximately 15 million connections from 2 million phone calls per day. These 'digital traces' offer new insights about the city, which are of great interest both from a economic and political perspective. innovation opportunity for new citizen services like traffic jam detectors or nightlife buzz indicators.public administration can evaluate urban planning strategies.reveal insights for businesses on how popular certain districts are, during what time periods. reveal information that is invisible in traditional visualization techniques such as cartography.
  • The process of innovating with (network) data demands several clear steps, each with their own set of questions andanswers: From the data access and collection techniques, that feed data to obfuscations algorithms and big datamanagement systems that are interrogated by basic data mining operation or advanced statistical inquiries. Informationvisualization techniques are then used to build evidences and indicators used to interrogate further the data.Innovate with data : iterate through process, métiers, sketch, sketch and sketchThe process involves multiple practices and skills from engineering, to statistics, design, strategy planning, productmanagement and law.
  • sketches with the data at hand at each steps. We use this sketches to answered some questions that generate newinterrogations for the next phase
  • Sketching is not a new practice as part of a creative activity. Sketching has been widely used to innovate in drawing,painting and architecture all domains related to visualization and communication. For instance Le Corbusier whochanged the face of architecture was famous to sketch while presenting his projects and ideas:“Through visual artifacts, architects can transform, manipulate, and develop architectural concepts in anticipation offuture construction. It may, in fact, be through this alteration that architectural ideas find form”
  • The project gathered multiple practices from a Network Engineer to help access the data to a Product manager that had to transform insights scenarios of product.Engineer Data: network of cells that distribute phone conversationsProduct manager view: sees the data through customers and their interactionsability to quickly sketch an interactive system is a way to develop a common language amongst varied stakeholdersallows them to focus on tangible opportunities of products or services that are hidden within their data
  • produced a sketch to showed the data we were trying to transform, for instance revealing the quality of the data to measure mobility and the type of information that could be extracted (here mobility and density of activity on the network).
  • In this project, we first helped the Louvre formulate needs to measure of occupancy levels and flows. We create an inventory of the availability of datasets both internally and externally in partnership with sensor network providers. We then considered the complementarity of the information to define indicators that help facility managers, museologists and architects evaluate their strategies. We helped them design novel strategies to control hyper-congestion and ensure a good visiting experience.
  • So far, administrators of the museum only had a partial understanding of the problem based on observations and surveys.Used BitCarrier to collect emperical data on flows and densities of visitors in key areasBased on the measures of occupancy levels, visiting times, and centrality of trails, we developed a solution that measures the influence of hyper-congestion on the visiting experience in the most popular rooms of the museum.These results can influence the remodeling of areas and the deployment of information kiosks and help evaluate strategies and policies to control hyper-congestion.
  • Limitations of quants: how to qualify how people walk, etc.Doors were closed because the crowds became too largeSo we used our sketches to confront our measures and indicators with people on the field. Their *qualitative evidences* helped contextualize and qualify the early results as well as explain the detected irregularities. This qualitative view reinforced the quantitative observations and consolidated the overall knowledge on hyper-congestion. In other words, network data tell a story, not THE story.
  • Limitations of quants: how to qualify how people walk, etc.Doors were closed because the crowds became too largePeople on the field have the experience to help contextualize the data and early resultsSo we used our sketches to confront our measures and indicators with people on the field. Their *qualitative evidences* helped contextualize and qualify the early results as well as explain the detected irregularities. This qualitative view reinforced the quantitative observations and consolidated the overall knowledge on hyper-congestion. In other words, network data tell a story, not THE story.
  • Explore new roles of banks in the smart cities in the near future: needWe used maps (see examples) and interactive proof of concept to provoke the exploration of opportunities for innovative BBVA internal and external services. This investigation process led us to co-create opportunities to exploit data in the domains of distribution strategies, audience profiling and social navigation.
  • New perspectives for innovative servicesThis investigation process led us to co-create opportunities to exploit data in the domains of distribution strategies, audience profiling and social navigation.As part of our consulting work, we sketched a pretty advanced dashboard for participants of the project to explore and interrogate their data with fresh perspectives. (Here a mix of social network and credit card activity in Madrid). The use of the dashboard helped the participants craft and tune indicators that qualify the space (e.g. the streets of a city) based on its business activity. This experience was used to develop specific scenarios involving services and products that exploit a bank could take advantage of. multiple perspectives extracted from the use of exploratory data visualizations is crucial to quickly answer some basic questions and provoke many better ones-that generate new interrogations for the next phase
  • Quadrigram is an online platform with a Visual Programming Language, that can be used to gather data and generate meaning through data processing and information visualization. Modular interface to design information flows, linking data resources to operators, controls and viz methods within node-based GUI that displays structure of your process. These modules form a data flow when you link them together. Each time you modify a modules, the update is propagated throughout the flow. Access, manipulate, analyze and visualizeFreely explore multiple dimensions of a single dataset, each time generating a set of questions and answers.Additionally they reduce the prototyping time necessary to sketch interactive visualizations that allow the different stakeholder of an organization to take an active part in the design of services or products.
  • Real-time traffic information: their sensor networks measures the quantity and speed of the traffic in key areas of a city.Exploratory data analysis approach to create an interactive applicationFive representations of a single data set:Table visualizer (rows & columns)Network visualization to see relationships between pointsGeodata to view points on map to view context. View trajectory of traffic in a single slice of timeData in real-time. Incoming up-to-the-second data to see motion of traffic between points, moving at different velocities Data as a living materialTemporal data: temperature data--------------
  • Real-time traffic information: their sensor networks measures the quantity and speed of the traffic in key areas of a city.Exploratory data analysis approach to create an interactive applicationFive representations of a single data set:Table visualizer (rows & columns)Network visualization to see relationships between pointsGeodata to view points on map to view context. View trajectory of traffic in a single slice of timeData in real-time. Incoming up-to-the-second data to see motion of traffic between points, moving at different velocities Data as a living materialTemporal data: temperature data--------------
  • This example shows how multiples interrelated perspectives on the same data (temporal bar charts, quadrifications, maps, and scatter plots) can create a powerful tool that permits us to explore the activities of a company by projects, sectors, location, and profitability.This application collects and analyses the sentiment expressed in real-time on Twitter. The results shows the positive and negative polarities with respect to a word you define.So, we have seen that our world produces new type of data - network data - that is now treated is a material. There areboth processes and tools that help innovate with this evolution. From our experience, there are values to sketching withdata, in the same ways as strategists, innovators and world changers have been using sketches in the past.
  • Visualization is one of the most advanced fields in policy modeling, being able to foster the design of more effective and efficient policies, as well as to make sense of large datasets, such as those provided as open government data. In fact the huge increase in data availability is also due to the so called "open data" movement, characterized by the fact that all across Europe and the US, governments are increasingly publishing their data repositories for other people to access and use it.
  • This map visualizes crowd-sourced radiation geiger counter readings from across Japan. Click on the labels to get more information on the source of each reading.The number of locations fluctuate due to the validity of the data feeds. There are approximately 185 feeds from the official Japanese government source MEXT and the rest are from other sources such as the Tokyo hackspace, universities, local councils and concerned individuals.
  • http://cpstiers.opencityapps.org/
  • Simon Rogers is editor of The Guardian Data Blog (www.guardian.co.uk/data, @datastore) an online data resource which publishes hundreds of raw datasets and encourages its users to visualise and analyse them. He is also a news editor on the Guardian, working with the graphics team to visualise and interpret huge datasets.
  • Simon Rogers is editor of The Guardian Data Blog (www.guardian.co.uk/data, @datastore) an online data resource which publishes hundreds of raw datasets and encourages its users to visualise and analyse them. He is also a news editor on the Guardian, working with the graphics team to visualise and interpret huge datasets.Manually pick out data from PDF to extract specific information
  • The tools we have to analyse the data may have changed; that motivation has stayed exactly the same.How all the spending fits together: see department cuts and which programmes received big increases (nuclear, defence)Most comprehensive atlas of public spending availableEver year every government dept publishes an annual report which includes breakdowns of spendingManually pick out data from PDF to extract specific information
  • The data itself covers over 194,000 individual transactions, payments to suppliers and bills covered by government departments in the first five months of the life of the Coalition. There's lots excluded, though: the NHS, benefit payments, spending by quangos, information removed for "national security" and personally confidential reports. It's about £80bn of an annual spend of £670bn.We figured 170 spreadsheets is too much for most people to browse, so Guardian lead software architect Matthew Wall has built this usefulspending data explorer app. It's designed to make it easier for you to search and download the key data you're interested in.We may even have done some of the analysis you're looking for already. We've combined spending for each department into single spreadsheets. Here's what you can find:• Sheet 1: Every item for the department• Sheet 2: Detailed breakdown of type of spending• Sheet 3: Broader breakdown into fewer areas• Sheet 4: Every supplier listed in alphabetical order and by size (watch out on this one for different spellings of the same supplier)
  • Soldiers are good at entering data – locations where soldiers died in Afghanhistan (date, what happened, # of casualties, summaries)
  • Interactive map display region using wikileaks war log dataWikileaks: every IED attack, with co-ordinates2004-2009
  • Made it more interesting/rewarding for people: asked ppl to do smaller tasks with reduced number of data “zooniverse” – citizen science project to transcribe documents, visually classify images, categorize etc. added recognition to users: keep track of task assignments, see progress reward for work: identification from journalists / editorial feedback allow users to skip over uninteresting docs – lead to users reviewing more docs on average ability to view data about your own MP
  • BlackoutGate: massive cover-up of their expenses after the Commons authorities released hundreds of thousands of claims documents and receipts with huge sections of detail blacked out. belief that publication would be in breach of the Data Protection Act.http://www.guardian.co.uk/politics/2009/jun/18/mps-expenses-censorship-black-out
  • http://storify.com/smfrogers/making-a-map-togetherhttp://www.guardian.co.uk/uk/datablog/2012/apr/12/deprivation-poverty-london
  • First time such a major attempt had been made to forensically examine the motivations behind a riot since the work in Detroit in 1967Gathered qualitative data of the interviews and quantitative responses to a set of questionsUK riots: every verified incidentCollected key reported incidents from as many possible sourcesRaw data in Google spreadsheets: approx time, date, place, location details, local authority, what happened, sourceMapped with Google Fusion tables
  • England riots: suspects mapped and poverty mapped

Strata Conference 2012 Strata Conference 2012 Presentation Transcript

  • O’Reilly Strata ConferenceMaking Data WorkFeb 28 - Mar 1Santa Clara, CAMichelle Li
  • Conference Overview • 3 days of workshops, lectures, keynotes, startup showcase and a mini Maker Faire • Developers, data scientists, data analysts, and other data professionals including researchers, designers, journalists • 5 different session tracks: Data Science, Deep Data, Business & Industry, Hadoop and Big Data (Applied & Tech), Domain Data, and Visualization & Interface© 2011 Oculus Info Inc. 2
  • Evening events included: • Mini Maker Faire – showcase of innovative data-related hardware, apps, and robots • Data Crush: Where Wine and Data Meet – wine tasting event where participants provide feedback data that was compiled and analyzed to extrapolate behavioural trends and factors influencing their responses • Startup Showcase – live demo program and competition for 10 finalist startups and early- stage companies to demonstrate their innovations to judges, investors, entrepreneurs, journalists© 2011 Oculus Info Inc. 3
  • Who goes to a conference about data? Over 2000 attendees from various organizations: Microsoft Digg Google Groupon Apple PayPal Netflix Infochimps IBM Tableau Oracle VMware LinkedIn Guardian News Facebook The Seattle Times Twitter MIT Media Lab Amazon …© 2011 Oculus Info Inc. 4
  • © 2011 Oculus Info Inc. 5
  • Data is the New Oil Source: http://www.house.gov/apps/list/press/tx08_brady/71509_hc_chart.html© 2011 Oculus Info Inc. 6
  • Materialize Data Into New Services© 2011 Oculus Info Inc. 7
  • Google Insights: “Infographic”© 2011 Oculus Info Inc. 8
  • Google Insights: “Infographic” vs “Big Data”© 2011 Oculus Info Inc. 9
  • Session Overviews o Data visualization • how we communicate information • visual analysis and principles for designing effective data views • design process and visualization tools for presenting data o Data Journalism • creating data stories to share information socially o Democratization of Data • data for the common good© 2011 Oculus Info Inc. 10
  • Noah Iliinksy, Complex Diagrams Jock Mackinlay, Tableau DESIGNING DATA VISUALIZATIONS© 2011 Oculus Info Inc. 11
  • Data Visualization The representation and presentation of data that exploits our visual perception abilities in order to amplify cognition© 2011 Oculus Info Inc. 12
  • Science of Visualization o Humans are slow at mental math; but we’re faster when using the 34 world around us o Human perception is powerful but x 72 VS perception can be aided and augmented by visual prompts o Finding patterns is key to information visualization • We have a flexible pattern finder coupled with an adaptive decision-making mechanism© 2011 Oculus Info Inc. 13
  • Visualization Makes Data Accessible Allows us to easily see trends and patterns© 2011 Oculus Info Inc. 14
  • Leverage the Amazing Abilities of Our Eyes and Brain Preattentive features: length, width, size, colour, closure, number, intersection, contrast, tilt, cur vature, etc.© 2011 Oculus Info Inc. 15
  • Faster Access to Actionable Insights Difficult to compare 15+ tire models with Chart allows customer to focus on appropriate tires based different characteristics on 3 axes of data: • rim diameters, various widths, various • desired rim size features, price, special features • tire width • toughness/quickness Source: http://www.rivbike.com/Tires-Pumps-Patches-s/52.htm Source: http://complexdiagrams.com/2009/03/tire-chart/© 2011 Oculus Info Inc. 16
  • Allows Access to Huge Amounts of Data GapMinder Public health data on a massive global scale Understand data through stories Source: gapminder.org© 2011 Oculus Info Inc. 17
  • Visualization for Exploration LinkedIn Maps© 2011 Oculus Info Inc. 18
  • Visualization for Explanation© 2011 Oculus Info Inc. 19
  • Visualizing Data Data has properties • categorical, quantifiable, geographic, binary • continuous, non-continous, ordered • timeline© 2011 Oculus Info Inc. 20
  • Define Knowledge Before Structure Donut charts: Aesthetically pleasing but not very functional in these cases. Good: Individual donuts good for glance of relative share of total market Chart #1 • Comparing series of donut charts is meaningless • Shows time series data over 7 donuts Chart #2 • Too many wedges • Many of the wedges are similarly sized • Non-standard sort Source: http://litmus.com/blog/email-client-market-share-infograph/email-client-market-stats-1000© 2011 Oculus Info Inc. 21
  • Use Defaults Time series data is usually best shown in a line graph Shows sequential changes more easily than comparing wedges between donuts Line graph shows trends more clearly© 2011 Oculus Info Inc. 22
  • Simple bar graph, but it’s much easier to extract knowledge from it© 2011 Oculus Info Inc. 23
  • Unless your data is periodic, don’t put your data in a periodic table Chronological timeline Family tree Influence of different controllers Meaningful context© 2011 Oculus Info Inc. 24
  • Encoding Well Position is everything. Colour is hard. /Moritz Stefaner© 2011 Oculus Info Inc. 25
  • Position is Everything© 2011 Oculus Info Inc. 26
  • Position is Everything© 2011 Oculus Info Inc. 27
  • Colour is Difficult Colour can be used effectively in information display • Naturally codes attributes of objects • Not naturally ordered in our brain Excellent for labelling and categorization • Works well for heat maps/temperature and categorization Poor for displaying shape, rank, order, detail or space • Not effective for quantitative data© 2011 Oculus Info Inc. 28
  • Colour is Difficult© 2011 Oculus Info Inc. 29
  • Retinal Properties o Jacques Bertin identified that every visualization is made up of basic components o Each component has different expressive power o Each works best only in some conditions o 6 basic variables: size, value, texture, colour, orientation, sha pe o Jock Mackinlay applied these same principles to automatically construct visualizations out of data Four dimensions of data shown Diagram shows how each visual effectively in traditional scatter component works best in each case plot generated by computers and how to use them.© 2011 Oculus Info Inc. 30
  • Appropriate Encodings http://complexdiagrams.com/properties© 2011 Oculus Info Inc. 31
  • Fabien Girardin, Near Future Laboratory SKETCHING WITH DATA© 2011 Oculus Info Inc. 32
  • Napolean “Un bon croquis vaut mieux qu’un long discours.”© 2011 Oculus Info Inc. 33
  • Network Data© 2011 Oculus Info Inc. 35
  • Urban Demos ‘Urban demos’ reveal how the city lives through its data. The City of Geneva visualized digital traces created from cellular network activity. They reflect mobility in a city or a street and reveal insights about a city that are of importance from an economic and political perspective.© 2011 Oculus Info Inc. 36
  • Digital Traces© 2011 Oculus Info Inc. 37
  • Process© 2011 Oculus Info Inc. 40
  • Innovate With Data - Sketch© 2011 Oculus Info Inc. 41
  • Sketching With Data Sketch: to think, to make an idea tangible (and observe its different dimensions and implications), to tell stories, to share discoveries A rough version of a creative work, made to assist in reaching coherent result Key values of sketching: • share common language • qualify results • explore ideas© 2011 Oculus Info Inc. 42
  • Sketch To Share A Common Language Sets a common language among different actors of the project how they understand the data and how the data can be used Project: explore novel services for mobile phone operators using aggregated cellular network activity Network Engineer’s view of the data Product Manager’s view of the same data© 2011 Oculus Info Inc. 43
  • Sketch To Share A Common LanguageThis is an early sketch to show thedata they were trying totransform, which reveals thequality of the data to measuremobility and density of activity onthe network 44© 2011 Oculus Info Inc.
  • Sketch To Qualify Results Project: Controlling hyper-congestion at le Louvre to create an enjoyable visiting experience Hypercongestion refers to the situation in which the quantity of visitors in a space influences negatively the quality of their visiting experience and their security.© 2011 Oculus Info Inc. 45
  • Sketch To Qualify Results o Used network of sensors over 10 days around critical areas to collect empirical data on flows and densities of visitors in key areas o Measured occupancy levels, visiting times, and centrality of trails o Field experts (security guards) helped contextualized data and early results through sketches o These results can influence the remodeling of areas and the deployment of information kiosks and help evaluate strategies and policies to control hyper-congestion© 2011 Oculus Info Inc. 46
  • Defining Measures of Hyper-Congestion • Measures provided insights and revealed symptoms of hyper-congestion, but were insufficient to describe the cause of the issue • how to qualify how people walk, etc. • Sketches were produced after each data collection period: visualized information about visiting sequences, travel times how long visitors stayed in each room • Used sketches to discuss with people in the field, who provided qualitative evidence to contextualize and qualify results and explain detected irregularities© 2011 Oculus Info Inc. 47
  • Defining Measures of Hyper-Congestion Network data tells A story, not THE story© 2011 Oculus Info Inc. 48
  • Sketch To Explore Ideas Project: Explore the role of a retail bank BBVA in smart cities in the near future Explored opportunities for innovative services to exploit data in the domains of distribution strategies, audience profiling and social navigation© 2011 Oculus Info Inc. 49
  • Sketch To Explore Ideas Created multiple prototypes to explore opportunities for innovative BBVA internal and external services Project participants were able to explore and interrogate the data from multiple perspectives Use of the dashboard helped participants develop specific scenarios involving services and products that a bank could take advantage of© 2011 Oculus Info Inc. 50
  • Interactive Sketching Tool: Quadrigram Data manipulation and visualization environment using a visual programming language Modular, node-based interface for designing data flows, linking data resources to operators, controls and visualizations WYSIWG interface designed for iterative exploration and explanation, allowing us to generate new questions and provide answers with data© 2011 Oculus Info Inc. 51
  • Access, Manipulate, Analyze and Visualize Real-time traffic information Five representations of a single data set: 1. Table visualizer (rows & columns) 2. Network visualization to see relationships between points 3. Geodata to view points of map to see context 4. Data in real-time visualizes traffic moving at different velocities 5. Temporal data© 2011 Oculus Info Inc. 52
  • Access, Manipulate, Analyze and Visualize Data as living material© 2011 Oculus Info Inc. 53
  • OPEN DATA & DATA JOURNALISM© 2011 Oculus Info Inc. 55
  • Data Journalism • Data is changing journalism in several ways • New ways of visualizing complexity • Provide real answers, based on evidence rather than assertion • Democratization of tools and data platforms to help people understand information and share stories • Bigger datasets about really small things o Allows you to search data o Make complex maps really quickly • Crowdsourcing o Aggregated input from the public is powerful for disaster response o Accurately depicts dynamic situations • Open data means open data journalism • Governments are increasingly publishing their data repositories for other people to access and use it© 2011 Oculus Info Inc. 56
  • Japanese Geiger Maps Using Pachube to aggregate geiger counter readings from various data sources • Geiger counter – readings for Tsunami/Fukushima facility • Government was releasing information only once per day in PDF format – only numerals; nothing about what they mean • Pachube community created tutorials- collected and aggregated measurements from various sources and hooked them up to the web • Suddenly 2000 feeds/minute across Japan • People took data and built applications to represent data in terms of health consequences and change from background radiation http://japan.failedrobot.com/© 2011 Oculus Info Inc. 59
  • Winds of Fukushima Android App: took your geolocation, wind direction and nearby radiation monitors to infer where radiation may peak next Android app: Winds of Fukushima© 2011 Oculus Info Inc. 60
  • After the tsunami and earthquakes, Toyota and Honda shared their data to map out usable roads© 2011 Oculus Info Inc. 61
  • Crowdsourcing Datasets Understand trends of the data set Help find anomalies People measured things that might not be measured by the offical network Public visibility and accountability- get people from different domain expertise to talk about the data© 2011 Oculus Info Inc. 62
  • Simon Rogers, Guardian THE CRAFT OF DATA JOURNALISM© 2011 Oculus Info Inc. 64
  • Behind the Scenes at the Guardian Datablog Datablog started off as a small blog offering full datasets behind their stories and now publishes hundreds of raw datasets, data visualizations and data analyses Process o Locate the data or receive it from various sources (e.g. breaking news stories, government data, journalists’ research) o Examine the data: transform for quality/purpose, tidy up, consolidate o Perform calculations and statistical inquiries to see whether there is a story o Output a story, graphic or visualization • Excel/Google charts for small line graphs and pies • Google Fusion Tables for maps • Internal dev team produce the more sophisticated graphics© 2011 Oculus Info Inc. 65
  • The First Guardian Data Journalism: May 5, 1821 • Contained a table of data: a list of schools in Manchester and Salford, with the number of students at each and the average annual spending ie. how many pupils received free education and how many poor children there were in the city • Official statistics were collected by only 4 clergymen, which resulted in inaccurate and faulty data • Leaked by a source identified as “NH”, the data caused a huge sensation • Revealed that 25 000 children were receiving free education instead of the 8 000 that was officially estimated • Using data to show the true state of affairs to help fight for a decent© 2011 Oculus Info Inc. 66 education system
  • Public spending by the UK’s central government departments 2010-2011© 2011 Oculus Info Inc. 67
  • © 2011 Oculus Info Inc. 68
  • Becoming Data Providers© 2011 Oculus Info Inc. 69
  • Exploring the Data 170 spreadsheets of government spending data Guardian created a spending data explorer application Designed to make it easier for people to search and download key data Simply analysis has already been done: combined spending for each department into single spreadsheets© 2011 Oculus Info Inc. 70
  • Wikileaks Afghanistan War Logs Wikileaks log of every IED attack with co-ordinates from 2004-2009 Soldiers are good at entering data: locations of where soldiers died in Afghanistan, including date, what happened, number of casualties, and summaries© 2011 Oculus Info Inc. 71
  • Bigger Datasets Of Smaller Things:Every IED attack from 2004-2009© 2011 Oculus Info Inc. 72
  • Crowdsourcing Experiment: MP Expense Scandal • Big release of MP’s documented expense claims – 458,000+ documents • The Guardian developed a crowdsourcing application in 5 days • Within 10 minutes of the launch, 323 people were using the application to go through the documents • First half hour, more than 2000 pages had been reviewed • Each receipt filed by an MP were converted into an image for the public to review • Users reviewing were asked to determine and detail what entries there were on a page and flag them as unimportant, interesting, “interesting but known” or worthy or investigation http://mps-expenses.guardian.co.uk/© 2011 Oculus Info Inc. 73
  • © 2011 Oculus Info Inc. 74
  • What Was Revealed… • Douglas Hogg, Conservative MP for Sleaford and North Hykeham, charged £2,115 to have the moat cleared at his Lincolnshire estate and claimed bills for a "mole man". • Sir Peter Viggers, Tory MP for Gosport, claimed £1,645 for a floating "duck island" in the garden of his Hampshire home as part of £32,000 of gardening expenses over three years. • Jacqui Smith, the former home secretary, claimed £10 for two adult films which were accessed by her husband at her constituency home. • Tony Blair claimed almost £7000 for roof repairs two days before leaving office and standing down as MP.© 2011 Oculus Info Inc. 75
  • London Riots Instant data journalism: filling the hole of knowledge for anyone wanting to know what was happening where • Collected key reported incidents from as many possible sources • Compiled a list of every incident where there was a verified report, then mapped it with Google Fusion tables • Allowed people to download the data behind it – possibly the the simplest but most popular thing they did© 2011 Oculus Info Inc. 77
  • Reading the Riots o Project took a look at the riots as experienced by those who were there o A specially-recruited team interviewed around 270 people about the riots and why they had been involved© 2011 Oculus Info Inc. 78
  • England Riots: Was Poverty A Factor?© 2011 Oculus Info Inc. 79
  • ‘Riot Commute’ • Data from 1,100 individual’s magistrate’s court records that included postcodes for defendents’ home and offence locations • 70% of those accused of riot-related crimes travelled from outside their area • Riots occurred in the city centre, but accused rioters lived in out districts • Travelled an average of 2.2 miles from home to the riot offence site • Transport mapping specialists modelled the most likely routes from home to offence© 2011 Oculus Info Inc. 80
  • How Riot Rumours Spread On Twitter • Many people, including the PM and acting head of the Metropolitan police, blamed Twitter for spreading the disorder • Analysis of 2.6 million riot-related tweets suggested a different conclusion: the network was able to collectively dispel and clarify false information • Picked a subset of more than 10 000 tweets concerning 7 key rumours that emerged during the riots© 2011 Oculus Info Inc. 81