Good afternoon, thank you to Michael and Paul. My name is Graham Hyde and I want to give you some food for thought some things to think about. Without wanting the state the obvious, times are changing within the public sector, transparency, diminishing budgets, doing more with less, localism, NHS restructuring, as I’m sure you are all aware, but what does that mean for us as an organisation who collects health related data and also disseminates it to the wider world? I am best known for heading up the Population and Geography team here at the Information Centre so I want to talk to you about geography, because that is my pet subject, but also to talk about wider initiatives and thinking about data collection, presentation and dissemination that are also going on. There will be some reference to social media which I am rather fond of, and if you haven’t found twitter yet, I would urge you to do so.
I would like to talk to you about data, opendata, linked data, free data, crowd sourced data, social data, map data, maps, geography
BUT, as I’m a geographer I’m not going to mention John Snow but here is his map to kick things off.
There is no doubt that as a planet we are “Data Rich”, but what should we do with it all? The amount of data we collect is absolutely staggering. During 2009, the amount of digital information grew 62% from 2008 to 800 billion gigabytes
When the Sloan Digital Sky Survey started work in 2000, its telescope in New Mexico collected more data in its first few weeks than had been amassed in the entire history of astronomy. Now, a decade later, its archive contains a whopping 140 terabytes of information. A successor, the Large Synoptic Survey Telescope, due to come on stream in Chile in 2016, will acquire that quantity of data every five days. Such astronomical amounts of information can be found closer to Earth too. Wal-Mart, the retail giant, handles more than 1m customer transactions every hour, feeding databases estimated at more than 2.5 petabytes
What is a petabyte? Its a lot of data
The amount of digital information created in 2010 could fill 75 billion fully-loaded 16 GB Apple iPads or 1.2 zettabytes What does 1.2 zettabytes look like? The iStack would cover Wembley Stadium’s pitch and reach approximately 4.24 miles into the sky The ipads would have a retail value of $37.4 trillion dollars which is equal to about 44% of the entire world’s GDP
Wikipedia suggests that Facebook has 600 million active users in service which was only established in 2004. Twitter, 5 today, has a reported 190 million accounts sending 1billion tweets a week and 460,000 people joining every week Foursquare has 6 million users Flicr the online photo sharing site has 32 million users Analysing my tweets I send 2.8 tweets per day and 47 tweets per month Facebook, a social-networking website, has 600 million active users and is home to 40 billion photos. All these examples tell the same story: that the world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. Do they have any value? Can we make any money? The founders of these sites are certainly doing OK, but what is the point of Facebook places, Foursquare, Gowalla and Google Places. Creating business databases of course. To sell, to send you marketing information.
But can we use all these social networking sites for anything? We can create maps from tweets…. This is London mapped by the very clever people at CASA (UCL CENTRE FOR ADVANCED SPATIAL ANALYSIS) Scientists have created a Twitter map of London to reveal the city's tweeting hot spots, with areas of the capital renamed to correspond with their traffic levels. Peaks - which have been given names such as Soho Mountain, Camden Town Ridge and Piccadilly Rock - have been depicted as mountains and areas with few tweets are shown as valleys. The project was the brainchild of 'Tweetographer' Fabian Neuhaus, from UCL's Centre for Advanced Spatial Analysis, who tracked the origins of the 120,000 tweets posted in London every week. Unsurprisingly, the Twitter peaks were concentrated in the centre of London. Mountainous areas were renamed accordingly as Battersea Hill, Waterloo Hang, Fulham Point and Parson's Green Ridge. Troughs in London include Sutton Valley and South Harrow Pit. Neuhaus and his team collected the data from tweets sent by mobile devices within a 30km radius of London. The data for the maps are derived from tweets sent via mobile devices that include the location at the time messages are sent Read more: http://www.dailymail.co.uk/sciencetech/article-1290777/The-London-Twitter-o-Meter-Boffins-map-city-tweet-tweet.html#ixzz1GZ41l5zr
I want to talk about Open Data. Under the previous government we saw the Making Public data public initiative and in early 2010 we witnessed the launch of a new British government website offering free access to a huge amount of public-sector data for private or commercial reuse We know this as data.gov.uk The data that is found on this web site is often referred to as Open Data. Data.gov.uk launched with more than 2,500 data sets
Open Data At the lastest count there are 5,600 datasets available Data is available from Central government, public sector bodies and local authorities I’m sure you have all heard of the more high profile datasets on there – expenses for one Department of Health 1060 datasets Datasets can also be tagged to allow more intuitive searches – and better results. Tags Health 2168 tags Care 1548 Health-and-social-care 1148
With Open Data we can create and display interesting information
It is not just statistical data, expenditure, weather data that is now made freely available. In Great Britain, as a result of considerable pressure from The Guardian’s “Free Our Data” campaign and a political change of heart mapping data is now included and has been since 1 st April 2010. Everyone who wants to can go to the Ordnance Survey web site, look at the data, order it and download it. It is available free, free of licence restrictions and available for commercial gain. Ordnance Survey also provide an API called OpenSpace where you can incorporate mapping into your web site – not for profit though Ordnance Survey OpenData can be used to create and support innovative, exciting ideas and applications. Ordnance Survey have changed from a company that likes to say No to a company that likes to say yes. http://www.ordnancesurvey.co.uk/oswebsite/opendata/
Centrally funded public sector digital map supply agreement Between Ordnance Survey and DCLG 10 years Will replace existing collective purchase agreement in central, local, health and london Access to a range of Ordnance Survey digital mapping Cost has been removed as a barrier to use Free at the point of use. Also means that data can be shared between all members of the Public Sector to open up cross sector working, sharing of best practice. National Address Gazetteer to be created as a best of breed from the NLPG, OS AL2 and Royal Mail PAF Reduce duplication ONS had to spend millions to create an address register as one didn’t exist. The agreement goes live on 1 st April 2011 Ordnance Survey is on target to sign up 60 to 70% of exisitng CPA members by then
February 2010 saw Tim Berners-Lee, the man credited with inventing the world wide web deliver a talk about Open Data. I would like to show you something that I find interesting and certainly focuses the mind on what is possible with OpenData. TED is a non profit organisation devoted to Ideas Worth Spreading. It started out (in 1984) as a conference bringing together people from three worlds: Technology, Entertainment, Design. Put your data onto the web for other people to use
One of the crucial elements of OpenData is being able to link datasets together. As a follow up to the TED talk we saw earlier, TBL now wants us to all create linked data. We have a wealth of data available but how do they relate to each other? What are the relationships? This is called Linked Data It isn’t just about putting data on the web it is also about making links between that data. For example, you want to attend an event, you find a list of attendees on the web, you pick a person and look up that person on the web, that person has a name and an address, or height and weight, we can also find out that they were born in Harrogate, Harrogate is in North Yorkshire, North Yorkshire is in England. We can look up and fetch all sorts of data, about the person, the population of the place where they live. Because each of those things is given a unique http: name, or a URI – unique reference identifier we can search on them and establish relationships. How good as an organisation you are at creating linked data can be seen with the use of a good old fashioned star chart. We are familiar with these, McDonalds’ employees have badges with stars on them. This relates to the level of service, experience and quality you can expect from them when you ask them for a Big Mac. As a really simple example of how data can be linked, I guess the odd one or two of you in the audience are on Facebook. Facebook tends to roll out new developments of its web site without much announcement and something that I’ve come across is the Friends suggests – relationships, how one person is connected to another. You may choose not to be connected to these people too! All the time on the web we are creating relationships, The same goes for photo tagging, tag the person in the photo, tag the location, bringing me back to my earlier point about we are creating data all the time. LinkedIn is the network for connecting people through their jobs and places of work Here, people are linked, relationships are established based on who is connected to who and where people work, or have worked. You can also see how people are connected, to what degree of removal from you they are.
To try and explain linked data a bit more. In this picture, the boxes are the data, the raw data, the boring brown boxes of data. Not much going on. Each box has a plant rooted in it which feeds off the data. We get lots and lots of brown boxes which we get lots and lots of things sprouting from them For each of those plants, it might be a presentation, a document, some analysis of the data, someone may be looking for patterns in the data, they get to look at all of the data and look at all the data because of the way each data item is linked to the next. The more data you have connected the better the picture the more powerful it is.
Christian Bizer , Freie Universität Berlin worked out that the little square boxes on Wikipedia contains valuable information within those boxes, data He wrote some code to extract those bits of information and imported that information into a linked data database on the web which he called the DBpedia If you look up dbpedia this is the central blob. If you extract information about London from dbpedia there are other blobs of data which are linked. This is beginning to grow. This is happening now. People are creating linked data Making data available could perhaps make the world run better, allowing people to know more stuff about stuff It is important to note that raw data is important to be exposed on the web. The IC is beginning to start this Many health datasets are all locked away in individual silos. A great way of improving knowledge is to expose this data as linked data. Each person can do a little bit, everyone else does a little bit, to create a world wide resource.
RDF - Resource Description format RDF is a standard model for data interchange on the Web. RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually referred to as a &quot;triple&quot;). Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications. RDF can be represented in a number of different serialisations (XML, N3, JSON etc.) - we are currently serialising the data as RDF/XML For more information please refer to the W3C's website .
There are of course issues with making data available in its raw format – we need to take care. How is this data interpreted, interrogated, and by who Are they qualified? Are they analysising the data properly Putting raw data out there on the web will this create a loss of analytical skills within those organisations who are simpy creator organisations Poor decision making upon poor analysis Is the data fit for purpose? I would like to highlight a recent example which hit the headlines.
http://bcsmaps.blogspot.com/ Users type in a place or a unit postcode and get data on the location showing handy point symbols identifying locations of crimes disaggregated by broad types...or do they? No! The points are some sort of generic locator placed on each road segment that is used as the point to symbolise crimes for that road. So what does Mr and Mrs average do with the site?..of course, they type in their postcode and are delighted to find out that all crime happened some distance away. Unless, that is, their property is located near the geometric centre of the road in which case they are now horrified. The use of totals and a single point is poor to say the least. If the desire is to build in fuzziness to deal with the privacy issue then use rates or proportions; use areas instead of points; or use some form of road-based linear symbol that varies by thickness or colour. Instead, the UK Police have been lazy and taken a nice standard base map service and contrived to make up points where crimes didn;t exist, then sum the crimes locally to give a value to that location. High numbers of crimes are often recorded at police stations, geocoding or addressing crimes is very often inconsistent. Often crimes are located to the postcode, who knows the postcode of a park? Often a postcode covers a large area, certainly in rural areas. Rubbish in, rubbish out.
Data can be mapped in many ways and I’d like to show you a few examples. This map rearranges the world by correlating the population of a country to actual size. Some countries (the United States, Yemen, Brazil and Ireland) remain in their original location. India has replaced Canada on the map.
2010 Map of Online Communities Navigating the ever evolving communities on the web is a difficult task – it helps to have a map. Randall Munroe of the webcomic xkcd , has updated his (in)famous map of online communities from 2007 . The 2010 version bears some striking contrast to the original. MySpace has shrunk, Facebook has swollen to gargantuan proportions, with farm related games taking up considerable real estate. China’s presence, with QQ and Happy Farm , is ever growing. During the redrawing of the map Munroe took the opportunity to delve into greater detail and places social media in the larger context of the internet as whole. It’s a fascinating piece of art, and a great way to wow yourself with the complexities of the worlds we’ve created online. It’s crazy how much changes in just three years. http://xkcd.com/802/ 2010 version http://xkcd.com/256/ 2007 version While the landscape of online communities is evolving, there seems to be one consistent trend: diversity. Recognizable giants like Facebook and YouTube hold major ground, but these maps are full of groups that demonstrate a wide variety of interests and beliefs. In the next decade or two, billions of more people will be coming online, many of them from developing nations. That growth will continue to diversify and change social media. Will the map of 2015 show Facebook dwarfed by QQ, or some new India-based video center looming over YouTube? Or will current leaders incorporate the influx of new users into their umbrella with automatic universal translations ? No matter how our internet communities change, you can rest assured that the world will continue to immigrate online. Give it a few years and social media maps may have as much importance as traditional views of the globe. I wonder how long it will take for governments and major institutions to follow the webcomic example and start to map the new digital world.
Add explanation of what this map is
Crowdsourcing – the crowd Crowdsourcing is the act of outsourcing tasks, traditionally performed by an employee or contractor , to an undefined, large group of people or community (a &quot;crowd&quot;), through an open call. What’s in it for me? Foursquare specials OpenSource code, contributing Online presentation and delivery tools – Open Office Groupon – volume local discounts Can we use this stuff in health? Patient experience, Rating hospitals and GPs, League tables of providers
With Japan suffering their most severe crisis in the past 65 years since World War II, how can modern technology help with the aftermath of the 8.9 magnitude earthquake suffered on Friday and the subsequent monstrous tsunamis? Japanese Prime Minister Naoto Kan stated &quot;I strongly believe that we can get over this great earthquake and tsunami by joining together.&quot; So by joining together all the information gathered during and after the events can we also help in a humanitarian role and help make decisions that will make a real difference? Following on from the visualisation in TBL’s talk at TED about the situation in Haiti. We have similar images showing the amount of work which went into the OpenStreetMap between 12 and 13 th March 2011. The earthquake hitting on 11 th March. One day difference.
Google, local platforms respond to Japan's 8.9 earthquake crisis An earthquake with a magnitude of 8.9 hit Japan today, resulting in tsunami warnings for 20 countries, as well as California and Hawaii. Crisis mappers wasted no time responding: In under 2.5 hours Google launched its person finder application , which was also used when New Zealand's 6.3 quake struck last month, and a local developer in Tokyo, Shu Sigashi, a member of the OpenStreetMap Foundation in Japan, quickly put up a localized Ushahidi crisis platform . Google's person finder app is already rapidly increasing in usage. Within a couple hours 2,000 reports had been logged. If you type in the name, &quot;Yoshi,&quot; in Google's app, results come up that indicate whether people with that name have been reported as alive or missing. The web, the cloud or the just plain internet has certainly become the hub of communications in the 21 st century with an almost unlimited list of uses.
The modern metropolis can often feel like a social archipelago – fragmented islands of social activity separated by large areas dedicated to commercial workplaces, flows of vehicles, residential sprawl or industrial sites. These islands of high density social encounter can be mapped using emerging data from location-based networks such as Foursquare . By visualising the aggregate data produced by these social networks, we can see how social activity in a city is distributed. In these maps, activity on the Foursquare network is aggregated onto a grid of ‘walkable’ cells (each one 400×400 meters in size) represented by dots. The size of each dot corresponds to the level of activity in that cell. By this process we can see social centers emerge in each city. The popularity of thousands of social venues — bars, restaurants, cafes, galleries, parks — contribute towards these maps. Together, they show us that social hubs emerge organically in cities. If we graph venue popularities we see a handful of social hubs trailed by a long tail of low activity locations in a distribution akin to a power law . This emergent order is a striking aspect of self-organisation in cities. The clusters of activity are typified by small, walkable links between venues. Informally, areas such as Shoreditch in London or the Lower East Side in New York are well known as walkable social hubs, but the data shows how some cities are far more walkable than others. If we explore these walkable clusters of venues in each city, we can show how Paris contains a much more contiguously walkable structure than both New York and London. The data implies that the pedestrian network is an important infrastructural element in the social life of our cities. Despite the fact that these islands are connected by long-range transport links (subway, rail), our experience of the city as a social space seems to a great degree based on walking between places of encounter.
http://www.rightcare.nhs.uk/atlas/ Awareness is the first important step in identifying and addressing unwarranted variation; if the existence of variation is unknown, the debate about whether it is unwarranted cannot take place I wanted to show you some work that the Association of Public Health Observatories has been doing in association with Sir Muir Gray They have produced an NHS Atlas of variation the paper version extends to 34 maps – they have also produced an electronic version It shows at PCT level, so at quite a high level the variation in health care across England The Atlas aims to stimulate debate about why variation exists. Neighbouring organisations might have best practice to share. Patterns of health Linking to Paul’s presentation, we both believe there is a need for lower level statistics to see in LA differences
new demands for health and social care analysis
<ul><li>......new demands for health and social care analysis </li></ul>Graham Hyde: NHS Information Centre Twitter: @grahamhyde Blog: http://allthingsgeo.wordpress.com
data open data linked data free data crowd sourced data social data map data
Structure Every 2 days we create as much information as we did from the dawn of civilisation up until 2003
Data underpins our economy and our society - data about how much is being spent and where, data about how schools, hospitals and police are performing, data about where things are and data about the weather. Yet until recently not many non-technical people concerned themselves with data and how it could be used better. Tim Berners-Lee 2010
“ I strongly believe that we can get over this great earthquake and tsunami by joining together” Naoto Kan
Greenwich O2 Arena London Fields Shoreditch Angel Islington Kings Cross Camden Town Regent’s Park Covent Garden South Bank Kensington
Awareness is the first important step in identifying and addressing unwarranted variation; if the existence of variation is unknown, the debate about whether it is unwarranted cannot take place
Graham Hyde: NHS Information Centre [email_address] Twitter: @grahamhyde Blog: Credits: Tim Berners-Lee TED OpenStreetMap Dr Ken Field, University of Kingston UK Police Home Office Randall Munroe xkcd