We are in the middle of a world-changing shift. When I started working in the business world, all of the data for running the business would fit into a single file in today’s speadsheet applications. Data was scarce, highly sought after, poured over in intimate detail and highly valued. Today we generate huge amounts of data every single day, and one of the biggest challenges in business is dealing with information-overload. The focus is less on gathering data, and more on curating and making sense of it. Technology to the rescue…
IBM’s “portable” hard drive solution from the late 50’s. You’ve probably seen this photo, or variations of it. That drive is actually less than 5Meg – that’s probably less memory that is in your washing machine (at least if you’ve every washed a USB memory stick by accident it is!). Today though, storage is cheap. By the end of this decade, you will likely be able to affordably store the whole of today’s internet in the palm of your hand. That’s changing how we build computer systems, but also changes how we deal with data.
The other big, historical, challenge with data was that we need to have a good idea of what we were going to collect, before we started collecting it. And if we wanted to change our minds half way through, well, that was going to mean tears before bedtime and a lot of hard work. Today, technologies like NoSQL databases me that we don’t have to worry about the structure of the data until we want to make sense of it. We can also collect different sorts of data in the same system, capturing more or less as our sources change. Platforms like Twitter and Facebook have lead to the development of technologies that can process and analyze huge amounts of data in near-real time – restructuring and refactoring it as we go. It’s a revolution in our relationship with data.
You have to love the idea of a fictional doctor telling us that we are all liars. Of course we aren’t. But we do bend the truth occasionally. Especially if we think we are being watched (or listened too…)
Socially desirable responding is a major challenge with surveys and many other traditional data gathering methods. Certainly, experts can help to minimize their impact, and control for other issues like response set bias. It gets interesting when we turn those biases from issues in to data. Technology allows us to record not just the answers, but to record the response times too. This is an all together different sort of data.
We can move from ‘expressed’ measures (attitudes) to observed measures (behaviours). Of course, not all ‘behavioural’ measures are actually measures. Foursquare check ins, for example, are expressions – people don’t (usually) check in at every physical location that they visit. The choice of checking locations is an expression of attitudes about themselves and the location brands. Social media, contrary to opinion, is not about transparency. It is about continual, partial transparency. We need to get smarter about understanding the data that we collection, and learn new techniques to control for the biases in it.
Of course some data is more ‘objective’ – this is a lovely visualisation of the Autodesk organisation over time. In our early days of working with human data, we spent quite a lot of time building these sorts of visualisation. They have become cheaper and easier to produce, and they are certainly good discussion points. The bigger lesson though, is that not all data matters, or at least much of the data we see as important actually is. I can predict more about the interactions of people in an organisation based on the physical distance between their desks, than I can from a hierarchical org chart. Objective information is good, but it is overly valued in business. Aggregate subjective information often tells us more. Not all opinions are never meaningless!
One of the most interesting things about social media is that it gives us more access than ever before to the raw language that people use. As software algorhythms have become more advanced, and our understanding of language has improved, we can create software that can analyze, on aggregate, the emotional content of communications. The hedonometer is a great example – how happy is the Twitter-verse today? But language tells us much more…
Shared vocabulary can predict social groupings and influence, in quite unexpected ways.
But blindly looking for patterns is a dangerous sport. It hits many of the weak spots in our cognitive systems, and can lead us up all sorts of blind alleys.
“correlation is not causality” – get it printed on a t-shirt. Say it randomly in meetings. The assumption that unrelated events have causal links almost makes the business world go around. Litterally. It is a much harder habit to break that you might think, for reasons I’ll come on to later. When we operate in the world of human data, it is an ever present danger – misunderstanding how variables do (or don’t) relate. Camera tripods have cameras on them. Cameras take good pictures. Cameras don’t like getting wet. Andy here is on a camera tripod. He takes very good pictures. He hates getting wet. Andy is, of course, a photographer, not a camera. But ascertaining that from a few variables (rather than a few megabytes of data and a lifetime of learning) is a very non-trivial problem.
We have a sea of cognitive biases that play in to one another. We tend to fixate on the first thing that we see, becoming blind to other interpretations, we are biased towards spotting evidence that supports our hypothesis, and ignoring data that doesn’t. We value and believe things, based on repetition, more than reliability, and when you put that in a social context, we support what we believe that other people believe. It is a chain of events that leads to big mistakes, and big data is high octane petrol, especially in the business context, where we value ‘objective’ data so highly. Numbers are not always objective, they are vulnerable to subjectivity.
You will have seen this video on line. Think through the consequences carefully. When told to diligently observer something, we completely miss the gorilla in the room, beating its chest. Apply that to big data. Our perceptual systems keep us safe from predators, and help use locate friends and relatives. They were not specified or tested for analysing terabytes of data on a computer screen…
Any, just to make it worse… We always over estimate our capabilities. When asked, on average, everyone is above average!
There is no such think as a neutral presentation of data. We always bring something of ourselves to the presentation, even if it is unconscious. Phenomenological approaches to psychological research understand, embrace and control for the biases of the researcher. Ignoring them, or even worse, denying their existence, simply increases their impact. Understand why (at an emotional level) you are measuring what you are measuring, and the story you tell well you present it.Human’s communicate at the level of stories, not at the level of data, so tell stories, and understand stories. Each story is a potentially narrative. Most data has multiple potential narratives. Without a narrative, an embracing context, data is meaningless, or at least meaning-less.
By the way… Not all biased presentations are as simple and obvious as this example. But look at what is going off here! What do you learn? What can you tell about what is being said (and what is not being said!).
The biggest challenge with Human Big Data is that it breaks the scientific model. Most people working in the data processing world come from a back ground that draws on the epistemology of natural science. We build a hypothesis, we construct experiments, we measure things. We gradually ‘discover’ the nature of the word around us. Of course human big data doesn’t work that way. Marketing people are paid to CHANGE WHAT PEOPLE BELIEVE. So, if we are measuring what people believe (attitudes) or how they behave (which is related to what their believe – says the marketing world!), we are changing the thing that we are measuring. At a higher level, we are using the learning from big human data to architect the social construct that people operate within. Marketing has never been static. What worded yesterday, won’t always work today. Humans adapt and normalise. If you use behaviour economics in your pricing, eventually you change the behaviors. Why don’t you by the cheapest or the most expensive wine today?
When we gather and present human big data, we have to do our best to design out the biases. But we can turn this all on its head and use our biases, combined with big data, to change attitudes and outcomes.
Nudge is common parlance today. Decision architectures generally play on age old cognitive biases. When we add social data…
We turn on the turbo button when we add social proof. The Facebook like button that you see on websites has faces on it for a reason. Instrumenting behaviours and playing back the data is powerful.
But we have to be careful. It can be too powerful. Overly rationalizing the world, and using social forces to drive compliance, can lead to an icy and brittle world. We need to use these new tools with caution. This is not a single move chess game, and eventually there is a tipping point at which contrarian approaches become the dominant strategy. If we measure emotional responses, and engineer the ultimate film script, and then every film studio follows it, suddenly it becomes bland. We have to apply our learning lightly, and with a bit of fun…
Probably the fastest way to start a fist fight at a game developers conference is to describe “gamification” as psychology for dummies. Ok. Some parts of that statement might not be true. But what is true is that we can usefully borrow from the tool box of gamification. In gaming, the players enter “the circle” of the game – temporarily adopting a perspective on how the world works. We can escape the game. When we can’t, it stops being a game. There is something else that gamification gives us: The construction of measures. Not everything that we want to measure in human bug data has a metric that we can express. In the world of games, we create measures and play them back. Number of lives, energy level… Constructed measures can be used to turn the disadvantages of reification into a positive advantage. We create and combine the things that we can measure, into new measures. Number of twitter followers, number of Facebook likes. These are actually all just constructed measures, which are used to drive behaviours. The players play the game to earn the points they need to level up!
Hard Discs are CheapIBM 305 RAMAC5 Meg Hard Drive, 1956“...Should the currenttechnological and pricingtrends continue, we will see2.5-inch 40-terabyte drives sellfor as little as US$40 by theend of the decade.”http://www.gizmag.com/hdd-storage-density/25004/
“Its a basic truth of the human condition, thateverybody lies. The only variable is about what...…I’ve found that when you want to know the truthabout somebody, that someone is probably the lastperson you should ask.” Dr Gregory House
Socially Desirable AnswersPeople will tell youwhat you what to hear,what they heardand what they would like to hear……but rarely what they think
From attitudes to behavioursSocial Signals“Let’s do coffee”
Not everyone (thing) lieshttp://www.youtube.com/watch?feature=player_embedded&v=mkJ-Uy5dt5g
But you are better than that…self-enhancement bias
Presentation is bias"[Big Data] is sometimes seen as a cure-all, as computers were in the 1970s. ChrisAnderson…wrote in 2008 that the sheer volume of data would obviate the need fortheory, and even the scientific method…."[T]hese views are badly mistaken. The numbers have no way of speaking forthemselves. We speak for them. We imbue them with meaning….[W]e mayconstrue them in self-serving ways that are detached from their objective reality."Data-driven predictions can succeed--and they can fail. It is when we deny ourrole in the process that the odds of failure rise. Before we demand more of ourdata, we need to demand more of ourselves….Unless we work actively to becomeaware of the biases we introduce, the returns to additional information may beminimal--or diminishing."Nate Silver, The Signal and the Noise
Presentation is biasVia “Data Visualization: a successful design process” Andy Kirk