Trainspotting - content analytics and online research


Published on

presented on newMR webinar in Feb 2011 - this argues that content analytics needs to frame by theme (the train) before analysing individual carriages (keywords) - related to the cloud of knowing open source project - this brings together discourse analysis and online data

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Today I want to talk about top down- abandoning all the bottom up tools and starting from what we all share – culture, language and metaphor. I’m going to refer to the trains metaphor a lot because I think there’s far too much nonsense about starting with the carriages and working up. Counting keywords is what it usually comes down to.
  • This thinking has been shaped by the cloud of knowing open source project which has been going since Sept 2009. We meet every couple of months to hear presentations and post the presentations on the Cloud of Knowing site which you’re welcome to have a look at. At previous meets we have considered online data as contextual or behavioural data rather than a rather basic form of research content, online data as evidence of herd thinking, using crowds to analyse/amplify online data, applying fluid dynamic thinking to web pages,. We’ve had a semiotician talking about the semantic web. Oh yes and two members of the Cloud of Knowing group briefed us on research robots in November 2009 long before it was on the industry radar. So it has proved to be a fertile place to share ideas about how to make content analytics fit for purpose.
  • If the only tool you have is a hammer then every problem looks like a nail. The tool that is nearest to hand isn't necessarily the best one. Research methodologies need to be platform independent. It simply isn’t acceptable to use stochastic (that’s counting) tools because lots of them have been built for Twitter. We wouldn’t accept crude counting in any other field of research To this is added utter ignorance of sampling a million sites. Its not better than half a million. If you increase the size of the sample you increase the level of noise even faster if you don’t set very tight controls. Last week boing boing published an anlysis by keyword of the libyan situation as tripoli and gaddafi went up and gaddafi rose and venezuala went down. Violence went up. Clicking on the violent graph I found myself awash in tweets about domestic violence nothing to do with Libya – but presented as if it was.
  • Rosie Campbell totems paper at last year’s MRS conference argues for starting with the grand themes and belief systems which people’s conversations flow out of. In a given market there are a limited number of discourses per market. Which Rosie describes as totems – round which the conversation circulates. Each discourse will use a different deep cultural metaphor. To discuss a topic comprehensibly this totem needs to be understood and shared by speakers and listeners alike. Note that metaphors will make use of different words some shared across metaphors. Which means that same words will have different connotations This is why word counts are so misleading. Aggregating words independently of their cultural roots cuts across meaning and context Decode the metaphors first then identify keywords and plug them in in relation to that meaning.
  • To return to the train metaphor: Different trains have similar carriages but put them in different places depending on destination type of journey. You don't understand the train by  identifying counting carriages.  First look at the train.
  • Its high time I gave you an example – in an online study about the experience of obesity – very quickly in a bulletin board discussion different metaphors emerged – one used a scale of weights which rose and fell almost independently of what that person tried to do. Alongside this was a transformation narrative which involved confronting obesity and using self disgust and divine help to create the momentum to escape to somewhere new. Each narrative used different words and described different relationships especially between outsider (without a weight problem) and and insiders who usually struggled alone.
  • The brand problem. Brand managers want to demonstrate the centrality and utility of their brands when neither may be true.
  • Brands will fare better when they understand which discourses and metaphors their brands should be facilitating. And work to shape and amplify these. It maybe better for the brand not to force it's way in with a carriage sponsorship programme. facilitating a particular metaphor more important than slapping your name on a carriage as a major unwelcome sponsor. It's not about owning the train but joining the right one and helping to fill it.
  • To summarise where we’ve got to: Using qualitative analysis or reanalysis of existing studies by hand to identify the trains, the constituent carriages ONLY then set analytics engines to track by train NOT by carriage using automated processes
  • Pick which station platforms you’re standing on and when to spot trains. No analytics engine tracks all the carriages. This is why sampling is critical. You need to pick your trains. I propose grading data samples by their place on 3 curves. Not just use any and all data that turns up Need to establish which class of data is most helpful to identify the different trains Clay Shirkey in Here comes everybody introduces the power curve – very different from the population distribution curve. Some authors are much more prolific than others. Some content is much more visible than others. We cannot treat all data as the same because there is so much of it.
  • Curators curve:  how often the author has posted on the same topic. Frequent authors develop their own connotations, refer to their own previous postings cf one time mentions. Where on the curve does the data concentrate?
  • Audience curve. how many have read this piece of data? A problem with MROCs is the over dependence on the postings of a minority. Does a much read or dense scarcely read posting tell us more about what sort of train we have?
  • Third curve the context curve. Any clues about whether the data is produced close to context. Geo tagging context cues close to usage/ experience or reflective, reporting or even detached from direct experience.  We may not need to sample everything but a limited dataset from different parts of the 3 curves
  • Take insomnia as an example of how the most useful data may bunch at different places on different curves. We don’t have to collect all of it.
  • To summarise start with trains before worrying about carriages TOP DOWN  Secondly sampling Grade the data don't rely on size of dataset to bring clarity.  noise increases faster than signal Lastly don't use a tool because you have it or worse it's free. I didn't even mention sentiment analysis. I shouldn't have to !
  • Trainspotting - content analytics and online research

    1. 1. Stop looking at carriages start looking at trains John Griffiths March 8th 2011 Planning Above and Beyond
    2. 2. Cloud of KnowingRemit to consider how Open source projectonline content can beincorporated robustlyinto market research Face to face meetings Next one due in April Sharing papers
    3. 3. The tool nearest to hand is NOTnecessarily the best one.. Just the closest Stochastic (counting) tools are easy to write It doesn’t mean they tell you very much
    4. 4. Totem talk from Rosie Campbell  Start with shared meanings  Cultural discourses  Family stories  Telltale words used  Only look at the component words when you understand the shared context  Several totems in each market – demarcate each one THEN identify component wordsSource: Inside language - how to spot totem polesRosie Campbell MRS conference 2010
    5. 5. You don’t understand a train by countingcarriages
    6. 6. The meanings come from abovenot below Obese = lazy and complacent Uses ‘obesity’ to repel her from it Who I am – tied into my weight Can’t bring herself to use the ‘obesity’ word ‘people’ vs us
    7. 7. Branding trains
    8. 8. Attaching brands to trains.. Can be artificial – imposed Better to discover which train the brand fits with And find the right time to make the brand connection – in the right part of the train Fill the train – don’t try to own it
    9. 9. Tying this down.. Using qualitative analysis or reanalysis of existing studies to identify different trains, the mix of carriages – analysis by hand THEN use analytic engines to track the train – analysis using automated processes
    10. 10. Pick the right platform  Sampling
    11. 11. The curator curve Posts regularly on a topic Often refers to own postingsNB not the same Posts one off commentsas authority or No evidence of deeper interestinfluence
    12. 12. The audience curve Online hit – viewed by thousands or hundreds of thousands Rarely if ever read by anyone!!
    13. 13. The context curve Actively involved/immersed in context Reflective but away from context Detached from context
    14. 14. Insomnia: which gives the best picture? Detailed diary content by insomnia sufferers Regularly accessed and rated content When and where the content was created eg video and photo content
    15. 15.  Work TOP down – sort the trains before the carriages Sample – grade the data rather than increase the size of the dataset – to bring clarity – reduce noise increase signal Don’t use a tool just because its there or because its free..
    16. 16. That’s all folks!