Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms


Published on

my presentation to NYU ITP Camp on 6-22-12

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • If you think about – just about everything at ITP Camp is Unstructured Data – and the story your telling, is one for you to frame, or make up, when you tell it.
  • http://www.marketingweek.co.uk/news/examining-social-media-data-can-burn-a-hole-in-your-pocket/4000881.article
  • Note: The chart says "tracking Brand Requested themes" but there are no themes the brand has suggested. I used the themes I found in my qualitative analysis instead.
  • Stop words include: -discount - buy -online -"without a prescription" -cheap
  • Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

    1. 1. Moving from Unstructured Data into Structured Meanings and Data Stories Marshall Sponder WebMetricsGuru INC for NYU ITP CAMP 6-22-12
    2. 2. Introduction – about me, besides being an ITPCamper….Marshall Sponder is the CEO/Founder of WebMetricsGuruInc., a social solution design, social media analytics, webdata analysis and SEO/SEM practice focusing on cuttingedge market research and social media trend analysis.He is the author of "Social Media Analytics: Effective Toolsfor Building, Interpreting, and Using Metrics", published byMcGraw-Hill, 2012.Marshall also teaches Social Media Analytics and Art atRutgers University and UCI Irvine, Extension and is afrequent speaker at Analytics conferences internationallyand in the United States.
    3. 3. Geo-located check-in data, somehow, got much harder to accuratelycapture, across the board, via listening platforms after last summer (but it was always fragmentary, at best). Sysomos Map Query: 4sq.com/
    4. 4. Last Year did some data crunching using Radian6 and 4SQ check-ins – found context / story much easier to get via Geo-local data My findings are that adding additional “dimensions” to the social data provides “context” that is often missing, because the social data is largely unstructured. Also was able to look at “influencers” by the venues they habitually visited and their Twitter following.
    5. 5. Hitting People through multiple Channels, creates “meaning” and DejaVu. NY Lottery online site 2. NY Subway Train 1. PandoraHaving cookies tracked across sites is probably doing something similar – but the idea isawareness, relevance and meaning are “created” by repetition across varied channels with in a certain
    6. 6. Finding /Creating Meaning / Creating the Story using Social Media Moment of ReceptivityAt the moment of receptivity – your message, argument, proposition has a chance ofbeing received and acted on. The story you create will be a mixture of what you wishto create, and what your recipients will make of it (how they will process it).
    7. 7. Unstructured Data Some types Data (not all inclusive) Some Data More work is Big Data, including machine generated data (and Big Analytics) req for unstructured Social Media, Video, Audio and Geo Local (Mobile) Data data Offline Data (verbal, observed) recorded or non-recorded Search Engine and Web Analytics Data (structured and Page Based)Large and medium business/marketing silo data (business intelligence) Financial, manufacturing, Legal and Legislative Data Learning/ Training Data (Education/University) Printed / Written Data (ledgers, lists) Structured Data
    8. 8. How much Unstructured Data is there?
    9. 9. Types of Unstructured Data (some of these types you deal with at ITP Camp)Examples of Structured Data (Businesses, Governments and Educational Institutions have alot of experience with this kind of data) - Databases - XML data - Data warehouses - Enterprise systems (CRM, ERP, etc)Examples of Unstructured Data (no one has a corner of this type of data, yet – everyone isstruggling with it). - Excel spreadsheets (one can argue this point – as Excel can have structure too) - Word documents - Email messages - RSS feeds - Audio files - Video files - Social Media Data (tweets, posts, photos, likes, shares, Near Field, Geo-fensing) - Mobile Data (check-ins, SMS, etc)
    10. 10. Add a Plan to help Structure to Data• Identity (who are you identifying?) You A business, non-profit, Gov, Some one else (depends how you an official, Industry, etc. want to define this)• What (are you measuring?)Check- Visits, Page views, Unique Behavioral and Attitudinal data –ins, mentions, posts, clicks, pulse Visitors, etc (perhaps a much harder – some ITPdata, etc specific audience type) experiments seem to go here -• Where (are you monitoring?)(have enough data?)Social Media Channels Location (where) Venue Type Situation/mindset• When (is it happening?) Real Time / Seasonal Specific event Time not defined Asynchronous• Why (is it happening?) You know why, but you want to know how much, be more tactical, effect specific changes Exploratory (don’t know – trying to find out) Business Goals? Art Goals, Effect Changes,
    11. 11. Would be nice if Social Data could interface into somethinglike Isadora, don’t know if anything that does that yet, orwhat the metaphors for the data would be (a gooddirection, though, if someone wants to take that on).I suspect such an interface would lend itself to “Big Data”
    12. 12. Maybe this taxonomy would work (http://behaviorgrid.org/)... But youhave code the verbatim manually – unless you can program machine learning to do it for you (wont be that accurate, though)
    13. 13. What Platforms Provide (not a complete list by any means)• Geolocation (location, venue type, other friends)• Social – Sentiment, Volume, age/gender, some attempt at topic, usually inadequate, text analytics• Web Analytics – Visit, Page view, Visitors (unique/new) Cookie Location, pathing (on site only), correlation tools, search keywords, links (referrers), ecommerce tracking (on site only)• Audience Measurement – via Ad Exchanges, online panels, demographics, psychographics, and geo- demographics.• Census and Governmental data• Financial Data – Wall Street,• Market Research – Traditional – Forums, Polls, opinioned analysis based on sampling (political polls, ie).• Market Research – New – Big Data – try to find out hidden patterns (ie: people who fix their roofs have less car accidents and get cheaper car insurance, stuff like that).
    14. 14. For ITP – Suggested Taxonomy• Author / Artist / Subject / Activity• Place (location)• Time (timeframe)• Type of Action / Behavior• Persona (this will have to be defined)• Medium (i.e.: GSM/Mobile, Projection, etc)• Subject / Area• Purpose (recreational, exploratory, consciousness raising, etc)• Etc, etc, etc (these need to be further defined)
    15. 15. Two-Tiered SegmentationMerging customer and visit type segmentation creates a two-tieredsegmentation framework that becomes the core of our data model Visit Types Visitor Product Category Early Stage Discount Type Directed Directed Research Shopper High-Ticket Buyer Prestige Giver Visitor Record SMB Shopper Brand Loyalist
    16. 16. Medium is the Message - McLuhan• How your measuring and viewing affects what you see, what you find. Many ITP experiments are directly impacted by the method/medium being used. In emerging media, especially, due to “unstructured” aspect of it, tools shape the data (and insights).• What tools or platforms are you using? What are tools are platforms can you use (are at your disposal)?• What is your budget for the tools platforms and people?• Are you in control of the measurement process yourself, or are you depending on others to execute it for you?• Do you have a framework to put all this data in? That’s pretty important.IMPLICATION: Choice of tool or platform profoundly shapes theresults of your experiment or project
    17. 17. What’s the Use Case? Pick One (or add a Type new use case) Behavioral
    18. 18. Use cases from a Tools/Platform perspective Consumer PR Monitoring & Social Campaigns Research Support Automating of the Listening for Listening and engagement Insights Engagement Social Media Traditional Coverage* Media Workflow Coverage* NLP (machine Influencer Operational learning) Identification Metrics Rich Topic Low-Latency Categorization Categorization Care of Gary Angel – Semphonic.com
    19. 19. Not so much an issue for ITP, but many organizationsend up buying the same data from multiple vendors (over and over) (something to avoid, if you can) Examples: Full Service Source: Semphonic.com
    20. 20. In some cases, a high level plan (similar to a 1 minute pitch) mighthelp to add structure and meaning to what your going to try to do(even here at ITP Camp or ITP, in general)Goal(s): Audience: amongLocation: Timing : through/ withVehicle (how your going to do it): Venues (where your going to do it): ask fans and customers toMessage (Call(S) to Action): Regarding ourProduct / Service / Program Where Success will be judged by Metrics/KPI’s
    21. 21. Example of a Student’s Goal – Resurrecting George Enescu’s WorkGoal: Audience:Salvage the reputation of the Romanian 20th Among Classical musiccentury composer, George Enescu institutions, enthusiasts, and musicians alikeLocation: Timing:Ideally GeorgeEnescu.com A 6 month campaign period Through/ WithVehicle: Venues:Online videos, online Personal blog, radio stations, YouTube, Ask fans andnetworking, podcasts, musicological musicological conferences, etc. customers toresearchMessage:Enescu’s art ought to be enjoyed and celebrated as the Regarding thework of a deserving, 20th-century masterProgram:Program to promote the musicians and orchestras who wish toexplore Enescu’s work Success will be judged byMetrics/KPI’sPopularity on Google New business connections New visitors to YoutubeTrends and partnerships website statistics
    22. 22. And what is a Plan, Anyway?
    23. 23. We’re drinking from the social media fire hose Massive data to process and make sense of it all But … We Don’t Need to boil the ocean!85
    24. 24. New Solutions Lie in …• Adding additional dimensions to the data (i.e.: time, place)• Adding Custom Taxonomies, Lexicons and data mashups helps, if done well and cleanly• Customizing the source data feeds• Customizing Data Extraction from Pages Crawled• Defining what your goals are• Defining what, when, where and how your going to accomplish your goals• Define your Key Performance Indicators that tell you if you hit or missed your goal targets
    25. 25. Internet Abundant with Predictive Signals
    26. 26. Beyond Listening: Reinventing Social Media MonitoringIf a statusupdate reachesa social networkbut no one seesit, does it exist?
    27. 27. Are people using the wrong solutions todetermine what people are saying?
    28. 28. @listening as a use case … Why Bother?? “the problem with social is that there is so much data - there’s 40 or 50 data points that you can measure and you have to figure out whether they are important. Some of those measurements are fundamentally not important.
    29. 29. Pain!! • Broad listening across the internetBlogs Forums Press • Focused on keyword matches News Social Sites Trade Sites Networks – Mentions of • Brand name “Starbucks” • Product names “Frappuccino” • Produces valuable insights, but is exploratory in nature, as a result, it can not answer tactical questions and is not scalable. “I’m drowning in data and documents from the internet but I need actionable insights”
    30. 30. Problems we all face with Social No Process Success Undefined 90 % Unstructured Time Consuming Hard to Scale esp. at the beginning
    31. 31. “Lens” approach usingBoolean queries and saved datasets don’t seem to work very well
    32. 32. Monitoring has become too complex http://www.youtube.com /watch?v=4Y-SVxnVOv8 Radian6 Query on Foreclosures in Rhode Island"housing solution"~2 AND "rhode island" AND "foreclosure", "road home program"~3 AND"foreclosure", "home loan modification"~4 AND "foreclosure", "jobless rate"~3 AND"foreclosure", "bankrupcy" AND "foreclosure" AND "housing" AND "obama", "rhode islandhousing"~3 AND "forclosure", "foreclosure prevention funds"~5, "bank foreclosures rhodeisland"~4 AND "obama", "selling house"~4 AND "foreclosure" AND "obama", "hardest hitfund"~4, "national foreclosure mitigation"~6, "homeowner stability initiative"~5 AND"obama", "roadhome program"~2, "hud homes rhode island"~3 AND "obama", "foreclosuresettlement"~4 AND "25 billion"~2 AND "obama", "fannie mae freddie mac"~10 AND"foreclosure", "keeping people in their homes"~4
    33. 33. And we don’t get our “Pie in the Sky”
    34. 34. RecordedFuture
    35. 35. Web is Loaded with Events Silicon Valley executives head to Vail, Colo. next week for the Drought and malnutrition hinder next year’s annual Pacific Crest Technology development plans in Yemen... Leadership Forum The carrier may select partners to set up a new carrier as early as next month “2010 is the year when Iran will kick out “...opposition organizers Islam. Ya Ahura we will.” plan to meet on Thursday to protest...” “... Dr Sarkar says the new facility will be operational by March 2014...” “Excited to see Mubarak “According to TechCrunch speak this weekend...” “Strange new Russian China’s new 4G network will worm set to unleash be deployed by mid-2010” botnet on 4/1/2012...”
    36. 36. Recorded Future Architecture70,000 Real-time Sources 100,000 future events/day 3+ Billion Time-tagged Facts
    37. 37. Mobile and Tablets - Next three years Huge market segments still emerging• Over 75% of businesses plan on deploying tablets by 2013• Revolutionizing health care delivery, on-site and mobile• Disrupting software engineering and user expectations
    38. 38. Actionable Intelligence gleamed from LBS instead of exploratory insights Geo-Location Analytics from SMM VenueLabs – They mess up here A LOT! If I wasn’t in a rush nor a coffee addict I would go somewhere else! New Insights Traditional Insights Location Date /Time Topic Sentiment Staff Working Managers Influence Local Context Unit Sales Engagement Nearby Competitors
    39. 39. VenueLabs solves Local Data Gap Example They mess up here A LOT! If I wasn’t in a rush nor a coffee addict I would go somewhere else!
    40. 40. Verified the Local Data GapThe text of the verbatim don’t help much since they we can’t tell where thisactually was taking place without looking at the additional short url and creatinga context – which the software, today, usually isn’t able to do.
    41. 41. New York Art Instance - VenueLabs
    42. 42. Most activeMuseums?
    43. 43. Local Data Analytics of Museums – adding location automatically makes info more actionable (context) Facebook & Twitter
    44. 44. Smoking Cessation Smoking Cessation Phases Patient Journey Stage 1 Patient Journey Stage 2 Patient Journey Stage 3 Patient Journey Stage 4 Behavioral: Cold Turkey Behavioral: Other Over The Counter RXSide Effects of Craving comfort food, nicotine, Respondents are actively Respondents in stage 3 are smoking cessation is the mainSmoking fear of weight gain, quitting and seeking to quit smoking by settling on a treatment choice at stage 4 (based on ourCessation wondering how friends and fear the physical and option presenting the least listening) but many respondents family will view decision. psychological side effects. side effects as possible or are having problems staying on going cold turkey. the regime due to side effects.Choosing the right Respondents are looking for a Patients are suffering the side In Stage 3 Smoking In Stage 4 smoking cessationmedication for way to stop smoking but are effects associated with Cessation decisions side effects are the main issueSmoking confused with options and Nicorette, nicotine patches, complicated by product bans respondents have, with menCessation asking for advice. smoking cessation or cold for e-cigarettes and smoking appearing to do better with the turkey cessation in some treatment than woman. communities and Negative press over side effects occupations of smoking cessation are upsetting - making many rethink their decision to stop smoking,Available Options In Stage 1 respondents are In Stage 2, the overwhelming In Stage 3 use of Electronic In Stage 4 Patients struggle withfor Smoking seeking guidance on all the choice of Smoking Cessation Cigarettes followed by the side effects of smokingCessation are available treatment options and treatment option is Hypnosis, Nicorette gum as the most cessation treatment, itself.Confusing making a decision on which with second most popular popular treatment Some patients complete one(s) to try. treatment being Nicorette according to our listening treatment successful but others and then smoking cessation. reports. do not and are dissatisfied with their progress.Getting advice on Online respondents are going on In Stage many side effects Patients in stage 3 have tried In Stage 4 just about all thethe right blogs, twitter and forums looking associated with each treatments and are sharing information on smokingtreatment options for people who have experiences treatment are evident and their experiences struggles cessation is negative, althoughfor Smoking taking drugs for Smoking respondents are grappling and successes with Smoking that does not stop many patientsCessation Cessation so they can get with which choice to make - Cessation. from taking the drug, but many information on the right often going with hypnosis are stopping once they approach to take. first. experience side effects.
    45. 45. Defining Key Words Patient Journey Stage 1 Patient Journey Stage 2 Patient Journey Stage 3 Patient Journey Stage 4 Behavioral: Cold Turkey Behavioral: Other Over The Counter RX"quit smoking" "counseling" AND "smoking" "snus" AND "smoking" "smoking cessation""smoking cessation" "cutting back" AND "smoking" "nicotrol“ (1 mention) "buproban" AND "-order""cold turkey" AND "smoking" "nicotine free-cigarettes" "e-cigarette" AND "-buy" ANDAND "quit on my own“ "-"buying“ "homeopathic remedies" ANDOR "quit smoking" AND "stop "stop smoking gum" smoking““jotharmcoe” "" nasal sprayAND "smoking“(1“smoking cessation” “hypnosis” AND “smoking” mention)“mytimetoquit.com” AND “-weed”“smoking cessation.com” "nicoderm"“let’s quit together now” "embarrassed to" AND“get quit clinic” "doctor" AND "smoking" "lozenge" AND "smoking““qut-quit.com” "quit line" AND "smoking" "patch" AND "smoking" "herbal remedies" AND "quit "inhaler AND "smoking" smoking" AND "stop smoking" "Nicolette“(strongest "support group" AND keyword) "smoking" "nicotine replacement therapy" AND "smoking"
    46. 46. Location of Conversation for Each Stage of the Patient JourneyBehavioral: Cold Turkey Behavioral: Other Stage 1 Stage 2 Over The Counter RX Stage 3 Stage 4
    47. 47. Putting targeting into action RI Primary = Create The Story (Be The Story)? ? Over 50 80,120 RI District 1 1. Hit potential voters in District 1 with issue targeted sponsored stories for AWARENESS only (expect little if any Clicks) 2. Blanket Zipcodes with mailing (post office now does this). 3. Use Venuelabs to sift checkin data and findMembers of Facebook over 49 years old voters –cross link to voting list when possible.in RI District 1 = 102,920 (caveat: 4. Categorize (data mining – persuadable?)there are a few zip-codes in both 5. Reach out / Community management – etcdistricts) 6. Set up tracking (i.e.; Campalyst – next slide)
    48. 48. Connecting Engagement To Conversions campalyst.com
    49. 49. Google Social Reports Cannot connect thedots to ROI (yet) though Campalyst, Can.
    50. 50. Google Social Reports Cannot connect thedots to ROI (yet) though Campalyst, Can. Not enough information – Google cannot connect the dots back to the original post that generated the referral, but Campalyst does.
    51. 51. Campalyst ties cause and effect for Twitter andFacebook better than any other platform I’ve yetseen – marshall sponder – WebMetricsGuru.com
    52. 52. Campalyst can also find the brand advocates thatgenerate the most traction and engagement for abrand or website.
    53. 53. Summary• The Future of Analytics is with Actionable Data• Actionable Data comes from adding contextual information and metadata in meaningful ways related to your business or organizational goals.• You need a Plan (the right one) to execute, together with the metrics, audience, timing, venu e, program /vehicle and KPI’s to succeed with Analytics of any kind.
    54. 54. Examples of Platforms (you can playRadian6 Basic with these later)Sysomos MapBrandwatchCampalystInfinigraph6dgreeNetbasePeekAnalyticsTraackrmPactVenuelabs
    55. 55. WebMetricsGuru.com Marshall Sponder WebMetricsGuru INC. www.webmetricsguru.com www.smabook.com now.seo@gmail.com @webmetricsguru @smanalyticsbook WebMetricsGuru INC.
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.