Data Strategy

3,035
-1

Published on

Information to Collect, and What You Can Do With It. A seminar for APME/ONA Newstrain.

Published in: Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,035
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
90
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Data Strategy

    1. 1. Your data strategy What information to collect and what you can do with it Anthony Moor Deputy Managing Editor/Interactive The Dallas Morning News [email_address]
    2. 2. Web 3.0: The data-driven Web Source: Chicago Tribune SEO (Google) Domain name speculation (Netscape) Search Boost The entire Web (Google AdSense) A few large sites (DoubleClick) Ad Distribution Write/Contribute (Wikipedia, Flickr) Read (Britannica online) Engagement Web services (syndication) Client/Server Architecture Dynamic (XML, Ajax, RSS) Static (HTML) State Data (mashup, widget) Page (article) Unit of Content <ul><li>Data powers the Web </li></ul><ul><li>Automated mashing up of personalized content </li></ul><ul><li>Intelligent-agent driven assembly and interactivity </li></ul>Everyone Geeks Audience Web 3.0 The Semantic Web Web 2.0 (2003-2010) Web 1.0 (1993-2003)
    3. 3. Key features of a data-driven Web <ul><li>Data powers Web applications </li></ul><ul><ul><li>Certain classes of data are becoming critical building blocks for Web 3.0 applications </li></ul></ul><ul><ul><li>Structured data records published to the Web in reusable and remotely queryable formats (widgets) </li></ul></ul><ul><li>Leverages the Long Tail </li></ul><ul><ul><li>Low-cost economics and broad reach enabled by the Internet </li></ul></ul><ul><li>Becomes the geospatial Web (Geoweb) </li></ul><ul><ul><li>Merging of geographical (location-based) information with the abstract information that currently dominates the Internet </li></ul></ul><ul><li>Enables content remixing and repurposing (Think: mashups) </li></ul><ul><ul><li>Increase benefits from collective adoption not private restriction </li></ul></ul><ul><li>Users add value </li></ul><ul><ul><li>Users add their own data to that which we provide </li></ul></ul>Source: Chicago Tribune
    4. 4. Data centers provide utility
    5. 5. Databases are hard to build <ul><li>Databases have three parts, built by different tech experts </li></ul><ul><ul><li>The data warehouse, where your cleaned/converted data sits </li></ul></ul><ul><ul><li>The Web interface, carefully designed to be intuitive to your users </li></ul></ul><ul><ul><li>The production tool, so producers can amend the data </li></ul></ul><ul><li>So consider them for projects that have a long shelf life </li></ul><ul><li>Because databases persist – the data gets old </li></ul><ul><ul><li>So someone needs to manage and update the database </li></ul></ul>
    6. 6. Acquire and build databases
    7. 7. Interactive school guides
    8. 8. Create image maps about your data
    9. 9. Property appraisal, home sale data
    10. 10. Police reports and crime statistics
    11. 11. Voter and election guides Powered by TheVoterGuide.org
    12. 12. Public employee salary databases
    13. 13. Mashups <ul><li>You can do your own map mashup at Atlas </li></ul><ul><li>Follow mashup development at Programmable Web </li></ul>
    14. 14. Carefully consider what to database or map <ul><li>Consider the time investment </li></ul><ul><li>Ask: What job do you want to get done for your user? </li></ul><ul><li>Crudely sketch out exactly your idea on a piece of paper then </li></ul><ul><li>… walk through exactly the clicks a user will take to get your stuff </li></ul><ul><li>Maps </li></ul><ul><ul><li>Poor: DISD bond issue map </li></ul></ul><ul><ul><li>Good: Home sales map </li></ul></ul><ul><li>GuideLive.com </li></ul><ul><ul><li>Listings </li></ul></ul>
    15. 15. Data structure powers mashups
    16. 16. Of course our stories don’t mashup very well. They aren’t data! Yes they are. And we ignore that fact at our peril.
    17. 17. Our key competitive differentiator is the data we gather every single day <ul><li>Our articles </li></ul><ul><li>Images </li></ul><ul><li>Ads </li></ul><ul><li>Classifieds </li></ul><ul><li>Listings </li></ul><ul><li>Video </li></ul><ul><li>User Content </li></ul><ul><li>Archives </li></ul><ul><li>Databases </li></ul><ul><li>Blogs </li></ul>
    18. 18. But it’s locked, hidden and unorganized <ul><li>Names/Places </li></ul><ul><ul><li>Coppell </li></ul></ul><ul><ul><li>Grapevine Mills Mall </li></ul></ul><ul><ul><li>Coppell High </li></ul></ul><ul><ul><li>OSU </li></ul></ul><ul><ul><li>Travis Masters </li></ul></ul><ul><ul><li>Emily Coker </li></ul></ul><ul><ul><li>Sarah Sanders </li></ul></ul><ul><li>Dates/Facts </li></ul><ul><ul><li>Audi slid under 18-wheeler </li></ul></ul><ul><ul><li>Friday morning </li></ul></ul><ul><li>Concepts </li></ul><ul><ul><li>Suicide </li></ul></ul><ul><ul><li>National Merit Scholarship </li></ul></ul><ul><ul><li>Motocross </li></ul></ul><ul><ul><li>Untimely deaths </li></ul></ul>
    19. 19. So how do we get it unlocked and organized? We need some data about this data
    20. 20. So how do we get it unlocked and organized? We need some data about this data metadata
    21. 21. Metadata tells us what an article is about
    22. 22. <ul><li>Names/Places </li></ul><ul><ul><li>Coppell </li></ul></ul><ul><ul><li>Grapevine Mills Mall </li></ul></ul><ul><ul><li>Coppell High </li></ul></ul><ul><ul><li>OSU </li></ul></ul><ul><ul><li>Travis Masters </li></ul></ul><ul><ul><li>Emily Coker </li></ul></ul><ul><ul><li>Sarah Sanders </li></ul></ul><ul><li>Dates/Facts </li></ul><ul><ul><li>Audi slid under 18-wheeler </li></ul></ul><ul><ul><li>Friday morning </li></ul></ul><ul><li>Concepts </li></ul><ul><ul><li>Suicide </li></ul></ul><ul><ul><li>National Merit Scholarship </li></ul></ul><ul><ul><li>Motocross </li></ul></ul><ul><ul><li>Untimely deaths </li></ul></ul>1) So we first extract key entities and concepts
    23. 23. <ul><li>Names/Places </li></ul><ul><ul><li>Coppell </li></ul></ul><ul><ul><li>Grapevine Mills Mall </li></ul></ul><ul><ul><li>Coppell High </li></ul></ul><ul><ul><li>OSU </li></ul></ul><ul><ul><li>Travis Masters </li></ul></ul><ul><ul><li>Emily Coker </li></ul></ul><ul><ul><li>Sarah Sanders </li></ul></ul><ul><li>Dates/Facts </li></ul><ul><ul><li>Audi slid under 18-wheeler </li></ul></ul><ul><ul><li>Friday morning </li></ul></ul><ul><li>Concepts </li></ul><ul><ul><li>Suicide </li></ul></ul><ul><ul><li>National Merit Scholarship </li></ul></ul><ul><ul><li>Motocross </li></ul></ul><ul><ul><li>Untimely deaths </li></ul></ul>2) Then filter them for relevance
    24. 24. 3) And finally relate them to standard categories <ul><li>Names/Places </li></ul><ul><ul><li>Coppell </li></ul></ul><ul><ul><li>Grapevine Mills Mall </li></ul></ul><ul><ul><li>Coppell High </li></ul></ul><ul><ul><li>OSU </li></ul></ul><ul><ul><li>Travis Masters </li></ul></ul><ul><ul><li>Emily Coker </li></ul></ul><ul><ul><li>Sarah Sanders </li></ul></ul><ul><li>Dates/Facts </li></ul><ul><ul><li>Audi slid under 18-wheeler </li></ul></ul><ul><ul><li>Friday morning </li></ul></ul><ul><li>Concepts </li></ul><ul><ul><li>Suicide </li></ul></ul><ul><ul><li>National Merit Scholarship </li></ul></ul><ul><ul><li>Motocross </li></ul></ul><ul><ul><li>Untimely deaths </li></ul></ul><ul><li>Names/Places </li></ul><ul><ul><li>Towns > Coppell </li></ul></ul><ul><ul><li>Location > Grapevine Mills </li></ul></ul><ul><ul><li>High Schools > Coppell </li></ul></ul><ul><ul><li>OSU </li></ul></ul><ul><ul><li>People > Travis Masters </li></ul></ul><ul><ul><li>People > Emily Coker </li></ul></ul><ul><ul><li>Sarah Sanders </li></ul></ul><ul><li>Dates/Facts </li></ul><ul><ul><li>Accidents > Auto/Truck </li></ul></ul><ul><ul><li>April 25, 2008 </li></ul></ul><ul><li>Concepts </li></ul><ul><ul><li>Suicide </li></ul></ul><ul><ul><li>National Merit Scholarship </li></ul></ul><ul><ul><li>Motocross </li></ul></ul><ul><ul><li>Teens > Deaths </li></ul></ul>
    25. 25. The standard categories are… <ul><li>Names/Places </li></ul><ul><ul><li>Towns > Coppell </li></ul></ul><ul><ul><li>Location > Grapevine Mills </li></ul></ul><ul><ul><li>High Schools > Coppell </li></ul></ul><ul><ul><li>OSU </li></ul></ul><ul><ul><li>People > Travis Masters </li></ul></ul><ul><ul><li>People > Emily Coker </li></ul></ul><ul><ul><li>Sarah Sanders </li></ul></ul><ul><li>Dates/Facts </li></ul><ul><ul><li>Accidents > Auto/Truck </li></ul></ul><ul><ul><li>April 25, 2008 </li></ul></ul><ul><li>Concepts </li></ul><ul><ul><li>Suicide </li></ul></ul><ul><ul><li>National Merit Scholarship </li></ul></ul><ul><ul><li>Motocross </li></ul></ul><ul><ul><li>Teens > Deaths </li></ul></ul>
    26. 26. <ul><li>tax·on·o·my </li></ul><ul><ul><li>Pronunciation: </li></ul></ul><ul><ul><ul><li> ak-sä-nə-mē </li></ul></ul></ul><ul><ul><li>Function: </li></ul></ul><ul><ul><ul><li>noun </li></ul></ul></ul><ul><ul><li>Etymology: </li></ul></ul><ul><ul><ul><li>French taxonomie, from tax- + -nomie -nomy </li></ul></ul></ul><ul><ul><li>Date: </li></ul></ul><ul><ul><ul><li>circa 1828 </li></ul></ul></ul><ul><ul><li>1:   the organizational structure of categories and attributes that define how you classify, describe and manage your data </li></ul></ul>
    27. 27. Taxonomy is the card catalog of our content <ul><li>A set of index terms that we manage and apply to each piece of content </li></ul><ul><li>Terms are hierarchical : Large categories split into specific sub-categories </li></ul><ul><li>Terms are cross-referenced, so if you look for “bucket,” you also get “pail.” </li></ul>A taxonomy organizes, classifies and relates our content Structuring our content just like data
    28. 28. So why do I need it?  Faceted navigation I can click in level by level to find something (Cuisine>Asian>Chinese>Szechwan )
    29. 29. So why do I need it?  Faceted navigation I can click in level by level to find something (Cuisine>Asian>Chinese>Szech-wan )  Much better site search by enabling search boxes that can restrict a search to terms of a particular type or context
    30. 30. So why do I need it?  Faceted navigation I can click in level by level to find something (Cuisine>Asian>Chinese>Szech-wan )  Much better search by enabling search boxes that can restrict a search to terms of a particular type or context  Related information that I may not have known about (articles, photo galleries, other listings)
    31. 31. So why do I need it?  Faceted navigation I can click in level by level to find something (Cuisine>Asian>Chinese>Szech-wan )  Much better search by enabling search boxes that can restrict a search to terms of a particular type or context  Related information that I may not have known about (articles, photo galleries, other listings)  Multiple attributes for listings (Parking, Ambience)
    32. 32. So why do I need it?  Faceted navigation I can click in level by level to find something (Cuisine>Asian>Chinese>Szech-wan )  Much better search by enabling search boxes that can restrict a search to terms of a particular type or context  Related information that I may not have known about (articles, photo galleries, other listings)  Multiple attributes for listings (BYOB, Outdoor Dining)  Higher search ranking (SEO) on search engines for topic subjects, listings, classified categories Source: Chicago Tribune
    33. 33. ‘ Hot Topic’-driven content pages provide new opportunities for keyword-targeted advertising, boost SEO rankings, increase site traffic and drive more engagement
    34. 35. Embedded links increase page views and boost SEO, which generates more site traffic
    35. 37. Geographic terms can power data mapping
    36. 38. … or create customized alerts where you define the geography where you want notifications to come from
    37. 39. Whatever you call it… it’s about describing and classifying our content <ul><li>Taxonomy – our standard, heirarchical categories </li></ul><ul><li>Metadata – the keywords describing a piece of content </li></ul><ul><li>Structured data – Information that’s been organized as above </li></ul>
    38. 40. How do I get structured data? <ul><li>You can do it by hand </li></ul><ul><ul><li>Librarians and Web producers are doing it every day </li></ul></ul><ul><ul><li>But they can only do so much </li></ul></ul><ul><ul><li>Should reporters be adding metadata to every story? </li></ul></ul><ul><ul><li>Should line editors and/or copy editors add metadata? </li></ul></ul><ul><li>You can use technology </li></ul><ul><ul><li>IPTC </li></ul></ul><ul><ul><li>AP Digital Exchange </li></ul></ul><ul><ul><li>Inform </li></ul></ul><ul><ul><li>Teragram </li></ul></ul><ul><ul><li>NStein </li></ul></ul><ul><ul><li>MetaCarta (geotagging) </li></ul></ul><ul><ul><li>Serra Media (geotagging) </li></ul></ul><ul><ul><li>Generate Inc. (business data) </li></ul></ul>
    39. 41. Elements of a data strategy
    40. 42. Organizing for structured data <ul><li>Do you need a taxonomy and data strategy? </li></ul><ul><li>Audit your newsroom datastream </li></ul><ul><li>Do you need a data coordinator? </li></ul><ul><li>Gannett’s Data Desks </li></ul><ul><ul><li>Brought everyone who ‘does data’ together: Agate clerks, librarians, CAR staff </li></ul></ul><ul><ul><li>Responsibilities: Acquiring data, creating databases, programming them for interactivity, training for building spreadsheets </li></ul></ul><ul><li>Do you need self-service ways the public can give you info? </li></ul><ul><ul><li>Yes: Web forms </li></ul></ul><ul><ul><li>No: Faxes, phones, notes on paper, lists on someone’s computer </li></ul></ul>
    41. 43. Reporting for structured data <ul><li>Reporting with data in mind means you gather the same fact in the same way every time </li></ul><ul><ul><li>Shoot every photo exactly the same way </li></ul></ul><ul><ul><li>Ask the same question of every interviewee </li></ul></ul><ul><ul><li>Find out all the same info from every venue </li></ul></ul><ul><li>Save the data </li></ul><ul><li>Input the data alongside the story </li></ul><ul><li>Look for databases you can bring back with your story </li></ul><ul><li>Example: Bluegrass Instruments </li></ul>
    42. 44. Writing/editing for structured data <ul><li>Editors add and apply keywords and standard categories </li></ul><ul><li>Bloggers (already?) tag and categorize blog posts </li></ul><ul><ul><li>Should you have standard categories or go with the ‘folksonomy?’ </li></ul></ul>
    43. 45. Discussion http://www.slideshare.net/ajmoor
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×