Visualization for Data Analysis: A New Way to Look at Content

1,836 views
1,730 views

Published on

Webinar presented March 28, 2012 by Marjorie M.K. Hlava of Access Innovations as part of the Special Libraries Association's Virtual Lunch series.

Published in: Education, Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,836
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
26
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Access Innovations and its software brand Data Harmony are known for the high caliber of data. It is clean, well formed and very accurately semantically enriched. They updated the IEEE thesaurus in 2005, building a rule base for use in indexing at the same time. The application of the terms to the IEEE content was 90% accurate – that is 90% of the terms suggested are what well trained indexers would use from a controlled vocabulary, and 80% accurate from the more difficult proceedings data at launch of the project. Since that time the rule base has improved over time and the IEEE production team only needs to spot check about 10% of the documents to insure a high standard of indexing is maintained. It has allowed IEEE to process a lot more documents with the same team and made the process more fun at the same time. The indexers are allowed time to think about the content, the thesaurus terms, what should be added and what other information can be collected to continue to enrich the files because the Data harmony software removes many of the clerical aspects of the indexing process, leveraging the mental processing of the staff. The accuracy is high enough that we simply indexed the entire contents of the eXplore database back to the earliest records in a single overnight process. Then to explore the edges of science we also indexed the 1.2 million records using Medical Subject headings and the defense Technical Information Center thesauri with similar accuracy results.
  • Visualization for Data Analysis: A New Way to Look at Content

    1. 1. Marjorie M.K. Hlava. President Access Innovations, Inc. mhlava@accessinn.com 505-998-0800MARCH 28THVIRTUAL LUNCH WEBINAR:VISUALIZATION FOR DATAANALYSIS: A NEW WAY TO LOOKAT CONTENT 1
    2. 2. A picture is worth…thousand words As librarians we normally look at data in lists, citations, and other text based presentations. Increasingly however this data can be analyzed, manipulated and presented as visual displays. Maps of science, places and spaces, increased amounts storage and computing power have made working with digital assets possible. Presenting the data in new and visual ways allow us to see trends, changes in research directions, coverage, demographic trends, data overlap and the white spaces where data does not exist on a topic – knowledge gaps are exposed. This talk will cover how the data is prepared and options for visual display content… 100 words x10 = thousand words 2
    3. 3. Why take a visual look?• As librarians we normally look at data in lists, citations, and other text based presentations.• Increasingly however this data can be analyzed, manipulated and presented as visual displays.• Maps of science, places and spaces, increased amounts storage and computing power have made working with digital assets possible.• Presenting the data in new and visual ways allow us to see trends, changes in research directions, coverage, demographic trends, data overlap and the white spaces where data does not exist on a topic – knowledge gaps are exposed. 3
    4. 4. Visualization of data• Needs • Is richer with − Measurement − Linking − Metrics − Semantic enrichment − Numbers − Classification• Shows − Adjacency • Supports − Relationships − Forecasting − Trends − Trend analysis − Co – occurrence − Segmentation − Conceptual distance − Distribution 4
    5. 5. Man’s attention to visual display to convey knowledge is ancientWell Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 5
    6. 6. The art in maps is a longstanding tradition Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 6
    7. 7. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 7
    8. 8. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 8
    9. 9. Super imposing data is now common A mash up example Traffic Injury Map UK Data Archive US National Highway Safety Administration Google Maps Base Accident categories include children automobile bicycle etc. Data time place type Source: JISC TechWatch: Data Mash-ups September 2010 Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 9
    10. 10. The most popular APIs for mashups a) July 2009 b) October 2009 Source: JISC TechWatch: Data Mash-ups September 2010 Programmable web data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 10
    11. 11. Radio4 website Data source MapTube Credit Crunch Mood Map User Website questionnaire Crowdsourced visualization and mappingEarly responses Final Credit Crunch Mood Map Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Source: JISC TechWatch: Data Mash-ups September 2010 11
    12. 12. Mash up of bird flight migrations and weather patternshttp://www.youtube.com/watch?v=uPff1t4pXiI&feature=youtu.be Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 12
    13. 13. http://www.youtube.com/watch?v=nokQBjk1s_8&feature=player_embedded Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
    14. 14. Noise Tube Application uses geo-locations of SMSlike Twitter with GPS sensing on mobile devices Source: JISC TechWatch: Data Mash-ups September 2010 Programmable web data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 14
    15. 15. Changes in our life time!Its only the beginning Source: JISC TechWatch: Data Mash-ups September 2010 Programmable web data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 15
    16. 16. Fine, So there are nice visual maps,What about information from databases and libraries?? Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 16
    17. 17. Start with data – like this XML file Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 17
    18. 18. Index or tag using subject terms from thesaurus or taxonomy date, category, taxonomy term, frequency Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 18
    19. 19. Many views of one set of data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 19
    20. 20. Load to a visualization program Like Prefuse Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 20
    21. 21. Or PajekWell Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 21
    22. 22. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 22
    23. 23. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 23
    24. 24. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 24
    25. 25. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 25
    26. 26. National Information center for Educational MediaAlbuquerque’s own» Sandia developed VxInsight» Access Innovations NICEMSame data - three viewsPrimary and Secondary Education in USShows the US Valley of ScienceLittle Science taught in elementary years Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 26
    27. 27. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
    28. 28. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
    29. 29. Requirements for Visualization From a society / publisher perspective » Which topical areas form our core? periphery? » Where is the coverage dense? thin? » Which topical areas are most active? least active? » Which topical areas seem to be emerging? declining? » Which topical areas are interrelated? isolated? » What are the overlaps between journals / segments? » Where are the potential expansion points? From a thesaurus perspective » What terms are too broadly defined? » How do actual topical relationships differ from the thesaurus structure? Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 29
    30. 30. Using visualization to show From a society / publisher perspective » Identify Core, Boundary and Cross Border » Provides Indicators  Activity  Growth  Relatedness  Centrality » Locates Journal domains From a thesaurus perspective » Identifies terms that are too broadly defined » Potential Improvements in thesaurus structure using topic structures Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 30
    31. 31. Case Study: Mapping IEEE thesaurus space We are interested in an expanded map that includes adjacencies to the IEEE data » Expanded term set shows adjacent white space; opportunities for expansion Overlaps and edges of the science » We need comparison data Learn the directions in the field » Low occurrence rate in IEEE documents? » Linkage to terms in IEEE documents? Where do we find these terms? How can we add them? Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 31
    32. 32. The process Built a rule base to auto index IEEE content » “90 % accuracy out of the box on journal data”* » “80% out of the box on proceedings data”* The overlapping data sets » Auto indexed 1.2 million Xplore records » 10 years of US Patent data » 10 years of Medline Term sets used » IEEE thesaurus terms rule base » Medical Subject Headings (MeSH) (and simple rule base) » Defense Technical Information Center (DTIC) Thesaurus ( and simple rule base) » Similar level of detail to current IEEE thesaurus terms Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 32
    33. 33. Defining expanded term space 1. The data - Select related corpus 14k DTIC 2k terms IEEE 475k patents PubMed 1.2M documents 525k docs 24k MeSH Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 33
    34. 34. Defining expanded term space 2. Identify related termsUse the IEEE Thesaurus to index the three collections 2k terms IEEE 1.2M documents Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 34
    35. 35. Defining expanded term space 2. Identify related termsUse MESH and DTIC to also index the three collections 2k terms IEEE 1.2M documents Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 35
    36. 36. Defining expanded term space 3. Resulting term setThe co-indexed items from the three collections 2k terms IEEE 1.2M documents Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 36
    37. 37. Defining expanded term space 4. Term:Term MatrixWhere do the articles and their indexing intersect? Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 37
    38. 38. Visualization Strategies VisualizationMatrix Software Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 38
    39. 39. All data up-posted to the top level Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 39
    40. 40. Many map optionsPrevious Experience IEEE Experience Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 40
    41. 41. IEEE Portfolio Electromag Compat Soc Prof Reliability Commun Society Society Education Sensors Ultrason, Robot Society Oceanic Council Ferro … Autom Soc Engng Soc Instr Measur Soc CouncilDielectr El Nucl Plasma SupercondInsul Soc Sys Man Sci Soc Computer Cyber Prod Saf Society Photonics Compon, Systems Society Engng Soc Magnetics Council Soc Packag … Soc Nanotech Social Council Impl Techn Computer Intelligence Society Eng Med Biol Sci Council Electr Design Auto Industr Industry Geosci Rem Electr Soc Appl Soc Sens Soc Antennas Propag Soc Power Power & Electron Soc MicrowaveEnergy Soc Theory Soc Circuits & Signal Consumer Systems Electron Proc Soc Electr Soc Dev Soc Broadcast Intell Transp Techn Soc Sys Soc Solid St Circuits Soc Aerosp Electr Vehicular Sys Soc Techn Soc Commun Soc Info Theory Soc Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 41
    42. 42. Radial Visualization Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 42
    43. 43. Subsidiary radials Journal of InstrumentationCompon, Dielectr El Ultrason, ElectromagPackag … Instr Ferro … Compat Soc Insul Soc Measur Soc Prod Saf Council Magnetics Sensors AntennasEngng Soc Supercond Soc Council Propag Soc Nanotech Oceanic Geosci Rem Nucl Plasma Council Engng Soc Sens Soc Sci Soc Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
    44. 44. The research team Access Innovations / Data Harmony » Founded in 1978 » Data enrichment and normalization » Suite of Semantic Enrichment tools SciTechStrategies » Understanding data through visualization IEEE Indexing & Abstracting Group Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 44
    45. 45. Use a Thesaurus to Label Maps Construction Packaging Consumer Products Vehicles, Parts Welding Gearing Automotive + Flow Defense Boats Appliances Food Brakes Hygiene Aircraft Dynamics Sprayers Cleaning IC Engines Turbines Industrial Pumps ValvesProducts Exhaust Leisure Fitness Outerwear Footwear Control Medical Pipes Devices Toys Health CareClocks Games Blasting Radiology Cooling Measurement Energy Med Instruments Agriculture Cables Heating Plants, Micro-orgs Conveyers Oilfield Services Pharma Lamps Components Printing Telecom Computer Motors Acyclic Comp HW/SW Semiconductors Lubricants Metals Optics Lasers Rubber Molding Paper Displays Electronics Catalysis Magn/Elect Conductors Layers Circuits Textiles Electrochem Magnets Macromolecules Disk Amplifiers Photochem Chemicals Coatings Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 45
    46. 46. Questions Answered Is there a way, using our own information, to forecast our direction? Where is the industry headed? What about by technology sector? Does our coverage match our mission and vision? Can we become smarter about our data and potential markets using our collection in new ways? Are the societies publishing and talking about what their charter indicates they cover? What are the trends – are topics emerging/cooling? Can we use technology and our own data to explore these questions while enhancing our data? Well Formed Data • Semantic Enrichment • Taxonomies •46 Access Innovations • Data Harmony
    47. 47. Conference Strategy Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 47
    48. 48. Publication StrategyJASIST reference Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 48
    49. 49. We looked at Visualization of data Finding the Metrics  How to enrich with » Measurement » Linking » Numbers » Semantic enrichment » Terms as indicators » Classification Ways to show  Maps supporting » Adjacency » Relationships » Forecasting » Trends » Trend analysis » Co – occurrence » Segmentation » Conceptual distance » Distribution 49 Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
    50. 50. Effective maps require Contextual data Detailed data Classification methods At least two directions in the matrix A little art for fun Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 50
    51. 51. Changing the way we interact with realityAcrossair’s Augmented reality application – just point your phone at it Source: JISC TechWatch: Data Mash-ups September 2010 Programmable web data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 51
    52. 52. It just takes a little imagination Thank you Marjorie M.K. Hlava President, Access Innovations 505-998-0800 mhlava@accessinn.com 52

    ×