• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Visualization for Data Analysis: A New Way to Look at Content
 

Visualization for Data Analysis: A New Way to Look at Content

on

  • 1,287 views

Webinar presented March 28, 2012 by Marjorie M.K. Hlava of Access Innovations as part of the Special Libraries Association's Virtual Lunch series.

Webinar presented March 28, 2012 by Marjorie M.K. Hlava of Access Innovations as part of the Special Libraries Association's Virtual Lunch series.

Statistics

Views

Total Views
1,287
Views on SlideShare
1,287
Embed Views
0

Actions

Likes
4
Downloads
26
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Access Innovations and its software brand Data Harmony are known for the high caliber of data. It is clean, well formed and very accurately semantically enriched. They updated the IEEE thesaurus in 2005, building a rule base for use in indexing at the same time. The application of the terms to the IEEE content was 90% accurate – that is 90% of the terms suggested are what well trained indexers would use from a controlled vocabulary, and 80% accurate from the more difficult proceedings data at launch of the project. Since that time the rule base has improved over time and the IEEE production team only needs to spot check about 10% of the documents to insure a high standard of indexing is maintained. It has allowed IEEE to process a lot more documents with the same team and made the process more fun at the same time. The indexers are allowed time to think about the content, the thesaurus terms, what should be added and what other information can be collected to continue to enrich the files because the Data harmony software removes many of the clerical aspects of the indexing process, leveraging the mental processing of the staff. The accuracy is high enough that we simply indexed the entire contents of the eXplore database back to the earliest records in a single overnight process. Then to explore the edges of science we also indexed the 1.2 million records using Medical Subject headings and the defense Technical Information Center thesauri with similar accuracy results.

Visualization for Data Analysis: A New Way to Look at Content Visualization for Data Analysis: A New Way to Look at Content Presentation Transcript

  • Marjorie M.K. Hlava. President Access Innovations, Inc. mhlava@accessinn.com 505-998-0800MARCH 28THVIRTUAL LUNCH WEBINAR:VISUALIZATION FOR DATAANALYSIS: A NEW WAY TO LOOKAT CONTENT 1
  • A picture is worth…thousand words As librarians we normally look at data in lists, citations, and other text based presentations. Increasingly however this data can be analyzed, manipulated and presented as visual displays. Maps of science, places and spaces, increased amounts storage and computing power have made working with digital assets possible. Presenting the data in new and visual ways allow us to see trends, changes in research directions, coverage, demographic trends, data overlap and the white spaces where data does not exist on a topic – knowledge gaps are exposed. This talk will cover how the data is prepared and options for visual display content… 100 words x10 = thousand words 2
  • Why take a visual look?• As librarians we normally look at data in lists, citations, and other text based presentations.• Increasingly however this data can be analyzed, manipulated and presented as visual displays.• Maps of science, places and spaces, increased amounts storage and computing power have made working with digital assets possible.• Presenting the data in new and visual ways allow us to see trends, changes in research directions, coverage, demographic trends, data overlap and the white spaces where data does not exist on a topic – knowledge gaps are exposed. 3
  • Visualization of data• Needs • Is richer with − Measurement − Linking − Metrics − Semantic enrichment − Numbers − Classification• Shows − Adjacency • Supports − Relationships − Forecasting − Trends − Trend analysis − Co – occurrence − Segmentation − Conceptual distance − Distribution 4
  • Man’s attention to visual display to convey knowledge is ancientWell Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 5
  • The art in maps is a longstanding tradition Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 6
  • Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 7
  • Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 8
  • Super imposing data is now common A mash up example Traffic Injury Map UK Data Archive US National Highway Safety Administration Google Maps Base Accident categories include children automobile bicycle etc. Data time place type Source: JISC TechWatch: Data Mash-ups September 2010 Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 9
  • The most popular APIs for mashups a) July 2009 b) October 2009 Source: JISC TechWatch: Data Mash-ups September 2010 Programmable web data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 10
  • Radio4 website Data source MapTube Credit Crunch Mood Map User Website questionnaire Crowdsourced visualization and mappingEarly responses Final Credit Crunch Mood Map Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Source: JISC TechWatch: Data Mash-ups September 2010 11
  • Mash up of bird flight migrations and weather patternshttp://www.youtube.com/watch?v=uPff1t4pXiI&feature=youtu.be Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 12
  • http://www.youtube.com/watch?v=nokQBjk1s_8&feature=player_embedded Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
  • Noise Tube Application uses geo-locations of SMSlike Twitter with GPS sensing on mobile devices Source: JISC TechWatch: Data Mash-ups September 2010 Programmable web data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 14
  • Changes in our life time!Its only the beginning Source: JISC TechWatch: Data Mash-ups September 2010 Programmable web data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 15
  • Fine, So there are nice visual maps,What about information from databases and libraries?? Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 16
  • Start with data – like this XML file Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 17
  • Index or tag using subject terms from thesaurus or taxonomy date, category, taxonomy term, frequency Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 18
  • Many views of one set of data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 19
  • Load to a visualization program Like Prefuse Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 20
  • Or PajekWell Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 21
  • Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 22
  • Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 23
  • Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 24
  • Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 25
  • National Information center for Educational MediaAlbuquerque’s own» Sandia developed VxInsight» Access Innovations NICEMSame data - three viewsPrimary and Secondary Education in USShows the US Valley of ScienceLittle Science taught in elementary years Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 26
  • Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
  • Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
  • Requirements for Visualization From a society / publisher perspective » Which topical areas form our core? periphery? » Where is the coverage dense? thin? » Which topical areas are most active? least active? » Which topical areas seem to be emerging? declining? » Which topical areas are interrelated? isolated? » What are the overlaps between journals / segments? » Where are the potential expansion points? From a thesaurus perspective » What terms are too broadly defined? » How do actual topical relationships differ from the thesaurus structure? Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 29
  • Using visualization to show From a society / publisher perspective » Identify Core, Boundary and Cross Border » Provides Indicators  Activity  Growth  Relatedness  Centrality » Locates Journal domains From a thesaurus perspective » Identifies terms that are too broadly defined » Potential Improvements in thesaurus structure using topic structures Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 30
  • Case Study: Mapping IEEE thesaurus space We are interested in an expanded map that includes adjacencies to the IEEE data » Expanded term set shows adjacent white space; opportunities for expansion Overlaps and edges of the science » We need comparison data Learn the directions in the field » Low occurrence rate in IEEE documents? » Linkage to terms in IEEE documents? Where do we find these terms? How can we add them? Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 31
  • The process Built a rule base to auto index IEEE content » “90 % accuracy out of the box on journal data”* » “80% out of the box on proceedings data”* The overlapping data sets » Auto indexed 1.2 million Xplore records » 10 years of US Patent data » 10 years of Medline Term sets used » IEEE thesaurus terms rule base » Medical Subject Headings (MeSH) (and simple rule base) » Defense Technical Information Center (DTIC) Thesaurus ( and simple rule base) » Similar level of detail to current IEEE thesaurus terms Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 32
  • Defining expanded term space 1. The data - Select related corpus 14k DTIC 2k terms IEEE 475k patents PubMed 1.2M documents 525k docs 24k MeSH Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 33
  • Defining expanded term space 2. Identify related termsUse the IEEE Thesaurus to index the three collections 2k terms IEEE 1.2M documents Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 34
  • Defining expanded term space 2. Identify related termsUse MESH and DTIC to also index the three collections 2k terms IEEE 1.2M documents Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 35
  • Defining expanded term space 3. Resulting term setThe co-indexed items from the three collections 2k terms IEEE 1.2M documents Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 36
  • Defining expanded term space 4. Term:Term MatrixWhere do the articles and their indexing intersect? Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 37
  • Visualization Strategies VisualizationMatrix Software Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 38
  • All data up-posted to the top level Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 39
  • Many map optionsPrevious Experience IEEE Experience Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 40
  • IEEE Portfolio Electromag Compat Soc Prof Reliability Commun Society Society Education Sensors Ultrason, Robot Society Oceanic Council Ferro … Autom Soc Engng Soc Instr Measur Soc CouncilDielectr El Nucl Plasma SupercondInsul Soc Sys Man Sci Soc Computer Cyber Prod Saf Society Photonics Compon, Systems Society Engng Soc Magnetics Council Soc Packag … Soc Nanotech Social Council Impl Techn Computer Intelligence Society Eng Med Biol Sci Council Electr Design Auto Industr Industry Geosci Rem Electr Soc Appl Soc Sens Soc Antennas Propag Soc Power Power & Electron Soc MicrowaveEnergy Soc Theory Soc Circuits & Signal Consumer Systems Electron Proc Soc Electr Soc Dev Soc Broadcast Intell Transp Techn Soc Sys Soc Solid St Circuits Soc Aerosp Electr Vehicular Sys Soc Techn Soc Commun Soc Info Theory Soc Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 41
  • Radial Visualization Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 42
  • Subsidiary radials Journal of InstrumentationCompon, Dielectr El Ultrason, ElectromagPackag … Instr Ferro … Compat Soc Insul Soc Measur Soc Prod Saf Council Magnetics Sensors AntennasEngng Soc Supercond Soc Council Propag Soc Nanotech Oceanic Geosci Rem Nucl Plasma Council Engng Soc Sens Soc Sci Soc Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
  • The research team Access Innovations / Data Harmony » Founded in 1978 » Data enrichment and normalization » Suite of Semantic Enrichment tools SciTechStrategies » Understanding data through visualization IEEE Indexing & Abstracting Group Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 44
  • Use a Thesaurus to Label Maps Construction Packaging Consumer Products Vehicles, Parts Welding Gearing Automotive + Flow Defense Boats Appliances Food Brakes Hygiene Aircraft Dynamics Sprayers Cleaning IC Engines Turbines Industrial Pumps ValvesProducts Exhaust Leisure Fitness Outerwear Footwear Control Medical Pipes Devices Toys Health CareClocks Games Blasting Radiology Cooling Measurement Energy Med Instruments Agriculture Cables Heating Plants, Micro-orgs Conveyers Oilfield Services Pharma Lamps Components Printing Telecom Computer Motors Acyclic Comp HW/SW Semiconductors Lubricants Metals Optics Lasers Rubber Molding Paper Displays Electronics Catalysis Magn/Elect Conductors Layers Circuits Textiles Electrochem Magnets Macromolecules Disk Amplifiers Photochem Chemicals Coatings Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 45
  • Questions Answered Is there a way, using our own information, to forecast our direction? Where is the industry headed? What about by technology sector? Does our coverage match our mission and vision? Can we become smarter about our data and potential markets using our collection in new ways? Are the societies publishing and talking about what their charter indicates they cover? What are the trends – are topics emerging/cooling? Can we use technology and our own data to explore these questions while enhancing our data? Well Formed Data • Semantic Enrichment • Taxonomies •46 Access Innovations • Data Harmony
  • Conference Strategy Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 47
  • Publication StrategyJASIST reference Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 48
  • We looked at Visualization of data Finding the Metrics  How to enrich with » Measurement » Linking » Numbers » Semantic enrichment » Terms as indicators » Classification Ways to show  Maps supporting » Adjacency » Relationships » Forecasting » Trends » Trend analysis » Co – occurrence » Segmentation » Conceptual distance » Distribution 49 Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
  • Effective maps require Contextual data Detailed data Classification methods At least two directions in the matrix A little art for fun Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 50
  • Changing the way we interact with realityAcrossair’s Augmented reality application – just point your phone at it Source: JISC TechWatch: Data Mash-ups September 2010 Programmable web data Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 51
  • It just takes a little imagination Thank you Marjorie M.K. Hlava President, Access Innovations 505-998-0800 mhlava@accessinn.com 52