Discover New Value from Unstructured Data


Published on

Presented at Semantic Garage Meetup San Francisco 2011. Unstructured data comes at a high cost - $37,000 per year per person in information industries. By using tools to automatically add metadata enterprises can improve search results, speed e-discovery and risk assessment, summarize content and extract entities from files. Unstructured and semi-structured data represents a large component of big data. By turning unstructured content into business intelligence, enterprise can speed time to information.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Approximately 1/3 of time spent at work is on tasks that can be automated.Just the time devoted to searching and organizing documents costs an average company 37K per person per year.
  • Pingar can’t help you write emails or create presentations. But it can rescue much of the time and costs spent on other tasks related to unstructured documents.
  • Examples:Creating a search report without the need of visiting each link in search resultsAnalyzing thousands of emails released by Enron and detecting interesting pattern in theseA SharePoint webpart that helps users formulate their query and understand search resultsHelp editing documents, e.g. removing all the sensitive information from them (redaction, sanitization)Gathering information by summarizing documentsMetadata is information about documents that can be used for filtering search results. People avoid creating metadata – they already spent time creating documents, they don’t want to spend time organizing them. Pingar generates metadata on the fly – it automatically extracts keywords, identifies names of places, companies, people.
  • Discover New Value from Unstructured Data

    1. 1. Pingar SharePoint NZ Idol For Wave to incorporate into Peter’s presentation
    2. 2. Time spent on information tasks Avg. hours per week14.5 = 37K year/person 13.3 9.6 9.5 8.8 8.3 6.8 6.7 5.6 5.6 4.3 4.2 1 Source: IDC, Hidden Cost of Information (2005)
    3. 3. Time spent on information tasks Avg. hours per week …can be rescued!14.5 13.3 9.6 9.5 8.8 8.3 6.8 6.7 5.6 5.6 4.3 4.2 1 Source: IDC, Hidden Cost of Information (2005)
    4. 4. Redaction example is from
    5. 5. New Pingar APIRapid DiscoveryRelated searchesDynamic facetsDocument preview HCIR Workshop 20 October 2011 Google, Mountain View
    6. 6. New Pingar API Entity Extraction Named entity extraction Taxonomy mapping Linked Data connectors Address detection Invoice analysis Mining Custom Taxonomies Sept 2010 – Feb 2012NZ Ministry of Science and Innovation University of Waikato & Pingar
    7. 7. New Pingar APIContent Analysis querySanitization and redactionOffensive content filteringSummarization Link to downloadReport generation an auto-generated PDF report Exploring verticals Legal Bioscience Education Government
    8. 8. Demo time