SlideShare is now on Android. 15 million presentations at your fingertips.  Get the app

×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Making Data Work Conference March 2012

by on Mar 08, 2012

  • 3,986 views

Alyona Medelyan (Pingar), Anna Divoli (Pingar)...

Alyona Medelyan (Pingar), Anna Divoli (Pingar)

presented at Strata O'Reilly Making Data Work Conference on March 1, 2012

The challenge of unstructured data is a top priority for organizations that are looking for ways to search, sort, analyze and extract knowledge from masses of documents they store and create daily. Text mining uses knowledge-driven algorithms to make sense of documents in a similar way a person would do by reading them. Lately, text mining and analytics tools became available via APIs, meaning that organizations can take immediate advantage these tools. We discuss three examples of how such APIs were utilized to solve key business challenges.

Most organizations dream of paperless office, but still generate and receive millions of print documents. Digitizing these documents and intelligently sharing them is a universal enterprise challenge. Major scanning providers offer solutions that analyze scanned and OCR’d documents and then store detected information in document management systems. This works well with pre-defined forms, but human interaction is required when scanning unstructured text. We describe a prototype build for the legal vertical that scans stacks of paper documents and on the fly categorizes and generates meaningful metadata.

In the area of forensics, intelligence and security, manual monitoring of masses of unstructured data is not feasible. The ability of automatically identify people’s names, addresses, credit card and bank account numbers and other entities is the key. We will briefly describe a case study of how a major international financial institution is taking advantage of text mining APIs in order to comply with a recent legislation act.

In healthcare, although Electronic Health Records (EHRs) have been increasingly becoming available over the past two decades, patient confidentiality and privacy concerns have been acting as obstacles from utilizing the incredibly valuable information they contain to further medical research. Several approaches have been reported in assigning unique encrypted identifiers to patients’ ID but each comes with drawbacks. For a number of medical studies consistent uniform ID mapping is not necessary and automated text sanitization can serve as a solution. We will demonstrate how sanitization has practical use in a medical study.


And read a full interview with Alyona and Anna at http://radar.oreilly.com/2012/02/unstructured-data-analysis-tools.html

Statistics

Views

Total Views
3,986
Views on SlideShare
2,919
Embed Views
1,067

Actions

Likes
2
Downloads
74
Comments
0

11 Embeds 1,067

http://www.annadivoli.com 808
http://strataconf.com 212
http://annadivoli.com 24
http://dev.en.oreilly.com 7
http://www.linkedin.com 6
http://lanyrd.com 5
https://si0.twimg.com 1
http://us-w1.rockmelt.com 1
https://twimg0-a.akamaihd.net 1
http://webcache.googleusercontent.com 1
http://w.annadivoli.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
Post Comment
Edit your comment

Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Making Data Work Conference March 2012 Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Making Data Work Conference March 2012 Presentation Transcript