E X T R A C T I V<br />The long-awaited beta!<br />
E X T R A C T I V<br />What it is:  Semantics services that transforms unstructured web content into structured semantic d...
  Applies NLP tools to perform entity extraction
  Produces marked-up files for you to use</li></ul>Who it’s for:<br /><ul><li>  Need high-volume text extraction
  Need more types of entities
  Want to go above OpenCalais limits
  Don’t want to pay a ton</li></li></ul><li>E X T R A C T I V<br />Demo time<br />
Upcoming SlideShare
Loading in...5
×

Extractiv

985

Published on

Shion Deysarkar presents Extractiv at the Austin Hadoop User Group

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
985
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Extractiv

  1. 1. E X T R A C T I V<br />The long-awaited beta!<br />
  2. 2. E X T R A C T I V<br />What it is: Semantics services that transforms unstructured web content into structured semantic data.<br />What it does:<br /><ul><li> Crawls millions of pages
  3. 3. Applies NLP tools to perform entity extraction
  4. 4. Produces marked-up files for you to use</li></ul>Who it’s for:<br /><ul><li> Need high-volume text extraction
  5. 5. Need more types of entities
  6. 6. Want to go above OpenCalais limits
  7. 7. Don’t want to pay a ton</li></li></ul><li>E X T R A C T I V<br />Demo time<br />
  8. 8. How it works<br />Create Extractiv Job<br /><ul><li> Automatically builds extraction app
  9. 9. Packages entities into data model blob</li></ul>Pass Job to Crawling Service<br /><ul><li> Checks duplicate links against link graph
  10. 10. Packages links and app into a work unit</li></ul>Pass Work Unit to Grid Server<br /><ul><li> Packages work units into version
  11. 11. Identifies available nodes
  12. 12. Sends version to nodes</li></ul>Pass Version to Nodes<br /><ul><li> Downloads content of link
  13. 13. Runs app, returns result
  14. 14. Sends results back</li></ul>Node Completes Work Units<br />
  15. 15. Some fun metrics<br />Max theoretical processing speed (per user):<br />5 million documents per hour<br />Available node pool:<br />50,000+ completely heterogeneous PCs<br />Back-end architecture:<br />12 grid service servers<br />8 crawling service servers<br />Number of available entities:<br />239 now, 1000+ in the next few months<br />Minimum time to create new entity:<br />2-3 hours<br />
  16. 16. Coming soon…<br />New features:<br />RDF<br />Facts<br />Relations<br />Entity linking<br />Triples<br />Pricing plans:<br />Monthly access + per-document pricing<br />Higher document limits<br />Advanced features<br />API:<br />Create jobs and retrieve results<br />Integrate directly with your applications<br />
  17. 17. If you want to try it out<br />Go to http://www.extractiv.com<br />Follow us @extractiv<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×