• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Extractiv
 

Extractiv

on

  • 1,153 views

Shion Deysarkar presents Extractiv at the Austin Hadoop User Group

Shion Deysarkar presents Extractiv at the Austin Hadoop User Group

Statistics

Views

Total Views
1,153
Views on SlideShare
1,109
Embed Views
44

Actions

Likes
0
Downloads
4
Comments
0

5 Embeds 44

http://austinhug.blogspot.com 40
http://austinhug.blogspot.com.es 1
http://theoldreader.com 1
http://austinhug.blogspot.in 1
http://austinhug.blogspot.fr 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Extractiv Extractiv Presentation Transcript

    • E X T R A C T I V
      The long-awaited beta!
    • E X T R A C T I V
      What it is: Semantics services that transforms unstructured web content into structured semantic data.
      What it does:
      • Crawls millions of pages
      • Applies NLP tools to perform entity extraction
      • Produces marked-up files for you to use
      Who it’s for:
      • Need high-volume text extraction
      • Need more types of entities
      • Want to go above OpenCalais limits
      • Don’t want to pay a ton
    • E X T R A C T I V
      Demo time
    • How it works
      Create Extractiv Job
      • Automatically builds extraction app
      • Packages entities into data model blob
      Pass Job to Crawling Service
      • Checks duplicate links against link graph
      • Packages links and app into a work unit
      Pass Work Unit to Grid Server
      • Packages work units into version
      • Identifies available nodes
      • Sends version to nodes
      Pass Version to Nodes
      • Downloads content of link
      • Runs app, returns result
      • Sends results back
      Node Completes Work Units
    • Some fun metrics
      Max theoretical processing speed (per user):
      5 million documents per hour
      Available node pool:
      50,000+ completely heterogeneous PCs
      Back-end architecture:
      12 grid service servers
      8 crawling service servers
      Number of available entities:
      239 now, 1000+ in the next few months
      Minimum time to create new entity:
      2-3 hours
    • Coming soon…
      New features:
      RDF
      Facts
      Relations
      Entity linking
      Triples
      Pricing plans:
      Monthly access + per-document pricing
      Higher document limits
      Advanced features
      API:
      Create jobs and retrieve results
      Integrate directly with your applications
    • If you want to try it out
      Go to http://www.extractiv.com
      Follow us @extractiv