Extractiv
Upcoming SlideShare
Loading in...5
×
 

Extractiv

on

  • 1,177 views

Shion Deysarkar presents Extractiv at the Austin Hadoop User Group

Shion Deysarkar presents Extractiv at the Austin Hadoop User Group

Statistics

Views

Total Views
1,177
Views on SlideShare
1,133
Embed Views
44

Actions

Likes
0
Downloads
4
Comments
0

5 Embeds 44

http://austinhug.blogspot.com 40
http://austinhug.blogspot.com.es 1
http://theoldreader.com 1
http://austinhug.blogspot.in 1
http://austinhug.blogspot.fr 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Extractiv Extractiv Presentation Transcript

  • E X T R A C T I V
    The long-awaited beta!
  • E X T R A C T I V
    What it is: Semantics services that transforms unstructured web content into structured semantic data.
    What it does:
    • Crawls millions of pages
    • Applies NLP tools to perform entity extraction
    • Produces marked-up files for you to use
    Who it’s for:
    • Need high-volume text extraction
    • Need more types of entities
    • Want to go above OpenCalais limits
    • Don’t want to pay a ton
  • E X T R A C T I V
    Demo time
  • How it works
    Create Extractiv Job
    • Automatically builds extraction app
    • Packages entities into data model blob
    Pass Job to Crawling Service
    • Checks duplicate links against link graph
    • Packages links and app into a work unit
    Pass Work Unit to Grid Server
    • Packages work units into version
    • Identifies available nodes
    • Sends version to nodes
    Pass Version to Nodes
    • Downloads content of link
    • Runs app, returns result
    • Sends results back
    Node Completes Work Units
  • Some fun metrics
    Max theoretical processing speed (per user):
    5 million documents per hour
    Available node pool:
    50,000+ completely heterogeneous PCs
    Back-end architecture:
    12 grid service servers
    8 crawling service servers
    Number of available entities:
    239 now, 1000+ in the next few months
    Minimum time to create new entity:
    2-3 hours
  • Coming soon…
    New features:
    RDF
    Facts
    Relations
    Entity linking
    Triples
    Pricing plans:
    Monthly access + per-document pricing
    Higher document limits
    Advanced features
    API:
    Create jobs and retrieve results
    Integrate directly with your applications
  • If you want to try it out
    Go to http://www.extractiv.com
    Follow us @extractiv