Your SlideShare is downloading. ×

Extractiv

939

Published on

Shion Deysarkar presents Extractiv at the Austin Hadoop User Group

Shion Deysarkar presents Extractiv at the Austin Hadoop User Group

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
939
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. E X T R A C T I V
    The long-awaited beta!
  • 2. E X T R A C T I V
    What it is: Semantics services that transforms unstructured web content into structured semantic data.
    What it does:
    • Crawls millions of pages
    • 3. Applies NLP tools to perform entity extraction
    • 4. Produces marked-up files for you to use
    Who it’s for:
    • Need high-volume text extraction
    • 5. Need more types of entities
    • 6. Want to go above OpenCalais limits
    • 7. Don’t want to pay a ton
  • E X T R A C T I V
    Demo time
  • 8. How it works
    Create Extractiv Job
    • Automatically builds extraction app
    • 9. Packages entities into data model blob
    Pass Job to Crawling Service
    • Checks duplicate links against link graph
    • 10. Packages links and app into a work unit
    Pass Work Unit to Grid Server
    • Packages work units into version
    • 11. Identifies available nodes
    • 12. Sends version to nodes
    Pass Version to Nodes
    • Downloads content of link
    • 13. Runs app, returns result
    • 14. Sends results back
    Node Completes Work Units
  • 15. Some fun metrics
    Max theoretical processing speed (per user):
    5 million documents per hour
    Available node pool:
    50,000+ completely heterogeneous PCs
    Back-end architecture:
    12 grid service servers
    8 crawling service servers
    Number of available entities:
    239 now, 1000+ in the next few months
    Minimum time to create new entity:
    2-3 hours
  • 16. Coming soon…
    New features:
    RDF
    Facts
    Relations
    Entity linking
    Triples
    Pricing plans:
    Monthly access + per-document pricing
    Higher document limits
    Advanced features
    API:
    Create jobs and retrieve results
    Integrate directly with your applications
  • 17. If you want to try it out
    Go to http://www.extractiv.com
    Follow us @extractiv

×