Extractiv
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Extractiv

  • 1,214 views
Uploaded on

Shion Deysarkar presents Extractiv at the Austin Hadoop User Group

Shion Deysarkar presents Extractiv at the Austin Hadoop User Group

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,214
On Slideshare
1,169
From Embeds
45
Number of Embeds
5

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 45

http://austinhug.blogspot.com 41
http://austinhug.blogspot.com.es 1
http://theoldreader.com 1
http://austinhug.blogspot.in 1
http://austinhug.blogspot.fr 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. E X T R A C T I V
    The long-awaited beta!
  • 2. E X T R A C T I V
    What it is: Semantics services that transforms unstructured web content into structured semantic data.
    What it does:
    • Crawls millions of pages
    • 3. Applies NLP tools to perform entity extraction
    • 4. Produces marked-up files for you to use
    Who it’s for:
    • Need high-volume text extraction
    • 5. Need more types of entities
    • 6. Want to go above OpenCalais limits
    • 7. Don’t want to pay a ton
  • E X T R A C T I V
    Demo time
  • 8. How it works
    Create Extractiv Job
    • Automatically builds extraction app
    • 9. Packages entities into data model blob
    Pass Job to Crawling Service
    • Checks duplicate links against link graph
    • 10. Packages links and app into a work unit
    Pass Work Unit to Grid Server
    • Packages work units into version
    • 11. Identifies available nodes
    • 12. Sends version to nodes
    Pass Version to Nodes
    • Downloads content of link
    • 13. Runs app, returns result
    • 14. Sends results back
    Node Completes Work Units
  • 15. Some fun metrics
    Max theoretical processing speed (per user):
    5 million documents per hour
    Available node pool:
    50,000+ completely heterogeneous PCs
    Back-end architecture:
    12 grid service servers
    8 crawling service servers
    Number of available entities:
    239 now, 1000+ in the next few months
    Minimum time to create new entity:
    2-3 hours
  • 16. Coming soon…
    New features:
    RDF
    Facts
    Relations
    Entity linking
    Triples
    Pricing plans:
    Monthly access + per-document pricing
    Higher document limits
    Advanced features
    API:
    Create jobs and retrieve results
    Integrate directly with your applications
  • 17. If you want to try it out
    Go to http://www.extractiv.com
    Follow us @extractiv