spaCy to the rescue
or why NLTK is not cool anymore
Anton Kasyanov | DataRobot
Anton Kasyanov | DataRobot
What is spaCy
• Natural language processing library

• Industrial strength - based on latest research

• Fast - written using Cython
Anton Kasyanov | DataRobot
Usage
import spacy
nlp = spacy.load(‘en')
doc = nlp(
‘Hello, world.’
‘Here are two sentences.’
)
Anton Kasyanov | DataRobot
Tokeniser
token = doc[0]
sentence = next(doc.sents)
assert token is sentence[0]
assert sentence.text == 
‘Hello, world.'
Anton Kasyanov | DataRobot
Word Vectors
doc = nlp(“Apples and oranges are similar.
Boots and hippos aren’t.")
apples = doc[0]
oranges = doc[2]
boots = doc[6]
hippos = doc[8]
assert apples.similarity(oranges) > 
boots.similarity(hippos)
Anton Kasyanov | DataRobot
Syntactic Parser
Anton Kasyanov | DataRobot
Speed
Anton Kasyanov | DataRobot
Other features
• Part-of-Speech tagger

• Named entities recognition

• Integer IDs for words

• Multi-threading support

• Deep learning 

• German, English, French (so far)
Anton Kasyanov | DataRobot
Thanks!
https://spacy.io
antonkasyanov.com

spaCy lightning talk for KyivPy #21

  • 1.
    spaCy to therescue or why NLTK is not cool anymore Anton Kasyanov | DataRobot
  • 2.
    Anton Kasyanov |DataRobot What is spaCy • Natural language processing library • Industrial strength - based on latest research • Fast - written using Cython
  • 3.
    Anton Kasyanov |DataRobot Usage import spacy nlp = spacy.load(‘en') doc = nlp( ‘Hello, world.’ ‘Here are two sentences.’ )
  • 4.
    Anton Kasyanov |DataRobot Tokeniser token = doc[0] sentence = next(doc.sents) assert token is sentence[0] assert sentence.text == ‘Hello, world.'
  • 5.
    Anton Kasyanov |DataRobot Word Vectors doc = nlp(“Apples and oranges are similar. Boots and hippos aren’t.") apples = doc[0] oranges = doc[2] boots = doc[6] hippos = doc[8] assert apples.similarity(oranges) > boots.similarity(hippos)
  • 6.
    Anton Kasyanov |DataRobot Syntactic Parser
  • 7.
    Anton Kasyanov |DataRobot Speed
  • 8.
    Anton Kasyanov |DataRobot Other features • Part-of-Speech tagger • Named entities recognition • Integer IDs for words • Multi-threading support • Deep learning • German, English, French (so far)
  • 9.
    Anton Kasyanov |DataRobot Thanks! https://spacy.io antonkasyanov.com