Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1 of 12

Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and TensorFlow with Alex Thomas

6

Share

Download to read offline

Natural language processing is a key component in many data science systems that must understand or reason about text. Common use cases include question answering, paraphrasing or summarization, sentiment analysis, natural language BI, language modeling, and disambiguation. Building such systems usually requires combining three types of software libraries: NLP annotation frameworks, machine learning frameworks, and deep learning frameworks. Ideally, all three of these pieces should be able to be integrated into a single workflow. This makes development, experimentation, and deploying results much easier. Spark’s MLlib provides a number of machine learning algorithms, and now there are also projects making deep learning achievable in MLlib pipelines. All we need is the NLP annotation frameworks. SparkNLP adds NLP annotations into the MLlib ecosystem. This talk will introduce SparkNLP: how to use it, its current functionality, and where it is going in the future.

Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and TensorFlow with Alex Thomas

  1. 1. NATURAL LANGUAGE UNDERSTANDING AT SCALE WITH SPARK- NATIVE NLP, SPARK ML, AND TENSORFLOW Alex Thomas | Data Scientist at indeed.com NATURAL LANGUAGE UNDERSTANDING AT SCALE WITH SPARK-NATIVE NLP, SPARK ML, AND TENSORFLOW
  2. 2. Natural Language Understanding is an AI-Complete problem.
  3. 3. AT THE BEGINNING, THERE WAS SEARCH Scalable & robust Indexing pipeline Tokenizers & analyzers Synonyms, spellers & auto-suggest File formats & header boosting Rankers, link & reputation boosting
  4. 4. THEN, YOU NEED TO UNDERSTAND LANGUAGE Prescribing sick days due to diagnosis of influenza. Positive Jane complains about flu-like symptoms. Speculative Jane’s RIDT came back clean. Negative Jane is at risk for flu if she’s not vaccinated. Conditional Jane’s older brother had the flu last month. Family history Jane had a severe case of flu last year. Patient history Parts of Speech • Dependency Parsing • Co-reference Resolution • Entity Recognition
  5. 5. WHAT MAKES LANGUAGE HARD • Nuanced – Sure / I agree / Absolutely! / Whatever / Yes sir / Just to see you smile ❤ • Fuzzy – Blue, New, Tall, Child, Tell, Do • Contextual – “Patient denies alcohol abuse” • Medium specific – “SGTM c u in 15” • Domain specific – All forward-looking statements included in this document are based on information available to us on the date hereof, and we assume no obligation to revise or publicly release any revision to any such forward-looking statement, except as may otherwise be required by law.
  6. 6. NLP LIBRARIES • By Ecosystem: 1. Python: NLTK, spaCy, gensim 2. JVM: OpenNLP, CoreNLP, Spark NLP, UIMA, GATE, Mallet 3. Others: tm for R, SAS, Watson, Matlab, … • By Design: 1. Raw functionality: NLTK, OpenNLP 2. Annotation libraries: spaCy, Spark NLP, UMIA, GATE • Industrial Grade, Open Source, Supported: 1. spaCy 2. Spark NLP
  7. 7. NLP FOR SPARK • Production Grade NLP for the Apache Spark ecosystem • Design Goals 1. Performance & Scale 2. Frictionless Reuse 3. Enterprise Grade • Build on top of the Spark 2.0 ML API’s • Apache 2.0 licensed, with active development & support
  8. 8. THE PERFORMANCE BOTTLENECK Image Credit: Tim Hunter’s TensorFrames overview
  9. 9. REUSE: OUT OF THE BOX FUNCTIONALITY New in Spark NLP Reused from Spark ML Tokenizer Normalizer Stemmer Lemmatizer Entity extractor Date extractor Part of Speech tagger Named entity recognizer Sentence boundary detection Sentiment analysis Spell checker Topic modeling Word 2 Vec TF-IDF String distance calculation N-grams calculation Stop words removal Classification Regression Ensembles Train/Test & Cross-Validation Grid Search
  10. 10. Demo: NLP for Spark in Action Sample Notebook Project Homepage
  11. 11. SUMMARY: WHEN DOES IT APPLY? Get by with rules, search, RegEx, attribute extraction Welcome to the world of NLP, ML and DL Social media Does this social media post contain an offensive word? Is this social media post offensive? Legal Find patents with the terms ‘car’ and battery’, or synonyms Who is patenting next-gen electrical car batteries? Support Find products mentioned in customer emails or phone calls What is this customer complaining about? Finance Extract the fee structure from a mutual fund prospectus Are UK pensions allowed to invest in this fund? Healthcare Extract the patient’s blood pressure reading from a note Does this patient have high blood pressure?
  12. 12. NATURAL LANGUAGE UNDERSTANDING AT SCALE WITH SPARK- NATIVE NLP, SPARK ML, AND TENSORFLOW THANK YOU! Special thanks to • Saif Addin • Dr. David Talby • John Snow Labs

×