Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Teaching AI about human knowledge

1,235 views

Published on

Video: https://www.facebook.com/foundersas/videos/712970348885532/

The bottleneck in AI is data, not algorithms. But how do we get data and knowledge from humans to ML systems? What will the future of data collection look like? And which skills and strategies do we need to improve the process and make our products useful?

Published in: Technology

Teaching AI about human knowledge

  1. 1. Teaching AI about human knowledge Ines Montani Explosion AI
  2. 2. Explosion AI is a digital studio specialising in Artificial Intelligence and Natural Language Processing. Open-source library for industrial-strength Natural Language Processing spaCy’s next-generation Machine Learning library for deep learning with text Data Store coming soon: pre-trained, customisable models
 for a variety of languages and domains
  3. 3. Machine learning is programming by example.
  4. 4. Examples are your source code, training is compilation. draw examples from the same distribution as the runtime inputs goal: system’s prediction given some input matches label a human would have assigned examples training predictioninput labels
  5. 5. How machines “learn” def train_tagger(examples): W = defaultdict(lambda: zeros(n_tags)) for (word, prev, next), human_tag in examples: scores = W[word] + W[prev] + W[next] guess = scores.argmax() if guess != human_tag: for feat in (word, prev, next): W[feat][guess] -= 1 W[feat][human_tag] += 1 the weights we'll train if the guess was wrong, adjust weights get the best-scoring tag score each tag given weights & context examples = words, tags, contexts decrease score for bad tag in this context increase score for good tag in this context Example: training a simple part-of-speech tagger with the perceptron algorithm
 (teach the model to recognise verbs, nouns, etc.)
  6. 6. The bottleneck in AI is data, not algorithms.
  7. 7. Algorithms are general, training data is specific. data quality, data quantity and accuracy problems are still the biggest problems in AI 
 (Source: The State of AI survey) you can extract knowledge from all kinds of sources, e.g. sentiment from emoji on Reddit 😍 you usually need at least some data specific to your problem annotated by humans
  8. 8. Where human knowledge in AI really comes from Images: Amazon Mechanical Turk, depressing.org Mechanical Turk human annotators ~$5 per hour boring tasks low incentives
  9. 9. Don’t expect great data if you’re boring the shit out of underpaid people. Why are we “designing around” this?
 “Taking a HIT: Designing around Rejection, Mistrust, Risk, and Workers’ Experiences in Amazon Mechanical Turk” 
 (McInnis et al., 2016) data collection needs the same treatment as all other human-facing processes good UX + purpose + incentives = better quality
  10. 10. UX-driven data collection with active learning assist human with good UX and task structure the things that are hard for the computer are usually easy for the human, and vice versa don’t waste time on what the model already knows, ask human about what the model is 
 most interested in SOLUTION #1
  11. 11. model human tasks human annotates all tasks annotated tasks are used as training data for model Batch learning vs. active learning approach to annotation and training BATCH BATCH
  12. 12. model human tasks human annotates all tasks annotated tasks are used as training data for model model chooses one task human annotates chosen task annotated single task influences model’s decision on what to ask next Batch learning vs. active learning approach to annotation and training BATCH BATCH ACTIVE
  13. 13. Import knowledge with pre-trained models start off with general information about the language, the world etc. fine-tune and improve to fit custom needs big models can work with little training data backpropagate error signals to correct model SOLUTION #2
  14. 14. user input word meanings entity labels phrase meanings intent your examples “whats the best way to catalinas” fit meaning representations to your data Backpropagation
  15. 15. user input word meanings entity labels phrase meanings intent your examples “whats the best way to catalinas” fit meaning representations to your data add workaround for read-only model Backpropagation
  16. 16. If you don’t own the weights of your model, you’re missing the point of deep learning.
  17. 17. Deep learning is about reading and writing. deep learning = learning internal representations to help solve your task. backpropagate errors on the task you care about to adjust the internal representations SaaS is great for read-only software – but deep learning needs read and write access
  18. 18. 1. AI systems today are no magically smart black boxes. They’re made of annotations and statistics. 2. AI has an HR problem. To improve the technology, we need to fix the way we collect data from humans. 3. AI needs write access. This challenges the current trends towards thin clients and centralised cloud computing.
  19. 19. Thanks! 💥 Explosion AI
 explosion.ai 📲 Follow us on Twitter
 @explosion_ai
 @_inesmontani
 @honnibal

×