Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

II-SDV 2017: Auto Classification: Can/Should AI replace You?

472 views

Published on

This presentation addresses machine learning techniques that can be used to categorize information. The session discusses the types of problems that are suitable (or unsuitable) for machine learning and catalogs strengths, weaknesses and requirements of current algorithms. The presentation closes with a brief discussion of what lies beyond machine learning.

Published in: Internet
  • Be the first to comment

II-SDV 2017: Auto Classification: Can/Should AI replace You?

  1. 1. Auto Classification Can/Should AI replace you? NILS C. NEWMAN [newman@searchtech.com] DR. ALAN L. PORTER SEARCH TECHNOLOGY WWW.THEVANTAGEPOINT.COM
  2. 2. Background Now that “Big Data” is old news, “Automation” is the new buzzword. Within our technology space, automation means machine learning. Will machine learning help us work better… Or... force us to find new careers?
  3. 3. Machine Learning and our job As information analysts, we are a pretty highly educated group. We have a lot of skills, experience, and in-depth knowledge. How could machines possibly replace us?
  4. 4. Your role as Gatekeeper One aspect of many of our jobs is the process of classification. In that capacity, we act as filters for information. We sort and classify information to vet its value and identify its appropriate place within our organizations. You are the gatekeeper of information.
  5. 5. Classification and Machine Learning But what happens when the information flow is too great? The firehose of information has been overwhelming us for 25+ years. When you first signed up for an alerting service or started relying on data providers to screen the data, you began to hand over part of your job to machine learning.
  6. 6. Machine Learning is getting smarter Machine learning did not really take off until the Internet made vast amounts of data readily available. Most Machine Learning approaches need lots of data to get smarter. Plus we rely on Machine Learning to deal with the massive increase in information. The process has become a self- reinforcing feedback loop.
  7. 7. Types of Machine Learning There are over a dozen different major approaches to machine learning. Many have their roots in specific types of non-text information. ◦ Image Processing ◦ Numerical Data Processing Others have been designed with an eye toward replacing you as a document classifier. Association Rule Learning Artificial Neural Networks Support Vector Machines Bayesian Networks Genetic Algorithms Representation Learning Deep Learning
  8. 8. The Machines are learning In the early days of machine learning, most approaches needed active participation of humans to provide samples of the “right” answer so they could learn. (Supervised Learning) But the algorithm designers soon learned that people are lousy and lazy when it comes to teaching machines. So they designed systems that required less supervision (semi-supervised learning) and finally unsupervised systems (unsupervised learning).
  9. 9. Fear the Machines Unsupervised systems have the greatest potential to be an agent of radical change in our (or any other) industry. Given enough data and sufficient computing power, the unsupervised systems can eventually be powerful agents of change.
  10. 10. Afraid yet? Unsupervised systems are the ones that will replace us. Fortunately, they are not so smart yet. A few years ago, Google build a neural network with 16,000 processors and fed it 10 million randomly selected youtube videos. (very unsupervised learning) After three days and without human interference, the deep learning system figured out that cats where a thing and could recognize them a high percentage of time.
  11. 11. So can Machine Learning help me? If I don’t have to fear the Google cat machine, can Machine Learning actually help me. If so, which types of Machine Learning are best to assist the gatekeepers of information?
  12. 12. Supervised Machine Learning The systems which are best positioned to help you at this point in time are supervised learning systems. The greatest potential are systems where you are the trainer. But beware of the workload. Teaching a machine how to classify documents should not be full time job.
  13. 13. So which algorithm? Currently, you as an end user do not have a whole lot of choice. Most machine learning software is targeted toward developers. They take the machine learning software and embed it into their applications. ◦ Smart Search in TI ◦ KMX in Evaluserve ◦ Luxid by Expert System ◦ Auto Classifier in VantagePoint ◦ Etc.. Often, the software developer will not tell you they are using Machine Learning.
  14. 14. So what to do? Be a better informed customer. Ask your vendors if they use machine learning algorithms in their applications. See if they will tell you which type of algorithm they use. (Most will not tell you but it is worth a try.) Ask if the training is supervised or unsupervised. If it is supervised, ask how difficult it is for you to teach the system.
  15. 15. Take away Supervised machine learning systems can be a great tool to improve your effectiveness. They have the potential to significantly reduce the level of effort required to classify documents.
  16. 16. The Future Keep an eye on unsupervised and semi-supervised systems. Semi-supervised systems that involved you as an unsuspecting passive trainer will become more invasive. (For example – Google or IBM Watson) In the near term, disruptive change will most likely come systems where the user doesn’t know they are participating in training. In the long term, unsupervised systems have the potential to be quite disruptive. Closely read “terms of use” for online systems. You may be training a system without knowing it. This type of training is fine for public search tools. But for industrial patent systems, where competitive intelligence has real value, there are a great many unanswered questions.
  17. 17. Discussion

×