Explainable AI in Industry (KDD 2019 Tutorial)

Aug. 6, 2019
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
1 of 145

More Related Content

What's hot

An Introduction to XAI! Towards Trusting Your ML Models!An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!Mansour Saffar
Explainable AIExplainable AI
Explainable AIEquifax Ltd
Explainable AIExplainable AI
Explainable AIArithmer Inc.
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Sri Ambati
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictionsAnton Kulesh
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Hayim Makabee

What's hot(20)

Similar to Explainable AI in Industry (KDD 2019 Tutorial)

Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
“Explainability in Computer Vision: A Machine Learning Engineer’s Overview,” ...“Explainability in Computer Vision: A Machine Learning Engineer’s Overview,” ...
“Explainability in Computer Vision: A Machine Learning Engineer’s Overview,” ...Edge AI and Vision Alliance
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learningPramit Choudhary
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIPramit Choudhary
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Pramit Choudhary
C3 w5C3 w5
C3 w5Ajay Taneja

Similar to Explainable AI in Industry (KDD 2019 Tutorial)(20)

More from Krishnaram Kenthapadi

Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
Amazon SageMaker ClarifyAmazon SageMaker Clarify
Amazon SageMaker ClarifyKrishnaram Kenthapadi
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedPrivacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsKrishnaram Kenthapadi
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Krishnaram Kenthapadi
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Krishnaram Kenthapadi

Recently uploaded

wikipedia.docxwikipedia.docx
wikipedia.docxrajsharma175677
Serverless and event-driven in a world of IoTServerless and event-driven in a world of IoT
Serverless and event-driven in a world of IoTJimmy Dahlqvist
The value of measuring your accessibility maturityThe value of measuring your accessibility maturity
The value of measuring your accessibility maturityIntopia
CASE STUDY.pdfCASE STUDY.pdf
CASE STUDY.pdfShivamYadav8517
[FediForum] Reisman FairPay - Rethinking Revenue.pdf[FediForum] Reisman FairPay - Rethinking Revenue.pdf
[FediForum] Reisman FairPay - Rethinking Revenue.pdfTeleshuttle Corporation
Guide to play with a GOD-TIER Swain adc.pptxGuide to play with a GOD-TIER Swain adc.pptx
Guide to play with a GOD-TIER Swain adc.pptxMizuBeats

Explainable AI in Industry (KDD 2019 Tutorial)

Editor's Notes

  1. Krishna to talk about the evolution of AI
  2. Models have many interactions within a business, multiple stakeholders, and varying requirements. Businesses need a 360-degree view of all models within a unified modeling universe that allows them to understand how models affect one another and all aspects of business functions across the organization. Worth saying that some of the highlighted concerns are applicable even for decisions not involving humans. Example: “Can I trust our AI decisions?” arises in domains with high stakes, e.g., deciding where to excavate/drill for mining/oil exploration; understanding the predicted impact of certain actions (e.g., reducing carbon emission by xx%) on global warming.
  3. Krishna
  4. At the same time, explainability is important from compliance & legal perspective as well. There have been several laws in countries such as the United States that prohibit discrimination based on “protected attributes” such as race, gender, age, disability status, and religion. Many of these laws have their origins in the Civil Rights Movement in the United States (e.g., US Civil Rights Act of 1964). When legal frameworks prohibit the use of such protected attributes in decision making, there are usually two competing approaches on how this is enforced in practice: Disparate Treatment vs. Disparate Impact. Avoiding disparate treatment requires that such attributes should not be actively used as a criterion for decision making and no group of people should be discriminated against because of their membership in some protected group. Avoiding disparate impact requires that the end result of any decision making should result in equal opportunities for members of all protected groups irrespective of how the decision is made. Please see NeurIPS’17 tutorial titled Fairness in machine learning by Solon Barocas and Moritz Hardt for a thorough discussion.
  5. Renewed focus in light of privacy / algorithmic bias / discrimination issues observed recently with algorithmic systems. Examples: EU General Data Protection Regulation (GDPR), which came into effect in May, 2018, and subsequent regulations such as California’s Consumer Privacy Act, which would take effect in January, 2020. The focus is not only around privacy of users (as the name may indicate), but also on related dimensions such as algorithmic bias (or ensuring fairness), transparency, and explainability of decisions. Image credit: https://pixabay.com/en/gdpr-symbol-privacy-icon-security-3499380/
  6. European Commission Vice-President for the #DigitalSingleMarket.
  7. Retain human decision in order to assign responsibility. https://gdpr-info.eu/art-22-gdpr/ Bryce Goodman, Seth Flaxman, European Union regulations on algorithmic decision-making and a “right to explanation” https://arxiv.org/pdf/1606.08813.pdf Renewed focus in light of privacy / algorithmic bias / discrimination issues observed recently with algorithmic systems. Examples: EU General Data Protection Regulation (GDPR), which came into effect in May, 2018, and subsequent regulations such as California’s Consumer Privacy Act, which would take effect in January, 2020. The focus is not only around privacy of users (as the name may indicate), but also on related dimensions such as algorithmic bias (or ensuring fairness), transparency, and explainability of decisions. Image credit: https://pixabay.com/en/gdpr-symbol-privacy-icon-security-3499380/
  8. https://gdpr-info.eu/recitals/no-71/ Renewed focus in light of privacy / algorithmic bias / discrimination issues observed recently with algorithmic systems. Examples: EU General Data Protection Regulation (GDPR), which came into effect in May, 2018, and subsequent regulations such as California’s Consumer Privacy Act, which would take effect in January, 2020. The focus is not only around privacy of users (as the name may indicate), but also on related dimensions such as algorithmic bias (or ensuring fairness), transparency, and explainability of decisions. Image credit: https://pixabay.com/en/gdpr-symbol-privacy-icon-security-3499380/
  9. Convey that this is the Need Purpose: Convey the need for adopting “explainability by design” approach when building AI products, that is, we should take explainability into account while designing & building AI/ML systems, as opposed to viewing these factors as an after-thought.
  10. This criteria distinguishes whether interpretability is achieved by restricting the complexity of the machine learning model (intrinsic) or by applying methods that analyze the model after training (post hoc). Intrinsic interpretability refers to machine learning models that are considered interpretable due to their simple structure, such as short decision trees or sparse linear models. Post hoc interpretability refers to the application of interpretation methods after model training.
  11. This criteria distinguishes whether interpretability is achieved by restricting the complexity of the machine learning model (intrinsic) or by applying methods that analyze the model after training (post hoc). Intrinsic interpretability refers to machine learning models that are considered interpretable due to their simple structure, such as short decision trees or sparse linear models. Post hoc interpretability refers to the application of interpretation methods after model training.
  12. We ask this question for a training image classification network.
  13. There are several different formulations of this Why question. Explain the function captured by the network in the local vicinity of the network Explain the entire function of the network Explain each prediction in terms of training examples that influence it And then there is attribution which is what we consider.
  14. Some important methods not covered Grad-CAM [ICCV 2017] Grad-CAM++ [arxiv 2017] SmoothGrad [arxiv 2017]
  15. Point out correlation is not causation (yet again)
  16. In such cases, it may be feasible to attribute to a small set of derived features, e.g., super pixels
  17. Attributions are useful when the network behavior entails that a strict subset of input features are important
  18. This criteria distinguishes whether interpretability is achieved by restricting the complexity of the machine learning model (intrinsic) or by applying methods that analyze the model after training (post hoc). Intrinsic interpretability refers to machine learning models that are considered interpretable due to their simple structure, such as short decision trees or sparse linear models. Post hoc interpretability refers to the application of interpretation methods after model training.
  19. So why do need relevance debugging and explaining? The most obvious reason is we want to improve the performance of our relevance machine learning model. After days and weeks of data collection, feature engineering, model training and finally the model was deployed, we want to know how our machine learning models actually work in production. Does it work well with other pieces of the system? Does it work well on some edge cases? How can we improve it? If we only look at the metrics numbers, we won't be able to answer these questions. By Improving the relevance machine learning models, we provide value to our members. Our members come to LinkedIn to look for people and jobs they are interested in, and to stay connected to their professional network. We want to provide the most relevant experience to our members. In this way, we can build trust with our members. Our members would only trust us when we understand what we are doing. For example if we recommend an entry level job to a CEO, we better understand what's going on why we are making that mistake.
  20. Now, let’s take a look at the architecture of our relevance system at LinkedIn. Let’s take search as an example. Our members use their mobile apps or browser to find relevant information, the request goes through frontend API layer and will be sent to the federator layer. There are different types of entities the member maybe looking for, for example they could be looking for another member, or for a job. So the federator will send the query to the broker layer of different types of entities. For each type of these entities, the data will be partitioned in to different shards, and each shards has a cluster of machines located in different data center. So the request goes to different shards and the search results will be fetched and finally returned to the member. This is where machine learning models comes into play. The federator layer will try to understand what the user is looking for to decide which broker to query. It could be one or more brokers depending on the search keyword. So there’s a machine learning model in this layer to figure out the user’s intention. In the broker and shards layer, there are machine learning models to score the search results. There will be a first pass ranking and second pass ranking.
  21. So what could go wrong in this diagram? First of all, the services could go down. The request could fail because of a bug or the query took too long and got timed out. The data in each shard could be stale, the data doesn’t exists in our search index. When it comes to the machine learning part, 1.The quality of the feature might not be good, sometimes the feature doesn’t really represent what we thought it should be. 2.The features could be inconsistent between offline training and online serving due to some issues with the feature generation pipeline. 3.When we ramp a new model, it doesn’t always work well for all the use cases, so that could also go wrong.
  22. To debug these kind of issues could be hard. First of all, the infrastructure is complex. As we have seen in the architecture, there are many services involved in this system, they are deployed in different hosts in different data centers, and most of the time, they are owned by different teams. So for an AI engineer, it could be hard for them to understand all the components in the system. The issue might be hard to reproduce locally. The production environment could be very different from dev environment and it could be constantly changing. Some of the issues could be time related, and these kind of issues is especially hard to debug. For example when a member report that he saw a feed item or received a notification that is not relevant to them, it could be days later when an AI engineer pick up the issue. And he would have no idea what was going on at that time. So basically they have no way to debug the issue. Also, debugging the issues locally could be time consuming, you’ll need to set up local environment, deploy multiple service locally, get the permission to access production data to reproduce the issue and most of the time you won’t be able to reproduce. The whole process could take a long time and it’s really not productive.
  23. So what we do is to provide the infrastructure to collect the debugging data and a web UI tool to query the system and visualize the data. This is the overall architecture of the solution that we provide. The Engineers will use the Web UI tool to query the relevance system. The request sent out from the tool will identify itself as a debugging call, so when each service receives the request, it will collect the data for debugging for example the request and response of this service, some statistics about the query and most importantly the features and scores of each item. And each of these services will send out the collected data using Kafka and the data will be stored in some kind of storage system. After the query has finished, the web UI tool will query the storage system and visualize the call graph and data to the engineer. The reason we use kafka instead of returning the debug data in the response is the data size could be large and it affects the performance of the system. Also, if a service fails, we lost all the debugging data from other services.
  24. Next, I’ll show you some UI parts of our debugging tool and how we can use them to debug the issues. The call graph here shows how the query was sent to different services. The service call graph could be different depending on the search key word and search vertical. So it’s very important for our engineers to understand the call graph, and be able to see the request and response and what happened in each of these services. For example, if a service returned 500, we would like to see the hostname, the version of the service, and also the logs. Sometimes the engineers not only want to see what was returned by the frontend API, they also want to debug a particular service in a particular data center or cluster. They could use our tools to specify which service or which host they want to debug and construct the query. In this way, they can tweak the request to see how does it change the results.
  25. Our users could also see some statistics of the query, such as how long does each phases take and how efficient is the query. This could help them to locate the issues or improve latency of the system.
  26. Sometimes the engineers would get questions like “Why am I seeing this feed item”, ”Why is this search result ranked in the first place”. To answer these questions, they would like to look at the features of these items. The issue usually happens when they are experimenting a new feature in the model, and the quality of the feature doesn’t meet their expectation and doesn’t perform well in the online system. In this case, the engineers would be interested in this particular new feature to see if the value is abnormal. Sometimes the issue happens when there’s a new model being ramped in production. Since we could have multiple models deployed for the A/B tests, the engineers would also like to know what is the model being used at the time of scoring. This is also useful because sometimes a model has a failure and it falls back to a default model, they would also want to confirm which one was used. Our engineers would also get the questions such as “Why am I not seeing something”. It would be very helpful to see what has been returned and what has been dropped in each layer and to see how was this item re-ranked. Sometimes the data doesn’t exists in our online store, sometimes the item are dropped because the score is not high enough. This would help the engineers to find out the root cause quickly.
  27. Here’re some more advanced and complex use cases when it comes to debugging and explaining relevance systems. We’ll now go over each one.
  28. The first use case is system perturbation. The idea is simple: I want to change something in the system and see how the results change. For example, I want to confirm a bug fix before making code changes, or just to observe how a new model would work before exposing it to traffic. So how does it work? First, the engineer uses our tool to specify what kind of perturbation he or she wants to perform, which is injected as part of the request. Our tool then passes the perturbation to downstream services, overrides the system behavior, and surfaces the results back to the engineer. Some examples of perturbation include: -To override settings or treatments of online A/B tests -To select a new machine learning model and sanity check its results before ramping up on traffic To override feature values and see if an item would be ranked differently in the search results
  29. Another advanced use case is side-by-side comparison: -We want to compare the results of 2 different machine learning models Or we may want to compare features and ranking scores of 2 items. The 2 items can come from a single query (to understand their relative ranking), or from 2 different queries (to understand differences in models).
  30. Here’s the user interface when comparing results from 2 different queries. The UI shows changes in ranking positions of the same item. As you can see in this example, Query 1 on the left side, using a different model than Query 2 on the right side, ranked the Lead Software Engineer job higher.
  31. And why do the models rank the same item differently? Our tool can compare this item from 2 queries and show the difference in the ranking process of the 2 models. As you can see in the table, this UI highlights the differences in feature values, both raw and computed by the models. AI engineers use this comparison information to understand model behavior and identify potential problems or improvements.
  32. The last use case is Replay. Scoring and ranking relevant information could be time sensitive. For example, we may get a complaint about old items showing up in a member’s feed. To debug the problem, we need to reproduce the feed items returned by the model at a specific timepoint in the past. Our tool provides the capability of replaying past model inferences. How does it work? First, when making model inferences, the online system emits features and ranking cores using Kafka. These Kafka events are then consumed by our tool using Samza and stored in database. Later, AI engineers can query the database for historical data when debugging. This is the UI for Feed Replay. On the left side, an AI engineer can query sessions of a given member within a time range. And when a session is selected, on the right side are the list of feed items shown to the member during that session.
  33. So we have talked about many different use cases, and I would like add a note on security and privacy. As you have seen earlier, debugging and explaining the relevance systems require access to personalized data. And we have always made privacy concerns a central part of the tool. Like the rest of LinkedIn, our debugging tool is GDPR compliant. We control and audit all access. Debugging data has a limited retention time of about a couple of weeks. And the replay capability from the previous slide only works for our internal employees. With the debugging tool, we give our engineers as much visibility as possible while protecting our members’ privacy. Here’re a list of AI teams that are using our tool on a daily basis: search, feed, comments, etc. As AI becomes more and more prevalent in our daily life, it’s important to understand the models and debug them when they don’t behave as expected. We hope our talk has been helpful, and we’re happy to answer questions. Thank you!
  34. Have these models read the question carefully? Can we say “yes”, because they achieve good accuracy? That is the question we wish to address in this paper. Let us start by looking at an example.
  35. Completely changing the question’s meaning is still not chanigng the behaviour of the model. Clearly, the model has not fully grasped the question.
  36. Completely changing the question’s meaning is still not chanigng the behaviour of the model. Clearly, the model has not fully grasped the question.
  37. Completely changing the question’s meaning is still not chanigng the behaviour of the model. Clearly, the model has not fully grasped the question.
  38. Completely changing the question’s meaning is still not chanigng the behaviour of the model. Clearly, the model has not fully grasped the question.
  39. Completely changing the question’s meaning is still not chanigng the behaviour of the model. Clearly, the model has not fully grasped the question. TODO: Bold + color:gray
  40. You may feel that Integration and Gradients are inverse operations of each other, and we did nothing.
  41. We encourage all of you to check out KDD’19 sessions/tutorials/talks related to these topics: Tutorial: Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned (Sun) Workshop: Explainable AI/ML (XAI) for Accountability, Fairness, and Transparency (Mon) Social Impact Workshop (Wed, 8:15 – 11:45) Keynote: Cynthia Rudin, Do Simpler Models Exist and How Can We Find Them? (Thu, 8 - 9am) Several papers on fairness (e.g., ADS7 (Thu, 10-12), ADS9 (Thu, 1:30-3:30)) Research Track Session RT17: Interpretability (Thu, 10am - 12pm)