This document provides an overview of Frankbot, an AI assistant created by Anthropic to be helpful, harmless, and honest. It summarizes the key aspects of Frankbot, including its use of historical customer support data to train models to intercept and resolve common customer queries, its methodology for offline training and online processing, and how it is periodically refreshed and taught by customer support agents. Metrics are presented showing how Frankbot has helped increase customer satisfaction scores while reducing average first response times for customers.
ML Framework for auto-responding to customer support queriesVarun Nathan
The synopsis of this presentation is about how ML can be employed to develop a bot that has the capability to understand natural language and provide suitable response.
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize
Welocalize language tools expert Laura Casanellas details key topics related to human translation and machine translation post-editing, production, throughputs and measuring success. This is the presentation used in a recent online webinar you can find at http://www.welocalize.com/wemt/wemt-webinars/
Topics for this recorded webinar include:
- Defining throughputs for human translation and machine translation post-editing
- How to accurately compare individual throughputs for translating and post-editing
- What are the most common deviations in throughputs
- How to spot progress and performance improvement
- Who really benefits from post-editing
ML Framework for auto-responding to customer support queriesVarun Nathan
The synopsis of this presentation is about how ML can be employed to develop a bot that has the capability to understand natural language and provide suitable response.
ML Framework for auto-responding to customer support queriesVarun Nathan
The synopsis of this presentation is about how ML can be employed to develop a bot that has the capability to understand natural language and provide suitable response.
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize
Welocalize language tools expert Laura Casanellas details key topics related to human translation and machine translation post-editing, production, throughputs and measuring success. This is the presentation used in a recent online webinar you can find at http://www.welocalize.com/wemt/wemt-webinars/
Topics for this recorded webinar include:
- Defining throughputs for human translation and machine translation post-editing
- How to accurately compare individual throughputs for translating and post-editing
- What are the most common deviations in throughputs
- How to spot progress and performance improvement
- Who really benefits from post-editing
ML Framework for auto-responding to customer support queriesVarun Nathan
The synopsis of this presentation is about how ML can be employed to develop a bot that has the capability to understand natural language and provide suitable response.
Data Science with r programming course has been designed after consulting some of the best professionals in the industry and the faculties teaching at the best of the universities. The reason we have done this is because we wanted to embed the topics and techniques which are practiced in the industry, conduct them with the help of pedagogy which is followed across universities – kind of practical data science with R implementation. In doing so, we prepare our learners to learn data science with R programming in a more industry/job ready fashion. IgmGuru’s Data Science with R certification course is the gateway towards your Data Science career.
https://www.igmguru.com/data-science-bi/data-science-with-r/
Data Science with r language course has been designed after consulting some of the best professionals in the industry and the faculties teaching at the best of the universities.
Data Science with R certification course has been designed after consulting some of the best professionals in the industry and the faculties teaching at the best of the universities.
Data Science with R certification course has been designed after consulting some of the best professionals in the industry and the faculties teaching at the best of the universities. The reason we have done this is because we wanted to embed the topics and techniques which are practiced in the industry, conduct them with the help of pedagogy which is followed across universities – kind of practical data science with R implementation. In doing so, we prepare our learners to learn data science with R programming in a more industry/job ready fashion. IgmGuru’s Data Science with R certification course is the gateway towards your Data Science career.
Strategic AI Integration in Engineering TeamsUXDXConf
This presentation dives into the practical applications of machine learning within Google's operations, providing a comprehensive overview of how to leverage AI technologies to solve real-world business challenges.
Key Points Covered:
- Introduction to Machine Learning at Google: Discussion on the role of ML and its evolution in enhancing Google's operational efficiency.
- Experience Sharing: Insights into the team's long-term engagement with machine learning projects and the impacts on Google’s operational strategies.
- Practical Applications: Real-world examples of ML applications within Google’s daily operations, providing a blueprint to adapt similar strategies.
- Challenges and Solutions: Discussion on the challenges faced during the implementation of ML projects and the strategic solutions employed to overcome them.
- Future of ML at Google: Insights into future trends in machine learning at Google and how they plan to continue integrating AI into their ecosystem.
Choose the Right Problems to Solve with ML by Spotify PMProduct School
Main takeaways:
-What problems are best solved with ML and what problems are NOT
-What you need to understand and how technical you need to get as a PM of an ML product
Teaching Data-driven Video Processing via Crowdsourced Data CollectionMatthias Trapp
Presentation of education paper "Teaching Data-driven Video Processing via Crowdsourced Data Collection" presented at 42nd Annual Conference of the European Association for Computer Graphics (Eurographics’2021).
• For a full set of 110+ questions. Go to
https://skillcertpro.com/product/google-machine-learning-engineer-exam-questions/
• SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
• It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
• SkillCertPro updates exam questions every 2 weeks.
• You will get life time access and life time free updates
• SkillCertPro assures 100% pass guarantee in first attempt.
Data Science with r programming course has been designed after consulting some of the best professionals in the industry and the faculties teaching at the best of the universities. The reason we have done this is because we wanted to embed the topics and techniques which are practiced in the industry, conduct them with the help of pedagogy which is followed across universities – kind of practical data science with R implementation. In doing so, we prepare our learners to learn data science with R programming in a more industry/job ready fashion. IgmGuru’s Data Science with R certification course is the gateway towards your Data Science career.
https://www.igmguru.com/data-science-bi/data-science-with-r/
Data Science with r language course has been designed after consulting some of the best professionals in the industry and the faculties teaching at the best of the universities.
Data Science with R certification course has been designed after consulting some of the best professionals in the industry and the faculties teaching at the best of the universities.
Data Science with R certification course has been designed after consulting some of the best professionals in the industry and the faculties teaching at the best of the universities. The reason we have done this is because we wanted to embed the topics and techniques which are practiced in the industry, conduct them with the help of pedagogy which is followed across universities – kind of practical data science with R implementation. In doing so, we prepare our learners to learn data science with R programming in a more industry/job ready fashion. IgmGuru’s Data Science with R certification course is the gateway towards your Data Science career.
Strategic AI Integration in Engineering TeamsUXDXConf
This presentation dives into the practical applications of machine learning within Google's operations, providing a comprehensive overview of how to leverage AI technologies to solve real-world business challenges.
Key Points Covered:
- Introduction to Machine Learning at Google: Discussion on the role of ML and its evolution in enhancing Google's operational efficiency.
- Experience Sharing: Insights into the team's long-term engagement with machine learning projects and the impacts on Google’s operational strategies.
- Practical Applications: Real-world examples of ML applications within Google’s daily operations, providing a blueprint to adapt similar strategies.
- Challenges and Solutions: Discussion on the challenges faced during the implementation of ML projects and the strategic solutions employed to overcome them.
- Future of ML at Google: Insights into future trends in machine learning at Google and how they plan to continue integrating AI into their ecosystem.
Choose the Right Problems to Solve with ML by Spotify PMProduct School
Main takeaways:
-What problems are best solved with ML and what problems are NOT
-What you need to understand and how technical you need to get as a PM of an ML product
Teaching Data-driven Video Processing via Crowdsourced Data CollectionMatthias Trapp
Presentation of education paper "Teaching Data-driven Video Processing via Crowdsourced Data Collection" presented at 42nd Annual Conference of the European Association for Computer Graphics (Eurographics’2021).
• For a full set of 110+ questions. Go to
https://skillcertpro.com/product/google-machine-learning-engineer-exam-questions/
• SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
• It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
• SkillCertPro updates exam questions every 2 weeks.
• You will get life time access and life time free updates
• SkillCertPro assures 100% pass guarantee in first attempt.
Similar to ML Framework for auto-responding to customer support queries (20)
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
ML Framework for auto-responding to customer support queries
1. Frankbot - ML framework for auto-responding
to customer support queries
2. Outline of the talk
● Introduction to Freshdesk
● Motivation and Objectives
● Datasets for model training
● Modeling Methodology
○ Offline training
○ Online processing
○ Onboarding a customer account
○ Periodic model refresh
○ Teach the bot
● Metrics and business impact
○ Understanding the metrics
○ Challenges and learnings
3. Introduction to Freshdesk
Freshdesk is a multi-channel cloud based customer support product, which enables businesses to
● Streamline all customer conversations in one place - these are conversations between the business and its end
customers
● Automate repetitive work and make support agents more efficient
● Enable support agents to collaborate with other teams to resolve issues faster
● Freshdesk tickets are a record of customer conversations across channels (read phone, chat, e-mail, social, etc.)
○ A typical conversation includes customer queries and agent responses
○ Frequently recurring customer queries are called T1 tickets
● Freshdesk currently has ~150,000 customers from across the world
Some statistics from companies using Freshdesk
● Average proportion of T1 tickets - 80%
● Average proportion of tickets with answers in the knowledge base - 60%
● Average proportion of tickets with answers in the ticket conversation - 70%
4. Motivation and Objectives
● To build a Machine learning based bot which can do the following
○ Intercept and auto-resolve T1 tickets which are frequently recurring in the support helpdesk
○ Leverage content from the business’ Knowledge base to answer T1 queries
○ Reduce time spent by support agents on T1 tickets, thereby enhancing their overall
productivity levels
○ Identify historical tickets which are similar to a new ticket - agents can resolve tickets faster by
looking up information contained in the similar ticket
● Enabling support agents to understand the different types of questions which are raised by
customers
● Help support agents create FAQs which can in turn enhance the bot’s self service potential
● Enable support agents to train the bot further by mapping customer queries to expected responses
6. Datasets for model training
● Source - Freshdesk data pertaining to customer (business) accounts
○ Includes tickets and Knowledge base articles, FAQs
○ Includes tickets from different channels such as e-mail, portal (raised on website),
chat, social and phone
● Data of different accounts - All active and paid accounts with at least 100 tickets in the
last 3 months.
● Training strategy
○ One model per account trained end-end
○ Embeddings trained at industry level, models at account level
Note: Tickets from email, portal-direct, chat and phone channels account for close to 95% of
the ticket volume
7. Modeling Methodology
FAQ Answerbot
Data Train - Historical ticket data + knowledge base - test tickets
Test - tickets in the last 10 days (no overlap with train)
Candidate responses - Articles/FAQs from the Knowledge base
Preprocessing Email cleaning - signature cleaning, cleaning forwarded emails, removal of
code constructs, non-ascii characters, salutation, text below signature
Primary preprocessing - unicode normalization, lower casing, punctuation
removal, stop words removal & stemming
Secondary preprocessing - bigram processing
L1 Layer Ensemble of LSA and W2V vector space embeddings
L1 Similarity metric Cosine similarity
8. Modeling Methodology
FAQ Answerbot
L2 Features 1. % word match between the query and candidate responses
2. % word match between words with similar parts-of-speech tags
3. Word mover distance
4. Ordered bigram and trigram counts
L2 Model RandomForest / XGBoost
Thresholds Based on L1 and L2 scores (with override levels)
9. Offline Model Training
Train data
Candidate
responses (n)
Test data
(m)
Preprocessing - Email cleaning, primary &
secondary preprocessing
L1 (Embedding) Layer -
training
Candidate
responses (n)
Test
vectors (m)
Pick top k responses based on
L1 scores (m*k)
Feature Creation
Preprocessing - missing value imputation,
outlier treatment, scaling
L2 (Classification)
Layer - training
Relevance Probability Vector ((m-t)*k)
Pick top 3 based on prob ((m-t)*3) +
evaluation
Train data
(t)
Candidate
responses (k)
Test data
(m-t)
Redis
S3
Write
Lookup/w
ord
vectors/idf
Write class
model object
Write L1 & L2 thresholds for gating and
ranking
10. Online Processing
I/P Query
Preprocessing - Email cleaning, primary &
secondary preprocessing
L1 (Embedding) Layer -
transformation
Candidate
response
vectors (n)
Query
vector
Pick top k based on similarity
(1*k)
Feature Creation
Preprocessing - missing value imputation,
outlier treatment, scaling
L2 (Classification)
Layer - prediction
Relevance Probability Vector (1*k)
Pick top 3 based on prob (1*3)
Redis
S3
Read
Lookup/w
ord
vectors/idf
Read class
model object
Read L1 & L2 thresholds for gating and
ranking
11. Onboarding a customer account
● Onboarding a new customer account involves extracting tickets and articles from the data
lake and training the L1 model (LSA)
● Onboarding also involves choosing the right pre-trained word embedding corresponding
to the account’s industry
○ Example industries : Retail, Financial services, SAS, Healthcare, Education
● An ensemble of LSA and W2V embeddings is used to generate L1 scores for each
(query, response) pairs
● A downstream classification (L2) model is trained to generate model confidence scores
for each (query, {response}) tuple
○ If enough data is not available for the concerned account, an industry-level L2
model is used
● Thresholding, i.e. deciding whether to answer a given query or not, is based on both L1
and L2 scores
12. ● Model refresh is key to ensuring that the models are up to date and stay relevant over
time
● This is done once a week; or as soon as an account accumulates a sizeable number of
new queries or Knowledge base updates
● It involves the following steps
○ Retraining the LSA model after including the newly accumulated data
○ Incremental training of word vectors with new data
○ Retraining the L2 (classification) model on recent data
■ The L2 model is trained by manually labeling if the responses from the L1 layer are
relevant or not (1/0)
■ A 3rd party company is engaged to label these responses
Periodic model refresh
13. Teach the bot
● Teach the bot is a feature that allows customer support agents to explicitly train the bot by
ingesting Q → A mappings
● When the Answerbot fails to respond to a query (Q), the agent can point the bot to the expected
response (A) which should have been returned
● If a suitable response (A) does not exist in the Knowledge base, it can be created on-the-fly
● This expected response (A) is consumed and mapped to be close to the query vector (Q) in the
L1 vector space
○ This ensures that article A would show up for future queries that are similar to Q
○ The same feature is re-purposed to resolve incorrect bot responses as well
○ This feature also helps to improve the overall coverage levels of the Answerbot
14. Metrics and business impact
Month
# Active
Clients
# Requests # Responded # Helpful # No Feedback % Deflection
May’18 97 10,805 6,075 1,657 1,868 15.34%
Jun’18 151 22,195 12,969 2,550 5,981 11.49%
July’18 182 30,376 19,330 3,792 5,669 12.48%
Aug’18 242 50,049 29,948 5,940 7,839 11.87%
Sep’18 347 63,587 38,064 8,308 10,112 13.07%
Oct’18 457 101,493 56,390 16,589 33,360 16.34%
Nov’18 478 130,687 78,902 25,680 46,555 19.65%
Dec’18 480 137,517 82,366 23,713 52,772 17.24%
● CSAT* - 79% with bots and 72% without bots
● Average First Response Time (overall) - 13 hrs with bots and 19 hrs without bots
*CSAT - Customer Satisfaction Score
15. Understanding the Metrics
● # Active clients - number of customers who are exposing the bot to their customers in their
support portal
● # Requests - number of requests that the bot gets
● # Responded - number of requests responded/answered by the bot
● # Helpful - number of requests where the bot responses were helpful
○ Alongside every bot response, a “Was this helpful?” message is also shown and the user’s
feedback is solicited. This helps in tracking helpful responses.
● # No Feedback - number of bot responses for which there was no feedback from users
● % Deflection - Ratio of the # Helpful and # Requests
16. Challenges and learnings
Challenges:
● Developing a preprocessing mechanism that can extract only the salient components from
messy emails
● Handling the complexity of storing and retrieving vector of floats (idfs, SVD components, word
vectors) for every account
● Serving predictions at low latency
● Handling kafka streams for updating content in real time - Spark streaming
● Usage of the right tools for monitoring and finding bugs in the codebase in a proactive manner
Lessons Learnt:
● Start with a simple model and add incremental improvements over a period of time
● Involve data engineers at the very beginning to create pipelines for data; front-end engineers for
making changes to the UI
● Define success metrics and inform stakeholders about what a reasonable target is
18. Appendix
Why are some suggestions not helpful to the
user?
● Query could relate to a new topic for which there may not be enough FAQs or articles
● Query could relate to an existing topic but may contain keywords which are not in the vocabulary
- This may result in low L1 and 2 confidence which may not satisfy the thresholds
● Query may be related to a particular action - Example: “Can you connect me to an agent?”
which is a question for a task completion bot that has intent detection capabilities
● Query may not have a question or issue - Example: “I have an open ticket 3335924”
● Query may be ambiguous or unclear - Example: “discussion”