발표일: 2018.1.
발표자: 최윤재 (Edward Choi, Georgia Tech 박사과정)
Since 2012, deep learning, or representation learning has shown impressive progress in computer vision, speech recognition, and natural language processing. The power of deep learning comes from combining expressive models with large labeled data. This allowed the machine to extract useful information from high-dimensional data, which was a human responsibility before the rise of deep learning.
Massive data have been collected in healthcare since the introduction of electronic healthcare records (EHR), and the amount of data is more than human medical experts can process. It is expected that, in this regard, deep learning can play a significant role in healthcare as it did in vision and language. However, computational healthcare requires predictive models to be both accurate and interpretable.
My talk will introduce how to use recurrent neural networks (RNN), one of the building blocks in deep learning, to process longitudinal EHR data and predict a future event. Specifically, I will focus on predicting a heart failure onset given a patients’ 18 months record. Building on top of this, I will address the interpretability issue of deep learning models, and propose a method to make predictions that is both accurate and interpretable.
Managing and Versioning Machine Learning Models in PythonSimon Frid
Practical machine learning is becoming messy, and while there are lots of algorithms, there is still a lot of infrastructure needed to manage and organize the models and datasets. Estimators and Django-Estimators are two python packages that can help version data sets and models, for deployment and effective workflow.
Emergence of MongoDB as an Enterprise Data HubMongoDB
Emergence of MongoDB as an Enterprise Data Hub, presented by Dylan Tong, Sr. Solutions Architect, MongoDB at MongoDB Evenings Seattle at the Seattle Public Library on October 6, 2015.
How to Migrate Applications Off a MainframeVMware Tanzu
Ah, the mainframe. Peel back many transactional business applications at any enterprise and you’ll find a mainframe application under there. It’s often where the crown jewels of the business’ data and core transactions are processed. The tooling for these applications is dated and new code is infrequent, but moving off is seen as risky. No one. Wants. To. Touch. Mainframes.
But mainframe applications don’t have to be the electric third rail. Modernizing, even pieces of those mainframe workloads into modern frameworks on modern platforms, has huge payoffs. Developers can gain all the productivity benefits of modern tooling. Not to mention the scaling, security, and cost benefits.
So, how do you get started modernizing applications off a mainframe? Join Rohit Kelapure, Consulting Practice Lead at Pivotal, as he shares lessons from projects with enterprises to move workloads off of mainframes. You’ll learn:
● How to decide what to modernize first by looking at business requirements AND the existing codebase
● How to take a test-driven approach to minimize risks in decomposing the mainframe application
● What to use as a replacement or evolution of mainframe schedulers
● How to include COBOL and other mainframe developers in the process to retain institutional knowledge and defuse project detractors
● How to replatform mainframe applications to the cloud leveraging a spectrum of techniques
Presenter : Rohit Kelapure, Consulting Practice Lead, Pivotal
Data integration is intrinsic to how modern research is undertaken in areas such as genomics, drug development and personalised medicine. To better enable this integration a large number of biomedical ontologies have been developed to provide standard semantics for describing metadata. There are now several hundred biomedical ontologies in widespread use that describe concepts such as genes, molecules, drugs and diseases. This amounts to millions of terms that are interconnected via relationships that naturally form a graph of biomedical terminology.
The Ontology Lookup Service (OLS) (http://www.ebi.ac.uk/ols) integrates over 160 ontologies and provide a central point for the biomedical community to query and visualise ontologies. OLS also provide a RESTful API over the ontologies that is used in high-throughput data annotation pipelines. OLS is built on top of a Neo4j database that provides efficient indexes for extracting ontological relationships. We have developed generic tools for loading RDF/OWL ontologies into Neo4j where the indexes are optimised for serving common ontology queries. We are now moving to adopt graph database more widely in applications relating to ontology mapping prediction and recommendation systems for data annotation.
AI, Knowledge Representation and Graph Databases - Key Trends in Data ScienceOptum
Knowledge Representation is a key focus for most modern AI texts. Many AI experts feel that over half of their work is understanding how to find the right knowledge structures to build intelligent agents that can continuously learn and respond to changing events in their world. In 2012, a paper published by Google started a consolidation of the many diverse forms of knowledge representation into a single general-purpose structure called a labeled property graph.
This talk will describe the key events behind this movement and show how a new generation of data scientist will be needed to build and maintain corporate knowledge graphs that contain a uniform, normalized and highly connected data sets for used by researchers and intelligent agents. We will also discuss the challenges of transferring siloed project-knowledge to reusable structures.
A tremendous backlog of predictive modeling problems in the industry and short supply of trained data scientists have spiked interest in automation over the last few years. A new academic field, AutoML, has emerged. However, there is a significant gap between the topics that are academically interesting and automation capabilities that are necessary to solve real-world industrial problems end-to-end. An even greater challenge is enabling a non-expert to build a robust and trustworthy AI solution for their company. In this talk, we’ll discuss what an industry-grade AutoML system consists of and the scientific and engineering challenges of building it.
Managing and Versioning Machine Learning Models in PythonSimon Frid
Practical machine learning is becoming messy, and while there are lots of algorithms, there is still a lot of infrastructure needed to manage and organize the models and datasets. Estimators and Django-Estimators are two python packages that can help version data sets and models, for deployment and effective workflow.
Emergence of MongoDB as an Enterprise Data HubMongoDB
Emergence of MongoDB as an Enterprise Data Hub, presented by Dylan Tong, Sr. Solutions Architect, MongoDB at MongoDB Evenings Seattle at the Seattle Public Library on October 6, 2015.
How to Migrate Applications Off a MainframeVMware Tanzu
Ah, the mainframe. Peel back many transactional business applications at any enterprise and you’ll find a mainframe application under there. It’s often where the crown jewels of the business’ data and core transactions are processed. The tooling for these applications is dated and new code is infrequent, but moving off is seen as risky. No one. Wants. To. Touch. Mainframes.
But mainframe applications don’t have to be the electric third rail. Modernizing, even pieces of those mainframe workloads into modern frameworks on modern platforms, has huge payoffs. Developers can gain all the productivity benefits of modern tooling. Not to mention the scaling, security, and cost benefits.
So, how do you get started modernizing applications off a mainframe? Join Rohit Kelapure, Consulting Practice Lead at Pivotal, as he shares lessons from projects with enterprises to move workloads off of mainframes. You’ll learn:
● How to decide what to modernize first by looking at business requirements AND the existing codebase
● How to take a test-driven approach to minimize risks in decomposing the mainframe application
● What to use as a replacement or evolution of mainframe schedulers
● How to include COBOL and other mainframe developers in the process to retain institutional knowledge and defuse project detractors
● How to replatform mainframe applications to the cloud leveraging a spectrum of techniques
Presenter : Rohit Kelapure, Consulting Practice Lead, Pivotal
Data integration is intrinsic to how modern research is undertaken in areas such as genomics, drug development and personalised medicine. To better enable this integration a large number of biomedical ontologies have been developed to provide standard semantics for describing metadata. There are now several hundred biomedical ontologies in widespread use that describe concepts such as genes, molecules, drugs and diseases. This amounts to millions of terms that are interconnected via relationships that naturally form a graph of biomedical terminology.
The Ontology Lookup Service (OLS) (http://www.ebi.ac.uk/ols) integrates over 160 ontologies and provide a central point for the biomedical community to query and visualise ontologies. OLS also provide a RESTful API over the ontologies that is used in high-throughput data annotation pipelines. OLS is built on top of a Neo4j database that provides efficient indexes for extracting ontological relationships. We have developed generic tools for loading RDF/OWL ontologies into Neo4j where the indexes are optimised for serving common ontology queries. We are now moving to adopt graph database more widely in applications relating to ontology mapping prediction and recommendation systems for data annotation.
AI, Knowledge Representation and Graph Databases - Key Trends in Data ScienceOptum
Knowledge Representation is a key focus for most modern AI texts. Many AI experts feel that over half of their work is understanding how to find the right knowledge structures to build intelligent agents that can continuously learn and respond to changing events in their world. In 2012, a paper published by Google started a consolidation of the many diverse forms of knowledge representation into a single general-purpose structure called a labeled property graph.
This talk will describe the key events behind this movement and show how a new generation of data scientist will be needed to build and maintain corporate knowledge graphs that contain a uniform, normalized and highly connected data sets for used by researchers and intelligent agents. We will also discuss the challenges of transferring siloed project-knowledge to reusable structures.
A tremendous backlog of predictive modeling problems in the industry and short supply of trained data scientists have spiked interest in automation over the last few years. A new academic field, AutoML, has emerged. However, there is a significant gap between the topics that are academically interesting and automation capabilities that are necessary to solve real-world industrial problems end-to-end. An even greater challenge is enabling a non-expert to build a robust and trustworthy AI solution for their company. In this talk, we’ll discuss what an industry-grade AutoML system consists of and the scientific and engineering challenges of building it.
Dmitry Kan, Principal AI Scientist at Silo AI and host of the Vector Podcast [1], will give an overview of the landscape of vector search databases and their role in NLP, along with the latest news and his view on the future of vector search. Further, he will share how he and his team participated in the Billion-Scale Approximate Nearest Neighbor Challenge and improved recall by 12% over a baseline FAISS.
Presented at https://www.meetup.com/open-nlp-meetup/events/282678520/
YouTube: https://www.youtube.com/watch?v=RM0uuMiqO8s&t=179s
Follow Vector Podcast to stay up to date on this topic: https://www.youtube.com/@VectorPodcast
How to Set Up a Cloud Cost Optimization Process for your EnterpriseRightScale
As cloud spend grows, enterprises need to set up internal processes to manage and optimize their cloud costs. This process will help organizations to accurately allocate and report on costs while minimizing wasted spend. In this webinar, experts from RightScale’s Cloud Cost Optimization team will share best practices in how to set up your own internal processes.
Actionable Insights with AI - Snowflake for Data ScienceHarald Erb
Talk @ ScaleUp 360° AI Infrastructures DACH, 2021: Data scientists spend 80% and more of their time searching for and preparing data. This talk explains Snowflake’s Platform capabilities like near-unlimited data storage and instant and near-infinite compute resources and how the platform can be used to seamlessly integrate and support the machine learning libraries and tools data scientists rely on.
Legacy ERP is any ERP system that cannot meet changing business needs.
Prevents innovation
Inflexible
Not real-time
Cannot leverage cloud
Reduces opportunity to work with LOB
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
Presentation for the Knowledge Graph Conference 2021
Abstract: Show me your schemas, and I will show you a graph! Although graph databases have become very popular in the enterprise, deep expertise in graphs is still in short supply (see "Building an Enterprise Knowledge Graph @Uber: Lessons from Reality" from KGC 2019). Developers often think of graphs as a completely different kind of thing from the rest of their company's data, and will go to great lengths to force their data into a "graph" shape. The amount of manual effort involved in building and maintaining ETL pipelines can become a bottleneck and a maintenance burden. In fact, there is usually a rich domain data model of entities, relationships, and properties which is already implicit in the company's existing schemas, be they interface descriptions for microservices, relational schemas, or various other kinds of storage schemas. Taking advantage of these schemas, and mapping conforming data into the graph, ought to require relatively little extra work, but developers need appropriate tools. In this presentation, we will illustrate such mappings with real-world examples from Uber, as well as introducing formal techniques for schema and data migration. We will also look ahead to the emerging GQL standard as the foundation for a new generation of highly interoperable graph database tools.
Data Analytics Strategies & Solutions for SAP customersVisual_BI
SAP customers are challenged in multiple fronts today, where we have rapidly evolved tools and technologies with smaller internal IT teams to evaluate them. In this webinar replay, Visual BI will offer strategies and solutions for some of the most common challenges faced by SAP BI & Analytics Leaders, Managers and Architects.
Recommender systems are software tools and techniques providing suggestions for items to be of interest to a user. Recommender systems have proved in recent years to be a valuable means of helping Web users by providing useful and effective recommendations or suggestions.
Migrating SAP Workloads to AWS: Stories and Tips - AWS Summit SydneyAmazon Web Services
There are many options to migrate SAP workloads to the AWS Cloud. Come and hear how our customers completed their migration on to AWS, gaining increased agility and significantly reducing their operational overhead whilst leveraging new capabilities. Take away some top tips and lessons learnt for your critical migration project.
Thinking of moving to S/4HANA? Starting down the road? Join Capgemini to hear how the Highway to S/4 HANA offering can make your journey a success. We will open our S/4 toolbox of accelerators, pre-defined scenarios, and reusable artifacts and show how you can reduce costs, shorten time to value, and make adoption predictable and planned.
An overview on how we have approached dataops to allow analysts and data scientists to work quickly and release frequently with high confidence. Covers:
- Cloud/multi-cloud architecture
- CI/CD in the data space
- Development, testing, and deployment
- Monitoring and alerting
Dmitry Kan, Principal AI Scientist at Silo AI and host of the Vector Podcast [1], will give an overview of the landscape of vector search databases and their role in NLP, along with the latest news and his view on the future of vector search. Further, he will share how he and his team participated in the Billion-Scale Approximate Nearest Neighbor Challenge and improved recall by 12% over a baseline FAISS.
Presented at https://www.meetup.com/open-nlp-meetup/events/282678520/
YouTube: https://www.youtube.com/watch?v=RM0uuMiqO8s&t=179s
Follow Vector Podcast to stay up to date on this topic: https://www.youtube.com/@VectorPodcast
How to Set Up a Cloud Cost Optimization Process for your EnterpriseRightScale
As cloud spend grows, enterprises need to set up internal processes to manage and optimize their cloud costs. This process will help organizations to accurately allocate and report on costs while minimizing wasted spend. In this webinar, experts from RightScale’s Cloud Cost Optimization team will share best practices in how to set up your own internal processes.
Actionable Insights with AI - Snowflake for Data ScienceHarald Erb
Talk @ ScaleUp 360° AI Infrastructures DACH, 2021: Data scientists spend 80% and more of their time searching for and preparing data. This talk explains Snowflake’s Platform capabilities like near-unlimited data storage and instant and near-infinite compute resources and how the platform can be used to seamlessly integrate and support the machine learning libraries and tools data scientists rely on.
Legacy ERP is any ERP system that cannot meet changing business needs.
Prevents innovation
Inflexible
Not real-time
Cannot leverage cloud
Reduces opportunity to work with LOB
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
Presentation for the Knowledge Graph Conference 2021
Abstract: Show me your schemas, and I will show you a graph! Although graph databases have become very popular in the enterprise, deep expertise in graphs is still in short supply (see "Building an Enterprise Knowledge Graph @Uber: Lessons from Reality" from KGC 2019). Developers often think of graphs as a completely different kind of thing from the rest of their company's data, and will go to great lengths to force their data into a "graph" shape. The amount of manual effort involved in building and maintaining ETL pipelines can become a bottleneck and a maintenance burden. In fact, there is usually a rich domain data model of entities, relationships, and properties which is already implicit in the company's existing schemas, be they interface descriptions for microservices, relational schemas, or various other kinds of storage schemas. Taking advantage of these schemas, and mapping conforming data into the graph, ought to require relatively little extra work, but developers need appropriate tools. In this presentation, we will illustrate such mappings with real-world examples from Uber, as well as introducing formal techniques for schema and data migration. We will also look ahead to the emerging GQL standard as the foundation for a new generation of highly interoperable graph database tools.
Data Analytics Strategies & Solutions for SAP customersVisual_BI
SAP customers are challenged in multiple fronts today, where we have rapidly evolved tools and technologies with smaller internal IT teams to evaluate them. In this webinar replay, Visual BI will offer strategies and solutions for some of the most common challenges faced by SAP BI & Analytics Leaders, Managers and Architects.
Recommender systems are software tools and techniques providing suggestions for items to be of interest to a user. Recommender systems have proved in recent years to be a valuable means of helping Web users by providing useful and effective recommendations or suggestions.
Migrating SAP Workloads to AWS: Stories and Tips - AWS Summit SydneyAmazon Web Services
There are many options to migrate SAP workloads to the AWS Cloud. Come and hear how our customers completed their migration on to AWS, gaining increased agility and significantly reducing their operational overhead whilst leveraging new capabilities. Take away some top tips and lessons learnt for your critical migration project.
Thinking of moving to S/4HANA? Starting down the road? Join Capgemini to hear how the Highway to S/4 HANA offering can make your journey a success. We will open our S/4 toolbox of accelerators, pre-defined scenarios, and reusable artifacts and show how you can reduce costs, shorten time to value, and make adoption predictable and planned.
An overview on how we have approached dataops to allow analysts and data scientists to work quickly and release frequently with high confidence. Covers:
- Cloud/multi-cloud architecture
- CI/CD in the data space
- Development, testing, and deployment
- Monitoring and alerting
In this presentation we answer the question, "Why do we need hypothesis tests in process improvement?" Then we walk you through a real, live hypothesis test direct from the Bahama Bistro!
You can find the rest of the webinar materials and questions from the webinar here:
https://goleansixsigma.com/webinar-set-run-hypothesis-tests/
Slides for three presentations Coolblue's Behind the Scenes Data Science event on 2018-03-22
Speakers:
- Andres Martinez (Data Science @ Coolblue)
- Matthias Schuurmans (forecasts)
- Daan Marechal (recommendations)
Golden Helix’s SNP & Variation Suite (SVS) has been used by researchers around the world to do association testing and trait analysis on large cohorts of samples in both humans and other species. As samples size increase to do population-scale genomics, the analysis methods need to adapt to remain computable on your analysis workstation.
One of the most popular methods for determining population structure in SVS is Principal Component Analysis. In this webcast, we review the fundamentals of this methodology, as well as how we have advanced the state of the art by implementing a new “Large Data PCA” capability in SVS, handling over 10 times as many samples as previously possible at a fraction of the time. Join us as we cover:
A review of SVS association testing and trait analysis capabilities
Usage of Principle Component Analysis to discern population structure
Scaling PCA beyond the limitations of computer hardware Other SVS improvements based on ongoing feedback from the user community
SVS continues to move forward as a flexible and powerful tool to perform genotype and Large-N variant analysis. We hope you enjoy this webcast highlighting the exciting new features and select enhancements we have made.
This presentation is used in a refresher course at Nuzvid. This is one day session of the course. It introduces research avenues in Image Processing and allied areas to faculty participants..
Big data vs the RCT - Derek Angus - SSAI2017scanFOAM
A talk by Derek Angus at the 2017 meeting of the Scandinavian Society of Anaestesiology and Intensive Care Medicine.
All of the conference content can be found here: https://scanfoam.org/ssai2017/
Developed in collaboration between scanFOAM, SSAI and SFAI.
A3 Thinking:
A3 thinking is a structured technique of working through problems or opportunities for improvement. The ‘A3’ itself is literally just that: a piece of A3 paper summarising the logical thought processes that have been agreed by the team in defining the opportunity for improvement or solving the problem they face.
The 10th Annual Utah Health Services Research Conference: Iterative Development of Sepsis Detection Algorithms for the Emergency Department. By: Peter Haug - Intermountain Healthcare
Health Services Research Conference: March 16, 2015
Patient Centered Research Methods Core, University of Utah, CCTS
Similar to Interpretable deep learning for healthcare (20)
비행기 설계를 왜 통일 해야 할까?
디자인 시스템을 하는 이유
비행기들이 다 용도가 다르다...어떻게 설계하지?
맥락이 다른 페이지와 패턴
경유지까지 아직 멀었다... 언제 수리하지?
디자인 시스템을 적용하는 시점
엔지니어랑 얘기해서 정비해야하는데...어떻게 수리하지?
디자인 시스템을 적용하는 프로세스
비행기 설계가 바뀐걸 어떻게 알리지?
디자인 시스템의 전파
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
24. Limitation of RNN
• Transparency
• RNN is a blackbox
• Feed input, receive output
• Hard to tell what caused the outcome
• Outcome 0.9
• Was it because of “Justice”?
• Was it because of “impressive”?
• Was it because of “Christmas”?
24
32. Attention models
• Attention, what is it good for?
• c is an explicit combination of all past information
• 𝛼#, 𝛼$, ⋯ , 𝛼#& denote the usefulness from each word
• We can tell which word was used the most/least to the outcome
c
𝛼#
𝛼$ 𝛼%
𝛼#&
32
33. Attention models
• Attention, what is it good for?
• Now c is an explicit combination of all past information
• 𝛼#, 𝛼$, ⋯ , 𝛼#& denote the usefulness from each word
• We can tell which word was used the most/least to the outcome
• Attentions 𝛼. are generated using an MLP
c
𝛼#
𝛼$ 𝛼%
𝛼#&
33
37. Structure of EHR
• Assumption so far
• Word sequence = Dx sequence
• Justice, League, is, as, impressive, as, …
• Cough, Benzonatate, Fever, Pneumonia, Chest X-ray, Amoxicillin, ...
Cough
Benzonatate
Fever
Pneumonia Amoxicillin
Chest X-ray
Time
37
38. Structure of EHR
• Assumption so far
• Word sequence = Dx sequence
• Justice, League, is, as, impressive, as, …
• Cough, Benzonatate, Fever, Pneumonia, Chest X-ray, Amoxicillin, ...
Cough
Benzonatate
Fever
Pneumonia Amoxicillin
Chest X-ray
Time
38
39. Structure of EHR
• Assumption so far
• Word sequence = Dx sequence
• Justice, League, is, as, impressive, as, …
• Cough, Benzonatate, Fever, Pneumonia, Chest X-ray, Amoxicillin, ...
Cough
Visit 1
Fever
Fever
Visit 2
Chill Fever
Visit 3
Pneumonia
Chest X-ray
Tylenol
IV fluid
39
55. RETAIN: Model Architecture
,# ,) ,*
&# &) &*
"# ") "*
$# $) $*
+# +) +*
'# ') '*
Σ
.* /*
5
⨀ ⨀ ⨀
1
23
4
011& 0112
Time
ure 2: Unfolded view of RETAIN’s architecture: Given input sequence x1, . . . , xi, we predict55
an RNN. To find the j-th word in the target language, we generate attentions ↵i
word in the original sentence. Then, we compute the context vector cj =
P
i ↵j
i hi
j-th word in the target language. In general, the attention mechanism allows the mo
word (or words) in the given sentence when generating each word in the target lan
In this work, we define a temporal attention mechanism to provide interpreta
healthcare. Doctors generally pay attention to specific clinical information (e.g., k
timing when reviewing EHR data. We exploit this insight to develop a temporal atte
doctors’ practice, which will be introduced next.
2.2 Reverse Time Attention Model RETAIN
Figure 2 shows the high-level overview of our model. One key idea is to delegate a
the prediction responsibility to the attention weights generation process. RNNs bec
due to the recurrent weights feeding past information to the hidden layer. Theref
visit-level and the variable-level (individual coordinates of xi) influence, we use a
input vector xi. That is, we define
vi = Exi,
where vi 2 Rm
denotes the embedding of the input vector xi 2 Rr
, m the size of t
E 2 Rm⇥r
the embedding matrix to learn. We can easily choose a more sophisticat
56. RETAIN: Model Architecture
,# ,) ,*
&# &) &*
"# ") "*
$# $) $*
+# +) +*
'# ') '*
Σ
.* /*
5
⨀ ⨀ ⨀
1
23
4
011& 0112
Time
ure 2: Unfolded view of RETAIN’s architecture: Given input sequence x1, . . . , xi, we predict
where vi 2 Rm
denotes the embedding of the input vector xi 2 Rr
, m th100
dimension, Wemb 2 Rm⇥r
the embedding matrix to learn. We can easily cho101
but still interpretable representation such as multilayer perceptron (MLP)102
used for representation learning in EHR data [10].103
We use two sets of weights for the visit-level attention and the variable-lev104
The scalars ↵1, . . . , ↵i are the visit-level attention weights that govern th105
embedding v1, . . . , vi. The vectors 1, . . . , i are the variable-level attenti106
each coordinate of the visit embedding v1,1, v1,2, . . . , v1,m, . . . , vi,1, vi,2, .107
We use two RNNs, RNN↵ and RNN , to separately generate ↵’s and ’s a108
gi, gi 1, . . . , g1 = RNN↵(vi, vi 1, . . . , v1),
ej = w>
↵ gj + b↵, for j = 1, . . . , i
↵1, ↵2, . . . , ↵i = Softmax(e1, e2, . . . , ei)
hi, hi 1, . . . , h1 = RNN (vi, vi 1, . . . , v1)
j = tanh W hj + b for j = 1,
where gi 2 Rp
is the hidden layer of RNN↵ at time step i, hi 2 Rq
the109
at time step i and w↵ 2 Rp
, b↵ 2 R, W 2 Rm⇥q
and b 2 Rm
are110
The hyperparameters p and q determine the hidden layer size of RNN↵ a111
3
56
57. RETAIN: Model Architecture
,# ,) ,*
&# &) &*
"# ") "*
$# $) $*
+# +) +*
'# ') '*
Σ
.* /*
5
⨀ ⨀ ⨀
1
23
4
011& 0112
Time
ure 2: Unfolded view of RETAIN’s architecture: Given input sequence x1, . . . , xi, we predict
mb 2 R the embedding matrix to learn. We can easily choose a more sophisticated
table representation such as multilayer perceptron (MLP) [13, 29] which has been
ntation learning in EHR data [10].
of weights for the visit-level attention and the variable-level attention, respectively.
. . . , ↵i are the visit-level attention weights that govern the influence of each visit
. . , vi. The vectors 1, . . . , i are the variable-level attention weights that focus on
of the visit embedding v1,1, v1,2, . . . , v1,m, . . . , vi,1, vi,2, . . . , vi,m.
Ns, RNN↵ and RNN , to separately generate ↵’s and ’s as follows,
gi, gi 1, . . . , g1 = RNN↵(vi, vi 1, . . . , v1),
ej = w>
↵ gj + b↵, for j = 1, . . . , i
↵1, ↵2, . . . , ↵i = Softmax(e1, e2, . . . , ei) (Step 2)
hi, hi 1, . . . , h1 = RNN (vi, vi 1, . . . , v1)
j = tanh W hj + b for j = 1, . . . , i, (Step 3)
is the hidden layer of RNN↵ at time step i, hi 2 Rq
the hidden layer of RNN
nd w↵ 2 Rp
, b↵ 2 R, W 2 Rm⇥q
and b 2 Rm
are the parameters to learn.
meters p and q determine the hidden layer size of RNN↵ and RNN , respectively.
3
57
58. RETAIN: Model Architecture
,# ,) ,*
&# &) &*
"# ") "*
$# $) $*
+# +) +*
'# ') '*
Σ
.* /*
5
⨀ ⨀ ⨀
1
23
4
011& 0112
Time
ure 2: Unfolded view of RETAIN’s architecture: Given input sequence x1, . . . , xi, we predict
records, they typically study the patient’s most recent records first, and go back in time.
ationally, running the RNN in reversed time order has several advantages as well: The reverse
der allows us to generate e’s and ’s that dynamically change their values when making
ons at different time steps i = 1, 2, . . . , T. It ensures that the attention vectors will be different
timestamp and makes the attention generation process computationally more stable.1
erate the context vector ci for a patient up to the i-th visit as follows,
ci =
iX
j=1
↵j j vj, (Step 4)
denotes element-wise multiplication. We use the context vector ci 2 Rm
to predict the true
2 {0, 1}s
as follows,
byi = Softmax(Wci + b), (Step 5)
W 2 Rs⇥m
and b 2 Rs
are parameters to learn. We use the cross-entropy to calculate the
ation loss as follows,
L(x1, . . . , xT ) =
1
N
NX
n=1
1
T(n)
T (n)
X
i=1
⇣
y>
i log(byi) + (1 yi)>
log(1 byi)
⌘
(1)
we sum the cross entropy errors from all dimensions of byi. In case of real-valued output
, we can change the cross-entropy in Eq. (1) to for example mean squared error.
our attention mechanism can be viewed as the inverted architecture of the standard attention
ism for NLP [2] where the words are encoded using RNN and generate the attention weights
58
59. RETAIN: Model Architecture
,# ,) ,*
&# &) &*
"# ") "*
$# $) $*
+# +) +*
'# ') '*
Σ
.* /*
5
⨀ ⨀ ⨀
1
23
4
011& 0112
Time
ure 2: Unfolded view of RETAIN’s architecture: Given input sequence x1, . . . , xi, we predict
ci =
iX
j=1
↵j j vj,
where denotes element-wise multiplication. We use the context vector ci 2123
label yi 2 {0, 1}s
as follows,124
byi = Softmax(Wci + b),
where W 2 Rs⇥m
and b 2 Rs
are parameters to learn. We use the cross-en125
classification loss as follows,126
L(x1, . . . , xT ) =
1
N
NX
n=1
1
T(n)
T (n)
X
i=1
⇣
y>
i log(byi) + (1 yi)>
log
where we sum the cross entropy errors from all dimensions of byi. In case127
yi 2 Rs
, we can change the cross-entropy in Eq. (1) to for example mean squ128
Overall, our attention mechanism can be viewed as the inverted architecture of129
mechanism for NLP [2] where the words are encoded using RNN and generate130
using MLP. Our method, on the other hand, uses MLP to embed the visit in131
interpretation and uses RNN to generate two sets of attention weights, reco132
information as well as mimicking the behavior of physicians.133
59
60. RETAIN: Calculating the Contributions
e a method to interpret the end-to-end behavior of RETAIN. By keeping ↵ and values fixed
ntion of doctors, we will analyze the changes in the probability of each label yi,1, . . . , yi,s
f the change in an original input x1,1, . . . , x1,r, . . . , xi,1, . . . , xi,r. The xj,k that lead to the
ange in yi,d will be the input variable with highest contribution. More formally, given the
x1, . . . , xi, we are trying to predict the probability of the output vector yi 2 {0, 1}s
, which
pressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
fact that the visit embedding vi is the sum of the columns of Wemb weighted by each
f xi, Eq (3) can be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,kWemb[:, k]
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j Wemb[:, k]
⌘
+ b
◆
(4)
k is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the
60
61. RETAIN: Calculating the Contributions
the past records, they typically study the patient’s most recent records fi117
Computationally, running the RNN in reversed time order has several advan118
time order allows us to generate e’s and ’s that dynamically change th119
predictions at different time steps i = 1, 2, . . . , T. It ensures that the attentio120
at each timestamp and makes the attention generation process computation121
We generate the context vector ci for a patient up to the i-th visit as follow122
ci =
iX
j=1
↵j j vj,
where denotes element-wise multiplication. We use the context vector ci123
label yi 2 {0, 1}s
as follows,124
byi = Softmax(Wci + b),
where W 2 Rs⇥m
and b 2 Rs
are parameters to learn. We use the cross125
classification loss as follows,126
L(x1, . . . , xT ) =
1
N
NX
n=1
1
T(n)
T (n)
X
i=1
⇣
y>
i log(byi) + (1 yi)>
where we sum the cross entropy errors from all dimensions of byi. In ca127
yi 2 Rs
, we can change the cross-entropy in Eq. (1) to for example mean128
Overall, our attention mechanism can be viewed as the inverted architecture129
mechanism for NLP [2] where the words are encoded using RNN and gene130
e a method to interpret the end-to-end behavior of RETAIN. By keeping ↵ and values fixed
ntion of doctors, we will analyze the changes in the probability of each label yi,1, . . . , yi,s
f the change in an original input x1,1, . . . , x1,r, . . . , xi,1, . . . , xi,r. The xj,k that lead to the
ange in yi,d will be the input variable with highest contribution. More formally, given the
x1, . . . , xi, we are trying to predict the probability of the output vector yi 2 {0, 1}s
, which
pressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
fact that the visit embedding vi is the sum of the columns of Wemb weighted by each
f xi, Eq (3) can be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,kWemb[:, k]
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j Wemb[:, k]
⌘
+ b
◆
(4)
k is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the
61
n terms of the change in an original input x1,1, . . . , x1,r, . . . , xi,1, . . . , xi,r. The xj,k that lead to the
argest change in yi,d will be the input variable with highest contribution. More formally, given the
equence x1, . . . , xi, we are trying to predict the probability of the output vector yi 2 {0, 1}s
, which
an be expressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
where ci 2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
1, . . . , vi weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
Using the fact that the visit embedding vi is the sum of the columns of Wemb weighted by each
lement of xi, Eq (3) can be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,kWemb[:, k]
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j Wemb[:, k]
⌘
+ b
◆
(4)
where xj,k is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the
kelihood of yi can be completely deconstructed down to the variables at each input x1, . . . , xi.
herefore we can calculate the contribution ! of the k-th variable of the input xj at time step j i,
62. RETAIN: Calculating the Contributions
n terms of the change in an original input x1,1, . . . , x1,r, . . . , xi,1, . . . , xi,r. The xj,k that lead to the
argest change in yi,d will be the input variable with highest contribution. More formally, given the
equence x1, . . . , xi, we are trying to predict the probability of the output vector yi 2 {0, 1}s
, which
an be expressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
where ci 2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
1, . . . , vi weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
Using the fact that the visit embedding vi is the sum of the columns of Wemb weighted by each
lement of xi, Eq (3) can be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,kWemb[:, k]
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j Wemb[:, k]
⌘
+ b
◆
(4)
where xj,k is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the
kelihood of yi can be completely deconstructed down to the variables at each input x1, . . . , xi.
herefore we can calculate the contribution ! of the k-th variable of the input xj at time step j i,
e a method to interpret the end-to-end behavior of RETAIN. By keeping ↵ and values fixed
ntion of doctors, we will analyze the changes in the probability of each label yi,1, . . . , yi,s
f the change in an original input x1,1, . . . , x1,r, . . . , xi,1, . . . , xi,r. The xj,k that lead to the
ange in yi,d will be the input variable with highest contribution. More formally, given the
x1, . . . , xi, we are trying to predict the probability of the output vector yi 2 {0, 1}s
, which
pressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
fact that the visit embedding vi is the sum of the columns of Wemb weighted by each
f xi, Eq (3) can be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,kWemb[:, k]
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j Wemb[:, k]
⌘
+ b
◆
(4)
k is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the
62
2.2 Reverse Time Attention Model RETAIN
Figure 2 shows the high-level overview of our model. One key idea is
the prediction responsibility to the attention weights generation proces
due to the recurrent weights feeding past information to the hidden l
visit-level and the variable-level (individual coordinates of xi) influen
input vector xi. That is, we define
vi = Exi,
where vi 2 Rm
denotes the embedding of the input vector xi 2 Rr
, m
E 2 Rm⇥r
the embedding matrix to learn. We can easily choose a mor
representation such as multilayer perceptron (MLP) [13, 28] which has
in EHR data [10].
We use two sets of weights for the visit-level attention and the vari
scalars ↵1, . . . , ↵i are the visit-level attention weights that govern th
v1, . . . , vi. The vectors 1, . . . , i are the variable-level attention weig
the visit embedding v1,1, v1,2, . . . , v1,m, . . . , vi,1, vi,2, . . . , vi,m.
We use two RNNs, RNN↵ and RNN , to separately generate ↵’s a
predict the probability of the output vector yi 2 {0, 1}s
, which can be expressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
fact that the visit embedding vi is the sum of the columns of E weighted by each element of xi,
be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,ke:,k
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j e:,k
⌘
+ b
◆
(4)
is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the likelihood of
completely deconstructed down to the variables at each input x1, . . . , xi. Therefore we can calculate
bution ! of the k-th variable of the input xj at time step j i, for predicting yi as follows,
!(yi, xj,k) = ↵jW( j e:,k)
| {z }
Contribution coefficient
xj,k
|{z}
Input value
, (5)
k-th column of E
63. RETAIN: Calculating the Contributions
e a method to interpret the end-to-end behavior of RETAIN. By keeping ↵ and values fixed
ntion of doctors, we will analyze the changes in the probability of each label yi,1, . . . , yi,s
f the change in an original input x1,1, . . . , x1,r, . . . , xi,1, . . . , xi,r. The xj,k that lead to the
ange in yi,d will be the input variable with highest contribution. More formally, given the
x1, . . . , xi, we are trying to predict the probability of the output vector yi 2 {0, 1}s
, which
pressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
fact that the visit embedding vi is the sum of the columns of Wemb weighted by each
f xi, Eq (3) can be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,kWemb[:, k]
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j Wemb[:, k]
⌘
+ b
◆
(4)
k is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the
Inside the iteration over k
63
e in an original input x1,1, . . . , x1,r, . . . , xi,1, . . . , xi,r. The xj,k that lead to the largest change in
e the input variable with highest contribution. More formally, given the sequence x1, . . . , xi, we are
predict the probability of the output vector yi 2 {0, 1}s
, which can be expressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
fact that the visit embedding vi is the sum of the columns of E weighted by each element of xi,
n be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,ke:,k
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j e:,k
⌘
+ b
◆
(4)
is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the likelihood of
completely deconstructed down to the variables at each input x1, . . . , xi. Therefore we can calculate
bution ! of the k-th variable of the input xj at time step j i, for predicting yi as follows,
!(y , x ) = ↵ W( e ) x , (5)
n terms of the change in an original input x1,1, . . . , x1,r, . . . , xi,1, . . . , xi,r. The xj,k that lead to the
argest change in yi,d will be the input variable with highest contribution. More formally, given the
equence x1, . . . , xi, we are trying to predict the probability of the output vector yi 2 {0, 1}s
, which
an be expressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
where ci 2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
1, . . . , vi weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
Using the fact that the visit embedding vi is the sum of the columns of Wemb weighted by each
lement of xi, Eq (3) can be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,kWemb[:, k]
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j Wemb[:, k]
⌘
+ b
◆
(4)
where xj,k is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the
kelihood of yi can be completely deconstructed down to the variables at each input x1, . . . , xi.
herefore we can calculate the contribution ! of the k-th variable of the input xj at time step j i,
predict the probability of the output vector yi 2 {0, 1}s
, which can be expressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
fact that the visit embedding vi is the sum of the columns of E weighted by each element of xi,
be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,ke:,k
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j e:,k
⌘
+ b
◆
(4)
is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the likelihood of
completely deconstructed down to the variables at each input x1, . . . , xi. Therefore we can calculate
bution ! of the k-th variable of the input xj at time step j i, for predicting yi as follows,
!(yi, xj,k) = ↵jW( j e:,k)
| {z }
Contribution coefficient
xj,k
|{z}
Input value
, (5)
Scalars in the front
64. RETAIN: Calculating the Contributions
e a method to interpret the end-to-end behavior of RETAIN. By keeping ↵ and values fixed
ntion of doctors, we will analyze the changes in the probability of each label yi,1, . . . , yi,s
f the change in an original input x1,1, . . . , x1,r, . . . , xi,1, . . . , xi,r. The xj,k that lead to the
ange in yi,d will be the input variable with highest contribution. More formally, given the
x1, . . . , xi, we are trying to predict the probability of the output vector yi 2 {0, 1}s
, which
pressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
fact that the visit embedding vi is the sum of the columns of Wemb weighted by each
f xi, Eq (3) can be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,kWemb[:, k]
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j Wemb[:, k]
⌘
+ b
◆
(4)
k is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the
64
1 i
predict the probability of the output vector yi 2 {0, 1}s
, which can be expressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
fact that the visit embedding vi is the sum of the columns of E weighted by each element of xi,
n be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,ke:,k
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j e:,k
⌘
+ b
◆
(4)
is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the likelihood of
completely deconstructed down to the variables at each input x1, . . . , xi. Therefore we can calculate
bution ! of the k-th variable of the input xj at time step j i, for predicting yi as follows,
!(yi, xj,k) = ↵jW( j e:,k)
| {z }
Contribution coefficient
xj,k
|{z}
Input value
, (5)
n terms of the change in an original input x1,1, . . . , x1,r, . . . , xi,1, . . . , xi,r. The xj,k that lead to the
argest change in yi,d will be the input variable with highest contribution. More formally, given the
equence x1, . . . , xi, we are trying to predict the probability of the output vector yi 2 {0, 1}s
, which
an be expressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
where ci 2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
1, . . . , vi weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
Using the fact that the visit embedding vi is the sum of the columns of Wemb weighted by each
lement of xi, Eq (3) can be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,kWemb[:, k]
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j Wemb[:, k]
⌘
+ b
◆
(4)
where xj,k is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the
kelihood of yi can be completely deconstructed down to the variables at each input x1, . . . , xi.
herefore we can calculate the contribution ! of the k-th variable of the input xj at time step j i,
predict the probability of the output vector yi 2 {0, 1}s
, which can be expressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
fact that the visit embedding vi is the sum of the columns of E weighted by each element of xi,
be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,ke:,k
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j e:,k
⌘
+ b
◆
(4)
is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the likelihood of
completely deconstructed down to the variables at each input x1, . . . , xi. Therefore we can calculate
bution ! of the k-th variable of the input xj at time step j i, for predicting yi as follows,
!(yi, xj,k) = ↵jW( j e:,k)
| {z }
Contribution coefficient
xj,k
|{z}
Input value
, (5)
65. RETAIN: Calculating the Contributions
e a method to interpret the end-to-end behavior of RETAIN. By keeping ↵ and values fixed
ntion of doctors, we will analyze the changes in the probability of each label yi,1, . . . , yi,s
f the change in an original input x1,1, . . . , x1,r, . . . , xi,1, . . . , xi,r. The xj,k that lead to the
ange in yi,d will be the input variable with highest contribution. More formally, given the
x1, . . . , xi, we are trying to predict the probability of the output vector yi 2 {0, 1}s
, which
pressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
fact that the visit embedding vi is the sum of the columns of Wemb weighted by each
f xi, Eq (3) can be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,kWemb[:, k]
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j Wemb[:, k]
⌘
+ b
◆
(4)
k is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the
Contribution of the k-th code in the j-th visit
65
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
e visit embedding vi is the sum of the columns of E weighted by each element of xi,
en as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,ke:,k
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j e:,k
⌘
+ b
◆
(4)
element of the input vector xj. Eq (4) tells us that the calculation of the likelihood of
econstructed down to the variables at each input x1, . . . , xi. Therefore we can calculate
the k-th variable of the input xj at time step j i, for predicting yi as follows,
!(yi, xj,k) = ↵jW( j e:,k)
| {z }
Contribution coefficient
xj,k
|{z}
Input value
, (5)
i is omitted in the ↵j and j. As we have described in Section 2.2, we are generating
1 i
predict the probability of the output vector yi 2 {0, 1}s
, which can be expressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
fact that the visit embedding vi is the sum of the columns of E weighted by each element of xi,
n be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,ke:,k
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j e:,k
⌘
+ b
◆
(4)
is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the likelihood of
completely deconstructed down to the variables at each input x1, . . . , xi. Therefore we can calculate
bution ! of the k-th variable of the input xj at time step j i, for predicting yi as follows,
!(yi, xj,k) = ↵jW( j e:,k)
| {z }
Contribution coefficient
xj,k
|{z}
Input value
, (5)
n terms of the change in an original input x1,1, . . . , x1,r, . . . , xi,1, . . . , xi,r. The xj,k that lead to the
argest change in yi,d will be the input variable with highest contribution. More formally, given the
equence x1, . . . , xi, we are trying to predict the probability of the output vector yi 2 {0, 1}s
, which
an be expressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
where ci 2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
1, . . . , vi weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
Using the fact that the visit embedding vi is the sum of the columns of Wemb weighted by each
lement of xi, Eq (3) can be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,kWemb[:, k]
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j Wemb[:, k]
⌘
+ b
◆
(4)
where xj,k is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the
kelihood of yi can be completely deconstructed down to the variables at each input x1, . . . , xi.
herefore we can calculate the contribution ! of the k-th variable of the input xj at time step j i,
predict the probability of the output vector yi 2 {0, 1}s
, which can be expressed as follows
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax (Wci + b) (2)
2 Rm
denotes the context vector. According to Step 4, ci is the sum of the visit embeddings
weighted by the attentions ↵’s and ’s. Therefore Eq (2) can be rewritten as follows,
p(yi|x1, . . . , xi) = p(yi|ci) = Softmax
✓
W
⇣ iX
j=1
↵j j vj
⌘
+ b
◆
(3)
fact that the visit embedding vi is the sum of the columns of E weighted by each element of xi,
be rewritten as follows,
p(yi|x1, . . . , xi) = Softmax
✓
W
⇣ iX
j=1
↵j j
rX
k=1
xj,ke:,k
⌘
+ b
◆
= Softmax
✓ iX
j=1
rX
k=1
xj,k ↵jW
⇣
j e:,k
⌘
+ b
◆
(4)
is the k-th element of the input vector xj. Eq (4) tells us that the calculation of the likelihood of
completely deconstructed down to the variables at each input x1, . . . , xi. Therefore we can calculate
bution ! of the k-th variable of the input xj at time step j i, for predicting yi as follows,
!(yi, xj,k) = ↵jW( j e:,k)
| {z }
Contribution coefficient
xj,k
|{z}
Input value
, (5)
68. Heart failure prediction
• Performance measure
• Area under the ROC curve (AUC)
• Competing models
• Logistic regression
• Aggregate all past codes into a fixed-size vector. Feed it to LR
• MLP
• Aggregate all past codes into a fixed-size vector. Feed it to MLP
• Two-layer RNN
• Visits are fed to the RNN, whose hidden layers are fed to another RNN.
• RNN+attention (Bahdanau et al. 2014)
• Visits are fed to RNN. Visit-level attentions are generated by MLP
• RETAIN
68
69. Heart failure prediction
Models AUC Training time / epoch Test time for 5K patients
Logistic Regression 0.7900 ± 0.0111 0.15s 0.11s
MLP 0.8256 ± 0.0096 0.25s 0.11s
Two-layer RNN 0.8706 ± 0.0080 10.3s 0.57s
RNN+attention 0.8624 ± 0.0079 6.7s 0.48s
RETAIN 0.8705 ± 0.0081 10.8s 0.63s
• RETAIN as accurate as RNN
• Requires similar training time & test time
• RETAIN is interpretable!
• RNN is a blackbox
69
71. Conclusion
• RETAIN: interpretable prediction framework
• As accurate as RNN
• Interpretable prediction
• Predictions can be explained
• Can be extended to general prognosis
• What are the likely disease he/she will have in the future?
• Can be used for any sequences with the two-layer structure
• E.g. online shopping
71