JupyterCon NY 2017-08-24
https://www.safaribooksonline.com/library/view/jupytercon-2017-/9781491985311/video313210.html
Paco Nathan reviews use cases where Jupyter provides a front-end to AI as the means for keeping "humans in the loop". This talk introduces *active learning* and the "human-in-the-loop" design pattern for managing how people and machines collaborate in AI workflows, including several case studies.
The talk also explores how O'Reilly Media leverages AI in Media, and in particular some of our use cases for active learning such as disambiguation in content discovery. We're using Jupyter as a way to manage active learning ML pipelines, where the machines generally run automated until they hit an edge case and refer the judgement back to human experts. In turn, the experts training the ML pipelines purely through examples, not feature engineering, model parameters, etc.
Jupyter notebooks serve as one part configuration file, one part data sample, one part structured log, one part data visualization tool. O'Reilly has released an open source project on GitHub called `nbtransom` which builds atop `nbformat` and `pandas` for our active learning use cases.
This work anticipates upcoming work on collaborative documents in JupyterLab, based on Google Drive. In other words, where the machines and people are collaborators on shared documents.
Humans in the loop: AI in open source and industryPaco Nathan
Nike Tech Talk, Portland, 2017-08-10
https://niketechtalks-aug2017.splashthat.com/
O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner.
This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon.
Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do.
In particular, we'll show two open source projects in Python from O'Reilly's AI team:
• pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics
• nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
Strata Singapore 2017 session talk 2017-12-06
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/65611
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn’t it applicable?
* How do HITL approaches compare/contrast with more “typical” use of Big Data?
* What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time:
* In what ways do the humans involved learn from the machines?
* In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
Strata CA 2018-03-08
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64223
Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human-in-the-loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (a special case of semi-supervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models.
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
Human-in-a-loop: a design pattern for managing teams which leverage ML
Big Data Spain, 2017-11-16
https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn't it applicable?
* How do HITL approaches compare/contrast with more "typical" use of Big Data?
* What's the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time
* In what ways do the humans involved learn from the machines?
In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Data By The People, For The People
Daniel Tunkelang
Director, Data Science at LinkedIn
Invited Talk at the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012)
LinkedIn has a unique data collection: the 175M+ members who use LinkedIn are also the content those same members access using our information retrieval products. LinkedIn members performed over 4 billion professionally-oriented searches in 2011, most of those to find and discover other people. Every LinkedIn search and recommendation is deeply personalized, reflecting the user's current employment, career history, and professional network. In this talk, I will describe some of the challenges and opportunities that arise from working with this unique corpus. I will discuss work we are doing in the areas of relevance, recommendation, and reputation, as well as the ecosystem we have developed to incent people to provide the high-quality semi-structured profiles that make LinkedIn so useful.
Bio:
Daniel Tunkelang leads the data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn's members. Prior to LinkedIn, Daniel led a local search quality team at Google. Daniel was a founding employee of faceted search pioneer Endeca (recently acquired by Oracle), where he spent ten years as Chief Scientist. He has authored fourteen patents, written a textbook on faceted search, created the annual workshop on human-computer interaction and information retrieval (HCIR), and participated in the premier research conferences on information retrieval, knowledge management, databases, and data mining (SIGIR, CIKM, SIGMOD, SIAM Data Mining). Daniel holds a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
Recommender systems support the decision making processes of customers with personalized suggestions. These widely used systems influence the daily life of almost everyone across domains like ecommerce, social media, and entertainment. However, the efficient generation of relevant recommendations in large-scale systems is a very complex task. In order to provide personalization, engines and algorithms need to capture users’ varying tastes and find mostly nonlinear dependencies between them and a multitude of items. Enormous data sparsity and ambitious real-time requirements further complicate this challenge. At the same time, deep learning has been proven to solve complex tasks like object or speech recognition where traditional machine learning failed or showed mediocre performance.
Explore a use case for vehicle recommendations at mobile.de, Germany’s biggest online vehicle market. Marcel shares a novel regularization technique for the optimization criterion and evaluates it against various baselines. To achieve high scalability, he combines this method with strategies for efficient candidate generation based on user and item embeddings—providing a holistic solution for candidate generation and ranking.
The proposed approach outperforms collaborative filtering and hybrid collaborative-content-based filtering by 73% and 143% for MAP@5. It also scales well for millions of items and users returning recommendations in tens of milliseconds.
Hank Roark of H2O gives an overview on data science, machine learning, and H2O.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Humans in the loop: AI in open source and industryPaco Nathan
Nike Tech Talk, Portland, 2017-08-10
https://niketechtalks-aug2017.splashthat.com/
O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner.
This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon.
Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do.
In particular, we'll show two open source projects in Python from O'Reilly's AI team:
• pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics
• nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
Strata Singapore 2017 session talk 2017-12-06
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/65611
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn’t it applicable?
* How do HITL approaches compare/contrast with more “typical” use of Big Data?
* What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time:
* In what ways do the humans involved learn from the machines?
* In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
Strata CA 2018-03-08
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64223
Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human-in-the-loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (a special case of semi-supervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models.
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
Human-in-a-loop: a design pattern for managing teams which leverage ML
Big Data Spain, 2017-11-16
https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn't it applicable?
* How do HITL approaches compare/contrast with more "typical" use of Big Data?
* What's the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time
* In what ways do the humans involved learn from the machines?
In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Data By The People, For The People
Daniel Tunkelang
Director, Data Science at LinkedIn
Invited Talk at the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012)
LinkedIn has a unique data collection: the 175M+ members who use LinkedIn are also the content those same members access using our information retrieval products. LinkedIn members performed over 4 billion professionally-oriented searches in 2011, most of those to find and discover other people. Every LinkedIn search and recommendation is deeply personalized, reflecting the user's current employment, career history, and professional network. In this talk, I will describe some of the challenges and opportunities that arise from working with this unique corpus. I will discuss work we are doing in the areas of relevance, recommendation, and reputation, as well as the ecosystem we have developed to incent people to provide the high-quality semi-structured profiles that make LinkedIn so useful.
Bio:
Daniel Tunkelang leads the data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn's members. Prior to LinkedIn, Daniel led a local search quality team at Google. Daniel was a founding employee of faceted search pioneer Endeca (recently acquired by Oracle), where he spent ten years as Chief Scientist. He has authored fourteen patents, written a textbook on faceted search, created the annual workshop on human-computer interaction and information retrieval (HCIR), and participated in the premier research conferences on information retrieval, knowledge management, databases, and data mining (SIGIR, CIKM, SIGMOD, SIAM Data Mining). Daniel holds a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
Recommender systems support the decision making processes of customers with personalized suggestions. These widely used systems influence the daily life of almost everyone across domains like ecommerce, social media, and entertainment. However, the efficient generation of relevant recommendations in large-scale systems is a very complex task. In order to provide personalization, engines and algorithms need to capture users’ varying tastes and find mostly nonlinear dependencies between them and a multitude of items. Enormous data sparsity and ambitious real-time requirements further complicate this challenge. At the same time, deep learning has been proven to solve complex tasks like object or speech recognition where traditional machine learning failed or showed mediocre performance.
Explore a use case for vehicle recommendations at mobile.de, Germany’s biggest online vehicle market. Marcel shares a novel regularization technique for the optimization criterion and evaluates it against various baselines. To achieve high scalability, he combines this method with strategies for efficient candidate generation based on user and item embeddings—providing a holistic solution for candidate generation and ranking.
The proposed approach outperforms collaborative filtering and hybrid collaborative-content-based filtering by 73% and 143% for MAP@5. It also scales well for millions of items and users returning recommendations in tens of milliseconds.
Hank Roark of H2O gives an overview on data science, machine learning, and H2O.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
I presented these slides as a keynote at the Enterprise Intelligence Workshop at KDD2016 in San francisco.
In these slides, I describe our work towards developing a Maslow's Hierarchy for Human in the Loop Data Analytics!
Big Data and Data Intensive Computing: Use CasesJongwook Woo
This invited talk was held by LG Data Mining Lab at LG R&D center, Woomyun-dong, Seoul, Korea. Introduces the emerging Hadoop ecosystems: Giraph, Spark, Shark, Flume and the use cases using Big Data in Korea and US. And, illustrates the importance of taking training.
Big Data and Data Intensive Computing on NetworksJongwook Woo
Big Data on Networks with Hadoop and its ecosystems (Giraph, Flume,...) at Korea Institute of Science and Technology Information. Illustrates some possible approach on Networks
How To Interview a Data Scientist
Daniel Tunkelang
Presented at the O'Reilly Strata 2013 Conference
Video: https://www.youtube.com/watch?v=gUTuESHKbXI
Interviewing data scientists is hard. The tech press sporadically publishes “best” interview questions that are cringe-worthy.
At LinkedIn, we put a heavy emphasis on the ability to think through the problems we work on. For example, if someone claims expertise in machine learning, we ask them to apply it to one of our recommendation problems. And, when we test coding and algorithmic problem solving, we do it with real problems that we’ve faced in the course of our day jobs. In general, we try as hard as possible to make the interview process representative of actual work.
In this session, I’ll offer general principles and concrete examples of how to interview data scientists. I’ll also touch on the challenges of sourcing and closing top candidates.
Python's Role in the Future of Data AnalysisPeter Wang
Why is "big data" a challenge, and what roles do high-level languages like Python have to play in this space?
The video of this talk is at: https://vimeo.com/79826022
This presentation is prepared by one of our renowned tutor "Suraj"
If you are interested to learn more about Big Data, Hadoop, data Science then join our free Introduction class on 14 Jan at 11 AM GMT. To register your interest email us at info@uplatz.com
Introduction to Big Data and AI for Business Analytics and PredictionJongwook Woo
Big Data has been popular last 10 years using Hadoop and Spark for data analysis and prediction with large scale data sets in distributed parallel computing systems. Its platform has expanded using NoSQL DB and Search Engine as well and has been more popular along cloud computing. Then, Deep Learning has become a buzzword past several years using GPU and Big Data. It makes even small companies and labs to own supercomputers with a small amount of budgets, which is the situation of “Dream Comes True” in the IT and business. In this talk, the history and trends of Big Data and AI platforms are introduced and how predictive analysis should be presented in Business using Big Data & AI.
MIT Deep Learning Basics: Introduction and Overview by Lex FridmanPeerasak C.
MIT Deep Learning Basics: Introduction and Overview by Lex Fridman
Watch video: https://youtu.be/O5xeyoRL95U
An introductory lecture for MIT course 6.S094 on the basics of deep learning including a few key ideas, subfields, and the big picture of why neural networks have inspired and energized an entire new generation of researchers. For more lecture videos on deep learning, reinforcement learning (RL), artificial intelligence (AI & AGI), and podcast conversations, visit our website or follow TensorFlow code tutorials on our GitHub repo.
INFO:
Website: https://deeplearning.mit.edu
CONNECT:
- If you enjoyed this video, please subscribe to this channel.
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/lexfridman
- Instagram: https://www.instagram.com/lexfridman
Presentation - "Dealing with uncertainty in fintech using AI" by Evgeny Savin.
Evgeny is a Senior Data Scientist at smava for the last 3,5 years. His work focused on the design and implementation of the ML engine that helps customers find the best loans from one of the smava partner banks. Previously, Evgeny worked as a data scientist at HERE and Paypal.
Introduction to Machine Learning - WeCloudDataWeCloudData
In this talk, WeCloudData introduces the lifecycle of machine learning and its tools/ecosystems. For more detail about WeCloudData's machine learning course please visit: https://weclouddata.com/data-science/
Introduction to Deep Learning and AI at Scale for ManagersDataWorks Summit
Deep Learning and the new wave of AI are inevitably coming to your business area. If you are a manager and if you are trying to make sense of all the buzzwords, this session is four you. We will show you what is Deep Learning in a way that you will understand how it works and how can you apply it. We then expand the scope and apply the deep learning and AI techniques in the Big Data context. You will learn about things that don't work out so well, the risks and challenges in both applying and developing with deep learning and AI technologies. We conclude with practical guidance on how to add the exciting deep learning and AI capabilities to your next project.
Outline:
- The path to Deep Learning
- From machine learning to Deep Learning
- But how does it work?
- Deep Learning architectures
- Deep Learning applications
- Deep Learning at scale
- Running AI at scale
- Deep learning at Scale using Spark
- The trouble with AI
- Application challenges
- Development challenges
- How to start your first Deep Learning project
I was meaning to put this talk up for grabs for some time now, but kept forgetting. I was invited to give the keynote speech for the Microstrategy World 2008 conference. The talk was very well received, so here it is.
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...Big Data Spain
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc.
https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml
Big Data Spain Conference
16th -17th November - Kinépolis Madrid
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
I presented these slides as a keynote at the Enterprise Intelligence Workshop at KDD2016 in San francisco.
In these slides, I describe our work towards developing a Maslow's Hierarchy for Human in the Loop Data Analytics!
Big Data and Data Intensive Computing: Use CasesJongwook Woo
This invited talk was held by LG Data Mining Lab at LG R&D center, Woomyun-dong, Seoul, Korea. Introduces the emerging Hadoop ecosystems: Giraph, Spark, Shark, Flume and the use cases using Big Data in Korea and US. And, illustrates the importance of taking training.
Big Data and Data Intensive Computing on NetworksJongwook Woo
Big Data on Networks with Hadoop and its ecosystems (Giraph, Flume,...) at Korea Institute of Science and Technology Information. Illustrates some possible approach on Networks
How To Interview a Data Scientist
Daniel Tunkelang
Presented at the O'Reilly Strata 2013 Conference
Video: https://www.youtube.com/watch?v=gUTuESHKbXI
Interviewing data scientists is hard. The tech press sporadically publishes “best” interview questions that are cringe-worthy.
At LinkedIn, we put a heavy emphasis on the ability to think through the problems we work on. For example, if someone claims expertise in machine learning, we ask them to apply it to one of our recommendation problems. And, when we test coding and algorithmic problem solving, we do it with real problems that we’ve faced in the course of our day jobs. In general, we try as hard as possible to make the interview process representative of actual work.
In this session, I’ll offer general principles and concrete examples of how to interview data scientists. I’ll also touch on the challenges of sourcing and closing top candidates.
Python's Role in the Future of Data AnalysisPeter Wang
Why is "big data" a challenge, and what roles do high-level languages like Python have to play in this space?
The video of this talk is at: https://vimeo.com/79826022
This presentation is prepared by one of our renowned tutor "Suraj"
If you are interested to learn more about Big Data, Hadoop, data Science then join our free Introduction class on 14 Jan at 11 AM GMT. To register your interest email us at info@uplatz.com
Introduction to Big Data and AI for Business Analytics and PredictionJongwook Woo
Big Data has been popular last 10 years using Hadoop and Spark for data analysis and prediction with large scale data sets in distributed parallel computing systems. Its platform has expanded using NoSQL DB and Search Engine as well and has been more popular along cloud computing. Then, Deep Learning has become a buzzword past several years using GPU and Big Data. It makes even small companies and labs to own supercomputers with a small amount of budgets, which is the situation of “Dream Comes True” in the IT and business. In this talk, the history and trends of Big Data and AI platforms are introduced and how predictive analysis should be presented in Business using Big Data & AI.
MIT Deep Learning Basics: Introduction and Overview by Lex FridmanPeerasak C.
MIT Deep Learning Basics: Introduction and Overview by Lex Fridman
Watch video: https://youtu.be/O5xeyoRL95U
An introductory lecture for MIT course 6.S094 on the basics of deep learning including a few key ideas, subfields, and the big picture of why neural networks have inspired and energized an entire new generation of researchers. For more lecture videos on deep learning, reinforcement learning (RL), artificial intelligence (AI & AGI), and podcast conversations, visit our website or follow TensorFlow code tutorials on our GitHub repo.
INFO:
Website: https://deeplearning.mit.edu
CONNECT:
- If you enjoyed this video, please subscribe to this channel.
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/lexfridman
- Instagram: https://www.instagram.com/lexfridman
Presentation - "Dealing with uncertainty in fintech using AI" by Evgeny Savin.
Evgeny is a Senior Data Scientist at smava for the last 3,5 years. His work focused on the design and implementation of the ML engine that helps customers find the best loans from one of the smava partner banks. Previously, Evgeny worked as a data scientist at HERE and Paypal.
Introduction to Machine Learning - WeCloudDataWeCloudData
In this talk, WeCloudData introduces the lifecycle of machine learning and its tools/ecosystems. For more detail about WeCloudData's machine learning course please visit: https://weclouddata.com/data-science/
Introduction to Deep Learning and AI at Scale for ManagersDataWorks Summit
Deep Learning and the new wave of AI are inevitably coming to your business area. If you are a manager and if you are trying to make sense of all the buzzwords, this session is four you. We will show you what is Deep Learning in a way that you will understand how it works and how can you apply it. We then expand the scope and apply the deep learning and AI techniques in the Big Data context. You will learn about things that don't work out so well, the risks and challenges in both applying and developing with deep learning and AI technologies. We conclude with practical guidance on how to add the exciting deep learning and AI capabilities to your next project.
Outline:
- The path to Deep Learning
- From machine learning to Deep Learning
- But how does it work?
- Deep Learning architectures
- Deep Learning applications
- Deep Learning at scale
- Running AI at scale
- Deep learning at Scale using Spark
- The trouble with AI
- Application challenges
- Development challenges
- How to start your first Deep Learning project
I was meaning to put this talk up for grabs for some time now, but kept forgetting. I was invited to give the keynote speech for the Microstrategy World 2008 conference. The talk was very well received, so here it is.
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...Big Data Spain
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc.
https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml
Big Data Spain Conference
16th -17th November - Kinépolis Madrid
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
Dashboards are Dumb Data - Why Smart Analytics Will Kill Your KPIsLuciano Pesci, PhD
Organizations of every size have access to data dashboard technology, yet none of the solutions have delivered on their hype and right now across the world executives and analysts are staring at a dashboard and thinking the same thing, ""so what?""
The failure of dashboards to deliver meaningful insights is inherent in their simplicity: they only show surface level information, and not the relationships between data points that really drive the fate of your organization.
But all is not lost! By combining the right mix of technology and human expertise in business strategy, research and data mining you can embrace the smart analytics movement, and start accessing insights that grow your company and your competitive position.
You can watch the accompanying webinar here: https://youtu.be/RdOcPxv9wLs
Many companies are starting or expanding their use of data mining and machine learning. This presentation covers seven practical ideas for encouraging advanced analytics in your organization.
Slide deck used to foster discussion with museum colleagues about the current trends, ideas, aspirations and challenges of digital strategy and implementation. Includes a short list of concerns and (exciting or even daunting) future trends. Nothing comprehensive here, just some jumping off points for discussion and debate.
This presentation was provided by Jake Zarnegar of Silverchair, during the NFAIS Forethought event "Artificial Intelligence #2 – Processes for Media Analysis and Extraction" The webinar was held on May 20, 2020.
Recommender systems support the decision making processes of customers with personalized suggestions. These widely used systems influence the daily life of almost everyone across domains like ecommerce, social media, and entertainment. However, the efficient generation of relevant recommendations in large-scale systems is a very complex task. In order to provide personalization, engines and algorithms need to capture users’ varying tastes and find mostly nonlinear dependencies between them and a multitude of items. Enormous data sparsity and ambitious real-time requirements further complicate this challenge. At the same time, deep learning has been proven to solve complex tasks like object or speech recognition where traditional machine learning failed or showed mediocre performance.
Join Marcel Kurovski to explore a use case for vehicle recommendations at mobile.de, Germany’s biggest online vehicle market. Marcel shares a novel regularization technique for the optimization criterion and evaluates it against various baselines. To achieve high scalability, he combines this method with strategies for efficient candidate generation based on user and item embeddings—providing a holistic solution for candidate generation and ranking.
The proposed approach outperforms collaborative filtering and hybrid collaborative-content-based filtering by 73% and 143% for MAP@5. It also scales well for millions of items and users returning recommendations in tens of milliseconds.
Event: O'Reilly Artificial Intelligence Conference, New York, 18.04.2019
Speaker: Marcel Kurovski, inovex GmbH
Mehr Tech-Vorträge: inovex.de/vortraege
Mehr Tech-Artikel: inovex.de/blog
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptxISSIP
20240103 HICSS Panel
Ethical and legal implications raised by Generative AI and Augmented Reality in the workplace.
Souren Paul - https://www.linkedin.com/in/souren-paul-a3bbaa5/
Event: https://kmeducationhub.de/hawaii-international-conference-on-system-sciences-hicss/
Lessons learned from 3 (going on 4) generations of Jupyter use cases at O'Reilly Media. In particular, about "Oriole" tutorials which combine video with Jupyter notebooks, Docker containers, backed by services managed on a cluster by Marathon, Mesos, Redis, and Nginx.
https://conferences.oreilly.com/fluent/fl-ca/public/schedule/detail/62859
https://conferences.oreilly.com/velocity/vl-ca/public/schedule/detail/62858
Strata UK 2017. Computable content leverages Jupyter notebooks to make learning materials more powerful by integrating compute engines, data sources, etc. O’Reilly Media extended this approach to create the new Oriole Online Tutorial medium, publishing notebooks from authors along with video timelines. (A free public tutorial, Regex Golf, by Peter Norvig demonstrates what’s possible with this technology integration.) Each user session launches a Docker container on a Mesos cluster for fully personalized compute environments. The UX is entirely browser based.
See 2020 update: https://derwen.ai/s/h88s
SF Python Meetup, 2017-02-08
https://www.meetup.com/sfpython/events/237153246/
PyTextRank is a pure Python open source implementation of *TextRank*, based on the [Mihalcea 2004 paper](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) -- a graph algorithm which produces ranked keyphrases from texts. Keyphrases generally more useful than simple keyword extraction. PyTextRank integrates use of `TextBlob` and `SpaCy` for NLP analysis of texts, including full parse, named entity extraction, etc. It also produces auto-summarization of texts, making use of an approximation algorithm, `MinHash`, for better performance at scale. Overall, the package is intended to complement machine learning approaches -- specifically deep learning used for custom search and recommendations -- by developing better feature vectors from raw texts. This package is in production use at O'Reilly Media for text analytics.
Use of standards and related issues in predictive analyticsPaco Nathan
My presentation at KDD 2016 in SF, in the "Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data" morning track about PMML and PFA http://dmg.org/kdd2016.html
Presented 2015-08-24 at SF Bay ACM, held at the eBay south campus in San Jose.
http://meetup.com/SF-Bay-ACM/events/221693508/
Project Jupiter https://jupyter.org/ evolved from IPython notebooks, and now supports a wide variety of programming language back-ends. Notebooks have proven to be effective tools used in Data Science, providing convenient packages for what Don Knuth coined as "literate programming" in the 1980s: code plus exposition in markdown. Results of running the code appear in-line as interactive graphics -- all packaged as collaborative, web-based documents. Some have said that the introduction of cloud-based notebooks is nearly as large of a fundamental change in software practice as the introduction of spreadsheets.
O'Reilly Media has been considering the question, "What comes after books and video?" Or, as one might imagine more pointedly, what comes after Kindle? To that point we have collaborated with Project Jupyter to integrate notebooks into our content management process, allowing authors to generate articles, tutorials, reports, and other media products as notebooks that also incorporate video segments. Code dependencies are containerized using Docker, and all of the content gets managed in Git repositories. We have added another layer, an open source project called Thebe that provides a kind of "media player" for embedding the containerized notebooks into web pages
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
http://www.meetup.com/Seattle-Data-Science/events/223445403/
Almost a dozen almost-truisms about Data that almost everyone should consider carefully as they embark on a journey into Data Science. There are a number of preconceptions about working with data at scale where the realities beg to differ. This talk estimates that number to be at least eleven, through probably much larger. At least that number has a great line from a movie. Let's consider some of the less-intuitive directions in which this field is heading, along with likely consequences and corollaries -- especially for those who are just now beginning to study about the technologies, the processes, and the people involved.
Microservices, containers, and machine learningPaco Nathan
http://www.oscon.com/open-source-2015/public/schedule/detail/41579
In this presentation, an open source developer community considers itself algorithmically. This shows how to surface data insights from the developer email forums for just about any Apache open source project. It leverages advanced techniques for natural language processing, machine learning, graph algorithms, time series analysis, etc. As an example, we use data from the Apache Spark email list archives to help understand its community better; however, the code can be applied to many other communities.
Exsto is an open source project that demonstrates Apache Spark workflow examples for SQL-based ETL (Spark SQL), machine learning (MLlib), and graph algorithms (GraphX). It surfaces insights about developer communities from their email forums. Natural language processing services in Python (based on NLTK, TextBlob, WordNet, etc.), gets containerized and used to crawl and parse email archives. These produce JSON data sets, then we run machine learning on a Spark cluster to find out insights such as:
* What are the trending topic summaries?
* Who are the leaders in the community for various topics?
* Who discusses most frequently with whom?
This talk shows how to use cloud-based notebooks for organizing and running the analytics and visualizations. It reviews the background for how and why the graph analytics and machine learning algorithms generalize patterns within the data — based on open source implementations for two advanced approaches, Word2Vec and TextRank The talk also illustrates best practices for leveraging functional programming for big data.
https://www.eventbrite.com/e/talk-by-paco-nathan-graph-analytics-in-spark-tickets-17173189472
Big Brains meetup hosted by BloomReach, 2015-06-04
Case study / demo of a large-scale graph analytics project, leveraging GraphX in Apache Spark to surface insights about open source developer communities — based on data mining of their email forums. The project works with any Apache email archive, applying NLP and machine learning techniques to analyze message threads, then constructs a large graph. Graph analytics, based on concise Scala coding examples in Spark, surface themes and interactions within the community. Results are used as feedback for respective developer communities, such as leaderboards, etc. As an example, we will examine analysis of the Spark developer community itself.
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
"Real-Time Analytics with Spark Streaming" presented at QCon São Paulo, 2015-03-26
http://qconsp.com/presentation/real-time-analytics-spark-streaming
This talk presents an overview of Spark and its history and applications, then focuses on the Spark Streaming component used for real-time analytics. We compare it with earlier frameworks such as MillWheel and Storm, and explore industry motivations for open-source micro-batch streaming at scale.
The talk will include demos for streaming apps that include machine-learning examples. We also consider public case studies of production deployments at scale.
We’ll review the use of open-source sketch algorithms and probabilistic data structures that get leveraged in streaming – for example, the trade-off of 4% error bounds on real-time metrics for two orders of magnitude reduction in required memory footprint of a Spark app.
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
Spark and Databricks component of the O'Reilly Media webcast "2015 Data Preview: Spark, Data Visualization, YARN, and More", as a preview of the 2015 Strata + Hadoop World conference in San Jose http://www.oreilly.com/pub/e/3289
Microservices, Containers, and Machine LearningPaco Nathan
Session talk for Data Day Texas 2015, showing GraphX and SparkSQL for text analytics and graph analytics of an Apache developer email list -- including an implementation of TextRank in Spark.
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
Los Angeles Apache Spark Users Group 2014-12-11 http://meetup.com/Los-Angeles-Apache-Spark-Users-Group/events/218748643/
A look ahead at Spark Streaming in Spark 1.2 and beyond, with case studies, demos, plus an overview of approximation algorithms that are useful for real-time analytics.
How Apache Spark fits into the Big Data landscapePaco Nathan
How Apache Spark fits into the Big Data landscape http://www.meetup.com/Washington-DC-Area-Spark-Interactive/events/217858832/
2014-12-02 in Herndon, VA and sponsored by Raytheon, Tetra Concepts, and MetiStream
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Humans in a loop: Jupyter notebooks as a front-end for AI
1. Humans
in
a
loop:
Jupyter
notebooks
as
a
front-‐end
for
AI
Paco
Nathan
@pacoid
Dir,
Learning
Group
@
O’Reilly
Media
JupyterCon
2017-‐08-‐24
2. Framing
2
Imagine
having
a
mostly-‐automated
system
where
people
and
machines
collaborate
together…
May
sound
a
bit
Sci-‐Fi
–
though
arguably
commonplace.
One
challenge
is
whether
we
can
advance
beyond
rote
tasks.
Instead
of
simply
running
code
libraries,
can
machines
make
difficult
decisions,
exercise
judgement
in
complex
situations?
Can
we
build
systems
in
which
people
who
aren’t
AI
experts
can
“teach”
machines
to
perform
complex
work
based
on
examples,
not
code?
3. Research
questions
▪ How
do
we
personalize
learning
experiences,
across
ebooks,
videos,
conferences,
computable
content,
live
online
courses,
case
studies,
expert
AMAs,
etc.
▪ How
do
we
help
experts
—
by
definition,
really
busy
people
—
share
knowledge
with
their
peers
in
industry?
▪ How
do
we
manage
the
role
of
editors
at
human
scale,
while
technology
and
delivery
media
evolve
rapidly?
▪ How
do
we
help
organizations
learn
and
transform
continuously?
3
7. Machine
learning
supervised
ML:
▪ take
a
dataset
where
each
element
has
a
label
▪ train
models
on
a
portion
of
the
data
to
predict
the
labels,
test
on
the
remainder
▪ deep
learning
is
a
popular
example,
though
only
if
you
have
lots
of
labeled
training
data
available
8. Machine
learning
unsupervised
ML:
▪ run
lots
of
unlabeled
data
through
an
algorithm
to
detect
“structure”
or
embedding
▪ for
example,
clustering
algorithms
such
as
K-‐means
▪ unsupervised
approaches
for
AI
are
an
open
research
question
9. Active
learning
special
case
of
semi-‐supervised
ML:
▪ send
difficult
calls
/
edge
cases
to
experts;
let
algorithms
handle
routine
decisions
▪ works
well
in
use
cases
which
have
lots
of
inexpensive,
unlabeled
data
▪ e.g.,
abundance
of
content
to
be
classified,
where
the
cost
of
labeling
is
the
expense
10. Active
learning
Real-‐World
Active
Learning:
Applications
and
Strategies
for
Human-‐in-‐the-‐Loop
Machine
Learning
radar.oreilly.com/2015/02/human-‐in-‐the-‐loop-‐
machine-‐learning.html
Ted
Cuzzillo
O’Reilly
Media,
2015-‐02-‐05
Develop
a
policy
for
how
human
experts
select
exemplars:
▪ bias
toward
labels
most
likely
to
influence
the
classifier
▪ bias
toward
ensemble
disagreement
▪ bias
toward
denser
regions
of
training
data
10
11. Active
learning
Data
preparation
in
the
age
of
deep
learning
oreilly.com/ideas/data-‐preparation-‐in-‐the-‐
age-‐of-‐deep-‐learning
Luke
Biewald
CrowdFlower
O’Reilly
Data
Show,
2017-‐05-‐04
send
human
workers
cases
where
machine
learning
algorithms
signal
uncertainty
(low
probability
scores)
…or
when
your
ensemble
of
ML
algorithms
signals
disagreement
11
12. Design
pattern:
Human-‐in-‐the-‐loop
Building
a
business
that
combines
human
experts
and
data
science
oreilly.com/ideas/building-‐a-‐business-‐that-‐
combines-‐human-‐experts-‐and-‐data-‐science-‐2
Eric
Colson
StitchFix
O’Reilly
Data
Show,
2016-‐01-‐28
“what
machines
can’t
do
are
things
around
cognition,
things
that
have
to
do
with
ambient
information,
or
appreciation
of
aesthetics,
or
even
the
ability
to
relate
to
another
human”
12
13. Design
pattern:
Human-‐in-‐the-‐loop
Strategies
for
integrating
people
and
machine
learning
in
online
systems
safaribooksonline.com/library/view/oreilly-‐
artificial-‐intelligence/9781491976289/
video311857.html
Jason
Laska
Clara
Labs
The
AI
Conf,
2017-‐06-‐29
how
to
create
a
two-‐sided
marketplace
where
machines
and
people
compete
on
a
spectrum
of
relative
expertise
and
capabilities
13
14. Design
pattern:
Human-‐in-‐the-‐loop
Building
human-‐assisted
AI
applications
oreilly.com/ideas/building-‐human-‐
assisted-‐ai-‐applications
Adam
Marcus
B12
O’Reilly
Data
Show,
2016-‐08-‐25
Orchestra:
a
platform
for
building
human-‐
assisted
AI
applications,
e.g.,
to
create
business
websites
https://github.com/b12io/orchestra
example
http://www.coloradopicked.com/
14
15. Design
pattern:
Flash
teams
Expert
Crowdsourcing
with
Flash
Teams
hci.stanford.edu/publications/2014/
flashteams/flashteams-‐uist2014.pdf
Daniela
Retelny,
et
al.
Stanford
HCI
“A
flash
team
is
a
linked
set
of
modular
tasks
that
draw
upon
paid
experts
from
the
crowd,
often
three
to
six
at
a
time,
on
demand”
http://stanfordhci.github.io/flash-‐teams/
15
16. Weak
supervision
/
Data
programming
Creating
large
training
data
sets
quickly
oreilly.com/ideas/creating-‐large-‐training-‐
data-‐sets-‐quickly
Alex
Ratner
Stanford
O’Reilly
Data
Show,
2017-‐06-‐08
Snorkel:
“weak
supervision”
and
“data
programming”
as
another
instance
of
human-‐in-‐the-‐loop
github.com/HazyResearch/snorkel
conferences.oreilly.com/strata/strata-‐ny/public/
schedule/detail/61849
16
17. Reinforcement
learning
Reinforcement
learning
explained
oreilly.com/ideas/reinforcement-‐
learning-‐explained
Junling
Hu
AI
Frontiers
O’Reilly
Radar,
2016-‐12-‐08
learning
to
act
based
on
long-‐term
payoffs
often
as
rewards/punishments
of
an
actor
within
a
simulation
17
19. AI
in
Media
▪ content
which
can
represented
as
text
can
be
parsed
by
NLP,
then
manipulated
by
available
AI
tooling
▪ labeled
images
get
really
interesting
▪ text
or
images
within
a
context
have
inherent
structure
▪ representation
of
that
kind
of
structure
is
rare
in
the
Media
vertical
–
so
far
19
21. Which
parts
do
people
or
machines
do
best?
21
team
goal:
maintain
structural
correspondence
between
the
layers
big
win
for
AI:
inferences
across
the
graph
human
scale
primary
structure
control
points
testability
machine
generated
data
products
~80%
of
the
graph
22. Ontology
▪ provides
context
which
Deep
Learning
lacks
▪ aka,
“knowledge
graph”
–
a
computable
thesaurus
▪ maps
the
semantics
of
business
relationships
▪ S/V/O:
“nouns”,
some
“verbs”,
a
few
“adjectives”
▪ difficult
work,
a
relatively
expensive
investment,
potentially
high
ROI
▪ conversational
interfaces
(e.g.,
Google
Assistant)
improve
UX
by
importing
ontologies
22
26. Disambiguating
contexts
26
Suppose
someone
publishes
a
book
which
uses
the
term
`IOS`:
are
they
talking
about
an
operating
system
for
an
Apple
iPhone,
or
about
an
operating
system
for
a
Cisco
router?
We
handle
lots
of
content
about
both.
Disambiguating
those
contexts
is
important
for
good
UX
in
personalized
learning.
In
other
words,
how
do
machines
help
people
distinguish
that
content
within
search?
Potentially
a
good
case
for
deep
learning,
except
for
the
lack
of
labeled
data
at
scale.
27. Disambiguation
in
content
discovery
27
Consider
searching
for
the
term
`react`
on
Google.
The
first
page
of
results:
▪ acting
coaches
▪ video
games
▪ student
engagement
▪ children’s
charities
▪ UI
web
components
▪ surveys
28. Active
learning
through
Jupyter
28
Jupyter
notebooks
are
used
to
manage
ML
pipelines
for
disambiguation,
where
machines
and
people
collaborate:
▪ notebooks
as
one
part
configuration
file,
one
part
data
sample,
one
part
structured
log,
one
part
data
visualization
tool
▪ ML
based
on
examples,
not
feature
engineering,
model
parameters,
etc.
29. Active
learning
through
Jupyter
1. Experts
use
notebooks
to
provide
examples
of
book
chapters,
video
segments,
etc.,
for
each
key
phrase
that
has
overlapping
contexts
2. Machines
build
ensemble
ML
models
based
on
those
examples,
updating
notebooks
with
model
evaluation
3. Machines
attempt
to
annotate
labels
for
millions
of
pieces
of
content,
e.g.,
`AlphaGo`,
`Golang`,
versus
a
mundane
use
of
the
verb
`go`
4. Disambiguation
can
run
mostly
automated,
in
parallel
at
scale
–
through
integration
with
Apache
Spark
5. In
cases
where
ensembles
disagree,
ML
pipelines
defer
to
human
experts
who
make
judgement
calls,
providing
further
examples
6. New
examples
go
into
training
ML
pipelines
to
build
better
models
7. Rinse,
lather,
repeat
31. Active
learning
through
Jupyter
▪ Jupyter
notebooks
allow
human
experts
to
access
the
internals
of
a
mostly
automated
ML
pipeline,
rapidly
▪ Stated
another
way,
both
the
machines
and
the
people
become
collaborators
on
shared
documents
▪ Anticipates
upcoming
collaborative
document
features
in
JupyterLab
32. Active
learning
through
Jupyter
32
Open
source
nbtransom
package:
▪ https://github.com/ceteri/nbtransom
▪ https://pypi.python.org/pypi/nbtransom
▪ based
on
use
of
nbformat
and
pandas
▪ notebook
as
both
Py
data
store
and
analytics
▪ custom
“pretty-‐print”
helps
with
use
of
Git
for
diffs,
commits,
etc.
33. Nuances
▪ No
Free
Lunch
theorem:
it’s
better
to
err
on
the
side
of
less
false
positives
/
more
false
negatives
for
this
use
case
▪ Bias
toward
exemplars
most
likely
to
influence
the
classifier
▪ Potentially,
the
“experts”
may
be
Customer
Service
staff
who
review
edge
cases
within
search
results
or
recommended
content
–
as
an
integral
part
of
our
UI
–
then
re-‐train
the
ML
pipelines
through
examples
35. Human-‐in-‐the-‐loop
as
management
strategy
personal
op-‐ed:
the
“game”
isn’t
to
replace
people
–
instead
it’s
about
leveraging
AI
to
augment
staff,
so
organizations
can
retain
people
with
valuable
domain
expertise,
making
their
contributions
and
expertise
even
more
vital
37. Acknowledgements
37
Many
thanks
to
people
who’ve
helped
with
this
work:
▪ Taylor
Martin
▪ Eszti
Schoell
▪ Fernando
Perez
▪ Andrew
Odewahn
▪ Scott
Murray
▪ John
Allwine
▪ Pascal
Honscher
38. Strata
Data
NY,
Sep
25-‐28
SG,
Dec
4-‐7
SJ,
Mar
5-‐8,
2018
UK,
May
21-‐24,
2018
The
AI
Conf
SF,
Sep
17-‐20
NY,
Apr
29-‐May
2,
2018
OSCON
(returns
to
Portland!)
PDX,
Jul
16-‐19,
2018
38
39. 39
Get
Started
with
NLP
in
Python
Just
Enough
Math Building
Data
Science
Teams
Hylbert-‐Speys How
Do
You
Learn?
updates,
reviews,
conference
summaries…
liber118.com/pxn/
@pacoid