Alex Smola is the Manager of the Cloud Machine Learning Platform at Amazon. Prior to his role at Amazon, Smola was a Professor in the Machine Learning Department of Carnegie Mellon University and cofounder and CEO of Marianas Labs. Prior to that he worked at Google Strategic Technologies, Yahoo Research, and National ICT Australia. Prior to joining CMU, he was professor at UC Berkeley and the Australian National University. Alex obtained his PhD at TU Berlin in 1998. He has published over 200 papers and written or coauthored 5 books.
Abstract summary
Personalization and Scalable Deep Learning with MXNET: User return times and movie preferences are inherently time dependent. In this talk I will show how this can be accomplished efficiently using deep learning by employing an LSTM (Long Short Term Model). Moreover, I will show how to train large scale distributed parallel models using MXNet efficiently. This includes a brief overview of key components of defining networks, of optimization, and a walkthrough of the steps required to allocate machines, and to train a model.
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
Fast, Cheap and Deep – Scaling Machine Learning: Distributed high throughput machine learning is both a challenge and a key enabling technology. Using a Parameter Server template we are able to distribute algorithms efficiently over multiple GPUs and in the cloud. This allows us to design very fast recommender systems, factorization machines, classifiers, and deep networks. This degree of scalability allows us to tackle computationally expensive problems efficiently, yielding excellent results e.g. in visual question answering.
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
Understanding Deep Learning for Big Data: The complexity and scale of big data impose tremendous challenges for their analysis. Yet, big data also offer us great opportunities. Some nonlinear phenomena, features or relations, which are not clear or cannot be inferred reliably from small and medium data, now become clear and can be learned robustly from big data. Typically, the form of the nonlinearity is unknown to us, and needs to be learned from data as well. Being able to harness the nonlinear structures from big data could allow us to tackle problems which are impossible before or obtain results which are far better than previous state-of-the-arts.
Nowadays, deep neural networks are the methods of choice when it comes to large scale nonlinear learning problems. What makes deep neural networks work? Is there any general principle for tackling high dimensional nonlinear problems which we can learn from deep neural works? Can we design competitive or better alternatives based on such knowledge? To make progress in these questions, my machine learning group performed both theoretical and experimental analysis on existing and new deep learning architectures, and investigate three crucial aspects on the usefulness of the fully connected layers, the advantage of the feature learning process, and the importance of the compositional structures. Our results point to some promising directions for future research, and provide guideline for building new deep learning models.
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
Teaching K-Means New Tricks: Over 50 years old, the k-means algorithm remains one of the most popular clustering algorithms. In this talk we’ll cover some recent developments, including better initialization, the notion of coresets, clustering at scale, and clustering with outliers.
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf
Corinna Cortes is a Danish computer scientist known for her contributions to machine learning. She is currently the Head of Google Research, New York. Cortes is a recipient of the Paris Kanellakis Theory and Practice Award for her work on theoretical foundations of support vector machines.
Cortes received her M.S. degree in physics from Copenhagen University in 1989. In the same year she joined AT&T Bell Labs as a researcher and remained there for about ten years. She received her Ph.D. in computer science from the University of Rochester in 1993. Cortes currently serves as the Head of Google Research, New York. She is an Editorial Board member of the journal Machine Learning.
Cortes’ research covers a wide range of topics in machine learning, including support vector machines and data mining. In 2008, she jointly with Vladimir Vapnik received the Paris Kanellakis Theory and Practice Award for the development of a highly effective algorithm for supervised learning known as support vector machines (SVM). Today, SVM is one of the most frequently used algorithms in machine learning, which is used in many practical applications, including medical diagnosis and weather forecasting.
Abstract Summary:
Harnessing Neural Networks:
Deep learning has demonstrated impressive performance gain in many machine learning applications. However, unveiling and realizing these performance gains is not always straightforward. Discovering the right network architecture is critical for accuracy and often requires a human in the loop. Some network architectures occasionally produce spurious outputs, and the outputs have to be restricted to meet the needs of an application. Finally, realizing the performance gain in a production system can be difficult because of extensive inference times.
In this talk we discuss methods for making neural networks efficient in production systems. We also discuss an efficient method for automatically learning the network architecture, called AdaNet. We provide theoretical arguments for the algorithm and present experimental evidence for its effectiveness.
Introduction to Deep Learning with Pythonindico data
A presentation by Alec Radford, Head of Research at indico Data Solutions, on deep learning with Python's Theano library.
The emphasis of the presentation is high performance computing, natural language processing (using recurrent neural nets), and large scale learning with GPUs.
Video of the talk available here: https://www.youtube.com/watch?v=S75EdAcXHKk
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
Fast, Cheap and Deep – Scaling Machine Learning: Distributed high throughput machine learning is both a challenge and a key enabling technology. Using a Parameter Server template we are able to distribute algorithms efficiently over multiple GPUs and in the cloud. This allows us to design very fast recommender systems, factorization machines, classifiers, and deep networks. This degree of scalability allows us to tackle computationally expensive problems efficiently, yielding excellent results e.g. in visual question answering.
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
Understanding Deep Learning for Big Data: The complexity and scale of big data impose tremendous challenges for their analysis. Yet, big data also offer us great opportunities. Some nonlinear phenomena, features or relations, which are not clear or cannot be inferred reliably from small and medium data, now become clear and can be learned robustly from big data. Typically, the form of the nonlinearity is unknown to us, and needs to be learned from data as well. Being able to harness the nonlinear structures from big data could allow us to tackle problems which are impossible before or obtain results which are far better than previous state-of-the-arts.
Nowadays, deep neural networks are the methods of choice when it comes to large scale nonlinear learning problems. What makes deep neural networks work? Is there any general principle for tackling high dimensional nonlinear problems which we can learn from deep neural works? Can we design competitive or better alternatives based on such knowledge? To make progress in these questions, my machine learning group performed both theoretical and experimental analysis on existing and new deep learning architectures, and investigate three crucial aspects on the usefulness of the fully connected layers, the advantage of the feature learning process, and the importance of the compositional structures. Our results point to some promising directions for future research, and provide guideline for building new deep learning models.
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
Teaching K-Means New Tricks: Over 50 years old, the k-means algorithm remains one of the most popular clustering algorithms. In this talk we’ll cover some recent developments, including better initialization, the notion of coresets, clustering at scale, and clustering with outliers.
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf
Corinna Cortes is a Danish computer scientist known for her contributions to machine learning. She is currently the Head of Google Research, New York. Cortes is a recipient of the Paris Kanellakis Theory and Practice Award for her work on theoretical foundations of support vector machines.
Cortes received her M.S. degree in physics from Copenhagen University in 1989. In the same year she joined AT&T Bell Labs as a researcher and remained there for about ten years. She received her Ph.D. in computer science from the University of Rochester in 1993. Cortes currently serves as the Head of Google Research, New York. She is an Editorial Board member of the journal Machine Learning.
Cortes’ research covers a wide range of topics in machine learning, including support vector machines and data mining. In 2008, she jointly with Vladimir Vapnik received the Paris Kanellakis Theory and Practice Award for the development of a highly effective algorithm for supervised learning known as support vector machines (SVM). Today, SVM is one of the most frequently used algorithms in machine learning, which is used in many practical applications, including medical diagnosis and weather forecasting.
Abstract Summary:
Harnessing Neural Networks:
Deep learning has demonstrated impressive performance gain in many machine learning applications. However, unveiling and realizing these performance gains is not always straightforward. Discovering the right network architecture is critical for accuracy and often requires a human in the loop. Some network architectures occasionally produce spurious outputs, and the outputs have to be restricted to meet the needs of an application. Finally, realizing the performance gain in a production system can be difficult because of extensive inference times.
In this talk we discuss methods for making neural networks efficient in production systems. We also discuss an efficient method for automatically learning the network architecture, called AdaNet. We provide theoretical arguments for the algorithm and present experimental evidence for its effectiveness.
Introduction to Deep Learning with Pythonindico data
A presentation by Alec Radford, Head of Research at indico Data Solutions, on deep learning with Python's Theano library.
The emphasis of the presentation is high performance computing, natural language processing (using recurrent neural nets), and large scale learning with GPUs.
Video of the talk available here: https://www.youtube.com/watch?v=S75EdAcXHKk
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Daniel Shank, Data Scientist, Talla at MLconf SF 2016MLconf
Neural Turing Machines: Perils and Promise: Daniel Shank is a Senior Data Scientist at Talla, a company developing a platform for intelligent information discovery and delivery. His focus is on developing machine learning techniques to handle various business automation tasks, such as scheduling, polls, expert identification, as well as doing work on NLP. Before joining Talla as the company’s first employee in 2015, Daniel worked with TechStars Boston and did consulting work for ThriveHive, a small business focused marketing company in Boston. He studied economics at the University of Chicago.
Language translation with Deep Learning (RNN) with TensorFlowS N
The author is going to take you into the realm of Recurrent Neural Network (RNN). He will be training a sequence to sequence model on a dataset of English and French sentences that can translate new (unseen) sentences from English to French.
This will be a walkthrough of an end to end technique to train a Deep RNN model. You will learn to build various components necessary to build a Sequence-to-Sequence model.
You will learn about the fundamentals of Deep Learning, mainly RNN, concepts that will be required in this solution. A familiarity of Deep Learning concepts would be handy, but most of the concepts used in this example will be covered during the demo.
Technologies to be used:
Python, Jupyter, TensorFlow, FloydHub
Source code: https://github.com/syednasar/deeplearning/blob/master/language-translation/dlnd_language_translation.ipynb
...
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
Academic project based on developing a LSTM distributing it on Spark and using Tensorflow for numerical operations.
Source code: https://github.com/EmanuelOverflow/LSTM-TensorSpark
How to win data science competitions with Deep LearningSri Ambati
Note: Please download the slides first, otherwise some links won't work!
How to win kaggle style data science competitions and influence decisions with R, Deep Learning and H2O's fast algorithms.
We take a few public and kaggle datasets and model to win competitions on accuracy and scoring speed.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Slides from the presentation given at M^3 conference: http://www.mcubed.london/
The idea is to use 3 statements to describe and start to work with the TensorFlow library.
Applying your Convolutional Neural NetworksDatabricks
Part 3 of the Deep Learning Fundamentals Series, this session starts with a quick primer on activation functions, learning rates, optimizers, and backpropagation. Then it dives deeper into convolutional neural networks discussing convolutions (including kernels, local connectivity, strides, padding, and activation functions), pooling (or subsampling to reduce the image size), and fully connected layer. The session also provides a high-level overview of some CNN architectures. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)Amazon Web Services
For many companies, recommendation systems solve important machine learning problems. But as recommendation systems grow to millions of users and millions of items, they pose significant challenges when deployed at scale. The user-item matrix can have trillions of entries (or more), most of which are zero. To make common ML techniques practical, sparse data requires special techniques. Learn how to use MXNet to build neural network models for recommendation systems that can scale efficiently to large sparse datasets.
by Vikram Madan, Sr. Product Manager, AWS Deep Learning
In this workshop, we will provide cover deep learning fundamentals and focus on the powerful and scalable Apache MXNet open source deep learning framework. At the end of this tutorial you’ll be able to train your own deep neural network and fine tune existing state of the art models for image and object recognition. We’ll also deep dive on setting up your deep learning infrastructure on AWS and model deployment on AWS Lambda.
Slides to support Austin Machine Learning Meetup, 1/19/2015.
Overview of techniques of recent Kaggle code to perform online logistic regression with FTRL-proximal (SGD, L1/L2 regularization) and hash trick.
Introduction to deep learning @ Startup.ML by Andres RodriguezIntel Nervana
Deep learning is unlocking tremendous economic value across various market sectors. Individual data scientists can draw from several open source frameworks and basic hardware resources during the very initial investigative phases but quickly require significant hardware and software resources to build and deploy production models. Intel offers various software and hardware to support a diversity of workloads and user needs. Intel Nervana delivers a competitive deep learning platform to make it easy for data scientists to start from the iterative, investigatory phase and take models all the way to deployment. This platform is designed for speed and scale, and serves as a catalyst for all types of organizations to benefit from the full potential of deep learning. Example of supported applications include but not limited to automotive speech interfaces, image search, language translation, agricultural robotics and genomics, financial document summarization, and finding anomalies in IoT data.
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017MLconf
Yi Wang is the tech lead of AI Platform at Baidu. The team is a primary contributor of PaddlePaddle, the open source deep learning platform originally developed in Baidu. Before Baidu, he was a founding member of ScaledInference, a Palo Alto-based AI startup company. Before that, he was a senior staff at LinkedIn, engineering director of advertising system at Tencent, and researcher at Google.
Abstract Summary:
Fault-tolerable Deep Learning on General-purpose Clusters:
Researchers have been used to running deep learning jobs on clusters. In industrial applications, AI is built on top of big data and deep learning is only one stage of the data pipeline. That is where MPI-based clusters are not enough, and general-purpose cluster management systems are necessary to run Web servers like Nginx, log collectors like fluentd and Kafka, data processors on top of Hadoop, Spark, and Storm, and deep learning, which improves the Web service quality. This talk explains how we integrate PaddlePaddle and Kubernetes to provide an open source fault-tolerable large-scale deep learning platform.
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017MLconf
Ben Lau is a quantitative researcher in a macro hedge fund in Hong Kong and he looks to apply mathematical models and signal processing techniques to study the financial market. Prior joining the financial industry, he specialized in using his mathematical modelling skills to discover the mysteries of the universe whilst working at Stanford Linear Accelerator Centre, a national accelerator laboratory where he studied the asymmetry between matter and antimatter by analysing tens of billions of collision events created by the particle accelerators. Ben was awarded his Ph.D. in Particle Physics from Princeton University and his undergraduate degree (with First Class Honours) at the Chinese University of Hong Kong.
Abstract Summary:
Deep Reinforcement Learning: Developing a robotic car with the ability to form long term driving strategies is the key for enabling fully autonomous driving in the future. Reinforcement learning has been considered a strong AI paradigm which can be used to teach machines through interaction with the environment and by learning from their mistakes. In this talk, we will discuss how to apply deep reinforcement learning technique to train a self-driving car under an open source racing car simulator called TORCS. I am going to share how this is implemented and will discuss various challenges in this project.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Daniel Shank, Data Scientist, Talla at MLconf SF 2016MLconf
Neural Turing Machines: Perils and Promise: Daniel Shank is a Senior Data Scientist at Talla, a company developing a platform for intelligent information discovery and delivery. His focus is on developing machine learning techniques to handle various business automation tasks, such as scheduling, polls, expert identification, as well as doing work on NLP. Before joining Talla as the company’s first employee in 2015, Daniel worked with TechStars Boston and did consulting work for ThriveHive, a small business focused marketing company in Boston. He studied economics at the University of Chicago.
Language translation with Deep Learning (RNN) with TensorFlowS N
The author is going to take you into the realm of Recurrent Neural Network (RNN). He will be training a sequence to sequence model on a dataset of English and French sentences that can translate new (unseen) sentences from English to French.
This will be a walkthrough of an end to end technique to train a Deep RNN model. You will learn to build various components necessary to build a Sequence-to-Sequence model.
You will learn about the fundamentals of Deep Learning, mainly RNN, concepts that will be required in this solution. A familiarity of Deep Learning concepts would be handy, but most of the concepts used in this example will be covered during the demo.
Technologies to be used:
Python, Jupyter, TensorFlow, FloydHub
Source code: https://github.com/syednasar/deeplearning/blob/master/language-translation/dlnd_language_translation.ipynb
...
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
Academic project based on developing a LSTM distributing it on Spark and using Tensorflow for numerical operations.
Source code: https://github.com/EmanuelOverflow/LSTM-TensorSpark
How to win data science competitions with Deep LearningSri Ambati
Note: Please download the slides first, otherwise some links won't work!
How to win kaggle style data science competitions and influence decisions with R, Deep Learning and H2O's fast algorithms.
We take a few public and kaggle datasets and model to win competitions on accuracy and scoring speed.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Slides from the presentation given at M^3 conference: http://www.mcubed.london/
The idea is to use 3 statements to describe and start to work with the TensorFlow library.
Applying your Convolutional Neural NetworksDatabricks
Part 3 of the Deep Learning Fundamentals Series, this session starts with a quick primer on activation functions, learning rates, optimizers, and backpropagation. Then it dives deeper into convolutional neural networks discussing convolutions (including kernels, local connectivity, strides, padding, and activation functions), pooling (or subsampling to reduce the image size), and fully connected layer. The session also provides a high-level overview of some CNN architectures. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)Amazon Web Services
For many companies, recommendation systems solve important machine learning problems. But as recommendation systems grow to millions of users and millions of items, they pose significant challenges when deployed at scale. The user-item matrix can have trillions of entries (or more), most of which are zero. To make common ML techniques practical, sparse data requires special techniques. Learn how to use MXNet to build neural network models for recommendation systems that can scale efficiently to large sparse datasets.
by Vikram Madan, Sr. Product Manager, AWS Deep Learning
In this workshop, we will provide cover deep learning fundamentals and focus on the powerful and scalable Apache MXNet open source deep learning framework. At the end of this tutorial you’ll be able to train your own deep neural network and fine tune existing state of the art models for image and object recognition. We’ll also deep dive on setting up your deep learning infrastructure on AWS and model deployment on AWS Lambda.
Slides to support Austin Machine Learning Meetup, 1/19/2015.
Overview of techniques of recent Kaggle code to perform online logistic regression with FTRL-proximal (SGD, L1/L2 regularization) and hash trick.
Introduction to deep learning @ Startup.ML by Andres RodriguezIntel Nervana
Deep learning is unlocking tremendous economic value across various market sectors. Individual data scientists can draw from several open source frameworks and basic hardware resources during the very initial investigative phases but quickly require significant hardware and software resources to build and deploy production models. Intel offers various software and hardware to support a diversity of workloads and user needs. Intel Nervana delivers a competitive deep learning platform to make it easy for data scientists to start from the iterative, investigatory phase and take models all the way to deployment. This platform is designed for speed and scale, and serves as a catalyst for all types of organizations to benefit from the full potential of deep learning. Example of supported applications include but not limited to automotive speech interfaces, image search, language translation, agricultural robotics and genomics, financial document summarization, and finding anomalies in IoT data.
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017MLconf
Yi Wang is the tech lead of AI Platform at Baidu. The team is a primary contributor of PaddlePaddle, the open source deep learning platform originally developed in Baidu. Before Baidu, he was a founding member of ScaledInference, a Palo Alto-based AI startup company. Before that, he was a senior staff at LinkedIn, engineering director of advertising system at Tencent, and researcher at Google.
Abstract Summary:
Fault-tolerable Deep Learning on General-purpose Clusters:
Researchers have been used to running deep learning jobs on clusters. In industrial applications, AI is built on top of big data and deep learning is only one stage of the data pipeline. That is where MPI-based clusters are not enough, and general-purpose cluster management systems are necessary to run Web servers like Nginx, log collectors like fluentd and Kafka, data processors on top of Hadoop, Spark, and Storm, and deep learning, which improves the Web service quality. This talk explains how we integrate PaddlePaddle and Kubernetes to provide an open source fault-tolerable large-scale deep learning platform.
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017MLconf
Ben Lau is a quantitative researcher in a macro hedge fund in Hong Kong and he looks to apply mathematical models and signal processing techniques to study the financial market. Prior joining the financial industry, he specialized in using his mathematical modelling skills to discover the mysteries of the universe whilst working at Stanford Linear Accelerator Centre, a national accelerator laboratory where he studied the asymmetry between matter and antimatter by analysing tens of billions of collision events created by the particle accelerators. Ben was awarded his Ph.D. in Particle Physics from Princeton University and his undergraduate degree (with First Class Honours) at the Chinese University of Hong Kong.
Abstract Summary:
Deep Reinforcement Learning: Developing a robotic car with the ability to form long term driving strategies is the key for enabling fully autonomous driving in the future. Reinforcement learning has been considered a strong AI paradigm which can be used to teach machines through interaction with the environment and by learning from their mistakes. In this talk, we will discuss how to apply deep reinforcement learning technique to train a self-driving car under an open source racing car simulator called TORCS. I am going to share how this is implemented and will discuss various challenges in this project.
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...MLconf
A Friendly Introduction To Causality: Causality has been studied under several frameworks in statistics and artificial intelligence. We will briefly survey Pearl’s Structural Equation model and explain how interventions can be used to discover causality. We will also present a novel information theoretic framework for discovering causal directions from observational data when interventions are not possible. The starting point is conditional independence in joint probability distributions and no prior knowledge on causal inference is required.
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017MLconf
Ashrith Barthur is a Security Scientist at H2O currently working on algorithms that detect anomalous behaviour in user activities, network traffic, attacks, financial fraud and global money movement. He has a PhD from Purdue University in the field of information security, specialized in Anomalous behaviour in DNS protocol.
Abstract summary
ML(Machine Learning) in AML (Anti Money Laundering):
AML or anti money laundering has been a consistent bane of multiple governments and banks. A strong influences by countries to curb illegal money movement has resulted in a significant yet extremely small aspect of money laundering being identified – a success rate of about 2% average. A more global foot print the bank has the lesser is the accuracy of money laundering investigations. In its current mechanism, investigators analyse each money laundering alert and provide their subjective opinion towards a case. Unfortunately this takes time, and still has a return rate of about 2% at average and 10% at the highest. What we design are AI algorithms that work upon features that track monetary behaviour of every account. These features are essentially time-bound making them a fundamental aspect of algorithm design. The algorithms have a capability to improve the identification close to 70%, and we a certain exclusive features that are a function of time and improve much further.
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
DL4J and DataVec for Enterprise Deep Learning Workflows: Applications in NLP, sensor processing (IoT), image processing, and audio processing have all emerged as prime deep learning applications. In this session we will take a look at a practical review of building practical and secure Deep Learning workflows in the enterprise. We’ll see how DL4J’s DataVec tool enables scalable ETL and vectorization pipelines to be created for a single machine or scale out to Spark on Hadoop. We’ll also see how Deep Networks such as Recurrent Neural Networks are able to leverage DataVec to more quickly process data for modeling.
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
Aaron Roth is an Associate Professor of Computer and Information Sciences at the University of Pennsylvania, affiliated with the Warren Center for Network and Data Science, and co-director of the Networked and Social Systems Engineering (NETS) program. Previously, he received his PhD from Carnegie Mellon University and spent a year as a postdoctoral researcher at Microsoft Research New England. He is the recipient of a Presidential Early Career Award for Scientists and Engineers (PECASE) awarded by President Obama in 2016, an Alfred P. Sloan Research Fellowship, an NSF CAREER award, and a Yahoo! ACE award. His research focuses on the algorithmic foundations of data privacy, algorithmic fairness, game theory and mechanism design, learning theory, and the intersections of these topics. Together with Cynthia Dwork, he is the author of the book “The Algorithmic Foundations of Differential Privacy.”
Abstract Summary:
Differential Privacy and Machine Learning:
In this talk, we will give a friendly introduction to Differential Privacy, a rigorous methodology for analyzing data subject to provable privacy guarantees, that has recently been widely deployed in several settings. The talk will specifically focus on the relationship between differential privacy and machine learning, which is surprisingly rich. This includes both the ability to do machine learning subject to differential privacy, and tools arising from differential privacy that can be used to make learning more reliable and robust (even when privacy is not a concern).
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
Comparing TensorFlow NLP Options: word2Vec, gloVe, RNN/LSTM, SyntaxNet, and Penn Treebank: Through code samples and demos, we’ll compare the architectures and algorithms of the various TensorFlow NLP options. We’ll explore both feed-forward and recurrent neural networks such as word2vec, gloVe, RNN/LSTM, SyntaxNet, and Penn Treebank using the latest TensorFlow libraries.
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...MLconf
Andrew recently joined Lucidworks to head up their Advisory practice, and is a Committer and PMC member on the Apache Mahout project.
Abstract summary
Apache Mahout: Distributed Matrix Math for Machine Learning:
Machine learning and statistics tools like R and Scikit-learn are declarative, flexible, and extensible, but they scale poorly. “Big Data” tools such as Apache Spark, Apache Flink, and H2O distribute well, but have rudimentary functionality for machine learning and are not easily extensible. In this talk we present Apache Mahout, which provides a Scala-based, R-like DSL for doing linear algebra on distributed systems, letting practitioners quickly implement algorithms on distributed matrices. We will highlight new features in version 0.13 including the hybrid CPU/GPU-optimized engine, and a new framework for user-contributed methods and algorithms similar to R’s CRAN.
We will cover some history of Mahout, introduce the R-Like Scala DSL, provide an overview of how Mahout is able to operate on matrices distributed across multiple computers, and how it takes advantage of GPUs on each computer in a cluster creating a hybrid distributed/GPU-accelerated environment; then demonstrate the kinds of normally complex or unfeasible problems users can easily solve with Mahout; show an integration which allows Mahout to leverage the visualization packages of projects such as R, Python, and D3; and lastly explain how to develop algorithms and submit them to the Mahout project for other users to use.
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017MLconf
Irina Rish is a researcher at the AI Foundations department of the IBM T.J. Watson Research Center. She received MS in Applied Mathematics from Moscow Gubkin Institute, Russia, and PhD in Computer Science from the University of California, Irvine. Her areas of expertise include artificial intelligence and machine learning, with a particular focus on probabilistic graphical models, sparsity and compressed sensing, active learning, and their applications to various domains, ranging from diagnosis and performance management of distributed computer systems (“autonomic computing”) to predictive modeling and statistical biomarker discovery in neuroimaging and other biological data. Irina has published over 60 research papers, several book chapters, two edited books, and a monograph on Sparse Modeling, taught several tutorials and organized multiple workshops at machine-learning conferences, including NIPS, ICML and ECML. She holds 24 patents and several IBM awards. Irina currently serves on the editorial board of the Artificial Intelligence Journal (AIJ). As an adjunct professor at the EE Department of Columbia University, she taught several advanced graduate courses on statistical learning and sparse signal modeling.
Abstract Summary:
Learning About the Brain and Brain-Inspired Learning:
Quantifying mental states and identifying statistical biomarkers of mental disorders from neuroimaging data is an exciting and rapidly growing research area at the intersection of neuroscience and machine learning, with the particular focus on interpretability and reproducibility of learned models. We will discuss promises and limitations of machine-learning methods in such applications, focusing on recent applications of deep learning methods such as recurrent convnets to the analysis of “brain movies” (EEG) data. On the other hand, besides the above “AI to Brain” direction, we will also discuss the “Brain to AI”, namely, borrowing ideas from neuroscience to improve machine learning, with specific focus on adult neurogenesis and online model adaptation in representation learning.
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016MLconf
Smart Reply: Learning a Model of Conversation from Data: Smart Reply is a text assistance feature that was recently introduced to Inbox by Gmail. Given an incoming email message, the Smartreply system analyzes its contents and suggests complete responses that the recipient can send with just one tap. This talk will cover how we built Smartreply using a combination of deep learning and semantic clustering, as well as what we learned along the way and why we think it shows promise for the future of dialogue models.
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...MLconf
Local Search Optimization for Hyper-Parameter Tuning: Many machine learning algorithms are sensitive to their hyper-parameter settings, lacking good universal rule-of-thumb defaults. In this talk we discuss the use of black-box local search optimization (LSO) for machine learning hyper-parameter tuning. Viewed as a black-box objective function of hyper-parameters, machine learning algorithms create a difficult class of optimization problems. The corresponding objective functions involved tend to be nonsmooth, discontinuous, unpredictably computationally expensive, requiring support for both continuous, categorical, and integer variables. Further evaluations can fail for a variety of reasons such as early exits due to node failure or hitting max time. Additionally, not all hyper-parameter combinations are compatible (creating so called “hidden constraints”). In this context, we apply a parallel hybrid derivative-free optimization algorithm that can make progress despite these difficulties providing significantly improved results over default settings with minimal user interaction. Further, we will address efficient parallel paradigms for different types of machine learning problems, while exploring the importance of validation to avoid overfitting and emphasizing that even for small data problems, the need to perform cross validations can create computationally intense functions that benefit from a distributed/threaded environment.
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016MLconf
Building a Machine Learning Platform at Quora: Each month, over 100 million people use Quora to share and grow their knowledge. Machine learning has played a critical role in enabling us to grow to this scale, with applications ranging from understanding content quality to identifying users’ interests and expertise. By investing in a reusable, extensible machine learning platform, our small team of ML engineers has been able to productionize dozens of different models and algorithms that power many features across Quora.
In this talk, I’ll discuss the core ideas behind our ML platform, as well as some of the specific systems, tools, and abstractions that have enabled us to scale our approach to machine learning.
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016MLconf
Interpreting Black-Box Models with Applications to Healthcare: Complex and highly interactive models such as Random Forests, Gradient Boosting, and Deep Neural Networks demonstrate superior predictive power compared to their high-bias counterparts, Linear and Logistic Regression. However, these more complex and sophisticated methods lack the interpretability of the simpler alternatives. In some areas of application, such as healthcare, model interpretability is crucial both to build confidence in the model predictions as well as to explain the results on individual cases. This talk will discuss recent approaches to explaining “black-box” models and demonstrate some recently developed tools that aid this effort.
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017MLconf
Xin Luna Dong is a Principal Scientist at Amazon, leading the efforts of constructing Amazon Product Graph. She was one of the major contributors to the Knowledge Vault project, and has led the Knowledge-based Trust project, which is called the “Google Truth Machine” by Washington’s Post. She has won the VLDB Early Career Research Contribution Award for “advancing the state of the art of knowledge fusion”, and the Best Demo award in Sigmod 2005. She has co-authored book “Big Data Integration”, published 65+ papers in top conferences and journals, and given 20+ keynotes/invited-talks/tutorials. She is the PC co-chair for Sigmod 2018 and WAIM 2015, and serves as an area chair for Sigmod 2017, CIKM 2017, Sigmod 2015, ICDE 2013, and CIKM 2011.
Abstract summary
Leave No Valuable Data Behind: the Crazy Ideas and the Business:
With the mission “leave no valuable data behind”, we developed techniques for knowledge fusion to guarantee the correctness of the knowledge. This talk starts with describing a few crazy ideas we have tested. The first, known as “Knowledge Vault”, used 15 extractors to automatically extract knowledge from 1B+ Webpages, obtaining 3B+ distinct (subject, predicate, object) knowledge triples and predicting well-calibrated probabilities for extracted triples. The second, known as “Knowledge-Based Trust”, estimated the trustworthiness of 119M webpages and 5.6M websites based on the correctness of their factual information. We then present how we bring the ideas to business in filling the gap between the knowledge at existing knowledge bases and the knowledge in the world.
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
Using Bayesian Optimization to Tune Machine Learning Models: In this talk we briefly introduce Bayesian Global Optimization as an efficient way to optimize machine learning model parameters, especially when evaluating different parameters is time-consuming or expensive. We will motivate the problem and give example applications.
We will also talk about our development of a robust benchmark suite for our algorithms including test selection, metric design, infrastructure architecture, visualization, and comparison to other standard and open source methods. We will discuss how this evaluation framework empowers our research engineers to confidently and quickly make changes to our core optimization engine.
We will end with an in-depth example of using these methods to tune the features and hyperparameters of a real world problem and give several real world applications.
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017MLconf
Ross Goodwin, Technologist – Creater, Sunspring
Ross Goodwin is a creative technologist, artist, hacker, data scientist, and former White House ghostwriter. Ross helped conceive Sunspring, a 2016 experimental science fiction short film entirely written by an artificial intelligence bot using neural networks. He employs machine learning, natural language processing, and other computational tools to realize new forms and interfaces for written language.
Abstract Summary:
Narrated Reality:
Can machine intelligence enable new forms and interfaces for written language, or does it merely reveal an “uncanny valley” of text? Join Ross Goodwin as he discusses his work with neural networks for creative applications, including expressive image captioning, narration devices for your home and car, and a film (Sunspring) created from a computer generated screenplay.
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016MLconf
A General Framework for Communication-Efficient Distributed Optimization: Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning. In light of this, we propose a general framework, CoCoA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. Our framework enjoys strong convergence guarantees and exhibits state-of-the-art empirical performance in the distributed setting. We demonstrate this performance with extensive experiments in Apache Spark, achieving speedups of up to 50x compared to leading distributed methods on common machine learning objectives.
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017MLconf
Sanjeev Satheesh, leads the Deep Speech team at Baidu’s Silicon valley AI lab. Baidu SVAIL is focused on developing hard AI technologies to impact hundreds of millions of people.
The Story of End to End Models in Deep Learning
The past few years have seen the explosive entrance of end to end deep learning models - in computer vision, speech recognition, machine translation, text to speech and others. In this talk, we look at this trend to identify what has worked well, and try to make some predictions for the future based on the next set of unsolved problems.
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017MLconf
Mayur is head of the Data Analytics Group in the Global Compliance Division. He joined Goldman Sachs as a managing director in 2014.
Prior to joining the firm, Mayur worked at Google, where he designed search algorithms for more than seven years. Previously, he was an assistant professor of computer science at the University of Missouri.
Mayur earned a PhD in Computer Science from the University of Rochester in 2004 and a BTech in Computer Science and Engineering from the Indian Institute of Technology, Delhi, in 1999.
Abstract Summary:
Surveillance platforms for bank compliance
Bank compliance uses models to look for outlier events such as insider trading, spoofing, front running, etc. With the exponential increase in the size of the data and a growing need to use such models, a key question is: How do we scale these models so they run efficiently and at the same time detect outlier events with good precision and recall?
In this talk, we will describe our experience building, from scratch, a Hadoop-based platform for surveillance.
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016MLconf
Machine Learning with TensorFlow: TensorFlow has enabled cutting-edge machine learning research at the top AI labs in the world. At the same time it has made the technology accessible to a large audience leading to some amazing uses. TensorFlow is used for classification, recommendation, text parsing, sentiment analysis and more. This talk will go over the design that makes it fast, flexible, and easy to use, and describe how we continue to make it better.
Power System Simulation: History, State of the Art, and ChallengesLuigi Vanfretti
This talk will give an overview of power system simulation technology through several decades, aiming to provide an understanding of the modeling philosophy and approach that has lead to the state of the art in (domain specific) power system simulation tools. This historical perspective will contrast the de facto proprietary software development method used by the power engineering community, against the open source development model. Aspects of resistance to change particular to the power system engineering community will be highlighted.
Given this particular context, power system simulation faces enormous challenges to adapt in order to satisfy simulation needs of both cyber-physical and sustainable system challenges. Such challenges will be highlighted during the talk.
There is, however, an opportunity for disruptive change in power system simulation technology emerging for the EU Smart Grid Mandate M/490, which requires "a set of consistent standards, which will support the information exchange (communication protocols and data models) and the integration of all users into the electric system operation." These regulatory aspects will be explained to highlight the importance of collaboration between the power system domain and computer system experts.
Open modeling and simulation standards may have a large role to play in the development of the European Smart Grid which will have to overcome challenges related to the design, operation and control of cyber-physical and sustainable electrical energy systems. To contribute to this role, the KTH SmarTS Lab research group has been applying the standardized Modelica language and the FMI standard for model exchange in order to couple the domain specific data exchange model (CIM) with the powerful and modern simulation technologies developed by the Modelica community. These efforts will be also discussed.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...Luigi Vanfretti
Title:
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and GridDyn
Presenters:
Luigi Vanfretti (RPI) & Philip Top (LNLL)
luigi.vanfretti@gmail.com, top1@llnl.gov
Abstract:
The Modelica language, being standardized and equation-based, has proven valuable for the for model exchange, simulation and even for model validation applications in actual power systems. These important features have been now recognized by the European Network of Transmission System Operators, which have adopted the Modelica language for dynamic model exchange in the Common Grid Model Exchange Standard (v2.5, Annex F).
Following previous FP7 project results, within the ITEA 3 openCPS project, the presenters have continued the efforts of using the Modelica language for power system modeling and simulation, by developing and maintaining the OpenIPSL library: https://github.com/SmarTS-Lab/OpenIPSL
This seminar first gives an overview of the origins of the OpenIPSL and it’s models, it contrasts it against typical power system tools, and gives an introduction the OpenIPSL library. The new project features that help in the OpenIPSL maintenance (use of continuous integration, regression testing, documentation, etc.) are also described.
Finally, the seminar will present current work at LNLL that exploits OpenIPSL in coordination with other tools including ongoing work integrating openIPSL models into GridDyn an open-source power system simulation tool, as well as a demos of the use of openIPSL libraries in GridDyn.
Bios:
Luigi Vanfretti (SMIEEE’14) obtained the M.Sc. and Ph.D. degrees in electric power engineering at Rensselaer Polytechnic Institute, Troy, NY, USA, in 2007 and 2009, respectively.
He was with KTH Royal Institute of Technology, Stockholm, Sweden, as Assistant 2010-2013), and Associate Professor (Tenured) and Docent (2013-2017/August); where he lead the SmarTS Lab and research group. He also worked at Statnett SF, the Norwegian electric power transmission system operator, as consultant (2011 - 2012), and Special Advisor in R&D (2013 - 2016).
He joined Rensselaer Polytechnic Institute in August 2017, to continue to develop his research at ALSETLab: http://alsetlab.com
His research interests are in the area of synchrophasor technology applications; and cyber-physical power system modeling, simulation, stability and control.
Philp Top (Lawrence Livermore National Lab)
PhD 2007 Purdue University. Currently a Research Engineer at Lawrence Livermore National Laboratory in Livermore, CA. Philip has been involved in several projects connected with the DOE effort on Grid Modernization including projects on modeling and simulation, co-simulation and smart grid data analytics. He is the principle developer on the open source power system simulation tool GridDyn, and a key contributor to the HELICS open source co-simulation framework.
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...Matteo Ferroni
Tools and applications for event stream processing and real-time analytics are getting a huge hype these days on a wide range of application scenarios, from the smallest Internet of Things (IoT) embedded sensor to the most popular Social Network feed. Unfortunately, dealing with this kind of input rises some issues that can easily mine the real-time analysis requirement due to an unexpected overload of the system; this happens as the processing time may strongly depend on the single event content, while the event arrival rate may vary unpredictably over time. In this work, we propose Fast Forward With Degradation (FFWD), a latency-aware load shedding framework that exploits performance degradation techniques to adapt the throughput of the application to the size of the input, allowing the system to have a fast and reliable response time in case of overloading. Moreover, we show how different domain-specific policies can guarantee a reasonable accuracy of the aggregated output metrics.
Full paper: http://ieeexplore.ieee.org/document/7982234/
Approaches to online quantile estimationData Con LA
Data Con LA 2020
Description
This talk will explore and compare several compact data structures for estimation of quantiles on streams, including a discussion of how they balance accuracy against computational resource efficiency. A new approach providing more flexibility in specifying how computational resources should be expended across the distribution will also be explained. Quantiles (e.g., median, 99th percentile) are fundamental summary statistics of one-dimensional distributions. They are particularly important for SLA-type calculations and characterizing latency distributions, but unlike their simpler counterparts such as the mean and standard deviation, their computation is somewhat more expensive. The increasing importance of stream processing (in observability and other domains) and the impossibility of exact online quantile calculation together motivate the construction of compact data structures for estimation of quantiles on streams. In this talk we will explore and compare several such data structures (e.g., moment-based, KLL sketch, t-digest) with an eye towards how they balance accuracy against resource efficiency, theoretical guarantees, and desirable properties such as mergeability. We will also discuss a recent variation of the t-digest which provides more flexibility in specifying how computational resources should be expended across the distribution. No prior knowledge of the subject is assumed. Some familiarity with the general problem area would be helpful but is not required.
Speaker
Joe Ross, Splunk, Principal Data Scientist
Modeling adoptions and the stages of the diffusion of innovationsNicola Barbieri
We study the data mining problem of modeling adoptions and the stages of the diffusion of an innovation. For our aim we propose a stochastic model which decomposes a diffusion trace (sequence of adoptions) in an ordered sequence of stages, where each stage is intuitively built around two dimensions: users and relative speed at which adoptions happen. Each stage is characterized by a specific rate of adoption and it involves different users to different extent, while the sequentiality in the diffusion is guaranteed by constraining the transition probabilities among stages.
An empirical evaluation on synthetic and real-world adoption logs shows the effectiveness of the proposed framework in summarizing the adoption process, enabling several analysis tasks such as the identification of adopter categories, clustering and characterization of diffusion traces, and prediction of which users will adopt an item in the next future.
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...Soumya Banerjee
In this research we use a decentralized computing approach to allocate and schedule tasks on a massively distributed grid. Using emergent properties of multi-agent systems, the algorithm dynamically creates and dissociates clusters
to serve the changing resource demands of a global task queue. The algorithm is compared to a standard First-in First-out (FIFO) scheduling algorithm. Experiments
done on a simulator show that the distributed resource allocation protocol (dRAP) algorithm outperforms the FIFO scheduling algorithm on time to empty
queue, average waiting time and CPU utilization. Such a decentralized computing approach holds promise for massively distributed processing scenarios like SETI@home and Google MapReduce.
RSC: Mining and Modeling Temporal Activity in Social MediaAlceu Ferraz Costa
Presentation of the KDD 2015 paper describing the RSC model:
RSC: Mining and Modeling Temporal Activity in Social Media
Alceu Ferraz Costa, Yuto Yamaguchi, Agma Juci Machado Traina, Caetano Traina Jr., and Christos Faloutsos
The 21st SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2015
Similar to Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016 (20)
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
Understanding Human Impact: Social and Equity Assessments for AI Technologies
Social and Equity Impact Assessments have broad applications but can be a useful tool to explore and mitigate for Machine Learning fairness issues and can be applied to product specific questions as a way to generate insights and learnings about users, as well as impacts on society broadly as a result of the deployment of new and emerging technologies.
In this presentation, my goal is to advocate for and highlight the need to consult community and external stakeholder engagement to develop a new knowledge base and understanding of the human and social consequences of algorithmic decision making and to introduce principles, methods and process for these types of impact assessments.
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
The Brain’s Guide to Dealing with Context in Language Understanding
Like the visual cortex, the regions of the brain involved in understanding language represent information hierarchically. But whereas the visual cortex organizes things into a spatial hierarchy, the language regions encode information into a hierarchy of timescale. This organization is key to our uniquely human ability to integrate semantic information across narratives. More and more, deep learning-based approaches to natural language understanding embrace models that incorporate contextual information at varying timescales. This has not only led to state-of-the art performance on many difficult natural language tasks, but also to breakthroughs in our understanding of brain activity.
In this talk, we will discuss the important connection between language understanding and context at different timescales. We will explore how different deep learning architectures capture timescales in language and how closely their encodings mimic the brain. Along the way, we will uncover some surprising discoveries about what depth does and doesn’t buy you in deep recurrent neural networks. And we’ll describe a new, more flexible way to think about these architectures and ease design space exploration. Finally, we’ll discuss some of the exciting applications made possible by these breakthroughs.
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
Applying Computer Vision to Reduce Contamination in the Recycling Stream
With China’s recent refusal of most foreign recyclables, North American waste haulers are scrambling to figure out how to make on-shore recycling cost-effective in order to continue providing recycling services. Recyclables that were once being shipped to China for manual sorting are now primarily being redirected to landfills or incinerators. Without a solution, a nearly $5 billion annual recycling market could come to a halt.
Purity in the recycling stream is key to this effort as contaminants in the stream can increase the cost of operations, damage equipment and reduce the ability to create pure commodities suitable for creating recycled goods. This market disruption as a result of China’s new regulations, however, provides us the chance to re-examine and improve our current disposal & collection habits with modern monitoring & artificial intelligence technology.
Using images from our in-dumpster cameras, Compology has developed an ML-based process that helps identify, measure and alert for contaminants in recycling containers before they are picked-up, helping keep the recycling stream clean.
Our convolutional neural network flags potential instances of contamination inside a dumpster, enabling garbage haulers to know which containers have the wrong type of material inside. This allows them to provide targeted, timely education, and when appropriate, assess fines, to improve recycling compliance at the businesses and residences they serve, helping keep recycling services financially viable.
In this presentation, we will walk through our ML-based contamination measurement and scoring process by showing how Waste Management, a national waste hauler, has experienced 57% contamination reduction in nearly 2,000 containers over six months, This progress shows significant strides towards financially viable recycling services.
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
Quantum Computing: a Treasure Hunt, not a Gold Rush
Quantum computers promise a significant step up in computational power over conventional computers, but also suffer a number of counterintuitive limitations --- both in their computational model and in leading lab implementations. In this talk, we review how quantum computers compete with conventional computers and how conventional computers try to hold their ground. Then we outline what stands in the way of successful quantum ML applications.
Josh Wills - Data Labeling as Religious ExperienceMLconf
Data Labeling as Religious Experience
One of the most common places to deploy a production machine learning systems is as a replacement for a legacy rules-based system that is having a hard time keeping up with new edge cases and requirements. I'll be walking through the process and tooling we used to help us design, train, and deploy a model to replace a set of static rules we had for handling invite spam at Slack, talk about what we learned, and discuss some problems to solve in order to make these migrations easier for everyone.
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
Project GaitNet: Ushering in the ImageNet moment for human Gait kinematics
The emergence of the upright human bipedal gait can be traced back 4 to 2.8 million years ago, to the now extinct hominin Australopithecus afarensis. Fine grained analysis of gait using the modern MEMS sensors found on all smartphones not just reveals a lot about the person’s orthopedic and neuromuscular health status, but also has enough idiosyncratic clues that it can be harnessed as a passive biometric. While there were many siloed attempts made by the machine learning community to model Bipedal Gait sensor data, these were done with small datasets oft collected in restricted academic environs. In this talk, we will introduce the ImageNet moment for human gait analysis by presenting 'Project GaitNet', the largest ever planet-sized motion sensor based human bipedal gait dataset ever curated. We’ll also present the associated state-of-the-art results in classifying humans harnessing novel deep neural architectures and the related success stories we have enjoyed in transfer-learning into disparate domains of human kinematics analysis.
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
Machine Learning Methods in Detecting Alzheimer’s Disease from Speech and Language
Alzheimer's disease affects millions of people worldwide, and it is important to predict the disease as early and as accurate as possible. In this talk, I will discuss development of novel ML models that help classifying healthy people from those who develop Alzheimer's, using short samples of human speech. As an input to the model, features of different modalities are extracted from speech audio samples and transcriptions: (1) syntactic measures, such as e.g. production rules extracted from syntactic parse trees, (2) lexical measures, such as e.g. features of lexical richness and complexity and lexical norms, and (3) acoustic measures, such as e.g. standard Mel-frequency cepstral coefficients. I will present the ML model that detects cognitive impairment by reaching agreement among modalities. The resulting model is able to achieve state of the art performance in both supervised and semi-supervised manner, using manual transcripts of human speech. Additionally, I will discuss potential limitations of any fully-automated speech-based Alzheimer's disease detection model, focusing mostly on the analysis of the impact of a not-so-accurate automatic speech recognition (ASR) on the classification performance. To illustrate this, I will present the experiments with controlled amounts of artificially generated ASR errors and explain how the deletion errors affect Alzheimer's detection performance the most, due to their impact on the features of syntactic and lexical complexity.
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
Optimized Image Classification on the Cheap
In this talk, we anchor on building an image classifier trained on the Stanford Cars dataset to evaluate two approaches to transfer learning -fine tuning and feature extraction- and the impact of hyperparameter optimization on these techniques. Once we define the most performant transfer learning technique for Stanford Cars, we will double the size of the dataset through image augmentation to boost the classifier’s performance. We will use Bayesian optimization to learn the hyperparameters associated with image transformations using the downstream image classifier’s performance as the guide. In conjunction with model performance, we will also focus on the features of these augmented images and the downstream implications for our image classifier.
To both maximize model performance on a budget and explore the impact of optimization on these methods, we apply a particularly efficient implementation of Bayesian optimization to each of these architectures in this comparison. Our goal is to draw on a rigorous set of experimental results that can help us answer the question: how can resource-constrained teams make trade-offs between efficiency and effectiveness using pre-trained models?
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
The Importance of Modeling Data Collection
Data sets used in machine learning are often collected in a systematically biased way - certain data points are more likely to be collected than others. We call this "observation bias". For example, in health care, we are more likely to see lab tests when the patient is feeling unwell than otherwise. Failing to account for observation bias can, of course, result in poor predictions on new data. By contrast, properly accounting for this bias allows us to make better use of the data we do have.
In this presentation, we discuss practical and theoretical approaches to dealing with observation bias. When the nature of the bias is known, there are simple adjustments we can make to nonparametric function estimation techniques, such as Gaussian Process models. We also discuss the scenario where the data collection model is unknown. In this case, there are steps we can take to estimate it from observed data. Finally, we demonstrate that having a small subset of data points that are known to be collected at random - that is, in an unbiased way - can vastly improve our ability to account for observation bias in the rest of the data set.
My hope is that attendees of this presentation will be aware of the perils of observation bias in their own work, and be equipped with tools to address it.
The Uncanny Valley of ML
Every so often, the conundrum of the Uncanny Valley re-emerges as advanced technologies evolve from clearly experimental products to refined accepted technologies. We have seen its effects in robotics, computer graphics, and page load times. The debate of how to handle the new technology detracts from its benefits. When machine learning is added to human decision systems a similar effect can be measured in increased response time and decreased accuracy. These systems include radiology, judicial assignments, bus schedules, housing prices, power grids and a growing variety of applications. Unfortunately, the Uncanny Valley of ML can be hard to detect in these systems and can lead to degraded system performance when ML is introduced, at great expense. Here, we'll introduce key design principles for introducing ML into human decision systems to navigate around the Uncanny Valley and avoid its pitfalls.
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
Deep Learning Architectures for Semantic Relation Detection Tasks
Recognizing and distinguishing specific semantic relations from other types of semantic relations is an essential part of language understanding systems. Identifying expressions with similar and contrasting meanings is valuable for NLP systems which go beyond recognizing semantic relatedness and require to identify specific semantic relations. In this talk, I will first present novel techniques for creating labelled datasets required for training deep learning models for classifying semantic relations between phrases. I will further present various neural network architectures that integrate morphological features into integrated path-based and distributional relation detection algorithms and demonstrate that this model outperforms state-of-the-art models in distinguishing semantic relations and is capable of efficiently handling multi-word expressions.
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
Building an Incrementally Trained, Local Taste Aware, Global Deep Learned Recommender System Model
At Netflix, our main goal is to maximize our members’ enjoyment of the selected show by minimizing the amount of time it takes for them to find it. We try to achieve this goal by personalizing almost all the aspects of our product -- from what shows to recommend, to how to present these shows and construct their home-pages to what images to select per show, among many other things. Everything is recommendations for us and as an applied Machine Learning group, we spend our time building models for personalization that will eventually increase the joy and satisfaction of our members. In this talk we will primarily focus our attention on a) making a global deep learned recommender model that is regional tastes and popularity aware and b) adapting this model to changing taste preferences as well as dynamic catalog availability.
We will first go through some standard recommender system models that use Matrix Factorization and Topic Models and then compare and contrast them with more powerful and higher capacity deep learning based models such as sequence models that use recurrent neural networks. We will show what it entails to build a global model that is aware of regional taste preferences and catalog availability. We will show how models that are built on simple Maximum Likelihood principle fail to do that. We will then describe one solution that we have employed in order to enable the global deep learned models to focus their attention on capturing regional taste preferences and changing catalog.In the latter half of the talk, we will discuss how we do incremental learning of deep learned recommender system models. Why do we need to do that ? Everything changes with time. Users’ tastes change with time. What’s available on Netflix and what’s popular also change over time. Therefore, updating or improving recommendation systems over time is necessary to bring more joy to users. In addition to how we apply incremental learning, we will discuss some of the challenges we face involving large-scale data preparation, infrastructure setup for incremental model training as well as pipeline scheduling. The incremental training enables us to serve fresher models trained on fresher and larger amounts of data. This helps our recommender system to nicely and quickly adapt to catalog and users’ taste changes, and improve overall performance.
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
Vito Ostuni - The Voice: New Challenges in a Zero UI World
The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a daily delightful listening experience for millions of users. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities. In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic, and broad open-ended. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query.
We will also present the differences and challenges regarding evaluation of voice powered recommendation systems. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made. In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
2. Outline
• Personalization
• Latent Variable Models
• User Engagement and Return Times
• Deep Recommender Systems
• MXNet
• Basic concepts
• Launching a cluster in a minute
• Imagenet for beginners
4. Latent Variable Models
• Temporal sequence of observations
Purchases, likes, app use, e-mails, ad clicks, queries, ratings
• Latent state to explain behavior
• Clusters (navigational, informational queries in search)
• Topics (interest distributions for users over time)
• Kalman Filter (trajectory and location modeling)
Action
Explanation
5. Latent Variable Models
• Temporal sequence of observations
Purchases, likes, app use, e-mails, ad clicks, queries, ratings
• Latent state to explain behavior
• Clusters (navigational, informational queries in search)
• Topics (interest distributions for users over time)
• Kalman Filter (trajectory and location modeling)
Action
Explanation
Are the parametric models really true?
6. Latent Variable Models
• Temporal sequence of observations
Purchases, likes, app use, e-mails, ad clicks, queries, ratings
• Latent state to explain behavior
• Nonparametric model / spectral
• Use data to determine shape
• Sidestep approximate inference
x
h
ht = f(xt 1, ht 1)
xt = g(xt 1, ht)
7. Latent Variable Models
• Temporal sequence of observations
Purchases, likes, app use, e-mails, ad clicks, queries, ratings
• Latent state to explain behavior
• Plain deep network = RNN
• Deep network with attention = LSTM / GRU …
(learn when to update state, how to read out)
x
h
8. Long Short Term Memory
x
h
Schmidhuber and Hochreiter, 1998
it = (Wi(xt, ht) + bi)
ft = (Wf (xt, ht) + bf )
zt+1 = ft · zt + it · tanh(Wz(xt, ht) + bz)
ot = (Wo(xt, ht, zt+1) + bo)
ht+1 = ot · tanh zt+1
9. Long Short Term Memory
x
h
Schmidhuber and Hochreiter, 1998
(zt+1, ht+1, ot) = LSTM(zt, ht, xt)
Treat it as a black box
11. User Engagement Modeling
• User engagement is gradual
• Daily average users?
• Weekly average users?
• Number of active users?
• Number of users?
• Abandonment is passive
• The last time you tweeted? Pin? Like? Skype?
• Churn models assume active abandonment
(insurance, phone, bank)
9:01
12. User Engagement Modeling
• User engagement is gradual
• Model user returns
• Context of activity
• World events (elections, Super Bowl, …)
• User habits (morning reader, night owl)
• Previous reading behavior
(poor quality content will discourage return)
9:01
13. Survival Analysis 101
• Model population where something dramatic happens
• Cancer patients (death; efficacy of a drug)
• Atoms (radioactive decay)
• Japanese women (marriage)
• Users (opens app)
• Survival probability
TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 100, NO. 10, JA
well known that the differential equation can be solved
partial integration, i.e.
Pr(tsurvival T) = exp
Z T
0
(T)dt
!
. (2)
ce, if the patient survives until time T and we stop
kernel
time t
Conse
hazard rate function
14. Session Model
• User activity is sequence of times
• bi when app is opened
• ei when app is closed
• In between wait for user return
• Model user activity likelihood
start
end
16. Personalized LSTM
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 100, NO. 10, JANUARY 2016 8
Hidden2
Hidden1
Input
……
……
……
Hidden2
Hidden1
……
……
Hidden2
Hidden1
……
……
Input
……
Input
……
Session s-2 Session s-1 Session s
Fig. 2. Unfolded LSTM network for 3 sessions. The input vector for session s is the concatenation of user embedding, time slot embedding and the
• LSTM for global state update
• LSTM for indvidual state update
• Update both of them
• Learn using backprop and SGD
Jing and Smola, WSDM’17
17. Perplexity (quality of prediction)
next visit time (hour)
Fig. 6. The histogram of the time period between two sessions. The top
one is from Toutiao and the bottom one is from Last.fm. The small bump
around 24 hours corresponds to users having a daily habit of using the
app at the same time.
global constant model. A static model with only one pa-
rameter, assuming that the rate is constant throughout
the time frame for all users.
global+user constant model. A static model that assumes
that the rate is an additive function of a global constant
and a user-specific constant model.
piecewise constant model. A more flexible static model
that learns parameters for each discretized bin.
Hawkes process. A self-exciting point process that respects
past sessions.
integrated model. A combined model with all the above
components.
DNN. A model that assumes that the rate is a function
of time, user, session feature, parameterized by a deep
neural network.
LSTM. A recurrent neural network that incorporates past
activities.
For completeness, we also report the result for Cox’s model
where the Hazard Rate is given by
u(t) = 0(t) exp(h , xu(t)i) (28)
perp = exp
⇣ 1
M
mX
u=1
muX
i=1
log p({bi, ei}; )
⌘
(29)
where M is the total number of sessions in the test set. The
lower the value, the better the model is at explaining the
test data. In other words, perplexity measures the amount
of surprise in a user’s behavior relative to our prediction.
Obviously a good model can predict well, hence there will
be less surprise.
6.6 Model Comparison
The summarized results are shown in table 1. As can be seen
from the table, there is a big gap between linear models
and the two deep models. The Cox model is inferior to
our integrated model and significantly worse than the deep
networks.
model Toutiao Last.fm
Cox Model 27.13 28.31
global constant 45.29 59.98
user constant 28.74 45.44
piecewise constant 26.88 26.12
Hawkes process 22.58 30.80
integrated model 21.56 26.06
DNN 18.87 20.62
LSTM 18.10 19.80
TABLE 1
Average perplexity evaluated on the test set for different models.
flexible static model
iscretized bin.
nt process that respects
el with all the above
the rate is a function
ameterized by a deep
that incorporates past
result for Cox’s model
xu(t)i) (28)
from the table, there is a big gap between line
and the two deep models. The Cox model is
our integrated model and significantly worse than
networks.
model Toutiao Last.fm
Cox Model 27.13 28.31
global constant 45.29 59.98
user constant 28.74 45.44
piecewise constant 26.88 26.12
Hawkes process 22.58 30.80
integrated model 21.56 26.06
DNN 18.87 20.62
LSTM 18.10 19.80
TABLE 1
Average perplexity evaluated on the test set for different
18. Perplexity (quality of prediction)IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 100, NO. 10, JANUARY 2016
Toutiao Last.fm
# of sessions (%)
0 20 40 60 80 100
Perplexity
0
20
40
60
80
100
120
140
160
global constant
user constant
piecewise constant
Hawkes Process
Integrated
Cox
DNN
LSTM
# of sessions (%)
0 20 40 60 80 100
Perplexity
0
20
40
60
80
100
120
140
160
180
global constant
user constant
piecewise constant
Hawkes Process
Integrated
Cox
DNN
LSTM
%)
50
LSTM v.s. Integrated
LSTM v.s. Cox
%)
45
50
LSTM v.s. Integrated
LSTM v.s. Cox
# of sessions (%)
0 20 40 60 80 100
0
20
# of sessions (%)
0 5 10 15 20
RelativeImprovements(%)
0
10
20
30
40
50
LSTM v.s. Integrated
LSTM v.s. Cox
Fig. 7. Top row: Average test perplexity as a function of the fraction of o
LSTMs over the integrated and the Cox model. Left column: Toutiao datJing and Smola, WSDM’17
19. t (hour)
0 20 40 60 80
λ(t)
0
0.1
0.2
0.3
0.4
0.5
instantaneous rate
actual return time
t (hour)
0 20 40 60 80
Pr(return≥t)
0
0.2
0.4
0.6
0.8
1
survival function
actual return time
t (hour)
0 20 40 60 80
λ(t)
0
0.1
0.2
0.3
0.4
0.5
instantaneous rate
actual return time
t (hour)
0 20 40 60 80
Pr(return≥t)
0
0.2
0.4
0.6
0.8
1
survival function
actual return time
t (hour)
0 20 40 60 80
λ(t)
0
0.1
0.2
0.3
0.4
0.5
instantaneous rate
actual return time
t (hour)
0 20 40 60 80
Pr(return≥t)
0
0.2
0.4
0.6
0.8
1
survival function
actual return time
t (hour)
0 20 40 60 80
λ(t)
0
0.1
0.2
0.3
0.4
0.5
0.6
instantaneous rate
actual return time
t (hour)
0 20 40 60 80
Pr(return≥t)
0
0.2
0.4
0.6
0.8
1
survival function
actual return time
t (hour)
0 20 40 60 80
λ(t)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
instantaneous rate
actual return time
t (hour)
0 20 40 60 80
Pr(return≥t)
0
0.2
0.4
0.6
0.8
1
survival function
actual return time
t (hour)
0 20 40 60 80
λ(t)
0
0.1
0.2
0.3
0.4
0.5
instantaneous rate
actual return time
t (hour)
0 20 40 60 80
Pr(return≥t)
0
0.2
0.4
0.6
0.8
1
survival function
actual return time
g. 9. Six randomly sampled learned predictive rate function. Three from toutiao (left) and three from Last.fm (right). Each pair of figure denotes
e instantaneous rate value (t) (purple), the survival function p(return t) in red, and the actual return time in blue. Clearly, our deep model is
29. Why yet another deep networks tool?
• Frugality & resource efficiency
Engineered for cheap GPUs with smaller memory, slow networks
• Speed
• Linear scaling with #machines and #GPUs
• High efficiency on single machine, too (C++ backend)
• Simplicity
Mix declarative and imperative code
single implementation of
backend system and
common operators
performance guarantee
regardless which frontend
language is used
frontend
backend
30. Imperative Programs
import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
print c
d = c + 1 Easy to tweak
with python
codes
Pro
• Straightforward and flexible.
• Take advantage of language native
features (loop, condition, debugger)
Con
• Hard to optimize
31. Declarative Programs
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),
B=np.ones(10)*2)
Pro
• More chances for optimization
• Cross different languages
Con
• Less flexible
A B
1
+
⨉
C can share memory with D,
because C is deleted later
32. Imperative vs. Declarative for Deep Learning
Computational Graph
of the Deep Architecture
forward backward
Needs heavy optimization,
fits declarative programs
Needs mutation and more
language native features, good for
imperative programs
Updates and Interactions
with the graph
• Iteration loops
• Parameter update
• Beam search
• Feature extraction …
w w ⌘@wf(w)
33. Mixed Style Training Loop in MXNet
executor = neuralnetwork.bind()
for i in range(3):
train_iter.reset()
for dbatch in train_iter:
args["data"][:] = dbatch.data[0]
args["softmax_label"][:] = dbatch.label[0]
executor.forward(is_train=True)
executor.backward()
for key in update_keys:
args[key] -= learning_rate * grads[key]
Imperative NDArray can be set as input
nodes to the graph
Executor is bound from
declarative program that
describes the network
Imperative parameter update on GPU
34. Mixed API for Quick Extensions
• Runtime switching between different graphs depending on input
• Useful for sequence modeling and image size reshaping
• Use of imperative code in Python, 10 lines of additional Python code
BucketingVariable length sentences
41. Getting Started
• Website
http://mxnet.io/
• GitHub repository
git clone —recursive git@github.com:dmlc/mxnet.git
• Docker
docker pull dmlc/mxnet
• Amazon AWS Deep Learning AMI (with other toolkits & anaconda)
https://aws.amazon.com/marketplace/pp/B01M0AXXQB
http://bit.ly/deepami
• CloudFormation Template
https://github.com/dmlc/mxnet/tree/master/tools/cfn
http://bit.ly/deepcfn
42. Acknowledgements
• User engagement
How Jing, Chao-Yuan Wu
• Temporal recommenders
Chao-Yuan Wu, Alex Beutel, Amr Ahmed
• MXNet & Deep Learning AMI
Mu Li, Tianqi Chen, Bing Xu, Eric Xie, Joseph Spisak,
Naveen Swamy, Anirudh Subramanian and many more …
We are hiring
{smola, thakerb, spisakj}@amazon.com