MLconf NYC Edo Liberty

•

1 like•567 views

This document discusses streaming data mining. It begins by explaining the difference between single machine and distributed data mining. It then introduces the streaming model for distributed data mining where data and computation are distributed across multiple machines in parallel. The document provides examples of algorithms for mining frequent items in data streams and maintaining approximate distributions. It also discusses using the streaming model for threading machine generated emails and mentions other types of mining that can be done in the streaming model like dimensionality reduction and clustering.

Technology Education

Streaming Data Mining
PRESENTED BY Edo Liberty⎪ April 11, 2014
Copyright © 2014 Yahoo! All rights reserved. No reproduction or distribution allowed without express written permission.
Parts of this presentation
were given with Jelani Nelson
(Harvard) as a KDD tutorial on
streaming data mining.

2 Yahoo Confidential & Proprietary
Data
Computation Result
The World
Single machine data mining

3 Yahoo Confidential & Proprietary
Data Data Data Data
Computation Result
The World
Distributed storage

4 Yahoo Confidential & Proprietary
Data +
Compute
Data +
Compute
Data +
Compute
Data +
Compute
Computation Result
The World
Data +
Compute
Data +
Compute
Data +
Compute
Data +
Compute
Distributed model (map/reduce, message passing, …)

5 Yahoo Confidential & Proprietary
Data +
Compute
Data +
Compute
Data +
Compute
Data +
Compute
Computation Result
The World
Data +
Compute
Data +
Compute
Data +
Compute
Data +
Compute
ComputationQuery
Distributed model (indexes, tables, databases, …)

207 big-data infographics (meta infographic)
6 Yahoo Confidential & Proprietary

8 Yahoo Confidential & Proprietary
Sketch
The World
Query Algorithm ResultQuery
Result
Computation
The streaming model

9 Yahoo Confidential & Proprietary
Aggregate+
Sketch
The World
Query Algorithm ResultQuery
Result
Compute
+ Sketch
Compute
+ Sketch
Compute
+ Sketch
Compute
+ Sketch
The parallel streaming model

10 Yahoo Confidential & Proprietary
1 7 8 1 0 1 7 7
Sketch
Result
Iterator
Computation
The streaming model (more accurately)
O(n)Items
O(polylog(n)) Space
O(polylog(n)) Computation per item

11 Yahoo Confidential & Proprietary
Sketch Result
Iterator Iterator
Communication complexity
1 7 8 1 0 1 7 7

Frequent items
Misra, Gries. Finding repeated elements, 1982.
Demaine, Lopez-Ortiz, Munro. Frequency estimation of internet packet streams with limited space, 2002
Karp, Shenker, Papadimitriou. A simple algorithm for finding frequent elements in streams and bags, 2003
The name ``Lossy Counting" was used for a different algorithm by Manku and Motwani, 2002
Metwally, Agrawal, Abbadi, Efficient Computation of Frequent and Top-k Elements in Data Streams, 2006

13 Yahoo Confidential & Proprietary
d
n
f( ) = 5

14 Yahoo Confidential & Proprietary
f( ) = 5
d

22 Yahoo Confidential & Proprietary
f0
( ) = 0
`
f0
( ) = 2

23 Yahoo Confidential & Proprietary
Assume we do this timest
Second fact: f0
(x) f(x) t
f0
(x)  f(x)First fact:
The proof (very short)

24 Yahoo Confidential & Proprietary
Third (not so obvious) fact:
Which gives . In words:
We can only delete items times!
t  n/`
0
P
f0
(x) =
P
f(x) t · ` = n t · `
⌅
The proof (very short)
` n/`
|f0
(x) f(x)|  n/`

Useful form…
25 Yahoo Confidential & Proprietary
Define
And
We get that
This is very useful for keeping approx’ distributions!
p(x) = f(x)/n
p0
(x) = f0
(x)/n
|p0
(x) p(x)|  1/`

27 Yahoo Confidential & Proprietary
Email threads
A simple email thread (that’s not very hard to do…)

Threading Machine Generated Email
28 Yahoo Confidential & Proprietary
Ailon, Karnin, Maarek, Liberty, Threading Machine Generated Email, WSDM 2013

29 Yahoo Confidential & Proprietary
Threading Machine Generated Email

30 Yahoo Confidential & Proprietary
Threading Machine Generated Email

What else can we do in the streaming model…
31 Yahoo Confidential & Proprietary
Items (words, IP-adresses, events, clicks,...):
§  Item frequencies
§  Counting distinct elements
§  Moment and entropy estimation
§  Approximate set operations
Vectors (text documents, images, example features,...)
§  Dimensionality reduction
§  Clustering (k-means, k-median,…)
§  Linear Regression
§  Machine learning (some of it at least)
Matrices (text corpora, user preferences, graphs...)
§  Covariance estimation matrix
§  Low rank approximation
§  Sparsification

Thanks!
32 Yahoo Confidential & Proprietary
Yahoo does big data algorithms, software and systems!
Speak to our Talent Team or visit Careers.Yahoo.com and explore our
career opportunities in NYC or Sunnyvale, CA
Seth Tropper
satropper@yahoo-inc.com
Doug DeSimone
desimone@yahoo-inc.com
Keith Daniels
kdnl@yahoo-inc.com
Yahoo is an equal opportunity employer.

ggplot2 Examples. References: ggplot2 Elegant Graphics for Data Analysis; Wickham, Hadley http://www.stat.wisc.edu/~larget/stat302/chap2.pdf https://cran.r-project.org/web/packages/ggthemes/vignettes/ggthemes.html https://www3.nd.edu/~steve/computing_with_data/11_geom_examples/ggplot_examples.html http://seananderson.ca/ggplot2-FISH554/ http://ggobi.github.io/ggally/docs.html#columns_and_mapping

Este año se cumple el 10º aniversario de la publicación de uno de los papers que más impacto han tenido en la evolución de Internet. Elaborado por dos ingenieros de Google, supuso el pistoletazo de salida para el surgimiento de las tecnologías que se engloban dentro del concepto Big Data. En la charla introduciremos los conceptos básicos de este modelo de programación y realizaremos un ejemplo utilizando el lenguaje python.

A Survey Of R Graphics

Dataspora

Funções 5

KalculosOnline

Data Visualization with R.ggplot2 and its extensions examples.

Dr. Volkan OBAN

Seminar PSU 10.10.2014 mme

Vyacheslav Arbuzov

การจัดการฉากหลังของสไลด์

PomPam Comsci

คู่มือการใช้ MS Power Point 2003 การจัดการฉากหลังของสไลด์ เรียบเรียงโดย : ณัฐพร น้ำใจดี ที่มา : ศูนย์เทคโนโลศูนย์เทคโนโลยีสารสนเทศและการสื่อสาร สำนักงานปลัดกระทรวงสาธารณสุข

MS2 POwer RulesChadwick International School

CLIM Undergraduate Workshop: (Attachment) Performing Extreme Value Analysis (...

The Statistical and Applied Mathematical Sciences Institute

Exponential Functionsacwalk03

Seminar psu 20.10.2013Vyacheslav Arbuzov

Machine Learning Summer School 2016

chris wiggins

Lectures delivered Aug 8-9, 2016 at MLSS.cc (Arequipa, Peru) "Data Science @ The New York Times". Topics include: descriptive/predictive/prescriptive modeling boosting & surrogate loss functions pp103-117 causality via generative models (p141-215) POISE: “policy optimization via importance sample estimation” connection w/causal inference & matching (p188-215) brief intro to instrumental variables (p230-248) bandits (p249-316) connecting thompson sampling with generative modeling (p259-305)

Using Topological Data Analysis on your BigData

AnalyticsWeek

Synopsis: Topological Data Analysis (TDA) is a framework for data analysis and machine learning and represents a breakthrough in how to effectively use geometric and topological information to solve 'Big Data' problems. TDA provides meaningful summaries (in a technical sense to be described) and insights into complex data problems. In this talk, Anthony will begin with an overview of TDA and describe the core algorithm that is utilized. This talk will include both the theory and real world problems that have been solved using TDA. After this talk, attendees will understand how the underlying TDA algorithm works and how it improves on existing “classical” data analysis techniques as well as how it provides a framework for many machine learning algorithms and tasks. Speaker: Anthony Bak, Senior Data Scientist, Ayasdi Prior to coming to Ayasdi, Anthony was at Stanford University where he did a postdoc with Ayasdi co-founder Gunnar Carlsson, working on new methods and applications of Topological Data Analysis. He completed his Ph.D. work in algebraic geometry with applications to string theory at the University of Pennsylvania and ,along the way, he worked at the Max Planck Institute in Germany, Mount Holyoke College in Germany, and the American Institute of Mathematics in California.

Big Data On Data You Don’t Have

J On The Beach

Traditional Big Data is done on Data you have. You load the data into a repository and perform map reduce or other style calculations on the data. However, certain industries need to perform complex operations on data you might not have. Data you can acquire, Data that can be shared with you, and Data that you can model are all types of data you may not have but may need to integrate instantly into a complex data analysis. Problem is: you may not even know you need this data until deep into the execution stack at runtime. This talk discusses a new functional language paradigm for dealing naturally with data you don’t have and about how to make all data first-class citizens, regardless of whether you have it or you don’t, and we will give a demo of a project written in Scala to deal exactly with this issue.

Data Democratization at Nubank

Databricks

Nubank is the leading fintech in Latin America. Using bleeding-edge technology, design, and data, the company aims to fight complexity and empower people to take control of their finances. We are disrupting an outdated and bureaucratic system by building a simple, safe and 100% digital environment. In order to succeed, we need to constantly make better decisions in the speed of insight, and that’s what We aim when building Nubank’s Data Platform. In this talk we want to explore and share the guiding principles and how we created an automated, scalable, declarative and self-service platform that has more than 200 contributors, mostly non-technical, to build 8 thousand distinct datasets, ingesting data from 800 databases, leveraging Apache Spark expressiveness and scalability. The topics we want to explore are: – Making data-ingestion a no-brainer when creating new services – Reducing the cycle time to deploy new Datasets and Machine Learning models to production – Closing the loop and leverage knowledge processed in the analytical environment to take decisions in production – Providing the perfect level of abstraction to users You will get from this talk: – Our love for ‘The Log’ and how we use it to decouple databases from its schema and distribute the work to keep schemas up to date to the entire team. – How we made data ingestion so simple using Kafka Streams that teams stopped using databases for analytical data. – The huge benefits of relying on the DataFrame API to create datasets which made possible having tests end-to-end verifying that the 8000 datasets work without even running a Spark Job and much more. – The importance of creating the right amount of abstractions and restrictions to have the power to optimize.

UBC STAT545 2014 Cm001 intro to-course

Jennifer Bryan

F sharp - an overview

Christoph Santschi

Master Minds on Data Science - Arno Siebes

Media Perspectives

208 dataflowdgm

TCT

Data flow

Hady Saeed

How to Data Flow Diagram

جلال مصطفیٰ

Ijarcet vol-2-issue-4-1579-1582Editor IJARCET

What's hot

Mi primer map reduce

betabeers

Mi primer map reduce

Ruben Orta

A Survey Of R Graphics

Dataspora

Funções 5

KalculosOnline

Data Visualization with R.ggplot2 and its extensions examples.

Dr. Volkan OBAN

Seminar PSU 10.10.2014 mme

Vyacheslav Arbuzov

การจัดการฉากหลังของสไลด์

PomPam Comsci

MS2 POwer RulesChadwick International School

CLIM Undergraduate Workshop: (Attachment) Performing Extreme Value Analysis (...

The Statistical and Applied Mathematical Sciences Institute

Exponential Functionsacwalk03

Seminar psu 20.10.2013Vyacheslav Arbuzov

What's hot (11)

Mi primer map reduce

A Survey Of R Graphics

Funções 5

Data Visualization with R.ggplot2 and its extensions examples.

Seminar PSU 10.10.2014 mme

การจัดการฉากหลังของสไลด์

MS2 POwer Rules

CLIM Undergraduate Workshop: (Attachment) Performing Extreme Value Analysis (...

Exponential Functions

Seminar psu 20.10.2013

Similar to MLconf NYC Edo Liberty

Machine Learning Summer School 2016

chris wiggins

Using Topological Data Analysis on your BigData

AnalyticsWeek

Big Data On Data You Don’t Have

J On The Beach

Data Democratization at Nubank

Databricks

UBC STAT545 2014 Cm001 intro to-course

Jennifer Bryan

F sharp - an overview

Christoph Santschi

Master Minds on Data Science - Arno Siebes

Media Perspectives

208 dataflowdgm

TCT

Data flow

Hady Saeed

How to Data Flow Diagram

جلال مصطفیٰ

Ijarcet vol-2-issue-4-1579-1582Editor IJARCET

Applied Business Statistics ,ken black , ch 5

AbdelmonsifFadl

Applications of Machine Learning at UCSB

Sri Ambati

208 dataflowdgmmarwakhalid

Data Science, what even...

David Coallier

Intelligent Ruby + Machine LearningIlya Grigorik

Data visualization in Python

Marc Garcia

Data Science, what even?!

David Coallier

Sustainable Logging – SplunkLive! 2014

Paul Gilowey

How it works- Data Science

Edureka!

Similar to MLconf NYC Edo Liberty (20)

Machine Learning Summer School 2016

Using Topological Data Analysis on your BigData

Big Data On Data You Don’t Have

Data Democratization at Nubank

UBC STAT545 2014 Cm001 intro to-course

F sharp - an overview

Master Minds on Data Science - Arno Siebes

208 dataflowdgm

Data flow

How to Data Flow Diagram

Ijarcet vol-2-issue-4-1579-1582

Applied Business Statistics ,ken black , ch 5

Applications of Machine Learning at UCSB

208 dataflowdgm

Data Science, what even...

Intelligent Ruby + Machine Learning

Data visualization in Python

Data Science, what even?!

Sustainable Logging – SplunkLive! 2014

How it works- Data Science

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

MLconf

Understanding Human Impact: Social and Equity Assessments for AI Technologies Social and Equity Impact Assessments have broad applications but can be a useful tool to explore and mitigate for Machine Learning fairness issues and can be applied to product specific questions as a way to generate insights and learnings about users, as well as impacts on society broadly as a result of the deployment of new and emerging technologies. In this presentation, my goal is to advocate for and highlight the need to consult community and external stakeholder engagement to develop a new knowledge base and understanding of the human and social consequences of algorithmic decision making and to introduce principles, methods and process for these types of impact assessments.

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

MLconf

The Brain’s Guide to Dealing with Context in Language Understanding Like the visual cortex, the regions of the brain involved in understanding language represent information hierarchically. But whereas the visual cortex organizes things into a spatial hierarchy, the language regions encode information into a hierarchy of timescale. This organization is key to our uniquely human ability to integrate semantic information across narratives. More and more, deep learning-based approaches to natural language understanding embrace models that incorporate contextual information at varying timescales. This has not only led to state-of-the art performance on many difficult natural language tasks, but also to breakthroughs in our understanding of brain activity. In this talk, we will discuss the important connection between language understanding and context at different timescales. We will explore how different deep learning architectures capture timescales in language and how closely their encodings mimic the brain. Along the way, we will uncover some surprising discoveries about what depth does and doesn’t buy you in deep recurrent neural networks. And we’ll describe a new, more flexible way to think about these architectures and ease design space exploration. Finally, we’ll discuss some of the exciting applications made possible by these breakthroughs.

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

MLconf

Applying Computer Vision to Reduce Contamination in the Recycling Stream With China’s recent refusal of most foreign recyclables, North American waste haulers are scrambling to figure out how to make on-shore recycling cost-effective in order to continue providing recycling services. Recyclables that were once being shipped to China for manual sorting are now primarily being redirected to landfills or incinerators. Without a solution, a nearly $5 billion annual recycling market could come to a halt. Purity in the recycling stream is key to this effort as contaminants in the stream can increase the cost of operations, damage equipment and reduce the ability to create pure commodities suitable for creating recycled goods. This market disruption as a result of China’s new regulations, however, provides us the chance to re-examine and improve our current disposal & collection habits with modern monitoring & artificial intelligence technology. Using images from our in-dumpster cameras, Compology has developed an ML-based process that helps identify, measure and alert for contaminants in recycling containers before they are picked-up, helping keep the recycling stream clean. Our convolutional neural network flags potential instances of contamination inside a dumpster, enabling garbage haulers to know which containers have the wrong type of material inside. This allows them to provide targeted, timely education, and when appropriate, assess fines, to improve recycling compliance at the businesses and residences they serve, helping keep recycling services financially viable. In this presentation, we will walk through our ML-based contamination measurement and scoring process by showing how Waste Management, a national waste hauler, has experienced 57% contamination reduction in nearly 2,000 containers over six months, This progress shows significant strides towards financially viable recycling services.

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

MLconf

Quantum Computing: a Treasure Hunt, not a Gold Rush Quantum computers promise a significant step up in computational power over conventional computers, but also suffer a number of counterintuitive limitations --- both in their computational model and in leading lab implementations. In this talk, we review how quantum computers compete with conventional computers and how conventional computers try to hold their ground. Then we outline what stands in the way of successful quantum ML applications.

Josh Wills - Data Labeling as Religious Experience

MLconf

Data Labeling as Religious Experience One of the most common places to deploy a production machine learning systems is as a replacement for a legacy rules-based system that is having a hard time keeping up with new edge cases and requirements. I'll be walking through the process and tooling we used to help us design, train, and deploy a model to replace a set of static rules we had for handling invite spam at Slack, talk about what we learned, and discuss some problems to solve in order to make these migrations easier for everyone.

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

MLconf

Project GaitNet: Ushering in the ImageNet moment for human Gait kinematics The emergence of the upright human bipedal gait can be traced back 4 to 2.8 million years ago, to the now extinct hominin Australopithecus afarensis. Fine grained analysis of gait using the modern MEMS sensors found on all smartphones not just reveals a lot about the person’s orthopedic and neuromuscular health status, but also has enough idiosyncratic clues that it can be harnessed as a passive biometric. While there were many siloed attempts made by the machine learning community to model Bipedal Gait sensor data, these were done with small datasets oft collected in restricted academic environs. In this talk, we will introduce the ImageNet moment for human gait analysis by presenting 'Project GaitNet', the largest ever planet-sized motion sensor based human bipedal gait dataset ever curated. We’ll also present the associated state-of-the-art results in classifying humans harnessing novel deep neural architectures and the related success stories we have enjoyed in transfer-learning into disparate domains of human kinematics analysis.

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

MLconf

Machine Learning Methods in Detecting Alzheimer’s Disease from Speech and Language Alzheimer's disease affects millions of people worldwide, and it is important to predict the disease as early and as accurate as possible. In this talk, I will discuss development of novel ML models that help classifying healthy people from those who develop Alzheimer's, using short samples of human speech. As an input to the model, features of different modalities are extracted from speech audio samples and transcriptions: (1) syntactic measures, such as e.g. production rules extracted from syntactic parse trees, (2) lexical measures, such as e.g. features of lexical richness and complexity and lexical norms, and (3) acoustic measures, such as e.g. standard Mel-frequency cepstral coefficients. I will present the ML model that detects cognitive impairment by reaching agreement among modalities. The resulting model is able to achieve state of the art performance in both supervised and semi-supervised manner, using manual transcripts of human speech. Additionally, I will discuss potential limitations of any fully-automated speech-based Alzheimer's disease detection model, focusing mostly on the analysis of the impact of a not-so-accurate automatic speech recognition (ASR) on the classification performance. To illustrate this, I will present the experiments with controlled amounts of artificially generated ASR errors and explain how the deletion errors affect Alzheimer's detection performance the most, due to their impact on the features of syntactic and lexical complexity.

Meghana Ravikumar - Optimized Image Classification on the Cheap

MLconf

Optimized Image Classification on the Cheap In this talk, we anchor on building an image classifier trained on the Stanford Cars dataset to evaluate two approaches to transfer learning -fine tuning and feature extraction- and the impact of hyperparameter optimization on these techniques. Once we define the most performant transfer learning technique for Stanford Cars, we will double the size of the dataset through image augmentation to boost the classifier’s performance. We will use Bayesian optimization to learn the hyperparameters associated with image transformations using the downstream image classifier’s performance as the guide. In conjunction with model performance, we will also focus on the features of these augmented images and the downstream implications for our image classifier. To both maximize model performance on a budget and explore the impact of optimization on these methods, we apply a particularly efficient implementation of Bayesian optimization to each of these architectures in this comparison. Our goal is to draw on a rigorous set of experimental results that can help us answer the question: how can resource-constrained teams make trade-offs between efficiency and effectiveness using pre-trained models?

Noam Finkelstein - The Importance of Modeling Data Collection

MLconf

The Importance of Modeling Data Collection Data sets used in machine learning are often collected in a systematically biased way - certain data points are more likely to be collected than others. We call this "observation bias". For example, in health care, we are more likely to see lab tests when the patient is feeling unwell than otherwise. Failing to account for observation bias can, of course, result in poor predictions on new data. By contrast, properly accounting for this bias allows us to make better use of the data we do have. In this presentation, we discuss practical and theoretical approaches to dealing with observation bias. When the nature of the bias is known, there are simple adjustments we can make to nonparametric function estimation techniques, such as Gaussian Process models. We also discuss the scenario where the data collection model is unknown. In this case, there are steps we can take to estimate it from observed data. Finally, we demonstrate that having a small subset of data points that are known to be collected at random - that is, in an unbiased way - can vastly improve our ability to account for observation bias in the rest of the data set. My hope is that attendees of this presentation will be aware of the perils of observation bias in their own work, and be equipped with tools to address it.

June Andrews - The Uncanny Valley of ML

MLconf

The Uncanny Valley of ML Every so often, the conundrum of the Uncanny Valley re-emerges as advanced technologies evolve from clearly experimental products to refined accepted technologies. We have seen its effects in robotics, computer graphics, and page load times. The debate of how to handle the new technology detracts from its benefits. When machine learning is added to human decision systems a similar effect can be measured in increased response time and decreased accuracy. These systems include radiology, judicial assignments, bus schedules, housing prices, power grids and a growing variety of applications. Unfortunately, the Uncanny Valley of ML can be hard to detect in these systems and can lead to degraded system performance when ML is introduced, at great expense. Here, we'll introduce key design principles for introducing ML into human decision systems to navigate around the Uncanny Valley and avoid its pitfalls.

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

MLconf

Deep Learning Architectures for Semantic Relation Detection Tasks Recognizing and distinguishing specific semantic relations from other types of semantic relations is an essential part of language understanding systems. Identifying expressions with similar and contrasting meanings is valuable for NLP systems which go beyond recognizing semantic relatedness and require to identify specific semantic relations. In this talk, I will first present novel techniques for creating labelled datasets required for training deep learning models for classifying semantic relations between phrases. I will further present various neural network architectures that integrate morphological features into integrated path-based and distributional relation detection algorithms and demonstrate that this model outperforms state-of-the-art models in distinguishing semantic relations and is capable of efficiently handling multi-word expressions.

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

MLconf

Building an Incrementally Trained, Local Taste Aware, Global Deep Learned Recommender System Model At Netflix, our main goal is to maximize our members’ enjoyment of the selected show by minimizing the amount of time it takes for them to find it. We try to achieve this goal by personalizing almost all the aspects of our product -- from what shows to recommend, to how to present these shows and construct their home-pages to what images to select per show, among many other things. Everything is recommendations for us and as an applied Machine Learning group, we spend our time building models for personalization that will eventually increase the joy and satisfaction of our members. In this talk we will primarily focus our attention on a) making a global deep learned recommender model that is regional tastes and popularity aware and b) adapting this model to changing taste preferences as well as dynamic catalog availability. We will first go through some standard recommender system models that use Matrix Factorization and Topic Models and then compare and contrast them with more powerful and higher capacity deep learning based models such as sequence models that use recurrent neural networks. We will show what it entails to build a global model that is aware of regional taste preferences and catalog availability. We will show how models that are built on simple Maximum Likelihood principle fail to do that. We will then describe one solution that we have employed in order to enable the global deep learned models to focus their attention on capturing regional taste preferences and changing catalog.In the latter half of the talk, we will discuss how we do incremental learning of deep learned recommender system models. Why do we need to do that ? Everything changes with time. Users’ tastes change with time. What’s available on Netflix and what’s popular also change over time. Therefore, updating or improving recommendation systems over time is necessary to bring more joy to users. In addition to how we apply incremental learning, we will discuss some of the challenges we face involving large-scale data preparation, infrastructure setup for incremental model training as well as pipeline scheduling. The incremental training enables us to serve fresher models trained on fresher and larger amounts of data. This helps our recommender system to nicely and quickly adapt to catalog and users’ taste changes, and improve overall performance.

Vito Ostuni - The Voice: New Challenges in a Zero UI World

MLconf

Vito Ostuni - The Voice: New Challenges in a Zero UI World The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a daily delightful listening experience for millions of users. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities. In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic, and broad open-ended. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query. We will also present the differences and challenges regarding evaluation of voice powered recommendation systems. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made. In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

MLconf

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

MLconf

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

MLconf

Neel Sundaresan - Teaching a machine to code

MLconf

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

MLconf

Soumith Chintala - Increasing the Impact of AI Through Better Software

MLconf

Roy Lowrance - Predicting Bond Prices: Regime Changes

MLconf

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

Josh Wills - Data Labeling as Religious Experience

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

Meghana Ravikumar - Optimized Image Classification on the Cheap

Noam Finkelstein - The Importance of Modeling Data Collection

June Andrews - The Uncanny Valley of ML

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

Vito Ostuni - The Voice: New Challenges in a Zero UI World

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

Neel Sundaresan - Teaching a machine to code

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

Soumith Chintala - Increasing the Impact of AI Through Better Software

Roy Lowrance - Predicting Bond Prices: Regime Changes

Recently uploaded

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

Product School

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Product School

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Prayukth K V

The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development. The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers: State of global ICS asset and network exposure Sectoral targets and attacks as well as the cost of ransom Global APT activity, AI usage, actor and tactic profiles, and implications Rise in volumes of AI-powered cyberattacks Major cyber events in 2024 Malware and malicious payload trends Cyberattack types and targets Vulnerability exploit attempts on CVEs Attacks on counties – USA Expansion of bot farms – how, where, and why In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East Why are attacks on smart factories rising? Cyber risk predictions Axis of attacks – Europe Systemic attacks in the Middle East Download the full report from here: https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

UiPathCommunity

💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™: See how to accelerate model training and optimize model performance with active learning Learn about the latest enhancements to out-of-the-box document processing – with little to no training required Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath. Speakers: 👨‍🏫 Andras Palfi, Senior Product Manager, UiPath 👩‍🏫 Lenka Dulovicova, Product Program Manager, UiPath

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Tobias Schneck

As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other? Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.

Bits & Pixels using AI for Good.........

Alison B. Lowndes

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

UiPath Test Automation using UiPath Test Suite series, part 3

DianaGray10

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Thierry Lestable

DevOps and Testing slides at DASA Connect

Kari Kakkonen

PCI PIN Basics Webinar from the Controlcase Team

ControlCase

Elevating Tactical DDD Patterns Through Object Calisthenics

Dorra BARTAGUIZ

After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

Recently uploaded (20)