TextCNN is a deep learning model for text classification that uses an embedding layer, 1D convolution filters, and max pooling. It was applied to moderate user generated content (UGC) like reviews and Q&A for an e-commerce company. The new TextCNN models improved over old models, achieving higher ROC scores and allowing more UGC to be auto-accepted while maintaining precision. For reviews, the new model could auto-accept 30% more content while reducing bad approvals. For Q&A, it could auto-accept 377% more content while improving precision by 7%.
The Validity of CNN to Time-Series Forecasting ProblemMasaharu Kinoshita
In order to confirm the validity of CNN to Time-Series Forecasting Problem, RNN, LSTM, and CNN+LSTM models are build and compared with their MSE score.
In this report, the google stock datasets obtained at kaggle are used.
https://github.com/kinopee0219/capstone
The Validity of CNN to Time-Series Forecasting ProblemMasaharu Kinoshita
In order to confirm the validity of CNN to Time-Series Forecasting Problem, RNN, LSTM, and CNN+LSTM models are build and compared with their MSE score.
In this report, the google stock datasets obtained at kaggle are used.
https://github.com/kinopee0219/capstone
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables.
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
Our team achieved 85th position out of 3,514 at the very popular Kaggle Otto Product Classification Challenge. Here's an overview of how we did it, as well as some techniques we learnt from fellow Kagglers during and after the competition.
Top contenders in the 2015 KDD cup include the team from DataRobot comprising Owen Zhang, #1 Ranked Kaggler and top Kagglers Xavier Contort and Sergey Yurgenson. Get an in-depth look as Xavier describes their approach. DataRobot allowed the team to focus on feature engineering by automating model training, hyperparameter tuning, and model blending - thus giving the team a firm advantage.
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Bartlomiej Twardowski
Modeling Contextual Information in Session-Aware Recommender Systems with Neural Networks, RecSys 2016 Boston, Bartłomiej Twardowski
Presentation for a paper:
http://dl.acm.org/citation.cfm?id=2959162
Abstract:
Preparing recommendations for unknown users or such that correctly respond to the short-term needs of a particular user is one of the fundamental problems for e-commerce. Most of the common Recommender Systems assume that user identification must be explicit. In this paper a Session-Aware Recommender System approach is presented where no straightforward user information is required. The recommendation process is based only on user activity within a single session, defined as a sequence of events. This information is incorporated in the recommendation process by explicit context modeling with factorization methods and a novel approach with Recurrent Neural Network (RNN). Compared to the session modeling approach, RNN directly models the dependency of user observed sequential behavior throughout its recurrent structure. The evaluation discusses the results based on sessions from real-life system with ephemeral items (identified only by the set of their attributes) for the task of top-n best recommendations.
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
How should data be preprocessed for use in machine learning algorithms? How to identify the most predictive attributes of a dataset? What features can generate to improve the accuracy of a model?
Feature Engineering is the process of extracting and selecting, from raw data, features that can be used effectively in predictive models. As the quality of the features greatly influences the quality of the results, knowing the main techniques and pitfalls will help you to succeed in the use of machine learning in your projects.
In this talk, we will present methods and techniques that allow us to extract the maximum potential of the features of a dataset, increasing flexibility, simplicity and accuracy of the models. The analysis of the distribution of features and their correlations, the transformation of numeric attributes (such as scaling, normalization, log-based transformation, binning), categorical attributes (such as one-hot encoding, feature hashing, Temporal (date / time), and free-text attributes (text vectorization, topic modeling).
Python, Python, Scikit-learn, and Spark SQL examples will be presented and how to use domain knowledge and intuition to select and generate features relevant to predictive models.
Tensors Are All You Need: Faster Inference with HummingbirdDatabricks
The ever-increasing interest around deep learning and neural networks has led to a vast increase in processing frameworks like TensorFlow and PyTorch. These libraries are built around the idea of a computational graph that models the dataflow of individual units. Because tensors are their basic computational unit, these frameworks can run efficiently on hardware accelerators (e.g. GPUs).Traditional machine learning (ML) such as linear regressions and decision trees in scikit-learn cannot currently be run on GPUs, missing out on the potential accelerations that deep learning and neural networks enjoy.
In this talk, we’ll show how you can use Hummingbird to achieve 1000x speedup in inferencing on GPUs by converting your traditional ML models to tensor-based models (PyTorch andTVM). https://github.com/microsoft/hummingbird
This talk is for intermediate audiences that use traditional machine learning and want to speedup the time it takes to perform inference with these models. After watching the talk, the audience should be able to use ~5 lines of code to convert their traditional models to tensor-based models to be able to try them out on GPUs.
Outline:
Introduction of what ML inference is (and why it’s different than training)
Motivation: Tensor-based DNN frameworks allow inference on GPU, but “traditional” ML frameworks do not
Why “traditional” ML methods are important
Introduction of what Hummingbirddoes and main benefits
Deep dive on how traditional ML models are built
Brief intro onhow Hummingbird converter works
Example of how Hummingbird can convert a tree model into a tensor-based model
Other models
Demo
Status
Q&A
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables.
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
Our team achieved 85th position out of 3,514 at the very popular Kaggle Otto Product Classification Challenge. Here's an overview of how we did it, as well as some techniques we learnt from fellow Kagglers during and after the competition.
Top contenders in the 2015 KDD cup include the team from DataRobot comprising Owen Zhang, #1 Ranked Kaggler and top Kagglers Xavier Contort and Sergey Yurgenson. Get an in-depth look as Xavier describes their approach. DataRobot allowed the team to focus on feature engineering by automating model training, hyperparameter tuning, and model blending - thus giving the team a firm advantage.
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Bartlomiej Twardowski
Modeling Contextual Information in Session-Aware Recommender Systems with Neural Networks, RecSys 2016 Boston, Bartłomiej Twardowski
Presentation for a paper:
http://dl.acm.org/citation.cfm?id=2959162
Abstract:
Preparing recommendations for unknown users or such that correctly respond to the short-term needs of a particular user is one of the fundamental problems for e-commerce. Most of the common Recommender Systems assume that user identification must be explicit. In this paper a Session-Aware Recommender System approach is presented where no straightforward user information is required. The recommendation process is based only on user activity within a single session, defined as a sequence of events. This information is incorporated in the recommendation process by explicit context modeling with factorization methods and a novel approach with Recurrent Neural Network (RNN). Compared to the session modeling approach, RNN directly models the dependency of user observed sequential behavior throughout its recurrent structure. The evaluation discusses the results based on sessions from real-life system with ephemeral items (identified only by the set of their attributes) for the task of top-n best recommendations.
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
How should data be preprocessed for use in machine learning algorithms? How to identify the most predictive attributes of a dataset? What features can generate to improve the accuracy of a model?
Feature Engineering is the process of extracting and selecting, from raw data, features that can be used effectively in predictive models. As the quality of the features greatly influences the quality of the results, knowing the main techniques and pitfalls will help you to succeed in the use of machine learning in your projects.
In this talk, we will present methods and techniques that allow us to extract the maximum potential of the features of a dataset, increasing flexibility, simplicity and accuracy of the models. The analysis of the distribution of features and their correlations, the transformation of numeric attributes (such as scaling, normalization, log-based transformation, binning), categorical attributes (such as one-hot encoding, feature hashing, Temporal (date / time), and free-text attributes (text vectorization, topic modeling).
Python, Python, Scikit-learn, and Spark SQL examples will be presented and how to use domain knowledge and intuition to select and generate features relevant to predictive models.
Tensors Are All You Need: Faster Inference with HummingbirdDatabricks
The ever-increasing interest around deep learning and neural networks has led to a vast increase in processing frameworks like TensorFlow and PyTorch. These libraries are built around the idea of a computational graph that models the dataflow of individual units. Because tensors are their basic computational unit, these frameworks can run efficiently on hardware accelerators (e.g. GPUs).Traditional machine learning (ML) such as linear regressions and decision trees in scikit-learn cannot currently be run on GPUs, missing out on the potential accelerations that deep learning and neural networks enjoy.
In this talk, we’ll show how you can use Hummingbird to achieve 1000x speedup in inferencing on GPUs by converting your traditional ML models to tensor-based models (PyTorch andTVM). https://github.com/microsoft/hummingbird
This talk is for intermediate audiences that use traditional machine learning and want to speedup the time it takes to perform inference with these models. After watching the talk, the audience should be able to use ~5 lines of code to convert their traditional models to tensor-based models to be able to try them out on GPUs.
Outline:
Introduction of what ML inference is (and why it’s different than training)
Motivation: Tensor-based DNN frameworks allow inference on GPU, but “traditional” ML frameworks do not
Why “traditional” ML methods are important
Introduction of what Hummingbirddoes and main benefits
Deep dive on how traditional ML models are built
Brief intro onhow Hummingbird converter works
Example of how Hummingbird can convert a tree model into a tensor-based model
Other models
Demo
Status
Q&A
How Criteo optimized and sped up its TensorFlow models by 10x and served them...Nicolas Kowalski
When you access a web page, bidders such as Criteo must determine in a few dozens of milliseconds if they want to purchase the advertising space on the page. At that moment, a real-time auction takes place, and once you remove all the communication exchange delays, it leaves a handful of milliseconds to compute exactly how much to bid. In the past year, Criteo has put a large amount of effort into reshaping its in-house machine learning stack responsible for making such predictions—in particular, opening it to new technologies such as TensorFlow.
Unfortunately, even for simple logistic regression models and small neural networks, Criteo’s initial TensorFlow implementations saw inference time increase by 100, going from 300 microseconds to 30 milliseconds.
Nicolas Kowalski and Axel Antoniotti outline how Criteo approached this issue, discussing how Criteo profiled its model to understand its bottleneck; why commonly shared solutions such as optimizing TensorFlow build for the target hardware, freezing and cleaning up the model, and using accelerated linear algebra (XLA) ended up being lackluster; and how Criteo rewrote is models from scratch, reimplementing cross-features and hashing functions using low-level TF operations in order to factorize as much as possible all TensorFlow nodes in its model.
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
People talk about a Moore's Law for gene sequencing, a Moore's Law for software, etc. This is talk is about *the* Moore's Law, the bull that the other "Laws" ride; and how Python-powered ML helps drive it. How do we keep making ever-smaller devices? How do we harness atomic-scale physics? Large-scale machine learning is key. The computation drives new chip designs, and those new chip designs are used for new computations, ad infinitum. High-dimensional regression, classification, active learning, optimization, ranking, clustering, density estimation, scientific visualization, massively parallel processing -- it all comes into play, and Python is powering it all.
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
See our presentation from the 6th International EULAG Users Workshop. We talked about taking HPC to the "Industry 4.0" by implementing smart techniques to optimize the codes in terms of performance and energy consumption. It explains how Machine Learning can dynamically optimize HPC simulations and byteLAKE's software autotuning solution.
Find out more about byteLAKE at: www.byteLAKE.com
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Alex Conway
Slides for my talk on:
"Convolutional Neural Networks for Image Classification"
...at the Cape Town Deep Learning Meet-up 20170620
https://www.meetup.com/Cape-Town-deep-learning/events/240485642/
Handwritten Digit Recognition and performance of various modelsation[autosaved]SubhradeepMaji
This presentation is all about handwritten digit recognition of different people using Convolution Neural Network and compare the performance of different models based on different sequence of layers.
Building a Tensorflow-based model that extracts the "best" frames from a video, which are then used as auto-generated thumbnails and thumbstrips. We used transfer learning on Google's Inceptionv3 model, which was pretrained with ImageNet data and retrained on JW Player's thumbnail library.
In 2001, as early high-speed networks were deployed, George Gilder observed that “when the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances.” Two decades later, our networks are 1,000 times faster, our appliances are increasingly specialized, and our computer systems are indeed disintegrating. As hardware acceleration overcomes speed-of-light delays, time and space merge into a computing continuum. Familiar questions like “where should I compute,” “for what workloads should I design computers,” and "where should I place my computers” seem to allow for a myriad of new answers that are exhilarating but also daunting. Are there concepts that can help guide us as we design applications and computer systems in a world that is untethered from familiar landmarks like center, cloud, edge? I propose some ideas and report on experiments in coding the continuum.
This presentation introduces Deep Learning (DL) concepts, such as neural neworks, backprop, activation functions, and Convolutional Neural Networks, followed by an Angular application that uses TypeScript in order to replicate the Tensorflow playground.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
2. ● Stuffs you are interested in
○ TextCNN
○ Why you should try deep-learning on text data
● On real product
○ Acme review / Q&A moderation
○ Network Spec
Outline
2
4. TextCNN: architecture
1. Architecture
a. Embedding
i. Trainable, or not (if using pretrained)
b. 1d convolution filters
i. Each conv “searches for this pattern”
ii. Pattern to be searched is trainable
iii. hundreds/thousands filters per size
c. Maxpool results
i. Means “max density of certain pattern”
2. Keras code: just 10 lines
4
5. Let’s take an intuitive analogy!
(Keyword: Convolution, MaxPool)
5
6. 6
I’m Llama (sample L) I’m Alpaca (sample A)
1. You want to classify whether the right animal is Llama.
7. 7
2. So you find traits of Llama as your filters.
Traits (Filters)
8. 8
3. Then you do a convolution to find maxmatch of each trait.
Llama Traits (Filters)
0%
70%
10%
Filters finding traits
(by Convolution)
70%
Best match of 1st trait
is 70% (by MaxPool)
10%
5%
9. 9
Llama Traits (Filters)
0%
10%
80%
Filters finding traits
(by Convolution)
80%
Best match of 2nd trait
is 80% (by MaxPool)
60%
15%
3. Then you do a convolution to find maxmatch of each trait.
10. 10
Llama Traits (Filters)
0%
10%
10%
Filters finding traits
(by Convolution)
60%
Best match of 3rd trait
is 60% (by MaxPool)
40%
60%
3. Then you do a convolution to find maxmatch of each trait.
11. 11
4. Finally, you have similarities for each trait (features!)
Llama Traits (Filters)
70%
80%
60%
12. 12
5. Make final decision (model) out of your features.
(In neural network, multi-layer perceptron is simplest.)
Llama Traits (Filters)
70%
80%
60%
M
Any
classifier
you love!
13. Conv & MaxPool
Now you got the idea,
Let’s dive into a bit more detail.
13
14. TextCNN: what is a 1d convolution?
14
Let’s talk in Convolutions!
10*3 + 20*0 + 30*1 + 40*2 = 140
Text Data Filter
15. TextCNN: what is a 1d convolution?
15
Let’s talk in Convolution!
10*3 + 20*0 + 30*1 + 40*2 = 140
10*0 + 20*0 + 30*0 + 40*0 = 0
Text Data Filter
16. TextCNN: what is a 1d convolution?
16
Let’s talk in Convolution!
10*3 + 20*0 + 30*1 + 40*2 = 140
10*0 + 20*0 + 30*0 + 40*0 = 0
10*0 + 20*2 + 30*0 + 40*0 = 40
Text Data Filter
17. TextCNN: what is a 1d convolution?
17
Let’s talk in Convolution!
10*3 + 20*0 + 30*1 + 40*2 = 140
10*0 + 20*0 + 30*0 + 40*0 = 0
10*0 + 20*2 + 30*0 + 40*0 = 40
10*0 + 20*0 + 30*0 + 40*0 = 0
Note: you could specify activation function in Conv1D to adjust your output,
and seems like people is using relu in TextCNN.
Text Data Filter
18. 1. It’s idea similar to ngram-bow/tfidf, but:
a. Don’t need an exact match, words with similar
meanings also contribute.
b. Weighted among tokens/dimensions.
c. Maxpool make only the most matched pattern
have final effect.
2. It’s including
a. Training the embedding
b. Training the feature finder
c. into your supervised learning process,
dedicated to your data.
3. While most of traditional feature-finding & extraction
is actually unsupervised-learning process.
TextCNN: what is a 1d convolution?
18
Text Data Filter
19. Now you know the details.
Let’s return to full picture!
19
20. TextCNN: architecture
1. Architecture
a. Embedding
i. Trainable, or not (if using pretrained)
b. 1d convolution filters
i. Each “searches for this pattern”
ii. Pattern to be searched is trainable
iii. hundreds/thousands filters per size
c. Maxpool results
i. Means “max density of certain pattern”
2. Keras code: just 10 lines
20
21. 1. Input: 140 as the max word count of doc.
2. 60k for english dictionary size
3. 300 is convention of embedding size
4. region size=[2,3,4] as example figure
5. filter=2, as example figure
6. dropout=0.5 because Hinton said so.
TextCNN: code detail
21
22. Text_1
(ex: resume)
Advance trick: concatenate channels from multiple features
22
MLP
...
?
[Output stage]
sigmoid, softmax,
linear… according to
your response type.
textcnn_vec1
textcnn_vec2
Text_2
(ex: jobTitle)
23. Your Fancy
Feature
Engineering
traditional
features
Advance trick: then concatenate regular features you love
23
MLP
...
?
[Output stage]
sigmoid, softmax,
linear… according to
your response type.
textcnn_vec1
textcnn_vec2
+
Text_1
(ex: resume)
Text_2
(ex: jobTitle)
Others
(ex: sex/lang)
25. ● Doing much better than ngram
○ Embedding => better resolution than word level
○ words having similar meaning will also work
○ Including feature extraction in supervised learning for your data.
● No Manual Feature Extraction
○ No need to reproduce feature extraction in deploy (Java) domain.
○ Feature extraction from text data could be computing expensive:
■ Dictionary based feature is slow for large sample size
■ Model based features like NMF, LDA is super slow.
● Pretrained embeddings give you a boost (word2vec, GloVe, FastText):
○ Idea like transfer learning.
○ Then your embedding could further fine tune it to fit your dataset.
Better features = Better performance
25
26. ● Customize you model according to your data/purpose.
○ Switch output layer / activation function for different response type / range.
○ Customize architecture according to your data characteristic.
○ Mess around with hidden layers / dropout / different activate function.
● Merit as an online model (SGD over BGD)
○ Sustainable model: old model + new samples = new model having old experience!
○ Memory friendly: don’t need to load all samples in memory, choose mini-batch size fitting
your usage.
● GPU speed you!
Customizable / Reusability!
26
27. Deployment: Tensorflow Java API (doc)
27
1. Load protobuf model
2. Input tokenized text
3. Get prediction results
28. ● Big network => expensive computing power
○ Embedding layer is expensive, since it’s a huge fully-connected layer.
○ Reference: the predicting throughput is ~750/sec in our deploying model. (w/o GPU)
● Larger model
○ Both model we’re deploying is 100Mb with 13M total/trainable parameters, while simple tree
based model could be < 10 Mb.
○ In our case, it takes 4.5 hours to train on 1.2M reviews in 1 epoch, with 8cpu/32G ram.
● Solutions for throughput:
○ GPU acceleration
○ Do predicting in parallel on product
○ Use coarse model to narrow down search space, only use fine-grain model in sorting
promising candidates.
No free lunch, it will cost you ...
28
30. Preliminary
● User generated content (UGC) is valuable asset in Acme.
● Bad UGC will ruin user experience, or get us sued.
● Today we’re talking about model moderating Reviews and Q&A.
30
32. Somehow the old model tend to be too confidence in separating sample to 1/0.
Acme reviews classifier: class distribution
32
33. Since the prediction distribution is smooth in NN, the precision-recall curve threshold is smoothly
scattered in whole range, too.
Acme reviews classifier
33
34. ● Target : auto-accepting 80% user content, since we don’t won’t human moderating more.
● Currently, NOT auto-rejecting anything since stack holder asked so.
New model
Auto-accepting 80%
82% class-1 precision
Old model
Auto-accepting 80%
74% class-1 precision
Acme reviews classifier
34
Bad
33%
Good
66%
0.82-0.74 / (1-0.74) = 30% less
bad content being approved
35. Acme Q&A answers classifier
35
Old model
ROC=0.668
New model
ROC=0.844
36. The old model prediction seems truncated by some reason.
Acme Q&A answers classifier
36
37. Acme Q&A answers classifier
37
Since the prediction distribution is smooth in NN, the precision-recall curve threshold is smoothly
scattered in whole range, too.
38. ● Target : auto-accepting 80% user content, since we don’t won’t human moderating more.
● Currently, NOT auto-rejecting anything since stack holder asked so.
Auto-moderating 377%
Precision improved 7%
New model
Auto-accepting 68%
90% class-1 precision
Old model
Auto-accepting 18%
83% class-1 precision
Acme QnA answers classifier
38
Bad
30%
Good
66%
Good
70%
39. ● LGB performance stocked 0.77 @ ~350k training samples
● TextCNN performance stocked 0.83 @ 1.2M training samples
● LSTM seems might still growing after 0.83? But we don’t have more samples.
Learning Curve (QnA Invalidator)
39