This document outlines an agenda for a CTO summit on machine learning and deep learning topics. It includes discussions on CNN and RNN architectures, word embeddings, entity embeddings, reinforcement learning, and tips for training deep neural networks. Specific applications mentioned include self-driving cars, image captioning, language modeling, and modeling store sales. It also includes summaries of papers and links to code examples.
Slides from the TensorFlow meetup at eBay NYC 06/07/2016 based on my blog https://medium.com/@st553/using-transfer-learning-to-classify-images-with-tensorflow-b0f3142b9366
At NECSTLab we are working at developing a Coursera specialization. The set of four courses will introduce the students to the FPGA technologies, to the concept of reconfigurability in FPGAs, presenting the available mechanisms and technologies at the device level and the tools and design methodologies required to design FPGA-based computing systems. The course will present the different aspects of the design of FPGA-based systems, starting from basic knowledge to advanced design methodologies to implement complex design via SDAccel on Amazon AWS F1 instances. This talk will start describing the work done so far and the future plans in realizing the specialization.
We will then focus on two research projects that will be also used during the online classes.
We will first present CAOS, a framework which helps the application designer in identifying acceleration opportunities and guides through the implementation of the final FPGA-based system. The CAOS platform targets the full stack of the application optimization process, starting from the identification of the kernel functions to accelerate, to the optimization of such kernels and to the generation of the runtime management and the configuration files needed to program the FPGA. After CAOS will present the HUGenomics projects. The unique genetic profile of a species is leading to the development of customized treatments, from personalized medicine to agrigenomics, but the exponential growth of available genomic data requires a computational effort that may limit the progress of these fields. The HUGenomics framework aims at facilitating genome assembly process by means of both hardware accelerated algorithms and scientific data visualization tools. Indeed, the system raises the level of abstraction allowing users to easily integrate custom algorithms into the hardware pipeline without any knowledge of the underneath architecture.
A simplified way of approaching machine learning and deep learning from the ground up. The case for deep learning and an attempt to develop intuition for how/why it works. Advantages, state-of-the-art, and trends.
Presented at NYU Center for Genomics for NY Deep Learning Meetup
Talk given at PYCON Stockholm 2015
Intro to Deep Learning + taking pretrained imagenet network, extracting features, and RBM on top = 97 Accuracy after 1 hour (!) of training (in top 10% of kaggle cat vs dog competition)
Slides from the TensorFlow meetup at eBay NYC 06/07/2016 based on my blog https://medium.com/@st553/using-transfer-learning-to-classify-images-with-tensorflow-b0f3142b9366
At NECSTLab we are working at developing a Coursera specialization. The set of four courses will introduce the students to the FPGA technologies, to the concept of reconfigurability in FPGAs, presenting the available mechanisms and technologies at the device level and the tools and design methodologies required to design FPGA-based computing systems. The course will present the different aspects of the design of FPGA-based systems, starting from basic knowledge to advanced design methodologies to implement complex design via SDAccel on Amazon AWS F1 instances. This talk will start describing the work done so far and the future plans in realizing the specialization.
We will then focus on two research projects that will be also used during the online classes.
We will first present CAOS, a framework which helps the application designer in identifying acceleration opportunities and guides through the implementation of the final FPGA-based system. The CAOS platform targets the full stack of the application optimization process, starting from the identification of the kernel functions to accelerate, to the optimization of such kernels and to the generation of the runtime management and the configuration files needed to program the FPGA. After CAOS will present the HUGenomics projects. The unique genetic profile of a species is leading to the development of customized treatments, from personalized medicine to agrigenomics, but the exponential growth of available genomic data requires a computational effort that may limit the progress of these fields. The HUGenomics framework aims at facilitating genome assembly process by means of both hardware accelerated algorithms and scientific data visualization tools. Indeed, the system raises the level of abstraction allowing users to easily integrate custom algorithms into the hardware pipeline without any knowledge of the underneath architecture.
A simplified way of approaching machine learning and deep learning from the ground up. The case for deep learning and an attempt to develop intuition for how/why it works. Advantages, state-of-the-art, and trends.
Presented at NYU Center for Genomics for NY Deep Learning Meetup
Talk given at PYCON Stockholm 2015
Intro to Deep Learning + taking pretrained imagenet network, extracting features, and RBM on top = 97 Accuracy after 1 hour (!) of training (in top 10% of kaggle cat vs dog competition)
Julia: A modern language for software 2.0Viral Shah
This talk introduces the Julia language, the size of the community, the package ecosystem, differentiable programming, compiler design, and applications of scientific machine learning.
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning ResearchAI Frontiers
In this talk at AI Frontiers conference, Jeff Dean discusses recent trends and developments in deep learning research. Jeff touches on the significant progress that this research has produced in a number of areas, including computer vision, language understanding, translation, healthcare, and robotics. These advances are driven by both new algorithmic approaches to some of these problems, and by the ability to scale computation for training ever large models on larger datasets. Finally, one of the reasons for the rapid spread of the ideas and techniques of deep learning has been the availability of open source libraries such as TensorFlow. He gives an overview of why these software libraries have an important role in making the benefits of machine learning available throughout the world.
Recently I gave a talk at UC Berkeley regarding the transition from academia to industry in the context of Machine Learning and Data Science related roles. I based most of my slides on my own transition from being an Astrophysicist to a Machine Learning Expert. I hope this will be useful to many. Feedback is welcome!
TensorFlow에 대한 분석 내용
- TensorFlow?
- 배경
- DistBelief
- Tutorial - Logistic regression
- TensorFlow - 내부적으로는
- Tutorial - CNN, RNN
- Benchmarks
- 다른 오픈 소스들
- TensorFlow를 고려한다면
- 설치
- 참고 자료
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
I present a brief review, and an outlook on the rapid changes happening in the field of recommendation engine research on the heels of the deep learning revolution!
From Conventional Machine Learning to Deep Learning and Beyond.pptxChun-Hao Chang
In this slide, Deep Learning are compared with Conventional Learning and the strength of DNN models will be explained.
The target audience are people who have the knowledge of Machine Learning or Data Mining but not familiar with Deep Learning.
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
Covers basics Artificial neural networks and motivation for deep learning and explains certain deep learning networks, including deep belief networks and autoencoders. It also details challenges of implementing a deep learning network at scale and explains how we have implemented a distributed deep learning network over Spark.
How to use transfer learning to bootstrap image classification and question a...Wee Hyong Tok
#theaiconf SFO 2018
Session by Danielle Dean, WeeHyong Tok
Transfer learning enables you to use pretrained deep neural networks trained on various large datasets (ImageNet, CIFAR, WikiQA, SQUAD, and more) and adapt them for various deep learning tasks (e.g., image classification, question answering, and more).
Wee Hyong Tok and Danielle Dean share the basics of transfer learning and demonstrate how to use the technique to bootstrap the building of custom image classifiers and custom question-answering (QA) models. You’ll learn how to use the pretrained CNNs available in various model libraries to custom build a convolution neural network for your use case. In addition, you’ll discover how to use transfer learning for question-answering tasks, with models trained on large QA datasets (WikiQA, SQUAD, and more), and adapt them for new question-answering tasks.
https://conferences.oreilly.com/artificial-intelligence/ai-ca/public/schedule/detail/68527
Julia: A modern language for software 2.0Viral Shah
This talk introduces the Julia language, the size of the community, the package ecosystem, differentiable programming, compiler design, and applications of scientific machine learning.
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning ResearchAI Frontiers
In this talk at AI Frontiers conference, Jeff Dean discusses recent trends and developments in deep learning research. Jeff touches on the significant progress that this research has produced in a number of areas, including computer vision, language understanding, translation, healthcare, and robotics. These advances are driven by both new algorithmic approaches to some of these problems, and by the ability to scale computation for training ever large models on larger datasets. Finally, one of the reasons for the rapid spread of the ideas and techniques of deep learning has been the availability of open source libraries such as TensorFlow. He gives an overview of why these software libraries have an important role in making the benefits of machine learning available throughout the world.
Recently I gave a talk at UC Berkeley regarding the transition from academia to industry in the context of Machine Learning and Data Science related roles. I based most of my slides on my own transition from being an Astrophysicist to a Machine Learning Expert. I hope this will be useful to many. Feedback is welcome!
TensorFlow에 대한 분석 내용
- TensorFlow?
- 배경
- DistBelief
- Tutorial - Logistic regression
- TensorFlow - 내부적으로는
- Tutorial - CNN, RNN
- Benchmarks
- 다른 오픈 소스들
- TensorFlow를 고려한다면
- 설치
- 참고 자료
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
I present a brief review, and an outlook on the rapid changes happening in the field of recommendation engine research on the heels of the deep learning revolution!
From Conventional Machine Learning to Deep Learning and Beyond.pptxChun-Hao Chang
In this slide, Deep Learning are compared with Conventional Learning and the strength of DNN models will be explained.
The target audience are people who have the knowledge of Machine Learning or Data Mining but not familiar with Deep Learning.
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
Covers basics Artificial neural networks and motivation for deep learning and explains certain deep learning networks, including deep belief networks and autoencoders. It also details challenges of implementing a deep learning network at scale and explains how we have implemented a distributed deep learning network over Spark.
How to use transfer learning to bootstrap image classification and question a...Wee Hyong Tok
#theaiconf SFO 2018
Session by Danielle Dean, WeeHyong Tok
Transfer learning enables you to use pretrained deep neural networks trained on various large datasets (ImageNet, CIFAR, WikiQA, SQUAD, and more) and adapt them for various deep learning tasks (e.g., image classification, question answering, and more).
Wee Hyong Tok and Danielle Dean share the basics of transfer learning and demonstrate how to use the technique to bootstrap the building of custom image classifiers and custom question-answering (QA) models. You’ll learn how to use the pretrained CNNs available in various model libraries to custom build a convolution neural network for your use case. In addition, you’ll discover how to use transfer learning for question-answering tasks, with models trained on large QA datasets (WikiQA, SQUAD, and more), and adapt them for new question-answering tasks.
https://conferences.oreilly.com/artificial-intelligence/ai-ca/public/schedule/detail/68527
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDatabricks
Does more data always improve ML models? Is it better to use distributed ML instead of single node ML?
In this talk I will show that while more data often improves DL models in high variance problem spaces (with semi or unstructured data) such as NLP, image, video more data does not significantly improve high bias problem spaces where traditional ML is more appropriate. Additionally, even in the deep learning domain, single node models can still outperform distributed models via transfer learning.
Data scientists have pain points running many models in parallel automating the experimental set up. Getting others (especially analysts) within an organization to use their models Databricks solves these problems using pandas udfs, ml runtime and MLflow.
Distributed Deep Learning on Hadoop
Deep-learning is useful in detecting anomalies like fraud, spam and money laundering; identifying similarities to augment search and text analytics; predicting customer lifetime value and churn; recognizing faces and voices.
Deeplearning4j is an infinitely scalable deep-learning architecture suitable for Hadoop and other big-data structures. It includes a distributed deep-learning framework and a normal deep-learning framework; i.e. it runs on a single thread as well. Training takes place in the cluster, which means it can process massive amounts of data. Nets are trained in parallel via iterative reduce, and they are equally compatible with Java, Scala and Clojure. The distributed deep-learning framework is made for data input and neural net training at scale, and its output should be highly accurate predictive models.
The framework's neural nets include restricted Boltzmann machines, deep-belief networks, deep autoencoders, convolutional nets and recursive neural tensor networks.
Cloudera Data Science Challenge 3 Solution by Doug NeedhamDoug Needham
This is my solution for the Cloudera Data Science Challenge 3. I use Spark MLLib for problem1, and Spark GraphX for problem3. Problem2 is "simple" streaming map-reduce.
In this deck, Huihuo Zheng from Argonne National Laboratory presents: Data Parallel Deep Learning.
"The Argonne Training Program on Extreme-Scale Computing (ATPESC) provides intensive, two weeks of training on the key skills, approaches, and tools to design, implement, and execute computational science and engineering applications on current high-end computing systems and the leadership-class computing systems of the future."
Watch the video: https://wp.me/p3RLHQ-lsl
Learn more: https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Screencasting and Presenting for EngineersKunal Johar
Engineers often think about the 'how' as the most exciting part of their work. These details often bore what would be candid listeners.
Take a step back, think about what excites others, then ease in your grand challenges. Tie it all together in a story.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-warden
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Pete Warden, Google research engineer and the tech lead of the TensorFlow Mobile and Embedded team, presents the "Solving Vision Tasks Using Deep Learning: An Introduction" tutorial at the May 2018 Embedded Vision Summit.
This talk introduces deep learning for vision tasks. It provides an overview of deep learning, explores its weaknesses and strengths, and highlights best approaches to applying deep learning to solving vision problems. The audience will learn to think about vision problems from a different perspective, understand what questions to ask, and discover where to find the answers to these questions. The talk will conclude with insights on the challenges of deploying deep learning solutions on mobile devices.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
1. BRV CTO Summit
• UC Berkeley Undergrad/Stanford Grad EE
• Statistical physics helped/tensors, Lagrangian
• Tech Lead @Google/Waymo started w/Book Search
• self driving cars/RTOS C++
1
2. Outline
• Basic Architectures — Group discussion first
• What I need from you/similar to Andrei Pop discussions
• CNNs
• Motivation (better than human performance)
• CNN applications
• Backprop/ deep dive demo only (replicating papers).
• tools for getting to 30 deep layers
• CNN / building a model -> tips for DL
• RNNs applications
• RNN language models
• Word Embeddings(word2Vec) -> Entity Embeddings
• Rossmann stores
• State of the art >90% modeling performance.
Group discussion/ Don’t need the functional spec <-> need
enough experience with your data to be useful
• deepRL (remove) replace w/request for reinforcement learning vs. mobile
2
3. Types of Machine Learning
• Supervised(data train)
• Unsupervised(data/dist)
• Reinforcement Learning
Data vs. algorithms/data is used in place of an algorithm
4. Data vs. Algorithm
Computer vision approach
Which approach is cheaper by 50x+?
$600 vs 30k
No collision assumption
11. CNN / Image recognition
Better than humans
Image captioning
CNN+RNN
RNN for language modeling
12.
13. Deep Learning
13
• Position above Udacity/Coursera classes/solutions
• Quality/Relevance based on quality of input. (group feed
back based for next lecture)
• HealthAPI embeddings/database; have seen their
data and use cases 2+ years ago.
• Poll: DL in production? Replicating papers? PyTorch vs.
TF(production)? Structured Data? Tools for 32layer
deep CNN?
• Start CNN/RNN architectures/ following the class format
14. CNNs - Convoutional Neural
Networks • ~2015 Beat
humans(94%
vs.
99%)/caused
creation of
cs231n
14
Source:cs231n
cifar10/cifar100
17. • 2D convolution; take a window and slide over the image. The
amount sliding to right then down when end is reached is the
stride
• The size of the convolution window is kernel/filter
• padding is an extra line of pixels on the edges added if padding is
SAME. Valid has no padding. Truncate the multiply. Assume 0.
tf.conf2d
22. Keras / use for kaggle/practice/replicate papers
Source: https://github.com/dougc333/UdacityCarStuff/tree/master/CarND-Traffic-Sign-Classifier-Project
23. OO Models from cs20SI
Models Iteration 1
Source: https://github.com/dougc333/DeepLearning
24. After building hundreds of models here are some problems I still have:
Error message won’t say no init. Will seem like you have an unrelated problem
debug adding new variable
Can’t print variable w/o session run()
TF tips:
Source: Stanford cs20si slides
25. Unlike normal programming languages Printing a variable is hard
W=tf.Variable(tf.truncated_normal[700,10]))
print(W)
with tf.Session() as sess:
sess.run(init)
print(W)
Tensor(“Variable/read:100”, shape(700,10))
no var name
there is global init
no init/
won’t see: no session/no eval
with tf.Session() as sess:
sess.run(init)
print(W.eval())
Tensor(“Variable/read:100”, shape(700,10))
26. I want to change the variable value while debugging someone’s code
no output?
create an assign op first then run the op
assign the assign!!!!
delayed execution
not in the web pages/might be in slack now
Source: Stanford cs20si slides
27. tensorflow graphs when debugging/order is not guaranteed
with g.control_dependencies([a,b,c]):
A+B+C = (A+B) +C or A + (B+C)
easy to figure out when you get to it..
a reminder this is a lazy eval graph language/
not like a normal programming language
things take longer with TF
the old versions are stable in production
28. Lazy graph problems
in a training loop/want to change variable
make op an add to keep simple
in tf/second example creates 10 adds
common error/works but graph too large
Source: Stanford cs20si slides
33. Weight histograms
L2 regularization, get rid of large weights,
center around origin
L1 regularization: center around 0
Additive noise: stops learning early
multiplicative noise, helps to push to origin/mean
35. Medical Imaging
Can replicate the turtle ~ tumor Need higher resolutions
Debug a higher resolution cnn > 224x224 input image size
Poll: who has/is working on higher resolution cnn?
37. Key to DL - Backprop
• leave out derivatives….
• none of the online classes emphasize this; matching manual gradient
calculations to code.
• need this skill to debug larger/deeper networks. Builds intuition on how
networks fail/recover.
• Tradeoff between batch normalization/L1/L2 regularization/adding
noise/reducing overfitting
• Overfitting necessary for failed nodes during SGD. Sampling process not
uniform.
• practice derivatives/chain rule — implemented by sw packages
tensorflow/pytorch/theano.
38. Key to Deep Learning
Need this to replicate papers/debug networks
Derivation of backdrop
52. Structured data with shared weights
Accuracy comparisons
• graphs w/# hidden neurons height, width
• accuracy w/hidden weights
• computation time savings w/hidden weights
Keras may not be best tool for Architecture experiments
Do the work and have your own opinion
Derivation of backdrop
55. Gradient Based Methods
Conjugate Gradient Descent
SGD (Vincent’s lecture)
LGBFS/Ng’s UFDL Stanford Notes
Udacity covers GD; Vincent covers SGD
Only 1 option w/BigData
SGD samples the input data
Poll: who is interested in hw accelerated momentum/python
Andrew Ng <-> Andrew Pop (word2vec)
Marshall /not telling him what to
do
65. NLP - add more?
Build a language model Source: Peter Norvig(Google)/udacity
n-grams
Next: word2vec
Need for embeddings
Poll: NLP tasks
66. Semantic Ambuigity problem
We want to learn in a corpus cat similar to kitty and we want to share weights
This task using supervised learning would require too much labeled data which we don't ha
We are going to use unsupervised learning
Source: Vincent Vanhoucke(Google)/Udacity
67. Unsupervised learning
The trick we are going to use is similar words appear in similar context.
s we process more and more text we will notice we can use kitty and cat interchangably
Source: Vincent Vanhoucke(Google)/Udacity
68. We are going to use embeddings which is a NN and
will map similar words close to each other
Source: Vincent Vanhoucke(Google)/Udacity
69. Word2Vec finds these embeddings
1) map each word to a random embedding
2) use embedding to predict next word
3) iterate over massive text and kitty purrs and cat
purrs examples will cause cat/kitty to be next to a
common word. This signal makes these words
close together in the embedded cube
Source: Vincent Vanhoucke(Google)/Udacity
71. Don’t do this Feature Engineering
Create separate feature to detect
outliers.
Increase profit
Do not hide it. Underfit
Never underfit
Example viable models
cs20si youtube lecture3 18:39
Outliers are good/ we need them to do feature engineering/on the outliers!!!!
74. Entity Embeddings of Categorical
Variables Cheng Guo/Felix Berkhahn
• Convert categorical to embeddings. Related to Word2Vec
• DL back propagation only holds for continuous functions.
EE map them closer to continuous distributions
• Add additional data from weather and Google Trends
• Feature Engineering
• Bonus/makes existing algorithms Decision Trees/XGBoost
orders of magnitude faster
• State of the art performance at 90-95%.
74
https://arxiv.org/pdf/1604.06737.pdf
78. Transfer Learning - data vs. new algorithms
https://medium.com/nf2-project
NF1 on chromosome 18
NF2 on chromosome 22
Discovered via hackathon
Sponsored by accel.ai/google
Expanded to health events
79. Multitask learning
• multimodal learning from Ng, using audio and video
together;
• Multi-Task learning: https://arxiv.org/pdf/1705.07115.pdf
79
Poll: who has seen this/has application for this?
RNN type apps
80. Reinforcement Learning
Dec 2013 Deepmind Atari games/agents
Design an agent to maximize a reward
take screenshots of atari
games, build a modified
CNN to output a reward
vs. a classification
probability
reward: get points for staying in
lane, etc.
lose points for collisions, etc…
actions: speed of car, steering
angle
maximize reward
82. MDP-Markov Decision
Process
• in current state s member of S
• with action a member of Actions(s)
• P(s’|s,a) probability of going to state s’ when in state s and
action a. We have a probability P because we don’t have
the same result every time on action a. (definition of
stochastic)
• R(s,a,s’) ; R is a reward state given current state s, next
state s’ and action a.
83. To solve a MDP
• find a PI(s) which is going to maximize the discounted total
reward.
math… 10/19 4am.. I changed my mind.. schedule a deep dive like backprop