Word embeddings are common for NLP tasks, but embeddings can also be used to learn relations among categorical data. Deep learning can be useful also for structured data, and entity embeddings is one reason why it makes sense. These are slides from a seminar held in Sbanken.
Overview on Optimization algorithms in Deep LearningKhang Pham
Overview on function optimization in general and in deep learning. The slides cover from basic algorithms like batch gradient descent, stochastic gradient descent to the state of art algorithm like Momentum, Adagrad, RMSprop, Adam.
▸ Machine Learning / Deep Learning models require to set the value of many hyperparameters
▸ Common examples: regularization coefficients, dropout rate, or number of neurons per layer in a Neural Network
▸ Instead of relying on some "expert advice", this presentation shows how to automatically find optimal hyperparameters
▸ Exhaustive Search, Monte Carlo Search, Bayesian Optimization, and Evolutionary Algorithms are explained with concrete examples
알리바바 클라우드 PAI (machine learning Platform for AI)Alibaba Cloud Korea
대규모 고성능 분산 컴퓨팅을 기반으로 구축된 알리바바 클라우드의 머신러닝 플랫폼 PAI에 대해 알아보세요. PAI는 고객이 대규모 데이터 마이닝 및 모델링을 쉽게 구현할 수 있도록 지원합니다.
중국 최초의 머신 러닝 플랫폼인 알리바바 클라우드 PAI는 AI 프로그램을 설계하기 위해 제작된 것으로, 여러 고객의 현실적인 문제를 해결하는 데 효과적인 도구입니다.
알리바바 클라우드 PAI의 주요 기능은 다음과 같습니다:
• 다양하고 혁신적인 알고리즘:PAI에는 데이터 전처리, 신경망, 회귀, 분류, 예측, 평가, 통계 분석, 기능 공학 및 딥러닝 아키텍처를 다루는 100가지 이상의 알고리즘이 설계되어 있습니다.
• 딥러닝 아키텍처: PAI에는 전체 컴퓨팅 아키텍처가 다양한 딥러닝 프레임워크에 맞게 최적화되어 있습니다. 또한 이는 API(Application Program Interface)를 배포하는 원클릭 기능을 지원해, 모델링과 서비스 통합 문제를 해결합니다.
• 대규모 컴퓨팅 파워: 알리바바 클라우드의 대형 컴퓨팅 엔진인 PAI는 Apsara에 의해 구동되며, 페타바이트급 컴퓨팅 업무를 매일 처리할 수 있는 초대규모 분산 컴퓨팅 기능을 제공합니다.
• 사용자 친화적 인터페이스: PAI의 데이터 시각화 기능을 통해 개발자는 드래그 앤 드롭 기능으로 구성요소를 작업 흐름에 편리하고 신속하게 투입할 수 있습니다. 모델 구축 및 디버깅 효율성을 향상시키는데 도움을 드립니다.
Revised presentation slide for NLP-DL, 2016/6/22.
Recent Progress (from 2014) in Recurrent Neural Networks and Natural Language Processing.
Profile http://www.cl.ecei.tohoku.ac.jp/~sosuke.k/
Japanese ver. https://www.slideshare.net/hytae/rnn-63761483
Overview on Optimization algorithms in Deep LearningKhang Pham
Overview on function optimization in general and in deep learning. The slides cover from basic algorithms like batch gradient descent, stochastic gradient descent to the state of art algorithm like Momentum, Adagrad, RMSprop, Adam.
▸ Machine Learning / Deep Learning models require to set the value of many hyperparameters
▸ Common examples: regularization coefficients, dropout rate, or number of neurons per layer in a Neural Network
▸ Instead of relying on some "expert advice", this presentation shows how to automatically find optimal hyperparameters
▸ Exhaustive Search, Monte Carlo Search, Bayesian Optimization, and Evolutionary Algorithms are explained with concrete examples
알리바바 클라우드 PAI (machine learning Platform for AI)Alibaba Cloud Korea
대규모 고성능 분산 컴퓨팅을 기반으로 구축된 알리바바 클라우드의 머신러닝 플랫폼 PAI에 대해 알아보세요. PAI는 고객이 대규모 데이터 마이닝 및 모델링을 쉽게 구현할 수 있도록 지원합니다.
중국 최초의 머신 러닝 플랫폼인 알리바바 클라우드 PAI는 AI 프로그램을 설계하기 위해 제작된 것으로, 여러 고객의 현실적인 문제를 해결하는 데 효과적인 도구입니다.
알리바바 클라우드 PAI의 주요 기능은 다음과 같습니다:
• 다양하고 혁신적인 알고리즘:PAI에는 데이터 전처리, 신경망, 회귀, 분류, 예측, 평가, 통계 분석, 기능 공학 및 딥러닝 아키텍처를 다루는 100가지 이상의 알고리즘이 설계되어 있습니다.
• 딥러닝 아키텍처: PAI에는 전체 컴퓨팅 아키텍처가 다양한 딥러닝 프레임워크에 맞게 최적화되어 있습니다. 또한 이는 API(Application Program Interface)를 배포하는 원클릭 기능을 지원해, 모델링과 서비스 통합 문제를 해결합니다.
• 대규모 컴퓨팅 파워: 알리바바 클라우드의 대형 컴퓨팅 엔진인 PAI는 Apsara에 의해 구동되며, 페타바이트급 컴퓨팅 업무를 매일 처리할 수 있는 초대규모 분산 컴퓨팅 기능을 제공합니다.
• 사용자 친화적 인터페이스: PAI의 데이터 시각화 기능을 통해 개발자는 드래그 앤 드롭 기능으로 구성요소를 작업 흐름에 편리하고 신속하게 투입할 수 있습니다. 모델 구축 및 디버깅 효율성을 향상시키는데 도움을 드립니다.
Revised presentation slide for NLP-DL, 2016/6/22.
Recent Progress (from 2014) in Recurrent Neural Networks and Natural Language Processing.
Profile http://www.cl.ecei.tohoku.ac.jp/~sosuke.k/
Japanese ver. https://www.slideshare.net/hytae/rnn-63761483
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Talk on Optimization for Deep Learning, which gives an overview of gradient descent optimization algorithms and highlights some current research directions.
Presentation in Vietnam Japan AI Community in 2019-05-26.
The presentation summarizes what I've learned about Regularization in Deep Learning.
Disclaimer: The presentation is given in a community event, so it wasn't thoroughly reviewed or revised.
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
Abstract (Eng/Kor):
Image restoration (IR) is one of the fundamental problems, which includes denoising, deblurring, super-resolution, etc. Among those, in today's talk, I will more focus on the super-resolution task. There are two main streams in the super-resolution studies; a traditional model-based optimization and a discriminative learning method. I will present the pros and cons of both methods and their recent developments in the research field. Finally, I will provide a mathematical view that explains both methods in a single holistic framework, while achieving the best of both worlds. The last slide summarizes the remaining problems that are yet to be solved in the field.
영상 복원(Image restoration, IR)은 low-level vision에서 매우 중요하게 다루는 근본적인 문제 중 하나로서 denoising, deblurring, super-resolution 등의 다양한 영상 처리 문제를 포괄합니다. 오늘 발표에서는 영상 복원 분야 중에서도 super-resolution 문제에 대해 집중적으로 다루겠습니다. 전통적인 model-based optimization 방식과 deep learning을 적용하여 문제를 푸는 방식에 대해, 각각의 장단점과 최신 연구 발전 흐름을 소개하겠습니다. 마지막으로는 이 둘을 하나로 잇는 통일된 관점을 제시하고 관련 연구들 살펴본 후, super-resolution 분야에서 아직 남아있는 문제점들을 정리하겠습니다.
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 231번째 논문 review 입니다
이번 논문은 Google Brain에서 나온 A Simple Framework for Contrastive Learning of Visual Representations입니다. Geoffrey Hinton님이 마지막 저자이시기도 해서 최근에 더 주목을 받고 있는 논문입니다.
이 논문은 최근에 굉장히 핫한 topic인 contrastive learning을 이용한 self-supervised learning쪽 논문으로 supervised learning으로 학습한 ResNet50와 동일한 성능을 얻을 수 있는 unsupervised pre-trainig 방법을 제안하였습니다. Data augmentation, Non-linear projection head, large batch size, longer training, NTXent loss 등을 활용하여 훌륭한 representation learning이 가능함을 보여주었고, semi-supervised learning이나 transfer learning에서도 매우 뛰어난 결과를 보여주었습니다. 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/2002.05709
영상링크: https://youtu.be/FWhM3juUM6s
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Talk on Optimization for Deep Learning, which gives an overview of gradient descent optimization algorithms and highlights some current research directions.
Presentation in Vietnam Japan AI Community in 2019-05-26.
The presentation summarizes what I've learned about Regularization in Deep Learning.
Disclaimer: The presentation is given in a community event, so it wasn't thoroughly reviewed or revised.
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
Abstract (Eng/Kor):
Image restoration (IR) is one of the fundamental problems, which includes denoising, deblurring, super-resolution, etc. Among those, in today's talk, I will more focus on the super-resolution task. There are two main streams in the super-resolution studies; a traditional model-based optimization and a discriminative learning method. I will present the pros and cons of both methods and their recent developments in the research field. Finally, I will provide a mathematical view that explains both methods in a single holistic framework, while achieving the best of both worlds. The last slide summarizes the remaining problems that are yet to be solved in the field.
영상 복원(Image restoration, IR)은 low-level vision에서 매우 중요하게 다루는 근본적인 문제 중 하나로서 denoising, deblurring, super-resolution 등의 다양한 영상 처리 문제를 포괄합니다. 오늘 발표에서는 영상 복원 분야 중에서도 super-resolution 문제에 대해 집중적으로 다루겠습니다. 전통적인 model-based optimization 방식과 deep learning을 적용하여 문제를 푸는 방식에 대해, 각각의 장단점과 최신 연구 발전 흐름을 소개하겠습니다. 마지막으로는 이 둘을 하나로 잇는 통일된 관점을 제시하고 관련 연구들 살펴본 후, super-resolution 분야에서 아직 남아있는 문제점들을 정리하겠습니다.
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 231번째 논문 review 입니다
이번 논문은 Google Brain에서 나온 A Simple Framework for Contrastive Learning of Visual Representations입니다. Geoffrey Hinton님이 마지막 저자이시기도 해서 최근에 더 주목을 받고 있는 논문입니다.
이 논문은 최근에 굉장히 핫한 topic인 contrastive learning을 이용한 self-supervised learning쪽 논문으로 supervised learning으로 학습한 ResNet50와 동일한 성능을 얻을 수 있는 unsupervised pre-trainig 방법을 제안하였습니다. Data augmentation, Non-linear projection head, large batch size, longer training, NTXent loss 등을 활용하여 훌륭한 representation learning이 가능함을 보여주었고, semi-supervised learning이나 transfer learning에서도 매우 뛰어난 결과를 보여주었습니다. 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/2002.05709
영상링크: https://youtu.be/FWhM3juUM6s
In this talk, after a brief overview of AI concepts in particular Machine Learning (ML) techniques, some of the well-known computer design concepts for high performance and power efficiency are presented. Subsequently, those techniques that have had a promising impact for computing ML algorithms are discussed. Deep learning has emerged as a game changer for many applications in various fields of engineering and medical sciences. Although the primary computation function is matrix vector multiplication, many competing efficient implementations of this primary function have been proposed and put into practice. This talk will review and compare some of those techniques that are used for ML computer design.
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
State-of-the-art Image Processing across all domainsKnoldus Inc.
Ever thought of going beyond TensorFlow, GPU or TPU to solve your image classification problems?
From the standpoint of deep learning, the problem of image processing can be solved in a much better way with Transfer Learning. It is a computer vision method that helps develop accurate models while saving a lot of time. This presentation will help you find out why it is so beneficial?
Agenda:
The history of image processing
What is Transfer Learning?
Introduction to Convolutional Neural Networks (CNNs)
Different types of CNN architectures like AlexNet, VGG, Inception, and ResNet
Performance of various CNN architectures
Solving a medical image diagnosis problem with the above-discussed architectures
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
Machine Learning Essentials Demystified part1 | Big Data DemystifiedOmid Vahdaty
Machine Learning Essentials Abstract:
Machine Learning (ML) is one of the hottest topics in the IT world today. But what is it really all about?
In this session we will talk about what ML actually is and in which cases it is useful.
We will talk about a few common algorithms for creating ML models and demonstrate their use with Python. We will also take a peek at Deep Learning (DL) and Artificial Neural Networks and explain how they work (without too much math) and demonstrate DL model with Python.
The target audience are developers, data engineers and DBAs that do not have prior experience with ML and want to know how it actually works.
Computer Vision abbreviated as CV aims to teach computers to achieve human level vision capabilities. Applications of CV in self driving cars, robotics, healthcare, education and the multitude of apps that allow customers to use the smartphone cameras to convey information has made it one of the most popular fields in Artificial Intelligence. The recent advances in Deep Learning, data storage and computing capabilities has lead to the huge success of CV. There are several tasks in computer vision, such as classification, object detection, image segmentation, optical character recognition, scene reconstruction and many others.
In this presentation I will talk about applying Transfer Learning, Image classification, object detection and the metrics required to measure them on still images. The increase in accuracy over of CV tasks over the past decade is due to Convolutional Neural Networks (CNN), CNN is the base used in architectures such as RESNET or VGGNET. I will go through how to use these pre-trained models for image classification and feature extraction. One of the break throughs in object detection has come with one-shot learning, where the bounding box and the class of the object is predicted simultaneously. This leads to low latency during inference (155 frames per second) and high accuracy. This is the framework behind object detection using YOLO , I will explain how to use yolo for specific use cases.
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Holdings
Deep learning is both computationally and memory intensive, necessitating enhancements in processor performance. In this issue, we explore how this has led to the rise of startups adopting alternative, innovative approaches and how it is expected to pave the way for different types of AI-optimized chipsets.
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...inside-BigData.com
In this Deck from the 2018 Swiss HPC Conference, Dave Turek from IBM presents: The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big Data.
"There is a shift underway where HPC is beginning to be addressed with novel techniques and technologies including cognitive and analytic approaches to HPC problems and the arrival of the first quantum systems. This talk will showcase how IBM is merging cognitive, analytics, and quantum with classic simulation and modeling to create a new path for computational science."
Watch the video: https://wp.me/p3RLHQ-ik7
Learn more: http://ibm.com
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Similar to Entity embeddings for categorical data (20)
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
3. 3
Exceeded 1 million users in 2017
Collaborativeandcompetitive datascience
Gradient boosted trees win most contests with tabular/structureddata
Deep Learning wins when datais unstructured images/text/sound
4. Standard modeling activitiesStatistics or machine learning, most activities are common
Select
model
Select
inputs
Train model
Test model
on unseen
data
Evaluate
performance
Success?
Modelling
activities
Supervised learning needs labeled data
SELECT GROUND TRUTH TO TARGET THE TRAINING AGAINST
Requires experts with deep understanding in the field
FEATURE ENGINEERING – FIND RELEVANT INPUTS
The risk of overfitting is high when the model has
many parameters
COMMON PITFALL - OVERFITTING
Nodes are randomly dropped so that the rest must readjust
DEEP LEARNING AVOIDS OVERFITTING USING DROPOUT
DATA
DISCOVERY
9. 9
Data representations, decomposing a vector
x
y
v
u
u
v
V =
We can decompose
the vector V into a
vector of length u
directed along the x
axis, and a vector of
length v directed
along the y axis.
V
10. 10
Data representations, vector length and direction
x
y u
v
V =
V
V
α
=
V
α
Both these data
representationsdefine
the same vector.
How you want to feed
this information to the
learning algorithm
depends on what
you’re aiming to
predict.
If this vector would represent wind in the horizontal plane, and we want to predict the power output from
a wind turbine, which we happen to know is a function of the wind speed, feeding in
to the learning algorithm makes a lot of sense.
𝑉 = 𝑢2 + 𝑣2This way the learning algorithm doesn’t need to figure out Pythagoras on it’s own.
However, with enough training data, a neural network could figure this out.
𝑉
11. 11
Data representations, cyclic variables
x
y u
v
V =
V
V
α
=
V
α
Cyclic variables needs
special consideration.
Angle α, the angle
between 0° and 359°
is only 1°, this is not
obvious to a learning
algorithm.
22. 22
• Normalize data
• If input is categorical,represent it as one-hot encodings
• Red,blue,green -> red=[1,0,0] , blue=[0,1,0], green=[0,0,1]
• If input is text,represent words as word embeddings
• If embeddinglength was 4, we could have«bank» = [0.23,1.2,0.34,0.78]
• The embeddings can be learned as part of the learningtask, or:
• Embeddings can be taken from a language model trained froma larger text corpus
Preprocessing of inputs to neural networks
23. 23
• Large number of categories lead to long one-hot vectors
• Different values of categorical variables are treated as
completely independent of each other.
Some weaknesses of one-hot for categorical data
26. 26
• >20 000 forbedringsforslag since 2010
• Each Forbedringsforslag has one text with maximum 98
words
• Each Forbedringsforslag is classified into a product
category by a person.
• Can we take those data and teach a learning algorithm to
predict product category?
Forbedringsforslag
27. 27
Forbedringsforslag – Neural network architecture
«Hei, jeg opplever det som veldig forvirrende at jeg ser bokført saldo. Jeg trenger kun å se
disponibel saldo. Ønsker å bare se disponibel eller velge det som den saldoen som er synlig.»
28. 28
Conclusion Forbedringsforslag
• We finally arriveat an accuracy of 75% for both the validation set and the test set
• Without regularization we startoverfitting after 10 to 15 epochs
• By applying dropoutfraction of 0.2 on both input-to-stateand state-to-statein the LSTM, we avoid overfitting
• A thin graphical user interfacecan presentthe products sorted by descending predicted probability
• The labelling job can the be quicker, but it can’tbe done entirely by machine learning
29. 29
Sales prediction Kaggle contest 2015
• 3000 drug stores
• 7 countries
• Predict daily sales
• Depends on:
• Promotions
• Competition
• School
• State holiday
• Seasonality
• Locality
• Etc
30. 30
• In principle a neural network can approximateany
continous function and piece wise continous function
• A neural network is not suitable to approximate arbitrary
non-continous functions as it assumes a certain level of
continuity
• Decision trees do not assumeany continuity of feature
variables and can divide the states of a variable as fine as
necessary
31. 31
• «The rise of neural networks in natural language
processing is based on the word embeddings which puts
words with similar meaning closer to each other in a
word space thus increasing the continuity of the words
compared to using one-hot encoding of words»
32. 32
Keras implementation of entity embeddings by Guo
https://github.com/entron/entity-embedding-rossmann/
• Store
• Day of week
• Promo
• Year
• Month
• Day of month
• State
35. 35
• Entity embeddings reduce memory usage and speeds up neural
networks compared to one-hot encoding.
• Intrinsic properties of the categorical features can be revealed by
mapping similar values close to each other in embedding space.
• The embeddings learned boost the performance of other machine
learning methods when using them as input features instead.
• Guo and Berkhahn came out third in the Rossman Store Sales prediction
• The students at MILA, Montreal who won the Taxi Destination
prediction on Kaggle also used entity embeddings
http://blog.kaggle.com/2015/07/27/taxi-trajectory-winners-interview-
1st-place-team-%F0%9F%9A%95/
Conclusions