This document summarizes a presentation on using an LSTM neural network to predict bitcoin price movements based on sentiment analysis of twitter data. It describes collecting over 1 million tweets related to bitcoin, representing the words in the tweets as word vectors, training an LSTM model on the vectorized tweet data with sentiment labels, and evaluating whether the predicted sentiment correlates with bitcoin price changes. While the results did not find a relationship between sentiment and price according to this model, improvements are discussed such as using a training set more similar to the actual tweet data.
Generating Natural-Language Text with Neural NetworksJonathan Mugan
Automatic text generation enables computers to summarize text, to have conversations in customer-service and other settings, and to customize content based on the characteristics and goals of the human interlocutor. Using neural networks to automatically generate text is appealing because they can be trained through examples with no need to manually specify what should be said when. In this talk, we will provide an overview of the existing algorithms used in neural text generation, such as sequence2sequence models, reinforcement learning, variational methods, and generative adversarial networks. We will also discuss existing work that specifies how the content of generated text can be determined by manipulating a latent code. The talk will conclude with a discussion of current challenges and shortcomings of neural text generation.
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
Slides for talk Abhishek Sharma and I gave at the Gennovation tech talks (https://gennovationtalks.com/) at Genesis. The talk was part of outreach for the Deep Learning Enthusiasts meetup group at San Francisco. My part of the talk is covered from slides 19-34.
An introduction to Machine Learning (and a little bit of Deep Learning)Thomas da Silva Paula
25-min talk about Machine Learning and a little bit of Deep Learning. Starts with some basic definitions (Supervised and Unsupervised Learning). Then, neural networks basic functionality is explained, ending up in Deep Learning and Convolutional Neural Networks.
Machine Learning Meetup that happened in Porto Alegre, Brazil.
Upload photos Copy this Meetup
Things we will discuss are
1.Introduction of Machine learning and deep learning.
2.Applications of ML and DL.
3.Various learning algorithms of ML and DL.
4.Quick introduction of open source solutions for all programming languages.
5.Finally A broad picture of what you can do with Deep learning to this tech world.
The following topics we cover in the event..
1.Introduction of neural networks
(What,Why and How)
2.Types of neural networks
(For different types of problems)
3.Neural networks Algorithms explanation (Forward and Back propagation)
4.Demo of neural networks
(Image classification like bird , aeroplane,person and etc...)
In this presentation we discuss the hypothesis of MaxEnt models, describe the role of feature functions and their applications to Natural Language Processing (NLP). The training of the classifier is discussed in a later presentation.
Generating Natural-Language Text with Neural NetworksJonathan Mugan
Automatic text generation enables computers to summarize text, to have conversations in customer-service and other settings, and to customize content based on the characteristics and goals of the human interlocutor. Using neural networks to automatically generate text is appealing because they can be trained through examples with no need to manually specify what should be said when. In this talk, we will provide an overview of the existing algorithms used in neural text generation, such as sequence2sequence models, reinforcement learning, variational methods, and generative adversarial networks. We will also discuss existing work that specifies how the content of generated text can be determined by manipulating a latent code. The talk will conclude with a discussion of current challenges and shortcomings of neural text generation.
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
Slides for talk Abhishek Sharma and I gave at the Gennovation tech talks (https://gennovationtalks.com/) at Genesis. The talk was part of outreach for the Deep Learning Enthusiasts meetup group at San Francisco. My part of the talk is covered from slides 19-34.
An introduction to Machine Learning (and a little bit of Deep Learning)Thomas da Silva Paula
25-min talk about Machine Learning and a little bit of Deep Learning. Starts with some basic definitions (Supervised and Unsupervised Learning). Then, neural networks basic functionality is explained, ending up in Deep Learning and Convolutional Neural Networks.
Machine Learning Meetup that happened in Porto Alegre, Brazil.
Upload photos Copy this Meetup
Things we will discuss are
1.Introduction of Machine learning and deep learning.
2.Applications of ML and DL.
3.Various learning algorithms of ML and DL.
4.Quick introduction of open source solutions for all programming languages.
5.Finally A broad picture of what you can do with Deep learning to this tech world.
The following topics we cover in the event..
1.Introduction of neural networks
(What,Why and How)
2.Types of neural networks
(For different types of problems)
3.Neural networks Algorithms explanation (Forward and Back propagation)
4.Demo of neural networks
(Image classification like bird , aeroplane,person and etc...)
In this presentation we discuss the hypothesis of MaxEnt models, describe the role of feature functions and their applications to Natural Language Processing (NLP). The training of the classifier is discussed in a later presentation.
Machine Learning Essentials Demystified part1 | Big Data DemystifiedOmid Vahdaty
Machine Learning Essentials Abstract:
Machine Learning (ML) is one of the hottest topics in the IT world today. But what is it really all about?
In this session we will talk about what ML actually is and in which cases it is useful.
We will talk about a few common algorithms for creating ML models and demonstrate their use with Python. We will also take a peek at Deep Learning (DL) and Artificial Neural Networks and explain how they work (without too much math) and demonstrate DL model with Python.
The target audience are developers, data engineers and DBAs that do not have prior experience with ML and want to know how it actually works.
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
achine Learning Essentials Abstract:
Machine Learning (ML) is one of the hottest topics in the IT world today. But what is it really all about?
In this session we will talk about what ML actually is and in which cases it is useful.
We will talk about a few common algorithms for creating ML models and demonstrate their use with Python. We will also take a peek at Deep Learning (DL) and Artificial Neural Networks and explain how they work (without too much math) and demonstrate DL model with Python.
The target audience are developers, data engineers and DBAs that do not have prior experience with ML and want to know how it actually works.
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
Aaron Roth is an Associate Professor of Computer and Information Sciences at the University of Pennsylvania, affiliated with the Warren Center for Network and Data Science, and co-director of the Networked and Social Systems Engineering (NETS) program. Previously, he received his PhD from Carnegie Mellon University and spent a year as a postdoctoral researcher at Microsoft Research New England. He is the recipient of a Presidential Early Career Award for Scientists and Engineers (PECASE) awarded by President Obama in 2016, an Alfred P. Sloan Research Fellowship, an NSF CAREER award, and a Yahoo! ACE award. His research focuses on the algorithmic foundations of data privacy, algorithmic fairness, game theory and mechanism design, learning theory, and the intersections of these topics. Together with Cynthia Dwork, he is the author of the book “The Algorithmic Foundations of Differential Privacy.”
Abstract Summary:
Differential Privacy and Machine Learning:
In this talk, we will give a friendly introduction to Differential Privacy, a rigorous methodology for analyzing data subject to provable privacy guarantees, that has recently been widely deployed in several settings. The talk will specifically focus on the relationship between differential privacy and machine learning, which is surprisingly rich. This includes both the ability to do machine learning subject to differential privacy, and tools arising from differential privacy that can be used to make learning more reliable and robust (even when privacy is not a concern).
Optimization as a model for few shot learningKaty Lee
paper presentation of "Optimization as a model for few shot learning" at ICLR 2017 by Sachin Ravi and Hugo Larochelle
highly related to "learning to learn by gradient descent by gradient descent"
안녕하세요 딥러닝 논문읽기 모임입니다 지난달 구글에서 발표한, 'H-Transformer-1D Paper : Fast One Dimensional Hierarchical Attention For Sequences review!!' 라는 제목의 논문입니다!
제목에서 알 수 있듯, 논문은 시퀀스 처리를 위한 1차원 Hierarchical Attention인데
알고리즘이 더 빠르다라고 언급하고 있는 논문입니다
논문 제목에서 보시다시피 이제 SelfAttention, 트랜스포머의 전신이죠
셀프Attention이 쿼드라틱 컴프레시티를 가진다는 거는
너무나도 많이 알려진 사실이고 이것을 해결하기 위한 다양한 논문들이 많이 나왔습니다 이 논문도 그러한 연구의 연장선상에 있는 한 논문인데요
이 논문 같은 경우에는 Attention 매트릭스를 로우 스트럭처로 근사하는데
또 이런 수치해석적인 기법을 적용해보자 라는 컨셉에서 나온 논문입니다.
오늘 논문 리뷰를 위해 자연어 처리 진명훈님이 자세한 리뷰 도와주셨습니다!
오늘도 많은 관심 미리 감사드립니다!
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
I had a fun time giving tutorial on the topic of deep learning in recommender systems at Latin America School on Recommender Systems (LARS) in Fortaleza, Brazil.
Neural Nets from Scratch
This contains the slides (without animations) for a talk I gave on the mathematical foundations of neural nets, and how one would code a neural net from scratch in a way that made the math work out, using Python.
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches: Multi-algorithm ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training and cross-validating multiple base learning algorithms.
We will demonstrate a variety of software- and hardware-based approaches that lead to more scalable ensemble learning software, including a highly scalable implementation of stacking called “H2O Ensemble”, built on top of the open source, distributed machine learning platform, H2O. H2O Ensemble scales across multi-node clusters and allows the user to create ensembles of deep neural networks, Gradient Boosting Machines, Random Forest, and others. As for algorithm-based approaches, we will present two algorithmic modifications to the original stacking algorithm that further reduce computation time — Subsemble algorithm and the Online Super Learner algorithm. This talk will also include benchmarks of the implementations of these new stacking variants.
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskSaurabh Saxena
Studied feasibility of applying state-of-the-art deep learning models like end-to-end memory networks and neural attention- based models to the problem of machine comprehension and subsequent question answering in corporate settings with huge
amount of unstructured textual data. Used pre-trained embeddings like word2vec and GLove to avoid huge training costs.
Michael Manukyan and Hrayr Harutyunyan gave a talk on sentence representations in the context of deep learning during Armenian NLP Meetup. They also reviewed a recent paper on machine comprehension (Wang, Jiang, 2016)
This was presented to software developers with the goal of introducing them to basic machine learning workflow, code snippets, possibilities and state-of-the-art in NLP and give some clues on where to get started.
Develop a fundamental overview of Google TensorFlow, one of the most widely adopted technologies for advanced deep learning and neural network applications. Understand the core concepts of artificial intelligence, deep learning and machine learning and the applications of TensorFlow in these areas.
The deck also introduces the Spotle.ai masterclass in Advanced Deep Learning With Tensorflow and Keras.
Erik Bernhardsson is the CTO at Better, a small startup in NYC working with mortgages. Before Better, he spent five years at Spotify managing teams working with machine learning and data analytics, in particular music recommendations.
Abstract Summary:
Nearest Neighbor Methods And Vector Models: Vector models are being used in a lot of different fields: natural language processing, recommender systems, computer vision, and other things. They are fast and convenient and are often state of the art in terms of accuracy. One of the challenges with vector models is that as the number of dimensions increase, finding similar items gets challenging. Erik developed a library called “Annoy” that uses a forest of random tree to do fast approximate nearest neighbor queries in high dimensional spaces. We will cover some specific applications of vector models with and how Annoy works.
Deep Learning For Practitioners, lecture 2: Selecting the right applications...ananth
In this presentation we articulate when deep learning techniques yield best results from a practitioner's view point. Do we apply deep learning techniques for every machine learning problem? What characteristics of an application lends itself suitable for deep learning? Does more data automatically imply better results regardless of the algorithm or model? Does "automated feature learning" obviate the need for data preprocessing and feature design?
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
Deep Learning is all the rage these days, but where does the reality of what Deep Learning can do end and the media hype begin? In this talk, I will dispel common myths about Deep Learning that are not necessarily true and help you decide whether you should practically use Deep Learning in your software stack.
I’ll begin with a technical overview of common neural network architectures like CNNs, RNNs, GANs and their common use cases like computer vision, language understanding or unsupervised machine learning. Then I’ll separate the hype from reality around questions like:
• When should you prefer traditional ML systems like scikit learn or Spark.ML instead of Deep Learning?
• Do you no longer need to do careful feature extraction and standardization if using Deep Learning?
• Do you really need terabytes of data when training neural networks or can you ‘steal’ pre-trained lower layers from public models by using transfer learning?
• How do you decide which activation function (like ReLU, leaky ReLU, ELU, etc) or optimizer (like Momentum, AdaGrad, RMSProp, Adam, etc) to use in your neural network?
• Should you randomly initialize the weights in your network or use more advanced strategies like Xavier or He initialization?
• How easy is it to overfit/overtrain a neural network and what are the common techniques to ovoid overfitting (like l1/l2 regularization, dropout and early stopping)?
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
n this SlideShare presentation, we delve into the intricate world of NLP Transformers, exploring their underlying architecture and uncovering their immense power in Natural Language Processing (NLP). Join us as we demystify the complexities and provide a comprehensive overview of how Transformers revolutionize tasks such as machine translation, sentiment analysis, question answering, and more. Gain valuable insights into the transformer model, attention mechanisms, self-attention, and the transformer encoder-decoder structure. Whether you're an NLP enthusiast or a beginner, this presentation will equip you with a solid foundation to comprehend and harness the potential of NLP Transformers.
Machine Learning Essentials Demystified part1 | Big Data DemystifiedOmid Vahdaty
Machine Learning Essentials Abstract:
Machine Learning (ML) is one of the hottest topics in the IT world today. But what is it really all about?
In this session we will talk about what ML actually is and in which cases it is useful.
We will talk about a few common algorithms for creating ML models and demonstrate their use with Python. We will also take a peek at Deep Learning (DL) and Artificial Neural Networks and explain how they work (without too much math) and demonstrate DL model with Python.
The target audience are developers, data engineers and DBAs that do not have prior experience with ML and want to know how it actually works.
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
achine Learning Essentials Abstract:
Machine Learning (ML) is one of the hottest topics in the IT world today. But what is it really all about?
In this session we will talk about what ML actually is and in which cases it is useful.
We will talk about a few common algorithms for creating ML models and demonstrate their use with Python. We will also take a peek at Deep Learning (DL) and Artificial Neural Networks and explain how they work (without too much math) and demonstrate DL model with Python.
The target audience are developers, data engineers and DBAs that do not have prior experience with ML and want to know how it actually works.
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
Aaron Roth is an Associate Professor of Computer and Information Sciences at the University of Pennsylvania, affiliated with the Warren Center for Network and Data Science, and co-director of the Networked and Social Systems Engineering (NETS) program. Previously, he received his PhD from Carnegie Mellon University and spent a year as a postdoctoral researcher at Microsoft Research New England. He is the recipient of a Presidential Early Career Award for Scientists and Engineers (PECASE) awarded by President Obama in 2016, an Alfred P. Sloan Research Fellowship, an NSF CAREER award, and a Yahoo! ACE award. His research focuses on the algorithmic foundations of data privacy, algorithmic fairness, game theory and mechanism design, learning theory, and the intersections of these topics. Together with Cynthia Dwork, he is the author of the book “The Algorithmic Foundations of Differential Privacy.”
Abstract Summary:
Differential Privacy and Machine Learning:
In this talk, we will give a friendly introduction to Differential Privacy, a rigorous methodology for analyzing data subject to provable privacy guarantees, that has recently been widely deployed in several settings. The talk will specifically focus on the relationship between differential privacy and machine learning, which is surprisingly rich. This includes both the ability to do machine learning subject to differential privacy, and tools arising from differential privacy that can be used to make learning more reliable and robust (even when privacy is not a concern).
Optimization as a model for few shot learningKaty Lee
paper presentation of "Optimization as a model for few shot learning" at ICLR 2017 by Sachin Ravi and Hugo Larochelle
highly related to "learning to learn by gradient descent by gradient descent"
안녕하세요 딥러닝 논문읽기 모임입니다 지난달 구글에서 발표한, 'H-Transformer-1D Paper : Fast One Dimensional Hierarchical Attention For Sequences review!!' 라는 제목의 논문입니다!
제목에서 알 수 있듯, 논문은 시퀀스 처리를 위한 1차원 Hierarchical Attention인데
알고리즘이 더 빠르다라고 언급하고 있는 논문입니다
논문 제목에서 보시다시피 이제 SelfAttention, 트랜스포머의 전신이죠
셀프Attention이 쿼드라틱 컴프레시티를 가진다는 거는
너무나도 많이 알려진 사실이고 이것을 해결하기 위한 다양한 논문들이 많이 나왔습니다 이 논문도 그러한 연구의 연장선상에 있는 한 논문인데요
이 논문 같은 경우에는 Attention 매트릭스를 로우 스트럭처로 근사하는데
또 이런 수치해석적인 기법을 적용해보자 라는 컨셉에서 나온 논문입니다.
오늘 논문 리뷰를 위해 자연어 처리 진명훈님이 자세한 리뷰 도와주셨습니다!
오늘도 많은 관심 미리 감사드립니다!
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
I had a fun time giving tutorial on the topic of deep learning in recommender systems at Latin America School on Recommender Systems (LARS) in Fortaleza, Brazil.
Neural Nets from Scratch
This contains the slides (without animations) for a talk I gave on the mathematical foundations of neural nets, and how one would code a neural net from scratch in a way that made the math work out, using Python.
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches: Multi-algorithm ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training and cross-validating multiple base learning algorithms.
We will demonstrate a variety of software- and hardware-based approaches that lead to more scalable ensemble learning software, including a highly scalable implementation of stacking called “H2O Ensemble”, built on top of the open source, distributed machine learning platform, H2O. H2O Ensemble scales across multi-node clusters and allows the user to create ensembles of deep neural networks, Gradient Boosting Machines, Random Forest, and others. As for algorithm-based approaches, we will present two algorithmic modifications to the original stacking algorithm that further reduce computation time — Subsemble algorithm and the Online Super Learner algorithm. This talk will also include benchmarks of the implementations of these new stacking variants.
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskSaurabh Saxena
Studied feasibility of applying state-of-the-art deep learning models like end-to-end memory networks and neural attention- based models to the problem of machine comprehension and subsequent question answering in corporate settings with huge
amount of unstructured textual data. Used pre-trained embeddings like word2vec and GLove to avoid huge training costs.
Michael Manukyan and Hrayr Harutyunyan gave a talk on sentence representations in the context of deep learning during Armenian NLP Meetup. They also reviewed a recent paper on machine comprehension (Wang, Jiang, 2016)
This was presented to software developers with the goal of introducing them to basic machine learning workflow, code snippets, possibilities and state-of-the-art in NLP and give some clues on where to get started.
Develop a fundamental overview of Google TensorFlow, one of the most widely adopted technologies for advanced deep learning and neural network applications. Understand the core concepts of artificial intelligence, deep learning and machine learning and the applications of TensorFlow in these areas.
The deck also introduces the Spotle.ai masterclass in Advanced Deep Learning With Tensorflow and Keras.
Erik Bernhardsson is the CTO at Better, a small startup in NYC working with mortgages. Before Better, he spent five years at Spotify managing teams working with machine learning and data analytics, in particular music recommendations.
Abstract Summary:
Nearest Neighbor Methods And Vector Models: Vector models are being used in a lot of different fields: natural language processing, recommender systems, computer vision, and other things. They are fast and convenient and are often state of the art in terms of accuracy. One of the challenges with vector models is that as the number of dimensions increase, finding similar items gets challenging. Erik developed a library called “Annoy” that uses a forest of random tree to do fast approximate nearest neighbor queries in high dimensional spaces. We will cover some specific applications of vector models with and how Annoy works.
Deep Learning For Practitioners, lecture 2: Selecting the right applications...ananth
In this presentation we articulate when deep learning techniques yield best results from a practitioner's view point. Do we apply deep learning techniques for every machine learning problem? What characteristics of an application lends itself suitable for deep learning? Does more data automatically imply better results regardless of the algorithm or model? Does "automated feature learning" obviate the need for data preprocessing and feature design?
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
Deep Learning is all the rage these days, but where does the reality of what Deep Learning can do end and the media hype begin? In this talk, I will dispel common myths about Deep Learning that are not necessarily true and help you decide whether you should practically use Deep Learning in your software stack.
I’ll begin with a technical overview of common neural network architectures like CNNs, RNNs, GANs and their common use cases like computer vision, language understanding or unsupervised machine learning. Then I’ll separate the hype from reality around questions like:
• When should you prefer traditional ML systems like scikit learn or Spark.ML instead of Deep Learning?
• Do you no longer need to do careful feature extraction and standardization if using Deep Learning?
• Do you really need terabytes of data when training neural networks or can you ‘steal’ pre-trained lower layers from public models by using transfer learning?
• How do you decide which activation function (like ReLU, leaky ReLU, ELU, etc) or optimizer (like Momentum, AdaGrad, RMSProp, Adam, etc) to use in your neural network?
• Should you randomly initialize the weights in your network or use more advanced strategies like Xavier or He initialization?
• How easy is it to overfit/overtrain a neural network and what are the common techniques to ovoid overfitting (like l1/l2 regularization, dropout and early stopping)?
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
n this SlideShare presentation, we delve into the intricate world of NLP Transformers, exploring their underlying architecture and uncovering their immense power in Natural Language Processing (NLP). Join us as we demystify the complexities and provide a comprehensive overview of how Transformers revolutionize tasks such as machine translation, sentiment analysis, question answering, and more. Gain valuable insights into the transformer model, attention mechanisms, self-attention, and the transformer encoder-decoder structure. Whether you're an NLP enthusiast or a beginner, this presentation will equip you with a solid foundation to comprehend and harness the potential of NLP Transformers.
As a data science Intern at Leapcheck Services private limited, I have developed a naive chatbot using sequence to sequence model by LSTM of RNN. Sharing the tutorial which I made explicitly for the deep learning enthusiasts to
provide them a basic insight on how chatbot can be developed with the help of recurrent neural network.
This is a single day course, allows the learner to get experience with the basic details of deep learning, first half is building a network using python/numpy only and the second half we build the more advanced netwrok using TensorFlow/Keras.
At the end you will find a list of usefull pointers to continue.
course git: https://gitlab.com/eshlomo/EazyDnn
This is a slide deck from a presentation, that my colleague Shirin Glander (https://www.slideshare.net/ShirinGlander/) and I did together. As we created our respective parts of the presentation on our own, it is quite easy to figure out who did which part of the presentation as the two slide decks look quite different ... :)
For the sake of simplicity and completeness, I just copied the two slide decks together. As I did the "surrounding" part, I added Shirin's part at the place when she took over and then added my concluding slides at the end. Well, I'm sure, you will figure it out easily ... ;)
The presentation was intended to be an introduction to deep learning (DL) for people who are new to the topic. It starts with some DL success stories as motivation. Then a quick classification and a bit of history follows before the "how" part starts.
The first part of the "how" is some theory of DL, to demystify the topic and explain and connect some of the most important terms on the one hand, but also to give an idea of the broadness of the topic on the other hand.
After that the second part dives deeper into the question how to actually implement DL networks. This part starts with coding it all on your own and then moves on to less coding step by step, depending on where you want to start.
The presentation ends with some pitfalls and challenges that you should have in mind if you want to dive deeper into DL - plus the invitation to become part of it.
As always the voice track of the presentation is missing. I hope that the slides are of some use for you, though.
This is a slide deck from a presentation, that my colleague Uwe Friedrichsen (https://www.slideshare.net/ufried/) and I did together. As we created our respective parts of the presentation on our own, it is quite easy to figure out who did which part of the presentation as the two slide decks look quite different ... :)
For the sake of simplicity and completeness, Uwe copied the two slide decks together. As he did the "surrounding" part, he added my part at the place where I took over and then added concluding slides at the end. Well, I'm sure, you will figure it out easily ... ;)
The presentation was intended to be an introduction to deep learning (DL) for people who are new to the topic. It starts with some DL success stories as motivation. Then a quick classification and a bit of history follows before the "how" part starts.
The first part of the "how" is some theory of DL, to demystify the topic and explain and connect some of the most important terms on the one hand, but also to give an idea of the broadness of the topic on the other hand.
After that the second part dives deeper into the question how to actually implement DL networks. This part starts with coding it all on your own and then moves on to less coding step by step, depending on where you want to start.
The presentation ends with some pitfalls and challenges that you should have in mind if you want to dive deeper into DL - plus the invitation to become part of it.
As always the voice track of the presentation is missing. I hope that the slides are of some use for you, though.
This is my Summer internship project presentation.I have Worked on total three projects and all the brief related details are provided in the presentation.
Thanks to Eckovation.
Traditional Machine Learning had used handwritten features and modality-specific machine learning to classify images, text or recognize voices. Deep learning / Neural network identifies features and finds different patterns automatically. Time to build these complex tasks has been drastically reduced and accuracy has exponentially increased because of advancements in Deep learning. Neural networks have been partly inspired from how 86 billion neurons work in a human and become more of a mathematical and a computer problem. We will see by the end of the blog how neural networks can be intuitively understood and implemented as a set of matrix multiplications, cost function, and optimization algorithms.
Automatic Attendace using convolutional neural network Face Recognitionvatsal199567
Automatic Attendance System will recognize the face of the student through the camera in the class and mark the attendance. It was built in Python with Machine Learning.
Presentation on Neural Networks in Tensorflow. Code available at https://github.com/nfmcclure/tensorflow_cookbook . Presentation for Open Source Bridge, Portland, 2016.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
6. Exploring the Twitter data
•All the Twitter data is stored in ElasticSearch…,
•We don’t know exactly yet what it looks like…,
•We want to create a Recurrent Neural Network with LSTM in Tensorflow…,
•So it’s a good thing Python has an ElasticSearch and a Tensorflow module!
Short demo
7. Predicting the sentiment of a tweet: positive or negative?
1 Million tweets… How to analyze these tweets and how do we put them in a deep learning algorithm?
Deep learning needs scalars or matrices of scalars as input.
For example a convolutional neural network uses pixels of images for object recognition
Likewise text/speech needs to be vectorized before analyzing it.
“Only words or word encodings provide no useful information regarding the relationships
that may exist between the individual symbols” (tensorflow.org).
So vectorization of our tweets….
9. The basic ideas behind a Word2Vec model
Word2Vec
model
Neural Network with one hidden layer
This hidden layer is a matrix with dimension N x D where
D is the length of a vector representing a word.
The input is a one-hot vector of a word and has dimension
N x 1 where N is the number of words in your dictionary.
The output layer is a vector with probabilities that a the
input word is the neighbour of the words in this vector.
This hidden layer is exactly what we are looking for!
10.
11. Pre-trained Word2Vec models
• Available on Stanford website (https://nlp.stanford.edu/projects/glove/)
• Data available with different number of words and several vector dimensions.
• In this project a set of 400k words is used with vectors of dimension 50 x 1.
• The data consist of a word list and a matrix:
❖The word list contains 400k words each represented by a number
❖The matrix has dimension 400k x 50, for each word a vector representation of length 50
12. Long-Short term memories, why should we use them?
source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks are sufficient if you want to predict for instance the sentiment of:
“The movie was really bad”
The problem arises when the relevant information is much further away or spread out over multiple
sentences:
“This is the best day ever. The weather is beautiful and I got a new job. However the movie I just saw
was really bad”
In a more simple recurrent network this may be predicted as negative. Long-Short term memories can
deal with the information in the whole text.
13. First an intuitive interpretation
•The complete network consist of n such layers.
•At each layer you put in the next word of your text, Xt, and add it to the already stored information.
•A number of updates and calculations are done and finally there is some output, ht and we move on
to the next layer.
And now step by step…
14. Step by Step: the main information line
•On this line all the information is stored and this information loops through all the cells until all words a
•Within each cell information is added, removed and updated.
15. Step by Step: the forget gate
•The next word is added to the cell, Xt, just like the information from the previous cell, ht-1.
•The sigmoid function determines which information from ht-1 is kept, e.g.:
When Xt is a new subject, you may want to forget the old one which is stored in the cell state at
the main line
•The outcome is multiplied with information from the cell state Ct-1.
16. Step by Step: the forget gate - example
• Assume the word at Xt is “bitcoin”. As earlier stated we use word vectors:
• The vector is multiplied by a weight matrix Wf with dimension 50 x (num LSTM units) and after that a bias is
added. In formula notation:
• We work with 50d vectors and 64 LSTM units, so the formula gives us:
• Finally this is put into the sigmoid function and the outcome goes to the cell state Ct
• Together with the previous state ht-1 the complete equation becomes:
𝜎(𝑋𝑡 ∗ 𝑊𝑥,𝑓 + 𝑏𝑥,𝑓)
𝑋𝑡 ∗ 𝑊𝑥,𝑓 + 𝑏𝑥,𝑓
𝜎(𝑋𝑡 ∗ 𝑊𝑥,𝑓 + 𝑏𝑥,𝑓)
𝜎(ℎ𝑡−1 ∗ 𝑊ℎ,𝑓 + 𝑏ℎ,𝑓)
17. Step by Step: the input gate
•The input gate consists of two functions:
1. A sigmoid function is used to determine what kind of information we would like to store. e.g. the
new subject
2. A tanh function is used to determine the content of the information, e.g. is the new subject male
or female?
•The output of these functions together is added to the current cell state Ct.
18. Step by Step: the output gate
•The output gate filters some information from the current cell state.
•A sigmoid decides what we are going to output and the tanh function makes sure the values are
between -1 and 1:
If we saw a new subject the output will be whether the subject is male or female, singular or plural.
20. Hyperparameters:
There are a lot of choices you have to make before training the RNN with LSTM.
• Length of the sequence: the number of LSTM cells.
• Number of LSTM units: comparable with the number of units in a layer of a regular NN.
• Iterations: how often you run the model during your training. Each iteration you run one batch.
• Batch size: each iteration you run one batch of tweets.
• Optimizer: the function that tries to optimize the loss. Often used functions are Gradient Descent and Adam.
• DropoutWrapper and its probability: the probability of keeping informations, it helps to prevent from overfitting.
• Learning rate: too big and you model may not converge, too small and it may take ages.
21. Loss function:
The loss function we use is softmax cross entropy:
• Softmax function: it squashes the output vector with real numbers to a vector with
real numbers between 0 and 1 and such that they add up to 1:
𝑆(𝑣)𝑖 =
𝑒𝑣𝑖
𝑘=1
𝑁
𝑒(𝑣𝑘)
• Cross entropy is an often used alternative of the well known squared error and is
defined by:
𝐻(𝑦, 𝑝) = −
𝑖
𝑦𝑖𝑙𝑜𝑔(𝑆𝑖)
Where Si is the output of the softmax function. Cross entropy is only useful when
the input is a probability distribution and therefore the Softmax function.
22. Optimization of the loss function:
The optimization functions used in this model are Gradient Descent and the Adam optimizer. The
Adam optimizer is an extension of Stochastic Gradient Descent. The SGD is defined as
SGD maintains a single learning rate for all parameter updates. Adam has learning rates for each
network weight and they are separately adapted.
• Adam: Adaptive Moment Estimation
• Adam stores the first and second moments (mean and variance) of the decaying average of the past gr
𝑚𝑡 = 𝛽1𝑚𝑡 − 1 + (1 − 𝛽1)𝛻𝑡
𝑣𝑡 = 𝛽2𝑣𝑡 − 1 + (1 − 𝛽2)𝛻𝑡
2
These variables are used to update your parameters/weights used in the model
𝑊𝑡+1 = 𝑊𝑡 −
𝛼
(𝑣𝑡) + 𝜖
𝑚𝑡
𝑊𝑡+1 = 𝑊𝑡 − 𝛼𝛻𝑡
http://ruder.io/optimizing-gradient-descent/index.html#adam
26. How about the ‘derivative’ of the sentiment?
• If the sentiment is getting better, the derivative is positive,
• If the sentiment is getting worse, the derivative is negative,
• If the sentiment is stable, the derivative is zero.
27. Discussion and conclusion
• Recurrent Neural Networks with LSTM are powerful tools to work with,
• The mathematics behind it are complicated, however the code is not that hard to understand,
• Many parameters to tune,
• Bitcoins and sentiment are not related according to this model.
Some possible improvements:
• Use a training set with the same kind of tweets as the actual set,
• Use other keywords in your tweets than only news and finance related topics
• Put a higher weight on tweets that were more retweeted than others.
28. Thank you all for coming
★Questions: https://www.linkedin.com/in/olaf-de-leeuw-6a2b073b/
★Code/Notebooks: https://github.com/olafdeleeuw/ODSC-London-2018
Editor's Notes
Leicester Square
We wanted to learn about RNN’s with LSTM and sentiment analysis.
Needed a cool topic, so bitcoin
We build an application in Java that collects Twitter data and stores it in ES. We run the collector during a couple weeks
We collected tweets with finance and news related items
The bitcoin data is stored in MySQL
Opportunity to learn some new things: ES
I wanted to learn about LSTM and Tensorflow
So I needed ES, Tensorflow and Recurrent NN —> python
We collected 1 million tweets but a RNN needs vectors, no strings
Example about image recognition
Strings provide no useful info to a RNN
How to convert the data? Vectorization
Words related in semantics, meaning and context are closer to each other
Word2Vec is a neural network: 1 hidden layer
Input is a 1-hot vector: see picture next slide. Length is the number of words in your dictionary. In my case 400k
Output of the NN is a vector with probabilities for all words in the dict that your input word is the neighbour.
Hidden layer is the vector matrix we want. It has dim 400k X 50 and is the vector representation of all the words in our dictionary
The hidden layer is the word vector matrix. We don’t need the output layer here
You can train a word2vec yourself but you need a lot of text. It is not the purpose of this talk. So I used pre-trained model.
There are sets available with vector dim from 50 to 300
So we split our tweets into words and each word in the tweet is converted to a word vector.
Use RNN with LSTM when regular RNN is not good enough, so when there is too much information and when it’s spread out.
Ref cholas blog
N cells, usually about the number of words. In our case the max length of tweets is about 60.
At each cell you put in a new word of your tweet
In the cell the input of the new word and output of previous cell is used to update your information about the sentiment, about your prediction..
Main layer, stores all relevant information
This goes from beginning to end, the output
The information is updated in each cell based on new words via multiplication and addition
Next word added just like information about the previous state
Sigmoid determines what to throw away from this —> 0 all, 1 nothing
Example: a new subject may be interesting and you may want to throw away the old subject
The output of sigmoid is multiplied with the current cell state to throw away this irrelevant data
Bitcoin as a vector —> Xt via Globe
Multiplication by weight matrix and add bias
In the model we start with a random normal distributed weight matrix and a constant bias
Via optimization algorithms such as SGD or Adam these weights and biases are updated
The outcomes are multiplied with Ct to throw away the information you don’t need anymore
AT the input gate you do 2 things:
- determine which items you want to update, e.g. the new subject
- determine what information you want to update: e.g. plural or singular, male or female
This is added (not multiplied) to Ct because you want to add information
In the last gate information is filtered which we would like to output
This information is also sent to the forget gate of the next cell
A sigmoid function determines which items are output, such as the new subject as in the previous example
A tanh function on the cell state determines what information the model outputs at this timestep
Start with all the tweets
Split them to lists
Create indices of words
Create vectors with the Globe dictionary/dataset
Run the RNN model with LSTM —> check loss, optimize with for example Adam
Evaluate the output labels
Explain hyperparameters
Batch size and number iterations may influence the overfitting of your model. My example subset, with bs 64 and 100k iterations
for each item you want the chances sum up to one —> softmax, e.g. 0,4 for pos and 0,6 for neg
So in fact it creates a probability distribution
Normal squared error causes non-convex functions for classification, therefore cross entropy. This makes sure we have a convex problem
Adam is better suitable because learning rate for each parameter, SGD 1 for all
Changing epsilon can help to prevent from fluctuations, in my model it didn’t
One period without predictions, because I had no data. Skiing :)