[Mmlab seminar 2016] deep learning for human pose estimationWei Yang
This document summarizes recent advances in deep learning approaches for human pose estimation. It describes early methods like DeepPose that used cascades of regressors. Later works introduced heatmap regression to capture spatial information. Convolutional Pose Machine and Stacked Hourglass networks further improved accuracy by incorporating stronger context modeling through deeper networks with larger receptive fields and intermediate supervision. These approaches demonstrate that both local appearance cues and modeling of global context and structure are important for accurate human pose estimation.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning-for-pose-estimation-wyang-defenseWei Yang
This document summarizes a thesis proposal on using deep learning for articulated human pose estimation. The proposed method uses a deep convolutional neural network (DCNN) as a front-end to extract local appearance features of body parts, combined with message passing layers to model spatial relationships between parts through pairwise constraints. This global pose model is trained end-to-end using a max-sum algorithm to maximize consistency across the entire human pose. Experimental results on standard pose estimation datasets demonstrate state-of-the-art performance.
center point-based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. Center- Net achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS.
CNNs are a type of deep learning algorithm used for computer vision tasks. They use convolutional and pooling layers to extract features from image data. CNNs apply multiple filters to input images to generate feature maps, which are then passed through activation functions to introduce nonlinearity. The features are then pooled and fed into fully connected layers for classification. Common CNN architectures include LeNet-5, AlexNet, VGG16, GoogLeNet, and ResNet, with more recent models featuring deeper networks and improved training techniques.
1. Recurrent neural networks can model sequential data like time series by incorporating hidden state that has internal dynamics. This allows the model to store information for long periods of time.
2. Two key types of recurrent networks are linear dynamical systems and hidden Markov models. Long short-term memory networks were developed to address the problem of exploding or vanishing gradients in training traditional recurrent networks.
3. Recurrent networks can learn tasks like binary addition by recognizing patterns in the inputs over time rather than relying on fixed architectures like feedforward networks. They have been successfully applied to handwriting recognition.
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.
[Mmlab seminar 2016] deep learning for human pose estimationWei Yang
This document summarizes recent advances in deep learning approaches for human pose estimation. It describes early methods like DeepPose that used cascades of regressors. Later works introduced heatmap regression to capture spatial information. Convolutional Pose Machine and Stacked Hourglass networks further improved accuracy by incorporating stronger context modeling through deeper networks with larger receptive fields and intermediate supervision. These approaches demonstrate that both local appearance cues and modeling of global context and structure are important for accurate human pose estimation.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning-for-pose-estimation-wyang-defenseWei Yang
This document summarizes a thesis proposal on using deep learning for articulated human pose estimation. The proposed method uses a deep convolutional neural network (DCNN) as a front-end to extract local appearance features of body parts, combined with message passing layers to model spatial relationships between parts through pairwise constraints. This global pose model is trained end-to-end using a max-sum algorithm to maximize consistency across the entire human pose. Experimental results on standard pose estimation datasets demonstrate state-of-the-art performance.
center point-based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. Center- Net achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS.
CNNs are a type of deep learning algorithm used for computer vision tasks. They use convolutional and pooling layers to extract features from image data. CNNs apply multiple filters to input images to generate feature maps, which are then passed through activation functions to introduce nonlinearity. The features are then pooled and fed into fully connected layers for classification. Common CNN architectures include LeNet-5, AlexNet, VGG16, GoogLeNet, and ResNet, with more recent models featuring deeper networks and improved training techniques.
1. Recurrent neural networks can model sequential data like time series by incorporating hidden state that has internal dynamics. This allows the model to store information for long periods of time.
2. Two key types of recurrent networks are linear dynamical systems and hidden Markov models. Long short-term memory networks were developed to address the problem of exploding or vanishing gradients in training traditional recurrent networks.
3. Recurrent networks can learn tasks like binary addition by recognizing patterns in the inputs over time rather than relying on fixed architectures like feedforward networks. They have been successfully applied to handwriting recognition.
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
- Masked Autoencoders Are Scalable Vision Learners presents a new self-supervised learning method called Masked Autoencoder (MAE) for computer vision.
- MAE works by masking random patches of input images, encoding the visible patches, and decoding to reconstruct the full image. This forces the model to learn visual representations from incomplete views of images.
- Experiments on ImageNet show that MAE achieves superior results compared to supervised pre-training from scratch as well as other self-supervised methods, scaling effectively to larger models. MAE representations also transfer well to downstream tasks like object detection, instance segmentation and semantic segmentation.
Interpreting deep learning and machine learning models is not just another regulatory burden to be overcome. Scientists, physicians, researchers, and analyst that use these technologies for their important work have the right to trust and understand their models and the answers they generate. This talk is an overview of several techniques for interpreting deep learning and machine learning models and telling stories from their results.
Speaker: Patrick Hall is a Data Scientist and Product Engineer at H2O.ai. He’s also an Adjunct Professor at George Washington University for the Department of Decision Sciences. Prior to joining H2O, Patrick spent many years as a Senior Data Scientist SAS and has worked with many Fortune 500 companies on their data science and machine learning problems. https://www.linkedin.com/in/jpatrickhall
안녕하세요 딥러닝 논문읽기 모임입니다 오늘 업로드된 논문 리뷰 영상은 2021 CVPR에서 oral Accept된 style space analysis 라는 제목의 논문 입니다.
논문은 다양한 데이터 세트에서 pretrained model을 사용하여 이미지 생성을 위한 최첨단 아키텍처인 StyleGAN2의 latent space를 탐색하고 분석하는 논문입니다.
먼저 채널별 스타일 매개변수의 공간인 StyleSpace가 이전 작업에서 탐색한 다른 중간 잠재 공간보다 훨씬 더 얽혀 있음을 보여줍니다.
각각이 고도로 지역화되고 얽힌 방식으로 고유한 시각적 속성을 제어하는 것으로 표시되는 스타일 채널의 대규모 컬렉션을 검색하는 방법을 설명합니다.
Style Gan1, 2를 한번 리뷰 해보신 분들이라면 더 재밌게 리뷰가 가능하실것 같습니다.
오늘도 많은 관심 미리 감사드립니다!
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Introduction to Recurrent Neural NetworkKnoldus Inc.
The document provides an introduction to recurrent neural networks (RNNs). It discusses how RNNs differ from feedforward neural networks in that they have internal memory and can use their output from the previous time step as input. This allows RNNs to process sequential data like time series. The document outlines some common RNN types and explains the vanishing gradient problem that can occur in RNNs due to multiplication of small gradient values over many time steps. It discusses solutions to this problem like LSTMs and techniques like weight initialization and gradient clipping.
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
안녕하세요 TensorFlow Korea 논문 읽기 모임 PR-12의 297번째 리뷰입니다
어느덧 PR-12 시즌 3의 끝까지 논문 3편밖에 남지 않았네요.
시즌 3가 끝나면 바로 시즌 4의 새 멤버 모집이 시작될 예정입니다. 많은 관심과 지원 부탁드립니다~~
(멤버 모집 공지는 Facebook TensorFlow Korea 그룹에 올라올 예정입니다)
오늘 제가 리뷰한 논문은 Facebook의 Training data-efficient image transformers & distillation through attention 입니다.
Google에서 나왔던 ViT논문 이후에 convolution을 전혀 사용하지 않고 오직 attention만을 이용한 computer vision algorithm에 어느때보다 관심이 높아지고 있는데요
이 논문에서 제안한 DeiT 모델은 ViT와 같은 architecture를 사용하면서 ViT가 ImageNet data만으로는 성능이 잘 안나왔던 것에 비해서
Training 방법 개선과 새로운 Knowledge Distillation 방법을 사용하여 mageNet data 만으로 EfficientNet보다 뛰어난 성능을 보여주는 결과를 얻었습니다.
정말 CNN은 이제 서서히 사라지게 되는 것일까요? Attention이 computer vision도 정복하게 될 것인지....
개인적으로는 당분간은 attention 기반의 CV 논문이 쏟아질 거라고 확신하고, 또 여기에서 놀라운 일들이 일어날 수 있을 거라고 생각하고 있습니다
CNN은 10년간 많은 연구를 통해서 발전해왔지만, transformer는 이제 CV에 적용된 지 얼마 안된 시점이라서 더 기대가 크구요,
attention이 inductive bias가 가장 적은 형태의 모델이기 때문에 더 놀라운 이들을 만들 수 있을거라고 생각합니다
얼마 전에 나온 open AI의 DALL-E도 그 대표적인 예라고 할 수 있을 것 같습니다. Transformer의 또하나의 transformation이 궁금하신 분들은 아래 영상을 참고해주세요
영상링크: https://youtu.be/DjEvzeiWBTo
논문링크: https://arxiv.org/abs/2012.12877
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
The document provides an introduction to supervised machine learning and pattern classification. It begins with an overview of the speaker's background and research interests. Key concepts covered include definitions of machine learning, examples of machine learning applications, and the differences between supervised, unsupervised, and reinforcement learning. The rest of the document outlines the typical workflow for a supervised learning problem, including data collection and preprocessing, model training and evaluation, and model selection. Common classification algorithms like decision trees, naive Bayes, and support vector machines are briefly explained. The presentation concludes with discussions around choosing the right algorithm and avoiding overfitting.
An Autoencoder is a type of Artificial Neural Network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal “noise.”
The document describes a non-local neural network approach for image processing and computer vision tasks. It summarizes the non-local means approach which averages similar pixel values while not averaging dissimilar pixels. It then describes how the non-local neural network extends this idea by using a similarity function to compute a weighted sum of all pixels, allowing it to capture long-range dependencies. The document outlines various implementations of the non-local operation and how it can be integrated into common network architectures like ResNet.
The document discusses the K-nearest neighbors (KNN) algorithm, a simple machine learning algorithm used for classification problems. KNN works by finding the K training examples that are closest in distance to a new data point, and assigning the most common class among those K examples as the prediction for the new data point. The document covers how KNN calculates distances between data points, how to choose the K value, techniques for handling different data types, and the strengths and weaknesses of the KNN algorithm.
The ArangoML Group had a detailed discussion on the topic "GraphSage Vs PinSage" where they shared their thoughts on the difference between the working principles of two popular Graph ML algorithms. The following slidedeck is an accumulation of their thoughts about the comparison between the two algorithms.
This document discusses unsupervised learning approaches including clustering, blind signal separation, and self-organizing maps (SOM). Clustering groups unlabeled data points together based on similarities. Blind signal separation separates mixed signals into their underlying source signals without information about the mixing process. SOM is an algorithm that maps higher-dimensional data onto lower-dimensional displays to visualize relationships in the data.
The document discusses various decision tree learning methods. It begins by defining decision trees and issues in decision tree learning, such as how to split training records and when to stop splitting. It then covers impurity measures like misclassification error, Gini impurity, information gain, and variance reduction. The document outlines algorithms like ID3, C4.5, C5.0, and CART. It also discusses ensemble methods like bagging, random forests, boosting, AdaBoost, and gradient boosting.
The document provides an overview of topological data analysis methods and examples of applications. It describes topological data analysis as a method for partial clustering that allows overlaps between clusters. It also outlines techniques like persistent homology and the Mapper algorithm. Applications discussed include identifying subtypes of diabetes and breast cancer using high-dimensional gene expression and medical data.
Generative adversarial networks (GANs) use two neural networks, a generator and discriminator, that compete against each other. The generator aims to produce realistic samples to fool the discriminator, while the discriminator tries to distinguish real samples from generated ones. This adversarial training can produce high-quality, sharp samples but is challenging to train as the generator and discriminator must be carefully balanced.
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
The document summarizes the DINO self-supervised learning approach for vision transformers. DINO uses a teacher-student framework where the teacher's predictions are used to supervise the student through knowledge distillation. Two global and several local views of an image are passed through the student, while only global views are passed through the teacher. The student is trained to match the teacher's predictions for local views. DINO achieves state-of-the-art results on ImageNet with linear evaluation and transfers well to downstream tasks. It also enables vision transformers to discover object boundaries and semantic layouts.
Pattern recognition and Machine Learning.Rohit Kumar
Machine learning involves using examples to generate a program or model that can classify new examples. It is useful for tasks like recognizing patterns, generating patterns, and predicting outcomes. Some common applications of machine learning include optical character recognition, biometrics, medical diagnosis, and information retrieval. The goal of machine learning is to build models that can recognize patterns in data and make predictions.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Kharita: Robust Road Map Inference Through Network Alignment of Trajectoriesvipyoung
In this work we address the challenge of inferring the road network
of a city from crowd-sourced GPS traces. While the problem has been
addressed before, our solution has the following unique characteristics:
(i) we formulate the road network inference problem as a network
alignment optimization
problem where both the nodes and edges of the network have to be inferred,
(ii) we propose both an offline (\kha) and an online (\khastar) algorithm
which are intuitive and capture the key aspects of the optimization
formulation but are scalable and accurate. The \khastar in particular is, to
the best of our knowledge, the first known online algorithm for map inference,
(iii) we test our approach on two real data sets
and both our code and data sets have been made available for research
reproducibility.
Webinar: Using smart card and GPS data for policy and planning: the case of T...BRTCoE
2014/08/28 webinar by Marcela A. Munizaga
See more in:
http://www.brt.cl/webinar-using-smart-card-and-gps-data-for-policy-and-planning-the-case-of-transantiago/
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
- Masked Autoencoders Are Scalable Vision Learners presents a new self-supervised learning method called Masked Autoencoder (MAE) for computer vision.
- MAE works by masking random patches of input images, encoding the visible patches, and decoding to reconstruct the full image. This forces the model to learn visual representations from incomplete views of images.
- Experiments on ImageNet show that MAE achieves superior results compared to supervised pre-training from scratch as well as other self-supervised methods, scaling effectively to larger models. MAE representations also transfer well to downstream tasks like object detection, instance segmentation and semantic segmentation.
Interpreting deep learning and machine learning models is not just another regulatory burden to be overcome. Scientists, physicians, researchers, and analyst that use these technologies for their important work have the right to trust and understand their models and the answers they generate. This talk is an overview of several techniques for interpreting deep learning and machine learning models and telling stories from their results.
Speaker: Patrick Hall is a Data Scientist and Product Engineer at H2O.ai. He’s also an Adjunct Professor at George Washington University for the Department of Decision Sciences. Prior to joining H2O, Patrick spent many years as a Senior Data Scientist SAS and has worked with many Fortune 500 companies on their data science and machine learning problems. https://www.linkedin.com/in/jpatrickhall
안녕하세요 딥러닝 논문읽기 모임입니다 오늘 업로드된 논문 리뷰 영상은 2021 CVPR에서 oral Accept된 style space analysis 라는 제목의 논문 입니다.
논문은 다양한 데이터 세트에서 pretrained model을 사용하여 이미지 생성을 위한 최첨단 아키텍처인 StyleGAN2의 latent space를 탐색하고 분석하는 논문입니다.
먼저 채널별 스타일 매개변수의 공간인 StyleSpace가 이전 작업에서 탐색한 다른 중간 잠재 공간보다 훨씬 더 얽혀 있음을 보여줍니다.
각각이 고도로 지역화되고 얽힌 방식으로 고유한 시각적 속성을 제어하는 것으로 표시되는 스타일 채널의 대규모 컬렉션을 검색하는 방법을 설명합니다.
Style Gan1, 2를 한번 리뷰 해보신 분들이라면 더 재밌게 리뷰가 가능하실것 같습니다.
오늘도 많은 관심 미리 감사드립니다!
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Introduction to Recurrent Neural NetworkKnoldus Inc.
The document provides an introduction to recurrent neural networks (RNNs). It discusses how RNNs differ from feedforward neural networks in that they have internal memory and can use their output from the previous time step as input. This allows RNNs to process sequential data like time series. The document outlines some common RNN types and explains the vanishing gradient problem that can occur in RNNs due to multiplication of small gradient values over many time steps. It discusses solutions to this problem like LSTMs and techniques like weight initialization and gradient clipping.
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
안녕하세요 TensorFlow Korea 논문 읽기 모임 PR-12의 297번째 리뷰입니다
어느덧 PR-12 시즌 3의 끝까지 논문 3편밖에 남지 않았네요.
시즌 3가 끝나면 바로 시즌 4의 새 멤버 모집이 시작될 예정입니다. 많은 관심과 지원 부탁드립니다~~
(멤버 모집 공지는 Facebook TensorFlow Korea 그룹에 올라올 예정입니다)
오늘 제가 리뷰한 논문은 Facebook의 Training data-efficient image transformers & distillation through attention 입니다.
Google에서 나왔던 ViT논문 이후에 convolution을 전혀 사용하지 않고 오직 attention만을 이용한 computer vision algorithm에 어느때보다 관심이 높아지고 있는데요
이 논문에서 제안한 DeiT 모델은 ViT와 같은 architecture를 사용하면서 ViT가 ImageNet data만으로는 성능이 잘 안나왔던 것에 비해서
Training 방법 개선과 새로운 Knowledge Distillation 방법을 사용하여 mageNet data 만으로 EfficientNet보다 뛰어난 성능을 보여주는 결과를 얻었습니다.
정말 CNN은 이제 서서히 사라지게 되는 것일까요? Attention이 computer vision도 정복하게 될 것인지....
개인적으로는 당분간은 attention 기반의 CV 논문이 쏟아질 거라고 확신하고, 또 여기에서 놀라운 일들이 일어날 수 있을 거라고 생각하고 있습니다
CNN은 10년간 많은 연구를 통해서 발전해왔지만, transformer는 이제 CV에 적용된 지 얼마 안된 시점이라서 더 기대가 크구요,
attention이 inductive bias가 가장 적은 형태의 모델이기 때문에 더 놀라운 이들을 만들 수 있을거라고 생각합니다
얼마 전에 나온 open AI의 DALL-E도 그 대표적인 예라고 할 수 있을 것 같습니다. Transformer의 또하나의 transformation이 궁금하신 분들은 아래 영상을 참고해주세요
영상링크: https://youtu.be/DjEvzeiWBTo
논문링크: https://arxiv.org/abs/2012.12877
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
The document provides an introduction to supervised machine learning and pattern classification. It begins with an overview of the speaker's background and research interests. Key concepts covered include definitions of machine learning, examples of machine learning applications, and the differences between supervised, unsupervised, and reinforcement learning. The rest of the document outlines the typical workflow for a supervised learning problem, including data collection and preprocessing, model training and evaluation, and model selection. Common classification algorithms like decision trees, naive Bayes, and support vector machines are briefly explained. The presentation concludes with discussions around choosing the right algorithm and avoiding overfitting.
An Autoencoder is a type of Artificial Neural Network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal “noise.”
The document describes a non-local neural network approach for image processing and computer vision tasks. It summarizes the non-local means approach which averages similar pixel values while not averaging dissimilar pixels. It then describes how the non-local neural network extends this idea by using a similarity function to compute a weighted sum of all pixels, allowing it to capture long-range dependencies. The document outlines various implementations of the non-local operation and how it can be integrated into common network architectures like ResNet.
The document discusses the K-nearest neighbors (KNN) algorithm, a simple machine learning algorithm used for classification problems. KNN works by finding the K training examples that are closest in distance to a new data point, and assigning the most common class among those K examples as the prediction for the new data point. The document covers how KNN calculates distances between data points, how to choose the K value, techniques for handling different data types, and the strengths and weaknesses of the KNN algorithm.
The ArangoML Group had a detailed discussion on the topic "GraphSage Vs PinSage" where they shared their thoughts on the difference between the working principles of two popular Graph ML algorithms. The following slidedeck is an accumulation of their thoughts about the comparison between the two algorithms.
This document discusses unsupervised learning approaches including clustering, blind signal separation, and self-organizing maps (SOM). Clustering groups unlabeled data points together based on similarities. Blind signal separation separates mixed signals into their underlying source signals without information about the mixing process. SOM is an algorithm that maps higher-dimensional data onto lower-dimensional displays to visualize relationships in the data.
The document discusses various decision tree learning methods. It begins by defining decision trees and issues in decision tree learning, such as how to split training records and when to stop splitting. It then covers impurity measures like misclassification error, Gini impurity, information gain, and variance reduction. The document outlines algorithms like ID3, C4.5, C5.0, and CART. It also discusses ensemble methods like bagging, random forests, boosting, AdaBoost, and gradient boosting.
The document provides an overview of topological data analysis methods and examples of applications. It describes topological data analysis as a method for partial clustering that allows overlaps between clusters. It also outlines techniques like persistent homology and the Mapper algorithm. Applications discussed include identifying subtypes of diabetes and breast cancer using high-dimensional gene expression and medical data.
Generative adversarial networks (GANs) use two neural networks, a generator and discriminator, that compete against each other. The generator aims to produce realistic samples to fool the discriminator, while the discriminator tries to distinguish real samples from generated ones. This adversarial training can produce high-quality, sharp samples but is challenging to train as the generator and discriminator must be carefully balanced.
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
The document summarizes the DINO self-supervised learning approach for vision transformers. DINO uses a teacher-student framework where the teacher's predictions are used to supervise the student through knowledge distillation. Two global and several local views of an image are passed through the student, while only global views are passed through the teacher. The student is trained to match the teacher's predictions for local views. DINO achieves state-of-the-art results on ImageNet with linear evaluation and transfers well to downstream tasks. It also enables vision transformers to discover object boundaries and semantic layouts.
Pattern recognition and Machine Learning.Rohit Kumar
Machine learning involves using examples to generate a program or model that can classify new examples. It is useful for tasks like recognizing patterns, generating patterns, and predicting outcomes. Some common applications of machine learning include optical character recognition, biometrics, medical diagnosis, and information retrieval. The goal of machine learning is to build models that can recognize patterns in data and make predictions.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Kharita: Robust Road Map Inference Through Network Alignment of Trajectoriesvipyoung
In this work we address the challenge of inferring the road network
of a city from crowd-sourced GPS traces. While the problem has been
addressed before, our solution has the following unique characteristics:
(i) we formulate the road network inference problem as a network
alignment optimization
problem where both the nodes and edges of the network have to be inferred,
(ii) we propose both an offline (\kha) and an online (\khastar) algorithm
which are intuitive and capture the key aspects of the optimization
formulation but are scalable and accurate. The \khastar in particular is, to
the best of our knowledge, the first known online algorithm for map inference,
(iii) we test our approach on two real data sets
and both our code and data sets have been made available for research
reproducibility.
Webinar: Using smart card and GPS data for policy and planning: the case of T...BRTCoE
2014/08/28 webinar by Marcela A. Munizaga
See more in:
http://www.brt.cl/webinar-using-smart-card-and-gps-data-for-policy-and-planning-the-case-of-transantiago/
Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...Biplav Srivastava
Simulation is known to be an effective technique to understand
and manage traffic in cities of developed countries. However, in developing countries, traffic management is lacking due to a wide diversity of vehicles on the road, their chaotic movement, little instrumentation to sense traffic state and limited funds to create IT and physical infrastructure to ameliorate the situation. Under these conditions, in this paper, we present our approach of using the Megaffic traffic simulator as a service to gain actionable insights for two use-cases and cities in India, a first. Our approach is general to be readily used in other use cases and cities; and our results give new insights: (a) using demographics data, traffic demand can be reduced if timings of government offices are altered in Delhi, (b) using a mobile company’s Call
Data Record (CDR) data to mine trajectories anonymously,
one can take effective traffic actions while organizing events
in Mumbai at local scale.
'Geo-spatial analysis' covers a wide range of analytical techniques which are used in identifying the patterns in Spatial data. Spatial data provides the better understanding of the problem and also in decision making. In order to explore a Geo spatial phenomena we have to categorize the problem and it is done by the better understanding how the data can be expressed for a feature. High population growth rates have posed new traffic management challenges in the urban areas. We are evaluating how Geo-spatial techniques are utilized in Road traffic analysis by visualizing the traffic congestion areas.
Theme 3b Users perspective of integrated transit systemsBRTCoE
This document discusses factors that influence public transit route choice modeling. It analyzed data from route choice surveys conducted in London and Santiago to identify tangible variables like travel time and intangible factors like comfort, transfers and the network topology. The study found travelers consider various in-vehicle, waiting, walking and transfer times along with crowding levels, seating availability and angular costs. Londoners placed more value on avoiding transfers while Santiago residents prioritized in-vehicle time. The network map design also impacted perceived route attributes.
This document discusses optimizing data crunching tasks by moving processing to more efficient languages and data formats. It summarizes benchmarks of completing two data analysis tasks on taxi trip data using Python, C, Awk and other tools. The key findings are that shell tools and simple C code using memory mapping are over 20x faster than Python for these text-based tasks. It also demonstrates how reworking tasks to use integer-based schemas and avoid floating point calculations can improve reproducibility and efficiency.
Machine Learning Approach to Report Prioritization with an ...butest
The document describes a machine learning approach to prioritize reports in a vehicle communication network. Reports contain information like travel times and are stored in local databases with limited size. A model is developed to learn the relevance of reports using attributes like age and road type. The model is applied to prioritize travel time reports to disseminate updated speeds to vehicles for route planning. Evaluation shows logistic regression accurately predicts relevant reports to change routes.
Cab travel time prediction using ensemble modelsAyan Sengupta
This document discusses developing a model to predict taxi travel times along different routes in Porto, Portugal based on floating car data. It involves motion detection on GPS data to identify stops, grid matching to map GPS coordinates to a road network, building a directed graph of the network, and using supervised learning methods like decision trees on historical travel times to predict times for new trips. Key factors in the predictions are day of week, time of day, and features derived from taxi GPS data. The model performance is evaluated based on mean absolute percentage error, mean absolute error, and mean percentage error.
This document summarizes a presentation on a hybrid approach to journey planning that minimizes environmental impact. The approach uses Dijkstra's algorithm to find the closest public transport nodes to the start and end points, and then builds a mathematical model to compute the optimal journey between the nodes. The model is a mixed integer linear program that minimizes a weighted combination of travel time and environmental cost. The approach was developed for the GreenYourMove project, which aims to create a multi-modal transport planning app that provides the most environmentally friendly routes.
Spot speed studies involve measuring the instantaneous speeds of vehicles at a point on the road. There are two main methods - measuring the time taken to travel a short distance or using a radar speed meter. Spot speeds are useful for traffic planning, road design, setting speed limits, and accident analysis. The radar method is efficient as it can instantly and automatically measure and record speeds accurately. Time-mean speed is the average of all instantaneous speeds measured, while space-mean speed represents the average speed of all vehicles traveling along a road section. Spot speed studies provide important input for various traffic engineering problems.
This document discusses vehicle routing and scheduling models and algorithms. It introduces basic models like the Traveling Salesman Problem (TSP), Vehicle Routing Problem (VRP), and Pickup and Delivery Problem with Time Windows (PDPTW). Construction heuristics like savings, insertion, and set covering algorithms are presented to find initial feasible solutions that can then be improved using local search methods. The document outlines practical considerations and recent variants like dynamic and stochastic routing problems.
Spark Summit EU talk by Javier AguedesSpark Summit
This document discusses generating origin-destination matrices from mobile network data using Spark. It describes using telco event data to dynamically compute OD matrices over time and by mode of transport between geographic regions. The approach involves processing raw telco data to estimate locations, detect trips, determine mode of transport, and aggregate the results into OD matrices for given time periods and regions. Spark is used to efficiently process large datasets in parallel. The matrices are validated against ground truth data and checked for logical consistency.
Autonomous vehicle (AV) technology is positioned to have a significant impact on various industries. Hence, artificial intelligence powered AVs and modern vehicles with advanced driver-assistance systems have been operated in street networks for real-life testing. As these tests become more frequent, accidents have been inevitable and there have been reported crashes. The data from these accidents are invaluable for generating edge case test scenarios and understanding accident-time behavior. In this paper, we use the existing AV accident data and identify the atomic blocks within each accident, which are modular and measurable scenario units. Our approach formulates each accident scenario using these atomic blocks and defines them in the Measurable Scenario Description Language (M-SDL). This approach produces modular scenario units with coverage analysis, provides a method to assist in the measurable analysis of accident-time AV behavior, identifies edge scenarios using AV assessment metrics.
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light...Yamato OKAMOTO
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control (KDD’18)
This paper address the traffic light control problem using a well-designed reinforcement learning approach.
proposed method distinguish the decision process for different traffic light phases.
They conducted experiments using both synthetic and real world.
proposed method showed superior performance over state-of-the-art methods.
Replacing Manhattan Subway Service with On-demand transportationChristian Moscardi
1) The document proposes replacing subway service in Manhattan with on-demand ridesharing during overnight repair periods to reduce costs.
2) A simulation would be used to model routing on-demand vehicles to service subway trips between 12AM-5AM using demand data and routing algorithms.
3) Key metrics like vehicle needs, passenger wait times, and repair costs vs transportation costs would be compared to evaluate the alternative. The simulation aims to answer if on-demand ridesharing can adequately replace subway service during repairs.
Seeking to quantify the viability of operating microtransit shuttles from station to station during nighttime (and weekend) subway closures that the MTA will take to support their signal upgrades.
This document summarizes a study on implementing demand-responsive transit (DRT) services in Stockholm, Sweden. The researchers:
1. Developed a methodology to analyze existing trip data, demographics, and other factors to model demand and identify optimal areas for a DRT pilot project. This included developing a gravity model to estimate potential trips.
2. Applied the methodology to identify the top 3 candidate clusters for the pilot: Sollentuna, Hammarbyhöjden/Björkhagen, and Södertälje.
3. Discussed opportunities to improve the analysis, such as incorporating additional data sources and parameters or using more advanced clustering algorithms.
The document describes a participatory platform for measuring trip quality using smartphone sensors. Key points:
- The system collects accelerometer and GPS data from user smartphones to analyze trip quality, detect bumps and jams, and visualize traffic analytics.
- Apps were developed for iOS and Android to collect data. Algorithms were implemented for trip quality index calculation, bump/jam detection, and moving time percentage.
- A backend server stores and processes the data. A website allows users to view personal trip data and agencies to view global traffic information.
- Data collection and testing involved over 100 users and 700 trips. Algorithms achieved over 86% accuracy for trip quality calculation and 93% for event detection.
The world of transportation is radically changing. It is an industry with immense technological challenges, most of which are AI related. In the current paste and major active industry players, it will become unrecognisable in following years.
In this talk I aim to cover the different fields that it includes, data science related problems that it poses, and current state of the art solutions.
The focus of this talk will be smart cities, which multiple teams @Google work on, including mine and myself.
I will present my own work, including hotspot analysis, trajectory tracking (using a novel clustering method) using GPS and beacon data (patent pending), vehicle identification (classification and clustering), ETA and routing optimisation and personalisation (regression and ranking), drivers and riders matching (ranking and classification) and city planning.
I will also cover but not focus other smart city topics research and solutions by my counterparts on other Google teams and in Uber like autonomous vehicles (not a focus here, it is already too popular and crowded and appears in too many talks), fleet coordination (in a multi agent system), load distribution (reinforcement based), and vehicle syncing.
I will describe problems and solutions including the algorithm / model that is most currently used in the industry to solve such problems. On specific example, which I have personally researched I will go into more detail, including research phases, algorithm inner working and experiment results (usually A/B testing) on real user data.
This talk will give the audience an understanding of the tremendous challenges faced when trying to improve the state of transportation, and how we solve and plan on solving them to make the world a better place. It will also give participants a rare glimpse to some of Google's and Waze's ideas, algorithm, research methodologies and future plans for global transportation.
From personal experience of giving talks on transportation / Waze algorithms (never this one before) I have learnt that this is an "emotional" subject for many people, therefore very exciting to audience and full of questions.
Note that this talk is very different from the one presented last year which was covering multiple fields Waze operates on (e.g. Ads, usage, conversion, behavioural analytics, etc.). This talk would focus only transportation, current state and future which focus on how data science is crucial and the leading field in solving many of these problems.
Concepts, architectures and uses of distributed databases. A gentle introduction to get you up to speed and understand the value and potential of distributed databases.
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
Data science isn't an easy task to pull of.
You start with exploring data and experimenting with models.
Finally, you find some amazing insight!
What now?
How do you transform a little experiment to a production ready workflow? Better yet, how do you scale it from a small sample in R/Python to TBs of production data?
Building a BIG ML Workflow - from zero to hero, is about the work process you need to take in order to have a production ready workflow up and running.
Covering :
* Small - Medium experimentation (R)
* Big data implementation (Spark Mllib /+ pipeline)
* Setting Metrics and checks in place
* Ad hoc querying and exploring your results (Zeppelin)
* Pain points & Lessons learned the hard way (is there any other way?)
Waze @Google is a Big Data company.
We use data and complex analytics to gain insights and make decisions on a daily basis.
This presentations includes teasers and ideas for you based on real use cases from Waze
Big data real time architectures -
How do to big data processing in real time?
What architectures are out there to support this paradigm?
Which one should we choose?
What Advantages / Pitfalls they contain.
This document summarizes a presentation on geo data analytics. It discusses why geo data matters, common data formats and libraries for working with spatial data, challenges of working with spatial data at scale, and solutions including dimension reduction techniques and spatial databases. It also provides tips for working with spatial data in tools like Spark, R, and Javascript libraries.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
2. A taxi goes from
Chinatown to Times
Square. How long will
it take to arrive?
3. Taxi challenge by @Final
In this challenge, you are given data
on taxi rides in New York, containing
information on each ride such as the
start and end points, date, time of
day, distance, etc...
Our purpose is to predict the travel
time (in logarithmic scale) of a ride.
The data is split to train and test sets,
and we can use both general data of
the ride with local data on similar
rides from the train set.
Data : Goal :
4. Ride Information - Given Dataset
● From / To coordinates (lon, lat)
● Departure timestamp
● Trip distance (road distance)
● Vendor - Taxi company (Found to be not important)
● Passenger count (Found to be not important)
5. Data Wizard, expert in big
data processing and
production ready ML.
Googler, Wazer and traffic
analytics expert.
Data Ninja, a pure
professional in every data
spect from gathering and
exploring to modeling.
Kaggle Master
An innovative ML expert
and programmer, expert in
feature engineering and
selection techniques.
Kaggle Master
The Team
A group of talented and creative world class professionals in ML and Traffic Analytics
Nir Malbin Gad Benram Seffi Cohen
CDS (Chief Data Scientist)
for the Israeli Defense
Forces, and pioneer in ML
ensemble techniques.
Kaggle Master
Daniel Marcous
7. Reminders
The Metric
Mean Square Error (log values) / Variance (constant)
sum((y-y')^2) / sum((y-avg(y))^2)
Notes :
● Interesting part :
𝚺((real-pred)^2) ~ Least Squares
● Log values
○ Mistake weight goes down for longer rides
○ Mistake is determined by error percentile -
10 minutes mistake on a 20 minutes ride
matters the same as
20 minutes mistake on a 40 minutes ride.
{log(a)-log(b)=log(a/b)}
8. Predicting ETA Using :
1. Ride Information
2. Environment
3. Geography
4. Inferred States
Machine
Learning
12. Datetime based features
● Month start / end
● Day / Day of week / Hour / 15 Minute interval
● Is weekend / business day
● Is work hour (09:00-17:00)
● Is rush hour (morning / afternoon)
● Is holiday
13. ● Coordinate Transformations (Coding directionality)
○ PCA 2 2
○ RBF
● Spheric (geo) distance
● Distance percentile
● Spheric-Trip distance ratio
Location based features
14. City based features
● NYC Neighbourhood (pair crossing)
● Distance to points of interest (100X2)
○ Schools / Hospitals / Parks etc.
PCA 2
15. Weather based features
● Temperature
● Events - Rain / Snow etc.
● Humidity
● Wind
● Visibility
● Min / Max / Avg / std etc.
PCA 2
16. Inferred Traffic based
features
PCA 1
● Assumption :
our data is a representative sample
of the NYC’s - “driving population”
● Crowdedness
○ #rides in X radius
■ 100 / 500 / 1500 / 5000
■ Euclidean / Manhattan
17. News based
features
1. Crawling NYTimes
2. Topic Modeling
3. Finding topics correlated with ETA
4. Using top10 correlated topics as
features
a. Number of articles on a day for
every topic
19. Caveats
● Timeseries future mixing
● Not exactly a metric that might be the most important
○ Same weight for positive / negative error
● Crowdedness - assumes that data is a representative sample of the total
car population
○ E.g. 2 times the samples equals 2 times the traffic
● Variance - taken from original validation dataset (constant)
20. Public Leader Board
Team Score
Team DCountdown 0.150041
Squanchers 0.160468
Noa's stars 0.165869
Aperture Science 0.167958
R-North 0.175308
TAU Deep Learning Lab 0.182602
Mr Terminal 0.193593
MTG 0.282009
SuperFish 0.302637
22. Machine Learning Approach
● Black Box (~ish)
● Harder to deploy
● Retrain when system changes
Advantages : Disadvantages :
● No manual tuning
● No complex heuristics
● Optimise different metrics
● Personalisation
● Taking different ideas and their
interactions into account as one