[unofficial] Pyramid Scene Parsing Network (CVPR 2017)Shunta Saito
Pyramid Scene Parsing Network introduces the Pyramid Pooling Module to improve semantic segmentation. The module captures context at different regions and scales by performing average pooling at different pyramid levels on the final convolutional feature map. Experiments on ADE20K and PASCAL VOC datasets show the Pyramid Pooling Module improves mean Intersection-over-Union by over 4% compared to global average pooling, achieving state-of-the-art performance.
Hardware progress has enabled solutions which were historically computationally intractable. This is particularly true in video analysis. This technological advance has opened a new frontier of problems. Within this expanse, we have chosen the classic problem of depth inference from images. Specifically, given a sequence of images captured over time, we output depth maps corresponding one-to-one with the input sequence. As a spatiotemporal problem, we were motivated to model it with convolutions (spatial) andLSTMs (temporal). These are used in a U-Net encoder-decoder architecture. The results indicate some potential in such an approach, the process by which we came to this conclusion is detailed below
This document summarizes recent advances in single image super-resolution (SISR) using deep learning methods. It discusses early SISR networks like SRCNN, VDSR and ESPCN. SRResNet is presented as a baseline method, incorporating residual blocks and pixel shuffle upsampling. SRGAN and EDSR are also introduced, with EDSR achieving state-of-the-art PSNR results. The relationship between reconstruction loss, perceptual quality and distortion is examined. While PSNR improves yearly, a perception-distortion tradeoff remains. Developments are ongoing to produce outputs that are both accurately restored and naturally perceived.
Introduction to cosmology and numerical cosmology (with the Cactus code) (2/2)SEENET-MTP
This document discusses using the Cactus code to model cosmological simulations numerically. It introduces the Cosmo and RealSF thorns developed to solve Einstein's equations for cosmological models within the Cactus framework. The Cosmo thorn provides initial data and boundary conditions for the Friedmann-Robertson-Walker metric. The RealSF thorn evolves a scalar field by solving the Klein-Gordon equation. Examples are presented of simulations using these thorns to model pure FRW cosmologies and those with a cosmological constant or scalar field.
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)Shunta Saito
Pyramid Scene Parsing Network introduces the Pyramid Pooling Module to improve semantic segmentation. The module captures context at different regions and scales by performing average pooling at different pyramid levels on the final convolutional feature map. Experiments on ADE20K and PASCAL VOC datasets show the Pyramid Pooling Module improves mean Intersection-over-Union by over 4% compared to global average pooling, achieving state-of-the-art performance.
Hardware progress has enabled solutions which were historically computationally intractable. This is particularly true in video analysis. This technological advance has opened a new frontier of problems. Within this expanse, we have chosen the classic problem of depth inference from images. Specifically, given a sequence of images captured over time, we output depth maps corresponding one-to-one with the input sequence. As a spatiotemporal problem, we were motivated to model it with convolutions (spatial) andLSTMs (temporal). These are used in a U-Net encoder-decoder architecture. The results indicate some potential in such an approach, the process by which we came to this conclusion is detailed below
This document summarizes recent advances in single image super-resolution (SISR) using deep learning methods. It discusses early SISR networks like SRCNN, VDSR and ESPCN. SRResNet is presented as a baseline method, incorporating residual blocks and pixel shuffle upsampling. SRGAN and EDSR are also introduced, with EDSR achieving state-of-the-art PSNR results. The relationship between reconstruction loss, perceptual quality and distortion is examined. While PSNR improves yearly, a perception-distortion tradeoff remains. Developments are ongoing to produce outputs that are both accurately restored and naturally perceived.
Introduction to cosmology and numerical cosmology (with the Cactus code) (2/2)SEENET-MTP
This document discusses using the Cactus code to model cosmological simulations numerically. It introduces the Cosmo and RealSF thorns developed to solve Einstein's equations for cosmological models within the Cactus framework. The Cosmo thorn provides initial data and boundary conditions for the Friedmann-Robertson-Walker metric. The RealSF thorn evolves a scalar field by solving the Klein-Gordon equation. Examples are presented of simulations using these thorns to model pure FRW cosmologies and those with a cosmological constant or scalar field.
The document discusses deep learning concepts without requiring advanced degrees. It introduces StoreKey, a Python package for scientific computing on GPUs and deep learning research. It covers basics like variables, tensors, and autograd in Python. Predictive models discussed include linear regression, logistic regression, and convolutional neural networks. Linear regression fits a line to data to predict unobserved values. Logistic regression predicts binary outcomes by fitting data to a logit function. A convolutional neural network example is shown with input, output, and hidden layers for classification problems.
Learning visual representation without human labelKai-Wen Zhao
Self supervised learning (SSL) is one of the most fast-growing research topic in recent years. SSL provides algorithm that directly learn visual representation from data itself rather than human manual labels. From theoretical point of view, SSL explores information theory & the nature of large scale dataset.
Pilot Contamination Mitigation for Wideband Massive MIMO: Number of Cells Vs ...T. E. BOGALE
The document presents a pilot contamination mitigation technique for wideband massive MIMO systems. It proposes a three-step approach: 1) Allowing pilot transmission in the time domain, 2) Expressing sub-carrier channel estimates as linear combinations of received signals, and 3) Optimizing the number of cells, pilots, and linear combination terms to ensure unbounded signal-to-interference-plus-noise ratio (SINR). The main results show that the number of cells can be increased to L, where L is the number of multipath taps, allowing cancellation of pilot contamination. Simulation results demonstrate that the proposed approach achieves rates close to perfect channel state information.
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-SupervisionDeep Learning JP
The document summarizes an academic paper on unpaired image super-resolution using pseudo-supervision. It presents the following key points:
1. The paper proposes a method using GANs and two networks - a correction network to transform real low-resolution images to clean low-resolution, and a super-resolution network to generate high-resolution images from clean low-resolution.
2. Experiments on multiple datasets demonstrate better results than previous methods, generating high-resolution images from diverse, unpaired low-resolution data.
3. The proposed method was incorporated into Sharp's newest smartphone just 1.5 years after the paper was published, showing the speed of applying academic research.
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
This document discusses convolutional neural networks for image classification and their application to the Kaggle National Data Science Bowl competition. It provides an overview of CNNs and their effectiveness for computer vision tasks. It then details various CNN architectures, preprocessing techniques, and ensembling methods that were tested on the competition dataset, achieving a top score of 0.609 log loss. The document concludes with highlights of the winning team's solution, including novel pooling methods and knowledge distillation.
#6 PyData Warsaw: Deep learning for image segmentationMatthew Opala
Deep learning techniques ignited a great progress in many computer vision tasks like image classification, object detection, and segmentation. Almost every month a new method is published that achieves state-of-the-art result on some common benchmark dataset. In addition to that, DL is being applied to new problems in CV.
In the talk we’re going to focus on DL application to image segmentation task. We want to show the practical importance of this task for the fashion industry by presenting our case study with results achieved with various attempts and methods.
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Intel® Software
This document summarizes a presentation about software-defined visualization using ParaView with OSPRay. The presentation covers:
- An overview of rasterization and ray tracing for visualization rendering.
- Available software-defined visualization libraries including OpenSWR, OSPRay, ParaView, GLuRay, and GraviT.
- A demonstration of ParaView with OSPRay, showing its capabilities for volume rendering, soft shadows, ambient occlusion, and more realistic lighting compared to traditional OpenGL.
- Hands-on tutorials using ParaView with OSPRay to visualize wavelet data, isosurfaces, and volumetric data with shadows. The benefits and limitations of OSPRay integration in Para
Multiuser MIMO Vector Perturbation Precodingadeelrazi
This paper proposes methods for sum rate optimization in multi-user MIMO systems using vector perturbation precoding. It derives an expression for sum rate in terms of the average transmitted vector energy. It then uses this to obtain a high-SNR upper bound on sum rate and proposes an extension of vector perturbation that allocates different rates to different users. It also proposes a low-complexity user scheduling algorithm as a method for rate allocation.
This document provides an overview of VAE-type deep generative models, especially RNNs combined with VAEs. It begins with notations and abbreviations used. The agenda then covers the mathematical formulation of generative models, the Variational Autoencoder (VAE), variants of VAE that combine it with RNNs (VRAE, VRNN, DRAW), a Chainer implementation of Convolutional DRAW, other related models (Inverse DRAW, VAE+GAN), and concludes with challenges of VAE-like generative models.
ECCV2010: feature learning for image classification, part 4zukun
This document discusses techniques for unsupervised feature learning from unlabeled data using neural networks. It describes using sparse autoencoders to learn feature hierarchies in an unsupervised manner by training networks to reconstruct their inputs while enforcing sparsity constraints. Convolutional deep belief networks are also discussed as a method for hierarchical probabilistic modeling of audio, images and video. The document concludes that unsupervised feature learning has achieved state-of-the-art results on various tasks such as object classification, activity recognition and speech processing.
PyTorch constructs dynamic computational graphs that allow for maximum flexibility and speed for deep learning research. Dynamic graphs are useful when the computation cannot be fully determined ahead of time, as they allow the graph to change on each iteration based on variable data. This makes PyTorch well-suited for problems with dynamic or variable sized inputs. While static graphs can optimize computation, dynamic graphs are easier to debug and create extensions for. PyTorch aims to be a simple and intuitive platform for neural network programming and research.
Objects as points (CenterNet) review [CDM]Dongmin Choi
The document proposes representing objects as single center points rather than bounding boxes. This allows detecting objects through keypoint estimation using a single neural network without post-processing. The method, called CenterNet, predicts center points along with object properties like size in one forward pass. Experiments show CenterNet runs in real-time and is simpler, faster and more accurate than two-stage detectors that require additional pre and post-processing steps. It provides a new direction for real-time object recognition.
Fast R-CNN is a method that improves object detection speed and accuracy over previous methods like R-CNN and SPPnet. It uses a region of interest pooling layer and multi-task loss to jointly train a convolutional neural network for classification and bounding box regression in a single stage of training. This allows the entire network to be fine-tuned end-to-end for object detection, resulting in faster training and testing compared to previous methods while achieving state-of-the-art accuracy on standard datasets. Specifically, Fast R-CNN trains 9x faster than R-CNN and runs 200x faster at test time.
1. The document discusses and compares various motion estimation methods used in video compression standards, including translational and affine motion models. 2. It describes pixel domain block matching and frequency domain matching techniques. 3. It provides details on parameters for block matching motion estimation such as search area size, sub-pixel precision, and hierarchical and early termination techniques to improve efficiency.
This document discusses semantic image segmentation with deep learning. It begins by defining semantic segmentation as classifying each pixel in an image. Convolutional neural networks (CNNs) can be used for pixel-wise prediction but do not capture spatial context. Conditional random fields (CRFs) can model contextual information but are typically applied as a post-processing step. The document proposes a method called CRF-RNN that integrates CRFs into CNNs by treating mean-field inference as a recurrent neural network. This allows end-to-end training and improves results over applying CRFs as a post-processing step. Examples of semantic segmentation results on various images are shown along with challenges in segmenting certain images.
This document describes research on using region-oriented convolutional neural networks for object retrieval. It discusses using local CNNs like CaffeNet, Fast R-CNN, and SDS to extract visual features from object candidates in images. These features are used to match against query descriptors. Pooled regional features are ranked to retrieve relevant shots. Fine-tuning pre-trained networks on larger datasets like COCO can improve retrieval accuracy. Combining global and local approaches through re-ranking provides an additional boost in performance.
The document discusses using grid computing resources on demand from cloud infrastructure. It proposes offering a grid interface to allow computationally intensive science applications to leverage elastic cloud resources when demand spikes. Key challenges include enabling secure delegation of authority through proxy certificates when hosting grid services on dynamically allocated cloud virtual machines.
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Thang Nguyen
In this undergraduate thesis, I provide a general view of communities and its the real life applications. In recent years, with the rapid growth of network scale, it is a difficult task to detect overlapping communities in large-scale networks for state of the art methods. This method is implemented in the Apache Spark framework for its power in distributed parallel computation.
The main contributions of this work include:
Introduce BigCLAM models proposed by Yang and Leskovec (2013).
proposed a few methods convex optimization.
implemented BigCLAM in Apache Spark is evaluated as lightning-fast cluster computing to able detect community in the large-scale networks.
https://thangdnsf.github.io/research.html
Landuse Classification from Satellite Imagery using Deep LearningDataWorks Summit
With the abundance of remote sensing satellite imagery, the possibilities are endless as to the kind of insights that can be derived from them. One such use is to determine land use for agriculture and non-agricultural purposes.
In this talk, we’ll be looking at leveraging Sentinel-2 satellite imagery data along with OpenStreetMap labels to be able to classify land use as agricultural or non-agricultural.
Sentinel-2 data has a 10-meter resolution in RGB bands and is well-suited for land use classification. Using these two datasets, many different machine learning tasks can be performed like image segmentation into two classes (farm land and non-farm land) or more challenging task of identification of crop type being cultivated on fields.
For this talk, we’ll be looking at leveraging convolutional neural networks (CNNs) built with Apache MXNet to train deep learning models for land use classification. We’ll be covering the different deep learning architectures considered for this particular use case along with the appropriate metrics.
We’ll be leveraging streaming pipelines built on Apache Flink and Apache NiFi for model training and inference. Developers will come away with a better understanding of how to analyze satellite imagery and the different deep learning architectures along with their pros/cons when analyzing satellite imagery for land use. SUNEEL MARTHI and CHRIS OLIVIER, Software Development Engineer Amazon Web Services
Large scale landuse classification of satellite imagerySuneel Marthi
This document summarizes a presentation on classifying land use from satellite imagery. It describes using a neural network to filter out cloudy images, segmenting images with a U-Net model to identify tulip fields, and implementing the workflow with Apache Beam for inference on new images. Examples are shown of detecting large and small tulip fields. Future work proposed includes classifying rock formations using infrared bands and measuring crop health.
The document discusses deep learning concepts without requiring advanced degrees. It introduces StoreKey, a Python package for scientific computing on GPUs and deep learning research. It covers basics like variables, tensors, and autograd in Python. Predictive models discussed include linear regression, logistic regression, and convolutional neural networks. Linear regression fits a line to data to predict unobserved values. Logistic regression predicts binary outcomes by fitting data to a logit function. A convolutional neural network example is shown with input, output, and hidden layers for classification problems.
Learning visual representation without human labelKai-Wen Zhao
Self supervised learning (SSL) is one of the most fast-growing research topic in recent years. SSL provides algorithm that directly learn visual representation from data itself rather than human manual labels. From theoretical point of view, SSL explores information theory & the nature of large scale dataset.
Pilot Contamination Mitigation for Wideband Massive MIMO: Number of Cells Vs ...T. E. BOGALE
The document presents a pilot contamination mitigation technique for wideband massive MIMO systems. It proposes a three-step approach: 1) Allowing pilot transmission in the time domain, 2) Expressing sub-carrier channel estimates as linear combinations of received signals, and 3) Optimizing the number of cells, pilots, and linear combination terms to ensure unbounded signal-to-interference-plus-noise ratio (SINR). The main results show that the number of cells can be increased to L, where L is the number of multipath taps, allowing cancellation of pilot contamination. Simulation results demonstrate that the proposed approach achieves rates close to perfect channel state information.
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-SupervisionDeep Learning JP
The document summarizes an academic paper on unpaired image super-resolution using pseudo-supervision. It presents the following key points:
1. The paper proposes a method using GANs and two networks - a correction network to transform real low-resolution images to clean low-resolution, and a super-resolution network to generate high-resolution images from clean low-resolution.
2. Experiments on multiple datasets demonstrate better results than previous methods, generating high-resolution images from diverse, unpaired low-resolution data.
3. The proposed method was incorporated into Sharp's newest smartphone just 1.5 years after the paper was published, showing the speed of applying academic research.
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
This document discusses convolutional neural networks for image classification and their application to the Kaggle National Data Science Bowl competition. It provides an overview of CNNs and their effectiveness for computer vision tasks. It then details various CNN architectures, preprocessing techniques, and ensembling methods that were tested on the competition dataset, achieving a top score of 0.609 log loss. The document concludes with highlights of the winning team's solution, including novel pooling methods and knowledge distillation.
#6 PyData Warsaw: Deep learning for image segmentationMatthew Opala
Deep learning techniques ignited a great progress in many computer vision tasks like image classification, object detection, and segmentation. Almost every month a new method is published that achieves state-of-the-art result on some common benchmark dataset. In addition to that, DL is being applied to new problems in CV.
In the talk we’re going to focus on DL application to image segmentation task. We want to show the practical importance of this task for the fashion industry by presenting our case study with results achieved with various attempts and methods.
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Intel® Software
This document summarizes a presentation about software-defined visualization using ParaView with OSPRay. The presentation covers:
- An overview of rasterization and ray tracing for visualization rendering.
- Available software-defined visualization libraries including OpenSWR, OSPRay, ParaView, GLuRay, and GraviT.
- A demonstration of ParaView with OSPRay, showing its capabilities for volume rendering, soft shadows, ambient occlusion, and more realistic lighting compared to traditional OpenGL.
- Hands-on tutorials using ParaView with OSPRay to visualize wavelet data, isosurfaces, and volumetric data with shadows. The benefits and limitations of OSPRay integration in Para
Multiuser MIMO Vector Perturbation Precodingadeelrazi
This paper proposes methods for sum rate optimization in multi-user MIMO systems using vector perturbation precoding. It derives an expression for sum rate in terms of the average transmitted vector energy. It then uses this to obtain a high-SNR upper bound on sum rate and proposes an extension of vector perturbation that allocates different rates to different users. It also proposes a low-complexity user scheduling algorithm as a method for rate allocation.
This document provides an overview of VAE-type deep generative models, especially RNNs combined with VAEs. It begins with notations and abbreviations used. The agenda then covers the mathematical formulation of generative models, the Variational Autoencoder (VAE), variants of VAE that combine it with RNNs (VRAE, VRNN, DRAW), a Chainer implementation of Convolutional DRAW, other related models (Inverse DRAW, VAE+GAN), and concludes with challenges of VAE-like generative models.
ECCV2010: feature learning for image classification, part 4zukun
This document discusses techniques for unsupervised feature learning from unlabeled data using neural networks. It describes using sparse autoencoders to learn feature hierarchies in an unsupervised manner by training networks to reconstruct their inputs while enforcing sparsity constraints. Convolutional deep belief networks are also discussed as a method for hierarchical probabilistic modeling of audio, images and video. The document concludes that unsupervised feature learning has achieved state-of-the-art results on various tasks such as object classification, activity recognition and speech processing.
PyTorch constructs dynamic computational graphs that allow for maximum flexibility and speed for deep learning research. Dynamic graphs are useful when the computation cannot be fully determined ahead of time, as they allow the graph to change on each iteration based on variable data. This makes PyTorch well-suited for problems with dynamic or variable sized inputs. While static graphs can optimize computation, dynamic graphs are easier to debug and create extensions for. PyTorch aims to be a simple and intuitive platform for neural network programming and research.
Objects as points (CenterNet) review [CDM]Dongmin Choi
The document proposes representing objects as single center points rather than bounding boxes. This allows detecting objects through keypoint estimation using a single neural network without post-processing. The method, called CenterNet, predicts center points along with object properties like size in one forward pass. Experiments show CenterNet runs in real-time and is simpler, faster and more accurate than two-stage detectors that require additional pre and post-processing steps. It provides a new direction for real-time object recognition.
Fast R-CNN is a method that improves object detection speed and accuracy over previous methods like R-CNN and SPPnet. It uses a region of interest pooling layer and multi-task loss to jointly train a convolutional neural network for classification and bounding box regression in a single stage of training. This allows the entire network to be fine-tuned end-to-end for object detection, resulting in faster training and testing compared to previous methods while achieving state-of-the-art accuracy on standard datasets. Specifically, Fast R-CNN trains 9x faster than R-CNN and runs 200x faster at test time.
1. The document discusses and compares various motion estimation methods used in video compression standards, including translational and affine motion models. 2. It describes pixel domain block matching and frequency domain matching techniques. 3. It provides details on parameters for block matching motion estimation such as search area size, sub-pixel precision, and hierarchical and early termination techniques to improve efficiency.
This document discusses semantic image segmentation with deep learning. It begins by defining semantic segmentation as classifying each pixel in an image. Convolutional neural networks (CNNs) can be used for pixel-wise prediction but do not capture spatial context. Conditional random fields (CRFs) can model contextual information but are typically applied as a post-processing step. The document proposes a method called CRF-RNN that integrates CRFs into CNNs by treating mean-field inference as a recurrent neural network. This allows end-to-end training and improves results over applying CRFs as a post-processing step. Examples of semantic segmentation results on various images are shown along with challenges in segmenting certain images.
This document describes research on using region-oriented convolutional neural networks for object retrieval. It discusses using local CNNs like CaffeNet, Fast R-CNN, and SDS to extract visual features from object candidates in images. These features are used to match against query descriptors. Pooled regional features are ranked to retrieve relevant shots. Fine-tuning pre-trained networks on larger datasets like COCO can improve retrieval accuracy. Combining global and local approaches through re-ranking provides an additional boost in performance.
The document discusses using grid computing resources on demand from cloud infrastructure. It proposes offering a grid interface to allow computationally intensive science applications to leverage elastic cloud resources when demand spikes. Key challenges include enabling secure delegation of authority through proxy certificates when hosting grid services on dynamically allocated cloud virtual machines.
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Thang Nguyen
In this undergraduate thesis, I provide a general view of communities and its the real life applications. In recent years, with the rapid growth of network scale, it is a difficult task to detect overlapping communities in large-scale networks for state of the art methods. This method is implemented in the Apache Spark framework for its power in distributed parallel computation.
The main contributions of this work include:
Introduce BigCLAM models proposed by Yang and Leskovec (2013).
proposed a few methods convex optimization.
implemented BigCLAM in Apache Spark is evaluated as lightning-fast cluster computing to able detect community in the large-scale networks.
https://thangdnsf.github.io/research.html
Landuse Classification from Satellite Imagery using Deep LearningDataWorks Summit
With the abundance of remote sensing satellite imagery, the possibilities are endless as to the kind of insights that can be derived from them. One such use is to determine land use for agriculture and non-agricultural purposes.
In this talk, we’ll be looking at leveraging Sentinel-2 satellite imagery data along with OpenStreetMap labels to be able to classify land use as agricultural or non-agricultural.
Sentinel-2 data has a 10-meter resolution in RGB bands and is well-suited for land use classification. Using these two datasets, many different machine learning tasks can be performed like image segmentation into two classes (farm land and non-farm land) or more challenging task of identification of crop type being cultivated on fields.
For this talk, we’ll be looking at leveraging convolutional neural networks (CNNs) built with Apache MXNet to train deep learning models for land use classification. We’ll be covering the different deep learning architectures considered for this particular use case along with the appropriate metrics.
We’ll be leveraging streaming pipelines built on Apache Flink and Apache NiFi for model training and inference. Developers will come away with a better understanding of how to analyze satellite imagery and the different deep learning architectures along with their pros/cons when analyzing satellite imagery for land use. SUNEEL MARTHI and CHRIS OLIVIER, Software Development Engineer Amazon Web Services
Large scale landuse classification of satellite imagerySuneel Marthi
This document summarizes a presentation on classifying land use from satellite imagery. It describes using a neural network to filter out cloudy images, segmenting images with a U-Net model to identify tulip fields, and implementing the workflow with Apache Beam for inference on new images. Examples are shown of detecting large and small tulip fields. Future work proposed includes classifying rock formations using infrared bands and measuring crop health.
This document discusses using fully convolutional neural networks for defect inspection. It begins with an agenda that outlines image segmentation using FCNs and defect inspection. It then provides details on data preparation including labeling guidelines, data augmentation, and model setup using techniques like deconvolution layers and the U-Net architecture. Metrics for evaluating the model like Dice score and IoU are also covered. The document concludes with best practices for successful deep learning projects focusing on aspects like having a large reusable dataset, feasibility of the problem, potential payoff, and fault tolerance.
Solr and Machine Vision - Scott Cote, Lucidworks & Trevor Grant, IBMLucidworks
This document discusses using machine vision techniques like Haar cascade filters and eigenfaces for real-time facial recognition and detection. It proposes using OpenCV to detect faces in video frames, clustering the detected faces to remove "ghost" faces, representing each face as a vector of eigenface coefficients, and searching Solr to identify faces or add new identities. It also discusses challenges like inconsistent face detection and proposes solutions like adaptive clustering parameters and windowing video frames to add context.
Dense Retrieval with Apache Solr Neural Search.pdfSease
This document provides an overview of dense retrieval with Apache Solr neural search. It discusses semantic search problems that neural search aims to address through vector-based representations of queries and documents. It then describes Apache Solr's implementation of neural search using dense vector fields and HNSW graphs to perform k-nearest neighbor retrieval. Functions are shown for indexing and searching vector data. The document also discusses using vector queries for filtering, re-ranking, and hybrid searches combining dense and sparse criteria.
Deep Learning And Business Models (VNITC 2015-09-13)Ha Phuong
Deep Learning and Business Models
Tran Quoc Hoan discusses deep learning and its applications, as well as potential business models. Deep learning has led to significant improvements in areas like image and speech recognition compared to traditional machine learning. Some business models highlighted include developing deep learning frameworks, building hardware optimized for deep learning, using deep learning for IoT applications, and providing deep learning APIs and services. Deep learning shows promise across many sectors but also faces challenges in fully realizing its potential.
The document discusses sparse coding and its applications in visual recognition tasks. It introduces sparse coding as an unsupervised learning technique that learns bases to represent image patches. Sparse coding has been shown to outperform bag-of-words models with vector quantization on datasets like Caltech-101 and PASCAL VOC. The document also discusses extensions of sparse coding, including hierarchical sparse coding and supervised methods, that have achieved further improvements on image classification benchmarks.
Open CV is an open source computer vision library that provides programming functions for real-time computer vision. It is cross-platform and can be used to build applications across operating systems. The library contains hundreds of functions for applications like factory product inspection, medical imaging, security, and robotics. It has a large user community including researchers and major tech companies and has been used in applications like surveillance, mapping, manufacturing inspection, and more.
Measuring vegetation health to predict natural hazardsSuneel Marthi
This document discusses using satellite imagery and machine learning to measure vegetation health and predict natural hazards. Specifically, it presents a workflow for identifying vegetation indices from Landsat8 satellite images to monitor things like agriculture, drought, and fire risk. The workflow includes acquiring and preprocessing Landsat8 data, computing normalized difference vegetation indices (NDVI), training a deep learning model to classify pixels, and implementing the inference pipeline using Apache Beam for scalability. Case studies of Paradise, CA show how NDVI can track changes over time. Future work proposed includes classifying rock formations and unsupervised clustering of image regions.
Overview of challenges being faced by the AI community to achieve high-performance, scalable and distributed DNN training on Modern HPC systems with both scale-up and scale-out strategies. After that, the talk will focus on a range of solutions being carried out in my group to address these challenges. The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of-core DNN training, and 4) Hybrid (Data and Model) parallelism. Case studies to accelerate DNN training with popular frameworks like TensorFlow, PyTorch, MXNet and Caffe on modern HPC systems will be presented.
Surveillance scene classification using machine learningUtkarsh Contractor
The problem of scene classification in surveillance footage is of great importance for ensuring security in public areas. With challenges such as low quality feeds, occlusion, viewpoint variations, background clutter etc. The task is both challenging and error-prone. Therefore it is important to keep the false positives low to maintain a high accuracy of detection. In this paper, we adapt high performing CNN architectures to identify abandoned luggage in a surveillance feed. We explore several CNN based approaches, from Transfer Learning on the Imagenet dataset to object classification using Faster R-CNNs on the COCO dataset. Using network visualization techniques, we gain insight into what the neural network sees and the basis of classification decision. The experiments have been conducted on real world datasets, and highlights the complexity in such classifications. Obtained results indicate that a combination of proposed techniques outperforms the individual approaches.
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation岳華 杜
This document discusses several semantic segmentation methods using deep learning, including fully convolutional networks (FCNs), U-Net, and SegNet. FCNs were among the first to use convolutional networks for dense, pixel-wise prediction by converting classification networks to fully convolutional form and combining coarse and fine feature maps. U-Net and SegNet are encoder-decoder architectures that extract high-level semantic features from the input image and then generate pixel-wise predictions, with U-Net copying and cropping features and SegNet using pooling indices for upsampling. These methods demonstrate that convolutional networks can effectively perform semantic segmentation through dense prediction.
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Sease
The first integrations of machine learning techniques with search allowed to improve the ranking of your search results (Learning To Rank) – but one limitation has always been that documents had to contain the keywords that the user typed in the search box in order to be retrieved. For example, the query “tiger” won’t retrieve documents containing only the terms “panthera tigris”. This is called the vocabulary mismatch problem and over the years it has been mitigated through query and document expansion approaches.
Neural search is an Artificial Intelligence technique that allows a search engine to reach those documents that are semantically similar to the user’s query without necessarily containing those terms; it avoids the need for long lists of synonyms by automatically learning the similarity of terms and sentences in your collection through the utilisation of deep neural networks and numerical vector representation.
The world is the computer and the programmer is youDavide Carboni
This document discusses the past, present, and future of connecting physical objects to the internet and computing networks. It outlines the evolution of related technologies over time from the 1950s to present. It also describes two approaches to programming these connected systems - a top-down approach using tools like PySense, and a bottom-up approach using a model called Hyperpipe that is based on pi-calculus.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
This document discusses a lecture on computer vision given by Dr. Eng. Mahmoud Shams at Kafrelsheikh University. It defines computer vision as dealing with how computers understand digital images and videos, and seeks to automate tasks of the human visual system. The lecture covers classification of AI, evaluation of computer vision algorithms, common computer vision tasks like localization and segmentation, and why benchmarks are important. It also lists the top 10 computer vision tools for 2020 and discusses negative results in computer vision research.
This document discusses a lecture on computer vision given by Dr. Eng. Mahmoud Shams at Kafrelsheikh University. It defines computer vision as dealing with how computers understand digital images and videos, and seeks to automate tasks of the human visual system. The lecture covers classification of AI, evaluation of computer vision algorithms, common computer vision tasks like localization and segmentation, and why benchmarks are important. It also discusses sources of noise in images, performance metrics like mean square error and confusion matrices, and some top computer vision tools like OpenCV, TensorFlow, Keras and YOLO.
Similar to Large scale landuse classification of satellite imagery (20)
Streaming topic model training and inferenceSuneel Marthi
This document discusses streaming topic modeling and inference. It begins by motivating topic modeling and describing existing batch-oriented approaches like LDA and LSA. It then discusses challenges with traditional approaches for dynamic corpora and the need for streaming algorithms. Two streaming approaches are described: learning topics from Jira issues using an online LDA algorithm on Flink. Online LDA uses variational Bayes for efficient, online inference of topic distributions from document streams. Key aspects of implementing online LDA on Flink are discussed. The document concludes by arguing for more use of streaming algorithms to enable instant, up-to-date results from dynamic data.
The document discusses moving beyond simply moving bytes in stream processing and instead focusing on understanding data semantics through the use of a schema registry. A schema registry is a centralized service for storing and retrieving schemas to support serialization and deserialization across applications and systems. Several existing schema registries are described, along with how schemas can be referenced in messages rather than embedded. The use of a schema registry in a data pipeline is demonstrated. Finally, the document discusses implementing serialization and deserialization using schemas with Apache Flink.
Embracing diversity searching over multiple languagesSuneel Marthi
This document discusses multi-lingual search and machine translation. It introduces Tommaso Teofili and Suneel Marthi, who work on Apache projects related to natural language processing. They discuss why multi-lingual search is important to embrace diversity online. Statistical machine translation generates translations from models trained on parallel text corpora. Phrase-based models can translate phrases as units and handle reordering better than word-based models. Apache Joshua is an open source machine translation decoder used by many organizations.
This document summarizes Suneel Marthi's presentation on large scale natural language processing. It discusses how natural language processing deals with processing and analyzing large amounts of human language data using computers. It provides an overview of Apache OpenNLP and Apache Flink, two open source projects for natural language processing. It also discusses how models for tasks like part-of-speech tagging and named entity recognition can be trained for different languages and integrated into data pipelines for large scale processing using these frameworks.
Distributed Machine Learning with Apache MahoutSuneel Marthi
This document discusses Apache Mahout, an open source machine learning library. It provides examples of using Mahout for tasks like linear regression, dimensionality reduction, and data visualization. Key points covered include loading and manipulating distributed datasets, fitting regression models, evaluating predictions, and visualizing high-dimensional data in 2D and 3D plots.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Large scale landuse classification of satellite imagery
1. Large Scale LanduseLarge Scale Landuse
Classification of SatelliteClassification of Satellite
ImageryImagery
Suneel MarthiSuneel Marthi
February 27, 2019February 27, 2019
Big Data Technology Summit, Warsaw, PolandBig Data Technology Summit, Warsaw, Poland
1
2. $WhoAmI$WhoAmI
Suneel MarthiSuneel Marthi
@suneelmarthi@suneelmarthi
Member of Apache Software Foundation
Committer and PMC on Apache Mahout, Apache OpenNLP, Apache
Streams
2
4. IntroductionIntroduction
Deep Learning has moved from Academia to IndustryDeep Learning has moved from Academia to Industry
Availability of Massive Cloud Computing PowerAvailability of Massive Cloud Computing Power
Combination of Compute Resources + Big Data withCombination of Compute Resources + Big Data with
Deep Learning models often produces useful andDeep Learning models often produces useful and
interesting applicationsinteresting applications
4
5. IntroductionIntroduction
Computer Vision for Satellite ImageryComputer Vision for Satellite Imagery
Availability of low cost satellite images for researchAvailability of low cost satellite images for research
Train a Deep Learning model to identify Tulip beds fromTrain a Deep Learning model to identify Tulip beds from
satellite datasatellite data
5
6. Data: Sentinel-2Data: Sentinel-2
Earth observation mission from ESAEarth observation mission from ESA
13 spectral bands, from RGB to SWIR (Short Wave13 spectral bands, from RGB to SWIR (Short Wave
Infrared)Infrared)
Spatial resolution: 10m/px (RGB bands)Spatial resolution: 10m/px (RGB bands)
5 day revisit time5 day revisit time
Free and open data policyFree and open data policy
6
12. Filter CloudsFilter Clouds
Need to remove cloudy images before segmentingNeed to remove cloudy images before segmenting
Approach: train a Neural Network to classify images asApproach: train a Neural Network to classify images as
clear or cloudyclear or cloudy
CNN Architectures: ResNet50 and ResNet101CNN Architectures: ResNet50 and ResNet101
12
14. Filter Clouds: training dataFilter Clouds: training data
‘Planet: Understanding the Amazon from Space’ Kaggle‘Planet: Understanding the Amazon from Space’ Kaggle
competitioncompetition
40K images labeled as clear, hazy, partly cloudy or40K images labeled as clear, hazy, partly cloudy or
cloudycloudy
14
15. Filter Clouds: Training data(2)Filter Clouds: Training data(2)
Origin No. of
Images
Cloudy
Images
Kaggle Competition 40000 30%
Sentinel-2(hand
labelled)
5000 50%
Total 45000 32%
Only two classes: clear and cloudy (cloudy = haze +Only two classes: clear and cloudy (cloudy = haze +
partly cloudy + cloudy)partly cloudy + cloudy)
15
23. Approach U-NetApproach U-Net
State of the Art CNN for Image Segmentation
Commonly used with biomedical images
Best Architecture for tasks like this
O. Ronneberger, P.Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. arxiv:1505.04597, 2015O. Ronneberger, P.Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. arxiv:1505.04597, 2015
23
25. U-Net Building BlocksU-Net Building Blocks
def conv_block(channels, kernel_size):
out = nn.HybridSequential()
out.add(
nn.Conv2D(channels, kernel_size, padding=1, use_bias=False
nn.BatchNorm(),
nn.Activation('relu')
)
return out
def down_block(channels):
out = nn.HybridSequential()
out.add(
conv_block(channels, 3),
conv_block(channels, 3)
)
return out
25
26. U-Net Building Blocks (2)U-Net Building Blocks (2)
class up_block(nn.HybridBlock):
def __init__(self, channels, shrink=True, **kwargs):
super(up_block, self).__init__(**kwargs)
self.upsampler = nn.Conv2DTranspose(channels=channels, ker
strides=2, padding=1,
self.conv1 = conv_block(channels, 1)
self.conv3_0 = conv_block(channels, 3)
if shrink:
self.conv3_1 = conv_block(int(channels/2), 3)
else:
self.conv3_1 = conv_block(channels, 3)
def hybrid_forward(self, F, x, s):
x = self.upsampler(x)
x = self.conv1(x)
x = F.relu(x)
x = F Crop(*[x s] center crop=True)
26
27. U-Net: Training dataU-Net: Training data
Ground truth: tulip fields in the
Netherlands
Provided by Geopedia, from
Sinergise
27
28. Loss function: Soft Dice Coefficient lossLoss function: Soft Dice Coefficient loss
Prediction = Probability of each pixel belonging to aPrediction = Probability of each pixel belonging to a
Tulip Field (Softmax output)Tulip Field (Softmax output)
ε serves to prevent division by zeroε serves to prevent division by zero
28
29. Evaluation Metric: Intersection Over Union(IoU)Evaluation Metric: Intersection Over Union(IoU)
AkaAka Jaccard IndexJaccard Index
Similar to Dice coefficient, standard metric for imageSimilar to Dice coefficient, standard metric for image
segmentationsegmentation
29
31. ResultsResults
IoU = 0.73 after 23 training epochs
Related results: DSTL Kaggle competition
IoU = 0.84 on crop vs building/road/water/etc
segmentation
https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection/discussion/29790https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection/discussion/29790
31
46. How to Scale - Batch or Stream ?How to Scale - Batch or Stream ?
"Batch is an extension of Streaming, except when"Batch is an extension of Streaming, except when
Streaming is an extension of Batch"Streaming is an extension of Batch"
-- Shannon Quinn, Apache Mahout-- Shannon Quinn, Apache Mahout
46
47. Spark or Flink ?Spark or Flink ?
"Spark Streaming is for people who want to operate on"Spark Streaming is for people who want to operate on
their streams using Batch idioms.their streams using Batch idioms.
Flink Batch is for people who want to operate on theirFlink Batch is for people who want to operate on their
batches using Streaming idioms."batches using Streaming idioms."
-- Joey Frazee, Apache NiFi-- Joey Frazee, Apache NiFi
47
48. What is Apache Beam?What is Apache Beam?
Agnostic (unified Batch + Stream) programming
model
Java, Python, Go SDKs
Runners for Dataflow
Apache Flink
Apache Spark
Google Cloud Dataflow
Local DataRunner
48
49. Why Apache Beam?Why Apache Beam?
Portability: Code abstraction that can be executed on
different backend runners
Unified: Unified batch and Streaming API
Expandable models and SDK: Extensible API to define
custom sinks and sources
49
50. End Users: Create
pipelines in a familiar
language
SDK Writers: Make Beam
concepts available in
new languages
Runner Writers: Support
Beam pipelines in
distributed processing
environments
The Apache Beam VisionThe Apache Beam Vision
50
56. Classify Rock FormationsClassify Rock Formations
Using Shortwave Infrared images (2.107 - 2.294 nm)Using Shortwave Infrared images (2.107 - 2.294 nm)
Radiant Energy reflected/transmitted per unit timeRadiant Energy reflected/transmitted per unit time
(Radiant Flux)(Radiant Flux)
Eg: Plants don't grow on rocksEg: Plants don't grow on rocks
https://en.wikipedia.org/wiki/Radiant_fluxhttps://en.wikipedia.org/wiki/Radiant_flux
55
57. Measure Crop HealthMeasure Crop Health
Using Near-Infrared (NIR) radiationUsing Near-Infrared (NIR) radiation
Emitted by plant Chlorophyll and MesophyllEmitted by plant Chlorophyll and Mesophyll
Chlorophyll content differs between plants and plantChlorophyll content differs between plants and plant
stagesstages
Good measure to identify different plants and theirGood measure to identify different plants and their
healthhealth
https://en.wikipedia.org/wiki/Near-infrared_spectroscopy#Agriculturehttps://en.wikipedia.org/wiki/Near-infrared_spectroscopy#Agriculture
56
58. Use images from Red bandUse images from Red band
Identify borders, regions without much details withIdentify borders, regions without much details with
naked eye - Wonder Why?naked eye - Wonder Why?
Images are in Red bandImages are in Red band
Unsupervised Learning - ClusteringUnsupervised Learning - Clustering
57