Explanation of tensorflow object detection, explanation of transfer learning, active learning, test result using actual data.
An introduction to the tool that makes it easy to use object detection api is also included.
The full version with animations can be found in this link.
https://docs.google.com/presentation/d/11EVzICHMw3UbNRRaODxQVT0qhuHrzpMhBCnVZy4FrhI/edit?usp=sharing
Slides from the UPC reading group on computer vision about the following paper:
Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. "You only look once: Unified, real-time object detection." arXiv preprint arXiv:1506.02640 (2015).
image classification is a common problem in Artificial Intelligence , we used CIFR10 data set and tried a lot of methods to reach a high test accuracy like neural networks and Transfer learning techniques .
you can view the source code and the papers we read on github : https://github.com/Asma-Hawari/Machine-Learning-Project-
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Lalit Jain
Using CNN with Keras and Tensorflow, we have a deployed a solution which can train any image on the fly. Code uses Google Api to fetch new images, VGG16 model to train the model and is deployed using Python Django framework
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIESIRJET Journal
1) The document presents a study on detecting yoga poses using machine learning libraries. The researchers collected a dataset of 5 popular yoga poses and used OpenCV and MediaPipe for pose detection.
2) Several machine learning classifiers were tested on the dataset including logistic regression, random forest, gradient boosting, KNN and ridge regression. The gradient boosting classifier achieved the highest accuracy of 98.87%.
3) A web application with a graphical user interface was developed to allow users to capture their pose and receive feedback on accuracy in real-time without an instructor, helping people learn yoga poses properly. The system has potential for improvements like detecting additional poses and developing mobile applications.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
Image classification using convolutional neural networkKIRAN R
For separating the images from a large collection of images or from a large dataset this classifier can be used, Here deep neural network is used for training and classifying the images. The convolutional neural network is the most suitable algorithm for classifier images. This Classifier is a machine learning model, so the more you train it the more will be the accuracy.
Slides from the UPC reading group on computer vision about the following paper:
Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. "You only look once: Unified, real-time object detection." arXiv preprint arXiv:1506.02640 (2015).
image classification is a common problem in Artificial Intelligence , we used CIFR10 data set and tried a lot of methods to reach a high test accuracy like neural networks and Transfer learning techniques .
you can view the source code and the papers we read on github : https://github.com/Asma-Hawari/Machine-Learning-Project-
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Lalit Jain
Using CNN with Keras and Tensorflow, we have a deployed a solution which can train any image on the fly. Code uses Google Api to fetch new images, VGG16 model to train the model and is deployed using Python Django framework
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIESIRJET Journal
1) The document presents a study on detecting yoga poses using machine learning libraries. The researchers collected a dataset of 5 popular yoga poses and used OpenCV and MediaPipe for pose detection.
2) Several machine learning classifiers were tested on the dataset including logistic regression, random forest, gradient boosting, KNN and ridge regression. The gradient boosting classifier achieved the highest accuracy of 98.87%.
3) A web application with a graphical user interface was developed to allow users to capture their pose and receive feedback on accuracy in real-time without an instructor, helping people learn yoga poses properly. The system has potential for improvements like detecting additional poses and developing mobile applications.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
Image classification using convolutional neural networkKIRAN R
For separating the images from a large collection of images or from a large dataset this classifier can be used, Here deep neural network is used for training and classifying the images. The convolutional neural network is the most suitable algorithm for classifier images. This Classifier is a machine learning model, so the more you train it the more will be the accuracy.
The presentation is made on CNN's which is explained using the image classification problem, the presentation was prepared in perspective of understanding computer vision and its applications. I tried to explain the CNN in the most simple way possible as for my understanding. This presentation helps the beginners of CNN to have a brief idea about the architecture and different layers in the architecture of CNN with the example. Please do refer the references in the last slide for a better idea on working of CNN. In this presentation, I have also discussed the different types of CNN(not all) and the applications of Computer Vision.
** AI & Deep Learning Using TensorFlow - https://www.edureka.co/ai-deep-learning-with-tensorflow **
This Edureka tutorial will provide you with a detailed and comprehensive knowledge of TensorFlow Object detection and how it works. It will also provide you with the details on how to use Tensorflow to detect objects in deep learning method. Below are the topics covered in this tutorial:
1. What is Object Detection?
2. Industrial use of Object Detection
3. Object Detection Workflow
4. What is Tensorflow?
5. Object Detection using Tensorflow - Demo
6. Live Object Detection using Tensorflow- Demo
Convolutional Neural Network (CNN) is a type of neural network that can take in an input image, assign importance to areas in the image, and distinguish objects in the image. CNNs use convolutional layers and pooling layers, which help introduce translation invariance to allow the network to recognize patterns and objects regardless of their position in the visual field. CNNs have been very effective for tasks involving visual imagery like image classification but may be less effective for natural language processing tasks that rely more on word order and sequence. Recurrent neural networks (RNNs) that can model sequential data may perform better than CNNs for some natural language processing tasks like text classification.
Convolutional neural networks (CNNs) learn multi-level features and perform classification jointly and better than traditional approaches for image classification and segmentation problems. CNNs have four main components: convolution, nonlinearity, pooling, and fully connected layers. Convolution extracts features from the input image using filters. Nonlinearity introduces nonlinearity. Pooling reduces dimensionality while retaining important information. The fully connected layer uses high-level features for classification. CNNs are trained end-to-end using backpropagation to minimize output errors by updating weights.
This document provides an overview of digital image processing and image compression techniques. It defines what a digital image is, discusses the advantages and disadvantages of digital images over analog images. It describes the fundamental steps in digital image processing as well as types of data redundancy that can be exploited for image compression, including coding, interpixel, and psychovisual redundancy. Common image compression models and lossless compression techniques like Lempel-Ziv-Welch coding are also summarized.
Convolutional neural network (CNN / ConvNet's) is a part of Computer Vision. Machine Learning Algorithm. Image Classification, Image Detection, Digit Recognition, and many more. https://technoelearn.com .
Face recognition using artificial neural networkSumeet Kakani
This document provides an overview of a face recognition system that uses artificial neural networks. It describes the structure and processing of artificial neural networks, including convolutional networks. It discusses how the system works, including local image sampling, the self-organizing map, and the convolutional network. It then provides details about the implementation and applications of the system for face recognition, and concludes by discussing the benefits of the system.
This document discusses research on human action recognition using skeleton data. It introduces issues with skeleton-based action recognition, such as variable scales, view orientations, noise and rate/intra-action variations. It then reviews previous work on skeleton-based action recognition using hand-crafted features and deep learning models. The document proposes two ensemble deep learning models called Ensemble TS-LSTM v1 and v2 that use temporal sliding LSTMs to capture short, medium and long-term dependencies from skeleton sequences for action recognition. Experimental results on standard datasets demonstrate the models outperform previous methods.
Real-time object detection coz YOLO! by Shagufta Gurmukhdas
You Only Look Once is a state-of-the-art, high speed real-time object detection algorithm. It looks at the whole image at test time so its predictions are informed by global context in the image. This talk teaches you to develop your own application to detect and classify objects in images & videos.
1.Intro to YOLO algorithm
2. Image detection on video with YOLO
3. Processing images in Python, adding bounding boxes and labels
4. Processing complete videos in Python in the similar way as the previous section
5. Processing real time video from webcam
This presentation describes briefly about the image enhancement in spatial domain, basic gray level transformation, histogram processing, enhancement using arithmetic/ logical operation, basics of spatial filtering and local enhancements.
Tijmen Blankenvoort, co-founder Scyfer BV, presentation at Artificial Intelligence Meetup 15-1-2014. Introduction into Neural Networks and Deep Learning.
Deep learning기법을 이상진단 등에 적용할 경우, 정상과 이상 data-set간의 심각한 unbalance가 문제. 본 논문에서는 GAN 기법을 이용하여 정상 data-set만의 Manifold(축약된 모델)를 찾아낸 후 Query data에 대하여 기 훈련된 GAN 모델로 Manifold로의 mapping을 수행함으로서 기 훈련된 정상 data-set과의 차이가 있는지 여부를 판단하여 Query data의 이상 유무를 결정하고 영상 내에 존재하는 이상 영역을 pixel-wise segmentation 하여 제시함.
This document provides an introduction to computer graphics. It defines computer graphics as the creation, storage, and manipulation of pictures and drawings using digital computers. Computer graphics is used across diverse fields such as engineering, medicine, education, entertainment, and more. The document discusses basic terms related to display devices such as pixels, resolution, color depth, and frame buffers. It also describes different types of display devices including raster scan displays, random scan displays, direct view storage tubes, flat panel displays, and stereoscopic displays. Applications of computer graphics such as design, image processing, animation, simulation, and medical imaging are also summarized.
In this project, we propose methods for semantic segmentation with the deep learning state-of-the-art models. Moreover,
we want to filterize the segmentation to the specific object in specific application. Instead of concentrating on unnecessary objects we
can focus on special ones and make it more specialize and effecient for special purposes. Furtheromore, In this project, we leverage
models that are suitable for face segmentation. The models that are used in this project are Mask-RCNN and DeepLabv3. The
experimental results clearly indicate that how illustrated approach are efficient and robust in the segmentation task to the previous work
in the field of segmentation. These models are reached to 74.4 and 86.6 precision of Mean of Intersection over Union. The visual
Results of the models are shown in Appendix part.
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
The document summarizes the You Only Look Once (YOLO) object detection method. YOLO frames object detection as a single regression problem to directly predict bounding boxes and class probabilities from full images in one pass. This allows for extremely fast detection speeds of 45 frames per second. YOLO uses a feedforward convolutional neural network to apply a single neural network to the full image. This allows it to leverage contextual information and makes predictions about bounding boxes and class probabilities for all classes with one network.
This document describes a project to build a convolutional neural network (CNN) model to recognize six basic human emotions (angry, fear, happy, sad, surprise, neutral) from facial expressions. The CNN architecture includes convolutional, max pooling and fully connected layers. Models are trained on two datasets - FERC and RaFD. Experimental results show that Model C achieves the best testing accuracy of 71.15% on FERC and 63.34% on RaFD. Visualizations of activation maps and a prediction matrix are provided to analyze the model's performance and confusions between emotions. A live demo application is also developed using OpenCV to demonstrate real-time emotion recognition from video frames.
The document provides an overview of machine learning use cases. It begins with an agenda that will discuss the basic framework for ML projects, model deployment options, and various ML use cases like text classification, image classification, object detection, etc. It then covers the basic 5 step framework for ML projects - defining the problem, planning the solution, acquiring and preparing data, designing and training a model, and deploying the solution. Next, it discusses popular methods for various tasks like image classification, object detection, pose estimation. Finally, it shares several use cases for each task to demonstrate real-world applications.
Apple makes it really easy to get started with Machine Learning as a developer. See how you can easily use Create ML and Turi Create to train Machine Learning models and use them in your iOS apps.
The presentation is made on CNN's which is explained using the image classification problem, the presentation was prepared in perspective of understanding computer vision and its applications. I tried to explain the CNN in the most simple way possible as for my understanding. This presentation helps the beginners of CNN to have a brief idea about the architecture and different layers in the architecture of CNN with the example. Please do refer the references in the last slide for a better idea on working of CNN. In this presentation, I have also discussed the different types of CNN(not all) and the applications of Computer Vision.
** AI & Deep Learning Using TensorFlow - https://www.edureka.co/ai-deep-learning-with-tensorflow **
This Edureka tutorial will provide you with a detailed and comprehensive knowledge of TensorFlow Object detection and how it works. It will also provide you with the details on how to use Tensorflow to detect objects in deep learning method. Below are the topics covered in this tutorial:
1. What is Object Detection?
2. Industrial use of Object Detection
3. Object Detection Workflow
4. What is Tensorflow?
5. Object Detection using Tensorflow - Demo
6. Live Object Detection using Tensorflow- Demo
Convolutional Neural Network (CNN) is a type of neural network that can take in an input image, assign importance to areas in the image, and distinguish objects in the image. CNNs use convolutional layers and pooling layers, which help introduce translation invariance to allow the network to recognize patterns and objects regardless of their position in the visual field. CNNs have been very effective for tasks involving visual imagery like image classification but may be less effective for natural language processing tasks that rely more on word order and sequence. Recurrent neural networks (RNNs) that can model sequential data may perform better than CNNs for some natural language processing tasks like text classification.
Convolutional neural networks (CNNs) learn multi-level features and perform classification jointly and better than traditional approaches for image classification and segmentation problems. CNNs have four main components: convolution, nonlinearity, pooling, and fully connected layers. Convolution extracts features from the input image using filters. Nonlinearity introduces nonlinearity. Pooling reduces dimensionality while retaining important information. The fully connected layer uses high-level features for classification. CNNs are trained end-to-end using backpropagation to minimize output errors by updating weights.
This document provides an overview of digital image processing and image compression techniques. It defines what a digital image is, discusses the advantages and disadvantages of digital images over analog images. It describes the fundamental steps in digital image processing as well as types of data redundancy that can be exploited for image compression, including coding, interpixel, and psychovisual redundancy. Common image compression models and lossless compression techniques like Lempel-Ziv-Welch coding are also summarized.
Convolutional neural network (CNN / ConvNet's) is a part of Computer Vision. Machine Learning Algorithm. Image Classification, Image Detection, Digit Recognition, and many more. https://technoelearn.com .
Face recognition using artificial neural networkSumeet Kakani
This document provides an overview of a face recognition system that uses artificial neural networks. It describes the structure and processing of artificial neural networks, including convolutional networks. It discusses how the system works, including local image sampling, the self-organizing map, and the convolutional network. It then provides details about the implementation and applications of the system for face recognition, and concludes by discussing the benefits of the system.
This document discusses research on human action recognition using skeleton data. It introduces issues with skeleton-based action recognition, such as variable scales, view orientations, noise and rate/intra-action variations. It then reviews previous work on skeleton-based action recognition using hand-crafted features and deep learning models. The document proposes two ensemble deep learning models called Ensemble TS-LSTM v1 and v2 that use temporal sliding LSTMs to capture short, medium and long-term dependencies from skeleton sequences for action recognition. Experimental results on standard datasets demonstrate the models outperform previous methods.
Real-time object detection coz YOLO! by Shagufta Gurmukhdas
You Only Look Once is a state-of-the-art, high speed real-time object detection algorithm. It looks at the whole image at test time so its predictions are informed by global context in the image. This talk teaches you to develop your own application to detect and classify objects in images & videos.
1.Intro to YOLO algorithm
2. Image detection on video with YOLO
3. Processing images in Python, adding bounding boxes and labels
4. Processing complete videos in Python in the similar way as the previous section
5. Processing real time video from webcam
This presentation describes briefly about the image enhancement in spatial domain, basic gray level transformation, histogram processing, enhancement using arithmetic/ logical operation, basics of spatial filtering and local enhancements.
Tijmen Blankenvoort, co-founder Scyfer BV, presentation at Artificial Intelligence Meetup 15-1-2014. Introduction into Neural Networks and Deep Learning.
Deep learning기법을 이상진단 등에 적용할 경우, 정상과 이상 data-set간의 심각한 unbalance가 문제. 본 논문에서는 GAN 기법을 이용하여 정상 data-set만의 Manifold(축약된 모델)를 찾아낸 후 Query data에 대하여 기 훈련된 GAN 모델로 Manifold로의 mapping을 수행함으로서 기 훈련된 정상 data-set과의 차이가 있는지 여부를 판단하여 Query data의 이상 유무를 결정하고 영상 내에 존재하는 이상 영역을 pixel-wise segmentation 하여 제시함.
This document provides an introduction to computer graphics. It defines computer graphics as the creation, storage, and manipulation of pictures and drawings using digital computers. Computer graphics is used across diverse fields such as engineering, medicine, education, entertainment, and more. The document discusses basic terms related to display devices such as pixels, resolution, color depth, and frame buffers. It also describes different types of display devices including raster scan displays, random scan displays, direct view storage tubes, flat panel displays, and stereoscopic displays. Applications of computer graphics such as design, image processing, animation, simulation, and medical imaging are also summarized.
In this project, we propose methods for semantic segmentation with the deep learning state-of-the-art models. Moreover,
we want to filterize the segmentation to the specific object in specific application. Instead of concentrating on unnecessary objects we
can focus on special ones and make it more specialize and effecient for special purposes. Furtheromore, In this project, we leverage
models that are suitable for face segmentation. The models that are used in this project are Mask-RCNN and DeepLabv3. The
experimental results clearly indicate that how illustrated approach are efficient and robust in the segmentation task to the previous work
in the field of segmentation. These models are reached to 74.4 and 86.6 precision of Mean of Intersection over Union. The visual
Results of the models are shown in Appendix part.
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
The document summarizes the You Only Look Once (YOLO) object detection method. YOLO frames object detection as a single regression problem to directly predict bounding boxes and class probabilities from full images in one pass. This allows for extremely fast detection speeds of 45 frames per second. YOLO uses a feedforward convolutional neural network to apply a single neural network to the full image. This allows it to leverage contextual information and makes predictions about bounding boxes and class probabilities for all classes with one network.
This document describes a project to build a convolutional neural network (CNN) model to recognize six basic human emotions (angry, fear, happy, sad, surprise, neutral) from facial expressions. The CNN architecture includes convolutional, max pooling and fully connected layers. Models are trained on two datasets - FERC and RaFD. Experimental results show that Model C achieves the best testing accuracy of 71.15% on FERC and 63.34% on RaFD. Visualizations of activation maps and a prediction matrix are provided to analyze the model's performance and confusions between emotions. A live demo application is also developed using OpenCV to demonstrate real-time emotion recognition from video frames.
The document provides an overview of machine learning use cases. It begins with an agenda that will discuss the basic framework for ML projects, model deployment options, and various ML use cases like text classification, image classification, object detection, etc. It then covers the basic 5 step framework for ML projects - defining the problem, planning the solution, acquiring and preparing data, designing and training a model, and deploying the solution. Next, it discusses popular methods for various tasks like image classification, object detection, pose estimation. Finally, it shares several use cases for each task to demonstrate real-world applications.
Apple makes it really easy to get started with Machine Learning as a developer. See how you can easily use Create ML and Turi Create to train Machine Learning models and use them in your iOS apps.
This tutor puts a trip to the kingdom of object recognition with computer vision knowledge and an image classifier.
Object detection has been witnessing a rapid revolutionary change in some fields of computer vision. Its involvement in the combination of object classification
as well as object recognition makes it one of the most challenging topics in the domain of machine learning & vision.
Python: Object oriented programming, RTS Tech. Indore
This is a presentation to take your skills to next level. Hope you will like our work to make programming easier for you.
Feel free to contact for the online/offline batches.
How to implement artificial intelligence solutionsCarlos Toxtli
The document provides an overview of how to implement artificial intelligence solutions. It discusses getting started in AI by either creating new techniques as a scientist or implementing existing techniques as an engineer. It then covers various machine learning algorithms like linear regression, decision trees, random forests, naive bayes, k-nearest neighbors, k-means, and support vector machines. Finally, it introduces deep learning concepts like artificial neural networks, neurons, layers, gradients, optimizers, overfitting, and regularization. The document serves as a guide for implementing both machine learning and deep learning techniques for AI applications.
OOPS concepts are one of the most important concepts in high level languages. Here in this PPT we will learn more about Object oriented approach in python programming which includes details related to classes and objects, inheritance, dat abstraction, polymorphism and many more with examples and code.
How to create a neural network that detects people wearing masks. Ultimate description, the A-to-Z workflow for creating a neural network that recognizes images.
A short intro to the paper: https://blog.fulcrum.rocks/neural-network-image-recognition
The Ring programming language version 1.4.1 book - Part 29 of 31Mahmoud Samir Fayed
This document provides documentation for Ring version 1.4.1. It discusses why list indexes start at 1 in Ring rather than 0 as in some other languages. It also covers topics like constructors, what happens when an object is created, using getter and setter methods, and including multiple source files in a project. Various code examples are provided to illustrate concepts around classes, objects, functions, and other Ring features.
IRJET- Object Detection in an Image using Convolutional Neural NetworkIRJET Journal
This document summarizes a research paper on object detection in images using convolutional neural networks. The paper proposes using the ImageAI Python library along with a region-based convolutional neural network model to perform object detection. The methodology loads a pre-trained model, imports the ImageAI detection class, detects objects in an input image, and prints the detected objects and probabilities. The system can accurately detect objects like faces and vehicles from images and has applications in security and traffic monitoring. However, training the neural network requires a large dataset and takes a long time.
Designing a neural network architecture for image recognitionShandukaniVhulondo
The document discusses the design of a basic neural network architecture for image recognition. It begins by outlining a simple design with dense layers but notes this does not work well for images. Convolutional layers are introduced to help detect patterns regardless of location. Max pooling and dropout layers are also discussed to make the network more efficient and robust. The document provides examples of how these various layer types work and combines them into a basic convolutional block that can be stacked for more complex images.
GDE: Lab 1 – Traffic Light Pg. 1
Lab 1: Traffic Light
Setup
Start a new project. Save the scene. Review the submission steps at the bottom to ensure you save the scene
name correctly.
Additional References
Official Unity Tutorials: Unity has some fantastic learning tools to help newcomers become better
familiarized with the tool suite. I cannot recommend this enough.
Unity Scripting Reference: Unity's script reference is fantastic. If you want to know anything about a
particular component or function or class, they have it documented and easily accessible. It won't give
you all the answers, but without this, developing for Unity would become much, much more difficult.
Preparation
Make sure you complete Experiment 1 and 2 before attempting this lab.
Goal
The goal for this lab is to implement a simple traffic light that has a state machine driving its behavior. By the
end, you should have a visual representation of a traffic light with three spheres that represent the three lights.
Using time to determine when to transition, switch the lights from green, to yellow, to red and then back to
green. Additionally, input can be used to change the traffic light's behavior.
http://unity3d.com/learn
http://docs.unity3d.com/ScriptReference/
GDE: Lab 1 – Traffic Light Pg. 2
Instructions
Let's create the actual traffic light together. Here is the step-by-step construction process:
Create a Cube object
Rename the Cube object in the Hierarchy to TrafficLight
Create three Spheres. Rename them to Green Light, Yellow Light, and Red Light. Make the Traffic Light
object the parent of all of the Light objects by dragging all three spheres in the Hierarchy on top of the
TrafficLight object
GDE: Lab 1 – Traffic Light Pg. 3
Once all three spheres are children of the Traffic Light object, the only other thing left to add to the scene is
just a little bit of light. Add a directional light to the scene. The position of a directional light does not
matter. You can move it out of the way. If the light is too bright, you can lower the intensity in the inspector.
The scene should now roughly look like this. In scale mode, shape the cube until it looks like a traffic light.
Move the three light spheres such that they protrude through one side of the traffic light object. You'll
notice that the spheres have taken on the scale of the parent. It's important to remember that translation,
rotation and scale actions performed on the parent affect the children in the same way.
Use the scale tool to make the spheres look like a light again and position them accordingly
GDE: Lab 1 – Traffic Light Pg. 4
Once all three lights are in place, move the Main Camera object so that the Traffic Light object is in the
frame and you can see the light spheres
Scripting
Now you are ready to begin the programming portion of this lab! Select the Traffic Light object and add a
.
The Ring programming language version 1.8 book - Part 7 of 202Mahmoud Samir Fayed
Ring 1.8 includes several new features and improvements such as better performance, new applications like Find in Files and String2Constant, more 3D samples, compiling on Manjaro Linux, updated libraries, and notes for extension creators. Key updates include 10-100% faster performance, a Find in Files application, a String2Constant tool to convert code to use constants, a StopWatch application, improved Form Designer, RingQt, and code generator for extensions. The release provides better compiler, virtual machine, and overall performance.
C++ [ principles of object oriented programming ]Rome468
C++ is an enhanced version of C that adds support for object-oriented programming. It includes everything in C and allows for defining classes and objects. Classes allow grouping of related data and functions, and objects are instances of classes. Key concepts of OOP supported in C++ include encapsulation, inheritance, and polymorphism. Encapsulation binds data and functions together in a class and allows hiding implementation details. Inheritance allows defining new classes based on existing classes to reuse their functionality. Polymorphism enables different classes to have similarly named functions that demonstrate different behavior.
Start machine learning in 5 simple stepsRenjith M P
Simple steps to get started with machine learning.
The use case uses python programming. Target audience is expected to have a very basic python knowledge.
- Tensor Flow is a library for large-scale machine learning and deep learning using data flow graphs. Nodes in the graph represent operations and edges represent multidimensional data arrays called tensors.
- It supports CPU and GPU processing on desktops, servers, and mobile devices. Models can be visualized using TensorBoard.
- An example shows how to build an image classifier using transfer learning with the Inception model. Images are retrained on flower categories to classify new images.
- Distributed Tensor Flow allows a graph to run across multiple machines in a cluster for greater performance.
Python programming computer science and engineeringIRAH34
Python supports object-oriented programming (OOP) through classes and objects. A class defines the attributes and behaviors of an object, while an object is an instance of a class. Inheritance allows classes to inherit attributes and methods from parent classes. Polymorphism enables classes to define common methods that can behave differently depending on the type of object. Operator overloading allows classes to define how operators like + work on class objects.
This document provides an overview of object-oriented programming concepts including classes, objects, encapsulation and abstraction. It begins by describing the objectives of learning OOP which are to describe objects and classes, define classes, construct objects using constructors, access object members using dot notation, and apply abstraction and encapsulation. It then compares procedural and object-oriented programming, noting that OOP involves programming using objects defined by classes. Key concepts covered include an object's state consisting of data fields and behavior defined by methods. The document demonstrates defining classes, creating objects, accessing object members, and using private data fields for encapsulation.
These questions will be a bit advanced level 2sadhana312471
These questions will be a bit advanced(Intermediate) in terms of Python interview.
This is the continuity of Nail the Python Interview Questions.
The fields that these questions will help you in are:
• Python Developer
• Data Analyst
• Research Analyst
• Data Scientist
Similar to Transfer learning, active learning using tensorflow object detection api (20)
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
2. Tensorflow Object detection api
Google released the tensorflow-based object detection API in June 2017.
By offering models with various speeds and accuracy, the general public can easily detect objects.
3. This announcement describes only object detection.
The more you go back, the harder the detection will be.
The object detection API supports both box and mask types.
Object detection
Detection of various objects in a single box is generally called object detection.
Although not mentioned in this announcement, it is called object sementatin that detects an object according to a shape rather than a box shape.
There are various types of image detection.
image classification is simply a determination of what the image is.
You can also localize from the image classification to the exact location of the object
4. Tensorflow Object detection api
Make
tfrecord
Re train Export Test
Evaluate
Loop
Optional
The most basic flow of the tensorflow object detection api.
All functions are provided to process the data to api, train this data, export the model to a usable form, and test this model.
You can also evaluate ongoing or completed models.
6. Make tfrecord
python generate_tfRecord.py
--csv_input=data/train.csv
--output_path=data/train.record
The data type for using object detection api is tfrecord.
The source for creating this tfrecord file is provided, below is how to use it.
Set the csv file containing the image information and the output directory.
7. Finally, we need four numeric values to tell where each object is located in the image.
xmin, ymin, xmax, ymax values. With this value, you can create a box in which the object is located.
There can be multiple objects in an image.
Make tfrecord
Image
Label
Xmin , Ymin
Xmax , Ymax
The data needed to create the tfrecord file is as follows:
First, you need the original image file.
Next, we need a label for each object in the image.
9. Make tfrecord
When you use labelimg, an xml file is created that contains the label of each object and the values of xmin, ymin, xmax, and ymax.
10. Make tfrecord
If we have an image to train, xml, and a labelmap that stores the id for each class, we can generate a tfrecord file.
11. Make tfrecord tfgenerator ( custom )
The default code is train and test.
Since this is a troublesome task, I have created a program that will do this automatically.
When executed, each class is distributed according to the ratio of train, validate, and a log is generated.
13. Re train Transfer learning
python object_detection/train.py
--logtostderr
--pipeline_config_path=pipeline.config
--train_dir=train
object detection api python base code related to train.
Set up the model you want to train, and set the training output directory.
There are many other options.
14. ssd_mobilenet_v1_coco stands for ssd_mobilenet, which trained Coco Dataset.
Pre-trained datasets include COCO, Kitti, and Open Images datasets.
Re train Object detection API model zoo
ssd_mobilenet_v1_coco
Dataset : COCO dataset, Kitti dataset, Open Images dataset.
It provides various pre-trained models for object detection of tensorflow.
Each model has different speed and accuracy.
The first part of the model name is the algorithm, and the second part is the data set.
15. Re train What you want to detect
COCO dataset Instance
If the class you want to detect is an instance already in coco, kitty, or open image dataset, you do not need to train.
Just use the built-in model and select it.
16. Re train What you want to detect
?
However, if you want to classify dinosaurs such as Tyrannosaurus and Triceratops as shown in the picture, we need to train on the above classes.
Here we have two choices.
Create models with completely new layers, or leverage existing models.
17. Re train Make own model
It is known that the depth of the neural network and the wider the layer, the higher the accuracy.
The picture is google inception v2 model.
Deep networks require exponential computing resources.
18. Re train Make own model
google TPU 2.0
45 TFLOPS
Titan v
15 TFLOPS
gtx 1080
9 TFLOPS
gtx 970
4 TFLOPS
This time I released tpu 3.0 on Google i / o 2018.
To simplify the quantification of the flops power of tpu 2.0 and other graphics cards: Perhaps the maximum gpu that a non-enterprise user can have is titan.
Google uses a tpu pot that contains 64 tpu. Google may take weeks or even months to do the work for an hour.
19. Teaching a dinosaur to a child can be quite challenging, but teaching a dinosaur to an adult is a lot easier.
Transfer learning is similar to teaching dinosaurs to adults.
But if you show dinosaur photos to an adult, you will see them as animals by looking at their eyes, legs and tails.
Is not that a kind of reptile based on various information such as skin texture? You will think.
If you show your baby a photo of a dinosaur, the baby probably has no idea.
Babies will not know what their eyes, tails, and feet are, and they will not even have the concept of point, line.
Re train Transfer learning
?
So we will use a method called transfer learning.
There are two people who do not know dinosaurs at all.
One is a very young baby and the other is an adult.
20. Re train Transfer learning
This photo is a visualization of the weights we have for each layer. Looking closer at the picture, the nearest layer to the input has a low-dimensional feature
such as a line, As you go to the output, you can see that it has more detailed high-dimensional features.
Since low dimensional features have any object, transfer learning takes it as it is.
21. Re train Transfer learning
replace custom class
For example, inception v2 model, we take only the weights of previous layers and change only the last fully connected layer to custom labels.
23. If you give the checkpoint file that you have trained as an input, you can use frozon_inference_graph.pd
The file is created.
We can do the prediction from anywhere with this frozon_inference_graph.pd.
Export
python object_detection/export_inference_graph.py
--input_type=image_tensor
--pipeline_config_path=pipeline.config
--trained_checkpoint_prefix=train/model.ckpt-xxxxx
--output_directory=output
train_checkpoint_prefix
output_directory
Once you train your model and reach your goal, we need to export this output to an executable file.
Below is the object detection default python export code.
25. Evaluate
python eval.py
--logtostderr
--pipeline_config_path=training/ssd_mobilenet_v1_coco.config
--checkpoint_dir=training/
--eval_dir=eval/
tensorboard --logdir=./
We would like to quantitatively identify how good this model is during training, or after the train is over. The api to use is evaluate.
Below is the python evaluate code provided by the object detection api.
We can visually check the evaluate result value through the tensorboard.
27. Evaluate Using tensorboard
First, you can see the loss value for classification or localization.
The lower the loss value, the better.
28. Evaluate Using tensorboard
Second, you can check the accuracy of each class.
For different reasons, each character does not have the same classification accuracy.
29. Evaluate Using tensorboard
Third, you can measure the accuracy of the entire validation data.
Accuracy measurement method is 0.5IoU.
30. Evaluate Using tensorboard
IoU is an abbreviation of intersection of union, which is the ratio of the sum of the overlapping areas of the actual ground truth box and the area of the prediction
truth box.
As you can see in the photo, the accuracy is not 100% even though the predicted value wraps around the actual value.
32. Tensorflow object detection helper tool
Make
tfrecord
Re train Export Test
Evaluate
Automation
The set of data creation, train, evaluate, and export described above must be done manually by default.
I created a program that automatically feels inconvenienced.
33. Tensorflow object detection helper tool
Since you have set all the settings to the default settings that you can run by default, you only need to select the model and enter the training step.
34. Tensorflow object detection helper tool
https://github.com/5taku/tensorflow_object_detection_helper_tool
You can check the log of the result value and the execution time of each process.
36. Test
It is the code that loads the file generated by export and loads it into memory.
These images are detected using the loaded graph file.
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
# Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb
37. Test cloud resource ( google cloud compute engine )
16 vCPU
60gb Ram
1 x NVIDIA Tesla P100
ubuntu 16.0.4
python 2.7.12
tensorflow 1.8.0
cuda 9.0
cudnn 7.1
All the tests related to the announcement were tested on the Google cloud virtual machine.
The specifications are as follows.
39. Test dataset
name total bounding_box
Homer Simpson 2246 612
Ned Flanders 1454 595
Moe Szyslak 1452 215
Lisa Simpson 1354 562
Bart Simpson 1342 554
Marge Simpson 1291 557
Krusty The Clown 1206 226
Principal Skinner 1194 506
Charles Montgomery Burns 1193 650
Milhouse Van Houten 1079 210
Chief Wiggum 986 209
Abraham Grampa Simpson 913 595
Sideshow Bob 877 203
Apu Nahasapeemapetilon 623 206
Kent Brockman 498 213
Comic Book Guy 469 208
Edna Krabappel 457 212
Nelson Muntz 358 219
Each character is fully labeled. (300 to 2000)
Some of them have box data. (200 to 600)
The number of data is not constant for each character.
40. Train , Evaluate , Export
model = faster rcnn resnet 50
training step = 140,000
training : validate rate = 8 : 2
I have trained 140,000 faster rcnn resnet 50 pre-train models with my tool.
Performs evaluation every 10,000 times.
Total training time is approximately 6 hours and 30 minutes.
41. Test tensor board result
The overall iou value was 0.67, which did not perform well.
IoU for each character is also inferior in performance.
42. Test handmade check
I did not touch the figures well, so I decided to check the image by inserting the actual image.
Twenty images per character were predialized. There are a few errors that I see in my eyes :(
The criteria for each T F N are as follows. (If multiple labels are detected on one object, the highest accuracy is T.)
True ( T ) False ( F ) None ( N )
43. Improved accuracy
1. Change Model
2. Increase data
a. Data augmentation
b. Labelling
c. Active learning
3. etc….
The result of training 6,000 data for 140,000 times is not satisfactory.
How can I improve my accuracy?
There are many ways to increase accuracy.
44. Improved accuracy change model
Model name
Speed
(ms)
COCO
mAP[^1]
Size
Training
time
Outputs
ssd_mobilenet_v1_coco 30 21 86M 9m 44s Boxes
ssd_mobilenet_v2_coco 31 22 201M 11m 12s Boxes
ssd_inception_v2_coco 42 24 295M 8m 43s Boxes
faster_rcnn_inception_v2_coco 58 28 167M 4m 43s Boxes
faster_rcnn_resnet50_coco 89 30 405M 4m 28s Boxes
faster_rcnn_resnet50_lowproposals_coco 64 405M 4m 30s Boxes
rfcn_resnet101_coco 92 30 685M 6m 19s Boxes
faster_rcnn_resnet101_coco 106 32 624M 6m 13s Boxes
faster_rcnn_resnet101_lowproposals_coco 82 624M 6m 13s Boxes
faster_rcnn_inception_resnet_v2_atrous_coco 620 37 712M 18m 6s Boxes
faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco 241 712M Boxes
faster_rcnn_nas 1833 43 1.2G 47m 49s Boxes
faster_rcnn_nas_lowproposals_coco 540 1.2G Boxes
In choosing a model, tensorflow already offers several models.
We only need to select the model considering the accuracy, prediction speed, and training speed.
In the tests I've done, resnet50, inception v2 has guaranteed the best performance.
45. Improved accuracy Data augmentation
original
size up
size down
brightness up
brightness down
Rotating, zooming in and out using a single image is a widely used method of increasing data.
However, if you change the size of the image, you have to change the box position as well, which will be quite cumbersome.
46. Improved accuracy labelling
Too Expensive COST!!!
Another way to increase data is to manually create one object box boundary.
Manual labeling results in many labeling costs.
47. Active learning
Introducing Active learning, which allows labeling of images at low cost.
The basic concept is to use the data obtained by predicting the candidate data with the model made by using a small amount of data, and to use the result as
the training data again. Of course, the predicted result is verified by the person or machine named oracle.
48. Active learning
Training
labeled
data
unlabeled
data pool
Prediction
Query
if detection label == image label
an oracle
Simpson dataset is an ideal data set for active learning because each unboxed image is labeled.
The resnet 50 algorithm is trained 20,000 times with 6,000 box data provided initially.
Then, 12,000 unboxed images are predicted, and if the prediction result matches the image label, it is moved to the train dataset.
49. Active learning
The number of images adopted in the training set is as follows.
At the first prediction, two times the image was adopted, and since then the number has been reduced, but it has been adopted as a steady train set.
ID NAME 1 step 2 step 3 step 4 step 5 step 6 step 7 step
1 homer_simpson 626 1830 1980 2018 2050 2061 2066
2 ned_flanders 607 1214 1335 1355 1381 1387 1388
3 moe_szyslak 215 692 945 1121 1170 1238 1246
4 lisa_simpson 578 1116 1217 1234 1271 1271 1271
5 bart_simpson 571 1133 1243 1256 1256 1265 1272
6 marge_simpson 568 1229 1244 1248 1248 1250 1250
7 krusty_the_clown 239 558 1002 1139 1145 1147 1147
8 principal_skinner 519 1054 1110 1114 1116 1119 1122
9 charles_montgomery_burns 663 1058 1085 1099 1104 1105 1107
10 milhouse_van_houten 223 776 897 925 952 954 965
11 chief_wiggum 206 561 788 837 858 863 867
12 abraham_grampa_simpson 608 844 875 878 878 878 878
13 sideshow_bob 202 217 420 651 733 760 772
14 apu_nahasapeemapetilon 223 537 570 581 588 588 590
15 kent_brockman 217 250 270 324 371 415 415
16 comic_book_guy 224 351 387 395 398 403 403
17 edna_krabappel 226 227 327 370 384 386 390
18 nelson_muntz 217 284 287 288 297 297 299
Total 6932 13931 15982 16833 17200 17387 17448
Increase rate 6999 2051 851 367 187 61
50. train , evaluate , export
The first and seventh data split ratios.
Because the number of images per character differs a lot, the number of training is uneven.
However, the number of training data in all classes increased by 30% ~ 300%.
51. train , evaluate , export
Because there is a process of predicting 11,000 images, it took 21 hours for a task that took 6 hours and 30 minutes to work with active learning.
52. Test tensor board result
When the localization loss value was learned from the existing 6900 chapters, it decreased from 0.115 to 0.06 after applying active learning.
53. Test tensor board result
Accuracy for each class has also gradually decreased.
In active learning, the initial graph value is suddenly suddenly the train data is suddenly increased, but it is not certain.
54. Test tensor board result
0.66
0.87
IOU accuracy for the entire class has been increased from 0.66 to 0.87.
55. Active learning
As data increases, the accuracy of each test case increases.
Accuracy increases rapidly in the early days and continues to increase in the future.
57. Active learning
Data 17,448
0 to 140,000
0 to 140,000
active learning every 20,000
Data 6,932
0 to 140,000
The first picture is the result of learning 6,932 pieces of data 140,000 times.
The second picture is the result of learning 140,000 times while adding training data to active learning every 20,000 times.
The last picture is the result of learning 140,400 data from the beginning of 17,448 data from the last result of active learning.
58. Active learning
Data 17,448
0 to 140,000
0 to 140,000
active learning every 20,000
Data 6,932
0 to 140,000
Looking at the results, some characters seem to show little increase in accuracy.
This is because the distribution of the entire dataset per class is so small that even if active learning is used, the percentage of the character in the entire data
set is rather lower.
comic_book_guy : 3.23 %
edna_krabappel : 3.26 %
nelson_muntz : 3.13 %
comic_book_guy : 2.30 %
edna_krabappel : 2.23 %
nelson_muntz : 1.71 %
59. Active learning
For kent brockman, the accuracy is fairly high, even though the data rate is 2.37%.
Though the test data may be luckily well-detected, I think that the characteristic white hair represents different classes and salient features.
kent_brockman : 2.37 %
60. Thank you!
1. Tensorflow Object detection API
2. Transfer learning
3. Object detection API helper tool
4. Active learning ( with test result )
Today we looked at the entire Tensorflow object detection API.
I also had a brief introduction to the concepts of tranfer learning and active learning and the helper tool I created.
We have confirmed that the accuracy of actual data is increased by active learning with low labeling cost.
구글은 tensorflow기반의 object detection api를 2017년 6월에 공개하였습니다. 다양한 속도와 정확도를 가지는 모델들을 선 제공하여,
일반인들도 쉽게object를 디텍션 할수 있게 되었습니다.
이미지를 디텍션 하는 여러가지 종류가 있습니다. 단순하게 한장의 이미지가 무엇인지 판별하는 image classification, image classification 에서 정확한 위치까지 localization 할수도 있습니다.
한장의 이미지에서 다양한 object 를 box 형태로 detection 하는 것을 일반적으로 object detection 이라고 합니다.
이번 발표에서 언급하지는 않지만, object 를 box형태가 아닌 shape을 따라서 detection 해주는것을 object sementatin이라고 합니다.
뒤로 갈수록 난이도가 올라가고 object detection api는 box형태와 mask형태 모두를 지원합니다.
tensorflow object detection api의 가장 기초적인 흐름입니다.
api에 맞도록 data를 가공하고 이 데이터를 training하고, 모델을 사용가능한 형태로 export하고 , 이 모델을 test 하는 모든 기능이 제공됩니다.
또한 진행중인 또는 진행이 완료된 모델을 평가할 수도 있습니다.
object detection api 를 사용하기 위한 데이터 타입은 tfrecord 파일입니다.
이 tfrecord 파일을 만들기 위한 소스가 제공되고, 아래는 사용 방법입니다.
이미지의 정보가 들어있는 csv 파일과 output directory를 셋팅합니다.
tfrecord 파일의 생성에 필요한 데이터는 다음과 같습니다.
우선, 원본 이미지 파일이 필요합니다. 다음으로는 이미지에 포함된 각 오브젝트의 라벨이 필요합니다.
마지막으로, 각 오브젝트가 이미지에서 어느곳에 위치하고 있는지 알려주는 4개의 숫자값이 필요합니다.
xmin , ymin , xmax , ymax 값입니다. 이 값이 있으면, 오브젝트가 위치하는 box형태를 만들수 있습니다.
하나의 이미지에서는 여러개의 object 가 존재할 수 있습니다.
오브젝트의 위치 포인트를 수작업으로 알아내기는 어렵습니다.
이를 자동으로 해주는 다양한 프로그램이 있습니다.
저는 가장 널리 알려진 labelimg를 사용하였습니다.
labelimg 를 사용하면, 각 object 의 label과 xmin , ymin , xmax , ymax 값이 포함된 xml파일이 생성됩니다.
트레이닝 시킬 이미지, xml , 각 클래스의 id를 저장하고 있는 labelmap이 있으면 우리는 tfrecord파일을 생성할 수 있습니다.
기본 코드로는 train, test 파일을 따로 생성해줘야 합니다.
해당 작업은 귀찮은 작업이기 때문에 이를 자동으로 하여주는 프로그램을 만들었습니다.
실행하면 각 클래스가 train, validate 비율대로 알아서 분배되고, 로그가 생성됩니다.
train에 관련한 object detection api python 기본 코드입니다.
트레이닝 시키려는 모델에 대한 설정, training output 디렉토리 등을 설정합니다.
이 외에도 많은 옵션이 존재합니다.
tensorflow 를 object detection 하기위한 pre-train된 여러가지의 모델을 미리 제공하여 줍니다.
각 모델은 서로 다른 속도와 정확도를 가집니다.
모델 이름의 앞부분은 알고리즘 , 뒷부분은 데이터셋을 의미합니다.
ssd_mobilenet_v1_coco는 코코데이터 셋을 트레이닝한 ssd_mobilenet 이라는 의미입니다.
pre-train 된 데이터셋은 COCO , Kitti , Open Images dataset 이 있습니다.
만약 여러분이 디텍션 하려는 클래스가 coco나 kitti, open image dataset 에 이미 있는 instance라면, training 작업을 할 필요가 없습니다.
그냥 기본 제공되는 모델을 적절하게 선택하여 쓰시기만 하면 됩니다.
하지만 사진과 같이 티라노사우르스, 트리케라톱스 등의 공룡을 분류하고 싶다고 한다면, 우리는 위의 클래스에 대한 트레이닝 작업을 거쳐야 합니다.
여기서 우리는 두가지 선택을 할수 있습니다.
완전히 새로운 레이어를 쌓은 모델을 만들거나, 기존 모델을 활용하거나.
알려진 바에 의하면, 신경망이 깊고 레이어가 넓을수록 정확도는 높아집니다.
사진은 google inception v2 모델입니다.
상당히 깊은 망을 사용하고 , 망이 깊어질수록 기하급수적인 컴퓨팅 리소스가 필요합니다.
이번에 구글 i/o 2018에서 tpu 3.0을 발표하였습니다.
tpu 2.0과 다른 그래픽카드의 flops 파워를 단순 수치화 해보면 다음과 같습니다.
아마 기업이 아닌 일반 사용자가 가질수 있는 최대한의 gpu는 titan 정도일 것입니다.
구글은 tpu 64개를 묶은 tpu pot 를 사용합니다.
구글이 한시간이면 할 작업을 일반인들은 몇주, 몇달이 걸릴수도 있습니다.
따라서 우리는 transfer learning 이라는 방법을 사용할 것입니다.
여기 공룡을 전혀 알지 못하는 두사람이 있습니다.
한명을 아주 어린 아기이고 다른 한명은 성인입니다.
아기에게 공룡사진을 보여주면, 아기는 아마도 아무 생각도 없을것입니다.
아기는 눈, 꼬리, 발이 무엇인지도 모르고, 심지어 점, 선의 개념도 없을 것입니다.
하지만 성인에게 공룡사진을 보여준다면, 눈, 다리, 꼬리를 보고 동물이라고 인식하게 될 것이고, 피부 질감 등등 여러 정보를 바탕으로 파충류 종류가 아닐까? 라고 생각하게 될것입니다.
아이에게 공룡을 가르치는것은 상당히 어렵겠지만, 어른에게 공룡을 가르치는것은 훨씬 쉬울것입니다.
transfer learning은 어른에게 공룡을 가르치는 것과 비슷합니다.
이 사진은 각 레이어 마다 가지고 있는 가중치를 시각화 한 것입니다.
사진을 자세히 보면, input에서 가까운 쪽 레이어는 선과 같은 저차원 특징을 지니고,
output 으로 갈수록, 더 디테일한 고차원 특징을 가짐을 알수 있습니다.
저차원의 특징들은 어느 object 나 가지고 있으므로, transfer learning 에서는 이를 그대로 가지고 갑니다.
inception v2 모델을 예로 들자면, 이전의 레이어들의 가중치는 그대로 가지고 가고 마지막 fully connected layer만 custom label들로 바꿔서 이 가중치만을 학습시킵니다.
우리는 모델을 training 시키는 도중이나, train이 끝난후에, 이 모델이 얼마나 좋은 모델인지 정량적으로 확인해보고 싶을것입니다. 이때 사용하는 api가 evaluate 입니다.
아래는 object detection api가 제공하는 python evaluate 코드입니다.
우리는 evaluate 결과값을 tensorboard를 통하여 시각적으로 확인할 수 있습니다.
evaluation 된 tensorboard 의 메인화면입니다.
각 내용에 대하여 간단하게 살펴보겠습니다.
첫번째로 classification 이나 localization 에 대한 loss 값을 확인할 수 있습니다.
loss 값은 낮을수록 좋습니다.
두번째로 각 class 마다의 정확도를 확인할 수 있습니다.
여러가지 이유로 각 캐릭터마다 동일한 분류 정확도를 가지지 않습니다.
세번째로 전체 validation 데이터에 대한 정확도를 측정할 수 있습니다.
정확도 측정 방식은 0.5IoU입니다.
IoU란 intersectin of union의 약자로써, 실제 groundtruth box와 prediction truth박스의 면적에서 겹쳐지는 면적의 합의 비율입니다.
사진에서 보이는것처럼 예측값이 실제 값 전체를 감싸고 있어도 정확도가 100%라고 할 수 없습니다.
마지막으로 tensorboard 에서 우리는 evaluation 스텝마다 이미지의 예측값이 어떻게 변화하는지 확인할 수 있습니다.
위에서 설명한 일련의 data creation, train , evaluate , export는 기본 api는 하나하나 수작업으로 수행해주어야 합니다.
불편함을 느껴 자동으로 수행하여 주는 program을 만들었습니다.
모든 설정을 기본설정으로 실행가능한 기본설정으로 맞추어 두었으므로 모델의 선택과 training step만 입력하면 됩니다.
각 process 가 수행된 결과값과 수행시간을 로그로 확인할 수 있습니다.
export 하여 생성된 파일을 로딩하고 메모리에 로딩하는 코드입니다.
이렇게 로딩된 그래프 파일을 활용하여 이미지들을 detection 하게 됩니다.
발표에 관련된 모든 테스트는 구글 클라우드 virtual machine에서 테스트 하였습니다.
사양은 다음과 같습니다.
테스트에 사용된 데이터 셋은 kaggle 의 simpson dataset 을 사용하였습니다.
각 캐릭터들은 전부 라벨링 되어 있습니다. ( 300 ~ 2000 )
그중 일부의 이미지들은 box 데이터를 가지고 있습니다. ( 200 ~ 600 )
각 캐릭터별로 data 의 갯수는 일정하지 않습니다.
제가 만든 tool로 faster rcnn resnet 50 pre-train모델을 140,000 트레이닝 시켰습니다.
10,000번마다 evaluation을 수행합니다.
총 training 시간은 대략 6시간 30분 정도입니다.
각 캐릭터별 IoU도 성능이 좋지 않습니다.
수치로는 잘 와닿지 않아서 실제 이미지를 넣어서 확인해보기로 하였습니다.
각 캐릭터당 20장씩 이미지를 predIction 시켰습니다. 제 눈으로 확인하여서 약간의 오류는 존재합니다 :(
각각 T F N의 기준은 다음과 같습니다.
model을 선택함에 있어 tensorflow 는 이미 여러가지의 model을 제공합니다.
우리는 정확도, prediction 속도, training 속도 등을 잘 고려하여 모델을 선택하기만 하면 됩니다.
제가 해본 테스트 상에선 resnet50 , inception v2가 가장 좋은 성능을 보장했습니다.
한장의 이미지를 사용하여 회전하거나, 확대, 축소하는 방법은 이전부터 널리 사용하던 데이터 증가 방법입니다.
하지만 이미지의 사이즈가 변경되면 이와 같이 box 위치도 변경시켜주어야 하며, 이는 상당히 번거로운 작업이 될것입니다.
다른 데이터 증가 방법으로는 수작업으로 한장한장 object box boundary를 생성하는 방법입니다.
수작업 라벨링은 많은 라벨링 비용을 초래합니다.
적은 비용으로 이미지의 라벨링을 할수 있는 Active learning 을 소개합니다.
기본 컨셉은 적은 데이터를 사용하여 만든 모델로 후보군 데이터를 프레딕션 시켜서 나온 결과값을 다시 트레이닝 데이터로 활용하는 것입니다.
물론 예측된 결과값은 oracle이라는 사람또는 기계에 의하여 확인작업을 수행합니다.
simpson dataset은 unboxed image가 각각 labeling 되어 있어서, active learning 을 하기에 아주 적합한 데이터 셋입니다.
최초 제공되는 box 데이터 6,000여장으로 resnet 50 알고리즘을 20,000번 트레이닝 시킵니다.
이후 unboxed image 12,000여장을 prediction 시켜서 prediction 결과값이 image label과 일치하면 이를 train dataset으로 옮깁니다.
7번의 스텝을 진행하였고, training set 으로 채택된 image 의 갯수는 다음과 같습니다.
처음 prediction 하였을때 2배정도의 이미지가 채택되었고, 그 이후로 갯수는 줄어들지만 꾸준하게 train set 으로 채택되었습니다.
첫번째와 7번째 데이터 분할 비율입니다.
캐릭터당 이미지 갯수가 많이 차이가 나기때문에 트레이닝 개수가 고르지 않습니다.
하지만 모든 클래스의 training 데이터 수가 30% ~ 300% 증가하였습니다.
11,000장의 이미지를 prediction 하는 과정이 있기 때문에, 기존에 6시간 30분걸리는 작업이 active learning 을 적용한 작업에선 21시간이 걸렸습니다.
localization loss 값이 기존 6900장으로 학습하였을때, 0.115 에서 active learning 적용한 후에는 0.06으로 줄어들었습니다.
각 클래스별 정확도 역시 점진적으로 줄어들었습니다.
active learning 시 초반 그래프 값이 어지러운 건 갑자기 train 데이터가 급격히 증가해서 인것 같으나 확실하진 않습니다.
전체 클래스에 대한 IOU 정확도는 0.66 에서 0.87로 증가하였습니다.
데이터가 증가하면서 각 테스트 케이스별 정확도 증가추이 입니다. 정확도가 초반에 급격하게 증가하고 이후에도 지속적으로 증가합니다.
결과 값을 보면, 몇몇 캐릭터들은 정확도의 증가가 거의 없어 보입니다.
이는 active learning 을 해도 전체 데이터셋의 클래스당 분포가 워낙 적어, 전체 데이터셋에서 해당 캐릭터가 차지하는 비율이 오히려 더 내려가서 입니다.
kent brockman 의 경우는 데이터 비율이 2.37 % 임에도 불구하고 정확도는 상당히 높습니다.
테스트 데이터가 운이 좋게 잘 detection 된 것일수도 있지만, 저는 특징적인 백발이 다른 클래스와 두드러진 특징을 나타내어서 라고 생각합니다.
오늘 우리는 Tensorflow object detection api 전반에 대하여 알아보았습니다.
또한, tranfer learning , active learning 의 개념과 제가 만든 helper tool에 대한 소개를 잠깐 했었구요.
active learning 으로 적은 labeling cost로 실제 데이터의 정확도가 올라감을 확인하였습니다.