- How to tackle an object detection competition
- Schwert's 6th-place solution on Open Images Challenge 2019
- presented at the lunch workshop of the 26th Symposium on Sensing via Image Information (2020).
Exemplar: Designing Sensor-based interactions by demonstration... (a CHI2007 ...bjoern611
Authoring Sensor-based Interactions by Demonstration with Direct Manipulation and Pattern Recognition
Björn Hartmann, Leith Abdulla, Manas Mittal, Scott Klemmer
Contributes method and tool for rapidly designing sensor-based interactions by demonstration; emphasizes control of generalization criteria through integrating direct manipulation and pattern recognition; offers theoretical and first-use lab evaluations.
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...Ravi Kiran B.
Modern perception pipelines in autonomous driving (AD) systems are based on Deep Neural Networks (DNNs) which utilize multiple hyper-parameter configurations and training strategies. Data augmentations is now a well-established training strategy to improve the generalization of DNNs, especially in a low dataset regime. Self-supervised learning and semi-supervised methods depend heavily on data augmentation strategies. In this study we view generalization due to data augmentations training DNNs since they implicitly model the geometric, viewpoint based transformations present on images/pointclouds due to noise, perspective, motion of the ego-vehicle. We shortly review current data augmentation strategies for perception tasks in AD, and recent developments on understanding its effects on model generalization.
In the talk we shall review data augmentation strategies through two case studies:
- Improving model performance of monocular 3D object detection model by using geometry preserving data augmentations on images
- Understand the role of data augmentation in reducing data redundancy and improving label efficiency within an active learning pipeline
Chris Varekamp (Philips Group Innovation, Research): Depth estimation, Proces...AugmentedWorldExpo
A talk from the Develop Track at AWE USA 2018 - the World's #1 XR Conference & Expo in Santa Clara, California May 30- June 1, 2018.
Chris Varekamp (Philips Group Innovation, Research): Depth estimation, Processing & Rendering for Dynamic 6DoF VR
In this talk I will discuss how a real-time depth-based processing chain can be built using our experience in stereo-to-depth conversion for autostereoscopic displays.
http://AugmentedWorldExpo.com
Corrosion Detection Using A.I : A Comparison of Standard Computer Vision Tech...csandit
In this paper we present a comparison between stand
ard computer vision techniques and Deep
Learning approach for automatic metal corrosion (ru
st) detection. For the classic approach, a
classification based on the number of pixels contai
ning specific red components has been
utilized. The code written in Python used OpenCV li
braries to compute and categorize the
images. For the Deep Learning approach, we chose Ca
ffe, a powerful framework developed at
“Berkeley Vision and Learning Center” (BVLC). The
test has been performed by classifying
images and calculating the total accuracy for the t
wo different approaches.
Falling costs with rising quality via hardware innovations and deep learning.
Technical introduction for scanning technologies from Structure-from-Motion (SfM), Range sensing (e.g. Kinect and Matterport) to Laser scanning (e.g. LiDAR), and the associated traditional and deep learning-based processing techniques.
Note! Due to small font size, and bad rendering by SlideShare, better to download the slides locally to your device
Alternative download link for the PDF:
https://www.dropbox.com/s/eclyy45k3gz66ve/proptech_emergingScanningTech.pdf?dl=0
The guide for design wrapper of tensorflow to build model easily.
All the codes above are available on my github.
https://github.com/NySunShine/fusion-net
This slide was presented at the Meeting on Image Recognition and Understanding (MIRU) 2013, Tokyo, Japan. This work was awaded the MIRU Nagao prize. The authors are: I. Sato, M. Ambai, and K. Suzuki (Denso IT Laboratory, Inc.).
Perceptually Lossless Compression with Error Concealment for Periscope and So...sipij
We present a video compression framework that has two key features. First, we aim at achieving perceptually lossless compression for low frame rate videos (6 fps). Four well-known video codecs in the literature have been evaluated and the performance was assessed using four well-known performance metrics. Second, we investigated the impact of error concealment algorithms for handling corrupted pixels
due to transmission errors in communication channels. Extensive experiments using actual videos have been performed to demonstrate the proposed framework.
Journal club done with Vid Stojevic for PointNet:
https://arxiv.org/abs/1612.00593
https://github.com/charlesq34/pointnet
http://stanford.edu/~rqi/pointnet/
Deep learning for Indoor Point Cloud processing. PointNet, provides a unified architecture operating directly on unordered point clouds without voxelisation for applications ranging from object classification, part segmentation, to scene semantic parsing.
Alternative download link:
https://www.dropbox.com/s/ziyhgi627vg9lyi/3D_v2017_initReport.pdf?dl=0
Exemplar: Designing Sensor-based interactions by demonstration... (a CHI2007 ...bjoern611
Authoring Sensor-based Interactions by Demonstration with Direct Manipulation and Pattern Recognition
Björn Hartmann, Leith Abdulla, Manas Mittal, Scott Klemmer
Contributes method and tool for rapidly designing sensor-based interactions by demonstration; emphasizes control of generalization criteria through integrating direct manipulation and pattern recognition; offers theoretical and first-use lab evaluations.
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...Ravi Kiran B.
Modern perception pipelines in autonomous driving (AD) systems are based on Deep Neural Networks (DNNs) which utilize multiple hyper-parameter configurations and training strategies. Data augmentations is now a well-established training strategy to improve the generalization of DNNs, especially in a low dataset regime. Self-supervised learning and semi-supervised methods depend heavily on data augmentation strategies. In this study we view generalization due to data augmentations training DNNs since they implicitly model the geometric, viewpoint based transformations present on images/pointclouds due to noise, perspective, motion of the ego-vehicle. We shortly review current data augmentation strategies for perception tasks in AD, and recent developments on understanding its effects on model generalization.
In the talk we shall review data augmentation strategies through two case studies:
- Improving model performance of monocular 3D object detection model by using geometry preserving data augmentations on images
- Understand the role of data augmentation in reducing data redundancy and improving label efficiency within an active learning pipeline
Chris Varekamp (Philips Group Innovation, Research): Depth estimation, Proces...AugmentedWorldExpo
A talk from the Develop Track at AWE USA 2018 - the World's #1 XR Conference & Expo in Santa Clara, California May 30- June 1, 2018.
Chris Varekamp (Philips Group Innovation, Research): Depth estimation, Processing & Rendering for Dynamic 6DoF VR
In this talk I will discuss how a real-time depth-based processing chain can be built using our experience in stereo-to-depth conversion for autostereoscopic displays.
http://AugmentedWorldExpo.com
Corrosion Detection Using A.I : A Comparison of Standard Computer Vision Tech...csandit
In this paper we present a comparison between stand
ard computer vision techniques and Deep
Learning approach for automatic metal corrosion (ru
st) detection. For the classic approach, a
classification based on the number of pixels contai
ning specific red components has been
utilized. The code written in Python used OpenCV li
braries to compute and categorize the
images. For the Deep Learning approach, we chose Ca
ffe, a powerful framework developed at
“Berkeley Vision and Learning Center” (BVLC). The
test has been performed by classifying
images and calculating the total accuracy for the t
wo different approaches.
Falling costs with rising quality via hardware innovations and deep learning.
Technical introduction for scanning technologies from Structure-from-Motion (SfM), Range sensing (e.g. Kinect and Matterport) to Laser scanning (e.g. LiDAR), and the associated traditional and deep learning-based processing techniques.
Note! Due to small font size, and bad rendering by SlideShare, better to download the slides locally to your device
Alternative download link for the PDF:
https://www.dropbox.com/s/eclyy45k3gz66ve/proptech_emergingScanningTech.pdf?dl=0
The guide for design wrapper of tensorflow to build model easily.
All the codes above are available on my github.
https://github.com/NySunShine/fusion-net
This slide was presented at the Meeting on Image Recognition and Understanding (MIRU) 2013, Tokyo, Japan. This work was awaded the MIRU Nagao prize. The authors are: I. Sato, M. Ambai, and K. Suzuki (Denso IT Laboratory, Inc.).
Perceptually Lossless Compression with Error Concealment for Periscope and So...sipij
We present a video compression framework that has two key features. First, we aim at achieving perceptually lossless compression for low frame rate videos (6 fps). Four well-known video codecs in the literature have been evaluated and the performance was assessed using four well-known performance metrics. Second, we investigated the impact of error concealment algorithms for handling corrupted pixels
due to transmission errors in communication channels. Extensive experiments using actual videos have been performed to demonstrate the proposed framework.
Journal club done with Vid Stojevic for PointNet:
https://arxiv.org/abs/1612.00593
https://github.com/charlesq34/pointnet
http://stanford.edu/~rqi/pointnet/
Deep learning for Indoor Point Cloud processing. PointNet, provides a unified architecture operating directly on unordered point clouds without voxelisation for applications ranging from object classification, part segmentation, to scene semantic parsing.
Alternative download link:
https://www.dropbox.com/s/ziyhgi627vg9lyi/3D_v2017_initReport.pdf?dl=0
These slides discuss some milestone results in image classification using Deep Convolutional neural network and talks about our results on Obscenity detection in images by using Deep Convolutional neural network and transfer learning on ImageNet models.
Presented by Mr. Dinesh KS
Software Developer, Livares Technologies
Introduction
Object detection is a computer technology related to computer vision and image processing that
deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or
cars) in digital images and videos.
Face detection is a computer technology being used in a variety of applications that identifies
human faces in digital images.
Lecture 5 from the COSC 426 Graduate course on Augmented Reality. This lecture talks about AR development tools and interaction styles. Taught by Mark Billinghurst from the HIT Lab NZ at the University of Canterbury. August 9th 2013
Efficient Point Cloud Pre-processing using The Point Cloud LibraryCSCJournals
Robotics, video games, environmental mapping and medical are some of the fields that use 3D data processing. In this paper we propose a novel optimization approach for the open source Point Cloud Library (PCL) that is frequently used for processing 3D data. Three main aspects of the PCL are discussed: point cloud creation from disparity of color image pairs; voxel grid downsample filtering to simplify point clouds; and passthrough filtering to adjust the size of the point cloud. Additionally, OpenGL shader based rendering is examined. An optimization technique based on CPU cycle measurement is proposed and applied in order to optimize those parts of the pre-processing chain where measured performance is slowest. Results show that with optimized modules the performance of the pre-processing chain has increased 69 fold.
Efficient Point Cloud Pre-processing using The Point Cloud LibraryCSCJournals
Robotics, video games, environmental mapping and medical are some of the fields that use 3D data processing. In this paper we propose a novel optimization approach for the open source Point Cloud Library (PCL) that is frequently used for processing 3D data. Three main aspects of the PCL are discussed: point cloud creation from disparity of color image pairs; voxel grid downsample filtering to simplify point clouds; and passthrough filtering to adjust the size of the point cloud. Additionally, OpenGL shader based rendering is examined. An optimization technique based on CPU cycle measurement is proposed and applied in order to optimize those parts of the pre-processing chain where measured performance is slowest. Results show that with optimized modules the performance of the pre-processing chain has increased 69 fold.
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
Talk from Software Engineering for Machine Learning Workshop (SW4ML) at the Neural Information Processing Systems (NIPS) 2014 conference in Montreal, Canada on 2014-12-13.
Abstract:
Building a real system that incorporates machine learning as a part can be a difficult effort, both in terms of the algorithmic and engineering challenges involved. In this talk I will focus on the engineering side and discuss some of the practical issues we’ve encountered in developing real machine learning systems at Netflix and some of the lessons we’ve learned over time. I will describe our approach for building machine learning systems and how it comes from a desire to balance many different, and sometimes conflicting, requirements such as handling large volumes of data, choosing and adapting good algorithms, keeping recommendations fresh and accurate, remaining responsive to user actions, and also being flexible to accommodate research and experimentation. I will focus on what it takes to put machine learning into a real system that works in a feedback loop with our users and how that imposes different requirements and a different focus than doing machine learning only within a lab environment. I will address the particular software engineering challenges that we’ve faced in running our algorithms at scale in the cloud. I will also mention some simple design patterns that we’ve fond to be useful across a wide variety of machine-learned systems.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Key Trends Shaping the Future of Infrastructure.pdf
Tackling Open Images Challenge (2019)
1. Mobility Technologies Co., Ltd.
Tackling Open Images Challenge
- presented at the 26th Symposium on Sensing via Image
Information
June 12, 2020
Hiroto Honda, Mobility Technologies Co., Ltd.
3. Mobility Technologies Co., Ltd.3
About Me
Hiroto Honda
https://hirotomusiker.github.io/
kaggle name : Schwert
‘Schwert’ = sword in German
R&D of Imaging devices in a Japanese Electronics company
→ DeNA computer vision team →Mobility Technologies
4. Mobility Technologies Co., Ltd.4
Check out my Blog Series!
https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd
Digging into Detectron 2 (object detection)
6. Mobility Technologies Co., Ltd.
Val Data
6
How to Try Kaggle
Test data
→private leaderboard
→public leaderboard
Train Data
How can you maximize your
model’s score on the HIDDEN
test data?
Evaluation metrics are described in the ‘Evaluation’ section - mean
average precision、Dice Coefficient, and so on. Sometimes non-standard
metrics are employed and discussed in the ‘Discussion’ threads.
Cross Validation and Test data
Val Data
Train Data
Val Data
Train Data
7. Mobility Technologies Co., Ltd.7
Open Images Dataset (v5) :
900 million images collected from Flickr
・16M Bounding box annotations of 600 classes on 1.9M images
・Segmentation polygons on 350-class instances
・329 inter-object relationship
Open Images Challenge
https://storage.googleapis.com/openimages/web/challenge.html
https://www.kaggle.com/c/open-images-2019-object-detection/
8. Mobility Technologies Co., Ltd.8
1GB of bounding box data!! (on 500GB of image data)
How Huge is Open Images Dataset ?
12. Mobility Technologies Co., Ltd.12
What an Object Detector Looks Like
https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd
13. Mobility Technologies Co., Ltd.13
Backbone Network
Region Proposal
Network
ROI Head
accuracy written in papers is achieved by managing
more than 100 config parameters
https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd
What an Object Detector Looks Like
14. Mobility Technologies Co., Ltd.14
How It Was Hard to Reproduce YOLOv3 in PyTorch
took months to perfectly reproduce the original repo’s accuracy.
implementation details such as weight init, loss definition, and lr schedule are
critical
https://github.com/DeNA/PyTorch_YOLOv3
blog: https://medium.com/@hirotoschwert/reproducing-training-performance-of-yolov3-in-pytorch-part-0-a792e15ac90d
15. Mobility Technologies Co., Ltd.15
You Should Care Tiny Accuracy Differences
Model Name AP
A: Faster R-CNN Res50 34.8
B: Faster R-CNN Res50 +
Feature Pyramid Network
36.7
C: RetinaNet (single-shot)
Res50 Feature Pyramid
Network + Focal Loss
35.7
NIPS’15
CVPR’17
ICCV’17
model B from a non-official repo with AP=33.0 is less accurate than
the official model A
16. Mobility Technologies Co., Ltd.16
MMDetection (CUHK)
https://github.com/open-mmlab/mmdetection
Detectron 2 (Facebook)
https://github.com/facebookresearch/detectron2
automl/efficientdet (Google)
https://github.com/google/automl/tree/master/efficientdet
tpu/models (Google)
https://github.com/tensorflow/tpu/tree/master/models/official
R. Wightman repos (tf->pytorch, non-official)
https://github.com/rwightman
Popular and Reliable Detection Frameworks
Authors’ official repos are basically recommended
Schwert used
maskrcnn-benchmark for the
competition
17. Mobility Technologies Co., Ltd.
17
takes 1 GPU month to train one model!
How to Choose Approaches for Large-scale Detection Competition
1month
one attempt is so costly...
18. Mobility Technologies Co., Ltd.18
1:Last Year’s solutions
2:Detection papers (CVPR, ICCV…)
3:Benchmark website such as papers with code
are good resources to find:
“An Exclusive Feature that Apparently Contributes to the score” (EFAC)
How to Choose Approaches for Large-scale Detection Competition
19. Mobility Technologies Co., Ltd.19
Looks like ResNet50 works..
OK, let’s try ResNeXt101
...and why not adding Random Cropping_
Example of Bad Experiment
model 1 (baseline)
new
feature
A
new
feature
B
model 2
Important to add / remove one exclusive feature at a time!
21. Mobility Technologies Co., Ltd.21
Schwert’s ranks:
Detection Track: 6th / 558 (Gold) [1] [2]
Segmentation Track: 11th / 193 (Silver) [3]
Relationship Track: 30th / 201 (Silver)
Results of Open Images Competition (2019)
# Team Name # of
members
score
1 MMfruit 5 0.65887
2 imagesearch 7 0.65337
3 Prisms 6 0.64214
4 PFDet 6 0.62221
5 Omni-Detection 3 0.60406
6 Schwert 1 (solo) 0.60231
7 Team 5 5 0.60210
8 pudae 1 (solo) 0.59727
Got a solo gold medal at the first kaggle competition!
22. Mobility Technologies Co., Ltd.22
“An Exclusive Feature that Apparently Contributes to the score” (EFAC)
EFAC examples from the solution writeups of Open Images 2018 [4][5][6]
・class balancing (3rd、5pts↑)
・Ensemble (1st / 3rd、5pts↑)
・voting NMS (1st / 3rd)
・long cosine annealing (2nd)
・parent class expansion
・ResNext 152 + SE (1st, 2nd, 3rd)
class balancing and model ensemble are essential
23. Mobility Technologies Co., Ltd.23
mean Average Precision (mAP) at IoU > 0.5 , avg of 500 classes
1: EVERY class is equal, even if it’s extremely rare.
images including ‘person’ instances:250,000
‘torch’ instances : 18
2: Strict localization is not required.
classification matters...
Evaluation Metrics
24. Mobility Technologies Co., Ltd.24
Method 1:Class Balancing [1]
- Equal probability for a model to encounter a certain class.
- Rare classes: increase sampling rate.
- Non-rare classes: limit number of images.
- Total number of images: 4k x 500 (2M) → efficient training
25. Mobility Technologies Co., Ltd.25
Method 2 : Ensembling Pipeline of Multiple Models [1]
・Baseliene model: ResNeXt152 [7] + Deformable Convnets v2 [8] + Feature
Pyramid Network [9]
・Train different types of models on training data with different seeds
・8 models are ensembled
26. Mobility Technologies Co., Ltd.26
Contribution of each exclusive feature on val and leaderboard accuracies
Ablation Study
Backbone Deformable
Convolutions
Parent
Expansion
Data Size val AP private LB
ResNeXt101 None Inference Time 4k per class 69.8 54.0
ResNeXt101 DCN v2 Inference Time 4k per class 72.2 (+2.4)
ResNeXt152 None Inference Time 4k per class 72.2 (+2.4)
ResNeXt152 None Inference Time 16k per class 72.4 (+2.6)
ResNeXt152 DCN v2 Inference Time 4k per class 73.2 (+3.4) 56.4 (best
single model)
ResNeXt152 None Training Time 4k per class 72.4 (+2.6)*
27. Mobility Technologies Co., Ltd.27
Method 3:Enhanced (Voting) NMS [6]
Non-Maximum Suppression for Model Ensembling
When the multiple boxes from different models are overlapped, the
resulting box earns added confidence scores
28. Mobility Technologies Co., Ltd.28
Result of 8 Model Ensembling
Backbone Deformable
Convolutions
Parent
Expansion
Data Size val AP private LB
ResNeXt152 DCN v2 Inference
Time
4k per class 73.2 (+3.4) 56.4 (best
single
model)
Ensemble of
8 models +
NMS tuned
60.23
~13th
place
6th
place!
33. Mobility Technologies Co., Ltd.33
・Kaggle is a wonderful platform where you can learn cutting-edge computer vision
methods and implementations. Discussion with great kagglers is always fun
・Like research, it’s a tough but fun job to develop (or surpass) the state-of-the-art method
methods
・Choosing a reliable framework is a must for Object Detection competitions
・Understand the past solutions and pick an Exclusive Feature that Apparently Contributes to
the score (EFAC)
Take-Home Messages
34. Mobility Technologies Co., Ltd.34
[1] Hiroto Honda, “The 6th Place Solution for the Open Images 2019 Object Detection Track, ”
presented at ICCVW 2019, https://hirotomusiker.github.io/files/schwert_open_images_6th_solution_v1.pdf
[2] Hiroto Honda, “6th place solution” , discussion in Open Images 2019 Object Detection Track,
https://www.kaggle.com/c/open-images-2019-object-detection/discussion/110953
[3] Hiroto Honda, “11th place solution, discussion in Open Images 2019 Instance Segmentation Track,
https://www.kaggle.com/c/open-images-2019-instance-segmentation/discussion/111351
[4] kivajok, 1st place writeup, https://storage.googleapis.com/openimages/web/challenge.html
[5] Takuya Akiba et al., “PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection
Track”, arXiv:1809.00778
[6] Yuan Gao et al., “Solution for Large-Scale Hierarchical Object Detection Datasets with Incomplete
Annotation and Data Imbalance”, arXiv:1810.06208
[7] Saining Xie et al., “Aggregated Residual Transformations for Deep Neural Networks,” CVPR 2017
[8] Xizhou Zhu et al., “Deformable ConvNets v2: More Deformable, Better Results”, CVPR 2019
[9] Tsung-Yi Lin et al., “Feature Pyramid Networks for Object Detection”, CVPR 2017
* All the photos used in this presentation were taken by Hiroto Honda
References