Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob L Menick, Sebastian Borgeaud, Andy Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikołaj Bińkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, Karén Simonyan, "Flamingo: a Visual Language Model for Few-Shot Learning" NeurIPS2022
https://proceedings.neurips.cc/paper_files/paper/2022/hash/960a172bc7fbf0177ccccbb411a7d800-Abstract-Conference.html
論文紹介:Parameter-Efficient Transfer Learning for NLPToru Tamaki
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly, "Parameter-Efficient Transfer Learning for NLP" ICML2019
http://proceedings.mlr.press/v97/houlsby19a.html
文献紹介:X3D: Expanding Architectures for Efficient Video RecognitionToru Tamaki
Christoph Feichtenhofer; X3D: Expanding Architectures for Efficient Video Recognition , Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 203-213
https://openaccess.thecvf.com/content_CVPR_2020/html/Feichtenhofer_X3D_Expanding_Architectures_for_Efficient_Video_Recognition_CVPR_2020_paper.html
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...Kazuki Adachi
Selvaraju, Ramprasaath R., et al. "Grad-cam: Visual explanations from deep networks via gradient-based localization." The IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618-626
論文紹介:Parameter-Efficient Transfer Learning for NLPToru Tamaki
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly, "Parameter-Efficient Transfer Learning for NLP" ICML2019
http://proceedings.mlr.press/v97/houlsby19a.html
文献紹介:X3D: Expanding Architectures for Efficient Video RecognitionToru Tamaki
Christoph Feichtenhofer; X3D: Expanding Architectures for Efficient Video Recognition , Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 203-213
https://openaccess.thecvf.com/content_CVPR_2020/html/Feichtenhofer_X3D_Expanding_Architectures_for_Efficient_Video_Recognition_CVPR_2020_paper.html
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...Kazuki Adachi
Selvaraju, Ramprasaath R., et al. "Grad-cam: Visual explanations from deep networks via gradient-based localization." The IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618-626
論文紹介:Learning With Neighbor Consistency for Noisy LabelsToru Tamaki
Ahmet Iscen, Jack Valmadre, Anurag Arnab, Cordelia Schmid, "Learning With Neighbor Consistency for Noisy Labels" CVPR2022
https://openaccess.thecvf.com/content/CVPR2022/html/Iscen_Learning_With_Neighbor_Consistency_for_Noisy_Labels_CVPR_2022_paper.html
論文紹介:Learn2Augment: Learning to Composite Videos for Data Augmentation in Act...Toru Tamaki
Shreyank N Gowda, Marcus Rohrbach, Frank Keller, Laura Sevilla-Lara, "Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition" ECCV2022
https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136910234.pdf
https://arxiv.org/abs/2206.04790
文献紹介:Token Shift Transformer for Video ClassificationToru Tamaki
Hao Zhang, Yanbin Hao, Chong-Wah Ngo, Token Shift Transformer for Video Classification, ACM MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021 Pages 917–925https://doi.org/10.1145/3474085.3475272
http://vireo.cs.cityu.edu.hk/papers/Hao_MM2021.pdf
http://arxiv.org/abs/2108.02432
https://dl.acm.org/doi/abs/10.1145/3474085.3475272
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentationtaeseon ryu
CutLER은 라벨 없이 객체 탐지와 분할 모델을 훈련시키는 간단한 방법입니다. 자가 지도 학습 모델의 객체를 찾는 능력을 이용하고, 이를 강화하여 최첨단 위치 지정 모델을 사람의 라벨 없이 훈련시킵니다. CutLER은 먼저 MaskCut 방법을 사용하여 이미지에서 여러 객체의 대략적인 마스크를 생성한 다음, 이러한 마스크에 대해 견고한 손실 함수를 사용하여 탐지기를 학습시킵니다. 모델의 예측 결과로 자가 훈련을 통해 성능을 더욱 향상시킵니다. 이전 연구에 비해 CutLER은 더 간단하며 다양한 탐지 아키텍처와 호환되고 여러 객체를 탐지할 수 있습니다. 또한 CutLER은 무감독 탐지기로서 다양한 도메인의 벤치마크에서 AP50 성능을 2.7배 이상 향상시킵니다.
오늘 논문 리뷰를 위해 자연어처리 조해창님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
生成式對抗網路 (Generative Adversarial Network, GAN) 顯然是深度學習領域的下一個熱點,Yann LeCun 說這是機器學習領域這十年來最有趣的想法 (the most interesting idea in the last 10 years in ML),又說這是有史以來最酷的東西 (the coolest thing since sliced bread)。生成式對抗網路解決了什麼樣的問題呢?在機器學習領域,回歸 (regression) 和分類 (classification) 這兩項任務的解法人們已經不再陌生,但是如何讓機器更進一步創造出有結構的複雜物件 (例如:圖片、文句) 仍是一大挑戰。用生成式對抗網路,機器已經可以畫出以假亂真的人臉,也可以根據一段敘述文字,自己畫出對應的圖案,甚至還可以畫出二次元人物頭像 (左邊的動畫人物頭像就是機器自己生成的)。本課程希望能帶大家認識生成式對抗網路這個深度學習最前沿的技術。
論文紹介:Learning With Neighbor Consistency for Noisy LabelsToru Tamaki
Ahmet Iscen, Jack Valmadre, Anurag Arnab, Cordelia Schmid, "Learning With Neighbor Consistency for Noisy Labels" CVPR2022
https://openaccess.thecvf.com/content/CVPR2022/html/Iscen_Learning_With_Neighbor_Consistency_for_Noisy_Labels_CVPR_2022_paper.html
論文紹介:Learn2Augment: Learning to Composite Videos for Data Augmentation in Act...Toru Tamaki
Shreyank N Gowda, Marcus Rohrbach, Frank Keller, Laura Sevilla-Lara, "Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition" ECCV2022
https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136910234.pdf
https://arxiv.org/abs/2206.04790
文献紹介:Token Shift Transformer for Video ClassificationToru Tamaki
Hao Zhang, Yanbin Hao, Chong-Wah Ngo, Token Shift Transformer for Video Classification, ACM MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021 Pages 917–925https://doi.org/10.1145/3474085.3475272
http://vireo.cs.cityu.edu.hk/papers/Hao_MM2021.pdf
http://arxiv.org/abs/2108.02432
https://dl.acm.org/doi/abs/10.1145/3474085.3475272
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentationtaeseon ryu
CutLER은 라벨 없이 객체 탐지와 분할 모델을 훈련시키는 간단한 방법입니다. 자가 지도 학습 모델의 객체를 찾는 능력을 이용하고, 이를 강화하여 최첨단 위치 지정 모델을 사람의 라벨 없이 훈련시킵니다. CutLER은 먼저 MaskCut 방법을 사용하여 이미지에서 여러 객체의 대략적인 마스크를 생성한 다음, 이러한 마스크에 대해 견고한 손실 함수를 사용하여 탐지기를 학습시킵니다. 모델의 예측 결과로 자가 훈련을 통해 성능을 더욱 향상시킵니다. 이전 연구에 비해 CutLER은 더 간단하며 다양한 탐지 아키텍처와 호환되고 여러 객체를 탐지할 수 있습니다. 또한 CutLER은 무감독 탐지기로서 다양한 도메인의 벤치마크에서 AP50 성능을 2.7배 이상 향상시킵니다.
오늘 논문 리뷰를 위해 자연어처리 조해창님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
生成式對抗網路 (Generative Adversarial Network, GAN) 顯然是深度學習領域的下一個熱點,Yann LeCun 說這是機器學習領域這十年來最有趣的想法 (the most interesting idea in the last 10 years in ML),又說這是有史以來最酷的東西 (the coolest thing since sliced bread)。生成式對抗網路解決了什麼樣的問題呢?在機器學習領域,回歸 (regression) 和分類 (classification) 這兩項任務的解法人們已經不再陌生,但是如何讓機器更進一步創造出有結構的複雜物件 (例如:圖片、文句) 仍是一大挑戰。用生成式對抗網路,機器已經可以畫出以假亂真的人臉,也可以根據一段敘述文字,自己畫出對應的圖案,甚至還可以畫出二次元人物頭像 (左邊的動畫人物頭像就是機器自己生成的)。本課程希望能帶大家認識生成式對抗網路這個深度學習最前沿的技術。
stackconf 2022: Are all programming languages in english?NETWAYS
After some time searching for the best programming language for my projects, I wondered: is there a programming language that does not use any English keyword? Of course, the short answer is no, but where do all the other non-English-based programming languages hide? How did we end up using that idiom for writing code? Let’s explore these questions during this talk!
Applying your Convolutional Neural NetworksDatabricks
Part 3 of the Deep Learning Fundamentals Series, this session starts with a quick primer on activation functions, learning rates, optimizers, and backpropagation. Then it dives deeper into convolutional neural networks discussing convolutions (including kernels, local connectivity, strides, padding, and activation functions), pooling (or subsampling to reduce the image size), and fully connected layer. The session also provides a high-level overview of some CNN architectures. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Lecture on 18 December 2018
Role of Cryptography in Blockchain
RSA and SHA
Blockchain for Beginners
Elective course from the Faculty of Information Technology, Thai - Nichi Institute of Technology, Bangkok for undergraduate students.
#BlockchainTNI2018
Jordan Wiens & Peter LaFosse
Modern binary analysis, whether for discovering vulnerabilities or analyzing malware needs automation to deal with the volume of code under inspection. And yet, while Intermediate Languages (ILs) have been used for decades in compiler design and implementation, too few reverse engineers have any experience with them even though many reverse engineering tools (Binary Ninja, Ghidra, IDA) are built on top of ILs. Given that, it's time to demystify this space and make it accessible beyond just computer scientists and researchers. There's many potentially unfamiliar concepts related to ILs: single-static assignment, value-set analysis, three argument form versus tree-based designs, and others. But what matters is how these ILs can help you build better binary analysis tools. This talk not only gives you an overview of existing ILs used in reverse engineering, but more importantly, shows you how your tooling can benefit from them. From cross-platform analysis (follow a botnet from an x86-64 desktop to a mobile arm, to an embedded MIPS), to leveraging existing data-flow capabilities that brings some of the benefits both dynamic and static analysis together, this talk will demonstrate several examples of plugins that leverage ILs to improve your ability to automatically reason over compiled code.
Deep Learning with Audio Signals: Prepare, Process, Design, ExpectKeunwoo Choi
Is deep learning Alchemy? No! But it heavily relies on tips and tricks, a set of common wisdom that probably works for similar problems. In this talk, I’ll introduce what the audio/music research societies have discovered while playing with deep learning when it comes to audio classification and regression -- how to prepare the audio data and preprocess them, how to design the networks (or choose which one to steal from), and what we can expect as a result.
This talk will shed some light into the intermediate language that is used inside the Hex-Rays Decompiler. The microcode is simple yet powerful to represent real world programs. We publish it and give programmatic access to it from C++.
Dynamic Wounds on Animated Characters in UE4Michał Kłoś
Talk describes the challenge of combining rich VFX with interaction of animated characters. During the presentation will give an overview on setup of materials, scripting and programming. Profiling and background of technical constraints will also be brought up. This talk covers technical art topics, but artists and programmers will also find interesting details. Best for people familiar with Unreal Engine 4 or for those who want to see how to create current state of the art VFX with UE4.
論文紹介:Multi-criteria Token Fusion with One-step-ahead Attention for Efficient ...Toru Tamaki
Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim, "Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers" arXiv2024
https://arxiv.org/abs/2403.10030
論文紹介:ArcFace: Additive Angular Margin Loss for Deep Face RecognitionToru Tamaki
Jiankang Deng, Jia Guo, Niannan Xue, Stefanos Zafeiriou , "ArcFace: Additive Angular Margin Loss for Deep Face Recognition" CVPR2019
https://openaccess.thecvf.com/content_CVPR_2019/html/Deng_ArcFace_Additive_Angular_Margin_Loss_for_Deep_Face_Recognition_CVPR_2019_paper.html
論文紹介:Deep Occlusion-Aware Instance Segmentation With Overlapping BiLayersToru Tamaki
Lei Ke, Yu-Wing Tai, Chi-Keung Tang, "Deep Occlusion-Aware Instance Segmentation With Overlapping BiLayers" CVPR2021
https://openaccess.thecvf.com/content/CVPR2021/html/Ke_Deep_Occlusion-Aware_Instance_Segmentation_With_Overlapping_BiLayers_CVPR_2021_paper.html
論文紹介:Automated Classification of Model Errors on ImageNetToru Tamaki
Momchil Peychev, Mark Müller, Marc Fischer, Martin Vechev, " Automated Classification of Model Errors on ImageNet", NeurIPS2023
https://proceedings.neurips.cc/paper_files/paper/2023/hash/7480ed13740773505262791131c12b89-Abstract-Conference.html
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex ScenesToru Tamaki
Henghui Ding, Chang Liu, Shuting He, Xudong Jiang, Philip H.S. Torr, Song Bai, " MOSE: A New Dataset for Video Object Segmentation in Complex Scenes " ICCV2023
論文紹介:Tracking Anything with Decoupled Video SegmentationToru Tamaki
Ho Kei Cheng, Seoung Wug Oh, Brian Price, Alexander Schwing, Joon-Young Lee, " Tracking Anything with Decoupled Video Segmentation " ICCV2023
https://openaccess.thecvf.com/content/ICCV2023/html/Cheng_Tracking_Anything_with_Decoupled_Video_Segmentation_ICCV_2023_paper.html
論文紹介:Real-Time Evaluation in Online Continual Learning: A New HopeToru Tamaki
Yasir Ghunaim, Adel Bibi, Kumail Alhamoud, Motasem Alfarra, Hasan Abed Al Kader Hammoud, Ameya Prabhu, Philip H.S. Torr, Bernard Ghanem, " Real-Time Evaluation in Online Continual Learning: A New Hope " CVPR2023
https://openaccess.thecvf.com/content/CVPR2023/html/Ghunaim_Real-Time_Evaluation_in_Online_Continual_Learning_A_New_Hope_CVPR_2023_paper.html
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...Toru Tamaki
Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas, " PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation " CVPR2017
https://openaccess.thecvf.com/content_cvpr_2017/html/Qi_PointNet_Deep_Learning_CVPR_2017_paper.html
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
論文紹介:Flamingo: a Visual Language Model for Few-Shot Learning
1. 🦩 Flamingo: a Visual Language
Model for Few-Shot Learning
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc,
Arthur Mensch, Katherine Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi,
Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob L Menick, Sebastian
Borgeaud, Andy Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikołaj Bińkowski, Ricardo Barreira,
Oriol Vinyals, Andrew Zisserman, Karén Simonyan, NeurIPS2022
2023/05/11
2. nFlamingo
• DeepMind Visual Language Model
• Large Language Model LLM
•
n
•
•
• 3B/9B/80B
Input Prompt Completion
This is a
chinchilla. They
are mainly found
in Chile.
This is a shiba.
They are very
popular in Japan.
This is !
a flamingo. They
are found in the
Caribbean and
South America.
3. Visual Question Answering
This is a
chinchilla. They
are mainly found
in Chile.
This is a shiba.
They are very
popular in Japan.
This is !
a flamingo. They
are found in the
Caribbean and
South America.
What is the title
of this painting?
Answer: The
Hallucinogenic
Toreador.
Where is this
painting
displayed?
Answer: Louvres
Museum, Paris.
What is the name
of the city where
this was painted?
Answer:
! Arles.
Output:
"Underground"
Output:
"Congress"
Output: ! "Soulomes"
2+1=3 5+6=11 ! 3x6=18
Output: A
propaganda
poster depicting a
cat dressed as
French emperor
Napoleon holding
a piece of cheese.
Output: A pink
room with a
flamingo pool
float.
Output: !
A portrait of
Salvador Dali
with a robot
head.
4. Visual Dialog
What happens to
the man after
hitting the ball?
Answer:
! he falls down.
This is a picture of two teddy
bears on the moon.
What are they doing?
They are having a
conversation.
What object are they using?
It looks like a computer.
Is this surprising?
Yes, it is surprising.
Why is this picture surprising
to you?
I think it is surprising because
teddy bears are not usually
found on the moon.
What is the common thing
about these three images?
They are all flamingos.
What is the difference be-
tween these three images?
The first one is a cartoon, the
second one is a real flamingo,
and the third one is a 3D
model of a flamingo.
This is an apple with a sticker
on it.
What does the sticker say?
The sticker says "iPod".
Where is the photo taken?
It looks like it’s taken in a
backyard.
Do you think it is printed or
handwritten?
It looks like it’s handwritten.
What color is the sticker?
It’s white.
This is a cityscape. It looks
like Chicago.
What makes you think this is
Chicago?
I think it’s Chicago because
of the Shedd Aquarium in the
background.
What about this one? Which
city is this and what famous
landmark helped you recog-
nise the city?
This is Tokyo. I think it’s
Tokyo because of the Tokyo
Tower.
Figure 1: Selected examples of inputs and outputs obtained from Flamingo-80B. Flamingo
5. Vision
Encoder
❄
Perceiver
Resampler 1st LM block
a very serious cat.
Pretrained and frozen
❄
Vision
Encoder
❄
Perceiver
Resampler
1st GATED XATTN-DENSE
Interleaved visual/text data
This is a very cute dog. This is
Trained from scratch
n-th GATED XATTN-DENSE
n-th LM block ❄
…
Output: text
❄
<image> This is a very cute dog.<image> This is
Processed text
Figure 3: Flamingo architecture overview. Flamingo is a family of visual language models (VLMs)
that take as input visual data interleaved with text and produce free-form text as output.
6. Perceiver Resampler
Learned
latent
queries
Vision
Encoder
Vision
Encoder
Vision
Encoder
✕ num_layers
t=0 t=1 t=2 +
+
+
Xf
flatten
FFW
K=V=[Xf
,X]
Attention
Q=[X]
def perceiver_resampler(
x_f, # The [T, S, d] visual features (T=time, S=space)
time_embeddings, # The [T, 1, d] time pos embeddings.
x, # R learned latents of shape [R, d]
num_layers, # Number of layers
):
"""The Perceiver Resampler model."""
# Add the time position embeddings and flatten.
x_f = x_f + time_embeddings
x_f = flatten(x_f) # [T, S, d] -> [T * S, d]
# Apply the Perceiver Resampler layers.
for i in range(num_layers):
# Attention.
x = x + attention_i(q=x, kv=concat([x_f, x]))
# Feed forward.
x = x + ffw_i(x)
return x
Time
+
+
X
Vision Encoder
Learned latent queries
[Jaegle+, ICML2021]
7. GATED XATTN-DENSE
nLLM Visual input
•
self attention
FFW
Q=[Y]
FFW
+
+
tanh gating
+
+
tanh gating
GATED XATTN-DENSE
LM layer ❄
X
K=V=[X]
cross attention
K=V=[Y] Q=[Y]
❄
❄
Y
Language
input
def gated_xattn_dense(
y, # input language features
x, # input visual features
alpha_xattn, # xattn gating parameter – init at 0.
alpha_dense, # ffw gating parameter – init at 0.
):
"""Applies a GATED XATTN-DENSE layer."""
# 1. Gated Cross Attention
y = y + tanh(alpha_xattn) * attention(q=y, kv=x)
# 2. Gated Feed Forward (dense) Layer
y = y + tanh(alpha_dense) * ffw(y)
# Regular self-attention + FFW on language
y = y + frozen_attention(q=y, kv=y)
y = y + frozen_ffw(y)
return y # output visually informed language features
Vision
input
Y
X
…
…
8. Frozen
nVision Encoder
• Normalizer-Free ResNet [Brock, arXiv2021]
•
• ALIGN[Jia+, ICML2021]
• CLIP [Radford+, ICML2021]
nLLM
• Chinchilla
[Hoffmann+, arXiv2022]
• Transformer LLM
•
• Frozen 1.4/7/70B
Chinchilla
Vision
Encoder
❄
Perceiver
Resampler 1st LM block
a very serious cat.
Pretrained and frozen
❄
Vision
Encoder
❄
Perceiver
Resampler
1st GATED XATTN-DENSE
Interleaved visual/text data
This is a very cute dog. This is
Trained from scratch
n-th GATED XATTN-DENSE
n-th LM block ❄
…
Output: text
❄
<image> This is a very cute dog.<image> This is
Processed text
10. n
• VQA TextVQA [Singh+, CVPR2019], NextQA [Xiao+, CVPR2021]
• Visual Dialog VisDial [Das+, CVPR2017]
• Vision and text Classification HatefulMemes [Kiela+, NeurIPS2020]
• 11
n shot
• Flamingo 3B/9B/80B Zero-shot Few-shot
• Few-shot
• Fine-tuning
• Fine-tuning
•
Vision Encoder
Input Prompt Completion
This is a
chinchilla. They
are mainly found
in Chile.
This is a shiba.
They are very
popular in Japan.
This is !
a flamingo. They
are found in the
Caribbean and
South America.
What is the title Where is this
What is the name
11. forward MLP is 4D. L: number of layers, D: transformer hidden size, H: number of heads, Act.:
FFW activation, Sq. ReLU: Squared ReLU [104].
16. (a) Attention tanh gating (b) FFW tanh gating.
Figure 6: Evolution of the absolute value of the tanh gating at different layers of Flamingo-3B.
<BOS> Cute pics of my pets!<EOC><image>My puppy sitting in the grass. <EOC><image>My cat looking very dignified.<EOC>
Masked cross attention
<BOS>Cute pics of my pets!<EOC><image>My puppy sitting in the grass.<EOC><image> My cat looking very dignified.<EOC>
tokenization
Vision
Encoder
Perceiver
Resampler
Vision
Encoder
Perceiver
Resampler
K=V=[X]
Q
Image 1 Image 2
Processed text: <image> tags are inserted and special tokens are added
Cute pics of my pets!
My puppy sitting in the
grass.
My cat looking very
dignified.
Input webpage
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
Figure 7: Interleaved visual data and text support. Given text interleaved with images/videos,
e.g. coming from a webpage, we first process the text by inserting <image> tags at the locations of
the visual data in the text as well as special tokens (<BOS> for “beginning of sequence” or <EOC> for
“end of chunk”). Images are processed independently by the Vision Encoder and Perceiver Resampler
to extract visual tokens. At a given text token, the model only cross-attends to the visual tokens
corresponding to the last preceding image/video. indicates which image/video a text token can
attend or 0 when no image/video is preceding. In practice, this selective cross-attention is achieved
through masking – illustrated here with the dark blue entries (unmasked/visible) and light blue entries
17. Fine-tuning
Method VQAV2 COCO VATEX VizWiz MSRVTTQA VisDial YouCook2 TextVQA HatefulMemes
test-dev test-std test test test-dev test-std test valid test-std valid valid test-std test seen
32 shots 67.6 - 113.8 65.1 49.8 - 31.0 56.8 - 86.8 36.0 - 70.0
Fine-tuned 82.0 82.1 138.1 84.2 65.7 65.4 47.4 61.8 59.7 118.6 57.1 54.1 86.6
81.3†
81.3†
149.6†
81.4†
57.2†
60.6†
46.8 75.2 75.4†
138.7 54.7 73.7 84.6†
SotA
[133] [133] [119] [153] [65] [65] [51] [79] [123] [132] [137] [84] [152]
Table 2: Comparison to SotA when fine-tuning Flamingo. We fine-tune Flamingo on all nine
tasks where Flamingo does not achieve SotA with few-shot learning. Flamingo sets a new SotA on
five of them, outperfoming methods (marked with †) that use tricks such as model ensembling or
domain-specific metric optimisation (e.g., CIDEr optimisation).
Ablated Flamingo-3B Changed Param. Step COCO OKVQA VQAv2 MSVDQA VATEX Overall
setting original value value count # time # CIDEr" top1" top1" top1" CIDEr" score"
Flamingo-3B model 3.2B 1.74s 86.5 42.1 55.8 36.3 53.4 70.7
(i) Training data All data
w/o Video-Text pairs 3.2B 1.42s 84.2 43.0 53.9 34.5 46.0 67.3
w/o Image-Text pairs 3.2B 0.95s 66.3 39.2 51.6 32.0 41.6 60.9
Image-Text pairs! LAION 3.2B 1.74s 79.5 41.4 53.5 33.9 47.6 66.4
w/o M3W 3.2B 1.02s 54.1 36.5 52.7 31.4 23.5 53.4
(ii) Optimisation Accumulation Round Robin 3.2B 1.68s 76.1 39.8 52.1 33.2 40.8 62.9
(iii) Tanh gating 3 7 3.2B 1.74s 78.4 40.5 52.9 35.9 47.5 66.5