Diffusion Deformable Model for 4D Temporal Medical Image GenerationBoahKim2
Presentation file for "Diffusion Deformable Model for 4D Temporal Medical Image Generation" presented at the International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2022.
This research report explains several pre-processing approaches for the object recognition task of the CIFAR-10 benchmark data set. The pre-processing approaches include numerical analysis of the color, texture, edges, and shape of the data set’s images. The processed data is then supplied to several classification algorithms. Our highest accuracy on the benchmark dataset was 57.98%.
Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fie...Vincent Sitzmann
Slides for the "generalization" session of our CVPR 2022 tutorial on Neural Fields in Computer Vision.
Neural Fields are an emerging technique to parameterize signals that live in spatial coordinates plus time. They parameterize a signal as a continuous function that maps a space-time coordinate to whatever is at that spacetime coordinate - for instance, the geometry of a 3D scene could be encoded in a function that maps a 3D coordinate to whether that coordinate is occupied or not. A neural field parameterizes that function as a neural network.
In this session, I gave a high-level overview over how we may use neural fields as the output of a variety of inference algorithms, for instance to reconstruct a complete 3D shape from partial observations in the form of a pointcloud, or to reconstruct a 3D scene from only a single image.
You are free to use the slides for any purpose, as long as you keep a note on the slides that acknowledges their source.
Neural Fields database: https://neuralfields.cs.brown.edu/
Tutorial website: https://neuralfields.cs.brown.edu/cvpr22
1일 수천대의 서버에서 발생하는 30~50억건의 Log와 Metric을 처리하는 Planet Mon 을 지탱하는 기술인 Collection(Collectd, NXlog), Transport(Kakfa, Logstash), Log Stream Analytics, Storage(Elasticsearch), Visualization을 구성하는 Architecture에 대해 설명드리고 제가 개발한 Log Stream Analytics 서버들의 구현 기술에 대해 좀더 상세히 설명합니다.
Diffusion Deformable Model for 4D Temporal Medical Image GenerationBoahKim2
Presentation file for "Diffusion Deformable Model for 4D Temporal Medical Image Generation" presented at the International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2022.
This research report explains several pre-processing approaches for the object recognition task of the CIFAR-10 benchmark data set. The pre-processing approaches include numerical analysis of the color, texture, edges, and shape of the data set’s images. The processed data is then supplied to several classification algorithms. Our highest accuracy on the benchmark dataset was 57.98%.
Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fie...Vincent Sitzmann
Slides for the "generalization" session of our CVPR 2022 tutorial on Neural Fields in Computer Vision.
Neural Fields are an emerging technique to parameterize signals that live in spatial coordinates plus time. They parameterize a signal as a continuous function that maps a space-time coordinate to whatever is at that spacetime coordinate - for instance, the geometry of a 3D scene could be encoded in a function that maps a 3D coordinate to whether that coordinate is occupied or not. A neural field parameterizes that function as a neural network.
In this session, I gave a high-level overview over how we may use neural fields as the output of a variety of inference algorithms, for instance to reconstruct a complete 3D shape from partial observations in the form of a pointcloud, or to reconstruct a 3D scene from only a single image.
You are free to use the slides for any purpose, as long as you keep a note on the slides that acknowledges their source.
Neural Fields database: https://neuralfields.cs.brown.edu/
Tutorial website: https://neuralfields.cs.brown.edu/cvpr22
1일 수천대의 서버에서 발생하는 30~50억건의 Log와 Metric을 처리하는 Planet Mon 을 지탱하는 기술인 Collection(Collectd, NXlog), Transport(Kakfa, Logstash), Log Stream Analytics, Storage(Elasticsearch), Visualization을 구성하는 Architecture에 대해 설명드리고 제가 개발한 Log Stream Analytics 서버들의 구현 기술에 대해 좀더 상세히 설명합니다.
발표자: 박태성 (UC Berkeley 박사과정)
발표일: 2017.6.
Taesung Park is a Ph.D. student at UC Berkeley in AI and computer vision, advised by Prof. Alexei Efros.
His research interest lies between computer vision and computational photography, such as generating realistic images or enhancing photo qualities. He received B.S. in mathematics and M.S. in computer science from Stanford University.
개요:
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.
However, for many tasks, paired training data will not be available.
We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples.
Our goal is to learn a mapping G: X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa).
Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc.
Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
이번 논문은 요즘 핫한 Diffusion을 처음으로 유행시킨 Denoising Diffusion Probabilistic Models (DDPM) 입니다. ICML 2015년에 처음 제안된 Diffusion의 여러 실용적인 측면들을 멋지게 해결하여 그 유행의 시작을 알린 논문인데요, Generative Model의 여러 분야와 Diffusion, 그리고 DDPM에서는 무엇이 바뀌었는지 알아보도록 하겠습니다.
논문 링크: https://arxiv.org/abs/2006.11239
영상 링크: https://youtu.be/1j0W_lu55nc
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/auvizsystems/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nagesh Gupta, Founder and CEO of Auviz Systems, presents the "Semantic Segmentation for Scene Understanding: Algorithms and Implementations" tutorial at the May 2016 Embedded Vision Summit.
Recent research in deep learning provides powerful tools that begin to address the daunting problem of automated scene understanding. Modifying deep learning methods, such as CNNs, to classify pixels in a scene with the help of the neighboring pixels has provided very good results in semantic segmentation. This technique provides a good starting point towards understanding a scene. A second challenge is how such algorithms can be deployed on embedded hardware at the performance required for real-world applications. A variety of approaches are being pursued for this, including GPUs, FPGAs, and dedicated hardware.
This talk provides insights into deep learning solutions for semantic segmentation, focusing on current state of the art algorithms and implementation choices. Gupta discusses the effect of porting these algorithms to fixed-point representation and the pros and cons of implementing them on FPGAs.
A presentation about the development of the ideas from the autoencoder to the Stable Diffusion text-to-image model.
Models covered: autoencoder, VAE, VQ-VAE, VQ-GAN, latent diffusion, and stable diffusion.
Introduction to Deep Learning, Keras, and TensorFlowSri Ambati
This meetup was recorded in San Francisco on Jan 9, 2019.
Video recording of the session can be viewed here: https://youtu.be/yG1UJEzpJ64
Description:
This fast-paced session starts with a simple yet complete neural network (no frameworks), followed by an overview of activation functions, cost functions, backpropagation, and then a quick dive into CNNs. Next, we'll create a neural network using Keras, followed by an introduction to TensorFlow and TensorBoard. For best results, familiarity with basic vectors and matrices, inner (aka "dot") products of vectors, and rudimentary Python is definitely helpful. If time permits, we'll look at the UAT, CLT, and the Fixed Point Theorem. (Bonus points if you know Zorn's Lemma, the Well-Ordering Theorem, and the Axiom of Choice.)
Oswald's Bio:
Oswald Campesato is an education junkie: a former Ph.D. Candidate in Mathematics (ABD), with multiple Master's and 2 Bachelor's degrees. In a previous career, he worked in South America, Italy, and the French Riviera, which enabled him to travel to 70 countries throughout the world.
He has worked in American and Japanese corporations and start-ups, as C/C++ and Java developer to CTO. He works in the web and mobile space, conducts training sessions in Android, Java, Angular 2, and ReactJS, and he writes graphics code for fun. He's comfortable in four languages and aspires to become proficient in Japanese, ideally sometime in the next two decades. He enjoys collaborating with people who share his passion for learning the latest cool stuff, and he's currently working on his 15th book, which is about Angular 2.
Slides by Víctor Garcia about the paper:
Reed, Scott, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. "Generative adversarial text to image synthesis." ICML 2016.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
발표자: 박태성 (UC Berkeley 박사과정)
발표일: 2017.6.
Taesung Park is a Ph.D. student at UC Berkeley in AI and computer vision, advised by Prof. Alexei Efros.
His research interest lies between computer vision and computational photography, such as generating realistic images or enhancing photo qualities. He received B.S. in mathematics and M.S. in computer science from Stanford University.
개요:
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.
However, for many tasks, paired training data will not be available.
We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples.
Our goal is to learn a mapping G: X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa).
Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc.
Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
이번 논문은 요즘 핫한 Diffusion을 처음으로 유행시킨 Denoising Diffusion Probabilistic Models (DDPM) 입니다. ICML 2015년에 처음 제안된 Diffusion의 여러 실용적인 측면들을 멋지게 해결하여 그 유행의 시작을 알린 논문인데요, Generative Model의 여러 분야와 Diffusion, 그리고 DDPM에서는 무엇이 바뀌었는지 알아보도록 하겠습니다.
논문 링크: https://arxiv.org/abs/2006.11239
영상 링크: https://youtu.be/1j0W_lu55nc
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/auvizsystems/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nagesh Gupta, Founder and CEO of Auviz Systems, presents the "Semantic Segmentation for Scene Understanding: Algorithms and Implementations" tutorial at the May 2016 Embedded Vision Summit.
Recent research in deep learning provides powerful tools that begin to address the daunting problem of automated scene understanding. Modifying deep learning methods, such as CNNs, to classify pixels in a scene with the help of the neighboring pixels has provided very good results in semantic segmentation. This technique provides a good starting point towards understanding a scene. A second challenge is how such algorithms can be deployed on embedded hardware at the performance required for real-world applications. A variety of approaches are being pursued for this, including GPUs, FPGAs, and dedicated hardware.
This talk provides insights into deep learning solutions for semantic segmentation, focusing on current state of the art algorithms and implementation choices. Gupta discusses the effect of porting these algorithms to fixed-point representation and the pros and cons of implementing them on FPGAs.
A presentation about the development of the ideas from the autoencoder to the Stable Diffusion text-to-image model.
Models covered: autoencoder, VAE, VQ-VAE, VQ-GAN, latent diffusion, and stable diffusion.
Introduction to Deep Learning, Keras, and TensorFlowSri Ambati
This meetup was recorded in San Francisco on Jan 9, 2019.
Video recording of the session can be viewed here: https://youtu.be/yG1UJEzpJ64
Description:
This fast-paced session starts with a simple yet complete neural network (no frameworks), followed by an overview of activation functions, cost functions, backpropagation, and then a quick dive into CNNs. Next, we'll create a neural network using Keras, followed by an introduction to TensorFlow and TensorBoard. For best results, familiarity with basic vectors and matrices, inner (aka "dot") products of vectors, and rudimentary Python is definitely helpful. If time permits, we'll look at the UAT, CLT, and the Fixed Point Theorem. (Bonus points if you know Zorn's Lemma, the Well-Ordering Theorem, and the Axiom of Choice.)
Oswald's Bio:
Oswald Campesato is an education junkie: a former Ph.D. Candidate in Mathematics (ABD), with multiple Master's and 2 Bachelor's degrees. In a previous career, he worked in South America, Italy, and the French Riviera, which enabled him to travel to 70 countries throughout the world.
He has worked in American and Japanese corporations and start-ups, as C/C++ and Java developer to CTO. He works in the web and mobile space, conducts training sessions in Android, Java, Angular 2, and ReactJS, and he writes graphics code for fun. He's comfortable in four languages and aspires to become proficient in Japanese, ideally sometime in the next two decades. He enjoys collaborating with people who share his passion for learning the latest cool stuff, and he's currently working on his 15th book, which is about Angular 2.
Slides by Víctor Garcia about the paper:
Reed, Scott, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. "Generative adversarial text to image synthesis." ICML 2016.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Arbitrary style transfer in real time with adaptive instance normalization
1. Arbitrary Style Transfer in Real-time
with Adaptive Instance Normalization
Neural Style Transfer
Xun Huang and Serge Belongie
Department of Computer Science & Cornell Tech, Cornell University
3. Previous Work Slow & Arbitrary Style Transfer
Loss Network
(VGG)
Style
Content
Output
Content Loss
Style Loss
Gatys et al., CVPR 2016
Li and Wand, CVPR 2016
Up to minutes
Optimization-based Framework - Flexibility
4. Previous Work Fast & Restricted Style Transfer
Fast Feed Forward
Network
Content
Output
Ulyanov et al., ICML 2016
Johnson et al., ECCV 2016
Li and Wand, ECCV 2016
Model A
Fast Feed Forward
Network
Model B
Fast Feed Forward
Network
Model C
>20 FPS
Feed-forward Approaches – Speed
6. Inspired by Batch Normalization Vs. Instance Normalization
Batch Normalization (BN) Instance Normalization (IN)
데이터로부터 학습 데이터로부터 학습
7. Inspired by Conditional Instance Normalization
Conditional Instance Normalization (CIN)
Style dependent parameter vectors
s: style label
Input
Activation
Spatial dimension에 대
해 normalization
Style dependent
parameter vector에 의
해 scaled and shifted
새로운 Style을 추가할 때
training을 다시 해야 함
Arbitrary Style Transfer (X)
Individual model과 비교
하였을 때 qualitatively
comparable
장점
단점
8. Inspired by Instance Normalization에 대한 새로운 해석 – 왜 Style Transfer에만 좋은 결과를 보이는가?
content normalization
content image의 contrast가
일정하기 때문이다.
IN은 content image만 normalize
하는 것이 아니며 feature space
에서 일어난다.
contrast normalization이 아니라
affine parameter의 영향일 것이다.
affine parameter로
image의 style을 control
할 수 있을 것이라 가정
?
9. Inspired by Instance Normalization에 대한 새로운 해석 – 실험 및 증명
IN이 BN보다 빠르게 converge 한다.
모든 training image의 content를 같은 contrast를 가
지도록 histogram equalization
여전히 IN이 BN에 비해 빠르게
converge 함.
content normalization 때문에 style transfer
가 잘 되는 것이 아니다.
모든 training image를 같은 style로 normalize하였더
니 BN과 IN의 차이가 줄었다.
(Target Style과는 다르도록 함)
IN이 Style Normalization효과를 준다
는 사실을 입증
BN이 덜 수렴하는 이유는 batch단위로
style normalization을 하였기 때문
Training
Image
Improved Texture
Networks [52]
(BN)
Improved Texture
Networks [52]
(IN)
Training
Image
Improved Texture
Networks [52]
(BN)
Improved Texture
Networks [52]
(IN)
Histogram
Equalization
Training
Image
Improved Texture
Networks [52]
(BN)
Improved Texture
Networks [52]
(IN)
Pretrained
style
transfer
network [24]
content
normalization style
normalization
10. Method Adaptive Instance Normalization
content input
style input
style input으로 affine
parameter를 계산
normalized content image
affine parameter로 각 style 별로 다르
게 normalization하여 style을 구분
12. Method Training
content
image
style
image
MS-COCO Data (8만개)
WikiArt (약 80,000개)
Crop
(256x256)
Rescale
(512)
Pre-processing
짧은 면이 512
가 되도록
random하게
crop
Content Loss
target과 output image의
feature간 Euclidean distance
AdaIN의 output인 t를 content
target으로 함 convergence
가 약간 더 빠름
Style Loss
relu1_1, relu2_1, relu3_1, relu4_1에서의 mean과 Std. Dev. 차이
Style feature의 mean과 standard deviation만 이용하기 때문에 mean과 standard deviation
으로 loss를 구함
Gram Matrix와 결과가 유사
13. Results Qualitative Examples
Arbitrary Style
Transfer
Single style
Fast
Flexible Style
Slow
Flexible style
Medium
Single style model이므로 test
style이 학습되어 있는 케이스
전반적으로 나쁜 결과를 보임
유사한 결과
결과가 약간 떨어짐
14. Results Quantitative Evaluations & Speed Analysis
Flexible Style
Slow
Single style
Fast
Arbitrary Style
Transfer
Arbitrary Style Transfer
Flexible Style Slow
Single style Fast
Flexible style Medium
32 styles Fast
15. Experiments [AdaIN Vs. Concatenation] & [BN or IN in the Decoder]
AdaIN
Concatenation
Conc
at
Style image의
object contour
가 보임
Style loss는 낮
지만 content
loss가 높음
Decoder에서
사용시 IN이
image를 single
style로
normalize함