(Accepted and presented in Symposium on Edge Computing, Seattle, Oct 2018)
We show how edge-based early discard of data can greatly improve the productivity of a human expert in assembling a large training set for machine learning. This task may span multiple data sources that are live (e.g., video cameras) or archival (data sets dispersed over the Internet). The critical resource here is the attention of the expert. We describe Eureka, an interactive system that leverages edge computing to greatly improve the productivity of experts in this task. Our experimental results show that Eureka reduces the labeling effort needed to construct a training set by two orders of magnitude relative to a brute-force approach.
Deep learning in medicine: An introduction and applications to next-generatio...Allen Day, PhD
Deep learning has enabled dramatic advances in image recognition performance. In this talk I will discuss using a deep convolutional neural network to detect genetic variation in aligned next-generation sequencing human read data. Our method, called DeepVariant, both outperforms existing genotyping tools and generalizes across genome builds and even to other species. DeepVariant represents a significant step from expert-driven statistical modeling towards more automatic deep learning approaches for developing software to interpret biological instrumentation data.
NIPS - Deep learning @ Edge using Intel's NCSgeetachauhan
The document discusses using Intel's Neural Compute Stick for deep learning at the edge. It introduces the Neural Compute Stick, which enables computer vision and AI capabilities in small, low power devices. It then provides an overview of deep learning and discusses how to build IoT applications using the Neural Compute Stick SDK. Examples of use cases for edge intelligence in IoT are also presented.
To present on the seminar in DASH-Lab, SKKU, I brought out the thesis, which is Transferable GAN-generated Images (ICML 2020)
Detection.
.
If you want to see the context more specifically, you can see from this link : https://arxiv.org/abs/2008.04115
20170402 Crop Innovation and Business - AmsterdamAllen Day, PhD
This document discusses applying machine learning and artificial intelligence techniques like deep neural networks to problems in genomics and agriculture. It provides examples of using Google Cloud platforms and services for storing and analyzing large genomic datasets, as well as developing models for tasks like variant calling from sequencing data and marker-assisted breeding. The document advocates that Google is well-positioned to handle massive volumes of genomic and agricultural data and help advance the application of AI in these domains.
In this session we will explore how Google's Cloud services (CloudML, Vision, Genomics API) can be used to process genomic and phenotypic data and solve problems in healthcare and agriculture.
The document discusses deep learning techniques for financial technology (FinTech) applications. It begins with examples of current deep learning uses in FinTech like trading algorithms, fraud detection, and personal finance assistants. It then covers topics like specialized compute hardware for deep learning training and inference, optimization techniques for CPUs and GPUs, and distributed training approaches. Finally, it discusses emerging areas like FPGA and quantum computing and provides resources for practitioners to start with deep learning for FinTech.
Deep learning in medicine: An introduction and applications to next-generatio...Allen Day, PhD
Deep learning has enabled dramatic advances in image recognition performance. In this talk I will discuss using a deep convolutional neural network to detect genetic variation in aligned next-generation sequencing human read data. Our method, called DeepVariant, both outperforms existing genotyping tools and generalizes across genome builds and even to other species. DeepVariant represents a significant step from expert-driven statistical modeling towards more automatic deep learning approaches for developing software to interpret biological instrumentation data.
NIPS - Deep learning @ Edge using Intel's NCSgeetachauhan
The document discusses using Intel's Neural Compute Stick for deep learning at the edge. It introduces the Neural Compute Stick, which enables computer vision and AI capabilities in small, low power devices. It then provides an overview of deep learning and discusses how to build IoT applications using the Neural Compute Stick SDK. Examples of use cases for edge intelligence in IoT are also presented.
To present on the seminar in DASH-Lab, SKKU, I brought out the thesis, which is Transferable GAN-generated Images (ICML 2020)
Detection.
.
If you want to see the context more specifically, you can see from this link : https://arxiv.org/abs/2008.04115
20170402 Crop Innovation and Business - AmsterdamAllen Day, PhD
This document discusses applying machine learning and artificial intelligence techniques like deep neural networks to problems in genomics and agriculture. It provides examples of using Google Cloud platforms and services for storing and analyzing large genomic datasets, as well as developing models for tasks like variant calling from sequencing data and marker-assisted breeding. The document advocates that Google is well-positioned to handle massive volumes of genomic and agricultural data and help advance the application of AI in these domains.
In this session we will explore how Google's Cloud services (CloudML, Vision, Genomics API) can be used to process genomic and phenotypic data and solve problems in healthcare and agriculture.
The document discusses deep learning techniques for financial technology (FinTech) applications. It begins with examples of current deep learning uses in FinTech like trading algorithms, fraud detection, and personal finance assistants. It then covers topics like specialized compute hardware for deep learning training and inference, optimization techniques for CPUs and GPUs, and distributed training approaches. Finally, it discusses emerging areas like FPGA and quantum computing and provides resources for practitioners to start with deep learning for FinTech.
At Intel Labs Day 2020, Intel spotlighted research initiatives across multiple domains where its researchers are striving for orders of magnitude advancements to shape the next decade of computing. Themed “In Pursuit of 1000X: Disruptive Research for the Next Decade in Computing,” the event featured several emerging areas including integrated photonics, neuromorphic computing, quantum computing, confidential computing, and machine programming. Together, these domains represent pioneering efforts to address critical challenges in the future of computing, and Intel’s leadership role in pursuing breakthroughs to address them. Rich Uhlig, Intel senior fellow, vice president, and director of Intel Labs was joined by several domain experts across the research organization to share perspectives on the industry and societal impact of these technologies.
Best Practices for On-Demand HPC in Enterprisesgeetachauhan
Traditionally HPC has been popular in Scientific domains, but not in most other Enterprises. With the advent of on-demand-HPC in cloud and growing adoption of Deep Learning, HPC should now be a standard platform for any Enterprise leading with AI and Machine Learning. This session will cover the best practices for building your own on-demand HPC cluster for Enterprise workloads along with key use cases where Enterprises will benefit from HPC solution.
Talk @ ACM SF Bayarea Chapter on Deep Learning for medical imaging space.
The talk covers use cases, special challenges and solutions for Deep Learning for Medical Image Analysis using Tensorflow+Keras. You will learn about:
- Use cases for Deep Learning in Medical Image Analysis
- Different DNN architectures used for Medical Image Analysis
- Special purpose compute / accelerators for Deep Learning (in the Cloud / On-prem)
- How to parallelize your models for faster training of models and serving for inferenceing.
- Optimization techniques to get the best performance from your cluster (like Kubernetes/ Apache Mesos / Spark)
- How to build an efficient Data Pipeline for Medical Image Analysis using Deep Learning
- Resources to jump start your journey - like public data sets, common models used in Medical Image Analysis
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...KIMMINHA3
This paper is about the Super-Resolution (SR) task and was introduced in CVPRW 2020 as the winner of two tasks with SR competition.
The authors called into question why there are no practical methods for denoising. Because previous papers dealt with ideal noise like bicubic downsampling.
To solve this impractical and ideal problem, the authors proposed to improve the resolution via kernel estimation and noise injection, which means that they do not use it while the training phase. That is why I was interested in this paper.
It is simply for before training. So I was interested in how they explore the proper with real-world images; kernel estimation and noise injection.
In summary, they save some informs of kernels that are applied corresponding to their formula using the eval data, i.e., no have ground truth. Also, the values of noise are as well.
These are what they are emphasizing novel method.
If you guys want to see and know more specifically this paper, you can cite this link:
https://openaccess.thecvf.com/content_CVPRW_2020/papers/w31/Ji_Real-World_Super-Resolution_via_Kernel_Estimation_and_Noise_Injection_CVPRW_2020_paper.pdf
In this presentation, I'll introduce the 'Real-world Super-Resolution via Kernel Estimation and Noise Injection'
In this deck from the Stanford HPC Conference, Peter Dueben from the European Centre for Medium-Range Weather Forecasts (ECMWF) presents: Machine Learning for Weather Forecasts.
"I will present recent studies that use deep learning to learn the equations of motion of the atmosphere, to emulate model components of weather forecast models and to enhance usability of weather forecasts. I will than talk about the main challenges for the application of deep learning in cutting-edge weather forecasts and suggest approaches to improve usability in the future."
Peter is contributing to the development and optimization of weather and climate models for modern supercomputers. He is focusing on a better understanding of model error and model uncertainty, on the use of reduced numerical precision that is optimised for a given level of model error, on global cloud- resolving simulations with ECMWF's forecast model, and the use of machine learning, and in particular deep learning, to improve the workflow and predictions. Peter has graduated in Physics and wrote his PhD thesis at the Max Planck Institute for Meteorology in Germany. He worked as Postdoc with Tim Palmer at the University of Oxford and has taken up a position as University Research Fellow of the Royal Society at the European Centre for Medium-Range Weather Forecasts (ECMWF) in 2017.
Watch the video: https://youtu.be/ks3fkRj8Iqc
Learn more: https://www.ecmwf.int/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
This document provides an overview of Mahdi Hosseini Moghaddam's background and work applying machine learning and cognitive computing for intrusion detection. It discusses his education in computer science and engineering and awards. It then outlines the goals of the presentation to discuss real-world applications of machine learning rather than scientific details. The document proceeds to discuss problems with current intrusion detection systems, introduce concepts in machine learning and cognitive computing, and describe Mahdi's methodology and architecture for a hardware-based machine learning system using a cognitive processor to enable fast intrusion detection.
Making Sense of Information Through Planetary Scale ComputingLarry Smarr
Larry Smarr discusses how planetary-scale computing and high-speed networks enable data-intensive research through optical portals. This infrastructure allows remote visualization and analysis of large datasets across multiple sites in real-time. Examples include viewing microbial genomes, cosmological simulations, and remote instrument control. The infrastructure also aims to reduce carbon emissions through more efficient computing.
II-SDV 2017: The Next Era: Deep Learning for Biomedical ResearchDr. Haxel Consult
Deep learning is hot, making waves, delivering results, and is somewhat of a buzzword today. There is a desire to apply deep learning to anything that is digital. Unlike the brain, these artificial neural networks have a very strict predefined structure. The brain is made up of neurons that talk to each other via electrical and chemical signals. We do not differentiate between these two types of signals in artificial neural networks. They are essentially a series of advanced statistics based exercises that review the past to indicate the likely future. Another buzzword that was used for the last few years across all industries is “big data”. In biomedical and health sciences, both unstructured and structured information constitute "big data". On the one hand deep learning needs lot of data whereas “big data" has value only when it generates actionable insight. Given this, these two areas are destined to be married. The couple is made for each other. The time is ripe now for a synergistic association that will benefit the pharmaceutical companies. It may be only a short time before we have vice presidents of machine learning or deep learning in pharmaceutical and biotechnology companies. This presentation will review the prominent deep learning methods and discuss these techniques for their usefulness in biomedical and health informatics.
This deck covers some of the open problems in the big data analytics space, starting with a discussion of state-of-art analytics using Spark/Hadoop YARN. It details out whether each of these are appropriate technologies and explores alternatives wherever possible. It ends with an important problem discussion - how to build a single system to handle big data pipelines without explicit data transfers.
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processinginside-BigData.com
In this video from the GPU Technology Conference, Lance Wilson from Monash University presents: How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing.
"Learn how high-resolution imaging is revolutionizing science and dramatically changing how we process, analyze, and visualize at this new scale. We will show the journey a researcher can take to produce images capable of winning a Nobel prize. We'll review the last two years of development in single-particle cryo-electron microscopy processing, with a focus on accelerated software, and discuss benchmarks and best practices for common software packages in this domain. Our talk will include videos and images of atomic resolution molecules and viruses that demonstrate our success in high-resolution imaging."
Watch the video: https://wp.me/p3RLHQ-kcW
Learn more: https://www.monash.edu/researchinfrastructure/cryo-em
and
https://www.nvidia.com/en-us/gtc/home/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Talking Data is the largest independent big data service company in China. Their network covers 70% of the mobile services nationwide with 3 billion ad clicks per day. Amongst those clicks, 90% are potentially fraudulent. Click fraud is happening at an overwhelming volume leading to misusage of data and wasting money. Hence, Kaggle (a platform for predictive modeling and analytics competitions from the U.S.) has partnered up with TalkingData to help resolve this issue.
This paper is to build predictive analysis models using traditional and Big Data methods to determine whether a smartphone app will be downloaded after clicking an advertisement. We have used data named “TalkingData AdTracking Fraud Detection Challenge”, which is of 7GB and given by a Kaggle competition. Four classification models are implemented with this massive data set in order to predict fraud in both traditional and Big Data methods. We define it fraud when the user clicked on an advertisement without downloading. The traditional platform has a resource limitation to build models with data set over a giga-byte so that we generate a sample data for the traditional models and adopt the full data set for the models in the Big Data Spark ML systems. We also present the accuracy and performance of the models implemented in both traditional and Big Data systems.
Machine Learning in Healthcare DiagnosticsLarry Smarr
Machine learning and artificial intelligence are rapidly transforming healthcare and medicine. Advances in genetic sequencing have enabled the mapping of human and microbial genomes at low costs. Researchers are using machine learning to analyze genomic and microbiome data to better understand health and disease. Non-von Neumann brain-inspired computing architectures are being developed for machine learning applications and could accelerate medical research and diagnostics. These technologies may help create personalized health coaching and move medicine from reactive sickcare to proactive healthcare.
ICIC 2017: The Next Era: Deep Learning for Biomedical ResearchDr. Haxel Consult
Srinivasan Parthiban (VINGYANI, India)
Deep learning is hot, making waves, delivering results, and is somewhat of a buzzword today. There is a desire to apply deep learning to anything that is digital. Unlike the brain, these artificial neural networks have a very strict predefined structure. The brain is made up of neurons that talk to each other via electrical and chemical signals. We do not differentiate between these two types of signals in artificial neural networks. They are essentially a series of advanced statistics based exercises that review the past to indicate the likely future. Another buzzword that was used for the last few years across all industries is “big data”. In biomedical and health sciences, both unstructured and structured information constitute "big data". On the one hand deep learning needs lot of data whereas “big data" has value only when it generates actionable insight. Given this, these two areas are destined to be married. The couple is made for each other. The time is ripe now for a synergistic association that will benefit the pharmaceutical companies. It may be only a short time before we have vice presidents of machine learning or deep learning in pharmaceutical and biotechnology companies. This presentation will review the prominent deep learning methods and discuss these techniques for their usefulness in biomedical and health informatics.
IRJET- A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...IRJET Journal
The document proposes and describes two novel reversible data hiding algorithms that can embed data in encrypted images at high capacity.
The first algorithm, called Vacating Room After Encryption (VRAE), encrypts data, embeds additional information into the encrypted data, hides the encrypted data in a cover image which is then encrypted and transmitted. At the receiver, the cover image is decrypted, the encrypted data is extracted and its embedded information is retrieved, then the data is decrypted to recover the original.
The second algorithm, Reserving Room Before Encryption (RRBE), first embeds information into data, encrypts the data, hides it in a cover image which is then encrypted and sent. At the receiver, the
Virtualized high performance computing with mellanox fdr and ro ceinside-BigData.com
In this video from VMworld 2014, Josh Simons from VMware presents: Virtualized High Performance Computing with Mellanox FDR and RoCE on VMware ESXi 5.5.
"The HPC community can realize significant benefits from adopting enterprise-capable IT solutions grounded in proven virtualization and cloud technology. And conversely, as business IT environments become increasingly compute-intensive, lessons learned by the scientists and engineers working with HPC can be transferred to their counterparts in the enterprise. It’s a win-win situation."
Watch the video presentation: http://insidehpc.com/2014/09/virtualized-high-performance-computing-mellanox-fdr-roce/
Learn more in the insideHPC Guide to Virtualization, the Cloud and HPC: http://bit.ly/1w8kMfu
The adaptive mechanisms include the following AI paradigms that exhibit an ability to learn or adapt to new environments:
Swarm Intelligence (SI),
Artificial Neural Networks (ANN),
Evolutionary Computation (EC),
Artificial Immune Systems (AIS), and
Fuzzy Systems (FS).
Wave Computing is a startup that has developed a new dataflow architecture called the Dataflow Processing Unit (DPU) to accelerate deep learning training by up to 1000x. Their initial market focus is on machine learning in the datacenter. They have invented a Coarse Grain Reconfigurable Array architecture that can statically schedule dataflow graphs onto a massive array of processors. Wave is now accepting qualified customers for its Early Access Program to provide select companies early access to benchmark Wave's machine learning computers before official sales begin.
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
Deep Learning And Business Models (VNITC 2015-09-13)Ha Phuong
Deep Learning and Business Models
Tran Quoc Hoan discusses deep learning and its applications, as well as potential business models. Deep learning has led to significant improvements in areas like image and speech recognition compared to traditional machine learning. Some business models highlighted include developing deep learning frameworks, building hardware optimized for deep learning, using deep learning for IoT applications, and providing deep learning APIs and services. Deep learning shows promise across many sectors but also faces challenges in fully realizing its potential.
At Intel Labs Day 2020, Intel spotlighted research initiatives across multiple domains where its researchers are striving for orders of magnitude advancements to shape the next decade of computing. Themed “In Pursuit of 1000X: Disruptive Research for the Next Decade in Computing,” the event featured several emerging areas including integrated photonics, neuromorphic computing, quantum computing, confidential computing, and machine programming. Together, these domains represent pioneering efforts to address critical challenges in the future of computing, and Intel’s leadership role in pursuing breakthroughs to address them. Rich Uhlig, Intel senior fellow, vice president, and director of Intel Labs was joined by several domain experts across the research organization to share perspectives on the industry and societal impact of these technologies.
Best Practices for On-Demand HPC in Enterprisesgeetachauhan
Traditionally HPC has been popular in Scientific domains, but not in most other Enterprises. With the advent of on-demand-HPC in cloud and growing adoption of Deep Learning, HPC should now be a standard platform for any Enterprise leading with AI and Machine Learning. This session will cover the best practices for building your own on-demand HPC cluster for Enterprise workloads along with key use cases where Enterprises will benefit from HPC solution.
Talk @ ACM SF Bayarea Chapter on Deep Learning for medical imaging space.
The talk covers use cases, special challenges and solutions for Deep Learning for Medical Image Analysis using Tensorflow+Keras. You will learn about:
- Use cases for Deep Learning in Medical Image Analysis
- Different DNN architectures used for Medical Image Analysis
- Special purpose compute / accelerators for Deep Learning (in the Cloud / On-prem)
- How to parallelize your models for faster training of models and serving for inferenceing.
- Optimization techniques to get the best performance from your cluster (like Kubernetes/ Apache Mesos / Spark)
- How to build an efficient Data Pipeline for Medical Image Analysis using Deep Learning
- Resources to jump start your journey - like public data sets, common models used in Medical Image Analysis
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...KIMMINHA3
This paper is about the Super-Resolution (SR) task and was introduced in CVPRW 2020 as the winner of two tasks with SR competition.
The authors called into question why there are no practical methods for denoising. Because previous papers dealt with ideal noise like bicubic downsampling.
To solve this impractical and ideal problem, the authors proposed to improve the resolution via kernel estimation and noise injection, which means that they do not use it while the training phase. That is why I was interested in this paper.
It is simply for before training. So I was interested in how they explore the proper with real-world images; kernel estimation and noise injection.
In summary, they save some informs of kernels that are applied corresponding to their formula using the eval data, i.e., no have ground truth. Also, the values of noise are as well.
These are what they are emphasizing novel method.
If you guys want to see and know more specifically this paper, you can cite this link:
https://openaccess.thecvf.com/content_CVPRW_2020/papers/w31/Ji_Real-World_Super-Resolution_via_Kernel_Estimation_and_Noise_Injection_CVPRW_2020_paper.pdf
In this presentation, I'll introduce the 'Real-world Super-Resolution via Kernel Estimation and Noise Injection'
In this deck from the Stanford HPC Conference, Peter Dueben from the European Centre for Medium-Range Weather Forecasts (ECMWF) presents: Machine Learning for Weather Forecasts.
"I will present recent studies that use deep learning to learn the equations of motion of the atmosphere, to emulate model components of weather forecast models and to enhance usability of weather forecasts. I will than talk about the main challenges for the application of deep learning in cutting-edge weather forecasts and suggest approaches to improve usability in the future."
Peter is contributing to the development and optimization of weather and climate models for modern supercomputers. He is focusing on a better understanding of model error and model uncertainty, on the use of reduced numerical precision that is optimised for a given level of model error, on global cloud- resolving simulations with ECMWF's forecast model, and the use of machine learning, and in particular deep learning, to improve the workflow and predictions. Peter has graduated in Physics and wrote his PhD thesis at the Max Planck Institute for Meteorology in Germany. He worked as Postdoc with Tim Palmer at the University of Oxford and has taken up a position as University Research Fellow of the Royal Society at the European Centre for Medium-Range Weather Forecasts (ECMWF) in 2017.
Watch the video: https://youtu.be/ks3fkRj8Iqc
Learn more: https://www.ecmwf.int/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
This document provides an overview of Mahdi Hosseini Moghaddam's background and work applying machine learning and cognitive computing for intrusion detection. It discusses his education in computer science and engineering and awards. It then outlines the goals of the presentation to discuss real-world applications of machine learning rather than scientific details. The document proceeds to discuss problems with current intrusion detection systems, introduce concepts in machine learning and cognitive computing, and describe Mahdi's methodology and architecture for a hardware-based machine learning system using a cognitive processor to enable fast intrusion detection.
Making Sense of Information Through Planetary Scale ComputingLarry Smarr
Larry Smarr discusses how planetary-scale computing and high-speed networks enable data-intensive research through optical portals. This infrastructure allows remote visualization and analysis of large datasets across multiple sites in real-time. Examples include viewing microbial genomes, cosmological simulations, and remote instrument control. The infrastructure also aims to reduce carbon emissions through more efficient computing.
II-SDV 2017: The Next Era: Deep Learning for Biomedical ResearchDr. Haxel Consult
Deep learning is hot, making waves, delivering results, and is somewhat of a buzzword today. There is a desire to apply deep learning to anything that is digital. Unlike the brain, these artificial neural networks have a very strict predefined structure. The brain is made up of neurons that talk to each other via electrical and chemical signals. We do not differentiate between these two types of signals in artificial neural networks. They are essentially a series of advanced statistics based exercises that review the past to indicate the likely future. Another buzzword that was used for the last few years across all industries is “big data”. In biomedical and health sciences, both unstructured and structured information constitute "big data". On the one hand deep learning needs lot of data whereas “big data" has value only when it generates actionable insight. Given this, these two areas are destined to be married. The couple is made for each other. The time is ripe now for a synergistic association that will benefit the pharmaceutical companies. It may be only a short time before we have vice presidents of machine learning or deep learning in pharmaceutical and biotechnology companies. This presentation will review the prominent deep learning methods and discuss these techniques for their usefulness in biomedical and health informatics.
This deck covers some of the open problems in the big data analytics space, starting with a discussion of state-of-art analytics using Spark/Hadoop YARN. It details out whether each of these are appropriate technologies and explores alternatives wherever possible. It ends with an important problem discussion - how to build a single system to handle big data pipelines without explicit data transfers.
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processinginside-BigData.com
In this video from the GPU Technology Conference, Lance Wilson from Monash University presents: How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing.
"Learn how high-resolution imaging is revolutionizing science and dramatically changing how we process, analyze, and visualize at this new scale. We will show the journey a researcher can take to produce images capable of winning a Nobel prize. We'll review the last two years of development in single-particle cryo-electron microscopy processing, with a focus on accelerated software, and discuss benchmarks and best practices for common software packages in this domain. Our talk will include videos and images of atomic resolution molecules and viruses that demonstrate our success in high-resolution imaging."
Watch the video: https://wp.me/p3RLHQ-kcW
Learn more: https://www.monash.edu/researchinfrastructure/cryo-em
and
https://www.nvidia.com/en-us/gtc/home/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Talking Data is the largest independent big data service company in China. Their network covers 70% of the mobile services nationwide with 3 billion ad clicks per day. Amongst those clicks, 90% are potentially fraudulent. Click fraud is happening at an overwhelming volume leading to misusage of data and wasting money. Hence, Kaggle (a platform for predictive modeling and analytics competitions from the U.S.) has partnered up with TalkingData to help resolve this issue.
This paper is to build predictive analysis models using traditional and Big Data methods to determine whether a smartphone app will be downloaded after clicking an advertisement. We have used data named “TalkingData AdTracking Fraud Detection Challenge”, which is of 7GB and given by a Kaggle competition. Four classification models are implemented with this massive data set in order to predict fraud in both traditional and Big Data methods. We define it fraud when the user clicked on an advertisement without downloading. The traditional platform has a resource limitation to build models with data set over a giga-byte so that we generate a sample data for the traditional models and adopt the full data set for the models in the Big Data Spark ML systems. We also present the accuracy and performance of the models implemented in both traditional and Big Data systems.
Machine Learning in Healthcare DiagnosticsLarry Smarr
Machine learning and artificial intelligence are rapidly transforming healthcare and medicine. Advances in genetic sequencing have enabled the mapping of human and microbial genomes at low costs. Researchers are using machine learning to analyze genomic and microbiome data to better understand health and disease. Non-von Neumann brain-inspired computing architectures are being developed for machine learning applications and could accelerate medical research and diagnostics. These technologies may help create personalized health coaching and move medicine from reactive sickcare to proactive healthcare.
ICIC 2017: The Next Era: Deep Learning for Biomedical ResearchDr. Haxel Consult
Srinivasan Parthiban (VINGYANI, India)
Deep learning is hot, making waves, delivering results, and is somewhat of a buzzword today. There is a desire to apply deep learning to anything that is digital. Unlike the brain, these artificial neural networks have a very strict predefined structure. The brain is made up of neurons that talk to each other via electrical and chemical signals. We do not differentiate between these two types of signals in artificial neural networks. They are essentially a series of advanced statistics based exercises that review the past to indicate the likely future. Another buzzword that was used for the last few years across all industries is “big data”. In biomedical and health sciences, both unstructured and structured information constitute "big data". On the one hand deep learning needs lot of data whereas “big data" has value only when it generates actionable insight. Given this, these two areas are destined to be married. The couple is made for each other. The time is ripe now for a synergistic association that will benefit the pharmaceutical companies. It may be only a short time before we have vice presidents of machine learning or deep learning in pharmaceutical and biotechnology companies. This presentation will review the prominent deep learning methods and discuss these techniques for their usefulness in biomedical and health informatics.
IRJET- A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...IRJET Journal
The document proposes and describes two novel reversible data hiding algorithms that can embed data in encrypted images at high capacity.
The first algorithm, called Vacating Room After Encryption (VRAE), encrypts data, embeds additional information into the encrypted data, hides the encrypted data in a cover image which is then encrypted and transmitted. At the receiver, the cover image is decrypted, the encrypted data is extracted and its embedded information is retrieved, then the data is decrypted to recover the original.
The second algorithm, Reserving Room Before Encryption (RRBE), first embeds information into data, encrypts the data, hides it in a cover image which is then encrypted and sent. At the receiver, the
Virtualized high performance computing with mellanox fdr and ro ceinside-BigData.com
In this video from VMworld 2014, Josh Simons from VMware presents: Virtualized High Performance Computing with Mellanox FDR and RoCE on VMware ESXi 5.5.
"The HPC community can realize significant benefits from adopting enterprise-capable IT solutions grounded in proven virtualization and cloud technology. And conversely, as business IT environments become increasingly compute-intensive, lessons learned by the scientists and engineers working with HPC can be transferred to their counterparts in the enterprise. It’s a win-win situation."
Watch the video presentation: http://insidehpc.com/2014/09/virtualized-high-performance-computing-mellanox-fdr-roce/
Learn more in the insideHPC Guide to Virtualization, the Cloud and HPC: http://bit.ly/1w8kMfu
The adaptive mechanisms include the following AI paradigms that exhibit an ability to learn or adapt to new environments:
Swarm Intelligence (SI),
Artificial Neural Networks (ANN),
Evolutionary Computation (EC),
Artificial Immune Systems (AIS), and
Fuzzy Systems (FS).
Wave Computing is a startup that has developed a new dataflow architecture called the Dataflow Processing Unit (DPU) to accelerate deep learning training by up to 1000x. Their initial market focus is on machine learning in the datacenter. They have invented a Coarse Grain Reconfigurable Array architecture that can statically schedule dataflow graphs onto a massive array of processors. Wave is now accepting qualified customers for its Early Access Program to provide select companies early access to benchmark Wave's machine learning computers before official sales begin.
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
Deep Learning And Business Models (VNITC 2015-09-13)Ha Phuong
Deep Learning and Business Models
Tran Quoc Hoan discusses deep learning and its applications, as well as potential business models. Deep learning has led to significant improvements in areas like image and speech recognition compared to traditional machine learning. Some business models highlighted include developing deep learning frameworks, building hardware optimized for deep learning, using deep learning for IoT applications, and providing deep learning APIs and services. Deep learning shows promise across many sectors but also faces challenges in fully realizing its potential.
The document discusses using recurrent neural networks to detect Android malware. It proposes developing a deep learning model using LSTM or GRU networks to efficiently detect malware files. The existing approaches have limitations in detecting new malware. The proposed system would use recurrent networks to model sequential Android app data and detect malware, including new emerging types.
1) Deep learning has achieved great success in many computer vision tasks such as image classification, object detection, and segmentation. Convolutional neural networks (CNNs) are often used.
2) The size and quality of training datasets is crucial, as deep learning models require large amounts of labeled data to learn meaningful patterns. Data augmentation and synthesis can help increase data quantity and quality.
3) Semi-supervised and transfer learning techniques can help address the challenge of limited labeled data by making use of unlabeled data as well. Generative adversarial networks (GANs) have also been used for data augmentation.
Deep Learning Based Real-Time DNS DDoS Detection SystemSeungjoo Kim
[Poster] Deep Learning Based Real-Time DNS DDoS Detection System @ ACSAC 2016 (The 32nd Annual Computer Security Applications Conference 2016), which is one of the most important cyber security conferences in the world and the oldest information security conference held annually
ACTOR CRITIC APPROACH BASED ANOMALY DETECTION FOR EDGE COMPUTING ENVIRONMENTSIJCNCJournal
The pivotal role of data security in mobile edge-computing environments forms the foundation for the
proposed work. Anomalies and outliers in the sensory data due to network attacks will be a prominent
concern in real time. Sensor samples will be considered from a set of sensors at a particular time instant as
far as the confidence level on the decision remains on par with the desired value. A “true” on the
hypothesis test eventually means that the sensor has shown signs of anomaly or abnormality and samples
have to be immediately ceased from being retrieved from the sensor. A deep learning Actor-Criticbased
Reinforcement algorithm proposed will be able to detect anomalies in the form of binary indicators and
hence decide when to withdraw from receiving further samples from specific sensors. The posterior trust
value influences the value of the confidence interval and hence the probability of anomaly detection. The
paper exercises a single-tailed normal function to determine the range of the posterior trust metric. The
decision taken by the prediction model will be able to detect anomalies with a good percentage of anomaly
detection accuracy.
Actor Critic Approach based Anomaly Detection for Edge Computing EnvironmentsIJCNCJournal
The pivotal role of data security in mobile edge-computing environments forms the foundation for the
proposed work. Anomalies and outliers in the sensory data due to network attacks will be a prominent
concern in real time. Sensor samples will be considered from a set of sensors at a particular time instant as
far as the confidence level on the decision remains on par with the desired value. A “true” on the
hypothesis test eventually means that the sensor has shown signs of anomaly or abnormality and samples
have to be immediately ceased from being retrieved from the sensor. A deep learning Actor-Criticbased
Reinforcement algorithm proposed will be able to detect anomalies in the form of binary indicators and
hence decide when to withdraw from receiving further samples from specific sensors. The posterior trust
value influences the value of the confidence interval and hence the probability of anomaly detection. The
paper exercises a single-tailed normal function to determine the range of the posterior trust metric. The
decision taken by the prediction model will be able to detect anomalies with a good percentage of anomaly
detection accuracy
RICE INSECTS CLASSIFICATION USIING TRANSFER LEARNING AND CNNIRJET Journal
This document summarizes a study that used transfer learning and convolutional neural networks (CNNs) to classify different rice insect pests from images. The researchers used pre-trained CNN models like AlexNet and VGG16 and fine-tuned them on a dataset of rice insect images. AlexNet achieved the highest classification accuracy of 98%. Transfer learning helped address the classification problem with minimal training requirements compared to training CNNs from scratch. The study aims to help with early detection of insect pests to prevent crop damage.
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...Allen Day, PhD
This document discusses Google's capabilities for handling large genomic and biomedical data sets. It describes how Google uses technologies like Google Cloud, BigQuery, Dataflow and TensorFlow to process, store and analyze massive volumes of genomic and medical data. Google's systems can handle hundreds of terabytes to petabytes of data and enable fast querying and machine learning on these data sets. The document also provides examples of how Google is applying these capabilities to challenges in genomics, healthcare and precision medicine.
The document discusses the potential applications of deep learning in healthcare. It begins by explaining that deep learning models can improve accuracy of diagnosis, prognosis, and risk prediction by analyzing large datasets. It then discusses how deep learning can optimize hospital processes like resource allocation and patient flow by early and accurate prediction of diseases. Finally, it mentions that deep learning can help identify patient subgroups for personalized and precision medicine approaches.
The document discusses Next Century's core competencies including mobile computing, GIS and mapping, data presentation and visualization, signal processing, data fusion and aggregation, image processing, and IT infrastructure and support. It then summarizes several projects including WISER, MASTIF, WRAP, an advanced visualization tool, a threat warning system, advanced image recognition R&D, CERTAS, TORA, and Performance DNA Desktop.
An intrusion detection system for packet and flow based networks using deep n...IJECEIAES
Study on deep neural networks and big data is merging now by several aspects to enhance the capabilities of intrusion detection system (IDS). Many IDS models has been introduced to provide security over big data. This study focuses on the intrusion detection in computer networks using big datasets. The advent of big data has agitated the comprehensive assistance in cyber security by forwarding a brunch of affluent algorithms to classify and analysis patterns and making a better prediction more efficiently. In this study, to detect intrusion a detection model has been propounded applying deep neural networks. We applied the suggested model on the latest dataset available at online, formatted with packet based, flow based data and some additional metadata. The dataset is labeled and imbalanced with 79 attributes and some classes having much less training samples compared to other classes. The proposed model is build using Keras and Google Tensorflow deep learning environment. Experimental result shows that intrusions are detected with the accuracy over 99% for both binary and multiclass classification with selected best features. Receiver operating characteristics (ROC) and precision-recall curve average score is also 1. The outcome implies that Deep Neural Networks offers a novel research model with great accuracy for intrusion detection model, better than some models presented in the literature.
Machine learning and deep learning techniques can be used to analyze diverse types of data such as images, text, signals and more. Deep learning uses neural networks to learn directly from raw data, enabling applications like object recognition, speech recognition, and analyzing time series signals. Deep learning has become popular due to labeled public datasets, increased GPU acceleration, and pre-trained models that provide a starting point for new problems.
About an Immune System Understanding for Cloud-native Applications - Biology ...Nane Kratzke
Presentation for 9th International Conference on Cloud Computing, GRIDS, and Virtualization (CLOUD COMPUTING 2018) in Barcelona, Spain, 2018.
There is no such thing as an impenetrable system, although the penetration of systems does get harder from year to year. The median days that intruders remained undetected on victim systems dropped from 416 days in 2010 down to 99 in 2016. Perhaps because of that, a new trend in security breaches is to compromise the forensic trail to allow the intruder to remain undetected for longer in victim systems and to retain valuable footholds for as long as possible. This paper proposes an immune system inspired solution which uses a more frequent regeneration of cloud application nodes to ensure that undetected compromised nodes can be purged. This makes it much harder for intruders to maintain a presence on victim systems. Basically the biological concept of cell-regeneration is combined with the information systems concept of append-only logs. Evaluation experiments performed on popular cloud service infrastructures (Amazon Web Services, Google Compute Engine, Azure and OpenStack) have shown that between 6 and 40 nodes of elastic container platforms can be regenerated per hour. Even a large cluster of 400 nodes could be regenerated in somewhere between 9 and 66 hours. So, regeneration shows the potential to reduce the foothold of undetected intruders from months to just hours.
This document discusses tools and services for data intensive research in the cloud. It describes several initiatives by the eXtreme Computing Group at Microsoft Research related to cloud computing, multicore computing, quantum computing, security and cryptography, and engaging with research partners. It notes that the nature of scientific computing is changing to be more data-driven and exploratory. Commercial clouds are important for research as they allow researchers to start work quickly without lengthy installation and setup times. The document discusses how economics has driven improvements in computing technologies and how this will continue to impact research computing infrastructure. It also summarizes several Microsoft technologies for data intensive computing including Dryad, LINQ, and Complex Event Processing.
Tom Soderstrom, Chief Technology and Innovation Officer at NASA’s Jet Propulsion Laboratory, has demonstrated how internet-of-things (IoT) technology and cloud computing can form the backbone for monumental innovation. This combination has enabled private and public space exploration enterprises to dare greatly and, together, discover more of the solar system than ever before. Cloud computing, with its unlimited storage and compute resources, blends IoT, machine learning, intelligent assistance, and new interfaces with computers. It has the potential to allow humans to explore and colonize other areas of the solar system by enabling collaboration across millions of miles, and social networking on a planetary scale.
Battista Biggio @ S+SSPR2014, Joensuu, Finland -- Poisoning Complete-Linkage ...Pluribus One
The document discusses poisoning attacks against complete-linkage hierarchical clustering. It introduces hierarchical clustering and describes how attackers can add poisoned samples to compromise the clustering output. The paper evaluates different attack strategies on real and artificial datasets, finding that even random attacks can be effective at poisoning the clusters, while extensions of greedy approaches generally perform best. Future work to develop defenses for clustering algorithms against adversarial inputs is discussed.
This document provides an overview of deep learning on GPUs. It discusses how GPUs are well-suited for deep learning and other computationally intensive tasks due to their massively parallel architecture. The document then describes what deep learning is, including different types of neural networks commonly used. It also discusses how deep learning can enhance analytics and big data by automating feature extraction. Examples of running deep learning on Spark clusters using frameworks like TensorFlow on Spark are presented.
The document discusses network forensics and the ability to capture and analyze all network traffic at high speeds. It notes that advances in storage technologies now enable total network traffic capture without loss. This allows analyzing past network events, even those from months prior, with full packet fidelity. The author proposes that network forensics technologies could evolve similarly to how firewalls became universal. By making total network traffic capture and analysis practical and easy to use, security defenses could become more effective against both known and unknown threats.
Surveillance scene classification using machine learningUtkarsh Contractor
The problem of scene classification in surveillance footage is of great importance for ensuring security in public areas. With challenges such as low quality feeds, occlusion, viewpoint variations, background clutter etc. The task is both challenging and error-prone. Therefore it is important to keep the false positives low to maintain a high accuracy of detection. In this paper, we adapt high performing CNN architectures to identify abandoned luggage in a surveillance feed. We explore several CNN based approaches, from Transfer Learning on the Imagenet dataset to object classification using Faster R-CNNs on the COCO dataset. Using network visualization techniques, we gain insight into what the neural network sees and the basis of classification decision. The experiments have been conducted on real world datasets, and highlights the complexity in such classifications. Obtained results indicate that a combination of proposed techniques outperforms the individual approaches.
Similar to Edge-based Discovery of Training Data for Machine Learning (20)
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Edge-based Discovery of Training Data for Machine Learning
1. Edge-based
Discovery of
Training Data
for Machine
Learning
Ziqiang (Edmond) Feng, Shilpa George, Jan
Harkes, Padmanabhan Pillai†, Roberta Klatzky,
Mahadev Satyanarayanan
Carnegie Mellon University and †Intel Labs
The New Yorker magazine April 20, 2018, p. 41
2. The Deep Learning Recipe
Collect a large amount of
data and label it
Select a model
and train a DNN
Deploy the DNN
for inference
2
TPOD @
CMU
3. DNNs for Domain Experts
Valuable in ecology, military intelligence, medical diagnosis, etc.
• Low base rate (prevalence) in the data
• Requires expertise to identify
Masked palm civet (Paguma larvata).
Transmitter of SARS during its 2003
outbreak in China.
BUK-M1. Believed to have shot down
MH17 and killed 298, 2014.
3
Nuclear atypia in cancer.
4. 4Building a Training Set Is Hard
Crowds are not experts
Crowd-sourcing (e.g., Amazon Mechanical Turk) are not applicable
Access restriction of data
Patient privacy, business policy, national security, etc.
In the worst case, a single domain expert has to generate
an entire training set of 103 to 104 examples.
Masked palm civet
Red panda
Raccoon
5. Our Contribution: Eureka
A system for efficient discovery of training examples from data
sources dispersed over the Internet
(focus on images in this paper)
Goal: to effectively utilize an expert’s time and attention
Key concepts:
Early discard
Iterative discovery workflow
Edge computing
5
(positive)
10. System Design and Implementation
Software generality: allow use of CV code written in
different languages, libraries and frameworks
(e.g., Python, Matlab, C++, TensorFlow, PyTorch, Scikit-learn)
Empower experts with newest CV innovations quickly
Encapsulate filters in Docker containers
Runtime efficiency: be able to rapidly process and discard
large volume of data
Exploit specialized hardware on cloudlets (e.g., GPU)
Cache filter results to exploit temporal locality
10
11. Matching System to User
The system should deliver images to user at a rate the user can inspect them.
Wasting computation and precious
Internet bandwidth
Suggestion
1. Restrict to fewer cloudlets
2. Bias filters towards precision rather
than recall
11
Too Fast
12. Matching System to User (cont’d)
The system should deliver images to user at a rate the user can inspect them.
Wasting expert time
Obvious solution
Scale out to more cloudlets
(Edge computing is your friend)
Risk
“Junk” (false positives) causes user
annoyance and dissatisfaction
Rule of thumb
Focus on reducing false positive rate
before scaling out
12
Too Slow
13. Evaluation: Setup
YFCC100M: 99.2 million Flickr photos.
Real-life distribution of objects.
Evenly partitioned over the cloudlets.
Dataset
8 cloudlets with Nvidia GPUs, access data from
local SSDs.Edge
Connected to the cloudlets via the Internet.Client
13
14. Evaluation: Case Studies
Deer Taj Mahal Fire hydrant
0.07% 0.02% 0.005%Estimated
base rate
111 105 74Collected positives
in evaluation
7,447 4,791 15,379Images viewed
by user
14
2,104,076 2,542,889 2,734,070Images discarded
by Eureka
15. Eureka vs. Brute-force
1,000
10,000
100,000
1,000,000
Deer Taj Mahal Fire hydrant
Number of images the user viewed to collect
~100 true positives
Brute-force Single-iteration Eureka Eureka
Brute-force:
User views every image.
Single-iteration Eureka:
Early-discard without
iterative improvement.
15
Please refer to our paper for detailed results of each case study.
16. Iteratively Improving Productivity
The case of deer
0.4 0.36
1.49
4.24
4.77
1 2 3 4 5
Iteration in Eureka
Productivity (New true positives / minute)
16
~10X
17. Compute Must Co-locate with Data
0
200
400
600
800
1000
10 Mbps 25 Mbps 100 Mbps 1 Gbps
MachineProcessingThroughput
(#/sec)
Throttling bandwidth between .
RGB histogram filter
US average
connectivity:
18.7 Mbps (2017)
17
18. More in the Paper
• Detailed system design and implementation
• An analytic model relating user wait time to base rate,
filter accuracy, cloudlet processing speed, etc.
• Detailed results of individual case studies
18
19. Conclusion
Eureka combines early discard, iterative discovery workflow
and edge computing to help domain experts efficiently
discover training examples of rare phenomena from data
sources on the edge.
Eureka reduces human labeling effort by two orders of
magnitude compared to a brute force approach.
19
20. Thank you!
I will also present on tomorrow’s PhD Forum to discuss related ideas.
20
Editor's Notes
Cartoon: New Yorker magazine April 20, 2018, p. 41
Deep learning has become the gold standard in many areas, especially computer vision, due to its superb accuracy. Here shows the high-level recipe when you try to apply deep learning to a problem. You collect a large amount of data and label it. Then you select a model and train a DNN. Finally you deploy the DNN for inference. Nowadays, there are many software libraries, frameworks, cloud services and web-based tools that let you do the last two steps with great convenience. Virtually all the painstaking effort is in the very first step. And it sometimes can be the showstopper of applying deep learning to your problem.
In this work, we focus on DNNs used by domain experts. Here are some examples. This animal is the transmitter of the SARS disease in China, 2003. You can imagine how valuable it would be if we had an accurate DNN detector and use it in public health effort. Likewise, this is a weapon that shot down an airplane and this is a pathological image of nuclear atypia in cancer. In all these cases, the target has low base rates – they are pretty rare in the data you are examining. And they all require expertise to correct identify.
https://en.wikipedia.org/wiki/Masked_palm_civet#Connection_with_SARS
Building a training set of this kind of targets is hard. First, obviously, crowds are not experts. So crowd-sourcing methods like Amazon Mechanical Turk are not applicable to these domains. For example, only an expert can reliably and accurately distinguish between these animals. Second, there may exist access restriction of data, such as patient privacy, business policy and national security. In the worst case, a single domain expert has to generate an entire training set of thousands to tens of thousands of examples.
In this paper, we describe a system called Eureka, for efficient discovery of training examples from data sources dispersed over the Internet. The goal of Eureka is to optimally utilize an expert’s time and attention. It combines three key concepts to achieve its goal: early discard, iterative discovery workflow and edge computing, which I will describe next.
Here shows Eureka’s architecture. An expert user runs a GUI on her own computer. The GUI connects to a number of cloudlets across the Internet. These cloudlets are LAN-connected to some associated data sources. These data sources may be archival or live, depending on the specific use case. As the shape of these arrows indicates, connections between cloudlets and data sources are high-bandwidth and low-latency, while those on the Internet are the contrary. And this high-bandwidth access is used to execute early-discard code on the cloudlets to drop clearly irrelevant data. Only a tiny fraction of data long with meta-data are transmitted and shown to the user, consuming little Internet bandwidth.
Here shows an example of using the GUI to find images of deer from an unlabeled dataset. You can specify a list of early-discard filters and only images passing all of the filters are transmitted and displayed. You are seeing many false positives because the filters used in this case are very weak color and texture filters.
(more time: 1. extend to general logical expression; 2. 500 more – efficient use of user attention)
To improve the efficacy of early-discard, we introduce the iterative discovery workflow. Here you see a spectrum of computer vision algorithms and machine learning models, from simple on the left, such as RGB and SIFT, to sophisticated on the right, such as deep learning. X-axis is the number of example images you have, and the Y-axis is the accuracy. While these numbers are not meant to be precise, the idea is that different models require a different amount of data to work properly, and they give you different levels of accuracy. When using Eureka, instead of creating a set of filters and searching for your target in one go, you iteratively change and improve your filters as you collect more examples, and move up the stairs when you have sufficient data to do so.
When using Eureka, in the beginning, you have very few examples. So you should only use explicit features like RGB or SIFT. With these weak filters, you may be able to find a few more, which allows you to escalate to a little more advanced filter, like SVM. The SVM is considerable more accurate, making it easier to find some more positives in a reasonable amount of time. So you iterate and climb up the stairs when you have sufficient data. In this process, you are both using more and more sophisticated filters, and growing the size of the training set you collect.
Here again shows the case of finding deer, but after a few iterations of using Eureka and an SVM is being used. You see the filter has now become much more accurate.
When designing and implementing Eureka, we have two major concerns. First is software generality. We want to allow the use computer vision code written in a diversity of languages, libraries and frameworks, so that we can empower experts with the newest computer vision innovations quickly. To do so, we encapsulate filters in Docker containers.
Second is runtime efficiency. Eureka needs to be able to rapidly discard large volume of data. To do so, we exploit specialized hardware such as GPUs on cloudlets when available. We also cache filter results to exploit temporal locality in typical Eureka workloads.
Another interesting problem is to match the Eureka system to the user. We propose that, ideally, the system should deliver images to user at a rate the user can inspect them. Because if the system is delivering too fast, you are pumping lots of results into the network which the user may never see. So it’s a waste of computation and precious Internet bandwidth. Our suggestion in this case is to restrict your search to fewer cloudlets
( or to bias your filters towards precision rather than recall.)
On the other hand, if the system is delivering too slowly, you are basically forcing the user to wait. And wasting an expert’s time is a really bad thing to do. An obvious solution is to scale out to more cloudlets. But there is a risk here. Showing more “junk” to user will cause user annoyance and dissatisfaction. So you really need to strike a balance between avoiding user wait time and avoiding too many false positives. Our rule of thumb in this scenario is one should focus on reducing the false positive rate before scaling out the many cloudlets.
To evaluate Eureka, we 99 million Flickr images from the YFCC100M dataset. On the edge we have 8 cloudlets with local access to data. And the client GUI connects to the cloudlets over the Internet.
We conducted three case studies using these three chosen targets – deer, Taj Mahal and fire hydrant. As you can see from the base rate, these are fairly rare objects in Flickr photos. We used Eureka to collect about 100 positive examples of each. Here you can see the number of images viewed by the user, and images discarded by Eureka in the whole process. You can see how effective Eureka is in reducing the amount of data the user needs to look at and label.
We compare Eureka with a brute-force method, where the user goes through the images one by one and label them. That’s basically how many datasets are curated today. For reference, we also compare with what we called “single-iteration Eureka”, which means using early-discard, but without iterative improvement.
Y-axis shows how many images the user viewed in order to collect the same number of positives. As you can see, compared with brute-force, single-iteration Eureka gives you up to an order of magnitude of improvement, showing the efficacy of early-discard. On top of that, full Eureka gives another order of magnitude of improvement, showing the benefit of the iterative workflow.
We show how Eureka is iteratively improving user’s productivity, in the case of deer. We measure productivity in terms of new true positives found in each Eureka iteration. Over five iterations of using Eureka, the productivity increases from 0.4 to 4.7, more than 10X improvement.
Finally, we show the importance of edge computing. Specifically, we show when the data is at the edge, the compute must also be at the edge for Eureka to be efficient. To do so, we throttled the bandwidth between the cloudlet and the data source, and measure the machine processing throughput of an RGB histogram filter. The result shows it really needs LAN-connectivity at 1Gbps to deliver sufficiently high throughput. If the data is shipped over the wide area network, it slows down by about 10X.
In conclusion, …
(….)
Our evaluation shows ….
Why is it hard? Most importantly, crowds are not experts. So crowd-sourcing approaches like Amazon Mechanical Turk are not applicable in these domains. Only an expert can reliably classify these animals. Besides, these interesting phenomena are usually rare, making it difficult to find positive examples in unlabeled. Finally, there may exists access restriction of data, such as patient privacy, business policy and national security. In the worst case, a single expert has to generate an entire training set of thousands of to tens of thousands of examples.
Here shows the execution model. On the cloudlet, a component called itemizer reads whatever data in its raw format, and emits individual items. Items are independently unit of early-discard. Items are then feed into the item processor. Here a chain of filters evaluate the items and try to drop them. We encapsulate filters in Docker containers to achieve software generality I just mentioned. And we cache filter results to improve efficiency. Finally, the filters also attach key-value attributes to each item. These attributes both facilitates communication between filters during the run time and post-analysis after they are sent back to the user.
Finally, we study the importance of edge computing for Eureka. Specifically, how necessary is high-bandwidth access to data. Here we throttle the bandwidth between the cloudlet and the data source, and measure the machine processing throughput of three filters, including cheap ones and expensive ones. As you can see, when we decrease the bandwidth, the throughput drops significantly. Under 25 Mbps, there is basically no difference between cheap filters and expensive filters, because data access time becomes the bottleneck. So we see high-bandwidth access is crucial to the efficacy of Eureka.