Slides from the keynote speech given at SAPIENCE 2018, International conference at SNGCE Kolenchery. It speaks about, the usage of Cloud computing for image processing and genomics
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Cloud computing for image processing and bio informatics
1. Cloud Computing support for image processing
and Genomics - An industrial perspective
ganesh.vigneswara@gmail.com, ni_ganesh@cb.amrita.edu
Dr Ganesh Neelakanta Iyer
Amrita Vishwa Vidyapeetham
Associate Professor, Dept of Computer Science and Engg
Amrita School of Engineering, Coimbatore
2. About Me • Associate Professor, Amrita Vishwa Vidyapeetham
• Masters & PhD from National University of Singapore (NUS)
• Several years in Industry/Academia
• Sasken, NXP, Progress, IIIT-H, NUS
• Architect, Manager, Technology Evangelist, Professor
• Talks/workshops in USA, Europe, Australia, Asia
• Kathakali Artist, Composer, Speaker, Traveler, Photographer
GANESHNIYER http://ganeshniyer.com
3. Outline
• Introduction
• Cloud Computing for Image Processing
– Perspectives
– Industry examples
• Cloud Computing for Genomics
– Perspectives
– Industry examples
• Challenges and Conclusions
7. Dr Ganesh Neelakanta Iyer 7
https://qz.com/india/1367639/kerala-floods-the-week-that-was-in-pictures/
8. Dr Ganesh Neelakanta Iyer 8
https://qz.com/india/1367639/kerala-floods-the-week-that-was-in-pictures/
9. Dr Ganesh Neelakanta Iyer 9
https://qz.com/india/1367639/kerala-floods-the-week-that-was-in-pictures/
10. 10DigitalGlobe has also released pre-and-post Kerala disaster https://www.geospatialworld.net/blogs/kerela-floods-geospatial-technologies-playing-a-crucial-role/
14. Technology for disaster management
Remote
Sensing
Geospatial
technologies
Satellite
Imagery
Dr Ganesh Neelakanta Iyer 14
15. Remote sensing and Satellite Imagery
• A set of remote sensing satellites and radar satellites
have been clicking high-resolution images of the areas
worst affected by the flood
• The images have been captured from a distance of 400-
800 kilometers from the earth’s surface
• Once the data is analyzed and processed, it becomes
easy to predict the level of rainfalls in the next few hours
and whether the situation would remain as alarming
Dr Ganesh Neelakanta Iyer 15
16. Remote sensing and Satellite Imagery
• ISRO’s ResourceSat-2 satellite has proven to be beneficial in clicking
pictures of vegetation, water bodies and other terrains
• Another satellite, Insat 3D, conveys the information about cloud
positioning and enables us to reach to a conclusion about wind velocity
• Insat is geostationary satellites relaying information to the ground station
every 30 minutes.
• Remote sensing using Microwave satellites is also beneficial in these
unforeseen situations
• The electromagnetic waves can penetrate the cloud and get info on
surface hydrology.
• ScatSat-1 data is mostly used for detecting and tracking oceanic tides,
floods, and cyclones.
Dr Ganesh Neelakanta Iyer 16
18. Cloud Computing for Image Processing
• Image processing and vision applications may benefit from cloud
computing since many are both data and compute intensive
• The rate at which such images must be captured and analyzed
varies considerably from application to application
• While high-speed image capture may not be necessary in digital
pathology systems, for example, it is critical in machine vision
systems designed to inspect automotive parts at rates of thousands
(or more) parts per minute
• In such systems, the speed of image capture and processing is
critical and - most importantly – so is the latency of the vision system
and the pass/fail rejection mechanism that may be required
Dr Ganesh Neelakanta Iyer 18
19. Cloud Computing for Image Processing
• With a promise to decentralize computation required in
both image processing and machine vision systems,
cloud computing impact applications that currently employ
local processing power and storage
• By remotely locating processing and storage capabilities,
image processing applications can be employed remotely
and may be paid for by the user on as-needed or pay-per-
use business models
Dr Ganesh Neelakanta Iyer 19
20. Industry leaders in Cloud – Image
Processing domains
Dr Ganesh Neelakanta Iyer 20
21.
22. Google
https://cloud.google.com/vision/
• Cloud Vision offers both pretrained models via an API and the ability
to build custom models using AutoML Vision to provide flexibility
depending on your use case
• It quickly classifies images into thousands of categories, detects
individual objects and faces within images, and reads printed words
contained within images
• Build metadata on your image catalog, moderate offensive content,
or enable new marketing scenarios through image sentiment
analysis
Dr Ganesh Neelakanta Iyer 22
23. Google
• AutoML Vision Beta helps novice ML knowledge
developers to train high-quality custom models
• After uploading and labeling images, AutoML Vision will
train a model that can scale as needed to adapt to
demands
• AutoML Vision offers higher model accuracy and faster
time to create a production-ready model
Dr Ganesh Neelakanta Iyer 23
29. Characteristics
• Easily detect broad sets of objects in your images, from
flowers, animals, or transportation to thousands of other
object categories commonly found within images
• Vision API improves over time as new concepts are
introduced and accuracy is improved
• With AutoML Vision, you can create custom models that
highlight specific concepts from your images
• This enables use cases ranging from categorizing product
images to diagnosing diseases
Insight from your images
Dr Ganesh Neelakanta Iyer 29
30. Characteristics
• Optical Character Recognition (OCR)
enables you to detect text within your
images, along with automatic language
identification
• Vision API supports a broad set of languages
Extract text
Dr Ganesh Neelakanta Iyer 30
31. Characteristics
• Vision API uses the power of Google Image Search to
find topical entities like celebrities, logos, or news
events
• Millions of entities are supported, so you can be
confident that the latest relevant images are available
• Combine this with Visually Similar Search to find
similar images on the web
Power of the web
Dr Ganesh Neelakanta Iyer 31
32. Characteristics
• Powered by Google SafeSearch, easily moderate
content and detect inappropriate content from
your crowd-sourced images
• Vision API enables you to detect different types of
inappropriate content, from adult to violent
content
Content moderation
Dr Ganesh Neelakanta Iyer 32
34. Image search
Use Vision API and AutoML Vision to make images searchable across broad topics and
scenes, including custom categories.
Dr Ganesh Neelakanta Iyer 34
https://cloud.google.com/solutions/image-search-app-with-cloud-vision/
36. Product Search
Find products of interest within images and visually search product catalogs using Cloud Vision API
Dr Ganesh Neelakanta Iyer 36
37. Cloud Vision API features
Label
detection
Web
detection
Optical
character
Handwriting
recognitionBETA
Logo
detection
Object
localizerBETA
Integrated
REST API
Landmark
detection
Face
detection
Content
moderation
ML Kit
integration
Product
searchBETA
Image
attributes
Dr Ganesh Neelakanta Iyer 37
40. Video Intelligence
• Does video analysis, classification, and labeling
• Searching through videos based on the extracted metadata
• Detect change of the scene and filter the explicit content
Dr Ganesh Neelakanta Iyer 40
42. Microsoft Computer Vision
• Extract rich information from images to categorize and process
visual data
• Perform machine-assisted moderation of images
• Returns information about visual content found in an image
• Use tagging, domain-specific models, and descriptions in four
languages to identify content and label it with confidence
• Apply adult settings to help you detect potential adult content
• Identify image types and color schemes in pictures
Dr Ganesh Neelakanta Iyer 42
44. Microsoft Computer Vision
Dr Ganesh Neelakanta Iyer 44
Analyze an
image
Read text in
images
Preview: Read
handwritten
text from
images
Recognize
celebrities and
landmarks
Analyze video
in near real-
time
Generate a
thumbnail
47. Amazon Rekognition
https://aws.amazon.com/rekognition/
You just provide an image or video to the Rekognition API, and the service can identify
the objects, people, text, scenes, and activities, as well as detect any inappropriate
content
Provides highly accurate facial analysis and facial recognition on images and video that
you provide
You can detect, analyze, and compare faces for a wide variety of user verification, people
counting, and public safety use cases
Simple and easy to use API that can quickly analyze any image or video file stored in
Amazon S3
Amazon Rekognition is always learning from new data, and we are continually adding
new labels and facial recognition features to the service
Dr Ganesh Neelakanta Iyer 47
57. Clarifai
https://clarifai.com/
• Clarifai Predict, Search and Create make it easy to integrate
Computer Vision into your existing product or technology
• Whether you run an online marketplace, an e-commerce
store, a content management platform, or a real-estate
company, Clarifai’s computer vision AI platform powers your
business with the goal of maximizing your profits or
understanding user activity
Dr Ganesh Neelakanta Iyer 57
61. Genomic data processing with Cloud
• Dealing with large genomic data on a limited computing
resource has been an inevitable challenge in life science
• Bioinformatics applications have required high performance
computation capabilities for next-generation sequencing
(NGS) data and the human genome sequencing data with
single nucleotide polymorphisms (SNPs)
• Cloud computing platforms have been widely adopted to deal
with the large data sets with parallel processing tools
Dr Ganesh Neelakanta Iyer 61
62. Genomic data processing with Cloud
• Biomedical research has become a digital data–intensive
endeavor, relying on secure and scalable computing,
storage, and network infrastructure
• For certain types of biomedical applications, cloud
computing has emerged as an alternative to locally
maintained traditional computing approaches
Dr Ganesh Neelakanta Iyer 62
63. Examples of cloud types, service models, workflows, and platforms for biomedical applications
Navale V, Bourne PE (2018) Cloud computing applications for biomedical science: A perspective.
PLOS Computational Biology 14(6): e1006144. https://doi.org/10.1371/journal.pcbi.1006144
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006144
65. BLAST
Tool for biomedical research
• A BLAST server image can be hosted on AWS, Azure, and GCP public clouds to allow
users to run stand-alone searches with BLAST
• Users can also submit searches using BLAST through the National Center for
Biotechnology Information (NCBI) API to run on AWS and Google Compute Engine
• Azure can be leveraged to execute large BLAST sequence matching tasks within
reasonable time limits
– Azure enables users to download sequence databases from NCBI, run different BLAST
programs on a specified input against the sequence databases, and generate visualizations
from the results for easy analysis
– Azure also provides a way to create a web UI for scheduling and tracking the BLAST match
tasks, visualizing results, managing users, and performing basic tasks
Dr Ganesh Neelakanta Iyer 65
66. CloudAligner and more…
• CloudAligner is a fast and full-featured MapReduce-based tool for
sequence mapping, designed to be able to deal with long sequences
• CloudBurst can provide highly sensitive short read mapping with
MapReduce
• High-throughput sequencing analyses can be carried out by the
Eoulsan package integrated in a cloud IaaS environment
• For whole genome resequencing analysis, Crossbow is a scalable
software pipeline
– Crossbow combines Bowtie, an ultrafast and memory efficient short read
aligner, and SoapSNP, a genotyper, in an automatic parallel pipeline that
can run in the cloud
Dr Ganesh Neelakanta Iyer 66
67. Workflows and platforms
• Integration of genotype, phenotype, and clinical data is
important for biomedical research
• Biomedical platforms can provide an environment for
establishing an end-to-end pipeline for data acquisition,
storage, and analysis
Dr Ganesh Neelakanta Iyer 67
68. Galaxy
• Galaxy, an open source, web-based platform, is used for
data–intensive biomedical research
• For large scale data analysis, Galaxy can be hosted in cloud
IaaS
• Reliable and highly scalable cloud-based workflow systems
for next-generation sequencing analyses has been achieved
by integrating the Galaxy workflow system with Globus
Provision
Dr Ganesh Neelakanta Iyer 68
69. Galaxy
• Galaxy software framework is an open-source application
• Its goal is to develop and maintain a system that enables researchers
without informatics expertise to perform computational analyses through
the web
• A user interacts with Galaxy through the web by uploading and analyzing
the data
• Galaxy interacts with underlying computational infrastructure (servers that
run the analyses and disks that store the data) without exposing it to the
user
Dr Ganesh Neelakanta Iyer 69
70. Galaxy
Galaxy is a web application that allows processing of large datasets using powerful
private/public/hybrid cloud infrastructure that the user never directly interacts with
70
71. BPDC
• The Bionimbus Protected Data Cloud (BPDC) is a private cloud-based
infrastructure for managing, analyzing, and sharing large amounts of
genomics and phenotypic data in a secure environment, which was used
for gene fusion studies
• BPDC is primarily based on OpenStack, open source software that
provides tools to build cloud platforms with a service portal for a single
point of entry and a single sign-on for various available BPDC resources
• Using BPDC, data analysis for the acute myeloid leukemia (AML)
resequencing project was rapidly performed to identify somatic variants
expressed in adverse-risk primary AML samples
Dr Ganesh Neelakanta Iyer 71
72. AWS Genomics in the Cloud
• AWS allows you to simplify and securely scale genomic analysis
• AWS provides an ecosystem of partners for tools and datasets that are
prepared for your sensitive data and scalable workloads
• Efficiently and dynamically store and compute your data, collaborate
with peers, and integrate findings into clinical practice
• You can also address security and compliance concerns in many
ways, such as encrypting your data in rest and transit or de-identify
patient information
Dr Ganesh Neelakanta Iyer 72
74. Genomic ancestry inference with deep
learning – Google Cloud Platform
• 1000 Genomes dataset
• Simons Genome Diversity Project
– hosts complete human genome sequences from more than one
hundred diverse human populations
– The data is stored on Google Cloud Storage and Google BigQuery.
• Model building approach
– First need to train a machine learning model using an algorithm –
TensorFlow
– Principles of neural networks
74https://cloud.google.com/blog/products/gcp/genomic-ancestry-inference-with-deep-learning
75. Genomic ancestry inference with deep
learning
Dr Ganesh Neelakanta Iyer 75
https://cloud.google.com/blog/products/gcp/genomic-ancestry-inference-with-deep-learning
VIDEO
76. Conclusions
Cloud usage, from large-scale genomics analysis to
remote monitoring of patients to molecular diagnostics
work in clinical laboratories, has advantages but also
potential drawbacks
A first step is the determination of what type of cloud
environment best fits the application and then whether it
represents a cost-effective solution
Dr Ganesh Neelakanta Iyer 76
77. Conclusions
The ubiquitous nature of clouds raises questions
regarding security and accessibility, particularly as it
relates to geopolitical boundaries#
Cost benefits of using clouds over other compute
environments need to be carefully assessed as they
relate to the size, complexity, and nature of the task
Dr Ganesh Neelakanta Iyer 77
78. Conclusions
For example, a simple, small prototype can be
tested in a cloud environment and immediately
scaled up to handle very large data
On the other hand, there is a cost associated
with such usage, particularly in extricating the
outcomes of the computation
Dr Ganesh Neelakanta Iyer 78
79. What is clear, however, is that clouds are a
growing part of the biomedical
computational ecosystem and are here to
stay
80. Dr Ganesh Neelakanta Iyer
ni_amrita@cb.amrita.edu
ganesh.vigneswara@gmail.com
GANESHNIYER