The document discusses the collision of big data in biomedical imaging. Specifically, it notes that population image data from millions of hardware devices and thousands of software tools creates the perfect storm for big data in computational neuroimaging and digital pathology. It provides examples of how terabytes of raw imaging data and petabytes of derived analytical results are being generated from sources like digital pathology and neuroimaging studies. Managing and analyzing this large, multi-modal medical imaging data requires scalable big data techniques and architectures.
On March 23, 2016, Prof. Henning Müller (HES-SO Valais-Wallis and Martinos Center) presented Medical image analysis and big data evaluation infrastructures at Stanford medicine.
Machine Learning for Medical Image Analysis:What, where and how?Debdoot Sheet
A great career advice for EECS (Electrical, electronics and computer science) graduates interested in machine vision and some advice for a PhD career in Medical Image Analysis.
On March 23, 2016, Prof. Henning Müller (HES-SO Valais-Wallis and Martinos Center) presented Medical image analysis and big data evaluation infrastructures at Stanford medicine.
Machine Learning for Medical Image Analysis:What, where and how?Debdoot Sheet
A great career advice for EECS (Electrical, electronics and computer science) graduates interested in machine vision and some advice for a PhD career in Medical Image Analysis.
Digital Pathology at John Hopkins
Practical Research and Clinical Considerations
Alexander Baras
Presented at the Digital Pathology Congress: USA. For more information visit: www.global-engage.com.
On April 11th 2016, Prof. Prof. Henning Müller (HES-SO Valais-Wallis and Martinos Center) presented Challenges in medical imaging and the VISCERAL model at National Cancer Institute in Washington.
Presentation by Prof. Dr. Henning Müller.
Overview:
- Medical image retrieval projects
- Image analysis and 3D texture modeling
- Data science evaluation infrastructures (ImageCLEF, VISCERAL, EaaS – Evaluation as a Service)
- What comes next?
Information fusion and algorithm training framework objective in ICT4Life H2020 Project. Presented in IEEEHealthcom'16 within Project Alfred workshop in Munich (14-17 September 2016).
Here in these slides we are going to discuss about the Digital pathology in which we have discuss about the working, role, benefits and requirements of Digital pathology.
Digital pathology and its importance as an omics data layerYves Sucaet
Bioinformatics and pathology are obvious scientific partners. Bioinformatics often takes places at the most basic (almost chemical, or even physical) level of life, but much of its procedures to obtain data are destructive. Pathology on the other hand takes place at a much more coarse level of data acquisition (usually where the physical properties of visible light end), but has the advantage of being rooted in the tradition of medicine. The traditional paradigm of pathology is "tissue is the issue". Morphology (exactly the component that often gets overlooked in bioinformatics) plays a large role and helps millions of patients each year around the world. Pathology is proven technology, bioinformatics is limited to niche applications.
With the development of whole slide imaging technology some twenty years ago, digital pathology became possible. Observations that used to be for the eyes of the pathologist only, could now be captured and translated into high-resolution pixels, and studied by and communicated to many. Many began to dream of automated tissue evaluation systems and AI-pathology, some even going as far as to suggest the replacement of the pathologist by intelligent computer systems.
Meanwhile in several areas of bioinformatics, new limits are being hit. Yes, we can do high-throughput experiments, but noisy datasets are often the results, (inter- and even intra-observer) replicability is difficult, and statistics only offer limited relief.
The goal of this introductory lecture is to highlight the problems as well as opportunities for both fields of study, and how exchange of experiences, and (in a later stadium) integration of techniques close the scientific gap that still exists in a great many areas.
There is no lack of pathology-centric workshops that offer insights into the world of algorithms. With the CPW event however, we take another approach. We want to bring together the most advanced groups in digital pathology, with the bioinformatics community, to explore the opportunities that exist on both sides of the fence.
We start by explaining the basic data types that are introduced by digital pathology. We also explain where they come from, and why this presents unique challenges when it comes to data mining and image analysis. Finally, we introduce PMA.start, a free software environment that can be used to universally gain access to digital pathology (imaging) data.
Bioinformatics groups can help quantify, model, and reduce morphological whole tissue data. Pathologists can help interpret and explain heterogeneous high-throughput datasets. And the first seeds of such collaboration can be planted right here, in Athens.
Lecture on the role of information systems within healthcare. Adressing the various types of information systems and their respective benefits. Also, PACS maturity as a concept is introduced.
Theoretical principles and practical implications of Picture Archiving and Communication Systems (PACS). I also introduce the concept of PACS Maturity, strategic planning, Business/IT-alingment in radiology
Digital Pathology at John Hopkins
Practical Research and Clinical Considerations
Alexander Baras
Presented at the Digital Pathology Congress: USA. For more information visit: www.global-engage.com.
On April 11th 2016, Prof. Prof. Henning Müller (HES-SO Valais-Wallis and Martinos Center) presented Challenges in medical imaging and the VISCERAL model at National Cancer Institute in Washington.
Presentation by Prof. Dr. Henning Müller.
Overview:
- Medical image retrieval projects
- Image analysis and 3D texture modeling
- Data science evaluation infrastructures (ImageCLEF, VISCERAL, EaaS – Evaluation as a Service)
- What comes next?
Information fusion and algorithm training framework objective in ICT4Life H2020 Project. Presented in IEEEHealthcom'16 within Project Alfred workshop in Munich (14-17 September 2016).
Here in these slides we are going to discuss about the Digital pathology in which we have discuss about the working, role, benefits and requirements of Digital pathology.
Digital pathology and its importance as an omics data layerYves Sucaet
Bioinformatics and pathology are obvious scientific partners. Bioinformatics often takes places at the most basic (almost chemical, or even physical) level of life, but much of its procedures to obtain data are destructive. Pathology on the other hand takes place at a much more coarse level of data acquisition (usually where the physical properties of visible light end), but has the advantage of being rooted in the tradition of medicine. The traditional paradigm of pathology is "tissue is the issue". Morphology (exactly the component that often gets overlooked in bioinformatics) plays a large role and helps millions of patients each year around the world. Pathology is proven technology, bioinformatics is limited to niche applications.
With the development of whole slide imaging technology some twenty years ago, digital pathology became possible. Observations that used to be for the eyes of the pathologist only, could now be captured and translated into high-resolution pixels, and studied by and communicated to many. Many began to dream of automated tissue evaluation systems and AI-pathology, some even going as far as to suggest the replacement of the pathologist by intelligent computer systems.
Meanwhile in several areas of bioinformatics, new limits are being hit. Yes, we can do high-throughput experiments, but noisy datasets are often the results, (inter- and even intra-observer) replicability is difficult, and statistics only offer limited relief.
The goal of this introductory lecture is to highlight the problems as well as opportunities for both fields of study, and how exchange of experiences, and (in a later stadium) integration of techniques close the scientific gap that still exists in a great many areas.
There is no lack of pathology-centric workshops that offer insights into the world of algorithms. With the CPW event however, we take another approach. We want to bring together the most advanced groups in digital pathology, with the bioinformatics community, to explore the opportunities that exist on both sides of the fence.
We start by explaining the basic data types that are introduced by digital pathology. We also explain where they come from, and why this presents unique challenges when it comes to data mining and image analysis. Finally, we introduce PMA.start, a free software environment that can be used to universally gain access to digital pathology (imaging) data.
Bioinformatics groups can help quantify, model, and reduce morphological whole tissue data. Pathologists can help interpret and explain heterogeneous high-throughput datasets. And the first seeds of such collaboration can be planted right here, in Athens.
Lecture on the role of information systems within healthcare. Adressing the various types of information systems and their respective benefits. Also, PACS maturity as a concept is introduced.
Theoretical principles and practical implications of Picture Archiving and Communication Systems (PACS). I also introduce the concept of PACS Maturity, strategic planning, Business/IT-alingment in radiology
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaMaria de la Iglesia
Según Hal Varian (experto en microeconomía y economía de la información y, desde el año 2002, Chief Economist de Google) “En los próximos años, el trabajo más atractivo será el de los estadísticos: La capacidad de recoger datos, comprenderlos, procesarlos, extraer su valor, visualizarlos, comunicarlos serán todas habilidades importantes en las próximas décadas. Ahora disponemos de datos gratuitos y omnipresentes. Lo que aún falta es la capacidad de comprender estos datos“.
Europe needs a clear strategy for leveraging Big Data
Economy in Europe. Our objectives are work at technical, business and policy levels, shaping the future through the positioning of Big Data in Horizon 2020. Bringing the necessary stakeholders into a sustainable industry-led initiative, which will greatly contribute to enhance the EU competitiveness taking full advantage of Big Data technologies.
Baptist Health: Solving Healthcare Problems with Big DataMapR Technologies
Editor’s Note: Download the complimentary MapR Guide to Big Data in Healthcare for more information: https://mapr.com/mapr-guide-big-data-healthcare/
There is no better example of the important role that data plays in our lives than in matters of our health and our healthcare. There’s a growing wealth of health-related data out there, and it’s playing an increasing role in improving patient care, population health, and healthcare economics.
Join this webinar to hear how Baptist Health is using big data and advanced analytics to address a myriad of healthcare challenges—from patient to payer—through their consumer- centric approach.
MapR Technologies will cover broader big data healthcare trends and production use cases that demonstrate how to converge data and compute power to deliver data-driven healthcare applications.
Abstract:
Big Data concern large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. This paper presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution.
Real World Application of Big Data In Data Mining Toolsijsrd.com
The main aim of this paper is to make a study on the notion Big data and its application in data mining tools like R, Weka, Rapidminer, Knime,Mahout and etc. We are awash in a flood of data today. In a broad range of application areas, data is being collected at unmatched scale. Decisions that previously were based on surmise, or on painstakingly constructed models of reality, can now be made based on the data itself. Such Big Data analysis now drives nearly every aspect of our modern society, including mobile services, retail, manufacturing, financial services, life sciences, and physical sciences. The paper mainly focuses different types of data mining tools and its usage in big data in knowledge discovery.
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...European Data Forum
BIG - NESSI Networking Session, Talk by Edward Curry, National University of Ireland Galway at the European Data Forum 2014, 20 March 2014 in Athens, Greece: The Big Data Value Chain.
The software development process is complete for computer project analysis, and it is important to the evaluation of the random project. These practice guidelines are for those who manage big-data and big-data analytics projects or are responsible for the use of data analytics solutions. They are also intended for business leaders and program leaders that are responsible for developing agency capability in the area of big data and big data analytics .
For those agencies currently not using big data or big data analytics, this document may assist strategic planners, business teams and data analysts to consider the value of big data to the current and future programs.
This document is also of relevance to those in industry, research and academia who can work as partners with government on big data analytics projects.
Technical APS personnel who manage big data and/or do big data analytics are invited to join the Data Analytics Centre of Excellence Community of Practice to share information of technical aspects of big data and big data analytics, including achieving best practice with modeling and related requirements. To join the community, send an email to the Data Analytics Centre of Excellence
Similar to BIMCV: The Perfect "Big Data" Storm. (20)
Cada uno de nosotros es único. La ciencia lleva mucho tiempo esforzándose por explicar dónde reside exactamente nuestra singularidad. Posiblemente no sólo esté escrita en nuestro "Genoma" sino también en nuestro “Conectoma”. Muchos científicos aseguran que nuestra singularidad es un proceso dinámico que modifica y reescribe nuestro "Conectoma", principalmente a través de los pensamientos y de cómo nos relacionamos con nuestro entorno. Hoy existe una carrera para ver quién desentraña antes las conexiones neuronales, descifrando así el mapa del Conectoma. Se avecinan grandes descubrimientos.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
1. BIMCV: The Perfect "Big Data" Storm.
Collision of Peta Bytes of Population Image Data, Millions of Hardware
Devices and Thousands of Software Tools.
e-Infraestructuras
Nacionales
Maria de la Iglesia, PhD. http://ceib.san.gva.es
2. OVERVIEW
• Big Data
• Strategic Vision of Big Data in EU
• Strategic Vision of Big Data in US
• Big Data in Neuroimaging
• Population Imaging
• EuroBioimaging – BIMCV – Valencia Node
• Neuroimaging
• Relevant facts
5. Big data techniques and
technologies
• Techniques for analyzing big data
– A/B testing.
• Association rule learning.
• Classification.
• Cluster analysis.
• Crowdsourcing.
• Data fusion and data integration.
– Signal processing
– natural language processing
• Data mining.
6. Big data techniques and
technologies
• Techniques for analyzing big data
– Ensemble learning
– Genetic algorithms
– Machine learning
– Natural language processing (NLP)
– Neural Networks
• Pattern recognition
– Network analysis
7. Big data techniques and
technologies
• Techniques for analyzing big data
– Optimization
• Pattern recognition
• Predictive modeling.
• Regression.
• Signal processing
– time series analysis
– data fusion
• Spatial analysis.
• Statistics.
8. Big data techniques and
technologies
• Big DataTechnologies
– Big Table. (Proprietary distributed database system
built on the Google File System. Inspiration for
Hbase)
– Business intelligence (BI). BI tools are often used to
read data that have been previously stored in a data
warehouse or data mart
– Cassandra. An open source (free) database
management system designed to handle huge
amounts of data on a distributed system. This system
was originally developed at Facebook and is now
managed as a project of the Apache Software
foundation
9. Big data techniques and
technologies
• Big DataTechnologies
– Cloud computing.
– Data mart.
– Data warehouse. using ETL (extract, transform, and load)
– Distributed system.
– Dynamo
– ETL
– Google File System.
– Hadoop
– HBase.
– MapReduce.
– Mashup
10. Big data techniques and
technologies
• Big DataTechnologies
– Non-relational database.
– R.
– Relational database.
– Semi-structured data.
– SQL.
– Stream processing.
– Structured data.
– Unstructured data.
– Visualization.
11. Big data techniques and
technologies
• Big DataTechnologies
– VISUALIZATION
• Tag cloud
• Clustergram
• History flow
• Spatial information flow
18. How Is the Europe Union Responding?
In Big Data
19. Panel: Personalized Medicine in the
Era of Big Data
EHTEL Symposium
Tapani Piha
• Head of Unit for eHealth and Technology
Assessment
European Commission
DG Health and Consumers
Health Systems and Products
20. How does Big Data link to the
Personalized Medicine?
•Big Data refers to a collection of data sets so
large and complex, it’s impossible to process
them with the usual databases and tools
•The data is gathered (most of the time) by
people just living their lives (e.g. using mobile
phones, the internet, driving cars, paying with
banking cards)
•Big data is used in the private sector (e.g.
Google), and in the public sector (e.g. NSA)
21. Big Data use in public health &
health care?
•Research: "In the last five years, more scientific
data has been generated than in the entire history
of mankind”1
•Health care: more evidence about personalized
treatment, better selection of right provider, better
equipped health care providers (e.g. IBM's Watson)
•Public health: better personalized life-style info
for citizens, earlier detection of epidemics, more
and quicker access to epidemiological
information
12012 Winston Hide, The Promise of Big Data, Harvard Public Health
22. Commission action on Big Data
•BIG-project: multi-sectorial initiative started in
2011 to promote adoption of earlier waves of big
data technology and contribute to EU
competitiveness;
•Green paper on mHealth: to assess market and
further clarify what is needed in the legal
framework concerning mHealth
•Study in health program: to assess the usages
and adoption of big data programs for (public)
health systems within the EU.
25. How Is U.S. Responding?
National Institute of Standards an
Technology (NIST)
NIST is an agency of the U.S. Department of Commerce.
To search federal science and technology web sites, including online databases see:
science.org
NIST program questions:
Public Inquiries Unit: (301) 975-NIST (6478), Federal Relay Service (800) 877-8339 (TTY).
NIST, 100 Bureau Drive, Stop 1070, Gaithersburg, MD 20899-1070
Technical website questions: DO-webmaster@nist.gov
26. NIST Big Data Public Working Group
Big Data PWG Overview Presentation
September 30, 2013
Wo Chang, NIST
Robert Marcus, ET-Strategies
Chaitanya Baru, UC San Diego
27. Agenda
• Why Big Data? Why NIST?
• NBD-PWG Charter
• Overall Workplan
• Subgroups Charter and Deliverables
– Use Case and Requirements SG
– Definitions and Taxonomies SG
– Reference Architecture SG
– Security and Privacy SG
– Technology Roadmap SG
• Next Steps
9/30/13 NBD-PWG Overview
28
28. Why Big Data? Why NIST?
• Why Big Data? There is a broad agreement among commercial, academic, and government
leaders about the remarkable potential of “Big Data” to spark innovation, fuel commerce,
and drive progress.
• Why NIST? (a) Recommendation from January 15 -- 17, 2013 Cloud/Big Data Forum and (b)
A lack of consensus on some important, fundamental questions is confusing potential users
and holding back progress. Questions such as:
– What are the attributes that define Big Data solutions?
– How is Big Data different from the traditional data environments and related
applications that we have encountered thus far?
– What are the essential characteristics of Big Data environments?
– How do these environments integrate with currently deployed architectures?
– What are the central scientific, technological, and standardization challenges that
need to be addressed to accelerate the deployment of robust Big Data solutions?
NBD-PWG is being launched to address these questions and is charged to develop
consensus definitions, taxonomies, secure reference architecture, and technology roadmap
for Big Data that can be embraced by all sectors.
9/30/13 NBD-PWG Overview
29
29. NBD-PWG Deliverables
Working Drafts version 1.0 for
1. Big Data Definitions
2. Big Data Taxonomies
3. Big Data Requirements
4. Big Data Security and Privacy Requirements
5. Big Data Architectures White Paper Survey
6. Big Data Reference Architectures
7. Big Data Security and Privacy Reference Architectures
8. Big Data Technology Roadmap
9/30/13 NBD-PWG Overview
30
31. Big Data Ecosystem in One Sentence
• Use Clouds running Data Analytics
Collaboratively processing Big Data to solve
problems in X-Informatics ( or e-X)
• X = Astronomy, Biology, Biomedicine, Business, Chemistry, Climate,
Crisis, Earth Science, Energy, Environment, Finance, Health, Intelligence,
Lifestyle, Marketing, Medicine, Pathology, Policy, Radar, Security,
Sensor, Social, Sustainability, Wealth and Wellness with more fields
(physics) defined implicitly
• Spans Industry and Science (research)
• Education: Data Science see recent New York Times articles
• http://datascience101.wordpress.com/2013/04/13/new-york-times-data-
science-articles/
32
33. Big Data Definition
• More consensus on Data Science definition than that of Big Data
• Big Data refers to digital data volume, velocity and/or variety
that:
– Enable novel approaches to frontier questions previously inaccessible or
impractical using current or conventional methods; and/or
– Exceed the storage capacity or analysis capability of current or
conventional methods and systems; and
– Differentiates by storing and analyzing population data and not sample
sizes.
– Needs management requiring scalability across coupled horizontal
resources
34
38. Electronic Medical Record (EMR) Data I
• Application: Large national initiatives around health data are emerging, and
include developing a digital learning health care system to support
increasingly evidence-based clinical decisions with timely accurate and up-to-
date patient-centered clinical information; using electronic observational
clinical data to efficiently and rapidly translate scientific discoveries into
effective clinical treatments; and electronically sharing integrated health
data to improve healthcare process efficiency and outcomes. These key
initiatives all rely on high-quality, large-scale, standardized and aggregate
health data. One needs advanced methods for normalizing patient,
provider, facility and clinical concept identification within and among
separate health care organizations to enhance models for defining and
extracting clinical phenotypes from non-standard discrete and free-text
clinical data using feature selection, information retrieval and machine
learning decision-models. One must leverage clinical phenotype data to
support cohort selection, clinical outcomes research, and clinical decision
support.
40
PP, Fusion, S/Q, Index Parallelism Streaming over EMR (a set per person), viewers
39. Electronic Medical Record (EMR) Data II
• Current Approach: Clinical data from more than 1,100 discrete logical,
operational healthcare sources in the Indiana Network for Patient Care
(INPC) the nation's largest and longest-running health information
exchange. This describes more than 12 million patients, more than 4
billion discrete clinical observations. > 20 TB raw data. Between
500,000 and 1.5 million new real-time clinical transactions added per
day.
• Futures: Teradata, PostgreSQL and MongoDB supporting information
retrieval methods to identify relevant clinical features (tf-idf, latent
semantic analysis, mutual information). Natural Language Processing
techniques to extract relevant clinical features. Validated features will
be used to parameterize clinical phenotype decision models based on
maximum likelihood estimators and Bayesian networks. Decision
models will be used to identify a variety of clinical phenotypes such as
diabetes, congestive heart failure, and pancreatic cancer.
41
40. Pathology Imaging/ Digital Pathology I
• Application: Digital pathology imaging is an emerging field where examination of high
resolution images of tissue specimens enables novel and more effective ways for
disease diagnosis. Pathology image analysis segments massive (millions per image)
spatial objects such as nuclei and blood vessels, represented with their boundaries,
along with many extracted image features from these objects. The derived information
is used for many complex queries and analytics to support biomedical research and
clinical diagnosis.
42
MR, MRIter, PP, Classification Streaming Parallelism over Images
41. Pathology Imaging/ Digital Pathology II
• Current Approach: 1GB raw image data + 1.5GB analytical results per 2D image. MPI for
image analysis; MapReduce + Hive with spatial extension on supercomputers and
clouds. GPU’s used effectively. Figure 3 of section 2.12 shows the architecture of
Hadoop-GIS, a spatial data warehousing system over MapReduce to support spatial
analytics for analytical pathology imaging.
43
• Futures: Recently, 3D pathology imaging
is made possible through 3D laser
technologies or serially sectioning
hundreds of tissue sections onto slides
and scanning them into digital images.
Segmenting 3D microanatomic objects
from registered serial images could
produce tens of millions of 3D objects
from a single image. This provides a
deep “map” of human tissues for next
generation diagnosis. 1TB raw image
data + 1TB analytical results per 3D
image and 1PB data per moderated
hospital per year.
Architecture of Hadoop-GIS, a spatial data warehousing system over MapReduce
to support spatial analytics for analytical pathology imaging
42. 18: Computational Bioimaging
• Application: Data delivered from bioimaging is increasingly automated, higher
resolution, and multi-modal. This has created a data analysis bottleneck that, if
resolved, can advance the biosciences discovery through Big Data techniques.
• Current Approach: The current piecemeal analysis approach does not scale to
situation where a single scan on emerging machines is 32TB and medical
diagnostic imaging is annually around 70 PB even excluding cardiology. One
needs a web-based one-stop-shop for high performance, high throughput
image processing for producers and consumers of models built on bio-imaging
data.
• Futures: Goal is to solve that bottleneck with extreme scale computing with
community-focused science gateways to support the application of massive
data analysis toward massive imaging data sets. Workflow components include
data acquisition, storage, enhancement, minimizing noise, segmentation of
regions of interest, crowd-based selection and extraction of features, and
object classification, and organization, and search. Use ImageJ, OMERO,
VolRover, advanced segmentation and feature detection software.
44
MR, MRIter?, PP, Classification Streaming Parallelism over Images
43. 22: Statistical Relational Artificial Intelligence for Health Care
• Application: The goal of the project is to analyze large, multi-modal medical data
including different data types such as imaging, EHR, genetic and natural language. This
approach employs the relational probabilistic models that have the capability of
handling rich relational data and modeling uncertainty using probability theory. The
software learns models from multiple data types and can possibly integrate the
information and reason about complex queries. Users can provide a set of descriptions
– say for instance, MRI images and demographic data about a particular subject. They
can then query for the onset of a particular disease (say Alzheimer’s) and the system
will then provide a probability distribution over the possible occurrence of this disease.
• Current Approach: A single server can handle a test cohort of a few hundred patients
with associated data of 100’s of GB.
• Futures: A cohort of millions of patient can involve petabyte datasets. Issues include
availability of too much data (as images, genetic sequences etc) that complicate
analysis. A major challenge lies in aligning the data and merging from multiple sources
in a form that can be made useful for a combined analysis. Another issue is that
sometimes, large amount of data is available about a single subject but the number of
subjects themselves is not very high (i.e., data imbalance). This can result in learning
algorithms picking up random correlations between the multiple data types as
important features in analysis.
MRIter, EGO Streaming Parallelism over People and their EMR 45
44. El paradigma P4 de la Medicina
PREDICTIVO PREVENTIVO PERSONALIZADO PARTICIPATIVO
45. El paradigma V4 en Big Data
Medicina
V-OLUME V-ARIETY V-ELOCITY V-ALUE
48. human neuroimaging is now, officially, a
“big data” science
• Among the examples of “big data” featured at
the meeting was – no surprise - human
neuroimaging
• The Brain Research through Advancing
Innovative Neurotechnologies (BRAIN) Initiative
• Initiatives surrounding large-scale brain mapping
are also underway in Europe
http://www.humanbrainproject.eu
• Organization for Human Brain Mapping (OHBM;
http://www.humanbrainmapping.org)
49. How Big is “Big”?
• While size is a relative term when it comes to data,
medical imaging applied to the brain comes in a variety of
forms which each generating differing types and amounts
of information about neural structure and/or function.
• NeuroImage, indicates that since 1995 the amount of
data collected has doubled approximately every 26
months. At this rate, by 2015 the amount of acquired
neuroimaging data alone, discounting header information
and before more files are generated during data
processing and statistical analysis, may exceed an average
of 20GB per published research study
50. Growth of Neuroimaging
Study Size
20000
15000
10000
5000
0
1990 1995 2000 2005 2010 2015 2020
MegaBytes
Year
Expected
Observed
Predicted
Van Horn and Toga (in press) Brain Imaging and Behavior
52. Big Neuroimaging + Big Genetics =
REALLY Big Data
• With the ability to obtain genome-wide sets of single
nucleotide polymorphism (SNP) information becoming
routine and the costs of full genomic sequencing rapidly
becoming affordable.
• Next Generation Sequencing (NGS) methods, for major
brain imaging studies such as the Alzheimer’s Disease
Neuroimaging Initiative (ADNI) (Weiner, Veitch et al.
2012), with its initially available sample of 832 subjects.
• As the bond between neuroimaging and genomics grows
tighter, with both areas growing at incredible rates, disk
storage, unique data compression techniques
53. Multisite Consortia and
Data Sharing
• Examples of multisite neuroimaging efforts can be found
in the ubiquitous application of neuroimaging in health
but also in devestating illnesses such as:
• Parkinson’s (Evangelou, Maraganore et al. 2009)
• psychiatric disorders (Schumann, Loth et al. 2010)
• the mapping of human brain connectivity (Toga, Clark et
al. 2012
• databases of aging and aging-related diseases, largescale
Autism Research (NDAR; Hall,Huerta et al. 2012) and the
Federal InteracgencyTraumatic Brain Injury Research
(FITBIR; Bushnik and Gordon 2012)
54. Multisite Consortia and Data Sharing
• The various “grass roots” collections of resting-state
fMRI data maintained as part of the
“1000 Functional Connectomes” project
• http://fcon_1000.projects.nitrc.org/
(see Biswal, Mennes et al. 2010)
• Task-based OpenfMRI http://www.openfmri.org
(Poldrack, Barch et al. 2013) are other notable
examples.
55.
56. The Role of Cyberinfrastructure
• Individual desktop computers are now no longer
suitable for analyzing potentially petabytes
worth of brain and genomics data at a time.
• While the National Science Foundation (NSF)
has made major investments in the computer
architecture needed for physics, weather, and
geological data.
• Eg. XSEDE, https://www.xsede.org/ , and Open
Science Grid, https://www.opensciencegrid.org
57. The Role of Cyberinfrastructure
• The Neuroimaging Informatics Tools and
Resources Clearinghouse
(NITRC; http://www.nitrc.org )
• The International Neuroinformatics Coordinating
Facility (INCF; http://incf.org )
Have begun to deploy local clusters with Amazon
EC2 server technology toward this goal but a larger
effort will be required involving dedicated
processing centers or distributed grids of linked
compute centers.
58. Many 1,000’s of Software Tools
• Acquisition, processing, storage/DB, service, migration, mining, analysis,
visualization, annotation, … “(data-driven) process understanding”
• Biomedical Imaging
– There are 100’s of different types of image
processing algorithms and filters
– For each type of process there may be dozens
of
concrete software products (instance implementations)
• (Example) Neuroimaging
– NITRC lists > 500 openly shared software tools
– For each openly shared tool there may be
dozens of
proprietary or less commonly used analogues
59. Millions of Dispersed Hardware Devices
• Cisco: "By the end of 2012, the number of mobile-connected devices will
exceed the number of people on Earth”
• There will be over 10 billion mobile-connected devices in 2016; i.e., there
will be 1.3 mobile devices per capita
– These include phones, tablets, laptops, handheld gaming consoles, e-readers,
in-car entertainment systems, digital cameras, and “machine-to-machine
modules”
• DBs, Clients, Servers, Compute-Nodes, Web-Services, Interfaces, …
• Solution …
Dinov et al., BMC 2011
60. Image
spatial
alignment
Slice
timing
adjustment
Van Horn et al., Nature Neuro, 2004
Statistical
modeling
(e.g. GLM)
Functional –
structural
co-registration
Raw fMRI
time series
High-resolution
anatomical
image
Standardized
brain atlas
template
Image
smoothing
Gaussian
spatial
filtering
Experimental
design matrix
Study Meta Data
Scanner protocols
Subject demographics
Stimulus timing
etc.
Spatial
normalization
to atlas space
Statistical
results maps
Graphical
overlays
Table of
statistically
significant
voxels in atlas
space coordinates
62. Perfect Neuroimaging-Computation Storm?
• Single Subject Studies (N=1)
– Genetics:
• Depending on Coverage(X)
• Whole Genome Seq Data > 320GB (>80X)
• Require 2+ TB RAM, and 100+ hrs CPU
– Imaging:
• Depending on protocols
• 40-512 gradient directions Diffusion imaging data
• Raw (multimodal) Neuroimaging Data > 10 GB
• Derived Data > 100 GB
• Require 100GB RAM and 70+ hrs CPU
• Large Subject Studies
– Cohort studies (N>10, Typically N~100’s)
– Multi-Institutional Population-wide Studies (N>1,000)
– Longitudinal (neuroimaging) studies …
63. From Biomedical Challenges to Modeling,
Computation, Tools and Curricular
Training
• Quantitative Volumetric and Surface based Stats Analyses
– Interactome: Challenge↔Models↔Data Analysis↔Computation↔Education
– Statistics Online Computational Resource Che, et al., JSS (2009) No effect
Marginal
Significant
64. Grid & Cloud Computing
• UCLA Grids
Cerebro Medulla
1,200 cores
1.4TB RAM
12,000 jobs/day
700 users
• Amazon Cloud
4,300 cores
9.6 TB RAM
(new)
– EC2 (Elastic Cloud Computing)
– S3 (Simple Storage Service)
• UC Grid
• Globus GridFTP
• INI Cluster @ USC
– 3328 cores, 128GB RAM per 16 cores, 26tb aggregate
memory space. Connectivity is 5Gbit per 16 cores,
roughly 4terabit aggregate on comp and another 4.3Tbit
on the storage. 2.43PB of online storage with over 50TB
of SSD accelerating it currently.
65. Neuroimaging Applications: 56-ROI Global
Shape Analysis (NC vs. IBS/Pain) Group
Effects
Data Workflow Protocol Results
Structural T1 data
NC IBS
221 107
Mean-Curvature between-group
differences in:
L_cuneus
R_angular_gyrus
Left View
Right View
66. Neuroimaging Applications: Stat Mapping
of Cortical GM Thickness (Group Effects)
Results
Left
Anterior
Insula
Data
Workflow Protocol
Structural T1 data
Cortical Models
1.0
P-value
0.0
69. Big Data y el sector de la Salud en
Imagen Poblacional
• Según Bonnies Feldman “el potencial de Big Data en medicina
reside en la posibilidad de combinar los datos tradicionales con
otras nuevas formas de datos, tanto a nivel individual como
Poblacional”
• El potencial del Big Data indica que se pueden producir ahorros en
el sector sanitario a través de varias vías:
– Transformación de datos en información.
– Apoyo al autocuidado de las personas.
– Aumento del conocimiento.
– Concienciación del estado de salud.
• El Big Data es una metodología de acceso abierto para integrar
diferentes tipos de datos en imagen poblacional, cuantificación de
imagen y extracción de características.
71. Estudios Poblacionales
• Estudios Poblacionales
– Si no se forman grupos en la población, se calcula la media
del parámetro o parámetros.
– Si se forman grupos (control y Patológicos) se debe realizar
un contraste de hipótesis.
• Modelado Poblacional
– Modelar la degeneración volumétrica de sustancia gris y
sustancia blanca
– Establecer parámetros de degeneración
– Contrastar el estado de un individuo con respecto a dicho
modelo.
77. Objetivos BIMCV
• Desarrollar e implementar estrategias para
prevenir o tratar efectivamente las
enfermedades mediante una infraestructura de
investigación en imagen asociada a grandes
estudios poblacionales de imagen.
– Concepto de “Population Imaging”.
• Proporcionar datos,
herramientas y recursos de
proceso para realizar estudios
avanzados en imagen.
81. GIBI230
Luis Martí-Bonmatí
Fernando Aparici
Alexandre Pérez
Roberto Sanz
Carlos Infantes
Jose María Salinas
Cayetano Hernández
NEuro-Bioimaging VLC
Mariam de la Iglesia
IBIME
Juan M García-Gómez
Elies Fuster
Javier Juan-Albarracín
83. Nodo Valenciano
Euro-BioImaging
Infraestructura Europea para la Investigación en
Tecnologías de Imagen Biomédica e Imagen
Biológica.
Un proyecto sobre la hoja de ruta de las ESFRI en infraestructuras
de investigación
www.eurobioimaging.eu
84. EIBIR key facts and daily work
In the service of research,
EIBIR offers to its Network Members:
- Multidisciplinary networking
- Project Management
- Research communication
- Research Training
- Meeting organisation
EIBIR Office
• Established in 2006
• Staff: 4.5, incl. 3 Project Managers, 1 assistant
• Provision of services to Network Members + EIBIR bodies
• Monitoring European Affairs + research funding opportunities
• Project management and coordination
• Information activities and media work
• Promotion of Network Membership
• Website and data base updates
• Congress activities
• Scientific Advisory Board
85. Cronología & Financiación
88
2013 - 2017
Fase de
Construcción
• Evaluación &
selección de nodos.
• Construccion de los
nodes.
Financiado por los Estados
Miembros (¿MINECO?)
2010 - 2013
Fase Preparatoria
• Framework
• Definición de los
criterios de
elegibilidad para los
nodos
• Llamada a los
Nodos, Abierta.
Financiado por CE
………
2017 - ….
Fase Operacional
• Acceso y formación
• Tecnología y evaluación
para mejorar el servicio
Financiado por los Estados Miembros
& EC
86. MULTIMODAL
TECHNOLOGY
NODE
Imaging Infrastructure with open user access
European life scientists as users
FLAGSHIP NODE
FLAGSHIP NODE
FLAGSHIP NODE
FLAGSHIP NODE
USER TRAINING
STAFF TRAINING
Web-access portal
Data storage and analysis infrastructure
User returns with results for publication
NODES HUB
MULTIMODAL
TECHNOLOGY
NODE
87.
88. 1st Open Call
Euro-BioImaging Nodes – Expression of Interest
The 1st Open Call: 1 February – 30 April 2013
• Multi-Modal Molecular Imaging
• Phase contrast Imaging
• High-field MRI
• MR-PET
• Population Imaging
• Data Infrastructure: Challenges Framework
• The biological imaging community will call for EoIs in 6 technologies
91. MEDICAL IMAGING DATA BANK (BIMCV)
Expresion of Interest: Population Imaging
BIG DATA DIASEASE SIGNATURES
SINGLE TECHNOLOGY FLAGSHIPS
CONSORTIUM
92. Evaluation summary and Final ranking
• The node develops and provides access to a large database of
imaging data and the associated clinical data records.
• Big Data repository from hospitals in the Valencia region (5 million
inhabitants living over an area of 23.255 Km2. average number of
5.3 million clinical cases per year, from 210 different imaging
modalities).
• The access to such data and tools will be an efficient way of
advancing population imaging studies and research.
• The node has ability to incorporate data from other facilities
93. Services offered by the node
• BIMCV facility provides a multi-level and multi-ology storage
service (Vendor Neutral Archive).
• CEIB-CS node integrates access to high-performance
computational services from local and European
infrastructures (Principe Felipe Research Centre & UPV-I3M
Infrastructure).
• Open access methodology to integrate different data types for
population imaging, quantitative resources and feature
extraction.
• Comprehensive user training
94. Single Technology Flagship Node – Population Imaging: Valencia
Evaluation summary and Final ranking:
• Requires minor improvements (training plan, actually corrected).
• The node develops and provides access to a large database of imaging data
and the associated clinical data records.
• Big Data repository from hospitals in the Valencia region (5 million inhabitants
living over an area of 23.255 Km2. average number of 5.3 million clinical cases
per year, from 210 different imaging modalities).
• The access to such data and tools will be an efficient way of advancing
population imaging studies and research.
• The node has ability to incorporate data from other facilities.
Other facilities
MEDICAL IMAGING DATA BANK (BIMCV)
BIG DATA DIASEASE SIGNATURES
Services offered by the node:
• BIMCV facility provides a multi-level and multi-ology
storage service (Vendor Neutral Archive).
• CEIB-AVS node integrates access to high-performance
computational services from local and European
infrastructures (Principe Felipe Research Centre & UPV-I3M
Infrastructure).
• Open access methodology to integrate different data
types for population imaging, quantitative resources and
feature extraction.
• Comprehensive user training.
95. Nodo Valenciano, BIMCV
Centro de Excelencia en Imagen Biomédica de la Conselleria de
Sanitat
Sede CEIB clínica Sede CEIB computo
119. 10 K Structural Modeling in
Neuroimage of Valencia Region
• Dos becas de la Subdirección General de Sistemas para
la Salud de la CS. Ingenieros Informáticos o Ingenieros
de Telecomunicaciones (DOGV 9-07-2014).
• Se van a medir las estructuras principales del cerebro.
• En colaboración con LABMAN
• En colaboración con Brain Dynamics
• La universidad del Sur de California (Jack Van Horn)
• Posiblemente con IBIME (volBrain system)