Presentation by Dominic Job at DCC-Arkivum event 'Data Storage & Preservation Strategies for Research Data Management' at 'University of Edinburgh 27 October 2014
Presentation by Prof. Dr. Henning Müller.
Overview:
- Medical image retrieval projects
- Image analysis and 3D texture modeling
- Data science evaluation infrastructures (ImageCLEF, VISCERAL, EaaS – Evaluation as a Service)
- What comes next?
This document discusses challenges in medical imaging and the VISCERAL model. It provides an overview of systematic evaluations of medical image retrieval since the 1960s. It describes the ImageCLEF benchmark which has run medical image retrieval tasks since 2003. It discusses open science initiatives to share data and tools. It introduces the VISCERAL model which brings algorithms to medical image data stored in the cloud to enable large-scale challenges. The document concludes that open science has potential advantages but the medical domain poses complications regarding data protection, and that challenges will be part of the ecosystem for sharing medical image analysis tools.
This document provides an overview of a research report on the use of distributed imaging systems in the health sector. The report examines how digital imaging systems can create optimal access to patient information for fast treatment by storing, manipulating, and allowing access to images without physical films. It analyzes centralized and grid-based distributed systems used in medical facilities globally. The main conclusion is that integrated digital imaging systems like CT, MRI, PACS, and DICOM databases eliminate film radiography and enable easy radiology through computerized imaging. Future work proposes a fully distributed distributed DICOM data warehouse for hardware and software support.
The document introduces the concept of a "Libriome Research Core", which is a proposed library research core at the University of Michigan. It summarizes that a library research core would provide centralized shared resources like other research cores, with a focus on information resources and expertise in areas like discovery, access, assessment, organization, synthesis, and dissemination of information. It then provides examples of specific services a library research core could offer, such as support for grant compliance, intellectual property, data services, data visualization, new publication types, research impact and promotion, and collaborations. The document advocates that a library could function as a valuable research core similar to other institutional cores.
David Snead on The use of digital pathology in the primary diagnosis of histo...Cirdan
Recent developments in digital pathology enable the rapid scanning of microscope slides at high resolution, making the digitisation of histopathology slides for routine diagnosis purposes feasible. An important initial step in the wider adoption of this technology is the establishment of validation data assessing how effective pathologists are using digital workstations in comparison to conventional light microscopes and glass slides when examining cases for primary diagnosis. I will report on the first study sufficiently powered to demonstrate a statistically valid equivalent (i.e. non-inferior) performance of digital pathology (DP) against standard glass slide (GS) microscopy. This study examined a total of 3,017 cases were included, generating 10,138 slides, which when scanned resulted in a digital archive of 2.45 terabytes. As well as demonstrating non-inferiority of digital in comparison to glass slides the study was useful in establishing rules for slide scanning and identifying areas where digital pathology has limitations and needs to be used with caution.
Finally the presentation covers the impact adopting digital pathology will have on diagnostic laboratories, the economics of these changes and where these changes are most likely to benefit patients.
The document discusses the importance of research data management for supervisors and HDR students, noting that it is important to justify research outcomes, meet obligations and guidelines, and ensure good stewardship of resources. Research data can include various materials needed to validate results and HDR students must store and maintain data responsibly by keeping clear records, managing storage and access, and retaining or disposing of data properly according to requirements. The document advises that supervisors should guide students on managing, storing, maintaining confidentiality of, and retaining data appropriately.
PACS (Picture Archiving Communication and Systems) are used to store, retrieve, and share medical images across sites. A PACS has four main components: imaging modalities, a secure network, workstations, and archives. It allows for hard copy replacement, remote access, integration of images, and radiology workflow management. A PACS uses DICOM objects on its network to query and receive images from modalities. A radiology information system (RIS) manages patient registration, scheduling, and reporting to interface with modalities and support the radiology workflow. ClearCanvas is an example of a vendor that provides both a PACS workstation and RIS.
Presentation by Prof. Dr. Henning Müller.
Overview:
- Medical image retrieval projects
- Image analysis and 3D texture modeling
- Data science evaluation infrastructures (ImageCLEF, VISCERAL, EaaS – Evaluation as a Service)
- What comes next?
This document discusses challenges in medical imaging and the VISCERAL model. It provides an overview of systematic evaluations of medical image retrieval since the 1960s. It describes the ImageCLEF benchmark which has run medical image retrieval tasks since 2003. It discusses open science initiatives to share data and tools. It introduces the VISCERAL model which brings algorithms to medical image data stored in the cloud to enable large-scale challenges. The document concludes that open science has potential advantages but the medical domain poses complications regarding data protection, and that challenges will be part of the ecosystem for sharing medical image analysis tools.
This document provides an overview of a research report on the use of distributed imaging systems in the health sector. The report examines how digital imaging systems can create optimal access to patient information for fast treatment by storing, manipulating, and allowing access to images without physical films. It analyzes centralized and grid-based distributed systems used in medical facilities globally. The main conclusion is that integrated digital imaging systems like CT, MRI, PACS, and DICOM databases eliminate film radiography and enable easy radiology through computerized imaging. Future work proposes a fully distributed distributed DICOM data warehouse for hardware and software support.
The document introduces the concept of a "Libriome Research Core", which is a proposed library research core at the University of Michigan. It summarizes that a library research core would provide centralized shared resources like other research cores, with a focus on information resources and expertise in areas like discovery, access, assessment, organization, synthesis, and dissemination of information. It then provides examples of specific services a library research core could offer, such as support for grant compliance, intellectual property, data services, data visualization, new publication types, research impact and promotion, and collaborations. The document advocates that a library could function as a valuable research core similar to other institutional cores.
David Snead on The use of digital pathology in the primary diagnosis of histo...Cirdan
Recent developments in digital pathology enable the rapid scanning of microscope slides at high resolution, making the digitisation of histopathology slides for routine diagnosis purposes feasible. An important initial step in the wider adoption of this technology is the establishment of validation data assessing how effective pathologists are using digital workstations in comparison to conventional light microscopes and glass slides when examining cases for primary diagnosis. I will report on the first study sufficiently powered to demonstrate a statistically valid equivalent (i.e. non-inferior) performance of digital pathology (DP) against standard glass slide (GS) microscopy. This study examined a total of 3,017 cases were included, generating 10,138 slides, which when scanned resulted in a digital archive of 2.45 terabytes. As well as demonstrating non-inferiority of digital in comparison to glass slides the study was useful in establishing rules for slide scanning and identifying areas where digital pathology has limitations and needs to be used with caution.
Finally the presentation covers the impact adopting digital pathology will have on diagnostic laboratories, the economics of these changes and where these changes are most likely to benefit patients.
The document discusses the importance of research data management for supervisors and HDR students, noting that it is important to justify research outcomes, meet obligations and guidelines, and ensure good stewardship of resources. Research data can include various materials needed to validate results and HDR students must store and maintain data responsibly by keeping clear records, managing storage and access, and retaining or disposing of data properly according to requirements. The document advises that supervisors should guide students on managing, storing, maintaining confidentiality of, and retaining data appropriately.
PACS (Picture Archiving Communication and Systems) are used to store, retrieve, and share medical images across sites. A PACS has four main components: imaging modalities, a secure network, workstations, and archives. It allows for hard copy replacement, remote access, integration of images, and radiology workflow management. A PACS uses DICOM objects on its network to query and receive images from modalities. A radiology information system (RIS) manages patient registration, scheduling, and reporting to interface with modalities and support the radiology workflow. ClearCanvas is an example of a vendor that provides both a PACS workstation and RIS.
Traditional Text-only vs. Multimedia Enhanced Radiology ReportingCarestream
The Department of Radiology and Imaging Sciences at Emory University School of Medicine partnered with Carestream to seek out the perceived value of using multimedia-enhanced radiology reports (MERR) vs. the traditional text reports. The results overwhelmingly favored the MERRs.
The document provides an overview of the CISER Data Archive at Cornell University and introduces key concepts of research data management (RDM).
The CISER Data Archive is a collection of over 27,000 numeric datasets to support quantitative research in various social science fields. It provides consulting services to help users find, access, and use data. It also maintains the Cornell research data repository.
The document defines research data and outlines the research data lifecycle. It discusses best practices for organizing, documenting, storing, and securing research data. Key aspects of RDM include developing data management plans, using appropriate file formats, and ensuring long-term preservation and sharing of research data.
Machine Learning and Deep Contemplation of DataJoel Saltz
Spatio temporal data analytics - Generation of Features
1) Sanity Checking and Data Cleaning, 2) Qualitative Exploration, 3) Descriptive Statistics, 4) Classification, 5) Identification of Interesting Phenomena, 6) Prediction, 7) Control and 8)
Save Data for Later (Compression).
Detailed example from Precision Medicine; Pathomics, Radiomics.
This document discusses research data in the context of visual arts research. It defines research data, discusses its importance and challenges in the visual arts domain. Key points covered include the heterogeneous nature of visual arts data, principles of data curation and preservation, and the need for data management planning and assistance with archiving. Examples of types of visual arts research data are provided.
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
The document summarizes the Jisc Managing Research Data Programme which aims to support universities in improving research data management. It discusses why managing research data is important, highlighting funder policies and the benefits of open data. It provides an overview of Jisc's activities including training projects, guidance resources, and funding for institutional infrastructure services and repositories. The presentation emphasizes the importance of institutional policies, support services, skills development and cultural change to effectively manage research data in line with funder expectations.
2015 D-STOP Symposium session by UT Austin's Joydeep Ghosh. Watch the presentation at http://youtu.be/y2kYLM8GdbI?t=19m42s
Get symposium details: http://ctr.utexas.edu/research/d-stop/education/annual-symposium/
The document outlines the program for the HCMDSS/MDPnP Workshop held on April 11, 2011. It included three sessions over the day comprising keynote speeches, invited talks, and presentations on a variety of topics. Some of the topics discussed included cognitive complexity in intensive care patient monitoring, certifying concurrent state tables in surgical robotics, regulatory issues for integrated clinical environments, model-based code generation for intensive care applications, medical device plug-and-play architectures, hybrid systems model checking, semantic alarms in medical device networks, and architecture-based static analysis of medical device software. The workshop concluded with a panel discussion and ended at 5pm.
Picture Archiving and Communication Systems (PACS) – A New Paradigm in Health...Apollo Hospitals
Digitization and transfer of images in Radiodiagnosis and Imaging dates back to to early 70s with the advent of Computerized Tomography Scanning, and, subsequently sending these images to cameras and printers hooked on to the machines through a local “network”. Rapid advancements in Information Technology (IT) as well as in the imaging technology have facilitated the healthcare organizations across the world to manage patient's images, records and other data more efficiently. Today, capturing images, archiving and retrieval have already reached great heights, and, further refinements are in progress. The infrastructural requirements for such a venture have to be very finely and judiciously planned well in advance with a view to go filmless as the ultimate objective. Involvement of all concerned and connected agencies is a must e.g. IT, Radiologists, Clinicians and the Vendors.
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...ARDC
Dr Jeff Christiansen (QCIF) introduced med.data.edu.au, a national facility to provide petabyte-scale research data storage, and related high-speed networked computational services, to Australian medical and health research organisations.
Webinar: https://www.youtube.com/watch?v=5jwBwDJrWAs
Jeff Christiansen Snippet: https://www.youtube.com/watch?v=PV_vuUKRm6w
Transcript: https://www.slideshare.net/AustralianNationalDataService/transcript-storing-and-publishing-health-and-medical-data-16052017
Presentation that gives an overview of the impact of IT on radiology, including the growing role of biomarkers and artificial intelligence and deep learning on the (future) radiology profession. The shift to precision medicine and personalized care are explained, the reasons for a re-definition of radiology are addressed.
Talk entitled "from the Virtual Human to a Digital Me" presented at the Virtual Physiological Human 2012 Conference held at IET Savoy, Savoy Place, London, 18-20 September 2012.
In the course of any clinical trial, there are risks associated with specific activities and tasks. This webinar will highlight some of these key risk areas and provide guidance on combining technology with best practices to help mitigate risks.
Iot, cloud and healthcare - Challenges and OpportunitiesArash Ghadar
The document discusses the opportunities and challenges of using IoT, cloud computing, and telehealth in healthcare. It provides several case studies of current IoT applications in industries like oil & gas, food, and medical. Key challenges identified are ensuring timely and reliable data delivery, network and data security, and complex certification processes. The future of telehealth is seen as improving healthcare access and reducing costs through remote monitoring and treatment of chronic diseases, though security and regulatory barriers must still be addressed.
The document discusses the opportunities and challenges of using IoT, cloud computing, and telehealth in healthcare. It provides several case studies of current IoT applications in industries like oil & gas, food, and medical. Key challenges identified are ensuring timely and reliable data delivery, network and data security, and complex certification processes. The future of telehealth is seen as improving healthcare access and reducing costs through remote monitoring and treatment of chronic diseases, though security and regulatory barriers must still be addressed.
On March 23, 2016, Prof. Henning Müller (HES-SO Valais-Wallis and Martinos Center) presented Medical image analysis and big data evaluation infrastructures at Stanford medicine.
Why does data matter? Professor Stephen Keevil, Head of Medical Physics, Guy’...NHS England
This document discusses the potential of big data, machine learning, and artificial intelligence to transform healthcare using medical imaging data. It provides examples of how these techniques have been used to improve diagnosis of diseases like tuberculosis, predict treatment response for conditions such as depression, and enable early monitoring of cancer therapy. However, it also notes challenges like the need for large curated datasets and concerns over data use. Overall, while recognizing the hype around these topics, the document argues that with proper standards, governance, infrastructure development, and workforce training, data-driven AI could help address healthcare challenges and save costs for the NHS.
Fundamental observation
• IT is increasingly pervasive in healthcare delivery (and
other complex applications)
• As more imaging and signal modalities become available
to clinicians, communications become faster, and
computers more powerful, doctors are drowning in data;
but what they want is information to guide patient
management.
• To do this, systems need to become smarter. Here we
examine two currently complementary (though
fundamentally linked) aspects to achieving this:
– Systems capable of reasoning, understanding & offering advice
– Systems capable of analysing signals and images
Social networks and collaborative platforms are changing how radiology data is shared. The rise of online information sharing and cloud technology has led to a paradigm shift towards increased data sharing. This benefits research, enables second opinions, and advances precision medicine through collaborative care models. However, challenges remain around data storage needs, standardization across specialties, and ensuring patient privacy and control over their information as new players may enter healthcare using artificial intelligence.
Using Big Data for Improved Healthcare Operations and AnalyticsPerficient, Inc.
Big Data technologies represent a major shift that is here to stay. Big Data enables the use of all types of data, including unstructured data like clinical notes and medical images, for new insights. Advanced analytics like predictive modeling and text mining will become more prevalent and intelligent with Big Data. Big Data will impact application development and require changes to data management approaches. Technologies like Hadoop, NoSQL databases, and semantic modeling will be important for healthcare Big Data.
Traditional Text-only vs. Multimedia Enhanced Radiology ReportingCarestream
The Department of Radiology and Imaging Sciences at Emory University School of Medicine partnered with Carestream to seek out the perceived value of using multimedia-enhanced radiology reports (MERR) vs. the traditional text reports. The results overwhelmingly favored the MERRs.
The document provides an overview of the CISER Data Archive at Cornell University and introduces key concepts of research data management (RDM).
The CISER Data Archive is a collection of over 27,000 numeric datasets to support quantitative research in various social science fields. It provides consulting services to help users find, access, and use data. It also maintains the Cornell research data repository.
The document defines research data and outlines the research data lifecycle. It discusses best practices for organizing, documenting, storing, and securing research data. Key aspects of RDM include developing data management plans, using appropriate file formats, and ensuring long-term preservation and sharing of research data.
Machine Learning and Deep Contemplation of DataJoel Saltz
Spatio temporal data analytics - Generation of Features
1) Sanity Checking and Data Cleaning, 2) Qualitative Exploration, 3) Descriptive Statistics, 4) Classification, 5) Identification of Interesting Phenomena, 6) Prediction, 7) Control and 8)
Save Data for Later (Compression).
Detailed example from Precision Medicine; Pathomics, Radiomics.
This document discusses research data in the context of visual arts research. It defines research data, discusses its importance and challenges in the visual arts domain. Key points covered include the heterogeneous nature of visual arts data, principles of data curation and preservation, and the need for data management planning and assistance with archiving. Examples of types of visual arts research data are provided.
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
The document summarizes the Jisc Managing Research Data Programme which aims to support universities in improving research data management. It discusses why managing research data is important, highlighting funder policies and the benefits of open data. It provides an overview of Jisc's activities including training projects, guidance resources, and funding for institutional infrastructure services and repositories. The presentation emphasizes the importance of institutional policies, support services, skills development and cultural change to effectively manage research data in line with funder expectations.
2015 D-STOP Symposium session by UT Austin's Joydeep Ghosh. Watch the presentation at http://youtu.be/y2kYLM8GdbI?t=19m42s
Get symposium details: http://ctr.utexas.edu/research/d-stop/education/annual-symposium/
The document outlines the program for the HCMDSS/MDPnP Workshop held on April 11, 2011. It included three sessions over the day comprising keynote speeches, invited talks, and presentations on a variety of topics. Some of the topics discussed included cognitive complexity in intensive care patient monitoring, certifying concurrent state tables in surgical robotics, regulatory issues for integrated clinical environments, model-based code generation for intensive care applications, medical device plug-and-play architectures, hybrid systems model checking, semantic alarms in medical device networks, and architecture-based static analysis of medical device software. The workshop concluded with a panel discussion and ended at 5pm.
Picture Archiving and Communication Systems (PACS) – A New Paradigm in Health...Apollo Hospitals
Digitization and transfer of images in Radiodiagnosis and Imaging dates back to to early 70s with the advent of Computerized Tomography Scanning, and, subsequently sending these images to cameras and printers hooked on to the machines through a local “network”. Rapid advancements in Information Technology (IT) as well as in the imaging technology have facilitated the healthcare organizations across the world to manage patient's images, records and other data more efficiently. Today, capturing images, archiving and retrieval have already reached great heights, and, further refinements are in progress. The infrastructural requirements for such a venture have to be very finely and judiciously planned well in advance with a view to go filmless as the ultimate objective. Involvement of all concerned and connected agencies is a must e.g. IT, Radiologists, Clinicians and the Vendors.
ANDS health and medical data webinar 16 May. Storing and Publishing Health an...ARDC
Dr Jeff Christiansen (QCIF) introduced med.data.edu.au, a national facility to provide petabyte-scale research data storage, and related high-speed networked computational services, to Australian medical and health research organisations.
Webinar: https://www.youtube.com/watch?v=5jwBwDJrWAs
Jeff Christiansen Snippet: https://www.youtube.com/watch?v=PV_vuUKRm6w
Transcript: https://www.slideshare.net/AustralianNationalDataService/transcript-storing-and-publishing-health-and-medical-data-16052017
Presentation that gives an overview of the impact of IT on radiology, including the growing role of biomarkers and artificial intelligence and deep learning on the (future) radiology profession. The shift to precision medicine and personalized care are explained, the reasons for a re-definition of radiology are addressed.
Talk entitled "from the Virtual Human to a Digital Me" presented at the Virtual Physiological Human 2012 Conference held at IET Savoy, Savoy Place, London, 18-20 September 2012.
In the course of any clinical trial, there are risks associated with specific activities and tasks. This webinar will highlight some of these key risk areas and provide guidance on combining technology with best practices to help mitigate risks.
Iot, cloud and healthcare - Challenges and OpportunitiesArash Ghadar
The document discusses the opportunities and challenges of using IoT, cloud computing, and telehealth in healthcare. It provides several case studies of current IoT applications in industries like oil & gas, food, and medical. Key challenges identified are ensuring timely and reliable data delivery, network and data security, and complex certification processes. The future of telehealth is seen as improving healthcare access and reducing costs through remote monitoring and treatment of chronic diseases, though security and regulatory barriers must still be addressed.
The document discusses the opportunities and challenges of using IoT, cloud computing, and telehealth in healthcare. It provides several case studies of current IoT applications in industries like oil & gas, food, and medical. Key challenges identified are ensuring timely and reliable data delivery, network and data security, and complex certification processes. The future of telehealth is seen as improving healthcare access and reducing costs through remote monitoring and treatment of chronic diseases, though security and regulatory barriers must still be addressed.
On March 23, 2016, Prof. Henning Müller (HES-SO Valais-Wallis and Martinos Center) presented Medical image analysis and big data evaluation infrastructures at Stanford medicine.
Why does data matter? Professor Stephen Keevil, Head of Medical Physics, Guy’...NHS England
This document discusses the potential of big data, machine learning, and artificial intelligence to transform healthcare using medical imaging data. It provides examples of how these techniques have been used to improve diagnosis of diseases like tuberculosis, predict treatment response for conditions such as depression, and enable early monitoring of cancer therapy. However, it also notes challenges like the need for large curated datasets and concerns over data use. Overall, while recognizing the hype around these topics, the document argues that with proper standards, governance, infrastructure development, and workforce training, data-driven AI could help address healthcare challenges and save costs for the NHS.
Fundamental observation
• IT is increasingly pervasive in healthcare delivery (and
other complex applications)
• As more imaging and signal modalities become available
to clinicians, communications become faster, and
computers more powerful, doctors are drowning in data;
but what they want is information to guide patient
management.
• To do this, systems need to become smarter. Here we
examine two currently complementary (though
fundamentally linked) aspects to achieving this:
– Systems capable of reasoning, understanding & offering advice
– Systems capable of analysing signals and images
Social networks and collaborative platforms are changing how radiology data is shared. The rise of online information sharing and cloud technology has led to a paradigm shift towards increased data sharing. This benefits research, enables second opinions, and advances precision medicine through collaborative care models. However, challenges remain around data storage needs, standardization across specialties, and ensuring patient privacy and control over their information as new players may enter healthcare using artificial intelligence.
Using Big Data for Improved Healthcare Operations and AnalyticsPerficient, Inc.
Big Data technologies represent a major shift that is here to stay. Big Data enables the use of all types of data, including unstructured data like clinical notes and medical images, for new insights. Advanced analytics like predictive modeling and text mining will become more prevalent and intelligent with Big Data. Big Data will impact application development and require changes to data management approaches. Technologies like Hadoop, NoSQL databases, and semantic modeling will be important for healthcare Big Data.
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaMaria de la Iglesia
Según Hal Varian (experto en microeconomía y economía de la información y, desde el año 2002, Chief Economist de Google) “En los próximos años, el trabajo más atractivo será el de los estadísticos: La capacidad de recoger datos, comprenderlos, procesarlos, extraer su valor, visualizarlos, comunicarlos serán todas habilidades importantes en las próximas décadas. Ahora disponemos de datos gratuitos y omnipresentes. Lo que aún falta es la capacidad de comprender estos datos“.
The document discusses the collision of big data in biomedical imaging. Specifically, it notes that population image data from millions of hardware devices and thousands of software tools creates the perfect storm for big data in computational neuroimaging and digital pathology. It provides examples of how terabytes of raw imaging data and petabytes of derived analytical results are being generated from sources like digital pathology and neuroimaging studies. Managing and analyzing this large, multi-modal medical imaging data requires scalable big data techniques and architectures.
Clinical Data Models - The Hyve - Bio IT World April 2019Kees van Bochove
Population genetics and genomics is an emerging topic for the application of machine learning methods in healthcare and biomedical sciences. Currently, several large genomics initiatives, such as Genomics England, UK Biobank, the All of Us Project, and Europe's 1 Million Genomes Initiative are all in the process of making both clinical and genomics data available from large numbers of patients to benefit biomedical research. However, a key challenge in these initiatives is the standardization of the clinical and outcomes data in such a way that machine learning methods can be effectively trained to discover useful medical and scientific insights. In this talk, we will look at what data is available at scale, and review some of examples of the application of common data and evidence models such as OMOP, FHIR, GA4GH etc. in order to achieve this, based on projects which The Hyve has executed with some of these initiatives to harmonize their clinical, genomics, imaging and wearables data and make it FAIR.
Computational Pathology Workshop July 8 2014Joel Saltz
This document discusses computational pathology research. It describes using computational methods like high dimensional fused informatics, image analysis, and machine learning to analyze pathology images and integrate them with genomic and clinical data. The goals are to characterize tumors at multiple scales, predict treatment outcomes, and identify tumor subtypes. Challenges include managing the large amounts of image and multi-dimensional data generated. The document outlines several of Joel Saltz's pathology research projects and computational pathology initiatives like challenges that integrate radiology, pathology, and genomic data to predict patient outcomes.
Dekker trog - big data for radiation oncology - 2017Andre Dekker
- Big data in radiation oncology comes from clinical research, registries, and routine clinical data, but the latter has the most patients and features while also having the most missing data.
- Models are developed using a hypothesis-driven approach by learning from a training cohort and estimating performance in a validation cohort. Challenges include gaining trust in models, dealing with continuous changes, and addressing barriers to implementing shared decision making.
- Overall, big data and models can improve cancer care by better tailoring treatments to individual patients, but also require overcoming challenges through rapid learning and collaboration across institutions.
Panel: FROM SMALL TO BIG TO RICH DATA: Dealing with new sources of data in Biomedicine Precision and Participatory Medicine
Fernando J. Martin-Sanchez, Professor and Chair of Health Informatics at Melbourne Medical School, discusses new sources of data in biomedicine including small, big, and rich data. He describes how small data connects people with meaningful insights from big data to be understandable for everyday tasks. Martin-Sanchez also discusses precision medicine, participatory health, and how convergence between the two can help integrate multiple data sources including genomics, the exposome, and digital health to improve disease prevention and treatment outcomes.
UCB discusses forecasts for the future of digital clinical trials in 2033. Key predictions include: AI will automate many communications and tasks while augmenting humans; computing will utilize cloud/distributed models; technology will focus on interoperability through standards like FHIR and APIs; study operations will integrate virtual/augmented reality, 3D printing, and omics data; governance will become more decentralized through DAOs and blockchain; and data ownership will shift to give patients control over their information through tokenization on the blockchain. This decentralized, digital paradigm could increase accessibility of research and focus on prevention.
European Research Funders and data sharing: an overview of current practicesDCC-info
This document provides an overview of data sharing policies and practices among European research funders. It finds that while many funders state a policy in support of open access to research data, fewer mandate sharing in repositories or monitor compliance. Common incentives for data sharing include guidance, tools and supported repositories, while rewards through additional funding or assessment are rare. Monitoring of data management plans and sharing is limited, occurring in only a few countries. The document discusses examples from the UK and other countries to identify best practices that could encourage data sharing while also building trust in repositories and services.
Presentation by Jim Cook at DCC-Arkivum event 'Data Storage & Preservation Strategies for Research Data Management' at University of Edinburgh 27 October 2014
Research Data Management Programme in EdinburghDCC-info
Presentation by Stuart Macdonald at DCC-Arkivum event 'Data Storage & Preservation Strategies for Research Data Management' at University of Edinburgh 27 October 2014
Long-term storage – will it fill up with the good stuff, or the big, bad, an...DCC-info
Presentation by Angus Whyte at DCC-Arkivum event 'Data Storage & Preservation Strategies for Research Data Management' at University of Edinburgh 27 October 2014
Janet Cloud Services helps research and education institutions move to cloud services through guidance and collaborative purchasing. It provides a data archive framework agreement that offers benefits such as long-term data storage with 100% integrity guarantees. The agreement is available through Janet's contract with Arkivum, an archiving company spun off from Southampton University, and provides discounted pricing options for data archiving.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
End-to-end pipeline agility - Berlin Buzzwords 2024
Data Management in Human Imaging
1. Data Management in Human
Imaging
Dominic Job, David Rodriguez, Elizabeth McDowell,
Joanna Wardlaw
Centre for Clinical Brain Sciences – Neuroimaging Sciences
The University of Edinburgh, October 2014
2. Requirements for Research Data
Management & BRIC, CRIC and WTCRF
•Significant risks to data longevity
– Ever changing nature of research
– Reliance on ‘Human infrastructure’
For data linking, provenance, support and documentation
(Whyte A. et al., 2008 – Digital Curation Centre)
3.
4. Data & risks
• IImmaaggiinngg:
• MR scanners, PET, Ultrasound, Optical (retinal)
–5+ Scanners in the University, up to 500k 2D images/hour
(Mostly DDIICCOOMM format)?
• CClliinniiccaall//ootthheerr ddaattaa:
• Formal interviews by professionals:
Psychiatrist, Neonatologist, Psychologist, Researcher
• AAnnoonnyymmiissaattiioonn & sseeccuurriittyy
• PPrroocceessssiinngg ppiippeelliinneess && AAnnaallyyssiiss
• PPuubblliiccaattiioonn && AArrcchhiivviinngg//PPrreesseerrvvaattiioonn??
5. Data Preservation and Reuse
– Data is of high value as it is expensive to collect
(BRIC, CRIC) Current imaging assets > £100 million > 100TB
Much of this data is very ‘rare’
Collected and Analysed over past 20 years
Multiple P.I’s/groups/sites/universities/hospitals/IT teams
WTCRF: 30,000 + documents
– Cost of curation vs. reuse value
–Old data often reused
» Huge cost savings £200 - £5,000 per subject
–Enables future discoveries in basic and clinical neuroscience
–Reused as new methods and theories develop
6. Data sharing & security
• Medical data is highly sensitive
– DPA1988, NHS R&D, Caldicott, MHRA, GCP
• Must be held securely Catalogued & Anonymised
• DICOM Confidential, (Rodríguez González D. et al., 2010);
• Scan Manager, CRF Manager® ( UoE, WTCRF software)
• Secure, auditable access control and sharing
– 2 factor verification under deployment (CMVM)
7. Solutions
– CRF Imaging Core application & CRF Manager ®
– Data management Planning - Online
– Scan Manager & DICOM Confidential
– Standard Provenance templates
(to capture research -> Electronic lab note books)
– BRAINS database
– Research Data Management Service (UoE)
• MANTRA, Data Vault, Data Store, Data Share, w3
New requirements for Research Data Management
The ever changing nature of research, in combination with our reliance on ‘human infrastructure’ for provenance support and documentation (Whyte A. et al., 2008), create a significant risk to the longevity of our data sets. To ameliorate these risks several applications have been developed that catalogue the imaging data at source, support documentation of lifecycle provenance data collection, and collate the data in a sustainable infrastructure and format: (DICOM Confidential, (Rodríguez González D. et al., 2010); Scan Manager (UoE in-house software); and standardised provenance templates (e.g. http://www.sbirc.ed.ac.uk/documents/templateREADME.pdf).
Initial Enquiry & CRF Imaging Core application:
Green - Investigator responsibility
Blue - R&D/Ethics responsibility
Orange - BRIC responsibility
- IRAS : Integrated Research Application System
- Ethics approval, subject consent, appropriate anonymisation
- Costs
- CRF Imaging Core application (online - includes Data management Plan)
- covers assessment of ‘reuse potential’
- is being updated e.g.
UoE Data management Planning:
http://www.ed.ac.uk/schools-departments/information-services/research-support
/data-management/data-management-home
(or Data management Planning Online (DCC) https://dmponline.dcc.ac.uk/)
Imaging - progress towards data longevity is getting better, but will always be evolving with technology and methods.
Clinical - progress towards data longevity is medium, often dependent on the discipline of the person (s) involved in acquisition.
- Working towards providing a database linking service and ‘complete study baseline data’ archiving service.
Processing and Analysis – longevity is poor and variable.
- difficult to document and record provenance. Software and methods/parameters change daily. Varying levels of discipline.
- working on infrastructure to record pipelines and parameter sets.
To date, we have not collected study data that we can confidently classify as ‘no longer useful’.
Approximately 50% of processed data is reused.
Analysis and Reuse generates 6-10 times more data than the ‘baseline’ data
A critical function of the human brain imaging research lifecycle is that of data linking. Without correctly linking imaging data to its corresponding clinical and biological data, the data is of little use.
One solution is the BRAINS database, which aims to provide a long term solution to this issue, including a standardised data schema, and a standard routine for collection of non-imaging data for prospective studies. BRAINS database – Currently a fully anonymised database of 3D MR imaging and link related data for sharing study data.
(Brain Images of Normal Subjects)
Software:
old software and some old data no longer meet industry standards or run on current OS’s
difficult to maintain/update with age.
Data Vault, Data Store, Data Share
Templates enable us to capture any information with great flexibility.
MANTRA – Data management courses for students and staff.
Electronic lab note books are currently being developed (in house) for data acquisition in the initial phases of data collection, but not yet within the processing and analysis phases.
http://www.sinapse.ac.uk/research-resources/brains-project
W3 http://www.w3.org/
- migrating & unifying ‘labs’ to DATA Store, centralised resources etc
Example of brain image data sharing improving basic science (from the BRAINS project).
This image shows that the appearance of brain structure in a group of subjects diagnosed with Alzheimer's disease (AD) versus a normal control group changes based on the method used. The image on the left shows voxel-wise parametric effect size and the image on the right shows voxel-wise NONparametric effect size (red areas = higher effect sizes > 0.75). If the voxel-wise data were Normally distributed then the parametric effect size would equal the NONparametric effect size. However, these voxel distributions are not Normally distributed therefore the parametric method appears to be artificially increasing effect size between the AD and normal control groups.
References:
BRAINS: http://www.sinapse.ac.uk/research-resources/brains-project
Caldicott Guardians: http://systems.hscic.gov.uk/data/ods/searchtools/caldicott/index_html
CMVM: College of Medicine and Veterinary Medicine
Data Protection Act 1988: http://www.legislation.gov.uk/ukpga/1998/29/contents
Good Clinical Practice: http://www.mhra.gov.uk/Howweregulate/Medicines/Inspectionandstandards/GoodClinicalPractice/
MHRA: http://www.mhra.gov.uk/
NHS R & D: http://www.rdforum.nhs.uk/content/
Rodríguez González D, Carpenter T, van Hemert JI, Wardlaw J. An open source toolkit for medical imaging de-identification. Eur Radiol 2010; 20:1896?1904. http://dx.doi.org/10.1007/s00330-010-1745-3. Sourceforge is: http://sourceforge.net/projects/privacyguard/
Wellcome Trust Clinical Research Facility, Edinburgh University: (CRF Manager software): https://www.wtcrf.ed.ac.uk/
Whyte A., Job D., Giles S., Lawrie S. Meeting Curation Challenges in a Neuroimaging Group. International Journal of Digital Curation. 2008, Vol. 3, No. 1, pp. 171-181
XNAT https://wiki.xnat.org/