Focusing on health verticle consistent with Open Knowledge Networking initiative. Also see a relevant review of: Contextualized Knowledge Graph portal: https://www.slideshare.net/ntkimvinh7/ckg-portal-a-knowledge-publishing-proposal-for-open-knowledge-network
Women Who Code-HSV Event:
'An Introduction to Machine Learning and Genomics'. Dr. Lasseigne will introduce the R programming language and the foundational concepts of machine learning with real-world examples including applications in the field of genomics with an emphasis on complex human disease research.
Brittany Lasseigne, PhD, is a postdoctoral fellow in the lab of Dr. Richard Myers at the HudsonAlpha Institute for Biotechnology and a 2016-2017 Prevent Cancer Foundation Fellow. Dr. Lasseigne received a BS in biological engineering from the James Worth Bagley College of Engineering at Mississippi State University and a PhD in biotechnology science and engineering from The University of Alabama in Huntsville. As a graduate student, she studied the role of epigenetics and copy number variation in cancer, identifying novel diagnostic biomarkers and prognostic signatures associated with kidney cancer. In her current position, Dr. Lasseigne’s research focus is the application of genetics and genomics to complex human diseases. Her recent work includes the identification of gene variants linked to ALS, characterization of gene expression patterns in schizophrenia and bipolar disorder, and development of non-invasive biomarker assays. Dr. Lasseigne is currently focused on integrating genomic data across cancers with functional annotations and patient information to explore novel mechanisms in cancer etiology and progression, identify therapeutic targets, and understand genomic changes associated with patient survival. Based upon those analyses, she is creating tools to share with the scientific community.
tools for communicating in the computational sciencesBrian Bot
Sage Bionetworks is a non-profit organization that aims to build an open scientific research "commons" by enabling open data sharing, accessible analysis platforms, and clear communication. They are developing ClearScience, a system to improve scientific publishing by allowing content to be consumed at varying levels of complexity and linking published results to underlying data, code, and computational environments. ClearScience aims to address issues like poor reproducibility and non-transparent methods by making the entire scientific process openly accessible and re-creatable.
AI has played a limited role in the COVID-19 pandemic so far, scoring a B- according to one expert. It has helped in some areas like early warning, image-based diagnosis, and optimizing clinical trials. However, it could not demonstrate great impact in regions with complex healthcare systems and high inertia. Going forward, AI may accelerate tasks like forecasting medical resource needs, optimizing logistics, and assisting vaccine and drug discovery for future pandemics if developed with proper objectives, less reliance on historical data, and alignment with human values.
Artificial Intelligence and AnaesthesiaFaizaBuhari
Artificial intelligence has several applications in anaesthesia including decision support systems, automated assist devices, and virtual reality training. Closed loop anaesthesia systems can precisely maintain drug levels and patient vitals within target ranges using feedback control of drug infusion pumps. While AI has benefits like reduced costs and time, and more consistent care, limitations include potential errors during learning, lack of emotional intelligence, and safety issues. Future areas of research include large datasets to improve AI and automated difficult airway assessment using facial recognition.
The document discusses developing a Personalized Health Knowledge Graph (PHKG) to support personalized preventative healthcare applications. PHKG integrates medical knowledge and personal health data to provide context-specific and personalized insights. It proposes an architecture with a knowledge graph, rule-based inference engine, and integration of knowledge from ontology catalogs. Challenges include modeling personalization/context, analyzing IoT data, and reusing knowledge from existing health resources. The solution is demonstrated for asthma management using the KHealth dataset and ontologies. Future work includes additional disease cases and dynamic knowledge graph evolution.
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
A look at how the thinking about Web Data and the sources of semantics can help drive decisions on combining latent and explicit knowledge. Examples from Elsevier and lots of pointers to related work.
CFPB's public data platform and the academic research communityDesiree Zamora Garcia
A talk given in 2013 at the CFPB. Getting data and sharing data is a pain. What does the workflow look like now, and what *could* it look like with the help of CFPB's public data platform?
Focusing on health verticle consistent with Open Knowledge Networking initiative. Also see a relevant review of: Contextualized Knowledge Graph portal: https://www.slideshare.net/ntkimvinh7/ckg-portal-a-knowledge-publishing-proposal-for-open-knowledge-network
Women Who Code-HSV Event:
'An Introduction to Machine Learning and Genomics'. Dr. Lasseigne will introduce the R programming language and the foundational concepts of machine learning with real-world examples including applications in the field of genomics with an emphasis on complex human disease research.
Brittany Lasseigne, PhD, is a postdoctoral fellow in the lab of Dr. Richard Myers at the HudsonAlpha Institute for Biotechnology and a 2016-2017 Prevent Cancer Foundation Fellow. Dr. Lasseigne received a BS in biological engineering from the James Worth Bagley College of Engineering at Mississippi State University and a PhD in biotechnology science and engineering from The University of Alabama in Huntsville. As a graduate student, she studied the role of epigenetics and copy number variation in cancer, identifying novel diagnostic biomarkers and prognostic signatures associated with kidney cancer. In her current position, Dr. Lasseigne’s research focus is the application of genetics and genomics to complex human diseases. Her recent work includes the identification of gene variants linked to ALS, characterization of gene expression patterns in schizophrenia and bipolar disorder, and development of non-invasive biomarker assays. Dr. Lasseigne is currently focused on integrating genomic data across cancers with functional annotations and patient information to explore novel mechanisms in cancer etiology and progression, identify therapeutic targets, and understand genomic changes associated with patient survival. Based upon those analyses, she is creating tools to share with the scientific community.
tools for communicating in the computational sciencesBrian Bot
Sage Bionetworks is a non-profit organization that aims to build an open scientific research "commons" by enabling open data sharing, accessible analysis platforms, and clear communication. They are developing ClearScience, a system to improve scientific publishing by allowing content to be consumed at varying levels of complexity and linking published results to underlying data, code, and computational environments. ClearScience aims to address issues like poor reproducibility and non-transparent methods by making the entire scientific process openly accessible and re-creatable.
AI has played a limited role in the COVID-19 pandemic so far, scoring a B- according to one expert. It has helped in some areas like early warning, image-based diagnosis, and optimizing clinical trials. However, it could not demonstrate great impact in regions with complex healthcare systems and high inertia. Going forward, AI may accelerate tasks like forecasting medical resource needs, optimizing logistics, and assisting vaccine and drug discovery for future pandemics if developed with proper objectives, less reliance on historical data, and alignment with human values.
Artificial Intelligence and AnaesthesiaFaizaBuhari
Artificial intelligence has several applications in anaesthesia including decision support systems, automated assist devices, and virtual reality training. Closed loop anaesthesia systems can precisely maintain drug levels and patient vitals within target ranges using feedback control of drug infusion pumps. While AI has benefits like reduced costs and time, and more consistent care, limitations include potential errors during learning, lack of emotional intelligence, and safety issues. Future areas of research include large datasets to improve AI and automated difficult airway assessment using facial recognition.
The document discusses developing a Personalized Health Knowledge Graph (PHKG) to support personalized preventative healthcare applications. PHKG integrates medical knowledge and personal health data to provide context-specific and personalized insights. It proposes an architecture with a knowledge graph, rule-based inference engine, and integration of knowledge from ontology catalogs. Challenges include modeling personalization/context, analyzing IoT data, and reusing knowledge from existing health resources. The solution is demonstrated for asthma management using the KHealth dataset and ontologies. Future work includes additional disease cases and dynamic knowledge graph evolution.
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
A look at how the thinking about Web Data and the sources of semantics can help drive decisions on combining latent and explicit knowledge. Examples from Elsevier and lots of pointers to related work.
CFPB's public data platform and the academic research communityDesiree Zamora Garcia
A talk given in 2013 at the CFPB. Getting data and sharing data is a pain. What does the workflow look like now, and what *could* it look like with the help of CFPB's public data platform?
The document describes the BigDataEurope project, which aims to lower barriers for using big data technologies across different societal domains. It provides a one-stop solution called the Big Data Integrator platform that allows flexible deployment of open source big and smart data management tools using Docker containers. The platform is demonstrated through 7 pilot use cases aligned with European Commission challenges. Workshops and webinars are held to engage stakeholders and show societal value. The project coordinates integration of tools for data acquisition, storage, processing, analytics and semantics.
BigDataEurope: Project Introduction @ Year #1 WorkshopsBigData_Europe
An overview of the BDE project's objective, as presented in the introduction (with some variations) in each of the 1st Year series of workshops (seven: one per societal challenge).
Workshop #1 Year Schedule available at: http://www.big-data-europe.eu/first-round-of-bigdataeurope-workshops-announced/
The document provides an overview of the SC1 Health Workshop technical platform. The platform goals are low cost of ownership, ease of use with big data, flexibility for different use cases, embracing emerging big data technologies, and simple integration. The platform architecture uses Docker containers and Compose files to define the pipeline topology. Components are developed as Docker images and the platform can be installed manually or using Docker Machine on various environments.
BigDataEurope - Big Data & Secure SocietiesBigData_Europe
This document discusses opportunities and challenges related to big data and secure societies. It outlines the goals of the EU's Horizon 2020 Secure Societies program, which aims to enhance resilience against disasters and threats through new crisis management tools, forensic tools to fight crime and terrorism, improved border security, and enhanced cybersecurity. It also describes how big data from sources like satellite imagery, aerial imagery, and open sources can support EU decision-making, but presents challenges involving integrating and analyzing heterogeneous data at large scales while protecting privacy. Requirements include dealing with complex datasets, speeding up processing and visualization, guaranteeing timely and value-added products through data fusion, and developing algorithms for trend analysis.
SC1 Workshop 2 General Introduction to BDEBigData_Europe
The Big Data Europe project is developing a flexible platform called the Big Data Integrator to support big data applications across multiple societal domains. The platform incorporates existing big data technologies in a modular Docker-based architecture. The project is testing the platform through 7 pilot use cases in domains like health, agriculture, energy, transport, climate, social sciences, and security. The pilots demonstrate how the platform can integrate and analyze diverse data sources to address challenges in each domain. The project is engaging stakeholders through workshops and other activities to gather requirements and showcase results.
BDE SC6-ws-05/12/2016 technology part - SWCBigData_Europe
The document discusses a pilot project within the Big Data Europe program to create an online dashboard of economic data from municipal budgets. The project aims to aggregate budget and spending data from multiple sources and formats, normalize it using RDF, and analyze and visualize the data to provide insights for citizens, researchers, and decision makers. Technical components used include Apache Flume, Kafka, Spark, HDFS, Virtuoso triplestore, and D3 for visualization. An initial version has been implemented and will be evaluated with municipalities and other stakeholders.
SC7 Hangout 3: Architecture of the BDE Pilot for Secure SocietiesBigData_Europe
This document describes the architecture of a pilot for secure societies that uses big data techniques. It involves workflows for event detection, change detection, and a common workflow. The event detection workflow crawls news, uses Cassandra to store items, detects events using Spark, and performs location enrichment. The change detection workflow aggregates images, detects changes using Spark, and clusters changes. The common workflow converts data to RDF using GeoTriples, stores and queries data using Strabon and SemaGrow, and includes a user interface called Sextant.
Second SC5 Pilot: Identifying the Release Location of a SubstanceBigData_Europe
This document discusses leveraging big data techniques to identify the release location of a hazardous substance into the atmosphere. It summarizes previous contamination incidents like Chernobyl and Fukushima and the challenges of inverse modeling. The proposed approach uses a large database of pre-computed dispersion simulations and historical weather data managed by big data tools. Matching current conditions to similar pre-computed cases could solve the release location faster than traditional computationally intensive inverse modeling. Key open issues to explore are defining "similar enough" weather and ensuring manageable database volumes for accurate matching.
Big Data Europe (BDE) is a consortium that aims to lower barriers for using big data technologies and help establish data value chains. BDE developed the Big Data Integrator (BDI) prototype, which features a Docker-based architecture supporting a variety of big data components like HDFS, Spark, and Kafka. The BDI also includes a semantic layer with tools like Semagrow to provide semantic perspectives over big data stores.
Big data Europe the transport pilot in Thessaloniki - Josep Maria SalanovaBigData_Europe
The document discusses mobile sensor data collection in Thessaloniki, Greece for transportation analysis. It describes using stationary Bluetooth sensors to track device IDs for travel time estimation and origin-destination analysis. It also uses floating car data from taxis and buses for traffic status and mobility pattern analysis. The data is processed using map matching and time series forecasting algorithms to classify current traffic states and predict future conditions. Websites and data portals for accessing the collected transportation data are also listed.
Big Data Europe Transport Pilot case, Luigi SelmiBigData_Europe
The document discusses a pilot project called SC4 that aims to build a scalable, fault-tolerant platform for processing large datasets and graphs using open source frameworks. The platform uses a microservices architecture with Apache Kafka as the message broker, Apache Flink for stream and batch processing, PostgreSQL and Elasticsearch for storage and indexing. In the second cycle, the project aims to extend its capabilities to include short-term traffic forecasting, improve the map-matching algorithm and parallelization, and enhance visualization tools.
BDE-SC6 Hangout - “Insight into Virtual Currency Ecosystems”BigData_Europe
Third SC6 webinar was held on 16 February 2017. It was organised by the Consortium of Social Science Data Archives (CESSDA) from Norway and the Semantic Web Company (SWC) from Austria. Theme of the webinar was “Insight into Virtual Currency Ecosystems” presented by Dr. Bernhard Haslhofer, Data Scientist at the Austrian Institute of Technology.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.Josh Cowls
1) The document discusses the challenges and opportunities of analyzing large datasets known as "Big Data" from a social science perspective.
2) It defines Big Data and explores how the approach could undermine traditional research methods but also presents new opportunities.
3) The key to effectively studying Big Data is developing a strong understanding of the data, collaborating across disciplines, and using mixed quantitative and qualitative methods to provide context and identify meaningful relationships for further study.
The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...Health Catalyst
It’s been over six years since IBM’s Watson amazed all of us on Jeopardy, but it has yet to deliver similar breakthroughs in healthcare. The headlines in last week’s Forbes article read, “MD Anderson Benches IBM Watson In Setback For Artificial Intelligence In Medicine.” Is it really a setback for the entire industry or not? Health Catalyst’s EVP for Product Development, Dale Sanders, believes that the challenges are unique to IBM’s machine learning strategy in healthcare. If they adjust that strategy and better manage expectations about what’s possible for machine learning in medicine, the future will be brighter for Watson, their clients, and AI in healthcare, in general. Watson’s success is good for all of us, but it’s failure is bad for all of us, too.
Join Dale as he discusses:
The good news: Machine learning technology is accelerating at a rate beyond Moore’s Law. Dale believes that machine learning algorithms and models are doubling in capability every six months.
The bad news: The healthcare data ecosystem is not nearly as rich as many would believe, and certainly not as rich as that used to train Watson for Jeopardy. Without high-volume, high-quality data, Watson’s potential and the constant advances in machine learning algorithms will hit a glass ceiling in healthcare.
The best news: By adjusting strategy and expectations, there are still plenty of opportunities to do great things with machine learning by using the current data content in healthcare, while we build out the volume and breadth of data we need to truly understand the patient at the center of the healthcare picture… and you don’t need an army of PhD data scientists to do it.
Philip Bourne discusses the opportunities for data science in addressing diabetes. Data science involves using diverse digital data to ask and answer relevant questions, arriving at statistically significant conclusions not otherwise possible. It also involves sharing findings in a way that can improve lives. Diabetes is well-suited for data science approaches due to increasing data from genomics, wearables, electronic health records, and predictive modeling successes. However, data science must be done carefully with input from experts to account for confounders and ensure accurate outcomes for complex health issues like diabetes.
How to Become a Data Science Company instead of a company with Data Scientist...Ruth Kearney
The journey of becoming a data science company is more about the culture and thinking, rather than hiring and up-skilling individuals. In Novartis, while we are hiring data scientists and spending a lot of time in training and learning related to data science, the destination for us is one of cultural change, which is required to make us a data science company.
Head of the AI Hub Dublin, Ashwini Mathur will share practical insights into the Novartis journey and how each employee plays a part. He will talk about the value of using the language of data science throughout the organisation and how this takes them one step closer to becoming a data science company.
- Biology is generating vast amounts of data from techniques like DNA sequencing that is growing exponentially. However, biology is unprepared to effectively analyze and utilize this "big data".
- There are few researchers trained in both data analysis and biology. Data is often not shared between researchers, and the current publishing system discourages open sharing of knowledge. Most computational research also lacks reproducibility.
- The presenter advocates for open science, data sharing, and improved training for the next generation of researchers in both biology and data analysis skills to help address these challenges and better leverage the growing stores of biological data.
The document describes the BigDataEurope project, which aims to lower barriers for using big data technologies across different societal domains. It provides a one-stop solution called the Big Data Integrator platform that allows flexible deployment of open source big and smart data management tools using Docker containers. The platform is demonstrated through 7 pilot use cases aligned with European Commission challenges. Workshops and webinars are held to engage stakeholders and show societal value. The project coordinates integration of tools for data acquisition, storage, processing, analytics and semantics.
BigDataEurope: Project Introduction @ Year #1 WorkshopsBigData_Europe
An overview of the BDE project's objective, as presented in the introduction (with some variations) in each of the 1st Year series of workshops (seven: one per societal challenge).
Workshop #1 Year Schedule available at: http://www.big-data-europe.eu/first-round-of-bigdataeurope-workshops-announced/
The document provides an overview of the SC1 Health Workshop technical platform. The platform goals are low cost of ownership, ease of use with big data, flexibility for different use cases, embracing emerging big data technologies, and simple integration. The platform architecture uses Docker containers and Compose files to define the pipeline topology. Components are developed as Docker images and the platform can be installed manually or using Docker Machine on various environments.
BigDataEurope - Big Data & Secure SocietiesBigData_Europe
This document discusses opportunities and challenges related to big data and secure societies. It outlines the goals of the EU's Horizon 2020 Secure Societies program, which aims to enhance resilience against disasters and threats through new crisis management tools, forensic tools to fight crime and terrorism, improved border security, and enhanced cybersecurity. It also describes how big data from sources like satellite imagery, aerial imagery, and open sources can support EU decision-making, but presents challenges involving integrating and analyzing heterogeneous data at large scales while protecting privacy. Requirements include dealing with complex datasets, speeding up processing and visualization, guaranteeing timely and value-added products through data fusion, and developing algorithms for trend analysis.
SC1 Workshop 2 General Introduction to BDEBigData_Europe
The Big Data Europe project is developing a flexible platform called the Big Data Integrator to support big data applications across multiple societal domains. The platform incorporates existing big data technologies in a modular Docker-based architecture. The project is testing the platform through 7 pilot use cases in domains like health, agriculture, energy, transport, climate, social sciences, and security. The pilots demonstrate how the platform can integrate and analyze diverse data sources to address challenges in each domain. The project is engaging stakeholders through workshops and other activities to gather requirements and showcase results.
BDE SC6-ws-05/12/2016 technology part - SWCBigData_Europe
The document discusses a pilot project within the Big Data Europe program to create an online dashboard of economic data from municipal budgets. The project aims to aggregate budget and spending data from multiple sources and formats, normalize it using RDF, and analyze and visualize the data to provide insights for citizens, researchers, and decision makers. Technical components used include Apache Flume, Kafka, Spark, HDFS, Virtuoso triplestore, and D3 for visualization. An initial version has been implemented and will be evaluated with municipalities and other stakeholders.
SC7 Hangout 3: Architecture of the BDE Pilot for Secure SocietiesBigData_Europe
This document describes the architecture of a pilot for secure societies that uses big data techniques. It involves workflows for event detection, change detection, and a common workflow. The event detection workflow crawls news, uses Cassandra to store items, detects events using Spark, and performs location enrichment. The change detection workflow aggregates images, detects changes using Spark, and clusters changes. The common workflow converts data to RDF using GeoTriples, stores and queries data using Strabon and SemaGrow, and includes a user interface called Sextant.
Second SC5 Pilot: Identifying the Release Location of a SubstanceBigData_Europe
This document discusses leveraging big data techniques to identify the release location of a hazardous substance into the atmosphere. It summarizes previous contamination incidents like Chernobyl and Fukushima and the challenges of inverse modeling. The proposed approach uses a large database of pre-computed dispersion simulations and historical weather data managed by big data tools. Matching current conditions to similar pre-computed cases could solve the release location faster than traditional computationally intensive inverse modeling. Key open issues to explore are defining "similar enough" weather and ensuring manageable database volumes for accurate matching.
Big Data Europe (BDE) is a consortium that aims to lower barriers for using big data technologies and help establish data value chains. BDE developed the Big Data Integrator (BDI) prototype, which features a Docker-based architecture supporting a variety of big data components like HDFS, Spark, and Kafka. The BDI also includes a semantic layer with tools like Semagrow to provide semantic perspectives over big data stores.
Big data Europe the transport pilot in Thessaloniki - Josep Maria SalanovaBigData_Europe
The document discusses mobile sensor data collection in Thessaloniki, Greece for transportation analysis. It describes using stationary Bluetooth sensors to track device IDs for travel time estimation and origin-destination analysis. It also uses floating car data from taxis and buses for traffic status and mobility pattern analysis. The data is processed using map matching and time series forecasting algorithms to classify current traffic states and predict future conditions. Websites and data portals for accessing the collected transportation data are also listed.
Big Data Europe Transport Pilot case, Luigi SelmiBigData_Europe
The document discusses a pilot project called SC4 that aims to build a scalable, fault-tolerant platform for processing large datasets and graphs using open source frameworks. The platform uses a microservices architecture with Apache Kafka as the message broker, Apache Flink for stream and batch processing, PostgreSQL and Elasticsearch for storage and indexing. In the second cycle, the project aims to extend its capabilities to include short-term traffic forecasting, improve the map-matching algorithm and parallelization, and enhance visualization tools.
BDE-SC6 Hangout - “Insight into Virtual Currency Ecosystems”BigData_Europe
Third SC6 webinar was held on 16 February 2017. It was organised by the Consortium of Social Science Data Archives (CESSDA) from Norway and the Semantic Web Company (SWC) from Austria. Theme of the webinar was “Insight into Virtual Currency Ecosystems” presented by Dr. Bernhard Haslhofer, Data Scientist at the Austrian Institute of Technology.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.Josh Cowls
1) The document discusses the challenges and opportunities of analyzing large datasets known as "Big Data" from a social science perspective.
2) It defines Big Data and explores how the approach could undermine traditional research methods but also presents new opportunities.
3) The key to effectively studying Big Data is developing a strong understanding of the data, collaborating across disciplines, and using mixed quantitative and qualitative methods to provide context and identify meaningful relationships for further study.
The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...Health Catalyst
It’s been over six years since IBM’s Watson amazed all of us on Jeopardy, but it has yet to deliver similar breakthroughs in healthcare. The headlines in last week’s Forbes article read, “MD Anderson Benches IBM Watson In Setback For Artificial Intelligence In Medicine.” Is it really a setback for the entire industry or not? Health Catalyst’s EVP for Product Development, Dale Sanders, believes that the challenges are unique to IBM’s machine learning strategy in healthcare. If they adjust that strategy and better manage expectations about what’s possible for machine learning in medicine, the future will be brighter for Watson, their clients, and AI in healthcare, in general. Watson’s success is good for all of us, but it’s failure is bad for all of us, too.
Join Dale as he discusses:
The good news: Machine learning technology is accelerating at a rate beyond Moore’s Law. Dale believes that machine learning algorithms and models are doubling in capability every six months.
The bad news: The healthcare data ecosystem is not nearly as rich as many would believe, and certainly not as rich as that used to train Watson for Jeopardy. Without high-volume, high-quality data, Watson’s potential and the constant advances in machine learning algorithms will hit a glass ceiling in healthcare.
The best news: By adjusting strategy and expectations, there are still plenty of opportunities to do great things with machine learning by using the current data content in healthcare, while we build out the volume and breadth of data we need to truly understand the patient at the center of the healthcare picture… and you don’t need an army of PhD data scientists to do it.
Philip Bourne discusses the opportunities for data science in addressing diabetes. Data science involves using diverse digital data to ask and answer relevant questions, arriving at statistically significant conclusions not otherwise possible. It also involves sharing findings in a way that can improve lives. Diabetes is well-suited for data science approaches due to increasing data from genomics, wearables, electronic health records, and predictive modeling successes. However, data science must be done carefully with input from experts to account for confounders and ensure accurate outcomes for complex health issues like diabetes.
How to Become a Data Science Company instead of a company with Data Scientist...Ruth Kearney
The journey of becoming a data science company is more about the culture and thinking, rather than hiring and up-skilling individuals. In Novartis, while we are hiring data scientists and spending a lot of time in training and learning related to data science, the destination for us is one of cultural change, which is required to make us a data science company.
Head of the AI Hub Dublin, Ashwini Mathur will share practical insights into the Novartis journey and how each employee plays a part. He will talk about the value of using the language of data science throughout the organisation and how this takes them one step closer to becoming a data science company.
- Biology is generating vast amounts of data from techniques like DNA sequencing that is growing exponentially. However, biology is unprepared to effectively analyze and utilize this "big data".
- There are few researchers trained in both data analysis and biology. Data is often not shared between researchers, and the current publishing system discourages open sharing of knowledge. Most computational research also lacks reproducibility.
- The presenter advocates for open science, data sharing, and improved training for the next generation of researchers in both biology and data analysis skills to help address these challenges and better leverage the growing stores of biological data.
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
Research allows people to learn new information, improve problem-solving skills, and challenge themselves. However, with an overwhelming amount of information available, it is important to determine the quality and reliability of sources. The Big 6 approach provides a systematic process for defining an information problem, locating relevant sources, extracting useful information, and evaluating the overall process and product.
This document discusses researchers' perspectives on managing and sharing data. It identifies that while gathering and evaluating data is core to research, few incentives exist for researchers to share their data. It also notes that practices vary significantly by discipline. Reasons researchers may hesitate to share include concerns about losing credit, control over their work, and legal or ethical issues. Additionally, a lack of standards, training, career paths and funding for data management pose challenges to sharing. The document calls for greater coordination between funders and institutions to develop sustainable national solutions that engage researchers and address their needs.
This document discusses researchers' perspectives on managing and sharing data. It identifies that while gathering and evaluating data is core to research, few incentives exist for researchers to share their data. It also notes that practices vary significantly by discipline. Reasons researchers may hesitate to share include concerns about losing control over their data or it being misused. Additionally, a lack of standards, training, career paths and funding for data management pose challenges to sharing. The document calls for greater coordination between funders and institutions to develop sustainable national solutions that engage researchers and address disciplinary differences.
CODATA International Training Workshop in Big Data for Science for Researcher...Johann van Wyk
Presentation at NeDICC Meeting on 16 July 2014. Feedback from CODATA International Training Workshop in Big Data for Science for Researchers from Emerging and Developing Countries, Beijing, China, 5-20 June 2014
Using social media to develop your scientific careerDaniel Quintana
These slides outline how you can harness social media to boost your professional profile, collaboration, information gathering, and public outreach. Practical information includes how to establish an online presence, effectively use Twitter and other useful platforms (e.g., blogs, Linkedin), and best manage the deluge of online information.
First presented at NORMENT, KG Jebsen Centre for Psychosis Research, University of Oslo on the 8th of October, 2014
How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...Dana Gardner
Transcript of a discussion on how HudsonAlpha leverages modern IT infrastructure and big data analytics to power research projects as well as pioneering genomic medicine findings.
This document provides an introduction to machine learning and its applications in genomics and biology. It discusses how biology and genomics data have become "big data" due to technological advances in sequencing and data generation. Machine learning is well-suited for analyzing these large, multidimensional datasets and addressing complex biological questions. The document outlines different machine learning approaches like supervised and unsupervised learning, and provides examples of real-world applications. R and Python are introduced as popular programming languages for machine learning.
This document discusses the opportunities and challenges of big data and data science over the next decade. It outlines three key points:
1. Big data is opening doors to accelerating scientific discovery through generating hypotheses from data and using ensemble models to gain multiple perspectives. However, challenges around efficacy and efficiency remain.
2. Data science can be viewed as applying the scientific method to data through discovering correlations from data-driven models and seeking causation through empirical verification, similar to traditional scientific discovery.
3. For data science to fulfill its potential, its laws and best practices around ensuring meaningful correlations and determining causation through verification must be followed, although they are not always common in practice currently. The limits of data science also
This document provides an introduction to understanding big data analytics. It defines big data as information that can't be processed or analyzed using traditional tools. Big data is growing rapidly, doubling every year, and by 2020 about 1.7 megabytes of new information will be created every second for every person on Earth.
The document outlines a plan to explain what big data is, why it is important, what data analytics is, and where it is used. It defines data analytics as examining, inspecting, cleansing, transforming, and modeling data to draw conclusions. The document discusses descriptive, predictive, diagnostic and unsupervised/supervised analytics methods. It concludes that big data analytics is an important research topic that allows for descriptive and predictive analysis
Lifelogging - A long term data analytics challengeCathal Gurrin
This document discusses lifelogging, which refers to the process of digitally storing data about all life experiences for future use. It envisions a "digital self" archive that captures a person's total experiences through sensors and creates a record that grows over time. Several challenges around organizing, searching, and analyzing such large datasets are discussed. The progress made in lifelogging research from early concepts to current technologies is reviewed. Potential future opportunities and issues around privacy, data access, and long-term preservation are also examined.
Luigi Selmi - The Big Data Integrator PlatformBigData_Europe
The document discusses the SC4 pilot project, which aims to build a scalable and fault-tolerant platform for processing large datasets using open source frameworks. The platform utilizes a microservices architecture and processes real-time floating car data for tasks like map matching and short-term traffic forecasting using algorithms like feedforward artificial neural networks. It also discusses how semantic technologies from projects like SANSA-Stack and LinkedGeoData could enable additional use cases for the SC4 platform.
Josep Maria Salanova - Introduction to BDE+SC4BigData_Europe
The BigDataEurope project aims to empower mobility management with big data. It has developed a modular platform that allows end-users to easily deploy functionality in their own systems using Docker containers. The platform is being tested through 7 pilot projects aligned with European societal challenges related to data-driven solutions. One pilot focuses on transport in Thessaloniki, Greece, using GPS, Bluetooth, and other probe data to improve map matching and mobility pattern recognition and forecasting to better manage traffic. The project coordinators are available for any questions.
The LeMO project aims to leverage big data to manage transport operations. It will identify issues around effective data mining and exploitation in transportation. The project will analyze barriers and opportunities for using big data in transport. It will also design recommendations for research and policy regarding big data in transport. The project involves 5 partners from 5 countries and will run from 2017 to 2020. It seeks to produce a roadmap for data collection, sharing, and exploitation to support European transport stakeholders. The project will study big data in transport through case studies focusing on issues like infrastructure innovation, transport efficiency, and data protection. It expects its recommendations and roadmap to help policymakers and industry better utilize big data.
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...BigData_Europe
This document describes a pilot project that aims to create an online dashboard of municipal budget and spending data from multiple European cities. The project will harvest, link, analyze and visualize open budget data to make it more useful for citizens, researchers and decision makers. Data will be integrated from Athens, Thessaloniki and Kalamaria in Greece initially, and potentially Vienna, Linz and Barcelona. The technical architecture uses Apache tools like Flume, Kafka and Spark to ingest, store and analyze the heterogeneous data sources in real-time. The goal is to evaluate how a big data approach can provide new insights into public finances from an integrated, multilingual and longitudinal perspective.
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...BigData_Europe
This document discusses Big Data Europe (BDE), an open source big data platform. It provides an overview of BDE's goals, architecture, and applications. The key points are:
1) BDE's goals are to make it easy to install, develop for, deploy, and integrate big data applications. It aims to unlock the value of data through an open platform.
2) BDE supports a variety of frameworks and uses Docker to package components. Its architecture includes layers for resources, data, processing, and applications.
3) BDE is being applied to challenges in domains like health, transport, energy, and security. Examples analyze traffic patterns, perform predictive maintenance, and detect changes in infrastructure
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...BigData_Europe
Where we are and are going for Big Data in OpenScience
Keynote talk at the Big Data Europe SC6 Workshop on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017: The perspective of European official statistics by Fernando Reis, Task-Force Big Data, European Commission (Eurostat).
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
Slides for keynote talk at the Big Data Europe workshop nr 3 on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference by Ron Dekker, Director CESSDA: European Open Science Agenda: where we are and where we are going?
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...BigData_Europe
Slides of the keynote at the 3rd Big Data Europe SC6 Workshop co-located at SEMANTiCS2018 in Amsterdam (NL) on: The European Research Data Landscape: Opportunities for CESSDA by Peter Doorn, Director DANS, Chair, Science Europe W.G. on Research Data. Chair, CESSDA ERIC General Assembly
The BigDataEurope 3rd Workshop took place on November 28, 2017 in Amsterdam. The workshop focused on learning about current and future data management challenges, Big Data tools, and the Open Big Data Platform. The agenda included talks on EU priorities on energy and digitalization, asset management and Big Data in wind energy, the BDE Open Platform, and system monitoring case studies. There was also a roundtable discussion on future data needs, the BDE platform, and research opportunities.
BDE SC3.3 Workshop - Data management in WT testing and monitoring BigData_Europe
Aresse Engineering is a company that provides engineering services including data acquisition, analysis, and monitoring for industries like wind energy. They have grown from 9 engineers in 2007 to over 20 staff members today. To address challenges of big data, Aresse developed tools like XDAS for acquisition, Hivex for data transfer, and a cloud platform called Hive for storage, analysis, and visualization of data. These tools allow customers to access massive amounts of synchronized measurement data. Machine learning and physical models are areas of ongoing development to enhance data quality.
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...BigData_Europe
Options for Wind Farm performance assessment and Power forecasting (Mr. A. Kyritsis, ALTSOL/TERNA) at the BigDataEurope Workshop, Amsterdam, Novermber 2017.
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...BigData_Europe
Big Data Europe: Workshop 3 SC6 Social Science - 11.09.2017 in Amsterdam, co-located with SEMANTiCS2017 titled: THE IMPORTANCE OF METADATA & BIG DATA IN OPEN SCIENCE. Slides by Ivana Versic (Cessda) and Martin Kaltenböck (SWC)
This document summarizes the MIDAS project, which is funded by the European Union to develop a platform using big data to support public health policies. The key points are:
1) MIDAS has received top funding scores and will receive €4.5 million over 40 months. It involves universities, technical partners, and policy boards across Europe.
2) The project aims to improve public health policies across Europe by facilitating analysis of diverse health datasets using its platform. This could lead to more effective policies and citizen benefits.
3) MIDAS will develop a secure data integration and analysis platform, with visualization tools to help policymakers. It will gather data from various sources to inform policy creation and evaluation
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfSelcen Ozturkcan
Ozturkcan, S., Berndt, A., & Angelakis, A. (2024). Mending clothing to support sustainable fashion. Presented at the 31st Annual Conference by the Consortium for International Marketing Research (CIMaR), 10-13 Jun 2024, University of Gävle, Sweden.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
5. Data growth is exponential
Number of articles indexed by
MEDLINE (PUBMED) per year
http://dan.corlan.net/medline-trend.html
Southan and Cameron, Beyond the tsunami: developing the infrastructure to deal with life
sciences data, The fourth Paradigm: Data-Intensive Scientific Discovery, Microsoft Corp., 2009.
6. Data is source of knowledge
Gillam et al., The healthcare sin-
gularity and the age of seman-
tic medicine, The fourth Paradigm:
Data-Intensive Scientific Discovery,
Microsoft Corp., 2009.
7. One just needs to make the connection
The Swanson case: Fish oil and Raynaud’s syndrome
• Public knowledge since 1975: Raynaud’s syndrome is associated with
high blood viscosity, platelet aggregability, vasoconstriction.
• Public knowledge since 1984: Fish oil leads to reductions in blood
lipids, platelet aggregability, blood viscosity, and vascular reactivity.
• Swanson puts the two together in 1986: Can dietary fish oil
ameliorate or prevent Raynaud’s syndrome? He supports his evidence
with relevant literature.
• DiGiacomo confirms the hypothesis in 1989.
Vision: Create big data machinery that help produce and support more
such cases.
8. BioASQ vision
• 2 articles published in biomedical journals every minute!
• Make sure this knowledge is used to the benefit of patients
• Need to make it accessible to biomedical experts
• Search is not e↵ective enough
• Push research in automated answering of questions
• A challenge for such systems can achieve a multiplying e↵ect
13. Talking to BioASQ experts
“as I’m growing older . . . I spend more time in front of the computer but I learn less. . . . the
complexity has increased, the variety has increased and my time has been reduced.”
“When I do research I use IT stu↵ all the time, I’m looking for papers and data...I’m also
doing statistical analysis”
14. Talking to BioASQ experts
“as I’m growing older . . . I spend more time in front of the computer but I learn less. . . . the
complexity has increased, the variety has increased and my time has been reduced.”
“When I do research I use IT stu↵ all the time, I’m looking for papers and data...I’m also
doing statistical analysis”
“PubMed and all this of course, we really depend on that. We cannot work if we don’t search
in those.”
“The bulk of information, that’s the main problem. . . . if someone has some extra time and
starts reading the results of a search then this might never end!”
“Sometimes you get irrelevant results. That’s the main problem.”
15. Talking to BioASQ experts
“as I’m growing older . . . I spend more time in front of the computer but I learn less. . . . the
complexity has increased, the variety has increased and my time has been reduced.”
“When I do research I use IT stu↵ all the time, I’m looking for papers and data...I’m also
doing statistical analysis”
“PubMed and all this of course, we really depend on that. We cannot work if we don’t search
in those.”
“The bulk of information, that’s the main problem. . . . if someone has some extra time and
starts reading the results of a search then this might never end!”
“Sometimes you get irrelevant results. That’s the main problem.”
“There is abundance of structured information . . . Unfortunately not all structured databases
are included into one.”
“I am looking at least into twenty di↵erent places for the same protein.”
“. . . since I use a number of di↵erent programs I forget them by the time I want to use them
again and I have to remember them once more.”
16. Putting big data to work
(Vision) Information systems that act like peers to human experts:
• understand the information need of the expert
• represent the need in machine-readable format
• match it to the information and data available in various sources
• provide comprehensive and comprehensible response, with supporting
material
(Big data added value) Integration of information from many sources
and large-scale semantic indexing.
(Outlook) Long way ahead but the impact of even marginal progress on
public health can be very significant!
17. Where do we stand?
• Big data is getting linked
• We have a range of tools for analysing and indexing such data
• BDE is set to bring the pieces together
• Challenges, such as push research further; NLM has improved
their MeSH indexing engine by 5%, in the first year of BioASQ!
• IBM Watson to be put in use by 14 US cancer research institutes
• Robotic science assistants making their appearance; “Adam”
generating functional genomics hypotheses about the yeast
Saccharomyces cerevisiae