Modern cloud-native applications are incredibly complex systems. Keeping the systems healthy and meeting SLAs for our customers is crucial for long-term success. In this session, we will dive into the three pillars of observability - metrics, logs, tracing - the foundation of successful troubleshooting in distributed systems. You'll learn the gotchas and pitfalls of rolling out the OpenTelemetry stack on Kubernetes to effectively collect all your signals without worrying about a vendor lock in. Additionally we will replace parts of the Prometheus stack to scrape metrics with OpenTelemetry collector and operator.
Modern cloud-native applications are incredibly complex systems. Keeping the systems healthy and meeting SLAs for our customers is crucial for long-term success. In this session, we will dive into the three pillars of observability - metrics, logs, tracing - the foundation of successful troubleshooting in distributed systems. You'll learn the gotchas and pitfalls of rolling out the OpenTelemetry stack on Kubernetes to effectively collect all your signals without worrying about a vendor lock in. Additionally we will replace parts of the Prometheus stack to scrape metrics with OpenTelemetry collector and operator.
** Kubernetes Certification Training: https://www.edureka.co/kubernetes-cer... **
This Edureka tutorial on "Kubernetes Networking" will give you an introduction to popular DevOps tool - Kubernetes, and will deep dive into Kubernetes Networking concepts. The following topics are covered in this training session:
1. What is Kubernetes?
2. Kubernetes Cluster
3. Pods, Services & Ingress Networks
4. Case Study of Wealth Wizards
5. Hands-On
DevOps Tutorial Blog Series: https://goo.gl/P0zAfF
Python is a popular programming language used in a variety of applications, including data analysis, web development, and artificial intelligence. Here's an introduction to the Basics of Python - A Beginners Guide! Whether you're new to programming or looking to brush up on your skills, this video covers the basics of Python programming language. From data types and operators to loops, functions and libraries, you'll get a solid foundation to start coding in Python.
Visit us: https://www.elewayte.com/
A basic introductory slide set on Kubernetes: What does Kubernetes do, what does Kubernetes not do, which terms are used (Containers, Pods, Services, Replica Sets, Deployments, etc...) and how basic interaction with a Kubernetes cluster is done.
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMillHostedbyConfluent
Here's the challenge: we've got a Kafka topic, where services publish messages to be delivered to browser-based clients through web sockets.
Sounds simple? It might, but we're faced with an increasing number of messages, as well as a growing count of web socket clients. How do we scale our solution? As our system contains a larger number of servers, failures become more frequent. How to ensure fault tolerance?
There’s a couple possible architectures. Each websocket node might consume all messages. Otherwise, we need an intermediary, which redistributes the messages to the proper web socket nodes.
Here, we might either use a Kafka topic, or a streaming forwarding service. However, we still need a feedback loop so that the intermediary knows where to distribute messages.
We’ll take a look at the strengths and weaknesses of each solution, as well as limitations created by the chosen technologies (Kafka and web sockets).
Python, the Language of Science and Engineering for EngineersBoey Pak Cheong
A talk given in November 2016 at IEM Malaysia to engineers, who are new to Python, a broad perspective of what Python is, why it is important to learn it and how it can help in solving/visualization of engineering and scientific tasks and problems.
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...NETWAYS
Open source is at the heart of what we do at Grafana Labs and there is so much happening! The intent of this talk to update everyone on the latest development when it comes to Grafana, Pyroscope, Faro, Loki, Mimir, Tempo and more. Everyone has had at least heard about Grafana but maybe some of the other projects mentioned above are new to you? Welcome to this talk 😉 Beside the update what is new we will also quickly introduce them during this talk.
Istio is a service mesh—a modernized service networking layer that provides a transparent and language-independent way to flexibly and easily automate application network functions. Istio is designed to run in a variety of environments: on-premise, cloud-hosted, in Kubernetes containers.
This is a talk on how you can monitor your microservices architecture using Prometheus and Grafana. This has easy to execute steps to get a local monitoring stack running on your local machine using docker.
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
Presentation investigating the state of FAIR practice and what is needed to turn FAIR data into reality given at the Danish FAIR conference in Copenhagen on 20th November 2018. https://vidensportal.deic.dk/en/Programme/FAIR_Toolbox_Nov2018 The presentation reflect on recent FAIR studies and international initiatives and outlines the recommendations emerging from the European Commission's FAIR Data Expert Group report - http://tinyurl.com/FAIR-EG
** Kubernetes Certification Training: https://www.edureka.co/kubernetes-cer... **
This Edureka tutorial on "Kubernetes Networking" will give you an introduction to popular DevOps tool - Kubernetes, and will deep dive into Kubernetes Networking concepts. The following topics are covered in this training session:
1. What is Kubernetes?
2. Kubernetes Cluster
3. Pods, Services & Ingress Networks
4. Case Study of Wealth Wizards
5. Hands-On
DevOps Tutorial Blog Series: https://goo.gl/P0zAfF
Python is a popular programming language used in a variety of applications, including data analysis, web development, and artificial intelligence. Here's an introduction to the Basics of Python - A Beginners Guide! Whether you're new to programming or looking to brush up on your skills, this video covers the basics of Python programming language. From data types and operators to loops, functions and libraries, you'll get a solid foundation to start coding in Python.
Visit us: https://www.elewayte.com/
A basic introductory slide set on Kubernetes: What does Kubernetes do, what does Kubernetes not do, which terms are used (Containers, Pods, Services, Replica Sets, Deployments, etc...) and how basic interaction with a Kubernetes cluster is done.
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMillHostedbyConfluent
Here's the challenge: we've got a Kafka topic, where services publish messages to be delivered to browser-based clients through web sockets.
Sounds simple? It might, but we're faced with an increasing number of messages, as well as a growing count of web socket clients. How do we scale our solution? As our system contains a larger number of servers, failures become more frequent. How to ensure fault tolerance?
There’s a couple possible architectures. Each websocket node might consume all messages. Otherwise, we need an intermediary, which redistributes the messages to the proper web socket nodes.
Here, we might either use a Kafka topic, or a streaming forwarding service. However, we still need a feedback loop so that the intermediary knows where to distribute messages.
We’ll take a look at the strengths and weaknesses of each solution, as well as limitations created by the chosen technologies (Kafka and web sockets).
Python, the Language of Science and Engineering for EngineersBoey Pak Cheong
A talk given in November 2016 at IEM Malaysia to engineers, who are new to Python, a broad perspective of what Python is, why it is important to learn it and how it can help in solving/visualization of engineering and scientific tasks and problems.
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...NETWAYS
Open source is at the heart of what we do at Grafana Labs and there is so much happening! The intent of this talk to update everyone on the latest development when it comes to Grafana, Pyroscope, Faro, Loki, Mimir, Tempo and more. Everyone has had at least heard about Grafana but maybe some of the other projects mentioned above are new to you? Welcome to this talk 😉 Beside the update what is new we will also quickly introduce them during this talk.
Istio is a service mesh—a modernized service networking layer that provides a transparent and language-independent way to flexibly and easily automate application network functions. Istio is designed to run in a variety of environments: on-premise, cloud-hosted, in Kubernetes containers.
This is a talk on how you can monitor your microservices architecture using Prometheus and Grafana. This has easy to execute steps to get a local monitoring stack running on your local machine using docker.
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
Presentation investigating the state of FAIR practice and what is needed to turn FAIR data into reality given at the Danish FAIR conference in Copenhagen on 20th November 2018. https://vidensportal.deic.dk/en/Programme/FAIR_Toolbox_Nov2018 The presentation reflect on recent FAIR studies and international initiatives and outlines the recommendations emerging from the European Commission's FAIR Data Expert Group report - http://tinyurl.com/FAIR-EG
The European Open Science Cloud: just what is it?Carole Goble
Presented at Jisc and CNI leaders conference 2018, 2 July 2018, Oxford, UK (https://www.jisc.ac.uk/events/jisc-and-cni-leaders-conference-02-jul-2018). The European Open Science Cloud. What exactly is it? In principle it is conceived as a virtual environment with open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines. How? By federating existing scientific data infrastructures, currently dispersed across disciplines and Member States. In practice, what it is depends on the stakeholder. To European Research Infrastructures it’s a coordinated mission to organise and exchange their data, metadata, software and services to be FAIR – Findable, Accessible, Interoperable, Reusable – and to use e-Infrastructures, either EU or commercial. To EU e-Infrastructures offering data storage and cloud services, it’s a funding mission to integrate their services, policies and organisational structures, and to be used by the Research Infrastructures. To agencies it’s a means to promote Open Science, standardisation, cross-disciplinary research and coordinated investment with a dream of a “one stop shop” for researchers. And for Libraries?
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
Presentació a càrrec de Mireia Alcalá, tècnica de Recursos d'Informació al CSUC, duta a terme al workshop en línia "Research Data Management & Open Science" organitzat per l'IDIBELL el 2 de novembre de 2020.
Using the Research Graph and Data Switchboard for cross-platform discoveryamiraryani
RDA EU Webinar - DDRI WG / April2017
Overview:
Driven by the rapid development of data storage technology, the number of data repositories is growing fast. Researchers now have access to a range of data infrastructures such as discipline-specific repositories and national (regional) data infrastructures. The problem is that these infrastructures are often operating in silos; that is, they do not connect their datasets to related research information in other platforms.
One solution to this problem is the work undertaken by the Data Description Registry Interoperability (DDRI) WG of Research Data Alliance (RDA). The group has developed the Research Data Switchboard which connects datasets and related information across research data repositories using information on co-authorship and jointly funded projects.
In this webinar, Dr Amir Aryani presents an overview of the Switchboard project and discuss how it enables connecting datasets to the Research Graph -- a distributed graph of scholarly works derived by the Switchboard project. Also, we will show a live demo of traversing the graph of connections between publications, datasets, researchers and research projects across repositories and data infrastructures.
Target Audience:
Research data managers, government agency representatives, data infrastructure managers, and technologists who are interested in interoperabilities between research infrastructures
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Paolo Manghi
Enabling better science: presentation on the results and vision of the OpenAIRE infrastructure and RDA Publishing Data Services Working Group in this direction.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
Presented at Digital Life 2018, Bergen, March 2018. In the Trust and Accountability session.
In recent years we have seen a change in expectations for the management and availability of all the outcomes of research (models, data, SOPs, software etc) and for greater transparency and reproduciblity in the method of research. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for stewardship [1] have proved to be an effective rallying-cry for community groups and for policy makers.
The FAIRDOM Initiative (FAIR Data Models Operations, http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards and sensitivity to asset sharing and credit anxiety. Our aim is a FAIR Research Commons that blends together the doing of research with the communication of research. The Platform has been installed by over 30 labs/projects and our public, centrally hosted FAIRDOMHub [2] supports the outcomes of 90+ projects. We are proud to support projects in Norway’s Digital Life programme.
2018 is our 10th anniversary. Over the past decade we learned a lot about trust between researchers, between researchers and platform developers and curators and between both these groups and funders. We have experienced the Tragedy of the Commons but also seen shifts in attitudes.
In this talk we will use our experiences in FAIRDOM to explore the political, economic, social and technical, social practicalities of Trust.
[1] Wilkinson et al (2016) The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
[2] Wolstencroft, et al (2016) FAIRDOMHub: a repository and collaboration environment for sharing systems biology research Nucleic Acids Research, 45(D1): D404-D407. DOI: 10.1093/nar/gkw1032
Overview of metadata standards, and how FAIRsharing and the FAIR Cookbook help selecting and using them. Presentation to the What is metadata? Common standards and properties. EHP Workshop, November 9, 2022: https://ephconference.eu/pre-conference-programme-441
FAIR, community standards and data FAIRification: components and recipesSusanna-Assunta Sansone
Overview of FAIR, FAIRsharing and the FAIR Cookbook at the ATI event on Knowledge Graphs: https://github.com/turing-knowledge-graphs/meet-ups/blob/main/symposium-2022.md
Presentation to the EOSC workshop on policies (https://www.google.com/url?q=https://eoscfuture.eu/eventsfuture/monitoring-eosc-readiness-fair-data-policies) on what FAIRsharing does for policies, including providing registration, discovery, flexible and clearer descriptions, relationships, machine readability and comparability.
The role of FAIRsharing in assessing FAIRness of digital objects: we assist, not assess. The workshop brought together a number of FAIR evaluation tools to discuss and design common FAIR tests to ensure tools deliver consistet results. Our presentation illustrates how FAIRsharing's content helps and how FAIRsharing's service contributes. The work will contribute to the work of the EOSC FAIR Metrics Task Force.
Presentation to the EC Workshop on Maximizing investments in health research: FAIR data for a coordinate COVID-19 response. Workshop III, November 8, 2021.
Presentation to the EC Workshop on Maximizing investments in health research: FAIR data for a coordinate COVID-19 response. Workshop I, October 11, 2021.
The FAIR Cookbook poster, as presented at the ELIXIR-UK Node and the UK Conference of Bioinformatics and Computational Biology 2021: https://www.earlham.ac.uk/uk-conference-bioinformatics-and-computational-biology-21
The FAIR Cookbook poster, as presented at the UK Conference of Bioinformatics and Computational Biology 2021: https://www.earlham.ac.uk/uk-conference-bioinformatics-and-computational-biology-21
Breif overview of the FAIR Cookbook for the UK Conference of Bioinformatics and Computational Biology 2021: https://www.earlham.ac.uk/uk-conference-bioinformatics-and-computational-biology-21
Brief introduction to FAIRsharing work with industry (publishers, pharmas) and the FAIR Cookbook (for the Life Science): https://www.opensciencefair.eu/2021/workshops/applying-fair-principles-to-open-science-and-industry-to-drive-innovation-challenges-and-opportunities
Overview of the role of FAIRsharing and a dedicated Collection of data resources (platforms and registries that collect, harmonize, and share participant-level clinical-epidemiological, OMICs, and/or imaging data) for the COVID-19 Clinical Research Coalition and The Tropical Disease Research initiatives: https://coronavirus.tghn.org/research-resources/data-sharing-covid-19
Presentation to the "FAIRification put into practice: Characterization of energy data and development of workflows" event by https://www.eeradata.eu => https://www.eeradata.eu/event/2857:online-discussion-fairification-put-into-practice-characterization-of-energy-data-and-development-of-workflows.html#
Presented at http://mcbios-maqc.org. The FAIR Principles have propelled the global debate in all disciplines about better RDM, transparent and reproducible data worldwide, and in all disciplines. FAIR has de facto become a global norm for good RDM, a prerequisite for data science, since their endorsement by global and intergovernmental leaders. Funding bodies are consolidating FAIR into their funding agreements; publishers have united behind FAIR as a way to remain at the forefront of open research; and in the private sector FAIR is adopted and enshrined in policy in major biopharmas, libraries, and unions. FAIR is changing the culture of data science, but work is needed to turn the principles into reality. I will use the work of the FAIRplus project as examplar to illustrate challenges and progresses.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
1. FAIR data: no longer optional, but it takes a village!
Susanna-Assunta Sansone, PhD
Academic Lead for Research Practice,
Professor of Data Readiness, Engineering Science
Associate Director, Oxford e-Research Centre
ELIXIR
Interoperability Platform Co-Lead
elixir-europe.org/platforms/interoperability
Founding
Academic Editor
nature.com/sdata
NFDI Physical Sciences Joint Colloquium, January 9, 2022
Slides: https://www.slideshare.net/SusannaSansone
datareadiness.eng.ox.ac.uk
0000-0001-5306-5690
@SusannaASansone
susanna-assunta.sansone@oerc.ox.ac.uk
2. Outline
Brief history of the FAIR Principles and FAIR awareness
Challenges and next steps
Highlights from the life sciences and ELIXIR
3. Acknowledgements
In particular slides from:
Carole Goble, Philippe Rocca-Serra, and Allyson Lister
My team* and our collaborators in many projects, working groups, advisory boards, incl.:
* https://datareadiness.eng.ox.ac.uk/#people
4. A set of principles to enhance the
value of all digital resources and its
reuse by humans and machines
Data that is discoverable and usable at scale
5. Discoveries are made using shared data and this requires data that are:
• Cited and stored to be discoverable
• Retrievable and structured in standard format(s)
• Richly described to be understandable
Rationale behind the FAIR Principles
https://www.forbes.com/sites/gilpress/2016/03/23/data-preparat
ion-most-time-consuming-least-enjoyable-data-science-task-surve
y-says/#276a35e6f637
Data preparation accounts for about 80% of the work of data scientists
7. A set of principles … not a standard
To enhance the value of all digital resources and its
reuse by humans and machines
A continuum of increasing reusability, via many different
implementations
Relaunch a dialogue with researchers and policy makers.
The FAIR Principles: just guiding principles
9. The scholarly publishing
ecosystem is changing
Data-relates mandates by funders and
institutions are growing
Researchers need recognition
and credit for data, software
and all research outputs
Human-machine and AI
collaboration is the future
Reproducibility of published studies
should be business as usual
The data driven revolution
10. • Publishers a “leverage point”
• Data is an integral part of the scholarly communications
• FAIR as a business opportunity, e.g. data support services, data publication tools
Data journals and data articles
• Incentive, credit for sharing
- Big and small data
- Negative results
- Long tail of data
- Curated aggregation
• Peer review of data
• Discoverability and reusability
- Complementing community databases
FAIR-enabling data journals and publishers’ services
12. FAIR data stakeholders: it takes a village
Personal, project, organizational, and public responsibilities
Researchers and
company scientists who
generate and use data
Service providers who
manage data
and infrastructure
from local to global
from public to
commercial
Authorities who set
community policy,
practice, resources,
compliance and global
sustainability
Funders, policy makers,
publishers, professional
societies, standards
organisations, institutions
Data Stewards and
Research Software
Engineers who support
data and data analytics
Programme and
institute directors who
set local policy,
methodology, practice,
resources and local
sustainability, drive
change management
14. Research Culture Programme
Research
Practice
Enabling researchers to
do reliable, reproducible,
and transparent
research
Valuing
Contributions
Recognising a diversity
of talents skills, &
outputs, and evaluating
them fairly
A partnership of academics and professional services,
supported by the Pro-Vice Chancellor of Research
Careers
Supporting researcher
careers by focusing
on career destinations
Priorities for advancing R&I culture at Oxford
https://staff.admin.ox.ac.uk/article/research-culture-at-oxford-improving-research-practices-and-supporting-research-careers
15. What’s good
for research
What’s good for
research careers
• Collaboration
• Diverse skills
• Openness & transparency
• Rigour
• Speed
• Novelty
• Ground-breaking results
• Ownership
• Self-interest
Research Culture Programme:
support what we reward and value
16. Royal Society
(Oct 2018)
Nuffield
(Dec 2014)
Wellcome Trust
(Jan 2020)
BEIS
(July 2021)
• Research Integrity
• Open Research Data
• Career Development of Researchers
• Openness in Animal Research
• Engaging the Public with Research
• Advancement of Knowledge Exchange in
Higher Education
• Technician Commitment
• SF Declaration on Research Assessment (DORA)
• Leiden Manifesto on Research Metrics
• Guidance for Safeguarding in International Development
Research
• Race Equality Charter
• Athena Swan Charter
Sector concordats Agreements Community principles
UK Gov
(July 2022)
Research Culture Programme:
integrate, simplifying sector requirements
21. 23
Nodes
220+
Orgs
Towards a federated digital infrastructure for
Life Science data, coordinating national
capabilities
Data & software FAIR and open as possible
transnational access and analysis
Gateway Communities of Practice,
European and Global initiatives,
Standards Bodies
Hub
elixir-europe.org
European research infrastructure for Life Science
22. The ELIXIR Interoperability Platform (EIP)
Food & Nutrition
+Toxicology
elixir-europe.org/platforms/interoperability
Deals with the challenges of
delivering FAIR data,
working with FAIR data, and
enable its actual reuse
23. Resources
Node-provided resources and
nascent one, annotation tools,
registries, catalogs, and
services
Standards
Generic and community-specific,
technical protocols, PIDs
schemas, reporting guidelines,
terminologies, models, formats
Methods
Good research data management,
and FAIRification design and
execution - retrospectively and
prospectively
WHY
Have practical stories to showcase, demonstrating
impact, and benefits
WHY
Systematic approach to collate knowledge, and
disseminate it to ELIXIR users and external researchers
EIP Knowledge Hub
HOW
Via a dissemination portal where users find
interoperability know-how, and use case examples
Interoperability stories and data journeys
HOW
Putting services, standards and methods in action,
showing how they can applied to cases and data types
The EIP: the FAIR service framework
24. Some examples:
Projects and Communities, incl.: Global
initiatives,e.g
NEW: RDA Life Science Infrastructure IG with Australia BioCommons,
the US NIH Office of Data Science Strategy, and H3ABioNet in Africa.
IMI2 project guidelines for
open access to publications
and research data
Funders’
guidelines
The EIP: the FAIR service framework
25. Interoperability stories: e.g. metadata authoring
ISA-implementing systems, internal and external to ELIXIR
• EMBL-EBI Metabolights (Claire o’Donovan)
• FAIRDom SEEK (Stuart Aitken, Rafael Buono, Flore d’Anna)
• Jackson Lab (Jake Emerson / Abigail Miller)
• NASA GeneLab (Dan Berrios)
• xOMics project (Anna Neuheus)
• EMBL-EBI Biosample (https://doi.org/10.1093/nar/gkab1046)
• Earlham Institute COPO (Rob Davey)
• Intermine (Gos Micklem)
github.com/ISA-tools/isa-api/discussions
github.com/ISA-tools/isa-api/issues
mailto: isatools@googlegroups.com
'Investigation' (the project context), 'Study' (a
unit of research) and 'Assay' (analytical
measurement) data model and serializations
(tabular, JSON and RDF)
● Experimental metadata authoring
● Compliance to metadata standards
● Formatting for submission to EBI
repositories
26. rdmkit.elixir-europe.org/nels_assembly
Omics data management
Data collected from sequencer facility (Norseq) and
deposited into a shared datastore (NeLS)
Selected samples and secondary data organised into ISA
structured catalogue with metadata (FAIRDOM-SEEK)
Data processing pipelines (Galaxy) registered in
WorkflowHub
Selected data enter deposition pipelines into public archives
(ELIXIR Deposition Databases)
Secure access (Feide)
Data management planning (DSW)
Ethical, Social, and Legal Implications checklist (Trygge)
FAIR data journeys: e.g. from ELIXIR-Norway
28. Share
Reuse
Preserve
Analyse
Process
Plan
Collect
Detailed recipes for
making FAIR data
FAIR Data Stewardship
Guidance, writing Data
Management Plans
Guidance and context for
RDM services
Registry of standards and
registries/repositories
EIP Knowledge Hub: the FAIR RDM know-how
Training elixiruknode.org/activities/elixir-dash-fellowship
30. Authored by almost 100 data
professionals from industry and
academia, led by ELIXIR Nodes,
with participation of USA NIH
Internationally
sustained and
adopted!
Pre-print: doi.org/10.5281/zenodo.7156792
A collection of recipes that cover the operation steps of FAIR data management
31. ● Over 70 recipes released and
more content available
● Covering over 20 data types,
incl:
○ omics
○ pre-clinical
○ clinical areas
But not limited to it!
A live resource, open to contributions
Learn how to improve the FAIRness with exemplar datasets
Understand the levels and indicators of FAIRness
Discover open source technologies, tools and services
Find out the required skills
Acknowledge the challenges
Coordinated by an Editorial Board
32. Navigate recipes: define your FAIR data journey
Search wizard: faircookbook.elixir-europe.org/content/search-wizard.html
35. Credit and citability of the recipes:
because all contributions matters!
CreDiT
attribution ontology
w3id.org/faircookbook/FCB006
36. Anatomy of a recipe: components
Ingredients
An idea of tools/skills needed
Step by step process
Guidelines, process, description
Practical
elements, code
snippets
#Python3
#zooma-annotator-script.py
file
def
get_annotations(propertyType
, propertyValues, filters = ""): "
Examples
Conclusions
What should I read next?
38. FAIRsharing: standards, databases and policies
Guides consumers to discover, select and use these resources with confidence
Helps producers to make their resources more visible, more widely adopted and cited
39. COMMUNITY STANDARDS
POLICIES
by funders, journals
and other organizations
DATABASES
including repositories
and knowledgebases
Identifiers
Terminologies Guidelines
Formats
Informative and educational resource, and a service
FAIRsharing provides curated descriptions and relationship graphs of
standards, databases and policies in all disciplines
41. Users, adopters and collaborators include:
https://fairsharing.org/communities
An endorsed output of the
FAIRsharing WG (since 2015):
A WG (since 2015) in:
A recommended resource in EOSC reports
Users from all stakeholder groups
Researchers Developers and curators Journal publishers
Societies and Alliances
Librarians and Trainers Funders
FAIRsharing: working with and for all stakeholders
46. Collection URL: fairsharing.org/graph/3515;
each record has a DOI
Collection URL: fairsharing.org/graph/3513;
each record has a DOI
FAIR organizations profiles: across disciplines
The standards,
repositories and
policies each EOSC
Cluster uses or endorses
47. NEW: FAIRsharing Community Curator Programme
Curate – Influence – Gain Attribution – Engage – Learn
Funded by the:
Ambassadorship Programme
Domain experts, from EOSC clusters and worldwide, who
● Help curate content, standards, repositories and policies
relevant to their EOSC cluster, RDA group, research
domain, or area of focus
● Contribute to educational material for the users
Enquires and apply: fairsharing.org/community_curation
48. First cohort of 16 curators!
They gain attribution of their
work in their profile
Curate – Influence – Gain Attribution – Engage – Learn
NEW: FAIRsharing Community Curator Programme
49. Share
Reuse
Preserve
Analyse
Process
Plan
Collect
Detailed recipes for
making FAIR data
FAIR Data Stewardship
Guidance, writing Data
Management Plans
Guidance and context for
RDM services
Registry of standards and
registries/repositories
EIP Knowledge Hub: the FAIR RDM know-how
Training elixiruknode.org/activities/elixir-dash-fellowship
50. references gets data from new, in progress
EIP Knowledge Hub: building links across resources
53. European Research Landscape Study 2022
• Objectives:
• To collect data on data production and use by scientific disciplines and relevant sub-disciplines
• To collect and analyse information on data deposition practices, data typology and volume
• To collect data on the level of maturity with respect to FAIR data implementation
• To assess responsiveness and readiness of research data repositories in terms of implementation of
FAIR principles
• Scope:
• All fields of science
• Survey of researchers: 15066 responses
• Survey of research data repositories: 316 responses
• Desk research; case studies; FAIRness assessment
Publications Office of the European Union, 2022, https://data.europa.eu/doi/10.2777/3648 Also
https://indico.lip.pt/event/1249/contributions/4555/
54. History of the problem
From the 2016 FAIR Principles paper:
These high-level FAIR Guiding Principles precede implementation choices, and do not
suggest any specific technology, standard, or implementation-solution; moreover, the
Principles are not, themselves, a standard or a specification. They act as a guide to data
publishers and stewards to assist them in evaluating whether their particular
implementation choices are rendering their digital research artefacts Findable, Accessible,
Interoperable, and Reusable.
55. FAIR is not a standard
It is a set of guiding principles that provide for a continuum of
increasing reusability, via many different implementations
56. Turning FAIR into reality requires we:
• deliver a number of research infrastructures and tools
• harmonize the standards for data and metadata
• address policies, education and training
• overcome technical, social and cultural challenges
• identify motivators, credit and rewards mechanisms
The road to FAIR data
57. The “cottage industry” of FAIR evaluation
https://fairassist.org
● Suffers from abundance and diversity!
○ 19 independent FAIR evaluation platforms (Oct 2022)**
○ Most are questionnaire-based, a small few are automated
○ Some are guidance, others are more judgmental
○ Some have invented their own FAIR tests and indicators
○ Even when using the same method, the results are
differents!
● Six NEW evaluators appeared since Feb 2022!
** Demonstrates that certain stakeholder communities are clamoring for a solution!
58. From assess to assist: not to judge but to help
And not everything that can be measured matters!
Strive for the FAIR enough!
Follow your data journey
and your needs!
More importantly in the current tools the tests
used and the result given, are not comparable!!
59. Developing guidance at European level
Collective views to shape guidance and influence policies:
outputs of the FAIR Metrics and Data Quality Task Force
doi.org/10.5281/zenodo.7390482
doi.org/10.5281/zenodo.7463421
60.
61. Modified form the Strategy for Culture Change:
https://www.cos.io/blog/continuing-acceleration-new-strategic-plan
and https://zenodo.org/record/6881009#.Y2BIeuTP2F5
Communities
Communities
Communities
Communities
Communities
Communities
Incentives
Incentives
Incentives
Infrastructure and Skills
Infrastructure and Skills
Infrastructure and Skills
Infrastructure and Skills
Infrastructure and Skills
Infrastructure and Skills
Usability
Usability
Usability
Usability
Usability
Usability
Policy
D4.4 Report and recommendations on FAIR incentives and
expected impacts in the Nordics, Baltics and EOSC
https://zenodo.org/record/6881009#.Y2BIeuTP2F5