Scott Edmunds talk on GigaScience Big-Data, Data Citation and future data handling at the International Conference of Genomics on the 15th November 2011.
Scott Edmunds talk in the "Policies and Standards for Reproducible Research" session on Revolutionizing Data Dissemination: GigaScience, at the Genomic Standards Consortium meeting at Shenzhen. 6th March 2012
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...GigaScience, BGI Hong Kong
Alexandra Basford's talk in the curation session at the InCoB meeting in Kuala Lumpar, 30/11/11 on: GigaScience: A Journal’s Perspective on Data Standards and Biocuration
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
Scott Edmunds talk at the HUPO congress in Geneva, September 6th 2011 on GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami.
Scott Edmunds talk on GigaScience Big-Data, Data Citation and future data handling at the International Conference of Genomics on the 15th November 2011.
Scott Edmunds talk in the "Policies and Standards for Reproducible Research" session on Revolutionizing Data Dissemination: GigaScience, at the Genomic Standards Consortium meeting at Shenzhen. 6th March 2012
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...GigaScience, BGI Hong Kong
Alexandra Basford's talk in the curation session at the InCoB meeting in Kuala Lumpar, 30/11/11 on: GigaScience: A Journal’s Perspective on Data Standards and Biocuration
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
Scott Edmunds talk at the HUPO congress in Geneva, September 6th 2011 on GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami.
GigaScience Editor-in-Chief Laurie Goodman's talk at the International Conference on Genomics pre-conference press-session on the release of new unpublished datasets, and a new look beta version of their database: GigaDB.org
Keynote presented to KE workshop held in conjunction with the release of the report "A Surfboard for Riding the Wave
Towards a four country action programme on research data": http://www.knowledge-exchange.info/Default.aspx?ID=469
Scott Edmunds slides for class 8 from the HKU Data Curation (module MLIM7350 from the Faculty of Education) course covering science data, medical data and ethics, and the FAIR data principles.
EZID: Easy dataset identification & management
Joan Starr, Manager, Strategic and Project Planning and EZID Service Manager, California Digital Library
Data and data curation are assuming a growing role today’s research library. New approaches are needed both to address the resulting challenges and take advantage of the emerging opportunities. Long-term identifiers represent one such tool. In this presentation, Joan Starr will introduce identifiers and an application designed to make them easy to create and manage: EZID. She will provide a closer look at two identifier types: DOIs and ARKs, and discuss what bringing an identifier service to your institution might mean.
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...GigaScience, BGI Hong Kong
Scott Edmunds talk at the AIST Computational Biology Research Center in Tokyo: Overcoming the Reproducibility Crisis: and why I stopped worrying a learned to love open data (& methods), July 1st 2014
Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center
Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.
Scientific discovery and innovation in an era of data-intensive science
William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator
The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.
Research Objects: more than the sum of the partsCarole Goble
Workshop on Managing Digital Research Objects in an Expanding Science Ecosystem, 15 Nov 2017, Bethesda, USA
https://www.rd-alliance.org/managing-digital-research-objects-expanding-science-ecosystem
Research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
A first step is to think of Digital Research Objects as a broadening out to embrace these artefacts or assets of research. The next is to recognise that investigations use multiple, interlinked, evolving artefacts. Multiple datasets and multiple models support a study; each model is associated with datasets for construction, validation and prediction; an analytic pipeline has multiple codes and may be made up of nested sub-pipelines, and so on. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described.
Findable Accessable Interoperable Reusable < data |models | SOPs | samples | articles| * >. FAIR is a mantra; a meme; a myth; a mystery; a moan. For the past 15 years I have been working on FAIR in a bunch of projects and initiatives in Life Science projects. Some are top-down like Life Science European Research Infrastructures ELIXIR and ISBE, and some are bottom-up, supporting research projects in Systems and Synthetic Biology (FAIRDOM), Biodiversity (BioVel), and Pharmacology (open PHACTS), for example. Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. Some have happy endings. Who are the villains and who are the heroes? What are the morals we can draw from these stories?
Published on Feb 07, 2016 by PMR
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus. Includes clips of the software in action
Published on Feb 29, 2016 by PMR
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
The Role of Libraries in Data Management and CurationNicole Vasilevsky
The Role of Libraries in Data Management and Curation, presented at the American Library Association conference in Las Vegas, NV, 07/29/14.
Abstract:
As increasing amounts of data are being generated, applying best practices in handling data is important, and librarians are well poised to assist users. During this session, we will discuss the role of libraries in assisting with data management, application of metadata, ontologies, data standards, and the publication of data in repositories and on the Semantic Web. This talk will describe best data practices and engage the attendees in interactive activities to demonstrate these principles.
Opening Keynote: The Many and the One: BCE themes in 21st century data curation
Allen Renear, Professor and Interim Dean, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Two scientists can be using "the same data" even though the computer files involved appear to be quite different. This is familiar enough, and for the most part, in small communities with shared practices and familiar datasets, raises few problems. But these informal understandings do not scale to 21st century data curation. To get full value from cyberinfrastructure we must support huge quantities of heterogeneous data developed by diverse communities and used by diverse communities -- often with widely varying methods, tools, and purposes. To accomplish this our informal practices and understandings much be replaced, or at least supplemented, by a shared framework of standard terminology for describing complex cascades of representational levels and relationships. Fundamental problems in data curation -- and in particular problems involving provenance, identifiers, and data citation — cannot be fully resolved without such a framework. Although the deepest problems here have ancient origins, useful practical measures are now within reach. Some recent work toward this end that is being carried out at the Center for Informatics Research in Science and Scholarship (CIRSS) at the Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign will be described.
GigaScience Editor-in-Chief Laurie Goodman's talk at the International Conference on Genomics pre-conference press-session on the release of new unpublished datasets, and a new look beta version of their database: GigaDB.org
Keynote presented to KE workshop held in conjunction with the release of the report "A Surfboard for Riding the Wave
Towards a four country action programme on research data": http://www.knowledge-exchange.info/Default.aspx?ID=469
Scott Edmunds slides for class 8 from the HKU Data Curation (module MLIM7350 from the Faculty of Education) course covering science data, medical data and ethics, and the FAIR data principles.
EZID: Easy dataset identification & management
Joan Starr, Manager, Strategic and Project Planning and EZID Service Manager, California Digital Library
Data and data curation are assuming a growing role today’s research library. New approaches are needed both to address the resulting challenges and take advantage of the emerging opportunities. Long-term identifiers represent one such tool. In this presentation, Joan Starr will introduce identifiers and an application designed to make them easy to create and manage: EZID. She will provide a closer look at two identifier types: DOIs and ARKs, and discuss what bringing an identifier service to your institution might mean.
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...GigaScience, BGI Hong Kong
Scott Edmunds talk at the AIST Computational Biology Research Center in Tokyo: Overcoming the Reproducibility Crisis: and why I stopped worrying a learned to love open data (& methods), July 1st 2014
Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center
Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.
Scientific discovery and innovation in an era of data-intensive science
William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator
The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.
Research Objects: more than the sum of the partsCarole Goble
Workshop on Managing Digital Research Objects in an Expanding Science Ecosystem, 15 Nov 2017, Bethesda, USA
https://www.rd-alliance.org/managing-digital-research-objects-expanding-science-ecosystem
Research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
A first step is to think of Digital Research Objects as a broadening out to embrace these artefacts or assets of research. The next is to recognise that investigations use multiple, interlinked, evolving artefacts. Multiple datasets and multiple models support a study; each model is associated with datasets for construction, validation and prediction; an analytic pipeline has multiple codes and may be made up of nested sub-pipelines, and so on. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described.
Findable Accessable Interoperable Reusable < data |models | SOPs | samples | articles| * >. FAIR is a mantra; a meme; a myth; a mystery; a moan. For the past 15 years I have been working on FAIR in a bunch of projects and initiatives in Life Science projects. Some are top-down like Life Science European Research Infrastructures ELIXIR and ISBE, and some are bottom-up, supporting research projects in Systems and Synthetic Biology (FAIRDOM), Biodiversity (BioVel), and Pharmacology (open PHACTS), for example. Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. Some have happy endings. Who are the villains and who are the heroes? What are the morals we can draw from these stories?
Published on Feb 07, 2016 by PMR
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus. Includes clips of the software in action
Published on Feb 29, 2016 by PMR
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
The Role of Libraries in Data Management and CurationNicole Vasilevsky
The Role of Libraries in Data Management and Curation, presented at the American Library Association conference in Las Vegas, NV, 07/29/14.
Abstract:
As increasing amounts of data are being generated, applying best practices in handling data is important, and librarians are well poised to assist users. During this session, we will discuss the role of libraries in assisting with data management, application of metadata, ontologies, data standards, and the publication of data in repositories and on the Semantic Web. This talk will describe best data practices and engage the attendees in interactive activities to demonstrate these principles.
Opening Keynote: The Many and the One: BCE themes in 21st century data curation
Allen Renear, Professor and Interim Dean, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Two scientists can be using "the same data" even though the computer files involved appear to be quite different. This is familiar enough, and for the most part, in small communities with shared practices and familiar datasets, raises few problems. But these informal understandings do not scale to 21st century data curation. To get full value from cyberinfrastructure we must support huge quantities of heterogeneous data developed by diverse communities and used by diverse communities -- often with widely varying methods, tools, and purposes. To accomplish this our informal practices and understandings much be replaced, or at least supplemented, by a shared framework of standard terminology for describing complex cascades of representational levels and relationships. Fundamental problems in data curation -- and in particular problems involving provenance, identifiers, and data citation — cannot be fully resolved without such a framework. Although the deepest problems here have ancient origins, useful practical measures are now within reach. Some recent work toward this end that is being carried out at the Center for Informatics Research in Science and Scholarship (CIRSS) at the Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign will be described.
On March 15, 33 experts from the Department’s National Nuclear Security Administration (NNSA) arrived in Japan along with more than 17,200 pounds of equipment. After initial deployments at U.S. consulates and military installations in Japan, these teams have utilized their unique skills, expertise and equipment to help assess, survey, monitor and sample areas for radiation. The 33 team members joined another six DOE personnel already in Japan.
Since arriving in Japan, NNSA teams have collected and analyzed data gathered from more than 40 hours of flights aboard Department of Defense aircraft and thousands of ground monitoring points.
Publishing of Scientific Data - Science Foundation Ireland Summit 2010jodischneider
Slides prepared for the Publishing of Scientific Data workshop at the Science Foundation Ireland Summit 2010. I was one of three panelists. We had a lively discussion!
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...GigaScience, BGI Hong Kong
Scott Edmunds talk at the 7th Internation Conference on Genomics: "Channeling the Deluge: Reproducibility & Data Dissemination in the “Big-Data” Era. ICG7, Hong Kong 1st December 2012
"
Data analysis & integration challenges in genomicsmikaelhuss
Presentation given at the Genomics Today and Tomorrow event in Uppsala, Sweden, 19 March 2015. (http://connectuppsala.se/events/genomics-today-and-tomorrow/) Topics include APIs, "querying by data set", machine learning.
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...GigaScience, BGI Hong Kong
Slides from GigaScience press-conference at BGI's Bio-IT APAC meeting on the GigaScience website launch and release of first unpublished animal genomes released from database. Genomes include polar bear, penguin, pigeon and macaque. 6th July 2011
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
Scott Edmunds at International Data Week 2022: A decades experiences in transparent and interactive publication of FAIR data and software via an end-to-end XML publishing platform. 21st June 2022
GigaByte Chief Editor Scott Edmunds presents on how to prepare a data paper for the TDR and WHO sponsored call for data papers describing datasets on vectors of human diseases launched in Nov 2021. Presented at the GBIF webinar on 25th January 2022 and aimed at authors interested in submitting a manuscript submitted to the series.
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
Scott Edmunds at the STM Week 2020 Digital Publishing seminar on Demonstrating bringing publications to life via an End-to-end XML publishing platform. 2nd December 2020
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
Scott Edmunds on a new publishing workflow for rapid dissemination of genomes using GigaByte & GigaDB. Presented at Biodiversity 2020 in the Annotation & Databases track, 9th October 2020.
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
Scot Edmunds talk at CODATA2019 on Quantifying how FAIR is Hong Kong: The Hong Kong Shareability of Hong Kong University Research Experiment. 19th September 2019 in Beijing
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
Scott Edmunds talk at IARC, Lyon. How can we make science more trustworthy and FAIR? Principled publishing for more evidence based research. 8th July 2019
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
A 3 part talk presented at PAG Asia 2019 in Shenzhen- The Digitalization of Ruili Botanical Garden Project: Production, Curation and Re-Use. Presented by Huan Liu (CNGB), Scott Edmunds (GigaScience) & Stephen Tsui (CUHK). 8th June 2019
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
Scott Edmunds at the China National GeneBank Youth Biodiversity MegaData Forum: Democratising biodiversity and genomics research: open and citizen science to build trust and fill the data gaps. 18th December 2018
Ricardo Wurmus at #ICG13: Reproducible genomics analysis pipelines with GNU Guix. Presented at the GigaScience Prize Track at the International Conference on Genomics, Shezhen 26th October 2018
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
Paul Pavlidis talk at the #ICG13 GigaScience Prize Track: Monitoring changes in the Gene Ontology and their impact on genomic data analysis (GOtrack). Shenzhen, 26th October 2018
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
Stefan Prost presentation for the #ICG13 GigaScience Prize Track: Genome analyses show strong selection on coloration, morphological and behavioral phenotypes in birds-of-paradise. Shenzhen, 26th October, 2018
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
Lisa Johnson's talk at the #ICG13 GigaScience Prize Track: Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes. Shenzhen, 26th October 2018
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
Scott Edmunds presentation on: Reproducible method and benchmarking publishing for the data (and evidence) driven era. The Silk Road Forensics Conference, Yantai, 18th September 2018
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
Mary Ann Tuli's talk at the International Society of Biocuration meeting : What MODs can learn from Journals – a GigaDB curator’s perspective. Shanghai 9th April 2018
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
Scott Edmunds: Data publication in the data deluge
1. Data publication in the data deluge
Scott Edmunds, GigaScience/BGI Hong Kong
COASP 2012, Budapest, 20th September 2012
www.gigasciencejournal.com
2. The Data Challenge:
•1.2 zettabytes (1021) electronic data generated globally each year
•>Exponential growth of genomics data (& growth in imaging and
MS data following)
Source: http://www.genome.gov/sequencingcosts/ (with apologies)
•Issues with reproducibility, hosting, curation, interoperability
•Need for better incentives to overcome these
Source: 1. Mervis J. U.S. science policy. Agencies rally to tackle big data. Science. 2012 Apr 6;336(6077):22.
3. Large-Scale Data
Journal/Database
In conjunction with:
Editor-in-Chief: Laurie Goodman, PhD
Editor: Scott Edmunds, PhD
Commisioning Editor: Nicole Nogoy, PhD
Lead Curator: Tam Sneddon D.Phil
Data Platform: Peter Li, PhD
www.gigasciencejournal.com
4. GigaDB is a new database integrated with the GigaScience journal to meet the needs of a new generation of biological
and biomedical research as it enters the era of “big-data”… (see more)
6. Anatomy of a Publication
Idea
Study
Metadata
Data
Analysis
Answer
7. Anatomy of a Data Publication
Idea
Study
Metadata
Data
Analysis
Answer
8. Issues for Data Publication
Idea
Cultural issues:
Study
Technical issues:
Metadata
Data
Analysis
Answer
9. Issues for Data Publication
Idea
Cultural issues:
Study
Metadata
Data
Analysis
Adoption held back by:
Answer journal policies, citation, tracking…
* T-Shirts available from Graham Steel / http://www.zazzle.co.uk/steelgraham
10. Issues for Data Publication
Idea
Study
Technical issues:
Metadata
What do we do with the data?
Data
Analysis
Answer
11. Issues for Data Publication
Idea
Study
Technical issues:
Metadata
What do we do with the data?
Data
Lightweight:
Analysis •Metadata only journals
•Get someone else to host
Heavyweight:
Answer •Become a repository
12. To host or not to host?
Against: supplementary files argument
The Journal of Neuroscience Average size of a Journal of Neuroscience article
and supplemental material in megabytes.
Announcement Regarding Supplemental Material:
Beginning November 1, 2010, The Journal of
Neuroscience will no longer allow authors to include
supplemental material when they submit new manuscripts
and will no longer host supplemental material on its web site
for those articles.
“While the size of articles has grown gradually over the
past decade, the supplemental material associated with a
typical Journal article appears to be growing exponentially
and is rapidly approaching the size of an article. The
sheer volume of supplemental material is adversely
affecting peer review.”
Maunsell J J. Neurosci. 2010;30:10599-10600
13. $1000 genome = million $ peer-review?
To review: (>6TBp, >1500 datasets)
S3 (storage) = $15,000
EC2 (analysis w/ BLASTx) = $500,000
Source: Folker Meyer/Wilkening et al. 2009, CLUSTER'09. IEEE International Conference on Cluster Computing and Workshops
14. $1000 genome = million $ peer-review?
To review: (>6TBp, >1500 datasets)
S3 (storage) = $15,000
EC2 (analysis w/ BLASTx) = $500,000
Source: Folker Meyer/Wilkening et al. 2009, CLUSTER'09. IEEE International Conference on Cluster Computing and Workshops
ENCODE analysis Virtual Machine:
Containing: input data, code
bundles with scripts and
processing steps, outputs
AWS = ~$5,000
Source: James Taylor / http://encodeproject.org/ENCODE/integrativeAnalysis/VM
15. To host or not to host?
For: reproducibility
The Guardian, 14th September 2012: Replication is the only solution to scientific fraud.
http://www.guardian.co.uk/commentisfree/2012/sep/14/solution-scientific-fraud-replication
For: “data is the new oil”
William Gibson: "Information is the currency of the future world”
Sir Tim Berners-Lee: "Data is a precious thing and will last longer than
the systems themselves”
Move compute to the data: think EC2 rather than S3
DNA Nexus + 0.5PB SRA data = $15 million given by Google
Source:DNA Nexus/SRA http://techcrunch.com/2011/10/12/dnanexus-raises-15-million-teams-with-google-to-host-massive-dna-database/
18. For data citation to work, needs:
1. Proven utility/potential user base.
2. Acceptance/inclusion by journals.
3. Data+Citation: inclusion in the references.
4. Tracking by citation indexes.
5. Usage of the metrics by the community…
19. Datacitation 1: utility/user base.
Establishment of data DOIs and use by databases:
Shackleton NJ, Hall MA, Vincent E (2001): Mean stable carbon isotope ratios
of Cibicidoides wuellerstorfi from sediment core MD95-2042 on the Iberian
margin, North Atlantic. PANGAEA - Data Publisher for Earth & Environmental
Science. http://doi.pangaea.de/10.1594/PANGAEA.58229
Cited in:
Pahnke K, Zahn R: Southern Hemisphere Water Mass Conversion Linked with North Atlantic
Climate Variability. Science 2005, 307:1741 -1746.
Nocek B, Xu X, Savchenko A, Edwards A, Joachimiak A. 2007. PDB
ID: 2P06 Crystal structure of a predicted coding region AF_0060
from Archaeoglobus fulgidus DSM 4304. 10.2210/pdb2p06/pdb.
Cited in:
Andreeva A, Howorth D, Chandonia J-M, Brenner SE, Hubbard TJP, Chothia C, Murzin AG: Data
growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008,
36:D419-425.
20. BGI Datasets Get DOI®s
Invertebrate Released pre-publication
Ant Paper Published in GigaScience
- Florida carpenter ant Microbe
- Jerdon’s jumping ant Vertebrates E. Coli O104:H4 TY-2482
- Leaf-cutter ant Darwin’s Finch T2D gut metagenome
Roundworm Giant panda Macaque
Schistosoma -Chinese rhesus Cell-Lines
Silkworm -Crab-eating Chinese Hamster Ovary
Mini-Pig Mouse methylomes
Human Naked mole rat
Asian individual (YH) Parrot, Puerto Rican PLANTS
- DNA Methylome Penguin Chinese cabbage
- Genome Assembly - Emperor penguin Cucumber
- Transcriptome - Adelie penguin Foxtail millet
Cancer (14TB) Pigeon, domestic Pigeonpea
Single cell bladder cancer Polar bear Potato
HBV infected exomes Sheep Sorghum
Ancient DNA Tibetan antelope
- Saqqaq Eskimo
- Aboriginal Australian
21. Our first DOI:
To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang,
Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun,
Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ;
Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482
isolate genome sequencing consortium (2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen.
doi:10.5524/100001
http://dx.doi.org/10.5524/100001
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
22.
23.
24.
25. 1.3 The power of intelligently open data
The benefits of intelligently open data were powerfully
illustrated by events following an outbreak of a severe gastro-
intestinal infection in Hamburg in Germany in May 2011. This
spread through several European countries and the US,
affecting about 4000 people and resulting in over 50 deaths. All
tested positive for an unusual and little-known Shiga-toxin–
producing E. coli bacterium. The strain was initially analysed by
scientists at BGI-Shenzhen in China, working together with
those in Hamburg, and three days later a draft genome was
released under an open data licence. This generated interest
from bioinformaticians on four continents. 24 hours after the
release of the genome it had been assembled. Within a week
two dozen reports had been filed on an open-source site
dedicated to the analysis of the strain. These analyses
provided crucial information about the strain’s virulence and
resistance genes – how it spreads and which antibiotics are
effective against it. They produced results in time to help
contain the outbreak. By July 2011, scientists published papers
based on this work. By opening up their early sequencing
results to international collaboration, researchers in Hamburg
produced results that were quickly tested by a wide range of
experts, used to produce new knowledge and ultimately to
control a public health emergency.
31. Is the DOI…
* Certain types of genomics data must also be deposited in INSDC databases (SRA & Genbank).
32. And in more journals…
Hodkinson BP, Uehling JK, Smith ME (2012) Data from: Lepidostroma
vilgalysii, a new basidiolichen from the New World. Dryad Digital
Repository. doi:10.5061/dryad.j1g5dh23
Cited in:
Hodkinson BP, Uehling JK, Smith ME: Lepidostroma vilgalysii, a new basidiolichen
from the New World. Mycological Progress 2012. Advance Online Publication.
Roberts SB (2012) Herring Hepatic Transcriptome 34300
contigs.fa. Figshare. Available:
hdl.handle.net/10779/084d34370fbda29bbc67b3c5ecb02
575. Accessed 2012 Jan 20.
Cited in:
Roberts SB, Hauser L, Seeb LW, Seeb JE (2012) Development of Genomic Resources
for Pacific Herring through Targeted Transcriptome Pyrosequencing. PLoS ONE 7(2):
e30908. doi:10.1371/journal.pone.0030908
33. For data citation to work, needs:
1. Proven utility/potential user base. ✔
2. Acceptance/inclusion by journals. ✔
3. Data+Citation: inclusion in the references. ✔
4. Tracking by citation indexes.
5. Usage of the metrics by the community…
35. Datacitation 4: tracking?
✗FAIL
DataCite metadata in harvestable form (OAI-PMH)
- lists some DataCite DOIs, but says:
Datasets listed are the “result of approximations in the indexing
algorithms.”
“Google Scholar's intended coverage is for scholarly articles. At
this point, we don't include datasets. “
36. Datacitation 4: tracking?
✗FAIL
DataCite metadata in harvestable form (OAI-PMH)
✗ Working on it. Coming soon…
37.
38. Datacitation 5: metrics?
“As a result of diverse practices and tool
limitations, data citations are currently very
difficult to track.”
39. Datacitation 5: metrics?
✗FAIL
Research Remix, 29th May 2012: http://researchremix.wordpress.com/2012/05/29/dear-research-
data-advocate-please-sign-the-petition-oamonday/
“I’m afraid we are making promises to data
creators about attribution and reward that we
can’t keep. ”Make your data citeable!” is the cry.
OK. So citeable is step one. Cited is step two. But
for the citation to be useful, it has to be indexed
so that citation metrics can be tracked and
admired and used.
Who is indexing data citations right now? As far
as I can tell: absolutely no one.”
40. Where data citation is in 2012:
1. Proven utility/potential user base. ✔
2. Acceptance/inclusion by journals. ✔
3. Data+Citation: inclusion in the references. ✔
4. Tracking by citation indexes. ✔/✗
5. Usage of the metrics by the community… ✗
42. Addressing the reproducibility gap:
Computable methods/workflow systems
Bioinformatics
Development Biomedical and bioinformatics research Publishing
43. Redefining what is a paper in the era of big-data?
goal: Executable Research Objects
Citable DOI
46. Methods +
Data +
Publication
• Background
• Methods Doi for workflows?
• Results (Data)
doi:10.5524/100035
• Conclusions/Discussion
doi:10.1186/2047-217X-1-3
47. Data Methods Analysis
doi:10.5524/100035 + DOI: x = doi:10.1186/2047-217X-1-3
DOI: A + DOI: X = DOI: 1
48. Data Methods Analysis
doi:10.5524/100035 + DOI: x = doi:10.1186/2047-217X-1-3
DOI: A + DOI: X = DOI: 1
DOI: B + DOI: X = DOI: 2
49. Data Methods Analysis
doi:10.5524/100035 + DOI: x = doi:10.1186/2047-217X-1-3
DOI: A + DOI: X = DOI: 1
DOI: B + DOI: X = DOI: 2
DOI: A + DOI: Y = DOI: 3
50. Data Methods Analysis
doi:10.5524/100035 + DOI: x = doi:10.1186/2047-217X-1-3
DOI: A + DOI: X = DOI: 1
DOI: B + DOI: X = DOI: 2
DOI: A + DOI: Y = DOI: 3
A, B, C… X, Y, Z… = 4, 5, 6…
52. Different shaped publishable objects
Different levels of granularity
Experiment e.g. doi:10.5524/100001 Papers
(e.g. ACRG project)
e.g. doi:10.5524/100001-2 Data/
Datasets Micropubs
(e.g. cancer type)
e.g. doi:10.5524/100001-2000
Sample or doi:10.5524/100001_xyz
(e.g. specimen xyz)
Smaller still? Facts/Assertions (~1013 in literature) Nanopubs
53. Adding “value” publishing data
• Scope for different shaped publishable objects
• Scope for publishing methods/executable papers
• Peer review of data problematic
– Post publication peer review
– Change criteria (assess on transparency/access only)
– Better use of workflows/cloud/VMs
DOIs are cheap*, data is precious: maximise its use
* ish
54. Adding “value” publishing data
DOIs are cheap*, data is precious: maximise its use
* ish Source: Ross Mounce CC-BY http://rossmounce.co.uk/2012/09/04/the-gold-oa-plot-v0-2/
55. Thanks to: Shaoguang Liang (BGI-SZ)
Laurie Goodman Tin-Lap Lee (CUHK)
Tam Sneddon Huayen Gao (CUHK)
Nicole Nogoy Qiong Luo (HKUST)
Alexandra Basford Senghong Wang (HKUST)
Peter Li Yan Zhou (HKUST)
Jesse Si Zhe Cogini
editorial@gigasciencejournal.com
Contact us: database@gigasciencejournal.com
@gigascience
Follow us: facebook.com/GigaScience
blogs.openaccesscentral.com/blogs/gigablog/
www.gigadb.org
www.gigasciencejournal.com
Editor's Notes
and an advanced search option…
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data to dbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data to dbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data to dbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
Leading on from that, current and future plans include collaborating with Tin-Lap Lee at the Chinese University of Hong Kong to integrate an instance of the Galaxy bioinformatics platform with GigaDB so users can make full use of the data in GigaDB by linking it to other resources and we can incorporate fully executable papers. One such submission is a new SOAPdenovo pipeline. The SOAP tools have been wrapped in Galaxy, the workflow defined in MyExperiment and the data will be issued with a DOI and accessible via GigaDB. Utilizing the BGI cloud if necessary, users will then be able to reproduce all the steps described in the GigaScience paper to test, reanalyze, compare results etc.Since we would like GigaDB to be a host for data types that have no other home, such as imaging data, we are investigating adding other tools such as an image viewer and the like to support accessibility to and usability of the data. So, if you have a large-scale biological or biomedical dataset and/or a pipeline or software that you would like to submit to GigaScience we would love to hear from you so please come and talk to Scott or myself.
That just leaves me to thank the GigaScience team: Laurie, Scott, Alexandra, Peter and Jesse, BGI for their support - specifically Shaoguang for IT and bioinformatics support – our collaborators on the database, website and tools: Tin-Lap, Qiong, Senhong, Yan, the Cogini web design team, Datacite for providing the DOI service and the isacommons team for their support and advocacy for best practice use of metadata reporting and sharing.Thank you for listening.