This document discusses issues around data sharing in genomics research. It provides background on the history of genomics projects like the Human Genome Project. It then discusses BGI's role in large-scale sequencing efforts and their goal of making sequencing data highly accessible. It also discusses challenges around sharing large volumes of genomic data and ensuring proper attribution and credit for data sharing. Issues around data citation are examined, including the need for data citations to be tracked by citation indexes and for metrics around data citations to be utilized by the research community.
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
Scott Edmunds talk at the HUPO congress in Geneva, September 6th 2011 on GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami.
The flood of nextgen sequencing data is changing the landscape of computation biology, pushing the need for more robust infrastructures, tools, and visualization techniques.
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
Scott Edmunds talk at the HUPO congress in Geneva, September 6th 2011 on GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami.
The flood of nextgen sequencing data is changing the landscape of computation biology, pushing the need for more robust infrastructures, tools, and visualization techniques.
Presentation delivered 8th August 2016, at the European Association for Potato Research (EAPR) meeting, Dundee - outlining classification of bacterial plant pathogens with
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Ellinor Michel
Global Digital Infrastructure for Biological Nomenclature and Taxonomy
Ellinor Michel, Dep’t of Life Sciences, The Natural History Museum, London, UK, (e.michel@nhm.ac.uk)
Richard L. Pyle, Natural Sciences Dep’t, Bishop Museum, Honolulu, HI, USA
Robert P. Guralnick, Dep’t of Ecology & Evolutionary Biology, Univ Colorado, Boulder, CO, USA
Jon Todd, Dep’t of Earth Sciences, The Natural History Museum, London, UK,
The future for interoperable scientific information is digital, yet scientific names, the handles for all biodiversity information, remain without an integrated system tied to published descriptions and museum type specimens. Descriptions and type specimens provide standards for the otherwise fluid concepts of biological taxa. We are working to unify the infrastructures for biological nomenclature across nomenclatural codes (including zoological (ICZN - http://iczn.org/), botanical (ICNafp - http://www.iapt-taxon.org/nomen/main.php) and bacterial (ICNB) codes) through the Global Names Architecture (GNA). Our initial focus is on animal names, as these comprise the largest component of metazoan biodiversity and ZooBank (zoobank.org) is the first code-related online nomenclatural registration system. Users are applied scientists in agriculture, medicine, veterinary science and climate change research; biodiversity researchers such as ecologists, physiologists; archives such as museums; the scientific publishing community – in short, all users of scientific names of organisms based on the work of taxonomists.
A software tool that classifies the mouse and human scientific literature in PubMed into different areas of research using citation networks and Medical Subject Heading MeSH Thesaurus to identify and study the popular areas of mouse-human research. It also classify the proteins in this literature citations into different biological systems using protein co-occurrence networks and Gene Ontology to investigate the proteins for which mouse is used as a model organism for human.
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Surya Saha
Rapidly spreading invasive diseases in systems with little or no prior experimental data or resources pose a unique set of challenges for growers, scientists as well as regulators. As a part of a USDA NIFA CAPS project focused on the psyllid, Diaphorina citri, we have released improved genomics resources including high quality genome assemblies and annotation. We have also created an open access web portal for analyses around the Citrus Greening/Huanglongbing disease complex. Citrusgreening.org includes pathosystem-wide resources and bioinformatics tools for multiple Citrus spp. hosts, the Asian citrus psyllid vector (ACP, Diaphorina citri), and multiple pathogens including Candidatus Liberibacter asiaticus (CLas). To the best of our knowledge, this is the first example of a database to use the pathosystem as a holistic framework to understand an insect transmitted plant disease. Users can submit relevant data sets to enable sharing and allow the community to leverage their data within an integrated system. The system includes the metabolic pathway databases CitrusCyc and DiaphorinaCyc with organism specific pathways that can be used to mine metabolomics, transcriptomics and proteomics results to identify pathways and regulatory mechanisms involved in disease response. The Psyllid Expression Network (PEN) contains expression profiles of ACP genes from multiple life stages, tissues, conditions and hosts. The Citrus Expression Network (CEN) contains public expression data from multiple tissues and conditions for various citrus hosts. All tools connect to a central database. The portal also includes electrical penetration graph (EPG) recordings, information about citrus rootstock trials and metabolomics data in addition to traditional omics data types with a goal of combining and mining all information related to the Huanglongbing pathosystem. User-friendly manual curation tools will allow the continuous improvement of knowledge base as more experimental research is published. The portal can be accessed at https://citrusgreening.org/.
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
A 3 part talk presented at PAG Asia 2019 in Shenzhen- The Digitalization of Ruili Botanical Garden Project: Production, Curation and Re-Use. Presented by Huan Liu (CNGB), Scott Edmunds (GigaScience) & Stephen Tsui (CUHK). 8th June 2019
Visualization of insect vector-plant pathogen interactions in the citrus gree...Surya Saha
The Asian citrus psyllid (ACP,Diaphorina citri) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the causal agent for citrus greening disease, which threatens the citrus industry worldwide. The Asian citrus psyllid genome project is a coordinated effort to define the psyllid genome, including the identification and annotation of every psyllid gene. This discovery of psyllid genes regulating CLas acquisition and transmission by the psyllid will transform future vector management strategies for controlling citrus greening. Advances in psyllid genome sequencing to improve genome assembly, including using Pacbio and long-range Hi-C scaffolding, resulted in the identification of 13 psyllid chromosomes, the first description of chromosome number for this economically important hemipteran insect vector. Together with Pacbio IsoSeq technology to sequence psyllid transcripts from different life stages and those reared on CLas + and - trees, approximately 20,000 putative full-length protein coding psyllid genes were identified. Student driven annotation resulted in more than 500 high quality models of genes involved in CLas-ACP interactions. New assemblies and annotations of the Florida strains of the ACP bacterial endosymbionts, Wolbachia, Profftella, and Carsonella were also characterized from the genome sequencing data.
Finally, we developed a data visualization platform, the Psyllid Expression Network (PEN), which is a user-friendly web-based tool for mining gene and protein expression patterns. PEN enabled us to identify tissue and host plant specific changes in ACP genes in response to CLas at the transcript and proteome level. The availability of a high quality reference genome, endosymbiont genomes and tools for analyzing transcriptomics, proteomics and metabolomics data in an integrated, systems biology approach will enable novel approaches to control the transmission of citrus greening disease. The new ACP genome assembly (Diaci v3), PEN and other tools are available on https://citrusgreening.org/ which is our portal for all omics resources for the citrus greening disease.
https://plan.core-apps.com/pag_2019/event/b6da6bc5896fea594de507e257910266
Personal Genomes: what can I do with my data?Melanie Swan
Biology evolved to be just good enough to survive and genomics provides the critical next-generation toolkit for its greater exploitation. Genomics is already starting to be medically actionable and is likely to become increasingly useful over time. This presentation discusses how your genetic information is already useful today,
Ruth Duerr, data scientist and steward at the National Snow & Ice Data Center, CIRES and CU-Boulder, describes the new data citation policy for American Geophysical Union (AGU) journals. She shows examples of each part of a good citation, and answers questions about where to house data.
2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...datacite
2013 DataCite Summer Meeting - Making Research better
DataCite. Co-sponsored by CODATA.
Thursday, 19 September 2013 at 13:00 - Friday, 20 September 2013 at 12:30
Washington, DC. National Academy of Sciences
http://datacite.eventbrite.co.uk/
Presentation delivered 8th August 2016, at the European Association for Potato Research (EAPR) meeting, Dundee - outlining classification of bacterial plant pathogens with
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Ellinor Michel
Global Digital Infrastructure for Biological Nomenclature and Taxonomy
Ellinor Michel, Dep’t of Life Sciences, The Natural History Museum, London, UK, (e.michel@nhm.ac.uk)
Richard L. Pyle, Natural Sciences Dep’t, Bishop Museum, Honolulu, HI, USA
Robert P. Guralnick, Dep’t of Ecology & Evolutionary Biology, Univ Colorado, Boulder, CO, USA
Jon Todd, Dep’t of Earth Sciences, The Natural History Museum, London, UK,
The future for interoperable scientific information is digital, yet scientific names, the handles for all biodiversity information, remain without an integrated system tied to published descriptions and museum type specimens. Descriptions and type specimens provide standards for the otherwise fluid concepts of biological taxa. We are working to unify the infrastructures for biological nomenclature across nomenclatural codes (including zoological (ICZN - http://iczn.org/), botanical (ICNafp - http://www.iapt-taxon.org/nomen/main.php) and bacterial (ICNB) codes) through the Global Names Architecture (GNA). Our initial focus is on animal names, as these comprise the largest component of metazoan biodiversity and ZooBank (zoobank.org) is the first code-related online nomenclatural registration system. Users are applied scientists in agriculture, medicine, veterinary science and climate change research; biodiversity researchers such as ecologists, physiologists; archives such as museums; the scientific publishing community – in short, all users of scientific names of organisms based on the work of taxonomists.
A software tool that classifies the mouse and human scientific literature in PubMed into different areas of research using citation networks and Medical Subject Heading MeSH Thesaurus to identify and study the popular areas of mouse-human research. It also classify the proteins in this literature citations into different biological systems using protein co-occurrence networks and Gene Ontology to investigate the proteins for which mouse is used as a model organism for human.
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Surya Saha
Rapidly spreading invasive diseases in systems with little or no prior experimental data or resources pose a unique set of challenges for growers, scientists as well as regulators. As a part of a USDA NIFA CAPS project focused on the psyllid, Diaphorina citri, we have released improved genomics resources including high quality genome assemblies and annotation. We have also created an open access web portal for analyses around the Citrus Greening/Huanglongbing disease complex. Citrusgreening.org includes pathosystem-wide resources and bioinformatics tools for multiple Citrus spp. hosts, the Asian citrus psyllid vector (ACP, Diaphorina citri), and multiple pathogens including Candidatus Liberibacter asiaticus (CLas). To the best of our knowledge, this is the first example of a database to use the pathosystem as a holistic framework to understand an insect transmitted plant disease. Users can submit relevant data sets to enable sharing and allow the community to leverage their data within an integrated system. The system includes the metabolic pathway databases CitrusCyc and DiaphorinaCyc with organism specific pathways that can be used to mine metabolomics, transcriptomics and proteomics results to identify pathways and regulatory mechanisms involved in disease response. The Psyllid Expression Network (PEN) contains expression profiles of ACP genes from multiple life stages, tissues, conditions and hosts. The Citrus Expression Network (CEN) contains public expression data from multiple tissues and conditions for various citrus hosts. All tools connect to a central database. The portal also includes electrical penetration graph (EPG) recordings, information about citrus rootstock trials and metabolomics data in addition to traditional omics data types with a goal of combining and mining all information related to the Huanglongbing pathosystem. User-friendly manual curation tools will allow the continuous improvement of knowledge base as more experimental research is published. The portal can be accessed at https://citrusgreening.org/.
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
A 3 part talk presented at PAG Asia 2019 in Shenzhen- The Digitalization of Ruili Botanical Garden Project: Production, Curation and Re-Use. Presented by Huan Liu (CNGB), Scott Edmunds (GigaScience) & Stephen Tsui (CUHK). 8th June 2019
Visualization of insect vector-plant pathogen interactions in the citrus gree...Surya Saha
The Asian citrus psyllid (ACP,Diaphorina citri) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the causal agent for citrus greening disease, which threatens the citrus industry worldwide. The Asian citrus psyllid genome project is a coordinated effort to define the psyllid genome, including the identification and annotation of every psyllid gene. This discovery of psyllid genes regulating CLas acquisition and transmission by the psyllid will transform future vector management strategies for controlling citrus greening. Advances in psyllid genome sequencing to improve genome assembly, including using Pacbio and long-range Hi-C scaffolding, resulted in the identification of 13 psyllid chromosomes, the first description of chromosome number for this economically important hemipteran insect vector. Together with Pacbio IsoSeq technology to sequence psyllid transcripts from different life stages and those reared on CLas + and - trees, approximately 20,000 putative full-length protein coding psyllid genes were identified. Student driven annotation resulted in more than 500 high quality models of genes involved in CLas-ACP interactions. New assemblies and annotations of the Florida strains of the ACP bacterial endosymbionts, Wolbachia, Profftella, and Carsonella were also characterized from the genome sequencing data.
Finally, we developed a data visualization platform, the Psyllid Expression Network (PEN), which is a user-friendly web-based tool for mining gene and protein expression patterns. PEN enabled us to identify tissue and host plant specific changes in ACP genes in response to CLas at the transcript and proteome level. The availability of a high quality reference genome, endosymbiont genomes and tools for analyzing transcriptomics, proteomics and metabolomics data in an integrated, systems biology approach will enable novel approaches to control the transmission of citrus greening disease. The new ACP genome assembly (Diaci v3), PEN and other tools are available on https://citrusgreening.org/ which is our portal for all omics resources for the citrus greening disease.
https://plan.core-apps.com/pag_2019/event/b6da6bc5896fea594de507e257910266
Personal Genomes: what can I do with my data?Melanie Swan
Biology evolved to be just good enough to survive and genomics provides the critical next-generation toolkit for its greater exploitation. Genomics is already starting to be medically actionable and is likely to become increasingly useful over time. This presentation discusses how your genetic information is already useful today,
Ruth Duerr, data scientist and steward at the National Snow & Ice Data Center, CIRES and CU-Boulder, describes the new data citation policy for American Geophysical Union (AGU) journals. She shows examples of each part of a good citation, and answers questions about where to house data.
2013 DataCite Summer Meeting - Out of Cite, Out of Mind: Report of the CODATA...datacite
2013 DataCite Summer Meeting - Making Research better
DataCite. Co-sponsored by CODATA.
Thursday, 19 September 2013 at 13:00 - Friday, 20 September 2013 at 12:30
Washington, DC. National Academy of Sciences
http://datacite.eventbrite.co.uk/
Thoughts on addressing data citation challenges: experiences of Vibrant projectVince Smith
Roberts, D Smith, VS. 2011. Thoughts on addressing data citation challenges: experiences of Vibrant project. TDWG 2011 Annual Conference, Data Citation Workshop at the Astor Crown Plaza Hotel, New Orleans, Louisiana, USA. 16 - 21st October 2011.
Let's talk about data: Citation and publicationAdam Leadbetter
Research data management talk delivered at the Marine Institute, Galway, Ireland in November 2015, introducing the concepts of data citation and formal data publication.
Scott Edmunds talk in the "Policies and Standards for Reproducible Research" session on Revolutionizing Data Dissemination: GigaScience, at the Genomic Standards Consortium meeting at Shenzhen. 6th March 2012
Scott Edmunds talk on GigaScience Big-Data, Data Citation and future data handling at the International Conference of Genomics on the 15th November 2011.
GigaScience Editor-in-Chief Laurie Goodman's talk at the International Conference on Genomics pre-conference press-session on the release of new unpublished datasets, and a new look beta version of their database: GigaDB.org
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...GigaScience, BGI Hong Kong
Alexandra Basford's talk in the curation session at the InCoB meeting in Kuala Lumpar, 30/11/11 on: GigaScience: A Journal’s Perspective on Data Standards and Biocuration
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
06.09.15
Invited Talk
2006 Synthetic Biology Symposium
Aliso Creek Inn
Title: Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics
Laguna Beach, CA
Tin-Lap Lee (CUHK) presentation "GDSAP- A Galaxy-based platform for large-scale genomics analysis" from the Galaxy Community Conference 2012, Chicago, July 26th 2012
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
Scott Edmunds at International Data Week 2022: A decades experiences in transparent and interactive publication of FAIR data and software via an end-to-end XML publishing platform. 21st June 2022
GigaByte Chief Editor Scott Edmunds presents on how to prepare a data paper for the TDR and WHO sponsored call for data papers describing datasets on vectors of human diseases launched in Nov 2021. Presented at the GBIF webinar on 25th January 2022 and aimed at authors interested in submitting a manuscript submitted to the series.
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
Scott Edmunds at the STM Week 2020 Digital Publishing seminar on Demonstrating bringing publications to life via an End-to-end XML publishing platform. 2nd December 2020
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
Scott Edmunds on a new publishing workflow for rapid dissemination of genomes using GigaByte & GigaDB. Presented at Biodiversity 2020 in the Annotation & Databases track, 9th October 2020.
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
Scot Edmunds talk at CODATA2019 on Quantifying how FAIR is Hong Kong: The Hong Kong Shareability of Hong Kong University Research Experiment. 19th September 2019 in Beijing
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
Scott Edmunds talk at IARC, Lyon. How can we make science more trustworthy and FAIR? Principled publishing for more evidence based research. 8th July 2019
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
Scott Edmunds at the China National GeneBank Youth Biodiversity MegaData Forum: Democratising biodiversity and genomics research: open and citizen science to build trust and fill the data gaps. 18th December 2018
Ricardo Wurmus at #ICG13: Reproducible genomics analysis pipelines with GNU Guix. Presented at the GigaScience Prize Track at the International Conference on Genomics, Shezhen 26th October 2018
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
Paul Pavlidis talk at the #ICG13 GigaScience Prize Track: Monitoring changes in the Gene Ontology and their impact on genomic data analysis (GOtrack). Shenzhen, 26th October 2018
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
Stefan Prost presentation for the #ICG13 GigaScience Prize Track: Genome analyses show strong selection on coloration, morphological and behavioral phenotypes in birds-of-paradise. Shenzhen, 26th October, 2018
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
Lisa Johnson's talk at the #ICG13 GigaScience Prize Track: Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes. Shenzhen, 26th October 2018
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
Scott Edmunds presentation on: Reproducible method and benchmarking publishing for the data (and evidence) driven era. The Silk Road Forensics Conference, Yantai, 18th September 2018
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
Mary Ann Tuli's talk at the International Society of Biocuration meeting : What MODs can learn from Journals – a GigaDB curator’s perspective. Shanghai 9th April 2018
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...GigaScience, BGI Hong Kong
Laurie Goodman's pre-prepared slides for the Subgroup S Sharing and Reusing Cell Image Data session at the 2017 ASCB│EMBO meeting in Philadelphia. December 2017
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
2. Overview
/ Genomics #101
Data-Sharing Issues
Introduction
How it’s working…
Adventures in Data
Citation
Downstream consequences…
Our Examples My two RMB/what is still
needed…
3. A brief history of genomics…
Human Genome Project: 1990-2003.
1 Genome = $3 Billion
Source: http://www.genome.gov/Images/press_photos/highres/38-300.jpg
4. A brief history of genomics…
Source: http://www.genome.gov/sequencingcosts/ (with apologies)
5. A brief history of genomics…
1st Gen 2nd (next) Gen
3rd (next-next) Gen?
Source: http://www.genome.gov/sequencingcosts/ (with apologies)
6. A brief history of genomics…
3rd (next-next) Gen?
Source: http://www.genome.gov/sequencingcosts/ (with apologies)
7. BGI Introduction
• Formerly known as Beijing Genomics Institute
• Founded in 1999 (1% of HGP)
• Not-for-profit research institute funded by
commercial sequencing-as-a-service
• Now the largest genomic organization in the world
• Goal
– Use genomics technology to impact the society
– Make leading edge genomics highly
accessible to the global research community
10. Global Sequencing Capacity
Data Production
5.6 Tb / day
> 1500X of human genome / day
Multiple Supercomputing Centers
157 TB Flops
20 TB Memory
14.7 PB Storage
11. BGI Sequencing Capacity
Sequencers Data Production
137 Illumina/HiSeq 2000 5.6 Tb / day
27 LifeTech/SOLiD 4 > 1500X of human genome / day
1 454 GS FLX+ 137
2 Illumina iScan Multiple Supercomputing Centers
1 Illumina MiSeq 157 TB Flops
1 Ion Torrent 20 TB Memory
14.7 PB Storage
12.
13. Goal – “Just sequence it.”
M+M+M: Million Genome Projects
• Plant and Animal Genomes: G10K, i5K...
• Variation Genomes: 10K rice resequencing....
• Human Genomes: Ancient, Population, Medical
• Cell Genomes: cancer single cell
• Micro Ecosystems: Metahit, EMP, etc.
• Personal Genomes
18. Genomics Data Sharing Policies…
Bermuda Accords 1996/1997/1998:
1. Automatic release of sequence assemblies within 24 hours.
2. Immediate publication of finished annotated sequences.
3. Aim to make the entire sequence freely available in the public domain for
both research and development in order to maximise benefits to society.
Fort Lauderdale Agreement, 2003:
1. Sequence traces from whole genome shotgun projects are to be
deposited in a trace archive within one week of production.
2. Whole genome assemblies are to be deposited in a public nucleotide
sequence database as soon as possible after the assembled sequence
has met a set of quality evaluation criteria.
Toronto International data release workshop, 2009:
The goal was to reaffirm and refine, where needed, the policies related to
the early release of genomic data, and to extend, if possible, similar data
release policies to other types of large biological datasets – whether from
proteomics, biobanking or metabolite research.
19. Challenges for the future…
(A) Cumulative base pairs in INSDC over
time, excluding the Trace Archive.
(B) Base pairs in INSDC, broken down into
selected data components.
Published by Oxford University Press 2011.
Karsch-Mizrachi I et al. Nucl. Acids Res. 2012;40:D33-D37
20. Challenges for the future…
1. Data Volumes (transfer, backlogs, funding issues)
2. Compliance
3. Lack of interoperability/sufficient metadata
4. Long tail of curation (“Democratization” of “big-data”)
21. New incentives/credit
Credit where credit is overdue:
“One option would be to provide researchers who release data to
public repositories with a means of accreditation.”
“An ability to search the literature for all online papers that used a
particular data set would enable appropriate attribution for those
who share. “
Nature Biotechnology 27, 579 (2009)
Prepublication data sharing
(Toronto International Data Release Workshop)
“Data producers benefit from creating a citable reference, as it can
?
later be used to reflect impact of the data sets.”
Nature 461, 168-170 (2009)
22. New incentives/credit
= Data Citation?
“increase acceptance of research data as
legitimate, citable contributions to the
scholarly record”.
“data generated in the course of research
are just as valuable to the ongoing
academic discourse as papers and
monographs”. ?
23. First issue next month…
Large-Scale Data
Journal/Database
In conjunction with:
Editor-in-Chief: Laurie Goodman, PhD
Editor: Scott Edmunds, PhD
Assistant Editor: Alexandra Basford, PhD
Lead Curator: Tam Sneddon D.Phil
www.gigasciencejournal.com
27. For data citation to work, needs:
1. Proven utility/potential user base.
2. Acceptance/inclusion by journals.
3. Data+Citation: inclusion in the references.
4. Tracking by citation indexes.
5. Usage of the metrics by the community…
28. Datacitation 1: utility/user base.
Establishment of data DOIs and use by databases:
Shackleton NJ, Hall MA, Vincent E (2001): Mean stable carbon isotope ratios
of Cibicidoides wuellerstorfi from sediment core MD95-2042 on the Iberian
margin, North Atlantic. PANGAEA - Data Publisher for Earth & Environmental
Science. http://doi.pangaea.de/10.1594/PANGAEA.58229
Cited in:
Pahnke K, Zahn R: Southern Hemisphere Water Mass Conversion Linked with North Atlantic
Climate Variability. Science 2005, 307:1741 -1746.
Nocek B, Xu X, Savchenko A, Edwards A, Joachimiak A. 2007. PDB
ID: 2P06 Crystal structure of a predicted coding region AF_0060
from Archaeoglobus fulgidus DSM 4304. 10.2210/pdb2p06/pdb.
Cited in:
Andreeva A, Howorth D, Chandonia J-M, Brenner SE, Hubbard TJP, Chothia C, Murzin AG: Data
growth and its impact on the SCOP database: new developments. Nucleic Acids Res.
2008, 36:D419-425.
29. BGI Datasets Get DOI®s
Invertebrate
Many released pre-publication…
Ant PLANTS
- Florida carpenter ant Chinese cabbage
Vertebrates
- Jerdon’s jumping ant Cucumber
Giant panda Macaque
- Leaf-cutter ant Foxtail millet
- Chinese rhesus
Roundworm Pigeonpea
- Crab-eating
Schistosoma Potato
Mini-Pig
Silkworm Sorghum
Naked mole rat
Penguin
Human - Emperor penguin
Asian individual (YH) - Adelie penguin
- DNA Methylome Pigeon, domestic
- Genome Assembly Polar bear
- Transcriptome Sheep
doi:10.5524/100004
Cancer (14TB) Tibetan antelope
Ancient DNA Microbe
- Saqqaq Eskimo E. Coli O104:H4 TY-2482
- Aboriginal Australian
Cell-Line
Chinese Hamster Ovary
30. Our first DOI:
To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang,
J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J;
Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X;
Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the
Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium
(2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI
Shenzhen. doi:10.5524/100001
http://dx.doi.org/10.5524/100001
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
31.
32.
33.
34. Downstream consequences:
1. Therapeutics (primers, antimicrobials) 2. Platform Comparisons (Loman et al., Nature Biotech 2012)
3. Speed/legal-freedom
“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli
strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days
for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could
use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that
allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and
publish their work without wasting time on legal wrangling.”
38. • Data submitted to NCBI databases:
- Raw data SRA:SRA046843
- Assemblies of 3 strains Genbank:AHAO00000000-AHAQ00000000
- SNPs dbSNP:1056306
- CNVs
-
-
InDels
SV
} dbVAR:nstd63
• Submission to public databases complemented by
its citable form in GigaDB (doi:10.5524/100012).
44. And in more journals…
Hodkinson BP, Uehling JK, Smith ME (2012) Data from: Lepidostroma
vilgalysii, a new basidiolichen from the New World. Dryad Digital
Repository. doi:10.5061/dryad.j1g5dh23
Cited in:
Hodkinson BP, Uehling JK, Smith ME: Lepidostroma vilgalysii, a new basidiolichen
from the New World. Mycological Progress 2012. Advance Online Publication.
Roberts SB (2012) Herring Hepatic Transcriptome 34300
contigs.fa. Figshare. Available:
hdl.handle.net/10779/084d34370fbda29bbc67b3c5ecb02
575. Accessed 2012 Jan 20.
Cited in:
Roberts SB, Hauser L, Seeb LW, Seeb JE (2012) Development of Genomic Resources
for Pacific Herring through Targeted Transcriptome Pyrosequencing. PLoS ONE 7(2):
e30908. doi:10.1371/journal.pone.0030908
45. For data citation to work, needs:
1. Proven utility/potential user base. ✔
2. Acceptance/inclusion by journals. ✔
3. Data+Citation: inclusion in the references. ✔
4. Tracking by citation indexes.
5. Usage of the metrics by the community…
47. Datacitation 4: tracking?
✗FAIL
DataCite metadata in harvestable form (OAI-PMH)
- lists some DataCite DOIs, but says:
Datasets listed are the “result of approximations in the indexing
algorithms.”
“Google Scholar's intended coverage is for scholarly articles. At
this point, we don't include datasets. “
48. Datacitation 4: tracking?
✗FAIL
DataCite metadata in harvestable form (OAI-PMH)
✗ Working on it. Coming soon?
…the final
challenge?
49.
50. Datacitation 5: metrics?
“As a result of diverse practices and tool
limitations, data citations are currently very
difficult to track.”
51. Datacitation 5: metrics?
✗FAIL
Research Remix, 29th May 2012: http://researchremix.wordpress.com/2012/05/29/dear-research-
data-advocate-please-sign-the-petition-oamonday/
I’m afraid we are making promises to data
creators about attribution and reward that we
can’t keep. ”Make your data citeable!” is the cry.
Ok. So citeable is step one. Cited is step two. But
for the citation to be useful, it has to be indexed
so that citation metrics can be tracked and
admired and used.
Who is indexing data citations right now? As far
as I can tell: absolutely no one.
52. Where data citation is in 2012:
1. Proven utility/potential user base. ✔
2. Acceptance/inclusion by journals. ✔
3. Data+Citation: inclusion in the references. ✔
4. Tracking by citation indexes. ✗
5. Usage of the metrics by the community… ✗
53. Minor quibbles: export to citation managers
DCC/DataCite recommended format:
Zheng, L-Y; Guo, X-S; He, B; Sun, L-J; Peng, Y; Dong, S-S; Liu, T-F; Jiang, S;
Ramachandran, S; Liu, C-M; Jing, H-C; (2011): Genome data from sweet and grain
sorghum (Sorghum bicolor); GigaScience. http://dx.doi.org/10.5524/100012
formatting:
Zheng, L-Y (2011). Genome data from sweet and grain sorghum (Sorghum bicolor).
GigaScience. Retrieved from http://dx.doi.org/10.5524/100012
Mendeley formatting:
Zheng L-Y Guo X-S He B Sun L-J Peng Y Dong S-S Liu T-F Jiang S
; ; ; ; ; ; ; ;
Ramachandran S Liu C-M Jing H-C: Genome data from sweet and grain sorghum
; ;
(Sorghum bicolor). 2011.
54. Minor quibbles: clearer guidelines
Rules for versioning/where do you set granularity?
Experiment e.g. doi:10.5524/100001 Papers
(e.g. ACRG project)
e.g. doi:10.5524/100001-2 Data/
Datasets Micropubs
(e.g. cancer type)
e.g. doi:10.5524/100001-2000
Sample or doi:10.5524/100001_xyz
(e.g. specimen xyz)
Smaller still? Facts/Assertations (~1013 in literature) Nanopubs
55.
56.
57. Papers in the era of big-data
goal: Executable Research Objects
July 2012 Wilson GA, Dhami P, Feber A, Cortázar D, Suzuki Y, Schulz R, Schär P, Beck S:
Resources for methylome analysis suitable for gene knockout studies of
potential epigenome modifiers. GigaScience 2012, 1:3. (in press)
GigaDB hosting all data + tools (84GB total): doi:10.5524/100035
+
Partial (~80%) integration of workflow into our data platform.
(all the data processing steps, but not the enrichment analysis)
Data in ISA-Tab compliant format
Next stage… Papers fully integrating all data + all workflows in our platform.
58. Do you have interesting large-scale
biological data sets?
Submit to:
• Rapid review/Open Access/High-visibility
• Article Processing Charge covered by BGI
• Hosting of any test datasets/workflows in GigaDB
Interested in Reproducible Research?
Take part in our session on: “Cloud and workflows for reproducible bioinformatics”
59. Thanks to:
Laurie Goodman Alexandra Basford
Tam Sneddon Shaoguang Liang
Tin-Lap Lee (CUHK) Qiong Luo (HKUST)
scott@gigasciencejournal.com
Contact us:
editorial@gigasciencejournal.com
@gigascience
Follow us: facebook.com/GigaScience
blogs.openaccesscentral.com/blogs/gigablog/
www.gigasciencejournal.com
Editor's Notes
BGI (formerly known as Beijing Genomics Institute) was founded in 1999 and has since become the largest genomic organization in the world, with a focus on research and applications in healthcare, agriculture, conservation, and bio-energy fields.Our goal is to make leading-edge genomics highly accessible to the global research community by leveraging industry’s best technology, economies of scale and expert bioinformatics resources. BGI Americas was established as an interface with customer and collaborations in North and South Americas.
Our facilities feature Sanger and next-generation sequencing technologies, providing the highest throughput sequencing capacity in the world. Powered by 137 IlluminaHiSeq 2000 instruments and 27 Applied BiosystemsSOLiD™ 4 Systems, we provide, high-quality sequencing results with industry-leading turnaround time. As of December 2010, our sequencing capacity is 5 Tb raw data per day, supported by several supercomputing centers with a total peak performance up to 102 Tflops, 20 TB of memory, and 10 PB storage. We provide stable and efficient resources to store and analyze massive amounts of data generated by next generation sequencing.
Our facilities feature Sanger and next-generation sequencing technologies, providing the highest throughput sequencing capacity in the world. Powered by 137 IlluminaHiSeq 2000 instruments and 27 Applied BiosystemsSOLiD™ 4 Systems, we provide, high-quality sequencing results with industry-leading turnaround time. As of December 2010, our sequencing capacity is 5 Tb raw data per day, supported by several supercomputing centers with a total peak performance up to 102 Tflops, 20 TB of memory, and 15 PB storage. We provide stable and efficient resources to store and analyze massive amounts of data generated by next generation sequencing. The LHC of Biology?
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data todbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.