The Royal Society of Chemistry is one of the worlds foremost scientific societies, a primary publisher for the chemical sciences and an innovator in the domain of eScience. In order to deliver on a number of our eScience projects we utilize a number of components of Advanced Chemistry Development software including nomenclature, physchem prediction, spectroscopy tools and the ACD/Ilab web-based system. This presentation will provide an overview of a number of RSC projects where ACS/Labs software has played an important role in the delivery of the systems including ChemSpider and the National Chemical Database Service for the United Kingdom. We will also provide an overview of our vision to deliver a repository for various types of experimental chemistry data and how we foresee utilizing various prediction and validation software approaches to characterize the data as well as the potential to generate predictive models from the data. This couples directly with our intention to data enable our publication archive of over 300,000 articles extracting chemicals, reactions and analytical data from the historical records.
A presentation on the SageCite project given at the JISC MRD International Workshop in March 2011. Describes the application domain and citation challenges in SageCite.
The Royal Society of Chemistry is one of the worlds foremost scientific societies, a primary publisher for the chemical sciences and an innovator in the domain of eScience. In order to deliver on a number of our eScience projects we utilize a number of components of Advanced Chemistry Development software including nomenclature, physchem prediction, spectroscopy tools and the ACD/Ilab web-based system. This presentation will provide an overview of a number of RSC projects where ACS/Labs software has played an important role in the delivery of the systems including ChemSpider and the National Chemical Database Service for the United Kingdom. We will also provide an overview of our vision to deliver a repository for various types of experimental chemistry data and how we foresee utilizing various prediction and validation software approaches to characterize the data as well as the potential to generate predictive models from the data. This couples directly with our intention to data enable our publication archive of over 300,000 articles extracting chemicals, reactions and analytical data from the historical records.
A presentation on the SageCite project given at the JISC MRD International Workshop in March 2011. Describes the application domain and citation challenges in SageCite.
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
A presentation on research data management tools, workflows and best practices at Imperial College London with a focus on software management. Presented at the 2017 session of the HPC Summer School (Dept. of Computing).
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...Big Data Value Association
At the heart of this DataBench webinar is the goal to share a benchmarking process helping European organisations developing Big Data Technologies to reach for excellence and constantly improve their performance, by measuring their technology development activity against parameters of high business relevance.
The webinar aims to provide the audience with a framework and tools to assess the performance and impact of Big Data and AI technologies, by providing real insights coming from DataBench. In addition, representatives from other projects part of the BDV PPP such as DeepHealth and They-Buy-for-You will participate to share the challenges and opportunities they have identified on the use of Big Data, Analytics, AI. The perspective of other projects that also have looked into benchmarking, such as Track&Now and I-BiDaaS will be introduced.
Database Security – Issues and Best PracticesOutlineOllieShoresna
Database Security – Issues and Best Practices
Outline
• Intro to Database Security
•Need for Database Security
•Database Security Fundamentals
•Database Security Issues
• OWASP Top 10 – A1:2017– Injection
• OWASP Top 10 – A3:2017– Sensitive Data Exposure
•Attacks against Database Security Mechanisms
•Database Security Best Practices
2
Intro to Database Security
3
Intro to Database Security
• How does a web application work?
4
Client
Server
Involves
databases
Intro to Database Security (contd.)
•Database
• A database is “an organized collection of structured information, or
data, typically stored electronically in a computer system”
• It includes: the data, the DBMS, & applications that use them
•Database Management Systems (DBMS):
• DBMS serve “as an interface between the database and its end
users or programs, allowing users to retrieve, update, and manage
how the information is organized and optimized”
5
Source: What is a Database – Oracle –
https://www.oracle.com/database/what-is-database.html
https://www.oracle.com/database/what-is-database.html
Intro to Database Security (contd.)
•Database Management Systems (DBMS) (continued):
• DBMS also facilitate “oversight and control of databases, enabling a
variety of administrative operations such as performance
monitoring, tuning, and backup and recovery”
• Types:
• Relational, Object-Oriented, Distributed, Data Warehouses, Open Source,
Cloud, Autonomous, etc.
• Examples:
• Oracle, SQL Server, MySQL, Microsoft Access, MariaDB, PostgreSQL, etc.
6
Source: What is a Database – Oracle –
https://www.oracle.com/database/what-is-database.html
https://www.youtube.com/watch?v=_p00AzHE5U4
https://www.oracle.com/database/what-is-database.html
Intro to Database Security (contd.)
•Database Tutorial for Beginners – Lucidchart
7
Source: Lucidchart – Database Tutorial for Beginners –
https://www.youtube.com/watch?v=wR0jg0eQsZA
https://www.youtube.com/watch?v=wR0jg0eQsZA
Intro to Database Security (contd.)
•Database security refers to “the range of tools, controls, and
measures designed to establish and preserve database
confidentiality, integrity, and availability” (IBM, 2019)
•Database security involves protection of
• The data in the database
• The database management system (DBMS) itself
• Any associated applications (including web applications)
• The physical and/or virtual database server farms and their
underlying hardware
• The computing and/or network infrastructure used to access
the database (IBM, 2019)
8
https://www.ibm.com/cloud/learn/database-security
https://www.ibm.com/cloud/learn/database-security
Intro to Database Security (contd.)
•Database security involves securing data
• At rest
• Using techniques such as encryption
• Example: Amazon RDS uses 256-bit Advanced Encryption Standard (AES) for
securing database instances, automated backups, and snapshots at rest
• In flight
• Using protocols such as Transport Layer ...
Doing Analytics Right - Building the Analytics EnvironmentTasktop
Implementing analytics for development processes is challenging. As in discussed in the previous webinars, the right analytics are determined by the goals of the organization, not by the available data. So implementing your analytics solutions will require an efficient analytics and data architecture, including the ability to combine and stage data from heterogeneous sources. An architecture that excludes the ability to gain access to the necessary data will create a barrier to deploying your newly designed analytics program, and will force you back into the “light is brighter here” anti-pattern.
This webinar will describe the technical considerations of implementing the data architecture for your analytics program, and explain how Tasktop can help.
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Denodo
Watch full webinar here: https://bit.ly/3kr0oq4
So you’re building a data lake to solve your big data challenges. A data lake will allow you to keep all of your raw, detailed data in a single, consolidated repository; therefore, your problem is solved. Or is it? Is it really that easy?
Data lakes have their use and purpose, and we’re not here to argue that. However, data lakes on their own are constrained by factors such as duplication of data and therefore higher costs, governance limitations, and the risk of becoming another data silo.
With the addition of data virtualization, a physical data lake, can turn into a virtual or logical data like through an abstraction layer. Data virtualization can facilitate and expedite accessing and exploring critical data in a cost-effective manner and assist in deriving a greater return on the data lake investment.
You might still not be convinced. Give us an opportunity and join us as we try to bust this myth!
Watch this webinar as we explore the promises of a data lake as well as its downfalls to draw a final conclusion.
The guidelines are targeted at academic institutions in developing countries world wide, who want to start an open access research repository and who want to know in detail what is required and how to do it step-by-step. This soup-to-nuts overview may be particularly useful for those involved in the early stages of planning for an institutional repository. The focus during development of the open system has been long term repository preservation, security, stability and interoperability on the internet.
Slides used to present WireCloud, WStore and WMarket during the ICT 2015 that takes place in Lisbon.
WireCloud, WStore and WMarket are generic enablers provided by FIWARE and developed by Universidad Politécnica de Madrid.
BioCASE web services for germplasm data sets, at FAO, Rome (2006)Dag Endresen
Sharing of biodiversity data with web services - demonstration of the BioCASE software. Food and Agriculture Organization of the United Nations (FAO) 2nd March 2006.
Cape Town - Bioschemas workshop before the Bioinformatics Education Summit.
Explains schema.org, Bioschemas, TeSS Case study, and the tools and implementation techniques adopters can use
The data behind the Biogeographic Atlas of the Southern OceanAnton Van de Putte
Griffiths, H. , Van de Putte, A. , Danis, B. , De Broyer, C. , Koubbi, P. , Raymond, B. , D'Udekem D'Acoz, C. , David, B. , Grant, S. , Gutt, J. , Held, C. , Hosie, G. , Huettmann, F. , Post, A. and Ropert-Coudert, Y. (2014): The data behind the Biogeographic Atlas of the Southern Ocean , 2014 XXXIII SCAR Open Science Conference, Auckland, New Zealand, 23 August 2014 - 3 September 2014
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
A presentation on research data management tools, workflows and best practices at Imperial College London with a focus on software management. Presented at the 2017 session of the HPC Summer School (Dept. of Computing).
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...Big Data Value Association
At the heart of this DataBench webinar is the goal to share a benchmarking process helping European organisations developing Big Data Technologies to reach for excellence and constantly improve their performance, by measuring their technology development activity against parameters of high business relevance.
The webinar aims to provide the audience with a framework and tools to assess the performance and impact of Big Data and AI technologies, by providing real insights coming from DataBench. In addition, representatives from other projects part of the BDV PPP such as DeepHealth and They-Buy-for-You will participate to share the challenges and opportunities they have identified on the use of Big Data, Analytics, AI. The perspective of other projects that also have looked into benchmarking, such as Track&Now and I-BiDaaS will be introduced.
Database Security – Issues and Best PracticesOutlineOllieShoresna
Database Security – Issues and Best Practices
Outline
• Intro to Database Security
•Need for Database Security
•Database Security Fundamentals
•Database Security Issues
• OWASP Top 10 – A1:2017– Injection
• OWASP Top 10 – A3:2017– Sensitive Data Exposure
•Attacks against Database Security Mechanisms
•Database Security Best Practices
2
Intro to Database Security
3
Intro to Database Security
• How does a web application work?
4
Client
Server
Involves
databases
Intro to Database Security (contd.)
•Database
• A database is “an organized collection of structured information, or
data, typically stored electronically in a computer system”
• It includes: the data, the DBMS, & applications that use them
•Database Management Systems (DBMS):
• DBMS serve “as an interface between the database and its end
users or programs, allowing users to retrieve, update, and manage
how the information is organized and optimized”
5
Source: What is a Database – Oracle –
https://www.oracle.com/database/what-is-database.html
https://www.oracle.com/database/what-is-database.html
Intro to Database Security (contd.)
•Database Management Systems (DBMS) (continued):
• DBMS also facilitate “oversight and control of databases, enabling a
variety of administrative operations such as performance
monitoring, tuning, and backup and recovery”
• Types:
• Relational, Object-Oriented, Distributed, Data Warehouses, Open Source,
Cloud, Autonomous, etc.
• Examples:
• Oracle, SQL Server, MySQL, Microsoft Access, MariaDB, PostgreSQL, etc.
6
Source: What is a Database – Oracle –
https://www.oracle.com/database/what-is-database.html
https://www.youtube.com/watch?v=_p00AzHE5U4
https://www.oracle.com/database/what-is-database.html
Intro to Database Security (contd.)
•Database Tutorial for Beginners – Lucidchart
7
Source: Lucidchart – Database Tutorial for Beginners –
https://www.youtube.com/watch?v=wR0jg0eQsZA
https://www.youtube.com/watch?v=wR0jg0eQsZA
Intro to Database Security (contd.)
•Database security refers to “the range of tools, controls, and
measures designed to establish and preserve database
confidentiality, integrity, and availability” (IBM, 2019)
•Database security involves protection of
• The data in the database
• The database management system (DBMS) itself
• Any associated applications (including web applications)
• The physical and/or virtual database server farms and their
underlying hardware
• The computing and/or network infrastructure used to access
the database (IBM, 2019)
8
https://www.ibm.com/cloud/learn/database-security
https://www.ibm.com/cloud/learn/database-security
Intro to Database Security (contd.)
•Database security involves securing data
• At rest
• Using techniques such as encryption
• Example: Amazon RDS uses 256-bit Advanced Encryption Standard (AES) for
securing database instances, automated backups, and snapshots at rest
• In flight
• Using protocols such as Transport Layer ...
Doing Analytics Right - Building the Analytics EnvironmentTasktop
Implementing analytics for development processes is challenging. As in discussed in the previous webinars, the right analytics are determined by the goals of the organization, not by the available data. So implementing your analytics solutions will require an efficient analytics and data architecture, including the ability to combine and stage data from heterogeneous sources. An architecture that excludes the ability to gain access to the necessary data will create a barrier to deploying your newly designed analytics program, and will force you back into the “light is brighter here” anti-pattern.
This webinar will describe the technical considerations of implementing the data architecture for your analytics program, and explain how Tasktop can help.
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Denodo
Watch full webinar here: https://bit.ly/3kr0oq4
So you’re building a data lake to solve your big data challenges. A data lake will allow you to keep all of your raw, detailed data in a single, consolidated repository; therefore, your problem is solved. Or is it? Is it really that easy?
Data lakes have their use and purpose, and we’re not here to argue that. However, data lakes on their own are constrained by factors such as duplication of data and therefore higher costs, governance limitations, and the risk of becoming another data silo.
With the addition of data virtualization, a physical data lake, can turn into a virtual or logical data like through an abstraction layer. Data virtualization can facilitate and expedite accessing and exploring critical data in a cost-effective manner and assist in deriving a greater return on the data lake investment.
You might still not be convinced. Give us an opportunity and join us as we try to bust this myth!
Watch this webinar as we explore the promises of a data lake as well as its downfalls to draw a final conclusion.
The guidelines are targeted at academic institutions in developing countries world wide, who want to start an open access research repository and who want to know in detail what is required and how to do it step-by-step. This soup-to-nuts overview may be particularly useful for those involved in the early stages of planning for an institutional repository. The focus during development of the open system has been long term repository preservation, security, stability and interoperability on the internet.
Slides used to present WireCloud, WStore and WMarket during the ICT 2015 that takes place in Lisbon.
WireCloud, WStore and WMarket are generic enablers provided by FIWARE and developed by Universidad Politécnica de Madrid.
BioCASE web services for germplasm data sets, at FAO, Rome (2006)Dag Endresen
Sharing of biodiversity data with web services - demonstration of the BioCASE software. Food and Agriculture Organization of the United Nations (FAO) 2nd March 2006.
Cape Town - Bioschemas workshop before the Bioinformatics Education Summit.
Explains schema.org, Bioschemas, TeSS Case study, and the tools and implementation techniques adopters can use
The data behind the Biogeographic Atlas of the Southern OceanAnton Van de Putte
Griffiths, H. , Van de Putte, A. , Danis, B. , De Broyer, C. , Koubbi, P. , Raymond, B. , D'Udekem D'Acoz, C. , David, B. , Grant, S. , Gutt, J. , Held, C. , Hosie, G. , Huettmann, F. , Post, A. and Ropert-Coudert, Y. (2014): The data behind the Biogeographic Atlas of the Southern Ocean , 2014 XXXIII SCAR Open Science Conference, Auckland, New Zealand, 23 August 2014 - 3 September 2014
The Antarctic Master Directory, sharing Antarctic (meta)data from multiple di...Anton Van de Putte
International Workshop on Sharing , Citation and Publication of Scientific Data across Disciplines.
Joint Support-Center for Data Science Research (DS) ,
Tachikawa, Tokyo, Japan
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
1. CCAMBIO and the mARS
project
Anton Van de Putte
CCAMBIO Annual Meeting
12 may 2014
2. Microbial Antarctic
Resources System
An information system dedicated to facilitate the
discovery, access and analysis of geo-
referenced,
molecular microbial diversity (meta)data generated
by Antarctic researchers, in an Open fashion.
3. What’s happened so far
• mARS Workshop hosted at the Belgian Science Policy
Office (BELSPO, Brussels) in May 2012
• mARS Workshop held during the SCAR Open
Science Conference (Portland, OR) in July 2012
• Technical mARS Workshop hosted at the Université
Libre de Bruxelles in December 2013
• Initiate the development of the database and
webplatform
4. Near future planning
• mARS Workshop held during the SCAR Open
Science Conference (Auckland, NZ) on 27 august
2014
• Present a proof of concept of the dataindrastructure to
be used for mARS
11. Getting Data into mARS
• Requires that
• Data is accessible in a public a public repository
(Genbank, IMG-M or other web accessible)
• 2 additional metadata files
• MiMARKS
• Microbial Sequence spreadsheet
12. 0. Before you start
• 1. Clearly Identify your needs
• You have a project that you would like to register
with mARS
• no sequence data or environmental data at this
point: skip Steps 1, 2, 4 and 7
• environmental data, but no publicly available
sequences yet, follow all Steps below, but do not
enter Sequence IDs in the forms.
• environmental data, and publicly available
sequences. Follow all Steps below.
13. 0. Before you start
• Send an email to request a username and password
from the IPT administrator
14.
15. 0. Before you start
• Send an email to request a username and password
from the IPT administrator
• Make a copy of the MiMarks Googlesheet from the
RDP MiMarks Googlesheet (click on “Make copy” from
the “File” menu).
16.
17. 0. Before you start
• Send an email to request a username and password
from the IPT administrator
• Make a copy of the MiMarks Googlesheet from the
RDP MiMarks Googlesheet (click on “Make copy” from
the “File” menu).
• Make a copy of the Microbial Sequence Set from the
mARS Googlesheet (click on “Make copy” from the
“File” menu).
18.
19. 1. Prepare your MiMarks
spreadsheet• In the MiMarks Googlesheet you’ve created in step 0,
fill in your environmental metadata details using the
“Google Documents” interface, following the
instructions available from the MiMarks Googlesheet
documentation at RDP. Example files are available
from the mARS website.
• In the header for each column that will hold your
sequence set data, list the unique identifier of your
sequence set.
• Once you are finished, download your spreadsheet as
a CSV (Comma-separated Values) file on your
computer.
20.
21. 2. Prepare your Microbial
Sequence Set spreadsheet
• In the Microbial Sequence Set Googlesheet you’ve
created in step 0, fill all the fields (replace the
examples available from the Googlesheet)
• Once you are finished, download your spreadsheet as
a CSV (Comma-separated Values) file on your
computer.
22.
23. 3. Describe your data in the
IPT
• Login the IPT using your credentials:
• Use the form at the bottom of the “Manage Resource”
page to create a new resource. Provide a unique
"shortname" for your dataset.
• Click the “Create” button. You will arrive on the
Resource Management page.
• Click on the “Edit” button in the Metadata section on
the left and fill in the details for the different metadata
sections. A detailed instructions are available from IPT
quick reference guide. Hint: mention your grant
number in the “Project Data” section, to allow us to link
your resource to relevant projects in the GCMD/AMD.
24. 4. Upload your MiMarks and
Microbial Sequence Set
• 1. In your IPT session, from your Resource
Management page, click on the “Choose file” button in
the “Source data” section on the left of the page.
• 2. Point to your completed MiMarks CSV, and click on
“Choose”
• 3. Click on the “add” button in the “Source data”
section on the left of the page then click on the “Save”
button on the bottom. Your MiMarks CSV file is now
uploaded on the IPT.
25. 4. Upload your MiMarks and
Microbial Sequence Set
• 5. From your Resource Management page, click on
the “Choose file” button in the “Source data” section on
the left of the page.
• 6. Point to your completed Microbial Sequence Set
CSV, and click on “Choose”
• 7. Click on the “add” button in the Source data section
on the left of the page, then click on the “Save” button.
Your Microbial Sequence Set CSV file is now uploaded
on the IPT.
26. 5. Publish and register your
data
• From your Resource Management page, click on the
“Publish” button in the “Published release” section on
the left of the page. Do not worry when you see a
warning message “Source data or Darwin Core
mappings missing. No data archive generated
• By default, your resource’s visibility is set to “Private”.
To allow your resource to become visible on the IPT
for all users, click on the “Public” button in the
“Visibility” section.
• Request one of the administrators to “Register” your
dataset.
Editor's Notes
This step will capture information about molecular microbial diversity research efforts that are being or have been conducted by the Antarctic research community. The results of step 1 will facilitate communication and collaboration, augment comparative biodiversity studies, and provide a legacy- discoverable resource to advance science, conservation awareness and management. The scope of the information that can be entered in the IPT encompasses present, past, or future studies involving marker gene surveys (e.g.16S or 18S rRNA, functional genes), or meta “omic” projects from natural samples in Antarctic habitats, enrichment or pure culture efforts
Secondly, users will be invited to upload habitat and molecular methods-specific (meta)data pertaining to the samples and the related sequencing data (including accession numbers) using standardized accessible on the mARS website. These templates can readily be shared with your collaborators and it works with GenBank submission tools (Sequin and WebIN).
Used together, and uploaded with the corresponding IPT metadata entry (as described in Step 1), these templates will describe geo-referenced physiochemical information that relates to Antarctic microbial diversity studies as well as the matching sequencing information.
In this step, sequence data files produced by different technologies (e.g. Sanger sequencing, 454, Illumina, Ion Torrent) will be linked back to the relevant entries as described in steps 1 and 2.
mARSwill provide indexed searching capabilities and geo-server links to DNA sequence data from Antarctic studies that have been deposited in public repositories, providing rapid access to this information through the biodiversity.aqdata portal.
There is currently no exhaustive resource that provides this level of information from a geo-referenced perspective. The Antarctic scientific community is actively engaged in molecular icrobial diversity and genomic surveys in both terrestrial and marine realms. mARSprovides a unique resource to harness this information.
As the primary mandate of biodiversity.aqis to provide the scientific community access to Antarctic diversity information, biodiversity.aq staff will process the microbial diversity information referenced in mARS for selected, highly used regions of marker genes (for each domain of life) generated through both Sanger sequencing studies and NGS efforts in order to provide the users with a window into the microbial diversity present in Antarctica.
This SOP details how you can upload (meta)data to mARS. To ensure this procedure only has to been carried out once, the mARS team has devoted special care to following widely-used standards for biodiversity data, as promoted by the Global Biodiversity Information Facility and the Genomics Standards Consortium. In this particular case, this SOP is built around two main types of standards, namely DarwinCore and MiMarks,ensuring maximal interoperability with internationally-recognized data and metadata repositories.