Information systems on fish and marine genetic resources by ENACA Presented during the Regional Workshop on Underutilized Fish and Marine Genetic Resources and their Amelioration, 10-12 July 2019, Location: Colombo, Sri Lanka
2. Information systems?
• I will (mainly) look at major types of
system, rather than individual systems.
• Nobody really develops databases on
underutilised genetic resources.
• They do, of course, develop information
systems about genetic resources for many
other reasons.
• Some of these may be relevant to the
present discussion.
3. What do you mean,
“under utilised”?
• From an aquaculture perspective, this can
be viewed in different contexts:
– Biodiversity
• New species with high potential to be farmed.
• Alternative species, useful in some circumstances.
• Supplementary species that can fill vacant niches
in existing farming systems.
• Aquaculture is a young industry, and lot of
experimentation is still going on.
4. What do you mean,
“under utilised”?
• Improved varieties:
– Selecting the best performing genetic resources from
those already in our hand.
– Develop high-performance (genetically improved)
lines for improved growth, disease resistance etc.
– Relatively few improved varieties are available.
– Many standard hatchery practices actually cause a
loss of genetic diversity.
– In many cases, we are farming genetically degraded
resources.
5. What do you mean,
“under utilised”?
• Molecular genetics:
– Markers, sequences, genomes etc.
– Can inform breeding programmes for improved
productivity, conservation etc.
– A relatively new science, but growing fast.
– Potential contribution has not been realised.
6. Common information issues
• We actually know very little about most non-
commercial species.
– Generalised idea about their biology.
• Physical description, macro habitat, basic diet information.
– Lack of species-specific details.
• Reproduction, nutritional requirements, behaviour.
– For some species, detailed information is essential
for successful hatchery and farm production.
• Technical barriers exist (eg. eels, lobster, anything with a
small, fragile or long larval stage).
7. Common information issues
• Most of the available information is not available
in public databases. It is in publications.
– Metadata coverage of publications is very good, but
you may need to search multiple publication
databases.
– Most publications are not free, need access to a
university-level library.
– Older non-journal publications may not be available in
electronic format (recent ones usually are).
8. Types of information system
• General taxonomy, biodiversity & biology
– Fishbase.
– Sealifebase.
– Aquatic Genetic Resource Information
System of India.
– Wikipedia projects on various taxonomic
groups.
– Many, many others.
9. www.enaca.org
Types of information system
• Geographic distribution & occurrence
– Ocean Biogeographic Information System.
– Global database on freshwater fish species
occurrence in drainage basins.
– NZ Freshwater Fish Database.
– Many, many others.
10. Types of information system
• Environmental monitoring & management
– Reefbase.
– Coral Triangle Atlas.
• Molecular genetics
– GenBank.
– European Nucleotide Archive.
– DNA Database of Japan.
– Sequence Read Archive.
– Meta-databases: International Nucleotide Sequence
Database Collaboration.
• Some combination of all of these
11. Data coverage and quality
• Most are patchy due to the sheer magnitude of
the task at hand, eg. Fishbase:
– Most significant attempt to document fish biodiversity
and biology.
– In production since 1996.
– Records on 33,000 species.
– Draws on 52,000 publications.
– > 2,300 expert collaborators.
– Maintained by consortium of 12 institutions and nine
donors
12. • Each species has a profile with information on:
– Classification.
– Environment.
– Distribution.
– Maturation.
– Physical description.
– Biology.
– Life cycle.
– Mating behaviour.
– Conservation status & human uses (= “utilisation”).
– + > 50 additional categories.
– References and links to datasets.
Data coverage and quality
13. • For important species, profiles have good coverage.
– eg. Common carp, rainbow trout.
• For less important species, coverage rapidly drops off.
– For unexploited species, information to generate a complete
profile probably doesn’t exist.
– Available information will be scattered, often in grey literature,
survey reports and similar.
• Other information systems may provide a subset of this
data, but in greater detail
– Specific geographic, environmental or taxonomic scope.
– eg. Aquatic Genetic Resource System of India provides country-
specific records that may not be available anywhere else.
Data coverage and quality
14. • Molecular genetics databases
– Most journals require authors to submit markers, sequences,
primers etc. to public databases.
– Data is being rigorously captured and archived in public
systems.
– Initiatives to cross-search data sets.
– Coverage is very good.
– But are results of potential commercial value published?
• Private sector, no.
• Public sector, sometimes.
• Role for funding agencies to insist on public access.
Data coverage and quality
15. • Systems on molecular genetics:
– Potentially extremely useful in breeding programmes for
improved productivity or conservation purposes.
– But is relevant data (which has commercial application) being
published?
• Systems on general biology, taxonomy, distribution, and
ecosystem-level data:
– Useful as general references and starting points for
investigation.
– But probably not very useful for identifying under utilised genetic
resources. They were designed for other purposes.
– Even for mainstream species, profiles are a small subset /
summary of available data. Manual literature review required.
How useful are these systems?
16. • It is often a subtle biological characteristic makes one
species more suitable than another.
– The significance may not be obvious from a record in a
database. It may only be understood in light of actual farming
experience.
– It is likely that identification of new or alternative species will be
made by farmers’ direct observations and experience, or by
scientists that have sufficient personal familiarity with a species
to recognise an opportunity.
– In many cases there are technical barriers in hatchery
production, larval rearing and nutrition, or disease that must be
overcome through research before it is possible to farm them.
Information systems will probably
never replace experts
17. • It is often a subtle biological characteristic makes one
species more suitable than another.
– The significance may not be obvious from a record in a
database. It may only be understood in light of actual farming
experience.
– It is likely that identification of new or alternative species will be
made by farmers’ direct observations and experience, or by
scientists that have sufficient personal familiarity with a species
to recognise an opportunity.
– In many cases there are technical barriers in hatchery
production, larval rearing and nutrition, or disease that must be
overcome through research before it is possible to farm them.
Information systems will probably
never replace experts
18. Examples of how context matters
• Sydney rock oyster + QX disease
– Devastating morality when QX enters an estuary.
– Switch to pacific oyster, or flat oyster (resistant).
• Vietnamese catfish
– Moved from river cages to land-based ponds.
– Switched from basa to tra catfish.
– Tra has easier hatchery technology = reliable seed supply.
– Tra can breathe air = tolerant of low O2 and extreme crowding
19. A gap: Breeders registry
• Regional Expert Consultation on Genetically Responsible
Aquaculture (ICAR-National Bureau of Fish Genetic Resources),
February 2019, India:
– Broodstock holdings of individual hatcheries tend to be too small
to maintain adequate genetic diversity.
– Many standard hatchery practices cause a loss of genetic
diversity.
– Inbreeding is a problem, and genetic management must be
implemented if improved varieties are to be maintained.
– An online, decentralised network of broodstock holding registries
would enable small holdings to be combined into a larger, virtual
population.
– This would facilitate exchange of genetic resources, mitigate
inbreeding, and help maintain a high level of adaptive capacity.
20. Building an information system
• Q: Designing one isn’t that hard anymore, so
why don’t more people do it?
– A: Getting hold of data is hard.
– A: Validating, entering and curating data is time
consuming and expensive.
– A: Databases are lifetime commitments, they are a
TERRIBLE fit for project-driven budget cycles.
– A: Inadequate computer security and lack of disaster
recovery plans undo many efforts.
21. Data acquisition
• Different models for collating and curating databases:
– Central:
• Small team can collate a database that has specific or narrow area
of focus.
• Good quality control, but expensive.
– Crowd-sourced:
• Potential to access huge data set and “grey” or unpublished data.
• Higher initial investment to allow for public participation.
• Lower control over quality (mitigate with public moderation).
• Security issues much, much more difficult.
– Blended:
• Core team efforts supplemented by voluntary contributions.
22. Linking and federating information
systems
• Largely a technical issue.
– Feasible, and not particularly difficult.
• Easier if collaborating systems expose their records via
an agreed protocol and data format.
– Not absolutely necessary, adapters can be written.
• Example: Open Archives Initiative Protocol for Metadata
Harvesting.
– Disparate systems expose records to harvesters using common
protocol.
– Facilitates cross-site search (federation).
23. IP issues
• Collaboration is massively enhanced by publishing data
under an open access license, eg. Creative Commons.
• Open access licenses are available with a range of
restrictions to suit different purposes.
• “Copyleft” licenses generally involve:
– The author asserting copyright over a work.
– Granting the public permission to use, redistribute or adapt the
work in various ways, so long as derivatives are distributed
under the same license.
– Restrictions can include non-commercial use, no modification.
24. Conclusions
• Information on non-commercial species is limited and
scattered.
• Most available information is in publications, not in public
databases.
• Information systems on general biology and distribution
are patchy and of limited use (for this purpose).
• Molecular genetics systems are potentially very useful,
but IP issues may limit publication of commercially-useful
results.
• Development of a decentralised breeders registry for
broodstock holdings is an opportunity to address multiple
productivity issues (ICAR-NBFGR 2019).