Databasing the world


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Databasing the world

  1. 1. Databasing the World:Biodiversity and the 2000s<br />Written by Bowker, G. C. <br />Presented by Chen Zhang (Mike)<br />
  2. 2. Four Key Aspects<br />Database Infrastructure<br />Standards—flexible, stable<br />Technology—stable <br />Communication<br />Data Sharing<br />Ownership<br />Disarticulation<br />Data collection<br />
  3. 3. Four Key Aspects<br />Distributed Collective Practice<br />Collaborate work<br />New Knowledge Economy<br />Accounting for life<br />Development of Classification<br />Cladistics<br />The Future<br />
  4. 4. Database Infrastructure<br />
  5. 5. Standards<br />Why do we need standards<br />Example of air-conditioner industry<br />Diameter Match between screw and the hole on the panel<br />Reasons for database<br />Need ‘handshake’ among various media<br />MIME<Multipurpose Internet Mail Extensions>protocol <br />Each layer of infrastructure requires its own set of standards<br />Need standardized categories.<br />
  6. 6. Standards<br />Standards will not always win<br />Some best-known standards<br />QWERTY keyboard<br />
  7. 7. Standards<br />Standards will not always win<br />Some best-known standards<br />VHS (Video Home System) standard <br />
  8. 8. Standards<br />Standards will not always win<br />Some best-known standards<br />DOS computing system<br />
  9. 9. Standards<br />Standards will not always win<br />Why?<br />The best standard maybe doesn’t have best market<br />Standards setting is a key site of political work<br />The inferior standard may be respected by the political agency. ( Such as standards-setting bodies)<br />
  10. 10. Standards<br />Interoperability<br />Continuum of strategies for standards setting<br />One Standard Fits All<br />Let A Thousand standards bloom<br />
  11. 11. Standards<br />Interoperability<br />Some Related Standards<br />1. ANSI/NISO Z39.50<br />ANSI/NISO Z39.50 is the American National Standard Information Retrieval Application Service Definition and Protocol Specification for Open Systems Interconnection.<br /> IT makes it easier to use large information databases by standardizing the procedures and features for searching and retrieving information.<br />
  12. 12. Standards<br />Interoperability<br />Some Related Standards<br />ANSI/NISO Z39.50<br />
  13. 13. Standards<br />Interoperability<br />Some Related Standards<br />1. ANSI/NISO Z39.50<br /> A single enquiry over multiple databases.<br /> widely adopter in the library world.<br />
  14. 14. Standards<br />Interoperability<br />Some Related Standards<br />2. XML<br />Extensible Markup Language (XML) is a set of rules for encoding documents in machine-readable form.<br /> Two extremes:<br /> a. Colonial model<br />b. Democratic model (win out)<br /> People’s established computing environment<br />
  15. 15. Technology<br />Technology must be stable<br />Nothing to guarantee the stability of vast data sets<br />Failure of Paul Otlet’s well catalogued microfiches<br />Development of computer memory<br />Hard to retrieve information<br />
  16. 16. Technology<br />Technology must stable<br />Data accessible and usable<br />Infrastructure will require a continued maintenance effort<br />Reasons<br /> a. Data is passed from one medium to another<br />b. Data is analyzed by one generation of database technology to the next.<br />
  17. 17. Issues of Communication<br />Problem of reliable metadata<br />Metadata—data about data<br />The blue lines <br />are metadata<br />
  18. 18. Issues of Communication<br />Problem of reliable metadata<br />The standard name of certain kinds of data<br />Searchable—easy to search over multiple database<br />Issue—how detail does the name of data should be?<br />Lack of details— the information of data is useless<br />Too many details— longer time, more work<br />
  19. 19. Issues of Communication<br />Dublin code<br />The Dublin Core set ofmetadata elements provides a small and fundamental group of text elements through which most resources can be described and cataloged.<br />The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements:<br />Language<br />Relation<br />Coverage<br />Rights<br />Title Creator Subject Description Publisher Contributor<br />Date<br />Type<br />Format<br />Identifier<br />Source<br />
  20. 20. Data Sharing<br />
  21. 21. Ownership<br />Control of knowledge<br />Mid-nineteenth century: <br />only professionally trained scientists and doctors <br />New information economy: <br />from many people<br />Example: patients group<br />
  22. 22. Ownership<br />Privacy<br />Keep data private is difficult :<br /> Example: data is complied by third-company to generate a new, marketable form of knowledge<br />New Patterns of ownership<br />Science has frequently been analyzed as a “public good”<br />Increasing privatization of knowledge : <br /> It is unclear to what extent the vaunted openness of the scientific community will last<br />
  23. 23. Disarticulation<br />Ideal database<br />Should according to most practitioners be theory-neutral, but should serve as a common basis for a number of scientific disciplines to progress.<br />Example: genome databank new kind of science  genome construct arguments about the genetic causation ≠ the process of mapping the genome<br /><ul><li>Data must be reusable by scientists
  24. 24. The data in a database should be easily manipulated by other scientists.</li></li></ul><li>Data Collection<br />Biodiversity<br />Large-scale databases are being developed for a diverse array of animal and plant groups<br />Worldwide effort<br />IUBS<br />CODATA<br />IUMS<br />Deal with old data<br />Data was rolled into a theory should remember<br />All its own data<br />Potentially data that had not yet been collected<br />
  25. 25. Data Collection<br />Deal with old data<br />Difficulties<br />Scientific paper don’t in general offer enough information to allow an experiment or procedure to be repeated.<br />The distributed database is becoming a new model form of scientific publication in its own right<br />Issues of Update<br />No automatic update from one field to a cognate one<br />Scientist are not able to share information across discipline divides<br />
  26. 26. Data Collection<br />International Technoscience<br />Purpose: Narrow the gaps between countries<br />Issues:<br />People do not have equal knowledge<br />Access is never really equal<br />Government have doubts of the usefulness of opening the database onto internet.<br />
  27. 27. Distributed Collective Practice<br />
  28. 28. Collaborative Work<br />Management structures in universities and industry still tend to support the heroic myth of the individual researcher.<br />What kind of value the large publishing houses add to journal production.<br />Great attention must be paid to the social and organizational setting of technoscientific work<br />
  29. 29. New Knowledge Economy<br />Three central issues<br />The development of flexible, stable data standard<br />The generation of protocols for data sharing<br />The restructuring of scientific careers<br />
  30. 30. Accounting For Life<br />
  31. 31. Development of Classification<br />Introduction: PANDORA taxonomic database<br />
  32. 32. Development of Classification<br />Importance of classification<br />18th-19th centuries : botanist must know all genera, and commit their names to memory, but cannot be expected to remember all specific names. ( A.J. Cain, 1958)<br />Later part of 19th century: new information technologies developed which permitted the easy storage and coding of larger amounts of data than could previously be easily manipulated. (Chandler,1977),(Yates,1989)<br />
  33. 33. Development of Classification<br />Example of classification<br />Paper-based archival practice.<br />Issues: hard to reclassified<br />Type specimen had to be relocated physically<br />So do Series of articles or books<br />
  34. 34. Development of Classification<br />Example of classification<br />Multifaceted classification system<br />Improve: Enabling the classifications to be ordered in multiple ways, rather than in a single<br />Example: A collection of books might be classified using an author facet, a subject facet, a date facet<br />
  35. 35. Development of Classification<br />Example of classification<br />Hierarchical classification (for reading the past)<br />E.F. Codd In the early 1970s<br />Split physical storage of data in the computer and the representation of that data.<br />Disadvantage: becomes awkward to introduce other levels of taxonomic category as an afterthought.<br />Improve method: one record for every name, regardless of its taxonomic level<br />
  36. 36. Cladistics<br />Definition<br />It is a method of classifying species of organisms into groups called clades, which consist of 1) all the descendants of an ancestral organism and 2) the ancestor itself.<br />Features : Give a more regular algorithm for determining phylogeny<br />Focusing attention on shared, derived characteristics of set organisms<br />Using ‘outgroup’ comparisons to develop the classification system<br />
  37. 37. Cladistics<br />Tree of life<br />Cladists use cladograms, diagrams which show ancestral relations between taxa, to represent the evolutionary tree of life<br />Charles Darwin (1809–1882) was the first to produce an evolutionary tree of life<br />
  38. 38. Cladistics<br />Tree of life<br />
  39. 39. Cladistics<br />Computer programs in cladistics<br />Undertaken using Swofford’s (1985) package PAUP version 2.4installed on a Cyber mainframe computer and version 2.4.1 on an amstrad 1512 PC<br />David Swofford’s PAUP is a software package for inference of evolutionary trees<br />Purpose: follow a given algorithm for generating and testing cladograms<br />
  40. 40. Cladistics<br />Computer programs in cladistics<br />
  41. 41. Cladistics<br />Computer programs in cladistics<br />Issues:<br />The packages produce variable results and cannot possibly look at all the possibilities, since there is NP-complete problem.<br />Algorithm issues<br />
  42. 42. The Future<br />Store the life<br />Life is described as itself a program, with DNA being code.<br />IF everything is information, then life can equally well be “stored” <br />
  43. 43. THANK YOU !<br />