Databasing the world

  • 293 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
293
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Databasing the World:Biodiversity and the 2000s
    Written by Bowker, G. C.
    Presented by Chen Zhang (Mike)
  • 2. Four Key Aspects
    Database Infrastructure
    Standards—flexible, stable
    Technology—stable
    Communication
    Data Sharing
    Ownership
    Disarticulation
    Data collection
  • 3. Four Key Aspects
    Distributed Collective Practice
    Collaborate work
    New Knowledge Economy
    Accounting for life
    Development of Classification
    Cladistics
    The Future
  • 4. Database Infrastructure
  • 5. Standards
    Why do we need standards
    Example of air-conditioner industry
    Diameter Match between screw and the hole on the panel
    Reasons for database
    Need ‘handshake’ among various media
    MIME<Multipurpose Internet Mail Extensions>protocol
    Each layer of infrastructure requires its own set of standards
    Need standardized categories.
  • 6. Standards
    Standards will not always win
    Some best-known standards
    QWERTY keyboard
  • 7. Standards
    Standards will not always win
    Some best-known standards
    VHS (Video Home System) standard
  • 8. Standards
    Standards will not always win
    Some best-known standards
    DOS computing system
  • 9. Standards
    Standards will not always win
    Why?
    The best standard maybe doesn’t have best market
    Standards setting is a key site of political work
    The inferior standard may be respected by the political agency. ( Such as standards-setting bodies)
  • 10. Standards
    Interoperability
    Continuum of strategies for standards setting
    One Standard Fits All
    Let A Thousand standards bloom
  • 11. Standards
    Interoperability
    Some Related Standards
    1. ANSI/NISO Z39.50
    ANSI/NISO Z39.50 is the American National Standard Information Retrieval Application Service Definition and Protocol Specification for Open Systems Interconnection.
    IT makes it easier to use large information databases by standardizing the procedures and features for searching and retrieving information.
  • 12. Standards
    Interoperability
    Some Related Standards
    ANSI/NISO Z39.50
  • 13. Standards
    Interoperability
    Some Related Standards
    1. ANSI/NISO Z39.50
    A single enquiry over multiple databases.
    widely adopter in the library world.
  • 14. Standards
    Interoperability
    Some Related Standards
    2. XML
    Extensible Markup Language (XML) is a set of rules for encoding documents in machine-readable form.
    Two extremes:
    a. Colonial model
    b. Democratic model (win out)
    People’s established computing environment
  • 15. Technology
    Technology must be stable
    Nothing to guarantee the stability of vast data sets
    Failure of Paul Otlet’s well catalogued microfiches
    Development of computer memory
    Hard to retrieve information
  • 16. Technology
    Technology must stable
    Data accessible and usable
    Infrastructure will require a continued maintenance effort
    Reasons
    a. Data is passed from one medium to another
    b. Data is analyzed by one generation of database technology to the next.
  • 17. Issues of Communication
    Problem of reliable metadata
    Metadata—data about data
    The blue lines
    are metadata
  • 18. Issues of Communication
    Problem of reliable metadata
    The standard name of certain kinds of data
    Searchable—easy to search over multiple database
    Issue—how detail does the name of data should be?
    Lack of details— the information of data is useless
    Too many details— longer time, more work
  • 19. Issues of Communication
    Dublin code
    The Dublin Core set ofmetadata elements provides a small and fundamental group of text elements through which most resources can be described and cataloged.
    The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements:
    Language
    Relation
    Coverage
    Rights
    Title Creator Subject Description Publisher Contributor
    Date
    Type
    Format
    Identifier
    Source
  • 20. Data Sharing
  • 21. Ownership
    Control of knowledge
    Mid-nineteenth century:
    only professionally trained scientists and doctors
    New information economy:
    from many people
    Example: patients group
  • 22. Ownership
    Privacy
    Keep data private is difficult :
    Example: data is complied by third-company to generate a new, marketable form of knowledge
    New Patterns of ownership
    Science has frequently been analyzed as a “public good”
    Increasing privatization of knowledge :
    It is unclear to what extent the vaunted openness of the scientific community will last
  • 23. Disarticulation
    Ideal database
    Should according to most practitioners be theory-neutral, but should serve as a common basis for a number of scientific disciplines to progress.
    Example: genome databank new kind of science  genome construct arguments about the genetic causation ≠ the process of mapping the genome
    • Data must be reusable by scientists
    • 24. The data in a database should be easily manipulated by other scientists.
  • Data Collection
    Biodiversity
    Large-scale databases are being developed for a diverse array of animal and plant groups
    Worldwide effort
    IUBS
    CODATA
    IUMS
    Deal with old data
    Data was rolled into a theory should remember
    All its own data
    Potentially data that had not yet been collected
  • 25. Data Collection
    Deal with old data
    Difficulties
    Scientific paper don’t in general offer enough information to allow an experiment or procedure to be repeated.
    The distributed database is becoming a new model form of scientific publication in its own right
    Issues of Update
    No automatic update from one field to a cognate one
    Scientist are not able to share information across discipline divides
  • 26. Data Collection
    International Technoscience
    Purpose: Narrow the gaps between countries
    Issues:
    People do not have equal knowledge
    Access is never really equal
    Government have doubts of the usefulness of opening the database onto internet.
  • 27. Distributed Collective Practice
  • 28. Collaborative Work
    Management structures in universities and industry still tend to support the heroic myth of the individual researcher.
    What kind of value the large publishing houses add to journal production.
    Great attention must be paid to the social and organizational setting of technoscientific work
  • 29. New Knowledge Economy
    Three central issues
    The development of flexible, stable data standard
    The generation of protocols for data sharing
    The restructuring of scientific careers
  • 30. Accounting For Life
  • 31. Development of Classification
    Introduction: PANDORA taxonomic database
  • 32. Development of Classification
    Importance of classification
    18th-19th centuries : botanist must know all genera, and commit their names to memory, but cannot be expected to remember all specific names. ( A.J. Cain, 1958)
    Later part of 19th century: new information technologies developed which permitted the easy storage and coding of larger amounts of data than could previously be easily manipulated. (Chandler,1977),(Yates,1989)
  • 33. Development of Classification
    Example of classification
    Paper-based archival practice.
    Issues: hard to reclassified
    Type specimen had to be relocated physically
    So do Series of articles or books
  • 34. Development of Classification
    Example of classification
    Multifaceted classification system
    Improve: Enabling the classifications to be ordered in multiple ways, rather than in a single
    Example: A collection of books might be classified using an author facet, a subject facet, a date facet
  • 35. Development of Classification
    Example of classification
    Hierarchical classification (for reading the past)
    E.F. Codd In the early 1970s
    Split physical storage of data in the computer and the representation of that data.
    Disadvantage: becomes awkward to introduce other levels of taxonomic category as an afterthought.
    Improve method: one record for every name, regardless of its taxonomic level
  • 36. Cladistics
    Definition
    It is a method of classifying species of organisms into groups called clades, which consist of 1) all the descendants of an ancestral organism and 2) the ancestor itself.
    Features : Give a more regular algorithm for determining phylogeny
    Focusing attention on shared, derived characteristics of set organisms
    Using ‘outgroup’ comparisons to develop the classification system
  • 37. Cladistics
    Tree of life
    Cladists use cladograms, diagrams which show ancestral relations between taxa, to represent the evolutionary tree of life
    Charles Darwin (1809–1882) was the first to produce an evolutionary tree of life
  • 38. Cladistics
    Tree of life
  • 39. Cladistics
    Computer programs in cladistics
    Undertaken using Swofford’s (1985) package PAUP version 2.4installed on a Cyber mainframe computer and version 2.4.1 on an amstrad 1512 PC
    David Swofford’s PAUP is a software package for inference of evolutionary trees
    Purpose: follow a given algorithm for generating and testing cladograms
  • 40. Cladistics
    Computer programs in cladistics
  • 41. Cladistics
    Computer programs in cladistics
    Issues:
    The packages produce variable results and cannot possibly look at all the possibilities, since there is NP-complete problem.
    Algorithm issues
  • 42. The Future
    Store the life
    Life is described as itself a program, with DNA being code.
    IF everything is information, then life can equally well be “stored”
  • 43. THANK YOU !