Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Moving From Small Science To Big Science


Published on

  • Be the first to comment

  • Be the first to like this

Moving From Small Science To Big Science

  1. 2. We will see two case studies like marine Mammal science and psychiatric genetics.
  2. 3. <ul><li>Subjects: 41 interviewees as principal investigators, junior researchers, and technicians. </li></ul><ul><li>The purpose of the projects the researchers are involving is tracking each mammals they are studying. </li></ul><ul><li>Place: 13 different laboratories in the U.S. and Europe. </li></ul><ul><li>Period of their projects: about 40 years from 1970s. </li></ul><ul><li>These researchers are scientists rather than social scientists, but their experience on </li></ul><ul><li>Organizing the data is more likely social science. </li></ul><ul><li>At first, it seemed helpful. For example, a certain school of dolphins (200-500 dolphins) </li></ul><ul><li>Stay at one area and only researchers working near the area could study the dolphins. </li></ul><ul><li>After People gather and share their information, researchers living in different area could </li></ul><ul><li>Study the dolphins as well. </li></ul>
  3. 4. <ul><li>Leah Tull: Well, honestly, I’m very protective about it… I guess it rather bugs me that </li></ul><ul><li>I have to do the work and everyone always asks me for a CD… it’s out scientific </li></ul><ul><li>Study. </li></ul><ul><li>Others also say that it is hard to organize the data with considering how others will </li></ul><ul><li>Systemize and standardize other data. It is difficult to know how deep, where to, and </li></ul><ul><li>To whom researchers should distribute and share the information is unsolved question </li></ul><ul><li>In the past, a small scientific group held a project. People could get information in informal </li></ul><ul><li>gatherings based on common attendance at a university or through shared contacts. But </li></ul><ul><li>For now, some people are putting efforts to build much larger databases like a project </li></ul><ul><li>Named SPLASH </li></ul><ul><li>SPLASH involve over 300 scientists from 50 research groups working in various areas in the </li></ul><ul><li>Pacific Ocean (Calambokidis et al., 2007). </li></ul>
  4. 5. <ul><li>Background: scientists are using photographs to distinguish each mammal. </li></ul><ul><li>Problem 1: most pictures prior to 2003 are in the form of slides, black and </li></ul><ul><li>White negatives, or black and white prints. Since 2003, many </li></ul><ul><li>Scientists have switched to digital photography, and have used </li></ul><ul><li>Different idiosyncratic systems to cope with digital catalogs. </li></ul><ul><li>Problem 2: that the amount and the range of the data are too broad </li></ul><ul><li>Because the purpose of collecting the data is tracking the </li></ul><ul><li>Mammal rather than organizing the data. It means </li></ul>
  5. 6. Psychiatric genetics <ul><li>Subjects: around 50 researchers from institutions expanded from </li></ul><ul><li>4 to 11. Each laboratory has one to five researchers working. </li></ul><ul><li>Period: about 20 years </li></ul><ul><li>The project the researchers are involving is BP (bipolar disorder) project. </li></ul>BP project was selected to GAIN (Genetic Association Identification Network). Rather than a funded group, GAIN is a group encouraging researchers to organize and Share the data to help not only others but also themselves in return.
  6. 7. <ul><li>Background: researchers have collected blood, genetic data, and phenotypic data on thousand of </li></ul><ul><li>subjects. </li></ul><ul><li>Problem 1: the amount and the range of the data is too broad. </li></ul><ul><li>For example, the biggerst data is over 100 pages’ interview data. Each of them took 4-6 hours </li></ul><ul><li>And it includes approximately 2600 variables. Additionally, each includes a trained clinician’s </li></ul><ul><li>Analysis, family history, medical records, and other information. Each subject has multiple best </li></ul><ul><li>Estimates from at least two clinicians plus the interviewer and an editor. </li></ul><ul><li>  </li></ul><ul><li>Problem 2: the data was encoded in three different versions of the interview instrument. </li></ul><ul><li>First data was collected by Oracle database. </li></ul><ul><li>Second data was organized by a Paradox database. </li></ul><ul><li>Third data was managed by a proprietary database with using labtops and PCs. </li></ul><ul><li>  </li></ul><ul><li>Problem 3: The three data systems are not compatible. </li></ul><ul><li>Problem 4: the diagnoses are conducted by different system. </li></ul><ul><li>The earliest diagnoses use a combined DSM-IIIR/RDC systems, while the latest subjects are implEmented with DSM-IV. </li></ul><ul><li>Problem 5: Variables in the three versions are confusing. All three versions are converted from their </li></ul><ul><li>Original storage into SAS files, but their variable names are not consistence. For example, one </li></ul><ul><li>Variable is “I1120” in the first set, “Number_of_manic_episodes” in the second set, “V756” in the </li></ul><ul><li>Third set. To organize these data, people are required to know the professional knowledge with </li></ul><ul><li>Organizing information skills. </li></ul>
  7. 8. As we can see from the two cases, there are hardships to go to big science from a small scientific project. The researchers from SPLASH and BP collaborations Are trained for their scientific task, but for organizing Information. If they were trained for organizing Information, it would be a help.   In SPLASH, the new system contain three versions of Systems is not made for expanding more. If it wants To expand, it will have some incompatible problems. In BP, even though the numbers of researchers were less than SPLASH, there were problems. They had difficulties in computer programming. For example, they had hards hips to implement EAV with various variables. Furthermore, SAS does not provide ampersand. So, “Total Manic & Depressive Episodes” in paradox Became “total_manic_depressive episodes” in SAS.
  8. 9. Style of social interaction in the project SPLASH didn’t “try to force them to do it one way” Jacob Tipton BP project was always very decentralized. Both SPLASH and BP projects have non-dogmatic leaders.   This flexible and decentralized form of leadership is common among scientific and creative teams(Mumford, Scott, Gaddis, & Strange, 2002) and is not inherently problematic. Science relies on the freedom of scientists to innovate (Bush, 1945; Gordon, Marquis, & Anderson, 1962), although some recent work suggests that these patterns are chaning in the face of calls for measures of increased accountability and relevance for scientific work (Demeritt, 2000; Harman, 2003).   The point is, to what extent data management should require to dictate and to what extent should individual scientists be allowed to ignore or skill issues of compatibility and data availability.
  9. 10. Derek de Solla Price (1963) identified some of these issues four decades ago in his work thay helped to develop the field of Scientometrics. More recently, scholars in computer science have addressed issues of scalability (Simmhan, Plale, & Gannon, 2005; Zheng, Venters, & Cornford, 2007). Any number of papers discussing the implementation of Grid enabled projects have identifies scalability as one Of the key issues developers have had to deal with (Pakhira, Fowler, Sastry, & Perring, 2005; Shimojo, Kalia, Nakano, & Vashishta, 2001). Only recently, have researchers begun to pay attention to how small scientific projects negotiate the changes required as they move towards becoming large, collaborative scientific projects (Calson & Anderson, 2006; Walsh & Maloney, 2007). Scientists attempt to sustain these collaborations over time (Bos et al., 2007).