View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
Subjects: around 50 researchers from institutions expanded from
4 to 11. Each laboratory has one to five researchers working.
Period: about 20 years
The project the researchers are involving is BP (bipolar disorder) project.
BP project was selected to GAIN (Genetic Association Identification Network). Rather than a funded group, GAIN is a group encouraging researchers to organize and Share the data to help not only others but also themselves in return.
Background: researchers have collected blood, genetic data, and phenotypic data on thousand of
Problem 1: the amount and the range of the data is too broad.
For example, the biggerst data is over 100 pages’ interview data. Each of them took 4-6 hours
And it includes approximately 2600 variables. Additionally, each includes a trained clinician’s
Analysis, family history, medical records, and other information. Each subject has multiple best
Estimates from at least two clinicians plus the interviewer and an editor.
Problem 2: the data was encoded in three different versions of the interview instrument.
First data was collected by Oracle database.
Second data was organized by a Paradox database.
Third data was managed by a proprietary database with using labtops and PCs.
Problem 3: The three data systems are not compatible.
Problem 4: the diagnoses are conducted by different system.
The earliest diagnoses use a combined DSM-IIIR/RDC systems, while the latest subjects are implEmented with DSM-IV.
Problem 5: Variables in the three versions are confusing. All three versions are converted from their
Original storage into SAS files, but their variable names are not consistence. For example, one
Variable is “I1120” in the first set, “Number_of_manic_episodes” in the second set, “V756” in the
Third set. To organize these data, people are required to know the professional knowledge with
Organizing information skills.
As we can see from the two cases, there are hardships to go to big science from a small scientific project. The researchers from SPLASH and BP collaborations Are trained for their scientific task, but for organizing Information. If they were trained for organizing Information, it would be a help. In SPLASH, the new system contain three versions of Systems is not made for expanding more. If it wants To expand, it will have some incompatible problems. In BP, even though the numbers of researchers were less than SPLASH, there were problems. They had difficulties in computer programming. For example, they had hards hips to implement EAV with various variables. Furthermore, SAS does not provide ampersand. So, “Total Manic & Depressive Episodes” in paradox Became “total_manic_depressive episodes” in SAS.
Style of social interaction in the project SPLASH didn’t “try to force them to do it one way” Jacob Tipton BP project was always very decentralized. Both SPLASH and BP projects have non-dogmatic leaders. This flexible and decentralized form of leadership is common among scientific and creative teams(Mumford, Scott, Gaddis, & Strange, 2002) and is not inherently problematic. Science relies on the freedom of scientists to innovate (Bush, 1945; Gordon, Marquis, & Anderson, 1962), although some recent work suggests that these patterns are chaning in the face of calls for measures of increased accountability and relevance for scientific work (Demeritt, 2000; Harman, 2003). The point is, to what extent data management should require to dictate and to what extent should individual scientists be allowed to ignore or skill issues of compatibility and data availability.
Derek de Solla Price (1963) identified some of these issues four decades ago in his work thay helped to develop the field of Scientometrics. More recently, scholars in computer science have addressed issues of scalability (Simmhan, Plale, & Gannon, 2005; Zheng, Venters, & Cornford, 2007). Any number of papers discussing the implementation of Grid enabled projects have identifies scalability as one Of the key issues developers have had to deal with (Pakhira, Fowler, Sastry, & Perring, 2005; Shimojo, Kalia, Nakano, & Vashishta, 2001). Only recently, have researchers begun to pay attention to how small scientific projects negotiate the changes required as they move towards becoming large, collaborative scientific projects (Calson & Anderson, 2006; Walsh & Maloney, 2007). Scientists attempt to sustain these collaborations over time (Bos et al., 2007).