Your SlideShare is downloading. ×
"What does 'Full Life-Cycle' Data Management Mean ?"
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

"What does 'Full Life-Cycle' Data Management Mean ?"

430

Published on

Presentation made to US Office of Personnel Management Community of Practice on Big Data

Presentation made to US Office of Personnel Management Community of Practice on Big Data

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
430
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • All data go through processes of development. This 1986 NASA publication is still an excellent guide to basics of scientific data management…
  • The text accompanying the DCC model is very helpful in differentiating “full life cycle” actions / “sequential actions” and “occasional actions” -- the graphic is much less effective…
  • The accompanying text is more helpful but still not comprehensive…
  • Michener’s chart from 2006 makes a better effort at suggesting constant elements and feedback loops…
  • This Oracle “model” focuses on “databases” – not on “data” per se…
  • Transcript

    • 1. What does “Full Life-Cycle” DataManagement Mean ?“BIG DATA”US Office of Personnel ManagementMarch 14, 2013
    • 2. “As required by the National Archives andRecords Administration (NARA) in 36 CFRChapter XII, Subchapter B, RecordsManagement, Federal agencies areresponsible for creating and maintainingauthentic, reliable, and usable records andensure that they remain so for the length oftheir authorized retention period.”http://www.archives.gov/records-mgmt/toolkit/pdf/ID373.pdf
    • 3. First, a brief digression concerning graphics…Edward Tufte’s favorite…
    • 4. DISCRETION…Exercise care in the selection of graphic formats– not all graphics enhance understandingsome may confuse…Lacking effective compound graphics, simplicityand the use of multiple graphic images may bemore effective.The New York Times often produces exemplarygraphics that compress complex data andcomplex relationships…
    • 5. NYT: “LEADING CAUSES OF CANCER DEATHS”http://www.nytimes.com/imagepages/2007/07/29/health/29cancer.graph.web.html
    • 6. “Data” ? [technical definition]“…’data’ are defined as any information that can be stored indigital form and accessed electronically, including, but notlimited to, numeric data, text, publications, sensor streams,video, audio, algorithms, software, models and simulations,images, etc.”-- Program Solicitation 07-601“Sustainable Digital Data Preservation and Access Network Partners (DataNet)”Taken in this broadest possible sense, “data” are thus simplyelectronic coded forms of information. And virtually anythingcan be represented as “data” so long as it is electronicallymachine-readable.
    • 7. “Data” [epistemicdefinition – addressing the meaning of data]“Measurements, observations or descriptions ofa referent -- such as an individual, an event, aspecimen in a collection or anexcavated/surveyed object -- created orcollected through human interpretation(whether directly “by hand” or through the useof technologies)”-- AnthroDPA Working Group on Metadata (May, 2009)[funded by Wenner-Gren Foundation and US NSF]
    • 8. “Experiments to determine the density of the earth,” by Henry Cavendish, ESQ., F.R.S. AND A.S. ReadJune 21, 1798 (From the Philosophical Transactions of the Royal Society of London for the year1798, Part II. , pp. 469-526)From: http://www.archive.org/details/lawsofgravitatio00mackrich
    • 9. USDA – NATURAL RESOURCES CONSERVATION SERVICE
    • 10. 2 12.365 1196796112 2018.8 0.5585 0.51029 0.55517 0.54354 0.6067 0.52858 0.55351 0.59008 0.59506 0.60337 0.56514 12/4/07 11:21 4.473513 12.348 1196796232 2017.9 0.55682 0.51028 0.5535 0.54352 0.60669 0.52857 0.55017 0.59007 0.59505 0.60336 0.56513 12/4/07 11:23 0 4.474904 12.357 1196796352 2018.6 0.55514 0.51027 0.55348 0.54351 0.60501 0.52855 0.55016 0.59005 0.59504 0.60501 0.56512 12/4/07 11:25 0 4.476285 12.354 1196796472 2017.6 0.55514 0.51026 0.55181 0.5435 0.60334 0.52855 0.54849 0.59004 0.59503 0.60334 0.56511 12/4/07 11:27 0 4.477676 12.334 1196796592 2018.3 0.55347 0.51026 0.55015 0.5435 0.60333 0.52854 0.54682 0.59004 0.59502 0.605 0.56511 12/4/0711:29 0 4.479067 12.34 1196796712 2018.5 0.55014 0.50859 0.55014 0.54349 0.60332 0.53019 0.54349 0.59003 0.59501 0.60498 0.56676 12/4/07 11:31 0 4.480458 12.337 1196796832 2017.8 0.55013 0.50692 0.55013 0.54348 0.60332 0.53019 0.54182 0.59002 0.59501 0.60498 0.56675 12/4/07 11:33 0 4.481849 12.328 1196796952 2017.5 0.5468 0.50691 0.5468 0.54347 0.60331 0.53018 0.53849 0.59001 0.595 0.60497 0.56674 12/4/0711:35 0 4.4832310 12.323 1196797072 2017 0.54679 0.50524 0.54679 0.54347 0.59998 0.53017 0.53682 0.59 0.59499 0.60496 0.56674 12/4/07 11:37 0 4.4846211 12.328 1196797192 2018.9 0.54679 0.50191 0.54512 0.5418 0.59665 0.53017 0.53349 0.59 0.59498 0.60496 0.56673 12/4/0711:39 0 4.4860112 12.319 1196797312 2017.7 0.54345 0.49857 0.54178 0.54178 0.59663 0.53015 0.53015 0.58998 0.5933 0.60327 0.56671 12/4/07 11:41 0 4.4874013 12.311 1196797432 2017.3 0.54343 0.4969 0.54011 0.54177 0.59661 0.53014 0.52848 0.58997 0.59329 0.6016 0.5667 12/4/07 11:43 0 4.4887814 12.316 1196797552 2018.6 0.5401 0.49357 0.53678 0.54176 0.59328 0.53013 0.5268 0.58995 0.59328 0.60325 0.56669 12/4/07 11:45 0 4.4901715 12.31 1196797672 2016.8 0.53844 0.4919 0.53511 0.54176 0.59494 0.53013 0.52514 0.58995 0.59328 0.60325 0.56503 12/4/07 11:47 0 4.4915616 12.31 1196797792 2017.1 0.53676 0.48856 0.53343 0.54174 0.59326 0.53011 0.5218 0.58993 0.59326 0.60323 0.56501 12/4/07 11:49 0 4.4929517 12.31 1196797912 2017.1 0.53342 0.48523 0.5301 0.54173 0.59324 0.5301 0.51846 0.58826 0.59324 0.60321 0.56499 12/4/07 11:51 0 4.4943418 12.301 1196798031 2017.5 0.53174 0.48521 0.52842 0.53839 0.59156 0.53008 0.51845 0.58824 0.59323 0.6032 0.56498 12/4/07 11:53 0 4.4957319 12.301 1196798151 2016.3 0.53007 0.48188 0.52509 0.53838 0.59155 0.53007 0.51512 0.58823 0.59321 0.60152 0.5633 12/4/07 11:55 0 4.4971220 12.303 1196798271 2016.6 0.5284 0.47855 0.52175 0.53837 0.59154 0.5284 0.5151 0.58821 0.59154 0.60151 0.56163 12/4/07 11:57 0 4.49851sbid battery datetime heater_voltage Manz1Sap1 Manz1Sap2 Manz1Sap3 Manz1Sap4 Manz2Sap5 Manz2Sap6 Manz2Sap7 Manz3Sap10 Manz3Sap8 Manz3Sap9 Manz4Sap11 timestamp Datagap Julianmanzanita_sapflow_12-5-07_to_7-7-08.xlsinstantaneous sap flow data (as temperature differences on a constant temperature heatdissipation probe) for multiple branches of Manzanita, collected with a datalogger. used tocorrelate physiological activity with below-ground measures of root grown and CO2 production.Datum: “0.59998”
    • 11. DATASETSsomeexampleswith “nativemetadata”2-d_soil_temps.csvsurface, and sub-surface soil temperatures (at 2cm and 8cm depths) measured at one location for a few days in order tocalibrate a model of temperature propagation. Surface temperature was measured with an infrared thermometer,subsurface temperatures with a thermocouple.----------------------------5-minute_light_data_for_4_continuous_days_plus_reference.xlsPPF (photosynthetic photon flux = photosynthetically active radiation 400-700nm) measured with an array of photodiodescalibrated to a Licor sensor, along a linear transect for a few days. used to get an idea of how much light plants alongthe transect are receiving.----------------------------CO2_of_air_at_different_heights_July_9.xlsconcentration of CO2 in the air during the evening for one day, measured with a Licor infrared gas analyzer and a series ofrelays and tubes with a pump. used to examine the gradient of CO2 coming from the soil when the air is still during theevening.----------------------------Fern_light_response.xlsLight response curves for bracken ferns, measured with a Licor photosynthesis system. Fronds are exposed to different lightlevels and their instantaneous photosynthesis and conductance is measured. used in conjunction with the inductiondata (below) for physiological characterization of the ferns.----------------------------La_Selva_species_photosyntheis_table.xlsincomplete data set on instantaneous photosynthesis rates for various tropical understory and epiphytic species grown in ashade house in Costa Rica.----------------------------manzanita_sapflow_12-5-07_to_7-7-08.xlsinstantaneous sap flow data (as temperature differences on a constant temperature heat dissipation probe) for multiplebranches of Manzanita, collected with a datalogger. used to correlate physiological activity with below-groundmeasures of root grown and CO2 production.----------------------------moisture_release_curves.xlspercentage of water content, water potential (in MegaPascals) and temperature of soil samples, measured in the laboratoryfor calibration of water content with water potential. soil is from the James Reserve in California.----------------------------Photosynthetic_induction.xlsa time-course of photosynthetic induction for a leaf over 35 minutes. instantaneous photosynthesis measured as �mol CO2m/2/s and light level is probably 1000 micromoles. used to determine physiological characteristics of bracken ferns.----------------------------run_2_24-h_data_for_mesh.xlsmeasurements of micrometeorological parameters on a moving shuttle, going from a clearing across a forest edge and intothe forest for about 30 meters. Pyronometers facing up and down, pyrgeometer facing up and down, PAR, airtemperature, relative humidity. Also data from a station fixed in the clearing and some derived variables calculated.used for examining edge effects in forests.----------------------------Segment_of_wallflower_compare_colorspaces_blur.xlspixel counts from images of wallflowers that were segmented into flower/not-flower under different color spaces.segmentation was made using a probability matrix of hand-segmented images. used to automatically count flowers inimages collected after this training data was collected (and used to determine the best color space for this task).
    • 12. Data Development:“Data Reduction - Processing Level Definitions” (an example)http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19860021622_1986021622.pdfReport of the EOS Data Panel Vol IIA, NASA, 1986 (Tech Memorandum 87777)Tom Moritz, OPM “Big Data” July, 2012
    • 13. Data in Public ServiceThe Federal government manages data insatisfaction of three primary requirements:1) To account transparently for governmentoperations2) To provide citizen access to the products ofgovernment activities3) To fulfill mandated tasks for which thegovernment has no original data (thisrequires data acquisition)
    • 14. The basic goal is to make all data held by the USgovernment fully reliable and “audit-worthy”.All data and all derived data products should be ableto withstand exacting examination and testing.All descriptive information required for auditingshould be fully disclosed, readily available andeasily accessible in standard reporting formats.
    • 15. http://www2.cec.org/nampan/species/vaquita
    • 16. • AGS Alto Golfo Sustentable• ASM American Society of Mammalogists• CEC Commission for Environmental Cooperation• CEDO Intercultural Center for the Study ofDeserts and Oceans• CI Conservation International• CIRVA International Committee for the Recoveryof the Vaquita• CICESE Centro de Investigación Científica yEcuación Superior de Ensenada• CILA International Boundary and WaterCommission• CITES Convention on International Trade inEndangered Species of Wild Fauna and Flora• Conagua National Water Commission• Conanp National Commission for ProtectedNatural Areas,• Semarnat (Comisión Nacional de ÁreasNaturales Protegida—Semarnat)• Conapesca National Fisheries and AquacultureCommission• Sagarpa (Comisión Nacional de Pesca yAcuacultura, Sagarpa)• Profepa Federal Attorney for EnvironmentalProtection• Secretariat of Agriculture, Livestock, RuralDevelopment, Fisheries, and Food (Mexico)Salud Secretariat of Health (Mexico)• COSEWIC Committee on the Status ofEndangered Wildlife in Canada• Department of Fisheries and Oceans (Canada)• United States Department of the Interior• European Cetacean Society• US Environmental Protection Agency• US Food and Drug Administration• GEF Global Environmental• IBWC International Boundary and WaterCommission• National Institute of Ecology, Semarnat• Inapesca National Fisheries Institute, Sagarpa• IUCN World Conservation Union• International Whaling Commission• Local Economic and Employment Developmentprogram• United States Marine Mammal CommissionVAQUITA STAKEHOLDERS
    • 17. • Marine Stewardship Council• NAMPAN North American Marine ProtectedAreas Network (CEC)• US National Academy of Sciences• North American Wildlife Enforcement Group(CEC)• US National Marine Fisheries Service, NOAA,Department of Commerce• US National Oceanic and AtmosphericAdministration, Department of Commerce• United States National Ocean Service (NOAA)• PACE Species Conservation Action Programs,Conanp• PGR Attorney General Office (Mexico)• POEMGC Marine Ecological Planning of the Gulfof California Program, Semarnat• Procer Conservation Program for Species at Risk• Secretariat of Economy (Mexico)• Sectur Secretariat of Tourism (Mexico)• Sedesol Secretariat for Social Development(Mexico)• Semar Secretariat of the Navy• Semarnat Secretariat of the Environment andNatural Resources• Society for Marine Mammalogy• Solamac Latin American Society for AquaticMammals• Somemma Mexican Society for MarineMammalogy• SWFSC Southwest Fisheries Science Center( USNMFS, NOAA)• The Nature Conservancy• Universidad Autónoma de Baja California Sur• University of California• United Nations• United States Coast Guard• United States Fish and Wildlife Service• World Wildlife Fund
    • 18. Values: “Data Quality” ???In the most general colloquial terms, “Data Quality” is the fundamental issueof concern to scientists, policy makers, managers/decision makers andthe general public.“Quality” can be considered in terms of three primary values:• Validity: logical in terms of intended hypothesis to be tested (all potentialtypes of data that could be chosen should be weighed for probativevalue,,,)• Competence (Reliability) : consideration of the proper choice of expertstaff, methods, apparatus/gear, calibration, deployment and operation• Integrity: the maintenance of original integrity of data as well as trackingand documenting of all transformations and sequences of transformationof data
    • 19. Auditing – A Case History“InterAcademyCouncil Names IPCC Review Committee”“AMSTERDAM, Netherlands – The InterAcademy Council (IAC), anorganization of the world’s science academies, announced today thatHarold T. Shapiro, an economist and former president of PrincetonUniversity and the University of Michigan, will chair a 12-membercommittee to conduct an independent review of the procedures andprocesses of the Intergovernmental Panel on Climate Change (IPCC). Thereview was requested in March by U.N. Secretary-General Ban Ki-moonand IPCC Chair Rajendra K. Pachauri.“The committee will review IPCC procedures for preparing its assessmentreports. Among the issues to be reviewed are data quality assurance andcontrol; the type of literature that may be cited in IPCC reports; expert andgovernment review of IPCC materials; handling of the full range ofscientific views; and the correction of errors that are identified after areport has been completed. The committee also will review overall IPCCprocesses, including management functions and communication strategies(the full statement of task is available atwww.interacademycouncil.net/ipccreview).”http://reviewipcc.interacademycouncil.net/IACNamesIPCCReviewCommittee.html
    • 20. Climate Change Assessments:Review of the Processes and Procedures of the IPCC(InterAcademyCouncil)U.N. Press Conference Aug. 30, 2010“Opening Statement”by Harold T. ShapiroPresident Emeritus and Professor of Economicsand Public Affairs, Princeton University andChair, InterAcademy Council Committee toReview the IPCChttp://reviewipcc.interacademycouncil.net/OpeningStatement.html
    • 21. US BLM Manual 1283”Data Administration and Management”“Every employee is responsible for the quality, integrity,relevancy, accuracy, and currency of the data that iscreated, collected, or maintained, whether the data arein manual (paper copy) or electronic format. Managerswill employ good data management practices tomanage the data collected and maintained by theirprogram specialists. The program specialist who uses,manages, and distributes the data must ensure thatdata are collected according to established standardsand maintained to ensure accuracy and integrity. Thissection identifies specific responsibilities in support ofthe data management program.”Rel. No. 1-1742 Supersedes Rel. No. 1-1678 Date: 7/10/2012http://www.blm.gov/pgdata/etc/medialib/blm/wo/Information_Resources_Management/policy/blm_manual.Par.77674.File.dat/BLM_1283_manual_final.pdf
    • 22. A Gallery of Efforts to DepictFull Life Cycle Data Management
    • 23. Source: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDIAlliance. 2004. Accessed on 11 August 2008.http://www.icpsr.umich.edu/DDI/committee-info/Concept-Model-WD.pdf
    • 24. US NSF “DataNet” Program“the full data preservation and access lifecycle”• “acquisition”• “documentation”• “protection”• “access”• “analysis and dissemination”• “migration”• “disposition”“Sustainable Digital Data Preservation and Access Network Partners (DataNet) Program Solicitation” NSF 07-601 US National Science Foundation Office of Cyberinfrastructure Directorate for Computer & InformationScience & Engineering
    • 25. www.dcc.ac.uk/docs/publications/DCCLifecycle.pdf
    • 26. “JISC DCC Curation Lifecycle Model”Tom Moritz, OPM “Big Data” July, 2012http://www.dcc.ac.uk/docs/publications/DCCLifecycle.pdf
    • 27. http://wiki.esipfed.org/images/c/c4/IWGDD.pptInteragency WorkingGroup on Digital Data
    • 28. IWGDD“DIGITAL DATA LIFE CYCLE”Exhibit B-2. Life Cycle Functions for Digital Data*• Plan−− Determine what data need to be created or collected to support a research agenda or a mission function-- Identify and evaluate existing sources of needed data−− Identify standards for data and metadata format and quality−− Specify actions and responsibilities for managing the data over their life cycle• Create−− Produce or acquire data for intended purposes−− Deposit data where they will be kept, managed and accessed for as long as needed to support their intendedpurpose−− Produce derived products in support of intended purposes; e.g., data summaries, data aggregations, reports,publications• Keep−− Organize and store data to support intended purposes-- Integrate updates and additions into existing collections-- Ensure the data survive intact for as long as needed• Acquire and implement technology−− Refresh technology to overcome obsolescence and to improve performance−− Expand storage and processing capacity as needed−− Implement new technologies to support evolving needs for ingesting, processing, analysis, searching and accessingdata• Disposition−− Exit Strategy: plan for transferring data to another entity should the current repository no longer be able to keep it−− Once intended purposes are satisfied, determine whether to destroy data or transfer to another organizationsuited to addressing other needs or opportunitieshttp://www.nitrd.gov/about/harnessing_power_web.pdfTom Moritz, OPM “Big Data” July, 2012
    • 29. http://www.dataone.org/best-practices
    • 30. DataOne:The Data Life Cycle: An OverviewThe data life cycle has eight components:Plan: description of the data that will be compiled, and how the data will bemanaged and made accessible throughout its lifetimeCollect: observations are made either by hand or with sensors or otherinstruments and the data are placed a into digital formAssure: the quality of the data are assured through checks and inspectionsDescribe: data are accurately and thoroughly described using the appropriatemetadata standardsPreserve: data are submitted to an appropriate long-term archive (i.e. datacenter)Discover: potentially useful data are located and obtained, along with therelevant information about the data (metadata)Integrate: data from disparate sources are combined to form onehomogeneous set of data that can be readily analyzedAnalyze:data are analyzedDataOne Best Practices Primer:http://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf
    • 31. W. K. Michener “Meta-information concepts for ecological data management”Ecological Informatics 1 (2006) 3-7Tom Moritz, OPM “Big Data” July, 2012http://tinyurl.com/d49f3vm
    • 32. Federal GeographicData Committee”Stages of the GeospatialData Lifecycle pursuant toOMB Circular A–16, sections8(e)(d), 8(e)(f), and 8(e)(g)”http://www.fgdc.gov/policyandplanning/a-16/stages-of-geospatial-data-lifecycle-a16.pdf
    • 33. “The Geospatial Data Lifecycle is not intended tobe rigidly sequential or linear. The qualityassurance and (or) quality control (QA/QC)functions for the data should be included atevery stage of the Geospatial Data Lifecycle.”[emphasis added]--”Stages of the Geospatial Data Lifecycle pursuant to OMB Circular A–16, sections8(e)(d), 8(e)(f), and 8(e)(g)”http://www.fgdc.gov/policyandplanning/a-16/stages-of-geospatial-data-lifecycle-a16.pdf
    • 34. Interagency Science Working GroupNational Archives and Records Administrationhttp://www.archives.gov/records-mgmt/toolkit/pdf/ID373.pdf“Establishing Trustworthy Digital Repositories: A Discussion Guide Based on the ISO OpenArchival Information System (OAIS) Standard Reference Model January 19, 2011”
    • 35. “Sustainable data curation”“There are several main elements necessary to sustain data curation: “Robust data storage facilities (hardware and software) that are capable ofaccurately handling data migration across generations of media. “Backup plans, that are tested, so irreplaceable data are not at risk.Unintended data loss can occur for many reasons: some major causes are:poor stewardship leading to the loss of metadata to understand where thedata is located and documentation to understand the content, physicalfacility and equipment failure (fire, flood, irrecoverable hardware crashes),accidental data overwrite or deletion. “Science-educated staff with knowledge to match the data discipline isimportant for checking data integrity, choosing archive organization, creatingadequate metadata, consulting with users, and designing access systems thatmeet user expectations. Staff responsible for stewardship and curation mustunderstand the digital data content and potential scientific uses. “C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledgesharing ,” from the 4th International Digital Curation Conference December 2008 , page 10. www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
    • 36. Sustainable data curation(cont.) “Non-proprietary data formats that will ensure data access capability formany decades and will help avoid data losses resulting from softwareincompatibilities… “Consistent staffing levels and people dedicated to best practices inarchiving, access, and stewardship… “National and International partnerships and interactions greatly aids inshared achievements for broad scale user benefits, e.g. reanalyses,TIGGE… “Stable fundingnot focused on specific projects, but data management ingeneral…”C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledgesharing ,” from the 4th International Digital Curation Conference December 2008 , page 10-11. www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]
    • 37. Database Lifecycle Management“The Database Lifecycle Management covers the entirelifecycle of the databases, including:• Discovery and Inventory tracking: the ability to discoveryour assets, and track them• Initial provisioning, the ability to rollout databases inminutes• Ongoing Change Management, End-to-end management ofpatches , upgrades, schema and data changes• Configuration Management, track inventory, configurationdrift and detailed configuration search• Compliance Management, reporting and management ofindustry and regulatory compliance standards• Site level Disaster Protection Automation”http://www.oracle.com/technetwork/oem/pdf/511949.pdfTom Moritz, OPM “Big Data”
    • 38. DesignDefineConceptualisePlanProduceCreateAcquireReceiveCollectPreserveProtectCurateMaintainArchiveAppraiseSelectAnalyzeDistributeAccessUseReuseStoreDiscoverDisposeTransformDescribeRepurposeMetadatastandards AddMetadataAssure
    • 39. “Data Quality” ???“In the most general colloquial terms, ‘Data Quality’ is the fundamental issueof concern to scientists, policy makers, managers/decision makers and thegeneral public.‘Data Quality’can be considered in terms of three primary values:• Validity: logical in terms of intended hypothesis to be tested (all potentialtypes of data that could be chosen should be weighed for probativevalue,,,)• Competence (Reliability) :consideration of the proper choice of expertstaff, methods, apparatus/gear, calibration, deployment and operation• Integrity: the maintenance of original integrity of data as well as trackingand documenting of all recording, migration, transformations andsequences of transformation of data”Tom Moritz, OPM “Big Data” July, 2012
    • 40. “…the “validation” of any scientific hypotheses restsupon the sum integrity of all original data andof all sequences of data transformationto which original data have been subject. “– Tom Moritz“The Burden of Proof”Tom Moritz, OPM “Big Data”http://imsgbif.gbif.org/CMS_NEW/get_file.php?FILE=2b032cf8212d19a720f21465df0686
    • 41. Tom MoritzLos Angelestom.moritz@gmail.com310 963 0199http://www.linkedin.com/in/tmoritz

    ×