SlideShare a Scribd company logo
1 of 44
Download to read offline
Some notes on “Big Data” :
What does “full life-cycle” data management mean ?




                                 Tom Moritz, OPM “Big Data” July, 2012
Open Government and “Transparency”
Two dimensions:
-- Data aboutgovernment operations (all three
   branches!)
-- Data that represent the products of
   government activity
Case Studies
“A representation of the cholera epidemic
       of the nineteenth century”


 http://history.nih.gov/exhibits/history/index.html
http://www.ph.ucla.edu/epi/snow/snowmap1_1854_lge.htm
http://johnsnow.matrix.ms
u.edu/images/online_comp
anion/chapter_images/fig1
2-6.jpg
The “Flash Crash”: “On the afternoon of May 6, 2010, the U.S. equity markets
                experienced an extraordinary upheaval. Over approximately 10 minutes, the
               Dow Jones Industrial Average dropped more than 600 points, representing the
              disappearance of approximately $800 billion of market value. The share price of
              several blue-chip multinational companies fluctuated dramatically; shares that
               had been at tens of dollars plummeted to a penny in some cases and rocketed
                  to values over $100,000 per share in others. As suddenly as this market
               downturn occurred, it reversed, so over the next few minutes most of the loss
              was recovered and share prices returned to levels close to what they had been
                                              before the crash.”
                        “Large-Scale Complex IT Systems.” By Ian Sommerville, et al.
                          Communications of the ACM, Vol. 55 No. 7, Pages 71-77.
                  http://cacm.acm.org/magazines/2012/7/151233-large-scale-complex-it-
                                               systems/fulltext




   Paul Strand: “Wall Street, New York City, 1915” Aerial view of pedestrians walking along Wall Street in strong sunlight and building in background with large recesses
(likely 23 Wall Street, the headquarters of J.P. Morgan & Co.). Photograph by Paul Strand (a student of Lewis Hine), 1915; published in Camera Work, v. 48, p. 25. October
                                                                                   1916.

                                                 http://www.flickr.com/photos/trialsanderrors/3075553370/
The “Flash Crash” (2)
 “…the trigger event was identified as a single block sale of
 $4.1 billion of futures contracts executed with uncommon
 urgency on behalf of a fund-management company. That
 sale began a complex pattern of interactions between the
 high-frequency algorithmic trading systems (algos) that buy
 and sell blocks of financial instruments on incredibly short
 timescales.
  “A software bug did not cause the Flash Crash; rather, the
 interactions of independently managed software systems
 created conditions unforeseen (probably unforeseeable)
 by the owners and developers of the trading systems.
 Within seconds, the result was a failure in the broader
 socio-technical markets that increasingly rely on the
 algos…”
             “Large-Scale Complex IT Systems.” By Ian Sommerville, et al.
               Communications of the ACM, Vol. 55 No. 7, Pages 71-77.
http://cacm.acm.org/magazines/2012/7/151233-large-scale-complex-it-systems/fulltext
The “Flash Crash”(3): Key Insights
  “Coalitions of systems, in which the system elements are
 managed and owned independently, pose challenging new
 problems for systems engineering.”
 “ When the fundamental basis of engineering –
 reductionism – breaks down, incremental improvements to
 current engineering techniques are unable to address the
 challenges of developing, integrating, and deploying large-
 scale complex IT systems.”
 “Developing complex systems requires a socio-technical
 perspective involving human, organizational, social and
 political factors, as well as technical factors.”

             “Large-Scale Complex IT Systems.” By Ian Sommerville, et al.
               Communications of the ACM, Vol. 55 No. 7, Pages 71-77.
http://cacm.acm.org/magazines/2012/7/151233-large-scale-complex-it-systems/fulltext
The Digital Environment…
The “Ecology” of Digital Data

 GRIDS




 Data
                                   International
 Centers
                                   Collaborative
                                   Research Effort



Individual
                    National Disciplinary Initiatives
Libraries

              Cooperative Projects

Local /
             Individuals
Personal
Archiving

              “Small Science”                           “BIG Science”
The Public Domain

“The institutional
ecology of the
digital
environment”
(Yokai Benkler)

Sectors (public < -
> private) and
Jurisdictional Scale




THE ROLE OF SCIENTIFIC AND TECHNICAL DATA AND INFORMATION IN THE PUBLIC DOMAIN PROCEEDINGS OF A SYMPOSIUM Julie M. Esanu
and Paul F. Uhlir, Editors Steering Committee on the Role of Scientific and Technical Data and Information in the Public Domain Office of
International Scientific and Technical Information Programs Board on International Scientific Organizations Policy and Global Affairs Division,
National Research Council of the National Academies, p. 5
The “small science,” independent investigator approach traditionally has
characterized a large area of experimental laboratory sciences, such as
chemistry or biomedical research, and field work and studies, such as
biodiversity, ecology, microbiology, soil science, and anthropology. The
data or samples are collected and analyzed independently, and the
resulting data sets from such studies generally are heterogeneous and
unstandardized, with few of the individual data holdings deposited in
public data repositories or openly shared.
The data exist in various twilight states of accessibility, depending
on the extent to which they are published, discussed in papers but not
revealed, or just known about because of reputation or ongoing work,
but kept under absolute or relative secrecy. The data are thus
disaggregated components of an incipient network that is only as
effective as the individual transactions that put it together.
Openness and sharing are not ignored, but they are not necessarily
dominant either. These values must compete with strategic
considerations of self-interest, secrecy, and the logic of mutually
beneficial exchange, particularly in areas of research in which
commercial applications are more readily identifiable.
The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium. Julie M. Esanu and Paul
F. Uhlir, Eds. Steering Committee on the Role of Scientific and Technical Data and Information in the Public Domain Office of
International Scientific and Technical Information Programs Board on International Scientific Organizations Policy and Global Affairs
Division, National Research Council of the National Academies, p. 8
“Small” data collections may become
      “Big” (and more complex)
by successive aggregation of sources…
Linked Open Data

                                                             2009
                                                               2011




              Courtesy of Tim Lebo, RPI http://bit.ly/lebo-ipaw-
20 Jun 2012                                2012
                        @timrdf http://bit.ly/lebo-ipaw-2012          16
“Data” ? [technical definition]
“…’data’ are defined as any information that can be stored in
  digital form and accessed electronically, including, but not
  limited to, numeric data, text, publications, sensor streams,
  video, audio, algorithms, software, models and simulations,
  images, etc.”-- Program Solicitation 07-601
  “Sustainable Digital Data Preservation and Access Network Partners (DataNet)”



Taken in this broadest possible sense, “data” are thus simply
   electronic coded forms of information. And virtually anything
   can be represented as “data” so long as it is electronically
  machine-readable.
“The digital universe in 2007 — at 2.25 x 1021bits (281 exabytes
       or 281 billion gigabytes) — was 10% bigger than we thought.
       The resizing comes as a result of faster growth in cameras,
       digital TV shipments, and better understanding of information
       replication.
    “By 2011, the digital universe will be 10 times the size it was in
       2006.
    “As forecast, the amount of information created, captured, or
       replicated exceeded available storage for the first time in
       2007. Not all information created and transmitted gets stored,
       but by 2011, almost half of the digital universe will not have a
       permanent home.
    “Fast-growing corners of the digital universe include those
       related to digital TV, surveillance cameras, Internet access in
       emerging countries, sensor-based applications, datacenters
       supporting “cloud computing,” and social networks.
The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth through 2011 -- Executive Summary.
IDC Information and Data, March, 2008 http://www.emc.com/collateral/analyst-reports/diverse-exploding-idc-exec-summary.pdf
“As you go down the Long Tail the signal-to-noise ratio gets worse. Thus
the only way you can maintain a consistently good enough signal to find
what you want is if your filters get increasingly powerful.”
          Chris Anderson “Is the Long Tail full of crap?” May 22, 2005


http://longtail.typepad.com/the_long_tail/2005/05/isnt_the_long_t.html
“Data” [epistemic definition]
“Measurements, observations or descriptions of
 a referent -- such as an individual, an event, a
 specimen in a collection or an
 excavated/surveyed object -- created or
 collected through human interpretation
 (whether directly “by hand” or through the use
 of technologies)”
                 -- AnthroDPA Working Group on Metadata (May, 2009)
Data Entropy:
                 the risks of inaction and the urgency of action




“…data longevity is increased. Comprehensive metadata counteract the natural tendency for
                     data to degrade in information content through time
               (i.e. information entropy sensu Michener et al., 1997; Fig. 1).”
 W. K. Michener “Meta-information concepts for ecological data management” Ecological Informatics 1 (2006) 3-7


                                                                     Tom Moritz, OPM “Big Data” July, 2012
Data Development:
     “Data Reduction - Processing Level Definitions” (an example)




          Report of the EOS Data Panel Vol IIA, NASA, 1986 (Tech Memorandum 87777)
http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19860021622_1986021622.pdf
                                                       Tom Moritz, OPM “Big Data” July, 2012
T.C. Chamberlin   Tom Moritz, OPM “Big Data” July, 2012
Hypotheses and data as evidence:
                    Inductive< -- > Deductive feedback loops?

  “What science does is put forward hypotheses, and use them to make
 predictions, and test those predictions against empirical evidence. Then the
scientists make judgments about which hypotheses are more likely, given the
 data. These judgments are notoriously hard to formalize, as Thomas Kuhn
argued in great detail, and philosophers of science don’t have anything like
 a rigorous understanding of how such judgments are made. But that’s only
 a worry at the most severe levels of rigor; in rough outline, the procedure is
  pretty clear. Scientists like hypotheses that fit the data, of course, but they
      also like them to be consistent with other established ideas, to be
  unambiguous and well-defined, to be wide in scope, and most of all to be
 simple. The more things an hypothesis can explain on the basis of the fewer
                   pieces of input, the happier scientists are.”

                                                                                             -- Sean Carroll
                                                                 “Science and Religion are not Compatible”
                                                                                        Discover Magazine
                                                                                  June 23rd, 2009 8:01 AM


http://blogs.discovermagazine.com/cosmicvariance/2009/06/23/science-and-religion-are-not-compatible/

                                                                Tom Moritz, OPM “Big Data” July, 2012
Full Life Cycle Management?




                  Tom Moritz, OPM “Big Data” July, 2012
US NSF “DataNet” Program
            “the full data preservation and access lifecycle”

      •   “acquisition”
      •   “documentation”
      •   “protection”
      •   “access”
      •   “analysis and dissemination”
      •   “migration”
      •   “disposition”
“Sustainable Digital Data Preservation and Access Network Partners (DataNet) Program Solicitation” NSF 07-
 601 US National Science Foundation Office of Cyberinfrastructure Directorate for Computer & Information
                                           Science & Engineering
   http://www.nsf.gov/pubs/2007/nsf07601/nsf07601.htm            Tom Moritz, OPM “Big Data” July, 2012
IWGDD =
                                                             [US]
                                                             “Interagency
                                                             Working
                                                             Group on
                                                             Digital Data”




http://www.nitrd.gov/about/harnessi
        ng_power_web.pdf              Tom Moritz, OPM “Big Data” July, 2012
IWGDD“DIGITAL DATA LIFE CYCLE”
                         Exhibit B-2. Life Cycle Functions for Digital Data*
   • Plan
   −− Determine what data need to be created or collected to support a research agenda or a mission function
         -- Identify and evaluate existing sources of needed data
   −− Identify standards for data and metadata format and quality
   −− Specify actions and responsibilities for managing the data over their life cycle
   • Create
   −− Produce or acquire data for intended purposes
   −− Deposit data where they will be kept, managed and accessed for as long as needed to support their intended
        purpose
   −− Produce derived products in support of intended purposes; e.g., data summaries, data aggregations, reports,
        publications
   • Keep
   −− Organize and store data to support intended purposes
         -- Integrate updates and additions into existing collections
         -- Ensure the data survive intact for as long as needed
   • Acquire and implement technology
   −− Refresh technology to overcome obsolescence and to improve performance
   −− Expand storage and processing capacity as needed
   −− Implement new technologies to support evolving needs for ingesting, processing, analysis, searching and accessing
        data
   • Disposition
   −− Exit Strategy: plan for transferring data to another entity should the current repository no longer be able to keep it
   −− Once intended purposes are satisfied, determine whether to destroy data or transfer to another organization
        suited to addressing other needs or opportunities

http://www.nitrd.gov/about/harnessing_power_web.pdf
                                                                                Tom Moritz, OPM “Big Data” July, 2012
www.dcc.ac.uk/docs/publications/DCCLifecycle.pdf




                                                   Tom Moritz, OPM “Big Data” July, 2012
“JISC DCC Curation Lifecycle Model”
http://www.dcc.ac.uk/docs/publications/DCCLifecycle.pdf   Tom Moritz, OPM “Big Data” July, 2012
Database Lifecycle Management
  “The Database Lifecycle Management covers the entire
    lifecycle of the databases, including:
      • Discovery and Inventory tracking: the ability to discover
         your assets, and track them
      • Initial provisioning, the ability to rollout databases in
         minutes
      • Ongoing Change Management, End-to-end management of
         patches , upgrades, schema and data changes
      • Configuration Management, track inventory, configuration
         drift and detailed configuration search
      • Compliance Management, reporting and management of
         industry and regulatory compliance standards
      • Site level Disaster Protection Automation”
http://www.oracle.com/technetwork/oem/pdf/511949.pdf
                                                       Tom Moritz, OPM “Big Data” July, 2012
W. K. Michener “Meta-information concepts for ecological data management”
  Ecological Informatics 1 (2006) 3-7

http://tinyurl.com/d49f3vm                         Tom Moritz, OPM “Big Data” July, 2012
“Sustainable data curation”
              “There are several main elements necessary to sustain data curation:



     “Robust data storage facilities (hardware and software) that are capable of
      accurately handling data migration across generations of media.

     “Backup plans, that are tested, so irreplaceable data are not at risk.
      Unintended data loss can occur for many reasons: some major causes are:
      poor stewardship leading to the loss of metadata to understand where the
      data is located and documentation to understand the content, physical
      facility and equipment failure (fire, flood, irrecoverable hardware crashes),
      accidental data overwrite or deletion.

      “Science-educated staff with knowledge to match the data discipline is
       important for checking data integrity, choosing archive organization, creating
       adequate metadata, consulting with users, and designing access systems that
       meet user expectations. Staff responsible for stewardship and curation must
       understand the digital data content and potential scientific uses. “
C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge
        sharing ,” from the 4th International Digital Curation Conference December 2008 , page 10. www.dcc.ac.uk/events/dcc-
                    2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]



                                                                                      Tom Moritz, OPM “Big Data” July, 2012
Sustainable data curation(cont.)
        “Non-proprietary data formats that will ensure data access capability for
         many decades and will help avoid data losses resulting from software
         incompatibilities…

        “Consistent staffing levels and people dedicated to best practices in
         archiving, access, and stewardship…

        “National and International partnerships and interactions greatly aids in
         shared achievements for broad scale user benefits, e.g. reanalyses,
         TIGGE…

        “Stable fundingnot focused on specific projects, but data management in
         general…”
C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge
        sharing ,” from the 4th International Digital Curation Conference December 2008 , page 10-11. www.dcc.ac.uk/events/dcc-
                      2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]



                                                                                      Tom Moritz, OPM “Big Data” July, 2012
“Data Quality” ???
In general colloquial terms, “Data Quality” is the fundamental issue of
    concern to scientists, policy makers, managers/decision makers and the
    general public.

“Quality” can be considered in terms of three primary values:

• Validity: logical in terms of intended hypothesis to be tested (all potential
  types of data that could be chosen should be weighed for probative
  value…)

• Competence (Reliability) :consideration of the proper choice of expert
  staff, methods, apparatus/gear, calibration, deployment and operation

• Integrity: the maintenance of original integrity of data as well as tracking
  and documenting of all recording, migration, transformations and
  sequences of transformation of data

                                                  Tom Moritz, OPM “Big Data” July, 2012
“…the “validation” of any scientific hypotheses rests
   upon the sum integrity of all original data and
       of all sequences of data transformation
     to which original data have been subject. “


                                                                    – Tom Moritz
                                                            “The Burden of Proof”




http://imsgbif.gbif.org/CMS_NEW/get_file.php?FI
LE=2b032cf8212d19a720f21465df0686
                                                  Tom Moritz, OPM “Big Data” July, 2012
A Primary Goal of Open Government
Public Access to Data that is:
• Of High Quality ( SEE –previous discussion)
• Free – no cost or minimal cost
• Open – easily discoverable and accessible
   – “A piece of content or data is open if anyone is free to use, reuse, and
     redistribute it — subject only, at most, to the requirement to attribute
     and/or share-alike.” * http://http://opendefinition.org/ ]

• Effective / Useful / Usable – both technically usable
  and descriptively identified in ways that support ready
  analysis, citation, use, reuse…

    T. Moritz “The Burden of Proof: Data as Evidence in Science and Public Policy”
       MicroSoft Research, GRDI2020, Stellenbosch, South Africa , Sept., 2010
 http://www.grdi2020.eu/Pages/SelectedDocument.aspx?id_documento=87f1b6d5-
                           5c30-42a7-94df-d9cd5f4b147c
Thanks for your attention…

             Tom Moritz
        Tom Moritz Consultancy
             Los Angeles
        tom.moritz@gmail.com
           +1 310 963 0199
          tommoritz (Skype)
 http://www.linkedin.com/in/tmoritz
http://www.slideshare.net/Tom_Moritz
Saturn images courtesy of R J Robbins and The Research Coordinating Network for the Genomics Standards Consortium…
http://www.nytimes.com/1992/
10/31/world/after-350-years-
vatican-says-galileo-was-right-
it-moves.html
Rosalind Franklin’s Image                           “Franklin's B-form data, in conjunction
                                                      with cylindrical Patterson map calculations
                                                      that she had applied to her A-form data,
                                                      allowed her to determine DNA's density,
                                                      unit-cell size, and water content. With
                                                      those data, Franklin proposed a double-
                                                      helix structure with precise measurements
                                                      for the diameter, the separation between
                                                      each of the coaxial fibers along the fiber
                                                      axis direction, and the pitch of the helix.3

                                                      “The diffraction photograph of the B form
                                                      of DNA taken by Rosalind Franklin in May
                                                      1952 was by far the best photograph of its
                                                      kind. Data derived from this photograph
                                                      were instrumental in allowing James
                                                      Watson and Francis Crick to construct
                                                      their Nobel Prize­winning model for DNA.”
                                                      (Courtesy of the Norman Collection on the
                                                      History of Molecular Biology in Novato,
                                                      Calif.)




http://philosophyofscienceportal.blogspot.com/2008/04/rosalind-franklin-double-helix.html
“Notebook entries show that Rosalind Franklin (a)
recognized that the B form of DNA was likely to have a
two-chained helix; (b) was aware of the Chargaff ratios;
(c) knew that most, if not all, of the nitrogenous bases in
DNA were in the keto configuration…; and (d)
determined that the backbone chains of A-form DNA are
antiparallel.” (Courtesy of Anne Sayre and Jenifer
Franklin Glynn.)

http://philosophyofscienceportal.blogspot.com/2008/04/rosalind-franklin-double-helix.html
“Transcript of letter from
James Watson to Max Delbruck
       March 12, 1953”



“The basic structure is helical – it
  consists of two intertwining
            helices…”




http://osulibrary.oregonstate.edu/spe
cialcollections/coll/pauling/dna/corr/c
orr432.1-watson-delbruck-19530312-
              01-large.html

More Related Content

What's hot

Digital Networks
Digital NetworksDigital Networks
Digital NetworksKathy Gill
 
Connection for Innovation - Petter Coffee - Avanxo Cloud Forum 2013
Connection for Innovation - Petter Coffee - Avanxo Cloud Forum 2013 Connection for Innovation - Petter Coffee - Avanxo Cloud Forum 2013
Connection for Innovation - Petter Coffee - Avanxo Cloud Forum 2013 Avanxo
 
Internet of Things - IoT Webinar 2013
Internet of Things - IoT Webinar 2013Internet of Things - IoT Webinar 2013
Internet of Things - IoT Webinar 2013Desiree Miloshevic
 
The Cyberspace: Redefining A New World
The Cyberspace: Redefining A New WorldThe Cyberspace: Redefining A New World
The Cyberspace: Redefining A New Worldiosrjce
 
Mobile network fundamentals and evolution
Mobile network fundamentals and evolutionMobile network fundamentals and evolution
Mobile network fundamentals and evolutionVedran Podobnik
 
Technology Education in an Urban Metropolitan University
Technology Education in an Urban Metropolitan UniversityTechnology Education in an Urban Metropolitan University
Technology Education in an Urban Metropolitan UniversityJoe McCarthy
 
Big Data and Social Sciences
Big Data and Social SciencesBig Data and Social Sciences
Big Data and Social SciencesDavid De Roure
 
Digital Futures - Data & Community Ecosystems
Digital Futures - Data & Community EcosystemsDigital Futures - Data & Community Ecosystems
Digital Futures - Data & Community EcosystemsOpen Knowledge Canada
 
 Blockchain Overview: Possibilities and Issues
 Blockchain Overview: Possibilities and Issues Blockchain Overview: Possibilities and Issues
 Blockchain Overview: Possibilities and IssuesBohyun Kim
 
The Internet of Things How the Next Evolution of the Internet Is Changing Eve...
The Internet of Things How the Next Evolution of the Internet Is Changing Eve...The Internet of Things How the Next Evolution of the Internet Is Changing Eve...
The Internet of Things How the Next Evolution of the Internet Is Changing Eve...Business of Software Conference
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjMirko Lorenz
 
Scholarship in the Digital World
Scholarship in the Digital WorldScholarship in the Digital World
Scholarship in the Digital WorldDavid De Roure
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012Lee Dirks
 
Moritz D Lib Building The Biodiversity Commons
Moritz D Lib Building The Biodiversity CommonsMoritz D Lib Building The Biodiversity Commons
Moritz D Lib Building The Biodiversity CommonsTom Moritz
 
The impact of information in society
The impact of information in society The impact of information in society
The impact of information in society Abrar Almjaly
 
MOOCs and ubiquitous computing
MOOCs and ubiquitous computingMOOCs and ubiquitous computing
MOOCs and ubiquitous computingBryan Alexander
 

What's hot (20)

Future Of Internet IV | AAAS
Future Of Internet IV | AAASFuture Of Internet IV | AAAS
Future Of Internet IV | AAAS
 
Ethics of Automation
Ethics of AutomationEthics of Automation
Ethics of Automation
 
Digital Networks
Digital NetworksDigital Networks
Digital Networks
 
Connection for Innovation - Petter Coffee - Avanxo Cloud Forum 2013
Connection for Innovation - Petter Coffee - Avanxo Cloud Forum 2013 Connection for Innovation - Petter Coffee - Avanxo Cloud Forum 2013
Connection for Innovation - Petter Coffee - Avanxo Cloud Forum 2013
 
Internet of Things - IoT Webinar 2013
Internet of Things - IoT Webinar 2013Internet of Things - IoT Webinar 2013
Internet of Things - IoT Webinar 2013
 
The Cyberspace: Redefining A New World
The Cyberspace: Redefining A New WorldThe Cyberspace: Redefining A New World
The Cyberspace: Redefining A New World
 
Statistics in Journalism Sheffield 2014
Statistics in Journalism Sheffield 2014Statistics in Journalism Sheffield 2014
Statistics in Journalism Sheffield 2014
 
Mobile network fundamentals and evolution
Mobile network fundamentals and evolutionMobile network fundamentals and evolution
Mobile network fundamentals and evolution
 
Technology Education in an Urban Metropolitan University
Technology Education in an Urban Metropolitan UniversityTechnology Education in an Urban Metropolitan University
Technology Education in an Urban Metropolitan University
 
Big Data and Social Sciences
Big Data and Social SciencesBig Data and Social Sciences
Big Data and Social Sciences
 
Digital Futures - Data & Community Ecosystems
Digital Futures - Data & Community EcosystemsDigital Futures - Data & Community Ecosystems
Digital Futures - Data & Community Ecosystems
 
 Blockchain Overview: Possibilities and Issues
 Blockchain Overview: Possibilities and Issues Blockchain Overview: Possibilities and Issues
 Blockchain Overview: Possibilities and Issues
 
The Internet of Things How the Next Evolution of the Internet Is Changing Eve...
The Internet of Things How the Next Evolution of the Internet Is Changing Eve...The Internet of Things How the Next Evolution of the Internet Is Changing Eve...
The Internet of Things How the Next Evolution of the Internet Is Changing Eve...
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
 
Scholarship in the Digital World
Scholarship in the Digital WorldScholarship in the Digital World
Scholarship in the Digital World
 
top 10 Data Mining Algorithms
top 10 Data Mining Algorithmstop 10 Data Mining Algorithms
top 10 Data Mining Algorithms
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
 
Moritz D Lib Building The Biodiversity Commons
Moritz D Lib Building The Biodiversity CommonsMoritz D Lib Building The Biodiversity Commons
Moritz D Lib Building The Biodiversity Commons
 
The impact of information in society
The impact of information in society The impact of information in society
The impact of information in society
 
MOOCs and ubiquitous computing
MOOCs and ubiquitous computingMOOCs and ubiquitous computing
MOOCs and ubiquitous computing
 

Viewers also liked

Public personnel management
Public personnel managementPublic personnel management
Public personnel managementRahat ul Aain
 
Development of Personnel Management
Development of Personnel ManagementDevelopment of Personnel Management
Development of Personnel ManagementBendita Baylôn Ü
 
Social Media for Personnel Management and Training
Social Media for Personnel Management and Training Social Media for Personnel Management and Training
Social Media for Personnel Management and Training Greg Friese
 
Personnel Management and Industrial Psycology
Personnel Management and Industrial PsycologyPersonnel Management and Industrial Psycology
Personnel Management and Industrial PsycologyNishant Munjal
 
Development of personnel management
Development of personnel managementDevelopment of personnel management
Development of personnel managementemmanuel ebro
 
NEW BUSINESS DEVELOPMENT
NEW BUSINESS DEVELOPMENTNEW BUSINESS DEVELOPMENT
NEW BUSINESS DEVELOPMENTJonty Mohta
 
New Business Development, By Richard Garrity- 2013
New Business Development, By Richard Garrity- 2013New Business Development, By Richard Garrity- 2013
New Business Development, By Richard Garrity- 2013Richard Garrity
 
Old Business Development Vs New Business Development
Old Business Development Vs New Business DevelopmentOld Business Development Vs New Business Development
Old Business Development Vs New Business DevelopmentDeeallan
 
New Business Development Proposal - Adding Project Portfolio Management (PPM)...
New Business Development Proposal - Adding Project Portfolio Management (PPM)...New Business Development Proposal - Adding Project Portfolio Management (PPM)...
New Business Development Proposal - Adding Project Portfolio Management (PPM)...Rolly Perreaux, PMP
 
School Personnel Management
School Personnel ManagementSchool Personnel Management
School Personnel Managementjoems_angel2000
 
Definitions of personnel management
Definitions of personnel managementDefinitions of personnel management
Definitions of personnel managementAnything Group
 
Using Social Media In HR & Recruiting 10 20 2009 SummitUp Conference
Using Social Media In HR & Recruiting 10 20 2009 SummitUp ConferenceUsing Social Media In HR & Recruiting 10 20 2009 SummitUp Conference
Using Social Media In HR & Recruiting 10 20 2009 SummitUp ConferenceJennifer McClure
 

Viewers also liked (20)

Public personnel management
Public personnel managementPublic personnel management
Public personnel management
 
Development of Personnel Management
Development of Personnel ManagementDevelopment of Personnel Management
Development of Personnel Management
 
Social Media for Personnel Management and Training
Social Media for Personnel Management and Training Social Media for Personnel Management and Training
Social Media for Personnel Management and Training
 
Personnel management
Personnel managementPersonnel management
Personnel management
 
Personnel Management and Industrial Psycology
Personnel Management and Industrial PsycologyPersonnel Management and Industrial Psycology
Personnel Management and Industrial Psycology
 
Personnel Management Rttc
Personnel Management RttcPersonnel Management Rttc
Personnel Management Rttc
 
Development of personnel management
Development of personnel managementDevelopment of personnel management
Development of personnel management
 
Personnel management
Personnel managementPersonnel management
Personnel management
 
NEW BUSINESS DEVELOPMENT
NEW BUSINESS DEVELOPMENTNEW BUSINESS DEVELOPMENT
NEW BUSINESS DEVELOPMENT
 
New Business Development, By Richard Garrity- 2013
New Business Development, By Richard Garrity- 2013New Business Development, By Richard Garrity- 2013
New Business Development, By Richard Garrity- 2013
 
Old Business Development Vs New Business Development
Old Business Development Vs New Business DevelopmentOld Business Development Vs New Business Development
Old Business Development Vs New Business Development
 
New Business Development Proposal - Adding Project Portfolio Management (PPM)...
New Business Development Proposal - Adding Project Portfolio Management (PPM)...New Business Development Proposal - Adding Project Portfolio Management (PPM)...
New Business Development Proposal - Adding Project Portfolio Management (PPM)...
 
Personnel Management
Personnel ManagementPersonnel Management
Personnel Management
 
Module 1 HRM vs. Personnel Management
Module 1 HRM vs. Personnel ManagementModule 1 HRM vs. Personnel Management
Module 1 HRM vs. Personnel Management
 
Role of Personnel Management
Role of Personnel ManagementRole of Personnel Management
Role of Personnel Management
 
School Personnel Management
School Personnel ManagementSchool Personnel Management
School Personnel Management
 
Definitions of personnel management
Definitions of personnel managementDefinitions of personnel management
Definitions of personnel management
 
Personnel Management
Personnel ManagementPersonnel Management
Personnel Management
 
Using Social Media In HR & Recruiting 10 20 2009 SummitUp Conference
Using Social Media In HR & Recruiting 10 20 2009 SummitUp ConferenceUsing Social Media In HR & Recruiting 10 20 2009 SummitUp Conference
Using Social Media In HR & Recruiting 10 20 2009 SummitUp Conference
 
Personnel Management
Personnel Management Personnel Management
Personnel Management
 

Similar to US Office of Personnel Management: Notes on "Big Data"

Economics of Digital Information
Economics of Digital InformationEconomics of Digital Information
Economics of Digital InformationKathy Gill
 
We Do That Differently* Now
We Do That Differently* NowWe Do That Differently* Now
We Do That Differently* NowPeter Coffee
 
IAMSLIC 2012, ANCHORAGE, AK
IAMSLIC 2012, ANCHORAGE, AK IAMSLIC 2012, ANCHORAGE, AK
IAMSLIC 2012, ANCHORAGE, AK Tom Moritz
 
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...Azamat Abdoullaev
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science suresh sood
 
Soderstrom
SoderstromSoderstrom
SoderstromNASAPMC
 
Economics of Information/Technology
Economics of Information/TechnologyEconomics of Information/Technology
Economics of Information/TechnologyKathy Gill
 
Sensory transformation
Sensory transformationSensory transformation
Sensory transformationKarlos Svoboda
 
Gimme my data: government transformation
Gimme my data: government transformationGimme my data: government transformation
Gimme my data: government transformationW. David Stephenson
 
Keis0s2 Is Stages 2008
Keis0s2 Is Stages 2008Keis0s2 Is Stages 2008
Keis0s2 Is Stages 2008Ian Miles
 
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...AnthonyOtuonye
 
Week 2 - Networks and Externalities
Week 2 - Networks and ExternalitiesWeek 2 - Networks and Externalities
Week 2 - Networks and ExternalitiesKathy Gill
 
Introducing the Internet of Things: lecture @IULM University
Introducing the Internet of Things: lecture @IULM UniversityIntroducing the Internet of Things: lecture @IULM University
Introducing the Internet of Things: lecture @IULM UniversityLeandro Agro'
 
Data driven innovation for education
Data driven innovation for education Data driven innovation for education
Data driven innovation for education EduSkills OECD
 
The Impact of Information System (Internet of Things) on Management and Globa...
The Impact of Information System (Internet of Things) on Management and Globa...The Impact of Information System (Internet of Things) on Management and Globa...
The Impact of Information System (Internet of Things) on Management and Globa...BRNSSPublicationHubI
 
Codes, Clouds & Constellations: Open Science in the Data Decade
Codes, Clouds & Constellations: Open Science in the Data DecadeCodes, Clouds & Constellations: Open Science in the Data Decade
Codes, Clouds & Constellations: Open Science in the Data DecadeLizLyon
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science JournalismLiliana Bounegru
 

Similar to US Office of Personnel Management: Notes on "Big Data" (20)

Economics of Digital Information
Economics of Digital InformationEconomics of Digital Information
Economics of Digital Information
 
We Do That Differently* Now
We Do That Differently* NowWe Do That Differently* Now
We Do That Differently* Now
 
IAMSLIC 2012, ANCHORAGE, AK
IAMSLIC 2012, ANCHORAGE, AK IAMSLIC 2012, ANCHORAGE, AK
IAMSLIC 2012, ANCHORAGE, AK
 
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...
AI WORLD: I-World: EIS Global Innovation Platform: BIG Knowledge World vs. BI...
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
Soderstrom
SoderstromSoderstrom
Soderstrom
 
Economics of Information/Technology
Economics of Information/TechnologyEconomics of Information/Technology
Economics of Information/Technology
 
Sensory transformation
Sensory transformationSensory transformation
Sensory transformation
 
Gimme my data: government transformation
Gimme my data: government transformationGimme my data: government transformation
Gimme my data: government transformation
 
So what if it's a bubble?
So what if it's a bubble?So what if it's a bubble?
So what if it's a bubble?
 
Keis0s2 Is Stages 2008
Keis0s2 Is Stages 2008Keis0s2 Is Stages 2008
Keis0s2 Is Stages 2008
 
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
 
Week 2 - Networks and Externalities
Week 2 - Networks and ExternalitiesWeek 2 - Networks and Externalities
Week 2 - Networks and Externalities
 
Introducing the Internet of Things: lecture @IULM University
Introducing the Internet of Things: lecture @IULM UniversityIntroducing the Internet of Things: lecture @IULM University
Introducing the Internet of Things: lecture @IULM University
 
Data driven innovation for education
Data driven innovation for education Data driven innovation for education
Data driven innovation for education
 
The Impact of Information System (Internet of Things) on Management and Globa...
The Impact of Information System (Internet of Things) on Management and Globa...The Impact of Information System (Internet of Things) on Management and Globa...
The Impact of Information System (Internet of Things) on Management and Globa...
 
Data and science
Data and scienceData and science
Data and science
 
Codes, Clouds & Constellations: Open Science in the Data Decade
Codes, Clouds & Constellations: Open Science in the Data DecadeCodes, Clouds & Constellations: Open Science in the Data Decade
Codes, Clouds & Constellations: Open Science in the Data Decade
 
10probs.ppt
10probs.ppt10probs.ppt
10probs.ppt
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science Journalism
 

More from Tom Moritz

ESA Science Commons
ESA Science CommonsESA Science Commons
ESA Science CommonsTom Moritz
 
Marine microbiology
Marine microbiologyMarine microbiology
Marine microbiologyTom Moritz
 
Pelagic Environments and Ecology (3) copy
Pelagic Environments and Ecology (3) copyPelagic Environments and Ecology (3) copy
Pelagic Environments and Ecology (3) copyTom Moritz
 
Pelagic environment and ecology (2)
Pelagic environment and ecology (2) Pelagic environment and ecology (2)
Pelagic environment and ecology (2) Tom Moritz
 
Pelagic Environments and Ecosystems (1)
Pelagic Environments and Ecosystems (1)Pelagic Environments and Ecosystems (1)
Pelagic Environments and Ecosystems (1)Tom Moritz
 
Chaparral and Coastal Scrub Ecology
Chaparral and Coastal Scrub EcologyChaparral and Coastal Scrub Ecology
Chaparral and Coastal Scrub EcologyTom Moritz
 
The Intertidal and Kelp Forests - Pacific Coast
The Intertidal and Kelp Forests  - Pacific CoastThe Intertidal and Kelp Forests  - Pacific Coast
The Intertidal and Kelp Forests - Pacific CoastTom Moritz
 
A Universe of Data
A Universe of DataA Universe of Data
A Universe of DataTom Moritz
 
Climate Change
Climate ChangeClimate Change
Climate ChangeTom Moritz
 
Climate change
Climate changeClimate change
Climate changeTom Moritz
 
The commons???
The commons???The commons???
The commons???Tom Moritz
 
Ecological Society of America Science Commons
Ecological Society of America Science CommonsEcological Society of America Science Commons
Ecological Society of America Science CommonsTom Moritz
 
Epidemiology cholera, ebola, hiv aids
Epidemiology cholera, ebola, hiv aidsEpidemiology cholera, ebola, hiv aids
Epidemiology cholera, ebola, hiv aidsTom Moritz
 
The Human Biome
The Human BiomeThe Human Biome
The Human BiomeTom Moritz
 
Epistemology, ontology, knowledge x
Epistemology, ontology, knowledge xEpistemology, ontology, knowledge x
Epistemology, ontology, knowledge xTom Moritz
 
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)Tom Moritz
 
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary TheoryCharles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary TheoryTom Moritz
 
Trauma and violence
Trauma and violenceTrauma and violence
Trauma and violenceTom Moritz
 
Children and Trauma in the International World (UWest Psych 490 November 7, 2...
Children and Trauma in the International World (UWest Psych 490 November 7, 2...Children and Trauma in the International World (UWest Psych 490 November 7, 2...
Children and Trauma in the International World (UWest Psych 490 November 7, 2...Tom Moritz
 

More from Tom Moritz (20)

ESA Science Commons
ESA Science CommonsESA Science Commons
ESA Science Commons
 
Microbiology
MicrobiologyMicrobiology
Microbiology
 
Marine microbiology
Marine microbiologyMarine microbiology
Marine microbiology
 
Pelagic Environments and Ecology (3) copy
Pelagic Environments and Ecology (3) copyPelagic Environments and Ecology (3) copy
Pelagic Environments and Ecology (3) copy
 
Pelagic environment and ecology (2)
Pelagic environment and ecology (2) Pelagic environment and ecology (2)
Pelagic environment and ecology (2)
 
Pelagic Environments and Ecosystems (1)
Pelagic Environments and Ecosystems (1)Pelagic Environments and Ecosystems (1)
Pelagic Environments and Ecosystems (1)
 
Chaparral and Coastal Scrub Ecology
Chaparral and Coastal Scrub EcologyChaparral and Coastal Scrub Ecology
Chaparral and Coastal Scrub Ecology
 
The Intertidal and Kelp Forests - Pacific Coast
The Intertidal and Kelp Forests  - Pacific CoastThe Intertidal and Kelp Forests  - Pacific Coast
The Intertidal and Kelp Forests - Pacific Coast
 
A Universe of Data
A Universe of DataA Universe of Data
A Universe of Data
 
Climate Change
Climate ChangeClimate Change
Climate Change
 
Climate change
Climate changeClimate change
Climate change
 
The commons???
The commons???The commons???
The commons???
 
Ecological Society of America Science Commons
Ecological Society of America Science CommonsEcological Society of America Science Commons
Ecological Society of America Science Commons
 
Epidemiology cholera, ebola, hiv aids
Epidemiology cholera, ebola, hiv aidsEpidemiology cholera, ebola, hiv aids
Epidemiology cholera, ebola, hiv aids
 
The Human Biome
The Human BiomeThe Human Biome
The Human Biome
 
Epistemology, ontology, knowledge x
Epistemology, ontology, knowledge xEpistemology, ontology, knowledge x
Epistemology, ontology, knowledge x
 
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
Ids 330 "Environmental Leadership" Basic Introduction (University of the West)
 
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary TheoryCharles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
Charles Darwin: The Galapagos Finches and the Emergence of Evolutionary Theory
 
Trauma and violence
Trauma and violenceTrauma and violence
Trauma and violence
 
Children and Trauma in the International World (UWest Psych 490 November 7, 2...
Children and Trauma in the International World (UWest Psych 490 November 7, 2...Children and Trauma in the International World (UWest Psych 490 November 7, 2...
Children and Trauma in the International World (UWest Psych 490 November 7, 2...
 

US Office of Personnel Management: Notes on "Big Data"

  • 1. Some notes on “Big Data” : What does “full life-cycle” data management mean ? Tom Moritz, OPM “Big Data” July, 2012
  • 2. Open Government and “Transparency” Two dimensions: -- Data aboutgovernment operations (all three branches!) -- Data that represent the products of government activity
  • 4. “A representation of the cholera epidemic of the nineteenth century” http://history.nih.gov/exhibits/history/index.html
  • 7.
  • 8. The “Flash Crash”: “On the afternoon of May 6, 2010, the U.S. equity markets experienced an extraordinary upheaval. Over approximately 10 minutes, the Dow Jones Industrial Average dropped more than 600 points, representing the disappearance of approximately $800 billion of market value. The share price of several blue-chip multinational companies fluctuated dramatically; shares that had been at tens of dollars plummeted to a penny in some cases and rocketed to values over $100,000 per share in others. As suddenly as this market downturn occurred, it reversed, so over the next few minutes most of the loss was recovered and share prices returned to levels close to what they had been before the crash.” “Large-Scale Complex IT Systems.” By Ian Sommerville, et al. Communications of the ACM, Vol. 55 No. 7, Pages 71-77. http://cacm.acm.org/magazines/2012/7/151233-large-scale-complex-it- systems/fulltext Paul Strand: “Wall Street, New York City, 1915” Aerial view of pedestrians walking along Wall Street in strong sunlight and building in background with large recesses (likely 23 Wall Street, the headquarters of J.P. Morgan & Co.). Photograph by Paul Strand (a student of Lewis Hine), 1915; published in Camera Work, v. 48, p. 25. October 1916. http://www.flickr.com/photos/trialsanderrors/3075553370/
  • 9. The “Flash Crash” (2) “…the trigger event was identified as a single block sale of $4.1 billion of futures contracts executed with uncommon urgency on behalf of a fund-management company. That sale began a complex pattern of interactions between the high-frequency algorithmic trading systems (algos) that buy and sell blocks of financial instruments on incredibly short timescales. “A software bug did not cause the Flash Crash; rather, the interactions of independently managed software systems created conditions unforeseen (probably unforeseeable) by the owners and developers of the trading systems. Within seconds, the result was a failure in the broader socio-technical markets that increasingly rely on the algos…” “Large-Scale Complex IT Systems.” By Ian Sommerville, et al. Communications of the ACM, Vol. 55 No. 7, Pages 71-77. http://cacm.acm.org/magazines/2012/7/151233-large-scale-complex-it-systems/fulltext
  • 10. The “Flash Crash”(3): Key Insights  “Coalitions of systems, in which the system elements are managed and owned independently, pose challenging new problems for systems engineering.” “ When the fundamental basis of engineering – reductionism – breaks down, incremental improvements to current engineering techniques are unable to address the challenges of developing, integrating, and deploying large- scale complex IT systems.” “Developing complex systems requires a socio-technical perspective involving human, organizational, social and political factors, as well as technical factors.” “Large-Scale Complex IT Systems.” By Ian Sommerville, et al. Communications of the ACM, Vol. 55 No. 7, Pages 71-77. http://cacm.acm.org/magazines/2012/7/151233-large-scale-complex-it-systems/fulltext
  • 12. The “Ecology” of Digital Data GRIDS Data International Centers Collaborative Research Effort Individual National Disciplinary Initiatives Libraries Cooperative Projects Local / Individuals Personal Archiving “Small Science” “BIG Science”
  • 13. The Public Domain “The institutional ecology of the digital environment” (Yokai Benkler) Sectors (public < - > private) and Jurisdictional Scale THE ROLE OF SCIENTIFIC AND TECHNICAL DATA AND INFORMATION IN THE PUBLIC DOMAIN PROCEEDINGS OF A SYMPOSIUM Julie M. Esanu and Paul F. Uhlir, Editors Steering Committee on the Role of Scientific and Technical Data and Information in the Public Domain Office of International Scientific and Technical Information Programs Board on International Scientific Organizations Policy and Global Affairs Division, National Research Council of the National Academies, p. 5
  • 14. The “small science,” independent investigator approach traditionally has characterized a large area of experimental laboratory sciences, such as chemistry or biomedical research, and field work and studies, such as biodiversity, ecology, microbiology, soil science, and anthropology. The data or samples are collected and analyzed independently, and the resulting data sets from such studies generally are heterogeneous and unstandardized, with few of the individual data holdings deposited in public data repositories or openly shared. The data exist in various twilight states of accessibility, depending on the extent to which they are published, discussed in papers but not revealed, or just known about because of reputation or ongoing work, but kept under absolute or relative secrecy. The data are thus disaggregated components of an incipient network that is only as effective as the individual transactions that put it together. Openness and sharing are not ignored, but they are not necessarily dominant either. These values must compete with strategic considerations of self-interest, secrecy, and the logic of mutually beneficial exchange, particularly in areas of research in which commercial applications are more readily identifiable. The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium. Julie M. Esanu and Paul F. Uhlir, Eds. Steering Committee on the Role of Scientific and Technical Data and Information in the Public Domain Office of International Scientific and Technical Information Programs Board on International Scientific Organizations Policy and Global Affairs Division, National Research Council of the National Academies, p. 8
  • 15. “Small” data collections may become “Big” (and more complex) by successive aggregation of sources…
  • 16. Linked Open Data 2009 2011 Courtesy of Tim Lebo, RPI http://bit.ly/lebo-ipaw- 20 Jun 2012 2012 @timrdf http://bit.ly/lebo-ipaw-2012 16
  • 17. “Data” ? [technical definition] “…’data’ are defined as any information that can be stored in digital form and accessed electronically, including, but not limited to, numeric data, text, publications, sensor streams, video, audio, algorithms, software, models and simulations, images, etc.”-- Program Solicitation 07-601 “Sustainable Digital Data Preservation and Access Network Partners (DataNet)” Taken in this broadest possible sense, “data” are thus simply electronic coded forms of information. And virtually anything can be represented as “data” so long as it is electronically machine-readable.
  • 18. “The digital universe in 2007 — at 2.25 x 1021bits (281 exabytes or 281 billion gigabytes) — was 10% bigger than we thought. The resizing comes as a result of faster growth in cameras, digital TV shipments, and better understanding of information replication. “By 2011, the digital universe will be 10 times the size it was in 2006. “As forecast, the amount of information created, captured, or replicated exceeded available storage for the first time in 2007. Not all information created and transmitted gets stored, but by 2011, almost half of the digital universe will not have a permanent home. “Fast-growing corners of the digital universe include those related to digital TV, surveillance cameras, Internet access in emerging countries, sensor-based applications, datacenters supporting “cloud computing,” and social networks. The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth through 2011 -- Executive Summary. IDC Information and Data, March, 2008 http://www.emc.com/collateral/analyst-reports/diverse-exploding-idc-exec-summary.pdf
  • 19. “As you go down the Long Tail the signal-to-noise ratio gets worse. Thus the only way you can maintain a consistently good enough signal to find what you want is if your filters get increasingly powerful.” Chris Anderson “Is the Long Tail full of crap?” May 22, 2005 http://longtail.typepad.com/the_long_tail/2005/05/isnt_the_long_t.html
  • 20. “Data” [epistemic definition] “Measurements, observations or descriptions of a referent -- such as an individual, an event, a specimen in a collection or an excavated/surveyed object -- created or collected through human interpretation (whether directly “by hand” or through the use of technologies)” -- AnthroDPA Working Group on Metadata (May, 2009)
  • 21. Data Entropy: the risks of inaction and the urgency of action “…data longevity is increased. Comprehensive metadata counteract the natural tendency for data to degrade in information content through time (i.e. information entropy sensu Michener et al., 1997; Fig. 1).” W. K. Michener “Meta-information concepts for ecological data management” Ecological Informatics 1 (2006) 3-7 Tom Moritz, OPM “Big Data” July, 2012
  • 22. Data Development: “Data Reduction - Processing Level Definitions” (an example) Report of the EOS Data Panel Vol IIA, NASA, 1986 (Tech Memorandum 87777) http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19860021622_1986021622.pdf Tom Moritz, OPM “Big Data” July, 2012
  • 23. T.C. Chamberlin Tom Moritz, OPM “Big Data” July, 2012
  • 24. Hypotheses and data as evidence: Inductive< -- > Deductive feedback loops? “What science does is put forward hypotheses, and use them to make predictions, and test those predictions against empirical evidence. Then the scientists make judgments about which hypotheses are more likely, given the data. These judgments are notoriously hard to formalize, as Thomas Kuhn argued in great detail, and philosophers of science don’t have anything like a rigorous understanding of how such judgments are made. But that’s only a worry at the most severe levels of rigor; in rough outline, the procedure is pretty clear. Scientists like hypotheses that fit the data, of course, but they also like them to be consistent with other established ideas, to be unambiguous and well-defined, to be wide in scope, and most of all to be simple. The more things an hypothesis can explain on the basis of the fewer pieces of input, the happier scientists are.” -- Sean Carroll “Science and Religion are not Compatible” Discover Magazine June 23rd, 2009 8:01 AM http://blogs.discovermagazine.com/cosmicvariance/2009/06/23/science-and-religion-are-not-compatible/ Tom Moritz, OPM “Big Data” July, 2012
  • 25. Full Life Cycle Management? Tom Moritz, OPM “Big Data” July, 2012
  • 26. US NSF “DataNet” Program “the full data preservation and access lifecycle” • “acquisition” • “documentation” • “protection” • “access” • “analysis and dissemination” • “migration” • “disposition” “Sustainable Digital Data Preservation and Access Network Partners (DataNet) Program Solicitation” NSF 07- 601 US National Science Foundation Office of Cyberinfrastructure Directorate for Computer & Information Science & Engineering http://www.nsf.gov/pubs/2007/nsf07601/nsf07601.htm Tom Moritz, OPM “Big Data” July, 2012
  • 27. IWGDD = [US] “Interagency Working Group on Digital Data” http://www.nitrd.gov/about/harnessi ng_power_web.pdf Tom Moritz, OPM “Big Data” July, 2012
  • 28. IWGDD“DIGITAL DATA LIFE CYCLE” Exhibit B-2. Life Cycle Functions for Digital Data* • Plan −− Determine what data need to be created or collected to support a research agenda or a mission function -- Identify and evaluate existing sources of needed data −− Identify standards for data and metadata format and quality −− Specify actions and responsibilities for managing the data over their life cycle • Create −− Produce or acquire data for intended purposes −− Deposit data where they will be kept, managed and accessed for as long as needed to support their intended purpose −− Produce derived products in support of intended purposes; e.g., data summaries, data aggregations, reports, publications • Keep −− Organize and store data to support intended purposes -- Integrate updates and additions into existing collections -- Ensure the data survive intact for as long as needed • Acquire and implement technology −− Refresh technology to overcome obsolescence and to improve performance −− Expand storage and processing capacity as needed −− Implement new technologies to support evolving needs for ingesting, processing, analysis, searching and accessing data • Disposition −− Exit Strategy: plan for transferring data to another entity should the current repository no longer be able to keep it −− Once intended purposes are satisfied, determine whether to destroy data or transfer to another organization suited to addressing other needs or opportunities http://www.nitrd.gov/about/harnessing_power_web.pdf Tom Moritz, OPM “Big Data” July, 2012
  • 29. www.dcc.ac.uk/docs/publications/DCCLifecycle.pdf Tom Moritz, OPM “Big Data” July, 2012
  • 30. “JISC DCC Curation Lifecycle Model” http://www.dcc.ac.uk/docs/publications/DCCLifecycle.pdf Tom Moritz, OPM “Big Data” July, 2012
  • 31. Database Lifecycle Management “The Database Lifecycle Management covers the entire lifecycle of the databases, including: • Discovery and Inventory tracking: the ability to discover your assets, and track them • Initial provisioning, the ability to rollout databases in minutes • Ongoing Change Management, End-to-end management of patches , upgrades, schema and data changes • Configuration Management, track inventory, configuration drift and detailed configuration search • Compliance Management, reporting and management of industry and regulatory compliance standards • Site level Disaster Protection Automation” http://www.oracle.com/technetwork/oem/pdf/511949.pdf Tom Moritz, OPM “Big Data” July, 2012
  • 32. W. K. Michener “Meta-information concepts for ecological data management” Ecological Informatics 1 (2006) 3-7 http://tinyurl.com/d49f3vm Tom Moritz, OPM “Big Data” July, 2012
  • 33. “Sustainable data curation” “There are several main elements necessary to sustain data curation:  “Robust data storage facilities (hardware and software) that are capable of accurately handling data migration across generations of media.  “Backup plans, that are tested, so irreplaceable data are not at risk. Unintended data loss can occur for many reasons: some major causes are: poor stewardship leading to the loss of metadata to understand where the data is located and documentation to understand the content, physical facility and equipment failure (fire, flood, irrecoverable hardware crashes), accidental data overwrite or deletion.  “Science-educated staff with knowledge to match the data discipline is important for checking data integrity, choosing archive organization, creating adequate metadata, consulting with users, and designing access systems that meet user expectations. Staff responsible for stewardship and curation must understand the digital data content and potential scientific uses. “ C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge sharing ,” from the 4th International Digital Curation Conference December 2008 , page 10. www.dcc.ac.uk/events/dcc- 2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09] Tom Moritz, OPM “Big Data” July, 2012
  • 34. Sustainable data curation(cont.)  “Non-proprietary data formats that will ensure data access capability for many decades and will help avoid data losses resulting from software incompatibilities…  “Consistent staffing levels and people dedicated to best practices in archiving, access, and stewardship…  “National and International partnerships and interactions greatly aids in shared achievements for broad scale user benefits, e.g. reanalyses, TIGGE…  “Stable fundingnot focused on specific projects, but data management in general…” C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge sharing ,” from the 4th International Digital Curation Conference December 2008 , page 10-11. www.dcc.ac.uk/events/dcc- 2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09] Tom Moritz, OPM “Big Data” July, 2012
  • 35. “Data Quality” ??? In general colloquial terms, “Data Quality” is the fundamental issue of concern to scientists, policy makers, managers/decision makers and the general public. “Quality” can be considered in terms of three primary values: • Validity: logical in terms of intended hypothesis to be tested (all potential types of data that could be chosen should be weighed for probative value…) • Competence (Reliability) :consideration of the proper choice of expert staff, methods, apparatus/gear, calibration, deployment and operation • Integrity: the maintenance of original integrity of data as well as tracking and documenting of all recording, migration, transformations and sequences of transformation of data Tom Moritz, OPM “Big Data” July, 2012
  • 36. “…the “validation” of any scientific hypotheses rests upon the sum integrity of all original data and of all sequences of data transformation to which original data have been subject. “ – Tom Moritz “The Burden of Proof” http://imsgbif.gbif.org/CMS_NEW/get_file.php?FI LE=2b032cf8212d19a720f21465df0686 Tom Moritz, OPM “Big Data” July, 2012
  • 37. A Primary Goal of Open Government Public Access to Data that is: • Of High Quality ( SEE –previous discussion) • Free – no cost or minimal cost • Open – easily discoverable and accessible – “A piece of content or data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.” * http://http://opendefinition.org/ ] • Effective / Useful / Usable – both technically usable and descriptively identified in ways that support ready analysis, citation, use, reuse… T. Moritz “The Burden of Proof: Data as Evidence in Science and Public Policy” MicroSoft Research, GRDI2020, Stellenbosch, South Africa , Sept., 2010 http://www.grdi2020.eu/Pages/SelectedDocument.aspx?id_documento=87f1b6d5- 5c30-42a7-94df-d9cd5f4b147c
  • 38. Thanks for your attention… Tom Moritz Tom Moritz Consultancy Los Angeles tom.moritz@gmail.com +1 310 963 0199 tommoritz (Skype) http://www.linkedin.com/in/tmoritz http://www.slideshare.net/Tom_Moritz
  • 39.
  • 40. Saturn images courtesy of R J Robbins and The Research Coordinating Network for the Genomics Standards Consortium…
  • 42. Rosalind Franklin’s Image “Franklin's B-form data, in conjunction with cylindrical Patterson map calculations that she had applied to her A-form data, allowed her to determine DNA's density, unit-cell size, and water content. With those data, Franklin proposed a double- helix structure with precise measurements for the diameter, the separation between each of the coaxial fibers along the fiber axis direction, and the pitch of the helix.3 “The diffraction photograph of the B form of DNA taken by Rosalind Franklin in May 1952 was by far the best photograph of its kind. Data derived from this photograph were instrumental in allowing James Watson and Francis Crick to construct their Nobel Prize­winning model for DNA.” (Courtesy of the Norman Collection on the History of Molecular Biology in Novato, Calif.) http://philosophyofscienceportal.blogspot.com/2008/04/rosalind-franklin-double-helix.html
  • 43. “Notebook entries show that Rosalind Franklin (a) recognized that the B form of DNA was likely to have a two-chained helix; (b) was aware of the Chargaff ratios; (c) knew that most, if not all, of the nitrogenous bases in DNA were in the keto configuration…; and (d) determined that the backbone chains of A-form DNA are antiparallel.” (Courtesy of Anne Sayre and Jenifer Franklin Glynn.) http://philosophyofscienceportal.blogspot.com/2008/04/rosalind-franklin-double-helix.html
  • 44. “Transcript of letter from James Watson to Max Delbruck March 12, 1953” “The basic structure is helical – it consists of two intertwining helices…” http://osulibrary.oregonstate.edu/spe cialcollections/coll/pauling/dna/corr/c orr432.1-watson-delbruck-19530312- 01-large.html

Editor's Notes

  1. Without systematic management of data knowledge is at risk…
  2. All data go through processes of development. This 1986 NASA publication is still an excellent guide to basics of scientific data management…
  3. Writing over 100 years ago, TC Chamberlin suggested that is structuring hypotheses – the method of working with multiple working hypotheses was superior…
  4. The “definition” phase of the data management cycle is often neglected. In some instances of “big science” definitions for certain types of data are standard and fully specified. In other cases, it is assumed that the reason for a data type is obvious… Nevertheless, in planning for full-life cycle management – the data type definition, the basis for selecting a given type of data , is essential, even if given only a cursory explanation. For purposes of policy formation, wise planners will always ask how the given data type(s) compare with all other possible – or conceivable data types.
  5. “Life-cycle” as a metaphorimplies a dependent developmental sequence. In fact, effective data management is more complicated than a simple sequence. There are certainly dependent developmental sequences but there are also sustained values that must be attended to throughout any cycle. For example, protection of the integrity of data so that no extraneous effects inadvertently deform data, rigorous record keeping, documenting provenance, chain-of-custody/lineage of data, careful documentation of scientific workflow (expert-competence of creators, apparatus, calibration, methodology) the sequences of transformation, migration/ emulation decisions, retention/disposition decisions…
  6. Data Managers are usually not privy to the data definition process hence there is a natural tendency to treat data definition as a given and to begin full life cycle management with “acquisition” – nevertheless, managers should include in their descriptive tasks some explanation of data type definition (an explanation which should account for the selection of a given data type over and against primary alternatives)
  7. The US IWGDD model is less than comprehensive and more vague – it is only broadly indicative of necessary processes…
  8. The accompanying text is more helpful but still not comprehensive…
  9. Schema are very tempting – particularly given the “devil’s toolbox” provided by MS PowerPoint – but unless very rigorously employed, they can misrepresent or obfuscate…
  10. The text accompanying the DCC model is very helpful in differentiating “full life cycle” actions / “sequential actions” and “occasional actions” -- the graphic is much less effective…
  11. This Oracle “model” focuses on “databases” – not on “data” per se…
  12. Michener’s chart from 2006 makes a better effort at suggesting constant elements and feedback loops…