Private Data and the Public Cloud
Jamie Taylor, Ph.D.
Your Data

•   Transactional data
•   Systems of record
•   Document management systems
•   Asset management systems
•   Collaborative data
Your Data

•   Regulatory data
•   Customer data
•   Supply chain data
•   Human resources data
•   Research and development data
Lots of Data
Think Critically About
    ...Your Data
Master Data
Entities that drive systems of record
  • customer numbers
  • service codes
  • warranty information
  • distribution information
  • partner information
  • products
  • item codes
  • suppliers
  • cost centers
  • department codes
  • company hierarchies
  • ...
Master Data



Where does master data end?
Master Data?
Is "master data" enough?



If not, can we make any data, master data?
How much of your data
directly affects the competitive
 advantage of the company?
Knowledge Organization Systems


  •   Thesauri
  •   Controlled vocabularies
  •   Taxonomies
  •   Ontologies
Core vs Context


     •   Core
         •   directly affects the competitive
             advantage of the company

     •   Context
         •   everything else
The resources that matter

      Scarce           Plentiful
     Resources        Resources




                                   a
                                 at
•   Time         •   Money




                             D
•   Talent       •   Computing
•   Management   •   Service
    Attention        Providers
How much of your data
    ...is context?
What Data is Context?


•   Supplier data
•   Asset classification taxonomy
•   Geographic information
How much of your data operations
       ...are context?
Partner Synchronization
         1010101010101010
         1010101010101010
         1010101010101010
         1010101010101010
         1010101010101010
         1010101001010101
         0101010101010101
         0101010101010101
Partner Synchronization
           1010101010101010
           1010101010101010
           1010101010101010
           1010101010101010
           1010101010101010
           1010101001010101
           0101010101010101
           0101010101010101




         100101010101010101
         010101010101010101
         010101010101010101
         010101010101010101
Are you expending scarce
    resources on context data?


Can these be offset with plentiful resources?
Tim Berners-Lee’s
Giant Global Graph
Clouds
Anatomy of a Web2.0 Application
Public
                 Mashup
    !     User Generated Content

                The XYZ


    "



    #

         XYZ
                    Private
         !"#
Identifiers Everywhere
The power of external identifiers




            http://kiwitobes.com/industry_mashup/
Industry (USCB)         Company              Company              Donations
    NAICS                Ticker        CRP    CRP ID     CRP       CRP ID

       NAICS/SIC Map
                            SEC
          Freebase


Industry (SEC)          Company               People               Person
     SIC          SEC     CIK          SEC     CIK     Freebase   Wikipedia

                            Freebase                                  Wikipedia


                        Location                                   Article
                        ZIP Code
Web Scale Identifiers
  http://somewhere.com/there/is/a/thing


• Dereferencable by anyone
 • confirms the entity being addressed
• Distributed construction
 • any publishing platform can mint IDs
The humble URI



     URIs



       URLs
The humble URL


Make your ID's usable




          URLs
Not all URLs are good identifiers
Web Scale Identifiers
http://somewhere.com/there/is/a/thing

                • Stable
                • Simple
                • Accessible

"Enormous synergies have gone unrealized because web
publishers have chosen to mint new namespaces rather than
add value to existing ones."

                                               -Jon Udell
Master Data
  Who defines master data entities?
Are they used in Excel spreadsheets?

                                  Priests of schema?
                                  Priests of Identifiers?



If "master data" represents the
entities that roll-up data across the
enterprise, the identifiers for those
entities must scale enterprise-wide.
How to Mint "Master" Entities
If the entity exists externally:
 Use an external identifiers for it
 (if an external identifier doesn't exist, add it to one of the
 public repositories!)

If the entity exists only internally:
 Add it to an internal publishing system

If the entity exists internally, but should be
used externally:
 Add it to one of the public repositories
 (or publish it publicly yourself)
Arrow's Information Paradox
 The value of information
 is not known until the
 information is known
      RAND 1959, p-1856-rc, p.10




Is this the corollary?
                                   Kenneth Arrow
 If I publish information
 publicly, I'll be giving
 away its value
                                    Photo: Linda A. Cicero / Stanford News Service
Are you expending scarce
    resources on context data?


Can these be offset with plentiful resources?
Use Time, Talent and Management Attention
       for the things that are Core.

     Use the public cloud for Context.




                    http://www.flickr.com/photos/jamescridland/613445810/
Architectural Notes
User Selectable Identifiers
http://www.freebase.com/view/en/imperial_sugar




                          http://www.freebase.com/docs/suggest
Hybrid Public/Private Architecture
Hybrid System Demo


           Barak Michener
http://www.freebase.com/view/en/barak_michener



 http://github.com/barakmich/jgd
?!
                   Additional Photo Credits
              Warehouse: http://www.flickr.com/photos/m500/3505922925/
       Shopping cart: http://commons.wikimedia.org/wiki/File:Stencil_shopping_cart.jpg
            Factory Clipart: http://commons.wikimedia.org/wiki/File:P_industry.png

           Org chart: http://www.flickr.com/photos/30975003@N06/3837106588/

             Equation plot: http://www.flickr.com/photos/ethanhein/3402215606/
Clouds: http://www.flickr.com/photos/31288116@N02/3909268647/in/set-72157622338570130/

Public private-cloud

  • 1.
    Private Data andthe Public Cloud Jamie Taylor, Ph.D.
  • 2.
    Your Data • Transactional data • Systems of record • Document management systems • Asset management systems • Collaborative data
  • 3.
    Your Data • Regulatory data • Customer data • Supply chain data • Human resources data • Research and development data
  • 4.
  • 5.
  • 6.
    Master Data Entities thatdrive systems of record • customer numbers • service codes • warranty information • distribution information • partner information • products • item codes • suppliers • cost centers • department codes • company hierarchies • ...
  • 7.
    Master Data Where doesmaster data end?
  • 8.
  • 10.
    Is "master data"enough? If not, can we make any data, master data?
  • 11.
    How much ofyour data directly affects the competitive advantage of the company?
  • 12.
    Knowledge Organization Systems • Thesauri • Controlled vocabularies • Taxonomies • Ontologies
  • 13.
    Core vs Context • Core • directly affects the competitive advantage of the company • Context • everything else
  • 14.
    The resources thatmatter Scarce Plentiful Resources Resources a at • Time • Money D • Talent • Computing • Management • Service Attention Providers
  • 15.
    How much ofyour data ...is context?
  • 16.
    What Data isContext? • Supplier data • Asset classification taxonomy • Geographic information
  • 17.
    How much ofyour data operations ...are context?
  • 18.
    Partner Synchronization 1010101010101010 1010101010101010 1010101010101010 1010101010101010 1010101010101010 1010101001010101 0101010101010101 0101010101010101
  • 19.
    Partner Synchronization 1010101010101010 1010101010101010 1010101010101010 1010101010101010 1010101010101010 1010101001010101 0101010101010101 0101010101010101 100101010101010101 010101010101010101 010101010101010101 010101010101010101
  • 20.
    Are you expendingscarce resources on context data? Can these be offset with plentiful resources?
  • 26.
  • 28.
  • 29.
    Anatomy of aWeb2.0 Application Public Mashup ! User Generated Content The XYZ " # XYZ Private !"#
  • 30.
  • 31.
    The power ofexternal identifiers http://kiwitobes.com/industry_mashup/
  • 32.
    Industry (USCB) Company Company Donations NAICS Ticker CRP CRP ID CRP CRP ID NAICS/SIC Map SEC Freebase Industry (SEC) Company People Person SIC SEC CIK SEC CIK Freebase Wikipedia Freebase Wikipedia Location Article ZIP Code
  • 33.
    Web Scale Identifiers http://somewhere.com/there/is/a/thing • Dereferencable by anyone • confirms the entity being addressed • Distributed construction • any publishing platform can mint IDs
  • 34.
    The humble URI URIs URLs
  • 35.
    The humble URL Makeyour ID's usable URLs
  • 36.
    Not all URLsare good identifiers
  • 37.
    Web Scale Identifiers http://somewhere.com/there/is/a/thing • Stable • Simple • Accessible "Enormous synergies have gone unrealized because web publishers have chosen to mint new namespaces rather than add value to existing ones." -Jon Udell
  • 38.
    Master Data Who defines master data entities? Are they used in Excel spreadsheets? Priests of schema? Priests of Identifiers? If "master data" represents the entities that roll-up data across the enterprise, the identifiers for those entities must scale enterprise-wide.
  • 39.
    How to Mint"Master" Entities If the entity exists externally: Use an external identifiers for it (if an external identifier doesn't exist, add it to one of the public repositories!) If the entity exists only internally: Add it to an internal publishing system If the entity exists internally, but should be used externally: Add it to one of the public repositories (or publish it publicly yourself)
  • 40.
    Arrow's Information Paradox The value of information is not known until the information is known RAND 1959, p-1856-rc, p.10 Is this the corollary? Kenneth Arrow If I publish information publicly, I'll be giving away its value Photo: Linda A. Cicero / Stanford News Service
  • 42.
    Are you expendingscarce resources on context data? Can these be offset with plentiful resources?
  • 43.
    Use Time, Talentand Management Attention for the things that are Core. Use the public cloud for Context. http://www.flickr.com/photos/jamescridland/613445810/
  • 44.
  • 45.
  • 46.
  • 47.
    Hybrid System Demo Barak Michener http://www.freebase.com/view/en/barak_michener http://github.com/barakmich/jgd
  • 48.
    ?! Additional Photo Credits Warehouse: http://www.flickr.com/photos/m500/3505922925/ Shopping cart: http://commons.wikimedia.org/wiki/File:Stencil_shopping_cart.jpg Factory Clipart: http://commons.wikimedia.org/wiki/File:P_industry.png Org chart: http://www.flickr.com/photos/30975003@N06/3837106588/ Equation plot: http://www.flickr.com/photos/ethanhein/3402215606/ Clouds: http://www.flickr.com/photos/31288116@N02/3909268647/in/set-72157622338570130/