Data Archiving and Networked Services



          Downscaling
      information systems
          for education
          Christophe Guéret (@cgueret)




DANS is een instituut van KNAW en NWO
What do you mean "Downscaling" ?
● Alternative to up/out scaling platforms when
  the cost of doing it becomes too high

● "Cost" in the wide sense
  ○   Loss of expressivity via harmonization
  ○   Loss of privacy
  ○   Loss of consistency / incompatible semantics
  ○   Hardware costs
  ○   Infrastructural costs
  ○   Cultural incompatibility
  ○   ...
Up / out scaling
● Get more and more data into one system
● Scale vertically (up) or horizontally (out)
Down scaling
● Instead of one big (cluster) system, use a
  swarm of smaller systems
● Aim at highest meaningful granularity
2 downscaled systems for education
● Information system for researchers willing to
  study Worldwide academic activity




● ~2M young learners willing to go social with
  digital media, but without Internet
Diversity aware publication of
activity of research institutions
Context
● Millions of researchers active Worldwide

● Represents lot of information about
   ○   Positions
   ○   Teaching activities
   ○   Equipment
   ○   Discoveries
   ○   ...


● Potential high value in sharing all that data
  and mining it
Problems
● Lots of name-centric, thus highly ambiguous,
  data sets

● Different conceptual spaces

● Different positions that not always
  correspond
   ○ "Maître de conférences" ~ "Universitair Docent" ~
     "Assistant professor" ?
Towards THE information system (?)
● Try to be the "Facebook for researchers"
● Eventual focus on sub-parts of the data
or THE ontology (?)
● Focus on the terminology, allow for different
  data stack (including non Web based)
Users end-up with a tough choice
● Do you prefer too specific or too generic ?




● Workaround: formats roundtripping
But data does not travel well...
● Publications from Frank van Harmelen
● Decreasing number from system to system




   148                38              13
Downscaling RIS
● Some of the harmonization high costs
  ○   Large ontologies are hard to design
  ○   Tradeoff coverage VS expressivity
  ○   Large amount of data
  ○   Lack of incentives to update one platform + branding
      and reporting issues playing against


● Alternative
  ○ Rely on a data ecosystem
  ○ Use several, layered ontologies
A research information ecosystem
Core ontology + national extensions
● Global scale insights and low level details




● Take advantage of reasoning
Cloud-less social interaction
Context
● XO laptop given to 2M
  kids aged 6-12
● Low-end hardware
  (~ old smartphone)
● Educational software
  based on
  constructivism
● Communication via
  Mesh-networking
The environment "Sugar"
Activities in Sugar
● A Sugar activity combines the concepts of
  “document” and “application” into a single
  object

● Activities can be easily shared between
  neighbouring computers

● Activity instances are associated with the
  document they let the user work on
Sharing activities
Journal of activity usage
Limitations of current data stack
● Data sharing limited to synchronous
  interaction

● Data isolated in silos

● No remote access to data created within a
  Sugar instance

● Social activity bounded by the classroom
Let's improve this, Web 2.0 way !
● Create a central server on the Cloud and
  define an API
● Create activities interacting via the API
● Add a Web frontend for authentication and
  adjust ACLs for the API


           +           +            =
Won't work because...
● Lack of stable, cheap, connection to Internet

● Lack of relevant content on the Web to
  justify getting a connection

● Issues with having kids on social networks

● (Besides, potentially hard to find a business
  model for sharing kids' work)
Downscaled alternative
● Turn every XO into a self-contained data
  publisher/consumer

● Apply Linked Data principles to achieve
  decentralised data integration
More information
● Collection of presentation about this and
  other topics
  ○ http://www.slideshare.net/cgueret


● Blog about making data sharing a reality for
  everyone
  ○ https://worldwidesemanticweb.wordpress.com/


● christophe.gueret@dans.knaw.nl

Downscaling information systems for education

  • 1.
    Data Archiving andNetworked Services Downscaling information systems for education Christophe Guéret (@cgueret) DANS is een instituut van KNAW en NWO
  • 2.
    What do youmean "Downscaling" ? ● Alternative to up/out scaling platforms when the cost of doing it becomes too high ● "Cost" in the wide sense ○ Loss of expressivity via harmonization ○ Loss of privacy ○ Loss of consistency / incompatible semantics ○ Hardware costs ○ Infrastructural costs ○ Cultural incompatibility ○ ...
  • 3.
    Up / outscaling ● Get more and more data into one system ● Scale vertically (up) or horizontally (out)
  • 4.
    Down scaling ● Insteadof one big (cluster) system, use a swarm of smaller systems ● Aim at highest meaningful granularity
  • 5.
    2 downscaled systemsfor education ● Information system for researchers willing to study Worldwide academic activity ● ~2M young learners willing to go social with digital media, but without Internet
  • 6.
    Diversity aware publicationof activity of research institutions
  • 7.
    Context ● Millions ofresearchers active Worldwide ● Represents lot of information about ○ Positions ○ Teaching activities ○ Equipment ○ Discoveries ○ ... ● Potential high value in sharing all that data and mining it
  • 8.
    Problems ● Lots ofname-centric, thus highly ambiguous, data sets ● Different conceptual spaces ● Different positions that not always correspond ○ "Maître de conférences" ~ "Universitair Docent" ~ "Assistant professor" ?
  • 9.
    Towards THE informationsystem (?) ● Try to be the "Facebook for researchers" ● Eventual focus on sub-parts of the data
  • 10.
    or THE ontology(?) ● Focus on the terminology, allow for different data stack (including non Web based)
  • 11.
    Users end-up witha tough choice ● Do you prefer too specific or too generic ? ● Workaround: formats roundtripping
  • 12.
    But data doesnot travel well... ● Publications from Frank van Harmelen ● Decreasing number from system to system 148 38 13
  • 13.
    Downscaling RIS ● Someof the harmonization high costs ○ Large ontologies are hard to design ○ Tradeoff coverage VS expressivity ○ Large amount of data ○ Lack of incentives to update one platform + branding and reporting issues playing against ● Alternative ○ Rely on a data ecosystem ○ Use several, layered ontologies
  • 14.
  • 15.
    Core ontology +national extensions ● Global scale insights and low level details ● Take advantage of reasoning
  • 16.
  • 17.
    Context ● XO laptopgiven to 2M kids aged 6-12 ● Low-end hardware (~ old smartphone) ● Educational software based on constructivism ● Communication via Mesh-networking
  • 18.
  • 19.
    Activities in Sugar ●A Sugar activity combines the concepts of “document” and “application” into a single object ● Activities can be easily shared between neighbouring computers ● Activity instances are associated with the document they let the user work on
  • 20.
  • 21.
  • 22.
    Limitations of currentdata stack ● Data sharing limited to synchronous interaction ● Data isolated in silos ● No remote access to data created within a Sugar instance ● Social activity bounded by the classroom
  • 23.
    Let's improve this,Web 2.0 way ! ● Create a central server on the Cloud and define an API ● Create activities interacting via the API ● Add a Web frontend for authentication and adjust ACLs for the API + + =
  • 24.
    Won't work because... ●Lack of stable, cheap, connection to Internet ● Lack of relevant content on the Web to justify getting a connection ● Issues with having kids on social networks ● (Besides, potentially hard to find a business model for sharing kids' work)
  • 25.
    Downscaled alternative ● Turnevery XO into a self-contained data publisher/consumer ● Apply Linked Data principles to achieve decentralised data integration
  • 26.
    More information ● Collectionof presentation about this and other topics ○ http://www.slideshare.net/cgueret ● Blog about making data sharing a reality for everyone ○ https://worldwidesemanticweb.wordpress.com/ ● christophe.gueret@dans.knaw.nl