Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. The   Grid   Observatory   Initiative   develops   a   scientific   view   of   the   dynamics   and   usage  of  globalized  IT  systems  by  monitoring  and  analyzing  the  EGI  grid.     The   overall   goal   is   to   create   a   full-­‐fledged   digital   curation   process,   with   its   four   components:  preservation,  validation,  indexation  and  knowledge  building.     As  the  largest  non-­‐profit  globalized  system  worldwide  and  with  demanding  scientific   users,   the   EGI   infrastructure   is   one   of   the   most   exciting   artificial   complex   systems.   With   extensive   monitoring   facilities   already   in   place,   it   offers   an   unprecedented   opportunity   to   observe   and   to   understand  the  computing  practices  within  the  e-­‐Science  community.     Grid   and   cloud   share   a   common   paradigm:   they   are   globalized   at   a   large   scale.   As   such,   the   data   collected   and   the   knowledge   built   from   analyzing   EGI   concern   cloud   modeling   as   well.   Ongoing   work   integrates   monitoring   data   from   the   StratusLab  cloud.   The  Grid  Observatory  is  an  open  collaboration,  keen  to  foster  dialog  and  partnerships  with  others  in  the  relevant  areas   of  computer  science  and  engineering.  The  Laboratoire  de  Recherche  en  Informatique  and  Laboratoire  de  l’Accélérateur   Linéaire,   from   CNRS   and   University   Paris-­‐Sud,   along   with   the   London   Imperial   College   operate   data   production.   The   initiative  is  supported  by  France-­‐Grilles,  INRIA  and  CNRS.     A  trove  of  experimental  data:  www.grid-­‐observatory.org     The   first   role   of   the   Grid   Observatory   is   to   preserve   the   monitoring  data,  normally  discarded  after  operational  usage,   and   to   make   them   available   to   the   wider   scientific   community.   Through   its   web   portal,   the   Grid   Observatory   offers  public  access  to  a  repository  of  grid  traces  to  observe   e-­‐Science  practice  and  infrastructure.   • EGI   provides   an   accessible   approximation   of   the   current  and  future  requirements  of  e-­‐Science  users.     • Grid   status   and   middleware   activity   are   recorded.   These   can   be   explored   for   a   wide   range   of   motivations,  from  operational  usage,  e.g.  improving   performance,   to   scientific   research,   e.g.   testing   classification  methods  for  fault  detection.   The   Grid   Observatory   follows   Tim   Berners   Lee’s   recommendation   for   Raw   Data   Now.   It   exemplifies   the   Big   Data   challenges:   semantic   organization,   provenance,   interoperability,   and   next   generation   analytics.   Emerging   technologies   such  as  Linked  Open  Data  will  be  explored  to  further  address  those  challenges.     The  Green  Computing  Observatory         The   Grid   Observatory   offers   extensive   traces   of   energy   consumption.   Because   green   IT   is   becoming   an   increasingly   urgent   need  and  also  because  there  was  no  existing  EGI  monitoring  tool,  this   action  has  its  own  name:  the  Green  Computing  Observatory.     • The   traces   integrate   motherboard-­‐level   monitoring   with   information  on  computing,  networking,  storage,  and  cooling.   • Acquisition  exploits  the  de  facto  standards  IPMI  and  Ganglia.   • Integration   is   based   on   an   ontology   of   IT   system   measurements,   including   virtual   machines,   developed   by   University  Picardie  Jules  Verne.       From  applied  to  fundamental  research   Research  exploiting  the  monitoring  data  should  demonstrate  verifiable  and  positive  impact  on  production  systems.     • Beyond-­‐power-­‐law   and   non-­‐stationary   behavior   are   pervasive.   With   sequential   testing,   segmentation   and   adaptive  on-­‐line  clustering,  we  advanced  fault  detection  and  parsimonious  model  building.   • Efficient   autonomic   policies   must   combine   a   priori   knowledge   and   on-­‐line   adaptation,   but   reference   interpretations   are   most   often   missing.   Data-­‐driven   topic   modeling   in   the   spirit   of   text   mining,   and   heterogeneous  data  integration  with  Statistical  Relational  Learning  help  to  build  intelligible  representations.    
  2. 2.          Digital  curation  The  overall  goal  of  the  Grid  Observatory  is  to  create  a  full-­‐fledged  digital  curation  process,  with  its  four  components.  Establishing  and  developing  a  long-­‐term  repository  of  digital  assets  for  current  and  future  references.  The  Grid  Observatory  operates  since  October  2008.  It  continuously  records  and  publishes  various  traces.  An  essential  achievement  is  to  cover  the  complete  scope  of  the  grid  middleware  and  users  activity,  beyond  particular  aspects  such  as  job  lifecycle  or  failure  events,  and  including  for  instance  logging  the  Information  System  (BDII).  Providing  digital  asset  search  and  retrieval  facilities  to  scientific  communities  through  a  gateway.  The   middleware   traces   are   currently   made   available   only   in   raw   format,   on   a  weekly   basis.   Much   remains   to   be   done   in  the   direction   of   a   more   semantic   organization.   The   Green   Computing   Observatory   data   are   organized   along   an   XML  schema  associated  with  the  measurement  ontology.  All  are  available  trough  the  Grid  Observatory  portal.    Tackling  the  good  data  creation  and  management  issues,  and  interoperability,  through  formal  ontology  building.  The   Grid   Observatory   most   often   builds   on   EGI   and   gLite   monitoring,   thus   benefits   from   their   collective   effort   of  middleware   development   and   EMI   standardization.   The   Green   Computing   Observatory   builds   on   IPMI   and   Ganglia.  Calibration   of   IPMI   measurements   is   made   possible   by   PDU   (Power   Distribution   Unit)   measurements.   The   Green  Computing  Observatory  participates  in  the  COST  action  IC0804  -­‐  Energy  efficiency  in  large  scale  distributed  systems.    Adding  value  to  data  by  generating  new  sources  of  information  and  knowledge  through  semantic,  statistical  and  Machine  Learning  based  inference.  The   general   framework   for   the   Grid   Observatory   is   to   turn   it   into   a   social   intelligence   system   to   pool   scientific   and  engineering   expertise,   in   order   to   build   gradually   more   integrated   models   of   the   European   e-­‐infrastructures,   and   to  define  and  validate  autonomic-­‐oriented  policies  addressing  their  operational  challenges.  More  information:     • The  Green  Computing  Observatory:  a  data  curation  approach  for  green  IT.  9th  IEEE  Int.  Conf.  on  Dependable,   Autonomic  and  Secure  Computing.     • The  Grid  Observatory.  11th  IEEE/ACM  Int.  Symp.  on  Cluster,  Cloud  and  Grid  Computing.   Towards  Open  Linked  Data   ***   Data   are   accessible   on   the   web   through   the   portal;   the   only   protection   implemented   is   against   malicious  usage.   All   formats   are   machine   readable   and  open:  ASCII,  XML,  SQL,  LDIF       RDF   and   Linked   RDF   are   the   next   step.      Selected  contributions  from  the  Grid  Observatory  initiative  and  its  users   Fault  detection  and  diagnosis,  smart  probing.  Distributed  Monitoring  with  Collaborative  Prediction.  12th  IEEE/ACM  Int.  Symp.  on  Cluster,  Cloud  and  Grid  Computing.  Toward   Autonomic   Grids:   Analyzing   the   Job   Flow   with   Affinity   Streaming.   15th   ACM   SIGKDD   Conf.   on   Knowledge  Discovery  and  Data  Mining.  Optimization   of   jobs   submission   on   the   EGEE   production   grid:   modeling   faults   using   workload.   Journal   of   Grid  Computing  ,  8(2).   Grid  models  Characterizing  e-­‐science  file  access  behavior  via  latent  Dirichlet  allocation  .  4th  IEEE/ACM  Int.  Conf.  on  Utility  and  Cloud  Computing.    Towards  non-­‐stationary  Grid  models.  Journal  of  Grid  Computing,  9(4).     Autonomic  Quality  of  Service  and  Green  Computing  Multiobjective  reinforcement  learning  for  responsive  grids.  Journal  of  Grid  Computing  8:3..    Autonomic  policy  adaptation  using  decentralized  online  clustering.  7th  IEEE/ACM  int.  conf.  on  Autonomic  computing.