SlideShare a Scribd company logo
1 of 38
Download to read offline
Oxford's DAMS Architecture


           Ben O'Steen
        Software Engineer,
 Oxford University Library Services

          Neil Jefferies
      R&D Project Manager
 Oxford University Library Services
'An institutional repository needs
  to be a service with continuity
behind it........Institutions need to
 recognize that they are making
commitments for the long term.'
          Clifford Lynch, 2004
Continuity and Services


We anticipate that the content
 our systems will hold or even
   hold now will outlive any
  software, hardware or even
people involved in its creation.
Continuity and Services




What content are we anticipating to
     store for the long-term?
Continuity and Services
    “Bookshelves for the 21st Century”
●


        Legal Deposit
    ●


        Voluntary Deposit
    ●


        Digitisation
    ●


        Extended Remit - Research Data
    ●


        Administrative data – researchers, projects, places,
    ●

        dates
        And anything else deemed worthy.
    ●
Continuity and Services




  What core principles do you
 design a system with when you
know the content will outlive you?
Continuity and Services
    Oxford DAMS – Digital Asset Management
●

    System
    – Not a single piece of software, more
      a set of guiding principles and aims.
Continuity and Services



      #1 Keep it simple


– simple is maintainable, easy to
           understand.
Continuity and Services
    #1 Keep it simple – simple is maintainable,
●

    easy to understand.
        Think about how much we have achieved with the
    –
        simple concept of using a single command on a
        simple protocol (HTTP GET) to get an HTML
        document. (HTML being a loose, ill-constrained set
        of elements.)

                       “GET / HTTP1.1”
Continuity and Services
    #1 Keep it simple – simple is maintainable,
●

    easy to understand.
        simple protocols such as HTTP, simple formats
    –
        such as JSON and API patterns such as REST are
        heavily favoured
        Remember, at some point, administration and
    –
        development of the system will need to be handed
        over.
Continuity and Services




#2 Everything but the content is
          replaceable
Continuity and Services
    #2 Everything but the content is replaceable
●


        The content that is being held by the system at any
    –
        one time is what is important, not the surrounding
        infrastructure
        The content should survive and be reusable in the
    –
        event that services like databases, search indexes,
        etc crash or corrupt.
Continuity and Services
              st
             1 logical outcome:


 Do not use complex packages to hold
             your content!

 Minimise the amount of software and work
that you need to maintain to understand how
           your content is stored.
Continuity and Services
    1st logical outcome:
●



    – Do   not use complex packages to
        hold your content!
        The basic, low-level digital object in our
    –
        system is a “bag”
         ● Each bag has a manifest, currently in

           either FOXML or RDF, which serves to
           identify the component parts and their
           inter-relationships.
Continuity and Services



 #3 The services in the system are
replaceable and can be rebuilt from
           the content.
Continuity and Services
    #3 The services in the system are replaceable
●

    and can be rebuilt from the content.
        Services such as:
    –

             Search indexes (eg via Lucene/Solr)
         ●


             Databases and RDF triplestores (MySQL, Mulgara, etc)
         ●


             XML databases (eXist db, etc)
         ●


             Object management middleware (Fedora)
         ●


             Object registries
         ●
Continuity and Services
    2nd logical outcome:
●



    –A     content store must be able to tell
        services when items have changed,
        been created, or deleted, and what
        items an individual store holds.
        We need slightly smarter storage to do this over a network.
    –
        Local filesystems have had this paradigm for a long time
        now, but we need to migrate it to the network level.
Continuity and Services
    2nd logical outcome:
●


        A content store must be able to tell services
    –
        when items have changed, been created, or
        deleted, and what items an individual store
        holds.
        Each store and set of services needs a way to pass
    –
        messages in a transactional way, a messaging system.
        [Tech note: we are moving from the JMS to use the new
    –
        AMQP via Rabbit MQ because on balance it is much more
        practical, flexible and useful.]
Continuity and Services
    3rd logical outcome:
●



    – The  storage must store the
      information and documentation for
      the standards and conventions its
      content follows.
    – The storage must be able to
      describe itself.
Continuity and Services



#4 We must lower our dependence on
 any one node in the system – be it
hardware, software or even a person.
Continuity and Services
    #4 We must lower our dependence on any one
●

    node in the system – be it hardware, software
    or even a person.
        This is another reason for #1 – keeping things
    –
        simple.
             Lowers the cost of hardware replacement and upgrade –
         ●

             simple hardware tends to be cheaper and more readily
             available; lowering the number of crucial features
             broadens what can be used.
             Software – Oxford has been around for 1000 years.
         ●

             Vendors will not be.
             People – a simpler system is more readily understood by
         ●

             new/replacement engineers.
Challenges to maintaining
                 Continuity
    Flexibility
●


    Scalability
●


    Longevity
●


    Availability
●


    Sustainability
●


    Interoperability
●
Challenges to maintaining
                   Continuity
    Flexibility
●


        Open standards and open APIs allow us to provide
    ●

        the tools people need to create the experience with
        the content they want.
        'Text' based metadata – XML, RDF, RDFa, JSON
    ●


        Componentised web services providing open APIs –
    ●

        flexibility through dynamically selecting what services
        run at anytime.
        Open Source
    ●
Challenges to maintaining
                   Continuity
    Scalability
●


        We need to anticipate exponential growth in demand,
    ●

        with the knowledge that storage will be longterm
        A rough estimate of all the data produced in the world
    ●

        every second is 7km of CDROMs – broadcast video,
        big science, CCTVs, cameras, research, etc. This
        figure is only ever going to go up.
        To scale like the web, we have to be like the web; not
    ●

        one single black box and workflow, but a distributed
        net of storage and services, simply interlinked.
        The “Billion file [inode] problem” must be avoided
    ●
Challenges to maintaining
                        Continuity
    Longevity
●


        Live Replicas (Backup Problems)
    ●


            MAID (Power)
        ●



        Self-healing Systems (Resilience)
    ●


        Simplicity of Interfaces
    ●


        Avoid 3rd Party dependence
    ●


        Support Heterogeneity
    ●


                Resolvers/Abstraction Layers
            ●
Challenges to maintaining
                      Continuity
    Availability
●


        Basic IT availability
    ●


        Enhanced long-term availability
    ●


        Archival recoverability
    ●


        Digital Preservation
    ●


            Conversion
        ●


            Emulation
        ●


            Archive Preservation
        ●
Challenges to maintaining
                   Continuity
    Sustainability
●


        Budget and cost as a conventional library
    ●


        Factor archival costs into projects
    ●


        Leverage content to generate income
    ●


        Migrate skills
    ●
Challenges to maintaining
                      Continuity
    Interoperability
●



        Interoperability is an ongoing process
    ●

            Support for emerging and established standards
        ●



        Persistent, stable, well-defined interfaces
    ●


        Ideally implement interfaces bidirectionally
    ●


        Avoid low-level interfaces – abstract as much as
    ●

        feasible
        Embrace the web – if you do it right, people will
    ●

        reuse your content in ways you never thought of and
        never thought possible.
Phase 1 – current hardware
                            Public

        VMWare ESX
           SATA
MDICS




            VM Image
            Store (2TB)




         HC 1        HC 2
Phase 1 – current hardware
                            Public

        VMWare ESX
           SATA
MDICS




            VM Image
            Store (2TB)




                              Content layer
         HC 1        HC 2
Phase 1 – current hardware
                             Public

        VMWare ESX
           SATA
MDICS




            VM Image
                            Service storage layer
            Store (2TB)




         HC 1        HC 2
Phase 1 – current hardware
                             Public

        VMWare ESX

                            Service execution layer
           SATA
MDICS




            VM Image
            Store (2TB)




         HC 1        HC 2
Phase 2 – (expecting hardware
             delivery this month)
                                Public

        VMWare ESX                            VMWare ESX


                                Private
           NFS




                                                            NFS
                                Storage
                      ZFS




                                                  ZFS
MDICS




            VM Image                              VM Image
                            ZFS Replication
            Store (2TB)                           Store (2TB)

                               Archive


         HC 1        HC 2
Aim – multisite (add sites as
             required to scale up)
                                Public
RSL                                           Osney?

        VMWare ESX                              VMWare ESX




                                                                        Sun Rays
                                    Private
           NFS




                                                                 NFS
                                Storage
                      ZFS




                                                       ZFS




                                                                        FutureArch
MDICS




            VM Image                                   VM Image
                            ZFS Replication
            Store (2TB)                                Store (2TB)

                                Archive

                             Split Writes &     Thumper       Thumper
         HC 1        HC 2
                             Crosschecks           1             2
Backend
services (blue)
   fit over the
content storage
 in 'stacks' and
      provide
  capability to
 other services
      (green)
Columns of services over common
     content/storage layer
ORA




                   Databank



         FMO
                              ...



           Storage layer
End of Presentation




 Any Questions?
Ben O'Steen
benjamin.osteen@ouls.ox.ac.uk

         Neil Jefferies
 neil.jefferies@sers.ox.ac.uk
      www.sers.ox.ac.uk

  Oxford Research Archive
   http://ora.ouls.ox.ac.uk

   Developer's Blog
http://oxfordrepo.blogspot.com

More Related Content

What's hot

The DDS Tutorial - Part I
The DDS Tutorial - Part IThe DDS Tutorial - Part I
The DDS Tutorial - Part IAngelo Corsaro
 
Classical Distributed Algorithms with DDS
Classical Distributed Algorithms with DDSClassical Distributed Algorithms with DDS
Classical Distributed Algorithms with DDSAngelo Corsaro
 
Authenticated Key Exchange Protocols for Parallel Network File Systems
Authenticated Key Exchange Protocols for Parallel Network File SystemsAuthenticated Key Exchange Protocols for Parallel Network File Systems
Authenticated Key Exchange Protocols for Parallel Network File Systems1crore projects
 
OMG DDS: The Data Distribution Service for Real-Time Systems
OMG DDS: The Data Distribution Service for Real-Time SystemsOMG DDS: The Data Distribution Service for Real-Time Systems
OMG DDS: The Data Distribution Service for Real-Time SystemsAngelo Corsaro
 
OpenSplice DDS Tutorial -- Part II
OpenSplice DDS Tutorial -- Part IIOpenSplice DDS Tutorial -- Part II
OpenSplice DDS Tutorial -- Part IIAngelo Corsaro
 
Communication Patterns Using Data-Centric Publish/Subscribe
Communication Patterns Using Data-Centric Publish/SubscribeCommunication Patterns Using Data-Centric Publish/Subscribe
Communication Patterns Using Data-Centric Publish/SubscribeSumant Tambe
 
Standardizing the Data Distribution Service (DDS) API for Modern C++
Standardizing the Data Distribution Service (DDS) API for Modern C++Standardizing the Data Distribution Service (DDS) API for Modern C++
Standardizing the Data Distribution Service (DDS) API for Modern C++Sumant Tambe
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...Niraj Tolia
 
Advanced OpenSplice Programming - Part I
Advanced OpenSplice Programming - Part IAdvanced OpenSplice Programming - Part I
Advanced OpenSplice Programming - Part IAngelo Corsaro
 
A Gentle Introduction to OpenSplice DDS
A Gentle Introduction to OpenSplice DDSA Gentle Introduction to OpenSplice DDS
A Gentle Introduction to OpenSplice DDSAngelo Corsaro
 
High Performance Distributed Computing with DDS and Scala
High Performance Distributed Computing with DDS and ScalaHigh Performance Distributed Computing with DDS and Scala
High Performance Distributed Computing with DDS and ScalaAngelo Corsaro
 
From IaaS to PaaS to Docker Networking to … Cloud Networking Scalability
From IaaS to PaaS to Docker Networking to … Cloud Networking ScalabilityFrom IaaS to PaaS to Docker Networking to … Cloud Networking Scalability
From IaaS to PaaS to Docker Networking to … Cloud Networking ScalabilityDaoliCloud Ltd
 
DDS for JMS Programmers
DDS for JMS ProgrammersDDS for JMS Programmers
DDS for JMS ProgrammersAngelo Corsaro
 
a hybrid cloud approach for secure authorized
a hybrid cloud approach for secure authorizeda hybrid cloud approach for secure authorized
a hybrid cloud approach for secure authorizedlogicsystemsprojects
 
DDS Interoperability Demo 2013 (Washington DC)
DDS Interoperability Demo 2013 (Washington DC)DDS Interoperability Demo 2013 (Washington DC)
DDS Interoperability Demo 2013 (Washington DC)Gerardo Pardo-Castellote
 
Getting Started in DDS with C++ and Java
Getting Started in DDS with C++ and JavaGetting Started in DDS with C++ and Java
Getting Started in DDS with C++ and JavaAngelo Corsaro
 

What's hot (20)

The DDS Tutorial - Part I
The DDS Tutorial - Part IThe DDS Tutorial - Part I
The DDS Tutorial - Part I
 
Classical Distributed Algorithms with DDS
Classical Distributed Algorithms with DDSClassical Distributed Algorithms with DDS
Classical Distributed Algorithms with DDS
 
Authenticated Key Exchange Protocols for Parallel Network File Systems
Authenticated Key Exchange Protocols for Parallel Network File SystemsAuthenticated Key Exchange Protocols for Parallel Network File Systems
Authenticated Key Exchange Protocols for Parallel Network File Systems
 
Lonza
Lonza Lonza
Lonza
 
OMG DDS: The Data Distribution Service for Real-Time Systems
OMG DDS: The Data Distribution Service for Real-Time SystemsOMG DDS: The Data Distribution Service for Real-Time Systems
OMG DDS: The Data Distribution Service for Real-Time Systems
 
Cont0519
Cont0519Cont0519
Cont0519
 
OpenSplice DDS Tutorial -- Part II
OpenSplice DDS Tutorial -- Part IIOpenSplice DDS Tutorial -- Part II
OpenSplice DDS Tutorial -- Part II
 
Communication Patterns Using Data-Centric Publish/Subscribe
Communication Patterns Using Data-Centric Publish/SubscribeCommunication Patterns Using Data-Centric Publish/Subscribe
Communication Patterns Using Data-Centric Publish/Subscribe
 
DDS QoS Unleashed
DDS QoS UnleashedDDS QoS Unleashed
DDS QoS Unleashed
 
Standardizing the Data Distribution Service (DDS) API for Modern C++
Standardizing the Data Distribution Service (DDS) API for Modern C++Standardizing the Data Distribution Service (DDS) API for Modern C++
Standardizing the Data Distribution Service (DDS) API for Modern C++
 
Restfs
RestfsRestfs
Restfs
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
 
Advanced OpenSplice Programming - Part I
Advanced OpenSplice Programming - Part IAdvanced OpenSplice Programming - Part I
Advanced OpenSplice Programming - Part I
 
A Gentle Introduction to OpenSplice DDS
A Gentle Introduction to OpenSplice DDSA Gentle Introduction to OpenSplice DDS
A Gentle Introduction to OpenSplice DDS
 
High Performance Distributed Computing with DDS and Scala
High Performance Distributed Computing with DDS and ScalaHigh Performance Distributed Computing with DDS and Scala
High Performance Distributed Computing with DDS and Scala
 
From IaaS to PaaS to Docker Networking to … Cloud Networking Scalability
From IaaS to PaaS to Docker Networking to … Cloud Networking ScalabilityFrom IaaS to PaaS to Docker Networking to … Cloud Networking Scalability
From IaaS to PaaS to Docker Networking to … Cloud Networking Scalability
 
DDS for JMS Programmers
DDS for JMS ProgrammersDDS for JMS Programmers
DDS for JMS Programmers
 
a hybrid cloud approach for secure authorized
a hybrid cloud approach for secure authorizeda hybrid cloud approach for secure authorized
a hybrid cloud approach for secure authorized
 
DDS Interoperability Demo 2013 (Washington DC)
DDS Interoperability Demo 2013 (Washington DC)DDS Interoperability Demo 2013 (Washington DC)
DDS Interoperability Demo 2013 (Washington DC)
 
Getting Started in DDS with C++ and Java
Getting Started in DDS with C++ and JavaGetting Started in DDS with C++ and Java
Getting Started in DDS with C++ and Java
 

Viewers also liked

Bj Cv Interactive Menu5
Bj Cv Interactive Menu5Bj Cv Interactive Menu5
Bj Cv Interactive Menu5Barry Jones
 
The beauty of virtualization
The beauty of virtualizationThe beauty of virtualization
The beauty of virtualizationVivek Juneja
 
Minicom introduction
Minicom introductionMinicom introduction
Minicom introductionelisasson
 
Minicom introduction
Minicom introductionMinicom introduction
Minicom introductionelisasson
 
Minicom White Paper Using Ram To Increase Security And Improve Efficiency In ...
Minicom White Paper Using Ram To Increase Security And Improve Efficiency In ...Minicom White Paper Using Ram To Increase Security And Improve Efficiency In ...
Minicom White Paper Using Ram To Increase Security And Improve Efficiency In ...elisasson
 
Bj Cv With Interactive Menu
Bj Cv With Interactive MenuBj Cv With Interactive Menu
Bj Cv With Interactive MenuBarry Jones
 
Don't Just Present, Enchant !
Don't Just Present, Enchant !Don't Just Present, Enchant !
Don't Just Present, Enchant !Vivek Juneja
 
Humanize the android
Humanize the androidHumanize the android
Humanize the androidVivek Juneja
 
רב תרבותיות באמנות הישראלית נעמה ששון מצגת
רב תרבותיות באמנות הישראלית   נעמה ששון    מצגתרב תרבותיות באמנות הישראלית   נעמה ששון    מצגת
רב תרבותיות באמנות הישראלית נעמה ששון מצגתelisasson
 
赢在执行
赢在执行赢在执行
赢在执行lei zhou
 
青團隊組織及招募
青團隊組織及招募青團隊組織及招募
青團隊組織及招募minwei
 
Progetto mappatura CSO per Banca Cassa di Risparmio di Firenze
Progetto mappatura CSO per Banca Cassa di Risparmio di FirenzeProgetto mappatura CSO per Banca Cassa di Risparmio di Firenze
Progetto mappatura CSO per Banca Cassa di Risparmio di FirenzeBusiness Resiliente
 
Sus Clubes Y Argentina
Sus Clubes Y ArgentinaSus Clubes Y Argentina
Sus Clubes Y Argentinaguest3c126ab
 
Sus Clubes Y Argentina
Sus Clubes Y ArgentinaSus Clubes Y Argentina
Sus Clubes Y Argentinaguest3c126ab
 

Viewers also liked (16)

Bj Cv Interactive Menu5
Bj Cv Interactive Menu5Bj Cv Interactive Menu5
Bj Cv Interactive Menu5
 
The beauty of virtualization
The beauty of virtualizationThe beauty of virtualization
The beauty of virtualization
 
Minicom introduction
Minicom introductionMinicom introduction
Minicom introduction
 
Minicom introduction
Minicom introductionMinicom introduction
Minicom introduction
 
Week 3 relative clause
Week 3 relative clauseWeek 3 relative clause
Week 3 relative clause
 
Minicom White Paper Using Ram To Increase Security And Improve Efficiency In ...
Minicom White Paper Using Ram To Increase Security And Improve Efficiency In ...Minicom White Paper Using Ram To Increase Security And Improve Efficiency In ...
Minicom White Paper Using Ram To Increase Security And Improve Efficiency In ...
 
Bj Cv With Interactive Menu
Bj Cv With Interactive MenuBj Cv With Interactive Menu
Bj Cv With Interactive Menu
 
Don't Just Present, Enchant !
Don't Just Present, Enchant !Don't Just Present, Enchant !
Don't Just Present, Enchant !
 
Humanize the android
Humanize the androidHumanize the android
Humanize the android
 
רב תרבותיות באמנות הישראלית נעמה ששון מצגת
רב תרבותיות באמנות הישראלית   נעמה ששון    מצגתרב תרבותיות באמנות הישראלית   נעמה ששון    מצגת
רב תרבותיות באמנות הישראלית נעמה ששון מצגת
 
赢在执行
赢在执行赢在执行
赢在执行
 
青團隊組織及招募
青團隊組織及招募青團隊組織及招募
青團隊組織及招募
 
Progetto mappatura CSO per Banca Cassa di Risparmio di Firenze
Progetto mappatura CSO per Banca Cassa di Risparmio di FirenzeProgetto mappatura CSO per Banca Cassa di Risparmio di Firenze
Progetto mappatura CSO per Banca Cassa di Risparmio di Firenze
 
Renovys - Sintesi Business Plan
Renovys - Sintesi Business PlanRenovys - Sintesi Business Plan
Renovys - Sintesi Business Plan
 
Sus Clubes Y Argentina
Sus Clubes Y ArgentinaSus Clubes Y Argentina
Sus Clubes Y Argentina
 
Sus Clubes Y Argentina
Sus Clubes Y ArgentinaSus Clubes Y Argentina
Sus Clubes Y Argentina
 

Similar to Digital Kniznica 09 Ben Osteen Oxford

Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDATAVERSITY
 
Sdwan webinar
Sdwan webinarSdwan webinar
Sdwan webinarpmohapat
 
Privacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagePrivacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagedbpublications
 
Skillwise Consulting -Technical competency
Skillwise Consulting -Technical competencySkillwise Consulting -Technical competency
Skillwise Consulting -Technical competencySkillwise Consulting
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 
Demystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data ScientistsDemystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data ScientistsDr Ganesh Iyer
 
Future-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayFuture-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayJohn Kunze
 
What is Your Edge From the Cloud to the Edge, Extending Your Reach
What is Your Edge From the Cloud to the Edge, Extending Your ReachWhat is Your Edge From the Cloud to the Edge, Extending Your Reach
What is Your Edge From the Cloud to the Edge, Extending Your ReachSUSE
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SAMeh Zaghloul
 

Similar to Digital Kniznica 09 Ben Osteen Oxford (20)

Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Distributed Systems in Data Engineering
Distributed Systems in Data EngineeringDistributed Systems in Data Engineering
Distributed Systems in Data Engineering
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
 
As34269277
As34269277As34269277
As34269277
 
Sdwan webinar
Sdwan webinarSdwan webinar
Sdwan webinar
 
Privacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagePrivacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storage
 
Skillwise Consulting -Technical competency
Skillwise Consulting -Technical competencySkillwise Consulting -Technical competency
Skillwise Consulting -Technical competency
 
kumarResume
kumarResumekumarResume
kumarResume
 
Techmeeting-17feb2016
Techmeeting-17feb2016Techmeeting-17feb2016
Techmeeting-17feb2016
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Demystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data ScientistsDemystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data Scientists
 
Future-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayFuture-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do Today
 
What is Your Edge From the Cloud to the Edge, Extending Your Reach
What is Your Edge From the Cloud to the Edge, Extending Your ReachWhat is Your Edge From the Cloud to the Edge, Extending Your Reach
What is Your Edge From the Cloud to the Edge, Extending Your Reach
 
Technical Skillwise
Technical SkillwiseTechnical Skillwise
Technical Skillwise
 
Convergence Best Poster Award
Convergence Best Poster AwardConvergence Best Poster Award
Convergence Best Poster Award
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
 
Microservices Architecture
Microservices ArchitectureMicroservices Architecture
Microservices Architecture
 

Recently uploaded

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Digital Kniznica 09 Ben Osteen Oxford

  • 1. Oxford's DAMS Architecture Ben O'Steen Software Engineer, Oxford University Library Services Neil Jefferies R&D Project Manager Oxford University Library Services
  • 2. 'An institutional repository needs to be a service with continuity behind it........Institutions need to recognize that they are making commitments for the long term.' Clifford Lynch, 2004
  • 3. Continuity and Services We anticipate that the content our systems will hold or even hold now will outlive any software, hardware or even people involved in its creation.
  • 4. Continuity and Services What content are we anticipating to store for the long-term?
  • 5. Continuity and Services “Bookshelves for the 21st Century” ● Legal Deposit ● Voluntary Deposit ● Digitisation ● Extended Remit - Research Data ● Administrative data – researchers, projects, places, ● dates And anything else deemed worthy. ●
  • 6. Continuity and Services What core principles do you design a system with when you know the content will outlive you?
  • 7. Continuity and Services Oxford DAMS – Digital Asset Management ● System – Not a single piece of software, more a set of guiding principles and aims.
  • 8. Continuity and Services #1 Keep it simple – simple is maintainable, easy to understand.
  • 9. Continuity and Services #1 Keep it simple – simple is maintainable, ● easy to understand. Think about how much we have achieved with the – simple concept of using a single command on a simple protocol (HTTP GET) to get an HTML document. (HTML being a loose, ill-constrained set of elements.) “GET / HTTP1.1”
  • 10. Continuity and Services #1 Keep it simple – simple is maintainable, ● easy to understand. simple protocols such as HTTP, simple formats – such as JSON and API patterns such as REST are heavily favoured Remember, at some point, administration and – development of the system will need to be handed over.
  • 11. Continuity and Services #2 Everything but the content is replaceable
  • 12. Continuity and Services #2 Everything but the content is replaceable ● The content that is being held by the system at any – one time is what is important, not the surrounding infrastructure The content should survive and be reusable in the – event that services like databases, search indexes, etc crash or corrupt.
  • 13. Continuity and Services st 1 logical outcome: Do not use complex packages to hold your content! Minimise the amount of software and work that you need to maintain to understand how your content is stored.
  • 14. Continuity and Services 1st logical outcome: ● – Do not use complex packages to hold your content! The basic, low-level digital object in our – system is a “bag” ● Each bag has a manifest, currently in either FOXML or RDF, which serves to identify the component parts and their inter-relationships.
  • 15. Continuity and Services #3 The services in the system are replaceable and can be rebuilt from the content.
  • 16. Continuity and Services #3 The services in the system are replaceable ● and can be rebuilt from the content. Services such as: – Search indexes (eg via Lucene/Solr) ● Databases and RDF triplestores (MySQL, Mulgara, etc) ● XML databases (eXist db, etc) ● Object management middleware (Fedora) ● Object registries ●
  • 17. Continuity and Services 2nd logical outcome: ● –A content store must be able to tell services when items have changed, been created, or deleted, and what items an individual store holds. We need slightly smarter storage to do this over a network. – Local filesystems have had this paradigm for a long time now, but we need to migrate it to the network level.
  • 18. Continuity and Services 2nd logical outcome: ● A content store must be able to tell services – when items have changed, been created, or deleted, and what items an individual store holds. Each store and set of services needs a way to pass – messages in a transactional way, a messaging system. [Tech note: we are moving from the JMS to use the new – AMQP via Rabbit MQ because on balance it is much more practical, flexible and useful.]
  • 19. Continuity and Services 3rd logical outcome: ● – The storage must store the information and documentation for the standards and conventions its content follows. – The storage must be able to describe itself.
  • 20. Continuity and Services #4 We must lower our dependence on any one node in the system – be it hardware, software or even a person.
  • 21. Continuity and Services #4 We must lower our dependence on any one ● node in the system – be it hardware, software or even a person. This is another reason for #1 – keeping things – simple. Lowers the cost of hardware replacement and upgrade – ● simple hardware tends to be cheaper and more readily available; lowering the number of crucial features broadens what can be used. Software – Oxford has been around for 1000 years. ● Vendors will not be. People – a simpler system is more readily understood by ● new/replacement engineers.
  • 22. Challenges to maintaining Continuity Flexibility ● Scalability ● Longevity ● Availability ● Sustainability ● Interoperability ●
  • 23. Challenges to maintaining Continuity Flexibility ● Open standards and open APIs allow us to provide ● the tools people need to create the experience with the content they want. 'Text' based metadata – XML, RDF, RDFa, JSON ● Componentised web services providing open APIs – ● flexibility through dynamically selecting what services run at anytime. Open Source ●
  • 24. Challenges to maintaining Continuity Scalability ● We need to anticipate exponential growth in demand, ● with the knowledge that storage will be longterm A rough estimate of all the data produced in the world ● every second is 7km of CDROMs – broadcast video, big science, CCTVs, cameras, research, etc. This figure is only ever going to go up. To scale like the web, we have to be like the web; not ● one single black box and workflow, but a distributed net of storage and services, simply interlinked. The “Billion file [inode] problem” must be avoided ●
  • 25. Challenges to maintaining Continuity Longevity ● Live Replicas (Backup Problems) ● MAID (Power) ● Self-healing Systems (Resilience) ● Simplicity of Interfaces ● Avoid 3rd Party dependence ● Support Heterogeneity ● Resolvers/Abstraction Layers ●
  • 26. Challenges to maintaining Continuity Availability ● Basic IT availability ● Enhanced long-term availability ● Archival recoverability ● Digital Preservation ● Conversion ● Emulation ● Archive Preservation ●
  • 27. Challenges to maintaining Continuity Sustainability ● Budget and cost as a conventional library ● Factor archival costs into projects ● Leverage content to generate income ● Migrate skills ●
  • 28. Challenges to maintaining Continuity Interoperability ● Interoperability is an ongoing process ● Support for emerging and established standards ● Persistent, stable, well-defined interfaces ● Ideally implement interfaces bidirectionally ● Avoid low-level interfaces – abstract as much as ● feasible Embrace the web – if you do it right, people will ● reuse your content in ways you never thought of and never thought possible.
  • 29. Phase 1 – current hardware Public VMWare ESX SATA MDICS VM Image Store (2TB) HC 1 HC 2
  • 30. Phase 1 – current hardware Public VMWare ESX SATA MDICS VM Image Store (2TB) Content layer HC 1 HC 2
  • 31. Phase 1 – current hardware Public VMWare ESX SATA MDICS VM Image Service storage layer Store (2TB) HC 1 HC 2
  • 32. Phase 1 – current hardware Public VMWare ESX Service execution layer SATA MDICS VM Image Store (2TB) HC 1 HC 2
  • 33. Phase 2 – (expecting hardware delivery this month) Public VMWare ESX VMWare ESX Private NFS NFS Storage ZFS ZFS MDICS VM Image VM Image ZFS Replication Store (2TB) Store (2TB) Archive HC 1 HC 2
  • 34. Aim – multisite (add sites as required to scale up) Public RSL Osney? VMWare ESX VMWare ESX Sun Rays Private NFS NFS Storage ZFS ZFS FutureArch MDICS VM Image VM Image ZFS Replication Store (2TB) Store (2TB) Archive Split Writes & Thumper Thumper HC 1 HC 2 Crosschecks 1 2
  • 35. Backend services (blue) fit over the content storage in 'stacks' and provide capability to other services (green)
  • 36. Columns of services over common content/storage layer ORA Databank FMO ... Storage layer
  • 37. End of Presentation Any Questions?
  • 38. Ben O'Steen benjamin.osteen@ouls.ox.ac.uk Neil Jefferies neil.jefferies@sers.ox.ac.uk www.sers.ox.ac.uk Oxford Research Archive http://ora.ouls.ox.ac.uk Developer's Blog http://oxfordrepo.blogspot.com