Low-cost Open Data As-a-Service
Marin Dimitrov, Alex Simov, Yavor Petkov
May 31st, 2015
Low-cost Open Data as-a-Service / SemDev’2015 #1May 2015
• Use cases & requirements
• Cloud architecture for a RDF DBaaS
• Lessons learned
Contents
#2May 2015Low-cost Open Data as-a-Service / SemDev’2015
Use Cases & Requirements
#3May 2015Low-cost Open Data as-a-Service / SemDev’2015
Why an RDF DBaaS?
#4May 2015Low-cost Open Data as-a-Service / SemDev’2015
Grafter Grafterizer
RDF DBaaSOpen Data Portal
• Transform tabular data into RDF
• Publish (Linked) data services,
instead of static datasets
• Lower-cost & easier data
publishing process
Why an RDF DBaaS?
#5May 2015Low-cost Open Data as-a-Service / SemDev’2015
• Transform textual data into RDF
• Linked data services
• Low-cost & easy to use
• Elastic
– dynamically adapt to growing data & query volumes
• High availability & resilience
– no SPFs, “graceful degradation” upon failures
• Cost efficient
• Host a large number of data services (databases)
– But probably of low/moderate data & query volume
• Isolation of the multi-tenant databases
DBaaS requirements
#6May 2015Low-cost Open Data as-a-Service / SemDev’2015
Not easy to
achieve all three!
Cloud Architecture
#7May 2015Low-cost Open Data as-a-Service / SemDev’2015
• AWS based
– Network storage, compute & autoscaling, load
balancing, integration services, …
• Ontotext GraphDB as the RDF DB engine
– OpenRDF REST API
• Docker for containerisation
• An RDF DBaaS is…
– A GraphDB instance…
– Running within a Docker container…
– Storing its data on a private NAS volume
DBaaS architecture on AWS
#8May 2015Low-cost Open Data as-a-Service / SemDev’2015
DBaaS architecture on AWS
#9May 2015Low-cost Open Data as-a-Service / SemDev’2015
Elasticity vs
High Availability vs
Cost Efficiency
Dealing with failures
#10May 2015Low-cost Open Data as-a-Service / SemDev’2015
our responsibility
CSP responsibility
• Elastic
– Routing nodes, data nodes + NAS storage grow as usage
grows
• High availability & resilience
– Strategies for dealing with failures in data, routing,
Coordinator nodes
– Planned: multi-DC deployment with replication
• Cost efficient
– Cloud native architecture -> cost savings
– Multi-tenant model -> cost savings
– Elastic: return underutilised or unused resources back
to CSP
Evaluation
#11May 2015Low-cost Open Data as-a-Service / SemDev’2015
Lessons Learned
#12May 2015Low-cost Open Data as-a-Service / SemDev’2015
• Cloud-native architecture
– Improved scalability, reliability, cost savings
• A microservice architecture will continuously
evolve
• Assume that failures will happen on all levels
– Design for “graceful degradation”
• A good DevOps process is essential
Lessons Learned
#13May 2015Low-cost Open Data as-a-Service / SemDev’2015
Discussion
#14May 2015Low-cost Open Data as-a-Service / SemDev’2015
• Use it for free!
– http://s4.ontotext.com (available NOW)
– http://dapaas.eu (end of June)
• Send us questions, comments, criticism,
suggestions for improvements, …
Help us improve it!
#15May 2015Low-cost Open Data as-a-Service / SemDev’2015
• Are you measuring the TCO of your on-premise
RDF databases?
– Important for many Open Data scenarios
• What is your #1 concern for using an RDF DBaaS
• Do you have use cases where your productivity
will increase by using an RDF DBaaS
– Experiment & prototype faster; focus on building apps,
don’t worry about infrastructure; provision new DBs
instantly…
– Real world example: training courses by Ontotext
switching from local deployments to the RDF DBaaS
Discussion topics
#16May 2015Low-cost Open Data as-a-Service / SemDev’2015
Thank you!
#17May 2015Low-cost Open Data as-a-Service / SemDev’2015

Low-cost Open Data As-a-Service

  • 1.
    Low-cost Open DataAs-a-Service Marin Dimitrov, Alex Simov, Yavor Petkov May 31st, 2015 Low-cost Open Data as-a-Service / SemDev’2015 #1May 2015
  • 2.
    • Use cases& requirements • Cloud architecture for a RDF DBaaS • Lessons learned Contents #2May 2015Low-cost Open Data as-a-Service / SemDev’2015
  • 3.
    Use Cases &Requirements #3May 2015Low-cost Open Data as-a-Service / SemDev’2015
  • 4.
    Why an RDFDBaaS? #4May 2015Low-cost Open Data as-a-Service / SemDev’2015 Grafter Grafterizer RDF DBaaSOpen Data Portal • Transform tabular data into RDF • Publish (Linked) data services, instead of static datasets • Lower-cost & easier data publishing process
  • 5.
    Why an RDFDBaaS? #5May 2015Low-cost Open Data as-a-Service / SemDev’2015 • Transform textual data into RDF • Linked data services • Low-cost & easy to use
  • 6.
    • Elastic – dynamicallyadapt to growing data & query volumes • High availability & resilience – no SPFs, “graceful degradation” upon failures • Cost efficient • Host a large number of data services (databases) – But probably of low/moderate data & query volume • Isolation of the multi-tenant databases DBaaS requirements #6May 2015Low-cost Open Data as-a-Service / SemDev’2015 Not easy to achieve all three!
  • 7.
    Cloud Architecture #7May 2015Low-costOpen Data as-a-Service / SemDev’2015
  • 8.
    • AWS based –Network storage, compute & autoscaling, load balancing, integration services, … • Ontotext GraphDB as the RDF DB engine – OpenRDF REST API • Docker for containerisation • An RDF DBaaS is… – A GraphDB instance… – Running within a Docker container… – Storing its data on a private NAS volume DBaaS architecture on AWS #8May 2015Low-cost Open Data as-a-Service / SemDev’2015
  • 9.
    DBaaS architecture onAWS #9May 2015Low-cost Open Data as-a-Service / SemDev’2015 Elasticity vs High Availability vs Cost Efficiency
  • 10.
    Dealing with failures #10May2015Low-cost Open Data as-a-Service / SemDev’2015 our responsibility CSP responsibility
  • 11.
    • Elastic – Routingnodes, data nodes + NAS storage grow as usage grows • High availability & resilience – Strategies for dealing with failures in data, routing, Coordinator nodes – Planned: multi-DC deployment with replication • Cost efficient – Cloud native architecture -> cost savings – Multi-tenant model -> cost savings – Elastic: return underutilised or unused resources back to CSP Evaluation #11May 2015Low-cost Open Data as-a-Service / SemDev’2015
  • 12.
    Lessons Learned #12May 2015Low-costOpen Data as-a-Service / SemDev’2015
  • 13.
    • Cloud-native architecture –Improved scalability, reliability, cost savings • A microservice architecture will continuously evolve • Assume that failures will happen on all levels – Design for “graceful degradation” • A good DevOps process is essential Lessons Learned #13May 2015Low-cost Open Data as-a-Service / SemDev’2015
  • 14.
    Discussion #14May 2015Low-cost OpenData as-a-Service / SemDev’2015
  • 15.
    • Use itfor free! – http://s4.ontotext.com (available NOW) – http://dapaas.eu (end of June) • Send us questions, comments, criticism, suggestions for improvements, … Help us improve it! #15May 2015Low-cost Open Data as-a-Service / SemDev’2015
  • 16.
    • Are youmeasuring the TCO of your on-premise RDF databases? – Important for many Open Data scenarios • What is your #1 concern for using an RDF DBaaS • Do you have use cases where your productivity will increase by using an RDF DBaaS – Experiment & prototype faster; focus on building apps, don’t worry about infrastructure; provision new DBs instantly… – Real world example: training courses by Ontotext switching from local deployments to the RDF DBaaS Discussion topics #16May 2015Low-cost Open Data as-a-Service / SemDev’2015
  • 17.
    Thank you! #17May 2015Low-costOpen Data as-a-Service / SemDev’2015