slides from the S4 webinar "On-Demand RDF Graph Databases in the Cloud"
RDF database-as-a-service running on the Self-Service Semantic Suite (S4) platform: http://s4.ontotext.com
video recording of the talk is available at http://info.ontotext.com/on-demand-rdf-graph-database
slides from our talk "Low-Cost Open Data as-a-service" from the Semantic Web Developers workshop of ESWC'2015 (full paper: http://ceur-ws.org/Vol-1361/paper7.pdf)
Text Analytics & Linked Data Management As-a-ServiceMarin Dimitrov
slides from the talk on "Text Analytics & Linked Data Management As-a-Service with S4" from the ESWC'2015 workshop on Semantic Web Enterprise Adoption & Best Practices
full paper available at http://2015.wasabi-ws.org/papers/wasabi15_1.pdf
Enabling Low-cost Open Data Publishing and ReuseMarin Dimitrov
In the space of just a few years we’ve seen the transformational power of open data; both for transparency and accountability in public data, and efficiency and innovation with businesses in private data. In its first year, institutions and individuals throughout Europe have supported public sector bodies in releasing data and numerous start-ups, developers and SMEs in reusing this data for economic benefit.
However, we are still at the beginning of the open data movement, and there is still more that can be done to make open data simpler to use and to make it available to a wider audience.
The core goal of the DaPaaS project is to provide a Data- and Platform-as-a-Service environment, where 3rd parties (such as governmental organisations, SMEs, developers and larger companies) can publish and host both data sets and data-intensive applications, which can then be accessed by end-user applications in a cross-platform manner. You can find out more about DaPaaS on the detailed about page.
Essentially, DaPaaS aims to make publishing, consumption, and reuse of open data, as well as deploying open data applications, easier and cheaper for SMEs and small public bodies which otherwise may not have sufficient technical expertise, infrastructure and resources required to do so.
see also http://www.slideshare.net/eswcsummerschool/wed-roman-tutopendatapub-38742186
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationRichard Cyganiak
A presentation I gave at I-Semantics 2010 on Sigma EE, an RDF-based data integration front-end.
Sigma EE is now available for download here: http://sig.ma/?page=help
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...Databricks
AI is fundamentally transforming how we live and work.
Zalando is a data driven company. We deliver an optimal customer experience that drives engagement. We continue to improve this experience by leveraging the latest technologies and machine learning techniques — such as building a cutting edge cloud based infrastructure to support our operations at scale.
We provide our data scientists across Zalando with the means to implement artificial intelligence use cases, leveraging data from all parts of our company and the best machine learning techniques from across the industry. Apache Spark delivered through Databricks is at the core of this strategy.
In this keynote, I’ll share our AI journey thus far, and share how we are exploring ways to unify data through A.I. with Spark and Databricks.
slides from our talk "Low-Cost Open Data as-a-service" from the Semantic Web Developers workshop of ESWC'2015 (full paper: http://ceur-ws.org/Vol-1361/paper7.pdf)
Text Analytics & Linked Data Management As-a-ServiceMarin Dimitrov
slides from the talk on "Text Analytics & Linked Data Management As-a-Service with S4" from the ESWC'2015 workshop on Semantic Web Enterprise Adoption & Best Practices
full paper available at http://2015.wasabi-ws.org/papers/wasabi15_1.pdf
Enabling Low-cost Open Data Publishing and ReuseMarin Dimitrov
In the space of just a few years we’ve seen the transformational power of open data; both for transparency and accountability in public data, and efficiency and innovation with businesses in private data. In its first year, institutions and individuals throughout Europe have supported public sector bodies in releasing data and numerous start-ups, developers and SMEs in reusing this data for economic benefit.
However, we are still at the beginning of the open data movement, and there is still more that can be done to make open data simpler to use and to make it available to a wider audience.
The core goal of the DaPaaS project is to provide a Data- and Platform-as-a-Service environment, where 3rd parties (such as governmental organisations, SMEs, developers and larger companies) can publish and host both data sets and data-intensive applications, which can then be accessed by end-user applications in a cross-platform manner. You can find out more about DaPaaS on the detailed about page.
Essentially, DaPaaS aims to make publishing, consumption, and reuse of open data, as well as deploying open data applications, easier and cheaper for SMEs and small public bodies which otherwise may not have sufficient technical expertise, infrastructure and resources required to do so.
see also http://www.slideshare.net/eswcsummerschool/wed-roman-tutopendatapub-38742186
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationRichard Cyganiak
A presentation I gave at I-Semantics 2010 on Sigma EE, an RDF-based data integration front-end.
Sigma EE is now available for download here: http://sig.ma/?page=help
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...Databricks
AI is fundamentally transforming how we live and work.
Zalando is a data driven company. We deliver an optimal customer experience that drives engagement. We continue to improve this experience by leveraging the latest technologies and machine learning techniques — such as building a cutting edge cloud based infrastructure to support our operations at scale.
We provide our data scientists across Zalando with the means to implement artificial intelligence use cases, leveraging data from all parts of our company and the best machine learning techniques from across the industry. Apache Spark delivered through Databricks is at the core of this strategy.
In this keynote, I’ll share our AI journey thus far, and share how we are exploring ways to unify data through A.I. with Spark and Databricks.
When We Spark and When We Don’t: Developing Data and ML PipelinesStitch Fix Algorithms
The data platform at Stitch Fix runs thousands of jobs a day to feed data products that provide algorithmic capabilities to power nearly all aspects of the business, from merchandising to operations to styling recommendations. Many of these jobs are distributed across Spark clusters, while many others are scheduled as isolated single-node tasks in containers running Python, R, or Scala. Pipelines are often comprised of a mix of task types and containers.
This talk will cover thoughts and guidelines on how we develop, schedule, and maintain these pipelines at Stitch Fix. We’ll discuss guidelines on how we think about which portions of the pipelines we develop to run on what platforms (e.g. what is important to run distributed across Spark clusters vs run in stand-alone containers) and how we get them to play well together. We’ll also provide an overview of tools and abstractions that have been developed at Stitch Fix to facilitate the process from development, to deployment, to monitoring them in production.
Narasimhan Sampath and Avinash Ramineni share how Choice Hotels International used Spark Streaming, Kafka, Spark, and Spark SQL to create an advanced analytics platform that enables business users to be self-reliant by accessing the data they need from a variety of sources to generate customer insights and property dashboards and enable data-driven decisions with minimal IT engagement. Narasimhan and Avinash highlight the architecture, lessons learned, and the challenges that were overcome on both the business and technology fronts.
The analytics platform is designed as a framework to enable self-service data intake, data processing, and report/model generation by the business users. The data-driven framework consists of a distributed hybrid-cloud data ingestor for data intake and a Cloudera CDH cluster with Spark as the distributed compute engine. The solution is built in such a way that storage and compute have been decoupled and encourages the concept of BYOC (bring your own compute). The platform uses EC2 instances to run CDH and leverages Amazon S3 as a data warehouse storage layer (data lake), Spark as an ETL engine, and Spark SQL as a distributed query engine. Results (computations/derived tables) are exposed to the end users via Spark SQL and are discovered via Tableau. The platform supports both batch and streaming use cases and is built on the following technology stack: AWS (S3, EC2, SQS, SNS), Cloudera CDH (YARN, Navigator, Sentry), Spark, Kafka, Spark SQL, and Spark Streaming.
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Fwdays
- Business goal
- What is Fast Data for us
- What is Fast & Big Data solution
- Reference Architecture
- Data Science for Big Data
- Technology Stack
- Solution Architecture
- Identity & Telemetry Data Processing Facts
- Continuous Deployment
- Quality Control
Personalization allows Stitch Fix to style its clients and provide recommendations to help them find what they love. To do this, the company gathers information about a client’s preferences up front when they sign up from the service and learns more about them as they become longer-term customers. This information is important for making recommendations but also must be protected and managed with care.
The data science team at Stitch Fix is the primary owner of the recommendation systems. Backing them up is the data platform team, who maintain the data infrastructure, data warehouse, and supporting tools and services. This data warehouse has several different data sources that read and write into it. This includes a logging pipeline for events, every Spark-based ETL, and daily snapshots of structured data from Stitch Fix applications.
Neelesh Srinivas Salian explains Stitch Fix’s process to better understand the movement and evolution of data within its data warehouse, from the initial ingestion from outside sources through all of its ETLs. Neelesh also details how Stitch Fix built a service that helps the company understand the lineage information that is associated with each table in the data warehouse. This service helps the company understand the source, parentage, and journey of all data in the warehouse. Although Stitch Fix makes sure to anonymize and filter out sensitive information from this data, the company needs a more flexible long-term solution as the business expands.
What is Connected Data as a concept? Who is interested in Connected Data? What problems does Connected Data solve? What skills are used in Connected Data?
Connected Data as of July 2017 has been running for over a year with very successful conference and 9 meetups held to date on a range of topics. These have included Knowledge Representation, Semantics, Linked Data, Graph Databases, Ontology development and use cases and industry verticals including recommendations, telecoms and finance. Yet the group has never had a particularly formal terms of reference or description defining what Connected Data actually means. Some would say this is something of a irony for a group so focused on semantics, schemas, definitions & structure!
This is an attempt (with some humour and something of journey included in it) to achieve something resembling a definition and terms of reference for the group.
Scylla Summit 2022: Scalable and Sustainable Supply Chains with DLT and ScyllaDBScyllaDB
Explore how IOTA addressed supply chain digitization challenges, including the role of data serialization formats (EPCIS 2.0), Distributed Ledgers (IOTA), and scalable, resilient databases (ScyllaDB) across specific use cases.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Simplified minimalistic workflows for the publication of Linked Open DataSalvatore Virtuoso
Our colleague Yuri Glikman of Fraunhofer FOKUS (LinDA partner) presented the LinDA transformation tool at the recent Samos Summit (http://samos-summit.blogspot.de/).
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...Shawn Jones
In a perfect world, all articles consistently contain sufficient metadata to describe the resource. We know this is not the reality, so we are motivated to investigate the evolution of the metadata that is present when authors and publishers supply their own. Because applying metadata takes time, we recognize that each news article author has a limited metadata budget with which to spend their time and effort. How are they spending this budget? What are the top metadata categories in use? How did they grow over time? What purpose do they serve? We also recognize that not all metadata fields are used equally. What is the growth of individual fields over time? Which fields experienced the fastest adoption? In this paper, we review 227,726 HTML news articles from 29 outlets captured by the Internet Archive between 1998 and 2016. Upon reviewing the metadata fields in each article, we discovered that 2010 began a metadata renaissance as publishers embraced metadata for improved search engine ranking, search engine tracking, social media tracking, and social media sharing. When analyzing individual fields, we find that one application of metadata stands out above all others: social cards -- the cards generated by platforms like Twitter when one shares a URL. Once a metadata standard was established for cards in 2010, its fields were adopted by 20% of articles in the first year and reached more than 95% adoption by 2016. This rate of adoption surpasses efforts like schema.org and Dublin Core by a fair margin. When confronted with these results on how news publishers spend their metadata budget, we must conclude that it is all about the cards.
PGDay.Amsterdam 2018 - Jeroen de Graaff - Step-by-step implementation of Post...PGDay.Amsterdam
Rijkswaterstaat is the Service of the Ministry of Infrastructure and Water Management in the Netherlands. During this presentation, I will share our journey to develop and apply PostgreSQL at Rijkswaterstaat. Our work is ICT-driven and access to our data, both historical and actual is key for executing our task now and in the future.
Manage tracability with Apache Atlas, a flexible metadata repositorySynaltic Group
Do you know where is your data ?
Do you know who is responsible of this specific datasets ?
Do you know from which application or task this entity was modified last friday ?
Apache Atlas helps you to manage all your metadata of your data. With Apache Atlas you can know all lineages between your datasets and process that use them.
The Business Case for Semantic Web Ontology & Knowledge GraphCambridge Semantics
In this webinar Mark Wallace, Ontologist & Developer, Semantic Arts, and Thomas Cook, Director of Sales AnzoGraph DB, Cambridge Semantics, explore the benefits of building a Semantic Knowledge Graph with RDF*, wrapping up with an airline data demo that illustrates the value of schema, inference and reasoning in it.
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
Building next generation data warehousesAlex Meadows
All Things Open 2016 Talk - discussing technologies used to augment traditional data warehousing. Those technologies are:
* data vault
* anchor modeling
* linked data
* NoSQL
* data virtualization
* textual disambiguation
In this session we take an in-depth look into the Apache Atlas open metadata and governance function.
Open metadata and governance is a moon-shot type of project to create a set of open APIs, types, and interchange protocols to allow all metadata repositories to share and exchange metadata. From this common base, it adds governance, discovery, and access frameworks to automate the collection, management, and use of metadata across an enterprise. The result is an enterprise catalog of data resources that are transparently assessed, governed, and used in order to deliver maximum value to the enterprise.
Apache Atlas is the reference implementation of the Open Metadata and Governance standards and framework (https://cwiki.apache.org/confluence/display/ATLAS/Open+Metadata+and+Governance). This function will enable an Apache Atlas server to synchronize and query metadata from any open metadata-compliant metadata repository.
In this session we will cover how Open Metadata and Governance works. This includes: (1) the key components in Atlas, (2) the different integration patterns and APIs that vendors can use to integrate their technology into the open metadata ecosystem, and (3) how common metadata use cases such as searching for data sets, managing security (through Atlas/Ranger integration), and automated metadata discovery work in the active ecosystem.
Speaker
Mandy Chessell, Distinguished Engineer, IBM
presentation from the 5th "EC Framework Programmes - funding opportunities" seminar organised by the Applied Research and Communications Fund
http://www.arcfund.net/arcartShow.php?id=16150
When We Spark and When We Don’t: Developing Data and ML PipelinesStitch Fix Algorithms
The data platform at Stitch Fix runs thousands of jobs a day to feed data products that provide algorithmic capabilities to power nearly all aspects of the business, from merchandising to operations to styling recommendations. Many of these jobs are distributed across Spark clusters, while many others are scheduled as isolated single-node tasks in containers running Python, R, or Scala. Pipelines are often comprised of a mix of task types and containers.
This talk will cover thoughts and guidelines on how we develop, schedule, and maintain these pipelines at Stitch Fix. We’ll discuss guidelines on how we think about which portions of the pipelines we develop to run on what platforms (e.g. what is important to run distributed across Spark clusters vs run in stand-alone containers) and how we get them to play well together. We’ll also provide an overview of tools and abstractions that have been developed at Stitch Fix to facilitate the process from development, to deployment, to monitoring them in production.
Narasimhan Sampath and Avinash Ramineni share how Choice Hotels International used Spark Streaming, Kafka, Spark, and Spark SQL to create an advanced analytics platform that enables business users to be self-reliant by accessing the data they need from a variety of sources to generate customer insights and property dashboards and enable data-driven decisions with minimal IT engagement. Narasimhan and Avinash highlight the architecture, lessons learned, and the challenges that were overcome on both the business and technology fronts.
The analytics platform is designed as a framework to enable self-service data intake, data processing, and report/model generation by the business users. The data-driven framework consists of a distributed hybrid-cloud data ingestor for data intake and a Cloudera CDH cluster with Spark as the distributed compute engine. The solution is built in such a way that storage and compute have been decoupled and encourages the concept of BYOC (bring your own compute). The platform uses EC2 instances to run CDH and leverages Amazon S3 as a data warehouse storage layer (data lake), Spark as an ETL engine, and Spark SQL as a distributed query engine. Results (computations/derived tables) are exposed to the end users via Spark SQL and are discovered via Tableau. The platform supports both batch and streaming use cases and is built on the following technology stack: AWS (S3, EC2, SQS, SNS), Cloudera CDH (YARN, Navigator, Sentry), Spark, Kafka, Spark SQL, and Spark Streaming.
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Fwdays
- Business goal
- What is Fast Data for us
- What is Fast & Big Data solution
- Reference Architecture
- Data Science for Big Data
- Technology Stack
- Solution Architecture
- Identity & Telemetry Data Processing Facts
- Continuous Deployment
- Quality Control
Personalization allows Stitch Fix to style its clients and provide recommendations to help them find what they love. To do this, the company gathers information about a client’s preferences up front when they sign up from the service and learns more about them as they become longer-term customers. This information is important for making recommendations but also must be protected and managed with care.
The data science team at Stitch Fix is the primary owner of the recommendation systems. Backing them up is the data platform team, who maintain the data infrastructure, data warehouse, and supporting tools and services. This data warehouse has several different data sources that read and write into it. This includes a logging pipeline for events, every Spark-based ETL, and daily snapshots of structured data from Stitch Fix applications.
Neelesh Srinivas Salian explains Stitch Fix’s process to better understand the movement and evolution of data within its data warehouse, from the initial ingestion from outside sources through all of its ETLs. Neelesh also details how Stitch Fix built a service that helps the company understand the lineage information that is associated with each table in the data warehouse. This service helps the company understand the source, parentage, and journey of all data in the warehouse. Although Stitch Fix makes sure to anonymize and filter out sensitive information from this data, the company needs a more flexible long-term solution as the business expands.
What is Connected Data as a concept? Who is interested in Connected Data? What problems does Connected Data solve? What skills are used in Connected Data?
Connected Data as of July 2017 has been running for over a year with very successful conference and 9 meetups held to date on a range of topics. These have included Knowledge Representation, Semantics, Linked Data, Graph Databases, Ontology development and use cases and industry verticals including recommendations, telecoms and finance. Yet the group has never had a particularly formal terms of reference or description defining what Connected Data actually means. Some would say this is something of a irony for a group so focused on semantics, schemas, definitions & structure!
This is an attempt (with some humour and something of journey included in it) to achieve something resembling a definition and terms of reference for the group.
Scylla Summit 2022: Scalable and Sustainable Supply Chains with DLT and ScyllaDBScyllaDB
Explore how IOTA addressed supply chain digitization challenges, including the role of data serialization formats (EPCIS 2.0), Distributed Ledgers (IOTA), and scalable, resilient databases (ScyllaDB) across specific use cases.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Simplified minimalistic workflows for the publication of Linked Open DataSalvatore Virtuoso
Our colleague Yuri Glikman of Fraunhofer FOKUS (LinDA partner) presented the LinDA transformation tool at the recent Samos Summit (http://samos-summit.blogspot.de/).
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...Shawn Jones
In a perfect world, all articles consistently contain sufficient metadata to describe the resource. We know this is not the reality, so we are motivated to investigate the evolution of the metadata that is present when authors and publishers supply their own. Because applying metadata takes time, we recognize that each news article author has a limited metadata budget with which to spend their time and effort. How are they spending this budget? What are the top metadata categories in use? How did they grow over time? What purpose do they serve? We also recognize that not all metadata fields are used equally. What is the growth of individual fields over time? Which fields experienced the fastest adoption? In this paper, we review 227,726 HTML news articles from 29 outlets captured by the Internet Archive between 1998 and 2016. Upon reviewing the metadata fields in each article, we discovered that 2010 began a metadata renaissance as publishers embraced metadata for improved search engine ranking, search engine tracking, social media tracking, and social media sharing. When analyzing individual fields, we find that one application of metadata stands out above all others: social cards -- the cards generated by platforms like Twitter when one shares a URL. Once a metadata standard was established for cards in 2010, its fields were adopted by 20% of articles in the first year and reached more than 95% adoption by 2016. This rate of adoption surpasses efforts like schema.org and Dublin Core by a fair margin. When confronted with these results on how news publishers spend their metadata budget, we must conclude that it is all about the cards.
PGDay.Amsterdam 2018 - Jeroen de Graaff - Step-by-step implementation of Post...PGDay.Amsterdam
Rijkswaterstaat is the Service of the Ministry of Infrastructure and Water Management in the Netherlands. During this presentation, I will share our journey to develop and apply PostgreSQL at Rijkswaterstaat. Our work is ICT-driven and access to our data, both historical and actual is key for executing our task now and in the future.
Manage tracability with Apache Atlas, a flexible metadata repositorySynaltic Group
Do you know where is your data ?
Do you know who is responsible of this specific datasets ?
Do you know from which application or task this entity was modified last friday ?
Apache Atlas helps you to manage all your metadata of your data. With Apache Atlas you can know all lineages between your datasets and process that use them.
The Business Case for Semantic Web Ontology & Knowledge GraphCambridge Semantics
In this webinar Mark Wallace, Ontologist & Developer, Semantic Arts, and Thomas Cook, Director of Sales AnzoGraph DB, Cambridge Semantics, explore the benefits of building a Semantic Knowledge Graph with RDF*, wrapping up with an airline data demo that illustrates the value of schema, inference and reasoning in it.
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
Building next generation data warehousesAlex Meadows
All Things Open 2016 Talk - discussing technologies used to augment traditional data warehousing. Those technologies are:
* data vault
* anchor modeling
* linked data
* NoSQL
* data virtualization
* textual disambiguation
In this session we take an in-depth look into the Apache Atlas open metadata and governance function.
Open metadata and governance is a moon-shot type of project to create a set of open APIs, types, and interchange protocols to allow all metadata repositories to share and exchange metadata. From this common base, it adds governance, discovery, and access frameworks to automate the collection, management, and use of metadata across an enterprise. The result is an enterprise catalog of data resources that are transparently assessed, governed, and used in order to deliver maximum value to the enterprise.
Apache Atlas is the reference implementation of the Open Metadata and Governance standards and framework (https://cwiki.apache.org/confluence/display/ATLAS/Open+Metadata+and+Governance). This function will enable an Apache Atlas server to synchronize and query metadata from any open metadata-compliant metadata repository.
In this session we will cover how Open Metadata and Governance works. This includes: (1) the key components in Atlas, (2) the different integration patterns and APIs that vendors can use to integrate their technology into the open metadata ecosystem, and (3) how common metadata use cases such as searching for data sets, managing security (through Atlas/Ranger integration), and automated metadata discovery work in the active ecosystem.
Speaker
Mandy Chessell, Distinguished Engineer, IBM
presentation from the 5th "EC Framework Programmes - funding opportunities" seminar organised by the Applied Research and Communications Fund
http://www.arcfund.net/arcartShow.php?id=16150
overview of the RDF graph database-as-a-service (GraphDB based) on the Self-Service Semantic Suite (S4)
http://s4.ontotext.com
presentation for the AKSW Group of the University of Leipzig
As software engineers we do trade-offs every day. We often need to pick between things like space vs time or budget vs scope. Or sometimes the amount of creative waste we can afford to have. And when we make the decision we need to be in full comprehension of both the upside and downside of a particular choice. In this talk we will discuss why our organization decided to move from Python to Java. We will go over each tradeoff we decided to do and the motivation behind it.
Много често, когато искаме да станем по-добри backend програмисти се опитваме да научим различни езици за програмиране и съответните библиотеки. Проблема е че в Rails, Express.js, Django или Zend Framework има горе долу едни и същи концепции. Ако искаме да се научим как да пишем код за големи системи, които скалират добре и се справят сами с различни грешки и неочаквани ситуации, трябва да овладеем един друг дял от човешкото познание, който се нарича разпределени системи. В моята презентация ще видим защо трябва да задълбаем в тях и какви са основните принципи като консистентност(consistency), достъпност(availability) и издръжливост на разделения(partition tolerance). Също, ще разгледаме стъпки, които всеки може да направи за да научи повече по темата и да получава нови и актуални знания.
Dec'2013 webinar from the EUCLID project on managing large volumes of Linked Data
webinar recording at https://vimeo.com/84126769 and https://vimeo.com/84126770
more info on EUCLID: http://euclid-project.eu/
Crossing the Chasm with Semantic TechnologyMarin Dimitrov
After more than a decade of active efforts towards establishing Semantic Web, Linked Data and related standards, the verdict of whether the technology has delivered its promise and has proven itself in the enterprise is still unclear, despite the numerous existing success stories.
Every emerging technology and disruptive innovation has to overcome the challenge of “crossing the chasm” between the early adopters, who are just eager to experiment with the technology potential, and the majority of the companies, who need a proven technology that can be reliably used in mission critical scenarios and deliver quantifiable cost savings.
Succeeding with a Semantic Technology product in the enterprise is a challenging task involving both top quality research and software development practices, but most often the technology adoption challenges are not about the quality of the R&D but about successful business model generation and understanding the complexities and challenges of the technology adoption lifecycle by the enterprise.
This talk will discuss topics related to the challenge of “crossing the chasm” for a Semantic Technology product and provide examples from Ontotext’s experience of successfully delivering Semantic Technology solutions to enterprises.
"Semantic Integration Is What You Do Before The Deep Learning". dev.bg Machine Learning seminar, 13 May 2019.
It's well known that 80\% of the effort of a data scientist is spent on data preparation. Semantic integration is arguably the best way to spend this effort more efficiently and to reuse it between tasks, projects and organizations. Knowledge Graphs (KG) and Linked Open Data (LOD) have become very popular recently. They are used by Google, Amazon, Bing, Samsung, Springer Nature, Microsoft Academic, AirBnb… and any large enterprise that would like to have a holistic (360 degree) view of its business. The Semantic Web (web 3.0) is a way to build a Giant Global Graph, just like the normal web is a Global Web of Documents. IEEE already talks about Big Data Semantics. We review the topic of KGs and their applicability to Machine Learning.
Triplestores and inference, applications in Finance, text-mining. Projects and solutions for financial media and publishers.
Keystone Industrial Panel, ISWC 2014, Riva del Garda, 18 Oct 2014.
Thanks to Atanas Kiryakov for this presentation, I just cut it to size.
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
Most organizations still rely on batch and offline processing of data streams to gain meaningful analysis and insight into their business. However, in our instant gratification world, real-time computation and analysis of streaming data is crucial in gaining insight into patterns and threats. A trend is emerging for real-time and instant analysis from live data streams, promoting the value of logs and a move toward functional programming.
This shift in technology is not about what and how to store the data, but what we can do with it to see emerging patterns and trends across multiple resources, applications, services and environments. Log data represents a wealth of information, yet is often sporadic, unstructured, scattered across the enterprise and difficult to track.
These slides provide insights into some of the most helpful Big Data tools used by the largest social media and data-centric organizations for competitive trends, instant analysis and feedback from large volume data streams. We show how how using Big Data tools Storm, ElasticSearch and an elastic UI can turn application logs into real-time analytical views.
You will also learn how Big Data:
Contains data that is elastic, minimally structured, flexible and scalable
Helps process live streams into meaningful data
Promotes a move toward functional programming
Effects the enterprise data architecture
Works with real-time CEP tools like Storm for functional programming
At Data-centric Architecture Forum 2020 Thomas Cook, our Sales Director of AnzoGraph DB, gave his presentation "Knowledge Graph for Machine Learning and Data Science". These are his slides.
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
Big Data projects are a struggle, not only on the technical side but also on the organizational side. In this talk the author shares his experience and opinions from almost 5 years of Big Data projects and develops an Agile Big Data Model which reflects his ideas on how Big Data projects can be successful, even in large companies.
Talk held at the crossover meetup of the "Agile Stammtisch Rhein-Main" and the "Hadoop & Spark User Group Rhein-Main" at codecentric AG on 31.01.2017.
Simplified minimalistic workflows for the publication of Linked Open DataLinDa_FP7
The LinDA project addresses one of the most significant challenges of the usage and publication of Linked Data, the renovation and conversion of existing data formats into structures that support the semantic enrichment and interlinking of data. The set of tools provided by LinDA will assist enterprises, especially SMEs which often cannot afford the development and maintenance of dedicated information analysis and management departments, in efficiently developing novel data analytical services that are linked to the available public data therefore contributing to improve their competitiveness and stimulating the emergence of innovative business models.
This is the project presentation from Samos 2015 Summit on ICT-enabled Governance, held on June 29 – July 3, 2015, Samos, Greece (http://samos-summit.blogspot.de/).
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...Mark Rittman
This talk focus is on what a data reservoir is, how it related to the RDBMS DW, and how Big Data Discovery provides access to it to business and BI users
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Walters...HostedbyConfluent
Are you looking for a cloud-based architecture that includes the best of breed streaming and database technologies? In this session you will learn how to setup and configure the Confluent Cloud with MongoDB Atlas. We'll start the journey learning about the basic connectivity between the two cloud services and end with a brief discovery of what you can do with data once it is in MongoDB Atlas. By the end of this session you will know how to securely setup and configure the MongoDB Atlas connectors in the Confluent Cloud in both a source and sink configuration.
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...HostedbyConfluent
Are you looking for a cloud-based architecture that includes the best of breed streaming and database technologies? In this session you will learn how to setup and configure the Confluent Cloud with MongoDB Atlas. We'll start the journey learning about the basic connectivity between the two cloud services and end with a brief discovery of what you can do with data once it is in MongoDB Atlas. By the end of this session you will know how to securely setup and configure the MongoDB Atlas connectors in the Confluent Cloud in both a source and sink configuration.
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...HostedbyConfluent
Are you looking for a cloud-based architecture that includes the best of breed streaming and database technologies? In this session you will learn how to setup and configure the Confluent Cloud with MongoDB Atlas. We'll start the journey learning about the basic connectivity between the two cloud services and end with a brief discovery of what you can do with data once it is in MongoDB Atlas. By the end of this session you will know how to securely setup and configure the MongoDB Atlas connectors in the Confluent Cloud in both a source and sink configuration.
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Marin Dimitrov
High-performing engineering teams regularly dedicate time on measuring the performance & quality of the systems and applications they’re building or on measuring & improving the various aspects of the development lifecycle. High-performing product companies are also data-driven when it comes to measuring the impact of new features & products in terms of business KPIs and Northstar metrics.
Can a data-driven approach be applied to measuring the performance, maturity and continuous improvement of an engineering team or the whole engineering organisation? In this discussion we’ll cover various important topics related to quantifying the performance of an engineering organisation
The career development of our teammates is among the key responsibilities of a leader - and оur personal career development vision & plan plays a critical role for our long term growth and success. Despite their importance, our career vision is often not getting enough attention and level of detail, or is hampered by easily avoidable mistakes. In this discussion, we’ll address typical mistakes related to long-term career planning, some best practices, and practical steps for building our own long-term career development vision (or the ones of the teammates we are leading), so that career planning becomes a long term journey with clear why/how/what, rather than just a list of SMART goals
Uber began its open source journey in 2015 when three passionate engineers decided to contribute Uber’s work back to the community. In only four years, Uber’s open source program has fostered 350+ outstanding open source projects with 2,000+ contributors worldwide delivering over 70,000 commits. Since 2017, four of Uber’s open source projects have won InfoWorld’s Best of Open Source Software Awards. In this talk, Brian Hsieh & Marin Dimitrov will share more details on Uber’s open source journey, program and best practices, and how Uber enables open innovation by fostering a healthy and collaborative open source culture
Trust - the Key Success Factor for Teams & OrganisationsMarin Dimitrov
>>> Most leaders agree that trust is a key factor for the success o the team and the organisation and that they are actively working to build trust. And yet, various studies imply that almost half of the teams and organisations worldwide experience lower trust levels with their managers, teammates and the rest of the organisation, which leads to decreased engagement, productivity and success.
>>> In this talk we will discuss why trust is a key success factor for every team and every organisation, some good practices for building, sustaining and rebuilding trust, as well as the most common mistakes related to trust building
talk @ the Computer Science department of Sofia University - practical advice for career growth for students
DEV.BG event http://dev.bg/%D1%81%D1%8A%D0%B1%D0%B8%D1%82%D0%B8%D0%B5/fmi-club-%D0%BF%D1%80%D0%B0%D0%BA%D1%82%D0%B8%D1%87%D0%BD%D0%B8-%D1%81%D1%8A%D0%B2%D0%B5%D1%82%D0%B8-%D0%B7%D0%B0-%D0%BA%D0%B0%D1%80%D0%B8%D0%B5%D1%80%D0%BD%D0%BE-%D1%80%D0%B0%D0%B7%D0%B2%D0%B8%D1%82/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Essentials of Automations: Optimizing FME Workflows with Parameters
On-Demand RDF Graph Databases in the Cloud
1. On-Demand RDF Graph Databases in
the Cloud
A webinar with
Marin Dimitrov, CTO of Ontotext
Jun 11th, 2015
On-Demand RDF Graph Databases in the Cloud #1Jun 2015
2. • The Self-Service Semantic Suite (S4)
• RDF graph databases
• On-demand RDF databases in the Cloud
• Demo
• Roadmap
• Q&A session
Today’s topics
#2On-Demand RDF Graph Databases in the Cloud Jun 2015
3. About Ontotext
• Provides products & solutions for content
enrichment, metadata management & information
discovery
– 70 employees, headquarters in Sofia (Bulgaria)
– Sales presence in London & New York
• Major clients and industries
– Media & Publishing
– Health Care & Life Sciences
– Cultural Heritage & Digital Libraries
– Government
– Education
#3On-Demand RDF Graph Databases in the Cloud Jun 2015
4. Some of our clients
#4On-Demand RDF Graph Databases in the Cloud Jun 2015
5. Our vision for Smart Data management
Graph Database
• Flexible RDF graph
data model
• Ontology based
metadata layer
Semantic Search
• Semantic,
exploratory search
• Metadata driven
content
Text Mining & Interlinking
• Interlink people,
locations, organisations,
topics
• Discover implicit relations
• Reuse open knowledge
graphs
#5On-Demand RDF Graph Databases in the Cloud Jun 2015
6. Ontotext and AstraZeneca
Profile
• Global, Bio-pharma company
• $28 billion in sales in 2012
• $4 billion in R&D across three continents
Goals
• Efficient design of new clinical studies
• Quick access to all of the data
• Improved evidence based decision-making
• Strengthen the knowledge feedback loop
• Enable predictive science
Challenges
• Over 7,000 studies and 23,000 documents
are difficult to obtain
• Searches returning 1,000 – 10,000 results
• Document repositories not designed for
reuse
• Tedious process to arrive at evidence
based decisions
#6On-Demand RDF Graph Databases in the Cloud Jun 2015
7. Ontotext and the Financial Times
Profile
• Top 3 business media
• Focused both on B2C publishing and B2B
services
Goals
• Create a horizontal platform for content
enrichment and recommendation based on
semantics
Challenges
• Critical part of the entire workflow
• Move fast from inception to production
deployment
• GraphDB used not only for data, but for
content storage as well
• Horizontal platform with focus on
organizations, people and relations between
them
• Automatic extraction of all these concepts
and relationships
• Personalised recommendations of relevant
content across the entire media
#7On-Demand RDF Graph Databases in the Cloud Jun 2015
8. Ontotext and LMI
Profile
• Established in 1961 to enable federal
agencies
• Specializes in logistics, financial,
infrastructure & information management
Goals
• Unlock large collections of complex
documents
• Improve analyst productivity
• Create an application they can sell to US
Federal agencies
Challenges
• Analysts taking hours to find, download
and search documents, using inaccurate
keyword searches
• Needed a knowledge base to search
quickly and guide the analysts – highly
relevant searches
• Extracts knowledge from collection of
documents
• Uses GraphDB to intuitively search and filter
• More than 90% savings in analyst time
• Accurate results
#8On-Demand RDF Graph Databases in the Cloud Jun 2015
10. • Capabilities for text analytics, content enrichment
and smart data management
– Text analytics for news, life sciences and social media
– RDF graph database as-a-service
– Access to large open knowledge graphs
• Available on-demand, anytime, anywhere
– Simple RESTful services
• Simple pay-per-use pricing
– No upfront commitments
What is S4?
#10On-Demand RDF Graph Databases in the Cloud Jun 2015
12. • Enables quick prototyping
– Instantly available, no provisioning & operations
required
– Focus on building applications, don’t worry about
infrastructure
• Free tier!
• Easy to start, shorter learning curve
– Various add-ons, SDKs and demo code
• Based on enterprise semantic technology by
Ontotext
Benefits
#12On-Demand RDF Graph Databases in the Cloud Jun 2015
13. Getting started in minutes
#13
1. Register a personal
account at s4.ontotext.com
2. Generate an
API key pair
3. Check out the docs,
demos & code at
docs.s4.ontotext.com
4. Contact us
with questions!
On-Demand RDF Graph Databases in the Cloud Jun 2015
14. • Text analytics services
– News annotation
– News categorisation
– Biomedical
– Twitter
• Entity linking & disambiguation
– Mappings to DBpedia & GeoNames instances
– Mappings to biomedical data sources (LinkedLifeData)
• HTML, MS Word, XML, plain text input
• Simple JSON output
Text analytics with S4
#14On-Demand RDF Graph Databases in the Cloud Jun 2015
16. • SPARQL query endpoint to the FactForge semantic
data warehouse
– 500 million entities / 5 billion triples
• Key LOD datasets integrated
– DBpedia, Freebase/WikiData, GeoNames, WordNet
– Dublin Core, SKOS, PROTON ontologies and
vocabularies
Knowledge graphs with S4
#16On-Demand RDF Graph Databases in the Cloud Jun 2015
17. Knowledge graph query example
#17
SPARQL query
using DBpedia
data
On-Demand RDF Graph Databases in the Cloud Jun 2015
18. RDF Graph Data Management
#18On-Demand RDF Graph Databases in the Cloud Jun 2015
19. • Schema-less data integration, easy querying of
diverse data
• Standards compliance
– Based on a mature set of W3C standards: RDF/S, OWL,
SPARQL
– Portability & interoperability across vendors
• Complex & exploratory queries
• Infer implicit relations in the graph
• Reuse open knowledge graphs (Linked Open Data)
RDF for smart data management
#19On-Demand RDF Graph Databases in the Cloud Jun 2015
20. A visual view of RDF data
#20
Sub-properties
Sub-classes
Transitive relations
Inference
On-Demand RDF Graph Databases in the Cloud Jun 2015
21. • High performance RDF database, 10s of billions of
triples
• Full SPARQL 1.1 support
• Various reasoning profiles, including custom rules
• Efficient data integration (“sameAs” optimisations)
and deletion of statements & their inferences
• Geo-spatial indexing & querying with SPARQL
• RDF Rank, full-text search, 3rd party plugins
• Connectors to Solr, ElasticSearch, NoSQL DBs
• GraphDB Workbench
GraphDB by Ontotext
#21On-Demand RDF Graph Databases in the Cloud Jun 2015
22. “Despite all of this attention the
market is dominated by Neo4J
and Ontotext (GraphDB), which
are graph and RDF database
providers respectively. These are
the longest established vendors
in this space (both founded in
2000) so they have a longevity
and experience that other
suppliers cannot yet match.
How long this will remain the
case remains to be seen.”
Graph databases report by Bloor
Bloor Group whitepaper
Graph Databases, April 2015
http://www.bloorresearch.com/technology/graph-databases/
#22On-Demand RDF Graph Databases in the Cloud Jun 2015
24. • Ideal for customers who are…
– still evaluating and testing RDF technology
– In the early phase of adoption / PoC
• Enterprise grade RDF database in the Cloud
– No need for upfront payments for licenses & hardware
– Pay only for what you use, when you use it
– Instantly operational within minutes
– No need for complex planning - use as many DB
instances for as long as needed
– Timely upgrades to the latest version
• Self-managed and fully managed options
RDF database in the Cloud with S4
#24On-Demand RDF Graph Databases in the Cloud Jun 2015
25. • Available from AWS Marketplace, “1-Click”
purchasing
• Variety of hardware configurations
– 2 to 8 CPU cores / 8 to 61 GB RAM
– IOPS performance & encryption (EBS)
• Manage large data volumes
• Pay-per-hour pricing
• Users take care of operations
– Backups, restores
Self-managed RDF DB in the Cloud
#25On-Demand RDF Graph Databases in the Cloud Jun 2015
26. Self-managed RDF DB in the Cloud
#26On-Demand RDF Graph Databases in the Cloud Jun 2015
27. • Low-cost graph DBaaS available 24/7
• Ideal for small & moderate data & query volumes
– database options: 1M, 10M, 50M, 250M & 1B triples
• Instantly deploy new databases when needed
• Zero administration
– automated operations, maintenance & upgrades
• Users pay only for the actual database utilisation
• Standard OpenRDF REST API
Fully managed RDF DB in the Cloud
#27On-Demand RDF Graph Databases in the Cloud Jun 2015
28. Fully managed RDF DB in the Cloud
#28
Database type Max triples
micro 1 million
XS 10 million
S 50 million
M 250 million
L 1 billion
On-Demand RDF Graph Databases in the Cloud Jun 2015
FREE!
29. Fully managed RDF DB in the Cloud
#29On-Demand RDF Graph Databases in the Cloud Jun 2015
30. • Evaluate the technology
• Instant deployment, faster experimentation
• Faster application development
• Data services / Open Data publishing
• Reducing TCO & risk
Use cases for an RDF DBaaS
#30On-Demand RDF Graph Databases in the Cloud Jun 2015
31. • Cloud native architecture, running on AWS
• Designed for elasticity & high availability
– More resources added whenever needed
– Failed nodes replaced immediately
• GraphDB is the RDF DB engine
– OpenRDF REST API
• Isolation of the multi-tenant databases
– Docker containers
– Private NAS volumes (EBS) for data storage
Fully managed RDF DB in the Cloud
#31On-Demand RDF Graph Databases in the Cloud Jun 2015
32. OpenRDF REST API
#32
resource operations comments
/repositories GET Get info on DB repos
/repositories/<REPOSITORY> GET, POST, PUT, DELETE Create*, delete, query a
repository
/repositories/<REPOSITORY>/size GET Gets the number of triples in a
repository
/repositories/<REPOSITORY>/statements GET, POST, PUT, DELETE Add, read, update, delete
statements
repositories/<REPOSITORY>/rdf-graphs/<GRAPH> GET, POST, PUT, DELETE Same as above
/settings GET, PUT Configure the DBaaS*
On-Demand RDF Graph Databases in the Cloud Jun 2015
36. Uploading data (Java / OpenRDF SDK)
#36
String dbaasURL = "<dbaas URL>";
String repositoryId="<repository ID>";
String pathToTheFile="<pathToTheFile>";
String ApiKey = "<api-key>";
String ApiPass = "<api-pass>";
//The base URI to resolve any relative URIs that are in the data against. String
baseURI="http://www.example.org";
// Create a RemoteRepositoryManager
RemoteRepositoryManager manager = RemoteRepositoryManager.getInstance(dbaasURL, ApiKey,
ApiPass);
// Open a connection to the repository
Repository repository = manager.getRepository(repositoryId);
RepositoryConnection repositoryConnection = repository.getConnection();
// upload RDF data
File fileToUpload=new File(pathToTheFile);
repositoryConnection.add(fileToUpload, baseURI, RDFFormat.RDFXML);
// close the connection
repositoryConnection.close();
On-Demand RDF Graph Databases in the Cloud Jun 2015
37. Querying data (OpenRDF Workbench)
#37On-Demand RDF Graph Databases in the Cloud Jun 2015
38. Querying data (OpenRDF Workbench)
#38On-Demand RDF Graph Databases in the Cloud Jun 2015
41. • (Create a database)
• Create a repository
• Upload sample data
• Query the data
• Explore data with a 3rd party tool
Demo scenario
#41On-Demand RDF Graph Databases in the Cloud Jun 2015
42. Create a database
#42On-Demand RDF Graph Databases in the Cloud Jun 2015
Micro, XS, S, M, or L
R/O access to Open
Data services or
open knowledge
graphs
55. • Various improvements (backup & export)
• Gradually introduce XS, S, M and L databases
• Increased availability
– Cross-datacenter replication
• Integration with the GraphDB Workbench
Work in progress
#55On-Demand RDF Graph Databases in the Cloud Jun 2015
58. • S4 provides an enterprise RDF DBaaS
• Free graph databases up to 1M triples
• Instantly available whenever needed
• Easy to use: OpenRDF REST services
• Zero administration: automated operations,
maintenance & upgrades
• Resilient design, high availability
• Check out http://s4.ontotext.com
Key Takeaways
#58On-Demand RDF Graph Databases in the Cloud Jun 2015
59. • Online documentation
– http://docs.s4.ontotext.com/
• Helpdesk
– http://support.s4.ontotext.com/
• Sample code & demos on GitHub
– https://github.com/Ontotext-AD/S4
• Twitter
– @Ontotext_S4
Additional S4 resources
#59On-Demand RDF Graph Databases in the Cloud Jun 2015
60. Thank you!
On-Demand RDF Graph Databases in the Cloud
A link to the recording will be sent out shortly
Jun 11th, 2015
#60On-Demand RDF Graph Databases in the Cloud Jun 2015