slides from the talk on "Text Analytics & Linked Data Management As-a-Service with S4" from the ESWC'2015 workshop on Semantic Web Enterprise Adoption & Best Practices
full paper available at http://2015.wasabi-ws.org/papers/wasabi15_1.pdf
slides from our talk "Low-Cost Open Data as-a-service" from the Semantic Web Developers workshop of ESWC'2015 (full paper: http://ceur-ws.org/Vol-1361/paper7.pdf)
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
slides from the S4 webinar "On-Demand RDF Graph Databases in the Cloud"
RDF database-as-a-service running on the Self-Service Semantic Suite (S4) platform: http://s4.ontotext.com
video recording of the talk is available at http://info.ontotext.com/on-demand-rdf-graph-database
overview of the RDF graph database-as-a-service (GraphDB based) on the Self-Service Semantic Suite (S4)
http://s4.ontotext.com
presentation for the AKSW Group of the University of Leipzig
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...Databricks
AI is fundamentally transforming how we live and work.
Zalando is a data driven company. We deliver an optimal customer experience that drives engagement. We continue to improve this experience by leveraging the latest technologies and machine learning techniques — such as building a cutting edge cloud based infrastructure to support our operations at scale.
We provide our data scientists across Zalando with the means to implement artificial intelligence use cases, leveraging data from all parts of our company and the best machine learning techniques from across the industry. Apache Spark delivered through Databricks is at the core of this strategy.
In this keynote, I’ll share our AI journey thus far, and share how we are exploring ways to unify data through A.I. with Spark and Databricks.
slides from our talk "Low-Cost Open Data as-a-service" from the Semantic Web Developers workshop of ESWC'2015 (full paper: http://ceur-ws.org/Vol-1361/paper7.pdf)
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
slides from the S4 webinar "On-Demand RDF Graph Databases in the Cloud"
RDF database-as-a-service running on the Self-Service Semantic Suite (S4) platform: http://s4.ontotext.com
video recording of the talk is available at http://info.ontotext.com/on-demand-rdf-graph-database
overview of the RDF graph database-as-a-service (GraphDB based) on the Self-Service Semantic Suite (S4)
http://s4.ontotext.com
presentation for the AKSW Group of the University of Leipzig
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...Databricks
AI is fundamentally transforming how we live and work.
Zalando is a data driven company. We deliver an optimal customer experience that drives engagement. We continue to improve this experience by leveraging the latest technologies and machine learning techniques — such as building a cutting edge cloud based infrastructure to support our operations at scale.
We provide our data scientists across Zalando with the means to implement artificial intelligence use cases, leveraging data from all parts of our company and the best machine learning techniques from across the industry. Apache Spark delivered through Databricks is at the core of this strategy.
In this keynote, I’ll share our AI journey thus far, and share how we are exploring ways to unify data through A.I. with Spark and Databricks.
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Fwdays
- Business goal
- What is Fast Data for us
- What is Fast & Big Data solution
- Reference Architecture
- Data Science for Big Data
- Technology Stack
- Solution Architecture
- Identity & Telemetry Data Processing Facts
- Continuous Deployment
- Quality Control
Scylla Summit 2022: Scalable and Sustainable Supply Chains with DLT and ScyllaDBScyllaDB
Explore how IOTA addressed supply chain digitization challenges, including the role of data serialization formats (EPCIS 2.0), Distributed Ledgers (IOTA), and scalable, resilient databases (ScyllaDB) across specific use cases.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
In this webinar Thomas Cook, Sales Director, AnzoGraph DB, provides a history lesson on the origins of SPARQL, including its roots in the Semantic Web, and how linked open data is used to create Knowledge Graphs. Then, he dives into "What is RDF?", "What is a URI?" and "What is SPARQL?", wrapping up with a real-world demonstration via a Zeppelin notebook.
Narasimhan Sampath and Avinash Ramineni share how Choice Hotels International used Spark Streaming, Kafka, Spark, and Spark SQL to create an advanced analytics platform that enables business users to be self-reliant by accessing the data they need from a variety of sources to generate customer insights and property dashboards and enable data-driven decisions with minimal IT engagement. Narasimhan and Avinash highlight the architecture, lessons learned, and the challenges that were overcome on both the business and technology fronts.
The analytics platform is designed as a framework to enable self-service data intake, data processing, and report/model generation by the business users. The data-driven framework consists of a distributed hybrid-cloud data ingestor for data intake and a Cloudera CDH cluster with Spark as the distributed compute engine. The solution is built in such a way that storage and compute have been decoupled and encourages the concept of BYOC (bring your own compute). The platform uses EC2 instances to run CDH and leverages Amazon S3 as a data warehouse storage layer (data lake), Spark as an ETL engine, and Spark SQL as a distributed query engine. Results (computations/derived tables) are exposed to the end users via Spark SQL and are discovered via Tableau. The platform supports both batch and streaming use cases and is built on the following technology stack: AWS (S3, EC2, SQS, SNS), Cloudera CDH (YARN, Navigator, Sentry), Spark, Kafka, Spark SQL, and Spark Streaming.
Simplified minimalistic workflows for the publication of Linked Open DataSalvatore Virtuoso
Our colleague Yuri Glikman of Fraunhofer FOKUS (LinDA partner) presented the LinDA transformation tool at the recent Samos Summit (http://samos-summit.blogspot.de/).
PGDay.Amsterdam 2018 - Jeroen de Graaff - Step-by-step implementation of Post...PGDay.Amsterdam
Rijkswaterstaat is the Service of the Ministry of Infrastructure and Water Management in the Netherlands. During this presentation, I will share our journey to develop and apply PostgreSQL at Rijkswaterstaat. Our work is ICT-driven and access to our data, both historical and actual is key for executing our task now and in the future.
Jan van Ansem - Help a friend: how the Developers community can help to get Data Warehousing development up to date with modern development technology.
This XML Prague 2015 Pre-conference presentations shows practical usage of linked data sources. These sources can help to: enrich content with entities, add link to external data sources, use the enriched content in question answering, machine translation or other scenarios. The aim is to show the practical application of linked data sources in XML tooling. The presentation is an update and provides outcomes of the related session held at XML Prague 2014.
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
Memory Database Technology is Driving a New Cycle of Business InnovationVoltDB
In-memory database technology enables a new wave of fast data use cases that are extremely challenging and in some cases not possible with older technologies. In this webinar, Noel Yuhanna, Principal Analyst of Forrester Research, and VoltDB CMO, Peter Vescuso will discuss the latest market and data access technology trends, the new use cases these trends enable, and the implications for business and IT leaders.
Containers help us move our applications forward in a consistent, repeatable, and predictable manner, reducing the labor and making things simpler. Design patterns exist to help you solve common problems for which containers are a good solution. Providing a common language to the architecture.
Mike Stonebraker on Designing An Architecture For Real-time Event ProcessingVoltDB
In this webinar, Mike Stonebraker explains the tradeoffs of performance, scale, and required programming “heroics” in capturing the value of fast data with different stream processing alternatives. Then hear from John Hugg, Founding Engineer of VoltDB, as he discusses the VoltDB approach to fast data. Learn what’s possible when systems integrate event processing with state management in a consistent, transactional way. You can also view the webinar recording here: http://learn.voltdb.com/WRStonebrakerRTeventprocessing.html
Evaluation of TPC-H on Spark and Spark SQL in ALOJADataWorks Summit
The Evaluation of TPC-H on Spark and Spark SQL in ALOJA was conducted at the Big Data Lab to obtain the master degree in Management Information Systems at the Johann-Wolfgang Goethe University in Frankfurt, Germany. Furthermore, the analysis was partially accomplished in collaboration and close coordination with the Barcelona Super Computer Center.
The intention of this research was the integration of a TPC-H on Spark Scala benchmark into ALOJA, an open-source and public platform for automated and cost-efficient benchmarks and to perform an evaluation on the runtime of Spark Scala with or without Hive Metastore compared to Spark SQL. Various alternate file formats with different applied compressions on underlying data and its impact are evaluated. The conducted performance evaluation exposed diverse and captivating outcomes for both benchmarks. Further investigations attempt to detect possible bottlenecks and other irregularities. The aim is to provide an explanation to enhance knowledge of Spark’s engine based on examining the physical plans. Our experiments show, inter alia, that: (1) Spark Scala performs better in case of heavy expression calculation, (2) Spark SQL is the better choice in case of strong data access locality in combination with heavyweight parallel execution. In conclusion, diverse results were observed with the consequence that each API has its advantages and disadvantages.
Surprisingly, our findings are well spread between Spark SQL and Spark Scala and contrary to our expectations Spark Scala did not outperform Spark SQL in all aspects but support the idea that applied optimizations appear to be implemented in a different way by Spark for its core and its extension Spark SQL. The API on top of Spark provides extra information about the underlying structured data, which is probably used to perform additional optimizations.
In conclusion, our research demonstrates that there are differences in the generation of query execution plans that goes hand-in-hand with similar discoveries leading to inefficient joins, and it underlines the importance of our benchmark to identify disparities and bottlenecks.
Speaker
Raphael Radowitz, Quality Specialist, SAP Labs Korea
ML Production Pipelines: A Classification ModelDatabricks
In this talk, we will present how we tied Python together with Databricks and MLflow to productionalize a machine learning pipeline.
Through the deployment of a fairly standard classification model, we will present what a machine learning pipeline in Production could look like. The project consists of two pipelines; training and prediction. We are using the S3 Bucket as a source of data. The training pipeline trains various models on data, registers them in Mlflow, and stores all metrics and hyperparameters. Using Grid Search, the best model is chosen and moved to the Production Stage in MLflow. The Production model can then be deployed using Flask, or just a UDF if we want to process data in a batch. The prediction pipeline will then use the deployed model to make a prediction, whether on-demand or in a batch.
Enabling Low-cost Open Data Publishing and ReuseMarin Dimitrov
In the space of just a few years we’ve seen the transformational power of open data; both for transparency and accountability in public data, and efficiency and innovation with businesses in private data. In its first year, institutions and individuals throughout Europe have supported public sector bodies in releasing data and numerous start-ups, developers and SMEs in reusing this data for economic benefit.
However, we are still at the beginning of the open data movement, and there is still more that can be done to make open data simpler to use and to make it available to a wider audience.
The core goal of the DaPaaS project is to provide a Data- and Platform-as-a-Service environment, where 3rd parties (such as governmental organisations, SMEs, developers and larger companies) can publish and host both data sets and data-intensive applications, which can then be accessed by end-user applications in a cross-platform manner. You can find out more about DaPaaS on the detailed about page.
Essentially, DaPaaS aims to make publishing, consumption, and reuse of open data, as well as deploying open data applications, easier and cheaper for SMEs and small public bodies which otherwise may not have sufficient technical expertise, infrastructure and resources required to do so.
see also http://www.slideshare.net/eswcsummerschool/wed-roman-tutopendatapub-38742186
presentation from the 5th "EC Framework Programmes - funding opportunities" seminar organised by the Applied Research and Communications Fund
http://www.arcfund.net/arcartShow.php?id=16150
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Fwdays
- Business goal
- What is Fast Data for us
- What is Fast & Big Data solution
- Reference Architecture
- Data Science for Big Data
- Technology Stack
- Solution Architecture
- Identity & Telemetry Data Processing Facts
- Continuous Deployment
- Quality Control
Scylla Summit 2022: Scalable and Sustainable Supply Chains with DLT and ScyllaDBScyllaDB
Explore how IOTA addressed supply chain digitization challenges, including the role of data serialization formats (EPCIS 2.0), Distributed Ledgers (IOTA), and scalable, resilient databases (ScyllaDB) across specific use cases.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
In this webinar Thomas Cook, Sales Director, AnzoGraph DB, provides a history lesson on the origins of SPARQL, including its roots in the Semantic Web, and how linked open data is used to create Knowledge Graphs. Then, he dives into "What is RDF?", "What is a URI?" and "What is SPARQL?", wrapping up with a real-world demonstration via a Zeppelin notebook.
Narasimhan Sampath and Avinash Ramineni share how Choice Hotels International used Spark Streaming, Kafka, Spark, and Spark SQL to create an advanced analytics platform that enables business users to be self-reliant by accessing the data they need from a variety of sources to generate customer insights and property dashboards and enable data-driven decisions with minimal IT engagement. Narasimhan and Avinash highlight the architecture, lessons learned, and the challenges that were overcome on both the business and technology fronts.
The analytics platform is designed as a framework to enable self-service data intake, data processing, and report/model generation by the business users. The data-driven framework consists of a distributed hybrid-cloud data ingestor for data intake and a Cloudera CDH cluster with Spark as the distributed compute engine. The solution is built in such a way that storage and compute have been decoupled and encourages the concept of BYOC (bring your own compute). The platform uses EC2 instances to run CDH and leverages Amazon S3 as a data warehouse storage layer (data lake), Spark as an ETL engine, and Spark SQL as a distributed query engine. Results (computations/derived tables) are exposed to the end users via Spark SQL and are discovered via Tableau. The platform supports both batch and streaming use cases and is built on the following technology stack: AWS (S3, EC2, SQS, SNS), Cloudera CDH (YARN, Navigator, Sentry), Spark, Kafka, Spark SQL, and Spark Streaming.
Simplified minimalistic workflows for the publication of Linked Open DataSalvatore Virtuoso
Our colleague Yuri Glikman of Fraunhofer FOKUS (LinDA partner) presented the LinDA transformation tool at the recent Samos Summit (http://samos-summit.blogspot.de/).
PGDay.Amsterdam 2018 - Jeroen de Graaff - Step-by-step implementation of Post...PGDay.Amsterdam
Rijkswaterstaat is the Service of the Ministry of Infrastructure and Water Management in the Netherlands. During this presentation, I will share our journey to develop and apply PostgreSQL at Rijkswaterstaat. Our work is ICT-driven and access to our data, both historical and actual is key for executing our task now and in the future.
Jan van Ansem - Help a friend: how the Developers community can help to get Data Warehousing development up to date with modern development technology.
This XML Prague 2015 Pre-conference presentations shows practical usage of linked data sources. These sources can help to: enrich content with entities, add link to external data sources, use the enriched content in question answering, machine translation or other scenarios. The aim is to show the practical application of linked data sources in XML tooling. The presentation is an update and provides outcomes of the related session held at XML Prague 2014.
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
Memory Database Technology is Driving a New Cycle of Business InnovationVoltDB
In-memory database technology enables a new wave of fast data use cases that are extremely challenging and in some cases not possible with older technologies. In this webinar, Noel Yuhanna, Principal Analyst of Forrester Research, and VoltDB CMO, Peter Vescuso will discuss the latest market and data access technology trends, the new use cases these trends enable, and the implications for business and IT leaders.
Containers help us move our applications forward in a consistent, repeatable, and predictable manner, reducing the labor and making things simpler. Design patterns exist to help you solve common problems for which containers are a good solution. Providing a common language to the architecture.
Mike Stonebraker on Designing An Architecture For Real-time Event ProcessingVoltDB
In this webinar, Mike Stonebraker explains the tradeoffs of performance, scale, and required programming “heroics” in capturing the value of fast data with different stream processing alternatives. Then hear from John Hugg, Founding Engineer of VoltDB, as he discusses the VoltDB approach to fast data. Learn what’s possible when systems integrate event processing with state management in a consistent, transactional way. You can also view the webinar recording here: http://learn.voltdb.com/WRStonebrakerRTeventprocessing.html
Evaluation of TPC-H on Spark and Spark SQL in ALOJADataWorks Summit
The Evaluation of TPC-H on Spark and Spark SQL in ALOJA was conducted at the Big Data Lab to obtain the master degree in Management Information Systems at the Johann-Wolfgang Goethe University in Frankfurt, Germany. Furthermore, the analysis was partially accomplished in collaboration and close coordination with the Barcelona Super Computer Center.
The intention of this research was the integration of a TPC-H on Spark Scala benchmark into ALOJA, an open-source and public platform for automated and cost-efficient benchmarks and to perform an evaluation on the runtime of Spark Scala with or without Hive Metastore compared to Spark SQL. Various alternate file formats with different applied compressions on underlying data and its impact are evaluated. The conducted performance evaluation exposed diverse and captivating outcomes for both benchmarks. Further investigations attempt to detect possible bottlenecks and other irregularities. The aim is to provide an explanation to enhance knowledge of Spark’s engine based on examining the physical plans. Our experiments show, inter alia, that: (1) Spark Scala performs better in case of heavy expression calculation, (2) Spark SQL is the better choice in case of strong data access locality in combination with heavyweight parallel execution. In conclusion, diverse results were observed with the consequence that each API has its advantages and disadvantages.
Surprisingly, our findings are well spread between Spark SQL and Spark Scala and contrary to our expectations Spark Scala did not outperform Spark SQL in all aspects but support the idea that applied optimizations appear to be implemented in a different way by Spark for its core and its extension Spark SQL. The API on top of Spark provides extra information about the underlying structured data, which is probably used to perform additional optimizations.
In conclusion, our research demonstrates that there are differences in the generation of query execution plans that goes hand-in-hand with similar discoveries leading to inefficient joins, and it underlines the importance of our benchmark to identify disparities and bottlenecks.
Speaker
Raphael Radowitz, Quality Specialist, SAP Labs Korea
ML Production Pipelines: A Classification ModelDatabricks
In this talk, we will present how we tied Python together with Databricks and MLflow to productionalize a machine learning pipeline.
Through the deployment of a fairly standard classification model, we will present what a machine learning pipeline in Production could look like. The project consists of two pipelines; training and prediction. We are using the S3 Bucket as a source of data. The training pipeline trains various models on data, registers them in Mlflow, and stores all metrics and hyperparameters. Using Grid Search, the best model is chosen and moved to the Production Stage in MLflow. The Production model can then be deployed using Flask, or just a UDF if we want to process data in a batch. The prediction pipeline will then use the deployed model to make a prediction, whether on-demand or in a batch.
Enabling Low-cost Open Data Publishing and ReuseMarin Dimitrov
In the space of just a few years we’ve seen the transformational power of open data; both for transparency and accountability in public data, and efficiency and innovation with businesses in private data. In its first year, institutions and individuals throughout Europe have supported public sector bodies in releasing data and numerous start-ups, developers and SMEs in reusing this data for economic benefit.
However, we are still at the beginning of the open data movement, and there is still more that can be done to make open data simpler to use and to make it available to a wider audience.
The core goal of the DaPaaS project is to provide a Data- and Platform-as-a-Service environment, where 3rd parties (such as governmental organisations, SMEs, developers and larger companies) can publish and host both data sets and data-intensive applications, which can then be accessed by end-user applications in a cross-platform manner. You can find out more about DaPaaS on the detailed about page.
Essentially, DaPaaS aims to make publishing, consumption, and reuse of open data, as well as deploying open data applications, easier and cheaper for SMEs and small public bodies which otherwise may not have sufficient technical expertise, infrastructure and resources required to do so.
see also http://www.slideshare.net/eswcsummerschool/wed-roman-tutopendatapub-38742186
presentation from the 5th "EC Framework Programmes - funding opportunities" seminar organised by the Applied Research and Communications Fund
http://www.arcfund.net/arcartShow.php?id=16150
As software engineers we do trade-offs every day. We often need to pick between things like space vs time or budget vs scope. Or sometimes the amount of creative waste we can afford to have. And when we make the decision we need to be in full comprehension of both the upside and downside of a particular choice. In this talk we will discuss why our organization decided to move from Python to Java. We will go over each tradeoff we decided to do and the motivation behind it.
Много често, когато искаме да станем по-добри backend програмисти се опитваме да научим различни езици за програмиране и съответните библиотеки. Проблема е че в Rails, Express.js, Django или Zend Framework има горе долу едни и същи концепции. Ако искаме да се научим как да пишем код за големи системи, които скалират добре и се справят сами с различни грешки и неочаквани ситуации, трябва да овладеем един друг дял от човешкото познание, който се нарича разпределени системи. В моята презентация ще видим защо трябва да задълбаем в тях и какви са основните принципи като консистентност(consistency), достъпност(availability) и издръжливост на разделения(partition tolerance). Също, ще разгледаме стъпки, които всеки може да направи за да научи повече по темата и да получава нови и актуални знания.
Dec'2013 webinar from the EUCLID project on managing large volumes of Linked Data
webinar recording at https://vimeo.com/84126769 and https://vimeo.com/84126770
more info on EUCLID: http://euclid-project.eu/
Crossing the Chasm with Semantic TechnologyMarin Dimitrov
After more than a decade of active efforts towards establishing Semantic Web, Linked Data and related standards, the verdict of whether the technology has delivered its promise and has proven itself in the enterprise is still unclear, despite the numerous existing success stories.
Every emerging technology and disruptive innovation has to overcome the challenge of “crossing the chasm” between the early adopters, who are just eager to experiment with the technology potential, and the majority of the companies, who need a proven technology that can be reliably used in mission critical scenarios and deliver quantifiable cost savings.
Succeeding with a Semantic Technology product in the enterprise is a challenging task involving both top quality research and software development practices, but most often the technology adoption challenges are not about the quality of the R&D but about successful business model generation and understanding the complexities and challenges of the technology adoption lifecycle by the enterprise.
This talk will discuss topics related to the challenge of “crossing the chasm” for a Semantic Technology product and provide examples from Ontotext’s experience of successfully delivering Semantic Technology solutions to enterprises.
This talk was given by Jun Rao (Staff Software Engineer at LinkedIn) and Sam Shah (Senior Engineering Manager at LinkedIn) at the Analytics@Webscale Technical Conference (June 2013).
Webinar: Metadata Enrichment in PublishingOntotext
The slide deck from the October 29, 2015 webinar "Metadata Enrichment in Publishing: Boosting Productivity and Increasing User Engagement" presented by Ilian Uzunov and Georgi Georgiev.
Triplestores and inference, applications in Finance, text-mining. Projects and solutions for financial media and publishers.
Keystone Industrial Panel, ISWC 2014, Riva del Garda, 18 Oct 2014.
Thanks to Atanas Kiryakov for this presentation, I just cut it to size.
Gaining Advantage in e-Learning with Semantic Adaptive TechnologyOntotext
In this presentation, we will introduce you to a solution that involves adaptive semantic technology for educational institutions and e-learning providers. You will learn how to integrate 3rd party resources, legacy assets, and other content sources to create the so-called knowledge graph of all structured and unstructured data.
A bright, talented and self-motivated reporting analyst who has excellent organisational skills, is highly efficient and has a good eye for detail. Has ex tensive experience of analyzing data, understanding requirements and carrying out entire reporting process. Able to play a key role in analysing problems and come up with creative solutions to customers. A quick learner who can absorb new ideas and can communicate clearly and effectively.
Boston Hadoop Meetup: Presto for the EnterpriseMatt Fuller
Presentation Title: Presto for the Enterprise
Presenter(s): Matt Fuller and Kamil Bajda-Pawlikowski
Company: Teradata Center for Hadoop
Short Description:
Teradata will provide a technical overview and demo of Presto, focusing on Presto's architecture and Teradata's contributions to the project and community. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Originally developed by Facebook, Teradata now joins Facebook as the second largest contributor to the open source project. Come join us and learn more about Presto. And how you can join the Presto community.
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
Watch full webinar here: https://bit.ly/39AhUB7
Enterprise organizations are shifting to self-service analytics as business users need real-time access to holistic and consistent views of data regardless of its location, source or type for arriving at critical decisions.
Data Virtualization and Data Visualization work together through a universal semantic layer. Learn how they enable self-service data discovery and improve performance of your reports and dashboards.
In this session, you will learn:
- Challenges faced by business users
- How data virtualization enables self-service analytics
- Use case and lessons from customer success
- Overview of the highlight features in Tableau
Emerging technologies in academic libraries. A department by department overview. Data visualization, online reference, nextGen library platforms, open source software, digital asset and archive management systems, digital humanities, scientific and creative software, new physical spaces for libraries.
"Semantic Integration Is What You Do Before The Deep Learning". dev.bg Machine Learning seminar, 13 May 2019.
It's well known that 80\% of the effort of a data scientist is spent on data preparation. Semantic integration is arguably the best way to spend this effort more efficiently and to reuse it between tasks, projects and organizations. Knowledge Graphs (KG) and Linked Open Data (LOD) have become very popular recently. They are used by Google, Amazon, Bing, Samsung, Springer Nature, Microsoft Academic, AirBnb… and any large enterprise that would like to have a holistic (360 degree) view of its business. The Semantic Web (web 3.0) is a way to build a Giant Global Graph, just like the normal web is a Global Web of Documents. IEEE already talks about Big Data Semantics. We review the topic of KGs and their applicability to Machine Learning.
At Data-centric Architecture Forum 2020 Thomas Cook, our Sales Director of AnzoGraph DB, gave his presentation "Knowledge Graph for Machine Learning and Data Science". These are his slides.
RWDG Webinar: Big Data & BI Analytics Require Data GovernanceDATAVERSITY
Business Intelligence (BI) used to be equated to Data Warehousing. In this day of Big Data and improved analytical technologies and capabilities, BI now means a lot more. Where governing data in the data warehouse was a challenge – governing the volume of Big Data in variable formats coming at us from all directions at a high velocity to maximize its analytical value has become paramount to differentiating an organization from its competition.
Join Bob Seiner for a Real-World Data Governance webinar focused on strengthening the relationship between Data Governance and corporate Big Data & Business Intelligence initiatives. This session will focus on expanding existing programs to address the expanding needs of the organization and building new programs to address the broadened definition of BI.
This webinar will cover:
Existing Governance Applications for BI
Future of Big Data & BI Data
Relationship between Big Data, BI and Governance
Articulating Governance Value in Terms of BI
True Intelligence Derived from Governed Data
A bright, talented and self-motivated reporting analyst who has excellent organisational skills, is highly efficient and has a good eye for detail. Has ex tensive experience of analyzing data, understanding requirements and carrying out entire reporting process. Able to play a key role in analysing problems and come up with creative solutions to customers. A quick learner who can absorb new ideas and can communicate clearly and effectively.
Similar to Text Analytics & Linked Data Management As-a-Service (20)
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Marin Dimitrov
High-performing engineering teams regularly dedicate time on measuring the performance & quality of the systems and applications they’re building or on measuring & improving the various aspects of the development lifecycle. High-performing product companies are also data-driven when it comes to measuring the impact of new features & products in terms of business KPIs and Northstar metrics.
Can a data-driven approach be applied to measuring the performance, maturity and continuous improvement of an engineering team or the whole engineering organisation? In this discussion we’ll cover various important topics related to quantifying the performance of an engineering organisation
The career development of our teammates is among the key responsibilities of a leader - and оur personal career development vision & plan plays a critical role for our long term growth and success. Despite their importance, our career vision is often not getting enough attention and level of detail, or is hampered by easily avoidable mistakes. In this discussion, we’ll address typical mistakes related to long-term career planning, some best practices, and practical steps for building our own long-term career development vision (or the ones of the teammates we are leading), so that career planning becomes a long term journey with clear why/how/what, rather than just a list of SMART goals
Uber began its open source journey in 2015 when three passionate engineers decided to contribute Uber’s work back to the community. In only four years, Uber’s open source program has fostered 350+ outstanding open source projects with 2,000+ contributors worldwide delivering over 70,000 commits. Since 2017, four of Uber’s open source projects have won InfoWorld’s Best of Open Source Software Awards. In this talk, Brian Hsieh & Marin Dimitrov will share more details on Uber’s open source journey, program and best practices, and how Uber enables open innovation by fostering a healthy and collaborative open source culture
Trust - the Key Success Factor for Teams & OrganisationsMarin Dimitrov
>>> Most leaders agree that trust is a key factor for the success o the team and the organisation and that they are actively working to build trust. And yet, various studies imply that almost half of the teams and organisations worldwide experience lower trust levels with their managers, teammates and the rest of the organisation, which leads to decreased engagement, productivity and success.
>>> In this talk we will discuss why trust is a key success factor for every team and every organisation, some good practices for building, sustaining and rebuilding trust, as well as the most common mistakes related to trust building
talk @ the Computer Science department of Sofia University - practical advice for career growth for students
DEV.BG event http://dev.bg/%D1%81%D1%8A%D0%B1%D0%B8%D1%82%D0%B8%D0%B5/fmi-club-%D0%BF%D1%80%D0%B0%D0%BA%D1%82%D0%B8%D1%87%D0%BD%D0%B8-%D1%81%D1%8A%D0%B2%D0%B5%D1%82%D0%B8-%D0%B7%D0%B0-%D0%BA%D0%B0%D1%80%D0%B8%D0%B5%D1%80%D0%BD%D0%BE-%D1%80%D0%B0%D0%B7%D0%B2%D0%B8%D1%82/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Text Analytics & Linked Data Management As-a-Service
1. Text Analytics & Linked Data
Management As-a-Service
Marin Dimitrov, Alex Simov, Yavor Petkov
May 31st, 2015
Text Analytics & Linked Data Management -aaS / Wasabi’2015 #1May 2015
2. About Ontotext
• Provides products & solutions for content
enrichment and metadata management
– 70 employees, headquarters in Sofia (Bulgaria)
– Sales presence in London, NYC & Boston
• Major clients and industries
– Media & Publishing
– Health Care & Life Sciences
– Cultural Heritage & Digital Libraries
– Government
– Education
#2Text Analytics & Linked Data Management -aaS / Wasabi’2015 May 2015
3. • Semantic Technology adoption challenges
• The Self-Service Semantic Suite (S4)
• Lessons learned
Contents
#3Text Analytics & Linked Data Management -aaS / Wasabi’2015 May 2015
5. Time-to-value gap (Gartner)
#5Text Analytics & Linked Data Management -aaS / Wasabi’2015 May 2015
From Wasabi @
ESWC’2014
Performance,
Integration,
Penetration,
Payback & ROI
6. • Limiting factors
– Complexity & cost of existing solutions
– Limited resources to evaluate novel technologies
(startups)
– Slow procurement processes, risk aversion (enterprises)
• How can we…
– Reduce time-to-market
– Reduce adoption risks
– Optimise costs
Semantic Technology adoption
#6Text Analytics & Linked Data Management -aaS / Wasabi’2015 May 2015
7. The Self-Service Semantic Suite
(S4)
#7Text Analytics & Linked Data Management -aaS / Wasabi’2015 May 2015
8. • Capabilities for text analytics, content enrichment
and smart data management
– Text analytics for news, life sciences and social media
– RDF graph database as-a-service
– Access to large open knowledge graphs
• Available on-demand, anytime, anywhere
– Simple RESTful services
• Simple pay-per-use pricing
– No upfront commitments
What is S4?
#8Text Analytics & Linked Data Management -aaS / Wasabi’2015 May 2015
9. What is S4?
#9Text Analytics & Linked Data Management -aaS / Wasabi’2015 May 2015
10. • Enables quick prototyping
– Instantly available, no provisioning & operations
required
– Focus on building applications, don’t worry about
infrastructure
• Free tier!
• Easy to start, shorter learning curve
– Various add-ons, SDKs and demo code
• Based on enterprise semantic technology
Benefits
#10Text Analytics & Linked Data Management -aaS / Wasabi’2015 May 2015
11. • Text analytics services
– News annotation
– News categorisation
– Biomedical
– Twitter
• Entity linking & disambiguation
– Mappings to DBpedia & GeoNames instances
– Mappings to biomedical data sources (LinkedLifeData)
• HTML, MS Word, XML, plain text input
• Simple JSON output
Text analytics with S4
#11Text Analytics & Linked Data Management -aaS / Wasabi’2015 May 2015
13. • Low-cost graph DBaaS available 24/7
• Ideal for small & moderate data volumes
– database options: 1M, 10M, 50M, 250M and 1B triples
• Instantly deploy new databases when needed
• Zero administration: automated operations,
maintenance & upgrades
• Users pay only for the actual database utilisation
– Number of triples stored + number of queries per month
• OpenRDF REST API
Fully managed RDF DB in the Cloud
#13Text Analytics & Linked Data Management -aaS / Wasabi’2015 May 2015
14. Fully managed RDF DB in the Cloud
#14Text Analytics & Linked Data Management -aaS / Wasabi’2015 May 2015
15. • SPARQL query endpoint to the FactForge semantic
data warehouse
– 500 million entities / 5 billion triples
• Key LOD datasets integrated
– DBpedia, Freebase/WikiData, GeoNames, WordNet
– Dublin Core, SKOS, PROTON ontologies and
vocabularies
Knowledge graphs with S4
#15Text Analytics & Linked Data Management -aaS / Wasabi’2015 May 2015
16. Cloud native architecture of S4
#16Text Analytics & Linked Data Management -aaS / Wasabi’2015 May 2015
Elasticity vs
High Availability vs
Cost Efficiency
18. • You must build a “cost aware” cloud platform
• Cloud-native architectures are more efficient, but
more difficult to build
• A microservices architecture improve system
resilience & agility, but difficult to design right
• Extensive and continuous benchmarking &
monitoring
– Some problems emerge only at large scale
• Assume failures will happen & design for resilience
Lessons learned
#18Text Analytics & Linked Data Management -aaS / Wasabi’2015 May 2015