DBPedia past, present and future - Dimitris Kontokostas. Reveals recent developments in the Linked Data and knowledge graphs field and how DBPedia progress with wikipedia data.
Alexander Aldev - Co-founder and CTO of MammothDB, currently focused on the architecture of the distributed database engine. Notable achievements in the past include managing the launch of the first triple-play cable service in Bulgaria and designing the architecture and interfaces from legacy systems of DHL Global Forwarding's data warehouse. Has lectured on Hadoop at AUBG and MTel.
"The future of Big Data tooling" will briefly review the architectural concepts of current Big Data tools like Hadoop and Spark. It will make the argument, from the perspective of both technology and economics, that the future of Big Data tools is in optimizing local storage and compute efficiency.
The document provides an introduction to Prof. Dr. Sören Auer and his background in knowledge graphs. It discusses his current role as a professor and director focusing on organizing research data using knowledge graphs. It also briefly outlines some of his past roles and major scientific contributions in the areas of technology platforms, funding acquisition, and strategic projects related to knowledge graphs.
Das Semantische Daten Web für UnternehmenSören Auer
This document summarizes the vision, technology, and applications of the Semantic Data Web for businesses. It discusses how the Semantic Web can help solve problems of searching for complex information across different data sources by complementing text on web pages with structured linked open data. It provides overviews of RDF standards, vocabularies, and technologies like SPARQL and OntoWiki that allow creating and managing structured knowledge bases. It also presents examples like DBpedia that extract structured data from Wikipedia and make it available on the web as linked open data.
Big Data: Improving capacity utilization of transport companiesData Science Society
Asparuh Koev is a successful serial entrepreneur. Currently, he is a CEO of Transmetrics, a solution for cargo transport companies that uses Big Data and predictive analytics. Asparuh holds a Bachelor’s Degree in Computer Science from the American University in Bulgaria and an MBA degree from the Vlerick Business School in Belgium.
“Big Data: Improving capacity utilization of transport companies” will explore the practical benefits, IT tools and challenges of implementing a big data solution in a traditional industry (cargo transport), using as a showcase a predictive analytics project done for Speedy.
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeSören Auer
This document discusses the development of linked open data and its potential to create an ecosystem of interlinked knowledge. It outlines achievements in extending the web with structured data and the growth of an open research community. However, it also identifies challenges regarding coherence, quality, performance and usability that must be addressed for linked data to reach its full potential as a global platform for knowledge integration. The document proposes that addressing these issues could ultimately lead to an ecosystem of interlinked knowledge on the semantic web.
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
The document discusses the relationship between building information modeling (BIM) and the semantic web. It provides an introduction to linked data and describes how semantic web technologies can be used to add contextual and background knowledge to BIM data, such as geographical, historical, and statistical information. It also addresses challenges around preserving and maintaining the evolution of linked BIM and architecture data on the semantic web.
Enterprise knowledge graphs use semantic technologies like RDF, RDF Schema, and OWL to represent knowledge as a graph consisting of concepts, classes, properties, relationships, and entity descriptions. They address the "variety" aspect of big data by facilitating integration of heterogeneous data sources using a common data model. Key benefits include providing background knowledge for various applications and enabling intra-organizational data sharing through semantic integration. Challenges include ensuring data quality, coherence, and managing updates across the knowledge graph.
Creating knowledge out of interlinked dataSören Auer
This document discusses creating knowledge from interlinked data. It notes that while reasoning over large datasets does not currently scale well, linked data approaches are more feasible as they allow for incremental improvement. The document outlines the linked data lifecycle including extraction, storage and querying, authoring, linking, and enrichment of semantic data. It provides examples of projects that extract, store, author and link diverse datasets including DBpedia, LinkedGeoData, and statistical data. Challenges discussed include improving query performance, developing standardized interfaces, and increasing the amount of interlinking between datasets.
Alexander Aldev - Co-founder and CTO of MammothDB, currently focused on the architecture of the distributed database engine. Notable achievements in the past include managing the launch of the first triple-play cable service in Bulgaria and designing the architecture and interfaces from legacy systems of DHL Global Forwarding's data warehouse. Has lectured on Hadoop at AUBG and MTel.
"The future of Big Data tooling" will briefly review the architectural concepts of current Big Data tools like Hadoop and Spark. It will make the argument, from the perspective of both technology and economics, that the future of Big Data tools is in optimizing local storage and compute efficiency.
The document provides an introduction to Prof. Dr. Sören Auer and his background in knowledge graphs. It discusses his current role as a professor and director focusing on organizing research data using knowledge graphs. It also briefly outlines some of his past roles and major scientific contributions in the areas of technology platforms, funding acquisition, and strategic projects related to knowledge graphs.
Das Semantische Daten Web für UnternehmenSören Auer
This document summarizes the vision, technology, and applications of the Semantic Data Web for businesses. It discusses how the Semantic Web can help solve problems of searching for complex information across different data sources by complementing text on web pages with structured linked open data. It provides overviews of RDF standards, vocabularies, and technologies like SPARQL and OntoWiki that allow creating and managing structured knowledge bases. It also presents examples like DBpedia that extract structured data from Wikipedia and make it available on the web as linked open data.
Big Data: Improving capacity utilization of transport companiesData Science Society
Asparuh Koev is a successful serial entrepreneur. Currently, he is a CEO of Transmetrics, a solution for cargo transport companies that uses Big Data and predictive analytics. Asparuh holds a Bachelor’s Degree in Computer Science from the American University in Bulgaria and an MBA degree from the Vlerick Business School in Belgium.
“Big Data: Improving capacity utilization of transport companies” will explore the practical benefits, IT tools and challenges of implementing a big data solution in a traditional industry (cargo transport), using as a showcase a predictive analytics project done for Speedy.
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeSören Auer
This document discusses the development of linked open data and its potential to create an ecosystem of interlinked knowledge. It outlines achievements in extending the web with structured data and the growth of an open research community. However, it also identifies challenges regarding coherence, quality, performance and usability that must be addressed for linked data to reach its full potential as a global platform for knowledge integration. The document proposes that addressing these issues could ultimately lead to an ecosystem of interlinked knowledge on the semantic web.
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
The document discusses the relationship between building information modeling (BIM) and the semantic web. It provides an introduction to linked data and describes how semantic web technologies can be used to add contextual and background knowledge to BIM data, such as geographical, historical, and statistical information. It also addresses challenges around preserving and maintaining the evolution of linked BIM and architecture data on the semantic web.
Enterprise knowledge graphs use semantic technologies like RDF, RDF Schema, and OWL to represent knowledge as a graph consisting of concepts, classes, properties, relationships, and entity descriptions. They address the "variety" aspect of big data by facilitating integration of heterogeneous data sources using a common data model. Key benefits include providing background knowledge for various applications and enabling intra-organizational data sharing through semantic integration. Challenges include ensuring data quality, coherence, and managing updates across the knowledge graph.
Creating knowledge out of interlinked dataSören Auer
This document discusses creating knowledge from interlinked data. It notes that while reasoning over large datasets does not currently scale well, linked data approaches are more feasible as they allow for incremental improvement. The document outlines the linked data lifecycle including extraction, storage and querying, authoring, linking, and enrichment of semantic data. It provides examples of projects that extract, store, author and link diverse datasets including DBpedia, LinkedGeoData, and statistical data. Challenges discussed include improving query performance, developing standardized interfaces, and increasing the amount of interlinking between datasets.
This document discusses graph databases and provides an overview of Neo4j. It describes how graph databases are useful for modeling connected data and performing complex queries over relationships. The document outlines the benefits of graph databases like expressing the domain as a graph and using graph traversals for queries. It then provides details on Neo4j, describing it as a widely used open source graph database that is scalable and supports ACID transactions. The document includes examples of creating nodes and relationships in Neo4j and traversing the graph.
Graph databases use graph structures to represent and store data, with nodes connected by edges. They are well-suited for interconnected data. Unlike relational databases, graph databases allow for flexible schemas and querying of relationships. Common uses of graph databases include social networks, knowledge graphs, and recommender systems.
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...Jens Mittelbach
D:SWARM is a graphical web-based ETL modelling tool that serves to import data from heterogeneous sources with different formats, to map input to output schemata and design transformation workflows, to load transformed data into property graph database. It is developed in a collaborative project by SLUB Dresden (www.slub-dresden.de) and Avantgarde Labs GmbH (www.avantgarde-labs.de) features additional functionalities like exporting of data models as RDF and sharing mappings and transformation workflows.
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
Linked data for Enterprise Data IntegrationSören Auer
The Web evolves into a Web of Data. In parallel Intranets of large companies will evolve into Data Intranets based on the Linked Data principles. Linked Data has the potential to complement the SOA paradigm with a light-weight, adaptive data integration approach.
This document provides an overview of graph databases and Neo4j. It discusses how graph databases are better suited than relational databases for interconnected data and have simpler data models. Neo4j is highlighted as a graph database that uses nodes, edges and properties to represent data and uses the Cypher query language. It is fully ACID compliant, open source, and has a large active community.
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)Jan Polowinski
d:swarm is a middleware for data integration and management currently developed by the Saxon State and University Library Dresden in cooperation with Avantgarde Labs.
Towards digitizing scholarly communicationSören Auer
Slides of the VIVO 2016 Conference keynote: Despite the availability of ubiquitous connectivity and information technology, scholarly communication has not changed much in the last hundred years: research findings are still encoded in and decoded from linear, static articles and the possibilities of digitization are rarely used. In this talk, we will discuss strategies for digitizing scholarly communication. This comprises in particular: the use of machine-readable, dynamic content; the description and interlinking of research artifacts using Linked Data; the crowd-sourcing of multilingual
educational and learning content. We discuss the relation of these developments to research information systems and how they could become part of an open ecosystem for scholarly communication.
Semantic Graph Databases: The Evolution of Relational DatabasesCambridge Semantics
In this webinar, Barry Zane, our Vice President of Engineering, discusses the evolution of databases from Relational to Semantic Graph and the Anzo Graph Query Engine, the key element of scale in the Anzo Smart Data Lake. Based on elastic clustered, in-memory computing, the Anzo Graph Query Engine offers interactive ad hoc query and analytics on datasets with billions of triples. With this powerful layer over their data, end users can effect powerful analytic workflows in a self-service manner.
Analytics and Access to the UK web archiveLewis Crawford
The document summarizes the background, purpose, and methods of the UK Web Archive. It discusses how the archive collects, stores, and provides access to snapshots of UK websites over time to preserve digital cultural heritage. It also describes challenges of scale due to the immense size of web content and techniques like full-text search and data analytics that are used to facilitate discovery of information within the archive.
MongoDB is a cross-platform document-oriented database that uses JSON-like documents with dynamic schemas, making it easier and faster to integrate data compared to traditional relational databases. It is developed by MongoDB Inc. and is open-source. MongoDB supports features like ad hoc queries, indexing, replication for high availability, automatic load balancing, and horizontal scalability. It is a popular choice for storing large datasets and powering modern applications.
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
Today, decision makers in enterprises have to rely more and more on a variety of data sets that are internally but also externally available in heterogeneous formats. Therefore, intelligent processes are required to build an integrated knowledge-base. Unfortunately, the adoption of the Linked Data lifecycle within enterprises, which targets the extraction, interlinking, publishing and analytics of distributed data, lags behind the public domain due to missing frameworks that are efficiently to deploy and ease to use. In this paper, we present our adoption of the lifecycle through our generic, enterprise-ready Linked Data workbench. To judge its benefits, we describe its application within a real-world Customer Relationship Management scenario. It shows (1) that sales employee could significantly reduce their workload and (2) that the integration of sophisticated Linked Data tools come with an obvious positive Return on Investment.
Choosing the Right Graph Database to Succeed in Your ProjectOntotext
The document discusses choosing the right graph database for projects. It describes Ontotext, a provider of graph database and semantic technology products. It outlines use cases for graph databases in areas like knowledge graphs, content management, and recommendations. The document then examines Ontotext's GraphDB semantic graph database product and how it can address key use cases. It provides guidance on choosing a GraphDB option based on project stage from learning to production.
Big data is characterized by large and complex datasets that are difficult to process using traditional software. These massive volumes of data, known by characteristics like volume, velocity, variety, veracity, value, and volatility, can provide insights to address business problems. Google Cloud Platform offers tools like Cloud Storage, Big Query, PubSub, Dataflow, and Cloud Storage storage classes that can handle big data according to these characteristics and help extract value from large and diverse datasets.
Linked Data Experiences at Springer NatureMichele Pasin
An overview of how we're using semantic technologies at Springer Nature, and an introduction to our latest product: www.scigraph.com
(Keynote given at http://2016.semantics.cc/, Leipzig, Sept 2016)
DANS is a institute of KNAW and NWO that provides open data repository and analysis services through its DataverseNL platform. DataverseNL allows researchers to share, archive, and analyze research data. It provides features such as persistent identifiers, metadata harvesting, and tools for visualization, statistics, and linking datasets. DataverseNL aims to make data Findable, Accessible, Interoperable and Re-usable in accordance with FAIR data principles. It can also integrate with other data archives and repositories to preserve and provide access to research data.
The Power of Semantic Technologies to Explore Linked Open DataOntotext
Atanas Kiryakov's, Ontotext’s CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) – a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotext’s own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
This document discusses big data solutions and introduces Hadoop. It defines common big data problems related to volume, velocity, and variety of data. Traditional storage does not work well for this type of unstructured data. Hadoop provides solutions through HDFS for storage, MapReduce for processing, and additional tools like HBase, Pig, Hive, Zookeeper, and Spark to handle different data and analytic needs. Each tool is described briefly in terms of its purpose and how it works with Hadoop.
Real-time information analysis: social networks and open dataData Science Society
Plamen Penev - Co-founder of Yatrus Analytics, graduated from the University of Essex (Political Science and IR). He works In Yatrus Analytics in the fields of NLP(Natural Language Processing), Text mining, Graph analysis.
"Real-time information analysis: social networks and open data" will focus on the problem of real-time multisource data analytics, as well as the variety and the combination of many various data sources, the blending of those in real-time. The information discovery from Twitter and other sources as seen through the state of NLP (Event extraction, Classification of events, Sentiment analysis) in combination with financial and economic data.
Tweeting beyond Facts – The Need for a Linguistic PerspectiveData Science Society
The document discusses applying linguistic principles to natural language processing tasks. It argues that a trigger-scope approach to analyzing negation, modality, and speculative language has proven effective. The approach uses general linguistic modules as preprocessing before applying domain-specific models. Underappreciated linguistic elements like numbers, amounts, locations, and modifiers provide useful information for tasks. A suite of language-oriented preprocessing modules could improve downstream specialized processing by adapting general linguistic treatments to specific domains.
This document discusses graph databases and provides an overview of Neo4j. It describes how graph databases are useful for modeling connected data and performing complex queries over relationships. The document outlines the benefits of graph databases like expressing the domain as a graph and using graph traversals for queries. It then provides details on Neo4j, describing it as a widely used open source graph database that is scalable and supports ACID transactions. The document includes examples of creating nodes and relationships in Neo4j and traversing the graph.
Graph databases use graph structures to represent and store data, with nodes connected by edges. They are well-suited for interconnected data. Unlike relational databases, graph databases allow for flexible schemas and querying of relationships. Common uses of graph databases include social networks, knowledge graphs, and recommender systems.
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...Jens Mittelbach
D:SWARM is a graphical web-based ETL modelling tool that serves to import data from heterogeneous sources with different formats, to map input to output schemata and design transformation workflows, to load transformed data into property graph database. It is developed in a collaborative project by SLUB Dresden (www.slub-dresden.de) and Avantgarde Labs GmbH (www.avantgarde-labs.de) features additional functionalities like exporting of data models as RDF and sharing mappings and transformation workflows.
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
Linked data for Enterprise Data IntegrationSören Auer
The Web evolves into a Web of Data. In parallel Intranets of large companies will evolve into Data Intranets based on the Linked Data principles. Linked Data has the potential to complement the SOA paradigm with a light-weight, adaptive data integration approach.
This document provides an overview of graph databases and Neo4j. It discusses how graph databases are better suited than relational databases for interconnected data and have simpler data models. Neo4j is highlighted as a graph database that uses nodes, edges and properties to represent data and uses the Cypher query language. It is fully ACID compliant, open source, and has a large active community.
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)Jan Polowinski
d:swarm is a middleware for data integration and management currently developed by the Saxon State and University Library Dresden in cooperation with Avantgarde Labs.
Towards digitizing scholarly communicationSören Auer
Slides of the VIVO 2016 Conference keynote: Despite the availability of ubiquitous connectivity and information technology, scholarly communication has not changed much in the last hundred years: research findings are still encoded in and decoded from linear, static articles and the possibilities of digitization are rarely used. In this talk, we will discuss strategies for digitizing scholarly communication. This comprises in particular: the use of machine-readable, dynamic content; the description and interlinking of research artifacts using Linked Data; the crowd-sourcing of multilingual
educational and learning content. We discuss the relation of these developments to research information systems and how they could become part of an open ecosystem for scholarly communication.
Semantic Graph Databases: The Evolution of Relational DatabasesCambridge Semantics
In this webinar, Barry Zane, our Vice President of Engineering, discusses the evolution of databases from Relational to Semantic Graph and the Anzo Graph Query Engine, the key element of scale in the Anzo Smart Data Lake. Based on elastic clustered, in-memory computing, the Anzo Graph Query Engine offers interactive ad hoc query and analytics on datasets with billions of triples. With this powerful layer over their data, end users can effect powerful analytic workflows in a self-service manner.
Analytics and Access to the UK web archiveLewis Crawford
The document summarizes the background, purpose, and methods of the UK Web Archive. It discusses how the archive collects, stores, and provides access to snapshots of UK websites over time to preserve digital cultural heritage. It also describes challenges of scale due to the immense size of web content and techniques like full-text search and data analytics that are used to facilitate discovery of information within the archive.
MongoDB is a cross-platform document-oriented database that uses JSON-like documents with dynamic schemas, making it easier and faster to integrate data compared to traditional relational databases. It is developed by MongoDB Inc. and is open-source. MongoDB supports features like ad hoc queries, indexing, replication for high availability, automatic load balancing, and horizontal scalability. It is a popular choice for storing large datasets and powering modern applications.
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
Today, decision makers in enterprises have to rely more and more on a variety of data sets that are internally but also externally available in heterogeneous formats. Therefore, intelligent processes are required to build an integrated knowledge-base. Unfortunately, the adoption of the Linked Data lifecycle within enterprises, which targets the extraction, interlinking, publishing and analytics of distributed data, lags behind the public domain due to missing frameworks that are efficiently to deploy and ease to use. In this paper, we present our adoption of the lifecycle through our generic, enterprise-ready Linked Data workbench. To judge its benefits, we describe its application within a real-world Customer Relationship Management scenario. It shows (1) that sales employee could significantly reduce their workload and (2) that the integration of sophisticated Linked Data tools come with an obvious positive Return on Investment.
Choosing the Right Graph Database to Succeed in Your ProjectOntotext
The document discusses choosing the right graph database for projects. It describes Ontotext, a provider of graph database and semantic technology products. It outlines use cases for graph databases in areas like knowledge graphs, content management, and recommendations. The document then examines Ontotext's GraphDB semantic graph database product and how it can address key use cases. It provides guidance on choosing a GraphDB option based on project stage from learning to production.
Big data is characterized by large and complex datasets that are difficult to process using traditional software. These massive volumes of data, known by characteristics like volume, velocity, variety, veracity, value, and volatility, can provide insights to address business problems. Google Cloud Platform offers tools like Cloud Storage, Big Query, PubSub, Dataflow, and Cloud Storage storage classes that can handle big data according to these characteristics and help extract value from large and diverse datasets.
Linked Data Experiences at Springer NatureMichele Pasin
An overview of how we're using semantic technologies at Springer Nature, and an introduction to our latest product: www.scigraph.com
(Keynote given at http://2016.semantics.cc/, Leipzig, Sept 2016)
DANS is a institute of KNAW and NWO that provides open data repository and analysis services through its DataverseNL platform. DataverseNL allows researchers to share, archive, and analyze research data. It provides features such as persistent identifiers, metadata harvesting, and tools for visualization, statistics, and linking datasets. DataverseNL aims to make data Findable, Accessible, Interoperable and Re-usable in accordance with FAIR data principles. It can also integrate with other data archives and repositories to preserve and provide access to research data.
The Power of Semantic Technologies to Explore Linked Open DataOntotext
Atanas Kiryakov's, Ontotext’s CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) – a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotext’s own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
This document discusses big data solutions and introduces Hadoop. It defines common big data problems related to volume, velocity, and variety of data. Traditional storage does not work well for this type of unstructured data. Hadoop provides solutions through HDFS for storage, MapReduce for processing, and additional tools like HBase, Pig, Hive, Zookeeper, and Spark to handle different data and analytic needs. Each tool is described briefly in terms of its purpose and how it works with Hadoop.
Real-time information analysis: social networks and open dataData Science Society
Plamen Penev - Co-founder of Yatrus Analytics, graduated from the University of Essex (Political Science and IR). He works In Yatrus Analytics in the fields of NLP(Natural Language Processing), Text mining, Graph analysis.
"Real-time information analysis: social networks and open data" will focus on the problem of real-time multisource data analytics, as well as the variety and the combination of many various data sources, the blending of those in real-time. The information discovery from Twitter and other sources as seen through the state of NLP (Event extraction, Classification of events, Sentiment analysis) in combination with financial and economic data.
Tweeting beyond Facts – The Need for a Linguistic PerspectiveData Science Society
The document discusses applying linguistic principles to natural language processing tasks. It argues that a trigger-scope approach to analyzing negation, modality, and speculative language has proven effective. The approach uses general linguistic modules as preprocessing before applying domain-specific models. Underappreciated linguistic elements like numbers, amounts, locations, and modifiers provide useful information for tasks. A suite of language-oriented preprocessing modules could improve downstream specialized processing by adapting general linguistic treatments to specific domains.
Ivo Mitov – Co-founder of Data Fusion Bulgaria, software consultant in the area of EAI and Big Data
"Real-time analytics with HBase" is focused on the usage of coprocessor framework in HBase for event complex processing and simple analytics. The presentation will describe monitoring use case in the context of complex SOA environment.
Have you ever searched for a flight online? Do you wonder when and where you can get the best price for your travel plans? And, why are there different flight prices? Now, do you want to know why it is hard to do a meta-search engine for travel, and especially for flights?
Presentation provided by SkyScanner, a leading travel search site offering unbiased, comprehensive and free flight, hotel and car hire search services, used by over 40 million unique visitors every month. Skyscanner opened its office in Sofia in October 2014 and is quickly growing its team here to help solve complex travel problems and continually improve their product.
Boyan Yankov presents the real life business case of VisagiSmile - cutting edge dental software for personalized smile design. The case includes both 2D imaging (facial landmarks, face classification) and 3D modelling (3D teeth model generation) based on data mining techniques and algorithms. After the main talk, Boyan would turn to the audience for ideas for solving the main challenge with the new product - Rebel. Dental - automation of mapping of 3D teeth models.
The document discusses using wavelet analysis techniques to identify hidden patterns in financial time series data. It presents the wavelet transform approach for filtering time series data and decomposing it into different time scales. Several examples are given showing how wavelet analysis can be used to study structural patterns in datasets like NASDAQ and for momentum-based trading strategies.
1) DBpedia started in 2007 when Sören Auer extracted infobox data from Wikipedia pages into RDF triples and collaborated with others to publish the first version.
2) DBpedia has grown significantly since then and the 2014 version contains over 4.5 million entities and 583 million triples extracted from over 100 languages.
3) For DBpedia to continue evolving, areas of focus include fusing data from different sources, validating information, using natural language processing to extract more from Wikipedia text, and developing enterprise solutions to integrate DBpedia knowledge graphs.
The web of interlinked data and knowledge strippedSören Auer
Linked Data approaches can help solve enterprise information integration (EII) challenges by complementing text on web pages with structured, linked open data from different sources. This allows for intelligently combining, integrating, and joining structured information across heterogeneous systems. A distributed, iterative, bottom-up integration approach using Linked Data may help solve the EII problem in large companies by taking a pay-as-you-go approach.
The Europeana Strategy and Linked Open DataDavid Haskiya
The document discusses Europeana's strategy for 2015-2020 and how linked open data and linked open data technologies will help realize this strategy. Key points:
- Europeana's strategy is to transition from metadata to graphs and from strings to things by making data and APIs more linked and open.
- Linked open data allows data from different sources to be combined and helps make content more findable on search engines and in knowledge panels.
- Europeana labs provides APIs, tools, documentation and data to help partners publish linked open data that can be reused in the Europeana portal and other applications.
I'm an expert on building commerical large scale systems based on Linked Data sources such as Freebase and DBpedia. I'm the creator of :BaseKB, which was the first correct conversion of Freebase to RDF and of Infovore, the open source
I do consulting on the following areas:
* Data processing with Hadoop and the design and construction of systems using Amazon Web Services
* Architecture and construction of systems that consume and produce Linked Data
* Construction and evaluation of intelligent systems that make subjective decisions (text search, text classification, machine learning, etc.)
I'm not at all interested in doing maintenance work on other people's code, but I am interested in helping you align your process, structure, and tools to speed up your development cycle, improve your products, and prevent developer burnout. I am not free to relocate at this time, but I collaborate all of the time with workers around the world and I can travel to your location, understand your needs and transfer skills to your workforce.
IFLA LIDASIG Open Session 2017: Introduction to Linked DataLars G. Svensson
At the IFLA Linked Data Special Interest Group open session in Wroclaw we briefly introduced the mission of the SIG and then went on to a brief introduction to what linked data is and why that topic is important to libraries.
The presentation was held jointly by Astrid Verheusen (general introduction to the SIG) and Lars G. Svensson (introduction to Linked Data)
The document provides an overview of the work done at DERI Galway, including developing technologies like SIOC, ActiveRDF, and BrowseRDF to interconnect online communities and enable semantic applications. It also describes JeromeDL, a digital library system that uses semantic metadata and services to allow users to collaboratively browse and share knowledge.
The MarcOnt Initiative aims to:
1) Develop tools for collaborative ontology development, including a portal for editing ontologies and mediation services for translating between formats.
2) Create a bibliographic ontology called MarcOnt that captures concepts from legacy formats to improve interoperability between digital libraries.
3) Enable domain experts to improve the ontology through collaboration and knowledge sharing using the provided tools.
Geo-annotations in Semantic Digital Libraries mdabrowski
The document discusses using geo-annotations and ontologies in digital libraries. It describes JeromeDL, a social semantic digital library that allows users to collaboratively annotate resources with metadata like geotags. It also describes the MarcOnt initiative which aims to develop tools for a collaborative ontology about bibliographic resources to improve interoperability between digital libraries and enable semantic search.
1. The document discusses big data concepts, architectures, and applications. It provides an overview of common big data storage models like HDFS and NoSQL databases.
2. It also describes computation frameworks like Apache Hadoop MapReduce and shows an example of how MapReduce processing works.
3. Finally, the document discusses popular big data use cases for organizations like recommendation systems, predictive analytics, sentiment analysis, and more to provide insights from large and diverse datasets.
The document summarizes the Meadville Public Library's transition to using open source software over time. It describes how the library started using open source tools like Linux routers and OpenBSD in the late 1990s and 2000s. It then migrated its integrated library system to Koha in the mid-2000s. The library also developed its own open source kiosk management software called Libki. The document outlines many common open source tools used by libraries and the benefits of using open source software, such as cost savings and community support.
This webinar will cover specific open source tools (some of which you may not have heard of before!) that work well for libraries and the benefits and challenges associated with their use. Meadville Public Library uses open source software on 90% of their public access computers.
Cindy Murdock Ames, IT Services Director and Kyle Hall, the library's on-staff developer, will share recommendations for libraries considering open source software and how to get started successfully. Cindy has been using open source software for over 10 years, which has allowed the library to save licensing costs and have more control over its computing environment. The library uses open source tools for their websites, e-mail, Internet firewall, wireless router, proxying, filtering, and productivity software. They use thin clients for Internet access and Koha for the circulation and public catalogs.
This presentation is from a TechSoup webinar. You can view the archive page (https://cc.readytalk.com/cc/schedule/display.do?udc=peinch14k2ix) for a recording and links to all of the many open source tools that were discussed. We have a lively conversation on our community forum (http://bit.ly/oslib) as Cindy and Kyle answered questions we didn't get to during the webinar.
This document discusses a presentation on multi-model data management. It introduces the topic of multi-model databases that can support multiple data models like document, graph, relational, and key-value models through a single integrated backend. The presentation covers topics like multi-model data storage, query languages, optimization, benchmarking, and open problems. Examples of multi-model databases like ArangoDB and OrientDB are provided.
The research group Agile Knowledge Engineering & Semantic Web (AKSW) was founded in 2006 and is now part of the Institute for Applied Informatics at the University of Leipzig. The AKSW aims to advance semantic web, knowledge engineering, and software engineering science and also bridges the gap between research results and applications. The AKSW team actively works on several funded projects involving knowledge management, semantic collaboration platforms, and applying semantic web technologies to applications like tourism information and requirements engineering.
The Standards Mosaic Opening the Way to New TechnologiesDave Lewis
Presents the mosaic of XML and linked data standards that can support the integration of future natural language technology for the localisation industry
ElasticSearch - index server used as a document databaseRobert Lujo
Presentation held on 5.10.2014 on http://2014.webcampzg.org/talks/.
Although ElasticSearch (ES) primary purpose is to be used as index/search server, in its featureset ES overlaps with common NoSql database; better to say, document database.
Why this could be interesting and how this could be used effectively?
Talk overview:
- ES - history, background, philosophy, featureset overview, focus on indexing/search features
- short presentation on how to get started - installation, indexing and search/retrieving
- Database should provide following functions: store, search, retrieve -> differences between relational, document and search databases
- it is not unusual to use ES additionally as an document database (store and retrieve)
- an use-case will be presented where ES can be used as a single database in the system (benefits and drawbacks)
- what if a relational database is introduced in previosly demonstrated system (benefits and drawbacks)
ES is a nice and in reality ready-to-use example that can change perspective of development of some type of software systems.
[Data Meetup] Data Science in Finance - Factor Models in FinanceData Science Society
In this talk Metodi Nikolov, a Quantitative Researcher, is reviewing, without being exhaustive, the usage of factor models in finance – from the simplest single factor linear regression models, through latent variables and beyond. The focus was not be put solely on stocks but rather, on exploring other data types. The hope is to give the listeners an appreciation for the different ways the models can be applied.
[Data Meetup] Data Science in Finance - Building a Quant ML pipelineData Science Society
Georgi Kirov shares a common market-neutral statistical arbitrage framework. It will help showcase the many different ways to structure a systematic research project. From data reconciliation and signal backtesting to optimization and execution, what are some principled ways to evaluate and compare ML ideas? This process inevitably depends on the characteristics of a specific strategy, for instance, if it is liquidity-taking or liquidity-making.
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MITData Science Society
Check out our Data Science Meetup devoted to Data Science in Journalism
Dr. Preslav Nakov, Principal Scientist at QCRI, presented the #Tanbih news aggregator, which makes people aware of what they are reading.
The aggregator features media profiles that show the general factuality of reporting, the degree of propagandistic content, the hyper-partisanship, the leading political ideology, the general frame of reporting, the stance with respect to various claims and topics, as well as the audience reach and the audience bias in social media. This is part of the Tanbih project, which is developed in collaboration with MIT.
Special thanks to our partners from #Ontotext, #Telelink and #Leanplum!
#DSS #DataMeetup
Vassil Lunchev, CEO of Homeheed (https://www.homeheed.com/) presented at our July Meetup how to detect fake listings using #ComputerVision and #MachineLearning.
Imagine that you have 600,000 real estate listings with a total of 5,000,000 photos. What you know is that many of these listings are fake and some of the challenges Vassil shared in his presentation how you can detect the fake ones including the approach that works, and those which do not. Apart from that, he presented what kind of additional data is necessary to detect the fake ones.
Boyan Bonev and Demir Tonchev from Gaida.AI covered the process from the very initial concept to working software. The focus of their talk at our July Meetup was put on the challenges the domain of real estate presents for some of the standard approaches and models (#CollaborativeFiltering).
In the presentation, you can find information about all the way from #DataExploration and #modeling to the nitty-gritty of getting it all up and running in a production environment.
Demir and Boyan shared lessons they learned, mistakes they made and things they are still looking to improve in Gaida.AI (https://www.gaida.ai/).
Lessons Learned: Linked Open Data implemented in 2 Use CasesData Science Society
In this presentation for the ESSnet Linked Open Statistics final event, Sergi Segiev is presenting the learned lessons from two implemented use cases with open data for finding valuable insights.
You can also refer to the presentation 'Data Reveals Corruption Practices' by Yasen Kiprov - http://bit.ly/2WsFxsP
The presentation with the topic AI methods for localization in a noisy environment, held by Ana Antonova and Kameliya Kosekova, was introduced at Robotics Days '19.
In the next slides, you can find information techniques for Robot localization in more details and several GitHub Repos on the topic.
Team Nishki consisted of 11th-graders is presenting a Hackathon ML solution to a Kaufland Airmap case for which they won a Datathon special award.
Used methodologies and algorithms: OCR, DarkFlow, YOLO
The solution can be found at:
https://www.datasciencesociety.net/datathon/kaufland-case-datathon-2019/
Team: Evgeni Dimov, Kalin Doichev, Kostadin Kostadinov and Aneta Tsvetkova
Data Science for Open Innovation in SMEs and Large CorporationsData Science Society
Latest trends in Data Science and why the open-source culture and open innovation is expanding so fast. Find more about Data Science Society, its latest activities and how they cooperate with different local communities around the world for stimulating the new forms of education. At the end of the presentation, there are the results of two business cases from a telecom company (SNA) and a German retailer (object detection), which were solved during the Data Science Society’s hackathons (Global Datathons).
Air Pollution in Sofia - Solution through Data Science by Kiwi teamData Science Society
Some of you have already know how serious is the problem with air pollution in the capital of Bulgaria, Sofia but ...
▶️Do you know how it could be solved?
Our community represented by 1800 members all around the world tried to tackle the issue at our previous #GlobalDatathon and our international #DataScience #MonthlyChallenge, part of a university program.
Team Kiwi is solving the problem by implementing algorithms and statistical methods for air pollution prediction in the next 24 hours.
This document discusses using machine learning for light curve analysis in astrophysics. Specifically, it discusses using supervised learning on light curve data to detect exoplanets via the transit method. A deep neural network can be trained on light curve data to distinguish true planetary transits from other events like eclipsing binaries or stellar variability. This trained model was then used to analyze new candidates from the Kepler space telescope, resulting in the confirmation of planets like Kepler 80 g and the eight planet system around Kepler 90.
#AcademiaDatathon Finlists' Solution of Crypto Datathon CaseData Science Society
Team UNWE, one of the finalists from #AcademiaDatathon will present their solution to the #cryptocurrency data case. Explore how to perform data modeling with ARIMA and Neural Network.
To learn more visit: https://bit.ly/2uhfF37
Video from the presentation: https://bit.ly/2LlaeYd
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Data Science Society
The whole NLP Data Science solution @ https://goo.gl/iEFb1L
Syntactic Parsing or Dependency Parsing is the task of recognizing a sentence and assigning a syntactic structure to it. The most widely used syntactic structure is the parse tree which can be generated using some parsing algorithms. These parse trees are useful in various applications like grammar checking or more importantly it plays a critical role in the semantic analysis stage. For example to answer the question “Who is the point guard for the LA Laker in the next game ?” we need to figure out its subject, objects, attributes to help us figure out that the user wants the point guard of the LA Lakers specifically for the next game. This was mostly the identification and extraction NLP task for team Coala at the First Global Online Datathon.
DNA Analytics - What does really goes into Sausages - Datathon2018 SolutionData Science Society
Link to whole Data Science solution: https://goo.gl/nY3iuE
The task for the Telelink case of the First Global Datathon 2018 is to obtain the complete set of genome traces found in a single food sample and ALL organisms that should not be found in the food sample. The business needs a solution to this DNA Sequence identification case for improved quality control to be utilized in supply chains supervision and health care and protection.
- by Polina Krustanova
Relationships between research tasks and data structure (basic methods and a...Data Science Society
This document discusses various statistical methods for analyzing different data structures, including descriptive statistics, segmentation, dimension reduction, and measures of association. For each statistical method, it provides the appropriate data scale and examples applying the method to real world datasets. Statistical methods are matched to the type of research task and data structure, whether involving only variables, dependent and independent variables, or time series analysis. Pros and cons are outlined for several predictive analytic techniques.
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)Rebecca Bilbro
To honor ten years of PyData London, join Dr. Rebecca Bilbro as she takes us back in time to reflect on a little over ten years working as a data scientist. One of the many renegade PhDs who joined the fledgling field of data science of the 2010's, Rebecca will share lessons learned the hard way, often from watching data science projects go sideways and learning to fix broken things. Through the lens of these canon events, she'll identify some of the anti-patterns and red flags she's learned to steer around.
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of March 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
Generative Classifiers: Classifying with Bayesian decision theory, Bayes’ rule, Naïve Bayes classifier.
Discriminative Classifiers: Logistic Regression, Decision Trees: Training and Visualizing a Decision Tree, Making Predictions, Estimating Class Probabilities, The CART Training Algorithm, Attribute selection measures- Gini impurity; Entropy, Regularization Hyperparameters, Regression Trees, Linear Support vector machines.
3. Get me all soccer players, who played as goalkeeper for a
club that has a stadium with more than 40.000 seats and
who are born in a country with more than 10 million
inhabitants
6. How it all started
- 2006 - Sören Auer (busy with his PhD) asking people: “Wikipedia
fact tables look like triples, don’t you want to write some
extractor?”
- 6 months later: Sören wrote the extractor himself and asked Jens
Lehmann to help with writing a paper
- Chris Bizer : “We are extracting people and place information from
Wikipedia too – lets join efforts and call it DBpedia.”
- Kingsley Idehen: “I need a showcase for my Virtuoso triple store.”
8. Taking a closer look
at heterogeneity…
- DBpedia Mappings wiki
9. Milestones
- 2008: DBpedia Live
- 2009: Scala-Based framework
- 2009: Mappings wiki
- 2011: Internationalization
- 2011: DBpedia Spotlight
- 2014: DBpedia Association
10. Now
DBpedia 2014 (English):
4.58 mio. entities and 583 mio. triples
131,2 mio. fact assertions (derived from infoboxes)
168,5 mio. triples representing Wikipedia structure
57,1 mio. links to external datasets
Localized DBpedia version for 125 languages, built from
corresponding Wikipedia versions
12 DBpedia language chapters
16. NLP
- Exploit the text…
- Let different NLP tools & approaches
compete for the best quality (in a certain
language)
- Need to define the interface (help needed)
17. Every Enterprise needs its DBpedia
- Represent common sense knowledge (DBpedia and
other LOD datasets) as well as the specific enterprise
knowledge
- Crystallization points for Linked Data intranets – an
addition to SOA facilitating enterprise-wide data linking
& integration
- Slicing & Dicing