Averbis provides text mining and natural language processing solutions including terminology management, text classification, and information extraction for domains such as life sciences, automotive, and publishing. They were awarded a contract by the European Patent Office to develop an automated patent classification system using text mining to classify over 80 million unpublished and published patents into the over 250,000 patent classification codes. The new system aimed to improve classification quality and speed compared to the previous manual process.
Metadata for Terminology / KOS ResourcesMarcia Zeng
1. Why do we need metadata for terminology resources? 2. What do we need to know about a terminology resource? 3. Is there a standardized set of metadata elements for terminology resources?-- a presentation at the "New Dimensions in Knowledge Organization Systems", a Joint NKOS/ CENDI Workshop, World Bank, Washington, DC. September 11, 2008 http://nkos.slis.kent.edu/2008workshop/NKOS-CENDI2008.htm
The state of KOS in the Linked Data movementMarcia Zeng
- The publishing, management, and interoperating of KOS for the Semantic Web.
Content: 1. Value vocabularies in the Linked Data Hub – CKAN The Data Hub
2. Semantic assets registries
2a. Asset Description Metadata Schema (ADMS)
2b. DCMI Application Profile for KOS Resource
3. Thesaurus data model (ISO 25964) and alignment with SKOS
(presented at ASIS&T 2012 Annual Conference)
Presentation: Big Data – From Strategy to Production - Mario Meir-Huber, Big Data Leader Eastern Europe, Teradata GmbH (AT), at the European Data Economy Workshop taking place back to back to SEMANTiCS2015 on 15 September 2015 in Vienna
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...StampedeCon
At StampedeCon 2014, Stephen O’Sullivan (Silicon Valley Data Science) presented "Beyond a Big Data Pilot: Building a Production Data Infrastructure."
Creating a data architecture involves many moving parts. By examining the data value chain, from ingestion through to analytics, we will explain how the various parts of the Hadoop and big data ecosystem fit together to support batch, interactive and realtime analytical workloads.
By tracing the flow of data from source to output, we’ll explore the options and considerations for components, including data acquisition, ingestion, storage, data services, analytics and data management. Most importantly, we’ll leave you with a framework for understanding these options and making choices.
Metadata for Terminology / KOS ResourcesMarcia Zeng
1. Why do we need metadata for terminology resources? 2. What do we need to know about a terminology resource? 3. Is there a standardized set of metadata elements for terminology resources?-- a presentation at the "New Dimensions in Knowledge Organization Systems", a Joint NKOS/ CENDI Workshop, World Bank, Washington, DC. September 11, 2008 http://nkos.slis.kent.edu/2008workshop/NKOS-CENDI2008.htm
The state of KOS in the Linked Data movementMarcia Zeng
- The publishing, management, and interoperating of KOS for the Semantic Web.
Content: 1. Value vocabularies in the Linked Data Hub – CKAN The Data Hub
2. Semantic assets registries
2a. Asset Description Metadata Schema (ADMS)
2b. DCMI Application Profile for KOS Resource
3. Thesaurus data model (ISO 25964) and alignment with SKOS
(presented at ASIS&T 2012 Annual Conference)
Presentation: Big Data – From Strategy to Production - Mario Meir-Huber, Big Data Leader Eastern Europe, Teradata GmbH (AT), at the European Data Economy Workshop taking place back to back to SEMANTiCS2015 on 15 September 2015 in Vienna
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...StampedeCon
At StampedeCon 2014, Stephen O’Sullivan (Silicon Valley Data Science) presented "Beyond a Big Data Pilot: Building a Production Data Infrastructure."
Creating a data architecture involves many moving parts. By examining the data value chain, from ingestion through to analytics, we will explain how the various parts of the Hadoop and big data ecosystem fit together to support batch, interactive and realtime analytical workloads.
By tracing the flow of data from source to output, we’ll explore the options and considerations for components, including data acquisition, ingestion, storage, data services, analytics and data management. Most importantly, we’ll leave you with a framework for understanding these options and making choices.
Presentation: (Open) Data Activities in the City of Vienna, Georg Sedlbauer Vienna Business Agency (AT), at the European Data Economy Workshop taking place back to back to SEMANTiCS2015 on 15 September 2015 in Vienna.
How to provide semantic capability to support the Australian population in their time of need?
Find out how Healthdirect Australia makes use of semantic technologies to improve findability on its information portal.
Main challenges:
- multiple websites that draw their content from multiple sources and are published across multiple devices.
- the "healthcare" domain is a confusing topic for many, filled with Latin names for diseases and industry jargon making it difficult for people to access and understand information
- accurate, automatic content classification
- support the publishing workflow for both edited and harvested/ingested/syndicated content
In the age of Big Data, filtering mechanisms have to professionalized to increase accessibility to data. This presentation, held at Knowledge Management Academy in Vienna, shows how technologies derived from the Semantic Web can help to establish more efficient means to manage data and information.
Presentation: BigDataEurope, by Martin Kaltenböck, Semantic Web Company (Austria), at the European Data Economy Workshop taking place back to back to SEMANTiCS2015 on 15 September 2015 in Vienna
Presentation: Data Activities in Austria, Lisbeth Mosnik, BMVIT (AT), at the European Data Economy Workshop taking place back to back to SEMANTiCS2015 on 15 September 2015 in Vienna.
Presentation: Study: #Big Data in #Austria, Mario Meir-Huber, Big Data Leader Eastern Europe, Teradata GmbH & Martin Köhler, Austrian Institute of Technology, AIT (AT), at the European Data Economy Workshop taking place back to back to SEMANTiCS2015 on 15 September 2015 in Vienna.
This talk was given at the International Semantic Web Conference (ISWC 2014).
We discuss how SKOS is a starting point for developing an enterprise linked data strategy. We show how taxonomies can be extended by ontologies and linked open data.
Establishing a Linked Data Warehouse build the basis for unified views on various information sources.
Presentation: ODINE - Open Data Incubator Europe, by Elena Simperl, University of Southampton & The ODI (UK), at the European Data Economy Workshop taking place back to back to SEMANTiCS2015 on 15 September 2015 in Vienna
See how the Simple Knowledge Organisation System (SKOS) can help to improve information management in various industries. The application scenarios are manifold, learn from real-world use cases.
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
No matter if you are new to Hadoop or have a mature cluster in production, scale will be a critical factor of your success with Hadoop. Are you ready to take the next big step as you scale out your data architecture?
Talend and Hortonworks discuss where we will help you learn how to implement an effective big data and Hadoop strategy across your IT infrastructure. You will learn:
How to grow a pilot into production
How to scale-out architecture & systems affordably
How to leverage the flexibility of Hadoop to optimize your data integration processes
Recording: http://www.talend.com/resources/webinars/starting-small-and-scaling-big-with-hadoop
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingSemantic Web Company
See how ontologies and taxonomies can play together to reach the ultimate goal, which is the cost-efficient creation and maintenance of an enterprise knowledge graph. The knowledge modelling methodology is supported by approaches taken from NLP, data science, and machine learning.
Big Data, IoT, data lake, unstructured data, Hadoop, cloud, and massively parallel processing (MPP) are all just fancy words unless you can find uses cases for all this technology. Join me as I talk about the many use cases I have seen, from streaming data to advanced analytics, broken down by industry. I’ll show you how all this technology fits together by discussing various architectures and the most common approaches to solving data problems and hopefully set off light bulbs in your head on how big data can help your organization make better business decisions.
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
Advances in text mining, analytics and machine learning are transforming our applications and enabling ever more powerful applications, yet most applications and platforms are designed to deal with a single (normalized) language. Hence as our applications and platforms are increasingly required to ingest international content, the challenge becomes to find ways to normalize content to a single language without compromising quality. An extension of this question in terms of such applications is also how we define quality in this context and what, if any, bi-products a localization effort can produce that may enhance the usefulness of the application.
This talk will, using patent searching as an example use case, review the challenges and possible solution approaches for handling localization effectively and will show what current emerging technology offers, what to expect and what not to expect and provide an introductory practical guide to handling localization in the context of data mining and analytics.
Presentation: (Open) Data Activities in the City of Vienna, Georg Sedlbauer Vienna Business Agency (AT), at the European Data Economy Workshop taking place back to back to SEMANTiCS2015 on 15 September 2015 in Vienna.
How to provide semantic capability to support the Australian population in their time of need?
Find out how Healthdirect Australia makes use of semantic technologies to improve findability on its information portal.
Main challenges:
- multiple websites that draw their content from multiple sources and are published across multiple devices.
- the "healthcare" domain is a confusing topic for many, filled with Latin names for diseases and industry jargon making it difficult for people to access and understand information
- accurate, automatic content classification
- support the publishing workflow for both edited and harvested/ingested/syndicated content
In the age of Big Data, filtering mechanisms have to professionalized to increase accessibility to data. This presentation, held at Knowledge Management Academy in Vienna, shows how technologies derived from the Semantic Web can help to establish more efficient means to manage data and information.
Presentation: BigDataEurope, by Martin Kaltenböck, Semantic Web Company (Austria), at the European Data Economy Workshop taking place back to back to SEMANTiCS2015 on 15 September 2015 in Vienna
Presentation: Data Activities in Austria, Lisbeth Mosnik, BMVIT (AT), at the European Data Economy Workshop taking place back to back to SEMANTiCS2015 on 15 September 2015 in Vienna.
Presentation: Study: #Big Data in #Austria, Mario Meir-Huber, Big Data Leader Eastern Europe, Teradata GmbH & Martin Köhler, Austrian Institute of Technology, AIT (AT), at the European Data Economy Workshop taking place back to back to SEMANTiCS2015 on 15 September 2015 in Vienna.
This talk was given at the International Semantic Web Conference (ISWC 2014).
We discuss how SKOS is a starting point for developing an enterprise linked data strategy. We show how taxonomies can be extended by ontologies and linked open data.
Establishing a Linked Data Warehouse build the basis for unified views on various information sources.
Presentation: ODINE - Open Data Incubator Europe, by Elena Simperl, University of Southampton & The ODI (UK), at the European Data Economy Workshop taking place back to back to SEMANTiCS2015 on 15 September 2015 in Vienna
See how the Simple Knowledge Organisation System (SKOS) can help to improve information management in various industries. The application scenarios are manifold, learn from real-world use cases.
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
No matter if you are new to Hadoop or have a mature cluster in production, scale will be a critical factor of your success with Hadoop. Are you ready to take the next big step as you scale out your data architecture?
Talend and Hortonworks discuss where we will help you learn how to implement an effective big data and Hadoop strategy across your IT infrastructure. You will learn:
How to grow a pilot into production
How to scale-out architecture & systems affordably
How to leverage the flexibility of Hadoop to optimize your data integration processes
Recording: http://www.talend.com/resources/webinars/starting-small-and-scaling-big-with-hadoop
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingSemantic Web Company
See how ontologies and taxonomies can play together to reach the ultimate goal, which is the cost-efficient creation and maintenance of an enterprise knowledge graph. The knowledge modelling methodology is supported by approaches taken from NLP, data science, and machine learning.
Big Data, IoT, data lake, unstructured data, Hadoop, cloud, and massively parallel processing (MPP) are all just fancy words unless you can find uses cases for all this technology. Join me as I talk about the many use cases I have seen, from streaming data to advanced analytics, broken down by industry. I’ll show you how all this technology fits together by discussing various architectures and the most common approaches to solving data problems and hopefully set off light bulbs in your head on how big data can help your organization make better business decisions.
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
Advances in text mining, analytics and machine learning are transforming our applications and enabling ever more powerful applications, yet most applications and platforms are designed to deal with a single (normalized) language. Hence as our applications and platforms are increasingly required to ingest international content, the challenge becomes to find ways to normalize content to a single language without compromising quality. An extension of this question in terms of such applications is also how we define quality in this context and what, if any, bi-products a localization effort can produce that may enhance the usefulness of the application.
This talk will, using patent searching as an example use case, review the challenges and possible solution approaches for handling localization effectively and will show what current emerging technology offers, what to expect and what not to expect and provide an introductory practical guide to handling localization in the context of data mining and analytics.
Registry types, Synergies and Differences (Data registry, metadata registry, terminology registry, ...) Talk at the NKOS Special Session, International Conference on Dublin Core and Metadata Applications, Berlin, 2008-09-22-26.
Semantic Web in Action: Ontology-driven information search, integration and a...Amit Sheth
Amit Sheth's Keynote talk given at: “Semantic Web in Action: Ontology-driven information search, integration and analysis,” Net Object Days 2003 and MATES03, Erfurt, Germany, September 23, 2003. http://knoesis.org
Note: slides 51-55 have audio.
My presentation at the http://neuroinformatics2017.org (Kuala Lumpur, Malaysia) on FAIR and FAIRsharing (previously BioSharing); metadata standards and their implementation by databases/repositories and adoption by journals' and funders' data policies.
Autonomous medical coding with discriminative transformersPatrick Nicolas
Application of transformers and deep learning to the extraction of medical codes and insurance claims from electronic health records. This presentation lists modeling challenges and pitfalls and analyzes various configurations for BERT encoder. It compares techniques for pre-training and fine-tuning run in the context of classification.
An all-day version of Access Innovations' Taxonomy Fundamentals workshop, presented by Marjorie M.K. Hlava and Bob Kasenchak at the 2014 Special Libraries Association (SLA) annual meeting in Vancouver, British Columbia on June 7, 2014.
Presentation to ImmPort Science Meeting, February 27, 2014 on the proper treatment of value sets in the Immport Immunology Database and Analysis Portal
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy ProjectIntland Software GmbH
This talk was presented by Jernej Plankar (Cosylab) at Intland Connect: Annual User Conference 2020 on 21 Oct 2020. To learn more, visit: https://intland.com/intland-connect-annual-user-conference-2020/
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...Semantic Web Company
Organising data, for most of us, means Excel spreadsheets and folders upon folders. Knowledge graph technology, however, organises data in ways similar to the brain – through context and relations. By connecting your data, you (and also machines) are able to gain context within your knowledge, helping you to make informed decisions based on all of the information you already have.
So, how can enterprises benefit from this and scale?
PwC Sr. Research Fellow for Emerging Tech, Alan Morrison, and Sebastian Gabler, Head of Sales of Semantic Web Company tackle the importance of Enterprise Knowledge Graphs and how these technologies scale business efficiency.
Learn about:
• Application-centric development to data-centric approaches
• How enterprise architects learn how to benefit from knowledge graphs: use cases
• Learn which use cases fit well to which type of graph, and which technologies are involved
• Understand how RDF helps with data integration.
• What is AI-assisted entity linking?
• Understand data virtualisation vs. materialisation
- Learn to understand what knowledge graphs are for
- Understand the structure of knowledge graphs (and how it relates to taxonomies and ontologies)
- Understand how knowledge graphs can be created using manual, semi-automatic, and fully automatic methods.
- Understand knowledge graphs as a basis for data integration in companies
- Understand knowledge graphs as tools for data governance and data quality management
- Implement and further develop knowledge graphs in companies
- Query and visualize knowledge graphs (including SPARQL and SHACL crash course)
- Use knowledge graphs and machine learning to enable information retrieval, text mining and document classification with the highest precision
- Develop digital assistants and question and answer systems based on semantic knowledge graphs
- Understand how knowledge graphs can be combined with text mining and machine learning techniques
- Apply knowledge graphs in practice: Case studies and demo applications
Deep Text Analytics - How to extract hidden information and aboutness from textSemantic Web Company
- Deep Text Analytics (DTA) is an application of Semantic AI
- DTA fuses methods and algorithms taken from language modeling, corpus linguistics, machine learning, knowledge representation and the semantic web result into Deep Text Analytics methods
- Main areas of use cases for DTA are Information retrieval, NLU, Question answering, and Recommender Systems
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemSemantic Web Company
Knowledge graphs and graph-based data in general are becoming increasingly important for addressing various data management challenges in industries such as financial services, life sciences, healthcare or energy.
At the core of this challenge is the comprehensive management of graph-based data, ranging from taxonomy to ontology management to the administration of comprehensive data graphs along with a defined governance framework. Various data sources are integrated and linked (semi) automatically using NLP and machine learning algorithms. Tools for securing high data quality and consistency are an integral part of such a platform.
PoolParty 7.0 can now handle a full range of enterprise data management tasks. Based on agile data integration, machine learning and text mining, or ontology-based data analysis, applications are developed that allow knowledge workers, marketers, analysts or researchers a comprehensive and in-depth view of previously unlinked data assets.
At the heart of the new release is the PoolParty GraphEditor, which complements the Taxonomy, Thesaurus, and Ontology Manager components that have been around for some time. All in all, data engineers and subject matter experts can now administrate and analyze enterprise-wide and heterogeneous data stocks with comfortable means, or link them with the help of artificial intelligence.
Unified views of business-critical information across all customer-facing processes and HR-related tasks are most relevant for decision makers.
In this talk we present a SharePoint extension that supports the automatic linking of unstructured content like Word documents with structured information from other databases, such as statistical data. As a result, decision makers have knowledge portals based on linked data at their fingertips.
While the importance of managed metadata and Term Store is clear to most SharePoint architects, the significance of a semantic layer outside of the content silos has not yet been explored systematically.
We will present a four-layered content architecture and will take a close look on some of the aspects of the semantic layer and its integration with SharePoint:
- Keeping Term Store and the semantic layer in sync
- Automatic tagging of SharePoint content
- Use of graph databases to store tags
- Entity-centric search & analytics applications
Metadata is most often stored per data source, and therefore it is meaningless outside of the silo. In this presentation, we will give a live demo of a SharePoint extension that makes use of an explicit semantic layer based on standards. This approach builds the basis to start linking data across the silos in a most agile way.
The resulting knowledge graph can start on a small scale, to develop continuously and to grow with the requirements. In this presentation we will give an example to illustrate how initially disconnected HR-related data (CVs in SharePoint; statistical data from labour market; skills and competencies taxonomies; salary spreadsheets) gets linked automatically, and is then made available through an extensive search & analytics application.
Slides based on a workshop held at SEMANTiCS 2018 in Vienna. Introduces a methodology for knowledge graph management based on Semantic Web standards, ranging from taxonomies over ontologies, mappings, graph and entity linking. Further topics covered: Semantic AI and machine learning, text mining, and semantic search.
Semantic Artificial Intelligence is the fusion of various types of AI, incl. symbolic AI, reasoning, and machine learning techniques like deep learning. At the same time, Semantic AI has a strong focus on data management and data governance. With the 'wedding' of various AI techniques new promises are made, but also fundamental approaches like 'Explainable AI (XAI)', knowledge graphs, or Linked Data are more strongly focused.
Bringing Machine Learning and Knowledge Graphs Together
Six Core Aspects of Semantic AI:
- Hybrid Approach
- Data Quality
- Data as a Service
- Structured Data Meets Text
- No Black-box
- Towards Self-optimizing Machines
The PoolParty Semantic Classifier is a component of the Semantic Suite, which makes use of machine learning in combination with Knowledge Graphs.
We discuss the potential of the fusion of machine learning, neuronal networks, and knowledge graphs based on use cases and this concrete technology offering.
We introduce the term 'Semantic AI' that refers to the combined usage of various AI methods.
Machines learn better with Semantics!
See how taxonomy management and the maintenance of knowledge graphs benefit from machine learning and corpus analysis, and how, in return, machine learning gets improved when using semantic knowledge models for further enrichment.
A quick introduction to taxonomies, and how they relate to ontologies and knowledge graph. See how they can serve as part of a semantic layer in your information architecture. Learn which use cases can be developed based on this.
PoolParty GraphSearch - The Fusion of Search, Recommendation and AnalyticsSemantic Web Company
See how Cognitive Search works when based on Semantic Knowledge Graphs.
We showcase the latest developments and new features of PoolParty GraphSearch:
- Navigate a semantic knowledge graph
- Ontology-based data access (OBDA)
- Search over various search spaces: Ontology-driven facets including hierarchies
- Sophisticated autocomplete including context information
- Custom views on entity-centric and document-centric search results
- Linked data: put various tagging services such as TRIT or PoolParty Extractor in series and benefit from comprehensive semantic enrichment
- Statistical charts to explain results from unified data repositories quickly
- Plug-in system for various recommendation and matchmaking algorithms
This talk discusses how companies can apply semantic technologies to build cognitive applications. It examines the role of semantic technologies within the larger Artificial Intelligence (AI) technology ecosystem, with the aim of raising awareness of different solution approaches.
To succeed in a digital and increasingly self-service-oriented business environment, companies can no longer rely solely on IT professionals. Solutions like the PoolParty Semantic Suite utilize domain experts and business users to shape the cognitive intelligence of knowledge-driven applications.
Cognitive solutions essentially mimic how the human brain works. The search for cognitive solutions has challenged computer scientists for more than six decades. The research has matured to the extent that it has moved out of the laboratory and is now being applied in a range of knowledge-intensive industries.
There is no such thing as a single, all-encompassing “AI technology.” Rather, the large global professional technology community and software vendors are continuously developing a broad set of methods and tools for natural language processing and advanced data analytics. They are creating a growing library of machine learning algorithms to enhance the automated learning capabilities of computer systems. These emerging technologies need to be customized or combined with complementary solutions as semantic knowledge graphs, depending on the use case.
A hybrid approach to cognitive computing, employing both the statistical and knowledge base models, will have a critical influence on the development of applications. Highly automated data processing based on sophisticated machine-learning algorithms must give end user the option to independently modify the functioning of smart applications in order to overcome the disadvantages associated with ‘black-box’ approaches.
This talk will give an overview over state-of-the-art smart applications, which are becoming a fusion of search, recommendation, and question-answer machines. We will cover specific use cases in focused knowledge domains, and we will discuss how this approach allows for AI-enabled use cases and application scenarios that are currently highly prioritized by corporate and digital business players.
In this engaging, 1-hour webinar (hosted by http://www.poolparty.biz and http://www.mekon.com), you will learn how to tailor information chunks to readers’ unique needs. We will talk about:
- Benefits and principles of granular structured content, and how to start preparing your own content for this new architecture.
- Best practices for linking structured content to standards-based taxonomies, and some pitfalls to avoid
- The underlying semantic architecture that you can work toward for a truly mature and scalable approach to linking content and data
- Key use cases that you can apply to your own organization
See how you can configure your linked data eco-system based on PoolParty's semantic middleware configurator. Benefit from Shadow Concept Extraction by making implicit knowledge visible. Combine knowledge graphs with machine learning and integrate semantics into your enterprise information systems.
Technical Deep Dive: Learn more about the most complete Semantic Middleware on the market. See how to integrate semantic services into your Enterprise Information Systems.
This talk addresses two questions: “How can the quality of taxonomies be defined?” and “How can it be measured?” See how quality criteria vary depending on how a taxonomy is applied, such as automatic content classification in ecommerce or a knowledge graph for data integration in enterprises. Distinguish between formal quality, structural properties, content coverage, and network topology. Investigate the advantages of standards-based and machine-processable SKOS taxonomies to be able to measure the quality of taxonomies automatically, as well as several tools and techniques for quality assessment.
Consistency is crucial to a good user experience. Designers go to great lengths to create and test consistent visual designs. The structural design of an information environment, which is of equal importance to a good user experience, is too often ignored. Blumauer presents a “four-layered content architecture” for making sense of any information environment by clearly distinguishing between the content, metadata, and semantic layers and the navigation logic. He discusses several use cases for a taxonomy-driven user experience such as personalization or dynamically created topic pages.
PoolParty Semantic Suite 5.5 has been released in August 2016. Further integrations like with Elasticsearch or Stardog strengthen PoolParty’s position as the leading semantic middleware at the cognitive computing market. Knowledge engineers and users benefit from an even more sophisticated combination of semantic computing and machine learning. The new features support context aware knowledge modelling and include an extended data quality management module.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
5. Synonyms: dimethyl sulfoxide, dimethylsulfoxide, Domoso, Infiltrina
Hierarchies: cancer, carcinoma, melanoma, lymphoma, glioblastoma…
Patterns: dates, citations, mail addresses…
Rule-based extraction of all different kinds of complex information
Persons, Locations, Genes, ….
Coocurrences, Typed Relations, e.g. Genes / Diseases / Modification Type
TEXT MINING
Term Detection
Regular
Expressions
Rule Engine
Named Entities
Relations
Sentences, Tokens, POS-Tags, Chunks, Paragraphs, Sections, Stemming, Decompounding…Syntax Detection
6. RULE ENGINE
1. NAME OF THE MEDICINAL PRODUCT
Desloratadine ratiopharm 5 mg film-coated tablets
Primary Field Name Secondary Field Name Field Value
MedicalProductName coveredText Desloratadine ratiopharm 5 mg film-coated tablets
inventedPartName DESLORATADINE
strengthPart 5 mg
pharmaceuticalDoseFormPart FILM-COATED TABLET
TextRegelErgebnis
7. SEARCH & NOSQL
Free text + concept based
search
Text mining integration
Guided navigation / facets
NoSQL functionalities
Multi- & cross lingual search
Related documents
Based on Apache Solr
• Extended Query Syntax
• JSON-API
• Scalability
…
10. INFORMATION DISCOVERY
Terminology
Management Text Mining
Search &
Analytics NoSQL
Categorization
& Clustering
Delivery / Deployment / Runtime Environment
Integration Tests / Continuous Integration
Extensive Documentation
Common Architecture / Application Design
User & Role Management, Security
Communication Bus
Project Management
11. PATENT CLASSIFICATION AT EPO
Tender No. 1585
1) Pre-Classification of
unpublished patents into departments
2) Re-Classification on
published patents, if category system changes
12. ABOUT EPO
• The European Patent Office (EPO)
grants European patents for the
Contracting States to the European
Patent Convention
• Second largest intergovernmental
institution in Europe
• Not an EU institution
• Self-financing, i.e. revenue
from fees covers operating
and capital expenditure
16. COOPERATIVE PATENT CLASSIFICATION
• Patent Classification System based on ECLA / IPC
• jointly developed by the European Patent Office (EPO)
and the United States Patent and Trademark Office
(USPTO)
• used by both the EPO and USPTO since 1 January 2013
• currently contains about 250.000 classes
22. PATENT CLASSIFICATION AT EPO
Tender No. 1585
1) Pre-Classification of
unpublished patents into departments
Our Motivation:
• Great Classification Use-Case
– Big Data (80 Mio. patents available)
– Large Scale Category System >250.000 CPC codes
– Tough classification quality and response time
constraints
• Text Mining Success Story
27. SOME FACTS
• about 650k training documents from 2005-2013
• supervised learning: light-weight and fast linear support
vector machine
• Training time (16 Cores, 128 GB RAM)
– Feature Extraction: ~1 hour
– Training of Classifiers: ~1 hour
– 90/10 tests with a look-a-head of 3 levels
and reporting 3 best candidates: ~1 hour
• Prediction: 5 docs in 5 sec
29. STATUS & OUTLOOK
Range-specific quality
evaluation
Going live with best
ranges
• Continuous optimization
30. PATENT CLASSIFICATION AT EPO
Tender No. 1585
1) Re-Classification on
published patents, if category system changes
Challenges and Facts:
– 250.000 CPC codes, regular changes/refinements
– Several re-classification projects at any one time, great
variation in size, a class is split into 5-20(?) subclasses
– No training material available
31. NEW RE-CLASSIFICATION PROCESS
Training Data
• Human Annotator starts labeling about 20% of
the documents with new subclasses
Statistical Models
• are generated on-the-fly, and
• Cross-validation test are carried out
Threshold
• If cross-validation achieves certain threshold
(e.g. 90%), the remaining documents are
classified fully automatically without further
review
• Otherwise, more training data is being generated
32. STATUS & OUTLOOK
Currently in evaluation
phase
• Going live in the next
weeks
33. …NOT ONLY PATENTS
Solutions
Libraries PharmaPatentsHealthcare Social Media
Terminology
Management Text Mining
Search &
Analytics NoSQL
Categorization
& Clustering
Automotive
34. For further questions, please contact:
David Baehrens
+ 49 (0)761 203 97690
info@averbis.com