Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Generation of Digital Content


Published on

Georg Rehm. QURATOR: Developing a Flexible AI Platform for Digital Content Curation. QURATOR 2020 – Conference on Digital Curation Technologies., 1 2020. Fraunhofer FOKUS, January 20/21, 2020. Invited keynote talk.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Generation of Digital Content

  1. 1. Georg Rehm DFKI GmbH Scientific Coordinator QURATOR QURATOR 2020 – Conference on Digital Curation Technologies – Berlin, 20/21 January 2020 A Flexible AI Platform for the Adaptive Analysis and Creative Generation of Digital Content Conference on Digital Curation Technologies 20–21 January 2020, Berlin, Germany
  2. 2. 2 • History – Introduction – Motivation • QURATOR Consortium and Group of Projects • DFKI Project • Industry Solutions and Showcases • QURATOR and Beyond • Next Steps Outline
  3. 3. History – Introduction – Motivation
  4. 4. 4 Motivation – The Need for Intelligent Content QURATION • In, essentially, all sectors and verticals the demand for intelligent systems to support the processing and generation of digital content is rapidly increasing. • The availability of vast amounts of content and the pressure to publish new content quickly and in rapid succession requires faster, more efficient and smarter processing methods. • The QURATOR project, funded by the German Federal Ministry of Education and Research (BMBF), aims to develop a sustainable and innovative technology platform that provides services to support knowledge workers in various industries to address the challenges they face when curating digital content.
  5. 5. Platform for Curation Technologies (Prototype) Platform for Curation Technologies Curation Services for Corporate Content 2015-20172008-2013 (2014-2016) • Example sectors • Prototypical curation services • Experimental approach • Positive results beyond expectations • Potential for growth and synergies • Strong interest from industry and research • Extended group of sectors and verticals • Several additional and extended as well as improved AI-based curation services • High quality, AI, Semantics, Precision • Multimedia, Text, Audio, Video, VR/AR • Corporate content • Positive results • Tehnologically compatible to the WK-P DKT • Complementary in terms of scientific approach 5 2018-2021
  6. 6. 6 QURATOR: Our Vision Goals • Establish an ecosystem for Curation Technoloiges. We have been trying to shape and position this emerging field since 2015 • Establish the metrpolitan area Berlin- Brandenburg as a Centre of Excellence for the Development and industrial application of curation technoloiges Technologies • Joint development of a sustainable Technology Platform • Define the State of the Art • Convincing showcases and prototypes for four different sectors Solutions • General curation technologies • Sector-specific curation technologies • AI and Machine Learning … WISSENSCHAFTSZENTRUM 39 Universitäten & Hochschulen 175.651 Studenten (18% International) 65.000 Wissenschaftler <70 öffentlich finanzierte Forschungseinrichtungen BERLIN ist … REGION
  7. 7. European Market for Language Technologies • Background: SMART 2016/0103 contract supporting the EC in Connecting Europe Facility AT. • EU market approx. 1B€ • Market is disrupted by dominant global players • SMEs: 70% of EU LT vendors up to 50 employees • Revenue per company is growing • Market is highly fragmented: hundreds of SMEs, many address very specific niches, sectors and languages Digital Single Market Final study report on CEF Automated Translation value proposition in the context of the European LT market/ecosystem FINAL REPORT A study prepared for the European Commission DG Communications Networks, Content & Technology by: 2018 2019 2020 Germany 197M€ 217M€ 240M€ UK 189M€ 209M€ 232M€ France 88M€ 96M€ 105M€ Netherlands 55M€ 60M€ 66M€ Rest of EU 28 249M€ 277M€ 305M€ 778M€ 859M€ 948M€ Recommendations • Europe is strong in research and innovation, but not successful to scale innovations and capture the market • Europe needs European alternatives to fill the gaps and to avoid reliance on monopolies • Multilingual Digital Single Market should be developed on its own infrastructure Ø A European platform is needed to connect demand and supply as well as industry and research Approach and Observations Ø European LT Market is estimated to reach 1B€ in 2020. Ø This appears to be only the beginning of something bigger …
  8. 8. Global Market for Language Technologies and NLP Ø Truly incredible market forecast! Ø Note that this report does not mention “curation technologies” yet!
  9. 9. QURATOR Consortium and Group of Projects
  10. 10. 10 QURATOR Overview • One joint initiative – ten individual projects – close collaboration • Six main topics at three levels (= the phases of the curation process) • Modular technology platform. It will allow the development of flexible curation technologies for industrial use from 2021 onwards. • Methods developed in the different projects can also be used individually – in the QURATOR platform they can be combined in a flexible way as microservices.
  11. 11. 11 QURATOR: One Initiative – Ten Projects DFKI GmbH Curation Technologies – A Flexible AI Platform for the Adaptive Analysis and Creative Generation of Digital Content 3pc GmbH Neue Kommunikation Curation Technologies for Interactive Storytelling Ada Health GmbH Curation of Biomedical Knowledge ART+COM AG Curation Tools for Multimedia Content Condat AG Smart Newsboard Fraunhofer Gesellschaft – FOKUS Corporate Smart Insights (CSI) Semtation GmbH Intelligent Business Process Modelling – intelligent navigation through knowledge spaces Stiftung Preußischer Kulturbesitz, Staatsbibliothek zu Berlin Automated Curation Technologies for Digitized Cultural Heritage uberMetrics Technologies GmbH Curation Technologies for the Monitoring of Online Content and Risks Wikimedia Deutschland e.V. Data quality in Wikidata QURATOR Kick off meeting at DFKI Berlin, 19 November 2018
  12. 12. QURATOR DFKI Project
  13. 13. 13 QURATOR DFKI Project: Architectural View Services Summarization Temporal Expression Analysis Paraphrasing Machine Translation Semantic Storytelling Document Structure Analysis Relation Extraction Event Detection Provisioning of Datasets and Content Named Entity Recognition and Linking Language Identification Duplicate Detection Input/Output • User • GUI • Document collection • Other content Workflow Manager Storage Knowledge Graph File Storage API Manager Security Kubernetes Preprocessing Semantic Analysis Content Generation • Flexible, robust, scalable platform, filled with a variety of services • Flexible orchestration of workflows • Methods for recognizing different types of content in order to feed them into type-specific processing workflows • Aggregate modular content more easily in new productions, applications, usage contexts • Interoperability through generic APIs (SaaS) and little integration effort
  14. 14. 14 Semantic Storytelling Incoming Content 1 Determine the relevance of a segment for 2 Determine importance of a segment 3 Discourse relation between segment and topic a Document relevance b Segment relevance T Sentence 1 Sentence 5 Sentence 4C B A T Self-contained document collection T C B A Comparison Comparison Expansion TC B A isMoreImportantThan isLessImportantThan isMoreImportantThan Web content RSS feeds Wikipedia Topic T Use Case 1 Use Case 2 Use Case 2 Use Case 3 Possible instantiations of • Complete document • Summary • Claim or fact • Event • Named entity Ranked list of text segments Prototype GUIs UC1 UC2 UC3
  15. 15. 15 QURATOR Innovation Labs and Digital Showroom Current State More details and demos available at our posters. Conference on Digital Curation Technologies A Flexible AI Platform for the Adaptive Analysis and Creative Generation of Digital Content DFKI is Germany’s leading research center in the field of innovative software technologies based on AI meth- ods. In QURATOR, DFKI focuses on the development of an innovative platform for digital curation technologies as well as on the population of this platform with vari- ous processing services. This platform plays a crucial role in the project as it is being designed in cooperation with all partners who also contribute their services. Ulti- mately, the QURATOR platform will contain services, data sets and components that are able to handle and to process different types and classes of content as well as content sources. Preprocessing Preprocessing encompasses the services that are responsible for obtaining and processing information from different content sources so that they can be used in the platform and integrated into other services. – Provisioning of Data Sets and Content (web pages, RSS feeds etc.) – Language and Duplicate Detection – Document Structure Recognition Semantic Analysis Semantic Analysis includes services that process a document (or part of it) and add information in the form of annotations. – Named Entity Recognition and Linking – Temporal Expression Analysis – Relation Extraction – Event Detection – Fake News Analysis – Discourse Analysis Content Generation Content Generation contains services that make use of annotated information (Semantic Analysis) to help create a new piece of information. – Summarization – Paraphrasing – Automated Translation – Semantic Storytelling Contact: Deutsches Forschungszentrum für Künstliche Intelligenz GmbH Speech and Language Technology Lab Dr. Georg Rehm Photo:istock/koto_feja Funded by
  16. 16. 16 Collaboration: Joint Research and Development • Bilateral and multilateral research and development work • Development, experiments, papers • Collaboration at the microservices level (abstract, combinable) • Partners provide microservices via the technology platform • Interoperability via common formats such as NIF (NLP Interchange Format) “In the United Kingdom this was associated with the punk rock movement , as exemplified by bands such as Crass and the Sex Pistols . Anarchist anthropologist David Graeber…” nif:anchorOf "United Kingdom"^^xsd:string ; nif:entity> ; nif:geoPoint 51.5_-0.11666666666666667"^^xsd:string ; itsrdf:taIdentRef <> . nif:anchorOf "Crass"^^xsd:string ; nif:entity <> ; nif:orgType "group_or_band"^^xsd:string ; Itsrdf:taIdentRef <> . nif:anchorOf "David Graeber"^^xsd:string ; nif:birthdate "1961-02-12"^^xsd:date ; nif:entity <> ; itsrdf:taIdentRef <> .
  17. 17. Industry Solutions and Showcases
  18. 18. 18 Medical Content Curator Smart Exhibits Media Curator Intelligent Navigation Risk Monitoring Next Reality Storytelling Content Curation Engine Corporate Smart Insights Semantic Enrichment Focus on Basic Technologies and Curation Tools Industry Solutions Additional Showcases
  19. 19. 19 20.1.2020 Conference Location: Auditorium 1 08:30–09:00 09:00–09:30 09:30–10:50 10:50 –11:15 11:15 –13:00 Registration Welcome & Introduction Armin Berger, CEO of 3pc Neue Kommunikation and spokesperson of QURATOR Prof. Dr. Adrian Paschke, Fraunhofer FOKUS A Flexible AI Platform for the Adaptive Analysis and Creative Generation of Digital Content Dr. Georg Rehm, DFKI GmbH Digital Curation Technologies for Industry Curation technologies for online media- and risk monitoring Daniel Siewert, Ubermetrics Technologies GmbH Towards the Bosch Materials Science Knowledge Base Dr. Jannik Stroetgen, Robert Bosch GmbH Intelligent Business Process Modelling - navigating across knowledge spaces Dr. Frauke Weichhardt, Semtation GmbH Semantic AI – How Knowledge Graphs can enable powerful AI Solutions! Martin Kaltenböck, Semantic Web Company Coffee break & demos Digital Curation Technologies for Media Topic Detection and Trend Analysis Methods using Wikidata Sacha Prelle und Radoslaw Oldakowski, Condat AG AI use cases for content production workflows at educational content providers Ina Abraham und Dr. Felix Sasaki, Cornelsen Curation technologies for interactive Storytelling Armin Berger, 3pc GmbH Neue Kommunikation Digital Curation Technologies for Media Peggy van der Kreeft, Deutsche Welle A simple and efficient approach for the semi-automated curation for media reviews Dr. Marc Rössler, Unicepta 13:00–14:00 14:00–15:00 15:00–15:30 15:30–16:30 16:30 Lunch break Digital Curation Technologies for Culture Automated curating technologies for digitized cultural heritage Clemens Neudecker, Stiftung Preußischer Kulturbesitz/Staatsbibliothek zu Berlin AI for digitized cultural heritage Stephan Bartholmei, Deutsche Nationalbibliothek Curation tools for interactive multimedia content Dr. Joachim Quantz, ART+COM AG Coffee break & demos Digital Curation Technologies in Medicine and Knowledge Graphs Tools and technology for curating biomedical knowledge Dr. Sarah Schulz, Ada Health GmbH Towards an Insight-driven Organisation with Corporate Smart Insights Prof. Dr. Adrian Paschke, Data Analytics Center, Fraunhofer FOKUS Quality improvement in Wikidata Lydia Pintscher, Wikimedia Deutschland e.V. Get-together & demos è è è è è è è è è è è = QURATOR project presentation Conference on Digital Curation Technologies 20–21 January 2020, Berlin, Germany
  20. 20. QURATOR and Beyond
  21. 21. 21 QURATOR and Beyond One of QURATOR’s Goals is to establish the Metropolitan Area Berlin/Brandenburg as an International Centre of Excellence for the Development and Use of Curation Technologies. • In order to accomplish this goal we need to look (way) beyond the individual project • Brief overview of related projects and initiatives on the German and European level SSENSCHAFTSZENTRUM versitäten & Hochschulen 51 Studenten (18% International) 0 Wissenschaftler fentlich finanzierte hungseinrichtungen BERLIN ist … GION OUTLINE MAP - EUROPE ? ?
  22. 22. 22 QURATOR and Beyond: German Level • Many AI-related activities in Germany • AI strategy of the German federal government • Several German ministries set up AI-related funding programmes (BMBW, BMWi, BMJV, BMG etc.) • Federal Ministry for Economic Affairs and Energy (BMWi): “KI Innovationswettbewerb” • SPEAKER – Sprachassistenzplattform “Made in Germany” – is one of the funded projects • Start on 1 April 2020, coordinated by Fraunhofer IAIS and IIS • DFKI is a partner and will strengthen the bridge to QURATOR and Curation Technologies • DIN AI Standardisation Roadmap • To be unveiled at the Digital Summit 2020 of the German government • More than 200 colleagues from the field participate in the preparation of the roadmap • Our goal is to include topics related to Curation Technologies
  23. 23. 23 QURATOR and Beyond: European Level • Various AI-related activities in Europe • Emerging AI strategies of the European Commission and of almost all EU member states. • Several EU projects develop technologies for content curation, e.g., LYNX (EU Horizon 2020): • Curation technologies for legal content, building the Multilingual Legal Knowledge Graph • AI4EU – European AI on Demand Platform (EU Horizon 2020) • EU-wide platform for the whole European AI community • ELG – European Language Grid (EU Horizon 2020) • EU-wide platform for the whole European LT community • European Parliament: CULT committee starting a new report on AI for the audio-visual sector • European Commission is currently defining the role of LT in Digital Europe Programme (DEP) • EP and EC activities: Our goal is to include topics related to Curation Technologies
  24. 24. 20-01-2020, Berlin – QURATOR 2020 1. Establish the ELG as the primary platform for LT in Europe. 2. ELG as a platform for commercial and non-commercial LTs, both functional and non-functional. 3. Enable the European LT community to upload services and data sets into the ELG, to deploy them and to connect with, and make use of those resources made available by others. 4. Establish the European Language Grid as the primary European market place for LT to connect demand and supply. Main Objectives Kick-off meeting, 22/23 January 2019
  25. 25. 20-01-2020, Berlin – QURATOR 2020 25 M1 M12 M24 M36 2019 2020 2021 GridPlatformGridContentGridCommunityMgmt. Corporate Identity Project Website NCCs set up Communication Plan developed Cloud Platform set up Metadata Schema First ELG Conference Second ELG Conference Third ELG Conference First Demo LT Board Meeting LT Board Meeting LT Board Meeting Services, Tools, Components: First Release Services, Tools, Components: Second Release Services, Tools, Components: Final Release Platform: First Release Platform: Second Release Platform: Final Release Data Sets: First Release Data Sets: Second Release Data Sets: Final Release Pilots: First Open Call Pilots: Second Open Call Pilots Pilots Sustainability Plan (First Version) Sustainability Plan (Final Version) ELG Legal Entity Test Services set up
  26. 26. 20-01-2020, Berlin – QURATOR 2020 Architecture 26 • QURATOR’s basic infrastructure is based on ELG’s platform approach • The QURATOR DFKI project is prototypically developing, on top of ELG: • Workflow Manager – complex services, pipelines, workflows (first example: MT pivoting) • Content store including content and genre ontology and text type-based processing
  27. 27. 20-01-2020, Berlin – QURATOR 2020 27 • ELG will provide 1.9M€ for pilot projects using Financial Support for Third Parties, selected through two open calls. • Usage of LTs in specific applications, processes and operations. • Pilot projects should demonstrate the usefulness of the ELG. • Results of the projects will be made available through the ELG. • Lightweight, open, transparent and expert-evaluation based selection process. • Single project up to 200,000€, recommended minimal amount of 100,000€; duration expected in the 9-12 months range. • The first call will be published in March 2020. • Pilot Projects
  28. 28. 20-01-2020, Berlin – QURATOR 2020 29 QURATOR will be ingested into the ELG platform. What does that mean? Ø Represent the QURATOR project itself in ELG catalogue Ø Represent all QURATOR partner organisations in catalogue Ø Upload and make available all processing services developed in QURATOR (if the developer wants to, of course) Ø Upload and make available all data sets, corpora, language resources developed in QURATOR (see above)
  29. 29. Home > Projects > COMPRISE Overview Partners Services Data Sets Other Resources Project: QURATOR Long title: QURATOR – Curation Technologies Abstract: Der Datendschungel wuchert. Hinzu kommen immer neue smarte Geräte und Rezeptionssituationen. Damit steigt nicht nur der Kommunikationsdruck. Auch der Wettbewerb um Aufmerksamkeit wird sich weiter verschärfen. Für die Digitalstrategie bedeutet dies: Wer gesehen und gehört werden will, muss vor allem eines bieten: überzeugenden, hochwertigen Content. Mit QURATOR wollen wir Teil dieser Lösung werden und setzen auf die vielversprechendste Technologie unserer Zeit: Künstliche Intelligenz. Ziel von QURATOR ist es, einzelne Kuratierungstätigkeiten mittels Verfahren aus der Künstlichen Intelligenz (KI) hochwertiger, effizienter und kostengünstiger zu gestalten und in praxisnahe Branchenlösungen zu überführen. Gemeinsam in die Zukunft! 10 Partner entwickeln in dem vom Bundesministerium für Bildung und Forschung (BMBF) geförderten Forschungsprojekt QURATOR eine neuartige Technologieplattform, die Wissensarbeiter*innen und Redakteur*innen bei der Kuratierung digitaler Inhalte unterstützen soll. QURATOR website: Start: 01 November 2018 – End: 31 October 2021 Funding Agency: BMBF Home > Projects > COMPRISE Overview Partners Services Data Sets Other Resources All services produced by QURATOR Project Pages in ELG (Draft)
  30. 30. Next Steps
  31. 31. 32 QURATOR, ELG and Beyond: Platform Interoperability • Close collaboration between AI4EU and ELG as well as QURATOR and, soon, SPEAKER • Our goal is to demonstrate and exploit platform interoperability • Platform proliferation has the risk of even further fragmentation of the European LT landscape. • We should make use of existing or emerging platforms and, additionally, make our LT/NLP/AI platforms interoperable with one another: platforms should be able to “talk” to each other. • Robust service and platform interoperability is crucial, e.g., • Repositories: Interoperability on the level of metadata records (including metadata harvesting) • Services: Interoperability on the level of data and annotation formats (exchange of services) • Eventually perhaps even EU-wide standardisation of NLP/LT APIs? • 1st Int. Workshop on Language Technology Platforms (IWLTP 2020), 16 May 2020 (at LREC 2020) • – Submission deadline: 21 February 2020
  32. 32. 33 • Continue research and development work as well as evaluation and rapid prototyping • DFKI: emphasis on basic curation services, semantic storytelling, workflows and platform • Collaborative development of sector-specific prototypes and showcases • Continue the collaboration between QURATOR and related projects and initiatives, especially ELG, to move Europe into the pole position of the global Language Technology market … • … and to establish Berlin/Brandenburg as an international centre of excellence for Curation Technologies! QURATOR: Next Steps SSENSCHAFTSZENTRUM versitäten & Hochschulen 51 Studenten (18% International) 0 Wissenschaftler fentlich finanzierte hungseinrichtungen BERLIN ist … GION OUTLINE MAP - EUROPE
  33. 33. 34 © 2020 QURATOR Consortium