The document summarizes Pablo Duboue's presentation on the Apache UIMA framework for unstructured information processing. It includes an outline describing an introduction and tutorial on UIMA, a discussion of W3C corpus processing, advanced topics in UIMA, and a summary. Key points about UIMA are that it is an open source framework for integrating text analytics tools and it uses standoff annotations to allow interoperability between tools.
Slides about an overview about Apache UIMA and how it can be used for Metadata Generation in the context of the "Information management on the Web" course at DIA (Computer Science Department) of Roma Tre University
The document discusses Apache UIMA and its role in the IBM Watson system. It provides an overview of UIMA and how it was used in Watson's architecture to allow for parallelization and scaling of its natural language processing and question answering components. The speaker then provides a tutorial on key UIMA concepts like the Common Analysis Structure and how to develop UIMA analysis engines and type systems.
This document provides an introduction and overview of Apache UIMA (Unstructured Information Management Architecture).
Apache UIMA is an open source framework for analyzing unstructured information like text, audio, and video. It allows defining type systems and building analysis pipelines using components called annotators that can extract metadata from unstructured data.
The document outlines some key aspects of Apache UIMA including its goals of supporting a community around analyzing unstructured content, how it can bridge different domains, and provides an example scenario of using it to extract metadata from articles about movies.
The document discusses Apache UIMA, an architectural framework for managing unstructured data. It is not inherently a semantic search tool. It allows for pluggable analysis engines and asynchronous scaleout. The document provides examples of how UIMA can be used for semantic search, including generating metadata for content management systems, data enrichment, and linking to external data sources. It also describes how AlchemyAPI services can be wrapped as UIMA components to perform named entity recognition and linking to knowledge bases.
The document provides an introduction to Java programming concepts including an overview of the Java platform, setting up a development environment, writing a first Java application, Java language elements, object-oriented programming concepts, and more. Key topics covered include the history and principles of Java, how to set up Eclipse, writing and running a "Hello World" program, Java syntax like variables and control structures, object-oriented concepts like classes and inheritance, and Java APIs and tools.
The document discusses two use cases (UC1 and UC2) for applying Apache UIMA to automatically extract structured information from unstructured text. UC1 involves using UIMA to analyze real estate listings to extract fields like price, zone, and phone number to track trends in the real estate market. UC2 aims to automatically extract common information like language, funding type, and expiration date from announcements of EU tenders and contracts. The document outlines the different components involved in each use case, including crawlers to extract text, annotators to identify relevant entities, and CAS consumers to store extracted fields in databases or indexes.
Presented by Wes Caldwell, Chief Architect, ISS, Inc.
The customers in the Intelligence Community and Department of Defense that ISS services have a big data challenge. The sheer volume of data being produced and ultimately consumed by large enterprise systems has grown exponentially in a short amount of time. Providing analysts the ability to interpret meaning, and act on time-critical information is a top priority for ISS. In this session, we will explore our journey into building a search and discovery system for our customers that combines Solr, OpenNLP, and other open source technologies to enable analysts to "Shrink the Haystack" into actionable information.
Slides about an overview about Apache UIMA and how it can be used for Metadata Generation in the context of the "Information management on the Web" course at DIA (Computer Science Department) of Roma Tre University
The document discusses Apache UIMA and its role in the IBM Watson system. It provides an overview of UIMA and how it was used in Watson's architecture to allow for parallelization and scaling of its natural language processing and question answering components. The speaker then provides a tutorial on key UIMA concepts like the Common Analysis Structure and how to develop UIMA analysis engines and type systems.
This document provides an introduction and overview of Apache UIMA (Unstructured Information Management Architecture).
Apache UIMA is an open source framework for analyzing unstructured information like text, audio, and video. It allows defining type systems and building analysis pipelines using components called annotators that can extract metadata from unstructured data.
The document outlines some key aspects of Apache UIMA including its goals of supporting a community around analyzing unstructured content, how it can bridge different domains, and provides an example scenario of using it to extract metadata from articles about movies.
The document discusses Apache UIMA, an architectural framework for managing unstructured data. It is not inherently a semantic search tool. It allows for pluggable analysis engines and asynchronous scaleout. The document provides examples of how UIMA can be used for semantic search, including generating metadata for content management systems, data enrichment, and linking to external data sources. It also describes how AlchemyAPI services can be wrapped as UIMA components to perform named entity recognition and linking to knowledge bases.
The document provides an introduction to Java programming concepts including an overview of the Java platform, setting up a development environment, writing a first Java application, Java language elements, object-oriented programming concepts, and more. Key topics covered include the history and principles of Java, how to set up Eclipse, writing and running a "Hello World" program, Java syntax like variables and control structures, object-oriented concepts like classes and inheritance, and Java APIs and tools.
The document discusses two use cases (UC1 and UC2) for applying Apache UIMA to automatically extract structured information from unstructured text. UC1 involves using UIMA to analyze real estate listings to extract fields like price, zone, and phone number to track trends in the real estate market. UC2 aims to automatically extract common information like language, funding type, and expiration date from announcements of EU tenders and contracts. The document outlines the different components involved in each use case, including crawlers to extract text, annotators to identify relevant entities, and CAS consumers to store extracted fields in databases or indexes.
Presented by Wes Caldwell, Chief Architect, ISS, Inc.
The customers in the Intelligence Community and Department of Defense that ISS services have a big data challenge. The sheer volume of data being produced and ultimately consumed by large enterprise systems has grown exponentially in a short amount of time. Providing analysts the ability to interpret meaning, and act on time-critical information is a top priority for ISS. In this session, we will explore our journey into building a search and discovery system for our customers that combines Solr, OpenNLP, and other open source technologies to enable analysts to "Shrink the Haystack" into actionable information.
This document introduces PQuery, an open source tool from Percona used for stress testing databases. It provides an overview of Percona and its products, describes the origins and capabilities of PQuery, and outlines future plans to expand PQuery's support for additional databases and cluster configurations. The document encourages using PQuery to test and optimize database setups and shares additional resources for learning about and using PQuery.
This document provides information about the OWASP Web Testing Environment (WTE) project and its leader Matt Tesauro. It discusses the history and goals of the WTE project, which provides a collection of web application security testing tools in an easy-to-use environment. It also outlines ideas for the future of the project, such as providing automated cloud-based instances of the WTE and aligning its tools with the OWASP Testing Guide.
Building User-Centred Websites with Drupalamanda etches
This document summarizes two presentations about building user-centered websites with the content management system Drupal. The presentations describe the experiences of Wilfrid Laurier University Library and McMaster University Library in migrating their websites to Drupal. Key lessons from the presentations include choosing an initial version of Drupal, deciding what modules to use like CCK, planning content types before development, and getting support from the large Drupal community. The document provides an overview of the features and flexibility of Drupal for building customized, dynamic websites.
Between a SPA and a JAMstack: Building Web Sites with Nuxt/Vue, Strapi and wh...Adam Khan
With the explosion of open source JavaScript, it's time to migrate from building web-based systems on a monolithic CMS to a decoupled, componentized, reactive open source technology stack.
[Presented at the Brighton Web Development Meetup, October 2018, Brighton, UK]
This document provides an introduction to Java programming and the Java development environment. It discusses what computer programming is, defines the key phases of programming, and presents a simple "Hello World" Java program as an example. It also covers Java concepts like case sensitivity, formatting, and file naming conventions. The document explains how Java code is compiled and executed, and provides an overview of the Java programming language, virtual machine, class files, and class loaders. It introduces the Eclipse integrated development environment and demonstrates how to create a new Java project, write code, compile, run, and debug programs within Eclipse.
Title Part1: Pharo Status
Speaker: Marcus Denker
Wed, August 20, 11:00am – 11:45am
Video:
Part1: https://www.youtube.com/watch?v=_Mv7SX-8Vlk
Part2: https://www.youtube.com/watch?v=qdZq2IZBm4k
Abstract: In this talk we will present the advances and new features in Pharo 3.0. We will present the current work on Pharo 4.0 and beyond. We will also present our new work around the Pharo consortium.
Bio: Marcus Denker is a permanent researcher (CR1, with tenure) at INRIA Lille - Nord Europe. Before, he was a postdoc at the PLEIAD lab/DCC University of Chile and the Software Composition Group, University of Bern. His research focuses on reflection and meta-programming for dynamic languages. He is an active participant in the Squeak and Pharo open source communities for many years. Marcus Denker received a PhD in Computer Science from the University of Bern/Switzerland in 2008 and a Dipl.-Inform. (MSc) from the University of Karlsruhe/Germany in 2004. He is a member of ACM, GI, and IEEE and a board-member of ESUG.
ESUG 2014, Cambridge
Wed, August 20, 11:00am – 11:45am
Video:
Part1: https://www.youtube.com/watch?v=_Mv7SX-8Vlk
Part2: https://www.youtube.com/watch?v=qdZq2IZBm4k
Description
Abstract: In this talk we will present the advances and new features in Pharo 3.0. We will present the current work on Pharo 4.0 and beyond.
This document discusses TAO APIs for integrating standalone items into computer-based assessment platforms. It provides an overview of the available APIs, including client-side and server-side APIs, and how they can be used for item I/O, backend setup, event logging, item state, and more. It also describes how to run a standalone item and which features are needed to integrate it with a CBA platform using the Item Runtime and Workflow Runtime APIs. The document encourages contributions to the TAO APIs by discussing support resources on the TAO Forge.
The document discusses importing packages in Java code. It includes examples of importing common packages like Scanner, FileInputStream, ArrayList, and others. The code sample shows initializing variables and objects like routers, table, and requests array list. It also includes code for parsing input, setting up network routing tables, and performing other routing calculations and simulations.
The document discusses an agenda for a Python workshop that will cover topics such as an introduction to Python, its features, how it is used in enterprises, installing Python on Windows and Linux, setting up development environments, and learning about Python concepts like strings, numbers, functions, modules, data structures, object-oriented programming, and errors/exceptions handling. The workshop will also include hands-on exercises and a quiz.
T4 is a text templating engine that allows generating text output from templates with input. It is used across the Microsoft stack for code generation scenarios. T4 templates use directives like <#@ #> and control blocks like <# #> to define template behavior. Templates can be designed for use at design-time in Visual Studio or as precompiled runtime templates. Microsoft extensively uses T4 for scenarios like ASP.NET MVC and Entity Framework code generation. Care must be taken when extending or modifying generated code to avoid losses during regeneration.
Maven is a build tool that can manage a project's build process, dependencies, documentation and reporting. It uses a Project Object Model (POM) file to store build configuration and metadata. Maven has advantages over Ant like built-in functionality for common tasks, cross-project reuse, and support for conditional logic. It works by defining the project with a POM file then running goals bound to default phases like compile, test, package to build the project.
Arakoon: A distributed consistent key-value storeNicolas Trangez
Arakoon is a distributed, consistent key-value store created by Incubaid using OCaml and the Lwt library. It uses a Multi-Paxos consensus protocol to guarantee consistency across cluster nodes. Researchers at Incubaid chose OCaml due to its suitability for the problem domain, availability of cooperative threads, and prior experience. While OCaml posed some recruitment challenges, the language worked well overall and allowed Arakoon to reach stability quickly. Areas for improvement include build tools, Lwt release cycles and testing, and growing the contributor and user base.
This document outlines recipes for creating Drupal distributions. It discusses the anatomy of distributions, including profiles, .info files, .install files, .make files and features. It also covers testing distributions with Simpletest and continuous integration with Jenkins. Business models for distributions are proposed, including open source and customization/support. The goal is to make superb, well-tested distributions to build customizable sites on Drupal.
Empowering Magnolia for Enterprise Use Cases - Experience Reportbkraft
This document discusses empowering Magnolia for enterprise use cases. It describes Aperto, an agency that has been using Magnolia since 2006 for clients. The document then covers two main topics: 1) customizing workflows for specific business requirements, and 2) approaches for handling large volumes of user-generated content (UGC), such as comments. For workflows, it provides examples of enhancing features and usability. For UGC, it describes separating this processing to avoid bottlenecks and ensure availability during peak loads. It also demonstrates a custom solution using Google Web Toolkit for comment moderation.
The document discusses the Apache SOA stack and debunks some myths about SOA. It provides an overview of the Enterprise Service Bus (ESB) and explains why the Apache ServiceMix stack is a good choice as an ESB due to its modularity, stability, and cluster capabilities. The document also discusses how to design software and build systems for an ESB using OSGi and Maven.
The document discusses moving assessment items to a web-based format. It suggests that items could be developed as simple web applications using standard web technologies that are easy to learn, widely adopted, and can integrate with existing platforms. Items would essentially be a collection of web applications that could leverage open web standards like XHTML, CSS, and JavaScript. This would allow items to have interactivity and be developed with familiar tools. It would also allow standalone items to interact with assessment platforms through programming interfaces. Overall, the document proposes a more open and flexible approach to online assessment items using web technologies.
Tampere Technical University - Seminar Presentation in testind day 2016 - Sca...Sakari Hoisko
Seminar presentation from Tampere Technical University, testing day 2016: http://www.cs.tut.fi/tapahtumat/testaus16/
Open Source project: https://github.com/symbionext/DockerizedRobotFramework
Lucandra is a project that combines Lucene/Solr indexing and search capabilities with Cassandra for storage. It addresses issues with scaling Lucene/Solr by leveraging Cassandra's ability to horizontally scale writes and queries. Lucandra stores Lucene indexes and term vectors in Cassandra to allow distributed indexing and searching across nodes. It has been used successfully in production systems like sparse.ly for twitter search and Wikassandra for Wikipedia search.
This document introduces PQuery, an open source tool from Percona used for stress testing databases. It provides an overview of Percona and its products, describes the origins and capabilities of PQuery, and outlines future plans to expand PQuery's support for additional databases and cluster configurations. The document encourages using PQuery to test and optimize database setups and shares additional resources for learning about and using PQuery.
This document provides information about the OWASP Web Testing Environment (WTE) project and its leader Matt Tesauro. It discusses the history and goals of the WTE project, which provides a collection of web application security testing tools in an easy-to-use environment. It also outlines ideas for the future of the project, such as providing automated cloud-based instances of the WTE and aligning its tools with the OWASP Testing Guide.
Building User-Centred Websites with Drupalamanda etches
This document summarizes two presentations about building user-centered websites with the content management system Drupal. The presentations describe the experiences of Wilfrid Laurier University Library and McMaster University Library in migrating their websites to Drupal. Key lessons from the presentations include choosing an initial version of Drupal, deciding what modules to use like CCK, planning content types before development, and getting support from the large Drupal community. The document provides an overview of the features and flexibility of Drupal for building customized, dynamic websites.
Between a SPA and a JAMstack: Building Web Sites with Nuxt/Vue, Strapi and wh...Adam Khan
With the explosion of open source JavaScript, it's time to migrate from building web-based systems on a monolithic CMS to a decoupled, componentized, reactive open source technology stack.
[Presented at the Brighton Web Development Meetup, October 2018, Brighton, UK]
This document provides an introduction to Java programming and the Java development environment. It discusses what computer programming is, defines the key phases of programming, and presents a simple "Hello World" Java program as an example. It also covers Java concepts like case sensitivity, formatting, and file naming conventions. The document explains how Java code is compiled and executed, and provides an overview of the Java programming language, virtual machine, class files, and class loaders. It introduces the Eclipse integrated development environment and demonstrates how to create a new Java project, write code, compile, run, and debug programs within Eclipse.
Title Part1: Pharo Status
Speaker: Marcus Denker
Wed, August 20, 11:00am – 11:45am
Video:
Part1: https://www.youtube.com/watch?v=_Mv7SX-8Vlk
Part2: https://www.youtube.com/watch?v=qdZq2IZBm4k
Abstract: In this talk we will present the advances and new features in Pharo 3.0. We will present the current work on Pharo 4.0 and beyond. We will also present our new work around the Pharo consortium.
Bio: Marcus Denker is a permanent researcher (CR1, with tenure) at INRIA Lille - Nord Europe. Before, he was a postdoc at the PLEIAD lab/DCC University of Chile and the Software Composition Group, University of Bern. His research focuses on reflection and meta-programming for dynamic languages. He is an active participant in the Squeak and Pharo open source communities for many years. Marcus Denker received a PhD in Computer Science from the University of Bern/Switzerland in 2008 and a Dipl.-Inform. (MSc) from the University of Karlsruhe/Germany in 2004. He is a member of ACM, GI, and IEEE and a board-member of ESUG.
ESUG 2014, Cambridge
Wed, August 20, 11:00am – 11:45am
Video:
Part1: https://www.youtube.com/watch?v=_Mv7SX-8Vlk
Part2: https://www.youtube.com/watch?v=qdZq2IZBm4k
Description
Abstract: In this talk we will present the advances and new features in Pharo 3.0. We will present the current work on Pharo 4.0 and beyond.
This document discusses TAO APIs for integrating standalone items into computer-based assessment platforms. It provides an overview of the available APIs, including client-side and server-side APIs, and how they can be used for item I/O, backend setup, event logging, item state, and more. It also describes how to run a standalone item and which features are needed to integrate it with a CBA platform using the Item Runtime and Workflow Runtime APIs. The document encourages contributions to the TAO APIs by discussing support resources on the TAO Forge.
The document discusses importing packages in Java code. It includes examples of importing common packages like Scanner, FileInputStream, ArrayList, and others. The code sample shows initializing variables and objects like routers, table, and requests array list. It also includes code for parsing input, setting up network routing tables, and performing other routing calculations and simulations.
The document discusses an agenda for a Python workshop that will cover topics such as an introduction to Python, its features, how it is used in enterprises, installing Python on Windows and Linux, setting up development environments, and learning about Python concepts like strings, numbers, functions, modules, data structures, object-oriented programming, and errors/exceptions handling. The workshop will also include hands-on exercises and a quiz.
T4 is a text templating engine that allows generating text output from templates with input. It is used across the Microsoft stack for code generation scenarios. T4 templates use directives like <#@ #> and control blocks like <# #> to define template behavior. Templates can be designed for use at design-time in Visual Studio or as precompiled runtime templates. Microsoft extensively uses T4 for scenarios like ASP.NET MVC and Entity Framework code generation. Care must be taken when extending or modifying generated code to avoid losses during regeneration.
Maven is a build tool that can manage a project's build process, dependencies, documentation and reporting. It uses a Project Object Model (POM) file to store build configuration and metadata. Maven has advantages over Ant like built-in functionality for common tasks, cross-project reuse, and support for conditional logic. It works by defining the project with a POM file then running goals bound to default phases like compile, test, package to build the project.
Arakoon: A distributed consistent key-value storeNicolas Trangez
Arakoon is a distributed, consistent key-value store created by Incubaid using OCaml and the Lwt library. It uses a Multi-Paxos consensus protocol to guarantee consistency across cluster nodes. Researchers at Incubaid chose OCaml due to its suitability for the problem domain, availability of cooperative threads, and prior experience. While OCaml posed some recruitment challenges, the language worked well overall and allowed Arakoon to reach stability quickly. Areas for improvement include build tools, Lwt release cycles and testing, and growing the contributor and user base.
This document outlines recipes for creating Drupal distributions. It discusses the anatomy of distributions, including profiles, .info files, .install files, .make files and features. It also covers testing distributions with Simpletest and continuous integration with Jenkins. Business models for distributions are proposed, including open source and customization/support. The goal is to make superb, well-tested distributions to build customizable sites on Drupal.
Empowering Magnolia for Enterprise Use Cases - Experience Reportbkraft
This document discusses empowering Magnolia for enterprise use cases. It describes Aperto, an agency that has been using Magnolia since 2006 for clients. The document then covers two main topics: 1) customizing workflows for specific business requirements, and 2) approaches for handling large volumes of user-generated content (UGC), such as comments. For workflows, it provides examples of enhancing features and usability. For UGC, it describes separating this processing to avoid bottlenecks and ensure availability during peak loads. It also demonstrates a custom solution using Google Web Toolkit for comment moderation.
The document discusses the Apache SOA stack and debunks some myths about SOA. It provides an overview of the Enterprise Service Bus (ESB) and explains why the Apache ServiceMix stack is a good choice as an ESB due to its modularity, stability, and cluster capabilities. The document also discusses how to design software and build systems for an ESB using OSGi and Maven.
The document discusses moving assessment items to a web-based format. It suggests that items could be developed as simple web applications using standard web technologies that are easy to learn, widely adopted, and can integrate with existing platforms. Items would essentially be a collection of web applications that could leverage open web standards like XHTML, CSS, and JavaScript. This would allow items to have interactivity and be developed with familiar tools. It would also allow standalone items to interact with assessment platforms through programming interfaces. Overall, the document proposes a more open and flexible approach to online assessment items using web technologies.
Tampere Technical University - Seminar Presentation in testind day 2016 - Sca...Sakari Hoisko
Seminar presentation from Tampere Technical University, testing day 2016: http://www.cs.tut.fi/tapahtumat/testaus16/
Open Source project: https://github.com/symbionext/DockerizedRobotFramework
Lucandra is a project that combines Lucene/Solr indexing and search capabilities with Cassandra for storage. It addresses issues with scaling Lucene/Solr by leveraging Cassandra's ability to horizontally scale writes and queries. Lucandra stores Lucene indexes and term vectors in Cassandra to allow distributed indexing and searching across nodes. It has been used successfully in production systems like sparse.ly for twitter search and Wikassandra for Wikipedia search.
Finite-State Queries in Lucene:
* Background, improvement/evolution of MultiTermQuery API in 2.9 and Flex
* Implementing existing Lucene queries with NFA/DFA for better performance: Wildcard, Regex, Fuzzy
* How you can use this Query programmatically to improve relevance (I'll use an English test collection/English examples)
Quick overview of other Lucene features in development, such as:
* Flexible Indexing
* "More-Flexible" Scoring: challenges/supporting BM25, more vector-space models, field-specific scoring, etc.
* Improvements to analysis
Bonus:
* Lucene / Solr merger explanation and future plans
About the presenter:
Robert Muir is a super-active Lucene developer. He works as a software developer for Abraxas Corporation. Robert received his MS in Computer Science from Johns Hopkins and BS in CS from Radford University. For the last few years Robert has been working on foreign language NLP problems - "I really enjoy working with Lucene, as it's always receptive to better int'l/language support, even though everyone seems to be a performance freak... such a weird combination!"
This document discusses incorporating probabilistic retrieval knowledge into TFIDF-based search engines. It provides an overview of different retrieval models such as Boolean, vector space, probabilistic, and language models. It then describes using a probabilistic model that estimates the probability of a document being relevant or non-relevant given its terms. This model can be combined with the BM25 ranking algorithm. The document proposes applying probabilistic knowledge to different document fields during ranking to improve relevance.
Lucene is a free and open source information retrieval (IR) library written in Java. It is widely used to add search functionality to applications. Lucene features fast and scalable indexing and search, and supports various query types including phrase, wildcard, fuzzy and range queries. The Lucene project includes related sub-projects like Solr (search server), Nutch (web crawler), and Mahout (machine learning).
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
A Comprehensive Guide to DeFi Development Services in 2024Intelisync
DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum.
In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of what’s possible in finance.
In summary, DeFi in 2024 is not just a trend; it’s a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape.
At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology.
Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
1. Intro and Tutorial
W3C Corpus Processing
Advanced Topics
Summary
Unstructured Information Processing with
Apache UIMA
NYC Search and Discovery Meetup
Pablo Ariel Duboue, PhD
IBM TJ Watson Research Center
19 Skyline Dr.
Hawthorne, NY 10603
February 24th, 2010
Pablo Duboue Apache UIMA
2. Intro and Tutorial
W3C Corpus Processing
Advanced Topics
Summary
Outline
1 Intro and Tutorial
What is UIMA
Mini-Tutorial
2 W3C Corpus Processing
TREC Enterprise Track
3 Advanced Topics
Custom Flow Controllers
UIMA Asynchronous Scale-out
Things I’m interested in improving
Pablo Duboue Apache UIMA
3. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
Outline
1 Intro and Tutorial
What is UIMA
Mini-Tutorial
2 W3C Corpus Processing
TREC Enterprise Track
3 Advanced Topics
Custom Flow Controllers
UIMA Asynchronous Scale-out
Things I’m interested in improving
Pablo Duboue Apache UIMA
4. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
What is UIMA
UIMA is a framework, a means to integrate text or other
unstructured information analytics.
Reference implementations available for Java, C++ and
others.
An Open Source project under the umbrella of the Apache
Foundation.
Pablo Duboue Apache UIMA
5. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
Analytics Frameworks
Find all telephone numbers in running text
(((([0-9]{3}))|[0-9]{3})-?
[0-9]{3}-?[0-9]{4}
Nice but...
How are you going to feed this further processing?
What about finding non-standard proper names in text?
Acquiring technology from external vendors, free software
projects, etc?
Pablo Duboue Apache UIMA
6. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
In-line Annotations
Modify text to include annotations
This/DET happy/ADJ puppy/N
It gets very messy very quickly
(S (NP (This/DET happy/ADJ puppy/N) (VP eats/V (NP
(the/DET bone/N)))
Annotations can easily cross boundaries of other
annotations
He said <confidential>the project can’t go own. The
funding is lacking.</confidential>
Pablo Duboue Apache UIMA
7. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
Standoff Annotations
Standoff annotations
Do not modify the text
Keep the annotations as offsets within the original text
Most analytics frameworks support standoff annotations.
UIMA is built with standoff annotations at its core.
Example:
He said the project can’t go own. The funding is lacking.
0123456789012345678901235678901234567890123456789012345678
Sentence Annotation: 0-33, 36-58.
Confidential Annotation: 8-58.
Pablo Duboue Apache UIMA
8. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
Type Systems
Key to integrating analytic packages developed by
independent vendors.
Clear metadata about
Expected Inputs
Tokens, sentences, proper names, etc
Produced Outputs
Parse trees, opinions, etc
The framework creates an unified typesystem for a given
set of annotators being run.
Pablo Duboue Apache UIMA
9. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
Many frameworks
Besides UIMA
http://incubator.apache.org/uima
LingPipe
http://alias-i.com/lingpipe/
Gate
http://gate.ac.uk/
Pablo Duboue Apache UIMA
10. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
UIMA Advantages
Apache Licensed
Enterprise-ready code quality
Demonstrated scalability
Developed by experts in building frameworks
Interoperable (C++, Java, others)
Pablo Duboue Apache UIMA
11. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
About The Speaker
PhD in CS, Columbia University (NY)
Natural Language Generation
Joined IBM Research in 2005
Worked in
Question Answering
Expert Search
DeepQA (Jeopardy!)
Recently joined the UIMA group
mailto:pablo.duboue@gmail.com
Pablo Duboue Apache UIMA
12. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
Outline
1 Intro and Tutorial
What is UIMA
Mini-Tutorial
2 W3C Corpus Processing
TREC Enterprise Track
3 Advanced Topics
Custom Flow Controllers
UIMA Asynchronous Scale-out
Things I’m interested in improving
Pablo Duboue Apache UIMA
13. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
UIMA Concepts
Common Annotation Structure or CAS
Subject of Analysis (SofA or View)
JCas
Feature Structures
Annotations
Indices and Iterators
Analysis Engines (AEs)
AEs descriptors
Pablo Duboue Apache UIMA
14. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
Room annotator
From the UIMA tutorial, write an Analysis Engine that
identifies room numbers in text.
Yorktown patterns: 20-001, 31-206, 04-123 (Regular
Expression Pattern: [0-9][0-9]-[0-2][0-9][0-9])
Hawthorne patterns: GN-K35, 1S-L07, 4N-B21 (Regular
Expression Pattern: [G1-4][NS]-[A-Z][0-9])
Steps:
1 Define the CAS types that the annotator will use.
2 Generate the Java classes for these types.
3 Write the actual annotator Java code.
4 Create the Analysis Engine descriptor.
5 Test the annotator.
Pablo Duboue Apache UIMA
15. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
Editing a Type System
Pablo Duboue Apache UIMA
16. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
The XML descriptor
<?xml version= " 1 . 0 " encoding= "UTF−8" ?>
< t y p e S y s t e m D e s c r i p t i o n xmlns= " h t t p : / / uima . apache . org / r e s o u r c e S p e c i f i e r " >
<name> Tu to ri al Ty pe Sys te m < / name>
< d e s c r i p t i o n >Type System D e f i n i t i o n f o r t h e t u t o r i a l examples −
as o f E x e r c i s e 1< / d e s c r i p t i o n >
<vendor>Apache Software Foundation < / vendor>
<version> 1 . 0 < / version>
<types>
<typeDescription>
<name>org . apache . uima . t u t o r i a l . RoomNumber< / name>
< d e s c r i p t i o n >< / d e s c r i p t i o n >
<supertypeName>uima . t c a s . A n n o t a t i o n < / supertypeName>
<features>
<featureDescription>
<name> b u i l d i n g < / name>
< d e s c r i p t i o n > B u i l d i n g c o n t a i n i n g t h i s room< / d e s c r i p t i o n >
<rangeTypeName>uima . cas . S t r i n g < / rangeTypeName>
</ featureDescription>
</ features>
</ typeDescription>
< / types>
< / typeSystemDescription>
Pablo Duboue Apache UIMA
17. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
The AE code
package org . apache . uima . t u t o r i a l . ex1 ;
import j a v a . u t i l . regex . Matcher ;
import j a v a . u t i l . regex . P a t t e r n ;
import org . apache . uima . analysis_component . JCasAnnotator_ImplBase ;
import org . apache . uima . j c a s . JCas ;
import org . apache . uima . t u t o r i a l . RoomNumber ;
/∗∗
∗ Example a n n o t a t o r t h a t d e t e c t s room numbers u s i n g
∗ Java 1 . 4 r e g u l a r e x p r e s s i o n s .
∗/
public class RoomNumberAnnotator extends JCasAnnotator_ImplBase {
p r i v a t e P a t t e r n mYorktownPattern =
P a t t e r n . compile ( " b [ 0 − 4 ] d−[0 −2]d d b " ) ;
p r i v a t e P a t t e r n mHawthornePattern =
P a t t e r n . compile ( " b [ G1−4][NS] −[A ] d d b " ) ;
−Z
public void process ( JCas aJCas ) {
/ / next s l i d e
}
}
Pablo Duboue Apache UIMA
18. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
The AE code (cont.)
public void process ( JCas aJCas ) {
/ / g e t document t e x t
S t r i n g docText = aJCas . getDocumentText ( ) ;
/ / search f o r Yorktown room numbers
Matcher matcher = mYorktownPattern . matcher ( docText ) ;
i n t pos = 0 ;
while ( matcher . f i n d ( pos ) ) {
/ / found one − c r e a t e a n n o t a t i o n
RoomNumber a n n o t a t i o n = new RoomNumber ( aJCas ) ;
a n n o t a t i o n . s e t B e g i n ( matcher . s t a r t ( ) ) ;
a n n o t a t i o n . setEnd ( matcher . end ( ) ) ;
a n n o t a t i o n . s e t B u i l d i n g ( " Yorktown " ) ;
a n n o t a t i o n . addToIndexes ( ) ;
pos = matcher . end ( ) ;
}
/ / search f o r Hawthorne room numbers
// ..
}
Pablo Duboue Apache UIMA
19. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
UIMA Document Analyzer
Pablo Duboue Apache UIMA
20. Intro and Tutorial
W3C Corpus Processing UIMA
Advanced Topics Tutorial
Summary
UIMA Document Analyzer (cont)
Pablo Duboue Apache UIMA
21. Intro and Tutorial
W3C Corpus Processing
TREC Enterprise Track
Advanced Topics
Summary
Outline
1 Intro and Tutorial
What is UIMA
Mini-Tutorial
2 W3C Corpus Processing
TREC Enterprise Track
3 Advanced Topics
Custom Flow Controllers
UIMA Asynchronous Scale-out
Things I’m interested in improving
Pablo Duboue Apache UIMA
22. Intro and Tutorial
W3C Corpus Processing
TREC Enterprise Track
Advanced Topics
Summary
An Example
TREC 2006 enterprise track
Search for experts in W3C Website
Given topic, find expert in topic
The IBM Enterprise Track 2006 Team
Guillermo Averboch, Jennifer Chu-Carroll, Pablo A Duboue,
David Gondek, J William Murdock and John Prager .
Pablo Duboue Apache UIMA
23. Intro and Tutorial
W3C Corpus Processing
TREC Enterprise Track
Advanced Topics
Summary
Corpus Processing: Generalities
A simplified view of our TREC 2006 Enterprise Track Indexing Pipeline (actual pipeline has 23 components)
Reader Low Level Processing Higher Level Analytics Analytics Consumers
Binary HTML Enterprise Expertise Pseudo
LibMagic Lucene EKDB
Collection Detagger Member Detector Document
Annotator Indexer Indexer
Reader Annotator Annotator Aggregate Constructor
Pablo Duboue Apache UIMA
24. Intro and Tutorial
W3C Corpus Processing
TREC Enterprise Track
Advanced Topics
Summary
Pipeline
Reader
Binary Collection Reader
Low Level Processing
LibMagic Annotator
HTML Detagger Annotator
Higher Level Analytics
Enterprise Member Annotator
Expertise Detector Aggregate
Analytics Consumers
Lucene Indexer
Pseudo Document Constructor
EKDB Indexer
Pablo Duboue Apache UIMA
25. Intro and Tutorial
W3C Corpus Processing
TREC Enterprise Track
Advanced Topics
Summary
Binary Collection Reader
Reads the TREC XML format
300,000+ documents (a full crawl of the w3.org site)
Binary format, to allow auto-detection of file type,
encoding, etc.
Pablo Duboue Apache UIMA
26. Intro and Tutorial
W3C Corpus Processing
TREC Enterprise Track
Advanced Topics
Summary
LibMagic Annotator
Uses “magic” numbers to heuristically guess the file type.
JNI wrapper to libmagic in Linux.
Non-supported file types are dropped.
UIMA can run this remotely from a Windows machine.
Pablo Duboue Apache UIMA
27. Intro and Tutorial
W3C Corpus Processing
TREC Enterprise Track
Advanced Topics
Summary
HTML Detagger Annotator
For documents identified as HTML, parse them and extract
the text.
Perform also encoding detection (utf-8 vs. iso-8859-1).
Other detaggers (not shown) are applied to other file
formats.
Pablo Duboue Apache UIMA
28. Intro and Tutorial
W3C Corpus Processing
TREC Enterprise Track
Advanced Topics
Summary
Enterprise Member Annotator
Detects inside running text the occurrence of any variant of
the 1,000+ experts for the Enterprise Track.
Dictionary extended with name variants.
Simple TRIE-based implementation.
Pablo Duboue Apache UIMA
29. Intro and Tutorial
W3C Corpus Processing
TREC Enterprise Track
Advanced Topics
Summary
Expertise Detector Aggregate
Hierarchical aggregate of 16 annotators leveraging existing
technology into a new “expertise detection” annotator.
Includes a named-entity detector and a relation detector for
semantic patterns.
Pablo Duboue Apache UIMA
30. Intro and Tutorial
W3C Corpus Processing
TREC Enterprise Track
Advanced Topics
Summary
Lucene Indexer
Integration with Open Source technology.
Indexes the tokens from the text.
The UIMA framework also contains JuruXML, an indexer
for semantic information.
Pablo Duboue Apache UIMA
31. Intro and Tutorial
W3C Corpus Processing
TREC Enterprise Track
Advanced Topics
Summary
Pseudo Document Constructor
Uses the name occurrences to create a “pseudo”
document with all text surrounding each expert name.
The pseudo documents are indexed off-line.
Pablo Duboue Apache UIMA
32. Intro and Tutorial
W3C Corpus Processing
TREC Enterprise Track
Advanced Topics
Summary
EKDB Indexer
Stores extracted entities and relations in a relational
database.
Standards-based (JDBC, RDF).
Employed in a variety of research applications for search
and inference.
Pablo Duboue Apache UIMA
33. Intro and Tutorial
Flow Controllers
W3C Corpus Processing
UIMA AS
Advanced Topics
Improvements
Summary
Outline
1 Intro and Tutorial
What is UIMA
Mini-Tutorial
2 W3C Corpus Processing
TREC Enterprise Track
3 Advanced Topics
Custom Flow Controllers
UIMA Asynchronous Scale-out
Things I’m interested in improving
Pablo Duboue Apache UIMA
34. Intro and Tutorial
Flow Controllers
W3C Corpus Processing
UIMA AS
Advanced Topics
Improvements
Summary
Custom Flow Controllers
UIMA allows you to specify which AE will receive the CAS
next, based on all the annotations on the CAS.
examples/descriptors/flow_controller/WhiteboardFlowController.xml
FlowController implementing a simple version of the
“whiteboard” flow model. Each time a CAS is received, it
looks at the pool of available AEs that have not yet run on
that CAS, and picks one whose input requirements are
satisfied. Limitations: only looks at types, not features.
Does not handle multiple Sofas or CasMultipliers.
Pablo Duboue Apache UIMA
35. Intro and Tutorial
Flow Controllers
W3C Corpus Processing
UIMA AS
Advanced Topics
Improvements
Summary
Outline
1 Intro and Tutorial
What is UIMA
Mini-Tutorial
2 W3C Corpus Processing
TREC Enterprise Track
3 Advanced Topics
Custom Flow Controllers
UIMA Asynchronous Scale-out
Things I’m interested in improving
Pablo Duboue Apache UIMA
36. Intro and Tutorial
Flow Controllers
W3C Corpus Processing
UIMA AS
Advanced Topics
Improvements
Summary
UIMA AS: ActiveMQ
ActiveMQ
UIMA AS
Broker
AEs
queue
queue
Client
Pablo Duboue Apache UIMA
37. Intro and Tutorial
Flow Controllers
W3C Corpus Processing
UIMA AS
Advanced Topics
Improvements
Summary
UIMA AS: Wrapping Primitive AEs
Pablo Duboue Apache UIMA
38. Intro and Tutorial
Flow Controllers
W3C Corpus Processing
UIMA AS
Advanced Topics
Improvements
Summary
UIMA AS: More information
http://incubator.apache.org/uima/doc-uimaas-what.html
http://svn.apache.org/viewvc/incubator/uima/uimadiscretionary{-}{}{}as/trunk/
uima-as-distr/src/main/readme/README?view=markup
http://incubator.apache.org/uima/downloads/releaseDocs/2.3.0
discretionary{-}{}{}incubating/docs-uima-as/html/uima_async_scaleout/uima_
async_scaleout.html
Pablo Duboue Apache UIMA
39. Intro and Tutorial
Flow Controllers
W3C Corpus Processing
UIMA AS
Advanced Topics
Improvements
Summary
Outline
1 Intro and Tutorial
What is UIMA
Mini-Tutorial
2 W3C Corpus Processing
TREC Enterprise Track
3 Advanced Topics
Custom Flow Controllers
UIMA Asynchronous Scale-out
Things I’m interested in improving
Pablo Duboue Apache UIMA
40. Intro and Tutorial
Flow Controllers
W3C Corpus Processing
UIMA AS
Advanced Topics
Improvements
Summary
Descriptorless Primitive Analysis Engines
Java 5 introduced annotations for metadata associated
with Java classes.
The UIMA primitive AEs descriptor files fall squarely within
their intended use cases.
Example:
import org . apache . uima . a n n o t a t i o n . ∗ ;
@UimaPrimitive (
d e s c r i p t i o n = " I d e n t i f i e s room numbers on t e x t f
typesystem= " org / apache / uima / examples / roomtypes
)
public class RoomNumberAnnotator extends JCasAn
...
Pablo Duboue Apache UIMA
41. Intro and Tutorial
Flow Controllers
W3C Corpus Processing
UIMA AS
Advanced Topics
Improvements
Summary
CASless JCas Types
A common use case within UIMA is to do some processing
and then kept some results outside the CAS.
These results are used with application level logic, outside
the UIMA framework.
Because the JCas types are only available when tied to a
CAS, they cannot be used within application logic.
Copying information to POJOs that re-create the JCas
types is a frequent and tedious task.
Proposal: have JCasGen produce both CAS-backed and
CAS-less implementation of the type defined in the type
system.
With methods to bridge between the two.
Pablo Duboue Apache UIMA
42. Intro and Tutorial
W3C Corpus Processing
Advanced Topics
Summary
Summary
UIMA is a production ready framework for unstructured
information processing.
UIMA is a framework and it contains little or no annotators.
It is an efficient framework that requires commitment on
behalf of its practitioners.
Outlook
As an open source project, new contributors are always
welcomed.
There are a number of things I am personally interested in
working with other people interested in UIMA.
Pablo Duboue Apache UIMA