The document describes contributions to building a corpora-flow system, including tools for book cleaning, detecting duplicates and candidate pairs, book synchronization, alignment evaluation, and a corpora-flow system. It outlines challenges in corpora building like format issues and provides examples of book processing problems with solutions. Key steps in the tools involve generating a sections ontology, measuring similarity to find duplicates and pairs, matching section delimiters for synchronization, and comparing alignments.
Editing images in the WordPress media managerJeremy Dawes
Get access to the videos that go with these presentations, whitelabel and branded options, www.wordpressboffin.com - for WordPress website design, build and hosting checkout www.jezweb.com.au - If you have any question about WordPress, would like to know how to do something or would like a WordPress training video made for a technique or plugin let me know. Hope you found this helpful, thank you, Jeremy.
Editing images in the WordPress media managerJeremy Dawes
Get access to the videos that go with these presentations, whitelabel and branded options, www.wordpressboffin.com - for WordPress website design, build and hosting checkout www.jezweb.com.au - If you have any question about WordPress, would like to know how to do something or would like a WordPress training video made for a technique or plugin let me know. Hope you found this helpful, thank you, Jeremy.
Set a featured image of a page in WordPressJeremy Dawes
Get access to the videos that go with these presentations, whitelabel and branded options, www.wordpressboffin.com - for WordPress website design, build and hosting checkout www.jezweb.com.au - If you have any question about WordPress, would like to know how to do something or would like a WordPress training video made for a technique or plugin let me know. Hope you found this helpful, thank you, Jeremy.
Analogy is one of the most studied representatives of a family of non-classical forms of reasoning working across different domains, usually taken to play a crucial role in creative thought and problem-solving. In the first part of the talk, I will shortly introduce general principles of computational analogy models (relying on a generalization-based approach to analogy-making). We will then have a closer look at Heuristic-Driven Theory Projection (HDTP) as an example for a theoretical framework and implemented system: HDTP computes analogical relations and inferences for domains which are represented using many-sorted first-order logic languages, applying a restricted form of higher-order anti-unification for finding shared structural elements common to both domains. The presentation of the framework will be followed by a few reflections on the "cognitive plausibility" of the approach motivated by theoretical complexity and tractability considerations.
In the second part of the talk I will discuss an application of HDTP to modeling essential parts of concept blending processes as current "hot topic" in Cognitive Science. Here, I will sketch an analogy-inspired formal account of concept blending —developed in the European FP7-funded Concept Invention Theory (COINVENT) project— combining HDTP with mechanisms from Case-Based Reasoning.
Slides from a ligthning talk oabout the Perl module Text::Perfide::BookPairs, presented on the I International Per-fide Workshops, at University of MInho, 2011.
Set a featured image of a page in WordPressJeremy Dawes
Get access to the videos that go with these presentations, whitelabel and branded options, www.wordpressboffin.com - for WordPress website design, build and hosting checkout www.jezweb.com.au - If you have any question about WordPress, would like to know how to do something or would like a WordPress training video made for a technique or plugin let me know. Hope you found this helpful, thank you, Jeremy.
Analogy is one of the most studied representatives of a family of non-classical forms of reasoning working across different domains, usually taken to play a crucial role in creative thought and problem-solving. In the first part of the talk, I will shortly introduce general principles of computational analogy models (relying on a generalization-based approach to analogy-making). We will then have a closer look at Heuristic-Driven Theory Projection (HDTP) as an example for a theoretical framework and implemented system: HDTP computes analogical relations and inferences for domains which are represented using many-sorted first-order logic languages, applying a restricted form of higher-order anti-unification for finding shared structural elements common to both domains. The presentation of the framework will be followed by a few reflections on the "cognitive plausibility" of the approach motivated by theoretical complexity and tractability considerations.
In the second part of the talk I will discuss an application of HDTP to modeling essential parts of concept blending processes as current "hot topic" in Cognitive Science. Here, I will sketch an analogy-inspired formal account of concept blending —developed in the European FP7-funded Concept Invention Theory (COINVENT) project— combining HDTP with mechanisms from Case-Based Reasoning.
Slides from a ligthning talk oabout the Perl module Text::Perfide::BookPairs, presented on the I International Per-fide Workshops, at University of MInho, 2011.
Cleaning plain text books with Text::Perfide::BookCleanerandrefsantos
Slides from a presentation about Text::Perfide::BookCleaner given at PtPW2011. T::P::BC is a Perl module created to clean books in plain text format, making them suitable for further automatic text processing activities.
Detecção e Correcção Parcial de Problemas na Conversão de Formatosandrefsantos
Presentation given at I Workshop Per-Fide, UMinho, about GuardaLivros, an application being developed to detect and resolve problems in simple-text documents to be automatically processed (e.g., bi-text alignment) [PT].
Bigorna - a toolkit for orthography migration challengesandrefsantos
Paper written by José João Almeida, André Santos and Alberto Simões and submitted, accepted and presented at LREC2010 - http://www.lrec-conf.org/lrec2010/
Slides from a ligthning talk on "Bigorna – a toolkit for orthography migration challenges", at 3T (Time Trial Talks), an event organized by CeSIUM (http://cesium.di.uminho.pt).
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
Slides
1. Contributions for building a
Corpora-Flow system
Andr´ Santos
e
andrefs@cpan.org
Informatics Engineering MSc
University of Minho
December 2011
2. Concepts
Aligned parallel corpus: Set of parallel texts in
which correspondences have been marked
between blocks (paragraphs, sentences,
words, . . . ) from each text.
Corpora-flow: Adaptation of the concept of
workflow to the several tasks, decisions
and sequences of steps involved in the
process of building a corpus.
1 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
3. Concepts
Aligned parallel corpus: Set of parallel texts in
which correspondences have been marked
between blocks (paragraphs, sentences,
words, . . . ) from each text.
Corpora-flow: Adaptation of the concept of
workflow to the several tasks, decisions
and sequences of steps involved in the
process of building a corpus.
This presentation and the underlying master thesis
describe the implementation of several tools to be
used in typical corpus building activities.
1 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
4. Context
The work developed in the context of this master
thesis was motivated and supported by
Project Per-fide, an undergoing project in
University of Minho which aims to build large
parallel corpora between Portuguese and other six
languages.
2 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
5. Corpora building challenges
file format and format conversion
finding duplicated files
text encoding format
structural residues
section delimiters
unpaired sections (parallel corpora)
...
3 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
6. Corpora building challenges
Severe problems which often lead to bad results
Many (most?) of them are hard/impossible to
solve completely
Find the problem and report it when it is not
solvable automatically
Provide intelligent ways of describing what was
found and done
4 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
7. 5 key issues
Book cleaning
Duplicates and candidate pairs detection
Book synchronization
Alignment evaluation
Corpora-flow system
5 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
8. Book processing problems – Motivation
(...) d <92>’ entr´e, donnant acc`s dans la salle commune.
e e
Une l´g`re v´randa, qui en prot´-
e e e e
M
<96>- 86 <96>-
^L geait la partie ant´rieure contre l <92>’ action
e
des rayons solaires, reposait sur de sveltes bambous. (...)
La Jangada, Jules Verne
6 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
9. Book processing problems – Motivation
(...) d <92>’ entr´e, donnant acc`s dans la salle commune.
e e
Une l´g`re v´randa, qui en prot´-
e e e e
M
<96>- 86 <96>-
^L geait la partie ant´rieure contre l <92>’ action
e
des rayons solaires, reposait sur de sveltes bambous. (...)
La Jangada, Jules Verne
<92>’ : right single quot. mark (CP1252)
<96>- : en dash (CP1252)
^L : page break (0xC)
prot´-(...)geait : transpagination
e
6 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
10. Book processing problems – Motivation
(...) d <92>’ entr´e, donnant acc`s dans la salle commune.
e e
Une l´g`re v´randa, qui en prot´-
e e e e
M
<96>- 86 <96>-
^L geait la partie ant´rieure contre l <92>’ action
e
des rayons solaires, reposait sur de sveltes bambous. (...)
La Jangada, Jules Verne
(...) d ’ entr´e, donnant acc`s dans la salle commune.
e e
Une l´g`re v´randa, qui en prot´geait _pb1_
e e e e
la partie ant´rieure contre l ’ action
e
des rayons solaires, reposait sur de sveltes bambous. (...)
6 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
11. Book cleaning
Subdivided in several steps:
7 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
12. Sections ontology
chap
PT cap´tulo,
ı
contains common section types cap, capitulo
FR chapitre, chap
used to automatically generate EN chapter, chap
the code to recognize section NT sec
delimiters end
PT fim
allows discussion/cooperation FR fin
EN the_end
with people with no BT _alone
programming knowledge scene
code becomes more simple and PT cena
FR sc`ne
e
clean EN scene
RU глава
BT act
8 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
13. Duplicates and pairs detection
Motivation
Duplicates can result in a biased corpus
Finding candidate pairs for alignment
Language independent elements (LIEs)
terms which are usually kept untranslated
year references – “1973”
proper names – “Hamlet”
Measuring similarity Thresholds
< 0.2: unrelated
|ALIEs ∩ BLIEs | > 0.4: pair
similarity (A, B) =
|ALIEs ∪ BLIEs | > 0.9: duplicates
9 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
14. Book synchronization
Definition
Structural alignment at section level, based on
previously added section delimiting marks.
Motivation
Some aligners cannot handle large documents
Section delimiters can act as anchor points
Unpaired sections can be discarded
Implementation
match similar section delimiters
synchronization points
10 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
15. Output
pair of files with
synchronization
marks
pair of files divided
into smaller pairs
of chunks
text report
synchronization
matrix
11 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
16. Output
pair of files with
synchronization
marks
pair of files divided
into smaller pairs
of chunks
text report
synchronization
matrix
11 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
17. Alignment evaluation
Motivation
compare alignments of the same documents
(performed by different tools, with different options, . . . )
determine if an alignment was successful
12 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
18. Alignment evaluation
Motivation
compare alignments of the same documents
(performed by different tools, with different options, . . . )
determine if an alignment was successful
Comparing alignments
parse TMX files and output the total number
correspondences of each type
0:1/1:0, 1:1, 2:1/1:2 and 2:2
evaluate the other tools developed
compare the performance of the available
alignment tools
12 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
19. Alignment evaluation
Determine if an alignment was successful
Summarize a TMX by sampling. Sampling can
be performed based on:
number of samples desired
explicit sampling points
translation units which match a given regular
expression
Output is a (much?) smaller TMX file
13 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
20. Alignment evaluation
The Name of the Rose, Umberto Eco
14 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
21. Alignment evaluation
AdsonDE = АдсоRU
The Name of the Rose, Umberto Eco
14 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
22. Alignment evaluation
AdsonDE = АдсоRU
The Name of the Rose, Umberto Eco
14 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
23. Alignment evaluation
AdsonDE = АдсоRU
The Name of the Rose, Umberto Eco
14 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
24. Distribution
All the tools implemented as Perl modules:
Text::Perfide::BookCleaner
Text::Perfide::BookPairs
Text::Perfide::BookSync
Text::Perfide::TMX::Utils
publicly available on CPAN
including tests and documentation
additional effort required to make code
installable and usable by other people
15 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
25. Corpora-flow
Motivation
building a corpus is a complex task
linear pipeline is not powerful enough
Workflow Makefiles
states file-oriented
actions timestamps and
conditions dependencies
context fail-fast and resumable
execution
parallelization
16 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
26. Corpora-flow
workflow + Makefiles = corpora-flow
DSL (→ Slay::Makefile)
workflow: rule*
rule: pre-condition* action post-condition*
action: targets dependencies function
condition: filename function
target: pattern*
dependencies: pattern*
function: Perl code
17 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
27. Conclusions
Evaluation of the tools has shown that they do
help to solve problems
Most of the methods devised can be applied in
other contexts
Working within a larger project:
provides requirements and resources
specific needs and priorities
making code available to other people:
requires additional effort
gives meaning to the work
external contributions
Higher level objects help to organize and
discuss
18 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
28. Conclusions
Evaluation of the tools has shown that they do
help to solve problems
Most of the methods devised can be applied in
other contexts
Working within a larger project:
provides requirements and resources
specific needs and priorities
making code available to other people:
requires additional effort
gives meaning to the work
external contributions
Higher level objects help to organize and
discuss
18 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
29. Conclusions
Evaluation of the tools has shown that they do
help to solve problems
Most of the methods devised can be applied in
other contexts
Working within a larger project:
provides requirements and resources
specific needs and priorities
making code available to other people:
requires additional effort
gives meaning to the work
external contributions
Higher level objects help to organize and
discuss
18 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
30. Conclusions
Evaluation of the tools has shown that they do
help to solve problems
Most of the methods devised can be applied in
other contexts
Working within a larger project:
provides requirements and resources
specific needs and priorities
making code available to other people:
requires additional effort
gives meaning to the work
external contributions
Higher level objects help to organize and
discuss
18 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
31. Conclusions
Evaluation of the tools has shown that they do
help to solve problems
Most of the methods devised can be applied in
other contexts
Working within a larger project:
provides requirements and resources
specific needs and priorities
making code available to other people:
requires additional effort
gives meaning to the work
external contributions
Higher level objects help to organize and
discuss
18 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
32. Conclusions
Evaluation of the tools has shown that they do
help to solve problems
Most of the methods devised can be applied in
other contexts
Working within a larger project:
provides requirements and resources
specific needs and priorities
making code available to other people:
requires additional effort
gives meaning to the work
external contributions
Higher level objects help to organize and
discuss
18 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
33. Conclusions
Evaluation of the tools has shown that they do
help to solve problems
Most of the methods devised can be applied in
other contexts
Working within a larger project:
provides requirements and resources
specific needs and priorities
making code available to other people:
requires additional effort
gives meaning to the work
external contributions
Higher level objects help to organize and
discuss
18 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
34. Conclusions
Evaluation of the tools has shown that they do
help to solve problems
Most of the methods devised can be applied in
other contexts
Working within a larger project:
provides requirements and resources
specific needs and priorities
making code available to other people:
requires additional effort
gives meaning to the work
external contributions
Higher level objects help to organize and
discuss
18 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
35. Conclusions
Evaluation of the tools has shown that they do
help to solve problems
Most of the methods devised can be applied in
other contexts
Working within a larger project:
provides requirements and resources
specific needs and priorities
making code available to other people:
requires additional effort
gives meaning to the work
external contributions
Higher level objects help to organize and
discuss
18 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
36. Conclusions
Evaluation of the tools has shown that they do
help to solve problems
Most of the methods devised can be applied in
other contexts
Working within a larger project:
provides requirements and resources
specific needs and priorities
making code available to other people:
requires additional effort
gives meaning to the work
external contributions
Higher level objects help to organize and
discuss
18 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
37. Future work
Document cleaners
other types of documents (e.g. scientific
articles)
algorithm for finding section delimiters with
notion of hierarchy
create ebooks/bilingual books
Duplicates and pair detection
list of correspondences (e.g. Adson → Адсо,
London → Londres)
calculate best threshold values in real time
19 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
38. Future work
Document synchronization
interactive mode
improvements on synchronization matrix and
metrics
hierarchical sections
other section alignment algorithms
Corpora-flow
finish specification and implementation
implement a corpora-flow for Project Per-fide
20 Andr´ Santos, andrefs@cpan.org
e Contributions for building a Corpora-Flow system
39. Contributions for building a
Corpora-Flow system
Andr´ Santos
e
andrefs@cpan.org
Informatics Engineering MSc
University of Minho
December 2011