|0|
Architect
@simonech
Simone Chiaretta
Fast and furious(ly) multilingual:
Publishing of EU politics in 24
languages
Council of the European Union
General Secretariat
Directorate-General Administration
Directorate Communication and Information Systems
Unit Design & Development
Disclaimer: The views expressed are solely those of the speaker and may not be regarded as stating an official position of the Council of the EU
Clause de non-responsabilité: Les avis exprimés n'engagent que leur auteur et ne peuvent être considérés comme une position officielle du Conseil de l'UE
Umbraco Specialist
@netaddicts
Dirk De Grave
|1|
• One of the 3 main EU institutions together with
Commission and Parliament
• Made of two Councils
– Council of European Union
• meetings of ministries of each EU country
– European Council
• Head of states of each EU member state
• Rotating presidency (different Member State every 6
months)
• 28 Countries
• 24 Official Languages
Council of European Union
|2|
Consilium.Europa.EU
|3|
• Report on the work of the European Council, the Council
of the EU, the Eurogroup and their presidents to citizens
of the EU
• Inform the public and media with press releases and news
• List all meetings and meetings’ conclusion
Goal of the site
|4|
• Why we moved to Umbraco
• How we work
• How we deploy
• Scalability and system
• Our beautiful editor experience
• Standard based translation in 24 languages
• Integration with legacy and external systems
• Full-Text search
• Import from old CMS
Agenda
|5|
Why Umbraco
|6|
• pre-2011: Custom CMS
• 2011: Umbraco v4
• Jan 2015: Redesign + Commercial CMS
• 2017 Q3: Umbraco v7
Why we moved to Umbraco
|7|
• Independent study from CMS expert
• PoC done with multiple CMS
– Umbraco
– Drupal @ EC
– EPi Server
• Internal evaluation
Decision Process
|8|
• Faster editing and publishing process
• Simple editorial experience
• Better integration with translation tools
• Better search
• Able to handle a team of 30-ish editors
Expectations
|9|
• Umbraco is not multi-lingual by default
• Default translation flow is obsolete and weak
• Integration of legacy satellite applications
• CI and deployment
• Import from old CMS
Challenges
|10|
How we work
|11|
• Multi-discipline team: 2 analysts, 3 frontend
devs, 6 backend devs
• Scrum/sprint planning/daily standups
• Atlassian stack on premise as collaboration
tools for:
- Analysis (Confluence)
- Development (BitBucket)
- CI/CD (Bamboo)
- Sprint planning/issue tracking (Jira)
Development/team setup
|12|
Development/team setup
(Atlassian stack)
|13|
• Few options
– Umbraco as a Service #UaaS
– Shared database development
All developers use the same database for development
(doctypes/datatypes/…)
– Local database development
Each individual developer uses a local database,
either Sql server or Sql Server CE
Umbraco development setup
|14|
Decision ?
• Umbraco as a Service = Nay 
• Shared development = NO
– Perfect for a single environment (eg. DEV)
– People don’t need to sync any metadata nor content
– Candidate to a cluttered database if someone forgets to
delete any metadata or content that is not part of the
solution
• Local database development = YES
– People will need to sync all metadata and “relevant” content
– Perfect fit for proof of concept’ing (switching between Sql
server/Sql server CE)
– Perfect fit for continuous integration with multiple
environments if we can find a way to synchronize metadata
and content
Umbraco development setup
|15|
Lightweight, can be easily fine-tuned to only sync
minimal settings to get your environments in clean
state for both metadata as well as content and can
be automated
Challenges ?
- How to handle media efficiently?
- Dealing with exotic datatypes
- Long path names in continuous environment
uSync
|16|
.Core project (Business logic) / .Core.Tests
project
Startup configuration (DI = Unity, IoC, event handling)
PropertyValueConverters
ModelsBuilder
Model customizations
Controllers (Route hijacking all the things)
Services
Automapper
ViewModels
Project/solution setup
|17|
.Web project
Default Umbraco installation
Minimal changes allowed (.config) to smooth upgrades
App_Plugins for custom built and 3rd party packages (Nuget/Private
Nuget) even for packages from the online repository
.Frontend project
All things related to UI (views/js/css)
Frontend team uses their own workflow to generate assets which are
copied into the .Web project
.Resources project
Legacy dictionary
Project/solution setup
|18|
Workflow = GitFlow
Main develop branch
Each feature/bugfix = separate branch
PR with approval = Merge
Merge vs rebase!
Strict rules in implementing features
Features must be small
Changes unrelated to feature = rejected
Every feature is discussed upfront
Commits / commit messages / PR
messages must be very clear
|19|
Workflow = GitFlow
|20|
How we deploy
|21|
Build plan kicks in for every commit on feature/bugfix
branch pushed to remote repository
- Build must be successful
- All related tests must pass
- Continuous code quality assurance (SonarQube)
Build plan is only responsible for creating the
required artifacts
Build plan will never change anything
(files/configurations)
Build/Deployment pipeline (Bamboo)
|22|
Different build plans for DEV/TEST and STA/PROD
DEV/TEST = 1 single artifact
STA/PROD = 2 artifacts, 1x frontend and 1x backend
Build/Deployment pipeline (Bamboo)
|23|
Build/Deployment pipeline (Bamboo)
|24|
• Only if a build has been completed without
errors, it becomes candidate for “release”
• Release plan takes care moving the artifacts
from the build to your “destination”
environment
• Release plan is also responsible for
configuring the environment (web.config
transformations, uSync)
• Release can be automated (DEV) or is a
manual process (TEST/STA/PROD)
Build/Deployment pipeline (Bamboo)
|25|
Build/Deployment pipeline (Bamboo)
|26|
System architecture
|27|
• Back-office shielded from internet
• Instant publishing of content
• Performance and availability
Security and publishing
|28|
Systems and caching
SQL
SQL
UMBRACO CMS Production
Environment
Varnish cache servers Umbraco IIS web servers Windows File share cluster
HTTP
HTTP
HTTP
HTTP
HTTP
SQL
HTTP
SMB
SMB
Database cluster
Internet
SQL
SQL
Authoring/back office
HTTP/HTTPS
Alteon Load Balancer
|29|
• 3 level caching
1. ASP.NET and Umbraco caching
2. Varnish
3. CloudFlare (future)
Caching
|30|
• Reverse Proxy
• Caching based on HTTP Headers
• Behavior configurable with a DSL
• Possible to invalidate individual pages
Varnish
|31|
CloudFlare
|32|
Making editors happy
|33|
• In-page editing experience
• Find content easily (even with 1000’s of
nodes)
Main requirements
|34|
Predecessor cms editing experience
|35|
Predecessor cms editing experience
|36|
Grid / NestedContent /
DocTypeGridEditor / Customized Vorto
|37|
Grid editing
|38|
Grid template output
|39|
Grid template customization
|40|
Grid settings
|41|
Custom content picker (with preview)
|42|
Listview (visualsearch.js)
|43|
24 languages in a box
|44|
|45|
• 1-1 translation of 24 languages
• Batch management of languages
• Localize just the minimum need
• Export to industry standard XLIFF format
• Automatic import of translation
Requirements
|46|
• XML Localization Interchange File Format
• The only open standard bitext format
• OASIS standard since 2008
• Supported by all professional CAT tools in the
market
• Bitext is a file that contains both source and
target languages correctly « aligned »
What is XLIFF
|47|
Tyger Tyger, burning bright,
Tigre! Tigre! Divampante fulgore
In the forests of the night;
Nelle foreste della notte,
What immortal hand or eye,
Quale fu l'immortale mano o l'occhio
Could frame thy fearful symmetry?
Ch'ebbe la forza di formare la tua agghiacciante simmetria?
William Blake / Giuseppe Ungaretti
What is bitext
|48|
Tyger Tyger, burning bright, Tigre! Tigre! Divampante fulgore
In the forests of the night; Nelle foreste della notte,
What immortal hand or eye, Quale fu l'immortale mano o l'occhio
Could frame thy fearful symmetry? Ch'ebbe la forza di formare la tua
agghiacciante simmetria?
William Blake Giuseppe Ungaretti
What is bitext
|49|
msgid "Tyger Tyger, burning bright,"
msgstr "Tigre! Tigre! Divampante fulgore"
msgid "In the forests of the night;"
msgstr "Nelle foreste della notte,"
msgid "What immortal hand or eye,"
msgstr "Quale fu l'immortale mano o l'occhio"
msgid "Could frame thy fearful symmetry?"
msgstr "Ch'ebbe la forza di formare la tua agghiacciante simmetria?"
What is bitext
|50|
<source>Tyger Tyger, burning bright,</source>
<target>Tigre! Tigre! Divampante fulgore</target>
<source>In the forests of the night;</source>
<target>Nelle foreste della notte,</target>
<source>What immortal hand or eye,</source>
<target>Quale fu l'immortale mano o l'occhio</target>
<source>Could frame thy fearful symmetry?</source>
<target>Ch'ebbe la forza di formare la tua agghiacciante
simmetria?</target>
What is bitext
|51|
<source xml:lang="EN">Tyger Tyger, burning bright,</source>
<target xml:lang="IT">Tigre! Tigre! Divampante fulgore</target>
<source xml:lang="EN">In the forests of the night;</source>
<target xml:lang="IT">Nelle foreste della notte,</target>
<source xml:lang="EN">What immortal hand or eye,</source>
<target xml:lang="IT">Quale fu l'immortale mano o l'occhio</target>
<source xml:lang="EN">Could frame thy fearful symmetry?</source>
<target xml:lang="IT">Ch'ebbe la forza di formare la tua
agghiacciante simmetria?</target>
What is bitext
|52|
<unit id=1>
<segment>
<source xml:lang="EN">Tyger Tyger, burning bright,</source>
<target xml:lang="IT">Tigre! Tigre! Divampante fulgore</target>
</segment>
<segment>
<source xml:lang="EN">In the forests of the night;</source>
<target xml:lang="IT">Nelle foreste della notte,</target>
</segment>
<segment>
<source xml:lang="EN">What immortal hand or eye,</source>
<target xml:lang="IT">Quale fu l'immortale mano o l'occhio</target>
</segment>
<segment>
<source xml:lang="EN">Could frame thy fearful symmetry?</source>
<target xml:lang="IT">Ch'ebbe la forza di formare la tua
agghiacciante simmetria?</target>
</segment>
</unit>
What is bitext
|53|
• Linked trees
– PRO: default Umbraco approach to localization
– CONS: everything else 
• Nested nodes
– PRO: meaningful history, easier to manage programmatically
– CONS: not possible to sync grid structure between
languages, needs for custom batch publishing actions (and
much more)
• Vorto
– PRO: just localize what’s needed, just one node per content,
one grid structure for all
– CONS: loss of meaningful history, needs for custom
publishing “flag” per language, more difficult to manage
programmatically
3 options
|54|
• Vorto (customised)
• Custom “vorto-like” grid editor
• Custom translation component
Solution
|55|
Customised Vorto
|56|
Vorto in the grid
|57|
Custom translation flow (1)
|58|
Custom translation flow (1)
|59|
Custom translation flow (2)
|60|
Custom translation flow (3)
|61|
Extraction
|62|
Extraction
Umbraco
Generic document structure
Initial XLIFF (with HTML markup)
Split paragraphs and extract inline code
Segmentation
Apply Translation Memory
Off to Translation Workflow (SDL Studio)
Enrich with custom extensions
|63|
• Complete the system 
• Make the generic Extraction/Merging library
OpenSource
• Integrate the Umbraco specific
extaction/merging into Umbraco Core
https://github.com/simonech/XliffLib
Future steps
|64|
Integrations
|65|
List of internal/external system to
interact with
• PoolParty to enrich your content with valuable
metadata (taxonomies)
• Oracle database (MPO Meetings/Meeting planner)
• (TV)Newsroom
– Video API
– Asset/image library
• Rss feeds/twitter feeds
|66|
PoolParty
|67|
PoolParty
|68|
PoolParty
Why?
- Exchange taxonomy between different units within the EU
Council, or even more… with the outside world and vice versa
Example:
A “Location” taxonomy may already exist “somewhere”, so we
should be able to transparently reference this taxonomy without the
need to create a new one
Europe > Belgium > Brussels capital region > Brussels > …
|69|
PoolParty
Automated tagging ?
• Automated content tagging is possible using a 3rd party solution “Powertagging
with Umbraco and PoolParty”
• Didn’t really fit our requirements (legacy data, “taxonomy” currently not
semantically normalize)
Solution ?
On demand synchronization from PoolParty to Umbraco
• Limited number of syncs (~1/month)
• One way sync from PoolParty -> Umbraco
• Don’t rely on server availability
|70|
PoolParty(Sync process)
|71|
PoolParty (Sync’ed data)
|72|
PoolParty
Challenges ?
• Enrich our sync’ed data with custom “attributes”
Examples:
• Set country flag for specific “location” taxonomies
• Change default descriptions of a taxonomy on the frontend website
– “Council of the European Union” -> “Council of the EU”
Solution ?
• Create a “developer centric” “taxonomy settings section” to create a link
between the sync’ed taxonomy and our custom metadata
|73|
PoolParty (Taxonomy settings)
|74|
Meeting planner data (Oracle db)
External tools used by other departments creating “Meeting”s
Challenges:
• Data stored in external database
• Data is only exposed through readonly views on the Oracle db
• ~3000 meetings currently in system and available online, about ~100 meetings
are created monthly
• Approx. 4 meetings/month need additional content editing before publishing
• Link with the existing sync’ed taxonomy
• Advanced search (date/taxonomy/…)
Do we import this data in Umbraco ?
|75|
Meeting planner data (Oracle db)
|76|
Meeting planner data (Oracle db)
Decisions:
• Don’t import any meeting data in Umbraco (you’ve got everything you need
already)
• Remove connection to Oracle db
• Pushing data from Oracle db view to Sql custom table
• Enrich meeting data at import and store alongside the meeting data in Sql
custom table
• Optimize Sql custom table for max performance (index/…)
• Meetings created in Umbraco must reference a sql record
Result:
• Searching/quering a db still very fast (Optimize sql/storage for optimization)
• Content editors can still use Umbraco to add more content
• Don’t bloat the Umbraco system with nodes that don’t add any added value
|77|
External asset library (TV)Newsroom
• Most assets are referenced from an internal asset library shared
across multiple teams/units/...
• Some assets are stored externally (Rackcdn.com)
• Still use the media section for all other assets though
Challenges ?
- Images are huge, we’re talking about very high resolution images >10Mb
- Video’s are stored externally, only public API is available to fetch the content
(and thumbnail previews) (Challenge?)
|78|
External asset library (TV)Newsroom
Solution implemented
• ImageProcessor takes care of retrieving/storing/caching images from multiple
sources, both over http and https
- Requires a .axd service both http and https endpoints
- Proxy configuration is still a bit flaky (PR?)
• Offloading API request to fetch info from external source to internal server which
will return the results
- Finetune network/security
|79|
Full-text search
|80|
• Support of full-text search in 24 languages
• Boosting of particular elements of the pages
• Indexing of “composition” pages
• Indexing of external sources (PDFs, external
site)
• Fast availability of new/updated pages in the
index
Requirements
|81|
Elastic Search
Elastic SearchBackend Search API
Crawler
Apache Manifold
Frontend
Search
Crawling
Notification
|82|
• Just like Google 
• Structured information passed with:
– HTTP headers
• etag: "078de59b16c27119c670e63fa53e5b51"
– Microdata:
<time itemprop="startDate" datetime="2017-06-
08T14:45">June 8, 2:45pm</time>
– RDFa
<div profile=“http://data.consilium.europa.eu/data/public_voting/rdf/schema/Configuration"
typeof=”Article">
<span property=”
http://data.consilium.europa.eu/data/public_voting/consilium/configuration/agri”>Agriculture and
Fisheries</span>
</div>
Crawling
|83|
Import from legacy cms (E-project)
|84|
Migrate “non-structured” content from
Ektron into Umbraco
|85|
• Non-structured = custom legacy xml format
• Storage
– Content: Sql server
– Assets (images/pdf’s): on disk
• Other requirement
• Process of importing content/assets has to be repeatable in a
CI/CD environment
• Iterative development, start small, grow fast
Migrate “non-structured” content from
Ektron into Umbraco
|86|
Looking at two “migration” tools
- Cms import (@rsoeteman‘s well known package)
- Chauffeur (~Umbraco CLI tool started by @slace)
Migrate “non-structured” content from
Ektron into Umbraco
|87|
Introducing Chauffeur
”Chauffeur is a CLI for Umbraco, it will sit with your Umbraco websites bin folder and give you an
interface to which you can execute commands, known as Deliverables, against your installed
Umbraco instance.”
• Command line: perfect fit for our continuous integration/deployment scenario
• Lightweight: can be easily added or removed from your environments
– Drop assembly in /bin folder and you’re set, remove in production
– Ability to inject any Umbraco service API
– Code once, run anywhere (Build blocks of reusable deliverables)
– Create a chain of deliverables to run from (a .delivery file)
• Restrictions
- Publishing content won’t work!
Migrate “non-structured” content from
Ektron into Umbraco
|88|
Migrate “non-structured” content from
Ektron into Umbraco
|89|
Migrate “non-structured” content from
Ektron into Umbraco
For each content to be migrated
• Get record data out of the legacy Sql server database
• Create new content using Umbraco service API
• Property data transformation using custom object model and Json.net to
serialize to a “json string”
• Set property data on the new content
• Save new content in cms
Challenges
- Grid content (rte content)
- Customized Vorto implementation
- NestedContent / DocTypeGridEditor / Vorto and any possible
combinations
|90|
Migrate “non-structured” content from
Ektron into Umbraco
Deliverable transforms xml into json blob using our custom
data object model and Json.net (simplified example)
|91|
Chauffeur references
- https://our.umbraco.org/projects/collaboration/chauffeur/
- https://github.com/aaronpowell/chauffeur
- https://24days.in/umbraco-cms/2015/may-the-tools-be-with-you/
Migrate “non-structured” content from
Ektron into Umbraco
|92|
Conclusion
|93|
• First try to use what’s out of the box or on Our
• If not enough Umbraco can be heavily extended
• Umbraco can be used in “security conscious” entities
Conclusion
|94|
SUPER TAK!
|95|
?
Questions

Fast and furious(ly) multilingual: Publishing of EU politics in 24 languages with Umbraco

  • 1.
    |0| Architect @simonech Simone Chiaretta Fast andfurious(ly) multilingual: Publishing of EU politics in 24 languages Council of the European Union General Secretariat Directorate-General Administration Directorate Communication and Information Systems Unit Design & Development Disclaimer: The views expressed are solely those of the speaker and may not be regarded as stating an official position of the Council of the EU Clause de non-responsabilité: Les avis exprimés n'engagent que leur auteur et ne peuvent être considérés comme une position officielle du Conseil de l'UE Umbraco Specialist @netaddicts Dirk De Grave
  • 2.
    |1| • One ofthe 3 main EU institutions together with Commission and Parliament • Made of two Councils – Council of European Union • meetings of ministries of each EU country – European Council • Head of states of each EU member state • Rotating presidency (different Member State every 6 months) • 28 Countries • 24 Official Languages Council of European Union
  • 3.
  • 4.
    |3| • Report onthe work of the European Council, the Council of the EU, the Eurogroup and their presidents to citizens of the EU • Inform the public and media with press releases and news • List all meetings and meetings’ conclusion Goal of the site
  • 5.
    |4| • Why wemoved to Umbraco • How we work • How we deploy • Scalability and system • Our beautiful editor experience • Standard based translation in 24 languages • Integration with legacy and external systems • Full-Text search • Import from old CMS Agenda
  • 6.
  • 7.
    |6| • pre-2011: CustomCMS • 2011: Umbraco v4 • Jan 2015: Redesign + Commercial CMS • 2017 Q3: Umbraco v7 Why we moved to Umbraco
  • 8.
    |7| • Independent studyfrom CMS expert • PoC done with multiple CMS – Umbraco – Drupal @ EC – EPi Server • Internal evaluation Decision Process
  • 9.
    |8| • Faster editingand publishing process • Simple editorial experience • Better integration with translation tools • Better search • Able to handle a team of 30-ish editors Expectations
  • 10.
    |9| • Umbraco isnot multi-lingual by default • Default translation flow is obsolete and weak • Integration of legacy satellite applications • CI and deployment • Import from old CMS Challenges
  • 11.
  • 12.
    |11| • Multi-discipline team:2 analysts, 3 frontend devs, 6 backend devs • Scrum/sprint planning/daily standups • Atlassian stack on premise as collaboration tools for: - Analysis (Confluence) - Development (BitBucket) - CI/CD (Bamboo) - Sprint planning/issue tracking (Jira) Development/team setup
  • 13.
  • 14.
    |13| • Few options –Umbraco as a Service #UaaS – Shared database development All developers use the same database for development (doctypes/datatypes/…) – Local database development Each individual developer uses a local database, either Sql server or Sql Server CE Umbraco development setup
  • 15.
    |14| Decision ? • Umbracoas a Service = Nay  • Shared development = NO – Perfect for a single environment (eg. DEV) – People don’t need to sync any metadata nor content – Candidate to a cluttered database if someone forgets to delete any metadata or content that is not part of the solution • Local database development = YES – People will need to sync all metadata and “relevant” content – Perfect fit for proof of concept’ing (switching between Sql server/Sql server CE) – Perfect fit for continuous integration with multiple environments if we can find a way to synchronize metadata and content Umbraco development setup
  • 16.
    |15| Lightweight, can beeasily fine-tuned to only sync minimal settings to get your environments in clean state for both metadata as well as content and can be automated Challenges ? - How to handle media efficiently? - Dealing with exotic datatypes - Long path names in continuous environment uSync
  • 17.
    |16| .Core project (Businesslogic) / .Core.Tests project Startup configuration (DI = Unity, IoC, event handling) PropertyValueConverters ModelsBuilder Model customizations Controllers (Route hijacking all the things) Services Automapper ViewModels Project/solution setup
  • 18.
    |17| .Web project Default Umbracoinstallation Minimal changes allowed (.config) to smooth upgrades App_Plugins for custom built and 3rd party packages (Nuget/Private Nuget) even for packages from the online repository .Frontend project All things related to UI (views/js/css) Frontend team uses their own workflow to generate assets which are copied into the .Web project .Resources project Legacy dictionary Project/solution setup
  • 19.
    |18| Workflow = GitFlow Maindevelop branch Each feature/bugfix = separate branch PR with approval = Merge Merge vs rebase! Strict rules in implementing features Features must be small Changes unrelated to feature = rejected Every feature is discussed upfront Commits / commit messages / PR messages must be very clear
  • 20.
  • 21.
  • 22.
    |21| Build plan kicksin for every commit on feature/bugfix branch pushed to remote repository - Build must be successful - All related tests must pass - Continuous code quality assurance (SonarQube) Build plan is only responsible for creating the required artifacts Build plan will never change anything (files/configurations) Build/Deployment pipeline (Bamboo)
  • 23.
    |22| Different build plansfor DEV/TEST and STA/PROD DEV/TEST = 1 single artifact STA/PROD = 2 artifacts, 1x frontend and 1x backend Build/Deployment pipeline (Bamboo)
  • 24.
  • 25.
    |24| • Only ifa build has been completed without errors, it becomes candidate for “release” • Release plan takes care moving the artifacts from the build to your “destination” environment • Release plan is also responsible for configuring the environment (web.config transformations, uSync) • Release can be automated (DEV) or is a manual process (TEST/STA/PROD) Build/Deployment pipeline (Bamboo)
  • 26.
  • 27.
  • 28.
    |27| • Back-office shieldedfrom internet • Instant publishing of content • Performance and availability Security and publishing
  • 29.
    |28| Systems and caching SQL SQL UMBRACOCMS Production Environment Varnish cache servers Umbraco IIS web servers Windows File share cluster HTTP HTTP HTTP HTTP HTTP SQL HTTP SMB SMB Database cluster Internet SQL SQL Authoring/back office HTTP/HTTPS Alteon Load Balancer
  • 30.
    |29| • 3 levelcaching 1. ASP.NET and Umbraco caching 2. Varnish 3. CloudFlare (future) Caching
  • 31.
    |30| • Reverse Proxy •Caching based on HTTP Headers • Behavior configurable with a DSL • Possible to invalidate individual pages Varnish
  • 32.
  • 33.
  • 34.
    |33| • In-page editingexperience • Find content easily (even with 1000’s of nodes) Main requirements
  • 35.
  • 36.
  • 37.
    |36| Grid / NestedContent/ DocTypeGridEditor / Customized Vorto
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
    |45| • 1-1 translationof 24 languages • Batch management of languages • Localize just the minimum need • Export to industry standard XLIFF format • Automatic import of translation Requirements
  • 47.
    |46| • XML LocalizationInterchange File Format • The only open standard bitext format • OASIS standard since 2008 • Supported by all professional CAT tools in the market • Bitext is a file that contains both source and target languages correctly « aligned » What is XLIFF
  • 48.
    |47| Tyger Tyger, burningbright, Tigre! Tigre! Divampante fulgore In the forests of the night; Nelle foreste della notte, What immortal hand or eye, Quale fu l'immortale mano o l'occhio Could frame thy fearful symmetry? Ch'ebbe la forza di formare la tua agghiacciante simmetria? William Blake / Giuseppe Ungaretti What is bitext
  • 49.
    |48| Tyger Tyger, burningbright, Tigre! Tigre! Divampante fulgore In the forests of the night; Nelle foreste della notte, What immortal hand or eye, Quale fu l'immortale mano o l'occhio Could frame thy fearful symmetry? Ch'ebbe la forza di formare la tua agghiacciante simmetria? William Blake Giuseppe Ungaretti What is bitext
  • 50.
    |49| msgid "Tyger Tyger,burning bright," msgstr "Tigre! Tigre! Divampante fulgore" msgid "In the forests of the night;" msgstr "Nelle foreste della notte," msgid "What immortal hand or eye," msgstr "Quale fu l'immortale mano o l'occhio" msgid "Could frame thy fearful symmetry?" msgstr "Ch'ebbe la forza di formare la tua agghiacciante simmetria?" What is bitext
  • 51.
    |50| <source>Tyger Tyger, burningbright,</source> <target>Tigre! Tigre! Divampante fulgore</target> <source>In the forests of the night;</source> <target>Nelle foreste della notte,</target> <source>What immortal hand or eye,</source> <target>Quale fu l'immortale mano o l'occhio</target> <source>Could frame thy fearful symmetry?</source> <target>Ch'ebbe la forza di formare la tua agghiacciante simmetria?</target> What is bitext
  • 52.
    |51| <source xml:lang="EN">Tyger Tyger,burning bright,</source> <target xml:lang="IT">Tigre! Tigre! Divampante fulgore</target> <source xml:lang="EN">In the forests of the night;</source> <target xml:lang="IT">Nelle foreste della notte,</target> <source xml:lang="EN">What immortal hand or eye,</source> <target xml:lang="IT">Quale fu l'immortale mano o l'occhio</target> <source xml:lang="EN">Could frame thy fearful symmetry?</source> <target xml:lang="IT">Ch'ebbe la forza di formare la tua agghiacciante simmetria?</target> What is bitext
  • 53.
    |52| <unit id=1> <segment> <source xml:lang="EN">TygerTyger, burning bright,</source> <target xml:lang="IT">Tigre! Tigre! Divampante fulgore</target> </segment> <segment> <source xml:lang="EN">In the forests of the night;</source> <target xml:lang="IT">Nelle foreste della notte,</target> </segment> <segment> <source xml:lang="EN">What immortal hand or eye,</source> <target xml:lang="IT">Quale fu l'immortale mano o l'occhio</target> </segment> <segment> <source xml:lang="EN">Could frame thy fearful symmetry?</source> <target xml:lang="IT">Ch'ebbe la forza di formare la tua agghiacciante simmetria?</target> </segment> </unit> What is bitext
  • 54.
    |53| • Linked trees –PRO: default Umbraco approach to localization – CONS: everything else  • Nested nodes – PRO: meaningful history, easier to manage programmatically – CONS: not possible to sync grid structure between languages, needs for custom batch publishing actions (and much more) • Vorto – PRO: just localize what’s needed, just one node per content, one grid structure for all – CONS: loss of meaningful history, needs for custom publishing “flag” per language, more difficult to manage programmatically 3 options
  • 55.
    |54| • Vorto (customised) •Custom “vorto-like” grid editor • Custom translation component Solution
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
    |62| Extraction Umbraco Generic document structure InitialXLIFF (with HTML markup) Split paragraphs and extract inline code Segmentation Apply Translation Memory Off to Translation Workflow (SDL Studio) Enrich with custom extensions
  • 64.
    |63| • Complete thesystem  • Make the generic Extraction/Merging library OpenSource • Integrate the Umbraco specific extaction/merging into Umbraco Core https://github.com/simonech/XliffLib Future steps
  • 65.
  • 66.
    |65| List of internal/externalsystem to interact with • PoolParty to enrich your content with valuable metadata (taxonomies) • Oracle database (MPO Meetings/Meeting planner) • (TV)Newsroom – Video API – Asset/image library • Rss feeds/twitter feeds
  • 67.
  • 68.
  • 69.
    |68| PoolParty Why? - Exchange taxonomybetween different units within the EU Council, or even more… with the outside world and vice versa Example: A “Location” taxonomy may already exist “somewhere”, so we should be able to transparently reference this taxonomy without the need to create a new one Europe > Belgium > Brussels capital region > Brussels > …
  • 70.
    |69| PoolParty Automated tagging ? •Automated content tagging is possible using a 3rd party solution “Powertagging with Umbraco and PoolParty” • Didn’t really fit our requirements (legacy data, “taxonomy” currently not semantically normalize) Solution ? On demand synchronization from PoolParty to Umbraco • Limited number of syncs (~1/month) • One way sync from PoolParty -> Umbraco • Don’t rely on server availability
  • 71.
  • 72.
  • 73.
    |72| PoolParty Challenges ? • Enrichour sync’ed data with custom “attributes” Examples: • Set country flag for specific “location” taxonomies • Change default descriptions of a taxonomy on the frontend website – “Council of the European Union” -> “Council of the EU” Solution ? • Create a “developer centric” “taxonomy settings section” to create a link between the sync’ed taxonomy and our custom metadata
  • 74.
  • 75.
    |74| Meeting planner data(Oracle db) External tools used by other departments creating “Meeting”s Challenges: • Data stored in external database • Data is only exposed through readonly views on the Oracle db • ~3000 meetings currently in system and available online, about ~100 meetings are created monthly • Approx. 4 meetings/month need additional content editing before publishing • Link with the existing sync’ed taxonomy • Advanced search (date/taxonomy/…) Do we import this data in Umbraco ?
  • 76.
  • 77.
    |76| Meeting planner data(Oracle db) Decisions: • Don’t import any meeting data in Umbraco (you’ve got everything you need already) • Remove connection to Oracle db • Pushing data from Oracle db view to Sql custom table • Enrich meeting data at import and store alongside the meeting data in Sql custom table • Optimize Sql custom table for max performance (index/…) • Meetings created in Umbraco must reference a sql record Result: • Searching/quering a db still very fast (Optimize sql/storage for optimization) • Content editors can still use Umbraco to add more content • Don’t bloat the Umbraco system with nodes that don’t add any added value
  • 78.
    |77| External asset library(TV)Newsroom • Most assets are referenced from an internal asset library shared across multiple teams/units/... • Some assets are stored externally (Rackcdn.com) • Still use the media section for all other assets though Challenges ? - Images are huge, we’re talking about very high resolution images >10Mb - Video’s are stored externally, only public API is available to fetch the content (and thumbnail previews) (Challenge?)
  • 79.
    |78| External asset library(TV)Newsroom Solution implemented • ImageProcessor takes care of retrieving/storing/caching images from multiple sources, both over http and https - Requires a .axd service both http and https endpoints - Proxy configuration is still a bit flaky (PR?) • Offloading API request to fetch info from external source to internal server which will return the results - Finetune network/security
  • 80.
  • 81.
    |80| • Support offull-text search in 24 languages • Boosting of particular elements of the pages • Indexing of “composition” pages • Indexing of external sources (PDFs, external site) • Fast availability of new/updated pages in the index Requirements
  • 82.
    |81| Elastic Search Elastic SearchBackendSearch API Crawler Apache Manifold Frontend Search Crawling Notification
  • 83.
    |82| • Just likeGoogle  • Structured information passed with: – HTTP headers • etag: "078de59b16c27119c670e63fa53e5b51" – Microdata: <time itemprop="startDate" datetime="2017-06- 08T14:45">June 8, 2:45pm</time> – RDFa <div profile=“http://data.consilium.europa.eu/data/public_voting/rdf/schema/Configuration" typeof=”Article"> <span property=” http://data.consilium.europa.eu/data/public_voting/consilium/configuration/agri”>Agriculture and Fisheries</span> </div> Crawling
  • 84.
  • 85.
  • 86.
    |85| • Non-structured =custom legacy xml format • Storage – Content: Sql server – Assets (images/pdf’s): on disk • Other requirement • Process of importing content/assets has to be repeatable in a CI/CD environment • Iterative development, start small, grow fast Migrate “non-structured” content from Ektron into Umbraco
  • 87.
    |86| Looking at two“migration” tools - Cms import (@rsoeteman‘s well known package) - Chauffeur (~Umbraco CLI tool started by @slace) Migrate “non-structured” content from Ektron into Umbraco
  • 88.
    |87| Introducing Chauffeur ”Chauffeur isa CLI for Umbraco, it will sit with your Umbraco websites bin folder and give you an interface to which you can execute commands, known as Deliverables, against your installed Umbraco instance.” • Command line: perfect fit for our continuous integration/deployment scenario • Lightweight: can be easily added or removed from your environments – Drop assembly in /bin folder and you’re set, remove in production – Ability to inject any Umbraco service API – Code once, run anywhere (Build blocks of reusable deliverables) – Create a chain of deliverables to run from (a .delivery file) • Restrictions - Publishing content won’t work! Migrate “non-structured” content from Ektron into Umbraco
  • 89.
  • 90.
    |89| Migrate “non-structured” contentfrom Ektron into Umbraco For each content to be migrated • Get record data out of the legacy Sql server database • Create new content using Umbraco service API • Property data transformation using custom object model and Json.net to serialize to a “json string” • Set property data on the new content • Save new content in cms Challenges - Grid content (rte content) - Customized Vorto implementation - NestedContent / DocTypeGridEditor / Vorto and any possible combinations
  • 91.
    |90| Migrate “non-structured” contentfrom Ektron into Umbraco Deliverable transforms xml into json blob using our custom data object model and Json.net (simplified example)
  • 92.
    |91| Chauffeur references - https://our.umbraco.org/projects/collaboration/chauffeur/ -https://github.com/aaronpowell/chauffeur - https://24days.in/umbraco-cms/2015/may-the-tools-be-with-you/ Migrate “non-structured” content from Ektron into Umbraco
  • 93.
  • 94.
    |93| • First tryto use what’s out of the box or on Our • If not enough Umbraco can be heavily extended • Umbraco can be used in “security conscious” entities Conclusion
  • 95.
  • 96.

Editor's Notes

  • #63 CMS is structured in tabs and properties (and sub-properties) Tabs map to Groups Properties map to Units
  • #64 From Umbraco we export a generic document structure which is extracted into a crude XLIFF doc (with many paragraphs and with HTML markup) This is then processed again and paragraphs are split and HTML converted into inline elements Then each unit is segmented Translation memory is applied to send the bitext format Finally custom extensions are applied before sending to the translation workflow and to SDL Studio
  • #65 Demo will be a pre-recorded video
  • #67 Demo will be a pre-recorded video
  • #68 Demo will be a pre-recorded video
  • #69 Demo will be a pre-recorded video
  • #70 Demo will be a pre-recorded video
  • #71 Demo will be a pre-recorded video
  • #72 Demo will be a pre-recorded video
  • #73 Demo will be a pre-recorded video
  • #74 Demo will be a pre-recorded video
  • #75 Demo will be a pre-recorded video
  • #76 Demo will be a pre-recorded video
  • #77 Demo will be a pre-recorded video
  • #78 Demo will be a pre-recorded video
  • #79 Demo will be a pre-recorded video
  • #80 Demo will be a pre-recorded video
  • #86 Demo will be a pre-recorded video
  • #87 Demo will be a pre-recorded video
  • #88 Demo will be a pre-recorded video
  • #89 Demo will be a pre-recorded video
  • #90 Demo will be a pre-recorded video
  • #91 Demo will be a pre-recorded video
  • #93 Demo will be a pre-recorded video