Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fast and furious(ly) multilingual: Publishing of EU politics in 24 languages with Umbraco

1,517 views

Published on

Developing the website for the Council of the European Union presented many challenges, both technical and organizational.
Dirk and Simone will explain how a team of more than 10 persons collaborated on the development of the website without too much friction and how they deeply customised Umbraco to be able to have a great editorial user experience while still allowing the internal translation unit to translate the original content in 24 languages using their every-day tools within tight deadlines.
They will also cover how they made the site cope with traffic peaks by leveraging a cache layer coupled with a load balancing setup and how they implemented a multilingual full-text search using Elastic Search.
And finally also how they managed to import the pre-existing content from the previous CMS.

Published in: Software
  • Be the first to comment

Fast and furious(ly) multilingual: Publishing of EU politics in 24 languages with Umbraco

  1. 1. |0| Architect @simonech Simone Chiaretta Fast and furious(ly) multilingual: Publishing of EU politics in 24 languages Council of the European Union General Secretariat Directorate-General Administration Directorate Communication and Information Systems Unit Design & Development Disclaimer: The views expressed are solely those of the speaker and may not be regarded as stating an official position of the Council of the EU Clause de non-responsabilité: Les avis exprimés n'engagent que leur auteur et ne peuvent être considérés comme une position officielle du Conseil de l'UE Umbraco Specialist @netaddicts Dirk De Grave
  2. 2. |1| • One of the 3 main EU institutions together with Commission and Parliament • Made of two Councils – Council of European Union • meetings of ministries of each EU country – European Council • Head of states of each EU member state • Rotating presidency (different Member State every 6 months) • 28 Countries • 24 Official Languages Council of European Union
  3. 3. |2| Consilium.Europa.EU
  4. 4. |3| • Report on the work of the European Council, the Council of the EU, the Eurogroup and their presidents to citizens of the EU • Inform the public and media with press releases and news • List all meetings and meetings’ conclusion Goal of the site
  5. 5. |4| • Why we moved to Umbraco • How we work • How we deploy • Scalability and system • Our beautiful editor experience • Standard based translation in 24 languages • Integration with legacy and external systems • Full-Text search • Import from old CMS Agenda
  6. 6. |5| Why Umbraco
  7. 7. |6| • pre-2011: Custom CMS • 2011: Umbraco v4 • Jan 2015: Redesign + Commercial CMS • 2017 Q3: Umbraco v7 Why we moved to Umbraco
  8. 8. |7| • Independent study from CMS expert • PoC done with multiple CMS – Umbraco – Drupal @ EC – EPi Server • Internal evaluation Decision Process
  9. 9. |8| • Faster editing and publishing process • Simple editorial experience • Better integration with translation tools • Better search • Able to handle a team of 30-ish editors Expectations
  10. 10. |9| • Umbraco is not multi-lingual by default • Default translation flow is obsolete and weak • Integration of legacy satellite applications • CI and deployment • Import from old CMS Challenges
  11. 11. |10| How we work
  12. 12. |11| • Multi-discipline team: 2 analysts, 3 frontend devs, 6 backend devs • Scrum/sprint planning/daily standups • Atlassian stack on premise as collaboration tools for: - Analysis (Confluence) - Development (BitBucket) - CI/CD (Bamboo) - Sprint planning/issue tracking (Jira) Development/team setup
  13. 13. |12| Development/team setup (Atlassian stack)
  14. 14. |13| • Few options – Umbraco as a Service #UaaS – Shared database development All developers use the same database for development (doctypes/datatypes/…) – Local database development Each individual developer uses a local database, either Sql server or Sql Server CE Umbraco development setup
  15. 15. |14| Decision ? • Umbraco as a Service = Nay  • Shared development = NO – Perfect for a single environment (eg. DEV) – People don’t need to sync any metadata nor content – Candidate to a cluttered database if someone forgets to delete any metadata or content that is not part of the solution • Local database development = YES – People will need to sync all metadata and “relevant” content – Perfect fit for proof of concept’ing (switching between Sql server/Sql server CE) – Perfect fit for continuous integration with multiple environments if we can find a way to synchronize metadata and content Umbraco development setup
  16. 16. |15| Lightweight, can be easily fine-tuned to only sync minimal settings to get your environments in clean state for both metadata as well as content and can be automated Challenges ? - How to handle media efficiently? - Dealing with exotic datatypes - Long path names in continuous environment uSync
  17. 17. |16| .Core project (Business logic) / .Core.Tests project Startup configuration (DI = Unity, IoC, event handling) PropertyValueConverters ModelsBuilder Model customizations Controllers (Route hijacking all the things) Services Automapper ViewModels Project/solution setup
  18. 18. |17| .Web project Default Umbraco installation Minimal changes allowed (.config) to smooth upgrades App_Plugins for custom built and 3rd party packages (Nuget/Private Nuget) even for packages from the online repository .Frontend project All things related to UI (views/js/css) Frontend team uses their own workflow to generate assets which are copied into the .Web project .Resources project Legacy dictionary Project/solution setup
  19. 19. |18| Workflow = GitFlow Main develop branch Each feature/bugfix = separate branch PR with approval = Merge Merge vs rebase! Strict rules in implementing features Features must be small Changes unrelated to feature = rejected Every feature is discussed upfront Commits / commit messages / PR messages must be very clear
  20. 20. |19| Workflow = GitFlow
  21. 21. |20| How we deploy
  22. 22. |21| Build plan kicks in for every commit on feature/bugfix branch pushed to remote repository - Build must be successful - All related tests must pass - Continuous code quality assurance (SonarQube) Build plan is only responsible for creating the required artifacts Build plan will never change anything (files/configurations) Build/Deployment pipeline (Bamboo)
  23. 23. |22| Different build plans for DEV/TEST and STA/PROD DEV/TEST = 1 single artifact STA/PROD = 2 artifacts, 1x frontend and 1x backend Build/Deployment pipeline (Bamboo)
  24. 24. |23| Build/Deployment pipeline (Bamboo)
  25. 25. |24| • Only if a build has been completed without errors, it becomes candidate for “release” • Release plan takes care moving the artifacts from the build to your “destination” environment • Release plan is also responsible for configuring the environment (web.config transformations, uSync) • Release can be automated (DEV) or is a manual process (TEST/STA/PROD) Build/Deployment pipeline (Bamboo)
  26. 26. |25| Build/Deployment pipeline (Bamboo)
  27. 27. |26| System architecture
  28. 28. |27| • Back-office shielded from internet • Instant publishing of content • Performance and availability Security and publishing
  29. 29. |28| Systems and caching SQL SQL UMBRACO CMS Production Environment Varnish cache servers Umbraco IIS web servers Windows File share cluster HTTP HTTP HTTP HTTP HTTP SQL HTTP SMB SMB Database cluster Internet SQL SQL Authoring/back office HTTP/HTTPS Alteon Load Balancer
  30. 30. |29| • 3 level caching 1. ASP.NET and Umbraco caching 2. Varnish 3. CloudFlare (future) Caching
  31. 31. |30| • Reverse Proxy • Caching based on HTTP Headers • Behavior configurable with a DSL • Possible to invalidate individual pages Varnish
  32. 32. |31| CloudFlare
  33. 33. |32| Making editors happy
  34. 34. |33| • In-page editing experience • Find content easily (even with 1000’s of nodes) Main requirements
  35. 35. |34| Predecessor cms editing experience
  36. 36. |35| Predecessor cms editing experience
  37. 37. |36| Grid / NestedContent / DocTypeGridEditor / Customized Vorto
  38. 38. |37| Grid editing
  39. 39. |38| Grid template output
  40. 40. |39| Grid template customization
  41. 41. |40| Grid settings
  42. 42. |41| Custom content picker (with preview)
  43. 43. |42| Listview (visualsearch.js)
  44. 44. |43| 24 languages in a box
  45. 45. |44|
  46. 46. |45| • 1-1 translation of 24 languages • Batch management of languages • Localize just the minimum need • Export to industry standard XLIFF format • Automatic import of translation Requirements
  47. 47. |46| • XML Localization Interchange File Format • The only open standard bitext format • OASIS standard since 2008 • Supported by all professional CAT tools in the market • Bitext is a file that contains both source and target languages correctly « aligned » What is XLIFF
  48. 48. |47| Tyger Tyger, burning bright, Tigre! Tigre! Divampante fulgore In the forests of the night; Nelle foreste della notte, What immortal hand or eye, Quale fu l'immortale mano o l'occhio Could frame thy fearful symmetry? Ch'ebbe la forza di formare la tua agghiacciante simmetria? William Blake / Giuseppe Ungaretti What is bitext
  49. 49. |48| Tyger Tyger, burning bright, Tigre! Tigre! Divampante fulgore In the forests of the night; Nelle foreste della notte, What immortal hand or eye, Quale fu l'immortale mano o l'occhio Could frame thy fearful symmetry? Ch'ebbe la forza di formare la tua agghiacciante simmetria? William Blake Giuseppe Ungaretti What is bitext
  50. 50. |49| msgid "Tyger Tyger, burning bright," msgstr "Tigre! Tigre! Divampante fulgore" msgid "In the forests of the night;" msgstr "Nelle foreste della notte," msgid "What immortal hand or eye," msgstr "Quale fu l'immortale mano o l'occhio" msgid "Could frame thy fearful symmetry?" msgstr "Ch'ebbe la forza di formare la tua agghiacciante simmetria?" What is bitext
  51. 51. |50| <source>Tyger Tyger, burning bright,</source> <target>Tigre! Tigre! Divampante fulgore</target> <source>In the forests of the night;</source> <target>Nelle foreste della notte,</target> <source>What immortal hand or eye,</source> <target>Quale fu l'immortale mano o l'occhio</target> <source>Could frame thy fearful symmetry?</source> <target>Ch'ebbe la forza di formare la tua agghiacciante simmetria?</target> What is bitext
  52. 52. |51| <source xml:lang="EN">Tyger Tyger, burning bright,</source> <target xml:lang="IT">Tigre! Tigre! Divampante fulgore</target> <source xml:lang="EN">In the forests of the night;</source> <target xml:lang="IT">Nelle foreste della notte,</target> <source xml:lang="EN">What immortal hand or eye,</source> <target xml:lang="IT">Quale fu l'immortale mano o l'occhio</target> <source xml:lang="EN">Could frame thy fearful symmetry?</source> <target xml:lang="IT">Ch'ebbe la forza di formare la tua agghiacciante simmetria?</target> What is bitext
  53. 53. |52| <unit id=1> <segment> <source xml:lang="EN">Tyger Tyger, burning bright,</source> <target xml:lang="IT">Tigre! Tigre! Divampante fulgore</target> </segment> <segment> <source xml:lang="EN">In the forests of the night;</source> <target xml:lang="IT">Nelle foreste della notte,</target> </segment> <segment> <source xml:lang="EN">What immortal hand or eye,</source> <target xml:lang="IT">Quale fu l'immortale mano o l'occhio</target> </segment> <segment> <source xml:lang="EN">Could frame thy fearful symmetry?</source> <target xml:lang="IT">Ch'ebbe la forza di formare la tua agghiacciante simmetria?</target> </segment> </unit> What is bitext
  54. 54. |53| • Linked trees – PRO: default Umbraco approach to localization – CONS: everything else  • Nested nodes – PRO: meaningful history, easier to manage programmatically – CONS: not possible to sync grid structure between languages, needs for custom batch publishing actions (and much more) • Vorto – PRO: just localize what’s needed, just one node per content, one grid structure for all – CONS: loss of meaningful history, needs for custom publishing “flag” per language, more difficult to manage programmatically 3 options
  55. 55. |54| • Vorto (customised) • Custom “vorto-like” grid editor • Custom translation component Solution
  56. 56. |55| Customised Vorto
  57. 57. |56| Vorto in the grid
  58. 58. |57| Custom translation flow (1)
  59. 59. |58| Custom translation flow (1)
  60. 60. |59| Custom translation flow (2)
  61. 61. |60| Custom translation flow (3)
  62. 62. |61| Extraction
  63. 63. |62| Extraction Umbraco Generic document structure Initial XLIFF (with HTML markup) Split paragraphs and extract inline code Segmentation Apply Translation Memory Off to Translation Workflow (SDL Studio) Enrich with custom extensions
  64. 64. |63| • Complete the system  • Make the generic Extraction/Merging library OpenSource • Integrate the Umbraco specific extaction/merging into Umbraco Core https://github.com/simonech/XliffLib Future steps
  65. 65. |64| Integrations
  66. 66. |65| List of internal/external system to interact with • PoolParty to enrich your content with valuable metadata (taxonomies) • Oracle database (MPO Meetings/Meeting planner) • (TV)Newsroom – Video API – Asset/image library • Rss feeds/twitter feeds
  67. 67. |66| PoolParty
  68. 68. |67| PoolParty
  69. 69. |68| PoolParty Why? - Exchange taxonomy between different units within the EU Council, or even more… with the outside world and vice versa Example: A “Location” taxonomy may already exist “somewhere”, so we should be able to transparently reference this taxonomy without the need to create a new one Europe > Belgium > Brussels capital region > Brussels > …
  70. 70. |69| PoolParty Automated tagging ? • Automated content tagging is possible using a 3rd party solution “Powertagging with Umbraco and PoolParty” • Didn’t really fit our requirements (legacy data, “taxonomy” currently not semantically normalize) Solution ? On demand synchronization from PoolParty to Umbraco • Limited number of syncs (~1/month) • One way sync from PoolParty -> Umbraco • Don’t rely on server availability
  71. 71. |70| PoolParty(Sync process)
  72. 72. |71| PoolParty (Sync’ed data)
  73. 73. |72| PoolParty Challenges ? • Enrich our sync’ed data with custom “attributes” Examples: • Set country flag for specific “location” taxonomies • Change default descriptions of a taxonomy on the frontend website – “Council of the European Union” -> “Council of the EU” Solution ? • Create a “developer centric” “taxonomy settings section” to create a link between the sync’ed taxonomy and our custom metadata
  74. 74. |73| PoolParty (Taxonomy settings)
  75. 75. |74| Meeting planner data (Oracle db) External tools used by other departments creating “Meeting”s Challenges: • Data stored in external database • Data is only exposed through readonly views on the Oracle db • ~3000 meetings currently in system and available online, about ~100 meetings are created monthly • Approx. 4 meetings/month need additional content editing before publishing • Link with the existing sync’ed taxonomy • Advanced search (date/taxonomy/…) Do we import this data in Umbraco ?
  76. 76. |75| Meeting planner data (Oracle db)
  77. 77. |76| Meeting planner data (Oracle db) Decisions: • Don’t import any meeting data in Umbraco (you’ve got everything you need already) • Remove connection to Oracle db • Pushing data from Oracle db view to Sql custom table • Enrich meeting data at import and store alongside the meeting data in Sql custom table • Optimize Sql custom table for max performance (index/…) • Meetings created in Umbraco must reference a sql record Result: • Searching/quering a db still very fast (Optimize sql/storage for optimization) • Content editors can still use Umbraco to add more content • Don’t bloat the Umbraco system with nodes that don’t add any added value
  78. 78. |77| External asset library (TV)Newsroom • Most assets are referenced from an internal asset library shared across multiple teams/units/... • Some assets are stored externally (Rackcdn.com) • Still use the media section for all other assets though Challenges ? - Images are huge, we’re talking about very high resolution images >10Mb - Video’s are stored externally, only public API is available to fetch the content (and thumbnail previews) (Challenge?)
  79. 79. |78| External asset library (TV)Newsroom Solution implemented • ImageProcessor takes care of retrieving/storing/caching images from multiple sources, both over http and https - Requires a .axd service both http and https endpoints - Proxy configuration is still a bit flaky (PR?) • Offloading API request to fetch info from external source to internal server which will return the results - Finetune network/security
  80. 80. |79| Full-text search
  81. 81. |80| • Support of full-text search in 24 languages • Boosting of particular elements of the pages • Indexing of “composition” pages • Indexing of external sources (PDFs, external site) • Fast availability of new/updated pages in the index Requirements
  82. 82. |81| Elastic Search Elastic SearchBackend Search API Crawler Apache Manifold Frontend Search Crawling Notification
  83. 83. |82| • Just like Google  • Structured information passed with: – HTTP headers • etag: "078de59b16c27119c670e63fa53e5b51" – Microdata: <time itemprop="startDate" datetime="2017-06- 08T14:45">June 8, 2:45pm</time> – RDFa <div profile=“http://data.consilium.europa.eu/data/public_voting/rdf/schema/Configuration" typeof=”Article"> <span property=” http://data.consilium.europa.eu/data/public_voting/consilium/configuration/agri”>Agriculture and Fisheries</span> </div> Crawling
  84. 84. |83| Import from legacy cms (E-project)
  85. 85. |84| Migrate “non-structured” content from Ektron into Umbraco
  86. 86. |85| • Non-structured = custom legacy xml format • Storage – Content: Sql server – Assets (images/pdf’s): on disk • Other requirement • Process of importing content/assets has to be repeatable in a CI/CD environment • Iterative development, start small, grow fast Migrate “non-structured” content from Ektron into Umbraco
  87. 87. |86| Looking at two “migration” tools - Cms import (@rsoeteman‘s well known package) - Chauffeur (~Umbraco CLI tool started by @slace) Migrate “non-structured” content from Ektron into Umbraco
  88. 88. |87| Introducing Chauffeur ”Chauffeur is a CLI for Umbraco, it will sit with your Umbraco websites bin folder and give you an interface to which you can execute commands, known as Deliverables, against your installed Umbraco instance.” • Command line: perfect fit for our continuous integration/deployment scenario • Lightweight: can be easily added or removed from your environments – Drop assembly in /bin folder and you’re set, remove in production – Ability to inject any Umbraco service API – Code once, run anywhere (Build blocks of reusable deliverables) – Create a chain of deliverables to run from (a .delivery file) • Restrictions - Publishing content won’t work! Migrate “non-structured” content from Ektron into Umbraco
  89. 89. |88| Migrate “non-structured” content from Ektron into Umbraco
  90. 90. |89| Migrate “non-structured” content from Ektron into Umbraco For each content to be migrated • Get record data out of the legacy Sql server database • Create new content using Umbraco service API • Property data transformation using custom object model and Json.net to serialize to a “json string” • Set property data on the new content • Save new content in cms Challenges - Grid content (rte content) - Customized Vorto implementation - NestedContent / DocTypeGridEditor / Vorto and any possible combinations
  91. 91. |90| Migrate “non-structured” content from Ektron into Umbraco Deliverable transforms xml into json blob using our custom data object model and Json.net (simplified example)
  92. 92. |91| Chauffeur references - https://our.umbraco.org/projects/collaboration/chauffeur/ - https://github.com/aaronpowell/chauffeur - https://24days.in/umbraco-cms/2015/may-the-tools-be-with-you/ Migrate “non-structured” content from Ektron into Umbraco
  93. 93. |92| Conclusion
  94. 94. |93| • First try to use what’s out of the box or on Our • If not enough Umbraco can be heavily extended • Umbraco can be used in “security conscious” entities Conclusion
  95. 95. |94| SUPER TAK!
  96. 96. |95| ? Questions

×