Successfully reported this slideshow.

A field guide to the Financial Times, Rhys Evans, Financial Times

1

Share

Loading in …3
×
1 of 72
1 of 72

A field guide to the Financial Times, Rhys Evans, Financial Times

1

Share

Download to read offline

Neo4j GraphTour Europe - Customer Presentation:
A field guide to the Financial Times, Rhys Evans, Principal Engineer, Financial Times

Neo4j GraphTour Europe - Customer Presentation:
A field guide to the Financial Times, Rhys Evans, Principal Engineer, Financial Times

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

A field guide to the Financial Times, Rhys Evans, Financial Times

  1. 1. A field guide to the Financial Times Rhys Evans Principal Engineer, Financial Times @wheresrhys
  2. 2. @wheresrhys Who I am ● Worked in tech 10+ years ● Gradually moved into tooling ● Co-lead the FT’s Reliability Engineering team ● Lifelong birdwatcher
  3. 3. @wheresrhys From Wikipedia: A book designed to help the reader identify wildlife (plants or animals) or other objects of natural occurrence (e.g. minerals). What is a field guide
  4. 4. ● Why the FT needs a field guide ● Organising our guide with neo4j and GraphQL ● Filling in the details
  5. 5. Why the FT needs a field guide
  6. 6. @wheresrhys Insert non dramatic screenshot
  7. 7. @wheresrhys
  8. 8. @wheresrhys
  9. 9. @wheresrhys
  10. 10. @wheresrhys
  11. 11. @wheresrhys “A tool dating from before the trees that built the ark. Unowned, unknown, and worth £250k of business. One day it fell over. We founds docs dated 1999... which helped” Greg Cope, Tech Director, FT
  12. 12. @wheresrhys Starting about 5 years ago, the range of tech we have to support exploded
  13. 13. @wheresrhys Previously Centralised decision making Monolithic architectures Data centres Infrequent releases
  14. 14. Move slow and achieve little
  15. 15. @wheresrhys Microservices FT were early adopters of microservices architecture Lots of independently deployed services easier to ● Pick the right tool for the job ● Release and iterate ● Replace and decommission
  16. 16. @wheresrhys Liberalisation Matt Chadburn http://matt.chadburn.co.uk/notes/teams-as-services.html “[...] follow the mechanics of free- market economy. Teams are allowed and encouraged to pick the best value tools for the job at hand”
  17. 17. @wheresrhys OUT IN Data Centre Your favourite cloud ‘The FT Platform’ Pick your own SaaS Java, Java, Java I hear Rust’s good... Ivory tower What works
  18. 18. @wheresrhys “The upside of this is teams, left to their own devices, and trusted to make responsible decisions will choose what is best for themselves and the business in the long-term.” Matt Chadburn http://matt.chadburn.co.uk/notes/teams-as-services.html
  19. 19. Build stuff and disappear
  20. 20. @wheresrhys Legacy is sooner than you think ● All images appearing on our websites relied on 1 person... who left ● A vanity url service built by a feature team that disbanded shortly after ● Part of our membership platform built in a niche language ● And many, many more
  21. 21. @wheresrhys 5 years is a long time in tech Long enough for ● Shiny new things to become legacy ● Budgets and business priorities to move on ● People to leave
  22. 22. @wheresrhys ● Have to keep lots of tech ticking over ● Generating more new stuff than ever before to keep track of ● Liberalising the tech department leads to ownership & maintenance problems Need a field guide to help us navigate the space In summary
  23. 23. Unowned & unknown
  24. 24. Owned & known
  25. 25. Organising our guide with neo4j and GraphQL
  26. 26. @wheresrhys ● Reaffirm who owns the various bits of FT tech ● Improve information about what is actually running and why ● Determine what state it’s in at any given time 3 priorities to improve reliability
  27. 27. @wheresrhys Who is our audience? Operations team ● Active 24/7 ● Broad knowledge of our tech platforms ● Need to know which approaches can be applied to incident X ● If nothing works, who to call
  28. 28. @wheresrhys CMDB versions 1 - 3 were: ● Too inert - Enter once and forget about it ● Too brittle - Chains of responsibility easily lost ● Too discrete - Hard to make important connections Not the first attempt
  29. 29. @wheresrhys ● The natural question to ask when addressing a problem ● Links between people and things dotted all over our previous CMDBs ● Intuitive but brittle Who can help me with system X?
  30. 30. @wheresrhys ● Hard to connect data, so get overly simplified models of reality ● Several degrees of separation is modelled as a systemOwner field ● Simple, but inaccurate and hard to maintain Relational databases constrain
  31. 31. @wheresrhys ● Designed to model complex relationships ● No need to simplify and abstract away details that actually matter ● If person X is a stakeholder via 4 degrees of separation, represent them as such Graph databases liberate
  32. 32. @wheresrhys A graph restatement of the problem ‘How can I ensure systems are assigned to the right people’ → ‘How can I ensure systems are connected somehow to the right people’
  33. 33. @wheresrhys System ? ? ? ? ? ? ? ?
  34. 34. Model the stable stuff first Model the stable stuff first
  35. 35. @wheresrhys ● Pick a unique, human readable code ● Kill infrastructure not tagged with it ● In our graph, the System record must be connected to a Team When systems are created we:
  36. 36. @wheresrhys ● Stable, manageable subdivisions of the organisation ● Tech director who is ultimately responsible On top of this stable foundation we can add the more ephemeral things Our tech connected to
  37. 37. @wheresrhys BIZ-OPS MAN
  38. 38. @wheresrhys ● Self-service ● No such thing as a power user ● Extensible ● API first, but UI a close second Data warehouse free
  39. 39. @wheresrhys REST API ● OK when fetching a single record type ● Painful to traverse ‘Canned query’ endpoints ● Less generic ● Limited by our imagination Some poor API options
  40. 40. @wheresrhys GraphQL to the rescue “GraphQL is a query language for APIs [...] gives clients the power to ask for exactly what they need [...] not just the properties of one resource but also smoothly follows references between them”
  41. 41. @wheresrhys neo4j-graphql-js ● GraphQL normally talks to multiple APIs and combines the results ● neo4j-graphql-js converts GraphQL queries to cypher, and talks to neo4j directly
  42. 42. @wheresrhys
  43. 43. @wheresrhys GraphQL big wins ● User friendly: Single, grokable query to get unlimited connected info ● Future proof: Mirrors the neo4j graph as its complexity grows ● More efficient: Fewer API calls and fewer and faster DB calls
  44. 44. @wheresrhys ● Hungry users: Allows unwitting construction of very expensive queries ● Caching: Not obvious what caching behaviour to implement ● To write or not to write: Not persuaded to move away from REST yet Pitfalls of GraphQL
  45. 45. @wheresrhys An extensible UI
  46. 46. @wheresrhys
  47. 47. #GRANDstack GraphQL + React + Apollo + Neo4j Database https://grandstack.io/
  48. 48. @wheresrhys In summary ● Some confidence that Biz Ops won’t degrade into a data graveyard ● Unlimited access to data for any person or machine But is the data actually any good?
  49. 49. Filling in the details
  50. 50. @wheresrhys Not the first attempt CMDB versions 1 -3 were ● Too inert - Enter once and forget about it ● Too brittle - Chains of responsibility easily lost ● Too discrete - Hard to make important connections
  51. 51. @wheresrhys Don’t rely on good behaviour ● Automate ● More carrot, less stick ● Gamify ● UX
  52. 52. @wheresrhys Automate ● Machines don’t forget to update information ● Restrict write access for certain records/types to privileged clients ○ people-api → Writes details of FT staff ○ github-importer → Writes details of repositories ○ …
  53. 53. @wheresrhys More carrot, less stick
  54. 54. @wheresrhys Gamify Teams respond well to seeing how they compare, and how they can improve
  55. 55. @wheresrhys UX
  56. 56. @wheresrhys
  57. 57. @wheresrhys Not just visual design ● Understand your users ● Uncover sources of friction ● Learn about their existing/ideal workflow ● Don’t expect them to come to you ● “Good design is invisible”
  58. 58. @wheresrhys ● System source code changes in Github, ● But runbook authorship in Biz Ops ● Bound to get out of step ● What if they happened concurrently? Example: runbook authorship
  59. 59. @wheresrhys ● Runbooks written in RUNBOOK.md with front matter metadata ● Content pulled into Biz Ops when production code release detected ● Github PR integrations to follow Example: runbook authorship
  60. 60. @wheresrhys ● Underpinning how we handle GDPR requests ● Quicker triaging of security incidents ● Integrating with leavers process More benefits → more incentives to improve data Beyond operational info
  61. 61. What have we learned today?
  62. 62. Model the stable stuff first Legacy code comes to us all
  63. 63. Model the stable stuff first Documented legacy is good legacy
  64. 64. Model the stable stuff first Graphs enable more powerful modelling
  65. 65. Model the stable stuff first Using #GRANDstack is like being the film version of Mark Zuckerberg
  66. 66. Model the stable stuff first Your data won’t update itself
  67. 67. Model the stable stuff first UX and other feedback loops can keep it fresh
  68. 68. Thank you The team: Geoff Thorpe, Laura Carvajal, Charlie Briggs, Katie Koschland, Simon Legg, Maggie Allen, Courtney Osborn, Kat Downes, Sentayhu Mekoonnali, David Balfour Images from: https://www.audubon.org/birds- of-america/ @wheresrhys www.ft.com/dev/null

Editor's Notes

  • Flagship website - ft.com
  • Diverse range of websites
  • A range of tech not seen much externally
  • Print & distribute 6 days a week
  • If this weren’t enough, we are at the whims of a fickle news cycle
  • It’s a lot to keep an eye on
  • Key phrase unowned and unknown
  • There was a conscious movement away from this centralised approach as it was failing to deliver
    2 responses emerged at around the same time
  • Side effect of this means instead of one big thing you have many little things to look after

    Rather than build one big thing, build lots of little things
  • Freeing up teams to choose what they need to deliver value for the business quickly
  • Draw particular attention to long-term
  • Left running the stuff that they built
  • Quickly find yourself in a position where you have some legacy system nobody is looking after

    When you liberalise this WILL happen
  • Even if not liberalised, these facts are still true
  • If you recognise any of the problems above in your own organisation, maybe some of our solutiosn can inspire you

  • New team set up
    I’ll talk mainly about the first 2, but we’ll touch on the third as well

  • It needed a rethink

  • Hard to track movement of people as they move a lot
  • Systems connected directly to variety of things - idiomatic in a relational data store to have few degrees of separation because creaks under more complex rels
    Show diagram of the odl model mapped to neo

    No longer have to have a model that ‘leaks’ the choice of DB out
  • How caMake sure each system is connected to the graphHow can a system be This doesn’t solve the problem by itself
    Ultimately people move on -THAT is the problem. Neo4j allows us to connect to better behaved entities, such as teams, and fro there connect to peopleNow can concentrate on the relationships that matter, not eth relationships that are easy
    Explain the direct connections to tech director mean lots of records need maintaining, but with graph only one link
    We can stop the battle of attrition

  • When systems are created we
    Enforce assigning a unique, human readable code, to the infrastructure e.g. biz-ops-api
    In our graph, the System record must be connected to a Team
    Teams are relatively few, their hierarchy easily maintained, and ultimately lead to a Tech Director


    Fixed -/. Less fixed
  • Inaccurate data that’s waiting to happen
    Start with things you know you can maintain Poorly maintained ACCURATE data will become inaccurate data
  • Compare to previous problem… rather than... nt that we won’t lose track of the critical stuff
    With system -> Team as the core datum [ENFORCED ON CREATION, and cannot create infrastructure without a system code) we can build on top of it

    Special people relationships e.g. technicalOwner still exist, but the responsibility clearly lies with the team to find a new person
    Cost attribution

    System -> team -> group -> tech director is the critical path

    BUT clear responsibility doesn’t necessarrilly mean well mainatined - we are all busy [eopl


  • Lots of connections between people and systems

    Who wants to know about GDPR & this system - HAS_DATA_WONER
  • List of lists Can piggy bag on that chain of responsibility
    Amazing what intersting connections you find
  • Any query goes
    Extensible without needing lots of dev work
    Talk about componentisation, origami etc
    But with this richer, more democratised and extensible data set, the hope is that we will store more connected data able to answer more and more of the questions the business wants to answer
    How can we open up access to the data and stop our team being a bottleneck?

    Examples


  • Simple rest endpoints and expect users to traverse themselves? Bas for users (complex) and bad for us (load) , but begins to add opinions, and favour the interactions we can imagine now, not what people may want in the future
  • Perfect - ask for things and the things they’re connected to
  • E.g. if query is simple prob little cache is fine Far less obvious what keys to cache on, and for how long
  • DB & API can grow organically…
    ...but our users want a UI
    Which must similarly be able to grow without our team becoming a bottleneck

    With graphQL as the foundation, we’ve extended the schema to create an entire read/write ecosystem for this data:

    graphQL = name, description, type
    Biz ops = name,description, type, label, isSearchable, required….


    Use ES, but neo4j should be our search DB soon too


    Some people don’t like yaml, because some people are wrong


  • Don’t let any of the code in any layers be opinionated
    Take waht given, apply generic rules
    Data & schema driven Mention mobile friendly





  • No downer on liberalisation
    Woul dnever’ve happened under central planning
  • This is what the cool kids are calling it
  • Tackled brittle & discrete, but not inert yet
    Accurate data is still bad data if you have no confidence in how current it is e.g. misleading confidence ‘don’t know what you don’t know’

    But any people problem shouldn’t be attributed to human error https://www.outcome-eng.com/human-error-never-root-cause/ We arrive back at tech or process to fix what’s wrong



  • No such thing as human error
  • There is a source of truth we can rely on for current information, and biz ops to make the right connections
  • Provide tangible benefits

  • Data correction journey - link to restricted form
    Show good dashboards
    Getting good quality data is rarely purely a technology problem
    Systems don’t forget to update data, _people_ forget to update data



    Visibility, easy wins, Natural catalyst
  • On a public website we work wth UX to drive up conversions Why not on an internal site to drive up ‘behaviour conversions’?

    UX = tech x 10

    Refine the solution so that people can be successful in doing what you want them to do
  • On a public website we work wth UX to drive up conversions Why not on an internal site to drive up ‘behaviour conversions’? And this is for, what, a documentation site? Roll over confluence and github

    If the tools you provide are a pleasure to use, peopel warn to the task
  • Invisibility can apply to workflow
    We as engineers shoudl think of more invisibility
  • Runbook = pages of the fieldguide
  • we are persisting in making biz-ops the default choice of data store. The more types of data it contains, the more useful connections can be made, and the more powerful it becomes.

    Within 3 months of building the platform which is naturally extensible it’s already starting to snowball and we are unable to keep up with demand

    Bringing forward features such as self-deploying schema updates to remove us as a bottleneck


  • Obviously, try to represent _some_detail - don’t represent everything as a single amorphous blob -
    but as soon as you have doubts about how easy it will be to maintain the data, step back to a less granular level A mistake previous incarnations had made was to model what we want to know, regardless of what we can realistically maintain. Misleading in the end Poorly maintained ACCURATE data will become inaccurate data
  • Obviously, try to represent _some_detail - don’t represent everything as a single amorphous blob -
    but as soon as you have doubts about how easy it will be to maintain the data, step back to a less granular level A mistake previous incarnations had made was to model what we want to know, regardless of what we can realistically maintain. Misleading in the end Poorly maintained ACCURATE data will become inaccurate data
  • Obviously, try to represent _some_detail - don’t represent everything as a single amorphous blob -
    but as soon as you have doubts about how easy it will be to maintain the data, step back to a less granular level A mistake previous incarnations had made was to model what we want to know, regardless of what we can realistically maintain. Misleading in the end Poorly maintained ACCURATE data will become inaccurate data
  • Obviously, try to represent _some_detail - don’t represent everything as a single amorphous blob -
    but as soon as you have doubts about how easy it will be to maintain the data, step back to a less granular level A mistake previous incarnations had made was to model what we want to know, regardless of what we can realistically maintain. Misleading in the end Poorly maintained ACCURATE data will become inaccurate data
  • Obviously, try to represent _some_detail - don’t represent everything as a single amorphous blob -
    but as soon as you have doubts about how easy it will be to maintain the data, step back to a less granular level A mistake previous incarnations had made was to model what we want to know, regardless of what we can realistically maintain. Misleading in the end Poorly maintained ACCURATE data will become inaccurate data
  • Obviously, try to represent _some_detail - don’t represent everything as a single amorphous blob -
    but as soon as you have doubts about how easy it will be to maintain the data, step back to a less granular level A mistake previous incarnations had made was to model what we want to know, regardless of what we can realistically maintain. Misleading in the end Poorly maintained ACCURATE data will become inaccurate data
  • ×