Sharing Data is Caring Data by Mark Terry (codeHarbour June 2019)

•Download as PPTX, PDF•

0 likes•101 views

Considerations for creating, storing and trusting a unified business approach to data in a distributed environment. In order to prevent disjointed and competing views of business facts.

Technology

Sharing Data is Caring Data
With Mark Terry

Sharing Data Is Caring Data
Is your data playing well with others?

Introduction / Motivation
 20+ years experience in the
industry
 Working at Holiday Extras
 Sharing data is not easy
 Microservices playing nice
with a data lake
 We are still learning...

The monolith
 Common in smaller organisations
 Often seen as legacy or older tech
 Unintended victims of their own success
 Serving businesses well for years

Monolithic Reporting
DatastoreApp
Report
Admin/ Data view
Report

Microservice Decomposition
Datastore
App
ComponentComponent
Component
Microservice

Monolithic Datastore
Datastore
App
Component
Microservice Microservice
Microservice

Operational datastores
Microservice Microservice
MicroserviceMicroservice
DB
DB
DB
DB

Generating Events
Microservice Microservice
MicroserviceMicroservice
DB
DB
DB
DB
Event
Event
Event
Event

$The Rules Booking = { customer_id: ‘integer’, product_ref: ‘string’, amount: ‘integer’, currency: ‘enum’ }$

Data Processing Pipeline
Data Processing
Historical DB
Event
Event
Event
Event
Data Processing
Raw Data

Historical Data
Data Processing
Historical DB
Data Processing
Reporting
Prediction
Operational
DB loads

Event Driven System
Event
Event
Event
Event
Microservice
Microservice
Microservice

Generating Even More Events!
Microservice Microservice
MicroserviceMicroservice
DB
DB
DB
DB
Event
Event
Event
Event

Positive Data Culture
What do we need
to report on?
What state is
changing?
Which business
entities are
involved?
How do we
measure success?
Can this data be
useful to others?
What future products
could the data enable?

What's hot

Technology tipping points Big Data and Blockchain use case presentationVinod Kumar Nerella

Towards a Post-Modern Hash Chain FutureCasey Kuhlman

Wbc blockchain in manufacturingHarish Pant

Application of blockchain in manufacturing industryCeline George

An Introduction to Blockchain Technology Niuversity

Overview Of Blockchain Technology And Architecture Powerpoint Presentation Sl...SlideTeam

Blockchain use cases and case studiesInnoTech

Iata blockchain presentation icaew suisse branchTim Moss

How to Apply Blockchain to Supply-Chain ManagementFluence.sh

Blockchain in banking bucharest meetupAlex Proca

Blockchain workshop decision tree handoutSamantha Reynolds

Blockchain: Exploring the Fundamentals and Promising Potential InnoTech

“Y si tu frigo te hace la compra en el supermercado más adecuado?”Digital Currency Summit

0. dao as a token economyAndy Martin

PSCU follows PCI Compliance Guidelines for Self-Service BI through Webi - SAB...CCG

Blockchain - Closer than it Appearssnewell4

Blockchain - a Catalyst for BusinessMichael John Peña

Information governance and blockchainMetataxis

Blockchain in banking 2020celine anderson

Blockchains : Risk or Mitigation?ITU

What's hot (20)

Technology tipping points Big Data and Blockchain use case presentation

Towards a Post-Modern Hash Chain Future

Wbc blockchain in manufacturing

Application of blockchain in manufacturing industry

An Introduction to Blockchain Technology

Overview Of Blockchain Technology And Architecture Powerpoint Presentation Sl...

Blockchain use cases and case studies

Iata blockchain presentation icaew suisse branch

How to Apply Blockchain to Supply-Chain Management

Blockchain in banking bucharest meetup

Blockchain workshop decision tree handout

Blockchain: Exploring the Fundamentals and Promising Potential

“Y si tu frigo te hace la compra en el supermercado más adecuado?”

0. dao as a token economy

PSCU follows PCI Compliance Guidelines for Self-Service BI through Webi - SAB...

Blockchain - Closer than it Appears

Blockchain - a Catalyst for Business

Information governance and blockchain

Blockchain in banking 2020

Blockchains : Risk or Mitigation?

Similar to Sharing Data is Caring Data by Mark Terry (codeHarbour June 2019)

KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationDenodo

Detecting Opportunities and Threats with Complex Event Processing: Case St...Tim Bass

Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward

Partena 2010.02.10lucdelanglez

Why Data Virtualization? An IntroductionDenodo

Microsoft SQL Server 2008 R2 and BizTalk Server PresentationMicrosoft Private Cloud

Introduction to Modern Data Virtualization 2021 (APAC)Denodo

Réinventez le Data Management avec la Data Virtualization de DenodoDenodo

Coin:Evolutionary and pervasive services - Digibiz'09Digibiz'09 Conference

Building Your Data Hub to Support DigitalDenodo

Building and deploying microservices with event sourcing, CQRS and Docker (QC...Chris Richardson

Building and deploying microservices with event sourcing, CQRS and Docker (Ha...Chris Richardson

Complex Event Processing (CEP) for Next-Generation Security Event Management,...Tim Bass

Data Virtualization for Data Architects (New Zealand)Denodo

Developing Event-driven Microservices with Event Sourcing & CQRS (gotoams)Chris Richardson

Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo

Data Virtualization for Data Architects (Australia)Denodo

Transforming Financial Services with Event Streaming Dataconfluent

Time Difference: How Tomorrow's Companies Will Outpace Today'sInside Analysis

Similar to Sharing Data is Caring Data by Mark Terry (codeHarbour June 2019) (20)

KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization

Detecting Opportunities and Threats with Complex Event Processing: Case St...

Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...

Partena 2010.02.10

Why Data Virtualization? An Introduction

Microsoft SQL Server 2008 R2 and BizTalk Server Presentation

Introduction to Modern Data Virtualization 2021 (APAC)

Réinventez le Data Management avec la Data Virtualization de Denodo

Coin:Evolutionary and pervasive services - Digibiz'09

Building Your Data Hub to Support Digital

Building and deploying microservices with event sourcing, CQRS and Docker (QC...

Building and deploying microservices with event sourcing, CQRS and Docker (Ha...

Complex Event Processing (CEP) for Next-Generation Security Event Management,...

Data Virtualization for Data Architects (New Zealand)

Developing Event-driven Microservices with Event Sourcing & CQRS (gotoams)

Denodo Data Virtualization - IT Days in Luxembourg with Oktopus

Data Virtualization for Data Architects (Australia)

Transforming Financial Services with Event Streaming Data

Time Difference: How Tomorrow's Companies Will Outpace Today's

Recently uploaded

Build your next Gen AI Breakthrough - April 2024Neo4j

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada

WordPress Websites for Engineers: Elevate Your Brandgvaughan

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

APIForce Zurich 5 April Automation LPDGMarianaLemus7

Bluetooth Controlled Car with Arduino.pdfngoud9212

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

"ML in Production",Oleksandr BaganFwdays

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

costume and set research powerpoint presentationphoebematthew05

Recently uploaded (20)

Build your next Gen AI Breakthrough - April 2024

Scanning the Internet for External Cloud Exposures via SSL Certs

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Unleash Your Potential - Namagunga Girls Coding Club

Dev Dives: Streamline document processing with UiPath Studio Web

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

My Hashitalk Indonesia April 2024 Presentation

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

WordPress Websites for Engineers: Elevate Your Brand

My INSURER PTE LTD - Insurtech Innovation Award 2024

"Debugging python applications inside k8s environment", Andrii Soldatenko

APIForce Zurich 5 April Automation LPDG

Bluetooth Controlled Car with Arduino.pdf

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

"ML in Production",Oleksandr Bagan

Nell’iperspazio con Rocket: il Framework Web di Rust!

Pigging Solutions Piggable Sweeping Elbows

costume and set research powerpoint presentation

Sharing Data is Caring Data by Mark Terry (codeHarbour June 2019)

1. Sharing Data is Caring Data With Mark Terry

2. Sharing Data Is Caring Data Is your data playing well with others?

3. Introduction / Motivation  20+ years experience in the industry  Working at Holiday Extras  Sharing data is not easy  Microservices playing nice with a data lake  We are still learning...

4. The monolith  Common in smaller organisations  Often seen as legacy or older tech  Unintended victims of their own success  Serving businesses well for years

5. Monolithic Architecture DatastoreApp

6. Monolithic Reporting DatastoreApp Report Admin/ Data view Report

7. Microservice Decomposition Datastore App ComponentComponent Component Microservice

8. Monolithic Datastore Datastore App Component Microservice Microservice Microservice

9. Operational datastores Microservice Microservice MicroserviceMicroservice DB DB DB DB

10. Generating Events Microservice Microservice MicroserviceMicroservice DB DB DB DB Event Event Event Event

11. The Rules Booking = { customer_id: ‘integer’, product_ref: ‘string’, amount: ‘integer’, currency: ‘enum’ }

12. Data Processing Pipeline Data Processing Historical DB Event Event Event Event Data Processing Raw Data

13. Historical Data Data Processing Historical DB Data Processing Reporting Prediction Operational DB loads

14. Event Driven System Event Event Event Event Microservice Microservice Microservice

15. Generating Even More Events! Microservice Microservice MicroserviceMicroservice DB DB DB DB Event Event Event Event

16. Positive Data Culture What do we need to report on? What state is changing? Which business entities are involved? How do we measure success? Can this data be useful to others? What future products could the data enable?

17. Questions? @TheDumbTerminal

Editor's Notes

https://tech.holidayextras.com/sharing-data-is-caring-data-f0b66f7c2f49
Hi I’m mark terry Thought I’d talk about a recent blog post about how we are sharing data at Holiday Extras between the various parts of the business and teams. Allowing us to implement a growing number of microservices and still maintain usable data warehouse. For these slide I’ll be focusing on the implementation detail of how we are currently doing this. But we are still learning here, and still making improvements in this space.
So this is where we started, years ago. Much like other companies I’ve worked in. Companies spend either let them be, or spend years moving away from them, generally they work well so they are hard to just kill. Monoliths do get a fair bit of bad press, but in data terms things are ok so far...
Probably the simplest diagram I’ve put on a slide. Generally monolithic apps are paired with a large datastore too. This was the case in several places I’ve worked Things are great data wise as there is a single place where engineers can store data, and no one needs to think about differing standards or schemas as it can be tightly coupled with the app. From an engineering point of view this could be seen as a negative but from a data view often the data is just appended to in whatever format is already there. Consistency wins here.
Overtime this DB is also used for servicing reporting to the business and there might be some simple admin screens to give some insight into the data contained within it. Once source of truth of the business data One place to go to find the numbers. These database often creak under the strain of needing to be quick for the application but contain enough data for good reporting. (pick one)
Enter the world of microservices. Often there are reasons to break a monolith down into smaller services. Those reasons are a whole other talk but mostly relate to developer experience or deployment cadence. A common pattern here is to identify components inside the larger app and move these out into their own service. The new service will still use the original datastore, to limit the amount of refactoring required at each step. This a good example of not thinking about data first.
After several services are broken out of the larger app, you end up with this architecture anti-pattern, the monolithic datastore. Multiple descent services still sharing the same datastore. We had this problem at Holiday Extras. You will not be affected by it immediately but it will get you son enough. This couples the data of services together so database schema changes require complicated deploys. Services can access and update data without going through advertised interfaces of the services. Making it harder to cache and identity sources of truth.
Microservices should each have their own operational datastores that is only accessible by that service. The data stored in these relate to the function that the service provides. Data might be duplicated in the different stores with tech and formats may differing. The sole access to data is via the service’s advertised interface. Operationally things are great at this point, but our precious data is locked away in many databases reporting and sharing is going to be much harder now.
We go through a process of identifying what business entity a service changes and we have that service emit an event when this happens. An event is a payload describing something that has happened. For example a new customer account has been created or a booking has been made. If there are multiple services that perform similar tasks that similar events should be sent, for example having two booking systems. These events are the key to sharing business data, they serve as an abstraction layer from the implementation detail in a service.
Now the hardest part of this whole process. Schema’ing! Deciding what makes up an event. When you go through this process even the smallest of points will take time. You’ll be surprised by the differences of opinion here. These discussion do pay off in the long run, its upfront pain which the engineers need to go through. But it gets easier the more schemas that are created.
So what do we do with these events? Well we collect them all into a single “pipeline”. The pipeline is made up of several smaller components (microservices) to provide the features we need to use this new data in the business. In this example we are storing raw data as files and then also storing the events into a single datastore for warehousing. Other tasks could be added to the pipeline as required, redaction, segregation etc..
From the data warehouse we can then add reports required by the business from a singe source. Great for compliance and makes it much easier join related data together. We can run large queries here as its completely separated from the operational space. No data is deleted, great for having large datasets for trends, and predicting customer intents.
The other major feature of this approach is as you have service generating events, you can have services also consuming these events. Advantages here can include looser coupling of services and queue processing for free. Services can be built around business entities state changes rather than from current implementation details. For example send and email when we have a booking event, rather than allow our booking API to send a booking confirmation when someone books online. It makes the engineers think a bit more generic and how a new service might be useful to others. Services built for the individual team but can be used by the entire busines.
Then the whole process starts again, services consuming events will change state and generate more events, more microservices. More data to report and analyse!
This whole process gives us a separation of business data from the implementation detail, allowing services to be changed but data consistency remains. It makes engineers and stakeholders think about the data they need to report on or how a new service would alter business data. Data driven development can be used if going to the extreme. Some example questions shown that can help during the development process.
Twitter account if you want to get in touch or happy to chat later this evening.

Sharing Data is Caring Data by Mark Terry (codeHarbour June 2019)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sharing Data is Caring Data by Mark Terry (codeHarbour June 2019)

Similar to Sharing Data is Caring Data by Mark Terry (codeHarbour June 2019) (20)

More from Alex Cachia

More from Alex Cachia (20)

Recently uploaded

Recently uploaded (20)

Sharing Data is Caring Data by Mark Terry (codeHarbour June 2019)

Editor's Notes