BUILDING CLOUD DATA
PLATFORMS IN ENTERPRISES
Josip Saban, M. Sc., E-MBA, PMP
Data consultant
About me
■ M.Sc. In Computing on University of Zagreb, MBA on Cotrugli (Croatia), Executive MBA on
Innovation and Entrepreneurship in area of toxic management on TU Wien (Austria)
■ 20+ years of “everything” data, supporter of best engineering and management practices
■ Tools
– Snowflake, Databricks, MS Fabrics, Tableau, PowerBI, SQL Server, Oracle, AWS, Azure,
Python, dbt, …
■ Previous management experience
– Two times startup co-owner, internal employee in various leading roles in Hypo Bank
and Erste Group, external consultant for SwissRe, Gartner, SteerCRM, Santander Bank
■ Writing blogs and holding lectures
– Wrote a book on Engineering Management
– More details on current activities at https://at.linkedin.com/in/josipsaban
Data platform summary
■ Don't overcomplicate the "data platform" term because everybody else does it
■ Central platform – ingests data, integrates data, stores the data, processes & transforms
■ Integrated management, governance & security – manages the data ( DQ, lineage,… ), secures data
■ Analytical and consumption platform – serves curated data, provides insights
■ And…that’s it
Or…that is not it…
■ Before IT operations knows it started…it started ( usually ) months/years ago
■ Drivers
– Legacy architecture - locally hosted datacenter maintained by small infra team offering no
dynamic scalability, fragile, little automation
– History of silos - homegrown data marts and purchased BI solutions – slow reaction to changes,
lots of black boxes, duplication of data, spaghetti-code
– Immature organization - no history of internal software development practices and little
experience with cloud
■ Human and organizational attitudes preventing innovation
– Development/operations is all that matters, everything else is overhead – “developers are heroes
of modern age”
– Assumptions that one role type is more important than the other – “management is useless, they
just consume air and produce nothing”
– Complete lack of understanding how tickets and demand appear in backlog
– We will do “agile” in data engineering project – good luck in your next data project failure 
– Not understanding that “you” own your data pipeline
■ Thinking that accountability and responsibility are the same thing – they are not!
– Not understanding that data engineering is NOT software development
Real assumptions of people with experience
■ You have built at least one end-to-end data solution, and you know…
– You are building a service, not evangelizing technology
– You know that things will not work out as planned, plan is “more or less” a guideline
– You know that there will be very little/no documentation
– You know that there will be some last century legacy that “no one” maintains
– You know that scope/stakeholders/complexity…will increase
– You know your work is as political as it is technical
– You know that you will have silo thinking, personal interests and conflict
– You know…that you do not know how it will look like
Two levels of motivation
“Sales” drivers Informal “unspoken” drivers
Cost reduction in running on-premise data centers Legacy software that cannot be maintained
Increased response time to business needs and reduced time to
market
High license costs combined with legacy enterprise tools
Improved scalability…in both directions Modernization of processes which are not supported by current tools
Usage of new tools and processes with little effort in architecture
and capital investments
Dozens, and sometimes hundreds, databases across enterprise
siloes
Consumption of new data sources which are impractical with on-
premise solutions
Mandatory integrations due to mergers and acquisitions processes
Partial decrease of responsibility/accountability for maintaining
environments
Easier implementation of security and GDPR compliance
Improvement of data quality and data observability with new tools …
■ In summary –data platform is more business than technical project
– It is a consequence and solution to problems we cannot admit exist
– And…it gives us ability to serve data to clients in new ways
– It is, primarily, an organizational topic
End goal
■ Sales pitch
– “To be one-stop-shop for providing, searching, and
consuming business relevant data”
– “Single source of truth”
– …
■ Technical pre-sales pitch
– …self service data ( consumers ) and data quality and
observability ( producers )
– …data catalogue functionalities “out of the box”
– …transparent data quality pipelines and orchestrations
– …”data products” ( whatever they are…again, highly
political topic )
– …centralized compliance
– …
■ What it will really bring…
– …hope that things will be better and that we
can serve better our customers
– …involvement of business in data observability,
quality and consumption
– …modernization and creation of processes
– …democratization of data ( this is not a
buzzword! )
– …political conflict on all organization levels
But what are we really paying?
■ Data platform is an architectural challenge of
setting up connected “black” boxes
■ It is more important to define features and
interoperability than focus on specific tools
■ We need to design it in a way that we can change
tools without endangering stability
■ Data platform is not a one-size-fits-all solution – it is
built in iterations
– What new orchestrators, databases, analytic
or reporting tools will be on market tomorrow?
■ It is fundamental to have the ability to…
– …to choose the right technology
– …at the right time
– …without overhauling the entire architecture –
it is not “IT geek playground”
■ Creating data platforms is a business first,
technology second project
■ Functioning data platform is your strategic asset, all
other systems are users…
– …data sharing
– …frontend applications
– …data catalogues
– …reporting
■ You are creating analytic systems to improve or
enable business performance
■ If you do not have at least one use-case per sponsor
your initiative will fail
■ You need to write a lot of documentation, be
transparent and demand support
■ Selected use cases need to be core company
processes
■ You need to setup, in advance, measurable KPIs that
define success criteria
Importance of sponsors
■ Data platform is a very visible tool, it brings visibility of many “hidden” silo problems
– Sponsors must have a name, hierarchical level and motivations
■ There are many non-obvious relations between your project sponsors
– Sponsors are not “IT”, “marketing”, “sales”, “risk”
■ You are working for a person, and you need to know who that is and what he wants to achieve
– There will be political fights or technical problems you will not be able to solve
– Solving problems “under the table” is very risky way of doing things
■ To build something as complex as enterprise grade data platform you need a shield, make sure to
have one when it rains
■ And…at this moment we didn’t have one line of code written down or technology chosen
– “Failure culture” in enterprises is a nice HR slogan…fauilure on this level means end of career
Putting together a modern data platform
■ Your project is approved, you have sponsors, use cases, budget
■ You have chosen a cloud platform and got “cloud-first” strategy commitment
■ You now only need to create a whole platform, add data, handle users, … 
– As lower teer operational employee you might learn about it in “kickoff” meeting
■ Next, we need to define processes and initial use case
– This includes data governance and all other standards
– We can start small, improve overtime, but governance is as important as features
Data masking – dynamic and static
Row-level security
Data lineage
Data retention and recovery
Cost monitoring and reporting
Four dimensions of impact ( example )
Platform Stability Data Quality & Governance
Data Adoption
Revenue/Efficiency
Impact
4
Dimensions
of Impact
Infrastructure cost
vs. plan: Tracks the
costs for data
infrastructure (DWH,
pipelines) vs. the
plan
Data Uptime: % of the
data pipelines which
were providing data
without reported
incidents
Data Monthly Active
Users (DMAU): Users of
our data platform and
reporting technology,
monthly basis
Data Maturity Score:
yearly measure of
current data maturity
against data
governance
requirements
Data Team Value: Value
in € of analytics
products on data
platform
Efficiency score: Value
in € of internal
processes savings due
to central reporting
Validation and
cleansing: % of data
products using
validation & cleansing
Policy as code:
monthly # of policy
violations of deployed
code
Can we finally write some code and stop “talking”?
■ Tools and code…is what most IT operational people like to discuss, but they come last, and they are just that…tools
– Experimentation is limited, it is necessary to align with budgets and existing vendors
■ Vendor lock is something we can “almost” avoid with proper architecture – there is no options to “stick to old ways”
■ You need to implement three things from “day one”…
– …user behavior monitoring
– …data platform usage
– …cost monitoring and reporting
■ Use best tools available even if they cost more – money is less important that project success
– Avoid open source “just” because they are “free” - have a good reason to use them, they are project risk from governance perspective
– And…you will usually pay much more later in price of consultants, “support” and problems
■ At this moment you still didn’t write a single line of production-ready code ( or any code at all )
■ For enterprise environments it “usually” takes years to start strategic data transformation project
Closing words
■ Creating a data platform is a company-wide effort that brings high risks and high benefits
■ It is much more than IT initiative, it is company-wide transformation project
■ It has impacts on how you do business and how you handle customers
■ It is a key driver of company strategy
■ You need business and technical knowledge, use cases and sponsors
■ You need people of different technological backgrounds and profiles
■ You need political wisdom as well as desire to innovate
■ Good luck!
Further reading
Sol Rashidi - Your AI Survival Guide - Scraped Knees, Bruised Elbows, and Lessons Learned
from Real-World AI Deployments
John Reis – Fundamentals of data engineering – Plan and Build robust data systems
Jesse Anderson – Data teams – a unified management model for successful data-focused
teams
Jesse Anderson – Data engineering teams – creating successful big data teams and products
Project Apollo - How to run a product team
The agile data method
Let’s do coffee later 

[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises

  • 1.
    BUILDING CLOUD DATA PLATFORMSIN ENTERPRISES Josip Saban, M. Sc., E-MBA, PMP Data consultant
  • 2.
    About me ■ M.Sc.In Computing on University of Zagreb, MBA on Cotrugli (Croatia), Executive MBA on Innovation and Entrepreneurship in area of toxic management on TU Wien (Austria) ■ 20+ years of “everything” data, supporter of best engineering and management practices ■ Tools – Snowflake, Databricks, MS Fabrics, Tableau, PowerBI, SQL Server, Oracle, AWS, Azure, Python, dbt, … ■ Previous management experience – Two times startup co-owner, internal employee in various leading roles in Hypo Bank and Erste Group, external consultant for SwissRe, Gartner, SteerCRM, Santander Bank ■ Writing blogs and holding lectures – Wrote a book on Engineering Management – More details on current activities at https://at.linkedin.com/in/josipsaban
  • 3.
    Data platform summary ■Don't overcomplicate the "data platform" term because everybody else does it ■ Central platform – ingests data, integrates data, stores the data, processes & transforms ■ Integrated management, governance & security – manages the data ( DQ, lineage,… ), secures data ■ Analytical and consumption platform – serves curated data, provides insights ■ And…that’s it
  • 4.
    Or…that is notit… ■ Before IT operations knows it started…it started ( usually ) months/years ago ■ Drivers – Legacy architecture - locally hosted datacenter maintained by small infra team offering no dynamic scalability, fragile, little automation – History of silos - homegrown data marts and purchased BI solutions – slow reaction to changes, lots of black boxes, duplication of data, spaghetti-code – Immature organization - no history of internal software development practices and little experience with cloud ■ Human and organizational attitudes preventing innovation – Development/operations is all that matters, everything else is overhead – “developers are heroes of modern age” – Assumptions that one role type is more important than the other – “management is useless, they just consume air and produce nothing” – Complete lack of understanding how tickets and demand appear in backlog – We will do “agile” in data engineering project – good luck in your next data project failure  – Not understanding that “you” own your data pipeline ■ Thinking that accountability and responsibility are the same thing – they are not! – Not understanding that data engineering is NOT software development
  • 5.
    Real assumptions ofpeople with experience ■ You have built at least one end-to-end data solution, and you know… – You are building a service, not evangelizing technology – You know that things will not work out as planned, plan is “more or less” a guideline – You know that there will be very little/no documentation – You know that there will be some last century legacy that “no one” maintains – You know that scope/stakeholders/complexity…will increase – You know your work is as political as it is technical – You know that you will have silo thinking, personal interests and conflict – You know…that you do not know how it will look like
  • 6.
    Two levels ofmotivation “Sales” drivers Informal “unspoken” drivers Cost reduction in running on-premise data centers Legacy software that cannot be maintained Increased response time to business needs and reduced time to market High license costs combined with legacy enterprise tools Improved scalability…in both directions Modernization of processes which are not supported by current tools Usage of new tools and processes with little effort in architecture and capital investments Dozens, and sometimes hundreds, databases across enterprise siloes Consumption of new data sources which are impractical with on- premise solutions Mandatory integrations due to mergers and acquisitions processes Partial decrease of responsibility/accountability for maintaining environments Easier implementation of security and GDPR compliance Improvement of data quality and data observability with new tools … ■ In summary –data platform is more business than technical project – It is a consequence and solution to problems we cannot admit exist – And…it gives us ability to serve data to clients in new ways – It is, primarily, an organizational topic
  • 7.
    End goal ■ Salespitch – “To be one-stop-shop for providing, searching, and consuming business relevant data” – “Single source of truth” – … ■ Technical pre-sales pitch – …self service data ( consumers ) and data quality and observability ( producers ) – …data catalogue functionalities “out of the box” – …transparent data quality pipelines and orchestrations – …”data products” ( whatever they are…again, highly political topic ) – …centralized compliance – … ■ What it will really bring… – …hope that things will be better and that we can serve better our customers – …involvement of business in data observability, quality and consumption – …modernization and creation of processes – …democratization of data ( this is not a buzzword! ) – …political conflict on all organization levels
  • 8.
    But what arewe really paying? ■ Data platform is an architectural challenge of setting up connected “black” boxes ■ It is more important to define features and interoperability than focus on specific tools ■ We need to design it in a way that we can change tools without endangering stability ■ Data platform is not a one-size-fits-all solution – it is built in iterations – What new orchestrators, databases, analytic or reporting tools will be on market tomorrow? ■ It is fundamental to have the ability to… – …to choose the right technology – …at the right time – …without overhauling the entire architecture – it is not “IT geek playground” ■ Creating data platforms is a business first, technology second project ■ Functioning data platform is your strategic asset, all other systems are users… – …data sharing – …frontend applications – …data catalogues – …reporting ■ You are creating analytic systems to improve or enable business performance ■ If you do not have at least one use-case per sponsor your initiative will fail ■ You need to write a lot of documentation, be transparent and demand support ■ Selected use cases need to be core company processes ■ You need to setup, in advance, measurable KPIs that define success criteria
  • 9.
    Importance of sponsors ■Data platform is a very visible tool, it brings visibility of many “hidden” silo problems – Sponsors must have a name, hierarchical level and motivations ■ There are many non-obvious relations between your project sponsors – Sponsors are not “IT”, “marketing”, “sales”, “risk” ■ You are working for a person, and you need to know who that is and what he wants to achieve – There will be political fights or technical problems you will not be able to solve – Solving problems “under the table” is very risky way of doing things ■ To build something as complex as enterprise grade data platform you need a shield, make sure to have one when it rains ■ And…at this moment we didn’t have one line of code written down or technology chosen – “Failure culture” in enterprises is a nice HR slogan…fauilure on this level means end of career
  • 10.
    Putting together amodern data platform ■ Your project is approved, you have sponsors, use cases, budget ■ You have chosen a cloud platform and got “cloud-first” strategy commitment ■ You now only need to create a whole platform, add data, handle users, …  – As lower teer operational employee you might learn about it in “kickoff” meeting ■ Next, we need to define processes and initial use case – This includes data governance and all other standards – We can start small, improve overtime, but governance is as important as features Data masking – dynamic and static Row-level security Data lineage Data retention and recovery Cost monitoring and reporting
  • 11.
    Four dimensions ofimpact ( example ) Platform Stability Data Quality & Governance Data Adoption Revenue/Efficiency Impact 4 Dimensions of Impact Infrastructure cost vs. plan: Tracks the costs for data infrastructure (DWH, pipelines) vs. the plan Data Uptime: % of the data pipelines which were providing data without reported incidents Data Monthly Active Users (DMAU): Users of our data platform and reporting technology, monthly basis Data Maturity Score: yearly measure of current data maturity against data governance requirements Data Team Value: Value in € of analytics products on data platform Efficiency score: Value in € of internal processes savings due to central reporting Validation and cleansing: % of data products using validation & cleansing Policy as code: monthly # of policy violations of deployed code
  • 12.
    Can we finallywrite some code and stop “talking”? ■ Tools and code…is what most IT operational people like to discuss, but they come last, and they are just that…tools – Experimentation is limited, it is necessary to align with budgets and existing vendors ■ Vendor lock is something we can “almost” avoid with proper architecture – there is no options to “stick to old ways” ■ You need to implement three things from “day one”… – …user behavior monitoring – …data platform usage – …cost monitoring and reporting ■ Use best tools available even if they cost more – money is less important that project success – Avoid open source “just” because they are “free” - have a good reason to use them, they are project risk from governance perspective – And…you will usually pay much more later in price of consultants, “support” and problems ■ At this moment you still didn’t write a single line of production-ready code ( or any code at all ) ■ For enterprise environments it “usually” takes years to start strategic data transformation project
  • 13.
    Closing words ■ Creatinga data platform is a company-wide effort that brings high risks and high benefits ■ It is much more than IT initiative, it is company-wide transformation project ■ It has impacts on how you do business and how you handle customers ■ It is a key driver of company strategy ■ You need business and technical knowledge, use cases and sponsors ■ You need people of different technological backgrounds and profiles ■ You need political wisdom as well as desire to innovate ■ Good luck!
  • 14.
    Further reading Sol Rashidi- Your AI Survival Guide - Scraped Knees, Bruised Elbows, and Lessons Learned from Real-World AI Deployments John Reis – Fundamentals of data engineering – Plan and Build robust data systems Jesse Anderson – Data teams – a unified management model for successful data-focused teams Jesse Anderson – Data engineering teams – creating successful big data teams and products Project Apollo - How to run a product team The agile data method Let’s do coffee later 