SlideShare a Scribd company logo
Lukáš Vereš May 3, 2021
Webinar:
Building big data pipelines: Lessons learned
2
About me
› Lead big data delivery projects; act as
a delivery lead and solution architect
› Projects focused on the creation of
pipeline generation frameworks and
data delivery for customers
› Experience from big pharma and
finance-related companies
› 10+ years experience in the industry
Lukáš Vereš
Delivery lead for big
data projects
3
PROFINIT
Our competencies
Company stats
SOFTWARE DEVELOPMENT
APPLICATION OUTSOURCING
ENTERPRISE INTEGRATION
BUSINESS INTELLIGENCE/DWH
BIG DATA AND DATA SCIENCE
22+ yrs.
On the
tech market
since 1998.
Prague
Headquarters
at cenrte of
Europe.
500+
Experienced
and enthusiastic
professionals.
Top 3
CAD company
in Czech Republic
(IDC study).
26M €
Company
revenue in
2019.
Multiple areas
Clients from
Finance, Insurance
and Telco industry.
50+
We serve many
prominent world
clients
Certifications, culture & quality
A long history of technical engineering excellence has
lead western companies to rely heavily on skills and
expertise from the Czech Repubic. We are proud of quality
of our services and the certificates ISO 9001, ISO 27001,
ISO 20000, PRINCE 2, underpinning our commitment
to provide high quality sustainable services.
ISO 9000 ISO 270000 ISO 20000
What am I going to talk about?
5
Today’s topics
› Personal experience
› Lessons learned
› Big data
What is big data?
7
The Original Vs
› Velocity
› Volume
› Variety
8
The business perspective on big data
› Validity
– Is the system under development or is it stable?
– Are data secure and can you trust them?
– Are the data compliant with laws and regulatory policies like the GDPR and CCPA?
› Value
– Set up your value objectives and then use the chosen metrics
› Visualisation
– All data flows and processes need to be monitored and illustrated descriptively
– Understand what is actually being carried out and how
9
Technical perspective on big data
› Data storage
– Data storage systems:
HDFS, GlusterFS, etc.
– File systems with internal
structures: AVRO,
PARQUET, DeltaLake, etc.
› Data processing
– Data transformation: Spark,
MapRed, etc.
– SQL engines: Hadoop, Impala,
Presto, Hive, Hbase, Phoenix
› Open technologies
– Free access with the option to
get support for special components
› Big data volume
– Datasets bigger than 1TB,
hundreds of datasets or more
from one source, different file
formats
Poll
First lesson:
“Measure twice but really cut just
once when it comes to selecting
technologies”
12
Real life stories
› Implemented two different
architectures and toolsets
for the same purpose
› Added one more reporting
tool to the pile
13
Benefits of multiple technologies
› Two different solutions for the same thing
create a competitive environment
› New ideas are created to try to differentiate
› People get the chance to decide which
technology they prefer to work with
14
Downsides of multiple technologies
› Choosing a toolset can add more work to onboarding projects
› Less transparency in decision-making
› Every tool has to be supported,
adding complexity to the infrastructure
15
Key Takeaways
› Do your work and write down any discrepancies and recommendations
› Be transparent in your decision-making
› IT teams need to support business teams by teaching them how to use
existing tools
Second lesson:
“Communication works for those
who work at it”
17
Real life stories
› The development team relocated a support
team member to work with them on product
development
› A member of the support team had
a workstation next to the developers
› The data warehouse team struggled to
understand the impact of a data lake
on their transformations
18
Benefits of Intensive Collaboration
› Helps the support team understand the technology and improves
constructive discussion
› Involvement of the other team can improve the quality of the delivery
› Decreases frustration in teams where there are misunderstandings
19
Downsides of Intensive Collaboration
› It consumes the capacity of the support team member
› Not everyone is both a good learner and a good teacher
20
Key Takeaways
› Invest in teams, not only in individuals
› It is a process, not a one-time experience;
it will take time to evolve
Poll
Third lesson:
“Think about data as a commodity”
23
Real life stories
› Too many data sources
to load
– The goal was to speed up
the pipeline delivery
process
– Difficulties with changing
architecture
› Developed a solution for
generating data ingest pipelines
– A semi-manual self-service
approach to speeding up delivery
– Built on the AWS platform as
serverless architecture
24
Benefits of Data Pipelines
› Speeding up pipeline development can increase data scientists’
and data analysts’ interest in getting data in a more standardized
way
› Standardization of data ingest increases the data quality of work
done by data scientists and data analysts
› Serverless architecture
25
Downsides of Data Pipelines
› Managing the lifecycle of data sources
› It takes time to build it
› Transparent costs
26
Key Takeaways
› Build a framework for pipeline generation; it will pay
off in the long run
– Save money on support
– Unified way to give analytics teams access in a standardized
way
› Think about how to load datasets to target systems
faster
– Open new possibilities for business customers
– Give even very small customers without big budgets access
to data
Fourth lesson:
“Involve business from the beginning”
28
Real life stories
› Implemented framework
from scratch without
involving the business
side from the beginning
› Loading all datasets in the
data lake to prepare it for the
data analyst or data scientist
was the wrong dogma
29
Benefits of Late Business Involvement
› Gives developers space to focus on technology and ideas
› Can try new things with dead ends
› Loading all datasets ahead of time gets data closer to the data
scientists and data analysts so it is ready anytime they need it
30
Downsides of Late Business Involvement
› If developers have more space, they might build
a solution that doesn’t fit real use cases
› Source systems are changing, and businesses
need to pay to support these changes and
impacts
31
Key Takeaways
› Think about when the right time is to involve people from
the business side
– The business can give developers space to work on a framework,
but at the same time, they should provide specific use cases
› The dogma for loading all datasets in advance is wrong
– Higher costs for pipeline support
– Frustration from fixing issues on pipelines that no one uses
– The focus is on fixing bugs instead of delivering higher quality
Summary
33
Lessons to be learned from this presentation
› Be constructive and honest when choosing technologies
› Have people work with different teams
› Deliver datasets from source systems faster
› Create solutions around business use cases
Profinit EU, s.r.o.
Tychonova 2, 160 00 Prague 6 | Phone + 420 224 316 016
Web
www.profinit.eu
LinkedIn
linkedin.com/company/profinit
Twitter
twitter.com/Profinit_EU
Facebook
facebook.com/Profinit.EU
Youtube
Profinit EU
Thank you
for your attention
35
We need your help to be better!
› Since you are here, please help us
improve our events and webinars
and take a look at our short survey.
We appreciate your interest to help
us grow. www.bigdataforbanking.com
linkedin.com/company/profinit
www.profinit.eu
› Contacts
Lukáš Vereš
lukas.veres@profinit.eu
Delivery lead for big data projects

More Related Content

What's hot

Cloud Computing 101 Workshop Sample
Cloud Computing 101 Workshop SampleCloud Computing 101 Workshop Sample
Cloud Computing 101 Workshop Sample
Alan Quayle
 
Edge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare ApplicationsEdge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare Applications
Debmalya Biswas
 
“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...
“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...
“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...
Edge AI and Vision Alliance
 
Towards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICETowards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICE
Pooyan Jamshidi
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
Yinlin Chen
 
“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLA“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLA
SLA-Ready Network
 
Mr. Hesham Rasmy's presentation at QITCOM 2011
Mr. Hesham Rasmy's presentation at QITCOM 2011Mr. Hesham Rasmy's presentation at QITCOM 2011
Mr. Hesham Rasmy's presentation at QITCOM 2011
QITCOM
 
RECAP Project Overview
RECAP Project OverviewRECAP Project Overview
RECAP Project Overview
RECAP Project
 
Cloud migration
Cloud migration Cloud migration
Cloud migration
deszal
 
Coud discovery chap 10
Coud discovery chap 10Coud discovery chap 10
Coud discovery chap 10
Alain Charpentier
 
Capella Days 2021 | An example of model-centric engineering environment with ...
Capella Days 2021 | An example of model-centric engineering environment with ...Capella Days 2021 | An example of model-centric engineering environment with ...
Capella Days 2021 | An example of model-centric engineering environment with ...
Obeo
 
PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...
PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...
PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...
Cedar Consulting
 
NCMS UberCloud Experiment Webinar .
NCMS UberCloud Experiment Webinar .NCMS UberCloud Experiment Webinar .
NCMS UberCloud Experiment Webinar .
hpcexperiment
 
Identifying Workloads to Move to the Cloud
Identifying Workloads to Move to the CloudIdentifying Workloads to Move to the Cloud
Identifying Workloads to Move to the Cloud
RightScale
 
Engineering Simulation Meets the Cloud
Engineering Simulation Meets the CloudEngineering Simulation Meets the Cloud
Engineering Simulation Meets the Cloud
hpcexperiment
 
Cloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens NimisCloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens Nimis
JensNimis
 
Cloud migration
Cloud migration Cloud migration
Cloud migration
Anirban Kundu
 
The Cloud Presentation 2016
The Cloud Presentation 2016The Cloud Presentation 2016
The Cloud Presentation 2016
Joel Kline
 
Cloud technology for hospitality
Cloud technology for hospitalityCloud technology for hospitality
Cloud technology for hospitality
PT Datacomm Diangraha
 
Coud discovery chap 9
Coud discovery chap 9Coud discovery chap 9
Coud discovery chap 9
Alain Charpentier
 

What's hot (20)

Cloud Computing 101 Workshop Sample
Cloud Computing 101 Workshop SampleCloud Computing 101 Workshop Sample
Cloud Computing 101 Workshop Sample
 
Edge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare ApplicationsEdge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare Applications
 
“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...
“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...
“Productizing Complex Visual AI Systems for Autonomous Flight,” a Presentatio...
 
Towards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICETowards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICE
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLA“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLA
 
Mr. Hesham Rasmy's presentation at QITCOM 2011
Mr. Hesham Rasmy's presentation at QITCOM 2011Mr. Hesham Rasmy's presentation at QITCOM 2011
Mr. Hesham Rasmy's presentation at QITCOM 2011
 
RECAP Project Overview
RECAP Project OverviewRECAP Project Overview
RECAP Project Overview
 
Cloud migration
Cloud migration Cloud migration
Cloud migration
 
Coud discovery chap 10
Coud discovery chap 10Coud discovery chap 10
Coud discovery chap 10
 
Capella Days 2021 | An example of model-centric engineering environment with ...
Capella Days 2021 | An example of model-centric engineering environment with ...Capella Days 2021 | An example of model-centric engineering environment with ...
Capella Days 2021 | An example of model-centric engineering environment with ...
 
PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...
PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...
PeopleSoft Cloud Architecture & PeopleSoft Selective Adoption...Not Just for ...
 
NCMS UberCloud Experiment Webinar .
NCMS UberCloud Experiment Webinar .NCMS UberCloud Experiment Webinar .
NCMS UberCloud Experiment Webinar .
 
Identifying Workloads to Move to the Cloud
Identifying Workloads to Move to the CloudIdentifying Workloads to Move to the Cloud
Identifying Workloads to Move to the Cloud
 
Engineering Simulation Meets the Cloud
Engineering Simulation Meets the CloudEngineering Simulation Meets the Cloud
Engineering Simulation Meets the Cloud
 
Cloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens NimisCloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens Nimis
 
Cloud migration
Cloud migration Cloud migration
Cloud migration
 
The Cloud Presentation 2016
The Cloud Presentation 2016The Cloud Presentation 2016
The Cloud Presentation 2016
 
Cloud technology for hospitality
Cloud technology for hospitalityCloud technology for hospitality
Cloud technology for hospitality
 
Coud discovery chap 9
Coud discovery chap 9Coud discovery chap 9
Coud discovery chap 9
 

Similar to Building big data pipelines—lessons learned

Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
Microsoft
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Databricks
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Denodo
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Soujanya V
 
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
Denodo
 
Understand your data dependencies – Key enabler to efficient modernisation
 Understand your data dependencies – Key enabler to efficient modernisation  Understand your data dependencies – Key enabler to efficient modernisation
Understand your data dependencies – Key enabler to efficient modernisation
Profinit
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014
Hortonworks
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
Toronto-Oracle-Users-Group
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
Denodo
 
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationWebinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
SnapLogic
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
Abdelkader OUARED
 
GHD iConnect - our intranet for the future
GHD iConnect - our intranet for the futureGHD iConnect - our intranet for the future
GHD iConnect - our intranet for the future
Maree Courts
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
Data Science Milan
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
Denodo
 
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced AnalyticsADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
DATAVERSITY
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 
Webinar on Big Data Challenges : Presented by Raj Kasturi
Webinar on Big Data Challenges : Presented by Raj KasturiWebinar on Big Data Challenges : Presented by Raj Kasturi
Webinar on Big Data Challenges : Presented by Raj Kasturi
oGuild .
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Precisely
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great Data
DLT Solutions
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Denodo
 

Similar to Building big data pipelines—lessons learned (20)

Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
 
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
 
Understand your data dependencies – Key enabler to efficient modernisation
 Understand your data dependencies – Key enabler to efficient modernisation  Understand your data dependencies – Key enabler to efficient modernisation
Understand your data dependencies – Key enabler to efficient modernisation
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
 
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationWebinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 
GHD iConnect - our intranet for the future
GHD iConnect - our intranet for the futureGHD iConnect - our intranet for the future
GHD iConnect - our intranet for the future
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced AnalyticsADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
Webinar on Big Data Challenges : Presented by Raj Kasturi
Webinar on Big Data Challenges : Presented by Raj KasturiWebinar on Big Data Challenges : Presented by Raj Kasturi
Webinar on Big Data Challenges : Presented by Raj Kasturi
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great Data
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
 

More from Profinit

Reference Data Management
Reference Data ManagementReference Data Management
Reference Data Management
Profinit
 
Propensity Modelling for Banks
Propensity Modelling for BanksPropensity Modelling for Banks
Propensity Modelling for Banks
Profinit
 
Legacy systems modernisation
Legacy systems modernisationLegacy systems modernisation
Legacy systems modernisation
Profinit
 
Automating Data Lakes, Data Warehouses and Data Stores
Automating Data Lakes, Data Warehouses and Data StoresAutomating Data Lakes, Data Warehouses and Data Stores
Automating Data Lakes, Data Warehouses and Data Stores
Profinit
 
4 Steps Towards Data Transparency
4 Steps Towards Data Transparency4 Steps Towards Data Transparency
4 Steps Towards Data Transparency
Profinit
 
Software systems modernisation
Software systems modernisationSoftware systems modernisation
Software systems modernisation
Profinit
 
Odborná snídaně: Datový sklad jako Perpetuum Mobile
Odborná snídaně: Datový sklad jako Perpetuum MobileOdborná snídaně: Datový sklad jako Perpetuum Mobile
Odborná snídaně: Datový sklad jako Perpetuum Mobile
Profinit
 
Data Science a MLOps v prostředí cloudu
Data Science a MLOps v prostředí clouduData Science a MLOps v prostředí cloudu
Data Science a MLOps v prostředí cloudu
Profinit
 
Detekce sociálních vazeb: domácnosti a přátelé
Detekce sociálních vazeb: domácnosti a přáteléDetekce sociálních vazeb: domácnosti a přátelé
Detekce sociálních vazeb: domácnosti a přátelé
Profinit
 
Výsledky backtestu propensitního modelu
Výsledky backtestu propensitního modeluVýsledky backtestu propensitního modelu
Výsledky backtestu propensitního modelu
Profinit
 
Propensitní modelování
Propensitní modelováníPropensitní modelování
Propensitní modelování
Profinit
 
Profinit Webinar: Benefits of Software Systems Modernization over their Repla...
Profinit Webinar: Benefits of Software Systems Modernization over their Repla...Profinit Webinar: Benefits of Software Systems Modernization over their Repla...
Profinit Webinar: Benefits of Software Systems Modernization over their Repla...
Profinit
 
Profinit webinar: Instalment Detector
Profinit webinar: Instalment DetectorProfinit webinar: Instalment Detector
Profinit webinar: Instalment Detector
Profinit
 
Profinit_snidane_DWH_22_10_2019_publish
Profinit_snidane_DWH_22_10_2019_publishProfinit_snidane_DWH_22_10_2019_publish
Profinit_snidane_DWH_22_10_2019_publish
Profinit
 
2019 09-23-snidane qa-public
2019 09-23-snidane qa-public2019 09-23-snidane qa-public
2019 09-23-snidane qa-public
Profinit
 
2019 03-20 snidane-serie-kuchyne-full
2019 03-20 snidane-serie-kuchyne-full2019 03-20 snidane-serie-kuchyne-full
2019 03-20 snidane-serie-kuchyne-full
Profinit
 
2018 11-28 snidane-serie-kuchyne
2018 11-28 snidane-serie-kuchyne2018 11-28 snidane-serie-kuchyne
2018 11-28 snidane-serie-kuchyne
Profinit
 
Matedatový sklad
Matedatový skladMatedatový sklad
Matedatový sklad
Profinit
 
Projekt Bitcoinová burza Coinmate
Projekt Bitcoinová burza CoinmateProjekt Bitcoinová burza Coinmate
Projekt Bitcoinová burza Coinmate
Profinit
 
Projekt Edenred Cafeteria
Projekt Edenred CafeteriaProjekt Edenred Cafeteria
Projekt Edenred Cafeteria
Profinit
 

More from Profinit (20)

Reference Data Management
Reference Data ManagementReference Data Management
Reference Data Management
 
Propensity Modelling for Banks
Propensity Modelling for BanksPropensity Modelling for Banks
Propensity Modelling for Banks
 
Legacy systems modernisation
Legacy systems modernisationLegacy systems modernisation
Legacy systems modernisation
 
Automating Data Lakes, Data Warehouses and Data Stores
Automating Data Lakes, Data Warehouses and Data StoresAutomating Data Lakes, Data Warehouses and Data Stores
Automating Data Lakes, Data Warehouses and Data Stores
 
4 Steps Towards Data Transparency
4 Steps Towards Data Transparency4 Steps Towards Data Transparency
4 Steps Towards Data Transparency
 
Software systems modernisation
Software systems modernisationSoftware systems modernisation
Software systems modernisation
 
Odborná snídaně: Datový sklad jako Perpetuum Mobile
Odborná snídaně: Datový sklad jako Perpetuum MobileOdborná snídaně: Datový sklad jako Perpetuum Mobile
Odborná snídaně: Datový sklad jako Perpetuum Mobile
 
Data Science a MLOps v prostředí cloudu
Data Science a MLOps v prostředí clouduData Science a MLOps v prostředí cloudu
Data Science a MLOps v prostředí cloudu
 
Detekce sociálních vazeb: domácnosti a přátelé
Detekce sociálních vazeb: domácnosti a přáteléDetekce sociálních vazeb: domácnosti a přátelé
Detekce sociálních vazeb: domácnosti a přátelé
 
Výsledky backtestu propensitního modelu
Výsledky backtestu propensitního modeluVýsledky backtestu propensitního modelu
Výsledky backtestu propensitního modelu
 
Propensitní modelování
Propensitní modelováníPropensitní modelování
Propensitní modelování
 
Profinit Webinar: Benefits of Software Systems Modernization over their Repla...
Profinit Webinar: Benefits of Software Systems Modernization over their Repla...Profinit Webinar: Benefits of Software Systems Modernization over their Repla...
Profinit Webinar: Benefits of Software Systems Modernization over their Repla...
 
Profinit webinar: Instalment Detector
Profinit webinar: Instalment DetectorProfinit webinar: Instalment Detector
Profinit webinar: Instalment Detector
 
Profinit_snidane_DWH_22_10_2019_publish
Profinit_snidane_DWH_22_10_2019_publishProfinit_snidane_DWH_22_10_2019_publish
Profinit_snidane_DWH_22_10_2019_publish
 
2019 09-23-snidane qa-public
2019 09-23-snidane qa-public2019 09-23-snidane qa-public
2019 09-23-snidane qa-public
 
2019 03-20 snidane-serie-kuchyne-full
2019 03-20 snidane-serie-kuchyne-full2019 03-20 snidane-serie-kuchyne-full
2019 03-20 snidane-serie-kuchyne-full
 
2018 11-28 snidane-serie-kuchyne
2018 11-28 snidane-serie-kuchyne2018 11-28 snidane-serie-kuchyne
2018 11-28 snidane-serie-kuchyne
 
Matedatový sklad
Matedatový skladMatedatový sklad
Matedatový sklad
 
Projekt Bitcoinová burza Coinmate
Projekt Bitcoinová burza CoinmateProjekt Bitcoinová burza Coinmate
Projekt Bitcoinová burza Coinmate
 
Projekt Edenred Cafeteria
Projekt Edenred CafeteriaProjekt Edenred Cafeteria
Projekt Edenred Cafeteria
 

Recently uploaded

New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
kinni singh$A17
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
huseindihon
 
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
tanupasswan6
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata
 
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
norina2645
 
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
tanupasswan6
 
Biometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdfBiometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdf
Joel Ngushwai
 
Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...
Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...
Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...
avanikakapoor
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
kinni singh$A17
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
45unexpected
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
fatima shekh$A17
 
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
sheetal singh$A17
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
huseindihon
 
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
NABLAS株式会社
 
Potential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriatePotential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriate
huseindihon
 
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdfWhy_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Alexander Teggin
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
birajmohan012
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
sharonblush
 
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
rightmanforbloodline
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
huseindihon
 

Recently uploaded (20)

New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
 
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
 
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
 
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
 
Biometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdfBiometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdf
 
Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...
Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...
Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
 
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
 
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
 
Potential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriatePotential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriate
 
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdfWhy_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
 
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
 

Building big data pipelines—lessons learned

  • 1. Lukáš Vereš May 3, 2021 Webinar: Building big data pipelines: Lessons learned
  • 2. 2 About me › Lead big data delivery projects; act as a delivery lead and solution architect › Projects focused on the creation of pipeline generation frameworks and data delivery for customers › Experience from big pharma and finance-related companies › 10+ years experience in the industry Lukáš Vereš Delivery lead for big data projects
  • 3. 3 PROFINIT Our competencies Company stats SOFTWARE DEVELOPMENT APPLICATION OUTSOURCING ENTERPRISE INTEGRATION BUSINESS INTELLIGENCE/DWH BIG DATA AND DATA SCIENCE 22+ yrs. On the tech market since 1998. Prague Headquarters at cenrte of Europe. 500+ Experienced and enthusiastic professionals. Top 3 CAD company in Czech Republic (IDC study). 26M € Company revenue in 2019. Multiple areas Clients from Finance, Insurance and Telco industry. 50+ We serve many prominent world clients Certifications, culture & quality A long history of technical engineering excellence has lead western companies to rely heavily on skills and expertise from the Czech Repubic. We are proud of quality of our services and the certificates ISO 9001, ISO 27001, ISO 20000, PRINCE 2, underpinning our commitment to provide high quality sustainable services. ISO 9000 ISO 270000 ISO 20000
  • 4. What am I going to talk about?
  • 5. 5 Today’s topics › Personal experience › Lessons learned › Big data
  • 6. What is big data?
  • 7. 7 The Original Vs › Velocity › Volume › Variety
  • 8. 8 The business perspective on big data › Validity – Is the system under development or is it stable? – Are data secure and can you trust them? – Are the data compliant with laws and regulatory policies like the GDPR and CCPA? › Value – Set up your value objectives and then use the chosen metrics › Visualisation – All data flows and processes need to be monitored and illustrated descriptively – Understand what is actually being carried out and how
  • 9. 9 Technical perspective on big data › Data storage – Data storage systems: HDFS, GlusterFS, etc. – File systems with internal structures: AVRO, PARQUET, DeltaLake, etc. › Data processing – Data transformation: Spark, MapRed, etc. – SQL engines: Hadoop, Impala, Presto, Hive, Hbase, Phoenix › Open technologies – Free access with the option to get support for special components › Big data volume – Datasets bigger than 1TB, hundreds of datasets or more from one source, different file formats
  • 10. Poll
  • 11. First lesson: “Measure twice but really cut just once when it comes to selecting technologies”
  • 12. 12 Real life stories › Implemented two different architectures and toolsets for the same purpose › Added one more reporting tool to the pile
  • 13. 13 Benefits of multiple technologies › Two different solutions for the same thing create a competitive environment › New ideas are created to try to differentiate › People get the chance to decide which technology they prefer to work with
  • 14. 14 Downsides of multiple technologies › Choosing a toolset can add more work to onboarding projects › Less transparency in decision-making › Every tool has to be supported, adding complexity to the infrastructure
  • 15. 15 Key Takeaways › Do your work and write down any discrepancies and recommendations › Be transparent in your decision-making › IT teams need to support business teams by teaching them how to use existing tools
  • 16. Second lesson: “Communication works for those who work at it”
  • 17. 17 Real life stories › The development team relocated a support team member to work with them on product development › A member of the support team had a workstation next to the developers › The data warehouse team struggled to understand the impact of a data lake on their transformations
  • 18. 18 Benefits of Intensive Collaboration › Helps the support team understand the technology and improves constructive discussion › Involvement of the other team can improve the quality of the delivery › Decreases frustration in teams where there are misunderstandings
  • 19. 19 Downsides of Intensive Collaboration › It consumes the capacity of the support team member › Not everyone is both a good learner and a good teacher
  • 20. 20 Key Takeaways › Invest in teams, not only in individuals › It is a process, not a one-time experience; it will take time to evolve
  • 21. Poll
  • 22. Third lesson: “Think about data as a commodity”
  • 23. 23 Real life stories › Too many data sources to load – The goal was to speed up the pipeline delivery process – Difficulties with changing architecture › Developed a solution for generating data ingest pipelines – A semi-manual self-service approach to speeding up delivery – Built on the AWS platform as serverless architecture
  • 24. 24 Benefits of Data Pipelines › Speeding up pipeline development can increase data scientists’ and data analysts’ interest in getting data in a more standardized way › Standardization of data ingest increases the data quality of work done by data scientists and data analysts › Serverless architecture
  • 25. 25 Downsides of Data Pipelines › Managing the lifecycle of data sources › It takes time to build it › Transparent costs
  • 26. 26 Key Takeaways › Build a framework for pipeline generation; it will pay off in the long run – Save money on support – Unified way to give analytics teams access in a standardized way › Think about how to load datasets to target systems faster – Open new possibilities for business customers – Give even very small customers without big budgets access to data
  • 27. Fourth lesson: “Involve business from the beginning”
  • 28. 28 Real life stories › Implemented framework from scratch without involving the business side from the beginning › Loading all datasets in the data lake to prepare it for the data analyst or data scientist was the wrong dogma
  • 29. 29 Benefits of Late Business Involvement › Gives developers space to focus on technology and ideas › Can try new things with dead ends › Loading all datasets ahead of time gets data closer to the data scientists and data analysts so it is ready anytime they need it
  • 30. 30 Downsides of Late Business Involvement › If developers have more space, they might build a solution that doesn’t fit real use cases › Source systems are changing, and businesses need to pay to support these changes and impacts
  • 31. 31 Key Takeaways › Think about when the right time is to involve people from the business side – The business can give developers space to work on a framework, but at the same time, they should provide specific use cases › The dogma for loading all datasets in advance is wrong – Higher costs for pipeline support – Frustration from fixing issues on pipelines that no one uses – The focus is on fixing bugs instead of delivering higher quality
  • 33. 33 Lessons to be learned from this presentation › Be constructive and honest when choosing technologies › Have people work with different teams › Deliver datasets from source systems faster › Create solutions around business use cases
  • 34. Profinit EU, s.r.o. Tychonova 2, 160 00 Prague 6 | Phone + 420 224 316 016 Web www.profinit.eu LinkedIn linkedin.com/company/profinit Twitter twitter.com/Profinit_EU Facebook facebook.com/Profinit.EU Youtube Profinit EU Thank you for your attention
  • 35. 35 We need your help to be better! › Since you are here, please help us improve our events and webinars and take a look at our short survey. We appreciate your interest to help us grow. www.bigdataforbanking.com linkedin.com/company/profinit www.profinit.eu › Contacts Lukáš Vereš lukas.veres@profinit.eu Delivery lead for big data projects