SlideShare a Scribd company logo
Featured Project:
Marina Bay Sands Casino Resort, Singapore
Connecting teams project-wide
Big Data in a production environment:
Lessons Learnt
LAST Conference 2017
Mark Grebler - Aconex
CONFIDENTIAL | 2
Featured Project:
Marina Bay Sands Casino Resort, Singapore
Connecting teams project-wide
What does Big Data mean to you?
Summary
• What is the Insights project
• Big Data for Data Science
• Big Data in a production, user-facing environment
• Lessons Learnt
• Problems still to solve
What’s an Aconex?
Pronounced: Ay-conn-ex
Highly flexible and customisable data model with low level concepts
=
useful for many types of projects
Aconex has Flexible data
The Insights Project
Highly flexible and customisable data model with low level concepts
=
Difficult to produce meaningful customer reports
Flexible Data Needs transformation
The Insights Project
What does it look like
The Insights Project
Typical Big Data Architectures
Insights architecture
Looks similar to the other architectures
But the differences exist
We use the AWS console
to deploy new
infrastructure
I add new hardware by
buying a new box and
connecting it to the
network
Quotes from Data Engineers interviewed
We deploy by copying the
jar file to the cluster
We don’t have any CI, I
just build it on my box
We test by running it over
some data and ensuring it
doesn’t crash
We have some
rudimentary tests
What are the differences
Other Big Data Projects
● Internal client
● Simple authentication
● For Data Scientists
● Single environment
○ Sometimes 2 or 3
● Manual infrastructure management
● Sanity testing
● Manual integration
● Manual deployment
● Unrestricted data access
Insights Project
● External client
● Integrated authentication
● For end users
● Multiple environments
○ Due to data sovereignty (10)
● Infrastructure as code
● Unit → end-to-end testing
● Continuous integration
● Single-step deployment
● Data access restrictions
It’s not always so black and white, but the left side represents quite a lot of other projects I’ve seen.
Lessons Learnt
● VPN to control data access
● Autoscaling application server
● Network independence
● Zero downtime-deployments with
automatic rollback
○ ElasticBeanstalk provides this
Lessons learned: Infrastructure-as-code
● Must be easily reproducible because we need to do it 10+ times
● Automation of infrastructure management
○ Infrastructure is a core part of the Big Data project, so it must be treated as important as our
application code
○ Terraform is used to manage the infrastructure, including:
■ Networking and VPN management
■ Security
■ Provisioning VMs and other infrastructure
■ Replication and ingestion of data from Data Centres
■ Database Administration and Automation
Lessons learned: Access segregation
● Different accounts for testing and
production
● Separate VPCs for each environment
● Multiple user roles allows fine-grained
control of access
● VPN used as a further level to restrict
data access
Lessons learned: Integration and deployment
Continuous Integration
Once built, versioned artifacts are pushed to s3 buckets
Deployments
Ansible is used to roll out new versions of the
application and transformations
Infrastructure
Terraform controls the base infrastructure
● Deployments run in parallel across environments
● Docker image used for deployments to control
dependencies
Lessons Learnt: Automate Testing
● Big Data testing is hard
● Automated unit tests to ensure transformations are correct
○ We pair with our QA to generate the data, and validate the expected output for the unit tests
○ TDD-ish, but often testing done after development
● Automated Integration tests using a large data set
○ To ensure regressions haven’t occurred
● Manual end-to-end sanity tests
○ This should be automated in the future
● Manual exploratory testing
Problems to resolve
● Testing
○ Big Data testing is time consuming
■ Particularly around data generation
○ How to effectively automate testing of the infrastructure
○ How to automate end-to-end sanity testing.
● Infrastructure
○ CI/CD with Terraform
○ So many moving parts makes management difficult
● Ingestion and transformations
○ How to move from batch processing to incremental or streaming
○ Removing the database clones
● Effectively communicating to the business what/why we’re doing what we are
○ Why are things so slow?

More Related Content

What's hot

Reinventing enterprise defense with the Elastic Stack
Reinventing enterprise defense with the Elastic StackReinventing enterprise defense with the Elastic Stack
Reinventing enterprise defense with the Elastic Stack
Elasticsearch
 
Build A Better Way to Deliver IT
Build A Better Way to Deliver ITBuild A Better Way to Deliver IT
Build A Better Way to Deliver IT
Rackspace
 
IPv17 sync17
IPv17 sync17IPv17 sync17
IPv17 sync17
serg_valov
 
Monitoring
MonitoringMonitoring
Monitoring
strikr .
 
O monitoramento da infraestrutura facilitado, da ingestão ao insight
O monitoramento da infraestrutura facilitado, da ingestão ao insightO monitoramento da infraestrutura facilitado, da ingestão ao insight
O monitoramento da infraestrutura facilitado, da ingestão ao insight
Elasticsearch
 
InfluxDB + Telegraf Operator: Easy Kubernetes Monitoring
InfluxDB + Telegraf Operator: Easy Kubernetes MonitoringInfluxDB + Telegraf Operator: Easy Kubernetes Monitoring
InfluxDB + Telegraf Operator: Easy Kubernetes Monitoring
InfluxData
 
Automatize a detecção de ameaças e evite falsos positivos
Automatize a detecção de ameaças e evite falsos positivosAutomatize a detecção de ameaças e evite falsos positivos
Automatize a detecção de ameaças e evite falsos positivos
Elasticsearch
 
Sumit Goel - Monitoring Cloud Applications Using Zabbix | ZabConf2016
Sumit Goel - Monitoring Cloud Applications Using Zabbix | ZabConf2016Sumit Goel - Monitoring Cloud Applications Using Zabbix | ZabConf2016
Sumit Goel - Monitoring Cloud Applications Using Zabbix | ZabConf2016
Zabbix
 
Detection, Response and the Azazel Rootkit
Detection, Response and the Azazel RootkitDetection, Response and the Azazel Rootkit
Detection, Response and the Azazel Rootkit
Threat Stack
 
Maplelabs scalable-field-device-cloud-native
Maplelabs scalable-field-device-cloud-nativeMaplelabs scalable-field-device-cloud-native
Maplelabs scalable-field-device-cloud-native
Ganeshkumar Sundararajan
 
Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]
AppFirst
 
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
Elasticsearch
 
OSMC 2017 | Icinga2 in a 24/7 Broadcast Environment by Dave Kempe
OSMC 2017 | Icinga2 in a 24/7 Broadcast Environment by Dave KempeOSMC 2017 | Icinga2 in a 24/7 Broadcast Environment by Dave Kempe
OSMC 2017 | Icinga2 in a 24/7 Broadcast Environment by Dave Kempe
NETWAYS
 
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with KubernetesHow To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
cnvrg.io AI OS - Hands-on ML Workshops
 
Using OPC-UA to Extract IIoT Time Series Data from PLC and SCADA Systems
Using OPC-UA to Extract IIoT Time Series Data from PLC and SCADA SystemsUsing OPC-UA to Extract IIoT Time Series Data from PLC and SCADA Systems
Using OPC-UA to Extract IIoT Time Series Data from PLC and SCADA Systems
InfluxData
 
Webinar slides: DevOps Tutorial: how to automate your database infrastructure
Webinar slides: DevOps Tutorial: how to automate your database infrastructureWebinar slides: DevOps Tutorial: how to automate your database infrastructure
Webinar slides: DevOps Tutorial: how to automate your database infrastructure
Severalnines
 
Automate threat detections and avoid false positives
Automate threat detections and avoid false positivesAutomate threat detections and avoid false positives
Automate threat detections and avoid false positives
Elasticsearch
 
DBOps
DBOpsDBOps
DBOps
strikr .
 
[WSO2Con USA 2018] Microservices, Containers, and Beyond
[WSO2Con USA 2018] Microservices, Containers, and Beyond[WSO2Con USA 2018] Microservices, Containers, and Beyond
[WSO2Con USA 2018] Microservices, Containers, and Beyond
WSO2
 
Historic Opportunities: Discover the Power of Ignition's Historian
Historic Opportunities: Discover the Power of Ignition's HistorianHistoric Opportunities: Discover the Power of Ignition's Historian
Historic Opportunities: Discover the Power of Ignition's Historian
Inductive Automation
 

What's hot (20)

Reinventing enterprise defense with the Elastic Stack
Reinventing enterprise defense with the Elastic StackReinventing enterprise defense with the Elastic Stack
Reinventing enterprise defense with the Elastic Stack
 
Build A Better Way to Deliver IT
Build A Better Way to Deliver ITBuild A Better Way to Deliver IT
Build A Better Way to Deliver IT
 
IPv17 sync17
IPv17 sync17IPv17 sync17
IPv17 sync17
 
Monitoring
MonitoringMonitoring
Monitoring
 
O monitoramento da infraestrutura facilitado, da ingestão ao insight
O monitoramento da infraestrutura facilitado, da ingestão ao insightO monitoramento da infraestrutura facilitado, da ingestão ao insight
O monitoramento da infraestrutura facilitado, da ingestão ao insight
 
InfluxDB + Telegraf Operator: Easy Kubernetes Monitoring
InfluxDB + Telegraf Operator: Easy Kubernetes MonitoringInfluxDB + Telegraf Operator: Easy Kubernetes Monitoring
InfluxDB + Telegraf Operator: Easy Kubernetes Monitoring
 
Automatize a detecção de ameaças e evite falsos positivos
Automatize a detecção de ameaças e evite falsos positivosAutomatize a detecção de ameaças e evite falsos positivos
Automatize a detecção de ameaças e evite falsos positivos
 
Sumit Goel - Monitoring Cloud Applications Using Zabbix | ZabConf2016
Sumit Goel - Monitoring Cloud Applications Using Zabbix | ZabConf2016Sumit Goel - Monitoring Cloud Applications Using Zabbix | ZabConf2016
Sumit Goel - Monitoring Cloud Applications Using Zabbix | ZabConf2016
 
Detection, Response and the Azazel Rootkit
Detection, Response and the Azazel RootkitDetection, Response and the Azazel Rootkit
Detection, Response and the Azazel Rootkit
 
Maplelabs scalable-field-device-cloud-native
Maplelabs scalable-field-device-cloud-nativeMaplelabs scalable-field-device-cloud-native
Maplelabs scalable-field-device-cloud-native
 
Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]
 
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
 
OSMC 2017 | Icinga2 in a 24/7 Broadcast Environment by Dave Kempe
OSMC 2017 | Icinga2 in a 24/7 Broadcast Environment by Dave KempeOSMC 2017 | Icinga2 in a 24/7 Broadcast Environment by Dave Kempe
OSMC 2017 | Icinga2 in a 24/7 Broadcast Environment by Dave Kempe
 
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with KubernetesHow To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
 
Using OPC-UA to Extract IIoT Time Series Data from PLC and SCADA Systems
Using OPC-UA to Extract IIoT Time Series Data from PLC and SCADA SystemsUsing OPC-UA to Extract IIoT Time Series Data from PLC and SCADA Systems
Using OPC-UA to Extract IIoT Time Series Data from PLC and SCADA Systems
 
Webinar slides: DevOps Tutorial: how to automate your database infrastructure
Webinar slides: DevOps Tutorial: how to automate your database infrastructureWebinar slides: DevOps Tutorial: how to automate your database infrastructure
Webinar slides: DevOps Tutorial: how to automate your database infrastructure
 
Automate threat detections and avoid false positives
Automate threat detections and avoid false positivesAutomate threat detections and avoid false positives
Automate threat detections and avoid false positives
 
DBOps
DBOpsDBOps
DBOps
 
[WSO2Con USA 2018] Microservices, Containers, and Beyond
[WSO2Con USA 2018] Microservices, Containers, and Beyond[WSO2Con USA 2018] Microservices, Containers, and Beyond
[WSO2Con USA 2018] Microservices, Containers, and Beyond
 
Historic Opportunities: Discover the Power of Ignition's Historian
Historic Opportunities: Discover the Power of Ignition's HistorianHistoric Opportunities: Discover the Power of Ignition's Historian
Historic Opportunities: Discover the Power of Ignition's Historian
 

Similar to Last Conference 2017: Big Data in a Production Environment: Lessons Learnt

Data Science in Production: Technologies That Drive Adoption of Data Science ...
Data Science in Production: Technologies That Drive Adoption of Data Science ...Data Science in Production: Technologies That Drive Adoption of Data Science ...
Data Science in Production: Technologies That Drive Adoption of Data Science ...
Nir Yungster
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA
 
The journey to Native Cloud Architecture & Microservices, tracing the footste...
The journey to Native Cloud Architecture & Microservices, tracing the footste...The journey to Native Cloud Architecture & Microservices, tracing the footste...
The journey to Native Cloud Architecture & Microservices, tracing the footste...
Mek Srunyu Stittri
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managability
Gaurav Bahrani
 
Netflix Architecture and Open Source
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open Source
All Things Open
 
Technology insights: Decision Science Platform
Technology insights: Decision Science PlatformTechnology insights: Decision Science Platform
Technology insights: Decision Science Platform
Decision Science Community
 
Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...
Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...
Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...
Codit
 
Deploy Eclipse hawBit in Production
Deploy Eclipse hawBit in ProductionDeploy Eclipse hawBit in Production
Deploy Eclipse hawBit in Production
Kynetics
 
DevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and ProjectsDevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and Projects
Fedir RYKHTIK
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
aspyker
 
GRANT DELP724
GRANT DELP724GRANT DELP724
GRANT DELP724
Grant Delp
 
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Vietnam Open Infrastructure User Group
 
From monolith to microservices
From monolith to microservicesFrom monolith to microservices
From monolith to microservices
TransferWiseSG
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
 
DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)
Blake Irvine
 
Workshop: Delivering chnages for applications and databases
Workshop: Delivering chnages for applications and databasesWorkshop: Delivering chnages for applications and databases
Workshop: Delivering chnages for applications and databases
Eduardo Piairo
 
PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
 PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
PROIDEA
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
Haggai Philip Zagury
 
Cloud computing
Cloud computingCloud computing
Cloud computing
jhoejoe
 

Similar to Last Conference 2017: Big Data in a Production Environment: Lessons Learnt (20)

Data Science in Production: Technologies That Drive Adoption of Data Science ...
Data Science in Production: Technologies That Drive Adoption of Data Science ...Data Science in Production: Technologies That Drive Adoption of Data Science ...
Data Science in Production: Technologies That Drive Adoption of Data Science ...
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
 
The journey to Native Cloud Architecture & Microservices, tracing the footste...
The journey to Native Cloud Architecture & Microservices, tracing the footste...The journey to Native Cloud Architecture & Microservices, tracing the footste...
The journey to Native Cloud Architecture & Microservices, tracing the footste...
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managability
 
Netflix Architecture and Open Source
Netflix Architecture and Open SourceNetflix Architecture and Open Source
Netflix Architecture and Open Source
 
Technology insights: Decision Science Platform
Technology insights: Decision Science PlatformTechnology insights: Decision Science Platform
Technology insights: Decision Science Platform
 
Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...
Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...
Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...
 
Deploy Eclipse hawBit in Production
Deploy Eclipse hawBit in ProductionDeploy Eclipse hawBit in Production
Deploy Eclipse hawBit in Production
 
DevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and ProjectsDevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and Projects
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
 
GRANT DELP724
GRANT DELP724GRANT DELP724
GRANT DELP724
 
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
 
From monolith to microservices
From monolith to microservicesFrom monolith to microservices
From monolith to microservices
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)DATA @ NFLX (Tableau Conference 2014 Presentation)
DATA @ NFLX (Tableau Conference 2014 Presentation)
 
Workshop: Delivering chnages for applications and databases
Workshop: Delivering chnages for applications and databasesWorkshop: Delivering chnages for applications and databases
Workshop: Delivering chnages for applications and databases
 
PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
 PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 

Recently uploaded

8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
aymanquadri279
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 

Recently uploaded (20)

8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 

Last Conference 2017: Big Data in a Production Environment: Lessons Learnt

  • 1. Featured Project: Marina Bay Sands Casino Resort, Singapore Connecting teams project-wide Big Data in a production environment: Lessons Learnt LAST Conference 2017 Mark Grebler - Aconex
  • 2. CONFIDENTIAL | 2 Featured Project: Marina Bay Sands Casino Resort, Singapore Connecting teams project-wide
  • 3. What does Big Data mean to you?
  • 4. Summary • What is the Insights project • Big Data for Data Science • Big Data in a production, user-facing environment • Lessons Learnt • Problems still to solve
  • 6. Highly flexible and customisable data model with low level concepts = useful for many types of projects Aconex has Flexible data The Insights Project
  • 7. Highly flexible and customisable data model with low level concepts = Difficult to produce meaningful customer reports Flexible Data Needs transformation The Insights Project
  • 8. What does it look like The Insights Project
  • 9. Typical Big Data Architectures
  • 10. Insights architecture Looks similar to the other architectures
  • 11. But the differences exist We use the AWS console to deploy new infrastructure I add new hardware by buying a new box and connecting it to the network Quotes from Data Engineers interviewed We deploy by copying the jar file to the cluster We don’t have any CI, I just build it on my box We test by running it over some data and ensuring it doesn’t crash We have some rudimentary tests
  • 12. What are the differences Other Big Data Projects ● Internal client ● Simple authentication ● For Data Scientists ● Single environment ○ Sometimes 2 or 3 ● Manual infrastructure management ● Sanity testing ● Manual integration ● Manual deployment ● Unrestricted data access Insights Project ● External client ● Integrated authentication ● For end users ● Multiple environments ○ Due to data sovereignty (10) ● Infrastructure as code ● Unit → end-to-end testing ● Continuous integration ● Single-step deployment ● Data access restrictions It’s not always so black and white, but the left side represents quite a lot of other projects I’ve seen.
  • 13. Lessons Learnt ● VPN to control data access ● Autoscaling application server ● Network independence ● Zero downtime-deployments with automatic rollback ○ ElasticBeanstalk provides this
  • 14. Lessons learned: Infrastructure-as-code ● Must be easily reproducible because we need to do it 10+ times ● Automation of infrastructure management ○ Infrastructure is a core part of the Big Data project, so it must be treated as important as our application code ○ Terraform is used to manage the infrastructure, including: ■ Networking and VPN management ■ Security ■ Provisioning VMs and other infrastructure ■ Replication and ingestion of data from Data Centres ■ Database Administration and Automation
  • 15. Lessons learned: Access segregation ● Different accounts for testing and production ● Separate VPCs for each environment ● Multiple user roles allows fine-grained control of access ● VPN used as a further level to restrict data access
  • 16. Lessons learned: Integration and deployment Continuous Integration Once built, versioned artifacts are pushed to s3 buckets Deployments Ansible is used to roll out new versions of the application and transformations Infrastructure Terraform controls the base infrastructure ● Deployments run in parallel across environments ● Docker image used for deployments to control dependencies
  • 17. Lessons Learnt: Automate Testing ● Big Data testing is hard ● Automated unit tests to ensure transformations are correct ○ We pair with our QA to generate the data, and validate the expected output for the unit tests ○ TDD-ish, but often testing done after development ● Automated Integration tests using a large data set ○ To ensure regressions haven’t occurred ● Manual end-to-end sanity tests ○ This should be automated in the future ● Manual exploratory testing
  • 18. Problems to resolve ● Testing ○ Big Data testing is time consuming ■ Particularly around data generation ○ How to effectively automate testing of the infrastructure ○ How to automate end-to-end sanity testing. ● Infrastructure ○ CI/CD with Terraform ○ So many moving parts makes management difficult ● Ingestion and transformations ○ How to move from batch processing to incremental or streaming ○ Removing the database clones ● Effectively communicating to the business what/why we’re doing what we are ○ Why are things so slow?

Editor's Notes

  1. Who's had a house built for them, built their own house, or organised a significant renovation? How many documents were needed? How many conversations were had? Think of the number of documents to build a skyscraper, or a refinery, etc.
  2. Looks similar. From the outside, no real differences.