Video from presentation: https://youtu.be/SoSZdI2lMVQ
Processing vast amounts of data in the cloud has long been a nightmare not just for data analysts but also budget owners. We believe that migrating your data engineering workloads to can be beneficial, if you keep in mind some basic architectural principles. Teams processing big data in the cloud should understand and leverage its key attribute. Flexibility.The goal of our keynote is to share our experience and key learnings on how to fully utilize the power that the cloud offers and not go broke. This could be useful for both startups, but also large corporation as we will show examples of how to dramatically lower the cost of infrastructure.
Speaker: Johnson Darkwah, Big Data Solution Architect at Gauss Algorithmic, https://www.linkedin.com/in/johnson-darkwah-7ba76511/
4. Where are you running your Big Data ?
The vast majority has been
running on premise, on
private infrastructure
5. 33% of enterprises will take their data
lakes off life support.
Forrester Predictions 2018: The Honeymoon
For AI Is Over
6. Initial Cost of a Production
Big Data Management
Platform running On
Premise
200,000 EUR
5,092,000 CZK
7. Challenges with the Cloud
• Security remains a concern
for early adopters
• Experienced companies
focus on optimizing cost
• Data integration poses a
challenge
• Lack of resources
8. Why use the cloud today for analytics
It’s never been
cheaper to process
and store data in the
cloud
It’s never been
easier to manage
cloud services
Cost Flexibility
11. ü Utilize Object Stores
ü Pay only for what we really use
ü Get discounts
ü Optimize the Data Architecture
ü Automate Everything
Cloud Data Engineering Checklist
13. ü Utilize Object Stores
ü Pay only for what we really use
ü Get discounts
ü Optimize the Data Architecture
ü Automate Everything
Cloud Data Engineering Checklist
15. ü Utilize Object Stores
ü Pay only for what we really use
ü Get discounts
ü Optimize the Data Architecture
ü Automate Everything
Cloud Data Engineering Checklist
18. ü Utilize Object Stores
ü Pay only for what we really use
ü Get discounts
ü Optimize the Data Architecture
ü Automate Everything
Cloud Data Engineering Checklist
19. Realtime System 24x7 S3/ADLS M W W W W W W
SQL Engine
Terraform / Cloudera Altus
Existing Enterprise
Logs, CRM, RDBMS
Gauss
Data Tool
Clicks IoT Trans.
Temp cluster
Client Job
ML, SQL, Enrichment, …
Spot
On-premise
Cloud
RIOSOD
21. ü Utilize Object Stores
ü Pay only for what we really use
ü Get discounts
ü Optimize the Data Architecture
ü Automate Everything
Cloud Data Engineering Checklist
23. Cloudera Altus
• Enterprise ready & fully managed
cluster
• Spark, MRv2 and Hive supported
• Support for spot instances
• Uses S3 or ADLS
• Clusters can run only when
executing jobs
• Custom pre-built image (AMI)
support
• Per hour pricing with support
included
24. ü Utilize Object Stores
ü Pay only for what we really use
ü Get discounts
ü Optimize the Data Architecture
ü Automate Everything
Cloud Data Engineering Checklist