Next Generation Data Warehouse Development with Lambda and Redshift

•Download as PPSX, PDF•

2 likes•537 views

This document discusses using Lambda and Redshift for next generation data warehouse development. It summarizes that Redshift provides a petabyte-scale data warehouse that is fast, horizontally scalable, and has massive storage capacity with attractive pricing. It then proposes using Lambda triggers and a CI/CD pipeline to empower database developers and DBAs by automating deployments, improving code quality and skills, and enhancing data security, uptime, and reputation.

Technology

Next Generation
Data Warehouse
Development
with Lambda and
Redshift
Andras Gombosi

AMAZON REDSHIFT
Petabyte scale Massively Parallel
Data Warehouse
Exceptionally fast *
Horizontally Scalable
Massive Storage capacity
Attractive and transparent pricing
SQL interface
AWS ecosystem
Secure *

Challenges – the cost of greatness
SLAs
Security
Audit & Compliance

Solution:
Empower the Database
Developer and DBA
communities with DevOps
methodologies!

5
AWS Cloud
Lambda
Trigger
CD Pipeline for DB code
Developer
And DBA
Communites
Git Push
Task Router
DEPLOYER APP

Data Modeller SDLC: Forward Engineering
CD PIPELINE
Data Modeling
Tool

Edge ETL Use Cases: AWS Billing Data Load
SQL Forwarder
CD PIPELINE

SDLC changes and effects
UPTIME and
REPUTATION
CODE QUALITY
AND SKILLS
AUTOMATIC,
PROCESS ENFORCED
DATA SECURITY
Push changes to repo instead of execute!

Well Architected : Security Pillar
Endpoints

Well Architected : Other Pillars
Security Reliability
AWS Well Architected Framework
Performance
Efficiency
Cost
Optimisation
Operational
Excellence

Conclusion &
Questions
• AWS Ecosystem
• Cloud-native
• Not just for Redshift
• Not “outside the box”, use
multiple boxes!

What's hot

Video: https://youtu.be/Zg8jrAOfqEY Feb 2021 Sydney Serverless Meetup talk on AWS Lambda Containers - bridging the gap between serverless and containers once and for all The serverless paradigm focuses on business problems and containers are the infrastructure abstraction of choice for most developers. With AWS Lambda container support, it is now possible to combine the two worlds to focus on business problems with the certainty of immutable infrastructure and unprecedented levels of code flexibility/portability. What does this brave new world of serverless containers on AWS looks like? How easy is it to implement/migrate? Which use cases are suitable? Let’s dive deep and find out!

AWS Lambda Containers - bridging the gap between serverless and containers on...

Yun Zhi Lin

AWS Office Hours: Dev and Test

Amazon Web Services

Real time serverless data pipelines on AWS

The Incredible Automation Day

Rapid Prototyping for Big Data with AWS

SoftServe

Real time Object Detection and Analytics using RedisEdge and Docker

Ajeet Singh Raina

Filip Kanpik - Product Manager @ Google Cloud Platform raised a topic related to the challenges and opportunities of the servaste architecture based on Google Cloud. The presentation will also cover open source solutions and launch of Serverless services on Kubernetes clusters using Knative; also outside of Google Cloud. We will talk about issues related to the efficiency of Serverless solutions in the public cloud and the direction of development of the entire area.

The future is Serveless | Filip Knapik | #4 Serverless UG Warsaw

Serverless User Group Poland

World's best AWS Cloud Log Analytics & Management Tool

Cloudlytics

Azure functions

The Incredible Automation Day

Scalable Application Development on AWS

Mikalai Alimenkou

In this talk from the Dublin Websummit 2014 AWS Technical Evangelist Ian Massingham discusses the major trends that are changing the gaming market today and how using the cloud as a development and delivery platform for gaming products and services can help meet the challenges that these trends present. Includes examples of gaming customers running on the AWS cloud today as well as a discussion of how you might build and scaling a gaming back-end on AWS using AWS services to enable low cost and pain free scaling of your gaming infrastructure.

Gaming in the Cloud at Websummit Dublin

Ian Massingham

Scalable Java Application Development on AWS

Mikalai Alimenkou

Learning Objectives: - Learn the basics of AWS Lambda and Amazon API Gateway - Understand how to build a web application using these services - Learn to architect a serverless application - Gain an overview of frameworks for building serverless applications What if you could build a web application that could support true web-scale traffic without having to ever provision or manage a single server? In this session, you will learn how to build a serverless website that scales automatically using services like AWS Lambda, Amazon API Gateway, and Amazon S3. We will review several frameworks that can help you build serverless applications, such as the AWS Serverless Application Model (AWS SAM), Chalice, and ClaudiaJS.

Building Serverless Web Applications - May 2017 AWS Online Tech Talks

Amazon Web Services

AWS Cloud Computing for Developers

Amazon Web Services

Welcome Keynote

Amazon Web Services

AWS Community Day Bangkok 2019 - Dev Ops Philosophy Increase Productivity

AWS User Group - Thailand

2017 September Golang Sydney meetup https://www.meetup.com/golang-syd/events/243263974/ Yun Zhi Lin wrote serverless-golang to bring about the perfect combination of strongly typed idiomatic Golang with the simplicity of Serverless Framework. Serverless Golang currently forms the backbone of amaysim’s Serverless Realtime Event Driven Architecture, Anti-Corruption Layer and Single Customer View across 4 business verticals. The library comes with easy to follow real world examples, and is entirely built and deployed immutably via Docker.

Amazingly Simple Serverless Go

Yun Zhi Lin

Cosmos DB and Azure Functions A serverless database processing.pptx

icebeam7

Trying out the Go language with Google App Engine

Lynn Langit

Running your database in the cloud presentation

Manish Singh

What's hot (19)

AWS Lambda Containers - bridging the gap between serverless and containers on...

AWS Office Hours: Dev and Test

Real time serverless data pipelines on AWS

Rapid Prototyping for Big Data with AWS

Real time Object Detection and Analytics using RedisEdge and Docker

The future is Serveless | Filip Knapik | #4 Serverless UG Warsaw

World's best AWS Cloud Log Analytics & Management Tool

Azure functions

Scalable Application Development on AWS

Gaming in the Cloud at Websummit Dublin

Scalable Java Application Development on AWS

Building Serverless Web Applications - May 2017 AWS Online Tech Talks

AWS Cloud Computing for Developers

Welcome Keynote

AWS Community Day Bangkok 2019 - Dev Ops Philosophy Increase Productivity

Amazingly Simple Serverless Go

Cosmos DB and Azure Functions A serverless database processing.pptx

Trying out the Go language with Google App Engine

Running your database in the cloud presentation

Similar to Next Generation Data Warehouse Development with Lambda and Redshift

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks

Databricks

Join us to learn about the state of serverless computing from Dr. Tim Wagner, General Manager of AWS Lambda. Dr. Wagner discusses the latest developments from AWS Lambda and the serverless computing ecosystem. He talks about how serverless computing is becoming a core component in how companies build and run their applications and services, and he also discusses how serverless computing will continue to evolve.

SMC301 The State of Serverless Computing

Amazon Web Services

Aws-What You Need to Know_Simon Elisha

Helen Rogers

Hands on Lab: Windows Workloads - AWS Online Tech Talks

Amazon Web Services

Beyond Relational

Lynn Langit

AWS Webcast - Migrating to RDS Oracle

Amazon Web Services

oin us to learn about the state of serverless computing from Dougal Ballantyne, Principal Product Manager, Serverless. Dougal Ballantyne discusses the latest developments from AWS Lambda and the serverless computing ecosystem. He talks about how serverless computing is becoming a core component in how companies build and run their applications and services, and he also discusses how serverless computing will continue to evolve. Learn More: https://aws.amazon.com/government-education/

The State of Serverless Computing | AWS Public Sector Summit 2017

Amazon Web Services

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...

Precisely

Building your first Analysis Services Tabular BI Semantic model with SQL Serv...

Microsoft TechNet - Belgium and Luxembourg

Cloud Big Data Architectures

Lynn Langit

Learning Objectives: - Secure network access to Amazon EC2 Instances with Security Groups - Select Amazon Machine Images (AMI) with Windows - Launch and configure a Windows virtual machine Bootstrap using Powershell - Create Key Pairs for authentication AWS helps you build, deploy, scale, and manage Microsoft applications quickly, easily, more securely, and more cost-effectively. This webinar will give you everything you need to get started deploying Windows Workloads on AWS, starting with creating and securing a new EC2 Windows instance. Join the hands-on-lab webinar and receive access to valuable online training. After the session, you can take your learning even further with free access to advanced and expert-level labs.

Hands On Lab: Windows Workloads on AWS - May 2017 AWS Online Tech Talks

Amazon Web Services

Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity, automates time-consuming database administration tasks, and provides you with six familiar database engines to choose from: Amazon Aurora, Oracle, Microsoft SQL Server, PostgreSQL, MySQL and MariaDB. In this session, we will take a close look at the capabilities of Amazon RDS and explain how it works. We’ll also discuss the AWS Database Migration Service and AWS Schema Conversion Tool, which help you migrate databases and data warehouses with minimal downtime from on-premises and cloud environments to Amazon RDS and other Amazon services. Gain your freedom from expensive, proprietary databases while providing your applications with the fast performance, scalability, high availability, and compatibility they need.

ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...

Amazon Web Services

Modern application delivery pipelines rely on series of increasingly complex test environments. Manual processes and ad-hoc management typically lead to misconfigured environments, scheduling conflicts, and project delays. You know automation and transitioning resources to the cloud can help, but don’t know where to start and unclear how to prove the value. Join Amazon Web Services (AWS) and Plutora in this joint presentation on managing test environments and transitioning them to the cloud. Learn secrets about Why you need a single source of truth to smooth out scheduling kinks. How to improve configuration tracking and management to enhance validation efforts. The best way to manage the complexity of large-scale test environments. How to reduce costs and eliminate conflicts by identifying and moving environments to the cloud.

Solved: Your Most Dreaded Test Environment Management Challenges

DevOps.com

Hands on Lab: Windows Workloads - AWS Online Tech Talks

Amazon Web Services

Learning Objectives: - Securing network access to Amazon EC2 Instances with Security Groups, Launch and configure a Windows virtual machine - Bootstrapping using Powershell - Creating Key Pairs for authentication AWS helps you build, deploy, scale, and manage Microsoft applications quickly, easily, more securely, and more cost-effectively. This Hands on Lab workshop will give you everything you need to get started deploying Windows Workloads on AWS, starting with creating and securing a new EC2 Windows instance.

Windows Workloads on AWS - July 2017 AWS Online Tech Talks

Amazon Web Services

AWS re:Invent 2016: The State of Serverless Computing (SVR311)

Amazon Web Services

In this webinar, learn how SnapLogic and Amazon Web Services helped Earth Networks create a responsive, self-service cloud for data integration, preparation and analytics. We also discuss how Earth Networks gained faster data insights using SnapLogic’s Amazon Redshift data integration and other connectors to quickly integrate, transfer and analyze data from multiple applications. To learn more, visit: www.snaplogic.com/redshift

Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...

SnapLogic

Building compelling Enterprise Solutions on AWS

Amazon Web Services

Practical Cloud

Lynn Langit

Best of re:Invent

Amazon Web Services

Similar to Next Generation Data Warehouse Development with Lambda and Redshift (20)

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks

SMC301 The State of Serverless Computing

Aws-What You Need to Know_Simon Elisha

Hands on Lab: Windows Workloads - AWS Online Tech Talks

Beyond Relational

AWS Webcast - Migrating to RDS Oracle

The State of Serverless Computing | AWS Public Sector Summit 2017

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...

Building your first Analysis Services Tabular BI Semantic model with SQL Serv...

Cloud Big Data Architectures

Hands On Lab: Windows Workloads on AWS - May 2017 AWS Online Tech Talks

ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...

Solved: Your Most Dreaded Test Environment Management Challenges

Hands on Lab: Windows Workloads - AWS Online Tech Talks

Windows Workloads on AWS - July 2017 AWS Online Tech Talks

AWS re:Invent 2016: The State of Serverless Computing (SVR311)

Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...

Building compelling Enterprise Solutions on AWS

Practical Cloud

Best of re:Invent

Recently uploaded

ICT role in 21st century education and its challenges

rafiqahmad00786416

The action of the next cyber saga takes place in the mystical lands of the Asia-Pacific region, where the main characters began their digital activities in the middle of 2021 and qualitatively strengthened it in 2022. Corporate espionage, document theft, audio recordings, and data leaks from messaging platforms were all a matter of one day for Dark Pink. Their geographical focus may have started in the Asia-Pacific region, but their ambitions knew no bounds, targeting a European government ministry in a bold move to expand their portfolio. Their victim profile was as diverse as a UN meeting, targeting military organizations, government agencies, and even a religious organization. Because discrimination is not a fashionable agenda. In the world of cybercrime, they serve as a reminder that sometimes the most serious threats come in the most unassuming packages with a pink bow.

Cyberprint. Dark Pink Apt Group [EN].pdf

Overkill Security

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

MINDCTI Revenue Release Quarter One 2024

MIND CTI

Exploring Multimodal Embeddings with Milvus

Zilliz

[BuildWithAI] Introduction to Gemini.pdf

Sandro Moreira

When you’re building (micro)services, you have lots of framework options. Spring Boot is no doubt a popular choice. But there’s more! Take Quarkus, a framework that’s considered the rising star for Kubernetes-native Java. It always depends on what's best for your situation, but how to choose the best solution if you're comparing 2 frameworks? Both Spring Boot and Quarkus have their positives and negatives. Let us compare the two by live coding a couple of common use cases in Spring Boot and Quarkus. After this talk, you’ll be ready to get started with Quarkus yourself, and know when to select Quarkus or Spring Boot.

Spring Boot vs Quarkus the ultimate battle - DevoxxUK

Jago de Vreede

CNIC Information System with Pakdata Cf In Pakistan

danishmna97

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

DBX First Quarter 2024 Investor Presentation

Dropbox

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

Manulife - Insurer Transformation Award 2024

The Digital Insurer

In the thrilling conclusion to 2023, ransomware groups had a banner year, really outdoing themselves in the "make everyone's life miserable" department. LockBit 3.0 took gold in the hacking olympics, followed by the plucky upstarts Clop and ALPHV/BlackCat. Apparently, 48% of organizations were feeling left out and decided to get in on the cyber attack action. Business services won the "most likely to get digitally mugged" award, with education and retail nipping at their heels. Hackers expanded their repertoire beyond boring old encryption to the much more exciting world of extortion. The US, UK and Canada took top honors in the "countries most likely to pay up" category. Bitcoins were the currency of choice for discerning hackers, because who doesn't love untraceable money?

Ransomware_Q4_2023. The report. [EN].pdf

Overkill Security

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

Passkeys: Developing APIs to enable passwordless authentication Cody Salas, Sr Developer Advocate | Solutions Architect - Yubico Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

apidays

Following the popularity of “Cloud Revolution: Exploring the New Wave of Serverless Spatial Data,” we’re thrilled to announce this much-anticipated encore webinar. In this sequel, we’ll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you’re building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

Recently uploaded (20)

ICT role in 21st century education and its challenges

Cyberprint. Dark Pink Apt Group [EN].pdf

Strategies for Landing an Oracle DBA Job as a Fresher

MINDCTI Revenue Release Quarter One 2024

Exploring Multimodal Embeddings with Milvus

[BuildWithAI] Introduction to Gemini.pdf

Spring Boot vs Quarkus the ultimate battle - DevoxxUK

CNIC Information System with Pakdata Cf In Pakistan

Artificial Intelligence Chap.5 : Uncertainty

DBX First Quarter 2024 Investor Presentation

presentation ICT roal in 21st century education

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Axa Assurance Maroc - Insurer Innovation Award 2024

Manulife - Insurer Transformation Award 2024

Ransomware_Q4_2023. The report. [EN].pdf

Boost Fertility New Invention Ups Success Rates.pdf

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

AWS Community Day CPH - Three problems of Terraform

Next Generation Data Warehouse Development with Lambda and Redshift

1. Next Generation Data Warehouse Development with Lambda and Redshift Andras Gombosi

2. AMAZON REDSHIFT Petabyte scale Massively Parallel Data Warehouse Exceptionally fast * Horizontally Scalable Massive Storage capacity Attractive and transparent pricing SQL interface AWS ecosystem Secure *

3. Challenges – the cost of greatness SLAs Security Audit & Compliance

4. Solution: Empower the Database Developer and DBA communities with DevOps methodologies!

5. 5 AWS Cloud Lambda Trigger CD Pipeline for DB code Developer And DBA Communites Git Push Task Router DEPLOYER APP

6. Data Modeller SDLC: Forward Engineering CD PIPELINE Data Modeling Tool

7. Edge ETL Use Cases: AWS Billing Data Load SQL Forwarder CD PIPELINE

8. Deployer functionality Email

9. SDLC changes and effects UPTIME and REPUTATION CODE QUALITY AND SKILLS AUTOMATIC, PROCESS ENFORCED DATA SECURITY Push changes to repo instead of execute!

10. Well Architected : Security Pillar Endpoints

11. Well Architected : Other Pillars Security Reliability AWS Well Architected Framework Performance Efficiency Cost Optimisation Operational Excellence

12. Conclusion & Questions • AWS Ecosystem • Cloud-native • Not just for Redshift • Not “outside the box”, use multiple boxes!

Editor's Notes

Hello, my name is Andras Gombosi. I am a senior Data- and Database Engineer at TerraAlto. We are a Dublin based well established technical consultancy focusing solely on AWS. We are an AWS Advanced Consulting Partner. We are also an AWS Managed Services Provider, members of an elite group of 126 companies worldwide having this competency. We are serving clients of all sizes from start-ups to truly global enterprises, we have countless migrations under our belt in Europe, Asia and also in AWS China. We are also working on various projects in the space of Big Data, IoT- and Data Lakes and Blockchain based track-and-trace solutions. One of our core operating principles is automation. The topic I have brought today is automation in a place where automation is not as widespread yet. Data Warehousing and BI development.
The ongoing rumour is that Redshift has been named to mark that we are moving away from something Red… I have worked with those Red technologies for nearly a decade, but since I have also made a shift. Redshift in physics happens when light undergoes an increase in wavelength. This phenomenon is directly related to the expansion of space, the expansion of the universe. Redshift is an exceptionally good service for corporate data warehouseing, both as a standalone DWH and as a SQL-compatible extension of a corporate data lake. As usual , AWS does most of the heavy lifting, but Data security and cluster performance however great care and attention from the customer side as well.
The capabilities of Redshift to grow make organisations capable of having a single, true enterprise Data Warehouse, typically queried, developed and modified by multiple, often geographically distributed teams and processes, in some cases hundreds having some sort of access to it. Developers and Data Engineers modify data and change structure in Data Marts Data Analysts query data directly DBA’s change Data Security (grants and revokes) and do housekeeping (VACUUM, ANALYZE) ETL processes (Glue, EMR, Matilion, Informatica) constantly insert and update Front end BI tools (QuickSight, Tableau, Microstrategy, SpotFire) query data through data marts Control? The challenges are not new, just a bit amplified, again because of the size, and because of the open source origins of Redshift, as open source solutions are typically surrounded by a tooling ecosystem, which is not present on Redshift right now out of the box. Challenges: SLA’s on Data Availability, and Uptime of data marts or other data sources for the upstream consumers. That means ETL/ELT jobs are running in a timely and performant manner, and BI teams and other upstream consumer tools can connect and query without any disruption. Security. In this case security of the Data itself. Who has access to what? Audit and Compliance. Who changed what exactly and when? In a complicated environment it is vital to have formal, automated processes without human intervention, otherwise due to the sheer scale the proper management of these challenges become very time-consuming, and sometimes near impossible.
One possible solution is a “DevOps” style governance framework. Yes, bringing database changes under the DevOps umbrella is an increasingly popular topic. There are many tools and many ways to build a pipeline, some of them pricy, some of them complicated, some of them only work with specific DB engines, and some of them are all three of these. Nevertheless, the principles are the same for a Redshift CD pipeline too. A Code repository for code version Control and audit is the entry point, triggering an event driven , automatic , intelligent Continuous Deployment capability Ideally this is accompanied with an in-cluster Database and Schema based User- and Privilege Management Framework which is controlling access via user groups, dedicated service users and default privileges. The solution I brought today is a BASIC, practically free Cloud- and AWS Native way to get going. It does not use anything, only AWS and Python.
Code Commit is the starting point. Multiple communities use separate Repositories, and different branches are set up. Some branches are protected, cannot be directly pushed into, only via Pull Requests and Merging. Code is being pushed or merged to the appropriate branch triggers a task router Task Router: Can be CodePipeline with a Lambda as custom action for the Build stage to execute anything on a database. For most organizations Lambda might be better suitable. Your mileage may vary. We are using Lambda for this step too. Task router understands information about the commit and evaluates requests. The Commit message, for example, for the order you want to run your SQL files, or the routing information. i.e. flag your commit if it has a big task to route it towards a container instead of Lambda (Limitation here is 15 minutes execution time.) Two major types of long running executions ETL COPY’s and UNLOAD’s , CREATE TABLE AS’s are usually done by an ETL / ELT tool (Glue, Matillion, Informatica, ) HOUSEKEEPING VACUUM / ANALYZE. Some ETL tools are also capable of scheduling these operations. Big Job deployer is entirely optional in most cases, depending of what other tools are available already in-house
A few examples of possible use cases apart from normal development work. Anything which is can commit a SQL file to a repository can utilize the framework. Automatic, controlled , central deployment of generated scripts forward engineered from a database modelling tool, be it a full new schema deployment or incremental deltas to structure. No more “Oops” situations where someone have accidentally dropped a few and broke another few views on Production instead of Dev just because he started to work before the third coffee. DBSchema, Aqua, Aginity, whatever is your weapon of choice. If the tool has git integration, it will work seamlessly with the PipeLine. If a Database team gets to a higher capability maturity level and the company can justify purchasing more complicated and potentially pricy Database Release Management software solutions, the SDLC might be changing again, but up until then…
What we frequently see is that more and more customers want to have almost real-time visibility of their AWS costs. AWS provides a neat extract mechanism which dumps the billing data into an S3 bucket, hourly if required. But good guy AWS not only dumps the raw data, it also dumps the SQL Commands and Manifest files for loading the raw csv’s to Redshift A trigger on the appropriate S3 put can start a function which picks up the event, makes minor changes to the loader SQL file (adding Redshift target schema for example), and commits the modified SQL to a repository monitored by a pipeline. Similar approaches can work very well even in certain Data Lake scenarios, or if you make the loader SQL and manifest part of your interface contract between systems, a deliverable with pieces of data.
TRIGGER -In a newly pushed commit, following info is getting automatically forwarded to Lambda in the trigger event - WHO - WHEN - Which repo - Which branch - Commit ID EXECUTOR - Most of the work is done by a Lambda function, written in Python. Boto is an incredibly convenient and elegant tool to create integration between AWS services. Retrieves commit details and code from Code Commit based on Commit ID Retrieves additional config from DynamoDB, such as hooks for Slack or Teams , Redshift host, and target database and schema. Retrieves appropriate Secrets from Secrets Manager. You will have to have a naming convention in place , [repo-branch] combo works fine. Executes code against Database Initiates notifications, Slack Hook, MS Teams Hook , basically anything supporting CURL / HTTP hooks, or email Exact setup depends on networking setup including Lambda networking, client preferences and existing messaging platform usage and integration capabilities. Logs Everything in CloudWatch
It will be a change, especially for teams at the low end of the Capability Maturity Model, but a crucial change, and that is exactly the point! Improved Code Quality, "lot of tools try to differentiate themselves with automatic code review capability" In the real, complicated world it is not always that simple that it can be codified, otherwise the DBA work would not have to be black magic! And there are other options, such as the new Redshift Recommendations, or clever monitoring of certain STL and STV views, sometimes in combination with alerting on a Kibana dashboard. Skills -> Pull Requests protected branches-> 4 eye checks Console provides easy access to relatively advanced GIT features, which is important, database development teams are traditionally a little bit behind in terms of experience in DevOps A human-to-human knowledge transfer is built in the deployment process, which automatically encourages Growth in both Team Maturity and individual developer skills, and Redshift Performance. Quality Many SQL statements can be scripted in an IDEMPOTENT way, so many scripts will be re-runnable. UPTIME The main effect is a much Improved, undisturbed Availability of Data for end-user facing BI tools. Breaking a Data Mart via an incorrect VIEW definition is now much harder. This leads to and trust in the IT team. Increased Customer satisfaction Overall Data Security enforced by automatic processes on every level, including auditing and traceability.
Multi-layer security is present. VPC (This was yesterday -> Re:Invent happened while I was sleeping) - closed VPC with Service Endpoints wherever it is possible (ENI or NAT setup might be required, but a seasoned SA should breeze through these) Executor Lambda running in closed VPC, which has S3 and Secrets Manager endpoint, also Redshift enhanced VPC routing is on. Code Commit has no VPC endpoints available yet, and also in AWS China there is no CodeCommit . Companies having very strict security requirements such as data (including code) cannot travel on the open internet even if encrypted still have choices, hosting GIT on an EC2 instance within closed VPC. Triggering the executors might require manual setup of the hooks. IAM IAM provides full lock-down capabilities on both Infrastructure and Services and Resource level - Bespoke Lambda and any other service execution / resource roles - Bespoke CodeCommit users and groups for engineers and senior / approver group Directory Service Working in Federation with IAM for Single Sign-On Console access, for example to facilitate Pull Request reviews and merges. The client controls access levels via AD Groups. Redshift Redshift -> in-database user management framework with service users the Lambda executors are utilising, and pre-configured upstream user groups. Many of our clients DO NOT even have credentials for any Redshift user accounts with elevated privileges, such as Schema owners or Superusers.
Reliability Lambda scales horizontally, Automatic burst 500 – 3000 (bigger regions, such as Ireland) Scaling on Code Commit and Secrets Manager are managed by AWS Just as on Fargate and ECS Operational Excellence Deployment of CD pipelines via Parameterized CloudFormation templates, infrastructure as a code Lambda : Retry functionality and Dead Letter Queues, optionally AWS Step functions for an extra layer of state management CloudWatch and X-Ray Notifications on functional DB code failures to Dev teams via Slack / Teams notifications Notifications and alerting on Infra level problems to SysOps teams via CloudWatch and DLQ Performance Efficiency Rightly sized Lambdas and rightly sized, configured containers for the Infra Power users are using repositories which are connected to dedicated Redshift users with access to Superuser / dedicated WLM queues DynamoDB -> autoscaling might be an overkill, depends on the size of the dev teams and branches to manage, but the main thing is that the load is measurable and the functionality is there to auto-scale if required. Cost Optimization The beauty is, that this is practically free to run once you build it, cost is insignificant if there is any. Lambda , Code Commit , Triggering, CloudFormation, all the nice tools are being made available free of charge or very cheap. Minimal cost associated with the “Big Job” The SQL code itself runs on the Redshift clusters!
Also, not just for Redshift. I believe the power of the AWS EcoSystem is evident, multiple cloud-native services working perfectly in concert to create an automated, event-driven, efficient, secure and scalable solution to a challenge. AWS is the perfect place for thinking “outside the box”

Next Generation Data Warehouse Development with Lambda and Redshift

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Next Generation Data Warehouse Development with Lambda and Redshift

Similar to Next Generation Data Warehouse Development with Lambda and Redshift (20)

Recently uploaded

Recently uploaded (20)

Next Generation Data Warehouse Development with Lambda and Redshift

Editor's Notes