1. AWS RE:Cap Data
Jacob Verhoeks
Schuberg Philis
AWS Architect / Data Engineer
https://www.linkedin.com/in/jacobverhoeks/
https://github.com/jverhoeks
https://jacob.verhoeks.org/
2. Re:invent 2022 Data & Integration
Over 40% of the announcements where data related
Check the full list of announcements sorted per topic:
https://dev.to/aws-builders/reinvent-2022-releases-sorted-539c
4. Data zone, central platform to organize data
Self Service
Request/Approval
Search
5. Data zone, a new foundation
Core Technology: RedShift - S3 datalake - Lake formation
• Data sharing access control with Lake formation (preview)
• Extends SQL Capabilities (preview) MERGE, ROLLUP, CUBE, GROUPING SETS
• Support large nested import (JSON/PARQUET)
• Dynamic Data Masking (preview)
• Zero-ETL Aurora -> Redshift
• Auto copy from S3
• Integration with Spark
• Real-time ingestion from Kinesis Dat Streams and Managed Kafka
• AWS Backup Support
RedShift New features:
6. Informatica Data Loader -> Redshift (no cost)
Amazon Aurora Database, Amazon S3, Box, Coupa, Cvent, Eloqua, Google Analytics, Google Cloud Storage,
Google Cloud Spanner, Jira, MongoDB, Marketo, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure
Blob, Microsoft CDM Folders, MSD Dynamics 365 Ops, MSD Dynamics 365 Sales, Microsoft SQL Server,
MySQL Database, Netsuite, Odata, Oracle Database, PostgreSQL Database, Salesforce (Sales, Service,
Financial Health), Salesforce Marketing, SAP SuccessFactors Odata, ServiceNow, Shopify, Stripe, Xactly,
Zendesk, ZuoraAQUA
9. Redshift Auto Copy Job
https://aws.amazon.com/blogs/big-data/simplify-data-ingestion-from-amazon-s3-to-amazon-
redshift-using-auto-copy-preview/
Redshift will watch the folder and automatic import new files
10. Glue
AWS Glue 4.0
Spark 3.3.0 , Python 3.10
built-in Pandas, hudi, iceberg, deltalake support
AWS Glue for Ray
Parallel high performance Pandas
Reusable Visual transforms in Glue Studio
Glue Data Quality (preview)
https://aws.amazon.com/blogs/big-data/getting-started-with-aws-glue-data-quality-for-etl-
pipelines/
13. Appflow
22 new data connectors
Marketing: Facebook Ads, Google Ads, Instagram Ads, and LinkedIn Ads;
customer service and engagement connectors: MailChimp, SendGrid,
Zendesk Sell, Freshdesk, Okta and Typeform;
business operations solutions: Microsoft Teams, Zoom Meetings, Stripe,
QuickBooks Online, Jira Cloud and GitHub
link