SlideShare a Scribd company logo
Configuration Driven
Reporting On a Large
Dataset Using Apache
Spark
Arvind Das, Senior Engineer, American Express
can connect with me @ http://linkedin.com/in/arvind-das-a8720b49
Zheng Gu, Engineer, American Express
can connect with me @ http://linkedin.com/in/zheng-gu-895bb4157
Agenda
• Introduction
• Need For Dynamic
Configuration based
Reporting
• Overall Design
• Components Involved
• Transformation at Scale
• Templating at Scale
Introduction- What is Reporting Framework?
Reporting framework entails dynamic scheduling of partner-specific
reports, transforming, aggregating and filtering the data into different
DataFrames using inbuilt as well as user-defined functions leveraging
Spark's in memory and parallel processing capabilities. This also
encompasses applying business rules and converting it into different
formats by embedding FreeMarker as template engine into the
framework
STATISTICS AND GENERAL NEED
PATTERN: Need for Configuration based Reporting
Different reports & feeds generation involve a common
pattern:
• How the input dataset is read
• Optional enhancement of the Dataset with a referential
data lookup
• Sequence of transformation rules
• Application of a template on final data
{ Control Various Parameters Of reporting, as dynamic configuration, external To the Actual Framework}
Common Reporting Framework
Different reports & feeds involve different:
• Partner/stakeholder configurations
• Frequency of generation
• Input Dataset and schema definition
• Aggregation rules
• Templates
Technical Components
• Configurations driving the reporting is maintained in config management system outside of
the framework
• Core reporting framework in a sequence of activities which runs as a Spark job
• K8s based scheduler app manages the job scheduling and frequencies based on
partner/downstream contracts
• FreeMarker template engine embedded into the framework, reads externally provided
template file
• Framework publishes the final report to s3 object store
A Sample Configuration File
{
"report-name": {
"title": "",
"type": "",
"id": "",
"schema": "sample-schema",
"look-up-dataset": [“”, “”],
"transformation-rule": {
"step1": , "step2": },
"report-template": ["report.ftlh"],
"sample-schema": [ ]
}
}
Report Meta-data
Schema to a
report
Step transform
rules
Schema elements
Report template
Lookup Dataset
Deep-Dive
Apply Schema Stage
• Create a Spark SQL Query from the Schema provided
• Filter out columns which are not needed for the report using Spark SQL
• Reduce the size of DataFrame
ID Name Sex Birth
1 name1 male 01/01/19
70
2 name2 female 01/01/19
70
Name Sex
name1 male
name2 female
DF1 DF2
select name, sex from DF1
{
"report-name": {
"title": "",
"type": "",
"id": "",
"schema": "sample-schema",
"look-up-dataset": [“”, “”],
"transformation-rule": {
"step1": , "step2": },
"report-template": [""],
"sample-schema": [ “name”, “sex”]
}
}
Data Lookup Stage
• Reports might have need for data that is not part of the input;
• For example, Static Data
• Join the data with the input Dataset based on a common key
ID Name Sex
1 name1 male
2 name2 female
DF1
ID Phone
1 123-456-7890
2 098-765-4321
sample-lookup
ID Name Sex Phone
1 name1 male 123-456-7890
2 name2 female 098-765-4321
DF after lookup
{
"report-name": {
"title": "",
"type": "",
"id": "",
"schema": "sample-schema",
"look-up-dataset": [“sample-lookup”],
"transformation-rule": {
"step1": , "step2": },
"report-template": [""],
"sample-schema": [ “name”, “sex”]
}
}
Apply Transformation Rules Stage
• Report needs aggregations at different levels. Several transformation rules are needed, and each
transformation returns a DataFrame
txn_ID txn_typ
e
cat_ID amoun
t
1 Credit 100 50
2 Debit 102 30
3 Credit 100 20
4 Credit 102 10
5 Credit 105 100
6 Debit 102 20
7 Credit 102 30
8 Credit 105 50
9 Credit 105 60
10 Debit 100 10
11 Debit 104 5
original DF
step1 DF
step2 DF
step3 DF
txn_ID txn_typ
e
cat_ID amoun
t
1 Credit 100 50
2 Debit 102 30
3 Credit 100 20
4 Credit 102 10
5 Credit 105 100
6 Debit 102 20
7 Credit 102 30
8 Credit 105 50
9 Credit 105 60
10 Debit 100 10
cat_id amount count
100 80 3
102 90 4
105 210 3
txn_type amount count
Credit 320 7
Debit 60 3
{
"report-name": {
"title": "",
…………….
"transformation-rule": {
"step1": select * from original_DF where txn_id <= 10,
"step2": select cat_id, sum(amount) as amount, count(amount)
as count from step1 group by cat_id,
"step3": select txn_type, sum(amount) as amount,
count(amount) as count from step 1 group by txn_type
},
"sample-schema": [ “name”, “sex”]
}
}
Apply Transformation Rules (continued…)
Along with Spark SQL functions, UDFs (User Defined Functions) provide customized querying abilities
• Some sample UDFs:
• Decimalize:
• Calculate the actual transaction amount
• Parameters: transaction amount, decimalization factor
select decimalize(amount, decimal_factor) from DF;
• Signage:
• Apply rules to change transaction amount to positive or negative value
• Be used for further aggregation
select signage(amount, xx, xx…) from DF;
Apply Template
• Choose a Template Engine
• Need to transform the DataFrame to the format that can be referred by the
template
• Each DataFrame will be transferred into List<T>
DataFrame Dataset<T> List<T>
<#list step2 as step2>
State: ${step2.state}
Count: ${step2.count}
</#list>
List<T> step2
Template
Engine
State: CA
Count: 4
State: MA
Count: 3
State: AZ
Count: 3
template:
data:
output:
{
"report-name": {
"title": "",
"type": "",
"id": "",
"schema": "sample-schema",
"look-up-dataset": [“”, “”],
"transformation-rule": {
"step1": , "step2": },
"report-template": [”sample-report.ftlh"],
"sample-schema": [ “name”, “sex”]
}
}
Template Options:
Velocity, Thymeleaf, StringTemplate, FreeMarker
We selected FreeMarker, because:
• Template engine for generic template
• Supported by the Apache Software Foundation (ASF)
• Is widely used across Apache projects
• New frequent release. Latest release was in Feb 2021
• Good documentations available
Success Metrics
• Time to build a new report reduced from a month to a week
• Customers of the reports doesn’t have to know the internal nuances of report build
• We do not need to have a highly skilled technical team for a report build
• Performance Of Processing up to 10 million records per report achieved
Thank you
Find your place in technology on #TeamAmex
https://jobs.americanexpress.com/tech

More Related Content

What's hot

Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Databricks
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
Knoldus Inc.
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
Databricks
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
colorant
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Databricks
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Spark tuning
Spark tuningSpark tuning
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0
Databricks
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Databricks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
Databricks
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
Databricks
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
Databricks
 

What's hot (20)

Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Spark tuning
Spark tuningSpark tuning
Spark tuning
 
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 

Similar to Configuration Driven Reporting On Large Dataset Using Apache Spark

Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
Ahmet Bulut
 
Lsmw ppt in SAP ABAP
Lsmw ppt in SAP ABAPLsmw ppt in SAP ABAP
Lsmw ppt in SAP ABAP
Aabid Khan
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data Mining
Nandakumar P
 
Ps training mannual ( configuration )
Ps training mannual ( configuration )Ps training mannual ( configuration )
Ps training mannual ( configuration )
Soumya De
 
Cognos framework manager
Cognos framework managerCognos framework manager
Cognos framework manager
maxonlinetr
 
IPC Data Analysis and Extraction
IPC Data Analysis and ExtractionIPC Data Analysis and Extraction
IPC Data Analysis and Extractionpzybrick
 
Get started with data migration
Get started with data migrationGet started with data migration
Get started with data migration
Thinqloud
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
BsMath3rdsem
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access Layer
Sambit Banerjee
 
How to obtain the Cloudera Data Engineer Certification
How to obtain the Cloudera Data Engineer CertificationHow to obtain the Cloudera Data Engineer Certification
How to obtain the Cloudera Data Engineer Certification
elephantscale
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
Chester Chen
 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima
Pratima Pandey
 
Stored procedure in sql server
Stored procedure in sql serverStored procedure in sql server
Stored procedure in sql server
baabtra.com - No. 1 supplier of quality freshers
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystem
DataWorks Summit
 
DQ Product Usage Methodology Highlights_v6_ltd
DQ Product Usage Methodology Highlights_v6_ltdDQ Product Usage Methodology Highlights_v6_ltd
DQ Product Usage Methodology Highlights_v6_ltdDigendra Vir Singh (DV)
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
Amazon Web Services
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data WarehousingDatastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
Vibrant Technologies & Computers
 
Ax 2012 R3 Legacy Data Migration
Ax 2012 R3 Legacy Data MigrationAx 2012 R3 Legacy Data Migration
Ax 2012 R3 Legacy Data Migration
Jayanta Sarkar
 
Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2
Eric Bragas
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
Syed Hadoop
 

Similar to Configuration Driven Reporting On Large Dataset Using Apache Spark (20)

Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 
Lsmw ppt in SAP ABAP
Lsmw ppt in SAP ABAPLsmw ppt in SAP ABAP
Lsmw ppt in SAP ABAP
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data Mining
 
Ps training mannual ( configuration )
Ps training mannual ( configuration )Ps training mannual ( configuration )
Ps training mannual ( configuration )
 
Cognos framework manager
Cognos framework managerCognos framework manager
Cognos framework manager
 
IPC Data Analysis and Extraction
IPC Data Analysis and ExtractionIPC Data Analysis and Extraction
IPC Data Analysis and Extraction
 
Get started with data migration
Get started with data migrationGet started with data migration
Get started with data migration
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access Layer
 
How to obtain the Cloudera Data Engineer Certification
How to obtain the Cloudera Data Engineer CertificationHow to obtain the Cloudera Data Engineer Certification
How to obtain the Cloudera Data Engineer Certification
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima
 
Stored procedure in sql server
Stored procedure in sql serverStored procedure in sql server
Stored procedure in sql server
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystem
 
DQ Product Usage Methodology Highlights_v6_ltd
DQ Product Usage Methodology Highlights_v6_ltdDQ Product Usage Methodology Highlights_v6_ltd
DQ Product Usage Methodology Highlights_v6_ltd
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data WarehousingDatastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
Ax 2012 R3 Legacy Data Migration
Ax 2012 R3 Legacy Data MigrationAx 2012 R3 Legacy Data Migration
Ax 2012 R3 Legacy Data Migration
 
Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Configuration Driven Reporting On Large Dataset Using Apache Spark

  • 1. Configuration Driven Reporting On a Large Dataset Using Apache Spark Arvind Das, Senior Engineer, American Express can connect with me @ http://linkedin.com/in/arvind-das-a8720b49 Zheng Gu, Engineer, American Express can connect with me @ http://linkedin.com/in/zheng-gu-895bb4157
  • 2. Agenda • Introduction • Need For Dynamic Configuration based Reporting • Overall Design • Components Involved • Transformation at Scale • Templating at Scale
  • 3. Introduction- What is Reporting Framework? Reporting framework entails dynamic scheduling of partner-specific reports, transforming, aggregating and filtering the data into different DataFrames using inbuilt as well as user-defined functions leveraging Spark's in memory and parallel processing capabilities. This also encompasses applying business rules and converting it into different formats by embedding FreeMarker as template engine into the framework
  • 5. PATTERN: Need for Configuration based Reporting Different reports & feeds generation involve a common pattern: • How the input dataset is read • Optional enhancement of the Dataset with a referential data lookup • Sequence of transformation rules • Application of a template on final data { Control Various Parameters Of reporting, as dynamic configuration, external To the Actual Framework} Common Reporting Framework Different reports & feeds involve different: • Partner/stakeholder configurations • Frequency of generation • Input Dataset and schema definition • Aggregation rules • Templates
  • 6. Technical Components • Configurations driving the reporting is maintained in config management system outside of the framework • Core reporting framework in a sequence of activities which runs as a Spark job • K8s based scheduler app manages the job scheduling and frequencies based on partner/downstream contracts • FreeMarker template engine embedded into the framework, reads externally provided template file • Framework publishes the final report to s3 object store
  • 7. A Sample Configuration File { "report-name": { "title": "", "type": "", "id": "", "schema": "sample-schema", "look-up-dataset": [“”, “”], "transformation-rule": { "step1": , "step2": }, "report-template": ["report.ftlh"], "sample-schema": [ ] } } Report Meta-data Schema to a report Step transform rules Schema elements Report template Lookup Dataset
  • 9. Apply Schema Stage • Create a Spark SQL Query from the Schema provided • Filter out columns which are not needed for the report using Spark SQL • Reduce the size of DataFrame ID Name Sex Birth 1 name1 male 01/01/19 70 2 name2 female 01/01/19 70 Name Sex name1 male name2 female DF1 DF2 select name, sex from DF1 { "report-name": { "title": "", "type": "", "id": "", "schema": "sample-schema", "look-up-dataset": [“”, “”], "transformation-rule": { "step1": , "step2": }, "report-template": [""], "sample-schema": [ “name”, “sex”] } }
  • 10. Data Lookup Stage • Reports might have need for data that is not part of the input; • For example, Static Data • Join the data with the input Dataset based on a common key ID Name Sex 1 name1 male 2 name2 female DF1 ID Phone 1 123-456-7890 2 098-765-4321 sample-lookup ID Name Sex Phone 1 name1 male 123-456-7890 2 name2 female 098-765-4321 DF after lookup { "report-name": { "title": "", "type": "", "id": "", "schema": "sample-schema", "look-up-dataset": [“sample-lookup”], "transformation-rule": { "step1": , "step2": }, "report-template": [""], "sample-schema": [ “name”, “sex”] } }
  • 11. Apply Transformation Rules Stage • Report needs aggregations at different levels. Several transformation rules are needed, and each transformation returns a DataFrame txn_ID txn_typ e cat_ID amoun t 1 Credit 100 50 2 Debit 102 30 3 Credit 100 20 4 Credit 102 10 5 Credit 105 100 6 Debit 102 20 7 Credit 102 30 8 Credit 105 50 9 Credit 105 60 10 Debit 100 10 11 Debit 104 5 original DF step1 DF step2 DF step3 DF txn_ID txn_typ e cat_ID amoun t 1 Credit 100 50 2 Debit 102 30 3 Credit 100 20 4 Credit 102 10 5 Credit 105 100 6 Debit 102 20 7 Credit 102 30 8 Credit 105 50 9 Credit 105 60 10 Debit 100 10 cat_id amount count 100 80 3 102 90 4 105 210 3 txn_type amount count Credit 320 7 Debit 60 3 { "report-name": { "title": "", ……………. "transformation-rule": { "step1": select * from original_DF where txn_id <= 10, "step2": select cat_id, sum(amount) as amount, count(amount) as count from step1 group by cat_id, "step3": select txn_type, sum(amount) as amount, count(amount) as count from step 1 group by txn_type }, "sample-schema": [ “name”, “sex”] } }
  • 12. Apply Transformation Rules (continued…) Along with Spark SQL functions, UDFs (User Defined Functions) provide customized querying abilities • Some sample UDFs: • Decimalize: • Calculate the actual transaction amount • Parameters: transaction amount, decimalization factor select decimalize(amount, decimal_factor) from DF; • Signage: • Apply rules to change transaction amount to positive or negative value • Be used for further aggregation select signage(amount, xx, xx…) from DF;
  • 13. Apply Template • Choose a Template Engine • Need to transform the DataFrame to the format that can be referred by the template • Each DataFrame will be transferred into List<T> DataFrame Dataset<T> List<T> <#list step2 as step2> State: ${step2.state} Count: ${step2.count} </#list> List<T> step2 Template Engine State: CA Count: 4 State: MA Count: 3 State: AZ Count: 3 template: data: output: { "report-name": { "title": "", "type": "", "id": "", "schema": "sample-schema", "look-up-dataset": [“”, “”], "transformation-rule": { "step1": , "step2": }, "report-template": [”sample-report.ftlh"], "sample-schema": [ “name”, “sex”] } } Template Options: Velocity, Thymeleaf, StringTemplate, FreeMarker We selected FreeMarker, because: • Template engine for generic template • Supported by the Apache Software Foundation (ASF) • Is widely used across Apache projects • New frequent release. Latest release was in Feb 2021 • Good documentations available
  • 14. Success Metrics • Time to build a new report reduced from a month to a week • Customers of the reports doesn’t have to know the internal nuances of report build • We do not need to have a highly skilled technical team for a report build • Performance Of Processing up to 10 million records per report achieved
  • 16. Find your place in technology on #TeamAmex https://jobs.americanexpress.com/tech