The document provides wireframes and workflows for a CCS DDS UI. It includes screens and flows for makers to create views from data sources, add metadata, upload Python scripts, validate data, and send views to checkers. It also includes screens and flows for checkers to get view data, promote views between environments, and schedule view deployments. It discusses challenges with real-time/near real-time data and notes that manual tasks include uploading new source/attribute metadata and validating view data. Validation and maintenance tasks would require SQL, Python, Git, and BigTable skills from resources.
2. Agenda
• Updated UI Screens and flow according to recommended Automation
option
• Business Flow
• Workflow Automation – Environments view
• Realtime/Near Real time challenges
5. Work Queue Tabl
to display views
created by makers
All View
Accordion
Close State
6. Requirement Analysis
View Definition
Execution in progress
Script Executed
File ready to download
Validation in progress
Validation Completed
Review in progress
Life Cycle of View – By Maker Life Cycle of View – By Checker
Pending Review
Validation in progress
Validation Completed-
Ready to Promote
Promotion in progress
View Promoted
Ready to Schedule
View Creation Requirement
When the view creation is in
progress status of the view will
be ‘View Definition’
When the view creation is in
progress status of the view will
be ‘View Definition’
31. Maker Login
Maker
Landing Screen
Maker Click on
Create View
List
of created views
Click on
Create New button
Empty Form opens to
fill basic details
Maker Click on
Select Data Button
Source & Table
Accordion Opens
Maker Select
Source + Table
( Attributes pre
selected in Tables)
Maker Click Next
to add derived
attribute Metadata
Maker add
Derive attribute name
and Type
Maker Add
Derived Attribute
Metadata
Maker Select added
Attribute
Maker Click
Copy Data to Clipboard
Lightbox display
to confirm
Pre-Selected Values
Maker confirm
Pre-Selected Data
and proceed
Maker click on Upload
Pyspark script
Maker upload
Pyspark script
Through Upload Form
Maker click Next to Get
View Data
Maker select filters
to
Get View Data
Maker view status in
Work Queue for
Validation of View
Maker Click
Send View To Checker
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22
33. New View
Creation by
Maker
Validation by
Maker in
Sandbox
Validation by
Checker in
Sandbox
Promote View
from Sandbox to
UAT by Checker
Validation by
Checker in UAT
Schedule
Deployment
from UAT to Prod
Life cycle of View
54. Checker
Landing Screen
Click on
View
Click on
Get View Data
Display view details to
proceed to Get View
Data
Checker
view status in Work
Queue
Checker Download
Test Sample Data (CSV )
After the validation View
Will be ready to Promote
Checker Click on
Promote View
Checker fill promote
details to
Promote View
Checker Click Promote
View from Sandbox to
UAT
Checker Click on Get
View Data to Validate
Data in UAT
After the Successful
validation of View in UAT
Checker Click to Schedule
View
Checker Click View to
Schedule deployment
Checker Click on
schedule deployment
view from
UAT to Prod with CR No.
1 2 3 4
5 6 8
7
9 10 11 12
13 14
71. Notes
Manual tasks
• Any new sources/attributes to existing CDL tables will require a
manual upload to CDL metadata by CDL Data Ingestion Team.
• DDS control metadata table will be populated manually as a
weekly feed (TBD) and any changes to CDL control table will not
be propagated in real time until CDL has the capability to publish
Control table metadata changes.
• CDL metadata is not available as an API and hence it will be a
weekly extract (TBD) pulled from Sharepoint into DDS catalog.
Validation of CDL metadata is not in scope.
• Entire view Business Logic will be developed in Pyspark by a
business developer. The input tables for Pyspark code are staging
tables and output of Pyspark code will be message to Pubsub
topic.
• Once the job runs, the sample data can be downloaded by maker
and he/she has to validate data manually.
• Promote to Higher Non Prod environment –
1. Code merge
2. Check-in to git
• Promote to Prod Environment -
1. It needs a CR approval.
2. Check-in to git
3. Code merge
72. Assumptions
• CDL control table will be exposed to DDS and provide details on which CDL tables are DDS specific. This is required
• to populate DDS control metadata table.
• Adapter API to poll CDL control table to check CDL job completion.
• CDL control table will give us frequency at a job level. If there are multiple jobs populating CDL table at different frequencies, then
it will be pulled into DDS at a lowest frequency.
• The data from DDS will be extracted as a file and placed into NAS storage on-premise, however integration into BO is out of scope.
• DDS only source is CDL.
• Metadata such as CDL control table metadata, CDL metadata, view audit information can be placed on Github.
• Exception report can be generated in Datastudio from exception tables.
• Pyspark code needs to be scanned for vulnerabilities before uploading the code from UI.
73. Maintainence tasks
• As part of incremental loads, if a job fails, it will be retried thrice
(configurable) and on failure, xmatter alerts will be raised to
appropriate support team. The team needs to look into the issue and
resolve it.
• If transaction data is available and there is no corresponding master
data, needs to be written to exception table. These records won’t go
into Persistance layer. This needs to be handled by Pyspark code
provided by business developer.
• Debugging of any performance issues raised.
74. Workflow Automation Flow
(Environments view)
Data Pipelines
(Batch & Realtime)
delivering Views to
Persistence Layer
Design Environment
Execution Environment
9 Approve
view
Automated data pipelines &
API build for Views
(CDL > Staging > Persistence)
UAT
Data Pipelines
(Batch & Realtime)
delivering Views to
Persistence Layer
Prod
Checker
PROD
Views promoted &
scheduled
View Data sent back
7 Get View
Data
2 Create View
Maker
1 Browse
CDL Catalog
10 Promote
view (to UAT)
5 Get View
Data
6 Validate
View Data
View Data Sent back
Data Pipelines
(Batch & Realtime)
delivering Views to
Persistence Layer
Sandbox
Developer
3 Generate
PySpark
Code for
View
11 Get View
Data (UAT)
Views promoted &
scheduled
13 Promote
view (to Prod)
CR Number
4 Upload
PySpark Code
for View
8 Validate
Data
12 Validate
Data
75. Following are the skillset the resources needs to have to support the manual tasks as part of automation process
• Basic SQL skills
• Pyspark skills
• Git
• Performance tuning using Pyspark.
• Understanding of Bigtable for performance tuning as required.
• There are two options
• Business developer will be able to handle all manual tasks, maintenance tasks listed in previous slides.
• Along with business developer, there will be lean IT support team which does the manual and maintenance
tasks.
Post CG Contract – Support for Automation
76. • No RT/NRT use cases including data sources
• CDL doesn’t have any RT GCP service and RT data feed as of now
• Need to validate BigTable as a staging service
• NFRs for streaming not available
• Data Catalog needs to be made available for RT data elements
• Unclear on the type of transformations that the NRT flows would need
• Unclear how the users would like to see nested structure on screen and
selection of those nested attributes
Real/Near Real time Challenges