SlideShare a Scribd company logo
REIN IN YOUR DATA
The Importance & Impact
of Data Wrangling/Data Prep
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Business Statement
Data-Core is in the Business of providing
Customized IT & IT-enabled Services to SMB
Customers in Specific Vertical Segment
For more information: www.datacoresystems.com
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
About Data-Core
❖ Incorporated in 1988 as a Delaware corporation
❖ Headquartered in Philadelphia, PA with offices
worldwide
❖ Provider of IT and IT-enabled services
❖ Started AI-Labs in India for training professionals
❖ Data-Core is part of the DC Group, a diversified
multinational conglomerate founded in 1930
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data-Core: Quick Facts
❖ 1500 Employees worldwide
❖ Locations
US:
• Philadelphia, PA
• Bristol, PA
• Las Vegas, NV
❖ Certifications & Compliance
• HIPPA: Healthcare Privacy
• ISO 9001: Quality Standard
• ISO 27001: Security Standard
• PCI: Data Security
• CMMi: Process Improvement
Offshore:
• Kolkata, India
• Mumbai, India
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
IT Projects Healthcare BPS Consulting AI Labs
• BI/Analytics
• AI/Machine
Learning
• Data Wrangling
• Cloud Services
• API Integration
• SAP
• TIBCO
• .Net/ Java
• Salesforce
• Microsoft
• Data Science
Training
• PG Diploma
• PG Certificate
• Revenue Cycle
• Lockbox Processing
• Medical Claims
• Nimbus Platform
• Patient Engagement
• Health Analytics
• Media Intelligence
• Ad Monitoring
• RUZIVO Platform
• e-Publishing
• Mobile/e-book
• PubFlow Platform
• Legacy Applications
Practice Areas
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Transition to the Cloud with
minimal investment
Scalable infrastructure with
close monitoring of capacity
Azure Government Cloud
implementation for security
API Integration:
Connecting systems
Data Wrangling:
Prepping data
Real-time Analytics:
Visualizing data
Machine Learning/ AI:
Predictive Analytics
Implementing off-the-
shelf applications
Developing Customized
applications
Integrating with
existing ERP solutions
Automated testing for
functional & regression
Data
IT Project Practice - Services
Cloud
Application
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
You can have data without information but you cannot have
information without data.
~ Daniel Keys Moran
Data is like garbage. You would better know what you are going to
do with it before you collect it.
~ Mark Twain
Dirty data is a business problem not an IT problem.
~ Gartner
From the Experts
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
❖ Data Scientists/Analysts spend too much time preparing the data
❖ Utilizing data scientists to carry out data preparation process
❖ Mapping of raw unstructured data
❖ Data Analytics teams are highly depending on data engineering team to prepare data for analysis
❖ Preparing data without completely understanding business goals or context of the use case
❖ Manual Data Preparation process can hinder collaboration and efficiency of the process
automation
Bottlenecks with Data Preparation
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data Preparation
60%Data Analysis
12%
Data Modeling
10%
Development
10%
Data Integration
5%
Other
3%
Time Consumption(%)
Bottlenecks with Data Preparation
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data Science Pipeline
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data Sources
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Types of Raw Data
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Unstructured Semi-StructuredStructured
Text files
CSV files
Excel files
PDF
NoSQL
Excel files
XML
JSON
Email
Relational Data
NoSQL
Media Files
NoSQL
Types of Raw Data
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Problems with Raw Data
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data with Null Values
Data with Missing Values
Order No. Order Date Region Rep Item Units Unit Cost Total
1 9/1/14 Central Smith Desk 2 125.00 250.00
9 4/10/15 Central Andrews Pencil Null 1.99 131.34
10 12/12/14 Central Smith Pencil 67 1.29 86.43
11 4/18/14 Central Andrews Pencil 20 1.99 Null
14 6/25/14 Central Morgan Pencil 90 4.99 449.10
18 11/8/14 East Parent Pen 15 5.00 Null
Order No. Order Date Region Rep Item Units Unit Cost Total
1 9/1/14 Central Smith Desk 2 125.00 250.00
9 4/10/15 Central Andrews Pencil 66 1.99 131.34
10 12/12/14 Central Smith Pencil 67 1.29 86.43
11 4/18/14 Central Andrews Pencil 20 1.99 39.80
14 6/25/14 Central Morgan Pencil 90 4.99 449.10
18 11/8/14 East Parent Pen 15 19.99 299.85
Order No. Order Date Region Rep Item Units Unit Cost Total
5 1/15/15 Central Gill Binder 8.99
7 5/14/15 Gill Pencil 1.29 68.37
19 9/18/14 East Jones Pen Set 16 15.99 255.84
21 22/10/14 Jones Pen 64 575.36
26 8/24/15 West Sorvino Desk 3 275.00 825.00
28 10/14/15 West Thompson Binder 19.99 1,139.43
Order No. Order Date Region Rep Item Units Unit Cost Total
5 1/15/15 Central Gill Binder 46 8.99 413.54
7 5/14/15 Central Gill Pencil 53 1.29 68.37
19 9/18/14 East Jones Pen Set 16 15.99 255.84
21 10/22/14 East Jones Pen 64 8.99 575.36
26 8/24/15 West Sorvino Desk 3 275.00 825.00
28 10/14/15 West Thompson Binder 57 19.99 1,139.43
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Backdated Data
Order No. Order Date Region Rep Item Units Unit Cost Total
3 2/9/1450 Central Jardine Pencil 36 4.99 179.64
4 8/7/15 Central Kivell Pen Set 42 23.95 1,005.90
7 5/14/00 Central Gill Pencil 53 1.29 68.37
27 3/15/20014 West Sorvino Pencil 56 2.99 167.44
28 10/14/15 West Thompson Binder 57 19.99 1,139.43
29 09/27/159 West Sorvino Pen 76 1.99 151.24
Order No. Order Date Region Rep Item Units Unit Cost Total
3 2/9/14 Central Jardine Pencil 36 4.99 179.64
4 8/7/15 Central Kivell Pen Set 42 23.95 1,005.90
7 5/14/15 Central Gill Pencil 53 1.29 68.37
27 3/15/14 West Sorvino Pencil 56 2.99 167.44
28 10/14/15 West Thompson Binder 57 19.99 1,139.43
29 9/27/15 West Sorvino Pen 76 1.99 151.24
Data with Mismatched Values
Order No. Order Date Region Rep Item Units Unit Cost Total
12 5/31/15 Central Gill Binder 80 8.99 719.20
13 2/1/15 Center Smith Binder 87 15.00 1,305.00
15 2015 Dec 5 Central Jardine Binder 94 19.99 1,879.06
21 22/10/14 East Jones Pen 64 8.99 575.36
24 Jan 5 2015 East Jones Pencil 95 1.99 189.05
26 8/24/15 W Sorvino Desk 3 275.00 825.00
29 Sept 27 14 West Sorvino Pen 76 1.99 151.24
Order No. Order Date Region Rep Item Units Unit Cost Total
12 5/31/15 Central Gill Binder 80 8.99 719.20
13 2/1/15 Central Smith Binder 87 15.00 1,305.00
15 12/4/15 Central Jardine Binder 94 19.99 1,879.06
21 10/22/14 East Jones Pen 64 8.99 575.36
24 1/6/14 East Jones Pencil 95 1.99 189.05
26 8/24/15 West Sorvino Desk 3 275.00 825.00
29 9/27/15 West Sorvino Pen 76 1.99 151.24
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Disparate Data
Unnormalized Data
Order
No.
Order Date Region Rep Item Units Unit Cost Total
6 3/24/15 Central Jardine Pen Set 50 4.99 FALSE
8 7/21/15 Central Morgan Pen Set -55 12.49 686.95
14 6/25/14 Central Morgan Pencil -90 4.99 449.10
15 2015 Dec 5 Central Jardine Binder 94 19.99 1,879.06
16 11/25/14 Central Kivell Pen Set 96 4.99 FALSE
22 12/29/14 E Parent Pen Set 40 15.99 FALSE
Order No. Order Date Region Rep Item Units Unit Cost Total
6 3/24/15 Central Jardine Pen Set 50 4.99 249.50
8 7/21/15 Central Morgan Pen Set 55 12.49 686.95
14 6/25/14 Central Morgan Pencil 90 4.99 449.10
15 12/4/15 Central Jardine Binder 94 19.99 1,879.06
16 11/25/14 Central Kivell Pen Set 96 4.99 479.04
22 12/29/14 East Parent Pen Set 74 15.99 1,183.26
Order No. Order Date Region Rep Item Units Unit Cost Total
10 12/12/14 Central Smith Pencil 67 1.29 86.43
11 4/18/14 Central Andrews Pencil 20 1.99 Null
12 5/31/15 Central Gill Binder 80 8.99 719.20
20 7/4/15 East Jones Pen Set 62 4.99 309.38
23 7/29/14 East Parent Binder 81 19.99 1,619.19
25 4/27/15 East Howard Pen 96 4.99 479.04
Order No. Order Date Region Rep Item Units Unit Cost Total
10 12/12/14 Central Smith Pencil 67 1.29 86.43
11 4/18/14 Central Andrews Pencil 20 1.99 39.80
12 5/31/15 Central Gill Binder 80 8.99 719.20
20 7/4/15 East Jones Pen Set 62 4.99 309.38
23 7/29/14 East Parent Binder 81 19.99 1,619.19
25 4/27/15 East Howard Pen 96 4.99 479.04
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data Wrangling/Preparation Steps
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data Wrangling/Preparation with
Excel
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data Sources
On-premises
Cloud
Hybrid
Data Types
Structured
Unstructured
Semi-structured
Import Data
Live Streaming/
Real-time Data
Historical Data
Pre-requisites
Basic
Knowledge
about the tool
(Excel)
Default – 10MB
Maximum – 2GB
File Size
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Features
Easy Application Integrations
Easy to use and readily available tool
Records live data from different data sources
Analytics Features
Easy to import data from multiple data sources
Limitation
Data Integrity Issues
Version Control Issues
Difficult to append data from multiple data sources
Less Scalability
Limited file/spreadsheet size
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data Wrangling/Preparation with
SSIS
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
❖ SQL Server Integration Services (SSIS)
✓ A high performance integration and transformation tool used to perform data extraction, data
transformation and loading data (ETL) to various destinations.
✓ Component of Microsoft SQL Server database software.
❖ Extract-Transform-Load (ETL) Process
✓ Data Extraction is the process where data is collected from various sources like text files, XML
files, Excel files etc.
✓ The Data Transformation process where the collected data is transformed as per the
requirements before it is loaded into that destination storage.
✓ The Load process is when the data collected from various sources is transformed and is then
loaded to the destination storage.
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
❖ SSIS Features:
✓ Data Cleaning
✓ Intelligent Data Handling
✓ Data Integration
✓ Flexible concurrent data processing to multiple varied destinations
✓ Visual Studio, SQL Server and Custom code integration
❖ SSIS Limitations
✓ The limitation on the data volume/file size is directly dependent of the system memory (RAM)
where the SSIS package is running as SSIS loads the data into the memory.
✓ SSIS cannot process real-time/live streaming data.
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data Wrangling/Preparation with
Power BI
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data Wrangling/Preparation with
Python
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Demo
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data Wrangling/Preparation with
Azure ML Studio
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Demo
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data Wrangling/Preparation with
Azure Data Factory
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data Wrangling/Preparation with
Tableau Prep
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Demo
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Data Wrangling/Preparation with
Trifacta
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Demo
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
❖ Data Wrangling/Preparation tools can benefit Data Scientists to focus on the data analysis and other counter
part activities rather than utilizing 60% of their time on data preparation processes
❖ Data preparation tools and platforms overall enables the data engineering team to prepare data for analysis
more efficiently and therefore help the Data Analytics teams with valuable data
❖ Data Preparation tools and platforms have advanced features mapping of raw unstructured data, providing
live previews of the data and AI features that provide suitable suggestions based on the dataset and give an
overview of the overall data flow
❖ Data wrangling provides automation process features that help to prepare the data without any redundant
data preparation process
Summary
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Discussion
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
Thank You!
© Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com

More Related Content

Recently uploaded

一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 

Recently uploaded (20)

一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 

Featured

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 

Featured (20)

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 

The Importance & Impact of Data Wrangling/Data Prep

  • 1. REIN IN YOUR DATA The Importance & Impact of Data Wrangling/Data Prep © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 2. Business Statement Data-Core is in the Business of providing Customized IT & IT-enabled Services to SMB Customers in Specific Vertical Segment For more information: www.datacoresystems.com © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 3. About Data-Core ❖ Incorporated in 1988 as a Delaware corporation ❖ Headquartered in Philadelphia, PA with offices worldwide ❖ Provider of IT and IT-enabled services ❖ Started AI-Labs in India for training professionals ❖ Data-Core is part of the DC Group, a diversified multinational conglomerate founded in 1930 © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 4. Data-Core: Quick Facts ❖ 1500 Employees worldwide ❖ Locations US: • Philadelphia, PA • Bristol, PA • Las Vegas, NV ❖ Certifications & Compliance • HIPPA: Healthcare Privacy • ISO 9001: Quality Standard • ISO 27001: Security Standard • PCI: Data Security • CMMi: Process Improvement Offshore: • Kolkata, India • Mumbai, India © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 5. IT Projects Healthcare BPS Consulting AI Labs • BI/Analytics • AI/Machine Learning • Data Wrangling • Cloud Services • API Integration • SAP • TIBCO • .Net/ Java • Salesforce • Microsoft • Data Science Training • PG Diploma • PG Certificate • Revenue Cycle • Lockbox Processing • Medical Claims • Nimbus Platform • Patient Engagement • Health Analytics • Media Intelligence • Ad Monitoring • RUZIVO Platform • e-Publishing • Mobile/e-book • PubFlow Platform • Legacy Applications Practice Areas © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 6. Transition to the Cloud with minimal investment Scalable infrastructure with close monitoring of capacity Azure Government Cloud implementation for security API Integration: Connecting systems Data Wrangling: Prepping data Real-time Analytics: Visualizing data Machine Learning/ AI: Predictive Analytics Implementing off-the- shelf applications Developing Customized applications Integrating with existing ERP solutions Automated testing for functional & regression Data IT Project Practice - Services Cloud Application © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 7. © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 8. © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 9. You can have data without information but you cannot have information without data. ~ Daniel Keys Moran Data is like garbage. You would better know what you are going to do with it before you collect it. ~ Mark Twain Dirty data is a business problem not an IT problem. ~ Gartner From the Experts © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 10. ❖ Data Scientists/Analysts spend too much time preparing the data ❖ Utilizing data scientists to carry out data preparation process ❖ Mapping of raw unstructured data ❖ Data Analytics teams are highly depending on data engineering team to prepare data for analysis ❖ Preparing data without completely understanding business goals or context of the use case ❖ Manual Data Preparation process can hinder collaboration and efficiency of the process automation Bottlenecks with Data Preparation © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 11. Data Preparation 60%Data Analysis 12% Data Modeling 10% Development 10% Data Integration 5% Other 3% Time Consumption(%) Bottlenecks with Data Preparation © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 12. Data Science Pipeline © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 13. © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 14. Data Sources © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 15. Types of Raw Data © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 16. Unstructured Semi-StructuredStructured Text files CSV files Excel files PDF NoSQL Excel files XML JSON Email Relational Data NoSQL Media Files NoSQL Types of Raw Data © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 17. Problems with Raw Data © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 18. Data with Null Values Data with Missing Values Order No. Order Date Region Rep Item Units Unit Cost Total 1 9/1/14 Central Smith Desk 2 125.00 250.00 9 4/10/15 Central Andrews Pencil Null 1.99 131.34 10 12/12/14 Central Smith Pencil 67 1.29 86.43 11 4/18/14 Central Andrews Pencil 20 1.99 Null 14 6/25/14 Central Morgan Pencil 90 4.99 449.10 18 11/8/14 East Parent Pen 15 5.00 Null Order No. Order Date Region Rep Item Units Unit Cost Total 1 9/1/14 Central Smith Desk 2 125.00 250.00 9 4/10/15 Central Andrews Pencil 66 1.99 131.34 10 12/12/14 Central Smith Pencil 67 1.29 86.43 11 4/18/14 Central Andrews Pencil 20 1.99 39.80 14 6/25/14 Central Morgan Pencil 90 4.99 449.10 18 11/8/14 East Parent Pen 15 19.99 299.85 Order No. Order Date Region Rep Item Units Unit Cost Total 5 1/15/15 Central Gill Binder 8.99 7 5/14/15 Gill Pencil 1.29 68.37 19 9/18/14 East Jones Pen Set 16 15.99 255.84 21 22/10/14 Jones Pen 64 575.36 26 8/24/15 West Sorvino Desk 3 275.00 825.00 28 10/14/15 West Thompson Binder 19.99 1,139.43 Order No. Order Date Region Rep Item Units Unit Cost Total 5 1/15/15 Central Gill Binder 46 8.99 413.54 7 5/14/15 Central Gill Pencil 53 1.29 68.37 19 9/18/14 East Jones Pen Set 16 15.99 255.84 21 10/22/14 East Jones Pen 64 8.99 575.36 26 8/24/15 West Sorvino Desk 3 275.00 825.00 28 10/14/15 West Thompson Binder 57 19.99 1,139.43 © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 19. Backdated Data Order No. Order Date Region Rep Item Units Unit Cost Total 3 2/9/1450 Central Jardine Pencil 36 4.99 179.64 4 8/7/15 Central Kivell Pen Set 42 23.95 1,005.90 7 5/14/00 Central Gill Pencil 53 1.29 68.37 27 3/15/20014 West Sorvino Pencil 56 2.99 167.44 28 10/14/15 West Thompson Binder 57 19.99 1,139.43 29 09/27/159 West Sorvino Pen 76 1.99 151.24 Order No. Order Date Region Rep Item Units Unit Cost Total 3 2/9/14 Central Jardine Pencil 36 4.99 179.64 4 8/7/15 Central Kivell Pen Set 42 23.95 1,005.90 7 5/14/15 Central Gill Pencil 53 1.29 68.37 27 3/15/14 West Sorvino Pencil 56 2.99 167.44 28 10/14/15 West Thompson Binder 57 19.99 1,139.43 29 9/27/15 West Sorvino Pen 76 1.99 151.24 Data with Mismatched Values Order No. Order Date Region Rep Item Units Unit Cost Total 12 5/31/15 Central Gill Binder 80 8.99 719.20 13 2/1/15 Center Smith Binder 87 15.00 1,305.00 15 2015 Dec 5 Central Jardine Binder 94 19.99 1,879.06 21 22/10/14 East Jones Pen 64 8.99 575.36 24 Jan 5 2015 East Jones Pencil 95 1.99 189.05 26 8/24/15 W Sorvino Desk 3 275.00 825.00 29 Sept 27 14 West Sorvino Pen 76 1.99 151.24 Order No. Order Date Region Rep Item Units Unit Cost Total 12 5/31/15 Central Gill Binder 80 8.99 719.20 13 2/1/15 Central Smith Binder 87 15.00 1,305.00 15 12/4/15 Central Jardine Binder 94 19.99 1,879.06 21 10/22/14 East Jones Pen 64 8.99 575.36 24 1/6/14 East Jones Pencil 95 1.99 189.05 26 8/24/15 West Sorvino Desk 3 275.00 825.00 29 9/27/15 West Sorvino Pen 76 1.99 151.24 © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 20. Disparate Data Unnormalized Data Order No. Order Date Region Rep Item Units Unit Cost Total 6 3/24/15 Central Jardine Pen Set 50 4.99 FALSE 8 7/21/15 Central Morgan Pen Set -55 12.49 686.95 14 6/25/14 Central Morgan Pencil -90 4.99 449.10 15 2015 Dec 5 Central Jardine Binder 94 19.99 1,879.06 16 11/25/14 Central Kivell Pen Set 96 4.99 FALSE 22 12/29/14 E Parent Pen Set 40 15.99 FALSE Order No. Order Date Region Rep Item Units Unit Cost Total 6 3/24/15 Central Jardine Pen Set 50 4.99 249.50 8 7/21/15 Central Morgan Pen Set 55 12.49 686.95 14 6/25/14 Central Morgan Pencil 90 4.99 449.10 15 12/4/15 Central Jardine Binder 94 19.99 1,879.06 16 11/25/14 Central Kivell Pen Set 96 4.99 479.04 22 12/29/14 East Parent Pen Set 74 15.99 1,183.26 Order No. Order Date Region Rep Item Units Unit Cost Total 10 12/12/14 Central Smith Pencil 67 1.29 86.43 11 4/18/14 Central Andrews Pencil 20 1.99 Null 12 5/31/15 Central Gill Binder 80 8.99 719.20 20 7/4/15 East Jones Pen Set 62 4.99 309.38 23 7/29/14 East Parent Binder 81 19.99 1,619.19 25 4/27/15 East Howard Pen 96 4.99 479.04 Order No. Order Date Region Rep Item Units Unit Cost Total 10 12/12/14 Central Smith Pencil 67 1.29 86.43 11 4/18/14 Central Andrews Pencil 20 1.99 39.80 12 5/31/15 Central Gill Binder 80 8.99 719.20 20 7/4/15 East Jones Pen Set 62 4.99 309.38 23 7/29/14 East Parent Binder 81 19.99 1,619.19 25 4/27/15 East Howard Pen 96 4.99 479.04 © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 21. Data Wrangling/Preparation Steps © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 22. © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 23. Data Wrangling/Preparation with Excel © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 24. © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 25. Data Sources On-premises Cloud Hybrid Data Types Structured Unstructured Semi-structured Import Data Live Streaming/ Real-time Data Historical Data Pre-requisites Basic Knowledge about the tool (Excel) Default – 10MB Maximum – 2GB File Size © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 26. Features Easy Application Integrations Easy to use and readily available tool Records live data from different data sources Analytics Features Easy to import data from multiple data sources Limitation Data Integrity Issues Version Control Issues Difficult to append data from multiple data sources Less Scalability Limited file/spreadsheet size © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 27. Data Wrangling/Preparation with SSIS © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 28. © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 29. ❖ SQL Server Integration Services (SSIS) ✓ A high performance integration and transformation tool used to perform data extraction, data transformation and loading data (ETL) to various destinations. ✓ Component of Microsoft SQL Server database software. ❖ Extract-Transform-Load (ETL) Process ✓ Data Extraction is the process where data is collected from various sources like text files, XML files, Excel files etc. ✓ The Data Transformation process where the collected data is transformed as per the requirements before it is loaded into that destination storage. ✓ The Load process is when the data collected from various sources is transformed and is then loaded to the destination storage. © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 30. ❖ SSIS Features: ✓ Data Cleaning ✓ Intelligent Data Handling ✓ Data Integration ✓ Flexible concurrent data processing to multiple varied destinations ✓ Visual Studio, SQL Server and Custom code integration ❖ SSIS Limitations ✓ The limitation on the data volume/file size is directly dependent of the system memory (RAM) where the SSIS package is running as SSIS loads the data into the memory. ✓ SSIS cannot process real-time/live streaming data. © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 31. Data Wrangling/Preparation with Power BI © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 32. © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 33. Data Wrangling/Preparation with Python © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 34. Demo © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 35. Data Wrangling/Preparation with Azure ML Studio © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 36. © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 37. Demo © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 38. Data Wrangling/Preparation with Azure Data Factory © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 39. © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 40. Data Wrangling/Preparation with Tableau Prep © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 41. Demo © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 42. Data Wrangling/Preparation with Trifacta © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 43. Demo © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 44. ❖ Data Wrangling/Preparation tools can benefit Data Scientists to focus on the data analysis and other counter part activities rather than utilizing 60% of their time on data preparation processes ❖ Data preparation tools and platforms overall enables the data engineering team to prepare data for analysis more efficiently and therefore help the Data Analytics teams with valuable data ❖ Data Preparation tools and platforms have advanced features mapping of raw unstructured data, providing live previews of the data and AI features that provide suitable suggestions based on the dataset and give an overview of the overall data flow ❖ Data wrangling provides automation process features that help to prepare the data without any redundant data preparation process Summary © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 45. Discussion © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com
  • 46. Thank You! © Copyright 2019 Data-Core Systems, Inc. | datacoresystems.com