More Related Content
Similar to Tera stream ETL
Similar to Tera stream ETL (20)
Tera stream ETL
- 2. © 2012 DataStreams Corp. All Rights Reserved.
Data Integration as Data Infrastructure
TeraStream™ for Data Integration
Case Studies
Appendix
Q & A
content
- 4. © 2012 DataStreams Corp. All Rights Reserved.
Data Integration Landscape: Business Challenges
Inaccurate data leads to bad or no decisions
More than 30% of IT budgets typically spent on Data integration
Inconsistent enterprise and application architecture for integration
Factors
Impact
Result
Disparate data
Inaccurate data
Incomplete data
Untimely data
Fragmented
Integration Approach
Multiple versions of the
“Truth”
Wasted time and
resources aggregating
information
Difficult to use Data
Delayed Decision making
Uninformed management
Bad decisions
Lost revenue
Lost productivity
Lost market opportunity
Bad Citizen relationships
This is more than 30 percent of corporate
IT budgets so data integrity is used to
emphasize what is important.
- 5. © 2012 DataStreams Corp. All Rights Reserved.
Data Integration
Deliver
Real time
Changed Data
capture
DI Solutions
Near Real Time
Data Processing
Enterprise Data Warehouse
E
T
L
E
T
L
Source System Integrated ODS/DW
ODS Model
(1:1)
DW Model
(ER)
Report Mart
Multidimensional
Mart
Summary Table
Data GovernanceArchitecture
Meta Data DataQuality Impact Analysis
Master Data
Management
Analyze
Application and
Data
Assure
High Quality
Manage
Metadata
DQ Solutions
Complete Enterprise Data Management Suite
DataStreams solution suite enables complex data integration projects with minimal
implementation effort while producing high-quality Business Intelligence output.
System Architecture
Operating
DB
- 6. © 2012 DataStreams Corp. All Rights Reserved.
Company ETL
Real Time
Data
Integration
Change
the data
extraction
High
Speed
Sorting
Enterprise
Meta Data
Mgt.
Data
Quality
Impact
Analysis
Master
Data Mgt.
Integrated
repository
Domestic
DataStreams
GTONE
WISE
EnCore
BTL
Global
Informatica
IBM
SAP
Oracle
SAS
Possession of Key Technology
* Possession * Processing * Not yet
- 8. © 2012 DataStreams Corp. All Rights Reserved.
TeraStreamTM
for Data Integration
TeraStream™ is a high-performance ETL solution with an user-friendly GUI proven for its
reliability in a variety of enterprises over a decade .
TeraStream™
Performance Experience User Friendly High Value
Powerful
Perfomance
(TeraSort™)
High-speed
extraction (FACT™)
Reuse of data (EBH)
Over 200 customers
Serving multiple
industries including
banking, government
retail
Over a decade of
experience
Intuitive GUI
Easy to operate
Easy to maintain
Fast implementation
Easy customization
Low resource use
- 9. © 2012 DataStreams Corp. All Rights Reserved.
TeraStream™ Approach
Variety of data types and formats transport from source to target as needed.
Covers enterprise-wise data flow from operational to subject Data Mart.
Also applied to high volume batch processing and near real-time data integration.
Loading
Files
New Systems
Files
Databases
Databases
Extraction
Transform / Cleansing
Conversion Reformat
Sort
Join
Aggregation
Automatic generation of scripts
can be used for loading to
various DBMSs
LOAD
Data extraction from various
commercial DBMS in high
speed
High performance SORT
engine resolves time bottleneck
due to transform large datum
EXTRACT TRANSFORM
- 10. © 2012 DataStreams Corp. All Rights Reserved.
TeraStream
TM
out-performed 3-times in speed against its competitor with 30% of CPU
resource using SORT Engine.(Data Migration in Shinhan Bank, Korea)
Excellent performance using novel method
thread MAX for sort =3
File manipulation : 35% CPU usage
Load : 80% of peak CPU usage
Parallel = 4
File manipulation : 58% of CPU usage.
Load: 58% of peak CPU usage
Elapse time : 20 minutes
Wasted System Resource : 800
( 40% Avg. CPU usage X 20 mins )
Conclusion
Elapse time : 59 minutes
Wasted System Resource : 3000
(50% Avg. CPU usage X 60 mins)
Conclusion
FILE → FILE FILE → DB
TeraStream™
FILE → DB DB → DB
IBM DataStage
- 11. © 2012 DataStreams Corp. All Rights Reserved.
Superior performance in NRT Implementation
Transportation of up to 1 million records per minute by reading flat files through EAI and
splitting them per tables eliminating the duplicated business days to Sybase IQ.
3 X
0
10
20
30
40
50
60
70
100 1,000 5,000 10,000 20,000
IBM DataStage
(minutes)
(Thousand records)
[Shinhan bank DW Benchmark in August, 2006)]
See Appendix 2 for performance of NRT additional information
10 million cases, expect more than 3 times
performance improvement
- 12. © 2012 DataStreams Corp. All Rights Reserved.
TeraStream™’s excellent performance can be applied to not only ETL but also daily batch jobs.
[Batch Job of POST Insurance Service Company, 2007]
No. of Records
Oracle
(SQL)
TeraStream
400,000 1m 32s 28s
1,000,000 5m 01s 41s
2,500,000 12m 21s 59s
No. of Recs
Oracle
Time
Exceptional Performance in Batch Jobs
250,000~500,000
Tth
High
Performance
Effective
use of
resources
Convenience
- 13. © 2012 DataStreams Corp. All Rights Reserved.
Over 56% improvement in ETL performance
Using EBH, TeraStream
TM
can cut
down data path from Legacy to DATA
MART saving ETL time and resource
usage.
Massive volume of files extracted
from Legacy Systems are stored in
EBH for further reuse in next step.
ETL time is reduced by avg. 56%. (In
LG Telecom from D-3 to D-1)
EDW Server
IBM p690
NCR 10Node
Teradata
D-1
Oracle 8i
ETL Server
ODS
Customer/Call/
Billing
Connection
PPS/BSS
Mining Input Variable
MOLAP Analysis
Mining Analysis
Campaign Analysis
Sybase IQ/ASE
OLAP
MART Server
CSM/AR
Billing
Oracle 8.0.6
CCS/MPS/ERP
CTI /PPS/NMS
SRDF
Legacy
ETL
EBH
Informatica
EBH (ETL and Batch Hub) stores temporary
and result files which is shared for further
table generation in EDW and DATA MART.
- 14. © 2012 DataStreams Corp. All Rights Reserved.
Over 20 times faster extraction than SQL
High speed data extraction of commercial database with SQL is supported.
Automatic extraction query is generated.
Select * from table
• High speed extraction engine(FACT™)
with optimized database API.
• DBMS Supported :
- Oracle
- Informix
- DB2 / UDB
- Sybase IQ /ASE
- Teradata
- Greenplum
- MSSQL /MySQL
- Altibase
• File split and filtering while extraction
• Time, time stamp, and user data format
specification
- 15. © 2012 DataStreams Corp. All Rights Reserved.
Intuitive User Interface
Supports for data integration activities(develop, execute, monitor, validation) in integrated
GUI environment
Intuitive task flow
Project monitor
Editor window
GUI for developers
Intuitive task flow
checking standard
output/error/file information/
number of files processed
Execution log
real time job monitoring
Project Monitor
scheduling by time/ period/
business calendar
Scheduler
Mapping creation
Editor window
SchedulerTask block execution log
Metadata property
Impact analysis
Change history manager
Metadata Repository
- 16. © 2012 DataStreams Corp. All Rights Reserved.
Work with best of breed DBMS providers
Powerful connection between different DBMS types.
Both DB-to-DB and File-to-DB data transportation are supported.
• N:N mapping
• Conversion while
transportation
• Click to choose record
processing types :
(Insert/delete/update/insert-
update/delete-insert)
• DBMS types : Oracle, DB2,
Sybase, Informix, Teradata,
Greenplum, MSSQL, MySQL,
(Altibase, Tibero)
Transformation
LogicSource Table Target Table
- 17. © 2012 DataStreams Corp. All Rights Reserved.
Easy Data Conversion
By mapping source to target, conversion of formats, types, character sets, dates, bytes/bits,
encryption
• Easy data conversion using mapping
window of “converter task block”
• Data character set conversion including
EBCDIC to ASCII
• Data conversion from NDB(Unisys 9-bit) or
HDB(IBM) data type to RDB
• 300 built-in functions
• DATE, Time Stamp Conversion between
different date formats
• CLOB/BLOB supported
• Users can add more functions as needed
List of provided functions
CALLED_NO function editor
=addday(cdate(“",”",” (N)")
addday(cdate("2005/05/12 12:08:24","YYYY/HH/DD HH:MI:SS"),2)
Converter task block
- 18. © 2012 DataStreams Corp. All Rights Reserved.
Easy Data Transport
TeraStream uses various transportation method according to file structure, transportation
distance, security, amount of record and etc.
• File to DB data load for bulk data
• “Load task block” generates
load scripts automatically.
• Remote transportation using
FTP
• Encryption while transporting
• Near Real-time and Bulk
transportation is possible
Load Scripts
- 19. © 2012 DataStreams Corp. All Rights Reserved.
Up to 40% cost Savings
The higher complexity, the bigger cost saving in development .
(Courtesy of Hanhwa Insurance Co. and SKC&C
in 2007)
Job
complexity
No. of
recs
Input
Size
(Gb)
TeraStream™
In-
house
coding
Speed-
up
1:1 mapping 90 22 30min 2hour 75%
1:N mapping 900 21 2hour 6hour 66%
N:1 mapping 1700 15 2hour 10hour 80%
N:N mapping,
complex
1300 8 2hour 20hour 90%
Avg. 70% of development speed-up
90% speed-up for more complex jobs
Overhead from modification, test and
preliminary data checking.
Development
(4Month)
Test
(4Month)
Stabilization
(1Month)
24M/M
48M/M
54M/M
TeraStream™
In-house coding
(Estimated)
40M/M
80M/M
90M/M
40%
Reduction
- 21. © 2012 DataStreams Corp. All Rights Reserved.
System configurationIssues
Plans
Kookmin Bank
IBM M/F
HDB, DB2
Server RDB
Sybase ASIQ 12.7IMS HDB
- Seg. split
- conversion & Array split
- logic applied
- conversion
- logic applied
- Logic applied
EDW
ETL
ETL
ETL
Informover
TS(FACT)
Informover
Source system
File process flow DB QUERY
Expected
Result
Various DBMS(IMS HDB, HOST DB2, Oracle, DB2 UDB) integration by using
TeraStream™
Meeting batch target time of 2 hours and 30 minutes for 4TB of EBCDIC data.
• M/F and IMS HDB conversion
• Processing changed data in absence of time-series
column
• Processing large size data within batch process
time(10TB/day based on source data)
• How to process high volume files in parallel
• Converting main frame data into data in Unix
environment (10TB → 25TB) within 18 hours.
• Various data conversion and processing including
Korean character conversion
• ETL task from accounting system server to new ODW
server(extracting appx. 200 GB of daily changed data
within 1 hour and 30 minutes by using FACT module of
TeraStream™)
• ETL and Batch process in unified way.
• Batch job in core banking system within 6 hours.
EDW and integrated DM installation
A-SOR DM
- 22. © 2012 DataStreams Corp. All Rights Reserved.
E-Voucher Statistical DWOperational
Health and Welfare Department’s e-Voucher
E-Voucher DW Performance Improvement
Statistics reporting time is dramatically reduced from 1~6 days to a few second or minutes.
Statistics reporting process made simple and easy to get report.
Consistent data delivery increase data reliability.
• daily transportation to ODS
• build ODS, DW and DM for better table model
• e-Voucher System (DB2 -> DW Server)
• Platform
- OS : AIX 5.3(ASIS,TOBE )
- CPU : Power5, 2.1GHz, 6core , IBM P-serise
- MEM : 12 GB
- H/W : 1TB
• Simple logic made MA easy
• Low data integrity
• Lack of expeditious response
• Fraud detection was hard.
• Low reliability of statistic data caused dispute
between data users and generators
Plans
Issues System Configuration
- ODS data conversion
- update/insert at ODS
- 1:1 mapping
- Daily batch
- Load to ODS
IBM P-serise
Voucher Service
Mis-settlement
Pregnancy & Birth
History
Target DB
(oracle)
FACT
ODS DM
DW
ETL
- ODS/ DW data manipulation
- update/insert to data mart
ETL
ETL
Expected
Result
Source DB
(oracle)
- 23. © 2012 DataStreams Corp. All Rights Reserved.
Deashin Securities
Deashin Securities Next generation System build
• Process transactions via data extraction and
transformation.
• Build preambles using transformed data.
• Bulk file processing (e.g. ASCII)
• Enable execution of modules in different languages
via shell.
• TeraStream Use Case
1. Non-periodic ETL or file processing routine.
Cybos UI -> TeraStream
Cybos UI generates a preamble or a report file.
2. daily/weekly/monthly/quarterly/yearly data batch
and non-periodic data processing routine
- Linkage between Control-M and TeraStream
- TeraStream extracts data from core-banking
- Data are transformed and loaded back to the
system.
• Bulk file operations required for file types such
as ASCII
• Modules in different languages to be executed via
shell.
Channel
(Service)
Channel
(External)
Core-Banking
(Business Data)
Business System
Cybos
Terminal
IE
CB+
FEP
X-MINS
FIX
Oracle
CORE DB
AIX
Control-M
Scheduler
Business Support AP
Batch AP
Websphere
NEFSS
HIS
(Web
Server)
TR(Online)
Unix
Shell
TeraStream
OTIS
Oracle
CORE DB
AIX
Oracle
CORE DB
AIX
1. Cybos ->
TeraStream
3. Control-M ->
TeraStream->
OTIS
2. Control-M ->
TeraStream
Services to ensure speed and reliability
Standardized linkage with other systems
24 * 365 system, building and operating the system faster issue resolution and
ease of maintenance
Expected
Result
Plans
Issues System Configuration
- 24. © 2012 DataStreams Corp. All Rights Reserved.
Samsung Electronics
• Rea-time data transportation between Germany and
China.
• Bi-directional synchronization between TeraStream of
Germany and China.
• 20 min. MAX loading time for transported data is
implemented using TeraStream NRT.
• Web Monitoring is developed
• Registration in one country should have the same
service at other country.
• duplicated record should be avoided due to cross
transportation.
• 20 minutes Near Real-time
• Perfect Recovery scheme should be presented
Plans
Issues System Configuration
Smart Phone
System in
Germany
DBs in
Service
Efficiency is maintained despite cross transportation
Bi-directional NRT integration allows the same service regardless of system type
and country (Time from extraction to loading is 20 minutes.)
Bi-directional remote data transportation using TeraStream
NRT Extract
프로그램 성공, 실패 등 실행 결과
Web Monitoring
Sam To DB
UPSERT
NRT Extract
SAM To DB
UPSERT
Global Database Integration using NRT ETL
DBs in
Service
Smart Phone System in China
Expected
Result
- 25. © 2012 DataStreams Corp. All Rights Reserved.
LG Telecom
• Solution provided by ‘I’ company requires more than
twelve hours for processing every billing and call data.
• It delays entire processes and often requires re-
processing of data.
• Efficient unique key generation for entire business tasks
• Transition from old to new billing system
- Data size: 3TB→ 3.5TB, Object: Transition in
30 minutes
• Move unchanged data among large dataset three
days prior to the new system open date.
• Separate files that will be loaded to EDW and DM
and load them in different business tables.
• Unique key generation for entire business process
is done first.
Legacy ODS Server
SRDFAR
Billing
MPS
ERP
PPS
NMS
CCS CTI
DM Server
EDW Server
IBM P Series
Sybase ASIQ
ODS
TeraStream loads data transformed
in ODS to EDW and DM at the same time.
ETL
ETL
CSM NCR 10Node
Custo
mer
Billing
Call
Data
Contacts PPS/
BSS
Teradata
OLAP
Mart
D+1
Oracle Oracle/
Informatica
Campaign
Analysis
Mining
Input
Variables
MOLAP
Analysis
Mining
Analysis
LG Telecom new billing system data transfer
The working hours shortened to D +3 and D +1 in reducing the system load
On average, 56% of the effect of reducing working hours
Emergency response system rework due to delay in securing and providing data
to minimize Impact
Expected
Result
Plans
Issues System Configuration
- 27. © 2012 DataStreams Corp. All Rights Reserved.
Real Time Change Data Capture_DeltaStream
DeltaStream is a real-time CDC(Change Data Capture) solution which automatically detects
the data change information from transaction log and transfers it to a target system.
Features Expected Result
System Architecture
Minimizing the burden on
source system
Minimizing the business
impact
Real-time data Capture
- 28. © 2012 DataStreams Corp. All Rights Reserved.
Metadata Management_MetaStream
MetaStream is to manage meta data which describes data, extracts and integrates meta
information which is spread over multiple systems, and supports for standardization management
system.
Features Expected Result
System Architecture
Improving efficiency by consistent
meta information management
from preventing meta data
redundancy.
Preventing redundant R&R and
meta request based on ownership
with standardization and model.
Saving analysis time
- 29. © 2012 DataStreams Corp. All Rights Reserved.
Data Quality Management _QualityStream
QualityStream is a data quality control solution which accesses to the target data, makes a
diagnosis, and analyzes the results. It analyzes the current data quality by running database
profiling. It registers the management issues and analyzes the results by scheduling.
Features Expected Result
System Architecture
Support of establishing quality
management system
Six sigma based approach to
generate more accurate statistical
indicators and precisely detect errors.
Efficient data quality control with
the register and management process.
Error rate reduction with error
data maintenance and control plan.
- 30. © 2012 DataStreams Corp. All Rights Reserved.
Application Impact Analysis_ ImpactStream
ImpactStream is Impact Analysis tool after changes in application. It constructs Application
Knowledge Database to improve understanding and readability. ImpactStream receives the
changed source from change management tool, automatically analyses it by parser engine,
stores it in the repository, and provides impact analysis information through search screen.
Features Expected Result
System Architecture
• Improving development productivity
and reducing maintenance costs
• IT Application Development /
Maintaining management information
• Integrating efficient enterprise
applications
• Improving control over outsourcing
- 31. © 2012 DataStreams Corp. All Rights Reserved.
Master Data Management_MasterStream
MasterStream is a master data management solution which ensures consistency of master data
within an enterprise. It has centralized type and cross over type to collect, create, verify, and
simultaneously distribute data. Data from the legacy system is integrated, verified by business
rules before it is referred by application system, synchronized, and monitored.
Main Components Expected Result
System Architecture
Improving efficiency in the workplace
by sharing the high quality key information
with enterprise users
Supporting quick decision making with
reliable statistical analysis
Reducing maintenance costs by improving
operating system with integration
- 33. © 2012 DataStreams Corp. All Rights Reserved.
App.1 : Product Configuration
TeraStream™ includes a sort engine and a high volume data extraction engine(FACT™), and
meta data is stored and managed in DBMS.
• easy to use GUI for developers.
User Interface
• High performance (FACT/CoSORT)
• External command(shell/SortCL)
• Query processing
• Data conversion (Korean/Japanese)
• Function processing
Data Processing
Metadata Management
Operations & Administration
User Interface
Operations & Administration
Data Processing Engine
TeraStream Designer
Metadata Management Engine
TeraStream DB
(Repository)
Log
Manager
Project
Scheduler
FFD
Manager
Process
Manager
Data
Access
Manager
Message
Broker
FACTTM
CoSORTTM
Converter USQL
External
command User SCL
• Job and system log management
• Job scheduling
• File Format Description for metadata
• Real-time job monitoring
• Authentication Management
•Data format, job & system
information in TSDB(Repository)Monitor
- 34. © 2012 DataStreams Corp. All Rights Reserved.
App. 2 : Time Table for NRT Implementation
Unit
(records in
thousand)
TeraStream™ D product
mapping/processing/loading mapping/processing/loading
start end time start end time
100 18:02:39 18:02:55 0:16 15:08:16 15:10:33 00:53
1000 18:05:25 18:06:23 0:58 15:11:13 15:20:34 03:32
5000 18:07:20 18:12:02 4:42 15:25:14 15:43:44 15:28
10,000 18:13:54 18:24:20 10:26 15:47:57 16:23:45 31:09
20,000 18:29:10 18:49:55 20:45 16:31:40 17:36:10 58:41
10,000
(concurrent
execution)
11:35:48 11:50:35 14:47 11:35:48 12:17:10 41:22
- 35. © 2012 DataStreams Corp. All Rights Reserved.
App. 3 : Performance Improvement Details
Job Task Cycle System Before After
Improvement
rate
Billing Sales Month
EDW 12:50 5:00 61%
OLAP Mart 18:35 8:20 55%
Calls
Charges day
EDW 5:50 3:00 49%
OLAP Mart 8:00 4:00 50%
ACCUM week
EDW 4:20 1:55 56%
OLAP Mart 7:20 3:00 60%
receiving CDR
(NMS)
day
EDW 1:00 0:30 50%
OLAP Mart 2:20 0:55 61%
Sending CDR
(NMS)
Day EDW 1:40 1:05 35%
ERP batch Month EDW 11:20 3:15 71%
receiving CDR
(NMS)
Month
EDW 5:00 2:15 55%
OLAP mart 11:40 2:20 80%
sending CDR (NMS) Month EDW 8:20 4:50 42%
ERP provided
BATCH
Month EDW 16:20 5:15 68%
Customer
Service
After service month EDW 5:30 5:05 9%
- 36. © 2012 DataStreams Corp. All Rights Reserved.
App. 4 : TeraStream ™ Features & Benefits(1/2)
TeraStream™ guarantees to meet your need for enterprise data integration as well as
excellent batch job hub.
Sort Engine
Using TeraSort™, TeraStream™ can accelerate sort-related
data manipulation (dedup, average, min, max, join, summary
and etc.)
FAst extraCT
FACT™ performs high speed bulk extraction from various
commercial DBMS.
Automatic Metadata
Generation
TeraStream™ provides direct reading of DBMS data dictionary
to create its own metadata information.
High Speed Lookup
It provides in-memory lookup function which is high speed
mapping conversion using lookup tables.
Variety of conversion
function calls
It provides more than 100 user friendly mapping functions.
Developers can easily add their own functions.
Pre/Post Processing
TeraStream™ provides inter-record and inter-table conversion
through pre/post mapping.
Major Features Description
- 37. © 2012 DataStreams Corp. All Rights Reserved.
TeraStream™ has been evolved to meet various parallel processing needs and to give
convenience through highly efficient GUIs.
Inter-node Operation
Remote call is possible to initiate the projects of other nodes
between TeraStream™s.
Distributed Computing using idle nodes is possible by easy
transfer of data.
Near Real-Time ETL
Data transportation every minute is possible including complex
data mapping
Efficient GUI
Using GUI, no skills on programming language are necessary.
Unified monitor and control in single screen or specialized
monitoring is possible through web browser.
Scheduling of jobs is made in unified GUI but even for
distributed servers.
Multi Language Support UTF-8 is supported.
App. 4 : TeraStream ™ Features & Benefits(1/2)
Major Features Description