MetaScale is a subsidiary of
Sears Holdings Corporation
The 3 Ts of Hadoop
Wuheng Luo
Ankur Gupta
06.2013
The 3 Ts of Hadoop
3-Stage Circular Process of Enterprise Big Data
What is the 3Ts?
3Ts = Transfer, Transform, and Translate
A new enterprise big data pattern
 to bring disruptive change t...
The 3Ts Goal
To simplify enterprise data processing, reduce latency to
turn enterprise data from raw form to products of d...
The 3Ts One Liners
Transfer
Once the Hadoop system is in place, a mandate is needed to
immediately and continuously captur...
Hadoop as Enterprise Data Hub
“Data Hub” is not a new concept, but:
Conventional Data Hub Hadoop Enterprise Data Hub
RDBMS...
TRANSFER
Sourcing Data into Hadoop
Intent
Capture continuously all enterprise data at earliest touch
points possible, deli...
TRANSFER
Motivation
To gain distinctive competing capability, enterprises need to
build an integrated data infrastructure ...
TRANSFER
(3 Ts’) Transfer vs. (ETL’s) Extract
Traditional ETL - Extract Hadoop - Transfer
Bottom-up Top-down
Task/project ...
TRANSFER
Consequences
Before After
Isolated, disconnected in various
siloed data/file systems
Consolidated and centralized...
TRANSFER
Implementation
 Always do a data gap analysis first
 Fork the ingestion in both batch and streaming if needed
...
TRANSFORM
Integrating Data within Hadoop
Intent
Keep the data flow beyond the ingest phase by transforming
the data from d...
TRANSFORM
Motivation
As the latency or speed from raw data to business insight
becomes the focal point of enterprise data ...
TRANSFORM
Implementation
 Partition enterprise-wide standardized data and job-specific analytical
data in HDFS, and retai...
TRANSFORM
(3 Ts’) Transform vs. (ETL’s) Transform
Transform in ETL / ELT Transform in 3 Ts
in vitro, outside Hadoop in viv...
TRANSLATE
Making Data Products out of Hadoop
Intent
Turn analytical data into data products of business wisdom
using home-...
TRANSLATE
Motivation
Low-latency big data analytics requires right platform/tools
Use Hadoop as the platform of choice for...
TRANSLATE
Implementation
 Big data analytics takes a team effort
 Include statisticians, data scientists and developers
...
The 3 Ts of Hadoop
Continuous Iteration of Enterprise Data Flow
Thank You!
For further information
email:
visit:
contact@metascale.com
www.metascale.com
MetaScale is a subsidiary of
Sear...
Upcoming SlideShare
Loading in...5
×

Luo june27 1150am_room230_a_v2

246

Published on

Near real-time, big data analytics is a reality via a new data pattern that avoids the latency and overhead of legacy ETL–the 3 T’s of Hadoop: Transfer, Transform, and Translate. Transfer: Once a Hadoop infrastructure is in place, a mandate is needed to immediately and continuously transfer all enterprise data, from external and internal sources and through different existing systems, into Hadoop. Previously, enterprise data was isolated, disconnected and monolithically segmented. Through this T, various source data are consolidated and centralized in Hadoop almost as they are generated in near real-time. Transform: Most of the enterprise data, when flowing into Hadoop, is transactional in nature. Analytics requires data be transformed from record-based OLTP form to column-based OLAP. This T is not the same T in ETL as we need to retain the granularity in the data feeds. The key is to transform in-place within Hadoop, without further data movement from Hadoop to other legacy systems. Translate: We pre-compute or provide on-the-fly views of analytical data, exposed for consumption. We facilitate analysis and reporting, for both scheduled and ad hoc needs, to be interactive with the data for analysts and end users, integrated in and on top of Hadoop.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
246
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Luo june27 1150am_room230_a_v2

  1. 1. MetaScale is a subsidiary of Sears Holdings Corporation The 3 Ts of Hadoop Wuheng Luo Ankur Gupta 06.2013
  2. 2. The 3 Ts of Hadoop 3-Stage Circular Process of Enterprise Big Data
  3. 3. What is the 3Ts? 3Ts = Transfer, Transform, and Translate A new enterprise big data pattern  to bring disruptive change to conventional ETL  To leverage Hadoop for streamlining data processes  To move toward real-time analytics
  4. 4. The 3Ts Goal To simplify enterprise data processing, reduce latency to turn enterprise data from raw form to products of discovery so as to better support business decisions.
  5. 5. The 3Ts One Liners Transfer Once the Hadoop system is in place, a mandate is needed to immediately and continuously capture and deliver all enterprise data, from all data sources, through all data systems, to Hadoop, and store the data under HDFS. Transform When source data is in, clean, standardize, and convert the data through dimensional modeling. Data transformation should be performed in-place within Hadoop, without moving the data out again for integration reasons. Translate Finish the data flow cycle by turning analytical data aggregated in Hadoop to data products of business wisdom. Use batch and streaming tools built on top of Hadoop to Interact with data scientists and end users.
  6. 6. Hadoop as Enterprise Data Hub “Data Hub” is not a new concept, but: Conventional Data Hub Hadoop Enterprise Data Hub RDBMS or EDW based Hadoop ecosystem based No consistent architectural style: ODS, MDM, messaging or publish- subscribe, etc. 3-phased architecture to cover full enterprise data flow cycle from data source to data products Heavily reply on ETL 3Ts-driven Intermediate, partial, siloed True center of enterprise data … …
  7. 7. TRANSFER Sourcing Data into Hadoop Intent Capture continuously all enterprise data at earliest touch points possible, deliver the data from all sources, through all source data systems, to Hadoop, and store the data under HDFS.
  8. 8. TRANSFER Motivation To gain distinctive competing capability, enterprises need to build an integrated data infrastructure as the foundation for big data analytics. Use Hadoop as THE centralized enterprise data repository, and make it the grand destination for all enterprise source data.
  9. 9. TRANSFER (3 Ts’) Transfer vs. (ETL’s) Extract Traditional ETL - Extract Hadoop - Transfer Bottom-up Top-down Task/project specific Enterprise-wide mandate Passive Proactive Data is not available when needed Data is ready when needed Same datasets are moved around again and again, with no value added Move the data once, and use it many times, each time with value increased
  10. 10. TRANSFER Consequences Before After Isolated, disconnected in various siloed data/file systems Consolidated and centralized in Hadoop Monolithically segmented Heterogeneous, diverse, huge Separated and partial Federated and holistic
  11. 11. TRANSFER Implementation  Always do a data gap analysis first  Fork the ingestion in both batch and streaming if needed  Have a delivery plan for the data feed  Synchronize data changes between source system and Hadoop
  12. 12. TRANSFORM Integrating Data within Hadoop Intent Keep the data flow beyond the ingest phase by transforming the data from dirty to clean, from raw to standardized, and from transactional to analytical, all within Hadoop.
  13. 13. TRANSFORM Motivation As the latency or speed from raw data to business insight becomes the focal point of enterprise data analytics, use Hadoop as data integration platform to perform in-place data transformation.
  14. 14. TRANSFORM Implementation  Partition enterprise-wide standardized data and job-specific analytical data in HDFS, and retain history.  Use dimensional modeling to transform and standardize, make dimensional data as the atomic unit of enterprise data.  Identify all enterprise data entities, and add finest grain attributes to each entity as dimensional data.  Take a bottom-up approach, also think about data usage across the enterprise, not specific task bound.
  15. 15. TRANSFORM (3 Ts’) Transform vs. (ETL’s) Transform Transform in ETL / ELT Transform in 3 Ts in vitro, outside Hadoop in vivo, within Hadoop Use Hadoop as rental space Use Hadoop as integration platform Non-value adding data movement in between data storage and transformation Data is transformed while flowing from one partition to another under HDFS High latency Low latency Network bottleneck Data locality
  16. 16. TRANSLATE Making Data Products out of Hadoop Intent Turn analytical data into data products of business wisdom using home-made or commercial tools of analytics built on top of Hadoop. Business decisions supported by data products will help generate more new data, thus a new round of enterprise data flow cycle…
  17. 17. TRANSLATE Motivation Low-latency big data analytics requires right platform/tools Use Hadoop as the platform of choice for enterprise data analytics because of its openness and flexibility Choose analytical tools that are flexible, agile, interactive and user friendly
  18. 18. TRANSLATE Implementation  Big data analytics takes a team effort  Include statisticians, data scientists and developers  Utilize both generic and Hadoop specific technologies  Consider both batch and streaming based approaches  Provide access to pre-computed view and on-the-fly query  Use both home-made and Hadoop-based commercial tools  Use web-based, mobile friendly UI  Visualize
  19. 19. The 3 Ts of Hadoop Continuous Iteration of Enterprise Data Flow
  20. 20. Thank You! For further information email: visit: contact@metascale.com www.metascale.com MetaScale is a subsidiary of Sears Holdings Corporation
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×