SlideShare a Scribd company logo
1 of 21
© 2015 Teradata
Douglas Moore, Principal Data Architect, Teradata
Streaming Data Lakes
Do’s and Don’ts from the Field
3
Introduction – About Me
• Principal (big) Data Architect
• Think Big Analytics – 7 years
• Data Lakes, Streaming Analytics, ETL, Strategy
• Before Big Data
– Analytic Data Warehousing
– OLTP
– Electricity
– High End Graphics
– Supercomputers
– Numerical Analysis
@douglas_ma
4
𝑣𝑎𝑙𝑢𝑒 = 𝑑𝑎𝑡𝑎
5
𝑣𝑎𝑙𝑢𝑒 =
𝑑𝑎𝑡𝑎
𝑡𝑖𝑚𝑒
6
https://bits.blogs.nytimes.com/2011/09/07/the-lifespan-of-a-link/
Value of data is
perishable
The Lifespan of a link
5.5hr
s
7
Relationships are perishable
Harming your customers
8
Failing Digital Strategy
Relationships are perishable
9
Annoying your customers
Relationships are perishable
10
Really???
Relationships are perishable
11
Director @ a major US airline:
“It’s not about analyzing 7 years of history to
make the future better,
it’s about looking at what happened this morning
and to make this afternoon better”
12
What is a Streaming Data Lake?
1. Data In
Motion
2. Layers of
Curation
Canonical
Model
Source
Facing
Consumer
Facing
13
Do’s & Don’ts
Do
Ingest,
Standardize,
Validate,
Enrich,
Integrate,
Conform &
Project
In a stream
Canonical
Model
Source
Facing
Consumer
Facing
14
Do’s & Don’ts
Don’t slow the data down
Example:
Don’t turn CDC
into batches
Batch Batch Batch
Batch
Streaming
& CDC
Raw Processed Conformed &
Integrated
15
Do’s & Don’ts
Do keep your data moving
Curate in a stream,
Sync as needed
sync sync sync
Streaming
& CDC
Raw Processed Conformed &
Integrated
NoSQL
16
Do’s & Don’ts
Do know your data, know your requirements and how they relate to
time
ab
a b a b
cd
c d
c
d
Event
IT System
System
Latency
Response
Watermark
Real World Projection Consumed
Operational
b
+
c
+
d
17
Do’s & Don’ts
Do think of batches as degenerate* streams
ab a
b
cd
c
d
*degenerate as in mathematics
Event Operational
IT SystemReal World
18
Do’s & Don’ts
Do checkpoint your streams
Important:
Audit Balance Controls
Recoverability
19
Do’s & Don’ts
Don’t spread related events across topics
ab
a
b
Profile Topic
Sales Topic
a) Profile update event
b) Sales Transaction
20
Do’s & Don’ts
Do put related topics together
ab
a
b
Profile Topic
Sales Topic
ab a
Customer
Topic
b
a) Profile update event
b) Sales Transaction
21
Summary
1. Tremendous value in ‘now’
2. Keep your data moving
3. Know how your data relates to time
Thank You!
Rate This Session #
with the Teradata Analytics Universe Mobile App
1254
@douglas_ma
Follow Me
Twitter
Questions/Comments
Email: Douglas.Moore@Teradata.com

More Related Content

Similar to Streaming data lakes - Do's and Don'ts from the field. Teradata Analytics Universe 2018-10-14

Why Data Science Projects Fail?
Why Data Science Projects Fail?Why Data Science Projects Fail?
Why Data Science Projects Fail?Ethan Ram
 
A Realistic Approach to Transforming IT Operations: Analytics + Automation + ...
A Realistic Approach to Transforming IT Operations: Analytics + Automation + ...A Realistic Approach to Transforming IT Operations: Analytics + Automation + ...
A Realistic Approach to Transforming IT Operations: Analytics + Automation + ...Enterprise Management Associates
 
Data Structures - The Cornerstone of Your Data’s Home
Data Structures - The Cornerstone of Your Data’s HomeData Structures - The Cornerstone of Your Data’s Home
Data Structures - The Cornerstone of Your Data’s HomeDATAVERSITY
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Aravindharamanan S
 
Why Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best OpportunityWhy Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best OpportunityZach Gardner
 
Data-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture RequirementsData-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture RequirementsDATAVERSITY
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements  Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data Blueprint
 
التنقيب في البيانات - Data Mining
التنقيب في البيانات -  Data Miningالتنقيب في البيانات -  Data Mining
التنقيب في البيانات - Data Miningnabil_alsharafi
 
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
DataEd Webinar:  Reference & Master Data Management - Unlocking Business ValueDataEd Webinar:  Reference & Master Data Management - Unlocking Business Value
DataEd Webinar: Reference & Master Data Management - Unlocking Business ValueDATAVERSITY
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1RUHULAMINHAZARIKA
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Profit from AI & Machine Learning: The Best Practices for People & Process
Profit from AI & Machine Learning: The Best Practices for People & ProcessProfit from AI & Machine Learning: The Best Practices for People & Process
Profit from AI & Machine Learning: The Best Practices for People & ProcessTony Baer
 
Data Architecture Strategies
Data Architecture StrategiesData Architecture Strategies
Data Architecture StrategiesDATAVERSITY
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyTamrMarketing
 
SMi Group's 16th annual E&P Information & Data Management conference & exhibi...
SMi Group's 16th annual E&P Information & Data Management conference & exhibi...SMi Group's 16th annual E&P Information & Data Management conference & exhibi...
SMi Group's 16th annual E&P Information & Data Management conference & exhibi...Dale Butler
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
FlockData Overview
FlockData OverviewFlockData Overview
FlockData OverviewFlockData
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersRuhollah Farchtchi
 
A Survey on Big Data Analytics
A Survey on Big Data AnalyticsA Survey on Big Data Analytics
A Survey on Big Data AnalyticsBHARATH KUMAR
 

Similar to Streaming data lakes - Do's and Don'ts from the field. Teradata Analytics Universe 2018-10-14 (20)

Why Data Science Projects Fail?
Why Data Science Projects Fail?Why Data Science Projects Fail?
Why Data Science Projects Fail?
 
A Realistic Approach to Transforming IT Operations: Analytics + Automation + ...
A Realistic Approach to Transforming IT Operations: Analytics + Automation + ...A Realistic Approach to Transforming IT Operations: Analytics + Automation + ...
A Realistic Approach to Transforming IT Operations: Analytics + Automation + ...
 
Data Structures - The Cornerstone of Your Data’s Home
Data Structures - The Cornerstone of Your Data’s HomeData Structures - The Cornerstone of Your Data’s Home
Data Structures - The Cornerstone of Your Data’s Home
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1
 
Why Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best OpportunityWhy Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best Opportunity
 
Data-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture RequirementsData-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture Requirements
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements  Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements
 
التنقيب في البيانات - Data Mining
التنقيب في البيانات -  Data Miningالتنقيب في البيانات -  Data Mining
التنقيب في البيانات - Data Mining
 
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
DataEd Webinar:  Reference & Master Data Management - Unlocking Business ValueDataEd Webinar:  Reference & Master Data Management - Unlocking Business Value
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Profit from AI & Machine Learning: The Best Practices for People & Process
Profit from AI & Machine Learning: The Best Practices for People & ProcessProfit from AI & Machine Learning: The Best Practices for People & Process
Profit from AI & Machine Learning: The Best Practices for People & Process
 
Data Architecture Strategies
Data Architecture StrategiesData Architecture Strategies
Data Architecture Strategies
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
 
SMi Group's 16th annual E&P Information & Data Management conference & exhibi...
SMi Group's 16th annual E&P Information & Data Management conference & exhibi...SMi Group's 16th annual E&P Information & Data Management conference & exhibi...
SMi Group's 16th annual E&P Information & Data Management conference & exhibi...
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Big_Data.pptx
Big_Data.pptxBig_Data.pptx
Big_Data.pptx
 
FlockData Overview
FlockData OverviewFlockData Overview
FlockData Overview
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makers
 
A Survey on Big Data Analytics
A Survey on Big Data AnalyticsA Survey on Big Data Analytics
A Survey on Big Data Analytics
 

Recently uploaded

7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 

Recently uploaded (20)

7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 

Streaming data lakes - Do's and Don'ts from the field. Teradata Analytics Universe 2018-10-14

Editor's Notes

  1. I’ve been a big data architect for the last 7 years I’ve deployed a lot of data lakes, streaming systems, ETL, and strategy to customers here and Europe.
  2. For the last 40 years, it’s been about integrated data, Then 10y rs ago it was about more and bigger data, the more data, the value you can extract. Machine Learning came along and simple algorithms given more data performed better than complicated rulesets and expert opinions distilled. Then Deep Learning came along and those algorithms are even more data hungry, very hungry. The curse of dimensionality
  3. These days, it’s not just more data it’s more data in less time drives value You still need curated data, when sensor data comes in to you, you’ll find lots of noise drop outs etc You still need to integrate your data, link it. Your sensor, claim, reservation data, becomes so much more valuable when linked to your customers, devices, properties,… Now you have to do all this not in 30 days, not a week or day but within seconds.
  4. The value of data is perishable The Half life of a tweet is just 2.8 hrs as found by Hilary Mason, then Bit.ly’s lead data scientist. Hilary Mason, Bit.ly’s lead scientist, found that links have different lifespans if they are posted on Facebook and Twitter or sent through e-mail or chat clients. After analyzing 1,000 popular links shared on bit.ly, Ms. Mason discovered that the average half life of a link on Twitter is 2.8 hours. On Facebook it’s 3.2 hours, and for e-mail and messenger services it’s 3.4 hours. This means a link gets an extra 24 minutes of life on Facebook compared to Twitter. Relate this to an engagement story AWS Storm based streaming analytics… binning event counts, fitting to a curve, R based models
  5. A hip established customer centered company is potentially harming Joe’s credit record because they can’t integrate their systems in a reasonable amount of time
  6. This is an example of a utility company with a failing digital strategy, they can’t within a reasonable amount of time integrate their mobile/internet with the rest of their legacy systems,
  7. In this case, a high tech digital company is just annoying Sheila
  8. Ed here is perplexed as to why there is just some random delay to updating his account,
  9. What she’s saying here is big data is nice, but the real value comes in producing insights, re-routing places, & resources in a timely manner that has meaning impact on operations.
  10. Someone suggested to me, perhaps we should call this a “Data River”
  11. Discuss Enriching vs. standardizing (appending quality factors, corrections, keeping original values) Discuss Validation vs. Routing You will need to join with ‘slow streams’. Keep them close in dataframes, caches.
  12. The first don’t, don’t slow the data down Anti-practice: “This one client… would source data, via CDC, … then land it in HDFS then that was it. No standardization, common keys, common summarizations… they would talk about real time,… yet they terminated the data flows at HDFS. They’re incurring a large cost by first doing it as a batch then later as a stream. Best Practice: Build levels of curation, within streams, sync to a durable storage as needed for other access patterns For stateful streams, for processing with a large watermark on the data projection you’ll need a low latency no-sql storage, sized according to your working set (volume rate * watermark)
  13. Anti-practice: “This one client… would source data, via CDC, … then land it in HDFS then that was it. No standardization, common keys, common summarizations… they would talk about real time,… yet they terminated the data flows at HDFS. They’re incurring a large cost by first doing it as a batch then later as a stream. Best Practice: Build levels of curation, within streams, sync to a durable storage as needed for other access patterns For stateful streams, for processing with a large watermark on the data projection you’ll need a low latency no-sql storage, this needs to be sized according to your working set (volume rate * watermark)
  14. Let’s say you have a real time analytics system and you want to see world wide reservations, or claims, or orders, or equipment status summarized, and summarized to a rolling five minute window: Response Time – The time between initiating a request and when the start of the response is first received. System Latency - The time between the event time and when event is available for analysis Operational Time - When did that event arrive into your data management system Watermark – The maximum lateness of a late arriving event before it’s considered too late. Now you can extend your water mark, but you’ll need more memory to maintain state. Event Time - What time did the business recognize the event? E.g. When order was signed, when the payment was processed, when the item was shipped There’s even more aspects of time - Processing windows, tumbling windows, sliding windows, recovery point objectives, return to operations
  15. Think of batches as de-generate streams, events are lumped together into thin slices of operational time. If you need another justification for doing streams, just remember It takes more resources, with lower system utilization to process batches.
  16. Do checkpoint and perform audit balance controls on your streams Anti-practice: “This major travel site, handling 100 billion XML events / day… They pay commissions based on their weblogs so accuracy is important. They have a beautifully designed streaming data lake… to checkpoint they quiesce the producers once a day at midnight, synchronize, then restart the producers Now this works for them, they can recover to the previous day’s values. Instead, look at every hour, every 5 minutes dropping a marker into each stream partition, this gives them an opportunity to reduce their recovery point objective Best Practice: “Drop a Coke can”, Metrics Metrics Metrics Every 5 minutes generate a count of your events, emit that on your metrics stream
  17. Let’s say a customer comes in and updates their credit card and then they go to order a widget from your website. Let’s say your transactional system writes a & b in the correct order. Your CDC captures these two events Your streaming system takes the two events and spreads them out over two subject oriented topics In this example, you have a chance that the sales transaction event arriving before the profile update reaches your system. Pain ensues Topics & partitions guarantee order of delivery, so don’t put your related events into separate topics. You’ve just exacerbated the one problem you were trying to avoid with late arriving data. What if the customer profile changed and then they perform a transaction? … Same kind of the same thing, the two are related you want to make sure they arrive in order as much as is possible. Do send fully annotated / enriched events unless you have a rediciulously large blob, like a move or something.
  18. Instead, put related topics together Topics & partitions guarantee order of delivery, do put your related records into the same topic & partition to help ensure the correct order of delivery and analysis.
  19. There’s much more to know but alas our time is short. There’s a tremendous value in now With Now you can better satisfy your customers and capture value your competitors are missing. Keep your data moving, it will require learning a couple new things but overall it will be more efficient and will better serve your business Know how your data relates to time, make sure event, operational, latency and response times are clearly tracked and understood by all involved.