SlideShare a Scribd company logo
Big Data :
Bits of History, Words of Advice
Venu Vasudevan
GLSEC Big Data Meetup
Big Data :
Bits of History, Words of Advice
Big Data Past
Big
Fast
intelligent
media
IoT
satellites
Big Data : Behavioral
Big Data
- The ‘V’ view of Big Data challenges
- Number of V’s up for debate
Big Data : Architectural
untidy
data
firehose
clean
analytics
fast &
good
slower & much better
Lambda
architecture
Lake architecture
Stream architecture
Technical
Technical
This Talk
Behavioral
View
Technology
Solution
Stack
‘Middleware’
(benefit of
hindsight)
some more some
governance culture (gap)
data economics
ownership
foodfights
dataeconomics
3 data points
Big
Fast
intelligent
media
IoT
satellites
Iridium
• mobile routers (10K mph), fixed
people
• no repeated patterns
• satellites N-S movement
• earth E-W movement
• regular topology, irregular
exceptions
• solar flares
• military satellite presence
Fast Data Problem
• cellular frequency allocation
(graph coloring problem)
• frequent fast recalculations (fast
routers + semi-fast earth)
• transmit-no transmit (solar flares,
military satellite presence)
• moving ‘seam’
seam
irregularities
Fast Data Problem
• cellular frequency allocation
(graph coloring problem)
• frequent fast recalculations (fast
routers + semi-fast earth)
• transmit-no transmit (solar flares,
military satellite presence)
• moving ‘seam’
• + ‘France’
seam
irregularities
broadcast
= +$$$
broadcast
= -$$$ (lawsuit)
Fast Data Problem
• quest for (OO)DB technology to
address ‘France’ as make-or-
break use case
• query expressive power
• complex constraint satisfaction
• query handling throughput
• 3-4 month benchmarking effort
seam
broadcast
= +$$$
broadcast
= -$$$ (lawsuit)
Fast Data Problem
• quest for (OO)DB technology to
address ‘France’
• query expressive power
• query handling throughput
• 3-4 month benchmarking effort
• France solved ‘out-of-
band’ (legally)
seam
broadcast
= +$$$
broadcast
= -$$$ (lawsuit)don’t overfit your architecture to
an extreme requirement
unless it’s from an extreme (paying) user
Big Data Problem
• systems management
• manage 66 ‘nodes’
• nodes moving at 10K mph
• ‘seam’ moving of 20K mph
• sounds harder than trivial, but
not too hard
‘Pre’ Lambda Solution
• Dumb edge | smart core
approach
• 15K events/sec/satellite
• 1M events/sec
• Fast & Approximate - FMEA:
’compiled’ lookup table for
failure modes
• Slow & Precise - Model-based
reasoning on satellite models
untidy
satellite
firehose
(1M events/sec)
actionable
insights
‘Pre’ Lambda
architecture
Model-Based
Reasoning
FMEA
‘Pre’ Lambda Solution
• Dumb edge | smart core
approach
• 15K events/sec/satellite
• Fast & Approximate - FMEA:
’compiled’ lookup table for
failure modes
• Slow & Precise - Model-based
reasoning on satellite models
• Simple, straightforward &
wrong.
untidy
satellite
firehose
(1M events/sec)
actionable
insights
‘Pre’ Lambda
architecture
Model-Based
Reasoning
real-time
expert system
FMEA
Yet, an architecture that is
‘rinsed and repeated’
over the years
why does dumb edge
smart cloud endure?
• edges are expensive ($2B)
• when edges go wrong
(break/blow up /collide) ,
they make headlines
$
$$$$$
why dumb edge smart
cloud
• edges are expensive ($2B)
• when edges go wrong
(break/blow up /collide)
and make headlines
• nobody messes with an
‘edge’ once it works
• clouds don’t make for good
news headlines
$
T-0
$$$$$
T-30 yrs
why dumb edge smart
cloud
• edges are expensive ($2B)
• when edges go wrong
(break/blow up /collide)
and make news headlines
• nobody messes with an
‘edge’ once it works
• thus, implementing an end-
to-end architecture causes
culture clashes
over my
dead body
iterate &
refine
an almost repeat
(Industrial IoT)
• edges are messy & domain
specific
• creating them means
dealing with culture clashes
• but .. an ounce of edge is
worth a pound of cloud
$$$$$
T-30 yrs
$
T-0
Things to consider
• Problem statement. What’s your ‘France’?
• colorful sub-problem. strategy overfit.
• Architecture. small fixes to IT/OT gap can go a long way to
a simpler problem
• Technology Choices. best practices & the risk of ‘rewardless
risk’
• right - make average programmers productive with new
tech
• frequent - turn great programmers into average
Big Data to Deep Metadata
streaming video(TV) ~ 1 petabyte/day
second
minute
hour
day/week
epochal
detect &
replace ads
Create Playlists by
Player,
Play, Sentiment
Identify minor characters
with rabid fan following
rejuvenate old content
derivenewcontent
‘chapterize’ by
Player,
Play, Sentiment
Platform Triage Challenge
new Product, new market
• one core technology, many
markets
• platform triaging challenge.
what drives the platform?
• highest (but uncertain) $
potential?
• ‘extreme’ requirement?
• sparsest competition?
• use case outlier is your biggest
customer
deep
metadata
technology
SaaS
data
platform
Advertising
Search
Video
concept
maps
ad replacement use case
• speed
• few days (on-demand content)
• few seconds (real-time rebroadcast with
new ads)
• precision
• low - best effort, for low cost
international content for niche audiences
• high - frame level for expensive content.
e.g. Sports/$10M/episode programming
• errors
• 90% accuracy - ok for long tail content
• ‘five nines’ for premium content
precision accuracy
speed
ad replacement
opportunity space
largest
customer
occam’s razor works (again)
• build to simplicity
• loose coupling between data
engg & equipment engg
• modularize complexity
• ‘differentiate your product’
changes
• ‘necessary evil’ changes
data-only
approach
+1st party integration
(dynamically configure
ad splicers)
3rd party knobs
(dynamically refresh CDN)
Architecture
but, what if ..
• Data is untidy
• Interpretation is subjective/cultural
• Automation is aspirational but quixotic
human-powered analytics
• some analytics tasks are too
‘slippery’ for machines
• data hard to characterize
• uneven video quality of ‘old’
archives
• untidy
• insights are subjective
human-powered analytics
• some analytics tasks are too
‘slippery’ for machines
• need for human
augmentation
• humans generate ‘training’
sets to bootstrap m/c learning
• humans completely take over
some tasks
machines vs humans
• crowdsourcing & human-
powered computing
• has been the ‘next big thing’
for a while
• checkered history:
• uneven output
• fraud
• uneven throughput
Machines Humans
fast slow
brittle malleable
objective subjective
clear nuanced
machines vs humans
• much of that has changed
• Amazon Mech Turk
• 500K active users
• the ‘human machine’ can
return substantial jobs in
under 30 mins
• quantifiable as a machine for
many media tasks - latency,
quality, error rate, thruput
Hybrid Architecture
Things to consider
• Beware ‘France’ in other forms:
• customer with loudest voice & ‘holy grail’ hairball
• Dealing with data quality & variability
• crowdsourcing has come a long way as credible ‘engine’
• If big data the answer, what is the question? (have strong opinion held
weakly)
• decision rationalization
• process automation
• human ‘power tool’ (e.g. compelling visualization) vs imperfect
automation
startup data jiu-jitsu
• How to create a data-
driven strategy before
the data shows up?
• rationalize future
SaaS revenue
models
• justify product
decisions in a data-
driven manner
need data
for product
need product
for data
startup data jiu-jitsu
• How to create a data-
driven strategy before
the data shows up?
• how ‘intelligent’ can
lighting control be
with 50-100K users?
• how do people use
dimmers (continuous
or quantized) — UX
implications
data set dilemma
• standard sources (e.g. Kaggle & UCI) insufficient
• few ‘physical world’ datasets
• expensive to collect
• may be specialized (vendor-specific)
• dataset proxies for IoT actuation may not work
• energy utilization != switch usage
big data, small start
• physical world data likely to
be smaller (1-10 homes, few
months)
• setup costs limit size of public
datasets
• e.g. UMass Smart* light switch
dataset
big data, small start
• consider data
‘augmentation’
• standard practice in AI (deep
learning) - horizontally flipping,
random crops …
• under-used in data space
• may need some thought on
perturbation models for your
domain
real
synthesized
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
In short ..
• big data success - equal parts tech & non-tech
• solving right problem, not just problem right
• revisit problem, and what success means
@venuv62
venu.vasudevan@nextio.co

More Related Content

Similar to Big Data : Bits of History, Words of Advice

Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)Ben Stopford
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
Denny Lee
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
 
Architecting a next-generation data platform
Architecting a next-generation data platformArchitecting a next-generation data platform
Architecting a next-generation data platform
hadooparchbook
 
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
NETWAYS
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
Aniekan Akpaffiong
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Softwareelliando dias
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
hadooparchbook
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
HostedbyConfluent
 
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Amazon Web Services
 
Traitement d'événements
Traitement d'événementsTraitement d'événements
Traitement d'événements
Amazon Web Services
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
Sri Ambati
 
Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platform
hadooparchbook
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
John Adams
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataTreasure Data, Inc.
 
Three Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking ObservabilityThree Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking Observability
DevOps.com
 
Big data use cases in the cloud presentation
Big data use cases in the cloud presentationBig data use cases in the cloud presentation
Big data use cases in the cloud presentation
TUSHAR GARG
 
Big Data
Big DataBig Data
Big Data
TUSHAR GARG
 

Similar to Big Data : Bits of History, Words of Advice (20)

Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New Contexts
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
Architecting a next-generation data platform
Architecting a next-generation data platformArchitecting a next-generation data platform
Architecting a next-generation data platform
 
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Software
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
 
Traitement d'événements
Traitement d'événementsTraitement d'événements
Traitement d'événements
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
 
Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platform
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
 
Three Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking ObservabilityThree Pillars, Zero Answers: Rethinking Observability
Three Pillars, Zero Answers: Rethinking Observability
 
Big data use cases in the cloud presentation
Big data use cases in the cloud presentationBig data use cases in the cloud presentation
Big data use cases in the cloud presentation
 
Big Data
Big DataBig Data
Big Data
 

More from Venu Vasudevan

Chatbots 101
Chatbots 101Chatbots 101
Chatbots 101
Venu Vasudevan
 
IIoT : Old Wine in a New Bottle?
IIoT : Old Wine in a New Bottle?IIoT : Old Wine in a New Bottle?
IIoT : Old Wine in a New Bottle?
Venu Vasudevan
 
Retrofit IoT
Retrofit IoTRetrofit IoT
Retrofit IoT
Venu Vasudevan
 
Mobile services for immobile users
Mobile services for immobile usersMobile services for immobile users
Mobile services for immobile usersVenu Vasudevan
 
Effortless Interfaces for Appified TV
Effortless Interfaces for Appified TVEffortless Interfaces for Appified TV
Effortless Interfaces for Appified TV
Venu Vasudevan
 
Fun and games for profit
Fun and games for profitFun and games for profit
Fun and games for profit
Venu Vasudevan
 
Can Couch Potatoes be Collaborators?
Can Couch Potatoes be Collaborators?Can Couch Potatoes be Collaborators?
Can Couch Potatoes be Collaborators?
Venu Vasudevan
 
Dual screen tv
Dual screen tvDual screen tv
Dual screen tv
Venu Vasudevan
 
tv.next
tv.nexttv.next
A social web for consumer and embedded devices
A social web for consumer and embedded devicesA social web for consumer and embedded devices
A social web for consumer and embedded devices
Venu Vasudevan
 
The Evolution of Mobile Information Services
The Evolution of Mobile Information ServicesThe Evolution of Mobile Information Services
The Evolution of Mobile Information Services
Venu Vasudevan
 

More from Venu Vasudevan (11)

Chatbots 101
Chatbots 101Chatbots 101
Chatbots 101
 
IIoT : Old Wine in a New Bottle?
IIoT : Old Wine in a New Bottle?IIoT : Old Wine in a New Bottle?
IIoT : Old Wine in a New Bottle?
 
Retrofit IoT
Retrofit IoTRetrofit IoT
Retrofit IoT
 
Mobile services for immobile users
Mobile services for immobile usersMobile services for immobile users
Mobile services for immobile users
 
Effortless Interfaces for Appified TV
Effortless Interfaces for Appified TVEffortless Interfaces for Appified TV
Effortless Interfaces for Appified TV
 
Fun and games for profit
Fun and games for profitFun and games for profit
Fun and games for profit
 
Can Couch Potatoes be Collaborators?
Can Couch Potatoes be Collaborators?Can Couch Potatoes be Collaborators?
Can Couch Potatoes be Collaborators?
 
Dual screen tv
Dual screen tvDual screen tv
Dual screen tv
 
tv.next
tv.nexttv.next
tv.next
 
A social web for consumer and embedded devices
A social web for consumer and embedded devicesA social web for consumer and embedded devices
A social web for consumer and embedded devices
 
The Evolution of Mobile Information Services
The Evolution of Mobile Information ServicesThe Evolution of Mobile Information Services
The Evolution of Mobile Information Services
 

Recently uploaded

guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
VivekSinghShekhawat2
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 

Recently uploaded (20)

guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 

Big Data : Bits of History, Words of Advice

  • 1. Big Data : Bits of History, Words of Advice Venu Vasudevan GLSEC Big Data Meetup
  • 2. Big Data : Bits of History, Words of Advice
  • 4. Big Data : Behavioral Big Data - The ‘V’ view of Big Data challenges - Number of V’s up for debate
  • 5. Big Data : Architectural untidy data firehose clean analytics fast & good slower & much better Lambda architecture Lake architecture Stream architecture
  • 8. This Talk Behavioral View Technology Solution Stack ‘Middleware’ (benefit of hindsight) some more some governance culture (gap) data economics ownership foodfights dataeconomics
  • 10. Iridium • mobile routers (10K mph), fixed people • no repeated patterns • satellites N-S movement • earth E-W movement • regular topology, irregular exceptions • solar flares • military satellite presence
  • 11. Fast Data Problem • cellular frequency allocation (graph coloring problem) • frequent fast recalculations (fast routers + semi-fast earth) • transmit-no transmit (solar flares, military satellite presence) • moving ‘seam’ seam irregularities
  • 12. Fast Data Problem • cellular frequency allocation (graph coloring problem) • frequent fast recalculations (fast routers + semi-fast earth) • transmit-no transmit (solar flares, military satellite presence) • moving ‘seam’ • + ‘France’ seam irregularities broadcast = +$$$ broadcast = -$$$ (lawsuit)
  • 13. Fast Data Problem • quest for (OO)DB technology to address ‘France’ as make-or- break use case • query expressive power • complex constraint satisfaction • query handling throughput • 3-4 month benchmarking effort seam broadcast = +$$$ broadcast = -$$$ (lawsuit)
  • 14. Fast Data Problem • quest for (OO)DB technology to address ‘France’ • query expressive power • query handling throughput • 3-4 month benchmarking effort • France solved ‘out-of- band’ (legally) seam broadcast = +$$$ broadcast = -$$$ (lawsuit)don’t overfit your architecture to an extreme requirement unless it’s from an extreme (paying) user
  • 15. Big Data Problem • systems management • manage 66 ‘nodes’ • nodes moving at 10K mph • ‘seam’ moving of 20K mph • sounds harder than trivial, but not too hard
  • 16. ‘Pre’ Lambda Solution • Dumb edge | smart core approach • 15K events/sec/satellite • 1M events/sec • Fast & Approximate - FMEA: ’compiled’ lookup table for failure modes • Slow & Precise - Model-based reasoning on satellite models untidy satellite firehose (1M events/sec) actionable insights ‘Pre’ Lambda architecture Model-Based Reasoning FMEA
  • 17. ‘Pre’ Lambda Solution • Dumb edge | smart core approach • 15K events/sec/satellite • Fast & Approximate - FMEA: ’compiled’ lookup table for failure modes • Slow & Precise - Model-based reasoning on satellite models • Simple, straightforward & wrong. untidy satellite firehose (1M events/sec) actionable insights ‘Pre’ Lambda architecture Model-Based Reasoning real-time expert system FMEA Yet, an architecture that is ‘rinsed and repeated’ over the years
  • 18. why does dumb edge smart cloud endure? • edges are expensive ($2B) • when edges go wrong (break/blow up /collide) , they make headlines $ $$$$$
  • 19. why dumb edge smart cloud • edges are expensive ($2B) • when edges go wrong (break/blow up /collide) and make headlines • nobody messes with an ‘edge’ once it works • clouds don’t make for good news headlines $ T-0 $$$$$ T-30 yrs
  • 20. why dumb edge smart cloud • edges are expensive ($2B) • when edges go wrong (break/blow up /collide) and make news headlines • nobody messes with an ‘edge’ once it works • thus, implementing an end- to-end architecture causes culture clashes over my dead body iterate & refine
  • 21. an almost repeat (Industrial IoT) • edges are messy & domain specific • creating them means dealing with culture clashes • but .. an ounce of edge is worth a pound of cloud $$$$$ T-30 yrs $ T-0
  • 22. Things to consider • Problem statement. What’s your ‘France’? • colorful sub-problem. strategy overfit. • Architecture. small fixes to IT/OT gap can go a long way to a simpler problem • Technology Choices. best practices & the risk of ‘rewardless risk’ • right - make average programmers productive with new tech • frequent - turn great programmers into average
  • 23. Big Data to Deep Metadata streaming video(TV) ~ 1 petabyte/day second minute hour day/week epochal detect & replace ads Create Playlists by Player, Play, Sentiment Identify minor characters with rabid fan following rejuvenate old content derivenewcontent ‘chapterize’ by Player, Play, Sentiment
  • 24. Platform Triage Challenge new Product, new market • one core technology, many markets • platform triaging challenge. what drives the platform? • highest (but uncertain) $ potential? • ‘extreme’ requirement? • sparsest competition? • use case outlier is your biggest customer deep metadata technology SaaS data platform Advertising Search Video concept maps
  • 25. ad replacement use case • speed • few days (on-demand content) • few seconds (real-time rebroadcast with new ads) • precision • low - best effort, for low cost international content for niche audiences • high - frame level for expensive content. e.g. Sports/$10M/episode programming • errors • 90% accuracy - ok for long tail content • ‘five nines’ for premium content precision accuracy speed ad replacement opportunity space largest customer
  • 26. occam’s razor works (again) • build to simplicity • loose coupling between data engg & equipment engg • modularize complexity • ‘differentiate your product’ changes • ‘necessary evil’ changes data-only approach +1st party integration (dynamically configure ad splicers) 3rd party knobs (dynamically refresh CDN)
  • 28. but, what if .. • Data is untidy • Interpretation is subjective/cultural • Automation is aspirational but quixotic
  • 29. human-powered analytics • some analytics tasks are too ‘slippery’ for machines • data hard to characterize • uneven video quality of ‘old’ archives • untidy • insights are subjective
  • 30. human-powered analytics • some analytics tasks are too ‘slippery’ for machines • need for human augmentation • humans generate ‘training’ sets to bootstrap m/c learning • humans completely take over some tasks
  • 31. machines vs humans • crowdsourcing & human- powered computing • has been the ‘next big thing’ for a while • checkered history: • uneven output • fraud • uneven throughput Machines Humans fast slow brittle malleable objective subjective clear nuanced
  • 32. machines vs humans • much of that has changed • Amazon Mech Turk • 500K active users • the ‘human machine’ can return substantial jobs in under 30 mins • quantifiable as a machine for many media tasks - latency, quality, error rate, thruput
  • 34. Things to consider • Beware ‘France’ in other forms: • customer with loudest voice & ‘holy grail’ hairball • Dealing with data quality & variability • crowdsourcing has come a long way as credible ‘engine’ • If big data the answer, what is the question? (have strong opinion held weakly) • decision rationalization • process automation • human ‘power tool’ (e.g. compelling visualization) vs imperfect automation
  • 35. startup data jiu-jitsu • How to create a data- driven strategy before the data shows up? • rationalize future SaaS revenue models • justify product decisions in a data- driven manner need data for product need product for data
  • 36. startup data jiu-jitsu • How to create a data- driven strategy before the data shows up? • how ‘intelligent’ can lighting control be with 50-100K users? • how do people use dimmers (continuous or quantized) — UX implications
  • 37. data set dilemma • standard sources (e.g. Kaggle & UCI) insufficient • few ‘physical world’ datasets • expensive to collect • may be specialized (vendor-specific) • dataset proxies for IoT actuation may not work • energy utilization != switch usage
  • 38. big data, small start • physical world data likely to be smaller (1-10 homes, few months) • setup costs limit size of public datasets • e.g. UMass Smart* light switch dataset
  • 39. big data, small start • consider data ‘augmentation’ • standard practice in AI (deep learning) - horizontally flipping, random crops … • under-used in data space • may need some thought on perturbation models for your domain real synthesized https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
  • 40. In short .. • big data success - equal parts tech & non-tech • solving right problem, not just problem right • revisit problem, and what success means