A slide deck that I put together with my thoughts on the main economic drivers that have led to the new "data rush" and the commercialisation of grid computing, and some examples of the many diverse applications for hadoop in the contemporary, third wave, knowledge economy.
3. THE INFORMATION AGE
The so-called “economic third wave” has bankrupted or
seriously damaged many blue chip organisations
Traditional manufacturing and retail is in rapid and heavy
decline in Europe and the US
Technology, connectivity and access to information is
restructuring our societies
Levels of political and social engagement have surged
Peer-to-peer lending platforms have revolutionized banking in
many countries
4.
5. NEW AGE NEW ORDER
Manufacturing is shifting from the mass-production
model of the 20th century back to build-to-order
production
High street stores are being used as showrooms while
the actual sale is made online
Web based services are run with tiny profit margins on
huge transaction volumes
Systems like Amazon Marketplace, Etsy and Ebay are
empowering small business, delivering globalised trade
and driving socioeconomic change that has never been
seen before
6. INNOVATION
Mass-production rarely benefits from innovation
Innovation drives change – a huge cost with little benefit for
production-line driven economies
“Refinement of product” mentality
Knowledge services need to innovate to differentiate
Change in a virtual world can be cheap and yield huge
rewards
“Reinvention of product” mentality
8. A SHIFT IN DEMANDS
Shifting emphasis from mass-production to
knowledge services and build-to-order
production means shifting priorities
Innovation and change become more valued
attributes than stability and reliability
9. LONG TAIL
Wallmart, Best Buy
Amazon, eBay, Netflix
everything else / lower value
Only the most popular / highest value
Long-tail economics underpin the information age
10. BIG DATA VIZ LONG TAIL
Knowledge and information-driven services are following
the “long-tail” paradigm in many ways, including
processing huge amounts of low value data to yield
profit
Google Now
Amazon recommendations
Ebay search
Facebook Exchange
11. BIG DATA VIZ INNOVATION
In a competitive, free market like the world-wide-web,
innovation is valued because it can open up new
opportunities
Consumer-grade access to grid computing technology is
a recent innovation
Grid computing can open up new opportunities that
would otherwise not be addressable
It is an excellent solution to the needs of ventures
architected around the long-tail economic model
12. CURRENT TREND
Industrial economies and traditional production line
manufacturing require stability, reliability and minimal
change
Knowledge economies thrive on innovation, and process
huge amounts of information
The US and Europe are transitioning from industrial to
knowledge economies
Big Data concepts and technologies are a key enabler for
the new economy
13. THE FUTURE - THINGTERNET
The internet of things is with us
Billions of connected devices, even e-tattoos
14. INTERNET OF THINGS
AND BIG DATA
Billions of connected devices create a huge
amount of data to process
Until grid computing, IoT was technically
near impossible to implement
15. INTERNET OF THINGS IS A WILD
WEST
The IoT poses many new, unsolved challenges
An internet alarm clock, monitoring how often you sleep
late, could be accessed by HR for employee
performance evaluations
But new challenges = new opportunities
17. STORAGE
Hadoop can be used purely for online data
storage, with no direct processing
Low cost per-GB for petascale online storage
The option of directly querying or analysing
the the data available if required.
18. PRODUCT SEARCH
A huge, constantly changing catalogue of
products – like Ebay and Amazon
Simple keyword search matching customer to
product
SolrCloud – a full text search engine indexing
and serving up terabytes of live content,
running on Hadoop clusters
19. BEHAVIOURAL TARGETING
Matching advertising content with users based on the
user's demographic and interests – like Google AdWords
Behavioural Targeting can yield twice as many
conversions (eg. Click-throughs) as untargeted
advertising
Generates a huge amount of log data which is used for
reporting and reprocessed for predictive analysis
Predictive analysis is compute intensive
TBs of data per day
20. PRODUCT RECOMMENDERS
Recommending products to the user based on their
demographic and interests, other [similar] user's
purchase history, and their current browsing pattern
Like Amazon and Zalando recommendations
A hybrid between Behavioural Ad Targeting and
Product Search
Combines product catalogue, clickstream data and
passive user profiling, possibly running live in-session
22. SELF SERVICE BIG DATA BUSINESS
INTELLIGENCE
So-called “Enterprise Data Hub” paradigm
The fastest growing use case in 2014 on
Yahoo's YGrid, a set of 16 clusters composed
from 32.500 hadoop nodes
Sales, accounting, executive and other
business users run the data analysis jobs
themselves on the available datasets using
discovery tools like MicroStrategy, Tableau and
Tibco Spotfire
23. DATA WAREHOUSING
Many migrations of classical Enterprise Data
Warehousing applications to Hadoop
2-3x+ performance gains over Teradata on 3TB – 30TB
workloads
Huge cost savings versus trad enterprise technologies
like Oracle and Teradata
Fraud detection – eg. Credit Card, Medical Insurance,
Welfare
Credit risk appraisal – eg. Credit card application
Banking and Retail batch processes
24. OLTP DBMS
Many large scale OLTP dbms implementations
use HBase, Accumulo or other NOSQL grid db
For low latency, high throughput, high
concurrency, high volume
eg. Sharedealing, Realtime ad auction
Volumes at 200BN transactions per day in
realtime reliably served
25. RESEARCH
Low cost solution for mapping the human
genome
About 4TB of data per person
eg. Cancer research, personalised drugs etc.
26. DEVICE MANAGEMENT
Automated, managed service for analysis and
response to threats detected by SPI module on
remote switch
Central heating system management – shut
down boiler when nobody home to reduce
heating bill and emissions – eg. Nest
Monitor drivers' propensity to break the speed
limit and apply lower insurance premiums to
good drivers