-
1.
BIG DATA
Arnon Rotem-Gal-Oz
Director of Technology Research, Amdocs
The blind men and the elephant. Poem by John Godfrey Saxe (Cartoon originally copyrighted by the authors; G.
Renee Guzlas, artists http://www.nature.com/ki/journal/v62/n5/fig_tab/4493262f1.html
-
2.
1880 US
Census
-
3.
Hollerith
Tabulating
Machine
Hollerith photos by Martin Wichary :
http://www.flickr.com/photos/mwichary/4358926764/in/photostream/
-
4.
ource: Silicon Angle http://siliconangle.com/blog/2013/11/13/how-big-is-big-data-really/
Big data happens when the data you
have to process is bigger than what
you can process in the given time with
current technologies
-
5.
Myth: Big data = keep all data
Source: Big Data Public Private Forum : http://www.big-
project.eu/sites/default/files/D2.2.1_First%20draft%20of%20Technical%20white%20papers_FINAL_v1.01_
0.pdf
-
6.
Source: Big Data Public Private Forum : http://www.big-
project.eu/sites/default/files/D2.2.1_First%20draft%20of%20Technical%20white%20papers_FINAL_v1.01_
0.pdf
-
7.
Some Telco
Numbers
Source: Wikipedia
http://upload.wikimedia.org/wikipedia/commons/5/50/Telephone_operators,_1952.jpg
-
8.
So, what do we do
with all this data?
Wikipedia http://upload.wikimedia.org/wikipedia/commons/0/06/UPS_Truck.jpg
-
9.
It’s the insights, stupid*
* With apologies to Bill Clinton
-
10.
ource: Silicon Angle http://siliconangle.com/blog/2013/11/13/how-big-is-big-data-really/
Big data analytics is when sample = N
• Big data happens when the data you have to process
is bigger than what you can process in the given time
with current technologies
-
11.
“My daughter got this in the
mail!, She’s still in high school,
and you’re sending her coupons
for baby clothes and cribs? Are
you trying to encourage her to
get pregnant?”
Source: Forbes http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-
father-did/
-
12.
We need to
watch out that
Analytics won’t
get too creepy
-
13.
When people hear
big data they think
fast data
Source: Steve Jones Cap Gemini
http://www.no.capgemini.com/node/778541
-
14.
Subscribers
Collect
& Filter
Correlate
(simplified) Network proactive care flow
Account
Event Store
Identify &
Predict
Network
Failures
Reimburse
VIPs
Prioritize
technicians
Identify
impact on
high valued
Accounts
-
15.
ource: Silicon Angle http://siliconangle.com/blog/2013/11/13/how-big-is-big-data-really/
Big data is when we can handle data
fast enough to make a difference
• Big data happens when the data you have to process
is bigger than what you can process in the given time
with current technologies
• Big data analytics is when sample = N
-
16.
Technology space
-
17.
The Elephant in the room
-
18.
Hadoop Stack
Map/R
educe
HDFS
HBase
Pig
Hive
Zoo
Keeper
Oozie Mahout
Giraph
-
19.
Schema on read
-
20.
Move data to computation
-
21.
Maybe we should rethink
moving data to
computation…
Source : http://my-inner-voice.blogspot.co.il/2012/06/haddop-101-paper-by-miha-ahronovitz-and.html
-
22.
Map/reduce
Source: http://www.bodhtree.com/blog/2012/10/18/ever-wondered-what-happens-between-map-and-reduce/
-
23.
Customer Segmentation
First
name
Last
name
ARPU Age Device Country …
Mr. Smith 100 22 iPhone 5s,White USA
John Doe 87 42 Samsung Galaxy S5,Gold France
Lady In Red 105 21 Samsung Note 3, White UK
…
Uluru, Australia by Stuart Edwards (cc) http://en.wikipedia.org/wiki/Uluru#mediaviewer/File:Uluru_Panorama.jpg
-
24.
K-Means
ARPU
Age
Source : http://pypr.sourceforge.net/kmeans.html
-
25.
K=3ARPU
Age
ARPU
Age
Source : http://pypr.sourceforge.net/kmeans.html
-
26.
New paradigms
Map/R
educe
HDFS
HBase
Pig
Hive
Zoo
Keeper
Oozie Mahout
Giraph
-
27.
New Paradigms
Map/R
educe
HDFS
HBase
Pig Hive
Zoo
Keeper
Oozie Mahout
YARN
Giraph
-
28.
New Paradigms
Map/R
educe
HDFS
HBase
Pig Hive
Zoo
Keeper
Oozie Mahout
YARN
Giraph
Spark
Storm
Slider
Flink
Impala
Tez
Presto
-
29.
Amdocs Analytics & Data Management
Heritage
2013
• Proactive Care
• TerraScale
• Network optimization
• Real time
analytics platform
• Single product catalog
• BSS–OSS
Integration
• CRM-Billing
Integration
OSS
Analytics Platform,
16 Analytics Patents
• aLDM logical data
model
• Policy control
Network Analytics
CRM
2000 2008
AcquisitionsPortfolio
-
30.
34
Information Security Level 2 – Sensitive
© 2014 – Proprietary and Confidential Information of Amdocs
Touchpoints & Applications
CRM Self Service E-MailPCRF SMS OtherWi-Fi OffloadCampaign Mng. • • • • • • •
Operational
Envelope &
Platform
Administration
• Security
Management
• Configuration
Management
• Services
Inventory
• Performance
Management
• Fault
Management
• LoggerCollect &
Ingest
Transform
& Enrich
Aggregate
& Correlate
Drive
Insight
Close the
Loop
Machine
Learn &
Score
Application-Ready Data and Analytics/ML Insights
Entities and Profiles
Detailed Data
OSS
Probes SocialRAN Inventory Usage &
Charging
CRM
Real-Time & Batch Connectors
Insight Platform
Marketing
Analytical
Application
Framework:
Dashboards &
Visualisation
Decisioning
Engine
Dynamic Micro
Segmentation
Network Care Operations
-
31.
ource: Silicon Angle http://siliconangle.com/blog/2013/11/13/how-big-is-big-data-really/
• Big data happens when the data you have to process
is bigger than what you can process in the given time
with current technologies
• Big data analytics is when sample = N
• Big data is when we can handle data fast enough to
make a difference
-
32.
Additional takeaways
• CSPs have always been in the big data
business – they just didn’t know it
• Big data is not a panacea
• Hadoop is shaping up as the big data OS
– Though there are alternatives arriving from the
cloud arena (mesos, kubernetes)
-
33.
What we
covered here
is not even
the tip of the
iceberg
Source: wikimedia http://commons.wikimedia.org/wiki/File:Iceberg.jpg
-
34.
Arnon Rotem-Gal-Oz
Director of Technology Research, Amdocs
arnonrot@amdocs.com / arnon@rgoarchitects.com
Poem by John Godfrey Saxe
It was six men of Indostan
To learning much inclined,
Who went to see the Elephant
(Though all of them were blind),
That each by observation
Might satisfy his mind.
50 Mil. People
7+ years of manual summations
read a blog post by Gil Press that stated that the first big data problem was in the 1880s (yes you read that right). In the late 1800s the processing of the US census was beginning to take so long that it was getting close to 10 years. Crossing this mark is meaningful as the census runs every 10 years and as birth rates are getting higher the outlook wasn’t very good. In 1886 Herman Hollerith started a business (that year later was merged with other companies to form IBM) to sell a tabulating machine that holds census data on punch cards. Indeed the 1890 census took less than 2 years to complete and handled both larger population (62 million people) and more data points than the 1880 census.
https://www.census.gov/history/www/census_then_now/notable_alumni/herman_hollerith.html
<< year instead of almost 10 years
62 Million people
1890 census
https://www.census.gov/history/www/census_then_now/notable_alumni/herman_hollerith.html
<< year instead of almost 10 years
62 Million people
1890 census
Large Telco – 200M subscribers
Orders data few GB
Charge Events – 100TB per month
Network 800TB - day
So we pile up all this data – but what are we piling it for?
1992 Bill Clinton campaign – It’s the economy, stupid
http://upload.wikimedia.org/wikipedia/commons/0/06/UPS_Truck.jpgz
Now get this: In 2007 alone, this helped us:
* shave nearly 30 million miles off already streamlined delivery routes.
* save 3 million gallons of gas, and
* reduce CO2 emissions by 32,000 metric tons¿the equivalent of removing 5,300 passenger cars from the road for an entire year.
Now get this: In 2007 alone, this helped us:
* shave nearly 30 million miles off already streamlined delivery routes.
* save 3 million gallons of gas, and
* reduce CO2 emissions by 32,000 metric tons¿the equivalent of removing 5,300 passenger cars from the road for an entire year.
Retail are the leaders in using analytics
Amazon is famous for that but they are not alone
hat Target discovered fairly quickly is that it creeped people out that the company knew about their pregnancies in advance.
“If we send someone a catalog and say, ‘Congratulations on your first child!’ and they’ve never told us they’re pregnant, that’s going to make some people uncomfortable,” Pole told me. “We are very conservative about compliance with all privacy laws. But even if you’re following the law, you can do things where people get queasy.”
Bold is mine. That’s a quote for our times.
So Target got sneakier about sending the coupons. The company can create personalized booklets; instead of sending people with high pregnancy scores books o’ coupons solely for diapers, rattles, strollers, and the “Go the F*** to Sleep” book, they more subtly spread them about:
“Then we started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random. We’d put an ad for a lawn mower next to diapers. We’d put a coupon for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.
“And we found out that as long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons. She just assumes that everyone else on her block got the same mailer for diapers and cribs. As long as we don’t spook her, it works.”
hat Target discovered fairly quickly is that it creeped people out that the company knew about their pregnancies in advance.
“If we send someone a catalog and say, ‘Congratulations on your first child!’ and they’ve never told us they’re pregnant, that’s going to make some people uncomfortable,” Pole told me. “We are very conservative about compliance with all privacy laws. But even if you’re following the law, you can do things where people get queasy.”
Bold is mine. That’s a quote for our times.
So Target got sneakier about sending the coupons. The company can create personalized booklets; instead of sending people with high pregnancy scores books o’ coupons solely for diapers, rattles, strollers, and the “Go the F*** to Sleep” book, they more subtly spread them about:
“Then we started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random. We’d put an ad for a lawn mower next to diapers. We’d put a coupon for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.
“And we found out that as long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons. She just assumes that everyone else on her block got the same mailer for diapers and cribs. As long as we don’t spook her, it works.”
http://www.geektime.co.il/okcupid-experiments-on-users/ <-Facebook, okCupid
Data from Actix (or other network sources) –20/30M subscribers would generate ~ 250K messages per second
Monitor for anomalies like dropped calls
Correlate with data from CRM (identify customer, account)
Analyze for impact on VIPs
Analyze for problems in the netwrok
Automated action
Change SLAs
Notify customers (sorry note, small freebie etc) <1 -5 seconds away from the problem <-can have real time impact on satisfaction
(should avoid falling into the creepiness problem mentioned with Target use case (we know what you’re doing!!)
Fraud analysis at big telco – where insights arrive ong after the fraud ended
Multiple connections with same IP from different locations
Buying unlimited data and letting “reselling” it for Skype etc.
Think of it as defining a view on a table but the underlying data can be
Poly structured and unstructured data
CRM data
Map – identify subscriber, account
Group by (account)
Reduce update account profile
Average revenue per user - ARPU
SQL on Hadoop
Streaming
“Enterprise Grade”
SQL on Hadoop
Streaming
“Enterprise Grade”
Fraud analysis at big telco – where insights arrive ong after the fraud ended
Multiple connections with same IP from different locations
Buying unlimited data and letting “reselling” it for Skype etc.
Volume
Velocity
(variety, ver