Big data Overview

8,678 views

Published on

An introduction to big data.
What's big data, why we'd want it , how is it applicable to CSPs, short intro to Hadoop

(some of the info is in the slide notes)

Published in: Software
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,678
On SlideShare
0
From Embeds
0
Number of Embeds
3,703
Actions
Shares
0
Downloads
196
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

  • Poem by John Godfrey Saxe

    It was six men of Indostan
    To learning much inclined,
    Who went to see the Elephant
    (Though all of them were blind),
    That each by observation
    Might satisfy his mind.
  • 50 Mil. People
    7+ years of manual summations

    read a blog post by Gil Press that stated that the first big data problem was in the 1880s (yes you read that right). In the late 1800s the processing of the US census was beginning to take so long that it was getting close to 10 years. Crossing this mark is meaningful as the census runs every 10 years and as birth rates are getting higher the outlook wasn’t very good. In 1886 Herman Hollerith started a business (that year later was merged with other companies to form IBM) to sell a tabulating machine that holds census data on punch cards. Indeed the 1890 census took less than 2 years to complete and handled both larger population (62 million people) and more data points than the 1880 census.
  • https://www.census.gov/history/www/census_then_now/notable_alumni/herman_hollerith.html


    << year instead of almost 10 years
    62 Million people
    1890 census

  • https://www.census.gov/history/www/census_then_now/notable_alumni/herman_hollerith.html


    << year instead of almost 10 years
    62 Million people
    1890 census
  • Large Telco – 200M subscribers
    Orders data few GB
    Charge Events – 100TB per month
    Network 800TB - day
  • So we pile up all this data – but what are we piling it for?
    1992 Bill Clinton campaign – It’s the economy, stupid

    http://upload.wikimedia.org/wikipedia/commons/0/06/UPS_Truck.jpgz


  • Now get this: In 2007 alone, this helped us:
    * shave nearly 30 million miles off already streamlined delivery routes.
    * save 3 million gallons of gas, and
    * reduce CO2 emissions by 32,000 metric tons¿the equivalent of removing 5,300 passenger cars from the road for an entire year.
  • Now get this: In 2007 alone, this helped us:
    * shave nearly 30 million miles off already streamlined delivery routes.
    * save 3 million gallons of gas, and
    * reduce CO2 emissions by 32,000 metric tons¿the equivalent of removing 5,300 passenger cars from the road for an entire year.
  • Retail are the leaders in using analytics

    Amazon is famous for that but they are not alone
  • hat Target discovered fairly quickly is that it creeped people out that the company knew about their pregnancies in advance.
    “If we send someone a catalog and say, ‘Congratulations on your first child!’ and they’ve never told us they’re pregnant, that’s going to make some people uncomfortable,” Pole told me. “We are very conservative about compliance with all privacy laws. But even if you’re following the law, you can do things where people get queasy.”
    Bold is mine. That’s a quote for our times.
    So Target got sneakier about sending the coupons. The company can create personalized booklets; instead of sending people with high pregnancy scores books o’ coupons solely for diapers, rattles, strollers, and the “Go the F*** to Sleep” book, they more subtly spread them about:
    “Then we started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random. We’d put an ad for a lawn mower next to diapers. We’d put a coupon for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.
    “And we found out that as long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons. She just assumes that everyone else on her block got the same mailer for diapers and cribs. As long as we don’t spook her, it works.”




  • hat Target discovered fairly quickly is that it creeped people out that the company knew about their pregnancies in advance.
    “If we send someone a catalog and say, ‘Congratulations on your first child!’ and they’ve never told us they’re pregnant, that’s going to make some people uncomfortable,” Pole told me. “We are very conservative about compliance with all privacy laws. But even if you’re following the law, you can do things where people get queasy.”
    Bold is mine. That’s a quote for our times.
    So Target got sneakier about sending the coupons. The company can create personalized booklets; instead of sending people with high pregnancy scores books o’ coupons solely for diapers, rattles, strollers, and the “Go the F*** to Sleep” book, they more subtly spread them about:
    “Then we started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random. We’d put an ad for a lawn mower next to diapers. We’d put a coupon for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.
    “And we found out that as long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons. She just assumes that everyone else on her block got the same mailer for diapers and cribs. As long as we don’t spook her, it works.”


    http://www.geektime.co.il/okcupid-experiments-on-users/ <-Facebook, okCupid

  • Data from Actix (or other network sources) –20/30M subscribers would generate ~ 250K messages per second
    Monitor for anomalies like dropped calls
    Correlate with data from CRM (identify customer, account)
    Analyze for impact on VIPs
    Analyze for problems in the netwrok
    Automated action
    Change SLAs
    Notify customers (sorry note, small freebie etc) <1 -5 seconds away from the problem <-can have real time impact on satisfaction
    (should avoid falling into the creepiness problem mentioned with Target use case (we know what you’re doing!!)
  • Fraud analysis at big telco – where insights arrive ong after the fraud ended

    Multiple connections with same IP from different locations
    Buying unlimited data and letting “reselling” it for Skype etc.
  • Think of it as defining a view on a table but the underlying data can be
    Poly structured and unstructured data
  • CRM data
    Map – identify subscriber, account
    Group by (account)
    Reduce update account profile
  • Average revenue per user - ARPU
  • SQL on Hadoop
    Streaming
    “Enterprise Grade”
  • SQL on Hadoop
    Streaming
    “Enterprise Grade”
  • Fraud analysis at big telco – where insights arrive ong after the fraud ended

    Multiple connections with same IP from different locations
    Buying unlimited data and letting “reselling” it for Skype etc.
  • Volume
    Velocity

    (variety, ver
  • Big data Overview

    1. 1. BIG DATA Arnon Rotem-Gal-Oz Director of Technology Research, Amdocs The blind men and the elephant. Poem by John Godfrey Saxe (Cartoon originally copyrighted by the authors; G. Renee Guzlas, artists http://www.nature.com/ki/journal/v62/n5/fig_tab/4493262f1.html
    2. 2. 1880 US Census
    3. 3. Hollerith Tabulating Machine Hollerith photos by Martin Wichary : http://www.flickr.com/photos/mwichary/4358926764/in/photostream/
    4. 4. ource: Silicon Angle http://siliconangle.com/blog/2013/11/13/how-big-is-big-data-really/ Big data happens when the data you have to process is bigger than what you can process in the given time with current technologies
    5. 5. Myth: Big data = keep all data Source: Big Data Public Private Forum : http://www.big- project.eu/sites/default/files/D2.2.1_First%20draft%20of%20Technical%20white%20papers_FINAL_v1.01_ 0.pdf
    6. 6. Source: Big Data Public Private Forum : http://www.big- project.eu/sites/default/files/D2.2.1_First%20draft%20of%20Technical%20white%20papers_FINAL_v1.01_ 0.pdf
    7. 7. Some Telco Numbers Source: Wikipedia http://upload.wikimedia.org/wikipedia/commons/5/50/Telephone_operators,_1952.jpg
    8. 8. So, what do we do with all this data? Wikipedia http://upload.wikimedia.org/wikipedia/commons/0/06/UPS_Truck.jpg
    9. 9. It’s the insights, stupid* * With apologies to Bill Clinton
    10. 10. ource: Silicon Angle http://siliconangle.com/blog/2013/11/13/how-big-is-big-data-really/ Big data analytics is when sample = N • Big data happens when the data you have to process is bigger than what you can process in the given time with current technologies
    11. 11. “My daughter got this in the mail!, She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?” Source: Forbes http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her- father-did/
    12. 12. We need to watch out that Analytics won’t get too creepy
    13. 13. When people hear big data they think fast data Source: Steve Jones Cap Gemini http://www.no.capgemini.com/node/778541
    14. 14. Subscribers Collect & Filter Correlate (simplified) Network proactive care flow Account Event Store Identify & Predict Network Failures Reimburse VIPs Prioritize technicians Identify impact on high valued Accounts
    15. 15. ource: Silicon Angle http://siliconangle.com/blog/2013/11/13/how-big-is-big-data-really/ Big data is when we can handle data fast enough to make a difference • Big data happens when the data you have to process is bigger than what you can process in the given time with current technologies • Big data analytics is when sample = N
    16. 16. Technology space
    17. 17. The Elephant in the room
    18. 18. Hadoop Stack Map/R educe HDFS HBase Pig Hive Zoo Keeper Oozie Mahout Giraph
    19. 19. Schema on read
    20. 20. Move data to computation
    21. 21. Maybe we should rethink moving data to computation… Source : http://my-inner-voice.blogspot.co.il/2012/06/haddop-101-paper-by-miha-ahronovitz-and.html
    22. 22. Map/reduce Source: http://www.bodhtree.com/blog/2012/10/18/ever-wondered-what-happens-between-map-and-reduce/
    23. 23. Customer Segmentation First name Last name ARPU Age Device Country … Mr. Smith 100 22 iPhone 5s,White USA John Doe 87 42 Samsung Galaxy S5,Gold France Lady In Red 105 21 Samsung Note 3, White UK … Uluru, Australia by Stuart Edwards (cc) http://en.wikipedia.org/wiki/Uluru#mediaviewer/File:Uluru_Panorama.jpg
    24. 24. K-Means ARPU Age Source : http://pypr.sourceforge.net/kmeans.html
    25. 25. K=3ARPU Age ARPU Age Source : http://pypr.sourceforge.net/kmeans.html
    26. 26. New paradigms Map/R educe HDFS HBase Pig Hive Zoo Keeper Oozie Mahout Giraph
    27. 27. New Paradigms Map/R educe HDFS HBase Pig Hive Zoo Keeper Oozie Mahout YARN Giraph
    28. 28. New Paradigms Map/R educe HDFS HBase Pig Hive Zoo Keeper Oozie Mahout YARN Giraph Spark Storm Slider Flink Impala Tez Presto
    29. 29. Amdocs Analytics & Data Management Heritage 2013 • Proactive Care • TerraScale • Network optimization • Real time analytics platform • Single product catalog • BSS–OSS Integration • CRM-Billing Integration OSS Analytics Platform, 16 Analytics Patents • aLDM logical data model • Policy control Network Analytics CRM 2000 2008 AcquisitionsPortfolio
    30. 30. 34 Information Security Level 2 – Sensitive © 2014 – Proprietary and Confidential Information of Amdocs Touchpoints & Applications CRM Self Service E-MailPCRF SMS OtherWi-Fi OffloadCampaign Mng. • • • • • • • Operational Envelope & Platform Administration • Security Management • Configuration Management • Services Inventory • Performance Management • Fault Management • LoggerCollect & Ingest Transform & Enrich Aggregate & Correlate Drive Insight Close the Loop Machine Learn & Score Application-Ready Data and Analytics/ML Insights Entities and Profiles Detailed Data OSS Probes SocialRAN Inventory Usage & Charging CRM Real-Time & Batch Connectors Insight Platform Marketing Analytical Application Framework: Dashboards & Visualisation Decisioning Engine Dynamic Micro Segmentation Network Care Operations
    31. 31. ource: Silicon Angle http://siliconangle.com/blog/2013/11/13/how-big-is-big-data-really/ • Big data happens when the data you have to process is bigger than what you can process in the given time with current technologies • Big data analytics is when sample = N • Big data is when we can handle data fast enough to make a difference
    32. 32. Additional takeaways • CSPs have always been in the big data business – they just didn’t know it • Big data is not a panacea • Hadoop is shaping up as the big data OS – Though there are alternatives arriving from the cloud arena (mesos, kubernetes)
    33. 33. What we covered here is not even the tip of the iceberg Source: wikimedia http://commons.wikimedia.org/wiki/File:Iceberg.jpg
    34. 34. Arnon Rotem-Gal-Oz Director of Technology Research, Amdocs arnonrot@amdocs.com / arnon@rgoarchitects.com

    ×