Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fluturas presentation @ Big Data Conclave

575 views

Published on

Flutura had presented at the big data conclave . Please find the presentation

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Fluturas presentation @ Big Data Conclave

  1. 1. Agenda • 3 Industries , 5 real life Flutura user stories • 7 Key “Gotchas” & Big Data Best Practices
  2. 2. Case Study-1 : Reducing Network threats by Detecting Patterns in perimeter device logs
  3. 3. What is the Biz problem being solved ?
  4. 4. What is the problem being solved? Network threats are growing ...
  5. 5. What is the problem being solved? • 2 types of threats – Internal ( Social Unrest & Watch List ) & External ( Hackers ) External hackers Internal Activists
  6. 6. Who is experiencing the pain ? Telecom Security Operations centre
  7. 7. Lots of Telecom Machine data left untapped ! This is typically flushed but has gold in it
  8. 8. Why is it important to solve this problem? • Reduces network disruption from hackers • Minimize social disruption and unrest
  9. 9. Traditional RDBMS architectures cant handle high velocity machine data !
  10. 10. SOC's cant see threat patterns … running BLIND • Being Blind = Risk • BeingCannot be blind to patterns anymore • The capability to “see” patterns previously not seen • Network activity and behaviour – Firewalls , routers • Saves lives, provides social stability – WL Chatter !
  11. 11. Capability to remove “data blind folds” to “SEE” behavioural patterns key to security MACHINE DATA KEY TO UNCOVERING SECURITY PATTERNS !
  12. 12. What are some “behavioural signatures” ? 1. Sudden increase in you tube uploads @ night 1. Viral Rate of propagation of MMS videos
  13. 13. So what does the data look like ? National content filtering log – 1 billion events/day !
  14. 14. 16 1329031890 http://photogallery.indiatimes.com/photo/4686985.cms 94.200.107.14 94.200.0.0 Du_Public_IP_Address 0 37 1 2 3 4 5 6 Decoding 7 components of the Netsweeper log entry 7 EPOCH Time stamp URL requested Source IP Client subnet Client group name 0 allowed 1 denied URL Category Descp tbd 50 categories in the system Education, Pornoraphy, Phishing, Criminal Skills etc 23" - Its related to "Pornography “45" - Its related to "GENERAL" Timestamp URL requested Source IP Client Subnet Client Group Name Denied flag URL Categort Decoding National content filtering logs
  15. 15. Expand to ingest variety of watched events File Delete Events User Login Failure Events Root access Failures 2 Sigma events Table Drop Events Table Delete Events Column Drop Events Critical Proc recompilation OS logs Database logs Critical tsn value changes Master data changes App login failures Login at unusual time windows Application logs Search for specific keywords 2 Sigma event for URL’s Decomp tree- failed reqsts Login Failure Web server logs Dropped call frequency Watch List inbound/outbound Cut calls - poor connection Call Failure event frequency Timeout event frequency Swarm event detected Dropped IP calls frequency Failed IP call frequency CDR logs IPR logs SMS Capacity events Unusual sms traffic events User defined router events Compliance related router event Router logs Odd hour Unsuccessful logins X happens Y times in Z time User defined firewall events Compliance oriented firewall e Firewall logs Frequency of login failures high in a certain pockets Recency of late night events noticed in certain pockets Certain corridors experiencing high dropped calls
  16. 16. Converting raw data Actionable Intelligence INTEGRATED EVENT 360 REPOSITORY SENSE & RESPOND LAYER LOG FILE INGESTION MACHINE LEARNING ALGORITHMS ON GRANULAR LOG EVENT DATA INFER INTENT FROM PATTERNS AND CREATE EVENT PROFILES LOAD RISK / BEHAVIOR PROFILE TO RULES ENGINE DB INTERCEPT OR OFFLINE REVIEW OF EVENTS CONSOLIDATE & REVIEW EVENT INTERCEPTS TO ASSESS EVENT RULE EFFECTIVENESS MEASURE PATTERN RULE EFFECTIVENESS - TRUE POSITIVE / FALSE POSITIVES CASE MANAGEMENT WORKFLOW TELECOM SWITCHES OTHER DEVICES •CDR LOG FILES •IP LOG FILES •MISC LOG FILES Holistic Value Chain BIG DATA REPOSITORY
  17. 17. Case Study-2 : Decoding travellers intent
  18. 18. What's the problem we are trying to solve ? • Travellers are “signalling” to us thru the behaviour they exhibit • OTA is unable to sense n respond to these varied behaviour
  19. 19. Why is it important to solve this problem ? • Impacts look to book • Increase revenue from cross sell
  20. 20. Srikanth intends to travel from San Fran to NYC
  21. 21. Srikanth searches !
  22. 22. Srikanths First Moment of Truth !
  23. 23. Srikanth sees the options rendered !
  24. 24. Is Srikanth Price Sensitive or Time conscious traveller? 87 % 13%
  25. 25. Does Srikanth have a bias towards any airline ? Those small clicks reveal a lot !
  26. 26. So who is Srikanth? Do we 'know' him ? What's his behavorial DNA ? Key vectors ? Early bird ( days = 21 ) Price insensitive ( click % = 89 %) Prefers American Airlines Most valuable customer ( Decile-1 ) Intra visit interval = 17 days Visit dispersion = 12 % International Churn propensity = 0 Bargain hunter = No ( 3 % coupon) Roadie = Yes ( 28000 miles per qtr ) Sentiment index = 73 %
  27. 27. How do we respond in real time to Srikanths experience and behavioural patterns we’ve seen ? • If Srikanth is a high value customer • If he does not book within 8 min window • In real time route to high performing agent • Short circuit the queue • Extra 10 % discount since he is vulnerable • If search response time velocity is trending downward • Signal to beef up infrastructure • Optimise code base • Property recommendations
  28. 28. Case Study-3 : Watched List
  29. 29. What is the problem being solved? • Internal watch lists • Can we get e signals in their behavior ?  Call patterns ?  SMS patterns ?  Youtube upload patterns ?  Watched countries ?  Intrawatch list chatter ?  Late night communication behavior ? • Watch list activity intelligence takes 6 weeks • Bring it down to < day • Enhance it to make it real time
  30. 30. Why is it important to solve this problem ? • Threat signals are there in telecom and communication logs • Saves lives ! • Ensures national security !
  31. 31. Under the hood • Remote Authentication Dial-In User Service (RADIUS) provide authentication, authorization and accounting for network access. • When a user wants to get access to the Internet he will first have to give his users credentials (in most cases username and password) to a local RADIUS client.
  32. 32. Deconstructing Radius Logs The IP address of the NAS ( Network Access server ) that is sending the request The framed address to be configured for the user 3 time stamps User Identity
  33. 33. Radius logs Netsweeper logs Subscriber database Rich Security intelligence ! Triangulate from 3 event data pools
  34. 34. Access/Device Framed IP address Customer ethnicity URL accessed Date/time Day Week Client IP address Customer type Customer browse location Post paid Subscriber Database 1329031890 http://photogallery.indiatimes.com/photo/4686985.cms 94.200.107.14 94.200.0.0 Du_Public_IP_Address 0 37 Status Enterprise Residential Asian European Dubai Smart Phone Desktop Ipad Others URL Type Gaming sites News sites Others ? ? Yes No Business rule to derive access device to be elicited from SME Location mapping business logic to be elicited from SME Social Networking Blogs P2P sites VPN/VOIP NAS Port Id Username Nas port id RADIUS Logs Co-relating fragmented telecom log files-Info model
  35. 35. Calls to watched countries Intra Watch list Chatter velocity is high Call patterns reveal malicious intent
  36. 36. 38 Entity on watch list NOT on watched list but high level of interactions Are people ‘n’ degrees away from watched list performing 2 sigma activity across multiple Call dimensions – sms, voice, conference and other behavioral activity ? CDR  From BTN  To TN  Date/Time  Duration  Call type,  Approximate tower location which carried call Watch List Recommender Data Product Modeling Unique behavioural signature
  37. 37. Discarded Telecom data--> Actionable Security patterns
  38. 38. Case Study-4 : Mobile forensics
  39. 39. Mobile funnel data Analyzing Mobile Sub Channel Behavioural shift to Drive revenues for a leading online travel company
  40. 40. What's the problem being solved ? • More applications becoming mobile • There is a dip in transaction completion rate • Friction points and hot spots exist • No way to “see” these hot spots and patterns
  41. 41. • Spot friction points • Mobile funnel drops • Payment gateway drops • Airline connector drops
  42. 42. Funnel Analysis
  43. 43. Churn Scoring Model
  44. 44. Case Study-5 : Money transmission
  45. 45. Minimizing fund leakages to watched entities Money transmission event stream Threat matrix Graph Analysis
  46. 46. Money transmission behavioral modeling
  47. 47. Modeling money transmission behavior
  48. 48. Graph analysis to monitor money transmission patterns • Each account can be modelled as a node in a graph • Behaviour across nodes can be analyzed • Proxy behaviours can be easily discerned
  49. 49. 7 Key “gotchas” ( best practices)
  50. 50. Lesson-1 : Think “Polyglot persistence” Asset Sensor Parameters Asset tags Sensor tags Events Column family ( Hbase/Cassandra) Document db ( Mongo) Graph db ( Neo4js) RDBMS ( Oracle ) Heavy duty write workloads Photos, Videos, text Inter relationships Low velocity self service Logical Business Model “Different strokes for different folks”
  51. 51. Lesson-2 : Think “pattern extraction” 1. Collaborative filtering 2. Text Mining 3. Scoring Models ( Logistic etc ) Embedding one ML process can help SPOT patterns not previously seen
  52. 52. Lesson-3 : Think “Baby steps” • 60-90 day Hadoop Sandbox • Build quick wins to build momentum • Pick a few low hanging use cases to demonstrate impact No Big Bang !
  53. 53. Lesson-4 : Think “Data Products” • Data Product = “Action an end user takes” • EXAMPLE • Watch List recommender vs tons of “feel good” graphs • Next best action vs lots of dials, graphs • Focus on Outcomes more than Analysis
  54. 54. Lesson-5 : Think “MVP-Minimum Viable Product” • Minimalist ... Key is to start simple • Only core features ... No bells and whistles • Get feedback from early adopters and enrich features •
  55. 55. How can Big Data co-exist with existing DW solutions ? Big DataExisting DW
  56. 56. Existing DW OSS BSS CRM ETL Existing BI tools Radius logs IP traffic logs Comments File copy / Bulk load / Agent based Operational App Integration Existing DW OSS BSS CRM ETL Existing BI tools Radius logs IP traffic logs Comments File copy / Bulk load / Agent based Operational App Integration Lesson-6 : Gracefully Co-exist
  57. 57. Lesson-7 : Think “Biz backward … NOT Tech forward” 1. What is the business problem you are solving ? Tightly framed ? 2. Why is important to solve this problem ? 3. What happens if we dont solve this problem ? 4. Is status quo an option ? 5. Is the business pain acknowledged ? 6. How would the end user “feel” when the product is deployed ? 7. Are budgets allocated ? 8. What is the actual use case to solve the pain ? Connect with business @ a deeper level !
  58. 58. 1. Think “Polyglot Persistence” 2. Think “Pattern Extraction” 3. Think “Crawl-Walk-Run” 4. Think “Data Products” 5. Think “MVP” 6. Think “Co-existence” 7. Think “Business Impact/Outcomes” To summarize !
  59. 59. Taming and channelising data beast is going to be a crucial capability for survival
  60. 60. Pl feel free to reach out … Derick.jose@fluturasolutions.com

×