Your SlideShare is downloading. ×

Big Data Analytics Summit - April, 2014


Published on

Presentation on Big Data Analytics at Vellore Institute of Technology, Chennai

Presentation on Big Data Analytics at Vellore Institute of Technology, Chennai

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Big Data Analytics ! Industry Perspective Shankar Radhakrishnan
  • 2. Topics • Market Research • Market Trends • Big Data Analytics in • Banking and Financial Services • Insurance • Travel and Hospitality • Retail • Life Sciences • Manufacturing • Telecommunications • Challenges vs. Opportunities • Q & A
  • 3. 3
  • 4. Market Research 4
  • 5. Key Trends driving Big Data AnalyticsIndustry Financial services ▪ Customer Insights – Integrating Transactional data (CRM/Payments) and unstructured Social feeds ▪ Regulatory Compliance – Risk exposures across asset classes, LOBs and firms ▪ Fraud Detection in Credit Cards & Financial Crimes (AML) in Banks Travel, Hospitality & Retail ▪ Customer centricity – Customer behavior analysis from Omni channel retailing & Social feeds ▪ Markdown Optimization – Improve markdown based on actual customer buying patters ▪ Market basket analysis – Narrow down market basket analysis by demographics Life Science ▪ Improve targeting & predictions – Automatic Detection of Adverse Drug Effects (ADEs) ▪ Patient data analysis – Longitudinal Patient Data (LPD) analysis ▪ Predictive Sciences – Analyze Preclinical Side Effect Profiles of Marketed Drugs Healthcare
 (Payers & Providers) ▪ Cost of Care – Drug effectiveness & Cost of Care Analysis based on electronic Health Records (EMR) ▪ Self Service Healthcare – Increase in mHealth & eHealth to allow consumer access to health information ▪ Claims Analytics – Analyze insurance claims data for fraud detection & preferred treatment plans Communication, Media & Entertainment ▪ Discover churn patterns based on Call data records (CDRs) and activity in subscribers’ networks ▪ Digital Asset Management (DAM) – Analyze & capitalize digital data assets Manufacturing ▪ Proactive Maintenance & Recommendation – Sensor Monitoring for automobile, buildings & machinery ▪ Energy Efficiency – Leveraging Smart meters for utility energy consumption ▪ Location or Proximity Tracking – Location based analytics using GPS Data Hi-Tech ▪ Extend and complement conventional information supply chain with big data path ▪ Predictive analysis and real time decision support Trends 5
  • 6. John calls a customer care executive at the bank. ! He is irritated with the services offered to him and is expressing signs of making a switch Executive validates the customer’s identify and pulls up an application powered by Big Data that presents all relevant information to make a decision. ! Big Data Application converts his speech to text in real time and identifies his propensity to churn. Based on John’s tonal sentiment the application immediately pulls up top 5 offers or decisions to take based on the Customer Persona information which contains likes/dislikes, past experiences, which channels he prefer, CLV(Customer Life-time Value) etc. Well Informed Customer Service Executive 6
  • 7. Social media Depositions Complaints Voice Data Unstructured Data Speech to Text Conversion Decision Engine Analytical System Customer Persona •Customer Persona •Demographics, •Top interactions •Channel Preferences, •Dis-satisfiers •Customer Lifetime Value •Recent Contact History •Customer Sentiment •Trend during the call Customer’s state of mind Sentimental Analysis Other Channel information (ATM, Branch) Big Data Warehouse Traditional Warehouse •Customer Executive Dashboard presents all intelligence required to make a decision •The decision engine also presents important decisions to be taken for the particular customer issue Well Informed Customer Service Executive 7
  • 8. Fraud Pattern Analysis & Detection Envisaged Benefits ▪New fraud patterns can be identified by building ‘analytical models’ to run against X yrs. of History data ▪‘Web crawling’, ‘Contextual text analysis’, ‘Natural Language Processing’ allows fraud behavior identification from social media. It may increase Fraud detection success rate ▪‘Real time’ models to capture behavioral patters and do pattern analysis against History data to evaluate Fraud case validity. The model learns by self and updates ‘Fraud pattern master sets. This brings ‘artificial intelligent’ fraud pattern detection and analysis ▪‘Real time’ (in the order of .5-1 minute refresh rate) alerts to Fraud analysts about ‘self learned’ fraud patterns based on new customer behavior patterns Process ▪Formation of key value groups to the order of XcY (where X no. of attributes that are relevant to Fraud and Y is no. of attributes that should be combined to identify patterns) ▪High speed history data loading from source systems ▪Efficient Real time fraud detection by identifying patterns through customer behavioral events and processing them over X yrs. of history data Scenario ▪Formation of Fraud patterns using •Real time data coming from different departments like IVR, WEB, Customer profile, Transactions etc •Real time Mining and analysis of history data to form prior patterns Fraud Pattern Analysis & Detection 8
  • 9. Legacy Fraud Data Customer Profile Data Social Media Data Card Transaction Data Decision Engine Approval/ Denial Decision History Data
 Processing to 
 find Fraud Patterns over years Real-time Customer Behavior Analysis for Fraud Detection Real time Analysis of behavior patterns Real time update to Decision Engine Self Learning Fraud Detection 9
  • 10. Cross Channel Analytics John exhibits a specific pattern when he avails services. ! He always visits the bank when he wants to deposit a check. ! He prefers most other operations to be online. ! He has recently started paying his utility bill payments through mobile. • Analytical Solution integrates Customer transactions through different channels and reveals insights on customer’s channel preferences and activities. • It also integrates data from call centers, surveys and complaints and measures Customer Experience. • It reveals customer activities across channels which is normally not available for a customer touch-point to deliver superior service • I t r e v e a l s o p p o r t u n i t i e s t o consolidate channels and optimize cost of operations by incentivizing customers to choose one medium over other Analytical Solution produces ! • Dominant Path Analysis specifying which channel is used by John for which events • Service Behavior Segmentation • Customer Journey analysis • Root Cause & Repeat Issue analysis • Longitudinal analysis on customer preference changes ! Helps bank deliver superior service and also optimize cost on specific channels 10
  • 11. Analytics Cross Channel Analytics Big Data Warehouse Dominant Path Analysis- channel usage info Service Behavior Segmentation Repeat Issue analysis Query Drill-down Ad hoc Reports Predictive Modeling
 Statistical Analysis & Text Mining Optimization Root cause analysis Call Reasons analysis Customer Journey analysis Structured data Web & Mobile ATM / Branch IVR, Call Records, Notes CRM Data ACH / Wire Transfer / Other channels Unstructured Content / Logs DW Transactions Mailings, Offers, Lists Other Channels Survey Complaints 11
  • 12. Analytics Data Mart Member profiling based on profile, demographic, social media and history data. Identification of key predictor variable for customer churn Member profiling and variable identification Termination prediction modeling Termination prediction modeling engine to determine “probability of termination” at each member level Member Prioritization Matrix List of members with high likelihood of termination Retention Target list generation Alternate product recommendation engine List of suitable products for each customer Analyze profitability of each of the recommended product Create most optimal and effective Retention campaigns Personalized Retention Plan Churn and Retention Analytics 1212
  • 13. Analyze customer’s search pattern by doing the weblog analysis using big data. ! e.g. Rate or amenity which customer prefers ! Step 1: Customer Starts the search on website Drill down into specific search patterns and analyze customer’s rate preference or amenity expectation on a particular rate e.g. ! Step 2: Customer selects some destination. ! Search displays all the hotels and then refine the search by selecting a price range or sort the search based on price and then he leaves and doesn’t book. ! It concludes that customer didn’t find the hotels at hisher expected rate. ! Step 3: Customer selects some destination. ! Search displays all the hotels and then refine the search by selecting preferred amenity e.g. swimming pool,wifi etc. and then he leaves and doesn’t book. ! It concludes that customer didn’t find the hotels with expected amenities. Popup right offers to the customer when they search which in return increase customer attraction and sales as well. ! Revenue management team to use this data and come up with ideal rate. ! The search pattern can be used for individual property amenity improvisation. ! Step 4: This data can be forwarded to revenue management team to setup the rightcompetitive rates in right geography ! Step 5: This data can be forwarded to properties as well for amenity improvisation Look to Book Ratio Analytics 13
  • 14. Planogram – created by planners and buyers Actual view of the shelf arranged by store associates Compliance dashboard as well as compliance score by Dept./Category/Subclass ▪ Planogram compliance is the process of verifying if the products arrangement and the manner in which they are displayed on the shelf in each store match the planogram that is strategically created and collaboratively developed between planners and trading partners ▪ Usually this verification and compliance check is a time consuming process and done on a sample basis. When the execution of planogram is compromised or if there are assortment void, it is a lost opportunity ▪ To accelerate this compliance check – take picture of the actual shelf by product facing, position and systematically compare for compliance The Need ▪ Storing the planogram’s created at corporate location for each store/dept./category combination ▪ Storing the actual photo of the shelf ▪ Comparing this unstructured data for matching ▪ Integrate this matching score with planogram planning data in Data warehouse to produce various dashboards and metrics that will influence
 sell-thru, profitability and customer satisfaction Big Data Analytics Planogram’s Compliance 14
  • 15. ▪ eCommerce retailer needs to analyze graphic images depicting items for sale over the Internet ▪ When a consumer wants to buy a red dress, their search may not match the tags used to identify each item’s search terms. ▪ Manufacturers do not always label their goods clearly for the distributors or identify keywords with which users are likely to search. The Need ▪ Analyze thousands of dress images, detecting the red prominence of the primary object in the graphic (JPGs, GIFs and PNGs) ▪ This requires enormously complex logic for the computer to “see” the dress and its primary colors as humans do. ▪ Millions of images are tagged with additional information to assist consumers with their search ▪ Increases the chances that they find the item they were looking for and make a purchase Big Data Analytics Intelligent Item Search 15
  • 16. ProcessInput Benefits Predictive Biology External and Internal Literature sources Text mining used for linking molecule with metabolic processes such as glucose uptake, fatty acid synthesis, metabolic stress etc. ! Manual curation can be done to extract assertions and relationships with respect to effect, drug treatment, experiment type etc. Vital evidence collected on the effect and relationships on species, tissues and linkages to canonical pathways and RNA expression data Business Goal Rapid extraction of key information from literature sources to collect evidence on biological processes ! Assessing the incidence of Nausea in development compounds by analyzing the preclinical side effect profiles of marketed drugs Statistical models created to find out the relation between various preclinical observations and occurrence of nausea ! Model shows clustering of compounds associated with nausea having higher gastrointestinal preclinical observations Model helps in identifying the risk of nausea early on during development ! Running this model during compound selection can minimize the risk of seeing nausea in follow up compounds Predictive Pre-Clinical Safety Gastrointestinal preclinical findings of marketed drugs 16 Predictive Sciences - Predictive Biology & Predictive Pre-Clinical Safety
  • 17. Data Processing StepsInput Benefits EMR data ! Prescription data ! Promotion data
 (eMail, Sales Calls etc.) ! 
 Identify key themes in the EMR data for a particular disease type Use the prescription data to validate the patterns / themes ! Merge the findings with the promotion data to uncover any relationships between promotion and treatment Refining targeting / promotional strategies ! Cost Reductions ! Uncover potential reasons for choosing a particular therapy Business Goal Linkage of EMR/Prescriber/Promotion data in order to understand the relationship between prescriber promotions and treatment patterns Improving Diagnosis by EHR/EMR Data Analysis 17
  • 18. Challenge Analytics Benefits Wastage of energy and resources ! Under utilized room’ temperature and lighting settings ! Huge Energy bills Historic Sensor Data ! Blue-prints of the building and room layouts ! Realtime Sensor Data ! Temperate settings in the room and building ! Usage patterns of the room Green Energy, Smart Energy Management Optimize consumption of energy in business environments ! Networked sensors and a new generation analytics tools play a huge role in gaining insights and to implement the most efficient and sustained energy strategies. 18
  • 19. Case • Typically contact center channel data is analyzed typically from SLA perspective: TAT, Average wait time. • However, the actual transcript of the conversation can yield powerful insights regarding telecom infrastructure usage Customer call-center
 Text Mining Collocation Analysis from
 Cell Phone Towers • Collocation analysis by an investigation team finds out if there were multiple phones with the same person. • Examination involves Terabytes of CDR/Tower records from the switch, one can triangulate on a few
 collocation events Multi-device Event Stream Analysis
 co-relating Firewall & IDS & Switch activity • Most telecom infrastructures IDS (Intrusion detection systems) sit at the periphery, with network monitoring , Firewalls and application logs being captured in silo • Deploy Central Log File repository with events streaming from multiple devices that are ingested and collated centrally • Channels into intelligence, network infrastructure and security of the telecom assets • Optimizes significantly to detect everything from malware and spear phishing attempts to breach security Optimizing cost of Telecom Tower Maintenance • Big Data platform manages fuel consumption data in the telecom tower business • Each of the telecom towers has a generator and one of the biggest components of cost is diesel cost • Sensors/energy meters which constantly emit large data streams of operational data • Machine learning algorithms crawls through operational data stored over years to predict and optimize cost and revenue User Behavior Analysis • Operational systems at each telecom service provider generates huge data volumes in the form of
 Call Data Records(CDR) for each call/SMS handled • Signaling data between various switches, nodes, and terminals within the network • Mining of this data leads to insights for improving marketing operations, network and service optimization Planning Sales Approach • Large-scale data analysis boosts the ability to pinpoint exactly where ongoing sales approach could make further gains • Study the behavior of customers to see what factors motivated them to choose one brand or product over another. • This involves analyzing online search data and real-time information, shared by consumers across social networks and other Web-based channel - about the company’s products and services • Brand affinity and customer sentiments are measured using Sentiment Analysis algorithms Big Data Analytics In Telecom Big Data Analytics 19
  • 20. ✓ Hype, Buzz & Myth ✓ “How?” vs. “Why?” ✓ Big Data Analytics for Business, than just for IT ✓ Business Case Justification ✓ Right Partner’s For Your Big Data Analytics Journey ✓ Evangelization and Alignment ✓ Business Onboarding ✓ Execution Plan And Course Corrections ✓ Talent and Knowledge Management ✓ Right math for ROI 20 Big Data Analytics : Challenges vs. Opportunities
  • 21. Source: The Evolving Role of the Enterprise Data Warehouse in the Era of Big Data Analytics , By Ralph Kimball   Vector, matrix, or complex structure Free text Image or Binary data Data “bags” Iterative logic or complex branching Advanced analytic routines Rapidly repeated measuremen ts Extreme low
 latency Access to all data required Search Ranking X X X X X X Ad Tracking X X X X X X X X   Location or Proximity Tracking X   X X     X X   Social CRM X X X X X X      X Document Similarity Testing X X X X X X   X X Genomic Analysis X X X X X Customer Cohort groups X X   X X X     X Fraud Detection X X X X X X X X X Smart Utility Metering X X X X X X Churn Analysis X X X X X X   X   Satellite Image Analysis X X X X Game Gesture Analysis X X X X X X X X Data Bag Exploration X X X X X X Ad Tracking / Click stream analytics Location or Proximity Tracking Social Media Analytics / Social CRM Document Similarity Testing / Match Making Customer Cohort Groups Sensor Monitoring (Flights / Building Smart Utility Metering Call Center Voice Analytics Log Analytics Satellite / CAT Image Comparisons Fraud Detection Game Online Gesture Analysis Big Science (Astronomy, weather, atom smashers, Genome decoding) Search Ranking Risk Management Churn Analysis Data “Bag” Exploration / Causal Factor Analysis Design Challenge 21
  • 22. Thanks Much !