• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
NASSCOM Big Data and Analytics Summit 2013: Keymote 2: Gurinder Grewal

NASSCOM Big Data and Analytics Summit 2013: Keymote 2: Gurinder Grewal



Keynote II: Big Data: Connecting the dots – ...

Keynote II: Big Data: Connecting the dots –
Delivering split-second decisions by integrating
offline and online systems.
Gurinder Grewal, Leader of Risk Big Data Platform,



Total Views
Views on SlideShare
Embed Views



2 Embeds 52

http://carlesz.blogspot.com.es 48
http://carlesz.blogspot.com 4



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    NASSCOM Big Data and Analytics Summit 2013: Keymote 2: Gurinder Grewal NASSCOM Big Data and Analytics Summit 2013: Keymote 2: Gurinder Grewal Presentation Transcript

    • BIG DATA – CONNECTING THE DOTS… Delivering split-second decisions by integrating offline and online systems NASSCOM Big Data & Analytics Summit 2013 GURINDER S. GREWAL
    • ART OF DECISION MAKING – SPEED & ACCURACY 11:01AM 11:05AM 11:06AM • Credit card used from three distance locations in short time Result based on realtime analysis: Block the card, not decided? • According to past purchasing behavior • Card holder lives in US - wife paid bill online from home PC • Card holder’s kid studies in Europe - used card to purchase books • Card holder travels to Japan - paid for lunch Result based on historical analysis: It’s a legit usage
    • TIERED BIG DATA STRATEGY real time e.g. filters near real time e.g. correlations offline e.g. behavioral analysis cost, speed data volume, accuracy effective decision = fn(accuracy, speed, cost) data age secondshoursyears Data in-motion Data in-use
    • BIG DATA - COMPUTATION STRATEGY Offline (map-reduce, batch) Offline variablesOnline variables Near Real-time (complex event processing) Realtime (in-flow processing) • fast, very stringent availability and performance SLA’s • computations are simple and eventually accurate • computations are transient, short lived (user sessions) • event-driven, incremental processing • high efficiency and scalability • data for short time windows (hours) • optimized for throughput • computations are slow and accurate • data captured as events for historical analysis
    • Hadoop Technology Stack BIG DATA IN USE - OFFLINE ECOSYSTEM HDFS HBase Map Reduce Framework Data Storage Data Processing Data Integration ETL Flume, Sqoop Programming Languages Pig Hive QL Scheduling, Coordination Zookeeper Oozie UI Framework/SDK Hue Hue SDK Structured Data Unstructured Data MPP DW RDBMS
    • BIG DATA IN MOTION – ONLINE ECOSYSTEM Complex Event Processing correlations filtering aggregations pattern matching In-memory data store Message Bus Offline Decision Service Events stream CEP enables continuous analytics on data in motion • Solution for velocity of big data • Well suited for detection, decisioning, alerting and taking actions • Relies on in-memory data grid for ability to provide low latency Monitoring
    • BIG DATA MOVEMENT EVOLUTION Offline In-memory data store Offline NoSQL (persistent backing store) In-memory data store Two-tier architecture Data Cloud Data Cloud Initial state • 500GB GB in 16 hours Optimization – Phase 1 • 2 TB in 16 hours • Split data files prepared offline • Maximize data load parallelism • Maximum data compression • Optimize data format • Validation before data movement Scale – Phase 2 • 10 TB in 6 hours • Add persistent NoSQL behind in-memory store • Blast bulk load into NoSQL store • Batch process will warm the cache • Lazy warm-up as needed, while serving r/w • Refresh cache contents via time based evictions Batch Multi-tier architecture
    • Confidential and Proprietary8 USE CASE: GRAPH BASED DECISIONING Map/Reduce Graph builder In-memory graph store Online Graph Server Daily incremental updates Continuous graph updates and rollup • Generate graph and associated complex variables on Hadoop on daily basis • Move the incremental changes to online in-memory graph store • Based on event stream, keep graph, offline variables up-to-date • In-memory store provides fast read only access to Decision services Decision Service Avg. read time: 2ms 95th percentile: 6ms Events stream offline online
    • Confidential and Proprietary9 • Hadoop is best for offline processing of variety and volume data – not for real time • CEP is a solution for online, big data in motion (velocity), complements Hadoop • Harness true power of big data by combining offline and online data • Data integration is a key – careful planning and optimization is needed • Online data stores are not optimized for highly parallel writes, bulk loads • Big data can solve complex problems while delivering speed and accuracy CONCLUSION
    • THANK YOU!