Data Analytics in Real World
Geeta Chauhan @ MUM Dec 2015
1 2 3
6 5 4
7 8 9
Master's in Computer
Application
Systems Engineer
General Manager &
Technical Director
Senior ConsultantDevelopment Director
Innovation &
Research Director
Chief Technology Officer
Lead 13 New Products,
Features across 30+ Products
Data Driven, Multi-tier, Social
Media, Mobile, Cloud,
Analytics
Agile, User Centered
Design, Lean Startup,
Mindfulness
India
USA
Data Analytics in Real World 2
Challenges for Data Analytics in Real World
Technological
 Rapidly evolving Technology Stack
 Shift towards Open Source to contain costs
 Shift from One standard way of doing things to
Contextual use case driven
 New types of access & usage patterns
 Real Time, On- Demand, Exploratory, Internet of
Things
 Two different types of projects
 Production Bread & Butter
 Experimental - High unknowns, don’t know what
you don’t know
Organizational & Cultural
 ROI - lead time for first set of outcomes
 Data cleansing & ingestion 80-90%
 Lack of Domain Expertise, Not asking or
solving for right questions
 Learning curve - crucial for successful rollout
of project
 Data Driven decision making still new
 Comfort level with high unknowns
 Test driven approach - A/B Testing
Data Analytics in Real World 3
Architectural Patterns & Solutions
 Lambda Architecture
 Real-time speed layer + Batch Processing layer + Serving Layer
 Edge Analytics – Internet of Things
 Distributed analytics closer to source
 Data Center as a Computer
 Cluster computing, dynamic workloads
Data Analytics in Real World 4
Lambda (λ) Architecture
Data Analytics in Real World
5
Edge Analytics
Cloudlets with Edge
Analytics
Video
IOT
Automotive
Source: CMU
Data Analytics in Real World 6
Client Server Era
Small Apps, Big Servers
Static Partitioned
Cloud Era
Big Apps, Small Servers, Micro-services
Elastic Partitioned
Data Center as a Computer
Source: Andreesen Horowitz
Data Analytics in Real World 7
Dynamic Workloads  Resource Utilization
 Distributed Systems Kernel
 General Purpose dynamic shared
cluster for multiple workloads
 When resources become idle, can be
reused by other schedulers
Source: Apache Mesos
Data Analytics in Real World
8
Key Takeaways
 Continuous Learning
 Interpersonal Skills
 Data Driven experimental approach
 Contextual Use Case driven technology stack
 Automation for rapid iterations and reproducible results
 Meditation
Data Analytics in Real World 9
Q & A
Contact: geeta.chauhan@gmail.com
Data Analytics in Real World 10
Resources
 Lambda Architecture: http://lambda-architecture.net/
 Edge Analytics: https://www.cs.cmu.edu/~satya/docdir/satya-edge2015.pdf
 Apache Mesos Whitepaper: https://www.cs.berkeley.edu/~alig/papers/mesos.pdf
Data Analytics in Real World 11

Data Analytics in Real World

  • 1.
    Data Analytics inReal World Geeta Chauhan @ MUM Dec 2015
  • 2.
    1 2 3 65 4 7 8 9 Master's in Computer Application Systems Engineer General Manager & Technical Director Senior ConsultantDevelopment Director Innovation & Research Director Chief Technology Officer Lead 13 New Products, Features across 30+ Products Data Driven, Multi-tier, Social Media, Mobile, Cloud, Analytics Agile, User Centered Design, Lean Startup, Mindfulness India USA Data Analytics in Real World 2
  • 3.
    Challenges for DataAnalytics in Real World Technological  Rapidly evolving Technology Stack  Shift towards Open Source to contain costs  Shift from One standard way of doing things to Contextual use case driven  New types of access & usage patterns  Real Time, On- Demand, Exploratory, Internet of Things  Two different types of projects  Production Bread & Butter  Experimental - High unknowns, don’t know what you don’t know Organizational & Cultural  ROI - lead time for first set of outcomes  Data cleansing & ingestion 80-90%  Lack of Domain Expertise, Not asking or solving for right questions  Learning curve - crucial for successful rollout of project  Data Driven decision making still new  Comfort level with high unknowns  Test driven approach - A/B Testing Data Analytics in Real World 3
  • 4.
    Architectural Patterns &Solutions  Lambda Architecture  Real-time speed layer + Batch Processing layer + Serving Layer  Edge Analytics – Internet of Things  Distributed analytics closer to source  Data Center as a Computer  Cluster computing, dynamic workloads Data Analytics in Real World 4
  • 5.
    Lambda (λ) Architecture DataAnalytics in Real World 5
  • 6.
    Edge Analytics Cloudlets withEdge Analytics Video IOT Automotive Source: CMU Data Analytics in Real World 6
  • 7.
    Client Server Era SmallApps, Big Servers Static Partitioned Cloud Era Big Apps, Small Servers, Micro-services Elastic Partitioned Data Center as a Computer Source: Andreesen Horowitz Data Analytics in Real World 7
  • 8.
    Dynamic Workloads Resource Utilization  Distributed Systems Kernel  General Purpose dynamic shared cluster for multiple workloads  When resources become idle, can be reused by other schedulers Source: Apache Mesos Data Analytics in Real World 8
  • 9.
    Key Takeaways  ContinuousLearning  Interpersonal Skills  Data Driven experimental approach  Contextual Use Case driven technology stack  Automation for rapid iterations and reproducible results  Meditation Data Analytics in Real World 9
  • 10.
    Q & A Contact:geeta.chauhan@gmail.com Data Analytics in Real World 10
  • 11.
    Resources  Lambda Architecture:http://lambda-architecture.net/  Edge Analytics: https://www.cs.cmu.edu/~satya/docdir/satya-edge2015.pdf  Apache Mesos Whitepaper: https://www.cs.berkeley.edu/~alig/papers/mesos.pdf Data Analytics in Real World 11