Phases of Big Data Challenges @ Nokia

Yekesa Kosuru
HERE.com
Nokia
Hadoop Innovation Summit February 20 & 21, San Diego
2013
Phases of Big Data Challenges
@ Nokia
11

• Phases of Big Data Challenges @Nokia
– Who we are
– Big data platform
– Use case data flows
– High level architecture
–Challenges
• Phases of challenges
Agenda
22

Accelerometer
GPS
Water
Proof
12h
Battery
Bluetooth 2GB Storage
Barometer
NFC
Gyroscope
Magnetometer
Who we are – disrupting the future
3

Apps
Smart Data
Platform
Content
PositionsMaps TrafficPlaces Directions Guidance
Location Platform, Enabling
Contextually Rich Mobile Experiences
44

5
Big Data
Analytics
…to Be Made
Available for Analysis
Enabling feedback loops for continuous improvement,
Location Optimized Experience, CRM, etc..!
Big Data Flows and Differentiates
…on All Supported
Platforms…
Nokia
Account
We Collect
User Data…
5

Click to edit Master title style
Phase 0
66
2008 – ‘10
BuildTechnology
Platform,
GetData

7
Business Challenges
• Data silos, no unique identifiers, missing semantics
• Multiple sources - overlapping, conflicting
• Timely processing of large volumes & velocity of data
• Partial, insufficient, inaccurate, inconsistent.. data
• Data/wire formats, Security, privacy and other policies
unknown
Central Big Data Platform created

8
…to verify Map accuracy and create
Motion Graph
Using different big data sets

Reports
Analytical
DBMS
Analytics Cluster
Data Asset
Catalog
Analytical
DBMS
Dashboards
Data Discovery
Interactive
Queries
Batch
Queries
Web Applications
Activity
Logs
VShards
(NoSQL)
Reference Data
Device Applications
Probes
3rd Party
Device
User Profile
POI, Map
Activity
Sensor
DataIntake
ETL,datacrunching,
attribution,ML
Algorithms
Aggregation
HDFS
9
Analytical
DBMS
Big Data Analytics Platform Data Flows

Technology Platform
10
Hadoop R
VShards
(KV)
SDK,
Scribe, FTP
Hive, Pig
Analytical
DBMS
Export/
Import
Workflow
Engine
Config./
Deploy
Monitor Alerts
Data
Pipeline
Scheduler
Security/Kerberos & ACL
On-Premise & Cloud Infrastructure

11
Data Platform
Self Serve
Tools
ETL, Agg
Machine Learning
Data Quality
Data Asset
Catalog
Data, Metadata, Operational Data
Collect Ingest Organize Analyze Deliver
Technology Platform

Phase 1 –2012
1212
2008 – ‘10
BuildTechnology
Platform,
GetData
2011
EnhancePlatform,
MoreData,
SimpleAnalytics,
DataCrunching
2012
PB’sofData,
HundredsofUsers
ThousandsofJobs
ComplexAnalytics,
MultipleClusters

13
2012 Production Statistics
• 10’s PB of data all across Nokia
• Multi-tenant, multi-petabyte analytics cluster
• 10-20K+ jobs per day
• 600+ internal users
• 300M+ KV queries
• Terabytes flowing in every day
• Multiple data centers around the world

14
Challenges With Big Data
• Complex eco-system of technologies - many moving
parts, slower deploy cycles, data integration is complex
• Capacity & Scale Issues – Provision for peaks or sustained,
storage or compute ?
• DBMS great for performance & data management, but
cant scale - price/performance & ACIDity
• Hadoop great for ETL, but poor on query performance &
data management, not interactive
• Data and Metadata fragmentation

15
Big Data Capacity Issues
• Spikey Workloads
• Capacity Provisioning
– Peaks
– Sustained loads
• How many clusters ?
– SLA/Adhoc/Research
– Multiple data centers
– Data duplication
• Tenancy – single/multi
• TOC
– Hadoop can get expensive -
storage & computed tightly
coupled, idle machines

16
Cloud helps with some issues
• Operational & IT complexity reduced – API based spin up
& tear down – rapid deployments, faster cycles
• Pay for what is used
• Capacity issues mitigated - idle machines or peaks not
an issue – elastically scale up and down
• De-coupled Storage and Compute makes sense
• Stateless architecture, recycle slow/bad machines, no
need for rolling upgrades, instead do rolling replace

Phase 2
1717
2012
PB’sofData,
HundredsofUsers
ThousandsofJobs
Simple&Complex
Analytics
2008 – ‘10
BuildTechnology
Platform,
GetData
17
2011
EnhancePlatform,
MoreData,
SimpleAnalytics
2013
StillPending
Challenges

18
Still Pending
• Data and Metadata fragmentation, need deeper
integration into all tools/frameworks
• Advanced Analytics - Data science problems are hard &
inefficient to implement in Map Reduce/RDBMS

19
Complex Analytics
• Mathematicians think terms of Arrays not Map Reduce
• Data science tools can’t efficiently handle big data
• Data partitioning is naïve, indexing wont scale

Big Data Technologies for Future

21
THANK YOU
Yekesa Kosuru yekesa.kosuru@nokia.com

Phases of Big Data Challenges @ Nokia

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Phases of Big Data Challenges @ Nokia

Similar to Phases of Big Data Challenges @ Nokia (20)

More from Innovation Enterprise

More from Innovation Enterprise (20)

Recently uploaded

Recently uploaded (20)

Phases of Big Data Challenges @ Nokia