Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Embracing Hadoop with a musical touch!
1. Enabling our Customer Advanced Analytics Environment (AAE)
Embracing Hadoop with a musical al touch!
Hadoop Summit, San Jose CA // June 09-11, 2015
Speaker(s): Shashin Surkund and Arindam Paul
Company: Fidelity Investments
2. 2
Why are we here today?
Evolution with planned yearly revolutionary changes
– Environment, architecture, and results
Lessons learned
And ……
To share our story about our Big data journey……
For enabling our Customer Data Analytics Platform
3. 3
Advanced Analytics Environment (AAE) journey - Timeline
Take baby steps to achieve something great…
Tried our hand at Hadoop
Too early for us jump in
Establish Hadoop User Group and host multiple tech. events
Deliver web data (clickstream) with multi-year history
Enrich Predictive model with web drivers
Stream line batch ingestion framework
Hadoop integral part of the advanced analytics platform
Hadoop Security and Governance
Lambda architecture
Omni-channel big data ingestion
Real-time processing
Hadoop becomes our advanced
analytics platform
Fidelity embraces Hadoop
Sets up two clusters [prod and non-prod]
Our team kicks off our first adventure(web data)
Kick-off multiple proof of concepts
5. 5
Nothing’s gonna change my love for you…
If I had to run my jobs
without you HADOOP
The days would all go waiting
The nights would seem so long
With you I see our data oh so clearly
With Hive, Impala and Mahout
But it never felt this strong
Our dreams are young and we both know
They'll take us where we want to go
Ingest me now, process me now
I don't want to live without you
Nothing's gonna change my love for you
You ought to know by now how much I
love you
One thing you can be sure of
I'll only ask for 1000 MAP SLOTS :)
The road ahead for us is not so easy
Arun will lead the way for us
Like a guiding star
Doug is there for us
if we should need him
You don't have to change a thing
We'll love you just the way you are
We’ll come to you, QUERY thru HUE
You’ll help us do AAE too
Ingest me now, process me now
I don't want to live without you
Nothing's gonna change my love for you
You ought to know by now how much I
love you
One thing you can be sure of
I'll only ask for 1000 MAP SLOTS :)
7. 7
Why Hadoop? Web Data use case
Technical
Challenges
Increasing data volumes
Closed ecosystem
Complex data processing
Operational challenges
Big Data
Opportunities
Solution
Capabilities
Advanced analytics [AAE]
Predictive modelling and
real-time scoring
Scalable, cost effective
and open source
Industry tested and future
of data warehousing
8. 8
Three V’s of big data
• Web data ingestion
• Omni-channel ingestion
Variety
• Batch ingestion in
production
• Intra-day
• Near Real-time
processing
Velocity
• Multi-year history
• Terra-bytes and growing
Volume
10. 10
Web data Hadoop implementation…
Highly normalized using a Star Schema data model
Daily grain partitioned by date
Compress historic read only partitions for space savings
Daily ETL cycle takes 16-18 hours to complete
Simplified de-normalized design resulting in one clickstream table
Leverage hive complex data types to store detail attributes
Partition by date for easy and efficient access
Use RC file format with block level snappy compression
Cluster Visitors into 128 buckets to facilitate advanced map joins and sampling
RDBMSHadoop From a Star to a Super Star……
11. 11
How we did it?
Stages Ingest Transform Load
Hadoop
Technology
Stack
Hive
Perl
Map-reduce
Hive
Java UDF
Hive
Java UDF
Pig
Batch Cycle Data standardization
Data cleansing
Data enrichment
Page fixing
Sessionize
Session flagging
Publish clickstream
Common
Framework
Data audit framework
Persistent staging area
Data retention policies
Role based security model
Enterprise Scheduler
Lessons
Learned
Importance of data cleansing and
audits
Hive supported column and row
delimiters
Hive file formats and compression
types
Edge Server processing is needed
Hive UDF best practices
Map joins
Addition of professional
services helped ramp up
the team faster.
Pig Data Fu libraries [don’t re-invent
the wheel]
Clustering and bucketing of data
Hive Windowing functions
Hive complex data types
Over communicate and build strong network
Take small deliberate steps forward
You will hit speed-bumps, but the team will persevere
It is a journey in a fast changing technology space
Engage Professional services for architecture guidance
13. 13
When the journey started…
Customer data up to 7 years history
Standard architecture: staging, persistent staging, integration area,
and dimensional data
Enable BI reporting and small to medium predictive analytics
Data preparation, model development, and scoring
Customer EDW built up over the years
But time to value too long for complex predictive analytics
14. 14
…Then we enabled complex predictive analytics with existing data
Data: Replicated EDW dimensional data
Data preparation: MPP Analytic DB for development & scoring
Model development & Scoring: MPP-enabled In-DB Statistics SW
Added an MPP environment to process existing data
15. 15
Enable complex predictive analytics with existing data (cont’d)
Next we looked at data too
big to fit in this
environment
16. 16
…Then Came The Hadoop extension to handle large data
Enable large data in predictive analytics
18. 18
Building Big Data Analytics – Lessons Learned
Maximize value of your existing assets (Enterprise Data Warehouse). Do
not start from scratch.
No need to solve “3 V’s” all at once.
Technology (Hadoop, etc.) is a means to the end.
Wrong question to Business: “What business value do you plan to get out of
Hadoop?”
Focus on the right business – not technology – use cases.
Data first
Evolve with controlled revolutionary changes
19. 19
Building Big Data Analytics – Lessons Learned (Cont’d)
Deliver fast and often.
Fail fast and adjust.
Involve Customer (business) in the solution from day one.
Big Data Competency
Agile principles help a lot
Ease of Use
Pay special attention to skill sets in IT and Business
Important to enable Business to do exploratory/discovery BI or
exploratory data analysis
20. 20
My latest dedication to the Hadoop community…
When Hadoop shines on the mountain
RDMS is on the run
It’s a new day, it’s a new way
YARN is live, Arun thanks a Ton
Una Paloma Blanca
For Batch we’re using Hive
Una Paloma Blanca
with Spark, real-time is alive
Yes no one can take
Our Hadoop away
Yes no one can take
Your Hadoop away
21. 21
Our journey does not end here….
Setup Fidelity Hadoop User Group (200+ members)
Quarterly technology events to share use cases,
success stories and lessons learnt (100+ attendance)
Leverage music and videos to connect with users
Build a solid Big Data Team
Deliver actual Business Value by using Hadoop
Leverage the Power of Yarn, Spark, newer versions
of Hive
Work towards building a Customer Analytics
Platform
24. 24
I would….
Data volumes are exploding
Backups are getting delayed...
Cycles are moving slowly
Our users are running away...
Hadoop Cluster is all setup
Eager for Webstats to come
Data scientist excited
When Webstats will hit a home run
Should we go to Hadoop
Well if ... it was me
I Would... I Would.....
Should we go to Hadoop
Well if ... it was me
I Would... I Would.....
25. 25
Hadoop Bollywood Song – Sar jo tera Chakaraye
Code jo tera Tadpaye, Logic complex ho jaye
aajaa pyaare paas hamaare, kaahe ghabaraay, kaahe ghabaraay
Hadoop mera open source, Hive aur Pig Dil ke close
Yarn, Spark, Scala, Impala se khelo tum har roz
Sun Sun Sun, aare babu sun, iss Hadoop mein bade bade gun
laakh dukho ki ek davaa hai, kyun naa aazamaaye
kahe ghabaraaye, kahe ghabaraaye
Code jo tera Tadpaye, Logic complex ho jaye
aajaa pyaare paas hamaare, kaahe ghabaraay, kaahe ghabaraay
Deadline ka ho jhagdaa, SLA kaa ho ragadaa
Delivery ka bhoj hatadee, Concept jab ho tagdaa
Sun Sun Sun, aare babu sun, iss Hadoop mein bade bade gun
laakh dukho ki ek davaa hai, kyun naa aazamaaye
kahe ghabaraaye, kahe ghabaraaye
Code jo tera Tadpaye, Logic complex ho jaye
aajaa pyaare paas hamaare, kaahe ghabaraay, kaahe ghabaraay
Code Tadpaye
26. 26
Credits
Song1:
Nothing’s Gonna Change My Love for You By George Benson
Song2:
Una Paloma Blanca By George Baker
Song3:
I Would By One Direction
Song4:
Original Sound track:
Sar jo Tera Chakaraye By Mohammed-Rafi (Movie: Pyassa 1957)
Hadoop Lyrics for all songs by Shashin Surkund
Editor's Notes
Talking Points:
Our journey started with ingesting and processing Web data in Hadoop and making it available on daily basis to exploration and modeling