SlideShare a Scribd company logo
1 of 26
Enabling our Customer Advanced Analytics Environment (AAE)
Embracing Hadoop with a musical al touch!
Hadoop Summit, San Jose CA // June 09-11, 2015
Speaker(s): Shashin Surkund and Arindam Paul
Company: Fidelity Investments
2
Why are we here today?
 Evolution with planned yearly revolutionary changes
– Environment, architecture, and results
 Lessons learned
 And ……
To share our story about our Big data journey……
For enabling our Customer Data Analytics Platform
3
Advanced Analytics Environment (AAE) journey - Timeline
Take baby steps to achieve something great…
 Tried our hand at Hadoop
 Too early for us jump in
 Establish Hadoop User Group and host multiple tech. events
 Deliver web data (clickstream) with multi-year history
 Enrich Predictive model with web drivers
 Stream line batch ingestion framework
 Hadoop integral part of the advanced analytics platform
 Hadoop Security and Governance
 Lambda architecture
 Omni-channel big data ingestion
 Real-time processing
 Hadoop becomes our advanced
analytics platform
 Fidelity embraces Hadoop
 Sets up two clusters [prod and non-prod]
 Our team kicks off our first adventure(web data)
 Kick-off multiple proof of concepts
4
Hadoop has touched our hearts and souls…
5
Nothing’s gonna change my love for you…
If I had to run my jobs
without you HADOOP
The days would all go waiting
The nights would seem so long
With you I see our data oh so clearly
With Hive, Impala and Mahout
But it never felt this strong
Our dreams are young and we both know
They'll take us where we want to go
Ingest me now, process me now
I don't want to live without you
Nothing's gonna change my love for you
You ought to know by now how much I
love you
One thing you can be sure of
I'll only ask for 1000 MAP SLOTS :)
The road ahead for us is not so easy
Arun will lead the way for us
Like a guiding star
Doug is there for us
if we should need him
You don't have to change a thing
We'll love you just the way you are
We’ll come to you, QUERY thru HUE
You’ll help us do AAE too
Ingest me now, process me now
I don't want to live without you
Nothing's gonna change my love for you
You ought to know by now how much I
love you
One thing you can be sure of
I'll only ask for 1000 MAP SLOTS :)
6
Onboarding Web Data on Hadoop
7
Why Hadoop? Web Data use case
Technical
Challenges
 Increasing data volumes
 Closed ecosystem
 Complex data processing
 Operational challenges
Big Data
Opportunities
Solution
Capabilities
 Advanced analytics [AAE]
 Predictive modelling and
real-time scoring
 Scalable, cost effective
and open source
 Industry tested and future
of data warehousing
8
Three V’s of big data
• Web data ingestion
• Omni-channel ingestion
Variety
• Batch ingestion in
production
• Intra-day
• Near Real-time
processing
Velocity
• Multi-year history
• Terra-bytes and growing
Volume
9
Hadoop welcomed us with an open canvas…
10
Web data Hadoop implementation…
 Highly normalized using a Star Schema data model
 Daily grain partitioned by date
 Compress historic read only partitions for space savings
 Daily ETL cycle takes 16-18 hours to complete
 Simplified de-normalized design resulting in one clickstream table
 Leverage hive complex data types to store detail attributes
 Partition by date for easy and efficient access
 Use RC file format with block level snappy compression
 Cluster Visitors into 128 buckets to facilitate advanced map joins and sampling
RDBMSHadoop From a Star to a Super Star……
11
How we did it?
Stages Ingest Transform Load
Hadoop
Technology
Stack
 Hive
 Perl
 Map-reduce
 Hive
 Java UDF
 Hive
 Java UDF
 Pig
Batch Cycle  Data standardization
 Data cleansing
 Data enrichment
 Page fixing
 Sessionize
 Session flagging
 Publish clickstream
Common
Framework
 Data audit framework
 Persistent staging area
 Data retention policies
 Role based security model
 Enterprise Scheduler
Lessons
Learned
 Importance of data cleansing and
audits
 Hive supported column and row
delimiters
 Hive file formats and compression
types
 Edge Server processing is needed
 Hive UDF best practices
 Map joins
 Addition of professional
services helped ramp up
the team faster.
 Pig Data Fu libraries [don’t re-invent
the wheel]
 Clustering and bucketing of data
 Hive Windowing functions
 Hive complex data types
 Over communicate and build strong network
 Take small deliberate steps forward
 You will hit speed-bumps, but the team will persevere
 It is a journey in a fast changing technology space
 Engage Professional services for architecture guidance
12
Our Advanced Analytics Platform Journey
13
When the journey started…
 Customer data up to 7 years history
 Standard architecture: staging, persistent staging, integration area,
and dimensional data
 Enable BI reporting and small to medium predictive analytics
 Data preparation, model development, and scoring
Customer EDW built up over the years
But time to value too long for complex predictive analytics
14
…Then we enabled complex predictive analytics with existing data
 Data: Replicated EDW dimensional data
 Data preparation: MPP Analytic DB for development & scoring
 Model development & Scoring: MPP-enabled In-DB Statistics SW
Added an MPP environment to process existing data
15
Enable complex predictive analytics with existing data (cont’d)
Next we looked at data too
big to fit in this
environment
16
…Then Came The Hadoop extension to handle large data
Enable large data in predictive analytics
17
Looking ahead…
18
Building Big Data Analytics – Lessons Learned
 Maximize value of your existing assets (Enterprise Data Warehouse). Do
not start from scratch.
 No need to solve “3 V’s” all at once.
 Technology (Hadoop, etc.) is a means to the end.
 Wrong question to Business: “What business value do you plan to get out of
Hadoop?”
Focus on the right business – not technology – use cases.
Data first
Evolve with controlled revolutionary changes
19
Building Big Data Analytics – Lessons Learned (Cont’d)
 Deliver fast and often.
 Fail fast and adjust.
 Involve Customer (business) in the solution from day one.
Big Data Competency
Agile principles help a lot
Ease of Use
 Pay special attention to skill sets in IT and Business
 Important to enable Business to do exploratory/discovery BI or
exploratory data analysis
20
My latest dedication to the Hadoop community…
When Hadoop shines on the mountain
RDMS is on the run
It’s a new day, it’s a new way
YARN is live, Arun thanks a Ton
Una Paloma Blanca
For Batch we’re using Hive
Una Paloma Blanca
with Spark, real-time is alive
Yes no one can take
Our Hadoop away
Yes no one can take
Your Hadoop away
21
Our journey does not end here….
 Setup Fidelity Hadoop User Group (200+ members)
 Quarterly technology events to share use cases,
success stories and lessons learnt (100+ attendance)
 Leverage music and videos to connect with users
 Build a solid Big Data Team
 Deliver actual Business Value by using Hadoop
 Leverage the Power of Yarn, Spark, newer versions
of Hive
 Work towards building a Customer Analytics
Platform
22
Thank you
Thank you
 Shashin.Surkund@fmr.com
 Arindam.Paul@fmr.com
23
Appendix
Additional
Hadoop
Sound Tracks
24
I would….
Data volumes are exploding
Backups are getting delayed...
Cycles are moving slowly
Our users are running away...
Hadoop Cluster is all setup
Eager for Webstats to come
Data scientist excited
When Webstats will hit a home run
Should we go to Hadoop
Well if ... it was me
I Would... I Would.....
Should we go to Hadoop
Well if ... it was me
I Would... I Would.....
25
Hadoop Bollywood Song – Sar jo tera Chakaraye
Code jo tera Tadpaye, Logic complex ho jaye
aajaa pyaare paas hamaare, kaahe ghabaraay, kaahe ghabaraay
Hadoop mera open source, Hive aur Pig Dil ke close
Yarn, Spark, Scala, Impala se khelo tum har roz
Sun Sun Sun, aare babu sun, iss Hadoop mein bade bade gun
laakh dukho ki ek davaa hai, kyun naa aazamaaye
kahe ghabaraaye, kahe ghabaraaye
Code jo tera Tadpaye, Logic complex ho jaye
aajaa pyaare paas hamaare, kaahe ghabaraay, kaahe ghabaraay
Deadline ka ho jhagdaa, SLA kaa ho ragadaa
Delivery ka bhoj hatadee, Concept jab ho tagdaa
Sun Sun Sun, aare babu sun, iss Hadoop mein bade bade gun
laakh dukho ki ek davaa hai, kyun naa aazamaaye
kahe ghabaraaye, kahe ghabaraaye
Code jo tera Tadpaye, Logic complex ho jaye
aajaa pyaare paas hamaare, kaahe ghabaraay, kaahe ghabaraay
Code Tadpaye
26
Credits
Song1:
Nothing’s Gonna Change My Love for You By George Benson
Song2:
Una Paloma Blanca By George Baker
Song3:
I Would By One Direction
Song4:
Original Sound track:
Sar jo Tera Chakaraye By Mohammed-Rafi (Movie: Pyassa 1957)
Hadoop Lyrics for all songs by Shashin Surkund

More Related Content

What's hot

Data infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedInData infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedInHari Shankar Sreekumar
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...Yahoo Developer Network
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop Edureka!
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Amr Awadallah
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?Hortonworks
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsSkillspeed
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Jonathan Seidman
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreTrendwise Analytics
 
Self Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System AccuracySelf Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System AccuracyDataWorks Summit
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
 

What's hot (20)

Data infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedInData infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedIn
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 
Uotm workshop
Uotm workshopUotm workshop
Uotm workshop
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Self Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System AccuracySelf Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System Accuracy
 
lec3_ref.pdf
lec3_ref.pdflec3_ref.pdf
lec3_ref.pdf
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 

Similar to Embracing Hadoop with a musical touch!

Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitzRaghu Kashyap
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsightNaoki (Neo) SATO
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Imam Raza
 
Big data and lynda_Subash_DSouza.com
Big data and lynda_Subash_DSouza.comBig data and lynda_Subash_DSouza.com
Big data and lynda_Subash_DSouza.comData Con LA
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: RevealedSachin Holla
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesQubole
 
Cloudera - Mike Olson - Hadoop World 2010
Cloudera - Mike Olson - Hadoop World 2010Cloudera - Mike Olson - Hadoop World 2010
Cloudera - Mike Olson - Hadoop World 2010Cloudera, Inc.
 
Keynote - Cloudera - Mike Olson - Hadoop World 2010
Keynote - Cloudera - Mike Olson - Hadoop World 2010Keynote - Cloudera - Mike Olson - Hadoop World 2010
Keynote - Cloudera - Mike Olson - Hadoop World 2010Cloudera, Inc.
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache HadoopHortonworks
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoopAdam Muise
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotechlccinfotech
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...BigDataEverywhere
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Vantara
 

Similar to Embracing Hadoop with a musical touch! (20)

Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitz
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
 
Big data and lynda_Subash_DSouza.com
Big data and lynda_Subash_DSouza.comBig data and lynda_Subash_DSouza.com
Big data and lynda_Subash_DSouza.com
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Cloudera - Mike Olson - Hadoop World 2010
Cloudera - Mike Olson - Hadoop World 2010Cloudera - Mike Olson - Hadoop World 2010
Cloudera - Mike Olson - Hadoop World 2010
 
Keynote - Cloudera - Mike Olson - Hadoop World 2010
Keynote - Cloudera - Mike Olson - Hadoop World 2010Keynote - Cloudera - Mike Olson - Hadoop World 2010
Keynote - Cloudera - Mike Olson - Hadoop World 2010
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoop
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotech
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 

Embracing Hadoop with a musical touch!

  • 1. Enabling our Customer Advanced Analytics Environment (AAE) Embracing Hadoop with a musical al touch! Hadoop Summit, San Jose CA // June 09-11, 2015 Speaker(s): Shashin Surkund and Arindam Paul Company: Fidelity Investments
  • 2. 2 Why are we here today?  Evolution with planned yearly revolutionary changes – Environment, architecture, and results  Lessons learned  And …… To share our story about our Big data journey…… For enabling our Customer Data Analytics Platform
  • 3. 3 Advanced Analytics Environment (AAE) journey - Timeline Take baby steps to achieve something great…  Tried our hand at Hadoop  Too early for us jump in  Establish Hadoop User Group and host multiple tech. events  Deliver web data (clickstream) with multi-year history  Enrich Predictive model with web drivers  Stream line batch ingestion framework  Hadoop integral part of the advanced analytics platform  Hadoop Security and Governance  Lambda architecture  Omni-channel big data ingestion  Real-time processing  Hadoop becomes our advanced analytics platform  Fidelity embraces Hadoop  Sets up two clusters [prod and non-prod]  Our team kicks off our first adventure(web data)  Kick-off multiple proof of concepts
  • 4. 4 Hadoop has touched our hearts and souls…
  • 5. 5 Nothing’s gonna change my love for you… If I had to run my jobs without you HADOOP The days would all go waiting The nights would seem so long With you I see our data oh so clearly With Hive, Impala and Mahout But it never felt this strong Our dreams are young and we both know They'll take us where we want to go Ingest me now, process me now I don't want to live without you Nothing's gonna change my love for you You ought to know by now how much I love you One thing you can be sure of I'll only ask for 1000 MAP SLOTS :) The road ahead for us is not so easy Arun will lead the way for us Like a guiding star Doug is there for us if we should need him You don't have to change a thing We'll love you just the way you are We’ll come to you, QUERY thru HUE You’ll help us do AAE too Ingest me now, process me now I don't want to live without you Nothing's gonna change my love for you You ought to know by now how much I love you One thing you can be sure of I'll only ask for 1000 MAP SLOTS :)
  • 7. 7 Why Hadoop? Web Data use case Technical Challenges  Increasing data volumes  Closed ecosystem  Complex data processing  Operational challenges Big Data Opportunities Solution Capabilities  Advanced analytics [AAE]  Predictive modelling and real-time scoring  Scalable, cost effective and open source  Industry tested and future of data warehousing
  • 8. 8 Three V’s of big data • Web data ingestion • Omni-channel ingestion Variety • Batch ingestion in production • Intra-day • Near Real-time processing Velocity • Multi-year history • Terra-bytes and growing Volume
  • 9. 9 Hadoop welcomed us with an open canvas…
  • 10. 10 Web data Hadoop implementation…  Highly normalized using a Star Schema data model  Daily grain partitioned by date  Compress historic read only partitions for space savings  Daily ETL cycle takes 16-18 hours to complete  Simplified de-normalized design resulting in one clickstream table  Leverage hive complex data types to store detail attributes  Partition by date for easy and efficient access  Use RC file format with block level snappy compression  Cluster Visitors into 128 buckets to facilitate advanced map joins and sampling RDBMSHadoop From a Star to a Super Star……
  • 11. 11 How we did it? Stages Ingest Transform Load Hadoop Technology Stack  Hive  Perl  Map-reduce  Hive  Java UDF  Hive  Java UDF  Pig Batch Cycle  Data standardization  Data cleansing  Data enrichment  Page fixing  Sessionize  Session flagging  Publish clickstream Common Framework  Data audit framework  Persistent staging area  Data retention policies  Role based security model  Enterprise Scheduler Lessons Learned  Importance of data cleansing and audits  Hive supported column and row delimiters  Hive file formats and compression types  Edge Server processing is needed  Hive UDF best practices  Map joins  Addition of professional services helped ramp up the team faster.  Pig Data Fu libraries [don’t re-invent the wheel]  Clustering and bucketing of data  Hive Windowing functions  Hive complex data types  Over communicate and build strong network  Take small deliberate steps forward  You will hit speed-bumps, but the team will persevere  It is a journey in a fast changing technology space  Engage Professional services for architecture guidance
  • 12. 12 Our Advanced Analytics Platform Journey
  • 13. 13 When the journey started…  Customer data up to 7 years history  Standard architecture: staging, persistent staging, integration area, and dimensional data  Enable BI reporting and small to medium predictive analytics  Data preparation, model development, and scoring Customer EDW built up over the years But time to value too long for complex predictive analytics
  • 14. 14 …Then we enabled complex predictive analytics with existing data  Data: Replicated EDW dimensional data  Data preparation: MPP Analytic DB for development & scoring  Model development & Scoring: MPP-enabled In-DB Statistics SW Added an MPP environment to process existing data
  • 15. 15 Enable complex predictive analytics with existing data (cont’d) Next we looked at data too big to fit in this environment
  • 16. 16 …Then Came The Hadoop extension to handle large data Enable large data in predictive analytics
  • 18. 18 Building Big Data Analytics – Lessons Learned  Maximize value of your existing assets (Enterprise Data Warehouse). Do not start from scratch.  No need to solve “3 V’s” all at once.  Technology (Hadoop, etc.) is a means to the end.  Wrong question to Business: “What business value do you plan to get out of Hadoop?” Focus on the right business – not technology – use cases. Data first Evolve with controlled revolutionary changes
  • 19. 19 Building Big Data Analytics – Lessons Learned (Cont’d)  Deliver fast and often.  Fail fast and adjust.  Involve Customer (business) in the solution from day one. Big Data Competency Agile principles help a lot Ease of Use  Pay special attention to skill sets in IT and Business  Important to enable Business to do exploratory/discovery BI or exploratory data analysis
  • 20. 20 My latest dedication to the Hadoop community… When Hadoop shines on the mountain RDMS is on the run It’s a new day, it’s a new way YARN is live, Arun thanks a Ton Una Paloma Blanca For Batch we’re using Hive Una Paloma Blanca with Spark, real-time is alive Yes no one can take Our Hadoop away Yes no one can take Your Hadoop away
  • 21. 21 Our journey does not end here….  Setup Fidelity Hadoop User Group (200+ members)  Quarterly technology events to share use cases, success stories and lessons learnt (100+ attendance)  Leverage music and videos to connect with users  Build a solid Big Data Team  Deliver actual Business Value by using Hadoop  Leverage the Power of Yarn, Spark, newer versions of Hive  Work towards building a Customer Analytics Platform
  • 22. 22 Thank you Thank you  Shashin.Surkund@fmr.com  Arindam.Paul@fmr.com
  • 24. 24 I would…. Data volumes are exploding Backups are getting delayed... Cycles are moving slowly Our users are running away... Hadoop Cluster is all setup Eager for Webstats to come Data scientist excited When Webstats will hit a home run Should we go to Hadoop Well if ... it was me I Would... I Would..... Should we go to Hadoop Well if ... it was me I Would... I Would.....
  • 25. 25 Hadoop Bollywood Song – Sar jo tera Chakaraye Code jo tera Tadpaye, Logic complex ho jaye aajaa pyaare paas hamaare, kaahe ghabaraay, kaahe ghabaraay Hadoop mera open source, Hive aur Pig Dil ke close Yarn, Spark, Scala, Impala se khelo tum har roz Sun Sun Sun, aare babu sun, iss Hadoop mein bade bade gun laakh dukho ki ek davaa hai, kyun naa aazamaaye kahe ghabaraaye, kahe ghabaraaye Code jo tera Tadpaye, Logic complex ho jaye aajaa pyaare paas hamaare, kaahe ghabaraay, kaahe ghabaraay Deadline ka ho jhagdaa, SLA kaa ho ragadaa Delivery ka bhoj hatadee, Concept jab ho tagdaa Sun Sun Sun, aare babu sun, iss Hadoop mein bade bade gun laakh dukho ki ek davaa hai, kyun naa aazamaaye kahe ghabaraaye, kahe ghabaraaye Code jo tera Tadpaye, Logic complex ho jaye aajaa pyaare paas hamaare, kaahe ghabaraay, kaahe ghabaraay Code Tadpaye
  • 26. 26 Credits Song1: Nothing’s Gonna Change My Love for You By George Benson Song2: Una Paloma Blanca By George Baker Song3: I Would By One Direction Song4: Original Sound track: Sar jo Tera Chakaraye By Mohammed-Rafi (Movie: Pyassa 1957) Hadoop Lyrics for all songs by Shashin Surkund

Editor's Notes

  1. Talking Points: Our journey started with ingesting and processing Web data in Hadoop and making it available on daily basis to exploration and modeling