IEG 201402 INTUIT Building Big Data Analytics Platform
Upcoming SlideShare
Loading in...5
×
 

IEG 201402 INTUIT Building Big Data Analytics Platform

on

  • 310 views

Information Excellence Group 2014 Spring "Business Analytics Industry Summit", Building Big Data Analytics Platform, Neeta Pande, Data Architect, INTUIT

Information Excellence Group 2014 Spring "Business Analytics Industry Summit", Building Big Data Analytics Platform, Neeta Pande, Data Architect, INTUIT

Statistics

Views

Total Views
310
Slideshare-icon Views on SlideShare
310
Embed Views
0

Actions

Likes
0
Downloads
11
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    IEG 201402 INTUIT Building Big Data Analytics Platform IEG 201402 INTUIT Building Big Data Analytics Platform Presentation Transcript

    • INTUIT: Neeta Pande Building Big Data Analytics Platform at Intuit
    • Building Big Data Analytics Platform at Intuit Neeta Pande 8/Feb/2014
    • Roadmap • Setting Context and Introduction to the Analytics Platform at Intuit • Key highlights that differentiates the platform • Sharing Experiences building the platform • Wish-list of capabilities for future of Big data technologies
    • Setting Context and Intro to the Analytics Platform
    • Quick look into Intuit Offerings
    • Introduction to the Analytical Platform • Central repository of Analytical Data from – – – – Intuit products Intuit Business Systems Intuit Master Systems External Data Sources • Caters to – – – – – Product Managers Product Developers Data Analysts Data Scientists Experience Designers Enterprise Wide Platform for cross Intuit Data Analytics 7
    • HCATALOG Technologies used to build the platform
    • Key highlights that differentiates the platform
    • Capability View of the Platform Management, PM, PD, Data Analyst, Data Scientist Policy based Access Control Central Analytics Platform Near Realtime Batch Realtime Data Integration Product User Entered Data 10 Product Usage Data Business Data Master Data External Data
    • Key differentiators of the Platform DWH Semantic layers on Hadoop Cohost Sensitive Information on same infrastructure Batch, Near Real Time, Real time on the same infrastructure Mobile, Web, Desktop Offerings Enterprise wide data across all offerings and cross-offerings
    • Data Pipeline and Challenges • Encryption of sensitive information • Tokenization for join optimization on sensitive fields • Extract Analytical information before encryption • Challenge loading data from transactional sources 3 Data Cleansing 1 Data Acquisition • Cleansing and Standardization need third party libraries • Part of the same flow and need a hadoop integration DWH load 6 7 8 5 4 Data Standardization • DWH patterns like SCD, surrogate key, fact updates challenging Entity Mastering Incremental load Data Securitization 2 • MDM solutions from major vendors do not provide mastering in Hadoop. • Interactive exploration in MPPRDBMS because of Advanced SQL and query performance • Sampling and extraction for building models in R Data Consumption
    • Sharing Experiences building the platform
    • Custom Implementation of Mastering solution in-hadoop. • Custom Implementation of symmetric key Encryption/Decryption. • Hadoop does not provide out of the box solution • Leading MDM solutions do not have Hadoop Integration • Evaluated Third Party Solutions, not matured enough • Some open source tools have MDM capabilities, but not matured and widely adopted. • Key management using HSM (Safenet) • Decryption UDFs in MR, PIG, Hive shielding developers/users from the security implementation • Evaluated and found Informatica Data Quality good fit for Data Cleansing and Standardization integrated in the same flow as Batch Data Integration • Batch Data Integration – Evaluated and found Big Data Integration capabilities of Informatica relevant for the Platform • Real time – Using Flume for real time use cases. Found Kafka and storm to be a good fit from several requirements POV. • Traditional DWH and incremental loads challenging on Hadoop. • Upserts and SCD handled best in HBase and exposed via HCatalog for querying The adhoc query capabilities still not matured/adopted and hence MPP-RDBMS still preferred. • Large Scale machine learning infrastructure still being adopted. Hence widely used technology options not in place
    • Wish-list for future of Hadoop
    • Data Security support built in to the platform MDM solutions integrated and optimized for the platform Interactive querying capabilities on the big data platforms (Impala, Tez) Better support for traditional DWH capabilities Integrated Real time, Near real time and Batch processing pipelines Distributed machine learning technologies with comprehensive and advanced capabilities Opensource end to end data quality solutions integrated with the platform
    • Q&A Thank you
    • About Information Excellence Group Community Focused Volunteer Driven Knowledge Share Accelerated Learning Collective Excellence Distilled Knowledge Shared, Non Conflicting Goals Validation / Brainstorm platform Progress Information Excellence Towards an Enriched Profession, Business and Society Mentor, Guide, Coach Satisfied, Empowered Professional Richer Industry and Academia
    • About Information Excellence Group Reach us at: blog: http://informationexcellence.wordpress.com/ presentations: http://www.slideshare.net/informationexcellence linked in: http://www.linkedin.com/groups/Information-Excellence-3893869 Facebook: http://www.facebook.com/pages/Information-excellence-group/171892096247159 Google+: https://plus.google.com/u/0/communities/102316155996060621595 twitter: email: #infoexcel informationexcellence@compegence.com informationexcellencegroup@gmail.com Have you enriched yourself by contributing to the community Knowledge Share..