Platfora - An Analytics Sandbox In A World Of Big Data

2,051 views

Published on

As Big Data becomes the norm in dealing with data volume, variety, and velocity, it becomes increasingly harder for the Data Analyst to understand and work with data sets. To overcome this we introduce Platfora, a Hadoop backed data analysis framework which nicely complements more traditional data warehousing and BI solutions. This presentation covers ingestion of new data and building of data sets and visualizations,in a system that requires no more work than interacting with a graphical interface. You'll see examples of peer-to-peer lending and how insights on loan applicants and their risk profiles can be quickly revealed with no ETL development or demanding data transformation.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,051
On SlideShare
0
From Embeds
0
Number of Embeds
389
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Platfora - An Analytics Sandbox In A World Of Big Data

  1. 1. ©2014 DesignMind. All Rights Reserved. An Analytics Sandbox in a World of Big Data Roberto Arnetoli roberto@designmind.com Vice President,Big DataSolutions Andrew Eichenbaum andrew@designmind.com Principal DataScience Consultant Platfora
  2. 2. 2 ©2014 DesignMind. All Rights Reserved. DesignMind’s Expertise and Offering Power BI Applications Databases Data Warehousing Big Data BI & Data Visualization Information Sharing & CollaborationCloud Computing Data Science
  3. 3. 3 ©2014 DesignMind. All Rights Reserved. Our Clients
  4. 4. 4 ©2014 DesignMind. All Rights Reserved. Agenda  Big Data and Self-Service Analytics  Platfora  Case Study: Peer-2-Peer Lending  Demo  Conclusion and Questions
  5. 5. 5 ©2014 DesignMind. All Rights Reserved. Big Data and Self-Service Analytics
  6. 6. 6 ©2014 DesignMind. All Rights Reserved. What is Big Data?  Largedata sets  excessive retrievaland processing time  structured and unstructured collections BIG DATA
  7. 7. 7 ©2014 DesignMind. All Rights Reserved.  volume velocity variety Volum e Velocity Variety SQL BIG DATA SQL vs. Big Data
  8. 8. 8 ©2014 DesignMind. All Rights Reserved. We tend to structure data  we tend to prepare, transform and structuredata  severaladvantages - - - -  severalnon-trivial disadvantages - - - Traditional DataWarehouse Big Data Platform
  9. 9. 9 ©2014 DesignMind. All Rights Reserved. For today’s Data Scientistsit issimply not enough! mailfeeds additional databases multimedia logs social geo e-commerce unstructured text web Traditional DataWarehouse Big Data Platform
  10. 10. 10 ©2014 DesignMind. All Rights Reserved. mailfeeds additional databases ia social web Traditional DataWarehouse Big Data Platform For today’s Data Scientistsit issimply not enough!  self-serviceanalyticsplatform  ‘analyticssandbox’  significantly reduce timeand costs
  11. 11. 11 ©2014 DesignMind. All Rights Reserved. DesignMind chooses Platfora  Microsoft Gold Data PlatformPartnerand SilverBI Partner ClouderaPartner PlatforaPartner  data analyticswinning solution maximize thevalueof their data makefact-based decisions Big Data Platform Traditional Data Warehouse Self-Service Analytics
  12. 12. 12 ©2014 DesignMind. All Rights Reserved. Platfora
  13. 13. 13 ©2014 DesignMind. All Rights Reserved. Platfora is an All in One Data Sandbox Ingest Select Explore
  14. 14. 14 ©2014 DesignMind. All Rights Reserved. Platfora Easily Ingests Data  Delimited Text XML JSON Raw Text Avro 
  15. 15. 15 ©2014 DesignMind. All Rights Reserved. Platfora MeansHands Off ETL    lenses
  16. 16. 16 ©2014 DesignMind. All Rights Reserved. Platfora MeansHands Off ETL  Platfora ETLprocessbacked by Hadoop - Automaticcluster creation on multiple platforms(Amazon,Cloudera, Hortonworks) - Cluster sizesfrom one node to many  Automaticallyhandlesthe handoff of multiple filesof any size to the cluster  Scheduling available for data reprocessing or updates
  17. 17. 17 ©2014 DesignMind. All Rights Reserved. Platfora Allows for Easy Data Exploration 
  18. 18. 18 ©2014 DesignMind. All Rights Reserved. Typical Big Data Warehousing Stack  complexlinear process Data warehouse accesstools have no easy way to accessthe data from earlier stages Only way to get new data in is to reprocess the data at the Ingestion and Transformation levels Ingest Select Explore Transformation I n g e t s i o n
  19. 19. 19 ©2014 DesignMind. All Rights Reserved. Big Data Warehousing Tools Pig  Transformation  Each step can be complexand need a knowledgeablesupport staff  Ingestion  BI Tools  data warehousing
  20. 20. 20 ©2014 DesignMind. All Rights Reserved. Platfora Sits Parallel to the Traditional Stack  Ingest Select Explore Data Catalog VizboardsLenses Transformation I n g e t s i o n
  21. 21. 21 ©2014 DesignMind. All Rights Reserved. Case Study: Peer-2-Peer Lending
  22. 22. 22 ©2014 DesignMind. All Rights Reserved. What is P2P Lending   
  23. 23. 23 ©2014 DesignMind. All Rights Reserved.  - - -  - - -
  24. 24. 24 ©2014 DesignMind. All Rights Reserved. Completed Loans: Months to Last Payment  Loans can complete in two ways: Charge Off (Default) and Fully Paid  Normal loan durations are 36 and 60 months.  Early payoff and Charge Offs follow the same curve after two months of payments.  Loan Charge Off rate is approximately 16% for loans completed in the first the first 18 months.
  25. 25. 25 ©2014 DesignMind. All Rights Reserved. Loan Stats: Average Revolving to Maximum Credit  When loans are in funding, can we find predictors of default?  We look at loan applicants total revolving credit (e.g. credit cards) vs the average revolving credit balance
  26. 26. 26 ©2014 DesignMind. All Rights Reserved. Loan Stats: Average Revolving to Maximum Credit
  27. 27. 27 ©2014 DesignMind. All Rights Reserved. Demo
  28. 28. 28 ©2014 DesignMind. All Rights Reserved. Demo Notes  - -  - -  
  29. 29. 29 ©2014 DesignMind. All Rights Reserved. Conclusion
  30. 30. 30 ©2014 DesignMind. All Rights Reserved.
  31. 31. 31 ©2014 DesignMind. All Rights Reserved.  Concluding Remarks  Quick Introduction to Platfora and its abilities - It is a data analytics sandbox that is complimentary to current ETL/Warehouse implementations - Allows data practitioners free range to access and use new data easily  Platfora can do a lot more than shown  Platfora is extensible: - UDFs allow access to almost any Java routine - Data ingestion can be scheduled
  32. 32. 32 ©2014 DesignMind. All Rights Reserved. Questions
  33. 33. 33 ©2014 DesignMind. All Rights Reserved. www.designmind.com

×