Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Blending Best Practices & Developments: Alteryx, Inspire 2016

704 views

Published on

As greater volumes and types of data become necessary for analysts to make critical business decisions, building streamlined applications for data blending and visualization is now more important than ever. This session presents best practice solutions for three main components of data blending: handling data sources, building efficient workflows, and conducting data exploration. We'll show you how users can optimize each of these components within Alteryx and avoid data analysis errors, thereby saving valuable time in the decision-making process.

To watch a recording of this session from Inspire 2016, visit alteryx.com/inspire-2016-tracks.

Published in: Data & Analytics
  • Be the first to comment

Data Blending Best Practices & Developments: Alteryx, Inspire 2016

  1. 1. Data Blending Best Practices & Developments Presented by : Ben Gomez, Director – Product Management Katie Haralson, Product Owner – Designer and Reporting
  2. 2. #inspire16 Session Speaker
  3. 3. #inspire163 Agenda Handling Input Data • Caching • Sampling • Input Macro Building Workflows Efficiently • Field Summary – Evaluation • Documentation • Simplify the Process • FormulaTool Important Details • Testing your work • Limiting Data Movement • Rename Fields • Bulk Loading #inspire16
  4. 4. #inspire16 To watch a recording of this session from Inspire 2016, visit alteryx.com/inspire-2016-tracks
  5. 5. #inspire16 Things we care about: • Workflow Efficiency • Performance • Memory • Hard Drive Space • Load on servers during production • Development Efficiency • Get to your insights faster • Iterate quickly • Limit load on server as you iterate • Understand your process • Share your process
  6. 6. #inspire16 Handling Input Data
  7. 7. #inspire16 Handling Input Data • Why? • First thing you do in every new workflow • Will affect workflow development • Will affect production performance • Will affect understanding of the process
  8. 8. #inspire16 Handling Input Data Best Practice – Use caching when you don’t need live data • Currently available with relational databases only 8
  9. 9. #inspire16 Handling Input Data Best Practice – Use caching when you don’t need live data 9 Caching – 53 seconds  Caching – 1.9 seconds! 
  10. 10. #inspire16 Handling Input Data Best Practice - Sample your data • Speeds up processing during workflow development 10
  11. 11. #inspire16 Handling Data Sources Best Practice – Utilize Input Macros for frequently-used sources 11
  12. 12. #inspire16 Start Demo • Input Macros • Understanding your data • Documentation • Use Sorts Sparingly
  13. 13. #inspire16 Building Workflows Efficiently
  14. 14. #inspire16 Conducting Data Exploration Best Practice – Understand your data before workflow development • Field Summary Tool can help to identify hidden data problems which can produce invalid results and slow you down • Duplicate records • Missing values • Unexpected characters • Invalid values or ranges 14
  15. 15. #inspire16 Documentation
  16. 16. #inspire16 Handling Data Sources Best Practice – Use sorts sparingly Sorting Data is Expensive • When data is joined by fields a sort is done on the full data set, both sides, unless the data was previously sorted and no operations have been done that invalidate the sort • When sorting, the more data in each record, the longer the sort will take • The Unique tool performs a sort: be aware of extra Unique tools 16
  17. 17. #inspire16 Building EfficientWorkflows Best Practice – Only keep the data you are using • Don’t keep fields that you don’t need • Don’t create spatial objects until you’re ready to use them; discard them once you are done • Don’t keep duplicate fields • Set data aside and rejoin it later • Best Practice: Add a record id field early that can be used to rejoin records later 17
  18. 18. #inspire16 Building EfficientWorkflows Best Practice – Take the time to create clear workflows • Simplify the Process 18 How would you parse an email? name@domain.com ([^@]*)(@)([^.]*)(.*)
  19. 19. #inspire16 Building EfficientWorkflows Best Practice – Separate out distinctly different formula functionality 19
  20. 20. Building EfficientWorkflows • One function per formula, unless they are very closely related • Easier to understand the process • Easier to debug • Easier to split out parts of the data • Easier to copy and paste specific functionality. Best Practice – Separate formulas with distinctly different tasks
  21. 21. #inspire16 Important Details
  22. 22. #inspire16 Important Details Best Practice – Build in tests to make sure your work is correct • Create a test for assumptions • Number of records • Results of calculations • Duplicates or Not • Eliminates the need for a visual verification • Prevents unnoticed errors down the road 22
  23. 23. #inspire16 Important Details Best Practice - Limit data movement • Private Server • Public Server • Amazon (AWS) • S3 • Redshift • Aurora • Azure • SQL DataWarehouse • SQL Server • In-DB 23
  24. 24. #inspire16 Important Details Best Practice – Take the time to create clear workflows • Simplify the Process • Rename Fields • Document yourWorkflow 24
  25. 25. #inspire16 Important Details Best Practice – Take the time to create clear workflows • Simplify the Process • Rename Fields • Document yourWorkflow 25
  26. 26. #inspire16 Handling Data Sources Best Practice – Use Bulk Loading when possible 26 Out In Oracle Teradata SQL Server – requires additional setup (needs native client) Teradata Amazon Redshift
  27. 27. #inspire16 • Limit Data Movement • Utilize Input Macros • Sample your Data • Use Caching! • Evaluate your data • Test your work • Simplify the Process • Rename Fields • Document your workflows • FormulaTool • Bulk Load Best Practices - Summary
  28. 28. #inspire16 Thank you
  29. 29. #inspire16 alteryx.com/trial You can also achieve the incredible benefits described in this slide deck Download a FREETrial of Alteryx and experience self-service data analytics on your next data project

×