Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Taming the shrew, Optimizing Power BI Options


Published on

SQL Saturday Atlanta BI Edition, 2018

Published in: Technology
  • Be the first to comment

Taming the shrew, Optimizing Power BI Options

  1. 1. Taming of the Shrew Tricks to Optimizing Power BI Kellyn Pot’Vin-Gorman TSP, Power BI and AI in Education
  2. 2. Kellyn Pot’Vin-Gorman Technical Solution Professional at Microsoft, Data Platform in Power BI and AI • Former Technical Intelligence Manager, Delphix • Multi-platform DBA, (Oracle, MSSQL, MySQL, Sybase, PostgreSQL, Informix…) • Oracle ACE Director, (Alumni) • OakTable Network Member • Idera ACE Alumni 2018 • STEM education with Raspberry Pi and Python, including DevOxx4Kids, Oracle Education Foundation andTechGirls • Former President, Rocky Mtn Oracle User Group • Current President, Denver SQL Server User Group • Linux and DevOps author, instructor and presenter. • Blogger, ( @DBAKevlar
  3. 3. Gaining just 10% more access to data can result in over $65 million in revenue
  4. 4. User Chooses to Refresh Report User Gets in Car To Get Cup of Coffee In Next Town While Waiting for Refresh User Needs Updated Information from Power BI Report Our User Story
  5. 5. Relational Data Oracle, SQL Server, Teradata, Salesforce Cloud Data Azure, AWS, Google Other Data Excel, Access, Sharepoint, etc. MODEL & SERVE Azure Analysis ServicesAzure SQL Data Warehouse Power BI . Power BISQL Server Integration Services P O W E R B I L A N D S C A P E Finding all the Fish in the Ocean Data Factory Big Data DataLake,Hadoop, Hortonworks
  6. 6. Power BI is Guilty Until Proven Innocent
  7. 7. Relational Data Oracle, SQL Server, Teradata, Salesforce Cloud Data Azure, AWS, Google Other Data Excel, Access, Sharepoint, etc. MODEL & SERVE Azure Analysis ServicesAzure SQL Data Warehouse Power BI . Power BISQL Server Integration Services P O W E R B I L A N D S C A P E Finding All The External Latency Data Factory Big Data HD Insights, DataLake, Hortonworks
  8. 8. Coordinate pipeline acOPTIMIZATION EXERCISE PROCESS Power BI Layer Bring Data to Network Specialist OnceVerified Non-Issue Network Layer OnceVerified Non-issue BringWait Times to Data Specialist Repeat and verify resolved Inspect Data Model Data Sets Power BI Review Steps: Resources Concurrency Visuals and Dashboards Data Modeler to Address OnceVerified Non-IssueData Sources Identify byType and bring in expertise for each
  10. 10. • A scientific approach to optimization. • Optimizing on cost, or assumptions does not guarantee results. • Removes finger pointing and the “Blame Game” • Simplifies the process of identifying real latency. • When Time is Addressed, Long Term Resolution is Often Experienced. Why Time Should BeYour Main Focus for Optimization
  11. 11. DATA SOURCES
  12. 12. • Data sources can be relational, databases, big data, CSV/Excel, structured/unstructured data files. • If there are onsite or remote specialists available, partner to gather distinct data to identify waits and patterns. • Know, along with execution plans, tracing can assist in identifying deeper and multi-tier issues that isn’t divulged in traditional performance tools. • Infrastructure tools, cloud monitoring tools and tracing can also provide more information than traditional tools. Steps for Optimizing Data Sources
  13. 13. RELATIONAL DATA SOURCES •Filter Early, Filter Often- before it is pulled to Power BI •Understand the optimizer and plans for queries and performance “gotchas” for different database platforms •Push calculated columns and measures to the source where possible – disperse resource age for the object to the source. •Add indices, partitioning, etc. to support commonly queried tables
  14. 14. BIG DATA •Use HD Insight and/or Azure Data Factory to help manage sheer quantity of data. •Manage partitions and prune unnecessary data regularly. •Make a goal to migrate to “pristine” data model from unstructured data. •Make yourself part of the development process to be aware of changes to what data is being consumed. •Have clear and concise list of what data is important to the business vs. what is collected.
  15. 15. ACCESS AND EXCEL/CSV • Keep Excel sheets and Access tables that are brought into Power BI narrow. Wider tables perform poorer. • Purge or archive off unused data from Access, which can slow down refreshes. • Convert derived values from formulas to static values whenever possible. This removes one conversion step when importing/refreshing to Power BI • Avoid multiple volatile functions and array formulas in Excel. This is not the place for these. • Avoid linked tables with Access with split database architecture. • Consider the size of the data in regards to refreshes and how it will impact Power BI performance.
  16. 16. NETWORK
  17. 17. The Network – The Final Bottleneck On-Premise data sources SQL DB Managed Instance SQL Server VNET Data User Power BICloud data sources Microsoft SQL Server Integration Services Firewall is our best friend and worst enemy
  18. 18. NETWORK • Networks are still limited by much of “Shannon’s Law” • Filter to deter from creating bottlenecks on the network. • Become friends with the network admin to isolate issues with firewalls and network bottlenecks. • Consider how often refreshes are performed and from where the data is being sent from and to.
  19. 19. POWER BI LAYER
  20. 20. Columnar data store makes it forgiving of large data sets. But…Power BI is dependent upon the data that it sources from, along with multiple other features. Performance can be hindered by numerous items Power BI is dependent upon: • Data Model • Data Size • Resources Allocated for Processing • DataTypes
  22. 22. POWER BI QUERY EDITOR • Avoid complex queries in Query Editor, combinations of filter with context transition are some of the worst. • Don’t use relative date filtering in the Query Editor. • Keep measures simple initially, adding complexity incrementally. • Avoid relationships on calculated columns and unique identifier columns. • Try setting “Assume Referential Integrity” on relationships – this may improve query performance. • Ensure relationships are set up properly, use new many to many sparingly.
  23. 23. As You Design Your Reports Simplify Data Demands Whenever Possible Remove Unused Columns Avoid Distinct counts on fields with High Cardinality Limit Complexity on High Cardinality Consider How Often Data Refresh is Required
  25. 25. VISUALS • Filter early and filter carefully. • You may want to switch off interaction between visuals – it reduces the query load as users cross-highlight. • Always test the impact of row-level security roles that your users will use and performance. • To ensure long-running queries won’t monopolize the system, there is a 225 second timeout on visuals. Design visuals with as much simplicity as possible to avoid this threshold.
  26. 26. • Eight MAX visuals in dashboard or report • Set filters in filter pane of reports. • Understand where performance hits are sourcing from • Test and track refreshes over time for reports and dashboards – Don’t assume. • Don’t build complicated measures or aggregates at the data model layer. Tips for Dashboards
  27. 27. • NarrowTables are Faster • Integers over strings, (text) • Slicers use multiple steps, (queries) to process • Use powerful DAX functions that can eliminate complex or poor performing expressions. • Certain filters can hinder performance if they examine each row. Identify when this occurs. • Simplify queries whenever possible • Follow best practices for relationships for your data model • Add indexes and foreign keys whenever possible Power BI Tips
  28. 28. Resource Constrictions Can Hinder Performance: • Consider increasing memory allocated for data loads • Up data cache for large processing. • Monitor and alert on thresholds for demands for enterprise reporting Resource Constrictions Can Hinder Performance, too!
  29. 29. Power BI uses premium memory when: •Loading datasets* •When refreshing a dataset, (scheduled and on- demand)* •Running report queries •Poor performance can result if evicted due to LRU runs into conflict. *Remember that datasets in memory may be larger than when stored on disk and not to confuse premium memory with Power BI Premium. Gotchas With Published Reports
  31. 31. X
  32. 32. let Source = Csv.Document(File.Contents(“<logfile>"),5,"",null,1252), #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}, {"Column3", Int64.Type}, {"Column4", type text}, {"Column5", type text}}), #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Column2", "Column4"}), #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Column3", "PID"}, {"Column1", "Process Type"}}), #"Replaced Value" = Table.ReplaceValue(#"Renamed Columns","{Start:","",Replacer.ReplaceText,{"Column5"}), #"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value", "Column5", Splitter.SplitTextByEachDelimiter({",Action:"}, QuoteStyle.Csv, false), {"Column5.1", "Column5.2"}), #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column5.1", type datetime}, {"Column5.2", type text}}), #"Renamed Columns1" = Table.RenameColumns(#"Changed Type1",{{"Column5.1", "Start"}}), #"Replaced Value1" = Table.ReplaceValue(#"Renamed Columns1","}","",Replacer.ReplaceText,{"Column5.2"}), #"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value1", "Column5.2", Splitter.SplitTextByEachDelimiter({",Duration:"}, QuoteStyle.Csv, true), {"Column5.2.1", "Column5.2.2"}), #"Replaced Value2" = Table.ReplaceValue(#"Split Column by Delimiter1","00:00:","",Replacer.ReplaceText,{"Column5.2.2"}), #"Renamed Columns2" = Table.RenameColumns(#"Replaced Value2",{{"Column5.2.2", "Duration"}}), #"Changed Type2" = Table.TransformColumnTypes(#"Renamed Columns2",{{"Duration", type number}}), #"Renamed Columns3" = Table.RenameColumns(#"Changed Type2",{{"Column5.2.1", "Message"}}), #"Removed Columns1" = Table.RemoveColumns(#"Renamed Columns3",{"Process Type"}) in #"Removed Columns1"
  33. 33. Term Function Log Source SimpleDocument Local Object Multiple logs RemoteDocument Remote Excel or CSV file Multiple logs PackageStorage Disk waits- database, often Access Power BI logs PBIDashboard Dashboard waits PBI logs, inspect message PBIVisualConsent Row level permissions PBI Logs, inspect message PBIData.get Get Data waits PBI Logs, inspect message PBITrustedVisual Open visual view PBI Logs PBIModuleLoad Load of dashboard PBI Logs FirewallDocument Cloud or remote document MSMdsrv Logs
  36. 36.
  38. 38. SUMMARY • Remember to stay with the process. • Use time as the reason to optimize. • Use data, not assumptions. • Use Power BI to analyze logs and traces, just as you would other data. • Collaborate with the user to identify what’s important to them, too.
  39. 39. Thanks to • Chris Webb for sharing test data and ideas. • Brent Ozar for creating the sp_blitz data model that offered the opportunity to optimize. • The EDU group at Microsoft for offering a full environment for me to build for testing, including the cloud to work with on this presentation.
  40. 40. Questions? Twitter: @dbakevlar