2. Agenda
- Tableau
- What is Python?
- TabPy Server
- Installation of TabPy
- Benefits of using Tableau
- Benefits of using Tabpy
- Python Function
- Script_Real Function
- Script_Str Function
- Script_Int Function
- Example of Seattle Police
3. Tableau
Tableau is a powerful business intelligence and data visualization tool that has a very
intuitive user interface. It is very useful in drilling-down data, creating insightful reports
and garner actionable business insights.
Tableau business intelligence and data visualization tool has been placed in the leader’s
quadrant for the fifth year in a row in 2017 in the Business Intelligence and Analytics
Platforms by IT research firm Gartner. Gartner has positioned Tableau as highest in
ability to execute again this year. When it comes to the aspects of intuitive analytics
and interactive visual analytics it cannot get any better than Tableau.
4. What is Python?
- Python is a widely used general-purpose programming language, popular among
academia and industry alike.
- It provides a wide variety of statistical and machine learning techniques, and is
highly extensible.
- Together, Python and Tableau is the data science dream team to cover any
organization’s data analysis needs.
- In 2013 Tableau introduced the R Integration [since Tableau 8.1], the ability to call
R scripts in calculated fields using R Server [ Rserve()].
- With the release of Tableau 10.1, we can use Python scripts as part of your
calculated fields in Tableau, just as we’ve been able to do with R
- The Python Integration happens through the Tableau Python Server - [ TabPy ]
5. Tableau Python Server
- Tableau Python Server (TabPy) is part of Tableau's expanding range of extensibility
options. TabPy framework allows Tableau to remotely execute Python code. It has
two components:
- A server process built on Tornado, which allows for the remote execution of
Python code through a set of REST APIs. Code can either be immediately executed
or persisted in the server process and exposed as a REST endpoint, to be called
later.
- A client library that enables the deployment of such endpoints, based on Python
functions.
- Tableau can connect to the TabPy server to execute Python code on the fly and
display results in Tableau visualizations. Users can control data and parameters
being sent to TabPy by interacting with their Tableau worksheets, dashboard or
6. Installation of TabPy
- Download TabPy from the Link :GitHub - tableau/TabPy: Execute Python
code on the fly and display results in Tableau visualizations
- Click on the Clone or download button in the upper right corner ( see
below ) of the TabPy repository page,downloading the zip file and
extracting it.
- UPDATED BY COMMUNITY TEAM _ Install instructions are here:
Tableau Integration with Python - Step by Step
- More information on how to configure and write calculations also on
official documentation which also talks about how to use table
calculation addressing/partitioning settings correctly.
7. Continued...
- Once the TabPy download is complete.
- Configure it on tableau.
- Configure a TabPy Connection on Tableau On the Help menu in Tableau
Desktop choose Settings and Performance > Manage External Service
Connection to open the TabPy connection dialog box.
8. Benefits of using Tableau
- Data visualization: Tableau is a data
visualization tool first and foremost. Therefore,
it’s technology is there to support complex
computations, data blending and dashboarding
for the purpose of creating beautiful
visualization that deliver insights that cannot
easily be derived from staring at a spreadsheet.
It has climbed to the top of the data
visualization heap because of its dedication to
this purpose.
Quickly create interactive visualizations:Using drag-n-drop functionalities of Tableau,
user can create a very interactive visuals within minutes. The interface can handle
endless variations while also limiting you from creating charts that are against data
visualization best practices.
9. Continued...
- Enhanced user-experience using Tableau:There are many different types of
visualization options available in Tableau which enhance the user-experience. Also,
Tableau is very easy to learn, anyone without having knowledge of coding can
easily learn Tableau.
- Tableau can handle large amounts of data:Tableau can handle millions of rows of
data with ease. Different types of visualization can be created with the large
amount of data without impacting the performance of the dashboards. Also, there
is an option in Tableau where user can make live to connections to different data
sources like SQL etc.
- Use of other scripting language in Tableau:To avoid the performance issues and to
do complex table calculations in tableau, users can incorporate Python or R. Using
Python script can take the load of the software by performing task in packages.
10. Benefits of using TabPy
Python gives access to a wide variety of machine learning libraries and allows you
to call out to web services etc. It sends the data in your Tableau dashboard to Python to
do that.
It has new capabilities like predicting churn or converting text data to sentiment
scores or finding the optimal route for delivery trucks.
11. Python Functions
- In order to let tableau know that the calculations need to go to Python, it must be
passed through one of the 4 functions.
- Python Functions are computed as Table calculations in Tableau.
- Aggregate measures include MIN(), MAX(), ATTR(), SUM(), MEDIAN(), and any
table calculations or Python measures
- Python Functions : SCRIPT_BOOL , SCRIPT_INT , SCRIPT_REAL , SCRIPT_STR
12. SCRIPT_REAL Function
- Script_real function returns a
real result from the specified expression.
- The expression is passed directly to a
running external service instance.
- SUM(Sales) and SUM(Profit) are
used as arguments and placed in
respected shelves.
13. Continued...
- In the above example, each point in the scatter plot is a Customer and TabPy is
receiving SUM(Sales) and SUM(Profit) for each Customer.
- Tableau aggregates the data before sending to tabpy using the level of detail of the
view.
- The function corrcoef returns a Matrix from the correlation coefficient is extracted
so that single column is returned
14. SCRIPT_STR Function
- Script_Str function is used to
return the value of array.
- This example shows ,
concatenate two strings using
python.
15. Continued...
- Category and Sub - Category are being dragged and dropped in row shelf .
- In the table calculation, it used list.append() which updates the value in the
existing list
- Arg1 and arg2 are two objects passed into the append() to combine two strings.
- ATTR() is used in the code to return the value of given expression if it only has a
value for all rows in a group , otherwise it displays asterisk.
16. SCRIPT_INT function
- Script_int function is used to return the value of integer from the specified
expression.
- The expression is passed directly to a running external service instance.
- Generated python script in calculated field of tableau , to display the result which
is multiplied by 2.
17. Code to generate multiplied value
- In this example, I have used sample
superstore data and placed sales value of
category and sub - category and added
table calculation using python script for
multiplying sales value by 2.
18. SCRIPT_BOOL Function
- Script_Bool is used to return the value of boolean from specified expression.
- To identify value is positive or negative this Script_Bool functions helps to get the
result as needed.
- In this example, _arg1 is equal to Sum(Profit).
- In the script , it was written to check the values in Profit measure if it is greater
than zero or less than zero
19. Code to provide +ve number
SCRIPT_BOOL("
lst= []
for i in _arg1 :
lst.append(i>0)
return lst
",
SUM([Profit])
)
20. Building Advanced Analytics Applications with TabPy
Python is a widely used general-purpose
programming language, popular among
academia and industry alike. It provides a
wide variety of statistical and machine
learning techniques, and is highly extensible.
Together, Python and Tableau is the data
science dream team to cover any
organization’s data analysis needs. When you
pair Python’s machine-learning capabilities
with the power of Tableau, you can rapidly
develop advanced-analytics applications that
can aid in various business tasks.
21. Seattle 911 Incident Response Dashboard Example
This example includes a real data set
from Seattle 911 police calls incident
response data for the past 6 years. The
dashboard highlights the hotspots
because there is a lot of activity. For
example, you want to focus your efforts
on patrol cars in certain places. You don’t
want to look at thousands of events; you
just want to know what is the best area
for improvement.
22. Dashboard Composition Criteria
For this we break it into different types of crimes and you can see that there is some
criteria like Average Incidents per Quarter.
- Does it have to have at least two incidents per quarter to be qualified as a hotspot
and what the proximity tells us about the distance between incidents?
- If there are five incidents within 100 feet of each other, does it make them a
cluster of incidents?
- Is that a hotspot or is 200 feet or 300 feet better?
These could change depending on where the analysis happens.
23. Dashboard Composition Criteria (continued)...
For example, in a rural area you may say maybe 500 feet is good enough, but in a
more condensed area (more dense city area where city blocks are smaller), there is a lot
more activity; maybe 200 feet is a better idea. If we change this from 300 feet to 200
feet, then it’s a more tight criteria; clusters are getting smaller. We can relax it by
saying one incident is enough; that will give us more points.
24. Justification of the Dashboard
We have an original data set with a point for each criminal activity to happen in Seattle area. If you plan on
visiting Seattle, at some point you can look at spots that you should be avoiding. In this case, we have a filter; we
can swap between different activity types. What we are seeing here are criminal hotspots out of many thousand
points. We are finding points where there is a lot of dense high number of activity nearby:
- Assault
- Threats/harassment
- Residential burglary
- Narcotics
- Liquor violation
- Parking violation
We can see that there are some interesting hotspots. We parameterized this because, for example, a police
officer may have different criteria for what qualifies as a hotspot; they may think that there has to be at least
one incident per quarter within 200 feet. Under the covers, we are running Python code to answer the question.
25. Hotspot Calculation
In this case, this is our hotspot
calculation. We are using script
string function in this case. In the
Calculated Field we can see that
the calculation contains plenty of
Python code; we are passing
Python code between quotation
marks. It has a method called
DBSCAN which is great for looking
for dense and sparse areas. It
handles noise really well and also
works really well on maps.
26. Hotspot Calculation (continued)...
It uses numpy; it also uses scikit-learn which is very popular machine learning library. It
passes our latitude, longitude and distance between incidents and incident count as
inputs to the model. DBSCAN algorithm is really good for this kind of mapping
applications; it is also good for clustering when the data has noise. This is actually a
clustering algorithm that returns cluster number or outliers; it looks at clumps of data.
We filter out outliers and just return the clumps.
27. DBSCAN
Density-based spatial clustering of applications with noise (DBSCAN) is a well-suited
algorithm for this job. It is also installed conveniently by default as part of TabPy. It
takes two parameters:
- one to specify the maximum allowed distance between points for them to be
considered part of the same cluster
- one to specify the minimum number of nearby points to constitute a cluster.
This allows for experimenting with different values of distance and event frequencies
criteria. Different options can be more appropriate for downtown Seattle versus the
suburbs, a police officer looking for hotspots versus a tourist looking for places to avoid,
or a tenant looking for houses to rent or buy.
28. How Does It Work?
We are using Excel as our data
source, but many different kinds of
data sources can be used. Since our
calculations and Python
calculations are local calculations,
it is independent of the Data
Source we are connected to. This
works on Tableau Desktop and
Server. Tableau leads into Tableau
product from the Data Source, and
then data is inside the sheet. That
is what is being passed to Python
for processing.
29. How Does It Work? (continued)...
If visualization is an aggregate visualization, we will pass the aggregated data. If
the aggregate measures option is unchecked, then we will pass this aggregated data.
Whatever we see in the visualization (marks, the values for those marks) - that is what
we are passing to Python. External Services cover RServe and TabPy that are the
components that we use to talk to Python. We send the scripts along with the data and
we get some results data back and display it in Tableau. We return one column as a
result. In this case, we return the column that we are using on the color shelf and the
filter shelf.
30. YouTube Link for presentation :
https://www.youtube.com/watch?v=NzzUAdxQLQQ&t=10
94s