This document discusses various visualizations and analyses created in TIBCO Spotfire using different public datasets. Some key points:
- Maps showing park bench locations in Rostock, Germany and combining layers from different WMS resources.
- Unemployment trends in Germany by gender with Holt-Winters forecasting.
- How Germans voted in the 2014 European elections.
- K-means clustering distinguishing east and west German states using census housing data.
- Hierarchical clustering of BI product usage patterns from a survey.
- Scraping privacy data from a website directly into Spotfire using import.io.
- Double WMS layer map showing German population density overlaid on rivers.
- Text
2. Have you ever wondered why you have no park bench when you think there should be
one in a historic city center? Here is a map of the park benches in the city of Rostock /
Germany. All data comes from 2 data files:
• Benches (csv) : https://www.govdata.de/suchen/-/details/baenke-hro-hro
• Historic Structures (shape): https://www.govdata.de/suchen/-
/details/baudenkmale-hro-hro
The picture shows the out of the box map visualization in TIBCO #Spotfire .
3. Spotfire 6.5 Desktop #ChartOfTheDay: Unemployment in Germany by
gender with 3 month Holt-Winters forecast by state and gender.
Source: Federal Statistical Office Germany (Genesis Database, Table:
13211-0010) https://www-genesis.destatis.de/genesis/online/data
4. How the Germans voted in the European elections 2014
#Spotfire #ChartOfTheDay
https://www-genesis.destatis.de/genesis/online
Table: 14221-0002
5. #ChartOfTheDay K-means clustering clearly distinguishes between east and west
German states. I used data from the German census from 2011. The data shows age
profiles of houses (the year the house was built by decade) for all German states.
A clear peak in the 1990's after the reunion in the East.
Berlin (also considered a small state) clearly clusters together with the eastern states,
but this may be because the entire Berlin, eastern and western part are counted as
one.
6. #ChartOfTheDay Hierarchical Clustering of the BI product usage
patterns taken from the BARC BI Survey 2014 (Use Cases by Product)
It is interesting to see which BI products are mainly used for which
purpose (red = low usage, yellow = average usage, green = high usage)
7. #ChartOfTheDay I have used the new http://magic.import.io tool to
scrap the data from http://privacygrade.org/apps and put the data
directly via copy and paste into Spotfire. I used label rendering for the
axis as well as for the apps and the mouse over (both labels combined
plus App Name)
8. My team has created a double WMS layer map chart using layers from 2 different WMS resources: rivers
from http://maps.dwd.de/geoserver/wms?
and population density (for Germany) from https://www-genesis.destatis.de/gis/cgi-
bin/mapserv?MAP=/home/fgs/gis/gisdocs/tmp/GUEST_7879745365893_001.map&SERVICE=WMS&VERSIO
N=1.1.1&REQUEST=GetCapabilities
You can directly see the German population lives along the big rivers or along small rivers flowing into the
big rivers.
9. Had a lot of fun textmining and analyzing pharma
patents today using @attivio and @TIBCO #Spotfire
@PKI_Informatics
10. Showing data in
context is always
more useful. Here
I clustered data
for 3 cities in
Germany on a
map and
combined it with a
"profile" view to
show how
geolocalized data
can help a user.
11. I love to combine 5 dimensions (and more) in
Scatterplots... #LifeScience
Can you share similar examples?
12. Another example of a ScatterPlot with multiple dimensions. Here is
an ECG dataset of patients with Apnea.
Source: T Penzel, GB Moody, RG Mark, AL Goldberger, JH Peter. The
Apnea-ECG Database (http://ecg.mit.edu/george/publications/apnea-
ecg-cinc-2000.pdf) . Computers in Cardiology 2000;27:255-258. ---
https://www.physionet.org/physiobank/database/apnea-ecg/
13. I finally got my hands on a perfect dataset to be used in the new Spotfire KPI-
Chart: water temperatures of all German swimming lakes divided by
Bundesland. This is Spotfire 7.6 and I am looking forward to use this on my
phone, since many of the lakes are close by and the kids are always looking for
the warmest water... Now this is my decision support system for the week-end!
14. Density Plot of measurements over time using OOTB Spotfire - I am Binning X & Y & Marker By
by the same bins of X/Y.:
BinByEvenIntervals([date],30) --> X
<BinByEvenIntervals([value],30)> --> Y
<BinByEvenIntervals([value],30) NEST BinByEvenIntervals([date],30)> --> Marker By
15. Regional Clusters of #Food Variety across US Farmers
Markets @PKI_Informatics @TIBCO #Spotfire
#100daysofSpotfire
16. Hey it's #FridayFunDay again, and today I've experimented with a large dataset
21Million rows and 5 columns - easily fits into my 16GB of RAM! This
ScatterPlot shows all 21M datapoints with line connections in the color of the
cell line. I have marked one cell line - this is the blue center. You can see a large
variety amongst the 300+ cell lines loaded. It looks like a work of art, but it
actually describes the dataset very well and now I can work on finding the
outliers in all that genetic profile noise :-) Have a nice week-end! The data
comes from CellLineNavigator and it is the full dataset:
http://medicalgenomics.org/celllinenavigator
17. Find gaps in your dataset with this little tweak to your Spotfire ScatterPlot:
...just project the dataset below the X and Y minimum and keep the coloring,
while adjusting the marker shape to lines. Remember you can always MARK an
entire "section" of the data by marking ON the axis itself. In this plot you have
3 dimensions. Why not add a 4th dimension by changing the SIZE of the
markers, or even experiment with the length of the projected lines?
#FridayFunDay #100daysofSpotfire
18. 3 Year Forecast using out-of-the-box Spotfire:
By 2018 Germany will have a similar number of births per year like 20 years ago. This is accomplished by the help of
foreigners mainly coming from the rest of Europe, but also APAC, America, and Africa.
Data from:
https://www-genesis.destatis.de/genesis/online?sequenz=tabelleDownload&selectionname=12612-
0003®ionalschluessel=&format=csv
The forecast here is using the Spotfire standard Holt-Winters with 0,95 confidence level. I used the "Group from
marked categories" to combine the countries into the trellis variable. This is also using a custom visual theme and
annotations.
19. Today I had a closer look at the Top500 YouTubers list of Socialblade
https://socialblade.com/youtube/top/50030d and I found some outliers just by
applying OOTB methods of Spotfire. You can see that there is a User name
appearing twice in the list and I added a new line into the datat table where
the combined counts would end up (Double Really?). The other two outliers
are in the scoring... What do you think? Do you have examples of Data Quality
controls like this in your data eco system?
20. #WhiteHouse salary progression > winners and losers.
Data from http://open.whitehouse.gov/OData.svc/
#FridayFunDay @PKI_Informatics @TIBCO #Spotfire
(...the Spotfire OData connector works well :-)
21. http://xenon.colorado.edu/portal gives you 10+ years of data on PBO H2O GPS Snow
Depth #FridayFunDay with Spotfire!
Can you spot the outlier? Every month is a slice of the pie. The size is relative to the
amount of snow over the last 10 years at this location.
...this is a "very cool" dataset!
22. I am starting to explore the FAERS data more and more. Answering questions outside of the standard drill-down and across
are doable on the fly with Spotfire, but one has to use all kinds of tools inside the software. Detecting anomalies in the data
through visuals is key. Every adverse reaction to a drug is monitored by the FDA with a case number for an individual which
is a unique ID through all follow-up doctor visits. Everything is timestamped. Most of the submissions are done by the
pharma company. So I could create this graph. For each caseID I created a relative days from first date, which gave me the
start. Then I ranked the cases by maximum days and inside this I did this by company. This all ended-up in one ScatterPlot.
For almost all pharma companies I have looked at the "shape" looks like the four to the bottom of the screen, but AbbVie
shows a different pattern. But even if I did a lot of pivots and calcualted columns in Spotfire, the link to the original data is
still there, so marking the orange cases which show the odd shape, brings me right back to my original data and I can have
an immediate look into the root cause of this...
I have done a similar analysis before with products and order to shipment dates with all the intermediate steps (you can see
here as individual "dots" inside the shape). You can split your ERP product data into different categories to find odd shapes
yourself.
23. Beginner's Guide: FAERS data in Spotfire (Real World Evidence)
Real World Data is usually hard to analyze outside the standard use-cases it was originally collected for. One of the often used data source in real world
evidence projects is FAERS.
FEARS stands for "FDA Adverse Event Reporting System"
(http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/ucm082193.htm) .
Why not "have a look" at this FAERS data to get a better understanding of how it might help you to answer your life-science and healthcare questions before
jumping into hardcore statistics?
Don't shy away from just loading it into Spotfire. Of course there are lots of datapoints, the data is not very clean, but Spotfire can easily handle 2 years of
that data (250M+ datapoints in one session). You can explore the relationships and get a good feeling of where it makes sense to dive deeper before using
statistics to really find out what is going on with a drug. In my example below, you might see a "signal", but of course you have to prove it first before you
really go out and publish it.
The FAERS data is released
quarterly and has a simple
relational structure but the data
is really multi dimensional.
Adverse events are reported by
many sources on many drugs
and of course many patients.
Some patients actually are
taking many drugs at the same
time or over a longer period
with many possible outcomes.
Some of the data is usable right
away, some must be cleaned
first, but luckily Spotfire is really
great for data wrangling...
In my example I tried to "state
the obvious" and look for drug
combinations which actually are
listed as a blax box warning
(https://en.wikipedia.org/wiki/B
oxed_warning). So don't take
Xarelto and Aspirin together...
But there are also hidden
relationships in this data you
might find yourself.
24. Failing Clinical Trial Sites:
Some clinical trial sites are not performing well. If you take
Phase III clinical trials and look at the ratio of performing vs.
non-performing trial sites you can gain a better understanding
of your trials. As always this might have many reasons and
Spotfire helps you to find the root-cause in all of your data.
26. The white asparagus season is starting soon across Germany with increasing yields year over year. In some areas across Germany the yield per acre is
increasing.
The graphs show the trends of asparagus yields per acre across German states (where data was available). We combined yields and crop areas from 2015 and
2016. The states increasing their yield per acre are displayed in green, whereas a decline is shown in red. In the ScatterPlot on the left the trend shows an
overall decline of yield/acre year over year with some states showing an increase. The bubbles on the map on the right have the same coloring and show the
states with increasing yield/acre are in the middle and north/east of Germany. The sizes of the bubbles show the overall yield in 2016. The individual states
are colored by overall area of the crop.
Source Data:
Combination of two tables 41215-0005 and 41215-0006 from https://www-genesis.destatis.de -- (C)opyright Statistisches Bundesamt (Destatis), 2017
27. Today I'd like to share an actual HeatMap! This one is showing
temperatur inside a greenhouse in correlation to the sunshine
intenisty per week of a year. I am binnning the sunshine
intensity inside the graph to let the user define the best
"resolution" of the image. The underlying data are hourly
values, but I don't have to use another time axis since the
sunshine intensity and the hour of the day are dependent
variables.
29. If you are a life science researcher, you know that publishing your results can
be a lengthy process. I was interested in the time it actually takes from sending
the article in the first time to publishing it. In this analysis I am comparing
Novartis and Pfizer by these publishing times and it seems that there is a trend
towards rapid publications. As a next step an idea might be to include impact
factors or MeSH terms to see whether this trend is specific to a certain field of
research.
30. Some eye candy for long Friday afternoons... I have taken all Phase2
clinical trials of the last 10 years, and textmined their description in
SciBite to get the mentioned gene names out. In the graph you can
see how the mentions are developing over time.
31. #FridayFun with #OpenData using @TIBCO #Spotfire to visualize per capita spending in
Paris and Île-de-France https://data.iledefrance.fr//explore/dataset/les-services-aux-
particuliers-par-commune-ou-arrondissement-base-permanente-des-
/download?format=shp&timezone=Europe/Berlin&use_labels_for_header=true
32. ....sometimes I just have to take a screenshot of beautiful
DataViz in TIBCO Spotfire here at PerkinElmer Informatics
- This is a TERR predicted DesignOfExperiment
33. Today I have analyzed the structure of #SemanticWeb
#GDPR text extensions (GDPRtEXT -
http://openscience.adaptcentre.ie/projects/GDPRtEXT/ )
using Spotfire
34. Have you ever looked at the "Data Carpet" = "HeatMap of the full
table" underlying a Dashboard? Using Spotfire you can discover data
inconsistencies and clean them right away. More tips and tricks you
can learn here: https://www.tibco.com/resources/demand-
webinar/data-preparation-tibco-spotfire
35. Using TIBCO Spotfire you
can effectively spot
outliers in your IoT
timeline data. 44
measurements seem to be
the norm in this data set.
There are single data
points towards the end
(right). Every row is a
device/patient and every
column is a time point/day.
There are four outliers in
the upper right as well as
two in the lower left.
Although 44 seems to be
the normal amount of
measurements, on day 7
measurements were
recorded twice for two
devices/patients. Now it is
time to drill into the data
and do some more data
analysis.
36. Summertime is bathing time. My new Spotfire interactive
version of this analysis with current (July 2018) data of the
water temperatures of German lakes across the entire country.
(data:https://www.wetteronline.de/wassertemperatur-
badeseen)
37. Using Spotfire I looked at the global CLIMAT weather stations and their altitude. The histogram shows that most
stations are close to sea level. Together with PerkinElmer Informatics your farming initiative could analyze data from
weather stations around the globe and combine public and private data to make the best predictions of local and
global trends.
#opendata from ftp://ftp-cdc.dwd.de/pub/CDC/observations_global/CLIMAT
(https://www.dwd.de/EN/ourservices/climat/climat.html)
For the nerds: the entire analysis was created using TIBCO Spotfire Business Author in Chrome on an Ubuntu Linux
system connecting to an #AWS based Spotfire server.
38. Love the new look and feel of Spotfire X.
I prepared several slides (interactive where needed) for a full-screen presentation in a browser!
Some detail:
- Navigation has been moved to the lower left hand corner
- Viewing mode is on (upper right)
- The new Natural Language Search, Bookmarks and Filter-panel are accessible with one click (icons upper right)
https://www.tibco.com/products/tibco-spotfire/whats-new
39. I used SpotfireX new AI-powered Recommendations, which
automatically suggested these density plots for my data.
#BlackFriday2018
https://www.tibco.com/products/tibco-spotfire/whats-new