SlideShare a Scribd company logo
1 of 41
Download to read offline
1
Business Analytics Tool Kit
PORTFOLIO
Hannah Forsythe
2
TABLE OF CONTENTS
Executive Summary .....................................................................................................................................4
Chapter 1: Microsoft Excel Pivot Tables.....................................................................................................5
About Microsoft Excel Pivot Tables.........................................................................................................5
Data Set and Research Questions..........................................................................................................5
Application of Microsoft Excel Pivot Tables............................................................................................6
Analysis of Microsoft Excel Pivot Tables.................................................................................................8
Conclusion................................................................................................................................................8
Chapter 2: Tableau ......................................................................................................................................9
About Tableau ..........................................................................................................................................9
Data Set and Research Questions..........................................................................................................9
Applications of Tableau ........................................................................................................................ 10
Analysis of Tableau............................................................................................................................... 11
Conclusion............................................................................................................................................. 11
Chapter 3: IBM Cognos Insight................................................................................................................. 12
About IBM Cognos Insight .................................................................................................................... 12
Data Set and Research Questions....................................................................................................... 12
Applications of IBM Cognos Insight ..................................................................................................... 12
Analysis of IBM Cognos Insight ............................................................................................................ 13
Conclusion............................................................................................................................................. 14
Chapter 4: SAP Lumira ............................................................................................................................. 15
About SAP Lumira ................................................................................................................................. 15
Data Set and Research Questions....................................................................................................... 15
Applications of SAP Lumira .................................................................................................................. 16
Analysis of SAP Lumira ......................................................................................................................... 19
Conclusion............................................................................................................................................. 19
Chapter 5: SAP Lumira for Geospatial Analysis ...................................................................................... 20
About SAP Lumira for Geospatial Analysis .......................................................................................... 20
Data Set and Research Questions....................................................................................................... 20
Applications of SAP Lumira for Geospatial Analysis ........................................................................... 21
Analysis of SAP Lumira for Geospatial Analysis .................................................................................. 24
Conclusion............................................................................................................................................. 24
Chapter 6: SAP Business Objects Analysis for Excel .............................................................................. 25
About SAP Business Objects Analysis for Microsoft Office................................................................. 25
3
Data Set and Research Questions....................................................................................................... 25
Applications of SAP Business Objects Analysis for Excel ................................................................... 26
Analysis of SAP Business Objects Analysis for Excel .......................................................................... 28
Conclusion............................................................................................................................................. 28
Chapter 7: SAP Analytics Cloud................................................................................................................ 29
About SAP Analytics Cloud.................................................................................................................... 29
Data Set and Research Questions....................................................................................................... 29
Application of SAP Analytics Cloud....................................................................................................... 30
Analysis of SAP Analytics Cloud............................................................................................................ 33
Conclusion............................................................................................................................................. 33
Chapter 8: SAP HANA DATA MODELING .................................................................................................. 34
About SAP HANA Data Modeling.......................................................................................................... 34
Data Set................................................................................................................................................. 34
Processes .............................................................................................................................................. 34
Analysis of SAP HANA Data Modeling.................................................................................................. 37
Conclusion............................................................................................................................................. 37
Chapter 9: SAP Predictive Analytics: Association Analysis..................................................................... 38
About SAP Predictive Analytics: Association Analysis......................................................................... 38
Data Set................................................................................................................................................. 38
Setting up the Model ............................................................................................................................ 38
Running the Association Analysis and Visualizing .............................................................................. 39
Analysis of SAP Predictive Analytics: Association Analysis................................................................. 40
Conclusion............................................................................................................................................. 40
References ................................................................................................................................................ 41
4
EXECUTIVE SUMMARY
As a fourth-year Bachelor of Commerce student at Dalhousie University, I’ve had the opportunity to
take the course Business Analytics and Data Visualization (COMM 4512), under the teaching of Prof.
Kyung Young Lee. A key component of this course is learning skills using a variety of the latest
business analytics tools through challenging exercises that not only allow us to learn about the
analytics tools, but also to think critically about the tools at our disposal and how we can use them to
create meaningful insights and answer key business questions, not only within our current course
work, but also our future careers.
Over the next 9 chapters, I will be analyzing the following tools:
- Microsoft Excel: Pivot Tables
- Tableau
- IBM Cognos Insights
- SAP Lumira
- SAP Lumira for Geospatial Analysis
- SAP Analytics Cloud
- SAP HANA Data Modeling
- SAP Predictive Analytics: Expert Analytics
Each tool has its advantages and disadvantages, and I’ve highlighted a variety of research questions
for each tool in an attempt to capture these.
Some of the data sets and a brief introduction include:
- Global Bikes Inc.: A fictitious company used in multiple examples due to the depth of the
data and the wide range of analysis that can be done on it.
- World Bank Data: Co2 Emissions. This is an excellent one to visualize as the data itself is
confusing and difficult to follow, therefore putting it in a tool that can highlight key points
without needing to thoroughly explain the data itself is key.
- Sales and Distribution Channels: Easy data to navigate for a tool that may not be as simple
as others due to its lack of advancement to keep up with technological changes
- Alcohol preferences within Canada: Multi sourced data that had to be used in a program that
can merge the data seamlessly and then can remain within the same program to be
visualized.
- Crime Data based out of Calgary, AB. There are many different sources and impacts that city
dynamics can have on crime, so by a combination of 4 data sets there can be a deeper
understanding of the geospatial crime environment
- ERP Simulation Game (ERPSIM) data, the game simulates the planning, procurement,
producing, and selling environment of a commodity and tracks the data at all levels, giving a
robust data set to work with
Through the variety of tools and data sets, I have gained a bias towards the SAP suite of products.
SAP has been the most user friendly in all aspects, from crosstab development, to geospatial
analysis, and data modelling, SAP offers something for everyone at every level of ability.
5
CHAPTER 1: MICROSOFT EXCEL PIVOT TABLES
About Microsoft Excel Pivot Tables
A lot of data starts from an Excel file and is then dumped into another tool for analysis, but it doesn’t
have to! Excel has a host of analysis tools right in the app that can be incredibly useful to pull
business insights from. One tool in particular, the Pivot table, allows you to slice and dice your data,
creating a cross-tabulated structure that you can then manipulate by sorting, using filters and slicers,
and ranking, to summarize your data. From the summary, you can create pivot charts to further
explain your findings.
Microsoft Excel Pivot Tables are extremely user friendly and allow you to retain all of your data within
one application.
Data Set and Research Questions
Using Excel Pivot Tables, I will be analyzing transactional data from the fictitious company Global
Bikes Inc. (GBI) obtained through a spreadsheet distributed in class. This includes details about
products and accessories sold between 2007-2011 throughout the US and Germany. This was given
to us in excel format (.xls) and has 18 different variables to be analyzed (table below). The data was
cleaned before being distributed.
GBI Variables:
Within this data, there are many levels that can be slice and diced within it, that hopefully can be
realized through the use of pivot tables. The flexibility of this data makes it interesting to analyze and
may draw some unexpected outcomes due to the fact that it covers operations and revenues of a
“global” company.
The four questions I have answered through Pivot Table analysis include:
i. In the year with the overall lowest revenue, which material generated the lowest
revenue?
ii. In the year with the highest Net Sales, which division had the highest Net Sales? In that
division, what customer had the highest Net Sales?
iii. In the year with the lowest Revenue, what division had the highest sales revenue?
iv. What is the trend in Annual Net Sales by country?
Material Material Desc Division
Customer Desc Country Country Desc
Calendar Year/Month Calendar Year Calendar month
Net Sales Revenue Sales Quantity
Product Category Customer Sales Organisation
Sales Org Desc Cost of Goods M USD Discount
6
Application of Microsoft Excel Pivot Tables
In the year with the lowest revenue, which material generated the lowest
revenue?
After creating a Pivot Table with the initial GBI data, I added the Calendar Year to the rows, and
Revenue to Values
in the Pivot Table
Fields menu. This
resulted in a cross-
tabulation that
displays total
revenue by years
(Figure 2). From this
it can be
determined that
2009 had the
lowest revenue at a
total of $52,610,815.06. From here, I drilled down by
adding in Material Description (Material Disc.) to rows. From
there, I expanded the 2009 list to see the complete list of
materials, and sorted in ascending order. From Figure 1 we
can derive that the Fixed Gear Bike Plus has the lowest
revenue, with a total of $13,475.37.
In the year with the Highest Net Sales, which division had the highest Net Sales? In
that division, what customer had the highest Net Sales?
rom the Pivot Table Field menu, I
added Calendar Year and Division to
rows, and Net Sales to values. By
first sorting the Net Sales in
Descending order, I determined that
2007 had the highest net sales
(Figure 4) with a total of
$58,786,492.15. By drilling down to
Division in 2007, I could easily
determine that the division of BI had
the highest net sales at
$58,293,362.01. Now that it’s determined that BI had the highest
net sales in 2007, I had to add Customer down in rows (Below Year
and Division). At this point, I could choose to filter Year to 2007, and
Division to BI, or just drill down within the Cross-tab. I choose to drill
down within the existing cross-tab as seen in Figure 3. To determine
the customer with the highest net sales, I sorted the sum of net sales in descending order. The
customer at the top of the list is Bavaria Bikes with a total net sales of $5,709,514.86.
FIGURE 2
FIGURE 1
FIGURE 4
FIGURE 3
7
In the year with the Lowest Revenue, what division had the highest Sales Revenue?
From the Pivot Table Field menu, Calendar Year and Division
have been added to the Rows, and Revenue and Net Sales
have been added to Values. I started by sorting the Revenue
in Ascending order to determine that 2009 was the year
with the lowest revenue with a value of $52,610,815.06.
Drilling down from that menu and in the Net Sales column, I
could easily determine that BI had the highest Sales
revenue in 2009, with a value of $50,555,395.70.
What is the trend in Annual Net Sales by Country?
By adding Calendar Year to the
rows, Country to the Column, and
Net Sales to the values in the
Pivot Table Field menu, it gives a
very organized cross tab of the
data by Year and Country in
terms of Total Net Sales (Figure
7). This is great if I needed to go
in and pull quick data in terms of
sales figures, but it’s not visually
organized to show trends. What I
can do with this data though, is go into the Pivot Table Analyze menu and create a Pivot Chart. I
choose to display the
trends through a line
chart (Figure 6) as it
is the most efficient
way to see a
snapshot of the
Country Net Sales
year over year side by
side. From Figure 6,
it’s easy to see that
in 2007, both US and
DE markets were
generating similar
sales values, and in
coming years DE
increased their net
sales up to nearly
$35,000,000.00,
while US dropped and has almost plateaued around $20,000,000.00.
$0.00
$5,000,000.00
$10,000,000.00
$15,000,000.00
$20,000,000.00
$25,000,000.00
$30,000,000.00
$35,000,000.00
$40,000,000.00
2007 2008 2009 2010 2011
NetSales
Net Sales per Country
DE
US
FIGURE 5
FIGURE 7
FIGURE 6
8
Analysis of Microsoft Excel Pivot Tables
Excel Pivot Tables is notably the most accessible data analysis tool. Most computers have the
Microsoft Office Suite and included in that is Excel.
First off, it’s very user friendly. Regardless of the amount of data, if you have clear and concise
column titles, it’s very easy to navigate. You can make changes quickly and easily, and everything in
the Pivot Table is labeled exactly like you see it in the menu. I would say it is only one step above
navigating excel for regular use, and those who use it regularly should have no problem picking up
this useful tool. It can also be used offline which is very convenient. As long as your data is loaded in,
it can be manipulated from wherever.
The cross tabulations generated from the Pivot Table Field menu are both instant and very well
organized. You have the option of dragging and dropping into the proper section, and from there
organizing it in the order that best suits your needs. It’s very customizable in the sense that you can
filter your options at every level as you’re drilling down so that you only see the exact data you are
looking for.
One draw back to the simplicity of it, is that it can’t be taken much further than what you see. There
are no Geo-location options that load in from a third party, it doesn’t offer interactive features, it just
offers cross tabulation and very simple and generic charts.
Conclusion
Pivot Tables are functional but basic. If you’re looking for a specific line of data and know your drill
down path, I think Pivot Tables will lead you exactly where you want to be. If you’re looking for a more
visual interpretation, I think there are substantially better options that can take you much further
than a bar or line chart.
9
CHAPTER 2: TABLEAU
About Tableau
Tableau is an incredibly powerful, full spectrum data analysis tool. From preparing data, to visualizing
the final results, you can do all of it within the one program. Having both a desktop application, as
well as servers for easy sharing you can get the whole data experience from start to finish within
Tableau. It is simply designed so that individuals can use it from the front end, but has a powerful
backend for customizable results if you go through the Python server and know how to program
using Python.
Tableau can use data pulled from a variety of sources, notably Excel, SAP, Amazon Web Services,
and Salesforce (Lee, Week 6) and data can easily be combined from multiple sources within the
program simply through common headers.
Tableau has a sleek dash board that allows an easy to use, drag and drop experience.
Data Set and Research Questions
The data set I’ll be using is Global CO2 rates retrieved through the World Bank. This is a raw data
Excel spreadsheet with the CO2 rates per country by year (1960-2011). Cleaning the data is very
simple within Tableau because it has a feature labelled “Data Interpreter” that will actually clean the
data for you, and then allows you to review what it cleaned up and make adjustments after. This
worked very well for my data specifically, and it was not necessary to make further adjustments.
From here though, to make one element easier, Tableau has an option to “Pivot” data. What I did
with this feature was switch the dates from being listed horizontally, to vertically underneath its own
columns. This changes nothing about the data itself, just how it’s presented.
Some questions and observations I’ll be addressing through Tableau about the data include:
i. How do the global CO2 emissions year over year trends compare between Canada and
China?
ii. Where are the highest CO2 emissions per capita concentrations globally?
iii. What are the 10 highest CO2 emitting countries captured by this data set (By Country)?
Top 10 per Capita?
10
Applications of Tableau
How do the Global CO2 emissions year over year trends compare between
Canada and China?
Using the cleaned data set and a Line Chart with
Year in the Columns and SUM(CO2 Emissions) as
the Rows, this produces a huge and complicated to
read visualization. By inserting Country Name into
the filter and first selecting Canada (Figure 9) and
then selecting China to see both on the same chart
(Figure 8). Note the drastic difference in scale
between the two graphs. We can see the extreme
incline in 2001 for China while Canada has been
steadily increasing at a substantially slower rate. There
is a brief period between 1963-1968 where both
countries are within ~500K in similar emissions but
then China increases coming into the 70’s. There really
is no comparison when it comes to these two countries
totals as they have very different production
environments, and China has a much larger population.
Where are the highest CO2 emissions per capita concentrations globally?
Creating a Map, with Longitude on the Column
and Latitude on the Rows, Tableau has the
capacity to create maps using geo location data.
By using the Country Name in the Detail field and
Avg. CO2 Emission per Capita as the Colour, I’ve
created a gradient view of the world map by CO2
Emissions per Capita (Figure 10). By creating a
colour gradient based on the Average CO2
Emissions per Capita, we can actually see in this
map that North America has some of the highest
CO2 emissions per capita globally. Compared to
the previous question, where it appears that
China is substantially worse than Canada, when bringing population into consideration, emissions
are sigbificantly higher per individual Canadian.
FIGURE 10
FIGURE 9
FIGURE 8
11
What are the 10 highest CO2 emitting countries captured by this data set (by
Country)? Top 10 per Capita?
Creating a Bar Chart
with Country Name as
the Columns and
Average CO2 as the
Rows I have a full
view of every country
captured within the
data and their
Average Emissions
output. From here I
went in through the
sort menu to only
show the Top 10
countries by Average
Emissions, and then
sorting from Highest
to Lowest (Figure 12).
Here you can see the US has
almost 2000K more on
average than the second highest emitting country, China. Canada falls behind in 8th place total.
Looking now per capita, the table is easily change by altering the Rows from just Average CO2, to
AVG CO2 Per Capita. (Figure 11) Interestingly enough, there is very ittle overlap in the two charts,
with the exception of Canada and the US.
Analysis of Tableau
I feel like I only scratched the surface of what Tableau is capable of, and it can create really intricate
and detailed visualizations that still present as easy to read and aesthetically appealing as well. I
really like the look and colour schemes within Tableau, they would be very eye catching to someone
looking at the results for the first time.
With that being said, it is almost like information overload. Every menu has so many options that it is
almost too much if you don’t know exactly what you’re looking to create going into the analysis. After
working with the tool a bit, I’ve found the surface level details easy enough to use, although I know I
could be using for much more complex designs that would take a lot more time and a much steeper
learning curve.
From my own capabilities and uses, I feel as though Tableau offers what I need, but there is way
more that is out of my capacity (i.e.., using Python) that I wouldn’t even know where to begin with.
Conclusion
Tableau is a very interesting tool, although I don’t feel it is the most user friendly. As someone with a
bit of experience now working with different visualization tools, I found Tableau to be one of the more
difficult ones to wrap my head around. After working with both SAP and Tableau, I think I prefer SAP.
FIGURE 11FIGURE 12
12
CHAPTER 3: IBM COGNOS INSIGHT
About IBM Cognos Insight
IBM Cognos Insight is a desktop-based analytics tool with the business user in mind. In saying that I
mean that the front end is very simple but it is a powerful tool that can create actionable results
without the need of an IT professional. It was created for uncovering inefficiencies and opportunities
using Business Intelligence, giving quick overviews from the minute you upload your data. A variety
of third-party data sources are supported.
Simplicity should be kept in mind as I go through the difference uses of this application.
Data Set and Research Questions
The data set being used within the IBM Cognos examples is a set of Regional Sales (fictitious) Data
that was provided in class. It includes 7 different product types being sold within North America,
being sold over 3 different sales channels.
The questions I will be answering through Analysis and Visualization include:
i. Which Sales Channel had the highest total revenue in Q3/2012?
ii. Which customer type has the weakest margins?
iii. Which Product Type within Entertainment Venues has the highest Margin
Applications of IBM Cognos Insight
Which Sales Channel had the highest total revenue in Q3/2012?
To determine which
of the 3 sales
channels (Direct,
Internet, or Retail)
had the highest total
revenue in Q3/2012,
I have to set the
Sales Channel as
the Rows, the
Revenue as the Dimension, and the Quarters as the
Columns. This will give an overview of all 4 quarters in
2012, but to be even more specific, within the Columns
menu, I can drill down to show specifically Q3/2012. In Figure 14, a simple Bar chart is
automatically generated when I make these changes to the dimensions, and we can see that Direct
Sales has the highest revenue with a total of $2,180,912.00. IBM Cognos Insight also generates a
table with the given data dimensions as well (Figure 13).
FIGURE 14
FIGURE 13
13
Which customer type has the weakest margins?
Margins are not something simply provided within my data set, but easy to achieve with IBM Cognos
Insight. First off, the Rows will be defined by Customer Type, and the Columns defined by Sales –
Region Measures. Within Sales-Region measures there are three data points: Cost, Revenue, and
Count. Count is irrelevant for this, so I deleted that column. Now left with Cost and Revenue, two
factors of margins, I went into the
column header “Calculate” menu
and compared Revenue vs. Cost.
By selecting this, it gives you a
visual of which Customer Type has
Good (Revenue is 10% more than
cost), Moderate (anything between
Good/Weak), or Weak (Revenue is
90% of cost) margins. As indicated in Figure 15, Retail has been indicated as having a weak Margin
score (Which can easily be proven true by looking at the Cost vs. Revenue in the table).
Which Product Type within Entertainment Venues has the highest Margin %?
When determining the exact margin %, I had to create a new column within the table for the
calculation. This is very simple, as IBM Cognos Insights has a calculations menu that will give you a
list of possible calculations when you have multiple
columns selected. For this, I simply selected the Cost
and Revenue Columns (from the previous question)
and right clicked. Under the Calculation menu, I
selected Cost/Revenue and the renamed it to Margin
%. Modifying to give the actual percentage, I had to
adjust the internal equation by adding a 1- to the
front of it, to get the pure margin percentage, and
then formatting the column to display all values as
percentages.
Drilling down to Customer Type: Entertainment Venues, (Figure 16) every value has a good margin
score, but the highest is Repairs with 66.29%.
Analysis of IBM Cognos Insight
After using IBM Cognos Insight, it is very evident that it is a dated software. With that being said, it
still serves its purpose well. It is easy to navigate and produces the exact results that a
businessperson would need fairly simply.
It is not the easiest analysis tool that I’ve used, but it is laid out in a way that you can find what
you’re looking for by picking around on it with minimal consequences.
The graphics are dated, and the tables are not as easy to manipulate as purely as excel sheets are.
In terms of a visualization tool, I feel as though Cognos falls short in the ability to take plain data and
transform it into something less than obvious, which came through in my questions and answers.
The charts are bland, the tables are no step above any average spreadsheet, and as a result,
uninspired answers.
FIGURE 15
FIGURE 16
14
It is definitely a sufficient application, and works very well for business purposes, but there are more
powerful, simpler, and prettier applications that serve the exact same purpose.
Conclusion
IBM Cognos Insight is a dated application which is evident in appearances alone, but an effective
tool nonetheless for basic business analysis.
15
CHAPTER 4: SAP LUMIRA
About SAP Lumira
SAP Lumira is a Business Intelligence and BI tool used for business purposes. Data can be pulled
from an Excel of CSV file and then visualized within the program.
There are 3 Key Capabilities of SPA Lumira:
i. Tell the Story with Self-Service Data Visualization: Explore and analyze data online with a
simple-to-use solution. Create stories with BI visualizations from all types of data that
others can leverage, build on, and share.
ii. Create analytics applications and dashboards: Develop interactive, mobile-ready
dashboards and analytics applications to collaborate with users and their data stories
and provide fingertip access to actionable insight.
iii. Secure trusted access and scalability: Connect to data anytime, anywhere for deeper
insights and informed decision making on the go. Explore data with filters, drill-down
capabilities, and hierarchal navigation. (SAP, 2019)
Data Set and Research Questions
The data used in these examples is two files of data pulled from Stats Canada. One file is Population
data including age/sex. The second file is “Sales of Alcoholic Beverages” in Canada. Luckily, the data
is downloadable in CSV format, and there was very little cleaning up within the data.
The trickiest portion was combining the two datasets to have them work together to uncover the
answers to a few questions. The data sets could be cross-referenced by the common factor of
Province, so they paired nicely. Another factor to consider, since I was working with alcoholic
beverage related statistics, was the legal drinking age of each province. I had to create an additional
column and labelled it Adult which was applied appropriately to the people over 18/19 years old.
I used SAP Lumira to visualize and compare the data, the visualizations include:
i. Number of Adults of Legal Drinking age per Province
ii. Adults and Population by Year in Canada
iii. Per Capita Sales by Year, Type of Beverage, Originating in Canada compared to Per
Capita Sales by Year, Type of Beverage, Originating in Nova Scotia
iv. Sales per Capita in Nova Scotia vs. Canada (Domestic & Imported)
v. Provincial Per Capita Sales in 2007 vs. 2017
16
Applications of SAP Lumira
What is the number of Adults of Legal Drinking Age per Province?
Before I began getting in depth with the data, I wanted to break down the information to uncover how
many people within each province was legal drinking age.
FIGURE 17
Using the “Adults” measure (this was the one created to display only members of the population
within legal drinking age), the REF_DATE (Year) in columns, and GEO (Province) in the rows, it creates
a cross-tab. For the rest of the questions, the “Adults” measure will be used as the population.
Is the number of adults increasing or decreasing within Canada?
To visualize the cross-
tab created in the
previous question, I
created a graph to
analyze the trend of not
only population growth,
but growth of people of
legal drinking age within
Canada year over year.
From this graph, it is
evident that it is
constant upward growth.
FIGURE 18
17
What is the Per Capita Sales Rates per Year by Beverage? What is the
comparison between products originating in Canada, vs. originating in Nova
Scotia?
For these charts, the Type of Beverage is in the Rows field and Origin of Product is in the Columns
field. Both graphs have been geographically filtered Figure19 is filtered to Canada, and Figure 20 is
filtered to only Nova Scotia, and the Type of Beverages has been filtered to solely Beer and Wine.
Both nationally and in Nova Scotia a few similarities can be found.
1. Canadian Beer sells substantially better than imported beer
2. Imported wine sells better than Canadian wine
3. Beer sales are higher than Wine sales
The differences I found in the two include:
1. Nova Scotia beer sales are higher than the national average
2. There is an influx of Canadian wine being purchased in recent years in Nova Scotia (not
Nationally)
3. There is an influx of Canadian beer being purchase in recent years in Nova Scotia (not
Nationally)
FIGURE 19
FIGURE 20
18
What do the per Capita sales look like in Nova Scotia, vs the National per
Capita Averages?
Type of Beverage (Beer vs. Wine), Origin of Product (All Products), and Geo (Canada vs. Nova Scotia),
were all factors within this Chart. This chart lays out where Nova Scotia stands in terms of Sales per
Capita in comparison to the national average Sales per Capita.
How have Provincial per Capita Sales Changed from 2007 to 2017?
I used a map instead of a regular chart in this one to add a more visual perspective on it. Blue
represents Beer, Green represents Spirits, and Purple represents Wines. The size of each pie chart is
also directly correlated to the Per Capita Sales figures in each province.
FIGURE 21
FIGURE 22 FIGURE 23
19
Analysis of SAP Lumira
SAP Lumira is very intuitive and user-friendly. It is clean and the functions are very easy to follow.
With that being said, I did not feel overly limited in its capabilities.
First off, I found it very easy to go in and select the type of visualization I wanted. It then guides you
through using not only text, but icons, in building out your visuals. It appears as soon as you start to
add in dimensions and is very visually appealing in the colour schemes as well.
Second, the sheer range of abilities within the program is excellent. I can go as simple as creating a
cross-tab, to creating geo tagged maps all within the same program. I think the diverse range that
SAP Lumira offers is a great selling point.
Third, the ability to import files directly from excel is great. If I have a work book that I’ve been
working out of and need something more visually appealing to add in to a paper or a presentation,
being able to load in the document and be able to manipulate it without having to go to any extra
lengths is very useful.
I think the most difficult part of the program is setting it up to manipulate multiple data sources
within the same project, although I don’t see this as a true limitation to SAP Lumira, as it is difficult to
cross reference files across a variety of platforms.
Conclusion
In conclusion, I think SAP Lumira is easy to use, visually appealing, while also capable of handling
complex data from multiple sources. I personally love using Lumira as it allows me to be creative
while still processing data and producing meaningful results.
20
CHAPTER 5: SAP LUMIRA FOR GEOSPATIAL ANALYSIS
About SAP Lumira for Geospatial Analysis
In the previous chapter, I introduced the concept of using SAP Lumira for the purpose of basic
geographic based analysis. There are a few reasons why I felt it was important to break geospatial
analysis into it’s own category, but primarily because over 80% of data has a connection to location
(Geospatial and Graph Analysis, Lee, 2019) and as we gain more access to mobile based data,
location correlation will rise as well.
By incorporating a geospatial component to data, you are adding an element that is very straight
forward, and easier to read for the average person. Looking at a map and having the data plotted is
substantially easier to make a connection with as opposed to seeing the data in a cross-tab form.
SAP Lumira can simply plot and quantify data based on inputted latitude and longitude coordinates.
You are then able to leverage the visualization capabilities of SAP Lumira to layer your data to view
location targeted results that are easy to read and understand.
Data Set and Research Questions
The data set being used in this chapter is about the crime rates in Calgary, AB. This data was
procured from four sources:
1) “Calgary Police Statistical Reports”
https://www.calgary.ca/cps/Pages/Statistics/Calgary-Police-statistical-reports.aspx
2) “Calgary Communities Locations” – INCLUDES LONGITUDE AND LATITUDE
(GEOSPATIAL ANALYSIS OF CALGARY CRIME DATA IN RESIDENTIAL COMMUNITIES, Lee,
2019)
3) “Historical Calgary Community Populations”
https://data.calgary.ca/Demographics/Historical-Calgary-Community-Populations/4mgk-hrwr
4) “Calgary Police Service Office Locations Map”
https://data.calgary.ca/Health-and-Safety/Calgary-Police-Service-Office-Locations-Map/ehvy-
b4t6
These data sources were all linked together using their own version of the “Community Name”
column within the data files. The “Calgary Communities Locations” includes the longitude and
latitude data, therefore merging the data files based on the community name will assign a
geographic location to all the data.
One notable fact about the statistical reports is that there is a significant outlier in the data, and that
is “Social Disorder”. This is something so common that it barely falls within the category of “crime”
but rather something that police respond to that isn’t actually threatening to anyone or anything,
more so just a social annoyance for the surrounding communities. I felt this was irrelevant for the
some of the data analysis, a it skewed the data in a substantial way so it was filtered out in certain
cases.
21
The research question I will be addressing include:
1) Were social disorder calls more common in the city centre or in the rural communities in
2015?
2) Has the quantity of commercial crimes increased or decreased between 2012-2017?
a. What does the amount of Breaking and Entering crimes look like in comparison to
Robberies?
3) Do the types of crimes committed change as the population size increases or decreases?
(Social Disorder filtered out)
4) Is there a correlation in the number of crimes committed and the location of police service
offices?
Applications of SAP Lumira for Geospatial Analysis
Were social disorder calls more common in the city centre or in the rural
communities in 2015?
I began with this
question to
determine whether it
would be smart to
filter out social
disorder from further
questions. If there is
a substantially
higher concentration
in some regions vs
others, it may be
interesting or
valuable to keep in
moving forward. If
the data is
consistently
concentrated
throughout all of Calgary, it would not be relevant to determining specific crime rates in regions
moving forward.
This is a simple geospatial visualization that uses the data from the “Calgary Police Statistical
Reports” merged with the “Calgary Communities Locations” data set.
The Geo Dimensions are set to the community locations, with the Crime Category being filtered to
only “Social Disorder”, and the Year being filtered to 2015. Using the Choropleth Data Point Type, the
colour is representative of the # of Incidents (as seen at the bottom of the figure 24).
As Figure 24 visually represents, it appears as though social disorders are prevalent in all
communities around Calgary, with more occurring in the city centre. Being the centre of the city we
can assume that the population is higher, making it not surprising to see a higher density of crime
there.
FIGURE 24
22
Has the quantity of commercial crimes increased or decreased between 2012-
2017? What does the amount of Breaking and Entering crimes look like in
comparison to robberies?
To begin, I had to filter the data down to what could be considered a “commercial crime”, I focused
in on Breaking and Entering, and Robberies. To achieve this focus, I filtered out all other Crime
Categories and then filtered each according to the year. In order to get a more visual representation
of the data, I used the Bubble Data Point Type with the number of incidents corresponding to size.
From a first glance, it appears as though the data
is almost the same across the board but if you
take a closer look at the bottom left hand corner,
you can actually see that the dimensions change
drastically between 2012-2017. In 2012, the
large bubble represents 177 incidents, whereas
in 2017, the large bubble represents 364
incidents! I would say based on solely visuals
(without looking at the precise numerical data)
that crimes of this nature have actually increased
from 2012-2017.
To look at a comparison between the two, I
took the Bubble Data Point Type and applied
the same filters. The difference between
Figure 28 and the ones above, is that there
is an additional colour filter sorted to colour
coordinate based on Crime Category. Based
on Figure 28, we can see that there are many
more Breaking and Entering crimes on record
than Robbery.
FIGURE 26
FIGURE 27
FIGURE 25
FIGURE 28
23
Do the type of crimes committed change as the population size increases or
decreases? (Social Disorders filtered out)
This question seems simple,
but it’s actually a little more
complex as there are layers to
this visualization. By joining in
the third data set to add in
community population sizes,
again linking through the
community name.
For the first layer, I applied the
Bubble Data Point Type to get a
visual based on population size
per community. This is a simple
layer with only one colour.
For the second layer, I applied
the pie chart Data Point Type
and added a filter to eliminate social disorders (being in such high quantity, it would make the rest of
the visualization harder to read if included). From here, I can see the weights of the crimes within the
population bubbles.
From a visual perspective, it doesn’t appear that there is an obvious correlation between any one
type of crime in larger/smaller population communities.
Is there a correlation in the number of crimes committed and the location of
police services offices?
The final question for this tool begins by adding in the fourth data set, the locations of the police
services offices within Calgary,
merged through the community
names.
This was another layered question,
layer 1 being the total number of
crimes (social disorders was not
filtered out, as this was not crime
committed specific), using the
Bubble Data Point Type, with layer
2 being the pin pointed geo-
location of the police services
offices. Based on this visual
analysis, there does not appear to
be any correlation between the
two. Yes, there appears to be
slightly less crime in the areas surrounding the offices, but there also isn’t one in the city centre,
where, based on previous questions, the crimes seem much more prominent in general.
FIGURE 29
FIGURE 30
24
Analysis of SAP Lumira for Geospatial Analysis
SAP Lumira is very user friendly, and the Geospatial Analysis tools are no exception to this. It is a very
clean and straightforward tool that allows the combination and layering of multiple data points from
a variety of sources.
There are different options for visualizing on the map, including a pin point, pie charts, and the one I
find most useful for the best picture, the Bubble Data Point Type.
The one draw-back I find in Lumira’s geospatial abilities, is the fact that you need to incorporate a
data set that include longitude and latitude. This is not as convenient as being able to have a list of
city names, provinces, countries, that some other programs do offer.
After using SAP Lumira in other contexts, I found it very easy to jump in a combine the different data
sets. They all came from excel or .csv files, making them easy to understand by Lumira.
SAP Lumira offers the complexity of detailed geospatial analytics but offers it in a way that is easy to
navigate and read to the average individual.
Conclusion
SAP Lumira has proved once again to be an incredibly user-friendly tool, with not only the ability to
take simple and complex data from a file to a visualization in very little time, but also introduce
geospatial analysis to the simplest degree. I would recommend SAP Lumira as the best tool to use
for moderate to complex levels of data by anyone with a base level knowledge of data analysis.
25
CHAPTER 6: SAP BUSINESS OBJECTS ANALYSIS FOR EXCEL
About SAP Business Objects Analysis for Microsoft Office
SAP Business Object Analysis for Excel is a Microsoft Office Add-In that allows multidimensional
analysis of OLAP sources, MS Excel Workbook application design, and creation of BI presentations in
PowerPoint. I’ll be using examples from MS Excel Workbook
Additional Information Includes:
• Bex Queries, Bex Query Views, BW InfoProviders can all be used as the data source
• Data is displayed in the workbox as a complex crosstab
• Design panel is very similar to that of Excel Pivot Table (“Drag and Drop” functionality)
• Can be used with Visual Basic Editor
• Can only be used when downloaded on a local machine
(Week 3, Lee, 2019)
Data Set and Research Questions
Using SAP Business Objects Analysis for Microsoft Office, I will be analyzing transactional data from
the fictitious company Global Bikes Inc. (GBI) obtained through the InfoProvider GBI InfoCube. This
includes details about products and accessories sold between 2007-2011 throughout the US and
Germany. It has 18 different variables to be analyzed (table below). Using the Bex Query Designer, I
was able to access trend data from within the InfoCube. Once that was saved, I proceeded to open
the saved file through SAP BusinessObjects Analysis for Microsoft Excel and began manipulating the
data.
The questions and observation I have answered through this program include:
i. Basic Navigation
ii. In the year with the lowest revenue, what division had the highest revenue, and what
customer had the highest revenue within that division?
iii. What are the historical revenue trends for the US and DE?
GBI Variables:
Material Material Desc Division
Customer Desc Country Country Desc
Calendar Year/Month Calendar Year Calendar month
Net Sales Revenue Sales Quantity
Product Category Customer Sales Organisation
Sales Org Desc Cost of Goods M USD Discount
26
Applications of SAP Business Objects Analysis for Excel
Basic Navigation
As an example has previously been done using Excel Pivot Tables, I wanted to highlight the basic
overview of what the SAP Business Objects Analysis for Excel looks like from a first glance as it is a
little different.
This is a page that I created by adding Sales Quantity, Revenue, Discount, Net Sales, and Cost of
Good M USD, in as my Measures, and Calendar Year and Material as my Rows. It looks very similar
to an Excel Pivot Tables cross-tab. This is where it ends. SAP Business Objects Analysis for Excel goes
substantially more in depth in comparison to Pivot Tables as this is about as complex as pivot tables
get, whereas this is just the starting page for SAP Business Objects Analysis for Excel.
FIGURE 31
27
In the year with the lowest revenue, what division had the lowest revenue, and
what customer had the highest revenue within that division?
To start, revenue is the only
relevant measure within this
question so everything else can
be removed from the measures.
There are layers to this
question so starting off, I would
drill down by adding both
Calendar year and division to
the rows area in that order. By
going through the “more sort
options” menu, I sorted the
data in ascending order based
on Revenue in Calendar Year.
This puts 2009 at the top with
the lowest revenue at
$52,610,815.24. From there I
can look at the two divisions (if
there were more, I would filter
to 2009, and sort again) and determine that AS has the Lowest
revenue between the two at $427,889.07. From here to determine the customer with the highest
revenue within the AS division, I have to drill down one more layer and add in Customer to rows.
Because this adds many more layers to the data, I decided to filter everything first to 2009, and then
to the division AS. I sorted the revenue in descending order and Bavaria Bikes had the highest
revenue in the AS division, at $38,216.79.
What are the Historical Revenue Trends for US and DE?
I had started with Revenue as my measure, and Country
and Calendar year as my Rows (Figure 35). This worked
fine if I was just going to analyze a small amount of data in
a cross-tab. What I needed to do next though to get a better understanding of the trends, was
change Calendar Year from rows to Measures.
This allowed me to create a line chart (Figure 34). From here, it is very clearly laid out that although
revenues in both countries were relatively similar in 2007, US began a quick decline and nearly
plateaued in 2009, where 2007-2009 was pretty stagnant for Germany and then hit an increase
from 2009-2011.
FIGURE 32FIGURE 33
FIGURE 35
FIGURE 34
28
Analysis of SAP Business Objects Analysis for Excel
I think SAP Business Object Analysis for Excel is significantly more complex that those in previous
chapters and has taken time and effort to understand, but because of this, it can offer deeper
insights and creates very clean and easy to read results.
The best aspect of this program has been the ability to extract it from the DataCube and being able
to manipulate the raw data to create the exact output I was looking for. The initial extraction was
complex and took a deeper understanding, but once I was able to open it in the Excel analysis tool, it
became quite a bit easier to navigate because it had a similar layout to what I am used to in Pivot
Tables.
The ability to drill down while having it still laid out simply makes it appealing to use.
I would not use this as a go to for everyday projects as it does require more work and specific
software downloads, it’s not as accessible as other programs. The design options are also lacking,
much like Pivot Tables.
Conclusion
I think although not incredibly difficult to use, there are still barriers to the average user. It is useful
for large amounts of complex data, but I would not use it as my primary visualization tool.
29
CHAPTER 7: SAP ANALYTICS CLOUD
About SAP Analytics Cloud
SAP Analytics Cloud is an analytics cloud-based program that leverages business intelligence,
enterprise planning, and business analytics to prepare and model (including creating plans, visuals,
and predictions) using data from both within the SAP ecosystem, as well as non-SAP sources. Being a
cloud based software, there is the benefit of low implementation costs, low maintenance from a
users perspective, as well as security standards upheld by SAP. Another benefit of being cloud based
is that updates happen in real time, if you want to add in features made available by SAP, you can
access them online and add in as you need.
Data Set and Research Questions
Using SAP Analytics Cloud, I will be analyzing data retrieved through the ERP Simulation Game
(ERPSIM). This game simulates the planning, procurement, producing, and selling environment of a
commodity (In this example, it is different kinds of Muesli). This game runs over several rounds, with
12 products in the given market, and while the game is being played, the data is being stored
through a direct connection to the SAP HANA ERPSim system. This section is using data from a
previously run simulation.
For more information on ERPSim, visit www.erpsim.hec.ca (Lee, Week 5, 2019)
Before jumping into a thorough analysis, I had to prepare the data to get the most from my results.
First off, I had to ensure each column of data was properly labelled as a Measure (Key information/
figure/ fact) or Dimension (Reference information directly related to measures). The data would not
model properly if it is not labelled properly. For this data, the measures are Price, Quantity, and
Revenue. The dimensions are Team, Day, Area, Distribution Channel, Sales, Order, and Product.
Second, I enhanced the data. Some of the data columns are vague, but when edited, give a better
picture of what is going on. By selecting the area column and choosing to “Create a Transformation”
I transformed the NO, SO, WE to North, South, and West (respectively), and then in the Distribution
Channels, transforming 10, 12, 14 to Hypermarkets, Grocery Chains, and Convenience Stores. This
allows the results to be apparent at a first glance of the visualizations, instead of having to go in later
and determine what each acronym or number means.
For my next step in enhancing the data, I created another column combining round and day. By doing
this, it allows the data to be compared even deeper by the day.
The questions I will be answering through analysis and visualization are as follows:
i. Which team brought in the highest revenue? What product had the highest revenue?
ii. What is the market share (in terms of revenue) of each team, by product?
iii. Were there products that did NOT sell in specific distribution channels?
iv. Which team sold the high quantity of product, and what quantities were sold of each
product?
v. On which days did individual teams not have any revenue throughout round 5?
30
Application of SAP Analytics Cloud
Which team brought in the highest revenue? What product had the highest
revenue?
This is a simple analysis once the data is cleaned and enhanced. By inserting a bar/column chart
(comparisons visualization type), with Revenue as the Measure and Teams as Dimension it
automatically generates the table in Figure 37.
By sorting the data in the chart from Highest to
lowest and updating the chart colours, it is easy
to identify that team RR brought in the highest
revenue with $32,297,798.00.
Using the same process to insert the bar/column
charts in Figure 1/2, we can easily determine the
product with the highest revenue by setting the
Measure as Revenue and the Dimension as
Product. In Figure 38 it is determined that the
500g Nut Muesli is the product with the highest
revenue, with a total of $24,445,956.53.
FIGURE 36FIGURE 37
FIGURE 38
31
What is the market share (in terms of revenue) of each team, by product?
Using a Stacked Bar chart from the comparison menu, I inserted the Measure as Revenue, the
Dimension as Product, and then the Colour as Team. The colour is an additional specifier within the
stacked bar chart, as there are multiple pieces of data within each bar.
In Figure 39, the chart has been sorted by total product revenue lowest to highest for organizational
purposes. Each team is assigned a colour so it is easy to see at a glance which team did the best
within each product category.
Were there products that did NOT sell in specific distribution channels?
Using a Heat Map from the
distribution menu, it is very easy to
find blank spots (if any) within the
distribution channels. By setting
Dimensions as Distribution
Channel and Product, with Colour
as Revenue. Based on the map in
Figure 40, there are products that
did not sell in both the
Convenience and the
Hypermarkets distribution
channels.
FIGURE 39
FIGURE 40
32
Which team sold the high quantity of product, and what quantities were sold of
each product?
Using a Marimekko chart (2-dimensional stacked chart) under the “more” menu, with Team as
Dimensions,
Quantity as Height,
and Product as
Colour. From this
chart we can clearly
see by the height of
the column, which
team sold the
highest quantity of
product, and then
the colour blocks
indicate the
quantity of products
sold.
On which days did individual teams not have any revenue throughout round 5?
Using a heat map with the Dimensions set to Round/Day (this is the combination created when
enhancing the data as mentioned previously), and Team, and Colour to Revenue by Round/Day. An
extra step for this question as it’s asking for a specific round, the data has to be drilled down to the
second level (Days within a specific round) and filtering by “5”.
FIGURE 42
As can be seen in Figure 42, any blank area within the chart can be considered a day without
revenue for a specific date.
FIGURE 41
33
Analysis of SAP Analytics Cloud
SAP Analytics Cloud is one of the most visually appealing analytics tools that I have used to date.
Being cloud based, it’s accessible with a membership virtually anywhere with internet.
It is a very user-friendly tool. As long as your baseline data is clear, everything is labelled both with
words and icons so that the average user should not have much difficulty at all achieving basic
results. It lends easily to manipulating the data even further if you take the extra time to enhance the
data. In the initial screen, it is laid out similarly to an excel spreadsheet, making it familiar to most
users.
Changes can be made easily to the visualizations by dragging and dropping within the menu, as well
as colours and sizes. You can then filter to achieve the results you want with minimal effort. Another
feature that lends itself to being cloud based is the fact that you can make use of geo-location as
well. It is very customizable for a program that does not require any coding.
Conclusion
From my personal usage standpoint, I find SAP Analytics Cloud very easy to use, while also being
able to make modern and professional looking visualizations and producing accurate results.
34
CHAPTER 8: SAP HANA DATA MODELING
About SAP HANA Data Modeling
Data Set
To demonstrate SAP HANA Data Modeling, I used a case study modeling a business scenario for the
fictitious company Global Bike Inc. (GBI). In this situation, GBI has acquired a new data warehouse
solution based on SAP HANA and the SAP HANA Platform. It is being integrated into the Sales
department so a prototype of the model had to be created.
Three .csv files were pre-loaded into the data-base including sales data on customers, products, and
sales transactions over the last five years.
Processes
Create DataBase Tables
The DataBase tables are the foundation of the data modeling. I started by creating 3 database
tables within my database schema.
The files given were in .csv form, meaning I had to write a code so that SAP HANA could understand
what the information it was reading actually was, where it was coming from, and how to properly
position it within a table.
An example of this for the SALES data code is as follows:
table.schemaName = "GBI_242”;
table.tableType = COLUMNSTORE;
table.columns = [
{name = "YEAR"; sqlType = INTEGER; },
{name = "MONTH"; sqlType = INTEGER; },
{name = "DAY"; sqlType = INTEGER; },
{name = "CUSTOMER_NUMBER"; sqlType = NVARCHAR; length = 10; },
{name = "ORDER_NUMBER"; sqlType = NVARCHAR; length = 10; },
{name = "ORDER_ITEM"; sqlType = NVARCHAR; length = 3; },
{name = "PRODUCT"; sqlType = NVARCHAR; length = 8; },
{name = "SALES_QUANTITY"; sqlType = INTEGER; },
{name = "UNIT_OF_MEASURE"; sqlType = NVARCHAR; length = 3; },
{name = "REVENUE"; sqlType = DECIMAL; precision = 17; scale = 2; },
{name = "CURRENCY"; sqlType = NVARCHAR; length = 3; },
{name = "DISCOUNT"; sqlType = DECIMAL; precision = 17; scale = 2; }
];
table.primaryKey.pkcolumns = ["ORDER_NUMBER", "ORDER_ITEM"];
To break this down further:
- Name: The column title
- sqlType: The data type
- Length: the number of characters to return underneath the column
- Precision: The number of significant digits
- Scale: the number of relevant decimal places
35
Once executed correctly, it creates a Data Table formatted exactly how you had it specified within the
code.
Data Provisioning
Once the tables are created, I had to create a new Remote source to host the Virtual Tables needed
to move forward. This allows the tables to be used on their own.
Once the Remote Source was established, the Virtual Tables can be created through a right click. I
had to ensure that the VT’s were included within my own schema “GBI_242”.
From here, the background information is established and I can begin the data association
processes through Flowgraph Models. Within the editor workbench in SAP HANA FIORI, I can go in
and create a flowgraph to associate the Virtual Data Tables (remote source data) with the Data
Tables (blank table with data formats).
In Figure 43, you can see the initial FlowGraph page. I have my Customer VT as the data source, and
the Customer Data Table as my Data Sink. This is an excellent visual that helped me to make the
connection between the different types of data, tables, and sourcing.
To establish a definite connection between the two, I selected the Data Sink to open the
Input/Output details screen (Figure 44). On this page, I drew connections between the relevant data
sets and column headers, and then executed the flowgraph model to load in the customer data. This
was then done for the product data and the sales data as well.
FIGURE 43
FIGURE 44
36
Calculation View -> Dimensions
There are two sets of calculation view needed for this data set, Customer (Dimension), and Product
(Dimension).
Using nodes, I extracted
data from the data table
(Projection & Join)
- Join: Joining two
source objects to pass
the result. In the example
in Figure 45, I used a
text-join.
- Projection: Used to
select columns, filter the
data, and create
additional columns (also
used in Figure 45)
Primary keys were also
used within the
Semantics of the
calculation view.
Calculation View -> Star Join Data Cube
The process for creating the Data Cube is very similar to that of the Dimension calculation, but
slightly simpler because it acts as a building block. Using the calculations established in the
Dimensions, they are
joined together in a
“Star Join”.
Relevant/Necessary
fields are then selected
based on the scope of
the project that the
data is needed for.
Figure 46 represents
the semantics screen,
where primary keys can
be defined, the data
type can be defined,
and columns can be re-
labelled properly.
FIGURE 45
FIGURE 46
37
Analysis of SAP HANA Data Modeling
SAP HANA Data Modeling is a very complex system, and even after fully completing the case
analysis, I feel as though I’ve barely scratched the surface of this platforms’ offerings.
I feel as though the first two steps, creating the database table and the data provisioning, were the
most straight forward and easiest to conceptually wrap my head around. They are very processed
based and errors are noted clearly for quick correction and then you can move on to the next steps
within the process. The Calculation Views are difficult if you don’t fully know or understand the scope
of the project you will be working on, as the options you choose and the definitions you use can
change the view you have of the data, and the results you can achieve moving forward.
SAP HANA is a very complex program that would require a lot of training and a thorough
understanding before using it to manipulate extensive data sets, but I feel after doing this case study
that I have a basic understanding of what SAP HANA is capable of, and where to focus my future
learning on to really understand how HANA works on a deeper level.
Conclusion
SAP HANA Data Modeling Process is very complex in its abilities, and it is not designed with a
beginner user in mind. With practice and understand of SAP HANA itself and proper research behind
the data, it is a very powerful tool that can be used to the advantage of any analyst.
38
CHAPTER 9: SAP PREDICTIVE ANALYTICS: ASSOCIATION
ANALYSIS
About SAP Predictive Analytics: Association Analysis
Association Analysis allows for us to find correlation between data that is not always apparent at
surface level by examining a group of transactions and determining rules based on the input.
The findings from these analyses can be used in business for example to promote and recommend
items that often occur together.
Association rules are written out as antecedents -> consequents, or an “if/then” format.
The strength of a rule is determined by three factors:
- Support (S) (0<S<1): The percentage of transactions that follow the rule
o =P (A ^B)
- Confidence ‘(C) (0<C<1): the probability that the rule will occur
o = P(B|A)
- Lift (L): Is the association rule based or just a coincidence?
o =P(B|A)/P(B)
The association analysis can then be mined further using the A-priori Algorithm:
- Trims out infrequent rules
- Identifies weaknesses within the model and can have a threshold setting for support,
confidence, and lift (Week 9, Lee, 2019)
Data Set
The data set I will be using is the passenger data from the Titanic. Using Expert Analytics, I will be
generating rules through an Association Analysis surrounding the survivability of the passengers
given different variables. (Jones, Kale, 2019)
Setting up the Model
Using the R-Apiori Association Algorithm
(Figure 47), I set up the Item columns to
reflect “Age”, “Class”,” Sex”, and
“Survived”. These are the factors that I am
trying to determine an association with.
By setting support at 0.01, we are looking
for 1% of the data to match the rule
perfectly. With confidence at 0.8, or the
relative frequency of survival.
FIGURE 47
39
I’m looking for survival rates, so isolating survived to the right hand side (the consequent) of the
equation will give a better picture of this and
what associations are actually associated
with survival (Figure 48)
Running the Association Analysis and Visualizing
The rules produced by the previous algorithm are as follows (Figure 50):
To set up the visualization, I converted Lift,
Confidence, and Support to measures (Figure 49)
to get a different perspective of the rules.
Creating a bubble chart based on these three
measures, with Confidence as the Y-Axis, Support
as the X-Axis, and Lift as the bubble size, it
visualizes the dimensions within the rules. Colours
are then added to differentiate the specific rules
(Figure 51).
Now that it is laid out visually within the program,
the exact coordinates can be seen by mousing over the individual bubbles.
FIGURE 50
FIGURE 48
FIGURE 49
FIGURE 51
40
Within the rules, there are two that I feel are the strongest:
From the visualization standpoint, these two standout as large and above the rest on the chart.
Upon further inspection, these have two of the highest confidence rates out of the rules.
Analysis of SAP Predictive Analytics: Association Analysis
SAP Predictive Analytics: Association Analysis is simple enough to use for even the most basic
analytics user. There is an understanding of statistics that the user would need before attempting to
understand, as not everything is explained within the program. There is an assumption of knowledge
within the program, but not at a complex level.
One big flaw that I found is that it is difficult to switch back after adding constraints, to the original
rules that the algorithm provided.
SAP Predictive Analytics is an excellent tool all around for generating statistical data, as you are able
to predict, analyze, visualize, and then save your findings again back into a .csv file all within the one
program.
Because it is so straight forward, there is not much to analyze about the actual processes that
wasn’t evident in the previous section.
Conclusion
SAP Predictive Analytics: Association Analysis is a user-friendly program that ads a visual component
to statistics that I had never considered before.
FIGURE 53 FIGURE 52
41
REFERENCES
Jones, N., Kale, N., (2019), Chapter 5: Exercise 1, Using Excel Pivot Tables for
Analytics, [Case Study], Retrieved from Brightspace
Jones, N., Kale, N., (2019), Chapter 5: Exercise 2, Data Manipulation for Analytics,
[Case Study], Retrieved from Brightspace *Modified by Kyung Lee
Jones, N., Kale, N., (2019), Chapter 11: Exercise 1, Association Analysis, [Case
Study], Retrieved from Brightspace
Lee, Kyung Y., (2019) SAP HANA DATA MODELING CASE STUDY BUSINESS
SCENARIO [Case Study], Retrieved from Brightspace
Lee, Kyung Y., (2019) Week 3: Data Modelling and Extraction Transformation
Loading [PowerPoint Slides], Retrieved from Brightspace
Lee, Kyung Y., (2019) Week 5: Business Reporting, and Performance Management
[PowerPoint Slides], Retrieved from Brightspace
Lee, Kyung Y., (2019) Week 6: Data Visualization Basics (Chapter 7) [PowerPoint
Slides], Retrieved from Brightspace
Lee, Kyung Y., (2019) Week 9: Data Mining & Predictive Analytics [PowerPoint
Slides], Retrieved from Brightspace
Lee, Kyung Y., (2019) Week 10: Big Data & In-Memory Analytics [PowerPoint Slides],
Retrieved from Brightspace
SAP Lumira Discovery (Benefits and Capabilities), (2019)
https://www.sap.com/canada/products/lumira.htm

More Related Content

What's hot

Intrusion Detection on Public IaaS - Kevin L. Jackson
Intrusion Detection on Public IaaS  - Kevin L. JacksonIntrusion Detection on Public IaaS  - Kevin L. Jackson
Intrusion Detection on Public IaaS - Kevin L. JacksonGovCloud Network
 
Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015
Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015
Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015Lora Cecere
 
Why And Ontology Engine Drives The Point Cross Orchestra Engine
Why And Ontology Engine Drives The Point Cross Orchestra EngineWhy And Ontology Engine Drives The Point Cross Orchestra Engine
Why And Ontology Engine Drives The Point Cross Orchestra EngineKuzinski
 
DIGITAL MARKETING STRATEGIES AND CHANNELS TO DRIVE DEMAND GENERATION AND ROI
DIGITAL MARKETING STRATEGIES AND CHANNELS TO DRIVE DEMAND GENERATION AND ROIDIGITAL MARKETING STRATEGIES AND CHANNELS TO DRIVE DEMAND GENERATION AND ROI
DIGITAL MARKETING STRATEGIES AND CHANNELS TO DRIVE DEMAND GENERATION AND ROIMohit Khare
 
SaskMedical Diagnostics (P.C.) BJORN HUNTER
SaskMedical Diagnostics (P.C.) BJORN HUNTERSaskMedical Diagnostics (P.C.) BJORN HUNTER
SaskMedical Diagnostics (P.C.) BJORN HUNTERBjorn Hunter
 
Google analytics training book - Now free
Google analytics training book - Now freeGoogle analytics training book - Now free
Google analytics training book - Now freeMesurex
 

What's hot (7)

Babok v2 draft
Babok v2 draftBabok v2 draft
Babok v2 draft
 
Intrusion Detection on Public IaaS - Kevin L. Jackson
Intrusion Detection on Public IaaS  - Kevin L. JacksonIntrusion Detection on Public IaaS  - Kevin L. Jackson
Intrusion Detection on Public IaaS - Kevin L. Jackson
 
Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015
Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015
Putting Together the Pieces - The S&OP Technology Landscape - 20 AUG 2015
 
Why And Ontology Engine Drives The Point Cross Orchestra Engine
Why And Ontology Engine Drives The Point Cross Orchestra EngineWhy And Ontology Engine Drives The Point Cross Orchestra Engine
Why And Ontology Engine Drives The Point Cross Orchestra Engine
 
DIGITAL MARKETING STRATEGIES AND CHANNELS TO DRIVE DEMAND GENERATION AND ROI
DIGITAL MARKETING STRATEGIES AND CHANNELS TO DRIVE DEMAND GENERATION AND ROIDIGITAL MARKETING STRATEGIES AND CHANNELS TO DRIVE DEMAND GENERATION AND ROI
DIGITAL MARKETING STRATEGIES AND CHANNELS TO DRIVE DEMAND GENERATION AND ROI
 
SaskMedical Diagnostics (P.C.) BJORN HUNTER
SaskMedical Diagnostics (P.C.) BJORN HUNTERSaskMedical Diagnostics (P.C.) BJORN HUNTER
SaskMedical Diagnostics (P.C.) BJORN HUNTER
 
Google analytics training book - Now free
Google analytics training book - Now freeGoogle analytics training book - Now free
Google analytics training book - Now free
 

Similar to Business Analytics Tools Comparison

GSPANN Guide ( Sitecore vs. Google Analytics )
GSPANN Guide ( Sitecore vs. Google Analytics )GSPANN Guide ( Sitecore vs. Google Analytics )
GSPANN Guide ( Sitecore vs. Google Analytics )Rolf Kraus
 
Rapid mart development guide
Rapid mart development guideRapid mart development guide
Rapid mart development guideBhaskar Reddy
 
CRM EHP3 landscape guide
CRM EHP3 landscape guide CRM EHP3 landscape guide
CRM EHP3 landscape guide SK Kutty
 
Youwe sap-ecc-r3-hana-e commerce-with-magento-mb2b-100717-1601-206
Youwe sap-ecc-r3-hana-e commerce-with-magento-mb2b-100717-1601-206Youwe sap-ecc-r3-hana-e commerce-with-magento-mb2b-100717-1601-206
Youwe sap-ecc-r3-hana-e commerce-with-magento-mb2b-100717-1601-206Dennis Reurings
 
Ibm spss bootstrapping
Ibm spss bootstrappingIbm spss bootstrapping
Ibm spss bootstrappingDũ Lê Anh
 
Tx2014 Feature and Highlights
Tx2014 Feature and Highlights Tx2014 Feature and Highlights
Tx2014 Feature and Highlights Heath Turner
 
The Analytics Revolution 2011: Optimizing Reporting and Analytics to Make A...
The Analytics Revolution 2011:  Optimizing Reporting and Analytics to  Make A...The Analytics Revolution 2011:  Optimizing Reporting and Analytics to  Make A...
The Analytics Revolution 2011: Optimizing Reporting and Analytics to Make A...IBM India Smarter Computing
 
BI Project report
BI Project reportBI Project report
BI Project reporthlel
 
Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569Kun Le
 
Business Intelligence in SAP Environments: Understanding the value of complem...
Business Intelligence in SAP Environments: Understanding the value of complem...Business Intelligence in SAP Environments: Understanding the value of complem...
Business Intelligence in SAP Environments: Understanding the value of complem...dcd2z
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine LearningLynn Langit
 
Data Science & BI Salary & Skills Report
Data Science & BI Salary & Skills ReportData Science & BI Salary & Skills Report
Data Science & BI Salary & Skills ReportPaul Buzby
 
Scorecard & Dashboards
Scorecard & DashboardsScorecard & Dashboards
Scorecard & DashboardsSunam Pal
 
SPi Global Services Overview
SPi Global Services OverviewSPi Global Services Overview
SPi Global Services Overviewbloevens
 
Master guide-ehp6for erp6.0-ehp3fornw7.0
Master guide-ehp6for erp6.0-ehp3fornw7.0Master guide-ehp6for erp6.0-ehp3fornw7.0
Master guide-ehp6for erp6.0-ehp3fornw7.0Adnan Khalid
 
Modifying infor erp_syte_line_5140
Modifying infor erp_syte_line_5140Modifying infor erp_syte_line_5140
Modifying infor erp_syte_line_5140rajesh_rolta
 
Adobe Audience Manager Readiness Playbook
Adobe Audience Manager Readiness PlaybookAdobe Audience Manager Readiness Playbook
Adobe Audience Manager Readiness PlaybookChristophe Lauer
 

Similar to Business Analytics Tools Comparison (20)

GSPANN Guide ( Sitecore vs. Google Analytics )
GSPANN Guide ( Sitecore vs. Google Analytics )GSPANN Guide ( Sitecore vs. Google Analytics )
GSPANN Guide ( Sitecore vs. Google Analytics )
 
Rapid mart development guide
Rapid mart development guideRapid mart development guide
Rapid mart development guide
 
CRM EHP3 landscape guide
CRM EHP3 landscape guide CRM EHP3 landscape guide
CRM EHP3 landscape guide
 
Bslsg131en 1
Bslsg131en 1Bslsg131en 1
Bslsg131en 1
 
Youwe sap-ecc-r3-hana-e commerce-with-magento-mb2b-100717-1601-206
Youwe sap-ecc-r3-hana-e commerce-with-magento-mb2b-100717-1601-206Youwe sap-ecc-r3-hana-e commerce-with-magento-mb2b-100717-1601-206
Youwe sap-ecc-r3-hana-e commerce-with-magento-mb2b-100717-1601-206
 
Ibm spss bootstrapping
Ibm spss bootstrappingIbm spss bootstrapping
Ibm spss bootstrapping
 
Tx2014 Feature and Highlights
Tx2014 Feature and Highlights Tx2014 Feature and Highlights
Tx2014 Feature and Highlights
 
The Analytics Revolution 2011: Optimizing Reporting and Analytics to Make A...
The Analytics Revolution 2011:  Optimizing Reporting and Analytics to  Make A...The Analytics Revolution 2011:  Optimizing Reporting and Analytics to  Make A...
The Analytics Revolution 2011: Optimizing Reporting and Analytics to Make A...
 
BI Project report
BI Project reportBI Project report
BI Project report
 
Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569
 
Whats new
Whats newWhats new
Whats new
 
Business Intelligence in SAP Environments: Understanding the value of complem...
Business Intelligence in SAP Environments: Understanding the value of complem...Business Intelligence in SAP Environments: Understanding the value of complem...
Business Intelligence in SAP Environments: Understanding the value of complem...
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine Learning
 
Data Science & BI Salary & Skills Report
Data Science & BI Salary & Skills ReportData Science & BI Salary & Skills Report
Data Science & BI Salary & Skills Report
 
Scorecard & Dashboards
Scorecard & DashboardsScorecard & Dashboards
Scorecard & Dashboards
 
Oracle sap
Oracle sapOracle sap
Oracle sap
 
SPi Global Services Overview
SPi Global Services OverviewSPi Global Services Overview
SPi Global Services Overview
 
Master guide-ehp6for erp6.0-ehp3fornw7.0
Master guide-ehp6for erp6.0-ehp3fornw7.0Master guide-ehp6for erp6.0-ehp3fornw7.0
Master guide-ehp6for erp6.0-ehp3fornw7.0
 
Modifying infor erp_syte_line_5140
Modifying infor erp_syte_line_5140Modifying infor erp_syte_line_5140
Modifying infor erp_syte_line_5140
 
Adobe Audience Manager Readiness Playbook
Adobe Audience Manager Readiness PlaybookAdobe Audience Manager Readiness Playbook
Adobe Audience Manager Readiness Playbook
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

Business Analytics Tools Comparison

  • 1. 1 Business Analytics Tool Kit PORTFOLIO Hannah Forsythe
  • 2. 2 TABLE OF CONTENTS Executive Summary .....................................................................................................................................4 Chapter 1: Microsoft Excel Pivot Tables.....................................................................................................5 About Microsoft Excel Pivot Tables.........................................................................................................5 Data Set and Research Questions..........................................................................................................5 Application of Microsoft Excel Pivot Tables............................................................................................6 Analysis of Microsoft Excel Pivot Tables.................................................................................................8 Conclusion................................................................................................................................................8 Chapter 2: Tableau ......................................................................................................................................9 About Tableau ..........................................................................................................................................9 Data Set and Research Questions..........................................................................................................9 Applications of Tableau ........................................................................................................................ 10 Analysis of Tableau............................................................................................................................... 11 Conclusion............................................................................................................................................. 11 Chapter 3: IBM Cognos Insight................................................................................................................. 12 About IBM Cognos Insight .................................................................................................................... 12 Data Set and Research Questions....................................................................................................... 12 Applications of IBM Cognos Insight ..................................................................................................... 12 Analysis of IBM Cognos Insight ............................................................................................................ 13 Conclusion............................................................................................................................................. 14 Chapter 4: SAP Lumira ............................................................................................................................. 15 About SAP Lumira ................................................................................................................................. 15 Data Set and Research Questions....................................................................................................... 15 Applications of SAP Lumira .................................................................................................................. 16 Analysis of SAP Lumira ......................................................................................................................... 19 Conclusion............................................................................................................................................. 19 Chapter 5: SAP Lumira for Geospatial Analysis ...................................................................................... 20 About SAP Lumira for Geospatial Analysis .......................................................................................... 20 Data Set and Research Questions....................................................................................................... 20 Applications of SAP Lumira for Geospatial Analysis ........................................................................... 21 Analysis of SAP Lumira for Geospatial Analysis .................................................................................. 24 Conclusion............................................................................................................................................. 24 Chapter 6: SAP Business Objects Analysis for Excel .............................................................................. 25 About SAP Business Objects Analysis for Microsoft Office................................................................. 25
  • 3. 3 Data Set and Research Questions....................................................................................................... 25 Applications of SAP Business Objects Analysis for Excel ................................................................... 26 Analysis of SAP Business Objects Analysis for Excel .......................................................................... 28 Conclusion............................................................................................................................................. 28 Chapter 7: SAP Analytics Cloud................................................................................................................ 29 About SAP Analytics Cloud.................................................................................................................... 29 Data Set and Research Questions....................................................................................................... 29 Application of SAP Analytics Cloud....................................................................................................... 30 Analysis of SAP Analytics Cloud............................................................................................................ 33 Conclusion............................................................................................................................................. 33 Chapter 8: SAP HANA DATA MODELING .................................................................................................. 34 About SAP HANA Data Modeling.......................................................................................................... 34 Data Set................................................................................................................................................. 34 Processes .............................................................................................................................................. 34 Analysis of SAP HANA Data Modeling.................................................................................................. 37 Conclusion............................................................................................................................................. 37 Chapter 9: SAP Predictive Analytics: Association Analysis..................................................................... 38 About SAP Predictive Analytics: Association Analysis......................................................................... 38 Data Set................................................................................................................................................. 38 Setting up the Model ............................................................................................................................ 38 Running the Association Analysis and Visualizing .............................................................................. 39 Analysis of SAP Predictive Analytics: Association Analysis................................................................. 40 Conclusion............................................................................................................................................. 40 References ................................................................................................................................................ 41
  • 4. 4 EXECUTIVE SUMMARY As a fourth-year Bachelor of Commerce student at Dalhousie University, I’ve had the opportunity to take the course Business Analytics and Data Visualization (COMM 4512), under the teaching of Prof. Kyung Young Lee. A key component of this course is learning skills using a variety of the latest business analytics tools through challenging exercises that not only allow us to learn about the analytics tools, but also to think critically about the tools at our disposal and how we can use them to create meaningful insights and answer key business questions, not only within our current course work, but also our future careers. Over the next 9 chapters, I will be analyzing the following tools: - Microsoft Excel: Pivot Tables - Tableau - IBM Cognos Insights - SAP Lumira - SAP Lumira for Geospatial Analysis - SAP Analytics Cloud - SAP HANA Data Modeling - SAP Predictive Analytics: Expert Analytics Each tool has its advantages and disadvantages, and I’ve highlighted a variety of research questions for each tool in an attempt to capture these. Some of the data sets and a brief introduction include: - Global Bikes Inc.: A fictitious company used in multiple examples due to the depth of the data and the wide range of analysis that can be done on it. - World Bank Data: Co2 Emissions. This is an excellent one to visualize as the data itself is confusing and difficult to follow, therefore putting it in a tool that can highlight key points without needing to thoroughly explain the data itself is key. - Sales and Distribution Channels: Easy data to navigate for a tool that may not be as simple as others due to its lack of advancement to keep up with technological changes - Alcohol preferences within Canada: Multi sourced data that had to be used in a program that can merge the data seamlessly and then can remain within the same program to be visualized. - Crime Data based out of Calgary, AB. There are many different sources and impacts that city dynamics can have on crime, so by a combination of 4 data sets there can be a deeper understanding of the geospatial crime environment - ERP Simulation Game (ERPSIM) data, the game simulates the planning, procurement, producing, and selling environment of a commodity and tracks the data at all levels, giving a robust data set to work with Through the variety of tools and data sets, I have gained a bias towards the SAP suite of products. SAP has been the most user friendly in all aspects, from crosstab development, to geospatial analysis, and data modelling, SAP offers something for everyone at every level of ability.
  • 5. 5 CHAPTER 1: MICROSOFT EXCEL PIVOT TABLES About Microsoft Excel Pivot Tables A lot of data starts from an Excel file and is then dumped into another tool for analysis, but it doesn’t have to! Excel has a host of analysis tools right in the app that can be incredibly useful to pull business insights from. One tool in particular, the Pivot table, allows you to slice and dice your data, creating a cross-tabulated structure that you can then manipulate by sorting, using filters and slicers, and ranking, to summarize your data. From the summary, you can create pivot charts to further explain your findings. Microsoft Excel Pivot Tables are extremely user friendly and allow you to retain all of your data within one application. Data Set and Research Questions Using Excel Pivot Tables, I will be analyzing transactional data from the fictitious company Global Bikes Inc. (GBI) obtained through a spreadsheet distributed in class. This includes details about products and accessories sold between 2007-2011 throughout the US and Germany. This was given to us in excel format (.xls) and has 18 different variables to be analyzed (table below). The data was cleaned before being distributed. GBI Variables: Within this data, there are many levels that can be slice and diced within it, that hopefully can be realized through the use of pivot tables. The flexibility of this data makes it interesting to analyze and may draw some unexpected outcomes due to the fact that it covers operations and revenues of a “global” company. The four questions I have answered through Pivot Table analysis include: i. In the year with the overall lowest revenue, which material generated the lowest revenue? ii. In the year with the highest Net Sales, which division had the highest Net Sales? In that division, what customer had the highest Net Sales? iii. In the year with the lowest Revenue, what division had the highest sales revenue? iv. What is the trend in Annual Net Sales by country? Material Material Desc Division Customer Desc Country Country Desc Calendar Year/Month Calendar Year Calendar month Net Sales Revenue Sales Quantity Product Category Customer Sales Organisation Sales Org Desc Cost of Goods M USD Discount
  • 6. 6 Application of Microsoft Excel Pivot Tables In the year with the lowest revenue, which material generated the lowest revenue? After creating a Pivot Table with the initial GBI data, I added the Calendar Year to the rows, and Revenue to Values in the Pivot Table Fields menu. This resulted in a cross- tabulation that displays total revenue by years (Figure 2). From this it can be determined that 2009 had the lowest revenue at a total of $52,610,815.06. From here, I drilled down by adding in Material Description (Material Disc.) to rows. From there, I expanded the 2009 list to see the complete list of materials, and sorted in ascending order. From Figure 1 we can derive that the Fixed Gear Bike Plus has the lowest revenue, with a total of $13,475.37. In the year with the Highest Net Sales, which division had the highest Net Sales? In that division, what customer had the highest Net Sales? rom the Pivot Table Field menu, I added Calendar Year and Division to rows, and Net Sales to values. By first sorting the Net Sales in Descending order, I determined that 2007 had the highest net sales (Figure 4) with a total of $58,786,492.15. By drilling down to Division in 2007, I could easily determine that the division of BI had the highest net sales at $58,293,362.01. Now that it’s determined that BI had the highest net sales in 2007, I had to add Customer down in rows (Below Year and Division). At this point, I could choose to filter Year to 2007, and Division to BI, or just drill down within the Cross-tab. I choose to drill down within the existing cross-tab as seen in Figure 3. To determine the customer with the highest net sales, I sorted the sum of net sales in descending order. The customer at the top of the list is Bavaria Bikes with a total net sales of $5,709,514.86. FIGURE 2 FIGURE 1 FIGURE 4 FIGURE 3
  • 7. 7 In the year with the Lowest Revenue, what division had the highest Sales Revenue? From the Pivot Table Field menu, Calendar Year and Division have been added to the Rows, and Revenue and Net Sales have been added to Values. I started by sorting the Revenue in Ascending order to determine that 2009 was the year with the lowest revenue with a value of $52,610,815.06. Drilling down from that menu and in the Net Sales column, I could easily determine that BI had the highest Sales revenue in 2009, with a value of $50,555,395.70. What is the trend in Annual Net Sales by Country? By adding Calendar Year to the rows, Country to the Column, and Net Sales to the values in the Pivot Table Field menu, it gives a very organized cross tab of the data by Year and Country in terms of Total Net Sales (Figure 7). This is great if I needed to go in and pull quick data in terms of sales figures, but it’s not visually organized to show trends. What I can do with this data though, is go into the Pivot Table Analyze menu and create a Pivot Chart. I choose to display the trends through a line chart (Figure 6) as it is the most efficient way to see a snapshot of the Country Net Sales year over year side by side. From Figure 6, it’s easy to see that in 2007, both US and DE markets were generating similar sales values, and in coming years DE increased their net sales up to nearly $35,000,000.00, while US dropped and has almost plateaued around $20,000,000.00. $0.00 $5,000,000.00 $10,000,000.00 $15,000,000.00 $20,000,000.00 $25,000,000.00 $30,000,000.00 $35,000,000.00 $40,000,000.00 2007 2008 2009 2010 2011 NetSales Net Sales per Country DE US FIGURE 5 FIGURE 7 FIGURE 6
  • 8. 8 Analysis of Microsoft Excel Pivot Tables Excel Pivot Tables is notably the most accessible data analysis tool. Most computers have the Microsoft Office Suite and included in that is Excel. First off, it’s very user friendly. Regardless of the amount of data, if you have clear and concise column titles, it’s very easy to navigate. You can make changes quickly and easily, and everything in the Pivot Table is labeled exactly like you see it in the menu. I would say it is only one step above navigating excel for regular use, and those who use it regularly should have no problem picking up this useful tool. It can also be used offline which is very convenient. As long as your data is loaded in, it can be manipulated from wherever. The cross tabulations generated from the Pivot Table Field menu are both instant and very well organized. You have the option of dragging and dropping into the proper section, and from there organizing it in the order that best suits your needs. It’s very customizable in the sense that you can filter your options at every level as you’re drilling down so that you only see the exact data you are looking for. One draw back to the simplicity of it, is that it can’t be taken much further than what you see. There are no Geo-location options that load in from a third party, it doesn’t offer interactive features, it just offers cross tabulation and very simple and generic charts. Conclusion Pivot Tables are functional but basic. If you’re looking for a specific line of data and know your drill down path, I think Pivot Tables will lead you exactly where you want to be. If you’re looking for a more visual interpretation, I think there are substantially better options that can take you much further than a bar or line chart.
  • 9. 9 CHAPTER 2: TABLEAU About Tableau Tableau is an incredibly powerful, full spectrum data analysis tool. From preparing data, to visualizing the final results, you can do all of it within the one program. Having both a desktop application, as well as servers for easy sharing you can get the whole data experience from start to finish within Tableau. It is simply designed so that individuals can use it from the front end, but has a powerful backend for customizable results if you go through the Python server and know how to program using Python. Tableau can use data pulled from a variety of sources, notably Excel, SAP, Amazon Web Services, and Salesforce (Lee, Week 6) and data can easily be combined from multiple sources within the program simply through common headers. Tableau has a sleek dash board that allows an easy to use, drag and drop experience. Data Set and Research Questions The data set I’ll be using is Global CO2 rates retrieved through the World Bank. This is a raw data Excel spreadsheet with the CO2 rates per country by year (1960-2011). Cleaning the data is very simple within Tableau because it has a feature labelled “Data Interpreter” that will actually clean the data for you, and then allows you to review what it cleaned up and make adjustments after. This worked very well for my data specifically, and it was not necessary to make further adjustments. From here though, to make one element easier, Tableau has an option to “Pivot” data. What I did with this feature was switch the dates from being listed horizontally, to vertically underneath its own columns. This changes nothing about the data itself, just how it’s presented. Some questions and observations I’ll be addressing through Tableau about the data include: i. How do the global CO2 emissions year over year trends compare between Canada and China? ii. Where are the highest CO2 emissions per capita concentrations globally? iii. What are the 10 highest CO2 emitting countries captured by this data set (By Country)? Top 10 per Capita?
  • 10. 10 Applications of Tableau How do the Global CO2 emissions year over year trends compare between Canada and China? Using the cleaned data set and a Line Chart with Year in the Columns and SUM(CO2 Emissions) as the Rows, this produces a huge and complicated to read visualization. By inserting Country Name into the filter and first selecting Canada (Figure 9) and then selecting China to see both on the same chart (Figure 8). Note the drastic difference in scale between the two graphs. We can see the extreme incline in 2001 for China while Canada has been steadily increasing at a substantially slower rate. There is a brief period between 1963-1968 where both countries are within ~500K in similar emissions but then China increases coming into the 70’s. There really is no comparison when it comes to these two countries totals as they have very different production environments, and China has a much larger population. Where are the highest CO2 emissions per capita concentrations globally? Creating a Map, with Longitude on the Column and Latitude on the Rows, Tableau has the capacity to create maps using geo location data. By using the Country Name in the Detail field and Avg. CO2 Emission per Capita as the Colour, I’ve created a gradient view of the world map by CO2 Emissions per Capita (Figure 10). By creating a colour gradient based on the Average CO2 Emissions per Capita, we can actually see in this map that North America has some of the highest CO2 emissions per capita globally. Compared to the previous question, where it appears that China is substantially worse than Canada, when bringing population into consideration, emissions are sigbificantly higher per individual Canadian. FIGURE 10 FIGURE 9 FIGURE 8
  • 11. 11 What are the 10 highest CO2 emitting countries captured by this data set (by Country)? Top 10 per Capita? Creating a Bar Chart with Country Name as the Columns and Average CO2 as the Rows I have a full view of every country captured within the data and their Average Emissions output. From here I went in through the sort menu to only show the Top 10 countries by Average Emissions, and then sorting from Highest to Lowest (Figure 12). Here you can see the US has almost 2000K more on average than the second highest emitting country, China. Canada falls behind in 8th place total. Looking now per capita, the table is easily change by altering the Rows from just Average CO2, to AVG CO2 Per Capita. (Figure 11) Interestingly enough, there is very ittle overlap in the two charts, with the exception of Canada and the US. Analysis of Tableau I feel like I only scratched the surface of what Tableau is capable of, and it can create really intricate and detailed visualizations that still present as easy to read and aesthetically appealing as well. I really like the look and colour schemes within Tableau, they would be very eye catching to someone looking at the results for the first time. With that being said, it is almost like information overload. Every menu has so many options that it is almost too much if you don’t know exactly what you’re looking to create going into the analysis. After working with the tool a bit, I’ve found the surface level details easy enough to use, although I know I could be using for much more complex designs that would take a lot more time and a much steeper learning curve. From my own capabilities and uses, I feel as though Tableau offers what I need, but there is way more that is out of my capacity (i.e.., using Python) that I wouldn’t even know where to begin with. Conclusion Tableau is a very interesting tool, although I don’t feel it is the most user friendly. As someone with a bit of experience now working with different visualization tools, I found Tableau to be one of the more difficult ones to wrap my head around. After working with both SAP and Tableau, I think I prefer SAP. FIGURE 11FIGURE 12
  • 12. 12 CHAPTER 3: IBM COGNOS INSIGHT About IBM Cognos Insight IBM Cognos Insight is a desktop-based analytics tool with the business user in mind. In saying that I mean that the front end is very simple but it is a powerful tool that can create actionable results without the need of an IT professional. It was created for uncovering inefficiencies and opportunities using Business Intelligence, giving quick overviews from the minute you upload your data. A variety of third-party data sources are supported. Simplicity should be kept in mind as I go through the difference uses of this application. Data Set and Research Questions The data set being used within the IBM Cognos examples is a set of Regional Sales (fictitious) Data that was provided in class. It includes 7 different product types being sold within North America, being sold over 3 different sales channels. The questions I will be answering through Analysis and Visualization include: i. Which Sales Channel had the highest total revenue in Q3/2012? ii. Which customer type has the weakest margins? iii. Which Product Type within Entertainment Venues has the highest Margin Applications of IBM Cognos Insight Which Sales Channel had the highest total revenue in Q3/2012? To determine which of the 3 sales channels (Direct, Internet, or Retail) had the highest total revenue in Q3/2012, I have to set the Sales Channel as the Rows, the Revenue as the Dimension, and the Quarters as the Columns. This will give an overview of all 4 quarters in 2012, but to be even more specific, within the Columns menu, I can drill down to show specifically Q3/2012. In Figure 14, a simple Bar chart is automatically generated when I make these changes to the dimensions, and we can see that Direct Sales has the highest revenue with a total of $2,180,912.00. IBM Cognos Insight also generates a table with the given data dimensions as well (Figure 13). FIGURE 14 FIGURE 13
  • 13. 13 Which customer type has the weakest margins? Margins are not something simply provided within my data set, but easy to achieve with IBM Cognos Insight. First off, the Rows will be defined by Customer Type, and the Columns defined by Sales – Region Measures. Within Sales-Region measures there are three data points: Cost, Revenue, and Count. Count is irrelevant for this, so I deleted that column. Now left with Cost and Revenue, two factors of margins, I went into the column header “Calculate” menu and compared Revenue vs. Cost. By selecting this, it gives you a visual of which Customer Type has Good (Revenue is 10% more than cost), Moderate (anything between Good/Weak), or Weak (Revenue is 90% of cost) margins. As indicated in Figure 15, Retail has been indicated as having a weak Margin score (Which can easily be proven true by looking at the Cost vs. Revenue in the table). Which Product Type within Entertainment Venues has the highest Margin %? When determining the exact margin %, I had to create a new column within the table for the calculation. This is very simple, as IBM Cognos Insights has a calculations menu that will give you a list of possible calculations when you have multiple columns selected. For this, I simply selected the Cost and Revenue Columns (from the previous question) and right clicked. Under the Calculation menu, I selected Cost/Revenue and the renamed it to Margin %. Modifying to give the actual percentage, I had to adjust the internal equation by adding a 1- to the front of it, to get the pure margin percentage, and then formatting the column to display all values as percentages. Drilling down to Customer Type: Entertainment Venues, (Figure 16) every value has a good margin score, but the highest is Repairs with 66.29%. Analysis of IBM Cognos Insight After using IBM Cognos Insight, it is very evident that it is a dated software. With that being said, it still serves its purpose well. It is easy to navigate and produces the exact results that a businessperson would need fairly simply. It is not the easiest analysis tool that I’ve used, but it is laid out in a way that you can find what you’re looking for by picking around on it with minimal consequences. The graphics are dated, and the tables are not as easy to manipulate as purely as excel sheets are. In terms of a visualization tool, I feel as though Cognos falls short in the ability to take plain data and transform it into something less than obvious, which came through in my questions and answers. The charts are bland, the tables are no step above any average spreadsheet, and as a result, uninspired answers. FIGURE 15 FIGURE 16
  • 14. 14 It is definitely a sufficient application, and works very well for business purposes, but there are more powerful, simpler, and prettier applications that serve the exact same purpose. Conclusion IBM Cognos Insight is a dated application which is evident in appearances alone, but an effective tool nonetheless for basic business analysis.
  • 15. 15 CHAPTER 4: SAP LUMIRA About SAP Lumira SAP Lumira is a Business Intelligence and BI tool used for business purposes. Data can be pulled from an Excel of CSV file and then visualized within the program. There are 3 Key Capabilities of SPA Lumira: i. Tell the Story with Self-Service Data Visualization: Explore and analyze data online with a simple-to-use solution. Create stories with BI visualizations from all types of data that others can leverage, build on, and share. ii. Create analytics applications and dashboards: Develop interactive, mobile-ready dashboards and analytics applications to collaborate with users and their data stories and provide fingertip access to actionable insight. iii. Secure trusted access and scalability: Connect to data anytime, anywhere for deeper insights and informed decision making on the go. Explore data with filters, drill-down capabilities, and hierarchal navigation. (SAP, 2019) Data Set and Research Questions The data used in these examples is two files of data pulled from Stats Canada. One file is Population data including age/sex. The second file is “Sales of Alcoholic Beverages” in Canada. Luckily, the data is downloadable in CSV format, and there was very little cleaning up within the data. The trickiest portion was combining the two datasets to have them work together to uncover the answers to a few questions. The data sets could be cross-referenced by the common factor of Province, so they paired nicely. Another factor to consider, since I was working with alcoholic beverage related statistics, was the legal drinking age of each province. I had to create an additional column and labelled it Adult which was applied appropriately to the people over 18/19 years old. I used SAP Lumira to visualize and compare the data, the visualizations include: i. Number of Adults of Legal Drinking age per Province ii. Adults and Population by Year in Canada iii. Per Capita Sales by Year, Type of Beverage, Originating in Canada compared to Per Capita Sales by Year, Type of Beverage, Originating in Nova Scotia iv. Sales per Capita in Nova Scotia vs. Canada (Domestic & Imported) v. Provincial Per Capita Sales in 2007 vs. 2017
  • 16. 16 Applications of SAP Lumira What is the number of Adults of Legal Drinking Age per Province? Before I began getting in depth with the data, I wanted to break down the information to uncover how many people within each province was legal drinking age. FIGURE 17 Using the “Adults” measure (this was the one created to display only members of the population within legal drinking age), the REF_DATE (Year) in columns, and GEO (Province) in the rows, it creates a cross-tab. For the rest of the questions, the “Adults” measure will be used as the population. Is the number of adults increasing or decreasing within Canada? To visualize the cross- tab created in the previous question, I created a graph to analyze the trend of not only population growth, but growth of people of legal drinking age within Canada year over year. From this graph, it is evident that it is constant upward growth. FIGURE 18
  • 17. 17 What is the Per Capita Sales Rates per Year by Beverage? What is the comparison between products originating in Canada, vs. originating in Nova Scotia? For these charts, the Type of Beverage is in the Rows field and Origin of Product is in the Columns field. Both graphs have been geographically filtered Figure19 is filtered to Canada, and Figure 20 is filtered to only Nova Scotia, and the Type of Beverages has been filtered to solely Beer and Wine. Both nationally and in Nova Scotia a few similarities can be found. 1. Canadian Beer sells substantially better than imported beer 2. Imported wine sells better than Canadian wine 3. Beer sales are higher than Wine sales The differences I found in the two include: 1. Nova Scotia beer sales are higher than the national average 2. There is an influx of Canadian wine being purchased in recent years in Nova Scotia (not Nationally) 3. There is an influx of Canadian beer being purchase in recent years in Nova Scotia (not Nationally) FIGURE 19 FIGURE 20
  • 18. 18 What do the per Capita sales look like in Nova Scotia, vs the National per Capita Averages? Type of Beverage (Beer vs. Wine), Origin of Product (All Products), and Geo (Canada vs. Nova Scotia), were all factors within this Chart. This chart lays out where Nova Scotia stands in terms of Sales per Capita in comparison to the national average Sales per Capita. How have Provincial per Capita Sales Changed from 2007 to 2017? I used a map instead of a regular chart in this one to add a more visual perspective on it. Blue represents Beer, Green represents Spirits, and Purple represents Wines. The size of each pie chart is also directly correlated to the Per Capita Sales figures in each province. FIGURE 21 FIGURE 22 FIGURE 23
  • 19. 19 Analysis of SAP Lumira SAP Lumira is very intuitive and user-friendly. It is clean and the functions are very easy to follow. With that being said, I did not feel overly limited in its capabilities. First off, I found it very easy to go in and select the type of visualization I wanted. It then guides you through using not only text, but icons, in building out your visuals. It appears as soon as you start to add in dimensions and is very visually appealing in the colour schemes as well. Second, the sheer range of abilities within the program is excellent. I can go as simple as creating a cross-tab, to creating geo tagged maps all within the same program. I think the diverse range that SAP Lumira offers is a great selling point. Third, the ability to import files directly from excel is great. If I have a work book that I’ve been working out of and need something more visually appealing to add in to a paper or a presentation, being able to load in the document and be able to manipulate it without having to go to any extra lengths is very useful. I think the most difficult part of the program is setting it up to manipulate multiple data sources within the same project, although I don’t see this as a true limitation to SAP Lumira, as it is difficult to cross reference files across a variety of platforms. Conclusion In conclusion, I think SAP Lumira is easy to use, visually appealing, while also capable of handling complex data from multiple sources. I personally love using Lumira as it allows me to be creative while still processing data and producing meaningful results.
  • 20. 20 CHAPTER 5: SAP LUMIRA FOR GEOSPATIAL ANALYSIS About SAP Lumira for Geospatial Analysis In the previous chapter, I introduced the concept of using SAP Lumira for the purpose of basic geographic based analysis. There are a few reasons why I felt it was important to break geospatial analysis into it’s own category, but primarily because over 80% of data has a connection to location (Geospatial and Graph Analysis, Lee, 2019) and as we gain more access to mobile based data, location correlation will rise as well. By incorporating a geospatial component to data, you are adding an element that is very straight forward, and easier to read for the average person. Looking at a map and having the data plotted is substantially easier to make a connection with as opposed to seeing the data in a cross-tab form. SAP Lumira can simply plot and quantify data based on inputted latitude and longitude coordinates. You are then able to leverage the visualization capabilities of SAP Lumira to layer your data to view location targeted results that are easy to read and understand. Data Set and Research Questions The data set being used in this chapter is about the crime rates in Calgary, AB. This data was procured from four sources: 1) “Calgary Police Statistical Reports” https://www.calgary.ca/cps/Pages/Statistics/Calgary-Police-statistical-reports.aspx 2) “Calgary Communities Locations” – INCLUDES LONGITUDE AND LATITUDE (GEOSPATIAL ANALYSIS OF CALGARY CRIME DATA IN RESIDENTIAL COMMUNITIES, Lee, 2019) 3) “Historical Calgary Community Populations” https://data.calgary.ca/Demographics/Historical-Calgary-Community-Populations/4mgk-hrwr 4) “Calgary Police Service Office Locations Map” https://data.calgary.ca/Health-and-Safety/Calgary-Police-Service-Office-Locations-Map/ehvy- b4t6 These data sources were all linked together using their own version of the “Community Name” column within the data files. The “Calgary Communities Locations” includes the longitude and latitude data, therefore merging the data files based on the community name will assign a geographic location to all the data. One notable fact about the statistical reports is that there is a significant outlier in the data, and that is “Social Disorder”. This is something so common that it barely falls within the category of “crime” but rather something that police respond to that isn’t actually threatening to anyone or anything, more so just a social annoyance for the surrounding communities. I felt this was irrelevant for the some of the data analysis, a it skewed the data in a substantial way so it was filtered out in certain cases.
  • 21. 21 The research question I will be addressing include: 1) Were social disorder calls more common in the city centre or in the rural communities in 2015? 2) Has the quantity of commercial crimes increased or decreased between 2012-2017? a. What does the amount of Breaking and Entering crimes look like in comparison to Robberies? 3) Do the types of crimes committed change as the population size increases or decreases? (Social Disorder filtered out) 4) Is there a correlation in the number of crimes committed and the location of police service offices? Applications of SAP Lumira for Geospatial Analysis Were social disorder calls more common in the city centre or in the rural communities in 2015? I began with this question to determine whether it would be smart to filter out social disorder from further questions. If there is a substantially higher concentration in some regions vs others, it may be interesting or valuable to keep in moving forward. If the data is consistently concentrated throughout all of Calgary, it would not be relevant to determining specific crime rates in regions moving forward. This is a simple geospatial visualization that uses the data from the “Calgary Police Statistical Reports” merged with the “Calgary Communities Locations” data set. The Geo Dimensions are set to the community locations, with the Crime Category being filtered to only “Social Disorder”, and the Year being filtered to 2015. Using the Choropleth Data Point Type, the colour is representative of the # of Incidents (as seen at the bottom of the figure 24). As Figure 24 visually represents, it appears as though social disorders are prevalent in all communities around Calgary, with more occurring in the city centre. Being the centre of the city we can assume that the population is higher, making it not surprising to see a higher density of crime there. FIGURE 24
  • 22. 22 Has the quantity of commercial crimes increased or decreased between 2012- 2017? What does the amount of Breaking and Entering crimes look like in comparison to robberies? To begin, I had to filter the data down to what could be considered a “commercial crime”, I focused in on Breaking and Entering, and Robberies. To achieve this focus, I filtered out all other Crime Categories and then filtered each according to the year. In order to get a more visual representation of the data, I used the Bubble Data Point Type with the number of incidents corresponding to size. From a first glance, it appears as though the data is almost the same across the board but if you take a closer look at the bottom left hand corner, you can actually see that the dimensions change drastically between 2012-2017. In 2012, the large bubble represents 177 incidents, whereas in 2017, the large bubble represents 364 incidents! I would say based on solely visuals (without looking at the precise numerical data) that crimes of this nature have actually increased from 2012-2017. To look at a comparison between the two, I took the Bubble Data Point Type and applied the same filters. The difference between Figure 28 and the ones above, is that there is an additional colour filter sorted to colour coordinate based on Crime Category. Based on Figure 28, we can see that there are many more Breaking and Entering crimes on record than Robbery. FIGURE 26 FIGURE 27 FIGURE 25 FIGURE 28
  • 23. 23 Do the type of crimes committed change as the population size increases or decreases? (Social Disorders filtered out) This question seems simple, but it’s actually a little more complex as there are layers to this visualization. By joining in the third data set to add in community population sizes, again linking through the community name. For the first layer, I applied the Bubble Data Point Type to get a visual based on population size per community. This is a simple layer with only one colour. For the second layer, I applied the pie chart Data Point Type and added a filter to eliminate social disorders (being in such high quantity, it would make the rest of the visualization harder to read if included). From here, I can see the weights of the crimes within the population bubbles. From a visual perspective, it doesn’t appear that there is an obvious correlation between any one type of crime in larger/smaller population communities. Is there a correlation in the number of crimes committed and the location of police services offices? The final question for this tool begins by adding in the fourth data set, the locations of the police services offices within Calgary, merged through the community names. This was another layered question, layer 1 being the total number of crimes (social disorders was not filtered out, as this was not crime committed specific), using the Bubble Data Point Type, with layer 2 being the pin pointed geo- location of the police services offices. Based on this visual analysis, there does not appear to be any correlation between the two. Yes, there appears to be slightly less crime in the areas surrounding the offices, but there also isn’t one in the city centre, where, based on previous questions, the crimes seem much more prominent in general. FIGURE 29 FIGURE 30
  • 24. 24 Analysis of SAP Lumira for Geospatial Analysis SAP Lumira is very user friendly, and the Geospatial Analysis tools are no exception to this. It is a very clean and straightforward tool that allows the combination and layering of multiple data points from a variety of sources. There are different options for visualizing on the map, including a pin point, pie charts, and the one I find most useful for the best picture, the Bubble Data Point Type. The one draw-back I find in Lumira’s geospatial abilities, is the fact that you need to incorporate a data set that include longitude and latitude. This is not as convenient as being able to have a list of city names, provinces, countries, that some other programs do offer. After using SAP Lumira in other contexts, I found it very easy to jump in a combine the different data sets. They all came from excel or .csv files, making them easy to understand by Lumira. SAP Lumira offers the complexity of detailed geospatial analytics but offers it in a way that is easy to navigate and read to the average individual. Conclusion SAP Lumira has proved once again to be an incredibly user-friendly tool, with not only the ability to take simple and complex data from a file to a visualization in very little time, but also introduce geospatial analysis to the simplest degree. I would recommend SAP Lumira as the best tool to use for moderate to complex levels of data by anyone with a base level knowledge of data analysis.
  • 25. 25 CHAPTER 6: SAP BUSINESS OBJECTS ANALYSIS FOR EXCEL About SAP Business Objects Analysis for Microsoft Office SAP Business Object Analysis for Excel is a Microsoft Office Add-In that allows multidimensional analysis of OLAP sources, MS Excel Workbook application design, and creation of BI presentations in PowerPoint. I’ll be using examples from MS Excel Workbook Additional Information Includes: • Bex Queries, Bex Query Views, BW InfoProviders can all be used as the data source • Data is displayed in the workbox as a complex crosstab • Design panel is very similar to that of Excel Pivot Table (“Drag and Drop” functionality) • Can be used with Visual Basic Editor • Can only be used when downloaded on a local machine (Week 3, Lee, 2019) Data Set and Research Questions Using SAP Business Objects Analysis for Microsoft Office, I will be analyzing transactional data from the fictitious company Global Bikes Inc. (GBI) obtained through the InfoProvider GBI InfoCube. This includes details about products and accessories sold between 2007-2011 throughout the US and Germany. It has 18 different variables to be analyzed (table below). Using the Bex Query Designer, I was able to access trend data from within the InfoCube. Once that was saved, I proceeded to open the saved file through SAP BusinessObjects Analysis for Microsoft Excel and began manipulating the data. The questions and observation I have answered through this program include: i. Basic Navigation ii. In the year with the lowest revenue, what division had the highest revenue, and what customer had the highest revenue within that division? iii. What are the historical revenue trends for the US and DE? GBI Variables: Material Material Desc Division Customer Desc Country Country Desc Calendar Year/Month Calendar Year Calendar month Net Sales Revenue Sales Quantity Product Category Customer Sales Organisation Sales Org Desc Cost of Goods M USD Discount
  • 26. 26 Applications of SAP Business Objects Analysis for Excel Basic Navigation As an example has previously been done using Excel Pivot Tables, I wanted to highlight the basic overview of what the SAP Business Objects Analysis for Excel looks like from a first glance as it is a little different. This is a page that I created by adding Sales Quantity, Revenue, Discount, Net Sales, and Cost of Good M USD, in as my Measures, and Calendar Year and Material as my Rows. It looks very similar to an Excel Pivot Tables cross-tab. This is where it ends. SAP Business Objects Analysis for Excel goes substantially more in depth in comparison to Pivot Tables as this is about as complex as pivot tables get, whereas this is just the starting page for SAP Business Objects Analysis for Excel. FIGURE 31
  • 27. 27 In the year with the lowest revenue, what division had the lowest revenue, and what customer had the highest revenue within that division? To start, revenue is the only relevant measure within this question so everything else can be removed from the measures. There are layers to this question so starting off, I would drill down by adding both Calendar year and division to the rows area in that order. By going through the “more sort options” menu, I sorted the data in ascending order based on Revenue in Calendar Year. This puts 2009 at the top with the lowest revenue at $52,610,815.24. From there I can look at the two divisions (if there were more, I would filter to 2009, and sort again) and determine that AS has the Lowest revenue between the two at $427,889.07. From here to determine the customer with the highest revenue within the AS division, I have to drill down one more layer and add in Customer to rows. Because this adds many more layers to the data, I decided to filter everything first to 2009, and then to the division AS. I sorted the revenue in descending order and Bavaria Bikes had the highest revenue in the AS division, at $38,216.79. What are the Historical Revenue Trends for US and DE? I had started with Revenue as my measure, and Country and Calendar year as my Rows (Figure 35). This worked fine if I was just going to analyze a small amount of data in a cross-tab. What I needed to do next though to get a better understanding of the trends, was change Calendar Year from rows to Measures. This allowed me to create a line chart (Figure 34). From here, it is very clearly laid out that although revenues in both countries were relatively similar in 2007, US began a quick decline and nearly plateaued in 2009, where 2007-2009 was pretty stagnant for Germany and then hit an increase from 2009-2011. FIGURE 32FIGURE 33 FIGURE 35 FIGURE 34
  • 28. 28 Analysis of SAP Business Objects Analysis for Excel I think SAP Business Object Analysis for Excel is significantly more complex that those in previous chapters and has taken time and effort to understand, but because of this, it can offer deeper insights and creates very clean and easy to read results. The best aspect of this program has been the ability to extract it from the DataCube and being able to manipulate the raw data to create the exact output I was looking for. The initial extraction was complex and took a deeper understanding, but once I was able to open it in the Excel analysis tool, it became quite a bit easier to navigate because it had a similar layout to what I am used to in Pivot Tables. The ability to drill down while having it still laid out simply makes it appealing to use. I would not use this as a go to for everyday projects as it does require more work and specific software downloads, it’s not as accessible as other programs. The design options are also lacking, much like Pivot Tables. Conclusion I think although not incredibly difficult to use, there are still barriers to the average user. It is useful for large amounts of complex data, but I would not use it as my primary visualization tool.
  • 29. 29 CHAPTER 7: SAP ANALYTICS CLOUD About SAP Analytics Cloud SAP Analytics Cloud is an analytics cloud-based program that leverages business intelligence, enterprise planning, and business analytics to prepare and model (including creating plans, visuals, and predictions) using data from both within the SAP ecosystem, as well as non-SAP sources. Being a cloud based software, there is the benefit of low implementation costs, low maintenance from a users perspective, as well as security standards upheld by SAP. Another benefit of being cloud based is that updates happen in real time, if you want to add in features made available by SAP, you can access them online and add in as you need. Data Set and Research Questions Using SAP Analytics Cloud, I will be analyzing data retrieved through the ERP Simulation Game (ERPSIM). This game simulates the planning, procurement, producing, and selling environment of a commodity (In this example, it is different kinds of Muesli). This game runs over several rounds, with 12 products in the given market, and while the game is being played, the data is being stored through a direct connection to the SAP HANA ERPSim system. This section is using data from a previously run simulation. For more information on ERPSim, visit www.erpsim.hec.ca (Lee, Week 5, 2019) Before jumping into a thorough analysis, I had to prepare the data to get the most from my results. First off, I had to ensure each column of data was properly labelled as a Measure (Key information/ figure/ fact) or Dimension (Reference information directly related to measures). The data would not model properly if it is not labelled properly. For this data, the measures are Price, Quantity, and Revenue. The dimensions are Team, Day, Area, Distribution Channel, Sales, Order, and Product. Second, I enhanced the data. Some of the data columns are vague, but when edited, give a better picture of what is going on. By selecting the area column and choosing to “Create a Transformation” I transformed the NO, SO, WE to North, South, and West (respectively), and then in the Distribution Channels, transforming 10, 12, 14 to Hypermarkets, Grocery Chains, and Convenience Stores. This allows the results to be apparent at a first glance of the visualizations, instead of having to go in later and determine what each acronym or number means. For my next step in enhancing the data, I created another column combining round and day. By doing this, it allows the data to be compared even deeper by the day. The questions I will be answering through analysis and visualization are as follows: i. Which team brought in the highest revenue? What product had the highest revenue? ii. What is the market share (in terms of revenue) of each team, by product? iii. Were there products that did NOT sell in specific distribution channels? iv. Which team sold the high quantity of product, and what quantities were sold of each product? v. On which days did individual teams not have any revenue throughout round 5?
  • 30. 30 Application of SAP Analytics Cloud Which team brought in the highest revenue? What product had the highest revenue? This is a simple analysis once the data is cleaned and enhanced. By inserting a bar/column chart (comparisons visualization type), with Revenue as the Measure and Teams as Dimension it automatically generates the table in Figure 37. By sorting the data in the chart from Highest to lowest and updating the chart colours, it is easy to identify that team RR brought in the highest revenue with $32,297,798.00. Using the same process to insert the bar/column charts in Figure 1/2, we can easily determine the product with the highest revenue by setting the Measure as Revenue and the Dimension as Product. In Figure 38 it is determined that the 500g Nut Muesli is the product with the highest revenue, with a total of $24,445,956.53. FIGURE 36FIGURE 37 FIGURE 38
  • 31. 31 What is the market share (in terms of revenue) of each team, by product? Using a Stacked Bar chart from the comparison menu, I inserted the Measure as Revenue, the Dimension as Product, and then the Colour as Team. The colour is an additional specifier within the stacked bar chart, as there are multiple pieces of data within each bar. In Figure 39, the chart has been sorted by total product revenue lowest to highest for organizational purposes. Each team is assigned a colour so it is easy to see at a glance which team did the best within each product category. Were there products that did NOT sell in specific distribution channels? Using a Heat Map from the distribution menu, it is very easy to find blank spots (if any) within the distribution channels. By setting Dimensions as Distribution Channel and Product, with Colour as Revenue. Based on the map in Figure 40, there are products that did not sell in both the Convenience and the Hypermarkets distribution channels. FIGURE 39 FIGURE 40
  • 32. 32 Which team sold the high quantity of product, and what quantities were sold of each product? Using a Marimekko chart (2-dimensional stacked chart) under the “more” menu, with Team as Dimensions, Quantity as Height, and Product as Colour. From this chart we can clearly see by the height of the column, which team sold the highest quantity of product, and then the colour blocks indicate the quantity of products sold. On which days did individual teams not have any revenue throughout round 5? Using a heat map with the Dimensions set to Round/Day (this is the combination created when enhancing the data as mentioned previously), and Team, and Colour to Revenue by Round/Day. An extra step for this question as it’s asking for a specific round, the data has to be drilled down to the second level (Days within a specific round) and filtering by “5”. FIGURE 42 As can be seen in Figure 42, any blank area within the chart can be considered a day without revenue for a specific date. FIGURE 41
  • 33. 33 Analysis of SAP Analytics Cloud SAP Analytics Cloud is one of the most visually appealing analytics tools that I have used to date. Being cloud based, it’s accessible with a membership virtually anywhere with internet. It is a very user-friendly tool. As long as your baseline data is clear, everything is labelled both with words and icons so that the average user should not have much difficulty at all achieving basic results. It lends easily to manipulating the data even further if you take the extra time to enhance the data. In the initial screen, it is laid out similarly to an excel spreadsheet, making it familiar to most users. Changes can be made easily to the visualizations by dragging and dropping within the menu, as well as colours and sizes. You can then filter to achieve the results you want with minimal effort. Another feature that lends itself to being cloud based is the fact that you can make use of geo-location as well. It is very customizable for a program that does not require any coding. Conclusion From my personal usage standpoint, I find SAP Analytics Cloud very easy to use, while also being able to make modern and professional looking visualizations and producing accurate results.
  • 34. 34 CHAPTER 8: SAP HANA DATA MODELING About SAP HANA Data Modeling Data Set To demonstrate SAP HANA Data Modeling, I used a case study modeling a business scenario for the fictitious company Global Bike Inc. (GBI). In this situation, GBI has acquired a new data warehouse solution based on SAP HANA and the SAP HANA Platform. It is being integrated into the Sales department so a prototype of the model had to be created. Three .csv files were pre-loaded into the data-base including sales data on customers, products, and sales transactions over the last five years. Processes Create DataBase Tables The DataBase tables are the foundation of the data modeling. I started by creating 3 database tables within my database schema. The files given were in .csv form, meaning I had to write a code so that SAP HANA could understand what the information it was reading actually was, where it was coming from, and how to properly position it within a table. An example of this for the SALES data code is as follows: table.schemaName = "GBI_242”; table.tableType = COLUMNSTORE; table.columns = [ {name = "YEAR"; sqlType = INTEGER; }, {name = "MONTH"; sqlType = INTEGER; }, {name = "DAY"; sqlType = INTEGER; }, {name = "CUSTOMER_NUMBER"; sqlType = NVARCHAR; length = 10; }, {name = "ORDER_NUMBER"; sqlType = NVARCHAR; length = 10; }, {name = "ORDER_ITEM"; sqlType = NVARCHAR; length = 3; }, {name = "PRODUCT"; sqlType = NVARCHAR; length = 8; }, {name = "SALES_QUANTITY"; sqlType = INTEGER; }, {name = "UNIT_OF_MEASURE"; sqlType = NVARCHAR; length = 3; }, {name = "REVENUE"; sqlType = DECIMAL; precision = 17; scale = 2; }, {name = "CURRENCY"; sqlType = NVARCHAR; length = 3; }, {name = "DISCOUNT"; sqlType = DECIMAL; precision = 17; scale = 2; } ]; table.primaryKey.pkcolumns = ["ORDER_NUMBER", "ORDER_ITEM"]; To break this down further: - Name: The column title - sqlType: The data type - Length: the number of characters to return underneath the column - Precision: The number of significant digits - Scale: the number of relevant decimal places
  • 35. 35 Once executed correctly, it creates a Data Table formatted exactly how you had it specified within the code. Data Provisioning Once the tables are created, I had to create a new Remote source to host the Virtual Tables needed to move forward. This allows the tables to be used on their own. Once the Remote Source was established, the Virtual Tables can be created through a right click. I had to ensure that the VT’s were included within my own schema “GBI_242”. From here, the background information is established and I can begin the data association processes through Flowgraph Models. Within the editor workbench in SAP HANA FIORI, I can go in and create a flowgraph to associate the Virtual Data Tables (remote source data) with the Data Tables (blank table with data formats). In Figure 43, you can see the initial FlowGraph page. I have my Customer VT as the data source, and the Customer Data Table as my Data Sink. This is an excellent visual that helped me to make the connection between the different types of data, tables, and sourcing. To establish a definite connection between the two, I selected the Data Sink to open the Input/Output details screen (Figure 44). On this page, I drew connections between the relevant data sets and column headers, and then executed the flowgraph model to load in the customer data. This was then done for the product data and the sales data as well. FIGURE 43 FIGURE 44
  • 36. 36 Calculation View -> Dimensions There are two sets of calculation view needed for this data set, Customer (Dimension), and Product (Dimension). Using nodes, I extracted data from the data table (Projection & Join) - Join: Joining two source objects to pass the result. In the example in Figure 45, I used a text-join. - Projection: Used to select columns, filter the data, and create additional columns (also used in Figure 45) Primary keys were also used within the Semantics of the calculation view. Calculation View -> Star Join Data Cube The process for creating the Data Cube is very similar to that of the Dimension calculation, but slightly simpler because it acts as a building block. Using the calculations established in the Dimensions, they are joined together in a “Star Join”. Relevant/Necessary fields are then selected based on the scope of the project that the data is needed for. Figure 46 represents the semantics screen, where primary keys can be defined, the data type can be defined, and columns can be re- labelled properly. FIGURE 45 FIGURE 46
  • 37. 37 Analysis of SAP HANA Data Modeling SAP HANA Data Modeling is a very complex system, and even after fully completing the case analysis, I feel as though I’ve barely scratched the surface of this platforms’ offerings. I feel as though the first two steps, creating the database table and the data provisioning, were the most straight forward and easiest to conceptually wrap my head around. They are very processed based and errors are noted clearly for quick correction and then you can move on to the next steps within the process. The Calculation Views are difficult if you don’t fully know or understand the scope of the project you will be working on, as the options you choose and the definitions you use can change the view you have of the data, and the results you can achieve moving forward. SAP HANA is a very complex program that would require a lot of training and a thorough understanding before using it to manipulate extensive data sets, but I feel after doing this case study that I have a basic understanding of what SAP HANA is capable of, and where to focus my future learning on to really understand how HANA works on a deeper level. Conclusion SAP HANA Data Modeling Process is very complex in its abilities, and it is not designed with a beginner user in mind. With practice and understand of SAP HANA itself and proper research behind the data, it is a very powerful tool that can be used to the advantage of any analyst.
  • 38. 38 CHAPTER 9: SAP PREDICTIVE ANALYTICS: ASSOCIATION ANALYSIS About SAP Predictive Analytics: Association Analysis Association Analysis allows for us to find correlation between data that is not always apparent at surface level by examining a group of transactions and determining rules based on the input. The findings from these analyses can be used in business for example to promote and recommend items that often occur together. Association rules are written out as antecedents -> consequents, or an “if/then” format. The strength of a rule is determined by three factors: - Support (S) (0<S<1): The percentage of transactions that follow the rule o =P (A ^B) - Confidence ‘(C) (0<C<1): the probability that the rule will occur o = P(B|A) - Lift (L): Is the association rule based or just a coincidence? o =P(B|A)/P(B) The association analysis can then be mined further using the A-priori Algorithm: - Trims out infrequent rules - Identifies weaknesses within the model and can have a threshold setting for support, confidence, and lift (Week 9, Lee, 2019) Data Set The data set I will be using is the passenger data from the Titanic. Using Expert Analytics, I will be generating rules through an Association Analysis surrounding the survivability of the passengers given different variables. (Jones, Kale, 2019) Setting up the Model Using the R-Apiori Association Algorithm (Figure 47), I set up the Item columns to reflect “Age”, “Class”,” Sex”, and “Survived”. These are the factors that I am trying to determine an association with. By setting support at 0.01, we are looking for 1% of the data to match the rule perfectly. With confidence at 0.8, or the relative frequency of survival. FIGURE 47
  • 39. 39 I’m looking for survival rates, so isolating survived to the right hand side (the consequent) of the equation will give a better picture of this and what associations are actually associated with survival (Figure 48) Running the Association Analysis and Visualizing The rules produced by the previous algorithm are as follows (Figure 50): To set up the visualization, I converted Lift, Confidence, and Support to measures (Figure 49) to get a different perspective of the rules. Creating a bubble chart based on these three measures, with Confidence as the Y-Axis, Support as the X-Axis, and Lift as the bubble size, it visualizes the dimensions within the rules. Colours are then added to differentiate the specific rules (Figure 51). Now that it is laid out visually within the program, the exact coordinates can be seen by mousing over the individual bubbles. FIGURE 50 FIGURE 48 FIGURE 49 FIGURE 51
  • 40. 40 Within the rules, there are two that I feel are the strongest: From the visualization standpoint, these two standout as large and above the rest on the chart. Upon further inspection, these have two of the highest confidence rates out of the rules. Analysis of SAP Predictive Analytics: Association Analysis SAP Predictive Analytics: Association Analysis is simple enough to use for even the most basic analytics user. There is an understanding of statistics that the user would need before attempting to understand, as not everything is explained within the program. There is an assumption of knowledge within the program, but not at a complex level. One big flaw that I found is that it is difficult to switch back after adding constraints, to the original rules that the algorithm provided. SAP Predictive Analytics is an excellent tool all around for generating statistical data, as you are able to predict, analyze, visualize, and then save your findings again back into a .csv file all within the one program. Because it is so straight forward, there is not much to analyze about the actual processes that wasn’t evident in the previous section. Conclusion SAP Predictive Analytics: Association Analysis is a user-friendly program that ads a visual component to statistics that I had never considered before. FIGURE 53 FIGURE 52
  • 41. 41 REFERENCES Jones, N., Kale, N., (2019), Chapter 5: Exercise 1, Using Excel Pivot Tables for Analytics, [Case Study], Retrieved from Brightspace Jones, N., Kale, N., (2019), Chapter 5: Exercise 2, Data Manipulation for Analytics, [Case Study], Retrieved from Brightspace *Modified by Kyung Lee Jones, N., Kale, N., (2019), Chapter 11: Exercise 1, Association Analysis, [Case Study], Retrieved from Brightspace Lee, Kyung Y., (2019) SAP HANA DATA MODELING CASE STUDY BUSINESS SCENARIO [Case Study], Retrieved from Brightspace Lee, Kyung Y., (2019) Week 3: Data Modelling and Extraction Transformation Loading [PowerPoint Slides], Retrieved from Brightspace Lee, Kyung Y., (2019) Week 5: Business Reporting, and Performance Management [PowerPoint Slides], Retrieved from Brightspace Lee, Kyung Y., (2019) Week 6: Data Visualization Basics (Chapter 7) [PowerPoint Slides], Retrieved from Brightspace Lee, Kyung Y., (2019) Week 9: Data Mining & Predictive Analytics [PowerPoint Slides], Retrieved from Brightspace Lee, Kyung Y., (2019) Week 10: Big Data & In-Memory Analytics [PowerPoint Slides], Retrieved from Brightspace SAP Lumira Discovery (Benefits and Capabilities), (2019) https://www.sap.com/canada/products/lumira.htm