The document describes how TIBCO Spotfire's recommendation engine can help users quickly build dashboards and analyses. It provides two case studies that demonstrate how recommendations reduced dashboard setup time to 30 seconds in the first case and helped identify factors influencing paper towel absorbency in the second. Overall, recommendations dramatically speeds data exploration and insight generation for both business users and analysts.
bis 155 week 1 ilab data analysis with spreadsheets with lab,bis 155 week 1 quiz data analysis with spreadsheets with lab,bis 155 week 1 to week 5 all quiz,bis 155 week 1 to week 7 all discussions,devry bis 155 week 1,bis 155 week 1,devry bis 155,bis 155,devry bis 155 week 1 tutorial,devry bis 155 week 1 assignment,devry bis 155 week 1 help
This document analyzes potential new markets for SPC Direct in South Carolina. It identifies counties that meet criteria for household income over $50,000 and insured populations over 40,000 by analyzing census data. These counties are overlaid with SPC Direct's current market areas. The analysis reveals potential in York County due to its high income, population of over 40,000 in Rock Hill, and insured population of over 200,000. Horry County also shows potential. The document recommends SPC Direct investigate expanding to York County.
This document provides instructions for completing iLab 8 activities in BIS 155. The activities include descriptive statistics, formatting, graphs, and regression analysis using temperature, marketing, income, and other sample data. Students are instructed to calculate descriptive statistics, create different graph types like bar charts and line graphs, perform regression analysis to examine relationships between variables, and sort data in various ways. The document emphasizes that while statistics are useful, they must be interpreted carefully and can be skewed depending on the questions asked and data collected.
This document provides an overview and instructions for using Tableau software for data visualization and analysis. It describes Tableau as a tool for simplifying data into understandable formats via dashboards and worksheets. Steps are outlined for connecting a CSV file on demographic data to Tableau, creating a map visualization showing populations by state in India, and differences between live and extract connections. Basic concepts like dimensions, measures, and different methods for creating visualizations through drag and drop or double clicking are also summarized.
The document provides information on various data types, connecting to data sources in Tableau, an assignment objective to analyze sales and shipping data, and how to change data types in a data source or view. It also covers visual design basics like elements, principles, and use of color in design.
Data visualization 101_how_to_design_charts_and_graphsAtner Yegorov
This document provides guidance on designing effective data visualizations. It discusses different types of charts and graphs such as bar charts, pie charts, line charts, area charts, scatter plots, bubble charts and heat maps. It explains how to identify the key story or relationship in the data to determine the best visualization method. The document also provides best practices for designing each type of visualization to ensure the data is clearly presented and easy to understand.
The document summarizes analysis of labor data from the Bureau of Labor Statistics for Iowa from 2006-2015. Key points:
- Data was downloaded, cleaned, filtered to only include Iowa data, and concatenated into a single dataset for analysis.
- Occupations were clustered and categorized into 3 groups: Professional, Manual Labor, and Personal Services. Geographical areas were also grouped.
- Analysis found higher salaries in college towns and metro areas for professionals, with salaries exceeding $60k recently. Median incomes were highest for professionals, followed by manual labor, and lowest for personal services statewide.
- Salary distributions were explored, finding the gap between lower and higher salaries growing for most areas and occupations, particularly
Statistics can be used to describe patterns but need context to avoid being misleading. While averages, measures of spread, and probabilities help summarize data, graphs are better to show trends over time. Pie charts, bar graphs, and maps can effectively visualize data geographically or by category when formatted properly. Advanced statistical software and websites provide cutting-edge tools for analysis and interactive graphics but improper use can result in poor statistical reasoning.
bis 155 week 1 ilab data analysis with spreadsheets with lab,bis 155 week 1 quiz data analysis with spreadsheets with lab,bis 155 week 1 to week 5 all quiz,bis 155 week 1 to week 7 all discussions,devry bis 155 week 1,bis 155 week 1,devry bis 155,bis 155,devry bis 155 week 1 tutorial,devry bis 155 week 1 assignment,devry bis 155 week 1 help
This document analyzes potential new markets for SPC Direct in South Carolina. It identifies counties that meet criteria for household income over $50,000 and insured populations over 40,000 by analyzing census data. These counties are overlaid with SPC Direct's current market areas. The analysis reveals potential in York County due to its high income, population of over 40,000 in Rock Hill, and insured population of over 200,000. Horry County also shows potential. The document recommends SPC Direct investigate expanding to York County.
This document provides instructions for completing iLab 8 activities in BIS 155. The activities include descriptive statistics, formatting, graphs, and regression analysis using temperature, marketing, income, and other sample data. Students are instructed to calculate descriptive statistics, create different graph types like bar charts and line graphs, perform regression analysis to examine relationships between variables, and sort data in various ways. The document emphasizes that while statistics are useful, they must be interpreted carefully and can be skewed depending on the questions asked and data collected.
This document provides an overview and instructions for using Tableau software for data visualization and analysis. It describes Tableau as a tool for simplifying data into understandable formats via dashboards and worksheets. Steps are outlined for connecting a CSV file on demographic data to Tableau, creating a map visualization showing populations by state in India, and differences between live and extract connections. Basic concepts like dimensions, measures, and different methods for creating visualizations through drag and drop or double clicking are also summarized.
The document provides information on various data types, connecting to data sources in Tableau, an assignment objective to analyze sales and shipping data, and how to change data types in a data source or view. It also covers visual design basics like elements, principles, and use of color in design.
Data visualization 101_how_to_design_charts_and_graphsAtner Yegorov
This document provides guidance on designing effective data visualizations. It discusses different types of charts and graphs such as bar charts, pie charts, line charts, area charts, scatter plots, bubble charts and heat maps. It explains how to identify the key story or relationship in the data to determine the best visualization method. The document also provides best practices for designing each type of visualization to ensure the data is clearly presented and easy to understand.
The document summarizes analysis of labor data from the Bureau of Labor Statistics for Iowa from 2006-2015. Key points:
- Data was downloaded, cleaned, filtered to only include Iowa data, and concatenated into a single dataset for analysis.
- Occupations were clustered and categorized into 3 groups: Professional, Manual Labor, and Personal Services. Geographical areas were also grouped.
- Analysis found higher salaries in college towns and metro areas for professionals, with salaries exceeding $60k recently. Median incomes were highest for professionals, followed by manual labor, and lowest for personal services statewide.
- Salary distributions were explored, finding the gap between lower and higher salaries growing for most areas and occupations, particularly
Statistics can be used to describe patterns but need context to avoid being misleading. While averages, measures of spread, and probabilities help summarize data, graphs are better to show trends over time. Pie charts, bar graphs, and maps can effectively visualize data geographically or by category when formatted properly. Advanced statistical software and websites provide cutting-edge tools for analysis and interactive graphics but improper use can result in poor statistical reasoning.
Data visualization data sources data types- visual designManokamnaKochar1
The document discusses various data types including string values, number values, date values, boolean values, and geographic values. It then provides information on connecting to data in Tableau and selecting the "Orders" sheet from the "Sample - Superstore.xls" excel file. The next section provides an objective and instructions for an assignment involving sales data analysis and deriving meaningful insights. Basic visual design principles such as hierarchy, balance, contrast, scale, and dominance/emphasis are then defined in 2-3 sentences each.
The document provides instructions for creating a census map by downloading census data from the US Census website, organizing it into a database file that can be joined to a census tract boundary shapefile, and defining the projection in order to map median household income by census tract. Key steps include selecting census variables of interest, converting the downloaded Excel file to a DBF format, downloading and defining the projection of a census tract boundary shapefile, and using a common identifier to join the census data to the tract boundaries.
This document provides an overview of data types in Tableau including string, number, date, date and time, boolean, and geographic values. It describes how to connect to data in Tableau by selecting a file or server data source. It also discusses changing the data type of fields from the data pane or in a view and explains that dimensions contain qualitative values that affect detail level while measures contain quantitative values that can be aggregated. Finally, it lists some portals for finding open data sets.
Creating visual representations of time period dataQHRClinicalOps
This presentations walks through charting in Microsoft excel. It builds on the previous "Summary Statistics" presentation to show how to take that information to the next level with charts.
The document discusses various features and functions of the Insert tab in Microsoft Excel, including how to insert tables, illustrations, charts, reports, and other objects. It provides details on different types of charts like column, bar, and line charts that can be created in Excel and how to modify chart properties. The document also summarizes steps for creating, editing, moving, deleting, and formatting charts in Excel worksheets.
Pivot tables allow users to summarize and analyze data in Excel by aggregating and reorganizing the data into a new format determined by the user. The document provides a step-by-step tutorial on how to create a pivot table using sample voter data. Key steps include selecting the data range, inserting a pivot table on a new worksheet, and dragging fields from the pivot table field list to rows, columns, and values areas to choose how the data should be organized and summarized. Advanced techniques like filtering, moving fields, and customizing pivot table options are also demonstrated.
Exploring covid 19 data for toronto, canada sta3031002 data oreo10
This document provides instructions for an assignment exploring COVID-19 data for Toronto, Canada. It outlines 3 tasks: 1) creating a bar chart of daily active, recovered, and deceased COVID cases, 2) creating a stacked bar chart of cases by outbreak type and week, and 3) analyzing COVID data and 2016 census data by Toronto neighborhood. The assignment is due February 12, 2021 and details are provided on data sources, submission instructions, and grading policies.
Grading sheet major assignment 2 grading sheetcompetencyrequirementoreo10
This document provides a grading sheet for a savings and loan analysis assignment. It lists the competencies and requirements students must meet to receive full credit. This includes correctly formatting interest rates, costs, tables, and formulas to calculate savings and loan amounts over various time periods. Students are asked to analyze potential savings from energy improvements by calculating projected costs savings over 5, 10, and 15 years and comparing this to loan payments to fund the improvements.
Data analysis and Data Visualization using Microsoft ExcelFrehiwot Mulugeta
The document provides an overview of data analysis and visualization using Microsoft Excel. It discusses summarizing data using functions like COUNTIF, COUNTIFS, and SUMIF. It also covers creating pivot tables, adding filters and slicers, formatting pivot tables, and creating pivot charts. The objective is to teach participants how to summarize, analyze, and visualize data in Excel to extract patterns and trends.
1. The document provides step-by-step instructions for editing an Excel database and dashboard used to track performance metrics over time.
2. It describes how to save the file with the new year, clear old data, enter new dates and objectives for the upcoming year, and save the changes.
3. Subsequent sections explain how to enter new monthly data, update the dashboard, graphs, and metrics as new information becomes available each reporting cycle.
This document provides an overview of data analysis and visualization using Microsoft Excel. It covers summarizing data using functions like COUNTIF, sorting and filtering data, creating pivot tables, adding filters and slicers to pivot tables, formatting pivot tables, and creating pivot charts. The objective is to help users understand how to extract insights from data through summarization, aggregation, and visualization techniques in Excel.
How to use SPSS (Statistical Package for Social Science) data. This software program is extensively used for Social Science data analysis. However it is also used by managers, scholars and Engineers also. In this document how to use SPSS for data analysis is explained step by step.
This document discusses six common types of charts used in business: column chart, stacked bar chart, line chart, XY scatter plot, pie chart, and exploded pie chart. It defines each chart and provides examples to illustrate the type of data each chart is best suited to display. The column chart compares groups of data. The stacked bar chart shows the contribution of parts to a whole. The line chart indicates trends over time. The XY scatter plot shows correlations between two variables. The pie chart displays the percentage of parts in a whole. The exploded pie chart emphasizes portions of a pie chart.
This document provides an overview of data visualization techniques. It discusses the uses of data visualization such as identifying errors and highlighting relationships in data. It also covers different types of charts (e.g. line charts, bar charts, pie charts) and tables as well as principles for effective data visualization like maximizing the data-ink ratio. Advanced techniques like parallel coordinate plots, treemaps and geographic information systems (GIS) charts are also introduced. Finally, the document discusses data dashboards and principles for effective dashboard design.
Effective Data Viewing in Sale Trend Analysis by using Data Cubeijtsrd
Most retailers wish they knew more about their sales and customers' buying habits. They want to know the right levers to push and pull to increase sales and customer satisfaction. However, there are many obstacles in their way that keep them from these insights. It could be that their data is split between disconnected systems or that they are dealing with legacy systems that can't keep up. It's complicated to converge the sales from different online and offline sales channels for analysis. Centralized the location and using the Data Cube gives benefits for business such as knowing the sales trends by product type, region, or time period and knowing how much inventory you have of each product easily. This proposed system will discuss what capabilities need to perform sales trend analysis and contribute the sales trend analysis effectively by using Data Cube in viewing data. Myint Myint Yee | San San New | Myat Mon Kyaw "Effective Data Viewing in Sale Trend Analysis by using Data Cube" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd27836.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/27836/effective-data-viewing-in-sale-trend-analysis-by-using-data-cube/myint-myint-yee
This document provides guidance on creating pivot tables in Excel to analyze multi-dimensional data. It discusses preparing the data source by ensuring it is in a single dimension without blank cells. It also recommends designing the desired report output before creating the pivot table. The document provides an example of source data structured for analysis by date, color, and quantity dimensions. It demonstrates how this data can be analyzed in a pivot table with hierarchies for rows and columns, including subtotals by month and year. Keyboard shortcuts for quickly building pivot tables are also outlined.
Analyzing data using application in MS ExcelShohag Das
This document provides an overview of different data analysis tools in Microsoft Excel including pivot tables, data tables, scenario manager, goal seek, and VLOOKUP. Pivot tables allow users to summarize and analyze data by sorting and filtering, data tables enable analyzing data with multiple changing variables or conditions simultaneously, and scenario manager facilitates comparing outcomes under different scenarios. Goal seek identifies the input needed to achieve a desired result, and VLOOKUP looks up values and references from a table horizontally or vertically. All of these tools help save time and make interactive data analysis and decision making more efficient.
This document contains summaries of various Excel data analysis techniques including: text to columns to separate cell content into different columns based on delimiters; pivot tables to analyze data from different perspectives; scatter plots and linear regression to determine relationships between variables; installing the Analysis ToolPak add-in; descriptive statistics like measures of central tendency, variance, skewness, and kurtosis; correlation to examine relationships between variables; and regression to make predictions by modeling relationships between independent and dependent variables with lines and equations.
This document provides information about finding and using local statistical data. It discusses why local statistics are important, the types of statistical information available including census data, benefits claimant rates, and indices of deprivation. It then provides step-by-step instructions on how to access and present this data using Neighbourhood Statistics, Nomisweb, and Deprivation Mapper. Key details covered include different geographic scales, downloading data to Excel to create graphs and maps, and using the tools to highlight issues in an area.
What is my neighbourhood like: Data collectingAmarni Wood
When developing your First Steps plan (and when applying to other funders) it is important to have good evidence of what your area is really like. Statistical information collected by various public bodies can be an excellent way of doing this.
This guidance provides information on: Why statistical data about your local area is important, what statistical information is available for public use, and how to find & present data about your local area.
Data visualization data sources data types- visual designManokamnaKochar1
The document discusses various data types including string values, number values, date values, boolean values, and geographic values. It then provides information on connecting to data in Tableau and selecting the "Orders" sheet from the "Sample - Superstore.xls" excel file. The next section provides an objective and instructions for an assignment involving sales data analysis and deriving meaningful insights. Basic visual design principles such as hierarchy, balance, contrast, scale, and dominance/emphasis are then defined in 2-3 sentences each.
The document provides instructions for creating a census map by downloading census data from the US Census website, organizing it into a database file that can be joined to a census tract boundary shapefile, and defining the projection in order to map median household income by census tract. Key steps include selecting census variables of interest, converting the downloaded Excel file to a DBF format, downloading and defining the projection of a census tract boundary shapefile, and using a common identifier to join the census data to the tract boundaries.
This document provides an overview of data types in Tableau including string, number, date, date and time, boolean, and geographic values. It describes how to connect to data in Tableau by selecting a file or server data source. It also discusses changing the data type of fields from the data pane or in a view and explains that dimensions contain qualitative values that affect detail level while measures contain quantitative values that can be aggregated. Finally, it lists some portals for finding open data sets.
Creating visual representations of time period dataQHRClinicalOps
This presentations walks through charting in Microsoft excel. It builds on the previous "Summary Statistics" presentation to show how to take that information to the next level with charts.
The document discusses various features and functions of the Insert tab in Microsoft Excel, including how to insert tables, illustrations, charts, reports, and other objects. It provides details on different types of charts like column, bar, and line charts that can be created in Excel and how to modify chart properties. The document also summarizes steps for creating, editing, moving, deleting, and formatting charts in Excel worksheets.
Pivot tables allow users to summarize and analyze data in Excel by aggregating and reorganizing the data into a new format determined by the user. The document provides a step-by-step tutorial on how to create a pivot table using sample voter data. Key steps include selecting the data range, inserting a pivot table on a new worksheet, and dragging fields from the pivot table field list to rows, columns, and values areas to choose how the data should be organized and summarized. Advanced techniques like filtering, moving fields, and customizing pivot table options are also demonstrated.
Exploring covid 19 data for toronto, canada sta3031002 data oreo10
This document provides instructions for an assignment exploring COVID-19 data for Toronto, Canada. It outlines 3 tasks: 1) creating a bar chart of daily active, recovered, and deceased COVID cases, 2) creating a stacked bar chart of cases by outbreak type and week, and 3) analyzing COVID data and 2016 census data by Toronto neighborhood. The assignment is due February 12, 2021 and details are provided on data sources, submission instructions, and grading policies.
Grading sheet major assignment 2 grading sheetcompetencyrequirementoreo10
This document provides a grading sheet for a savings and loan analysis assignment. It lists the competencies and requirements students must meet to receive full credit. This includes correctly formatting interest rates, costs, tables, and formulas to calculate savings and loan amounts over various time periods. Students are asked to analyze potential savings from energy improvements by calculating projected costs savings over 5, 10, and 15 years and comparing this to loan payments to fund the improvements.
Data analysis and Data Visualization using Microsoft ExcelFrehiwot Mulugeta
The document provides an overview of data analysis and visualization using Microsoft Excel. It discusses summarizing data using functions like COUNTIF, COUNTIFS, and SUMIF. It also covers creating pivot tables, adding filters and slicers, formatting pivot tables, and creating pivot charts. The objective is to teach participants how to summarize, analyze, and visualize data in Excel to extract patterns and trends.
1. The document provides step-by-step instructions for editing an Excel database and dashboard used to track performance metrics over time.
2. It describes how to save the file with the new year, clear old data, enter new dates and objectives for the upcoming year, and save the changes.
3. Subsequent sections explain how to enter new monthly data, update the dashboard, graphs, and metrics as new information becomes available each reporting cycle.
This document provides an overview of data analysis and visualization using Microsoft Excel. It covers summarizing data using functions like COUNTIF, sorting and filtering data, creating pivot tables, adding filters and slicers to pivot tables, formatting pivot tables, and creating pivot charts. The objective is to help users understand how to extract insights from data through summarization, aggregation, and visualization techniques in Excel.
How to use SPSS (Statistical Package for Social Science) data. This software program is extensively used for Social Science data analysis. However it is also used by managers, scholars and Engineers also. In this document how to use SPSS for data analysis is explained step by step.
This document discusses six common types of charts used in business: column chart, stacked bar chart, line chart, XY scatter plot, pie chart, and exploded pie chart. It defines each chart and provides examples to illustrate the type of data each chart is best suited to display. The column chart compares groups of data. The stacked bar chart shows the contribution of parts to a whole. The line chart indicates trends over time. The XY scatter plot shows correlations between two variables. The pie chart displays the percentage of parts in a whole. The exploded pie chart emphasizes portions of a pie chart.
This document provides an overview of data visualization techniques. It discusses the uses of data visualization such as identifying errors and highlighting relationships in data. It also covers different types of charts (e.g. line charts, bar charts, pie charts) and tables as well as principles for effective data visualization like maximizing the data-ink ratio. Advanced techniques like parallel coordinate plots, treemaps and geographic information systems (GIS) charts are also introduced. Finally, the document discusses data dashboards and principles for effective dashboard design.
Effective Data Viewing in Sale Trend Analysis by using Data Cubeijtsrd
Most retailers wish they knew more about their sales and customers' buying habits. They want to know the right levers to push and pull to increase sales and customer satisfaction. However, there are many obstacles in their way that keep them from these insights. It could be that their data is split between disconnected systems or that they are dealing with legacy systems that can't keep up. It's complicated to converge the sales from different online and offline sales channels for analysis. Centralized the location and using the Data Cube gives benefits for business such as knowing the sales trends by product type, region, or time period and knowing how much inventory you have of each product easily. This proposed system will discuss what capabilities need to perform sales trend analysis and contribute the sales trend analysis effectively by using Data Cube in viewing data. Myint Myint Yee | San San New | Myat Mon Kyaw "Effective Data Viewing in Sale Trend Analysis by using Data Cube" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd27836.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/27836/effective-data-viewing-in-sale-trend-analysis-by-using-data-cube/myint-myint-yee
This document provides guidance on creating pivot tables in Excel to analyze multi-dimensional data. It discusses preparing the data source by ensuring it is in a single dimension without blank cells. It also recommends designing the desired report output before creating the pivot table. The document provides an example of source data structured for analysis by date, color, and quantity dimensions. It demonstrates how this data can be analyzed in a pivot table with hierarchies for rows and columns, including subtotals by month and year. Keyboard shortcuts for quickly building pivot tables are also outlined.
Analyzing data using application in MS ExcelShohag Das
This document provides an overview of different data analysis tools in Microsoft Excel including pivot tables, data tables, scenario manager, goal seek, and VLOOKUP. Pivot tables allow users to summarize and analyze data by sorting and filtering, data tables enable analyzing data with multiple changing variables or conditions simultaneously, and scenario manager facilitates comparing outcomes under different scenarios. Goal seek identifies the input needed to achieve a desired result, and VLOOKUP looks up values and references from a table horizontally or vertically. All of these tools help save time and make interactive data analysis and decision making more efficient.
This document contains summaries of various Excel data analysis techniques including: text to columns to separate cell content into different columns based on delimiters; pivot tables to analyze data from different perspectives; scatter plots and linear regression to determine relationships between variables; installing the Analysis ToolPak add-in; descriptive statistics like measures of central tendency, variance, skewness, and kurtosis; correlation to examine relationships between variables; and regression to make predictions by modeling relationships between independent and dependent variables with lines and equations.
This document provides information about finding and using local statistical data. It discusses why local statistics are important, the types of statistical information available including census data, benefits claimant rates, and indices of deprivation. It then provides step-by-step instructions on how to access and present this data using Neighbourhood Statistics, Nomisweb, and Deprivation Mapper. Key details covered include different geographic scales, downloading data to Excel to create graphs and maps, and using the tools to highlight issues in an area.
What is my neighbourhood like: Data collectingAmarni Wood
When developing your First Steps plan (and when applying to other funders) it is important to have good evidence of what your area is really like. Statistical information collected by various public bodies can be an excellent way of doing this.
This guidance provides information on: Why statistical data about your local area is important, what statistical information is available for public use, and how to find & present data about your local area.
This is a presentation I gave on Data Visualization at a General Assembly event in Singapore, on January 22, 2016. The presso provides a brief history of dataviz as well as examples of common chart and visualization formatting mistakes that you should never make.
Graphs represent data in an engaging manner and make c.docxshericehewat
Graphs represent data in an engaging manner and make comparisons and analyses easier. For example, a graph depicting the number of crimes committed each year over a decade is easier to comprehend visually than reading the numerical values for each year. Before creating a graph, however, it is important to choose one that appropriately represents the data. A histogram, rather than a pie chart, is appropriate for depicting the age groups (e.g., 15–24, 25–34) of murder victims in a city. Histograms are designed to be used with variables that are categorized, but pie charts plot each value. Therefore, it would be easier to read a histogram showing bars for age groups of murder victims than a pie chart in which every single age would have to be plotted. In the past, creating graphs was cumbersome and time consuming, but present-day software programs such as Microsoft Word and Excel provide tutorials that walk you through the process. With knowledge of these software programs, you can create customized charts and figures to represent your research data in visually interesting ways. In this Assignment, you create at least two different graphs in Excel or Word that can be used to illustrate hypothetical data related to six incidents of crime.
· Create at least two different graphs in Excel or Word using the data provided in the table below:
Type of Crime
Offender’s Age
(Years)
Offender’s Gender
Time of the Incident
Theft
22
Male
Early morning
Possession of drugs
21
Female
Late evening
Theft
19
Male
Late evening
Theft
33
Female
Afternoon
Possession of drugs
47
Female
Morning
Possession of drugs
17
Male
Early morning
· Briefly describe the data represented in the graphs and/or charts you created.
· Explain why the graphs and/or charts you created best represent the data compared to other options. Be specific.
Submit the graphs you created in a document that is separate from your written Assignment.
Bachman, R. D., & Schutt, R. K. (2019). The practice of research in criminology and criminal justice (7th ed.). Thousand Oaks, CA: SAGE Publications.
· Chapter 4, “Conceptualization and Measurement” (pp. 86–116)
The Practice of Research in Criminology and Criminal Justice, 7th Edition by Bachman, R. D. & Schutt, R. K. Copyright 2019 by SAGE Publications, Inc. Reprinted by permission of SAGE Publications, Inc via the Copyright Clearance Center.
Bachman, R. D., & Schutt, R. K. (2019). The practice of research in criminology and criminal justice (7th ed.). Thousand Oaks, CA: SAGE Publications.
· Chapter 14, “Analyzing Quantitative Data” (pp. 404–415 and 426–444)
The Practice of Research in Criminology and Criminal Justice, 7th Edition by Bachman, R. D. & Schutt, R. K. Copyright 2019 by SAGE Publications, Inc. Reprinted by permission of SAGE Publications, Inc via the Copyright Clearance Center.
Trochim, W. M. K. (2006). Levels of measurement. In Research methods knowledge base. Retrieved from http://www.socialresearchmethods.net/kb/measlevl.php
Walden Univer ...
The document provides a user guide for an interactive dashboard created to display funding information from the Big Lottery Fund. The dashboard uses bar charts, line charts, and pie charts to show current funding amounts by region, thematic area, applicant, year, and constituency. Bar charts are used to compare amounts across regions and applicants, while line charts show trends over time by year. A pie chart displays proportions spent on different thematic areas. The interactive features allow filtering the charts by selection.
BA 301.01Chapter 3 HomeworkIdentifyQ1) Examine the data se.docxrock73
BA 301.01
Chapter 3 Homework
Identify
Q1) Examine the data set on the worksheet Q1. This should be familiar; you saw this data set in last week’s homework. Recall that this is a survey of 500 households. The definitions of the variables can be found by reading the comments (the little red triangles).
a) Suppose we wanted to examine the distribution of the variable Utilities. It would be appropriate to create a histogram, but we would like to identify measures like the mean, median, and quartiles. What other type of graph could you create?
b) In order to examine the relationship between the variables First_Income and Second_Income, what type of graph should be created? What summary measures, if any, should be calculated?
c) In order to examine the relationship between the variables Location and Monthly_Payment, what type of graph should be created? What summary measures, if any, should be calculated?
Execute
Q2) Examine the data set in the worksheet Q2. This data set lists the 15 players on the 2015-2016 roster of the Los Angeles Lakers. The variables include the name, position, and salary for each player.
Create a side-by-side boxplot for Salary, broken out by Position. In other words, there should be 5 boxplots, one showing the distribution of salaries for those who play center, one showing the distribution of salaries for those who play point guard, and so on. Specifically, this should all be in a single graph.
Q3) Examine again the data in sheet Q1 (there is no sheet Q3.) We would like to know if the size on one’s family has an effect on their income. Keeping in mind which is the explanatory variable and which is the response variable, create a scatterplot of the variables First_Income and Family Size. Be sure to include the correlation coefficient.
Q4) Who has been most likely to access the internet over the past decade? Consider the survey data collected from 1,000 randomly selected internet users, given in Q4. Create pivot tables to answer each of the following questions, and be sure to highlight your answer:
a)
i. What proportion (percentage) of these internet users is currently employed?
ii. What is the average salary of the employed internet users in this sample?
b) What proportion (percentage) of these internet users is single with no formal education beyond the high
school?
c) What proportion (percentage) of these internet users are men under the age of 30?
Interpret
Q5) Interpret the following. What do you learn? What conclusions can you draw?
In May of 2008, a study was published examining CEO compensation (salary and bonus.) In last week’s homework, you viewed a boxplot of the distribution of compensation, as well as summary measures. Here is the same data, now broken out by company type (there were 10 different company types surveyed: from Basic Materials to Utilities). Side-by-side boxplots are on the next page. Again, what can you conclude?
Comp 08 (Basi ...
Week 2 Project - STAT 3001Student Name Type your name here.docxcockekeshia
Week 2 Project - STAT 3001
Student Name: <Type your name here>
Date: <Enter the date on which you began working on this assignment.>
Instructions: To complete this project, you will need the following materials:
· STATDISK User Manual (found in the classroom in DocSharing)
· Access to the Internet to download the STATDISK program.
This assignment is worth a total of 60 points.
Part I. Histograms and Frequency Tables
Instructions
Answers
1. Open the file Diamonds using menu option Datasets and then Elementary Stats, 9th Edition. This file contains some information about diamonds. What are the names of the variables in this file?
2. Create a histogram for the depth of the diamonds using the Auto-fit option. Paste the chart here. Once your histogram displays, click Turn on Labels to get the height of the bars.
3. Using the information in the above histogram, complete this table. Be sure to include frequency, relative frequency, and cumulative frequency.
Depth
Frequency
Relative Frequency
Cumulative Frequency
57-58.9
59-60.9
61-62.9
63-64.9
a. Using the frequency table above, how many of the diamonds have a depth of 60.9 or less? How do you know?
b. Using the frequency table above, how many of the diamonds have a depth between 59 and 62.9? Show your work.
c. What percent of the diamonds have a depth of 61 or more?
Part II. Comparing Datasets
Instructions
Answers
1. Create a boxplot that compares the color and clarity of the diamonds. Paste it here.
2. Describe the similarities and differences in the data sets. Please be specific to the graph created.
Part III. Finding Descriptive Numbers
Instructions
Answers
3. Open the file named Stowaway (using Datasets and then Elementary Stats, 9th Edition). This gives information on the number of stowaways going west vs east.List all the variables in the dataset.
4. Find the Mean, median, and midrange for the Data in Column 1.
5. Find the Range, variance, and standard deviation for the first column.
6. List any values for the first column that you think may be outliers. Why do you think that?
[Hint: You may want to sort the data and look at the smallest and largest values.]
7. Find the Mean, median, and midrange for the data in Column 2.
8. Find the Range, variance, and standard deviation for the data in Column 2.
9. List any values for the second column that you think may be outliers. Why do you think that?
10. Find the five-number summary for the stowaways data in Columns 1 and 2. You will need to label each of the columns with an appropriate measure in the top row for clarity.
11. Compare number of stowaways going west and east using a boxplot of Columns 1 and 2. Paste your boxplot here
12. Create a histogram for the
Column 1 data and paste it here.
13. Create a histogram for the
Column 2 data and paste it here.
Part IV. Interpreting Statistical Information
The Stowaway data contains two columns, both of which are mea.
This document provides an overview of the statistical software IBM SPSS and its uses. It discusses SPSS's abilities in descriptive statistics, bivariate statistics, prediction, and identifying groups. The document also explores the basic use of SPSS, including how to enter data and define variable meanings. Finally, it examines the dataset "country.sav" that could be used for a class project, focusing on variables like GDP, life expectancy, and healthcare that reflect living standards across countries.
About Your Signature AssignmentThis signature assignment is desi.docxbartholomeocoombs
About Your Signature Assignment
This signature assignment is designed to align with specific program student learning outcome(s) in your program. Program Student Learning Outcomes are broad statements that describe what students should know and be able to do upon completion of their degree. The signature assignments might be graded with an automated rubric that allows the University to collect data that can be aggregated across a location or college/school and used for program improvements.
Purpose of Assignment
The purpose of this assignment is for students to synthesize the concepts learned throughout the course. This assignment will provide students an opportunity to build critical thinking skills, develop businesses and organizations, and solve problems requiring data by compiling all pertinent information into one report.
Assignment Steps
Resources: Microsoft Excel®, Signature Assignment Databases, Signature Assignment Options, Part 3: Inferential Statistics
Scenario: Upon successful completion of the MBA program, say you work in the analytics department for a consulting company. Your assignment is to analyze one of the following databases:
· Manufacturing
· Hospital
· Consumer Food
· Financial
Select one of the databases based on the information in the Signature Assignment Options.
Provide a 1,600-word detailed, statistical report including the following:
· Explain the context of the case
· Provide a research foundation for the topic
· Present graphs
· Explain outliers
· Prepare calculations
· Conduct hypotheses tests
· Discuss inferences you have made from the results
This assignment is broken down into four parts:
· Part 1 - Preliminary Analysis
· Part 2 - Examination of Descriptive Statistics
· Part 3 - Examination of Inferential Statistics
· Part 4 - Conclusion/Recommendations
Part 1 - Preliminary Analysis (3-4 paragraphs)
Generally, as a statistics consultant, you will be given a problem and data. At times, you may have to gather additional data. For this assignment, assume all the data is already gathered for you.
State the objective:
· What are the questions you are trying to address?
Describe the population in the study clearly and in sufficient detail:
· What is the sample?
Discuss the types of data and variables:
· Are the data quantitative or qualitative?
· What are levels of measurement for the data?
Part 2 - Descriptive Statistics (3-4 paragraphs)
Examine the given data.
Present the descriptive statistics (mean, median, mode, range, standard deviation, variance, CV, and five-number summary).
Identify any outliers in the data.
Present any graphs or charts you think are appropriate for the data.
Note: Ideally, we want to assess the conditions of normality too. However, for the purpose of this exercise, assume data is drawn from normal populations.
Part 3 - Inferential Statistics (2-3 paragraphs)
Use the Part 3: Inferential Statistics document.
· Create (formulate) hypotheses
· Run formal hypothesis tests
· Make decis.
BUSI 331Marketing Research Report Part 3 InstructionsData .docxhumphrieskalyn
BUSI 331
Marketing Research Report Part 3 Instructions
Data Submission
Review the Basic Data Analysis section in the Zikmund & Babin text and the presentation from Module/Week 4, Presentation: Using Excel for Data Analysis. There will be 2 submissions in this assignment: the Excel document with the raw data that includes a code guide and the Marketing Research Report as a continuation of your Part 1 Word document.
1. Submit the raw data from your survey results in an Excel document. To do this, you will need to build an Excel spreadsheet to organize your data. You may find that Survey Monkey or other online survey tools will already do this for you.
2. In order to get the best results from your data analysis, you will need to code your responses that are not already numerical. For example, if your question asked if the respondent was male or female, male=0, and female=1. If it was yes or no question, yes=0, no=1. Please include a code guide with your raw data.
Your Excel document submission is your raw data with a code chart to clarify what the raw numbers stand for. Please note that raw data is numerical and the data has not been manipulated in any way. You can post the Code Guide in Sheet 2 of your Excel document if that is easier for you. As an example, your raw data and code guide will look like this:
Respondent
Gender
Age
Q1
1
1
1
1
2
1
2
4
3
2
3
3
4
2
2
5
5
1
3
2
Code Guide:
Gender: 1=male, 2=female
Age Range: 1=18-24, 2=25-30, etc.
Q1 (5 point likert scale): 1=very unlikely, 2=unlikely, 3=neutral, etc.
3. Submit 3 tables that were created in Excel from your data, inclusive of 1 frequency table and 2 cross-tabulation tables. This needs to be relevant information that will directly impact your research problem. Please write 1 comprehensive paragraph underneath each individual table that clearly describes what the table is showing and what the inferences are from this table and information in relation to the research problem. Turn at least 1 of your tables into a graph (either a bar or pie chart) to show the data from the table. Place this material (three charts/tables and three written discussions of each) as Appendix 2 in your research report, and submit this part as a compilation with your Parts 1 and 2.
This assignment is due by 11:59 p.m. (ET) on Monday of Module/Week 6.
Show all your work neatly for full credit.
1) Solve the differential equations:
2) Compute the solution of the given initial value problem.
3) For the equation
a) determine the frequency of the beats.
b) determine the frequency of the rapid oscillations.
c) Use the informayion from parts a) and b) to give a rough sketch of the graph of a typical solution.
4) Consider the equation
a) Compute the general solution.
a)
Solve the initial value problem
...
Sheet1SIZESIZESQUSAGE129016641001182135018225001172147021609001264160025600001493171029241001571184033856001711198039204001804223049729001840240057600001956271073441002007293085849001984300090000001960321010304100200132401049760019283520123904001945
Sheet2
Sheet3
Descriptive Statistics and Interpretation
Create a Microsoft® Excel® spreadsheet with the two variables from your learning team's dataset.
Analyze the data with MegaStat®, StatCrunch®, Microsoft® Excel®or other statistical tool(s), including:
(a) Descriptive stats for each numeric variable
(b) Histogram for each numeric variable
(c) Bar chart for each attribute (non numeric) variable
(d) Scatter plot if the data contains two numeric variables
Determine the appropriate descriptive statistics.
(a) For normally distributed data use the mean and standard deviation.
(b) For significantly skewed data use the median and interquartile range.
Use the Individual Methodology Findings Template to complete the descriptive statistics.
Use the Descriptive Statistics and Interpretation Example to develop an interpretation of the descriptive statistics.
Format your paper consistent with APA guidelines.
Submit both the spreadsheet and the completed Individual Methodology Findings Template.
Here are the 3 papers from my team you need to use to populate the spreadsheet and the Individual Methodology Findings
Week 3: Random Sampling Plan
LEARNING TEAM REFLECTION 4
Random Sampling Plan
Pierre Lane
Applied Business Research & Statistics/561
February 21, 2015
Louis Daily
Abstract
This paper will focus on the electrical resident of the state of Alabama, and more specifically the customer of Alabama Power. We will pay close attention to the square footage of the target population’s home square footage and the amount of electricity consumed on an annual basis to determine it the average home size in the stated determines the amount of electricity consumed by the average resident. The paper will also focus on the collecting and analyzing sampling methods used to arrive at a decision on square footage and its effect on energy consumption.
Variables
Independent – square footage of a home
Dependent – electricity usage of a home
Target Population and Population Size
For the purpose of this paper to outline the random sampling population size and target population we will included all 2.08 MM Alabama Power residential customers. We will only use residential customers that have at least one consecutive year of usage data available in the company historical databases. This is to ensure that the residential data used is the most accurate and a true representation of energy consumption used in homes relative to their square footage. This data will be mined using both Microsoft Excel an Access. All residential customer data from Alabama power will be analyzed and exclusion parameters will be set after the data is pulled but before analysis has commenced. We will then take this information.
Chapter 12 - Analyzing data quantitatively.pdfssuser864684
This document discusses quantitative data analysis and preparing data for quantitative analysis. It begins by defining quantitative data as numerical data that can be analyzed using quantitative techniques. It explains that data needs to be quantified or coded numerically before analysis. The document then outlines the main steps in preparing data for analysis, including defining what a "case" is, determining the data types and scales of measurement, and using appropriate numerical codes. It emphasizes that these preparation steps should be considered before collecting or obtaining any data to analyze.
The document discusses the United States Census Bureau's plans to use big data for the 2020 Census. Specifically:
1) The Census Bureau plans to use governmental and corporate data sources to update addresses and maps, reducing the need for in-person address canvassing. Satellite imagery and change detection techniques will identify areas still needing in-field verification.
2) Administrative records from government agencies will be used to identify vacant housing units and pre-fill census responses to reduce costly in-person follow-ups.
3) A new operational control system using a predictive model and real-time data will efficiently assign and track field workers, improving productivity.
Photoshop is used to edit raster images while Illustrator handles vector graphics. InDesign is used to format layouts combining vector, raster, and text. Basic tools in all three programs include selection tools and layers/frames. Photoshop allows adjusting photos using levels, quick mask, and fill layers. InDesign is used to arrange photos on a page which can then be exported as PDFs or JPEGs.
This document discusses graphical descriptions of data through graphs and charts. It introduces frequency tables and relative frequency tables that organize qualitative data into categories and counts. A sample frequency table is created from data on types of cars students drive. This data is then displayed in a bar chart and pie chart to visualize the distribution. Bar charts are described as using category on the x-axis and frequency on the y-axis with rectangular bars proportional in height to the frequencies. Pie charts divide a circle into wedge-shaped sectors proportional to relative frequencies. Examples are provided of creating these graphs in StatCrunch software from the raw car type data.
Statistics is a branch of mathematics used to organize, analyze, and interpret data. It helps simplify large amounts of data and make objective decisions. There are two main branches: descriptive statistics, which describes data, and inferential statistics, which makes inferences about populations. Common descriptive statistics tools include measures of central tendency (mean, median, mode) and measures of variability (range, standard deviation). Quality control uses seven tools: histograms, check sheets, Pareto charts, cause-and-effect diagrams, scatter diagrams, stratification diagrams, and control charts. Control charts monitor processes over time to determine if variation is due to chance or assignable causes.
Data mining techniques like classification trees and cluster analysis have been used by the National Agricultural Statistics Service (NASS) to improve various aspects of survey processing and estimation for the Census of Agriculture. Classification trees have been used for non-response weighting, identifying records unlikely to represent farms, and analyzing reporting errors. Cluster analysis has been applied to screen records with optical character recognition errors from donor pools for imputation, propose combining state questionnaires, and identify subtypes of missing records. NASS is exploring further applications of these techniques to improve operational efficiencies and data quality.
EEAUS is a visual analytics application for analysis and inspection of the "Energy-Census-and-Economic-Data-US-2010-2014" dataset. It is realized by using web technologies as D3.js, jQuery.js.
STAT 2103 Project 4 Performing a Multiple Linear Regress.docxdessiechisomjj4
STAT 2103 Project 4:
Performing a Multiple Linear Regression Analysis
Goal: Use the data set provided and the statistical methods learned in class to carry out an
applied multiple linear regression analysis.
Data: The data set for this project has been posted to Blackboard. The observational units in the
sample are 146 countries. The response variable (Y) is a “HAPPY”, an index of each country’s
overall happiness. Also included are 10 predictor variables (X’s), such as GDP, life expectancy,
health care expenditure, and population density. The “Description” tab explains each variable.
Method: You can complete the regression using StatCrunch (recommended) or Excel:
StatCrunch: On MyStatLab, select “StatCrunch”, then “StatCrunch website”, then “Type
or paste data into a blank data table”. Then use the “Stat” menu, “Regression”, and
“Multiple Linear”. Choose the correct variables and specifications.
Excel: Download “Analysis ToolPak” add-in (File – Options – Add-Ins – Manage). Then
“Data Analysis”, select “Regression”. Choose the correct input and specifications.
Assignment: Perform a multiple linear regression analysis. This includes:
List Variables: Select and list 4 predictor variables that you think may be related to
happiness.
Explore Variables: Include a scatterplot of the response variable “HAPPY” on the y-axis
and one of your predictor variables on the x-axis. Describe their relationship/correlation.
Write Model: Construct and write out a multiple linear regression model with your
selected variables.
Analyze Model: Use the statistical output to identify which predictor variables are
significantly important and how much of the variability in the response variable is
explained (the r
2
value).
Finalize Model: Rerun the regression model using only the significant predictor variables.
(If none were significant the first time, use the two variables with the lowest p-values.)
Learn from Model: Choose one variable from this finalized model and interpret its
coefficient. Also, why do you think that the r
2
is so high or so low?
Predict with Model: Select a country from the sample. Use the values of that country’s
predictor variables and the final regression model to estimate that country’s HAPPY
index. Find how much the model overestimated or underestimated the true value.
Details: Due date is in class on Thursday, December 4. The previous class on Tuesday,
December 2 will be partially spent as an in-class work day for the project, so it is recommended
that you bring your laptop to class that day if you have questions.
DescriptionVARNAMEVariable DescriptionCOUNTRY country nameHAPPY Forbes happiness indexCOMP health success indexHLTHEXP per capita health expenditureEDUC average years of educationDALE life expectancyGINI index of income distribution – higher is worse (less equal)POPDEN population density in people per square kilometerPUBTHE percent of.
The document provides an overview of quantitative techniques in analysis including different levels of measurement, common statistical software packages, and different types of data that can be analyzed such as time series data, cross-sectional data, and panel data. It discusses nominal, ordinal, interval, and ratio levels of measurement and how they are treated in SPSS. Examples of time series, cross-sectional, and panel data are also provided to illustrate the concepts.
Similar to Spotfire Recommendations in Action (20)
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Natural Language Processing (NLP), RAG and its applications .pptxfkyes25
1. In the realm of Natural Language Processing (NLP), knowledge-intensive tasks such as question answering, fact verification, and open-domain dialogue generation require the integration of vast and up-to-date information. Traditional neural models, though powerful, struggle with encoding all necessary knowledge within their parameters, leading to limitations in generalization and scalability. The paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" introduces RAG (Retrieval-Augmented Generation), a novel framework that synergizes retrieval mechanisms with generative models, enhancing performance by dynamically incorporating external knowledge during inference.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
1. Examples of Spotfire Recommendations in Action
Easy dashboard setup for business users,
dramatically faster creation of full-featured data
analysis applications for analysts
The agile business intelligence market is growing rapidly, and as Gartner points
out, the transition is toward platforms that can be rapidly implemented and used
by analysts and business users to find insights quickly—as well as by IT staff to
quickly build analytics content to meet business requirements and deliver more
timely business benefits.1
This drive for speed is about business value: accuracy
and speed of interpretation for decision-making, authoring, and development
of data discovery applications, and task completion to enable developers to
implement their ideas quickly and obtain accurate insights. 2
This paper describes a recommendation engine for the TIBCO Spotfire®
interactive graphical analysis system. Spotfire Recommendations makes data
discovery fast and easy for both analysts and business users. The system uses
metadata typing and built-in graphics taxonomy to produce a collection of
inherently sensible graphics choices applied to the data at hand. The user chooses
from the suggestions, and the software builds a dashboard of linked, brushable,
configurable graphics with supporting data filters and graphics controls that can
be rapidly applied to the canvas.
1 Rita L. Sallam, Bill Hostmann, Kurt Schlegel, Joao Tapadinhas, Josh Parenteau, Thomas W. Oestreich. Mag-
ic Quadrant for Business Intelligence and Analytics Platforms, Gartner. February 23, 2015.
2 Stephen Few. Information Dashboard Design, Analytics Press, CA. 2015.
TIME IS OF THE ESSENCE
“With a dashboard, every
unnecessary piece of
information results in time
wasted trying to filter out
what’s important, which is
intolerable when time is of
the essence.”2
2. WHITEPAPER | 2
For business users, Recommendations reduces the burden for initial setup of the
dashboard, and for analysts, it dramatically speeds the creation of full-featured
data analysis applications.
Following are two case studies showing Recommendations applied to datasets
for consumer packaged goods manufacturing and for homeless populations in
the United States.
CASE STUDY: GEOLOCATION ANALYSIS OF US HOMELESS
The Department of Housing and Urban Development (HUD) collects data
on homelessness in the US and releases two annual reports to Congress:
the Annual Homelessness Assessment Report (AHAR), Parts 13
and 24
. Part 1
contains information from the annual point-in-time counts (PIT) conducted by
communities nationwide on a single night in January. Part 2 includes information
obtained from homeless shelters throughout the course of a calendar year, the
Homeless Inventory Count (HIC). In March 2015, HUD released the 2013 AHAR
Part 2; Part 1 was released in October 2014.
Raw data is available online at data.hud.gov. We obtained PIT and HIC data
for 2007–2013. Estimates of homeless veterans are included beginning in 2011.
HUD partners with the Veterans Administration on the Veterans Homelessness
Prevention Demonstration Program.
The Housing Inventory Count and point-in-time data are yearly measures across
~400 spatial regions in the US using HUD’s Continuums of Care (CoC) regions. A
shape file describing these regions is available at https://www.hudexchange.info/
coc/gis-tools/.
Key variables in the beds data (HIC [Housing Inventory Count]) are: Shelter
Type (ES [Emergency Shelter], TH [Transitional Housing], RRH [Rapid Re-
Housing], SH [Safe Haven], PSH [Permanent Supportive Housing] and Household
Type (with children, without children, with only children). Key variables in
homeless data (PIT) are: Shelter Access (Sheltered, Unsheltered) and Family
Situation (Individuals, Persons in Families).
We also included US Census data in the analysis for the period 2010–2013,
including counts by state, county, and age group of total population, total male
population, total female population, and male and female populations broken
down by race.
3 https://www.hudexchange.info/resources/documents/2014-AHAR-Part1.pdf
4 https://www.hudexchange.info/onecpd/assets/File/2013-AHAR-Part-2.pdf
3. WHITEPAPER | 3
DATA PANEL
The analysis begins by loading the data, which results in data column names
being organized in a data panel (Figure 1). To start the analysis, the user clicks the
Recommendations icon and selects one or more columns of interest.
Figure 1. Spotfire open with data panel on the left showing homeless data. The
user clicks the Recommended visualizations icon in the center of the canvas to
start the analysis.
The numerical columns in this data panel are the homeless counts (PIT) and beds
(HIC) by year. Other data includes a time variable (YearDate), location (by state
and county) and categorical data relating to the Continuum of Care.
We select homeless and state data initially. Spotfire Recommendations
suggests some maps, a bar chart, and a tree map of homeless by state (Figure 2).
We add a map and treemap by state to the canvas.
Figure 2. Recommendations panel for US homeless data, state selected.
4. WHITEPAPER | 4
HOMELESS, BEDS, AND YEAR
We next add beds, and year. Recommendations responds by suggesting cross
tables, a bar chart trellised by year, and a parallel coordinates plot (Figure 3). We
choose the cross table (lower right) to add to the canvas.
FIGURE 3. Recommendations panel for US homeless data, state, beds, and
year selected.
HOMELESS BY STATE
We now have a map of homeless by state, a treemap by state, and a cross table
of homeless and beds by state. Recommendations has linked and arranged these
three graphs on the canvas. With a few more mouse clicks to configure the
graphs, we have an accurate, interactive summary of homeless in the US (Figure
4). Creating this initial dashboard took approximately 30 seconds.
Figure 4. Dashboard showing map of homeless and utilization of shelters by state.
5. WHITEPAPER | 5
HOMELESS SHELTERS
Using this dashboard as a starting point, we are now able to build a
comprehensive analysis of homeless across the US. This enables us to assess if
there are enough shelters for the homeless on an ongoing regional basis.
Figure 5 shows such a dashboard including a map of homeless utilization by
state, trends of homeless and available beds, beds by shelter type, top states for
bed utilization, and tables of homeless and bed utilization by CoC. The dashboard
addresses the question: Do we have enough shelters for the homeless? Relevant KPIs
are shown across the top and visualizations are arranged for easy interpretation.
Figure 5. Completed dashboard providing a detailed analysis of homeless in the
US during 2007–2013.
The dashboard in Figure 5 is rapidly assembed from that shown in Figure 4. We
calculated bed utilization, configured colors on the map and bar chart, and added
the trend charts and a slider for years at the top.
This dashboard is setup for drill down into regions and times of interest.
All the data is now in shape for continued analysis, and for combining with
additional data. We focus on Massachusetts and incorporate some weather
data into the analysis. We fit contours to the temperature data (zip code) and
display precipitation by size of circle. Color of contour lines and circles indicates
temperature (red is warmer and blue is colder).
Figure 6 shows this updated analysis for Massachusetts. Note that the pockets
of high homeless utilization to the southeast of Boston coincide with milder
temperatures and lower precipitation.
6. WHITEPAPER | 6
Figure 6. Completed dashboard with drill-down to homeless in Massachusetts.
Weather data has been incorporated: contours are fit to the temperature data,
and precipitation is shown in circles (larger circles indicate more precipitation).
Color of contour lines and circles indicates temperature (red is warmer and blue
is colder). Note that the pockets of high homeless utilization to the southeast of
Boston coincide with milder temperatures and lower precipitation.
CASE STUDY: PAPER TOWEL MANUFACTURING
Paper towel manufacturing involves equipment including dryers, dyes, feeders,
cutting and pattern machines, and a series of process steps. Product quality is
assessed via measurements of quality characteristics like absorbancy, strength,
and softness. Large quantities of data are collected on machines and on process
times for each batch at every process step.
The data under consideration in this simple example includes measurements
of product quality and equipment operation and performance. One goal of the
analysis is to assess effects of equipment on product quality.
DATA PANEL AND COLUMN NAMES
The analysis begins by loading the data, which results in data column names
being organized in a data panel. To start the analysis, the user clicks the
Recommendations icon and selects one or more columns of interest.
The numerical columns in this dataset are the measured product quality
characteristics. Selecting these columns produces histograms, density plots, and
tables in the Recommendations panel. Basic versions of the actual visualizations
are displayed (not canned representations of generic chart types), so the user can
see the shape of the distribution directly in the panel, (Figure 7).
7. WHITEPAPER | 7
FIGURE 7. Recommendations panel with numeric columns selected. The
absorbency histogram is shown in the lower right.
HISTOGRAMS
Paper towel softness appears to be normally distributed (upper right). Selecting
each of the other product quality characteristics indicates that most of them
are normally distributed as well. However, when absorbency is selected, the
histogram shows a bi-modal distribution (lower right). This is an interesting
finding that needs an explanation. The absorbency histogram is selected and
added to the analysis by clicking on it in the Recommendations display (Figure 7).
PROCESS DATES
To investigate whether the changes in absorbency correlate to any temporal
effects, a column with process dates is now selected. In this case, a line plot, bar
chart, tree map, table, and other graphics (not showing) are then displayed in the
Recommendations panel (Figure 8). The line plot shows a dramatic change in
absorbency over time (Figure 8, top left), so it is selected and added to the analysis.
Figure 8. Recommendations panel with absorbency numeric column and a time
column selected in the Data Panel. Why does absorbency increase in the later
part of July? The line plot is selected (clicked) to add it to the analysis.
8. WHITEPAPER | 8
Additional line plots are available from Recommendations when the numeric
variables are chosen along with a time variable (Figure 9). Absorbency (lower
left) stands out as being more affected by time (day of the month).
Figure 9. Recommendations panel with numeric columns and time (day of the
month) selected. Absorbency is the red trace in the lower left.
MACHINE USE IN TWO PROCESS STEPS
To investigate whether the machines may be affecting absorbency at one or more
process steps, additional categorical columns, each containing the machine used
at a process step, are selected. The most relevant recommended graph is an
absorbency line plot, colored by machines at one step and trellised by machines
at the second step (Figure 10, top right). This is chosen and added to the analysis.
Figure 10. Recommendations panel with absorbency numeric column, a time
column, and two categorical process machine columns selected.
9. WHITEPAPER | 9
DAY OF MONTH VS. LOW ABSORBENCY
The Recommendations panel is then closed to view the useful visualizations that
have been added. Marking is set to correlate colors for corresponding points
in the line plot and histogram (Figure 11). It is clear that low absorbency in the
histogram correlates to batches processed before July 17th, and high absorbency
correlates to batches processed on or after July 17th.
Figure 11. Analysis with line plot and histogram added from the
Recommendations panel.
ANALYSIS OF VARIANCE
To understand the paper towel absorbency variability, we use the Spotfire analysis of
variance (ANOVA) function. This enables an assessment of the effects of the machine
process steps on the measured product quality characteristics. In the ANOVA setup
dialog (Figure 12), absorbency is selected as a response variable (Y), and the process
step equipment (tool) columns are selected as explanatory variables (X).
Figure 12. ANOVA analysis setup dialog.
10. WHITEPAPER | 10
ANOVA results (Figure 13) are presented as a sorted summary table with one row per
response predictor pair (product quality characteristic and process step equipment).
Each row contains a p-value indicating statistical significance. The most significant
relationship is the effect of Dryer Tool on absorbency. Marking this row produces a
drill-down box plot showing that Dryer Tool DY1 produces paper towel batches with
significantly higher absorbency than Dryer Tools DY2 and DY3.
Figure 13. ANOVA results.
DRYER TOOLS
Since the ANOVA results indicate that Dryer Tool is responsible for the variation in
absorbency, the linked line plot and histogram produced with Recommendations are
then configured to show this clearly (Figure 14). The earlier dashboard graphics are
colored to distinguish between the Dryer Tools, and the line plot X-axis is changed
to the Dryer process date. This shows that only Dryer Tools DY2 and DY3 were in use
prior to July 13th and only DY1 was used on or after July 13th. It is now clear that the
increase in paper towel absorbency is related to the switch to Dryer DY1. Additional
investigation is warranted to determine how DY2 and DY3 could be modified to
match the superior absorbency results obtained with DY1.
Figure 14. Reconfigured discovery analysis for presentation; line plots and
histograms colored according to Dryer Tool.