Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docxpooleavelina
Homework #1
SOCY 3115
Spring 20
Read the Syllabus and FAQ on how to do your homework before beginning the assignment!
To get consideration for full credit, you must:
· Follow directions;
· Show all work required to arrive at answer (statistical calculations often require multiple steps, so you need to write these down, not just skip to the final answer)
· Use appropriate statistical notation at all times (e.g. if you are calculating a population mean, begin with the equation for population mean)
· Use units in your answer, where appropriate (e.g. a mean time would be “6.5 hours” rather than just “6.5”)
Understanding the Structure of Data
1. For the following rectangular dataset:
Id
Highest degree
Works full-time
Annual income cat
1
Did not grad HS
Yes
Low
2
HS dip
Yes
Low
3
HS dip
No
Med
4
BA
No
Low
5
BA
Yes
Med
6
MA
Yes
High
7
HS dip
Yes
Med
a. What is the unit-of-analysis of the dataset?
b. How many variables are in the dataset?
c. How many observations/cases are in the dataset?
d. For eachvariable that is not named “id”:
i. What is the variable name?
ii. What is the level-of-measurement?
iii. What are the values for the variable?
iv. If you had to make a guess, what do you think the “question” was that was asked of the unit-of-analysis to get these data? (for example, if we had a continuous variable called “num_pets” the question might be “How many pets live in your household?”)
2. For the following rectangular dataset:
Id
num_bdrms
num_bthrms
sqft
Ranch
1
4
3
3200
Yes
2
2
1.5
2800
Yes
3
2
1
1200
Yes
4
3
2
1500
No
5
2
2
1100
No
a. What is the unit-of-analysis of the dataset?
b. How many variables are in the dataset?
c. How many observations/cases are in the dataset?
d. For each variable that is not named “id”:
i. What is the variable name?
ii. What is the level-of-measurement? Before answering, be sure to consult the slide called “Level of measurement – language to use”. Use the formal language!
iii. What are the values for the variable?
iv. If you had to make a guess, what do you think the “question” was that was asked of the unit-of-analysis to get these data? (for example, if we had a continuous variable called “num_pets” the question might be “How many pets live in your household?”)
3. For each of the following questions (1) construct a dataset with one variable and three observations (2) add data that could have theoretically been collected (just make up the actual responses to the question); and (3) indicate the level-of-measurement of the variable. I’ve done two examples for you.
Example#1:
What is your current age? (individual is the unit-of-analysis)
idage
1 25
2 32
3 61
The age variable is continuous/interval ratio.
Example#2:
What is the size of this hospital based on number of beds? (hospital is the unit-of-analysis)? Answers can be small (1-100 beds), medium (101-500 beds), large (501 beds to 1000 beds), extra large (1001+ beds)
idhosp_size
1 med
2 med
3 ext ...
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docxpooleavelina
Homework #1
SOCY 3115
Spring 20
Read the Syllabus and FAQ on how to do your homework before beginning the assignment!
To get consideration for full credit, you must:
· Follow directions;
· Show all work required to arrive at answer (statistical calculations often require multiple steps, so you need to write these down, not just skip to the final answer)
· Use appropriate statistical notation at all times (e.g. if you are calculating a population mean, begin with the equation for population mean)
· Use units in your answer, where appropriate (e.g. a mean time would be “6.5 hours” rather than just “6.5”)
Understanding the Structure of Data
1. For the following rectangular dataset:
Id
Highest degree
Works full-time
Annual income cat
1
Did not grad HS
Yes
Low
2
HS dip
Yes
Low
3
HS dip
No
Med
4
BA
No
Low
5
BA
Yes
Med
6
MA
Yes
High
7
HS dip
Yes
Med
a. What is the unit-of-analysis of the dataset?
b. How many variables are in the dataset?
c. How many observations/cases are in the dataset?
d. For eachvariable that is not named “id”:
i. What is the variable name?
ii. What is the level-of-measurement?
iii. What are the values for the variable?
iv. If you had to make a guess, what do you think the “question” was that was asked of the unit-of-analysis to get these data? (for example, if we had a continuous variable called “num_pets” the question might be “How many pets live in your household?”)
2. For the following rectangular dataset:
Id
num_bdrms
num_bthrms
sqft
Ranch
1
4
3
3200
Yes
2
2
1.5
2800
Yes
3
2
1
1200
Yes
4
3
2
1500
No
5
2
2
1100
No
a. What is the unit-of-analysis of the dataset?
b. How many variables are in the dataset?
c. How many observations/cases are in the dataset?
d. For each variable that is not named “id”:
i. What is the variable name?
ii. What is the level-of-measurement? Before answering, be sure to consult the slide called “Level of measurement – language to use”. Use the formal language!
iii. What are the values for the variable?
iv. If you had to make a guess, what do you think the “question” was that was asked of the unit-of-analysis to get these data? (for example, if we had a continuous variable called “num_pets” the question might be “How many pets live in your household?”)
3. For each of the following questions (1) construct a dataset with one variable and three observations (2) add data that could have theoretically been collected (just make up the actual responses to the question); and (3) indicate the level-of-measurement of the variable. I’ve done two examples for you.
Example#1:
What is your current age? (individual is the unit-of-analysis)
idage
1 25
2 32
3 61
The age variable is continuous/interval ratio.
Example#2:
What is the size of this hospital based on number of beds? (hospital is the unit-of-analysis)? Answers can be small (1-100 beds), medium (101-500 beds), large (501 beds to 1000 beds), extra large (1001+ beds)
idhosp_size
1 med
2 med
3 ext ...
STAT225 Introduction to Statistics in the Behavioral Sciences.docxdessiechisomjj4
STAT225: Introduction to Statistics in the Behavioral Sciences
1. In a school election, five people run for student body president. The actual number of votes for each candidate would be a(n) variable. If the total number of votes were removed and the candidates were listed in order of least to most popular, this would be a(n)
variable.
a. ratio; ordinal b. ordinal; ratio c. ratio; nominal
d. nominal; ordinal
2. A researcher was interested in the effects of gender on attitudes toward women in leadership positions. The researcher surveyed a group of individuals, 12 of whom were men and 12 of whom were women. In this example, what is the explanatory/independent variable?
a. type of leadership position b. the 12 women in the study
c. the gender of the participants
d. the participants' attitudes toward women in leadership positions
3. A researcher was interested in the effects of gender on attitudes toward women in leadership positions. The researcher surveyed a group of individuals, 12 of whom were men and 12 of whom were women. In this example, what is the response/dependent variable?
a. type of leadership position b. the 12 women in the study
c. the gender of the participants
d. the participants' attitudes toward women in leadership positions
Please use the following information to answer questions 4 through 9
An industrial psychologist at a company has heard that desk bikes could help employees to lose weight, increase their stamina, and improve productivity. Sixteen employees were provided with desk bikes and the total number of pounds they lost, after one month, was recorded. Here are the data, in pounds lost, per employee:
4
8
12
0
2
20
18
0
12
6
12
16
10
8
12
4
4. What is the range of this distribution?
a. 0 t o 20 b. 20
c. 18 d. 4
5. What is the mean number of pounds that were lost by the employees in one month?
a.
9.88
b.
10.4
c.
12
d.
9
6. What is the median number of pounds that were lost by the employees in one month?
a.
8
b.
9
c.
10
d.
11
7. What is the variance of the number of pounds that were lost by the employees in one month?
a.
37.33
b.
9.72
c.
9.85
d.
6.11
8. What is the Interquartile range for this distribution?
a.
4
b.
8
c.
9
d.
12
9. How many outliers are in this distribution?
a. 0 b. 1
c. 2
d. Unable to determine from this information
The following graph depicts the typical relationship found between physiological arousal (anxiety) levels (e.g., range from 0 = no anxiety to 10=extreme anxiety) and test performance (e.g., percentage of correct answers on test).
Please use the following information to answer questions 10 and 11.
100%
Test Performance (in Percentage)
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Relationship Between Physiological Arousal Level and Test
Performance
0 2 4 6 8 10
Physiological Arousal Level
10. Based on this graph, what type of relation exists between physiological arousal level and test performanc.
The two major areas of statistics are: descriptive statistics and inferential statistics. In this presentation, the difference between the two are shown including examples.
IntroductionIntroduction to Populations and SamplesIt wo.docxvrickens
Introduction
Introduction to Populations and Samples
It would take too long and cost too much money to test the qualityof every piece of cereal made at a factory. Instead, a small sample ofeach batch is tested.
Wouldn't it be great if we could ask everyone in the world their opinion on atopic? What if we could have every person take a psychological test of interest sowe can assemble the most accurate data? How can we make sure that we includeevery man, woman, child, race, ethnicity, socioeconomic status, class, religion,occupation, or other demographic of interest in any study we conduct? We wantto make sure that the data we collect is as good as we can get under the givencircumstances. Because we cannot include everyone of interest in a study, wemust make sure our sample, or the group of those who participate in our study, isas close to "looking" like the population, or the entire collection of people ofinterest, as possible.
Consider this example. You are doing a study on the differences between men andwomen regarding their ability to follow directions. If you collected data from allmales and all females in the world—which would be the entire population,because sex is our main variable of interest—you would get an extremely accurateresult. However, it would be unrealistic, time consuming, and costly to collect thisdata. You could, however, take a sample of males and females and study them. If you choose a good sample, the results of your study can yieldan accurate representation of the population.
Collecting a sample that closely resembles the population we are interested in is an important component of conducting research. Muchconsideration must be given to the individuals you want to choose for your sample and how to ensure that your sample represents thepopulation. By choosing a good sample, we can make certain assumptions about the population, just as if we had selected everyone in thatpopulation. This is the focus of sampling: to select an appropriate cross-section of the population that will accurately represent the entirepopulation.
In the following lesson you will learn how to sample a population using a range of sampling methods. Be sure to pay specific attention to theadvantages and disadvantages of each method and when each is most useful.
Applying Knowledge of Populations and Samples
Populations and Samples in Ashford Courses
You will need to understand sample and population in a range of graduate courses, including those with a focus on psychological ororganizational assessment and testing, measurement, research methods, and statistics. In these courses you will need to be able to identify anddescribe the population of interest, how a sample was obtained, and the sampling methods used. These topics are important in understandinghow assessment or test results can be used or interpreted based on population norms, and how to conduct a study that does not suffer fromsampling biases or errors. In addition, having knowledge and s ...
This presentation is about Basic Statistics-related to types of Data-Qualitative and Quantitative, and its Examples in everyday life- By: Dr. Farhana Shaheen
Understanding statistics 2#4 Using a Z Chart TableFlorin Neagu
The z-table is short for the “Standard Normal z-table”. The Standard Normal model is used in hypothesis testing, including tests on proportions and on the difference between two means. The area under the whole of a normal distribution curve is 1, or 100 percent.
Understanding Statistics 2#3 The Standard Normal DistributionFlorin Neagu
The standard normal distribution is a special case of the normal distribution. It is the distribution that occurs when a normal random variable has a mean of zero and a standard deviation of one.
The normal random variable of a standard normal distribution is called a standard score or a z score. Every normal random variable X can be transformed into a z score
More Related Content
Similar to Master statistics - Mean, Standard Deviation, Z score, Sampling
STAT225 Introduction to Statistics in the Behavioral Sciences.docxdessiechisomjj4
STAT225: Introduction to Statistics in the Behavioral Sciences
1. In a school election, five people run for student body president. The actual number of votes for each candidate would be a(n) variable. If the total number of votes were removed and the candidates were listed in order of least to most popular, this would be a(n)
variable.
a. ratio; ordinal b. ordinal; ratio c. ratio; nominal
d. nominal; ordinal
2. A researcher was interested in the effects of gender on attitudes toward women in leadership positions. The researcher surveyed a group of individuals, 12 of whom were men and 12 of whom were women. In this example, what is the explanatory/independent variable?
a. type of leadership position b. the 12 women in the study
c. the gender of the participants
d. the participants' attitudes toward women in leadership positions
3. A researcher was interested in the effects of gender on attitudes toward women in leadership positions. The researcher surveyed a group of individuals, 12 of whom were men and 12 of whom were women. In this example, what is the response/dependent variable?
a. type of leadership position b. the 12 women in the study
c. the gender of the participants
d. the participants' attitudes toward women in leadership positions
Please use the following information to answer questions 4 through 9
An industrial psychologist at a company has heard that desk bikes could help employees to lose weight, increase their stamina, and improve productivity. Sixteen employees were provided with desk bikes and the total number of pounds they lost, after one month, was recorded. Here are the data, in pounds lost, per employee:
4
8
12
0
2
20
18
0
12
6
12
16
10
8
12
4
4. What is the range of this distribution?
a. 0 t o 20 b. 20
c. 18 d. 4
5. What is the mean number of pounds that were lost by the employees in one month?
a.
9.88
b.
10.4
c.
12
d.
9
6. What is the median number of pounds that were lost by the employees in one month?
a.
8
b.
9
c.
10
d.
11
7. What is the variance of the number of pounds that were lost by the employees in one month?
a.
37.33
b.
9.72
c.
9.85
d.
6.11
8. What is the Interquartile range for this distribution?
a.
4
b.
8
c.
9
d.
12
9. How many outliers are in this distribution?
a. 0 b. 1
c. 2
d. Unable to determine from this information
The following graph depicts the typical relationship found between physiological arousal (anxiety) levels (e.g., range from 0 = no anxiety to 10=extreme anxiety) and test performance (e.g., percentage of correct answers on test).
Please use the following information to answer questions 10 and 11.
100%
Test Performance (in Percentage)
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Relationship Between Physiological Arousal Level and Test
Performance
0 2 4 6 8 10
Physiological Arousal Level
10. Based on this graph, what type of relation exists between physiological arousal level and test performanc.
The two major areas of statistics are: descriptive statistics and inferential statistics. In this presentation, the difference between the two are shown including examples.
IntroductionIntroduction to Populations and SamplesIt wo.docxvrickens
Introduction
Introduction to Populations and Samples
It would take too long and cost too much money to test the qualityof every piece of cereal made at a factory. Instead, a small sample ofeach batch is tested.
Wouldn't it be great if we could ask everyone in the world their opinion on atopic? What if we could have every person take a psychological test of interest sowe can assemble the most accurate data? How can we make sure that we includeevery man, woman, child, race, ethnicity, socioeconomic status, class, religion,occupation, or other demographic of interest in any study we conduct? We wantto make sure that the data we collect is as good as we can get under the givencircumstances. Because we cannot include everyone of interest in a study, wemust make sure our sample, or the group of those who participate in our study, isas close to "looking" like the population, or the entire collection of people ofinterest, as possible.
Consider this example. You are doing a study on the differences between men andwomen regarding their ability to follow directions. If you collected data from allmales and all females in the world—which would be the entire population,because sex is our main variable of interest—you would get an extremely accurateresult. However, it would be unrealistic, time consuming, and costly to collect thisdata. You could, however, take a sample of males and females and study them. If you choose a good sample, the results of your study can yieldan accurate representation of the population.
Collecting a sample that closely resembles the population we are interested in is an important component of conducting research. Muchconsideration must be given to the individuals you want to choose for your sample and how to ensure that your sample represents thepopulation. By choosing a good sample, we can make certain assumptions about the population, just as if we had selected everyone in thatpopulation. This is the focus of sampling: to select an appropriate cross-section of the population that will accurately represent the entirepopulation.
In the following lesson you will learn how to sample a population using a range of sampling methods. Be sure to pay specific attention to theadvantages and disadvantages of each method and when each is most useful.
Applying Knowledge of Populations and Samples
Populations and Samples in Ashford Courses
You will need to understand sample and population in a range of graduate courses, including those with a focus on psychological ororganizational assessment and testing, measurement, research methods, and statistics. In these courses you will need to be able to identify anddescribe the population of interest, how a sample was obtained, and the sampling methods used. These topics are important in understandinghow assessment or test results can be used or interpreted based on population norms, and how to conduct a study that does not suffer fromsampling biases or errors. In addition, having knowledge and s ...
This presentation is about Basic Statistics-related to types of Data-Qualitative and Quantitative, and its Examples in everyday life- By: Dr. Farhana Shaheen
Understanding statistics 2#4 Using a Z Chart TableFlorin Neagu
The z-table is short for the “Standard Normal z-table”. The Standard Normal model is used in hypothesis testing, including tests on proportions and on the difference between two means. The area under the whole of a normal distribution curve is 1, or 100 percent.
Understanding Statistics 2#3 The Standard Normal DistributionFlorin Neagu
The standard normal distribution is a special case of the normal distribution. It is the distribution that occurs when a normal random variable has a mean of zero and a standard deviation of one.
The normal random variable of a standard normal distribution is called a standard score or a z score. Every normal random variable X can be transformed into a z score
Understanding Statistics 2#2 The Normal Probability DistributionFlorin Neagu
The normal distribution is the most important and most widely used distribution in statistics.
It is sometimes called the "bell curve," Strictly speaking, it is not correct to talk about "the normal distribution" since there are many normal distributions. Normal distributions can differ in their means and in their standard deviations. Normal distributions are symmetric with relatively more values at the center of the distribution and relatively few in the tails.
Understanding statistics 2#1 Random Variation and Discrete Probability Distri...Florin Neagu
Random Variable
In algebra, you learned about different variables like x, y and maybe even z. Some examples of variables include x = number of heads or y = number of cell phones or z = running time of movies. Thus, in basic math, a variable is an alphabetical character that represents an unknown number.
In probability, we also have variables and we refer to them as random variables. A random variable is a variable that is subject to randomness, which means it can take on different values. For example, tossing a coin, cast a dice, etc.
The random variable takes on different values depending on the situation. Each value of the random variable has a probability or percentage associated with it
Discrete Probability Distribution
A discrete distribution describes the probability of occurrence of each value of a discrete random variable. With a discrete probability distribution, each possible value of the discrete random variable can be associated with a non-zero probability
Z-score is a numerical measurement of a value's relationship to the mean in a group of values. If a Z-score is 0, then the score is identical to the mean score.
Z-scores can be positive or negative, with a positive value indicating the score is above the mean and a negative score indicating it is below the mean.
Z-score or the standard score is a very useful statistic because it
- allows us to calculate the probability of a score occurring within our normal distribution
- enables us to compare two scores that are from different normal distributions.
Understanding Statistics 1#13 Box and Whisker plotsFlorin Neagu
A box and whisker plot (boxplot) is a graph that presents information from a five-number summary. It is especially useful for indicating whether a distribution is skewed and whether there are potential unusual observations (outliers) in the data set. Box and whisker plots are also very useful when large numbers of observations are involved and when two or more data sets are being compared
A box and whisker plot is a way of summarizing a set of data measured on an interval scale. It is often used in explanatory data analysis. This type of graph is used to show the shape of the distribution, its central value, and its variability.
Understanding Statistics 1#12 Quartiles of DataFlorin Neagu
To understand a quartile, let us revisit median. For median, we cut off the data into two groups with equal number of points. Thus the middle value that separates these groups is the median. In the same way if we divide the data into 4 equal groups, the first differentiating point is the first quartile, the second differentiating point is the second quartile (which is the same as the median) and the third differentiating point is the third quartile.
To further see what quartiles do, the first quartile is at the 25th percentile. This means that 25% of the data is smaller than the first quartile and 75% of the data is larger than this. Similarly, in case of the third quartile, 25% of the data is larger than it while 75% of it is smaller. For the second quartile, which is nothing but the median, 50% or half of the data is smaller while half of the data is larger than this value
- The Empirical Rule is an approximation that applies only to data sets with a bell-shaped relative frequency histogram. It estimates the proportion of the measurements that lie within one, two, and three standard deviations of the mean.
- Chebyshev’s Theorem is a fact that applies to all possible data sets. It describes the minimum proportion of the measurements that lie must within one, two, or more standard deviations of the mean.
Master statistics 1#10 Empirical Rule of Standard Deviation Florin Neagu
The empirical rule is a statistical rule which declares that for a normal distribution, almost all data will fall within three standard deviations of the mean.
The empirical rule is most often used in statistics to anticipate final outcomes. After a standard deviation is calculated and before exact data can be collected, this rule can be used as a rough estimate of the outcome of the data. This probability can be used meanwhile since gathering appropriate data may be time-consuming or even impossible to obtain. The empirical rule is also used as a rough way to test a distribution's "normality". If too many data points fall outside the three standard deviation boundaries, this could suggest that the distribution is not normal
Master statistics 1#09_ Standard Deviation of Data in a Frequency TableFlorin Neagu
When using standard deviation keep in mind the following properties.
• Standard deviation is only used to measure spread or dispersion around the mean of a data set.
• Standard deviation is never negative.
• Standard deviation is sensitive to outliers. A single outlier can raise the standard deviation and in turn, distort the picture of spread.
• For data with approximately the same mean, the greater the spread, the greater the standard deviation.
Master statistics 1#8 Population and Sample Standard DeviationFlorin Neagu
- The standard deviation is the square root of the variance. The standard deviation is expressed in the same units as the mean is, whereas the variance is expressed in squared units.
- Both standard deviation and variance are derived from the mean of a given data set.
- The coefficient of variation is a standardized measure of dispersion of a probability distribution or frequency distribution. If you know nothing about the data other than the mean, one way to interpret the relative magnitude of the standard deviation is to divide it by the mean
Master statistics 1#7 Population and Sample VarianceFlorin Neagu
• Variance: this is a very important concept in statistics and represents how spread the data is
Variance is used to see how individual numbers relate to each other within a data set.
- A drawback to variance is that it gives added weight to numbers far from the mean (outliers), since squaring these numbers can skew interpretations of the data.
- The advantage of variance is that it treats all deviations from the mean the same regardless of direction; as a result, the squared deviations cannot sum to zero and give the appearance of no variability.
- The drawback of variance is that it is not easily interpreted, and the square root of its value is usually taken to get the standard deviation of the data set in question.
Master statistics 1#6 Median vs. Mean and Range of DataFlorin Neagu
• Median: the middle value in an ordered list of data
• If an odd number of sample then chose the median as the middle
• If even number then the median is average of the 2 middle sample
• Mode: value in the dataset that occur most frequently
• Mean: Average value of the data
• Measure of dispersion: how much data is spread or dispersed
• Range of the data: Difference between largest and smallest values
Understanding statistics 1#5 Mean and Sample MeanFlorin Neagu
Mean: Average value of the data
Sample Mean: Mean of the sample values collected;
Weighted mean: when each value of the dataset is not equally important
Understanding statistics 1#4 Pie chart, Bar chart, Pareto,Histogram, Steam L...Florin Neagu
• Pie Chart: Data represented as a fraction of a circle
• Bar chart: show comparisons between categories of data
• Pareto: Use when there are many problems or causes and you want to focus on the most significant
• Histogram: Bar chart of a frequency distribution
Steam and leaf: a method for showing the frequency with which certain classes of values occur
Frequency Distribution
In statistics, a frequency distribution is a list, table or graph that displays the frequency of various outcomes in a sample.
Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the distribution of values in the sample.
A brief description of how to "handle" any type of data you come across.
This cover:
- Type of data,
- Data representation; or how to visualize data to understand relationship
- Which control chart should I use to check the process capability
Testing common assumptions; which hypothesis should be used it and how to check
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
2. Lesson 2 – Populations and Samples - definitions
Statistics: Gathering, describing & analysing data. OR numeric description of sample data
Data: Information Gathering
Population: Particular group of interest ex: everybody in a city; everybody in something,
ex: all males or all females, all children between 6-9 ya, ex: everybody in a city; .
Note the word ALL, is all inclusive of a category
Parameter: Numerical description of a population characteristic
Ex: the mean height of all males in the world; IQ of all children in the world, mean IQ of all females in USA, 75% of all kids age
6-9 play games
Note the word NUMERICAL from definition; it is a number
Sample: Subset of the POPULATION from which DATA is collected
Ex: we asked 100 males IN THE CITY what their fav movies
Sample Statistic: Numeric description of particular sample CHARACHTERISTIC
Ex: 100 female asked from city; 47% disliked chocolate
3. Lesson 3 – Descriptive vs Inferential Statistics
Branches of statistics
Descriptive Statistic – gather, sort, summarise data from sample
Inferential statistics – use descriptive statistics (DATA) to ESTIMATE POPULATION PARAMETERS
Problem: Based on a phone survey, 22% of all men dislike football
Inferential as we estimate the percentage of a population from a sample (survey)
PR2: 65% of seniors at a local HS apply to college plan to major in business
Descriptive because
Population
Sample
4. Lesson 4 – Apply Definitions in Statistics
Height of every 4th bottle on an assembly line: Sample
Ages of all USA president: Population
A research stop 100 people in a store to ask a survey of household income
Population: people in a store
Sample: 100 people chosen
Parameters describe populations, statistics describe sample
The average number of hour per week a sample of 10 yo. spend watching TV is 20h
Statistic: 20h/week for the sample
87% of all patients in a hospital report having alcohol problem.
Parameter: describe all patients