The use of data and its modelling in science provides meaningful interpretation of real world problems. This presentation provides an easy to understand overview of data visualization and analytics , and snippets of data science applications using R - programming.
Data analysis using spss for two sample t-test tutorialDaniel Sarpong
This beginner's manual for students, researchers, and data analysts provide a visual step-by-step approach for conducting data analysis using the Statistical Package for the Social Sciences (SPSS). It uses screen captures of the software to simplify the steps needed to carry out the commands to perform the statistical methods commonly employed in data analysis.
Data analysis using spss for two sample t-test tutorialDaniel Sarpong
This beginner's manual for students, researchers, and data analysts provide a visual step-by-step approach for conducting data analysis using the Statistical Package for the Social Sciences (SPSS). It uses screen captures of the software to simplify the steps needed to carry out the commands to perform the statistical methods commonly employed in data analysis.
In this webinar Dr. Lani discusses key points in successfully completing your quantitative analysis. You will learn how to conduct common statistical analyses, how to examine assumptions, how to easily generate APA 6th edition tables and figures, how to use Intellectus Statistics(TM) Software, how to identify and interpret the appropriate statistics, and how to present and summarize your findings.
SSP is now Intellectus Statistics Software. Intellectus Statistics™ software primarily serves the academic and research communities as a powerful statistical package that can be purchased via four distinct cloud based subscriptions. Learn more here: http://www.statisticssolutions.com/buy-intellectus/
In this webinar Dr. Lani discusses key points in successfully completing your quantitative analysis. You will learn how to conduct common statistical analyses, how to examine assumptions, how to easily generate APA 6th edition tables and figures, how to use Intellectus Statistics(TM) Software, how to identify and interpret the appropriate statistics, and how to present and summarize your findings.
SSP is now Intellectus Statistics Software. Intellectus Statistics™ software primarily serves the academic and research communities as a powerful statistical package that can be purchased via four distinct cloud based subscriptions. Learn more here: http://www.statisticssolutions.com/buy-intellectus/
Part of a course I run introducing quantitative methods. One of the slideshows on my site www.kevinmorrell.org.uk please reference the site if you use any of it - hope it is useful.
Data Analysis in Research: Descriptive Statistics & NormalityIkbal Ahmed
A Presentation on Data Analysis using descriptive statistics & normality. From this presentation you can know-
1) What is Data
2) Types of Data
3) What is Data analysis
4) Descriptive Statistics
5) Tools for assessing normality
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
2. “Data is the new oil”
• Data is a collection of facts, such as numbers, words, measurements, observations or
even just descriptions of things
• Data is all around us. But what exactly is it?
Data is a value assigned to a thing. Color, Shape, Number,
Condition, Size
QUALITATIVE DATA : is everything that refers to the
quality of something: A description of colours, texture and
feel of an object, a description of experiences, and
interviews are all qualitative data.
QUANTITATIVE DATA : is data that refers to a number.
E.g. the number of golf balls, the size, the price, a score
on a test etc.
2
3. 3
• Categorical data is qualitative in nature
• Numerical (quantitative) data of both discrete and continuous nature can be interval or ratio data also
• Interval data has ordered values with same difference but lack a true zero value e.g. Temperature. PH.
• Ratio data are also ordered values with same difference but has a true zero value e.g. height, weight.
4. Categorical Data : puts the item you are describing
into a category: For example, the condition “used”
would be categorical and also categories such as
“new”, “used”, ”broken” etc.
Discrete Data : is numerical data that has gaps in
it: e.g. the count of golf balls. There can only be
whole numbers of golf ball (there is no such thing
as 0.3 golf balls).
Continuous Data : is numerical data with a
continuous range: e.g. size of the golf balls can be
any value (e.q. 10.55 mm or 10.61 mm but also
10.536 mm). In continuous data, all values are
possible with no gaps in between.
Primary Data
Secondary Data
4
7. • From researchers’ experience
Can result in wide confidence interval
or measurement error
• Using some formula
For instance, Cochran’s formula for sample size
calculation:
𝑛0 =
𝑍2
𝑝𝑞
𝑒2
Where:
e is the desired level of precision (i.e.
the margin of error or confidence interval),
p is the (estimated) proportion of the
population which has the attribute in question,
q is 1 – p.
Determining the ideal sample size
7
8. Example
Suppose we are doing a study on the inhabitants of a large town or village, and want to
find out how many households serve breakfast in the mornings. We don’t have much
information on the subject to begin with, so we’re going to assume that half of the
families serve breakfast: this gives us maximum variability. So p = 0.5. Now let’s say we
want 95% confidence level, and at least 5 percent—plus or minus—precision. A 95 %
confidence level gives us Z values of 1.96, from the table values, so we get
((1.96)2 *(0.5) *(0.5)) / (0.05)2 =
384.16 ~ 385.
So a random sample of 385 households in our target population should be enough to
give us the confidence levels we need.
8
9. Both Accurate
and Precise
Accurate
Not precise
Not accurate
But precise
Neither accurate
nor precise
• Accuracy refers to how close measurements are to the "true" value
• Precision refers to how close measurements are to each other
Data accuracy vs. precision
9
10. Independent Variable: The variable in the study
under consideration. The cause for the outcome
for the study.
Dependent Variable: The variable being
affected by the independent variable. The
effect of the study
y = f(x)
Which is which here?
10
11. Principles of Data Collection
• Understanding and knowing what types of data required
• Collect only relevant data
• Determine methods of data collection
Survey/questionnaire
Observation, participatory
Focus groups
Standard instruments
Content analysis
Experiments/observations
Personal interviews
Literature search – meta analysis
11
12. Principles…..
• Where, who, how, and when to collect
* Research design
* Sampling procedure
* Prepare field work schedule/data plan
* Conduct preliminary (surveys) investigation
• Assess situation and prepare further strategies
12
13. 13
Enter the data in
MS-Excel.
Top row with
variable labels in
each cell.
Save the entered
data as .csv file in
MS-Excel
14. Data analysis has been around for a while…
R.A. Fisher
Howard Dresner
Peter LuhnW.E. Deming
Robert Gentleman
Ross Ihaka
14
17. • Consider the set
• 1, 1, 2, 2, 3, 6, 7, 11, 11, 13, 14, 16, 19
• In this case there are 13 values so the median is the middle
value, or (n+1) / 2
• (13+1) /2 = 7
• Consider the set
• 1, 1, 2, 2, 3, 6, 7, 11, 11, 13, 14, 16
• In the second case, the mean of the two middle values is the
median or (n+1) /2
(12 + 1) / 2 = 6.5 ~ (6+7) / 2 = 6.5
Median
17
18. The most frequent value in a data set
• Consider the set
• 1, 1, 1, 1, 2, 2, 3, 6, 11, 11, 11, 13, 14, 16, 19
• In this case the mode is 1 because it is the most common value.
• This is a case of unimodal distrbution
• There may be cases where there are more than one mode as in this case
• Consider the set
• 1, 1, 1, 1, 2, 2, 3, 6, 11, 11, 11, 11, 13, 14, 16, 19
• In this case there are two modes (bimodal) : 1 and 11 because both
occur 4 times in the data set.
Mode
18
26. Symbol Meaning Level of
significance
ns P > 0.05 Not applicable
* P ≤ 0.05 At 10% level
** P ≤ 0.01 At 5% level
*** P ≤ 0.001 At 1% level
**** P ≤ 0.0001 At 0.1% level
"p-value offers a first defense line against being fooled by randomness,
separating signal from noise"
26
Statistical significance and p-value
27. Chance (Random Error; Sampling Error)
Bias (Systematic Errors [inaccuracies])
Selection bias
Loss to follow-up bias
Information bias
• Nondifferential (e.g. simple misclassification)
• Differential Biases (e.g., recall bias, interviewer bias)
Confounding (Imbalance in Other Factors)
A situation in which the effect of two processes
are not separated.
Errors affecting validity. A
systematic error (caused by the
investigator or the subjects) that
causes an incorrect (over- or
under-) estimate of an association.
What is bias?
27
28. 28
A word of caution:
“Interpretation can
however be
subjective”
29. Don’t have any strong opinion about SPSS since I am not an avid user of the
same......
29
30. R or others – The fight is on
A lot more documents found in Google Scholar still uses
SPSS than R while it is vice-versa in Scopus .
30
31. What Is R?
• a programming “environment”
• object-oriented
• similar to S-Plus
• freeware
• provides calculations on matrices
• excellent graphics capabilities
• supported by a large user network
31
32. What is R Not?
• a statistics software package
• menu-driven
• quick to learn
• a program with a complex graphical interface
32
33. Installing R
• www.r-project.org/
• download from CRAN
• select a download site
• download the base package at a minimum
• download contributed packages as needed
33
34.
35.
36.
37.
38. Tutorials cont.
• Textbooks
The Art of R programming by Norman Matloff Handbook of programming with R by
Garrett Grolemund
38
44. Disclaimer: Many of the image files used in this presentation have been downloaded from the internet. Any copyright holders who are not
duly acknowledged here may contact me for proper citation.
Contact : sreejiagriman@gmail.com