Upcoming SlideShare
×

# R part II

373 views

Published on

Praxis Weekend Analytics

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
373
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
21
0
Likes
0
Embeds 0
No embeds

No notes for slide

### R part II

1. 1. Vector with regularly spaced numbers > 1:10 [1] 1 2 3 > seq(1,10) [1] 1 2 3 > seq(1,10,2) [1] 1 3 5 7 9 4 5 6 7 8 9 10 4 5 6 7 8 9 10 • We have used both “:” operator and seq command • Note the last command where we have used “2” as step, which is the “by” argument of the seq command
2. 2. Try some sequence or seq commands …. > seq(0,1, length=11) [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 > seq(4,10,by=0.5) [1] 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 > seq(4,10,0.5) [1] 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 8.5 9.0 9.5 10.0 > seq(2,8,0.3) [1] 2.0 2.3 2.6 2.9 3.2 3.5 3.8 4.1 4.4 4.7 5.0 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7 [21] 8.0 > seq.int(2,8,0.3) [1] 2.0 2.3 2.6 2.9 3.2 3.5 3.8 4.1 4.4 4.7 5.0 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7 [21] 8.0 > seq(2,8,length.out=10) [1] 2.000000 2.666667 3.333333 4.000000 4.666667 5.333333 6.000000 6.666667 7.333333 [10] 8.000000
3. 3. Try more seq commands …. > seq(1,5,0.3) [1] 1.0 1.3 1.6 1.9 2.2 2.5 2.8 3.1 3.4 3.7 4.0 4.3 4.6 4.9 > pi:6 [1] 3.141593 4.141593 5.141593 > 6:pi [1] 6 5 4 > 10:-2 [1] 10 9 8 7 6 5 4 3 2 1 0 -1 -2 > -7:8 [1] -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 • You can generate decreasing sequence • Try generating a sequence of negative numbers
4. 4. Think and try …... Generate a sequence of the following numbers: 0.0 0.2 0.4 0.6 0.8 1.0 1.0 2.0 3.0 6.0 7.0 8.0 9.0 10.0 100.0 4.0 Hints • You have to use more than one sequence. • But how will you include “100”? 5.0
5. 5. Think and try ….. Possible Solution Generate a sequence of the following numbers: 0.0 0.2 0.4 0.6 0.8 1.0 1.0 2.0 3.0 6.0 7.0 8.0 9.0 10.0 100.0 > seq(0, 1, length=6) [1] 0.0 0.2 0.4 0.6 0.8 1.0 > seq.1<-seq(0, 1, length=6) > c(seq.1,1:10,100) [1] 0.0 0.2 0.4 0.6 [14] 8.0 9.0 10.0 100.0 0.8 1.0 1.0 2.0 3.0 4.0 4.0 5.0 5.0 6.0 7.0
6. 6. Try replicate or rep command > rep(1:5,2) [1] 1 2 3 4 5 1 2 3 4 5 > rep(1:5, length=12) [1] 1 2 3 4 5 1 2 3 4 5 1 2 > rep(c('one', 'two'), c(6, 3)) [1] "one" "one" "one" "one" "one" "one" "two" "two" "two" Now enter help(rep) command and try the examples
7. 7. Try replicate or rep command > rep(1:4, each = 2) [1] 1 1 2 2 3 3 4 4 > rep(1:4, c(2,2,2,2)) [1] 1 1 2 2 3 3 4 4 > rep(5:8, c(2,1,2,1)) [1] 5 5 6 7 7 8 > rep(1:4, each = 2, len = 4) [1] 1 1 2 2 Hope you are enjoying as we go….. Have you noted the arguments “each” and “len”gth? Now note the “times” argument > rep(1:4, each = 2, times = 3) [1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
8. 8. Try Histogram…. Suppose the top 25 ranked movies made the following gross receipts for a Week: 29.6 28.2 19.6 13.7 13.0 7.8 3.4 2.0 1.9 1.0 0.7 0.4 0.4 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1 Scan the data and then draw some histograms. > x [1] 29.6 28.2 19.6 13.7 13.0 0.4 0.4 0.3 0.3 0.3 [17] 0.3 0.3 0.2 0.2 0.2 > receipts<-x > hist(receipts) 7.8 3.4 2.0 1.9 1.0 0.1 0.1 0.1 0.1 0.1 0.7
9. 9. Try Histogram…. Suppose the top 25 ranked movies made the following gross receipts for a Week: 29.6 28.2 19.6 13.7 13.0 7.8 3.4 2.0 1.9 1.0 0.7 0.4 0.4 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1
10. 10. Now try better histograms …. Add colour, change colour, add title for the histogram, add title for x-axis and then y-axis > hist(receipts, col="red2") > hist(receipts, col="red4") > hist(receipts, col="red2",main="Gross Receipts for first 25 ranked movies") > hist(receipts, col="red2",main="Gross Receipts for first 25 ranked movies",xlab="receipts in a week") > hist(receipts, col="red2",main="Gross Receipts for first 25 ranked movies",xlab="receipts in a week",ylab="count of movies")
11. 11. Now try better histograms …. Your new histogram should look like this
12. 12. Now try better histograms …. Now put the range for x-axis and y-axis > hist(receipts, col="red2",main="Gross Receipts for first 25 ranked movies",xlab="receipts in a week",ylab="count of movies",xlim=c(0.1,35),ylim=c(0,25))
13. 13. Now more about histograms …. Now try breaks=…. What is “breaks”? > hist(receipts,breaks=3,col="red2",main="Gross Receipts for first 25 ranked movies",xlab="receipts in a week",ylab="count of movies") Remember: Breaks is just a suggestion to R
14. 14. Now more about breaks …. “breaks” can also specify the actual break points in a histogram > hist(receipts,breaks=c(0,1,2,3,4,5,10,20,max(x)),col="violetred") Note the break points
15. 15. Summary and Fivenum Suppose, CEO yearly compensations are sampled and the following are found (in millions). 12 0.4 5 2 50 8 3 1 4 0.25 > sals [1] 12.00 0.40 5.00 2.00 50.00 8.00 3.00 1.00 4.00 0.25 > mean(sals) # the average [1] 8.565 > var(sals) # the variance [1] 225.5145 > sd(sals) # the standard deviation [1] 15.01714 > median(sals) # the median [1] 3.5 > summary(sals) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.250 1.250 3.500 8.565 7.250 50.000 > fivenum(sals) # min, lower hinge, Median, upper hinge, max [1] 0.25 1.00 3.50 8.00 50.00 > quantile(sals) 0% 25% 50% 75% 100% 0.25 1.25 3.50 7.25 50.00
16. 16. Important: Difference between Fivenum and Quantiles
17. 17. Difference between Fivenum and Quantile: Lower and Upper Hinge The sorted data: 0.25 0.4 1 2 3 3.5 4 5 8 12 50 Median = 3.5 • The lower hinge is the median of all the data to the left of the median (3.5), not counting this particular data point (if it is one.) • The upper hinge is similarly defined.