0
Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# R part II

122

Published on

Praxis Weekend Analytics

Praxis Weekend Analytics

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
122
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
20
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. Vector with regularly spaced numbers &gt; 1:10 [1] 1 2 3 &gt; seq(1,10) [1] 1 2 3 &gt; seq(1,10,2) [1] 1 3 5 7 9 4 5 6 7 8 9 10 4 5 6 7 8 9 10 &#x2022; We have used both &#x201C;:&#x201D; operator and seq command &#x2022; Note the last command where we have used &#x201C;2&#x201D; as step, which is the &#x201C;by&#x201D; argument of the seq command
• 2. Try some sequence or seq commands &#x2026;. &gt; seq(0,1, length=11) [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 &gt; seq(4,10,by=0.5) [1] 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 &gt; seq(4,10,0.5) [1] 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 8.5 9.0 9.5 10.0 &gt; seq(2,8,0.3) [1] 2.0 2.3 2.6 2.9 3.2 3.5 3.8 4.1 4.4 4.7 5.0 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7 [21] 8.0 &gt; seq.int(2,8,0.3) [1] 2.0 2.3 2.6 2.9 3.2 3.5 3.8 4.1 4.4 4.7 5.0 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7 [21] 8.0 &gt; seq(2,8,length.out=10) [1] 2.000000 2.666667 3.333333 4.000000 4.666667 5.333333 6.000000 6.666667 7.333333 [10] 8.000000
• 3. Try more seq commands &#x2026;. &gt; seq(1,5,0.3) [1] 1.0 1.3 1.6 1.9 2.2 2.5 2.8 3.1 3.4 3.7 4.0 4.3 4.6 4.9 &gt; pi:6 [1] 3.141593 4.141593 5.141593 &gt; 6:pi [1] 6 5 4 &gt; 10:-2 [1] 10 9 8 7 6 5 4 3 2 1 0 -1 -2 &gt; -7:8 [1] -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 &#x2022; You can generate decreasing sequence &#x2022; Try generating a sequence of negative numbers
• 4. Think and try &#x2026;... Generate a sequence of the following numbers: 0.0 0.2 0.4 0.6 0.8 1.0 1.0 2.0 3.0 6.0 7.0 8.0 9.0 10.0 100.0 4.0 Hints &#x2022; You have to use more than one sequence. &#x2022; But how will you include &#x201C;100&#x201D;? 5.0
• 5. Think and try &#x2026;.. Possible Solution Generate a sequence of the following numbers: 0.0 0.2 0.4 0.6 0.8 1.0 1.0 2.0 3.0 6.0 7.0 8.0 9.0 10.0 100.0 &gt; seq(0, 1, length=6) [1] 0.0 0.2 0.4 0.6 0.8 1.0 &gt; seq.1&lt;-seq(0, 1, length=6) &gt; c(seq.1,1:10,100) [1] 0.0 0.2 0.4 0.6 [14] 8.0 9.0 10.0 100.0 0.8 1.0 1.0 2.0 3.0 4.0 4.0 5.0 5.0 6.0 7.0
• 6. Try replicate or rep command &gt; rep(1:5,2) [1] 1 2 3 4 5 1 2 3 4 5 &gt; rep(1:5, length=12) [1] 1 2 3 4 5 1 2 3 4 5 1 2 &gt; rep(c('one', 'two'), c(6, 3)) [1] "one" "one" "one" "one" "one" "one" "two" "two" "two" Now enter help(rep) command and try the examples
• 7. Try replicate or rep command &gt; rep(1:4, each = 2) [1] 1 1 2 2 3 3 4 4 &gt; rep(1:4, c(2,2,2,2)) [1] 1 1 2 2 3 3 4 4 &gt; rep(5:8, c(2,1,2,1)) [1] 5 5 6 7 7 8 &gt; rep(1:4, each = 2, len = 4) [1] 1 1 2 2 Hope you are enjoying as we go&#x2026;.. Have you noted the arguments &#x201C;each&#x201D; and &#x201C;len&#x201D;gth? Now note the &#x201C;times&#x201D; argument &gt; rep(1:4, each = 2, times = 3) [1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
• 8. Try Histogram&#x2026;. Suppose the top 25 ranked movies made the following gross receipts for a Week: 29.6 28.2 19.6 13.7 13.0 7.8 3.4 2.0 1.9 1.0 0.7 0.4 0.4 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1 Scan the data and then draw some histograms. &gt; x [1] 29.6 28.2 19.6 13.7 13.0 0.4 0.4 0.3 0.3 0.3 [17] 0.3 0.3 0.2 0.2 0.2 &gt; receipts&lt;-x &gt; hist(receipts) 7.8 3.4 2.0 1.9 1.0 0.1 0.1 0.1 0.1 0.1 0.7
• 9. Try Histogram&#x2026;. Suppose the top 25 ranked movies made the following gross receipts for a Week: 29.6 28.2 19.6 13.7 13.0 7.8 3.4 2.0 1.9 1.0 0.7 0.4 0.4 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1
• 10. Now try better histograms &#x2026;. Add colour, change colour, add title for the histogram, add title for x-axis and then y-axis &gt; hist(receipts, col="red2") &gt; hist(receipts, col="red4") &gt; hist(receipts, col="red2",main="Gross Receipts for first 25 ranked movies") &gt; hist(receipts, col="red2",main="Gross Receipts for first 25 ranked movies",xlab="receipts in a week") &gt; hist(receipts, col="red2",main="Gross Receipts for first 25 ranked movies",xlab="receipts in a week",ylab="count of movies")
• 11. Now try better histograms &#x2026;. Your new histogram should look like this
• 12. Now try better histograms &#x2026;. Now put the range for x-axis and y-axis &gt; hist(receipts, col="red2",main="Gross Receipts for first 25 ranked movies",xlab="receipts in a week",ylab="count of movies",xlim=c(0.1,35),ylim=c(0,25))
• 13. Now more about histograms &#x2026;. Now try breaks=&#x2026;. What is &#x201C;breaks&#x201D;? &gt; hist(receipts,breaks=3,col="red2",main="Gross Receipts for first 25 ranked movies",xlab="receipts in a week",ylab="count of movies") Remember: Breaks is just a suggestion to R
• 14. Now more about breaks &#x2026;. &#x201C;breaks&#x201D; can also specify the actual break points in a histogram &gt; hist(receipts,breaks=c(0,1,2,3,4,5,10,20,max(x)),col="violetred") Note the break points
• 15. Summary and Fivenum Suppose, CEO yearly compensations are sampled and the following are found (in millions). 12 0.4 5 2 50 8 3 1 4 0.25 &gt; sals [1] 12.00 0.40 5.00 2.00 50.00 8.00 3.00 1.00 4.00 0.25 &gt; mean(sals) # the average [1] 8.565 &gt; var(sals) # the variance [1] 225.5145 &gt; sd(sals) # the standard deviation [1] 15.01714 &gt; median(sals) # the median [1] 3.5 &gt; summary(sals) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.250 1.250 3.500 8.565 7.250 50.000 &gt; fivenum(sals) # min, lower hinge, Median, upper hinge, max [1] 0.25 1.00 3.50 8.00 50.00 &gt; quantile(sals) 0% 25% 50% 75% 100% 0.25 1.25 3.50 7.25 50.00
• 16. Important: Difference between Fivenum and Quantiles
• 17. Difference between Fivenum and Quantile: Lower and Upper Hinge The sorted data: 0.25 0.4 1 2 3 3.5 4 5 8 12 50 Median = 3.5 &#x2022; The lower hinge is the median of all the data to the left of the median (3.5), not counting this particular data point (if it is one.) &#x2022; The upper hinge is similarly defined.