Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using R to Determine a Getting Started on Hadoop


Published on

Using R to Determine a Threshold…

data <- read.csv("thresh.tsv", sep='t', header=F)
t_data <- data[,3]

# pass through values for 80+ percentile
qntile <- .8
t_thresh <- quantile(t_data, qntile)

# CDF plot
title <- "CDF threshold max(tfidf)"
xtitle <- paste("thresh:", t_thresh)
par(mfrow=c(2, 1))
plot(ecdf(t_data), xlab=xtitle, main=title)
abline(v=t_thresh, col="red")
abline(h=qtile, col="yellow")

# box-and-whisker plot
boxplot(t_data, horizontal=TRUE)
rug(t_data, side=1)

Published in: Technology