SlideShare a Scribd company logo
1 of 34
Download to read offline
1
EX.1 Hadoop Installation and Configuration
Aim:-
To install and Configure Hadoop Environment.
Procedure:-
1. Install Java 8:
a. Download Java 8 a. Set environmental variables:
i. User variable:
 Variable: JAVA_HOME
 Value: C:java
ii. System variable:
 Variable: PATH
 Value: C:javabin
b. Check on cmd, see below:
2. Install Eclipse Mars. Download it from the link: https://eclipse.org/downloads/ and
extract it into C drive.
a. Set environmental variables:
i. User variable:
 Variable: ECLIPSE_HOME
 Value: C:eclipse
ii. System variable:
 Variable: PATH
 Value: C:eclipse bin
b. Download “hadoop2x-eclipse-plugin-master.”Three Jar files on the path
“hadoop2x- eclipse-plugin-masterrelease.” Copy these three jar files and pate
them into “C:eclipsedropins.”
c. Download “slf4j-1.7.21.” Copy Jar files from this folder and paste them to
“C:eclipseplugins”. This step may create errors; when you will execute Eclipse,
you will see errors like org.apa…..jar file in multiple places. So, now delete these
files from all the places except one.
Errors
2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/C:/eclipse/plugins/org.slf4j.impl.log4j12_1.7.2.v20131105-
2200.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/hadoop-
2.6.0/share/hadoop/common/lib/slf4j-log4j12-
1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
3. Download Apache-ant-1.9.6: (optional step) extract it into a folder in C drive.
4. Download Hadoop-2.6.x:
a. Put extracted Hadoop-2.6.x files into D drive.
b. Download “hadoop-common-2.6.0-bin-master. Paste all these files into the
“bin” folder of Hadoop-2.6.x.
c. Create a “data” folder inside Hadoop-2.6.x, and also create two more folders in
the “data” folder as “data” and “name.”
d. Create a folder to store temporary data during execution of a project, such as
“D:hadooptemp.”
e. Create a log folder, such as “D:hadoopuserlog”
f. Go to Hadoop-2.6.x etc Hadoop and edit four files:
i. core-site.xml
ii. hdfs-site.xml
iii. mapred.xml
iv. yarn.xml
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied.
See the License for the specific language governing permissions and
3
limitations under the License. See accompanying LICENSE file. -->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>D:hadooptemp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:50071</value>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property><name>dfs.replication</name><value>1</value></property>
<property> <name>dfs.namenode.name.dir</name><value>/hadoop-
2.6.0/data/name</value><final>true</final></property>
<property><name>dfs.datanode.data.dir</name><value>/hadoop-
2.6.0/data/data</value><final>true</final> </property>
</configuration>
mapred.xml
<?xml version="1.0"?>
4
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>/hadoop-2.6.0/share/hadoop/mapreduce/*,
/hadoop-2.6.0/share/hadoop/mapreduce/lib/*,
/hadoop-2.6.0/share/hadoop/common/*,
/hadoop-2.6.0/share/hadoop/common/lib/*,
/hadoop-2.6.0/share/hadoop/yarn/*,
/hadoop-2.6.0/share/hadoop/yarn/lib/*,
/hadoop-2.6.0/share/hadoop/hdfs/*,
/hadoop-2.6.0/share/hadoop/hdfs/lib/*,
</value>
</property>
</configuration>
yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
5
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>D:hadoopuserlog</value><final>true</final>
</property>
<property><name>yarn.nodemanager.local-dirs</name><value>D:hadooptempnm-
local-dir</value></property>
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value> 600</value>
</property>
<property><name>yarn.application.classpath</name>
<value>/hadoop-2.6.0/,/hadoop-2.6.0/share/hadoop/common/*,/hadoop-
2.6.0/share/hadoop/common/lib/*,/hadoop-2.6.0/share/hadoop/hdfs/*,/hadoop-
2.6.0/share/hadoop/hdfs/lib/*,/hadoop-2.6.0/share/hadoop/mapreduce/*,/hadoop-
2.6.0/share/hadoop/mapreduce/lib/*,/hadoop-2.6.0/share/hadoop/yarn/*,/hadoop-
2.6.0/share/hadoop/yarn/lib/*</value>
</property>
</configuration>
g. Go to the location: “Hadoop-2.6.0->etc->hadoop,” and edit “hadoop-env.cmd” by
writing
set JAVA_HOME=C:javajdk1.8.0_91
h. Set environmental variables: Do: My computer -> Properties -> Advance system
settings -> Advanced -> Environmental variables
i. User variables:
 Variable: HADOOP_HOME
 Value: D:hadoop-2.6.0
ii. System variable
 Variable: Path
 Value: D:hadoop-2.6.0bin
D:hadoop-2.6.0sbin
D:hadoop-2.6.0sharehadoopcommon*
D:hadoop-2.6.0sharehadoophdfs
D:hadoop-2.6.0sharehadoophdfslib*
6
D:hadoop-2.6.0sharehadoophdfs*
D:hadoop-2.6.0sharehadoopyarnlib*
D:hadoop-2.6.0sharehadoopyarn*
D:hadoop-2.6.0sharehadoopmapreducelib*
D:hadoop-2.6.0sharehadoopmapreduce*
D:hadoop-2.6.0sharehadoopcommonlib*
i. Check on cmd;
j. Format name-node: On cmd go to the location “Hadoop-2.6.0 bin” by writing on cmd
“cd hadoop-2.6.0.bin” and then “hdfs namenode –format”
k. Start Hadoop. Go to the location: “D:hadoop-2.6.0sbin.” Run the following files as
administrator “start-dfs.cmd” and “start-yarn.cmd”
How to create a new MapReduce project in Eclipse
1. Open Ellipse
2. Click File -> New Project -> Java project
7
3. Click next and add external Jars for MapReduce.
Copy all the Jar files from the locations “D:hadoop-2.6.0”
a. sharehadoopcommonlib
b. sharehadoopmapreduce
c. sharehadoopmapreducelib sharehadoopyarn
d. sharehadoopyarnlib
4. Connect DFS in Eclipse
8
Eclipse -> Window -> Perspective -> Open Perspective -> Other -> MapReduce -> Click
OK.
See a bar at the bottom. Click on Map/Reduce locations.
Right click on blank space, then click on “Edit setting,” and see the following screen.
a. Set the following:
i. MapReduce (V2) Master
 Host: localhost
 Port: 9001
ii. DFS Master
 Host: localhost
 Port: 50071
b. Click finish
Result:-
Thus, the Hadoop Environment was installed and Configured.
9
EX.2 Implementation of word count/frequency using MapReduce
Aim:-
To implement word count program using MapReduce.
Program:-
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
10
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Result:-
Thus, the word count program was executed using Hadoop environment.
11
EX.3 Implementation of MR program using Weather dataset
Aim :-
To write a code to find maximum temperature per year from sensor temperature
data sheet, using hadoop mapreduce framework.
Procedure:-
Implement Mapper and Reducer program for finding Maximum temperature in java
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
//Mapper class
class MaxTemperatureMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);
12
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
//Reducer class
class MaxTemperatureReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
//Driver Class
public class MaxTemperature {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxTemperature <input path=""> <output path>");
System.exit(-1);
}
Job job = Job.getInstance(new Configuration());
job.setJarByClass(MaxTemperature.class);
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
13
job.setOutputValueClass(IntWritable.class);
job.submit();
}
}
Result:-
Thus, Maximum temperature of weather dataset was obtained using MapReduce.
14
EX.4 INSTALL, CONFIGURE AND RUN SPARK
Aim:-
To install and configure spark in standalone machine.
Procedure:-
Step 1: Install Java 8
Apache Spark requires Java 8. You can check to see if Java is installed using the
command prompt.
Open the command line by clicking Start > type cmd > click Command Prompt.
Type the following command in the command prompt:
java -version
If Java is installed, it will respond with the following output:
Step 2: Install Python
1. To install the Python package manager, navigate to https://www.python.org/ in your
web browser.
2. Mouse over the Download menu option and click Python 3.8.3. 3.8.3 is the latest
version at the time of writing the article.
3. Once the download finishes, run the file.
15
4. Near the bottom of the first setup dialog box, check off Add Python 3.8 to PATH. Leave
the other box checked.
5. Next, click Customize installation.
6. You can leave all boxes checked at this step, or you can uncheck the options you do
not want.
16
7. Click Next.
8. Select the box Install for all users and leave other boxes as they are.
9. Under Customize install location, click Browse and navigate to the C drive. Add a new
folder and name it Python.
10. Select that folder and click OK.
11. Click Install, and let the installation complete.
12. When the installation completes, click the Disable path length limit option at the bottom
and then click Close.
13. If you have a command prompt open, restart it. Verify the installation by checking the
version of Python:
python --version
The output should print Python 3.8.3.
Step 3: Download Apache Spark
1. Open a browser and navigate to https://spark.apache.org/downloads.html.
2. Under the Download Apache Spark heading, there are two drop-down menus. Use the
current non-preview version.
In our case, in Choose a Spark release drop-down menu select 2.4.5
17
In the second drop-down Choose a package type, leave the selection Pre-built for Apache
Hadoop 2.7.
3. Click the spark-2.4.5-bin-hadoop2.7.tgz link.
4. A page with a list of mirrors loads where you can see different servers to download
from. Pick any from the list and save the file to your Downloads folder.
Step 4: Verify Spark Software File
1. Verify the integrity of your download by checking the checksum of the file. This ensures
you are working with unaltered, uncorrupted software.
2. Navigate back to the Spark Download page and open the Checksum link, preferably in
a new tab.
3. Next, open a command line and enter the following command:
certutil -hashfile c:usersusernameDownloadsspark-2.4.5-bin-hadoop2.7.tgz
SHA512
4. Change the username to your username. The system displays a long alphanumeric
code, along with the message Certutil: -hashfile completed successfully.
18
5. Compare the code to the one you opened in a new browser tab. If they match, your
download file is uncorrupted.
Step 5: Install Apache Spark
Installing Apache Spark involves extracting the downloaded file to the desired location.
1. Create a new folder named Spark in the root of your C: drive. From a command line,
enter the following:
cd 
mkdir Spark
2. In Explorer, locate the Spark file you downloaded.
3. Right-click the file and extract it to C:Spark using the tool you have on your system
(e.g., 7-Zip).
4. Now, your C:Spark folder has a new folder spark-2.4.5-bin-hadoop2.7 with the
necessary files inside.
Step 6: Add winutils.exe File
Download the winutils.exe file for the underlying Hadoop version for the Spark installation
you downloaded.
1. Navigate to this URL https://github.com/cdarlint/winutils and inside the bin folder,
locate winutils.exe, and click it.
2. Find the Download button on the right side to download the file.
3. Now, create new folders Hadoop and bin on C: using Windows Explorer or the
Command Prompt.
4. Copy the winutils.exe file from the Downloads folder to C:hadoopbin.
19
Step 7: Configure Environment Variables
Configuring environment variables in Windows adds the Spark and Hadoop locations to
your system PATH. It allows you to run the Spark shell directly from a command prompt
window.
1. Click Start and type environment.
2. Select the result labeled Edit the system environment variables.
3. A System Properties dialog box appears. In the lower-right corner, click Environment
Variables and then click New in the next window.
4. For Variable Name type SPARK_HOME.
5. For Variable Value type C:Sparkspark-2.4.5-bin-hadoop2.7 and click OK. If you
changed the folder path, use that one instead.
20
6. In the top box, click the Path entry, then click Edit. Be careful with editing the system
path. Avoid deleting any entries already on the list.
7. You should see a box with entries on the left. On the right, click New.
8. The system highlights a new line. Enter the path to the Spark folder C:Sparkspark-
2.4.5-bin-hadoop2.7bin. We recommend using %SPARK_HOME%bin to avoid possible
issues with the path.
21
9. Repeat this process for Hadoop and Java.
 For Hadoop, the variable name is HADOOP_HOME and for the value use the path
of the folder you created earlier: C:hadoop. Add C:hadoopbin to the Path variable
field, but we recommend using %HADOOP_HOME%bin.
 For Java, the variable name is JAVA_HOME and for the value use the path to your
Java JDK directory (in our case it’s C:Program FilesJavajdk1.8.0_251).
10. Click OK to close all open windows.
Step 8: Launch Spark
1. Open a new command-prompt window using the right-click and Run as administrator:
2. To start Spark, enter:
C:Sparkspark-2.4.5-bin-hadoop2.7binspark-shell
If you set the environment path correctly, you can type spark-shell to launch Spark.
3. The system should display several lines indicating the status of the application. You
may get a Java pop-up. Select Allow access to continue.
22
Finally, the Spark logo appears, and the prompt displays the Scala shell.
4., Open a web browser and navigate to http://localhost:4040/.
5. You can replace localhost with the name of your system.
6. You should see an Apache Spark shell Web UI. The example below shows the
Executors page.
7. To exit Spark and close the Scala shell, press ctrl-d in the command-prompt window.
Result:-
Thus, the SPARK was installed and configured successfully.
23
EX.5 IMPLEMENT WORD COUNT / FREQUENCY PROGRAMS USING SPARK
Aim:-
To Implement word count / frequency programs using Spark
Program:-
package org.apache.spark.examples;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;
public final class WordCount {
private static final Pattern SPACE = Pattern.compile(" ");
public static void main(String[] args) throws Exception {
if (args.length < 1) {
System.err.println("Usage: WordCount <file>");
System.exit(1);
}
final SparkConf sparkConf = new SparkConf().setAppName("WordCount");
final JavaSparkContext ctx = new JavaSparkContext(sparkConf);
final JavaRDD<String> lines = ctx.textFile(args[0], 1);
final JavaRDD<String> words = lines.flatMap(s ->
Arrays.asList(SPACE.split(s)));
final JavaPairRDD<String, Integer> ones = words.mapToPair(s -> new
Tuple2<>(s, 1));
final JavaPairRDD<String, Integer> counts = ones.reduceByKey((i1, i2) ->
i1 + i2);
final List<Tuple2<String, Integer>> output = counts.collect();
for (Tuple2 tuple : output) {
System.out.println(tuple._1() + ": " + tuple._2());
}
ctx.stop();
}}
24
Result:-
Thus, the word count program is executed successfully.
25
EX.6 IMPLEMENT MACHINE LEARNING USING SPARK
Aim:-
To implement machine learning using spark
Procedure:-
Spark MLlib is a module on top of Spark Core that provides machine learning primitives
as APIs.
1. Setting the Dependencies
First, we have to define the following dependency in Maven to pull the relevant libraries:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.4.3</version>
<scope>provided</scope>
</dependency>
And we need to initialize the SparkContext to work with Spark APIs:
SparkConf conf = new SparkConf()
.setAppName("Main")
.setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(conf);
2. Loading the Data
First things first, we should download the data, which is available as a text file in CSV
format. Then we have to load this data in Spark:
String dataFile = "datairis.data";
JavaRDD<String> data = sc.textFile(dataFile);
Spark MLlib offers several data types, both local and distributed, to represent the input
data and corresponding labels. The simplest of the data types are Vector:
JavaRDD<Vector> inputData = data
.map(line -> {
String[] parts = line.split(",");
double[] v = new double[parts.length - 1];
for (int i = 0; i < parts.length - 1; i++) {
v[i] = Double.parseDouble(parts[i]);
}
return Vectors.dense(v);
});
26
A training example typically consists of multiple input features and a label, represented
by the class LabeledPoint:
Map<String, Integer> map = new HashMap<>();
map.put("Iris-setosa", 0);
map.put("Iris-versicolor", 1);
map.put("Iris-virginica", 2);
JavaRDD<LabeledPoint> labeledData = data
.map(line -> {
String[] parts = line.split(",");
double[] v = new double[parts.length - 1];
for (int i = 0; i < parts.length - 1; i++) {
v[i] = Double.parseDouble(parts[i]);
}
return new LabeledPoint(map.get(parts[parts.length - 1]), Vectors.dense(v));
});
3. Exploratory Data Analysis
MultivariateStatisticalSummary summary = Statistics.colStats(inputData.rdd());
System.out.println("Summary Mean:");
System.out.println(summary.mean());
System.out.println("Summary Variance:");
System.out.println(summary.variance());
System.out.println("Summary Non-zero:");
System.out.println(summary.numNonzeros());
Another important metric to analyze is the correlation between features in the input data:
Matrix correlMatrix = Statistics.corr(inputData.rdd(), "pearson");
System.out.println("Correlation Matrix:");
System.out.println(correlMatrix.toString());
Result:-
Thus, the word count program is executed successfully.
27
EX.7 IMPLEMENTATION OF LINEAR AND LOGISTIC REGRESSION USING R
Aim:-
To implement Linear and Logistic Regression using R.
Procedure:-
Step 1: Carry out the experiment of gathering a sample of observed values of height and
corresponding weight.
Step 2: Create a relationship model using the lm() functions in R.
Step 3: Find the coefficients from the model created and create the mathematical equation
using these
Step 4: Get a summary of the relationship model to know the average error in prediction.
Also called residuals.
Step 5:To predict the weight of new persons, use the predict () function in R.
Program:-
Linear Regression:
Linear..r
# The predictor vector.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
# The response vector.
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function.
relation <- lm(y~x)
print(relation)
print(summary(relation))
# Find weight of a person with height 170.
a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)
28
# Give the chart file a name.
png(file = "linearregression.png")
# Plot the chart.
plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm")
# Save the file.
dev.off()
Logistic Regression:
logistic.r
# Select some columns form mtcars.
input <- mtcars[,c("am","cyl","hp","wt")]
print(head(input))
am.data = glm(formula = am ~ cyl + hp + wt, data = input, family = binomial)
print(summary(am.data))
Result:-
Thus, Linear and Logistic Regression was implemented using R Programming
successfully.
29
EX.8 IMPLEMENTATION OF DECISION TREE CLASSIFIER USING R
Aim:-
To implement Decision Tree Classifier using R
Procedure:-
Step 1:Install party package
Step 2:Input the data
Step 3:Create decision tree using ctree() function
Step 4: Display and Print Result.
Program:-
# Load the party package. It will automatically load other dependent packages.
library(party)
# Create the input data frame.
input.dat <- readingSkills[c(1:105),]
# Give the chart file a name.
png(file = "decision_tree.png")
# Create the tree.
output.tree <- ctree(
nativeSpeaker ~ age + shoeSize + score,
data = input.dat)
# Plot the tree.
plot(output.tree)
# Save the file.
dev.off()
Result:-
Thus, the Decision tree classifier is implemented using R.
30
EX.9 IMPLEMENTATION OF CLUSTERING TECHNIQUE
Aim: -
To implement Clustering Technique (Pam) using R
Procedure:-
Step 1: Include cluster package.
Step 2: Apply Pam method to car dataset
Step 3: Compare hclust and pam result using table method
Step 4: Plot the result
Program:-
Pam.r
library(cluster)
cars.pam = pam(cars.dist,3)
names(cars.pam)
table(groups.3,cars.pam$clustering)
cars$Car[groups.3 != cars.pam$clustering]
cars$Car[cars.pam$id.med]
cars$Car[cars.pam$id.med]
plot(cars.pam)
Result:-
Thus, the clustering using Pam was implemented using R.
31
EX.10 IMPLEMENTATION OF DATA VISUALIZATION
Aim:-
To implement the Data Visualization using R Program.
Procedure:-
Step 1: Read the input
Step 2: Visualize the data using
i)Piechart
ii)3DPiechart
iii)Boxplot
iv)Histogram
v)Linechart
vi)Scatterplot
Program :-
1.Piechart.r
# Create data for the graph.
x <- c(21, 62, 10, 53)
labels <- c("London", "New York", "Singapore", "Mumbai")
# Give the chart file a name.
png(file = "city_title_colours.jpg")
# Plot the chart with title and rainbow color pallet.
pie(x, labels, main = "City pie chart", col = rainbow(length(x)))
# Save the file.
dev.off()
32
2.ThreeDPiechart.r
# Get the library.
library(plotrix)
# Create data for the graph.
x <- c(21, 62, 10,53)
lbl <- c("London","New York","Singapore","Mumbai")
# Give the chart file a name.
png(file = "3d_pie_chart.jpg")
# Plot the chart.
pie3D(x,labels = lbl,explode = 0.1, main = "Pie Chart of Countries ")
# Save the file.
dev.off()
3.Boxplot.r
# Give the chart file a name.
png(file = "boxplot.png")
# Plot the chart.
boxplot(mpg ~ cyl, data = mtcars, xlab = "Number of Cylinders",
ylab = "Miles Per Gallon", main = "Mileage Data")
# Save the file.
dev.off()
4.Histogram.r
# Create data for the graph.
v <- c(9,13,21,8,36,22,12,41,31,33,19)
# Give the chart file a name.
png(file = "histogram.png")
# Create the histogram.
hist(v,xlab = "Weight",col = "yellow",border = "blue")
# Save the file.
dev.off()
33
5.Linechart.r
# Create the data for the chart.
v <- c(7,12,28,3,41)
# Give the chart file a name.
png(file = "line_chart_label_colored.jpg")
# Plot the bar chart.
plot(v,type = "o", col = "red", xlab = "Month", ylab = "Rain fall",
main = "Rain fall chart")
# Save the file.
dev.off()
6.Scatterplot.r
# Get the input values.
input <- mtcars[,c('wt','mpg')]
# Give the chart file a name.
png(file = "scatterplot.png")
# Plot the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
plot(x = input$wt,y = input$mpg,
xlab = "Weight",
ylab = "Milage",
xlim = c(2.5,5),
ylim = c(15,30),
main = "Weight vs Milage"
)
# Save the file.
dev.off()
Result:-
Thus, the different data visualization techniques were implemented using R.
34
EX.11 Implementation of an Application
Aim:-
To implement survival analysis using R
Procedure:-
Step 1: Install survival package
Step 2: Display input to check details
Step 3: Create survival object
Step 4: Display the output
Program:-
Survival.r
# Load the library.
library("survival")
# Print first few rows.
print(head(pbc))
# Create the survival object.
survfit(Surv(pbc$time,pbc$status == 2)~1)
# Give the chart file a name.
png(file = "survival.png")
# Plot the graph.
plot(survfit(Surv(pbc$time,pbc$status == 2)~1))
# Save the file.
dev.off()
Result:-
Thus, the survival analysis was implemented using R.

More Related Content

Similar to DA Lab Manual Data Analysis Data AnalysisData AnalysisData AnalysisData Analysis

Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝recast203
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Nag Arvind Gudiseva
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Titus Damaiyanti
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Single node setup
Single node setupSingle node setup
Single node setupKBCHOW123
 
Dev-Jam 2019 - Container & OpenNMS
Dev-Jam 2019 - Container & OpenNMSDev-Jam 2019 - Container & OpenNMS
Dev-Jam 2019 - Container & OpenNMSRonny Trommer
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsAnandMHadoop
 
Config/BuildConfig
Config/BuildConfigConfig/BuildConfig
Config/BuildConfigVijay Shukla
 
Zero Downtime Deployment with Ansible
Zero Downtime Deployment with AnsibleZero Downtime Deployment with Ansible
Zero Downtime Deployment with AnsibleStein Inge Morisbak
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOpsОмские ИТ-субботники
 
R Data Access from hdfs,spark,hive
R Data Access  from hdfs,spark,hiveR Data Access  from hdfs,spark,hive
R Data Access from hdfs,spark,hivearunkumar sadhasivam
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14jijukjoseph
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 

Similar to DA Lab Manual Data Analysis Data AnalysisData AnalysisData AnalysisData Analysis (20)

Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
 
Run wordcount job (hadoop)
Run wordcount job (hadoop)Run wordcount job (hadoop)
Run wordcount job (hadoop)
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Single node setup
Single node setupSingle node setup
Single node setup
 
Dev-Jam 2019 - Container & OpenNMS
Dev-Jam 2019 - Container & OpenNMSDev-Jam 2019 - Container & OpenNMS
Dev-Jam 2019 - Container & OpenNMS
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic Commands
 
Config/BuildConfig
Config/BuildConfigConfig/BuildConfig
Config/BuildConfig
 
Config BuildConfig
Config BuildConfigConfig BuildConfig
Config BuildConfig
 
Zero Downtime Deployment with Ansible
Zero Downtime Deployment with AnsibleZero Downtime Deployment with Ansible
Zero Downtime Deployment with Ansible
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
 
R Data Access from hdfs,spark,hive
R Data Access  from hdfs,spark,hiveR Data Access  from hdfs,spark,hive
R Data Access from hdfs,spark,hive
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Ex-8-hive.pptx
Ex-8-hive.pptxEx-8-hive.pptx
Ex-8-hive.pptx
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 

Recently uploaded

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Recently uploaded (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

DA Lab Manual Data Analysis Data AnalysisData AnalysisData AnalysisData Analysis

  • 1. 1 EX.1 Hadoop Installation and Configuration Aim:- To install and Configure Hadoop Environment. Procedure:- 1. Install Java 8: a. Download Java 8 a. Set environmental variables: i. User variable:  Variable: JAVA_HOME  Value: C:java ii. System variable:  Variable: PATH  Value: C:javabin b. Check on cmd, see below: 2. Install Eclipse Mars. Download it from the link: https://eclipse.org/downloads/ and extract it into C drive. a. Set environmental variables: i. User variable:  Variable: ECLIPSE_HOME  Value: C:eclipse ii. System variable:  Variable: PATH  Value: C:eclipse bin b. Download “hadoop2x-eclipse-plugin-master.”Three Jar files on the path “hadoop2x- eclipse-plugin-masterrelease.” Copy these three jar files and pate them into “C:eclipsedropins.” c. Download “slf4j-1.7.21.” Copy Jar files from this folder and paste them to “C:eclipseplugins”. This step may create errors; when you will execute Eclipse, you will see errors like org.apa…..jar file in multiple places. So, now delete these files from all the places except one. Errors
  • 2. 2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/C:/eclipse/plugins/org.slf4j.impl.log4j12_1.7.2.v20131105- 2200.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/C:/hadoop- 2.6.0/share/hadoop/common/lib/slf4j-log4j12- 1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 3. Download Apache-ant-1.9.6: (optional step) extract it into a folder in C drive. 4. Download Hadoop-2.6.x: a. Put extracted Hadoop-2.6.x files into D drive. b. Download “hadoop-common-2.6.0-bin-master. Paste all these files into the “bin” folder of Hadoop-2.6.x. c. Create a “data” folder inside Hadoop-2.6.x, and also create two more folders in the “data” folder as “data” and “name.” d. Create a folder to store temporary data during execution of a project, such as “D:hadooptemp.” e. Create a log folder, such as “D:hadoopuserlog” f. Go to Hadoop-2.6.x etc Hadoop and edit four files: i. core-site.xml ii. hdfs-site.xml iii. mapred.xml iv. yarn.xml core-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and
  • 3. 3 limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>D:hadooptemp</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:50071</value> </property> </configuration> hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property><name>dfs.replication</name><value>1</value></property> <property> <name>dfs.namenode.name.dir</name><value>/hadoop- 2.6.0/data/name</value><final>true</final></property> <property><name>dfs.datanode.data.dir</name><value>/hadoop- 2.6.0/data/data</value><final>true</final> </property> </configuration> mapred.xml <?xml version="1.0"?>
  • 4. 4 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>mapreduce.application.classpath</name> <value>/hadoop-2.6.0/share/hadoop/mapreduce/*, /hadoop-2.6.0/share/hadoop/mapreduce/lib/*, /hadoop-2.6.0/share/hadoop/common/*, /hadoop-2.6.0/share/hadoop/common/lib/*, /hadoop-2.6.0/share/hadoop/yarn/*, /hadoop-2.6.0/share/hadoop/yarn/lib/*, /hadoop-2.6.0/share/hadoop/hdfs/*, /hadoop-2.6.0/share/hadoop/hdfs/lib/*, </value> </property> </configuration> yarn-site.xml <?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value>
  • 5. 5 </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>D:hadoopuserlog</value><final>true</final> </property> <property><name>yarn.nodemanager.local-dirs</name><value>D:hadooptempnm- local-dir</value></property> <property> <name>yarn.nodemanager.delete.debug-delay-sec</name> <value> 600</value> </property> <property><name>yarn.application.classpath</name> <value>/hadoop-2.6.0/,/hadoop-2.6.0/share/hadoop/common/*,/hadoop- 2.6.0/share/hadoop/common/lib/*,/hadoop-2.6.0/share/hadoop/hdfs/*,/hadoop- 2.6.0/share/hadoop/hdfs/lib/*,/hadoop-2.6.0/share/hadoop/mapreduce/*,/hadoop- 2.6.0/share/hadoop/mapreduce/lib/*,/hadoop-2.6.0/share/hadoop/yarn/*,/hadoop- 2.6.0/share/hadoop/yarn/lib/*</value> </property> </configuration> g. Go to the location: “Hadoop-2.6.0->etc->hadoop,” and edit “hadoop-env.cmd” by writing set JAVA_HOME=C:javajdk1.8.0_91 h. Set environmental variables: Do: My computer -> Properties -> Advance system settings -> Advanced -> Environmental variables i. User variables:  Variable: HADOOP_HOME  Value: D:hadoop-2.6.0 ii. System variable  Variable: Path  Value: D:hadoop-2.6.0bin D:hadoop-2.6.0sbin D:hadoop-2.6.0sharehadoopcommon* D:hadoop-2.6.0sharehadoophdfs D:hadoop-2.6.0sharehadoophdfslib*
  • 6. 6 D:hadoop-2.6.0sharehadoophdfs* D:hadoop-2.6.0sharehadoopyarnlib* D:hadoop-2.6.0sharehadoopyarn* D:hadoop-2.6.0sharehadoopmapreducelib* D:hadoop-2.6.0sharehadoopmapreduce* D:hadoop-2.6.0sharehadoopcommonlib* i. Check on cmd; j. Format name-node: On cmd go to the location “Hadoop-2.6.0 bin” by writing on cmd “cd hadoop-2.6.0.bin” and then “hdfs namenode –format” k. Start Hadoop. Go to the location: “D:hadoop-2.6.0sbin.” Run the following files as administrator “start-dfs.cmd” and “start-yarn.cmd” How to create a new MapReduce project in Eclipse 1. Open Ellipse 2. Click File -> New Project -> Java project
  • 7. 7 3. Click next and add external Jars for MapReduce. Copy all the Jar files from the locations “D:hadoop-2.6.0” a. sharehadoopcommonlib b. sharehadoopmapreduce c. sharehadoopmapreducelib sharehadoopyarn d. sharehadoopyarnlib 4. Connect DFS in Eclipse
  • 8. 8 Eclipse -> Window -> Perspective -> Open Perspective -> Other -> MapReduce -> Click OK. See a bar at the bottom. Click on Map/Reduce locations. Right click on blank space, then click on “Edit setting,” and see the following screen. a. Set the following: i. MapReduce (V2) Master  Host: localhost  Port: 9001 ii. DFS Master  Host: localhost  Port: 50071 b. Click finish Result:- Thus, the Hadoop Environment was installed and Configured.
  • 9. 9 EX.2 Implementation of word count/frequency using MapReduce Aim:- To implement word count program using MapReduce. Program:- import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable();
  • 10. 10 public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } Result:- Thus, the word count program was executed using Hadoop environment.
  • 11. 11 EX.3 Implementation of MR program using Weather dataset Aim :- To write a code to find maximum temperature per year from sensor temperature data sheet, using hadoop mapreduce framework. Procedure:- Implement Mapper and Reducer program for finding Maximum temperature in java import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; //Mapper class class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private static final int MISSING = 9999; @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(15, 19); int airTemperature; if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs airTemperature = Integer.parseInt(line.substring(88, 92)); } else { airTemperature = Integer.parseInt(line.substring(87, 92)); } String quality = line.substring(92, 93);
  • 12. 12 if (airTemperature != MISSING && quality.matches("[01459]")) { context.write(new Text(year), new IntWritable(airTemperature)); } } } //Reducer class class MaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int maxValue = Integer.MIN_VALUE; for (IntWritable value : values) { maxValue = Math.max(maxValue, value.get()); } context.write(key, new IntWritable(maxValue)); } } //Driver Class public class MaxTemperature { public static void main(String[] args) throws Exception { if (args.length != 2) { System.err.println("Usage: MaxTemperature <input path=""> <output path>"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setJarByClass(MaxTemperature.class); job.setJobName("Max temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class);
  • 14. 14 EX.4 INSTALL, CONFIGURE AND RUN SPARK Aim:- To install and configure spark in standalone machine. Procedure:- Step 1: Install Java 8 Apache Spark requires Java 8. You can check to see if Java is installed using the command prompt. Open the command line by clicking Start > type cmd > click Command Prompt. Type the following command in the command prompt: java -version If Java is installed, it will respond with the following output: Step 2: Install Python 1. To install the Python package manager, navigate to https://www.python.org/ in your web browser. 2. Mouse over the Download menu option and click Python 3.8.3. 3.8.3 is the latest version at the time of writing the article. 3. Once the download finishes, run the file.
  • 15. 15 4. Near the bottom of the first setup dialog box, check off Add Python 3.8 to PATH. Leave the other box checked. 5. Next, click Customize installation. 6. You can leave all boxes checked at this step, or you can uncheck the options you do not want.
  • 16. 16 7. Click Next. 8. Select the box Install for all users and leave other boxes as they are. 9. Under Customize install location, click Browse and navigate to the C drive. Add a new folder and name it Python. 10. Select that folder and click OK. 11. Click Install, and let the installation complete. 12. When the installation completes, click the Disable path length limit option at the bottom and then click Close. 13. If you have a command prompt open, restart it. Verify the installation by checking the version of Python: python --version The output should print Python 3.8.3. Step 3: Download Apache Spark 1. Open a browser and navigate to https://spark.apache.org/downloads.html. 2. Under the Download Apache Spark heading, there are two drop-down menus. Use the current non-preview version. In our case, in Choose a Spark release drop-down menu select 2.4.5
  • 17. 17 In the second drop-down Choose a package type, leave the selection Pre-built for Apache Hadoop 2.7. 3. Click the spark-2.4.5-bin-hadoop2.7.tgz link. 4. A page with a list of mirrors loads where you can see different servers to download from. Pick any from the list and save the file to your Downloads folder. Step 4: Verify Spark Software File 1. Verify the integrity of your download by checking the checksum of the file. This ensures you are working with unaltered, uncorrupted software. 2. Navigate back to the Spark Download page and open the Checksum link, preferably in a new tab. 3. Next, open a command line and enter the following command: certutil -hashfile c:usersusernameDownloadsspark-2.4.5-bin-hadoop2.7.tgz SHA512 4. Change the username to your username. The system displays a long alphanumeric code, along with the message Certutil: -hashfile completed successfully.
  • 18. 18 5. Compare the code to the one you opened in a new browser tab. If they match, your download file is uncorrupted. Step 5: Install Apache Spark Installing Apache Spark involves extracting the downloaded file to the desired location. 1. Create a new folder named Spark in the root of your C: drive. From a command line, enter the following: cd mkdir Spark 2. In Explorer, locate the Spark file you downloaded. 3. Right-click the file and extract it to C:Spark using the tool you have on your system (e.g., 7-Zip). 4. Now, your C:Spark folder has a new folder spark-2.4.5-bin-hadoop2.7 with the necessary files inside. Step 6: Add winutils.exe File Download the winutils.exe file for the underlying Hadoop version for the Spark installation you downloaded. 1. Navigate to this URL https://github.com/cdarlint/winutils and inside the bin folder, locate winutils.exe, and click it. 2. Find the Download button on the right side to download the file. 3. Now, create new folders Hadoop and bin on C: using Windows Explorer or the Command Prompt. 4. Copy the winutils.exe file from the Downloads folder to C:hadoopbin.
  • 19. 19 Step 7: Configure Environment Variables Configuring environment variables in Windows adds the Spark and Hadoop locations to your system PATH. It allows you to run the Spark shell directly from a command prompt window. 1. Click Start and type environment. 2. Select the result labeled Edit the system environment variables. 3. A System Properties dialog box appears. In the lower-right corner, click Environment Variables and then click New in the next window. 4. For Variable Name type SPARK_HOME. 5. For Variable Value type C:Sparkspark-2.4.5-bin-hadoop2.7 and click OK. If you changed the folder path, use that one instead.
  • 20. 20 6. In the top box, click the Path entry, then click Edit. Be careful with editing the system path. Avoid deleting any entries already on the list. 7. You should see a box with entries on the left. On the right, click New. 8. The system highlights a new line. Enter the path to the Spark folder C:Sparkspark- 2.4.5-bin-hadoop2.7bin. We recommend using %SPARK_HOME%bin to avoid possible issues with the path.
  • 21. 21 9. Repeat this process for Hadoop and Java.  For Hadoop, the variable name is HADOOP_HOME and for the value use the path of the folder you created earlier: C:hadoop. Add C:hadoopbin to the Path variable field, but we recommend using %HADOOP_HOME%bin.  For Java, the variable name is JAVA_HOME and for the value use the path to your Java JDK directory (in our case it’s C:Program FilesJavajdk1.8.0_251). 10. Click OK to close all open windows. Step 8: Launch Spark 1. Open a new command-prompt window using the right-click and Run as administrator: 2. To start Spark, enter: C:Sparkspark-2.4.5-bin-hadoop2.7binspark-shell If you set the environment path correctly, you can type spark-shell to launch Spark. 3. The system should display several lines indicating the status of the application. You may get a Java pop-up. Select Allow access to continue.
  • 22. 22 Finally, the Spark logo appears, and the prompt displays the Scala shell. 4., Open a web browser and navigate to http://localhost:4040/. 5. You can replace localhost with the name of your system. 6. You should see an Apache Spark shell Web UI. The example below shows the Executors page. 7. To exit Spark and close the Scala shell, press ctrl-d in the command-prompt window. Result:- Thus, the SPARK was installed and configured successfully.
  • 23. 23 EX.5 IMPLEMENT WORD COUNT / FREQUENCY PROGRAMS USING SPARK Aim:- To Implement word count / frequency programs using Spark Program:- package org.apache.spark.examples; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import scala.Tuple2; import java.util.Arrays; import java.util.List; import java.util.regex.Pattern; public final class WordCount { private static final Pattern SPACE = Pattern.compile(" "); public static void main(String[] args) throws Exception { if (args.length < 1) { System.err.println("Usage: WordCount <file>"); System.exit(1); } final SparkConf sparkConf = new SparkConf().setAppName("WordCount"); final JavaSparkContext ctx = new JavaSparkContext(sparkConf); final JavaRDD<String> lines = ctx.textFile(args[0], 1); final JavaRDD<String> words = lines.flatMap(s -> Arrays.asList(SPACE.split(s))); final JavaPairRDD<String, Integer> ones = words.mapToPair(s -> new Tuple2<>(s, 1)); final JavaPairRDD<String, Integer> counts = ones.reduceByKey((i1, i2) -> i1 + i2); final List<Tuple2<String, Integer>> output = counts.collect(); for (Tuple2 tuple : output) { System.out.println(tuple._1() + ": " + tuple._2()); } ctx.stop(); }}
  • 24. 24 Result:- Thus, the word count program is executed successfully.
  • 25. 25 EX.6 IMPLEMENT MACHINE LEARNING USING SPARK Aim:- To implement machine learning using spark Procedure:- Spark MLlib is a module on top of Spark Core that provides machine learning primitives as APIs. 1. Setting the Dependencies First, we have to define the following dependency in Maven to pull the relevant libraries: <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-mllib_2.11</artifactId> <version>2.4.3</version> <scope>provided</scope> </dependency> And we need to initialize the SparkContext to work with Spark APIs: SparkConf conf = new SparkConf() .setAppName("Main") .setMaster("local[2]"); JavaSparkContext sc = new JavaSparkContext(conf); 2. Loading the Data First things first, we should download the data, which is available as a text file in CSV format. Then we have to load this data in Spark: String dataFile = "datairis.data"; JavaRDD<String> data = sc.textFile(dataFile); Spark MLlib offers several data types, both local and distributed, to represent the input data and corresponding labels. The simplest of the data types are Vector: JavaRDD<Vector> inputData = data .map(line -> { String[] parts = line.split(","); double[] v = new double[parts.length - 1]; for (int i = 0; i < parts.length - 1; i++) { v[i] = Double.parseDouble(parts[i]); } return Vectors.dense(v); });
  • 26. 26 A training example typically consists of multiple input features and a label, represented by the class LabeledPoint: Map<String, Integer> map = new HashMap<>(); map.put("Iris-setosa", 0); map.put("Iris-versicolor", 1); map.put("Iris-virginica", 2); JavaRDD<LabeledPoint> labeledData = data .map(line -> { String[] parts = line.split(","); double[] v = new double[parts.length - 1]; for (int i = 0; i < parts.length - 1; i++) { v[i] = Double.parseDouble(parts[i]); } return new LabeledPoint(map.get(parts[parts.length - 1]), Vectors.dense(v)); }); 3. Exploratory Data Analysis MultivariateStatisticalSummary summary = Statistics.colStats(inputData.rdd()); System.out.println("Summary Mean:"); System.out.println(summary.mean()); System.out.println("Summary Variance:"); System.out.println(summary.variance()); System.out.println("Summary Non-zero:"); System.out.println(summary.numNonzeros()); Another important metric to analyze is the correlation between features in the input data: Matrix correlMatrix = Statistics.corr(inputData.rdd(), "pearson"); System.out.println("Correlation Matrix:"); System.out.println(correlMatrix.toString()); Result:- Thus, the word count program is executed successfully.
  • 27. 27 EX.7 IMPLEMENTATION OF LINEAR AND LOGISTIC REGRESSION USING R Aim:- To implement Linear and Logistic Regression using R. Procedure:- Step 1: Carry out the experiment of gathering a sample of observed values of height and corresponding weight. Step 2: Create a relationship model using the lm() functions in R. Step 3: Find the coefficients from the model created and create the mathematical equation using these Step 4: Get a summary of the relationship model to know the average error in prediction. Also called residuals. Step 5:To predict the weight of new persons, use the predict () function in R. Program:- Linear Regression: Linear..r # The predictor vector. x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) # The response vector. y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) # Apply the lm() function. relation <- lm(y~x) print(relation) print(summary(relation)) # Find weight of a person with height 170. a <- data.frame(x = 170) result <- predict(relation,a) print(result)
  • 28. 28 # Give the chart file a name. png(file = "linearregression.png") # Plot the chart. plot(y,x,col = "blue",main = "Height & Weight Regression", abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm") # Save the file. dev.off() Logistic Regression: logistic.r # Select some columns form mtcars. input <- mtcars[,c("am","cyl","hp","wt")] print(head(input)) am.data = glm(formula = am ~ cyl + hp + wt, data = input, family = binomial) print(summary(am.data)) Result:- Thus, Linear and Logistic Regression was implemented using R Programming successfully.
  • 29. 29 EX.8 IMPLEMENTATION OF DECISION TREE CLASSIFIER USING R Aim:- To implement Decision Tree Classifier using R Procedure:- Step 1:Install party package Step 2:Input the data Step 3:Create decision tree using ctree() function Step 4: Display and Print Result. Program:- # Load the party package. It will automatically load other dependent packages. library(party) # Create the input data frame. input.dat <- readingSkills[c(1:105),] # Give the chart file a name. png(file = "decision_tree.png") # Create the tree. output.tree <- ctree( nativeSpeaker ~ age + shoeSize + score, data = input.dat) # Plot the tree. plot(output.tree) # Save the file. dev.off() Result:- Thus, the Decision tree classifier is implemented using R.
  • 30. 30 EX.9 IMPLEMENTATION OF CLUSTERING TECHNIQUE Aim: - To implement Clustering Technique (Pam) using R Procedure:- Step 1: Include cluster package. Step 2: Apply Pam method to car dataset Step 3: Compare hclust and pam result using table method Step 4: Plot the result Program:- Pam.r library(cluster) cars.pam = pam(cars.dist,3) names(cars.pam) table(groups.3,cars.pam$clustering) cars$Car[groups.3 != cars.pam$clustering] cars$Car[cars.pam$id.med] cars$Car[cars.pam$id.med] plot(cars.pam) Result:- Thus, the clustering using Pam was implemented using R.
  • 31. 31 EX.10 IMPLEMENTATION OF DATA VISUALIZATION Aim:- To implement the Data Visualization using R Program. Procedure:- Step 1: Read the input Step 2: Visualize the data using i)Piechart ii)3DPiechart iii)Boxplot iv)Histogram v)Linechart vi)Scatterplot Program :- 1.Piechart.r # Create data for the graph. x <- c(21, 62, 10, 53) labels <- c("London", "New York", "Singapore", "Mumbai") # Give the chart file a name. png(file = "city_title_colours.jpg") # Plot the chart with title and rainbow color pallet. pie(x, labels, main = "City pie chart", col = rainbow(length(x))) # Save the file. dev.off()
  • 32. 32 2.ThreeDPiechart.r # Get the library. library(plotrix) # Create data for the graph. x <- c(21, 62, 10,53) lbl <- c("London","New York","Singapore","Mumbai") # Give the chart file a name. png(file = "3d_pie_chart.jpg") # Plot the chart. pie3D(x,labels = lbl,explode = 0.1, main = "Pie Chart of Countries ") # Save the file. dev.off() 3.Boxplot.r # Give the chart file a name. png(file = "boxplot.png") # Plot the chart. boxplot(mpg ~ cyl, data = mtcars, xlab = "Number of Cylinders", ylab = "Miles Per Gallon", main = "Mileage Data") # Save the file. dev.off() 4.Histogram.r # Create data for the graph. v <- c(9,13,21,8,36,22,12,41,31,33,19) # Give the chart file a name. png(file = "histogram.png") # Create the histogram. hist(v,xlab = "Weight",col = "yellow",border = "blue") # Save the file. dev.off()
  • 33. 33 5.Linechart.r # Create the data for the chart. v <- c(7,12,28,3,41) # Give the chart file a name. png(file = "line_chart_label_colored.jpg") # Plot the bar chart. plot(v,type = "o", col = "red", xlab = "Month", ylab = "Rain fall", main = "Rain fall chart") # Save the file. dev.off() 6.Scatterplot.r # Get the input values. input <- mtcars[,c('wt','mpg')] # Give the chart file a name. png(file = "scatterplot.png") # Plot the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30. plot(x = input$wt,y = input$mpg, xlab = "Weight", ylab = "Milage", xlim = c(2.5,5), ylim = c(15,30), main = "Weight vs Milage" ) # Save the file. dev.off() Result:- Thus, the different data visualization techniques were implemented using R.
  • 34. 34 EX.11 Implementation of an Application Aim:- To implement survival analysis using R Procedure:- Step 1: Install survival package Step 2: Display input to check details Step 3: Create survival object Step 4: Display the output Program:- Survival.r # Load the library. library("survival") # Print first few rows. print(head(pbc)) # Create the survival object. survfit(Surv(pbc$time,pbc$status == 2)~1) # Give the chart file a name. png(file = "survival.png") # Plot the graph. plot(survfit(Surv(pbc$time,pbc$status == 2)~1)) # Save the file. dev.off() Result:- Thus, the survival analysis was implemented using R.