DA Lab Manual Data Analysis Data AnalysisData AnalysisData AnalysisData Analysis

1
EX.1 Hadoop Installation and Configuration
Aim:-
To install and Configure Hadoop Environment.
Procedure:-
1. Install Java 8:
a. Download Java 8 a. Set environmental variables:
i. User variable:
 Variable: JAVA_HOME
 Value: C:java
ii. System variable:
 Variable: PATH
 Value: C:javabin
b. Check on cmd, see below:
2. Install Eclipse Mars. Download it from the link: https://eclipse.org/downloads/ and
extract it into C drive.
a. Set environmental variables:
i. User variable:
 Variable: ECLIPSE_HOME
 Value: C:eclipse
ii. System variable:
 Variable: PATH
 Value: C:eclipse bin
b. Download “hadoop2x-eclipse-plugin-master.”Three Jar files on the path
“hadoop2x- eclipse-plugin-masterrelease.” Copy these three jar files and pate
them into “C:eclipsedropins.”
c. Download “slf4j-1.7.21.” Copy Jar files from this folder and paste them to
“C:eclipseplugins”. This step may create errors; when you will execute Eclipse,
you will see errors like org.apa…..jar file in multiple places. So, now delete these
files from all the places except one.
Errors

2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/C:/eclipse/plugins/org.slf4j.impl.log4j12_1.7.2.v20131105-
2200.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/hadoop-
2.6.0/share/hadoop/common/lib/slf4j-log4j12-
1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
3. Download Apache-ant-1.9.6: (optional step) extract it into a folder in C drive.
4. Download Hadoop-2.6.x:
a. Put extracted Hadoop-2.6.x files into D drive.
b. Download “hadoop-common-2.6.0-bin-master. Paste all these files into the
“bin” folder of Hadoop-2.6.x.
c. Create a “data” folder inside Hadoop-2.6.x, and also create two more folders in
the “data” folder as “data” and “name.”
d. Create a folder to store temporary data during execution of a project, such as
“D:hadooptemp.”
e. Create a log folder, such as “D:hadoopuserlog”
f. Go to Hadoop-2.6.x etc Hadoop and edit four files:
i. core-site.xml
ii. hdfs-site.xml
iii. mapred.xml
iv. yarn.xml
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied.
See the License for the specific language governing permissions and

3
limitations under the License. See accompanying LICENSE file. -->

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>D:hadooptemp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:50071</value>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<configuration>
<property><name>dfs.replication</name><value>1</value></property>
<property> <name>dfs.namenode.name.dir</name><value>/hadoop-
2.6.0/data/name</value><final>true</final></property>
<property><name>dfs.datanode.data.dir</name><value>/hadoop-
2.6.0/data/data</value><final>true</final> </property>
</configuration>
mapred.xml
<?xml version="1.0"?>

4
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>/hadoop-2.6.0/share/hadoop/mapreduce/*,
/hadoop-2.6.0/share/hadoop/mapreduce/lib/*,
/hadoop-2.6.0/share/hadoop/common/*,
/hadoop-2.6.0/share/hadoop/common/lib/*,
/hadoop-2.6.0/share/hadoop/yarn/*,
/hadoop-2.6.0/share/hadoop/yarn/lib/*,
/hadoop-2.6.0/share/hadoop/hdfs/*,
/hadoop-2.6.0/share/hadoop/hdfs/lib/*,
</value>
</property>
</configuration>
yarn-site.xml
<?xml version="1.0"?>

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>

5
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>D:hadoopuserlog</value><final>true</final>
</property>
<property><name>yarn.nodemanager.local-dirs</name><value>D:hadooptempnm-
local-dir</value></property>
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value> 600</value>
</property>
<property><name>yarn.application.classpath</name>
<value>/hadoop-2.6.0/,/hadoop-2.6.0/share/hadoop/common/*,/hadoop-
2.6.0/share/hadoop/common/lib/*,/hadoop-2.6.0/share/hadoop/hdfs/*,/hadoop-
2.6.0/share/hadoop/hdfs/lib/*,/hadoop-2.6.0/share/hadoop/mapreduce/*,/hadoop-
2.6.0/share/hadoop/mapreduce/lib/*,/hadoop-2.6.0/share/hadoop/yarn/*,/hadoop-
2.6.0/share/hadoop/yarn/lib/*</value>
</property>
</configuration>
g. Go to the location: “Hadoop-2.6.0->etc->hadoop,” and edit “hadoop-env.cmd” by
writing
set JAVA_HOME=C:javajdk1.8.0_91
h. Set environmental variables: Do: My computer -> Properties -> Advance system
settings -> Advanced -> Environmental variables
i. User variables:
 Variable: HADOOP_HOME
 Value: D:hadoop-2.6.0
ii. System variable
 Variable: Path
 Value: D:hadoop-2.6.0bin
D:hadoop-2.6.0sbin
D:hadoop-2.6.0sharehadoopcommon*
D:hadoop-2.6.0sharehadoophdfs
D:hadoop-2.6.0sharehadoophdfslib*

6
D:hadoop-2.6.0sharehadoophdfs*
D:hadoop-2.6.0sharehadoopyarnlib*
D:hadoop-2.6.0sharehadoopyarn*
D:hadoop-2.6.0sharehadoopmapreducelib*
D:hadoop-2.6.0sharehadoopmapreduce*
D:hadoop-2.6.0sharehadoopcommonlib*
i. Check on cmd;
j. Format name-node: On cmd go to the location “Hadoop-2.6.0 bin” by writing on cmd
“cd hadoop-2.6.0.bin” and then “hdfs namenode –format”
k. Start Hadoop. Go to the location: “D:hadoop-2.6.0sbin.” Run the following files as
administrator “start-dfs.cmd” and “start-yarn.cmd”
How to create a new MapReduce project in Eclipse
1. Open Ellipse
2. Click File -> New Project -> Java project

7
3. Click next and add external Jars for MapReduce.
Copy all the Jar files from the locations “D:hadoop-2.6.0”
a. sharehadoopcommonlib
b. sharehadoopmapreduce
c. sharehadoopmapreducelib sharehadoopyarn
d. sharehadoopyarnlib
4. Connect DFS in Eclipse

8
Eclipse -> Window -> Perspective -> Open Perspective -> Other -> MapReduce -> Click
OK.
See a bar at the bottom. Click on Map/Reduce locations.
Right click on blank space, then click on “Edit setting,” and see the following screen.
a. Set the following:
i. MapReduce (V2) Master
 Host: localhost
 Port: 9001
ii. DFS Master
 Host: localhost
 Port: 50071
b. Click finish
Result:-
Thus, the Hadoop Environment was installed and Configured.

9
EX.2 Implementation of word count/frequency using MapReduce
Aim:-
To implement word count program using MapReduce.
Program:-
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

10
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Result:-
Thus, the word count program was executed using Hadoop environment.

11
EX.3 Implementation of MR program using Weather dataset
Aim :-
To write a code to find maximum temperature per year from sensor temperature
data sheet, using hadoop mapreduce framework.
Procedure:-
Implement Mapper and Reducer program for finding Maximum temperature in java
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
//Mapper class
class MaxTemperatureMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);

12
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
//Reducer class
class MaxTemperatureReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
//Driver Class
public class MaxTemperature {
if (args.length != 2) {
System.err.println("Usage: MaxTemperature <input path=""> <output path>");
System.exit(-1);
}
Job job = Job.getInstance(new Configuration());
job.setJarByClass(MaxTemperature.class);
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);

13
job.setOutputValueClass(IntWritable.class);
job.submit();
}
}
Result:-
Thus, Maximum temperature of weather dataset was obtained using MapReduce.

14
EX.4 INSTALL, CONFIGURE AND RUN SPARK
Aim:-
To install and configure spark in standalone machine.
Procedure:-
Step 1: Install Java 8
Apache Spark requires Java 8. You can check to see if Java is installed using the
command prompt.
Open the command line by clicking Start > type cmd > click Command Prompt.
Type the following command in the command prompt:
java -version
If Java is installed, it will respond with the following output:
Step 2: Install Python
1. To install the Python package manager, navigate to https://www.python.org/ in your
web browser.
2. Mouse over the Download menu option and click Python 3.8.3. 3.8.3 is the latest
version at the time of writing the article.
3. Once the download finishes, run the file.

15
4. Near the bottom of the first setup dialog box, check off Add Python 3.8 to PATH. Leave
the other box checked.
5. Next, click Customize installation.
6. You can leave all boxes checked at this step, or you can uncheck the options you do
not want.

16
7. Click Next.
8. Select the box Install for all users and leave other boxes as they are.
9. Under Customize install location, click Browse and navigate to the C drive. Add a new
folder and name it Python.
10. Select that folder and click OK.
11. Click Install, and let the installation complete.
12. When the installation completes, click the Disable path length limit option at the bottom
and then click Close.
13. If you have a command prompt open, restart it. Verify the installation by checking the
version of Python:
python --version
The output should print Python 3.8.3.
Step 3: Download Apache Spark
1. Open a browser and navigate to https://spark.apache.org/downloads.html.
2. Under the Download Apache Spark heading, there are two drop-down menus. Use the
current non-preview version.
In our case, in Choose a Spark release drop-down menu select 2.4.5

17
In the second drop-down Choose a package type, leave the selection Pre-built for Apache
Hadoop 2.7.
3. Click the spark-2.4.5-bin-hadoop2.7.tgz link.
4. A page with a list of mirrors loads where you can see different servers to download
from. Pick any from the list and save the file to your Downloads folder.
Step 4: Verify Spark Software File
1. Verify the integrity of your download by checking the checksum of the file. This ensures
you are working with unaltered, uncorrupted software.
2. Navigate back to the Spark Download page and open the Checksum link, preferably in
a new tab.
3. Next, open a command line and enter the following command:
certutil -hashfile c:usersusernameDownloadsspark-2.4.5-bin-hadoop2.7.tgz
SHA512
4. Change the username to your username. The system displays a long alphanumeric
code, along with the message Certutil: -hashfile completed successfully.

18
5. Compare the code to the one you opened in a new browser tab. If they match, your
download file is uncorrupted.
Step 5: Install Apache Spark
Installing Apache Spark involves extracting the downloaded file to the desired location.
1. Create a new folder named Spark in the root of your C: drive. From a command line,
enter the following:
cd
mkdir Spark
2. In Explorer, locate the Spark file you downloaded.
3. Right-click the file and extract it to C:Spark using the tool you have on your system
(e.g., 7-Zip).
4. Now, your C:Spark folder has a new folder spark-2.4.5-bin-hadoop2.7 with the
necessary files inside.
Step 6: Add winutils.exe File
Download the winutils.exe file for the underlying Hadoop version for the Spark installation
you downloaded.
1. Navigate to this URL https://github.com/cdarlint/winutils and inside the bin folder,
locate winutils.exe, and click it.
2. Find the Download button on the right side to download the file.
3. Now, create new folders Hadoop and bin on C: using Windows Explorer or the
Command Prompt.
4. Copy the winutils.exe file from the Downloads folder to C:hadoopbin.

19
Step 7: Configure Environment Variables
Configuring environment variables in Windows adds the Spark and Hadoop locations to
your system PATH. It allows you to run the Spark shell directly from a command prompt
window.
1. Click Start and type environment.
2. Select the result labeled Edit the system environment variables.
3. A System Properties dialog box appears. In the lower-right corner, click Environment
Variables and then click New in the next window.
4. For Variable Name type SPARK_HOME.
5. For Variable Value type C:Sparkspark-2.4.5-bin-hadoop2.7 and click OK. If you
changed the folder path, use that one instead.

20
6. In the top box, click the Path entry, then click Edit. Be careful with editing the system
path. Avoid deleting any entries already on the list.
7. You should see a box with entries on the left. On the right, click New.
8. The system highlights a new line. Enter the path to the Spark folder C:Sparkspark-
2.4.5-bin-hadoop2.7bin. We recommend using %SPARK_HOME%bin to avoid possible
issues with the path.

21
9. Repeat this process for Hadoop and Java.
 For Hadoop, the variable name is HADOOP_HOME and for the value use the path
of the folder you created earlier: C:hadoop. Add C:hadoopbin to the Path variable
field, but we recommend using %HADOOP_HOME%bin.
 For Java, the variable name is JAVA_HOME and for the value use the path to your
Java JDK directory (in our case it’s C:Program FilesJavajdk1.8.0_251).
10. Click OK to close all open windows.
Step 8: Launch Spark
1. Open a new command-prompt window using the right-click and Run as administrator:
2. To start Spark, enter:
C:Sparkspark-2.4.5-bin-hadoop2.7binspark-shell
If you set the environment path correctly, you can type spark-shell to launch Spark.
3. The system should display several lines indicating the status of the application. You
may get a Java pop-up. Select Allow access to continue.

22
Finally, the Spark logo appears, and the prompt displays the Scala shell.
4., Open a web browser and navigate to http://localhost:4040/.
5. You can replace localhost with the name of your system.
6. You should see an Apache Spark shell Web UI. The example below shows the
Executors page.
7. To exit Spark and close the Scala shell, press ctrl-d in the command-prompt window.
Result:-
Thus, the SPARK was installed and configured successfully.

23
EX.5 IMPLEMENT WORD COUNT / FREQUENCY PROGRAMS USING SPARK
Aim:-
To Implement word count / frequency programs using Spark
Program:-
package org.apache.spark.examples;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;
public final class WordCount {
private static final Pattern SPACE = Pattern.compile(" ");
if (args.length < 1) {
System.err.println("Usage: WordCount <file>");
System.exit(1);
}
final SparkConf sparkConf = new SparkConf().setAppName("WordCount");
final JavaSparkContext ctx = new JavaSparkContext(sparkConf);
final JavaRDD<String> lines = ctx.textFile(args[0], 1);
final JavaRDD<String> words = lines.flatMap(s ->
Arrays.asList(SPACE.split(s)));
final JavaPairRDD<String, Integer> ones = words.mapToPair(s -> new
Tuple2<>(s, 1));
final JavaPairRDD<String, Integer> counts = ones.reduceByKey((i1, i2) ->
i1 + i2);
final List<Tuple2<String, Integer>> output = counts.collect();
for (Tuple2 tuple : output) {
System.out.println(tuple._1() + ": " + tuple._2());
}
ctx.stop();
}}

24
Result:-
Thus, the word count program is executed successfully.

25
EX.6 IMPLEMENT MACHINE LEARNING USING SPARK
Aim:-
To implement machine learning using spark
Procedure:-
Spark MLlib is a module on top of Spark Core that provides machine learning primitives
as APIs.
1. Setting the Dependencies
First, we have to define the following dependency in Maven to pull the relevant libraries:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.4.3</version>
<scope>provided</scope>
</dependency>
And we need to initialize the SparkContext to work with Spark APIs:
SparkConf conf = new SparkConf()
.setAppName("Main")
.setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(conf);
2. Loading the Data
First things first, we should download the data, which is available as a text file in CSV
format. Then we have to load this data in Spark:
String dataFile = "datairis.data";
JavaRDD<String> data = sc.textFile(dataFile);
Spark MLlib offers several data types, both local and distributed, to represent the input
data and corresponding labels. The simplest of the data types are Vector:
JavaRDD<Vector> inputData = data
.map(line -> {
String[] parts = line.split(",");
double[] v = new double[parts.length - 1];
for (int i = 0; i < parts.length - 1; i++) {
v[i] = Double.parseDouble(parts[i]);
}
return Vectors.dense(v);
});

26
A training example typically consists of multiple input features and a label, represented
by the class LabeledPoint:
Map<String, Integer> map = new HashMap<>();
map.put("Iris-setosa", 0);
map.put("Iris-versicolor", 1);
map.put("Iris-virginica", 2);
JavaRDD<LabeledPoint> labeledData = data
.map(line -> {
String[] parts = line.split(",");
double[] v = new double[parts.length - 1];
for (int i = 0; i < parts.length - 1; i++) {
v[i] = Double.parseDouble(parts[i]);
}
return new LabeledPoint(map.get(parts[parts.length - 1]), Vectors.dense(v));
});
3. Exploratory Data Analysis
MultivariateStatisticalSummary summary = Statistics.colStats(inputData.rdd());
System.out.println("Summary Mean:");
System.out.println(summary.mean());
System.out.println("Summary Variance:");
System.out.println(summary.variance());
System.out.println("Summary Non-zero:");
System.out.println(summary.numNonzeros());
Another important metric to analyze is the correlation between features in the input data:
Matrix correlMatrix = Statistics.corr(inputData.rdd(), "pearson");
System.out.println("Correlation Matrix:");
System.out.println(correlMatrix.toString());
Result:-
Thus, the word count program is executed successfully.

27
EX.7 IMPLEMENTATION OF LINEAR AND LOGISTIC REGRESSION USING R
Aim:-
To implement Linear and Logistic Regression using R.
Procedure:-
Step 1: Carry out the experiment of gathering a sample of observed values of height and
corresponding weight.
Step 2: Create a relationship model using the lm() functions in R.
Step 3: Find the coefficients from the model created and create the mathematical equation
using these
Step 4: Get a summary of the relationship model to know the average error in prediction.
Also called residuals.
Step 5:To predict the weight of new persons, use the predict () function in R.
Program:-
Linear Regression:
Linear..r
# The predictor vector.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
# The response vector.
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function.
relation <- lm(y~x)
print(relation)
print(summary(relation))
# Find weight of a person with height 170.
a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)

28
# Give the chart file a name.
png(file = "linearregression.png")
# Plot the chart.
plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm")
# Save the file.
dev.off()
Logistic Regression:
logistic.r
# Select some columns form mtcars.
input <- mtcars[,c("am","cyl","hp","wt")]
print(head(input))
am.data = glm(formula = am ~ cyl + hp + wt, data = input, family = binomial)
print(summary(am.data))
Result:-
Thus, Linear and Logistic Regression was implemented using R Programming
successfully.

29
EX.8 IMPLEMENTATION OF DECISION TREE CLASSIFIER USING R
Aim:-
To implement Decision Tree Classifier using R
Procedure:-
Step 1:Install party package
Step 2:Input the data
Step 3:Create decision tree using ctree() function
Step 4: Display and Print Result.
Program:-
# Load the party package. It will automatically load other dependent packages.
library(party)
# Create the input data frame.
input.dat <- readingSkills[c(1:105),]
png(file = "decision_tree.png")
# Create the tree.
output.tree <- ctree(
nativeSpeaker ~ age + shoeSize + score,
data = input.dat)
# Plot the tree.
plot(output.tree)
# Save the file.
dev.off()
Result:-
Thus, the Decision tree classifier is implemented using R.

30
EX.9 IMPLEMENTATION OF CLUSTERING TECHNIQUE
Aim: -
To implement Clustering Technique (Pam) using R
Procedure:-
Step 1: Include cluster package.
Step 2: Apply Pam method to car dataset
Step 3: Compare hclust and pam result using table method
Step 4: Plot the result
Program:-
Pam.r
library(cluster)
cars.pam = pam(cars.dist,3)
names(cars.pam)
table(groups.3,cars.pam$clustering)
cars$Car[groups.3 != cars.pam$clustering]
cars$Car[cars.pam$id.med]
cars$Car[cars.pam$id.med]
plot(cars.pam)
Result:-
Thus, the clustering using Pam was implemented using R.

31
EX.10 IMPLEMENTATION OF DATA VISUALIZATION
Aim:-
To implement the Data Visualization using R Program.
Procedure:-
Step 1: Read the input
Step 2: Visualize the data using
i)Piechart
ii)3DPiechart
iii)Boxplot
iv)Histogram
v)Linechart
vi)Scatterplot
Program :-
1.Piechart.r
# Create data for the graph.
x <- c(21, 62, 10, 53)
labels <- c("London", "New York", "Singapore", "Mumbai")
png(file = "city_title_colours.jpg")
# Plot the chart with title and rainbow color pallet.
pie(x, labels, main = "City pie chart", col = rainbow(length(x)))
# Save the file.
dev.off()

32
2.ThreeDPiechart.r
# Get the library.
library(plotrix)
x <- c(21, 62, 10,53)
lbl <- c("London","New York","Singapore","Mumbai")
png(file = "3d_pie_chart.jpg")
# Plot the chart.
pie3D(x,labels = lbl,explode = 0.1, main = "Pie Chart of Countries ")
# Save the file.
dev.off()
3.Boxplot.r
png(file = "boxplot.png")
# Plot the chart.
boxplot(mpg ~ cyl, data = mtcars, xlab = "Number of Cylinders",
ylab = "Miles Per Gallon", main = "Mileage Data")
# Save the file.
dev.off()
4.Histogram.r
v <- c(9,13,21,8,36,22,12,41,31,33,19)
png(file = "histogram.png")
# Create the histogram.
hist(v,xlab = "Weight",col = "yellow",border = "blue")
# Save the file.
dev.off()

33
5.Linechart.r
# Create the data for the chart.
v <- c(7,12,28,3,41)
png(file = "line_chart_label_colored.jpg")
# Plot the bar chart.
plot(v,type = "o", col = "red", xlab = "Month", ylab = "Rain fall",
main = "Rain fall chart")
# Save the file.
dev.off()
6.Scatterplot.r
# Get the input values.
input <- mtcars[,c('wt','mpg')]
png(file = "scatterplot.png")
# Plot the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
plot(x = input$wt,y = input$mpg,
xlab = "Weight",
ylab = "Milage",
xlim = c(2.5,5),
ylim = c(15,30),
main = "Weight vs Milage"
)
# Save the file.
dev.off()
Result:-
Thus, the different data visualization techniques were implemented using R.

34
EX.11 Implementation of an Application
Aim:-
To implement survival analysis using R
Procedure:-
Step 1: Install survival package
Step 2: Display input to check details
Step 3: Create survival object
Step 4: Display the output
Program:-
Survival.r
# Load the library.
library("survival")
# Print first few rows.
print(head(pbc))
# Create the survival object.
survfit(Surv(pbc$time,pbc$status == 2)~1)
png(file = "survival.png")
# Plot the graph.
plot(survfit(Surv(pbc$time,pbc$status == 2)~1))
# Save the file.
dev.off()
Result:-
Thus, the survival analysis was implemented using R.

DA Lab Manual Data Analysis Data AnalysisData AnalysisData AnalysisData Analysis

Recommended

Recommended

More Related Content

Similar to DA Lab Manual Data Analysis Data AnalysisData AnalysisData AnalysisData Analysis

Similar to DA Lab Manual Data Analysis Data AnalysisData AnalysisData AnalysisData Analysis (20)

Recently uploaded

Recently uploaded (20)

DA Lab Manual Data Analysis Data AnalysisData AnalysisData AnalysisData Analysis