UNIT 2 _ Data Processing and Aanalytics.pptx

UNIT II : DATA Processing and Analytics
By
Mr.S.Selvaraj, AP(SRG) / CSD
Ms. K. Jothimani, AP / CSD
Kongu Engineering College
Perundurai, Erode, Tamilnadu, India
20VA028 – IMAGE PROCESSING WITH
MATLB
Thanks to and Resource from : Carl Hamacher, Zvonko Vranesic, Safwat Zaky, Naraig Manjikian, “Computer Organization and Embedded Systems”, McGraw Hill Education; 6th edition, 2017

Unit Wise Syllabus – CO
11/18/2022 Unit 2: Data Processing and Analytics 2

What is Table?

Import Data

Summary of Table

Plotting Lat and Lon Data

Set Text Type as String
EPL = readtable("EPLresults.csv","TextType","string")

table() function
• You can organize your workspace variables
into a table with the table function.
• The following code creates a table, data with
variables a, b, and c.
– data = table(a,b,c)

Array2table() function
• You can use the array2table function to
convert a matrix to a table.
• The following code creates a table named data
from a matrix, A.
– data = array2table(A)

Example

create custom variable names
• create custom variable names in the table,
follow the variable input with the property
VariableNames and a string array of text.
• The following code creates a table named data
with custom variable names, X and Y..
– data = array2table(A,... "VariableNames",["X"
"Y"])

• You can sort a table on a specific variable
using the sortrows function.
– tSort= sortrows(tableName,... "SortingVariable")
• To put the top teams at the top of the table,
you need to sort in descending order.
• You can use the "descend" option to sort in
descending order.
– tSort = sortrows(tableName,...
"SortingVariable","descend")

• To sort by a second variables, supply them in
order to the sortrows function as a string
array.
– tSort = sortrows(tableName,... ["var1"
"var2"],"descend")

Getting Data into MATLAB
• You can use the Import Tool to import many types of data
interactively.
• In MATLAB, you can interactively import data files having several
formats such as: TXT, CSV, XLS, XLSX, JPG, PNG, etc.
• In this lesson, you will load, modify, save and clear data in MATLAB.

Getting Data into MATLAB
• In the Import Tool, you need to do three things:
1. Select the data to load. The cells that will be loaded are highlighted.
Yellow shading means there is a missing value, which will be
imported as NaN, or not-a-number.
2. Specify how you want to load the dataset. Should it be a table, a set
of column vectors, a matrix, or text data?
3. Click Import Selection when you are ready.

Importing Data with the Import Tool
• You can import gasprices.csv as a matrix using the Import
Tool in three steps.
1. Select the cells with gas prices. Here they are shaded.
2. Change the Output Type to Numeric Matrix.
3. Click Import Selection.

Extracting Part of an Array
• The data is currently all stored in a single array.
• The first column represents the years; the remaining columns are the
prices.
• You can interactively extract parts of an array by clicking and dragging to
select elements, right-clicking to bring up the context menu, then
selecting New Variable from Selection.
• This creates a new variable with a default name. You can rename variables
in the Workspace by right-clicking and selecting Rename from the context
menu.

Save variables to a MAT - file
• You can use the save command to save variables to a MAT-file.
• >> save fileName
• >> save fileName var1 var2
• These commands both save variables in the workspace to a MAT-
file named fileName.mat.
• The first command saves all variables currently in the workspace.
The second saves only var1 and var2.

Extracting Portions of a Table

• EPL =readtable("EPLresults.csv","TextType","string");
• EPL = sortrows(EPL,["HomeWins“ "AwayWins"],"descend")

Example
• You can create a subset of the original table using
regular array indexing with parentheses.
• winningTeams = EPL(1:4,1)
• winningTeams =
Team
___________________
"Leicester City"
"Arsenal"
"Manchester City"
"Manchester United"

What will be the Result?
• A = EPL(1:6,:)
• B = EPL(:,[1 2 7])
• C = EPL(2:4,[1 2 3 7 8])
• D = EPL([1:4 18],[1 2 3 7 8])
• E = EPL([18 4:-1:1],:)

Index using Variable Name
• When indexing into a table, it's often easier to
remember a variable name as opposed to
figuring out the specific column number.
• So, as an alternative to numeric indexing, you
can index using the variable name in double
quotes.
• hmWins = EPL(:,"HomeWins");

Select Multiple Variables
• It would be easier to compare the home goals
for if the team names were included.
• You can select multiple variables by name
using a string vector of variable names as
input.
• wins = EPL(:,["HomeWins" "AwayWins"]);

Indexing by Number and Name
• You can also index into a table using a
combination of indexing by number and
name.
• fhw = EPL(2:2:8,["Team" "HomeWins"]);

Specialized data
• When you use readtable to bring your data into
MATLAB, dates are often automatically detected
and brought in as datetime arrays.
• A datetime array makes date and time data
easier to work with, because many functions are
designed to handle them, such as sortrows and
plot.
• For instance, if you tried to sort dates stored in a
string array, the sorting would be alphabetical.
• December would come before January, and you
probably meant to sort chronologically.

Data Types for Date and Time

MATLAB Functions

datetime

duration variable

Additional functions
• sortrows()
• cumsum()

Create datetime
• seasonStart = datetime(2015,8,8)
• seasonEnd = datetime(2016,5,17)
• seasonLength = seasonEnd - seasonStart

Convert HH:MM:SS into days
• The returned length of time value is called a
duration and is given in hours.
• You can convert this to a more readable
number, like days, using the days function.
• seasonLength = days(seasonLength)

Output in days
• The returned value is now a number rather
than a duration. The days function will convert
the input value from a duration to a number
or vice versa depending on the input.
• seasonLength = days(seasonLength)

Preprocessing Data

Three Option on Missing Values

MATLAB Data Preprocessing Functions

normalize()
• One of the most common ways to normalize data
is to shift it so that it's mean is centered on zero
(i.e. the data has zero mean) and scale the data
so that it's standard deviation is one.
• This is called the z-score of the data.
• To normalize data using z-scores, you can use the
normalize function.
– xNorm = normalize(X)
• By default, normalize acts on the columns of
array X

isnan ()
• Instead of ==, you can use the isnan function
to identify NaN values. The isnan function
takes an array as input and returns a logical
array of the same size.

ismissing()
• The isnan function is used to identify missing
values in numeric data types, where missing
values are denoted as NaN values.
• The ismissing function is more general and
identifies missing values in other data types as
well.

nnz()
• Remember, that the nnz function counts the
number of non-zero elements in a logical
array.

omitnan
• Some functions allow you to skip, or ignore,
missing data.
• For instance, the mean and prod functions
accept the "omitnan" flag.
– mean(v,"omitnan")

• Sometimes a missing value has a specific
meaning, like 0 measurement.
• You can use the logical vector that identifies
missing data to access and change them.
– data(idxMissing)=42
– idx = ismissing(x,[NaN -999])

UNIT 2 _ Data Processing and Aanalytics.pptx

Recommended

Recommended

More Related Content

Similar to UNIT 2 _ Data Processing and Aanalytics.pptx

Similar to UNIT 2 _ Data Processing and Aanalytics.pptx (20)

Recently uploaded

Recently uploaded (20)

UNIT 2 _ Data Processing and Aanalytics.pptx