R ile Veri Madenciliği Yaz Okulu, 07 – 13 Eylül 2015, Muğla,
TOVAK ULUSLARARSI MARMARİS AKADEMİSİ
DATA VISUALIZATION
WITH R PACKAGES
FATMA ÇINAR, MBA, CAPITAL MARKETS BOARD OF TURKEY
E-mail: fatma.cinar@spk.gov.tr @fatma_cinar_ftm @TRUserGroup
Kutlu MERİH, PhD, e-mail: kutmerih@gmail.com @cortexien
https://www.riskonomi.com
Visualization of multidimensional multi factorial big data is
not large data, big data is complex data.
What is big data?
How Big Data Humour is big!
 We are trainnig decipher this complexcity
data Visualization.
 Data Visualization packages of R software
lattice and ggplot 2.
 What is data analysis?
 Why use a programming language?
 Why use R ?
 Why lattice packages?
 What is lattice packages grammer of
graphics?
 Why ggplot2 ?
 What is ggplot2 grammer of graphics?
Agenda • Case study: BRSA NUTS and Sectoral
Loans Default Chart of Turkey
Sectoral Loans Dataset Graphics Data-Mining
Analysis
Action
 Real Time Interactive Data Management
for
 Effect and Response Analysis
Technique:
 #Lattice and #ggplot2 Graphical Packages
using #R Software
 #library(lattice)
 #library(ggplot2)
 # This example uses the
ENGTOVAKLOANS dataset, which comes
with ggplot2
 names(dataset)
Wednesday, September 02, 2015
names(dataset)
 names(dataset)
 [1] "NYEAR" "SYEAR" "QUARTERS"
 [4] "CITY" "CITYCODE" "NREGION"
 [7] "REGION" "NUTS3CODE" "NUTS2CODE"
 [10] "NUTS1CODE" "TRNUTS1REGION" "NUTS1REGION"
 [13] "TRGROUP" "SECTORAL" "CASHLOANS"
 [16] "NONCASHLOANS" "TOTALCASHLOANS" "AUTO"
 [19] "MORTGAGE" "OVERDRAFTACCOUNT" "CREDITCARDS"
 [22] "FOOD" "BUILDING" "MINERALS "
 [25] "FINANCIAL" "TEXTILE" "WHOSESALE "
 [28] "TOURISM" "AGRICULTURE" "ENERGY"
 [31] "MARITIME" "OTHERCONSUMER" "DEFRECEIVABLE"
 [34] "DEFCREDITCARDS" "DEFAUTO" "DEFMORTGAGE"
 [37] "DEFOTHERCONSUMER" "DEFFOOD" "DEFBUILDING"
 [40] "DEFMINERALS" "DEFFINANCIAL" "DEFTEXTILE"
 [43] "DEFWHOLESALE " "DEFTOURISM" "DEFAGRICULTURE"
 [46] "DEFENERGY" "DEFMARITIME" "NONCASHFOOD"
 [49] "NONCAHBUILDING" "NONCASHMINERALS" "NONFINANCIAL"
 [52] "NONCASHTEXTILE" "NONCASHWHOLESALE " "NONCASHTOURISM"
 [55] "NONCASHAGRICULTURE" "NONCASHENERGY" "NONCASHMARITIME"
Wednesday, September 02, 2015
• [1] "NYEAR" "SYEAR" "QUARTERS"
• [4] "CITY" "CITYCODE" "NREGION"
• [7] "REGION" "NUTS3CODE" "NUTS2CODE"
• [10] "NUTS1CODE" "TRNUTS1REGION"
"NUTS1REGION"
• [13] "TRGROUP" "SECTORAL" "CASHLOANS"
• [16] "NONCASHLOANS" "TOTALCASHLOANS" "AUTO"
• [19] "MORTGAGE" "OVERDRAFTACCOUNT"
"CREDITCARDS"
• [22] "FOOD" "BUILDING" "MINERALS "
• [25] "FINANCIAL" "TEXTILE" "WHOSESALE "
• [28] "TOURISM" "AGRICULTURE" "ENERGY"
• [31] "MARITIME" "OTHERCONSUMER" "DEFRECEIVABLE"
• [34] "DEFCREDITCARDS" "DEFAUTO" "DEFMORTGAGE"
• [37] "DEFOTHERCONSUMER" "DEFFOOD" "DEFBUILDING"
• [40] "DEFMINERALS" "DEFFINANCIAL" "DEFTEXTILE"
• [43] "DEFWHOLESALE " "DEFTOURISM" "DEFAGRICULTURE"
• [46] "DEFENERGY" "DEFMARITIME" "NONCASHFOOD"
• [49] "NONCAHBUILDING" "NONCASHMINERALS" "NONFINANCIAL"
• [52] "NONCASHTEXTILE" "NONCASHWHOLESALE "
"NONCASHTOURISM"
• [55] "NONCASHAGRICULTURE" "NONCASHENERGY"
"NONCASHMARITIME"
Wednesday, September 02, 2015
NUTS-1:12 Region of
Turkey
 MEDITERRANEAN
 SOUTHEAST ANATOLIA
 EAGEAN REGION
 NORTHEAST ANATOLIA
 MIDDLE ANATOLIA
 WEST BLACK SEA
 WEST ANATOLIA
 EAST BLACK SEA
 WEST MARMARA
 MIDDLE EAST ANATOLIA
 ISTANBUL
 EAST MARMARA
•NUTS-1: 12 Regions
•NUTS-2: 26 Subregions
•NUTS-3: 81 Provinces
(Nomenclature of Territorial Units for
Statistics, NUTS)
İstanbul
Region
West
Marmara
Region
Aegean
Region
East
Marmara
West
Anatolia
Region
Mediterranea
n Region
Anatolia
Region
West Black
Sea Region
East Black
Sea Region
Northeast
Anatolia
Region
East
Anatolia
Region
Southea
st
Anatoli
a
İstanbul
(Subregion)
Tekirdağ
(Subregion)
İzmir
(Subregion)
Bursa
(Subregion)
Ankara
(Subregion)
Antalya
(Subregion)
Kırıkkale
(Subregion)
Zonguldak
(Subregion)
Trabzon
(Subregion)
Erzurum
(Subregion)
Malatya
(Subregion)
Gaziant
ep
(Subreg
ion)
Edirne
Aydın
(Subregion)
Eskişehir
Konya
(Subregion)
Isparta Aksaray Karabük Ordu Erzincan Elazığ
Adıyam
an
Kırlareli Denizli Bilecik Karaman Burdur Niğde Bartın Giresun Bayburt Bingöl Kilis
Balıkesir
(Subregion)
Muğla
Kocaeli
(Subregion)
Adana
(Subregion)
Nevşehir
Kastamonu
(Subregion)
Rize
Ağrı
(Subregion)
Dersim
Şanlıurf
a
(Subreg
ion)
Çanakkale
Manisa
(Subregion)
Sakarya Mersin Kırşehir Çankırı Artvin Kars
Van
(Subregion)
Diyarba
kır
A.Karahisar Düzce
Hatay
(Subregion)
Kayseri
(Subregion)
Sinop Gümüşhane Iğdır Muş
Mardin
(Subreg
ion)
Kütahya Bolu Kahramanmaraş Sivas
Samsun
(Subregion)
Ardahan Bitlis Batman
Uşak Yalova Osmaniye Yozgat Tokat Hakkari Şırnak
Çorum Siirt
Amasya
1 Province 5 Province 8 Province 8 Province 3 Province 8 Province 8 Province 10 Province 6 Province 7 Province 8 Province
9
Provinc
e
1. Lattice Graphics Packages
 How to create basic plots (xyplot,
scatterplots, histograms, boxwhisper,
dotplot and bar using qplot()
 Setting vs. mapping
 How to add group and factor=numerical
variable
Çarşamba 7.Ekim.2015
1.1. XYPlot Graphic Module
 library(lattice)
 p<-xyplot(NUMERIC ~ NUMERIC) | FACTOR1,
group= FACTOR2, data=dataset)
 p
 p<-xyplot( NUM ~ NUM ) | FAC1+FAC2, group=
FAC3, data=dataset)
 p
 p<-xyplot(DEFENERGY ~ ENERGY |
CITY+factor(NYEAR), group=SECTORAL,
data=dataset)
 p
Description
of XYPlot
Graphs
1. Lattice Graphics Packages
Çarşamba 7.Ekim.2015
p<-xyplot(DEFENERGY ~ ENERGY | CITY+factor(NYEAR),
group=SECTORAL, data=dataset)
XYPlot graph of the lattice packakge for 2 numerical 3 factors
values
Çarşamba 7.Ekim.2015
p<-xyplot(DEFENERGY ~ ENERGY | CITY+factor(NYEAR),
group=SECTORAL, data=dataset)
p<-xyplot(DEFENERGY ~ ENERGY |
SECTORAL+factor(NYEAR), group=NUTS1REGION,
data=dataset)
1.1.1. XYPlot Graphic Module and Legand
 library(lattice)
 p<-xyplot(NUMERIC ~ NUMERIC) | FACTOR1, group= FACTOR2,
data=dataset)
 p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) |
SECTORAL+factor(NYEAR), group=NUTS1REGION,
data=dataset)
 p
 p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) |
SECTORAL+factor(NYEAR), group=NUTS1REGION,
auto.key=list(border=TRUE),data=dataset)
 p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) | SECTORAL,
group=factor(NYEAR),
auto.key=list(border=TRUE),data=dataset)
Çarşamba 7.Ekim.2015
p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) | SECTORAL+factor(NYEAR),
group=NUTS1REGION, data=dataset)
p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) | SECTORAL+factor(NYEAR),
group=NUTS1REGION, auto.key=list(border=TRUE),data=dataset)
p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) | SECTORAL,
group=factor(NYEAR), auto.key=list(border=TRUE),data=dataset)
• We do factor-based analysis for the
begining with the simplest graphical form
of the histogram.
• Histograms of a single numeric value by
one factor are the starting point of the
factor based graphical analysis
Çarşamba 7.Ekim.2015
1.2. Histogram Graphic Module
Description
of
Histogram
Graphs
Çarşamba 7.Ekim.2015
p<-histogram( ~ log10(DEFENERGY) | SECTORAL, data=dataset)
p<-bwplot(SECTORAL ~ log10(DEFENERGY))
p<-bwplot(SECTORAL ~ log10(DEFENERGY) | NUTS1REGION)
1.3.DotPlot Graphic Module
#p<-dotplot (FACTOR1 ~ NUMERIC | FACTOR2,
group=FACTOR3, data=dataset)
p enter
**********CITY!!!!!!***************
p<-dotplot (CITY ~ DEFENERGY | SECTORAL,
group=factor(NYEAR), data=dataset)
p<-dotplot (CITY ~ log10(DEFENERGY) | SECTORAL,
group=factor(NYEAR), data=dataset)
p<-dotplot (NUTS1REGION ~ DEFENERGY |
SECTORAL, group=factor(NYEAR), data=dataset)
p<-dotplot(SECTORAL ~ log10(DEFENERGY) | CITY,
group=factor(NYEAR), data=dataset)
Description
of DotPlot
Graphs
Çarşamba 7.Ekim.2015
p<-dotplot(SECTORAL ~ log10(DEFENERGY) |
NUTS1REGION,group=factor(NYEAR),data=dataset)
p<-dotplot(SECTORAL ~ log10(DEFENERGY) |
NUTS1REGION,group=factor(NYEAR),auto.key=list(border=TRUE),data=dataset)
p<-dotplot(SECTORAL ~ DEFENERGY | CITY, group=factor(NYEAR), data=dataset)
p<-dotplot(SECTORAL ~ log10(DEFENERGY) | CITY, group=factor(NYEAR),
data=dataset)
p<-dotplot(CITY ~ DEFENERGY | SECTORAL, group=factor(NYEAR),
data=dataset)
p<-dotplot (CITY ~ log10(DEFENERGY) | SECTORAL, group=factor(NYEAR),
data=dataset)
2. Ggplot2 Graphics
Packages
 How to create basic plots (xyplot, scatterplots,
histograms, baloon, facet, density and violin)
using qplot()
 Setting vs. mapping
 How to add extra variables with aesthetics
(like color, shape, and size) or faceting
 https://plot.ly/ggplot2/geom_bar/
Çarşamba 7.Ekim.2015
What is ggplot2 ?
 Grammer of graphics represents and abstraction of
graphics ideas/objects
 Think ‘verb’, ‘noun’, ‘adjective’ for graphics
 Allows for a ‘theory’ of graphics on which to build
new graphics and graphics ogjects
 ‘Shorten the distence from mind to page’
Çarşamba 7.Ekim.2015
Grammer of Graphics ?
‘In brief, the grammer tells us that a statistical graphic
is a mapping from data to aesthetic attributes (color,
shape, size) of geometric object (point, lines, bars).
The plot may also contain stastistical transformations
of data and drawn on a specific coordinate system’
Hadley Wickham
Çarşamba 7.Ekim.2015
2.1.Logarithm Module
 library(ggplot2)
 ds<-ggplot(dataset)
 #as<-aes(log10(NUMERIC), log10(NUMERIC), color=FACTOR)
 as<-aes(log10(ENERGY), log10(DEFENERGY),
color=SECTORAL)
 lx<-scale_x_log10()
 ly<-scale_y_log10()
 p<-ds+as+gp+lx+ly
 p
Çarşamba 7.Ekim.2015
How to add extra variables with aesthetics (like
color, shape, and size)
#as<-(NUMERIC, NUMERIC, color=FACTOR, shape=factor(NUMERIC),
size=NUMERIC
as<-aes(ENERGY,DEFENERGY, color=NUTS1REGION, shape=factor(NYEAR),
size=DEFRECEIVABLE
gp<-geom_point()
ds<-ggplot(dataset)
ds<-ggplot(dataset)
p<-ds+as+gp
p enter
2.1.1. Baloon Graphic Module
Çarşamba 7.Ekim.2015
Description
of Baloon
Graphs
Baloon graphs of ggplot2 package can show us
3-dimensional relations distributed according 1-3
factors in scatterplot form.
With this type 2-dimensional numerical relations
can be represented under effect of 3rd numerical
value.
Çarşamba 7.Ekim.2015
as<-aes(Log10(ENERGY), (log10(DEFENERGY), color=factor(NYEAR),
shape=SECTORAL), size=DEFRECEIVABLE
ae<-aes(log10(ENERGY), log10(DEFENERGY),
color=SECTORAL)
gp<-geom_point()
ds<-ggplot(dataset)
dataset=subset(dataset, ENERGY!=0)
dataset=subset(dataset, DEFENERGY!=0)
ss<-stat_smooth(method = "lm", formula = y ~ x, size = 2)
p<-ds+ae+gp+ss
p
2.1.2. PowerLaw Graphic Module
Description
of Baloon
Graphs
ss<-stat_smooth(method = "lm", formula = y ~ x, size = 2)
Çarşamba 7.Ekim.2015
ae<-aes(log10(ENERGY), log10(DEFENERGY), color=SECTORAL)
ss<-stat_smooth(method = "lm", formula = y ~ x, size = 2)
p<-ds+ae+gp+ss
ae<-aes(log10(ENERGY), log10(DEFENERGY), color=NUTS1REGION)
p<-ds+ae+gp+ss
3.Density Graphic Module
#ad<-aes(NUMERIC, color=FACTOR)
ad<-aes(ENERGY, color=SECTORAL)
#as<-aes(log10(NUMERIC), fill=FACTOR
ad<-aes(log10(ENERGY), fill=SECTORAL)
gd<-geom_density()
gd<-geom_density(alpha=0.5)
ds<-ggplot(dataset)
p<-ds+ad+gd
p enter
P.S It will be one Numeric Variable
Description
of Density
Graphs
Çarşamba 7.Ekim.2015
• Density Graphs are the continuous version of
Histograms
• They plot a single numerical variable against their
frequancy.
• We can detect single or multiple peaks of density
graphs and pinpoint the effective factors.
• On the other hand soperposing density graphs
acording the factors with different colors provide us
with information of the effect of the factors
• Logarithmic scale leads a more stable density
formations for financial data.Description of
Density
Graphs
Çarşamba 7.Ekim.2015
ad<-aes(log10(ENERGY),fill=SECTORAL)
p<-ds+ad+gd
NUTS
Eagean
Regions
Log10
Energy Vs
Log10
Default
Energy,
Baloon
Defreceivable
Explained by
Sectoral and
Year Factors
Density/
Violin
Graphics
3.1.Density Bar Graphic Module
#ad<-aes(NUMERIC, color=FACTOR)
ad<-aes(ENERGY, color=SECTORAL)
#as<-aes(log10(NUMERIC), fill=FACTOR
ab<-aes(log10(ENERGY), fill=SECTORAL)
gbd<-geom_bar(position="dodge")
gbs<-geom_bar(position="stack")
ds<-ggplot(dataset)
ab<-aes(log10(ENERGY), fill=SECTORAL)
p<-ds+ab+gbs
p enter
Çarşamba 7.Ekim.2015
ab<-aes(log10(ENERGY), fill=SECTORAL)
gbs<-geom_bar(position="stack")
p<-ds+ab+gbs
ab<-aes(log10(ENERGY), fill=SECTORAL)
gbd<-geom_bar(position="dodge")
p<-ds+ab+gbd
4.Facet Graphic Module
 f<-facet_grid(FACTOR ~ NUMERIC)
 f<-facet_grid(NUTS1REGION ~ NYEAR)
 f<-facet_grid(SECTORAL ~ NYEAR)
 f<-facet_grid(NYEAR ~ SECTORAL)***
 ds<-ggplot(dataset)
 gv<-geom_violin(),gp<-geom_point(),gd<-geom_density()
 p<-ds+as+gp+f
 p<-ds+as+gv+f
 p<-ds+as+gd+f
av<-aes(ENERGY,DEFENERGY,fill=SECTORAL,color=NUTS1REGION)
f<-facet_grid(NYEAR ~ SECTORAL)
p<-ds+av+gp+(lx+ly)+f
Çarşamba 7.Ekim.2015
av<-aes(ENERGY,DEFENERGY,fill=SECTORAL,color=NUTS1REGION)
f<-facet_grid(NYEAR ~ SECTORAL)
p<-ds+av+gp+f
 Facet graphs of ggplot2 package can
show us 3-dimensional graphs distributed
according 3 factors in matrix form.
 In which we can see the anomalies occurs
on which year and which region and
which period.
 Here we investigate default energy versus
default loans bloonad by total loans
according to region, year and period
factors.
 Colors period, balloons Total Cash
loans.Description
of Facet
Graphs
Çarşamba 7.Ekim.2015
4.1.Facet Violin Graphic Module
 f<-facet_grid(FACTOR ~ NUMERIC)
 f<-facet_grid(NUTS1REGION ~ NYEAR)
 f<-facet_grid(SECTORAL ~ NYEAR)
 f<-facet_grid(NYEAR ~ SECTORAL)***
 ds<-ggplot(dataset)
 gv<-geom_violin(),gp<-geom_point(),gd<-geom_density()
 p<-ds+av+gv+lx+ly+f
 p<-ds+as+gv+f
Çarşamba7.Ekim.2015
Çarşamba 7.Ekim.2015
av<-aes(ENERGY, DEFENERGY, fill=SECTORAL)
f<-facet_grid(NYEAR ~ SECTORAL)
p<-ds+av+gv+f
5.Violin Graphic Module
 subset
 ds<-ggplot(dataset)
 dataset=subset(dataset,ENERGRY!=0)
 dataset=subset(dataset,DEFENERGRY!=0)
 Subset Justify
 m<-length(dataset[,1])
 m enter
 [m] 3046 ….
Çarşamba7.Ekim.2015
• ds<-ggplot(dataset)
• av<aes(ENERGY,DEFENERGY,fill=SECTORAL)
• gv<-geom_violin()
• gj<-geom_jitter()
• p<-ds+av+gv+gj+lx+ly
• p enter
Çarşamba 7.Ekim.2015
Description
of Violin
Graphs
•Violin Graphs can be seen as two-dimensional
density graphs
•Usually Violin Graphs comes with Mushroom, Potter
and Bottle formations
•Violin Graphs are very important for Risk Analysis of
financial Data
•Through the mean of X-axis Y-density graph ocuurs
with nirror copy
•Mushroom formation represents a risk concentration
on hig order values of financial data
•Potter means risk on the medium order and the
bottle menas risk on the lower orders
av<-aes(ENERGY,DEFENERGY, fill=NUTS1REGION)
p<-ds+av+gv+gj+ly
av<-aes(log10(ENERGY),log10(DEFENERGY), fill=NUTS1REGION)
p<-ds+av+gv+lx+ly
I would like to express my deep gratitude to;
Dr. Kutlu MERİH,
Dr. C. Coşkun KÜÇÜKÖZMEN
for their valuable contibutions,
Fatma ÇINAR
Contact
kutmerih@gmail.com
kutlu@merih.net
coskun.kucukozmen@ieu.edu.tr
http://www.ieu.edu.tr/tr
coskunkucukozmen@gmail.com
http://www.coskunkucukozmen.com
fatma.cinar@spk.gov.tr
http://www.spk.gov.tr/
http://www.riskonomi.com
@TRUserGroup
@CORTEXIEN
@Riskonometri
@Riskonomi
@datanalitik
@Riskanalitigi
@RiskLabTurkey
@fatma_cinar_ftm
tr.linkedin.com/in/fatmacinar
tr.linkedin.com/pub/kutlu-merih
tr.linkedin.com/in/coskunkucukozmen
KÜÇÜKÖZMEN, C. C. AND ÇINAR F., (2014). “MODELLİNG OF CORPORATE PERFORMANCE IN MULTİ-
DİMENSİONAL COMPLEX STRUCTURED ORGANİZATİONS “CBBC” MANAGEMENT”, SUBMİTTED TO THE
“2ND INTERNATİONAL SYMPOSİUM ON CHAOS, COMPLEXİTY AND LEADERSHİP (ICCLS), DECEMBER
17-19 AT MİDDLE EAST TECHNİCAL UNİVERSİTY (METU), ANKARA, TURKEY.
KÜÇÜKÖZMEN, C. C. VE ÇINAR F., (2014). “FİNANSAL KARAR SÜREÇLERİNDE GRAFİK-DATAMİNİNG
ANALİZİ”, TROUGBI/DW SIG, NİSAN 2014 İSTANBUL, HTTP://WWW.TROUG.ORG/?P=684
 KÜÇÜKÖZMEN, C. C. VE ÇINAR F., (2014). “GÖRSEL VERİ ANALİZİNDE DEVRİM” SÖYLEŞİ, EKONOMİK
ÇÖZÜM, TEMMUZ 2014, HTTP://EKONOMİK-COZUM.COM.TR/GORSEL-VERİ-ANALİZİNDE-DEVRİM-
Mİ.HTML.
KÜÇÜKÖZMEN, C. C. VE MERİH K., (2014). “GÖRSEL TEKNİKLER ÇAĞI" SÖYLEŞİ, EKONOMİK ÇÖZÜM,
TEMMUZ 2014, HTTP://EKONOMİK-COZUM.COM.TR/GORSEL-TEKNİKLER-CAGİ.HTML
KÜÇÜKÖZMEN, C. C. AND ÇINAR F., (2014). “BANKİNG SECTOR ANALYSİS OF IZMİR PROVİNCE: A
GRAPHİCAL DATA MİNİNG APPROACH”, SUBMİTTED TO THE 34TH NATİONAL CONFERENCE FOR
OPERATİONS RESEARCH AND INDUSTRİAL ENGİNEERİNG (YAEM 2014), GÖRÜKLE CAMPUS OF
ULUDAĞ UNİVERSİTY İN BURSA, TURKEY ON 25-27 JUNE 2014.
MERİH, K. VE ÇINAR, F., (2013). “MODELLİNG OF CORPORATE PERFORMANCE IN MULTİ-DİMENSİONAL
COMPLEX STRUCTURED ORGANİZATİONS: “CBBC” APPROACH”, SUBMİTTED TO THE ECONANADOLU
2013: ANADOLU INTERNATİONAL CONFERENCE İN ECONOMİCS III JUNE 19-21, 2013, ESKİŞEHİR.
 HTTP://WWW.ECONANADOLU.ORG/EN/İNDEX.PHP/ARTİCLES2013/3683
KÜÇÜKÖZMEN, C. C. AND ÇINAR F., (2014). “NEW SECTORAL INCENTİVE SYSTEM AND CREDİT
DEFAULTS: GRAPHİC-DATA MİNİNG ANALYSİS”, SUBMİTTED TO THE ICEF 2014 CONFERENCE, YILDIZ
TECHNİCAL UNİVERSİTY İN İSTANBUL, TURKEY ON 08-09 SEP. 2014.
PEDRONİ M., AND BERTRAND MEYER (2009). “OBJECT-ORİENTED MODELİNG OF OBJECT-ORİENTED
CONCEPTS”, ‘A CASE STUDY İN STRUCTURİNG AN EDUCATİONAL DOMAİN’, CHAİR OF SOFTWARE
ENGİNEERİNG, ETH ZURİCH, SWİTZERLAND. FMİCHELA.PEDRONİ|BERTRAND.MEYERG@İNF.ETHZ.CH
KÜÇÜKÖZMEN, C. C. AND ÇINAR F., (2015). “VİSUAL ANAYSİS OF ELECTRİCİTY DEMAND ENERGY
DASHBOARD GRAPHİCS” SUBMİTTED TO THE 5TH MULTİNATİONAL ENERGY AND VALUE CONFERENCE
MAY 7-9, 2015 KADİR HAS UNİVERSİTY İN İSTANBUL, TURKEY
DATA VISUALIZATION WITH R PACKAGES

DATA VISUALIZATION WITH R PACKAGES

  • 1.
    R ile VeriMadenciliği Yaz Okulu, 07 – 13 Eylül 2015, Muğla, TOVAK ULUSLARARSI MARMARİS AKADEMİSİ DATA VISUALIZATION WITH R PACKAGES FATMA ÇINAR, MBA, CAPITAL MARKETS BOARD OF TURKEY E-mail: fatma.cinar@spk.gov.tr @fatma_cinar_ftm @TRUserGroup Kutlu MERİH, PhD, e-mail: kutmerih@gmail.com @cortexien https://www.riskonomi.com
  • 2.
    Visualization of multidimensionalmulti factorial big data is not large data, big data is complex data. What is big data?
  • 3.
    How Big DataHumour is big!
  • 4.
     We aretrainnig decipher this complexcity data Visualization.  Data Visualization packages of R software lattice and ggplot 2.
  • 5.
     What isdata analysis?  Why use a programming language?  Why use R ?  Why lattice packages?  What is lattice packages grammer of graphics?  Why ggplot2 ?  What is ggplot2 grammer of graphics? Agenda • Case study: BRSA NUTS and Sectoral Loans Default Chart of Turkey
  • 6.
    Sectoral Loans DatasetGraphics Data-Mining Analysis Action  Real Time Interactive Data Management for  Effect and Response Analysis Technique:  #Lattice and #ggplot2 Graphical Packages using #R Software
  • 7.
     #library(lattice)  #library(ggplot2) # This example uses the ENGTOVAKLOANS dataset, which comes with ggplot2  names(dataset) Wednesday, September 02, 2015
  • 8.
    names(dataset)  names(dataset)  [1]"NYEAR" "SYEAR" "QUARTERS"  [4] "CITY" "CITYCODE" "NREGION"  [7] "REGION" "NUTS3CODE" "NUTS2CODE"  [10] "NUTS1CODE" "TRNUTS1REGION" "NUTS1REGION"  [13] "TRGROUP" "SECTORAL" "CASHLOANS"  [16] "NONCASHLOANS" "TOTALCASHLOANS" "AUTO"  [19] "MORTGAGE" "OVERDRAFTACCOUNT" "CREDITCARDS"  [22] "FOOD" "BUILDING" "MINERALS "  [25] "FINANCIAL" "TEXTILE" "WHOSESALE "  [28] "TOURISM" "AGRICULTURE" "ENERGY"  [31] "MARITIME" "OTHERCONSUMER" "DEFRECEIVABLE"  [34] "DEFCREDITCARDS" "DEFAUTO" "DEFMORTGAGE"  [37] "DEFOTHERCONSUMER" "DEFFOOD" "DEFBUILDING"  [40] "DEFMINERALS" "DEFFINANCIAL" "DEFTEXTILE"  [43] "DEFWHOLESALE " "DEFTOURISM" "DEFAGRICULTURE"  [46] "DEFENERGY" "DEFMARITIME" "NONCASHFOOD"  [49] "NONCAHBUILDING" "NONCASHMINERALS" "NONFINANCIAL"  [52] "NONCASHTEXTILE" "NONCASHWHOLESALE " "NONCASHTOURISM"  [55] "NONCASHAGRICULTURE" "NONCASHENERGY" "NONCASHMARITIME" Wednesday, September 02, 2015
  • 10.
    • [1] "NYEAR""SYEAR" "QUARTERS" • [4] "CITY" "CITYCODE" "NREGION" • [7] "REGION" "NUTS3CODE" "NUTS2CODE" • [10] "NUTS1CODE" "TRNUTS1REGION" "NUTS1REGION" • [13] "TRGROUP" "SECTORAL" "CASHLOANS" • [16] "NONCASHLOANS" "TOTALCASHLOANS" "AUTO" • [19] "MORTGAGE" "OVERDRAFTACCOUNT" "CREDITCARDS" • [22] "FOOD" "BUILDING" "MINERALS " • [25] "FINANCIAL" "TEXTILE" "WHOSESALE " • [28] "TOURISM" "AGRICULTURE" "ENERGY" • [31] "MARITIME" "OTHERCONSUMER" "DEFRECEIVABLE" • [34] "DEFCREDITCARDS" "DEFAUTO" "DEFMORTGAGE" • [37] "DEFOTHERCONSUMER" "DEFFOOD" "DEFBUILDING" • [40] "DEFMINERALS" "DEFFINANCIAL" "DEFTEXTILE" • [43] "DEFWHOLESALE " "DEFTOURISM" "DEFAGRICULTURE" • [46] "DEFENERGY" "DEFMARITIME" "NONCASHFOOD" • [49] "NONCAHBUILDING" "NONCASHMINERALS" "NONFINANCIAL" • [52] "NONCASHTEXTILE" "NONCASHWHOLESALE " "NONCASHTOURISM" • [55] "NONCASHAGRICULTURE" "NONCASHENERGY" "NONCASHMARITIME" Wednesday, September 02, 2015
  • 11.
    NUTS-1:12 Region of Turkey MEDITERRANEAN  SOUTHEAST ANATOLIA  EAGEAN REGION  NORTHEAST ANATOLIA  MIDDLE ANATOLIA  WEST BLACK SEA  WEST ANATOLIA  EAST BLACK SEA  WEST MARMARA  MIDDLE EAST ANATOLIA  ISTANBUL  EAST MARMARA •NUTS-1: 12 Regions •NUTS-2: 26 Subregions •NUTS-3: 81 Provinces
  • 12.
    (Nomenclature of TerritorialUnits for Statistics, NUTS)
  • 13.
    İstanbul Region West Marmara Region Aegean Region East Marmara West Anatolia Region Mediterranea n Region Anatolia Region West Black SeaRegion East Black Sea Region Northeast Anatolia Region East Anatolia Region Southea st Anatoli a İstanbul (Subregion) Tekirdağ (Subregion) İzmir (Subregion) Bursa (Subregion) Ankara (Subregion) Antalya (Subregion) Kırıkkale (Subregion) Zonguldak (Subregion) Trabzon (Subregion) Erzurum (Subregion) Malatya (Subregion) Gaziant ep (Subreg ion) Edirne Aydın (Subregion) Eskişehir Konya (Subregion) Isparta Aksaray Karabük Ordu Erzincan Elazığ Adıyam an Kırlareli Denizli Bilecik Karaman Burdur Niğde Bartın Giresun Bayburt Bingöl Kilis Balıkesir (Subregion) Muğla Kocaeli (Subregion) Adana (Subregion) Nevşehir Kastamonu (Subregion) Rize Ağrı (Subregion) Dersim Şanlıurf a (Subreg ion) Çanakkale Manisa (Subregion) Sakarya Mersin Kırşehir Çankırı Artvin Kars Van (Subregion) Diyarba kır A.Karahisar Düzce Hatay (Subregion) Kayseri (Subregion) Sinop Gümüşhane Iğdır Muş Mardin (Subreg ion) Kütahya Bolu Kahramanmaraş Sivas Samsun (Subregion) Ardahan Bitlis Batman Uşak Yalova Osmaniye Yozgat Tokat Hakkari Şırnak Çorum Siirt Amasya 1 Province 5 Province 8 Province 8 Province 3 Province 8 Province 8 Province 10 Province 6 Province 7 Province 8 Province 9 Provinc e
  • 14.
    1. Lattice GraphicsPackages  How to create basic plots (xyplot, scatterplots, histograms, boxwhisper, dotplot and bar using qplot()  Setting vs. mapping  How to add group and factor=numerical variable Çarşamba 7.Ekim.2015
  • 15.
    1.1. XYPlot GraphicModule  library(lattice)  p<-xyplot(NUMERIC ~ NUMERIC) | FACTOR1, group= FACTOR2, data=dataset)  p  p<-xyplot( NUM ~ NUM ) | FAC1+FAC2, group= FAC3, data=dataset)  p  p<-xyplot(DEFENERGY ~ ENERGY | CITY+factor(NYEAR), group=SECTORAL, data=dataset)  p Description of XYPlot Graphs 1. Lattice Graphics Packages Çarşamba 7.Ekim.2015
  • 16.
    p<-xyplot(DEFENERGY ~ ENERGY| CITY+factor(NYEAR), group=SECTORAL, data=dataset) XYPlot graph of the lattice packakge for 2 numerical 3 factors values Çarşamba 7.Ekim.2015
  • 17.
    p<-xyplot(DEFENERGY ~ ENERGY| CITY+factor(NYEAR), group=SECTORAL, data=dataset)
  • 18.
    p<-xyplot(DEFENERGY ~ ENERGY| SECTORAL+factor(NYEAR), group=NUTS1REGION, data=dataset)
  • 19.
    1.1.1. XYPlot GraphicModule and Legand  library(lattice)  p<-xyplot(NUMERIC ~ NUMERIC) | FACTOR1, group= FACTOR2, data=dataset)  p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) | SECTORAL+factor(NYEAR), group=NUTS1REGION, data=dataset)  p  p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) | SECTORAL+factor(NYEAR), group=NUTS1REGION, auto.key=list(border=TRUE),data=dataset)  p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY) | SECTORAL, group=factor(NYEAR), auto.key=list(border=TRUE),data=dataset) Çarşamba 7.Ekim.2015
  • 20.
    p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY)| SECTORAL+factor(NYEAR), group=NUTS1REGION, data=dataset)
  • 21.
    p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY)| SECTORAL+factor(NYEAR), group=NUTS1REGION, auto.key=list(border=TRUE),data=dataset)
  • 22.
    p<-xyplot(log10(DEFENERGY) ~ log10(ENERGY)| SECTORAL, group=factor(NYEAR), auto.key=list(border=TRUE),data=dataset)
  • 23.
    • We dofactor-based analysis for the begining with the simplest graphical form of the histogram. • Histograms of a single numeric value by one factor are the starting point of the factor based graphical analysis Çarşamba 7.Ekim.2015 1.2. Histogram Graphic Module Description of Histogram Graphs
  • 24.
    Çarşamba 7.Ekim.2015 p<-histogram( ~log10(DEFENERGY) | SECTORAL, data=dataset)
  • 25.
  • 26.
  • 27.
    1.3.DotPlot Graphic Module #p<-dotplot(FACTOR1 ~ NUMERIC | FACTOR2, group=FACTOR3, data=dataset) p enter **********CITY!!!!!!*************** p<-dotplot (CITY ~ DEFENERGY | SECTORAL, group=factor(NYEAR), data=dataset) p<-dotplot (CITY ~ log10(DEFENERGY) | SECTORAL, group=factor(NYEAR), data=dataset) p<-dotplot (NUTS1REGION ~ DEFENERGY | SECTORAL, group=factor(NYEAR), data=dataset) p<-dotplot(SECTORAL ~ log10(DEFENERGY) | CITY, group=factor(NYEAR), data=dataset) Description of DotPlot Graphs Çarşamba 7.Ekim.2015
  • 28.
    p<-dotplot(SECTORAL ~ log10(DEFENERGY)| NUTS1REGION,group=factor(NYEAR),data=dataset)
  • 29.
    p<-dotplot(SECTORAL ~ log10(DEFENERGY)| NUTS1REGION,group=factor(NYEAR),auto.key=list(border=TRUE),data=dataset)
  • 30.
    p<-dotplot(SECTORAL ~ DEFENERGY| CITY, group=factor(NYEAR), data=dataset)
  • 31.
    p<-dotplot(SECTORAL ~ log10(DEFENERGY)| CITY, group=factor(NYEAR), data=dataset)
  • 32.
    p<-dotplot(CITY ~ DEFENERGY| SECTORAL, group=factor(NYEAR), data=dataset)
  • 33.
    p<-dotplot (CITY ~log10(DEFENERGY) | SECTORAL, group=factor(NYEAR), data=dataset)
  • 34.
    2. Ggplot2 Graphics Packages How to create basic plots (xyplot, scatterplots, histograms, baloon, facet, density and violin) using qplot()  Setting vs. mapping  How to add extra variables with aesthetics (like color, shape, and size) or faceting  https://plot.ly/ggplot2/geom_bar/ Çarşamba 7.Ekim.2015
  • 35.
    What is ggplot2?  Grammer of graphics represents and abstraction of graphics ideas/objects  Think ‘verb’, ‘noun’, ‘adjective’ for graphics  Allows for a ‘theory’ of graphics on which to build new graphics and graphics ogjects  ‘Shorten the distence from mind to page’ Çarşamba 7.Ekim.2015
  • 36.
    Grammer of Graphics? ‘In brief, the grammer tells us that a statistical graphic is a mapping from data to aesthetic attributes (color, shape, size) of geometric object (point, lines, bars). The plot may also contain stastistical transformations of data and drawn on a specific coordinate system’ Hadley Wickham Çarşamba 7.Ekim.2015
  • 37.
    2.1.Logarithm Module  library(ggplot2) ds<-ggplot(dataset)  #as<-aes(log10(NUMERIC), log10(NUMERIC), color=FACTOR)  as<-aes(log10(ENERGY), log10(DEFENERGY), color=SECTORAL)  lx<-scale_x_log10()  ly<-scale_y_log10()  p<-ds+as+gp+lx+ly  p Çarşamba 7.Ekim.2015
  • 38.
    How to addextra variables with aesthetics (like color, shape, and size) #as<-(NUMERIC, NUMERIC, color=FACTOR, shape=factor(NUMERIC), size=NUMERIC as<-aes(ENERGY,DEFENERGY, color=NUTS1REGION, shape=factor(NYEAR), size=DEFRECEIVABLE gp<-geom_point() ds<-ggplot(dataset) ds<-ggplot(dataset) p<-ds+as+gp p enter 2.1.1. Baloon Graphic Module Çarşamba 7.Ekim.2015
  • 39.
    Description of Baloon Graphs Baloon graphsof ggplot2 package can show us 3-dimensional relations distributed according 1-3 factors in scatterplot form. With this type 2-dimensional numerical relations can be represented under effect of 3rd numerical value. Çarşamba 7.Ekim.2015
  • 40.
  • 41.
    ae<-aes(log10(ENERGY), log10(DEFENERGY), color=SECTORAL) gp<-geom_point() ds<-ggplot(dataset) dataset=subset(dataset, ENERGY!=0) dataset=subset(dataset,DEFENERGY!=0) ss<-stat_smooth(method = "lm", formula = y ~ x, size = 2) p<-ds+ae+gp+ss p 2.1.2. PowerLaw Graphic Module Description of Baloon Graphs ss<-stat_smooth(method = "lm", formula = y ~ x, size = 2) Çarşamba 7.Ekim.2015
  • 42.
  • 43.
  • 44.
    3.Density Graphic Module #ad<-aes(NUMERIC,color=FACTOR) ad<-aes(ENERGY, color=SECTORAL) #as<-aes(log10(NUMERIC), fill=FACTOR ad<-aes(log10(ENERGY), fill=SECTORAL) gd<-geom_density() gd<-geom_density(alpha=0.5) ds<-ggplot(dataset) p<-ds+ad+gd p enter P.S It will be one Numeric Variable Description of Density Graphs Çarşamba 7.Ekim.2015
  • 45.
    • Density Graphsare the continuous version of Histograms • They plot a single numerical variable against their frequancy. • We can detect single or multiple peaks of density graphs and pinpoint the effective factors. • On the other hand soperposing density graphs acording the factors with different colors provide us with information of the effect of the factors • Logarithmic scale leads a more stable density formations for financial data.Description of Density Graphs Çarşamba 7.Ekim.2015
  • 46.
  • 47.
  • 48.
    3.1.Density Bar GraphicModule #ad<-aes(NUMERIC, color=FACTOR) ad<-aes(ENERGY, color=SECTORAL) #as<-aes(log10(NUMERIC), fill=FACTOR ab<-aes(log10(ENERGY), fill=SECTORAL) gbd<-geom_bar(position="dodge") gbs<-geom_bar(position="stack") ds<-ggplot(dataset) ab<-aes(log10(ENERGY), fill=SECTORAL) p<-ds+ab+gbs p enter Çarşamba 7.Ekim.2015
  • 49.
  • 50.
  • 51.
    4.Facet Graphic Module f<-facet_grid(FACTOR ~ NUMERIC)  f<-facet_grid(NUTS1REGION ~ NYEAR)  f<-facet_grid(SECTORAL ~ NYEAR)  f<-facet_grid(NYEAR ~ SECTORAL)***  ds<-ggplot(dataset)  gv<-geom_violin(),gp<-geom_point(),gd<-geom_density()  p<-ds+as+gp+f  p<-ds+as+gv+f  p<-ds+as+gd+f av<-aes(ENERGY,DEFENERGY,fill=SECTORAL,color=NUTS1REGION) f<-facet_grid(NYEAR ~ SECTORAL) p<-ds+av+gp+(lx+ly)+f Çarşamba 7.Ekim.2015
  • 52.
  • 53.
     Facet graphsof ggplot2 package can show us 3-dimensional graphs distributed according 3 factors in matrix form.  In which we can see the anomalies occurs on which year and which region and which period.  Here we investigate default energy versus default loans bloonad by total loans according to region, year and period factors.  Colors period, balloons Total Cash loans.Description of Facet Graphs Çarşamba 7.Ekim.2015
  • 54.
    4.1.Facet Violin GraphicModule  f<-facet_grid(FACTOR ~ NUMERIC)  f<-facet_grid(NUTS1REGION ~ NYEAR)  f<-facet_grid(SECTORAL ~ NYEAR)  f<-facet_grid(NYEAR ~ SECTORAL)***  ds<-ggplot(dataset)  gv<-geom_violin(),gp<-geom_point(),gd<-geom_density()  p<-ds+av+gv+lx+ly+f  p<-ds+as+gv+f Çarşamba7.Ekim.2015 Çarşamba 7.Ekim.2015
  • 55.
  • 56.
    5.Violin Graphic Module subset  ds<-ggplot(dataset)  dataset=subset(dataset,ENERGRY!=0)  dataset=subset(dataset,DEFENERGRY!=0)  Subset Justify  m<-length(dataset[,1])  m enter  [m] 3046 …. Çarşamba7.Ekim.2015 • ds<-ggplot(dataset) • av<aes(ENERGY,DEFENERGY,fill=SECTORAL) • gv<-geom_violin() • gj<-geom_jitter() • p<-ds+av+gv+gj+lx+ly • p enter Çarşamba 7.Ekim.2015
  • 57.
    Description of Violin Graphs •Violin Graphscan be seen as two-dimensional density graphs •Usually Violin Graphs comes with Mushroom, Potter and Bottle formations •Violin Graphs are very important for Risk Analysis of financial Data •Through the mean of X-axis Y-density graph ocuurs with nirror copy •Mushroom formation represents a risk concentration on hig order values of financial data •Potter means risk on the medium order and the bottle menas risk on the lower orders
  • 58.
  • 59.
  • 60.
    I would liketo express my deep gratitude to; Dr. Kutlu MERİH, Dr. C. Coşkun KÜÇÜKÖZMEN for their valuable contibutions, Fatma ÇINAR
  • 61.
  • 62.
    KÜÇÜKÖZMEN, C. C.AND ÇINAR F., (2014). “MODELLİNG OF CORPORATE PERFORMANCE IN MULTİ- DİMENSİONAL COMPLEX STRUCTURED ORGANİZATİONS “CBBC” MANAGEMENT”, SUBMİTTED TO THE “2ND INTERNATİONAL SYMPOSİUM ON CHAOS, COMPLEXİTY AND LEADERSHİP (ICCLS), DECEMBER 17-19 AT MİDDLE EAST TECHNİCAL UNİVERSİTY (METU), ANKARA, TURKEY. KÜÇÜKÖZMEN, C. C. VE ÇINAR F., (2014). “FİNANSAL KARAR SÜREÇLERİNDE GRAFİK-DATAMİNİNG ANALİZİ”, TROUGBI/DW SIG, NİSAN 2014 İSTANBUL, HTTP://WWW.TROUG.ORG/?P=684  KÜÇÜKÖZMEN, C. C. VE ÇINAR F., (2014). “GÖRSEL VERİ ANALİZİNDE DEVRİM” SÖYLEŞİ, EKONOMİK ÇÖZÜM, TEMMUZ 2014, HTTP://EKONOMİK-COZUM.COM.TR/GORSEL-VERİ-ANALİZİNDE-DEVRİM- Mİ.HTML. KÜÇÜKÖZMEN, C. C. VE MERİH K., (2014). “GÖRSEL TEKNİKLER ÇAĞI" SÖYLEŞİ, EKONOMİK ÇÖZÜM, TEMMUZ 2014, HTTP://EKONOMİK-COZUM.COM.TR/GORSEL-TEKNİKLER-CAGİ.HTML KÜÇÜKÖZMEN, C. C. AND ÇINAR F., (2014). “BANKİNG SECTOR ANALYSİS OF IZMİR PROVİNCE: A GRAPHİCAL DATA MİNİNG APPROACH”, SUBMİTTED TO THE 34TH NATİONAL CONFERENCE FOR OPERATİONS RESEARCH AND INDUSTRİAL ENGİNEERİNG (YAEM 2014), GÖRÜKLE CAMPUS OF ULUDAĞ UNİVERSİTY İN BURSA, TURKEY ON 25-27 JUNE 2014. MERİH, K. VE ÇINAR, F., (2013). “MODELLİNG OF CORPORATE PERFORMANCE IN MULTİ-DİMENSİONAL COMPLEX STRUCTURED ORGANİZATİONS: “CBBC” APPROACH”, SUBMİTTED TO THE ECONANADOLU 2013: ANADOLU INTERNATİONAL CONFERENCE İN ECONOMİCS III JUNE 19-21, 2013, ESKİŞEHİR.  HTTP://WWW.ECONANADOLU.ORG/EN/İNDEX.PHP/ARTİCLES2013/3683 KÜÇÜKÖZMEN, C. C. AND ÇINAR F., (2014). “NEW SECTORAL INCENTİVE SYSTEM AND CREDİT DEFAULTS: GRAPHİC-DATA MİNİNG ANALYSİS”, SUBMİTTED TO THE ICEF 2014 CONFERENCE, YILDIZ TECHNİCAL UNİVERSİTY İN İSTANBUL, TURKEY ON 08-09 SEP. 2014. PEDRONİ M., AND BERTRAND MEYER (2009). “OBJECT-ORİENTED MODELİNG OF OBJECT-ORİENTED CONCEPTS”, ‘A CASE STUDY İN STRUCTURİNG AN EDUCATİONAL DOMAİN’, CHAİR OF SOFTWARE ENGİNEERİNG, ETH ZURİCH, SWİTZERLAND. FMİCHELA.PEDRONİ|BERTRAND.MEYERG@İNF.ETHZ.CH KÜÇÜKÖZMEN, C. C. AND ÇINAR F., (2015). “VİSUAL ANAYSİS OF ELECTRİCİTY DEMAND ENERGY DASHBOARD GRAPHİCS” SUBMİTTED TO THE 5TH MULTİNATİONAL ENERGY AND VALUE CONFERENCE MAY 7-9, 2015 KADİR HAS UNİVERSİTY İN İSTANBUL, TURKEY