SlideShare a Scribd company logo
1 of 2
Download to read offline
Hands on Text Analytics
with Orange
Ajda	Pretnar	
ajda.pretnar@fri.uni-lj.si	
University	of	Ljubljana,	Slovenia	
	
Niko	Colnerič	
niko.colneric@fri.uni-lj.si	
University	of	Ljubljana,	Slovenia	
	
Lan	Žagar	
lan.zagar@fri.uni-lj.si	
University	of	Ljubljana,	Slovenia	
	
Orange for Text Analytics
In	recent	years,	the	digital	humanities	community	
has	been	introduced	to	many	powerful	tools	for	text	
analysis,	but	few	of	these	tools	combine	powerful	data	
mining	and	machine	learning	algorithms	within	a	sim-
ple	and	capable	user	interface.	For	flexible	and	crea-
tive	analysis,	researchers	need	a	tool	that	focuses	on	
intuition,	visualizations	and	interactivity.	
This	 workshop	 will	 introduce	 participants	 to	 Or-
ange,	a	visual	programming	environment	for	data	min-
ing,	suitable	for	both	beginners	and	experts.	Particular	
emphasis	will	be	placed	on	its	Text	add-on,	which	of-
fers	 components	 for	 text	 mining,	 visualization	 and	
deep-	learning-based	embedding.		
This	 is	 a	 hands-on	 workshop,	 where	 the	 partici-
pants	will	actively	construct	analytical	workflows	and	
go	through	case	studies	with	the	help	of	the	instruc-
tors.	They	will	learn	how	to	manage	textual	data,	pre-
process	it,	use	machine	learning,	data	projection	and	
visualisation	 techniques	 to	 expose	 hidden	 patterns	
and	evaluate	the	resulting	models.	At	the	end	of	the	
workshop,	the	participants	will	know	how	to	use	vis-
ual	 programming	 to	 seamlessly	 construct	 powerful	
data	 analysis	 workflows,	 which	 can	 be	 applied	 to	 a	
wide	range	of	challenges	in	digital	humanities.	
Structure of the Workshop
Part 1: Visual programming, workflows, data
input and preprocessing
First,	 we	 will	 show	 the	 basics	 of	 Orange:	 how	 to	
load	the	data,	inspect	and	visualize	it.	Participants	will	
be	introduced	to	several	options	for	data	import,	from	
standard	Corpus	to	Twitter,	Guardian	and	Text	Import.	
Once	the	corpus	is	loaded,	we	will	preprocess	it	and	
display	the	result	in	a	word	cloud.	A	particular	empha-
sis	will	be	on	the	use	of	custom	preprocessing	tech-
niques	and	how	to	successfully	apply	them	to	the	cor-
pus.	The	results	of	each	technique	will	be	observed	in	
an	interactive	word	cloud	and	concordances.	
	
Figure 1: Preprocessing results displayed in a word cloud
Part 2: Machine learning and deep-learning-
based embedding for predictive analysis
Next,	we	will	use	Twitter	data	to	construct	an	au-
thor	prediction	pipeline	and	test	some	classifiers.	We	
will	fetch	author	Timelines	from	Twitter	and	observe	
the	 retrieved	 corpus.	 This	 time	 we	 will	 introduce	 a	
pre-trained	 tweet	 tokenizer	 and	 pass	 the	 prepro-
cessed	corpus	through	a	bag	of	words.	We	will	discuss	
bag	of	words	parameters	and	how	to	best	prepare	the	
data	for	further	analysis.	The	results	of	using	different	
parameters	will	be	observed	in	a	data	table	to	under-
stand	the	underlying	data	structures.	For	comparison,	
we	will	use	deep-learning-based	embedding	to	derive	
vector	representation	of	tweets	and	in	this	way	enable	
machine	learning.	
We	will	explain	how	we	can	use	machine	learning	
in	text	mining	and	introduce	a	number	of	techniques	
for	predictive	analysis.	We	will	use	cross-validation	to	
test	the	constructed	bag	of	words	models	and	compare	
classification	scores	for	each	algorithm.	We	will	dis-
cuss	 the	 quality	 of	 constructed	 models	 and	 what	
scores	are	usually	the	best	for	observing	model	qual-
ity.	Additionally,	we	will	inspect	misclassified	tweets	
in	 a	 confusion	 matrix	 and	 even	 further	 in	 Corpus	
Viewer,	to	leverage	the	possibilities	of	a	close(r)	read-
ing.	
Part 3: Data clustering, sentiment analysis,
image and geo analytics
In	the	third	part,	we	will	work	on	geomapping	and	
image	analytics.	We	will	transform	textual	and	visual	
data	 into	 feature	 vectors	 and	 plot	 these	 data	 onto	 a	
world	map	to	discover	interesting	relations.	
We	 will	 discuss	 how	 to	 acquire	 geolocated	 data	
from	Twitter	and	why	this	is	useful.	Next,	we	will	use	
geotagged	Twitter	data	and	apply	a	pre-trained	senti-
ment	analysis	model	to	acquire	sentiment	orientation.	
We	will	map	the	sentiment-tagged	tweets	and	explore	
how	to	use	sentiment	together	with	geomapping.	
Finally,	the	participants	will	be	introduced	to	image	
analytics	for	humanities	research.	We	will	explain	why	
and	 how	 to	 transform	 raw	 images	 into	 multidimen-
sional	vectors	and	how	to	work	with	the	new	data.	We	
will	cluster	Instagram	images	into	groups	and	explore	
how	to	map	image-containing	tweets	on	a	world	map.	
Do	images	correspond	to	geolocation?	We	will	see.	
	
	
Figure 2: Images from social media are embedded with
ImageNet embedding, clustered with Hierarchical Clustering
and displayed on a map by their geolocation.

More Related Content

Similar to Orange Text Analytics Workshop Hands-On Guide

Ferenc Józsa - Hungarian University of Fine Arts
Ferenc Józsa - Hungarian University of Fine ArtsFerenc Józsa - Hungarian University of Fine Arts
Ferenc Józsa - Hungarian University of Fine ArtsCUBCCE Conference
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpRikki Wright
 
Pankaj rajanresume2014
Pankaj rajanresume2014Pankaj rajanresume2014
Pankaj rajanresume2014Pankaj Rajan
 
The Community of Interest in France
The Community of Interest in FranceThe Community of Interest in France
The Community of Interest in FranceChristian Mercat
 
Resume jim yu
Resume   jim yuResume   jim yu
Resume jim yuJim Yu
 
A Comparative Study of Recent Ontology Visualization Tools with a Case of Dia...
A Comparative Study of Recent Ontology Visualization Tools with a Case of Dia...A Comparative Study of Recent Ontology Visualization Tools with a Case of Dia...
A Comparative Study of Recent Ontology Visualization Tools with a Case of Dia...IJORCS
 
Digital Art ToolKit
Digital Art ToolKitDigital Art ToolKit
Digital Art ToolKitBryan Chung
 
Yiran_Wang_Resume
Yiran_Wang_ResumeYiran_Wang_Resume
Yiran_Wang_ResumeYiran Wang
 
Guia 2-examen-de-ingles
Guia 2-examen-de-inglesGuia 2-examen-de-ingles
Guia 2-examen-de-inglesLiz Castro B
 
Rae's Resume - UX Research
Rae's Resume - UX ResearchRae's Resume - UX Research
Rae's Resume - UX ResearchRae Hu
 
Workshop: Data Visualization for Corpus Linguistics via Shiny Framework
Workshop: Data Visualization for Corpus Linguistics via Shiny FrameworkWorkshop: Data Visualization for Corpus Linguistics via Shiny Framework
Workshop: Data Visualization for Corpus Linguistics via Shiny FrameworkOlga Scrivner
 
Evolution Of Object Oriented Technology
Evolution Of Object Oriented TechnologyEvolution Of Object Oriented Technology
Evolution Of Object Oriented TechnologySharon Roberts
 
Jiali_Han_Resume
Jiali_Han_ResumeJiali_Han_Resume
Jiali_Han_ResumeJiali Han
 

Similar to Orange Text Analytics Workshop Hands-On Guide (20)

py04.pdf
py04.pdfpy04.pdf
py04.pdf
 
Ferenc Józsa - Hungarian University of Fine Arts
Ferenc Józsa - Hungarian University of Fine ArtsFerenc Józsa - Hungarian University of Fine Arts
Ferenc Józsa - Hungarian University of Fine Arts
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And Rlbp
 
Pankaj rajanresume2014
Pankaj rajanresume2014Pankaj rajanresume2014
Pankaj rajanresume2014
 
The Community of Interest in France
The Community of Interest in FranceThe Community of Interest in France
The Community of Interest in France
 
CV _Manoj
CV _ManojCV _Manoj
CV _Manoj
 
Port
PortPort
Port
 
Nguyen Nhat Tien CV
Nguyen Nhat Tien CVNguyen Nhat Tien CV
Nguyen Nhat Tien CV
 
CV-Jayusman
CV-JayusmanCV-Jayusman
CV-Jayusman
 
Gesture detection
Gesture detectionGesture detection
Gesture detection
 
dh_specialist_interview
dh_specialist_interviewdh_specialist_interview
dh_specialist_interview
 
Resume jim yu
Resume   jim yuResume   jim yu
Resume jim yu
 
A Comparative Study of Recent Ontology Visualization Tools with a Case of Dia...
A Comparative Study of Recent Ontology Visualization Tools with a Case of Dia...A Comparative Study of Recent Ontology Visualization Tools with a Case of Dia...
A Comparative Study of Recent Ontology Visualization Tools with a Case of Dia...
 
Digital Art ToolKit
Digital Art ToolKitDigital Art ToolKit
Digital Art ToolKit
 
Yiran_Wang_Resume
Yiran_Wang_ResumeYiran_Wang_Resume
Yiran_Wang_Resume
 
Guia 2-examen-de-ingles
Guia 2-examen-de-inglesGuia 2-examen-de-ingles
Guia 2-examen-de-ingles
 
Rae's Resume - UX Research
Rae's Resume - UX ResearchRae's Resume - UX Research
Rae's Resume - UX Research
 
Workshop: Data Visualization for Corpus Linguistics via Shiny Framework
Workshop: Data Visualization for Corpus Linguistics via Shiny FrameworkWorkshop: Data Visualization for Corpus Linguistics via Shiny Framework
Workshop: Data Visualization for Corpus Linguistics via Shiny Framework
 
Evolution Of Object Oriented Technology
Evolution Of Object Oriented TechnologyEvolution Of Object Oriented Technology
Evolution Of Object Oriented Technology
 
Jiali_Han_Resume
Jiali_Han_ResumeJiali_Han_Resume
Jiali_Han_Resume
 

More from Akuhuruf

PER-4_PP_2017-1.pdf
PER-4_PP_2017-1.pdfPER-4_PP_2017-1.pdf
PER-4_PP_2017-1.pdfAkuhuruf
 
Materi-Taksonomi-Hijau-resize_compressed.pdf
Materi-Taksonomi-Hijau-resize_compressed.pdfMateri-Taksonomi-Hijau-resize_compressed.pdf
Materi-Taksonomi-Hijau-resize_compressed.pdfAkuhuruf
 
Panduan-Ergonomi-WFH-PEI-200514-OnlineVer.pdf
Panduan-Ergonomi-WFH-PEI-200514-OnlineVer.pdfPanduan-Ergonomi-WFH-PEI-200514-OnlineVer.pdf
Panduan-Ergonomi-WFH-PEI-200514-OnlineVer.pdfAkuhuruf
 
ODOI-7-Maret-2022-Profesionalisme.pdf
ODOI-7-Maret-2022-Profesionalisme.pdfODOI-7-Maret-2022-Profesionalisme.pdf
ODOI-7-Maret-2022-Profesionalisme.pdfAkuhuruf
 
Pengelolaan-Kinerja-Organisasi.pdf
Pengelolaan-Kinerja-Organisasi.pdfPengelolaan-Kinerja-Organisasi.pdf
Pengelolaan-Kinerja-Organisasi.pdfAkuhuruf
 
Mengelola-Emosi-Orang-Tua-Saat-Mendampingi-Anak-PJJ-Rheni-M..pdf
Mengelola-Emosi-Orang-Tua-Saat-Mendampingi-Anak-PJJ-Rheni-M..pdfMengelola-Emosi-Orang-Tua-Saat-Mendampingi-Anak-PJJ-Rheni-M..pdf
Mengelola-Emosi-Orang-Tua-Saat-Mendampingi-Anak-PJJ-Rheni-M..pdfAkuhuruf
 
Hipertensi-Sharing-Experience.pdf
Hipertensi-Sharing-Experience.pdfHipertensi-Sharing-Experience.pdf
Hipertensi-Sharing-Experience.pdfAkuhuruf
 
laut-bercerita (1).pdf
laut-bercerita (1).pdflaut-bercerita (1).pdf
laut-bercerita (1).pdfAkuhuruf
 
S2-2021-449118-summary_id_compressed (1).pdf
S2-2021-449118-summary_id_compressed (1).pdfS2-2021-449118-summary_id_compressed (1).pdf
S2-2021-449118-summary_id_compressed (1).pdfAkuhuruf
 
S2-2021-449118-complete_compressed (1).pdf
S2-2021-449118-complete_compressed (1).pdfS2-2021-449118-complete_compressed (1).pdf
S2-2021-449118-complete_compressed (1).pdfAkuhuruf
 
Juknis-Rumbel_compressed.pdf
Juknis-Rumbel_compressed.pdfJuknis-Rumbel_compressed.pdf
Juknis-Rumbel_compressed.pdfAkuhuruf
 
PP-Nomor-38-Tahun-2016-PP-Nomor-38-Tahun-2016.pdf
PP-Nomor-38-Tahun-2016-PP-Nomor-38-Tahun-2016.pdfPP-Nomor-38-Tahun-2016-PP-Nomor-38-Tahun-2016.pdf
PP-Nomor-38-Tahun-2016-PP-Nomor-38-Tahun-2016.pdfAkuhuruf
 
Kesediaan-Relokasi-by-yusuf_compressed.pdf
Kesediaan-Relokasi-by-yusuf_compressed.pdfKesediaan-Relokasi-by-yusuf_compressed.pdf
Kesediaan-Relokasi-by-yusuf_compressed.pdfAkuhuruf
 
ODOI-Pilar.pdf
ODOI-Pilar.pdfODOI-Pilar.pdf
ODOI-Pilar.pdfAkuhuruf
 
s40537-015-0030-3-data-analytics-a-survey.pdf
s40537-015-0030-3-data-analytics-a-survey.pdfs40537-015-0030-3-data-analytics-a-survey.pdf
s40537-015-0030-3-data-analytics-a-survey.pdfAkuhuruf
 
big-data-analytics-and-iot-in-logistics-a-case-study-2018.pdf
big-data-analytics-and-iot-in-logistics-a-case-study-2018.pdfbig-data-analytics-and-iot-in-logistics-a-case-study-2018.pdf
big-data-analytics-and-iot-in-logistics-a-case-study-2018.pdfAkuhuruf
 
pole2016-A-Recent-Study-of-Emerging-Tools.pdf
pole2016-A-Recent-Study-of-Emerging-Tools.pdfpole2016-A-Recent-Study-of-Emerging-Tools.pdf
pole2016-A-Recent-Study-of-Emerging-Tools.pdfAkuhuruf
 
Hidroponik-Asik-di-Masa-Pandemik-Riza-S.-N.pdf
Hidroponik-Asik-di-Masa-Pandemik-Riza-S.-N.pdfHidroponik-Asik-di-Masa-Pandemik-Riza-S.-N.pdf
Hidroponik-Asik-di-Masa-Pandemik-Riza-S.-N.pdfAkuhuruf
 
kambatla2014.pdf
kambatla2014.pdfkambatla2014.pdf
kambatla2014.pdfAkuhuruf
 

More from Akuhuruf (20)

PER-4_PP_2017-1.pdf
PER-4_PP_2017-1.pdfPER-4_PP_2017-1.pdf
PER-4_PP_2017-1.pdf
 
Materi-Taksonomi-Hijau-resize_compressed.pdf
Materi-Taksonomi-Hijau-resize_compressed.pdfMateri-Taksonomi-Hijau-resize_compressed.pdf
Materi-Taksonomi-Hijau-resize_compressed.pdf
 
Panduan-Ergonomi-WFH-PEI-200514-OnlineVer.pdf
Panduan-Ergonomi-WFH-PEI-200514-OnlineVer.pdfPanduan-Ergonomi-WFH-PEI-200514-OnlineVer.pdf
Panduan-Ergonomi-WFH-PEI-200514-OnlineVer.pdf
 
ODOI-7-Maret-2022-Profesionalisme.pdf
ODOI-7-Maret-2022-Profesionalisme.pdfODOI-7-Maret-2022-Profesionalisme.pdf
ODOI-7-Maret-2022-Profesionalisme.pdf
 
Pengelolaan-Kinerja-Organisasi.pdf
Pengelolaan-Kinerja-Organisasi.pdfPengelolaan-Kinerja-Organisasi.pdf
Pengelolaan-Kinerja-Organisasi.pdf
 
CRS.pdf
CRS.pdfCRS.pdf
CRS.pdf
 
Mengelola-Emosi-Orang-Tua-Saat-Mendampingi-Anak-PJJ-Rheni-M..pdf
Mengelola-Emosi-Orang-Tua-Saat-Mendampingi-Anak-PJJ-Rheni-M..pdfMengelola-Emosi-Orang-Tua-Saat-Mendampingi-Anak-PJJ-Rheni-M..pdf
Mengelola-Emosi-Orang-Tua-Saat-Mendampingi-Anak-PJJ-Rheni-M..pdf
 
Hipertensi-Sharing-Experience.pdf
Hipertensi-Sharing-Experience.pdfHipertensi-Sharing-Experience.pdf
Hipertensi-Sharing-Experience.pdf
 
laut-bercerita (1).pdf
laut-bercerita (1).pdflaut-bercerita (1).pdf
laut-bercerita (1).pdf
 
S2-2021-449118-summary_id_compressed (1).pdf
S2-2021-449118-summary_id_compressed (1).pdfS2-2021-449118-summary_id_compressed (1).pdf
S2-2021-449118-summary_id_compressed (1).pdf
 
S2-2021-449118-complete_compressed (1).pdf
S2-2021-449118-complete_compressed (1).pdfS2-2021-449118-complete_compressed (1).pdf
S2-2021-449118-complete_compressed (1).pdf
 
Juknis-Rumbel_compressed.pdf
Juknis-Rumbel_compressed.pdfJuknis-Rumbel_compressed.pdf
Juknis-Rumbel_compressed.pdf
 
PP-Nomor-38-Tahun-2016-PP-Nomor-38-Tahun-2016.pdf
PP-Nomor-38-Tahun-2016-PP-Nomor-38-Tahun-2016.pdfPP-Nomor-38-Tahun-2016-PP-Nomor-38-Tahun-2016.pdf
PP-Nomor-38-Tahun-2016-PP-Nomor-38-Tahun-2016.pdf
 
Kesediaan-Relokasi-by-yusuf_compressed.pdf
Kesediaan-Relokasi-by-yusuf_compressed.pdfKesediaan-Relokasi-by-yusuf_compressed.pdf
Kesediaan-Relokasi-by-yusuf_compressed.pdf
 
ODOI-Pilar.pdf
ODOI-Pilar.pdfODOI-Pilar.pdf
ODOI-Pilar.pdf
 
s40537-015-0030-3-data-analytics-a-survey.pdf
s40537-015-0030-3-data-analytics-a-survey.pdfs40537-015-0030-3-data-analytics-a-survey.pdf
s40537-015-0030-3-data-analytics-a-survey.pdf
 
big-data-analytics-and-iot-in-logistics-a-case-study-2018.pdf
big-data-analytics-and-iot-in-logistics-a-case-study-2018.pdfbig-data-analytics-and-iot-in-logistics-a-case-study-2018.pdf
big-data-analytics-and-iot-in-logistics-a-case-study-2018.pdf
 
pole2016-A-Recent-Study-of-Emerging-Tools.pdf
pole2016-A-Recent-Study-of-Emerging-Tools.pdfpole2016-A-Recent-Study-of-Emerging-Tools.pdf
pole2016-A-Recent-Study-of-Emerging-Tools.pdf
 
Hidroponik-Asik-di-Masa-Pandemik-Riza-S.-N.pdf
Hidroponik-Asik-di-Masa-Pandemik-Riza-S.-N.pdfHidroponik-Asik-di-Masa-Pandemik-Riza-S.-N.pdf
Hidroponik-Asik-di-Masa-Pandemik-Riza-S.-N.pdf
 
kambatla2014.pdf
kambatla2014.pdfkambatla2014.pdf
kambatla2014.pdf
 

Recently uploaded

Panet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRILPanet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRILChristina Parmionova
 
VIP Greater Noida Call Girls 9711199012 Escorts Service Noida Extension,Ms
VIP Greater Noida Call Girls 9711199012 Escorts Service Noida Extension,MsVIP Greater Noida Call Girls 9711199012 Escorts Service Noida Extension,Ms
VIP Greater Noida Call Girls 9711199012 Escorts Service Noida Extension,Msankitnayak356677
 
13875446-Ballistic Missile Trajectories.ppt
13875446-Ballistic Missile Trajectories.ppt13875446-Ballistic Missile Trajectories.ppt
13875446-Ballistic Missile Trajectories.pptsilvialandin2
 
VIP Call Girls Doodh Bowli ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Doodh Bowli ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...VIP Call Girls Doodh Bowli ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Doodh Bowli ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...Suhani Kapoor
 
Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012
Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012
Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012rehmti665
 
WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.Christina Parmionova
 
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…nishakur201
 
Start Donating your Old Clothes to Poor People kurnool
Start Donating your Old Clothes to Poor People kurnoolStart Donating your Old Clothes to Poor People kurnool
Start Donating your Old Clothes to Poor People kurnoolSERUDS INDIA
 
Club of Rome: Eco-nomics for an Ecological Civilization
Club of Rome: Eco-nomics for an Ecological CivilizationClub of Rome: Eco-nomics for an Ecological Civilization
Club of Rome: Eco-nomics for an Ecological CivilizationEnergy for One World
 
productionpost-productiondiary-240320114322-5004daf6.pptx
productionpost-productiondiary-240320114322-5004daf6.pptxproductionpost-productiondiary-240320114322-5004daf6.pptx
productionpost-productiondiary-240320114322-5004daf6.pptxHenryBriggs2
 
history of 1935 philippine constitution.pptx
history of 1935 philippine constitution.pptxhistory of 1935 philippine constitution.pptx
history of 1935 philippine constitution.pptxhellokittymaearciaga
 
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...Christina Parmionova
 
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...Suhani Kapoor
 
(怎样办)Sherbrooke毕业证本科/硕士学位证书
(怎样办)Sherbrooke毕业证本科/硕士学位证书(怎样办)Sherbrooke毕业证本科/硕士学位证书
(怎样办)Sherbrooke毕业证本科/硕士学位证书mbetknu
 
2024: The FAR, Federal Acquisition Regulations - Part 27
2024: The FAR, Federal Acquisition Regulations - Part 272024: The FAR, Federal Acquisition Regulations - Part 27
2024: The FAR, Federal Acquisition Regulations - Part 27JSchaus & Associates
 
(办)McGill毕业证怎么查学位证书
(办)McGill毕业证怎么查学位证书(办)McGill毕业证怎么查学位证书
(办)McGill毕业证怎么查学位证书mbetknu
 
High Class Call Girls Mumbai Tanvi 9910780858 Independent Escort Service Mumbai
High Class Call Girls Mumbai Tanvi 9910780858 Independent Escort Service MumbaiHigh Class Call Girls Mumbai Tanvi 9910780858 Independent Escort Service Mumbai
High Class Call Girls Mumbai Tanvi 9910780858 Independent Escort Service Mumbaisonalikaur4
 
How the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists LawmakersHow the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists LawmakersCongressional Budget Office
 
2024: The FAR, Federal Acquisition Regulations - Part 26
2024: The FAR, Federal Acquisition Regulations - Part 262024: The FAR, Federal Acquisition Regulations - Part 26
2024: The FAR, Federal Acquisition Regulations - Part 26JSchaus & Associates
 
Building the Commons: Community Archiving & Decentralized Storage
Building the Commons: Community Archiving & Decentralized StorageBuilding the Commons: Community Archiving & Decentralized Storage
Building the Commons: Community Archiving & Decentralized StorageTechSoup
 

Recently uploaded (20)

Panet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRILPanet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRIL
 
VIP Greater Noida Call Girls 9711199012 Escorts Service Noida Extension,Ms
VIP Greater Noida Call Girls 9711199012 Escorts Service Noida Extension,MsVIP Greater Noida Call Girls 9711199012 Escorts Service Noida Extension,Ms
VIP Greater Noida Call Girls 9711199012 Escorts Service Noida Extension,Ms
 
13875446-Ballistic Missile Trajectories.ppt
13875446-Ballistic Missile Trajectories.ppt13875446-Ballistic Missile Trajectories.ppt
13875446-Ballistic Missile Trajectories.ppt
 
VIP Call Girls Doodh Bowli ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Doodh Bowli ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...VIP Call Girls Doodh Bowli ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Doodh Bowli ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
 
Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012
Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012
Call Girls Connaught Place Delhi reach out to us at ☎ 9711199012
 
WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.
 
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
Goa Escorts WhatsApp Number South Goa Call Girl … 8588052666…
 
Start Donating your Old Clothes to Poor People kurnool
Start Donating your Old Clothes to Poor People kurnoolStart Donating your Old Clothes to Poor People kurnool
Start Donating your Old Clothes to Poor People kurnool
 
Club of Rome: Eco-nomics for an Ecological Civilization
Club of Rome: Eco-nomics for an Ecological CivilizationClub of Rome: Eco-nomics for an Ecological Civilization
Club of Rome: Eco-nomics for an Ecological Civilization
 
productionpost-productiondiary-240320114322-5004daf6.pptx
productionpost-productiondiary-240320114322-5004daf6.pptxproductionpost-productiondiary-240320114322-5004daf6.pptx
productionpost-productiondiary-240320114322-5004daf6.pptx
 
history of 1935 philippine constitution.pptx
history of 1935 philippine constitution.pptxhistory of 1935 philippine constitution.pptx
history of 1935 philippine constitution.pptx
 
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
 
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
 
(怎样办)Sherbrooke毕业证本科/硕士学位证书
(怎样办)Sherbrooke毕业证本科/硕士学位证书(怎样办)Sherbrooke毕业证本科/硕士学位证书
(怎样办)Sherbrooke毕业证本科/硕士学位证书
 
2024: The FAR, Federal Acquisition Regulations - Part 27
2024: The FAR, Federal Acquisition Regulations - Part 272024: The FAR, Federal Acquisition Regulations - Part 27
2024: The FAR, Federal Acquisition Regulations - Part 27
 
(办)McGill毕业证怎么查学位证书
(办)McGill毕业证怎么查学位证书(办)McGill毕业证怎么查学位证书
(办)McGill毕业证怎么查学位证书
 
High Class Call Girls Mumbai Tanvi 9910780858 Independent Escort Service Mumbai
High Class Call Girls Mumbai Tanvi 9910780858 Independent Escort Service MumbaiHigh Class Call Girls Mumbai Tanvi 9910780858 Independent Escort Service Mumbai
High Class Call Girls Mumbai Tanvi 9910780858 Independent Escort Service Mumbai
 
How the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists LawmakersHow the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists Lawmakers
 
2024: The FAR, Federal Acquisition Regulations - Part 26
2024: The FAR, Federal Acquisition Regulations - Part 262024: The FAR, Federal Acquisition Regulations - Part 26
2024: The FAR, Federal Acquisition Regulations - Part 26
 
Building the Commons: Community Archiving & Decentralized Storage
Building the Commons: Community Archiving & Decentralized StorageBuilding the Commons: Community Archiving & Decentralized Storage
Building the Commons: Community Archiving & Decentralized Storage
 

Orange Text Analytics Workshop Hands-On Guide

  • 1. Hands on Text Analytics with Orange Ajda Pretnar ajda.pretnar@fri.uni-lj.si University of Ljubljana, Slovenia Niko Colnerič niko.colneric@fri.uni-lj.si University of Ljubljana, Slovenia Lan Žagar lan.zagar@fri.uni-lj.si University of Ljubljana, Slovenia Orange for Text Analytics In recent years, the digital humanities community has been introduced to many powerful tools for text analysis, but few of these tools combine powerful data mining and machine learning algorithms within a sim- ple and capable user interface. For flexible and crea- tive analysis, researchers need a tool that focuses on intuition, visualizations and interactivity. This workshop will introduce participants to Or- ange, a visual programming environment for data min- ing, suitable for both beginners and experts. Particular emphasis will be placed on its Text add-on, which of- fers components for text mining, visualization and deep- learning-based embedding. This is a hands-on workshop, where the partici- pants will actively construct analytical workflows and go through case studies with the help of the instruc- tors. They will learn how to manage textual data, pre- process it, use machine learning, data projection and visualisation techniques to expose hidden patterns and evaluate the resulting models. At the end of the workshop, the participants will know how to use vis- ual programming to seamlessly construct powerful data analysis workflows, which can be applied to a wide range of challenges in digital humanities. Structure of the Workshop Part 1: Visual programming, workflows, data input and preprocessing First, we will show the basics of Orange: how to load the data, inspect and visualize it. Participants will be introduced to several options for data import, from standard Corpus to Twitter, Guardian and Text Import. Once the corpus is loaded, we will preprocess it and display the result in a word cloud. A particular empha- sis will be on the use of custom preprocessing tech- niques and how to successfully apply them to the cor- pus. The results of each technique will be observed in an interactive word cloud and concordances. Figure 1: Preprocessing results displayed in a word cloud Part 2: Machine learning and deep-learning- based embedding for predictive analysis Next, we will use Twitter data to construct an au- thor prediction pipeline and test some classifiers. We will fetch author Timelines from Twitter and observe the retrieved corpus. This time we will introduce a pre-trained tweet tokenizer and pass the prepro- cessed corpus through a bag of words. We will discuss bag of words parameters and how to best prepare the data for further analysis. The results of using different parameters will be observed in a data table to under- stand the underlying data structures. For comparison, we will use deep-learning-based embedding to derive vector representation of tweets and in this way enable machine learning. We will explain how we can use machine learning in text mining and introduce a number of techniques for predictive analysis. We will use cross-validation to test the constructed bag of words models and compare classification scores for each algorithm. We will dis- cuss the quality of constructed models and what scores are usually the best for observing model qual- ity. Additionally, we will inspect misclassified tweets in a confusion matrix and even further in Corpus Viewer, to leverage the possibilities of a close(r) read- ing. Part 3: Data clustering, sentiment analysis, image and geo analytics
  • 2. In the third part, we will work on geomapping and image analytics. We will transform textual and visual data into feature vectors and plot these data onto a world map to discover interesting relations. We will discuss how to acquire geolocated data from Twitter and why this is useful. Next, we will use geotagged Twitter data and apply a pre-trained senti- ment analysis model to acquire sentiment orientation. We will map the sentiment-tagged tweets and explore how to use sentiment together with geomapping. Finally, the participants will be introduced to image analytics for humanities research. We will explain why and how to transform raw images into multidimen- sional vectors and how to work with the new data. We will cluster Instagram images into groups and explore how to map image-containing tweets on a world map. Do images correspond to geolocation? We will see. Figure 2: Images from social media are embedded with ImageNet embedding, clustered with Hierarchical Clustering and displayed on a map by their geolocation.