Data science tools - A.Marchev and K.Haralampiev

Popular Data Science Tools
Angel Marchev 2.0
Kaloyan Haralampiev

3
What we did?
Over 3000 members
Cooperation with communities
and Universities from Europe
and Asia
More than 50
countries
25 real cases in 3
Datathons
63 superb solutions
Students and experts
with up to 20 years of
experience
4 years
Area of Machine Learning,
NLP, Data enrichment,
Computer Vision and AI
Working with SME and Big
companies

4
Hack the Fake
News Datathon
What we dare to?
Our first event
Two projects with
30 volunteers
Big data conference
2014, Nov 2015 2016
2017
2018
The First online
#Datathon2018
Academia
#Datathon
Apr
• Over 50 meetups
• 8 conferences participation
• 2 workshops
Jun SepFeb
Hack the News
#Datathon
#Datathon2018
v2
First Datathon in CEE
Mar May

5
Past #Datathon2018
144 participants
39 teams 9 cases
493 chat rooms
16563 messages
exchanged
32 mentors and
industry experts
24
countries
The #Datathon2018 participants
managed to solve all cases
There was great fun with more than
4 fun sessions
А lot of beer and pizza was
consumed
38 quality Data solutions at the
end
Great results challenging
even for the companies
38 solutions

6
Impressions
Milena Yankova
Head of Research &
Innovation
Shashank Shekhar
Manager - Data Sciences .
Agamemnon
Baltagiannis
Principal Data Scientist
Tomislav Križan
CPO, Member of the Board
“The results of our case are
impressive and have further
motivated our R&D department to
explore more opportunities and apply
some of the team results that worked
on it.”
“The best thing about this Datathon
was its global footprint. I was amazed
by the sheer enthusiasm that the
participants demonstrated. The
resilience and adaptability shown by a
lot of them in providing a working
solution to real life problems made
this Datathon a huge success."
“Thank you all for this great
weekend. It was a fantastic
challenge and I am happy that I
saw deep technical work from all
the participants! I will be always
here to support the DSS
community”
“From all finalists we did see
good and novel approach...
also those who didn't arrive to
finals, were also really close ...
so good job to all teams!"
“The teams solutions were well documented in CRISP-DM
Methodology at Datathon 2018 organized by DSS, in which
Kaufland was proud to participate”

Introduction
• It is impossible to cover all tools,
• so we reduced the number of tools covered to the ones we use
• Still the task is hard, due to:
– Various types of tools (noise in the input data)
– Many criteria (so multi-dimensional problem)
– Tools for many purposes (overlapping categories)
• Hmmmm..!? Sounds like an ideal case for Multi-dimensional
scaling (MDS)
• SO LET’S GO FULL NERDY ON IT

MDS Map
Features:
Application
• Statistics
• Econometrics
• Data mining
Workflow
• Console
• Menus
• Nodes
• Online
License
• Free
• Non-free
Relatedness
Popularity
Interactivity
- Free
- Non free
Popularity:

MDS Map
“The All-Stars”
“The Classics”
“The User-Friendlies”
“The On-liners”

Excel Data Analysis
• Application: Statistical analysis
• Interface: Menus and windows
• Price: Licensed
• Pros: Availability (almost
everybody have Excel)
• Cons: Works with selected cells
not with variable names

IBM SPSS Statistics
• Application: Statistical analysis; Econometric analysis
• Interface: Menus and windows; Command console
• Price: Licensed
• Pros: Very large set of analyses
• Cons: Non-interactive

PSPP
• Price: Free
• Pros: “Free” SPSS Statistics
• Cons: Relatively small set of
analyses; Non-interactive

eViews
• Application: Econometric
analysis
• Interface: Menus and
windows; Command console
• Price: Licensed
• Pros: Efficient calculations
• Cons: Data import issues

Gretl
• Application: Econometric analysis
• Interface: Menus and windows;
Command console
• Price: Free
• Pros: Hansl (localized user manual)
• Cons: Limit to the volume of data

Python
• Application: Statistical analysis;
Econometric analysis; Data
mining
• Interface: Command console
• Price: Free
• Pros: Global community
developing libs
• Cons:

R (+R studio)
• Application: Statistical analysis;
Econometric analysis; Data mining
• Price: Free
• Pros: Global community
developing libs
• Cons: a little weird language

Jupyter Notebook
• Application: Data mining
• Interface: Online platform
• Price: Free
• Pros: Industry standard for Data Science
• Cons:

MatLab
• Application: Statistical analysis; Econometric analysis
• Price: Licensed
• Pros: Great documentation,
parallel computing
• Cons: Expensive

JASP
• Price: Free
• Pros: Interactive
• Cons: Relatively small set of
analyses

Weka
• Application: Statistical analysis; Data
mining
• Interface: Graphical stream/workflow
• Price: Free
• Pros: One of the original revolutionaries
• Cons: outdated and clumsy

Rapid Miner
• Application: Statistical analysis; Data
mining
• Price: Licensed
• Pros: Probably the most intuitive interface
• Cons:

KNIME
• Application: Statistical analysis; Data mining
• Price: Free
• Cons: Relatively small set of analyses

Orange
• Interface: Graphical
stream/workflow
• Price: Free
• Cons: Relatively small set
of analyses

IBM SPSS Modeler
• Application: Econometric analysis; Data mining
• Price: Licensed
• Pros: well utilizing resources
• Cons: not user friendly when dealing with lots of
features

MatLab Classification Learner
• Interface: Graphical
stream/workflow
• Price: Licensed
• Pros: part of Matlab
environment
• Cons: still under
development to include more
models

Microsoft Azure
• Interface: Online
platform
• Price: Licensed
• Pros: Many tools already
available
• Cons: Could be a little
hard to set-up

IBM Watson Studio
• Price: Licensed
• Pros: brand new
• Cons: still some computability issues

Amazon ML
• Price: Licensed
• Pros: integrated with AWS
S3 and could work real-
time
• Cons: still under
development to include
more models

Google Colab
• Price: Free
• Pros: GPU computation via Tensor Flow
• Cons: 12 hours at a time

Selection tree
• What type of problem do you solve? (Application)
• What type of interface would be suitable? (Workflow)
• Licensed or non-licensed? (Price)
Application Workflow Price Software
Statistical analysis Menus and windows Licensed Excel Data Analysis IBM SPSS Statistics
Free PSPP JASP
Command console Licensed MatLab IBM SPSS Statistics
Free R (+ R Studio) Python
Graphical stream/workflow Licensed Rapid Miner
Free KNIME Weka
Econometric analysis Menus and windows Licensed eViews IBM SPSS Statistics
Free Gretl
Command console Licensed eViews IBM SPSS Statistics MatLab
Free Gretl R (+ R Studio) Python
Graphical stream/workflow Licensed IBM SPSS Modeler
Data mining Command console Licensed Matlab
Free R (+ R Studio) Python
Graphical stream/workflow Licensed IBM SPSS Modeler Rapid Miner Matlab Classification App
Free Orange KNIME Weka
Online platform Licensed IBM Watson Studio Microsoft Azure Amazon ML
Free Google Colab Jupyter Notebook

Q & A
• angel.marchev@datasciencesociety.net
k_haralampiev@phls.uni-sofia.bg

Data science tools - A.Marchev and K.Haralampiev

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Similar to Data science tools - A.Marchev and K.Haralampiev

Similar to Data science tools - A.Marchev and K.Haralampiev (20)

More from Data Science Society

More from Data Science Society (20)

Recently uploaded

Recently uploaded (20)

Data science tools - A.Marchev and K.Haralampiev