SlideShare a Scribd company logo
1 of 35
Data Scientist 101:
How to become a Super Cruncher
“All truths are easy to understand once they are
discovered; the point is to discover them.”
The 4 “soft” C's of a Data Scientist
...and the 5 R's of 21st Century Literacy
⇨Reading
⇨wRiting
⇨aRithmetic
⇨pRobability
⇨R
Source: Joe BlitzStein, Harvard
"data scientists should take a page
from social scientists, who have a
long history of asking where the
data they're working with comes
from, what methods were used to
gather and analyze it, and what
cognitive biases they might bring to
its interpretation."
Kate Crawford, Microsoft Research/MIT
Wrong prediction
due to extensive
media attention &
coverage
Data Science: wetting your appetite
The Data Science Venn Diagram
Source: Drew Conway, NYU
http://drewconway.com/zia/2013/3/
26/the-data-science-venn-diagram
Another way to look at things...
The nerdy approach...
Source: Hillary Mason, bit.ly
Data Scientists have more fun
Source: How to Engage and Retain Analytical Talent
By Elizabeth Craig, Jeanne G. Harris and Henry Egan
January 2010
How Do I Become A Data Scientist?
⇨ Learn about matrix factorizations
⇨ Learn about distributed computing
⇨ Learn about statistical analysis
⇨ Learn about optimization
⇨ Learn about machine learning
⇨ Learn about information retrieval
⇨ Learn about signal detection and estimation
⇨ Master algorithms and data structures
⇨ Practice
⇨ Study Engineering
Source: http://www.quora.com/Career-Advice/How-do-I-become-a-data-scientist
6 levels of expertise needed
Data wranglingStatistics
Data mining Visualization
Communication
Data
Science*
Domain & Business Expertise
* a bit of programming
skills doesn't hurt either
Programming Skills?
C
C++
PAL
Smalltalk
VB.Net
C#
SQL
LotusScript
VBScript
JavaScript
HTML
Delphi
(Java)
Python
R
Perl
Me “Them”
Prolog Octave
Ruby
SQL
Pascal
SQL Still Matters!
⇨ Big Data SQL
⇨ Hbase & Hive
⇨ Amazon Redshift
⇨ Cloudera Impala
⇨ HortonWorks Stinger
⇨ ...
Source: KDNuggets.com
How about Technology?
New analytics->new infrastructure
The Analytics Landscape
Why you need (some) Statistics
Correlation != Causation
Learning Statistics
⇨ Coursera.org
⇨ Statistics One
⇨ Passion Driven Statistics
⇨ Statistics: Making sense of Data
Essentially,
all models are wrong...
...but some are useful
George E.P. Box
Learning Data Mining
⇨ Coursera.org
⇨ Machine Learning
⇨ Neural Networks for
Machine Learning
⇨ Kaggle.com
⇨ Kaggle In Class
VisualizationVisualization
Visualization is...
Theconversionofanyabstractdataintoagraphicalformatsothecharacteristicsand
relationshipsofthedatacanbeexploredandanalyzed.
⇨ Humans have the ability to analyze large amounts of information that is
presented visually
⇨ This is good for certain types of pattern and trend analysis
⇨ It’s often easy to detect outliers and unusual patterns
Usefulforexploration,explanation,discovery,but not forautomatedsystemactions.
How many 5's?
3435261241134352612203498723566
9623466620398652034095823450238
4560289567109238401645089630489
5769782364196873484
Again: how many 5's?
3435261241134352612203498723566
9623466620398652034095823450238
4560289567109238401645089630489
5769782364196873484
Learning Visualization
⇨ Stephen Few classes ($$)
⇨ Alberto Cairo
⇨ Introduction to Data Journalism
Want to get your feet wet?
Tableau Public
http://www.tableausoftware.com/public/
SAS Visual Analytics
http://www.sas.com/software/visual-analytics
Where to go from here?
⇨ Read 'Competing on Analytics'
⇨ Move on to 'Data Analysis Using SQL and Excel'
⇨ Then buy 'Handbook of Statistical Analysis & Data Mining
Applications'
⇨ Statistics for business:
⇨
http://home.ubalt.edu/ntsbarsh/Business-stat/opre504.htm
⇨ Data Mining:
⇨ www.rapid-i.com (RapidMiner)
⇨
http://www.thearling.com
⇨ http://www.autonlab.org/tutorials/
⇨ For free text books, search www.scribd.com
⇨ Enter http://www.coursera.org
More Resources to Get You Started
Books:
⇨ DataMiningTechniques:ForMarketing,SalesandCustomerSupport,MichaelJ.BarryandGordonLinoff
⇨
DataPreparationforDataMining,DorianPyle
⇨ DataMiningAlgorithms,ElbeFrank,IanWitten,JimGray
⇨
AnIntroductiontoInformationRetrieval,ChristopherD.Manning,PrabhakarRaghavan,HinrichSchütze
⇨ InformationRetrieval,C.J.vanRijsbergen
⇨
TheVisualDisplayofQuantitativeInformation,EdwardR.Tufte
Journals,Newsletters,WebSites:
⇨
SIGKDDExplorations,NewsletteroftheACMSIGonKnowledgeDiscoveryandDataMining
⇨ IEEETransactionsonPatternAnalysisandMachineIntelligence
⇨
SASKnowledgeExchange: www.sas.com/knowledge-exchange/business-analytics
⇨ KDNuggetsdataminingresources: www.kdnuggets.com
⇨
FlowingData,visualizationresources: http://flowingdata.com/
⇨ Infoaesthetics,visualdesignresources: http://infosthetics.com/
⇨
VisualComplexity,visualizationresources: www.visualcomplexity.com/vc/index.cfm
⇨ Recommendationsystemsresources:
http://www.deitel.com/ResourceCenters/Web20/RecommenderSystems/tabid/1229/Default.aspx
⇨
TheImpoverishedSocialScientist'sGuidetoFreeStatisticalSoftwareandResources: http://maltman.hmdc.harvard.edu/socsci.shtml
Free Stuff So You Can Work Cheaply
⇨
WEKA http://www.cs.waikato.ac.nz/ml/weka/
⇨ IND decision tree software http://opensource.arc.nasa.gov/software/ind/
⇨
Clustering http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/
⇨ Parallel Sets http://eagereyes.org/parallel-sets#download
⇨
RapidMiner http://rapid-i.com/content/blogcategory/38/69/
⇨ Knime http://www.knime.org/
⇨ Orange http://www.ailab.si/Orange/
⇨
R statistics software http://www.r-project.org/
⇨ ARC statistics software http://www.stat.umn.edu/arc/software.html
⇨
Octave numerical and matrix computation http://www.gnu.org/software/octave/
⇨ Processing http://www.processing.org/
⇨
Circos http://mkweb.bcgsc.ca/circos/
⇨
Treemap http://www.cs.umd.edu/hcil/treemap/
⇨ Many Eyes http://manyeyes.alphaworks.ibm.com/manyeyes/
⇨ Dutch Students: SAS & SPSS Academic Licenses (e.g. SurfSpot.nl)
Web: www.sas.com
Email: jos.vandongen<at>sas.com
Phone: +31-(0)6-10172008
Skype: tholis.jos
LinkedIn: jvdongen
Twitter: josvandongen
Delicious: jvdongen
Jos van Dongen
In BI since 1991
Principal Consultant @ SAS
Author/Speaker/Analyst

More Related Content

What's hot

Data Science 101
Data Science 101Data Science 101
Data Science 101odsc
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USCSri Ambati
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science Mahesh Kumar CV
 
Begin with Data Scientist
Begin with Data ScientistBegin with Data Scientist
Begin with Data ScientistNarong Intiruk
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 
Data Science presentation for elementary school students
Data Science presentation for elementary school studentsData Science presentation for elementary school students
Data Science presentation for elementary school studentsMelanie Manning, CFA
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsSrinath Perera
 

What's hot (20)

Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USC
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
 
Begin with Data Scientist
Begin with Data ScientistBegin with Data Scientist
Begin with Data Scientist
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Data Science presentation for elementary school students
Data Science presentation for elementary school studentsData Science presentation for elementary school students
Data Science presentation for elementary school students
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 

Viewers also liked

Visualization 101 BA4All
Visualization 101 BA4AllVisualization 101 BA4All
Visualization 101 BA4AllJos van Dongen
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Jos van Dongen
 
PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012Jos van Dongen
 
Hi Speed Datawarehousing
Hi Speed DatawarehousingHi Speed Datawarehousing
Hi Speed DatawarehousingJos van Dongen
 
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...Domino Data Lab
 
Open Source Business Intelligence
Open Source Business IntelligenceOpen Source Business Intelligence
Open Source Business IntelligenceJos van Dongen
 
Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Jos van Dongen
 
A Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataA Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataEdward Hsu
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?Jos van Dongen
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Cambridge Semantics
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo UnstructuredCambridge Semantics
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData
 
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleCambridge Semantics
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraRobbie Strickland
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosDataWorks Summit
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraRobbie Strickland
 
How to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using SemanticsHow to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using SemanticsCambridge Semantics
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesCambridge Semantics
 

Viewers also liked (20)

Visualization 101 BA4All
Visualization 101 BA4AllVisualization 101 BA4All
Visualization 101 BA4All
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?
 
PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012
 
Hi Speed Datawarehousing
Hi Speed DatawarehousingHi Speed Datawarehousing
Hi Speed Datawarehousing
 
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
 
Open Source Business Intelligence
Open Source Business IntelligenceOpen Source Business Intelligence
Open Source Business Intelligence
 
Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?
 
A Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataA Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big Data
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise Scale
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
How to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using SemanticsHow to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using Semantics
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
 

Similar to Data Scientist 101 BI Dutch

Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk KnowledgeKrishna Sankar
 
New Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max WellingNew Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max WellingTextkernel
 
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...Data Driven Innovation
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongMarTech Conference
 
SSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research AssessmentSSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research AssessmentWilliam Gunn
 
Unlocking the Potential: Data as a Medium for Design & Justice
Unlocking the Potential: Data as a Medium for Design & JusticeUnlocking the Potential: Data as a Medium for Design & Justice
Unlocking the Potential: Data as a Medium for Design & JusticeJess Freaner
 
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.Josh Cowls
 
It's all a game: The twin fallacies of epistemic purity and the scholarly inv...
It's all a game: The twin fallacies of epistemic purity and the scholarly inv...It's all a game: The twin fallacies of epistemic purity and the scholarly inv...
It's all a game: The twin fallacies of epistemic purity and the scholarly inv...Carl Bergstrom
 
The Need for Deep Learning Transparency
The Need for Deep Learning TransparencyThe Need for Deep Learning Transparency
The Need for Deep Learning Transparencyinside-BigData.com
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsHugo Bowne-Anderson
 
Look Around: Question Answering, Serendipity, and the Research Process of Sch...
Look Around: Question Answering, Serendipity, and the Research Process of Sch...Look Around: Question Answering, Serendipity, and the Research Process of Sch...
Look Around: Question Answering, Serendipity, and the Research Process of Sch...KimberleyMartin
 
Fairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionFairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionAzzurra Ragone
 
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBigML, Inc
 
Bi(G) data: opportunities for BI Professionals
Bi(G) data: opportunities for BI ProfessionalsBi(G) data: opportunities for BI Professionals
Bi(G) data: opportunities for BI ProfessionalsAlbert Besselse
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Jonathan Stray
 
Laws and limits of data science 11 10-14
Laws and limits of data science 11 10-14Laws and limits of data science 11 10-14
Laws and limits of data science 11 10-14Michael Brodie
 

Similar to Data Scientist 101 BI Dutch (20)

Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 
New Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max WellingNew Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max Welling
 
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
 
SSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research AssessmentSSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research Assessment
 
Unlocking the Potential: Data as a Medium for Design & Justice
Unlocking the Potential: Data as a Medium for Design & JusticeUnlocking the Potential: Data as a Medium for Design & Justice
Unlocking the Potential: Data as a Medium for Design & Justice
 
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
 
It's all a game: The twin fallacies of epistemic purity and the scholarly inv...
It's all a game: The twin fallacies of epistemic purity and the scholarly inv...It's all a game: The twin fallacies of epistemic purity and the scholarly inv...
It's all a game: The twin fallacies of epistemic purity and the scholarly inv...
 
The Need for Deep Learning Transparency
The Need for Deep Learning TransparencyThe Need for Deep Learning Transparency
The Need for Deep Learning Transparency
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Look Around: Question Answering, Serendipity, and the Research Process of Sch...
Look Around: Question Answering, Serendipity, and the Research Process of Sch...Look Around: Question Answering, Serendipity, and the Research Process of Sch...
Look Around: Question Answering, Serendipity, and the Research Process of Sch...
 
Fairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionFairness in Machine Learning @Codemotion
Fairness in Machine Learning @Codemotion
 
The Ethics of AI
The Ethics of AIThe Ethics of AI
The Ethics of AI
 
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic Modeling
 
01datamining.pdf
01datamining.pdf01datamining.pdf
01datamining.pdf
 
Bi(G) data: opportunities for BI Professionals
Bi(G) data: opportunities for BI ProfessionalsBi(G) data: opportunities for BI Professionals
Bi(G) data: opportunities for BI Professionals
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
 
Laws and limits of data science 11 10-14
Laws and limits of data science 11 10-14Laws and limits of data science 11 10-14
Laws and limits of data science 11 10-14
 

Recently uploaded

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Recently uploaded (20)

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 

Data Scientist 101 BI Dutch

  • 1. Data Scientist 101: How to become a Super Cruncher
  • 2. “All truths are easy to understand once they are discovered; the point is to discover them.”
  • 3. The 4 “soft” C's of a Data Scientist
  • 4. ...and the 5 R's of 21st Century Literacy ⇨Reading ⇨wRiting ⇨aRithmetic ⇨pRobability ⇨R Source: Joe BlitzStein, Harvard
  • 5. "data scientists should take a page from social scientists, who have a long history of asking where the data they're working with comes from, what methods were used to gather and analyze it, and what cognitive biases they might bring to its interpretation." Kate Crawford, Microsoft Research/MIT
  • 6. Wrong prediction due to extensive media attention & coverage
  • 7. Data Science: wetting your appetite
  • 8. The Data Science Venn Diagram Source: Drew Conway, NYU http://drewconway.com/zia/2013/3/ 26/the-data-science-venn-diagram
  • 9. Another way to look at things...
  • 10. The nerdy approach... Source: Hillary Mason, bit.ly
  • 11. Data Scientists have more fun Source: How to Engage and Retain Analytical Talent By Elizabeth Craig, Jeanne G. Harris and Henry Egan January 2010
  • 12. How Do I Become A Data Scientist? ⇨ Learn about matrix factorizations ⇨ Learn about distributed computing ⇨ Learn about statistical analysis ⇨ Learn about optimization ⇨ Learn about machine learning ⇨ Learn about information retrieval ⇨ Learn about signal detection and estimation ⇨ Master algorithms and data structures ⇨ Practice ⇨ Study Engineering Source: http://www.quora.com/Career-Advice/How-do-I-become-a-data-scientist
  • 13. 6 levels of expertise needed Data wranglingStatistics Data mining Visualization Communication Data Science* Domain & Business Expertise * a bit of programming skills doesn't hurt either
  • 15. SQL Still Matters! ⇨ Big Data SQL ⇨ Hbase & Hive ⇨ Amazon Redshift ⇨ Cloudera Impala ⇨ HortonWorks Stinger ⇨ ... Source: KDNuggets.com
  • 19. Why you need (some) Statistics
  • 21. Learning Statistics ⇨ Coursera.org ⇨ Statistics One ⇨ Passion Driven Statistics ⇨ Statistics: Making sense of Data
  • 22.
  • 23. Essentially, all models are wrong... ...but some are useful George E.P. Box
  • 24. Learning Data Mining ⇨ Coursera.org ⇨ Machine Learning ⇨ Neural Networks for Machine Learning ⇨ Kaggle.com ⇨ Kaggle In Class
  • 26. Visualization is... Theconversionofanyabstractdataintoagraphicalformatsothecharacteristicsand relationshipsofthedatacanbeexploredandanalyzed. ⇨ Humans have the ability to analyze large amounts of information that is presented visually ⇨ This is good for certain types of pattern and trend analysis ⇨ It’s often easy to detect outliers and unusual patterns Usefulforexploration,explanation,discovery,but not forautomatedsystemactions.
  • 28. Again: how many 5's? 3435261241134352612203498723566 9623466620398652034095823450238 4560289567109238401645089630489 5769782364196873484
  • 29. Learning Visualization ⇨ Stephen Few classes ($$) ⇨ Alberto Cairo ⇨ Introduction to Data Journalism
  • 30. Want to get your feet wet? Tableau Public http://www.tableausoftware.com/public/ SAS Visual Analytics http://www.sas.com/software/visual-analytics
  • 31. Where to go from here? ⇨ Read 'Competing on Analytics' ⇨ Move on to 'Data Analysis Using SQL and Excel' ⇨ Then buy 'Handbook of Statistical Analysis & Data Mining Applications' ⇨ Statistics for business: ⇨ http://home.ubalt.edu/ntsbarsh/Business-stat/opre504.htm ⇨ Data Mining: ⇨ www.rapid-i.com (RapidMiner) ⇨ http://www.thearling.com ⇨ http://www.autonlab.org/tutorials/ ⇨ For free text books, search www.scribd.com ⇨ Enter http://www.coursera.org
  • 32. More Resources to Get You Started Books: ⇨ DataMiningTechniques:ForMarketing,SalesandCustomerSupport,MichaelJ.BarryandGordonLinoff ⇨ DataPreparationforDataMining,DorianPyle ⇨ DataMiningAlgorithms,ElbeFrank,IanWitten,JimGray ⇨ AnIntroductiontoInformationRetrieval,ChristopherD.Manning,PrabhakarRaghavan,HinrichSchütze ⇨ InformationRetrieval,C.J.vanRijsbergen ⇨ TheVisualDisplayofQuantitativeInformation,EdwardR.Tufte Journals,Newsletters,WebSites: ⇨ SIGKDDExplorations,NewsletteroftheACMSIGonKnowledgeDiscoveryandDataMining ⇨ IEEETransactionsonPatternAnalysisandMachineIntelligence ⇨ SASKnowledgeExchange: www.sas.com/knowledge-exchange/business-analytics ⇨ KDNuggetsdataminingresources: www.kdnuggets.com ⇨ FlowingData,visualizationresources: http://flowingdata.com/ ⇨ Infoaesthetics,visualdesignresources: http://infosthetics.com/ ⇨ VisualComplexity,visualizationresources: www.visualcomplexity.com/vc/index.cfm ⇨ Recommendationsystemsresources: http://www.deitel.com/ResourceCenters/Web20/RecommenderSystems/tabid/1229/Default.aspx ⇨ TheImpoverishedSocialScientist'sGuidetoFreeStatisticalSoftwareandResources: http://maltman.hmdc.harvard.edu/socsci.shtml
  • 33. Free Stuff So You Can Work Cheaply ⇨ WEKA http://www.cs.waikato.ac.nz/ml/weka/ ⇨ IND decision tree software http://opensource.arc.nasa.gov/software/ind/ ⇨ Clustering http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/ ⇨ Parallel Sets http://eagereyes.org/parallel-sets#download ⇨ RapidMiner http://rapid-i.com/content/blogcategory/38/69/ ⇨ Knime http://www.knime.org/ ⇨ Orange http://www.ailab.si/Orange/ ⇨ R statistics software http://www.r-project.org/ ⇨ ARC statistics software http://www.stat.umn.edu/arc/software.html ⇨ Octave numerical and matrix computation http://www.gnu.org/software/octave/ ⇨ Processing http://www.processing.org/ ⇨ Circos http://mkweb.bcgsc.ca/circos/ ⇨ Treemap http://www.cs.umd.edu/hcil/treemap/ ⇨ Many Eyes http://manyeyes.alphaworks.ibm.com/manyeyes/ ⇨ Dutch Students: SAS & SPSS Academic Licenses (e.g. SurfSpot.nl)
  • 34.
  • 35. Web: www.sas.com Email: jos.vandongen<at>sas.com Phone: +31-(0)6-10172008 Skype: tholis.jos LinkedIn: jvdongen Twitter: josvandongen Delicious: jvdongen Jos van Dongen In BI since 1991 Principal Consultant @ SAS Author/Speaker/Analyst