SlideShare a Scribd company logo
Data Scientist 101:
How to become a Super Cruncher
“All truths are easy to understand once they are
discovered; the point is to discover them.”
The 4 “soft” C's of a Data Scientist
...and the 5 R's of 21st Century Literacy
⇨Reading
⇨wRiting
⇨aRithmetic
⇨pRobability
⇨R
Source: Joe BlitzStein, Harvard
"data scientists should take a page
from social scientists, who have a
long history of asking where the
data they're working with comes
from, what methods were used to
gather and analyze it, and what
cognitive biases they might bring to
its interpretation."
Kate Crawford, Microsoft Research/MIT
Wrong prediction
due to extensive
media attention &
coverage
Data Science: wetting your appetite
The Data Science Venn Diagram
Source: Drew Conway, NYU
http://drewconway.com/zia/2013/3/
26/the-data-science-venn-diagram
Another way to look at things...
The nerdy approach...
Source: Hillary Mason, bit.ly
Data Scientists have more fun
Source: How to Engage and Retain Analytical Talent
By Elizabeth Craig, Jeanne G. Harris and Henry Egan
January 2010
How Do I Become A Data Scientist?
⇨ Learn about matrix factorizations
⇨ Learn about distributed computing
⇨ Learn about statistical analysis
⇨ Learn about optimization
⇨ Learn about machine learning
⇨ Learn about information retrieval
⇨ Learn about signal detection and estimation
⇨ Master algorithms and data structures
⇨ Practice
⇨ Study Engineering
Source: http://www.quora.com/Career-Advice/How-do-I-become-a-data-scientist
6 levels of expertise needed
Data wranglingStatistics
Data mining Visualization
Communication
Data
Science*
Domain & Business Expertise
* a bit of programming
skills doesn't hurt either
Programming Skills?
C
C++
PAL
Smalltalk
VB.Net
C#
SQL
LotusScript
VBScript
JavaScript
HTML
Delphi
(Java)
Python
R
Perl
Me “Them”
Prolog Octave
Ruby
SQL
Pascal
SQL Still Matters!
⇨ Big Data SQL
⇨ Hbase & Hive
⇨ Amazon Redshift
⇨ Cloudera Impala
⇨ HortonWorks Stinger
⇨ ...
Source: KDNuggets.com
How about Technology?
New analytics->new infrastructure
The Analytics Landscape
Why you need (some) Statistics
Correlation != Causation
Learning Statistics
⇨ Coursera.org
⇨ Statistics One
⇨ Passion Driven Statistics
⇨ Statistics: Making sense of Data
Essentially,
all models are wrong...
...but some are useful
George E.P. Box
Learning Data Mining
⇨ Coursera.org
⇨ Machine Learning
⇨ Neural Networks for
Machine Learning
⇨ Kaggle.com
⇨ Kaggle In Class
VisualizationVisualization
Visualization is...
Theconversionofanyabstractdataintoagraphicalformatsothecharacteristicsand
relationshipsofthedatacanbeexploredandanalyzed.
⇨ Humans have the ability to analyze large amounts of information that is
presented visually
⇨ This is good for certain types of pattern and trend analysis
⇨ It’s often easy to detect outliers and unusual patterns
Usefulforexploration,explanation,discovery,but not forautomatedsystemactions.
How many 5's?
3435261241134352612203498723566
9623466620398652034095823450238
4560289567109238401645089630489
5769782364196873484
Again: how many 5's?
3435261241134352612203498723566
9623466620398652034095823450238
4560289567109238401645089630489
5769782364196873484
Learning Visualization
⇨ Stephen Few classes ($$)
⇨ Alberto Cairo
⇨ Introduction to Data Journalism
Want to get your feet wet?
Tableau Public
http://www.tableausoftware.com/public/
SAS Visual Analytics
http://www.sas.com/software/visual-analytics
Where to go from here?
⇨ Read 'Competing on Analytics'
⇨ Move on to 'Data Analysis Using SQL and Excel'
⇨ Then buy 'Handbook of Statistical Analysis & Data Mining
Applications'
⇨ Statistics for business:
⇨
http://home.ubalt.edu/ntsbarsh/Business-stat/opre504.htm
⇨ Data Mining:
⇨ www.rapid-i.com (RapidMiner)
⇨
http://www.thearling.com
⇨ http://www.autonlab.org/tutorials/
⇨ For free text books, search www.scribd.com
⇨ Enter http://www.coursera.org
More Resources to Get You Started
Books:
⇨ DataMiningTechniques:ForMarketing,SalesandCustomerSupport,MichaelJ.BarryandGordonLinoff
⇨
DataPreparationforDataMining,DorianPyle
⇨ DataMiningAlgorithms,ElbeFrank,IanWitten,JimGray
⇨
AnIntroductiontoInformationRetrieval,ChristopherD.Manning,PrabhakarRaghavan,HinrichSchütze
⇨ InformationRetrieval,C.J.vanRijsbergen
⇨
TheVisualDisplayofQuantitativeInformation,EdwardR.Tufte
Journals,Newsletters,WebSites:
⇨
SIGKDDExplorations,NewsletteroftheACMSIGonKnowledgeDiscoveryandDataMining
⇨ IEEETransactionsonPatternAnalysisandMachineIntelligence
⇨
SASKnowledgeExchange: www.sas.com/knowledge-exchange/business-analytics
⇨ KDNuggetsdataminingresources: www.kdnuggets.com
⇨
FlowingData,visualizationresources: http://flowingdata.com/
⇨ Infoaesthetics,visualdesignresources: http://infosthetics.com/
⇨
VisualComplexity,visualizationresources: www.visualcomplexity.com/vc/index.cfm
⇨ Recommendationsystemsresources:
http://www.deitel.com/ResourceCenters/Web20/RecommenderSystems/tabid/1229/Default.aspx
⇨
TheImpoverishedSocialScientist'sGuidetoFreeStatisticalSoftwareandResources: http://maltman.hmdc.harvard.edu/socsci.shtml
Free Stuff So You Can Work Cheaply
⇨
WEKA http://www.cs.waikato.ac.nz/ml/weka/
⇨ IND decision tree software http://opensource.arc.nasa.gov/software/ind/
⇨
Clustering http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/
⇨ Parallel Sets http://eagereyes.org/parallel-sets#download
⇨
RapidMiner http://rapid-i.com/content/blogcategory/38/69/
⇨ Knime http://www.knime.org/
⇨ Orange http://www.ailab.si/Orange/
⇨
R statistics software http://www.r-project.org/
⇨ ARC statistics software http://www.stat.umn.edu/arc/software.html
⇨
Octave numerical and matrix computation http://www.gnu.org/software/octave/
⇨ Processing http://www.processing.org/
⇨
Circos http://mkweb.bcgsc.ca/circos/
⇨
Treemap http://www.cs.umd.edu/hcil/treemap/
⇨ Many Eyes http://manyeyes.alphaworks.ibm.com/manyeyes/
⇨ Dutch Students: SAS & SPSS Academic Licenses (e.g. SurfSpot.nl)
Web: www.sas.com
Email: jos.vandongen<at>sas.com
Phone: +31-(0)6-10172008
Skype: tholis.jos
LinkedIn: jvdongen
Twitter: josvandongen
Delicious: jvdongen
Jos van Dongen
In BI since 1991
Principal Consultant @ SAS
Author/Speaker/Analyst

More Related Content

What's hot

Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Francis Michael Bautista
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
Gregory Piatetsky-Shapiro
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Dr.Sotarat Thammaboosadee CIMP-Data Governance
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
odsc
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
Mark West
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
Daniel Tunkelang
 
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USC
Sri Ambati
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Data ScienceTech Institute
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
ANOOP V S
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
Mohammed Barakat
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
Mahesh Kumar CV
 
Begin with Data Scientist
Begin with Data ScientistBegin with Data Scientist
Begin with Data Scientist
Narong Intiruk
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
Gregory Piatetsky-Shapiro
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
Jason Geng
 
Data Science presentation for elementary school students
Data Science presentation for elementary school studentsData Science presentation for elementary school students
Data Science presentation for elementary school students
Melanie Manning, CFA
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Ghulam Imaduddin
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
Caserta
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
Mark West
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
Srinath Perera
 

What's hot (20)

Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USC
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
 
Begin with Data Scientist
Begin with Data ScientistBegin with Data Scientist
Begin with Data Scientist
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Data Science presentation for elementary school students
Data Science presentation for elementary school studentsData Science presentation for elementary school students
Data Science presentation for elementary school students
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 

Viewers also liked

Visualization 101 BA4All
Visualization 101 BA4AllVisualization 101 BA4All
Visualization 101 BA4All
Jos van Dongen
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?
Jos van Dongen
 
PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012
Jos van Dongen
 
Hi Speed Datawarehousing
Hi Speed DatawarehousingHi Speed Datawarehousing
Hi Speed Datawarehousing
Jos van Dongen
 
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
Domino Data Lab
 
Open Source Business Intelligence
Open Source Business IntelligenceOpen Source Business Intelligence
Open Source Business Intelligence
Jos van Dongen
 
Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?
Jos van Dongen
 
A Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataA Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big Data
Edward Hsu
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
Jos van Dongen
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Cambridge Semantics
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
Cambridge Semantics
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
SnappyData
 
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise Scale
Cambridge Semantics
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
Robbie Strickland
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
DataWorks Summit
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
Robbie Strickland
 
How to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using SemanticsHow to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using Semantics
Cambridge Semantics
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
Cambridge Semantics
 

Viewers also liked (20)

Visualization 101 BA4All
Visualization 101 BA4AllVisualization 101 BA4All
Visualization 101 BA4All
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?
 
PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012
 
Hi Speed Datawarehousing
Hi Speed DatawarehousingHi Speed Datawarehousing
Hi Speed Datawarehousing
 
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...
 
Open Source Business Intelligence
Open Source Business IntelligenceOpen Source Business Intelligence
Open Source Business Intelligence
 
Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?
 
A Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataA Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big Data
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise Scale
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
How to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using SemanticsHow to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using Semantics
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
 

Similar to Data Scientist 101 BI Dutch

Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
Krishna Sankar
 
New Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max WellingNew Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max Welling
Textkernel
 
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Data Driven Innovation
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Micah Altman
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
MarTech Conference
 
SSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research AssessmentSSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research AssessmentWilliam Gunn
 
Unlocking the Potential: Data as a Medium for Design & Justice
Unlocking the Potential: Data as a Medium for Design & JusticeUnlocking the Potential: Data as a Medium for Design & Justice
Unlocking the Potential: Data as a Medium for Design & Justice
Jess Freaner
 
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
Josh Cowls
 
It's all a game: The twin fallacies of epistemic purity and the scholarly inv...
It's all a game: The twin fallacies of epistemic purity and the scholarly inv...It's all a game: The twin fallacies of epistemic purity and the scholarly inv...
It's all a game: The twin fallacies of epistemic purity and the scholarly inv...
Carl Bergstrom
 
The Need for Deep Learning Transparency
The Need for Deep Learning TransparencyThe Need for Deep Learning Transparency
The Need for Deep Learning Transparency
inside-BigData.com
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
Hugo Bowne-Anderson
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Tharushi Ruwandika
 
Look Around: Question Answering, Serendipity, and the Research Process of Sch...
Look Around: Question Answering, Serendipity, and the Research Process of Sch...Look Around: Question Answering, Serendipity, and the Research Process of Sch...
Look Around: Question Answering, Serendipity, and the Research Process of Sch...
KimberleyMartin
 
Fairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionFairness in Machine Learning @Codemotion
Fairness in Machine Learning @Codemotion
Azzurra Ragone
 
The Ethics of AI
The Ethics of AIThe Ethics of AI
The Ethics of AI
Mark S. Steed
 
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic Modeling
BigML, Inc
 
01datamining.pdf
01datamining.pdf01datamining.pdf
01datamining.pdf
Priyankapawar886284
 
Bi(G) data: opportunities for BI Professionals
Bi(G) data: opportunities for BI ProfessionalsBi(G) data: opportunities for BI Professionals
Bi(G) data: opportunities for BI Professionals
Albert Besselse
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Jonathan Stray
 
Laws and limits of data science 11 10-14
Laws and limits of data science 11 10-14Laws and limits of data science 11 10-14
Laws and limits of data science 11 10-14
Michael Brodie
 

Similar to Data Scientist 101 BI Dutch (20)

Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 
New Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max WellingNew Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max Welling
 
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
 
SSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research AssessmentSSP2013: Altmetrics for Research Assessment
SSP2013: Altmetrics for Research Assessment
 
Unlocking the Potential: Data as a Medium for Design & Justice
Unlocking the Potential: Data as a Medium for Design & JusticeUnlocking the Potential: Data as a Medium for Design & Justice
Unlocking the Potential: Data as a Medium for Design & Justice
 
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
 
It's all a game: The twin fallacies of epistemic purity and the scholarly inv...
It's all a game: The twin fallacies of epistemic purity and the scholarly inv...It's all a game: The twin fallacies of epistemic purity and the scholarly inv...
It's all a game: The twin fallacies of epistemic purity and the scholarly inv...
 
The Need for Deep Learning Transparency
The Need for Deep Learning TransparencyThe Need for Deep Learning Transparency
The Need for Deep Learning Transparency
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Look Around: Question Answering, Serendipity, and the Research Process of Sch...
Look Around: Question Answering, Serendipity, and the Research Process of Sch...Look Around: Question Answering, Serendipity, and the Research Process of Sch...
Look Around: Question Answering, Serendipity, and the Research Process of Sch...
 
Fairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionFairness in Machine Learning @Codemotion
Fairness in Machine Learning @Codemotion
 
The Ethics of AI
The Ethics of AIThe Ethics of AI
The Ethics of AI
 
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic Modeling
 
01datamining.pdf
01datamining.pdf01datamining.pdf
01datamining.pdf
 
Bi(G) data: opportunities for BI Professionals
Bi(G) data: opportunities for BI ProfessionalsBi(G) data: opportunities for BI Professionals
Bi(G) data: opportunities for BI Professionals
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
 
Laws and limits of data science 11 10-14
Laws and limits of data science 11 10-14Laws and limits of data science 11 10-14
Laws and limits of data science 11 10-14
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 

Data Scientist 101 BI Dutch

  • 1. Data Scientist 101: How to become a Super Cruncher
  • 2. “All truths are easy to understand once they are discovered; the point is to discover them.”
  • 3. The 4 “soft” C's of a Data Scientist
  • 4. ...and the 5 R's of 21st Century Literacy ⇨Reading ⇨wRiting ⇨aRithmetic ⇨pRobability ⇨R Source: Joe BlitzStein, Harvard
  • 5. "data scientists should take a page from social scientists, who have a long history of asking where the data they're working with comes from, what methods were used to gather and analyze it, and what cognitive biases they might bring to its interpretation." Kate Crawford, Microsoft Research/MIT
  • 6. Wrong prediction due to extensive media attention & coverage
  • 7. Data Science: wetting your appetite
  • 8. The Data Science Venn Diagram Source: Drew Conway, NYU http://drewconway.com/zia/2013/3/ 26/the-data-science-venn-diagram
  • 9. Another way to look at things...
  • 10. The nerdy approach... Source: Hillary Mason, bit.ly
  • 11. Data Scientists have more fun Source: How to Engage and Retain Analytical Talent By Elizabeth Craig, Jeanne G. Harris and Henry Egan January 2010
  • 12. How Do I Become A Data Scientist? ⇨ Learn about matrix factorizations ⇨ Learn about distributed computing ⇨ Learn about statistical analysis ⇨ Learn about optimization ⇨ Learn about machine learning ⇨ Learn about information retrieval ⇨ Learn about signal detection and estimation ⇨ Master algorithms and data structures ⇨ Practice ⇨ Study Engineering Source: http://www.quora.com/Career-Advice/How-do-I-become-a-data-scientist
  • 13. 6 levels of expertise needed Data wranglingStatistics Data mining Visualization Communication Data Science* Domain & Business Expertise * a bit of programming skills doesn't hurt either
  • 15. SQL Still Matters! ⇨ Big Data SQL ⇨ Hbase & Hive ⇨ Amazon Redshift ⇨ Cloudera Impala ⇨ HortonWorks Stinger ⇨ ... Source: KDNuggets.com
  • 19. Why you need (some) Statistics
  • 21. Learning Statistics ⇨ Coursera.org ⇨ Statistics One ⇨ Passion Driven Statistics ⇨ Statistics: Making sense of Data
  • 22.
  • 23. Essentially, all models are wrong... ...but some are useful George E.P. Box
  • 24. Learning Data Mining ⇨ Coursera.org ⇨ Machine Learning ⇨ Neural Networks for Machine Learning ⇨ Kaggle.com ⇨ Kaggle In Class
  • 26. Visualization is... Theconversionofanyabstractdataintoagraphicalformatsothecharacteristicsand relationshipsofthedatacanbeexploredandanalyzed. ⇨ Humans have the ability to analyze large amounts of information that is presented visually ⇨ This is good for certain types of pattern and trend analysis ⇨ It’s often easy to detect outliers and unusual patterns Usefulforexploration,explanation,discovery,but not forautomatedsystemactions.
  • 28. Again: how many 5's? 3435261241134352612203498723566 9623466620398652034095823450238 4560289567109238401645089630489 5769782364196873484
  • 29. Learning Visualization ⇨ Stephen Few classes ($$) ⇨ Alberto Cairo ⇨ Introduction to Data Journalism
  • 30. Want to get your feet wet? Tableau Public http://www.tableausoftware.com/public/ SAS Visual Analytics http://www.sas.com/software/visual-analytics
  • 31. Where to go from here? ⇨ Read 'Competing on Analytics' ⇨ Move on to 'Data Analysis Using SQL and Excel' ⇨ Then buy 'Handbook of Statistical Analysis & Data Mining Applications' ⇨ Statistics for business: ⇨ http://home.ubalt.edu/ntsbarsh/Business-stat/opre504.htm ⇨ Data Mining: ⇨ www.rapid-i.com (RapidMiner) ⇨ http://www.thearling.com ⇨ http://www.autonlab.org/tutorials/ ⇨ For free text books, search www.scribd.com ⇨ Enter http://www.coursera.org
  • 32. More Resources to Get You Started Books: ⇨ DataMiningTechniques:ForMarketing,SalesandCustomerSupport,MichaelJ.BarryandGordonLinoff ⇨ DataPreparationforDataMining,DorianPyle ⇨ DataMiningAlgorithms,ElbeFrank,IanWitten,JimGray ⇨ AnIntroductiontoInformationRetrieval,ChristopherD.Manning,PrabhakarRaghavan,HinrichSchütze ⇨ InformationRetrieval,C.J.vanRijsbergen ⇨ TheVisualDisplayofQuantitativeInformation,EdwardR.Tufte Journals,Newsletters,WebSites: ⇨ SIGKDDExplorations,NewsletteroftheACMSIGonKnowledgeDiscoveryandDataMining ⇨ IEEETransactionsonPatternAnalysisandMachineIntelligence ⇨ SASKnowledgeExchange: www.sas.com/knowledge-exchange/business-analytics ⇨ KDNuggetsdataminingresources: www.kdnuggets.com ⇨ FlowingData,visualizationresources: http://flowingdata.com/ ⇨ Infoaesthetics,visualdesignresources: http://infosthetics.com/ ⇨ VisualComplexity,visualizationresources: www.visualcomplexity.com/vc/index.cfm ⇨ Recommendationsystemsresources: http://www.deitel.com/ResourceCenters/Web20/RecommenderSystems/tabid/1229/Default.aspx ⇨ TheImpoverishedSocialScientist'sGuidetoFreeStatisticalSoftwareandResources: http://maltman.hmdc.harvard.edu/socsci.shtml
  • 33. Free Stuff So You Can Work Cheaply ⇨ WEKA http://www.cs.waikato.ac.nz/ml/weka/ ⇨ IND decision tree software http://opensource.arc.nasa.gov/software/ind/ ⇨ Clustering http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/ ⇨ Parallel Sets http://eagereyes.org/parallel-sets#download ⇨ RapidMiner http://rapid-i.com/content/blogcategory/38/69/ ⇨ Knime http://www.knime.org/ ⇨ Orange http://www.ailab.si/Orange/ ⇨ R statistics software http://www.r-project.org/ ⇨ ARC statistics software http://www.stat.umn.edu/arc/software.html ⇨ Octave numerical and matrix computation http://www.gnu.org/software/octave/ ⇨ Processing http://www.processing.org/ ⇨ Circos http://mkweb.bcgsc.ca/circos/ ⇨ Treemap http://www.cs.umd.edu/hcil/treemap/ ⇨ Many Eyes http://manyeyes.alphaworks.ibm.com/manyeyes/ ⇨ Dutch Students: SAS & SPSS Academic Licenses (e.g. SurfSpot.nl)
  • 34.
  • 35. Web: www.sas.com Email: jos.vandongen<at>sas.com Phone: +31-(0)6-10172008 Skype: tholis.jos LinkedIn: jvdongen Twitter: josvandongen Delicious: jvdongen Jos van Dongen In BI since 1991 Principal Consultant @ SAS Author/Speaker/Analyst