Data Science is More Than Just
Statistics
Please visit us at stand N310 opposite!
Data science = Statistics?
2 MoreThanJustStats.nb
Data science = Computation with
data
MoreThanJustStats.nb 3
Computation ⊃ {Statistics,
modeling, visualization, machine
learning, signal processing,
geometry, image processing,
maths, semantics, networks,
queues, geodesy, random
processes, audio, survival analysi
4 MoreThanJustStats.nb
Use computation
– to find things to count
Example: Text
Out[ ]=
Word frequency in Lord of the Flies
In[ ]:= MaximalBy[TextSentences[lotf], Classify["Sentiment", #, {"Probability", "Positive"}] &]
Out[ ]= {We are going to have fun on this island!}
MoreThanJustStats.nb 5
Out[ ]=
Example: Images
In[1]:= image =
6 MoreThanJustStats.nb
In[2]:= TabView
"Blobs" → i = Image[Rasterize[Graphics[{Disk[], Disk[{0.7, 0}, 1]}]]],
"Distances" → i2 = ImageAdjust@DistanceTransform[ColorNegate@i],
"Maxima" → Dilation[MaxDetect[i2], 2]

Out[2]=
Blobs Distances Maxima
In[4]:= Showimage, GraphicsText[Style["×", FontColor → White], #] & /@
idata = DeleteDuplicates[Last /@ ComponentMeasurements[
MaxDetect[DistanceTransform[DistanceTransform[DeleteSmallComponents[
ColorNegate[DeleteSmallComponents[Binarize[image]]], 100]]]],
"Centroid"], EuclideanDistance[##] < 6 &]
Out[4]=
×
××
× ××
×
×
×
× ×× ×× × ×× × ×× × ×× × ××
× ×
×
MoreThanJustStats.nb 7
In[5]:= SmoothHistogram3D[idata, BoxRatios → {3, 2, 1}]
Out[5]=
8 MoreThanJustStats.nb
Use computation
– to inject context
Example: London bikes
In[15]:= currentBikeData = Import["http://api.citybik.es/barclays-cycle-hire.json"];
In[16]:= currentBikeData[[1]]
Out[16]= {bikes → 10, name → 000989 - Murray Grove , Hoxton, idx → 0, lat → 51 530 890,
timestamp → 2018-04-25T12:11:42.632000Z, lng → -89 782, id → 0, free → 17, number → 63}
In[13]:= Dataset[Association /@ currentBikeData]
Out[13]=
△
▽
bikes name idx
10 000989 - Murray Grove , Hoxton 0
8 200069 - Knaresborough Place, Earl's Court 1
0 300057 - Westbourne Park Road, Portobello 2
29 000981 - British Museum, Bloomsbury 3
8 001083 - Commercial Street, Shoreditch 4
8 001027 - Warwick Avenue Station, Maida Vale 5
25 000971 - Godliman Street, St. Paul's 6
27 000974 - Guilford Street , Bloomsbury 7
6 001060 - Torrens Street, Angel 8
9 001038 - Harrington Square 1, Camden Town 9
1 001070 - Bricklayers Arms, Borough 10
12 001047 - Falkirk Street, Hoxton 11
0 001041 - Westbourne Grove, Bayswater 12
18 001042 - Woodstock Street, Mayfair 13
14 001049 - Finsbury Leisure Centre, St. Luke's 14
4 001037 - Park Lane , Hyde Park 15
15 001050 - Park Road (Baker Street), The Regent's Park 16
7 000973 - Bethnal Green Road, Shoreditch 17
17 001053 - Clerkenwell Green, Clerkenwell 18
9 001078 - Lambeth Road, Vauxhall 19
showing 1–20 of 784
MoreThanJustStats.nb 9
In[14]:= Legended
GeoGraphicsAbsolutePointSize[10], ColorData["DarkRainbow"]
#1[[2]]  0.001 + #1[[2]] + #1[[3]], Point[#[[1]]] & /@
QuietGeoPosition{"lat", "lng"}  1 000 000., "bikes", "free" /.
currentBikeData, PlotLabel → "Availability of bicycles in London",
ImageSize → 600, BarLegend[{"DarkRainbow", {0, 35}}]
Out[14]=
Example: Accidents
Data Local density Local points All density
10 MoreThanJustStats.nb
MoreThanJustStats.nb 11
0 5 10 15 20
0.75
1.00
1.25
1.50
1.75
Relative Danger Near Fitch Learning
0:00 4:49 9:38 14:27 19:16 0
0
200
400
600
800
1000
Accidents By Time Of Day
12 MoreThanJustStats.nb
Use computation
– to change the viewpoint
Example: Supersonic car
2000 4000 6000 8000
time
-500
500
1000
1500
2000
MoreThanJustStats.nb 13
Apply some calculus
2000 4000 6000 8000
-0.5
0.5
Apply some signal processing
2000 4000 6000 8000 10000
12
11
10
9
8
7
6
5
4
3
2
1
Load on front left wheel
14 MoreThanJustStats.nb
Use computation
– to inject a new viewpoint
Example: Finance
Correlation Threshold 0.1 0.3 0.45 0.5 0.55
Portfolio Correlation
ATHX PFBX NKTR PRGX THLD REGN IDXX WBMD TSRA CSII
ATHX 1. -0.155746 0.192554 0.0346706 0.199476 0.215293 0.055311 0.0598668 0.0667703 0.256818
PFBX -0.155746 1. 0.0551318 -0.0238231 0.0864465 -0.0371775 0.0372411 0.0297157 0.0501491 -0.0263306
NKTR 0.192554 0.0551318 1. 0.173956 0.408224 0.430739 0.252799 0.244477 0.240393 0.340634
PRGX 0.0346706 -0.0238231 0.173956 1. 0.107611 0.152259 0.187019 0.147743 0.224447 0.233454
THLD 0.199476 0.0864465 0.408224 0.107611 1. 0.340108 0.194287 0.231985 0.209756 0.280469
REGN 0.215293 -0.0371775 0.430739 0.152259 0.340108 1. 0.328254 0.21232 0.195491 0.282689
IDXX 0.055311 0.0372411 0.252799 0.187019 0.194287 0.328254 1. 0.21433 0.186771 0.287849
WBMD 0.0598668 0.0297157 0.244477 0.147743 0.231985 0.21232 0.21433 1. 0.203065 0.202096
TSRA 0.0667703 0.0501491 0.240393 0.224447 0.209756 0.195491 0.186771 0.203065 1. 0.252917
CSII 0.256818 -0.0263306 0.340634 0.233454 0.280469 0.282689 0.287849 0.202096 0.252917 1.
CFBK -0.0202622 0.047074 0.045182 -0.0116863 -0.00255542 -0.000900992 -0.0169211 -0.00205736 0.0692481 -0.0932367
SHOO 0.0666881 0.0877096 0.137083 0.246094 0.0914212 0.17177 0.172684 0.146134 0.156682 0.224934
NCMI -0.00881864 0.0229376 0.0943068 0.26504 0.117862 -0.0183203 0.120724 0.118175 0.205415 0.199415
BMRN 0.210411 0.11726 0.427514 0.222912 0.377895 0.555568 0.42669 0.2383 0.222794 0.395739
MGLN 0.0861668 0.0244271 0.269405 0.228394 0.217646 0.253427 0.435732 0.241649 0.322608 0.290727
CRDC -0.0246006 0.0999991 0.0603159 0.09501 0.00876257 -0.0257783 0.00368514 0.0795676 0.00799325 -0.0202939
SURG 0.0793505 -0.0290028 0.0249819 -0.0184576 0.0949218 0.120998 0.0330482 0.165451 0.0105265 0.128889
HFFC 0.0159068 0.0453208 0.056407 -0.133485 0.0857123 0.0728086 0.0179226 -0.0582327 0.0575192 0.0567152
LTRX 0.160061 0.0175897 0.11774 0.0961482 0.22694 0.0688794 -0.0311349 0.197049 0.211728 0.134419
MFNC -0.102578 0.00566642 0.070634 0.0481224 0.0240564 0.0164561 -0.0230522 -0.0500249 0.0625739 0.0602153
SHLO 0.137258 0.00834566 0.259109 0.255858 0.22768 0.170004 0.184693 0.262615 0.283098 0.19992
LABC 0.0582621 -0.0770103 0.0336639 -0.0219564 -0.0555295 0.0168567 -0.035154 -0.000979898 -0.0312983 -0.0374193
SNFCA 0.106298 -0.0796818 0.0353886 -0.0379301 0.102008 0.0133965 -0.0160458 0.064443 0.00624401 0.0744798
ASTI 0.0296936 -0.10231 0.0579198 0.130839 0.170478 0.159669 0.0242666 0.060238 0.103079 0.0802336
STRM 0.103553 -0.027044 0.015168 0.0813282 0.0710839 0.0916784 0.11691 0.0622404 0.10743 0.03938
RLOC 0.0673226 0.12551 0.174601 0.0897312 0.204838 0.117319 0.115966 0.214765 0.200785 0.178144
HSKA 0.0696611 0.0310147 0.0391807 0.0775681 0.140841 0.185131 -0.0176538 0.0500578 0.0610371 0.10533
NWFL -0.145537 0.12346 -0.0149798 -0.0737182 -0.0392499 -0.0887732 0.0110006 0.00857444 -0.0703103 -0.0690263
IFON 0.173712 0.0506925 0.112228 0.118179 0.168008 0.183346 0.146735 0.103064 0.0700051 0.1546
GENC -0.00454872 0.111557 0.0929517 0.118871 0.0893299 0.0951799 0.0683046 0.162016 -0.0174889 0.0333547
VSCI 0.0934438 -0.122724 -0.0337841 -0.11932 -0.0100928 -0.0106619 0.0262221 0.0221221 -0.0462289 0.00243774
HNSN 0.0728787 -0.0202281 0.0217612 0.0479466 0.039112 0.0492618 0.0121075 -0.00590073 0.0782787 0.177993
AMGN 0.144263 0.0524144 0.434697 0.164497 0.363856 0.567489 0.345919 0.270955 0.339446 0.383243
TOPS 0.0165096 0.0629182 0.0856118 -0.0616743 0.0295842 -0.006062 -0.0335838 0.143903 0.0738433 -0.0262043
UTSI 0.0574989 0.0159024 0.0864641 0.0859226 0.117787 0.0371759 -0.0538848 0.137577 0.0161543 0.0243469
INFN 0.136798 -0.172368 0.304939 0.257152 0.275845 0.267789 0.20102 0.283399 0.254023 0.283986
SAAS 0.100494 0.0334147 0.284865 0.292893 0.343925 0.250811 0.263846 0.298446 0.235286 0.290863
KRNY 0.00778643 0.0275327 0.261778 0.327411 0.227934 0.16828 0.237698 0.192625 0.336821 0.250728
ACUR 0.0599088 0.0654299 -0.0287304 0.0594885 0.0324952 0.0809222 0.0524753 0.065247 -0.0453231 0.107133
BERK -0.0263423 0.0857371 -0.0267503 -0.0363498 0.0123773 0.0174475 0.0648471 -0.0242965 0.079019 -0.045243
QCCO -0.036335 -0.0247951 0.125918 0.0854495 0.0777864 0.0406812 0.113156 0.0767831 0.0439639 0.0665833
LINC 0.109821 0.0471629 0.0797274 0.0278627 0.0350469 0.0664748 0.0578671 0.00775897 0.11751 0.0977395
KOPN 0.185468 0.00644997 0.398103 0.277598 0.350097 0.290742 0.260754 0.312968 0.341296 0.394899
ICCC 0.152975 -0.14512 0.00441828 0.0278111 0.0303676 -0.0192398 -0.0681873 0.0840796 0.0132102 0.0659621
Data Visualized Graph Communities
MoreThanJustStats.nb 15
16 MoreThanJustStats.nb
Use computation
– to separate signal from noise
MoreThanJustStats.nb 17
4.91 s | 8192
Sound Frequency shifts Model Result
18 MoreThanJustStats.nb
So why is most data science just
counting?
MoreThanJustStats.nb 19
So why is most data science just
counting?
Computation can be difficult
You have to know what’s possible
20 MoreThanJustStats.nb
The role of automation
– making computation easier
Example: Titanic
In[ ]:= Short[tData = ExampleData[{"MachineLearning", "Titanic"}, "Data"]]
Out[ ]//Short= {{1st, 29., female} → survived,
{1st, 0.9167, male} → survived, 1306, {3rd, 29., male} → died}
In[ ]:= titanicSurvival = Classify[tData]
Out[ ]= ClassifierFunction
Input type: {Nominal, Numerical, Nominal}
Classes: died, survived

In[ ]:= titanicSurvival[{"1st", 46, "male"}]
Out[ ]= died
In[ ]:= Plot[titanicSurvival[{"1st", age, "male"}, {"Probability", "died"}], {age, 0, 60}]
Out[ ]=
10 20 30 40 50 60
0.4
0.5
0.6
0.7
MoreThanJustStats.nb 21
Example: Day & night
In[ ]:= daynight = Classify
 → "Night", → "Day", → "Night", → "Night", → "Day",
→ "Night", → "Day", → "Day", → "Night", → "Night",
→ "Day", → "Night", → "Night", → "Day", → "Night",
→ "Night", → "Day", → "Day", → "Day", → "Day",
→ "Night", → "Night", → "Day", → "Night", → "Night",
→ "Day", → "Day", → "Day", → "Night", → "Day"
Out[ ]= ClassifierFunction
Input type: Image
Classes: Day, Night

In[ ]:= daynight , , ,
, , 
Out[ ]= {Day, Night, Day, Night, Night, Night}
22 MoreThanJustStats.nb
The role of automation
– automating insights
Example: Image identification
Example: Supervising the computer
Data = 0
Reset
Capture: Rock Paper Scissors Watch Stop
Train
MoreThanJustStats.nb 23
Example: No supervision - “Hands off the wheel”
Dogs
In[ ]:= Dataset[dogs]
Out[ ]=
24 MoreThanJustStats.nb
In[ ]:= FeatureSpacePlot[Take[dogs, 60], LabelingSize → 70]
Out[ ]=
In[ ]:= nearestDog = FeatureNearest[dogs]
Out[ ]= NearestFunction
Input type: Image
Output property: Element
Unable to store data in notebook.

In[ ]:= Grid[{testDogs, First /@ nearestDog[testDogs]}]
Out[ ]=
MoreThanJustStats.nb 25
Financial Assets
AMEX:MSN
MSADX
MSAIX
MSBNX
MSBYX
MSCCUX
MSCDX
MSCFX
MSDIX
MSDVX
MSEDX
MSEEX
MSEGX
MSELX
MSENX
MSEPX
MSFAX
MSFIX
MSFYX
MSGFX
MSGIX
MSGOX
MSGVX
MSHEX
MSHYX
MSIAX
MSIDX
MSIIX
MSIJX
MSIRX
MSIZX
MSJLX
MSMBX
MSMUX
MSNEOX
MSOCX
MSPDX
MSPMX
MSPRX
MSRBX
MSSAX
MSSFX
MSSIX
MSTCX
MSTIX
MSTLX
MSTZX
MSUAX
MSUGPX
MSULX
MSVCX
MSVNX
NASDAQ:MSEX
NASDAQ:MSON
NYSE:MSB
NYSE:MSPENYSE:MSPF
NYSE:MSPG
NYSE:MSPI
NYSE:MS-PK
26 MoreThanJustStats.nb
MoreThanJustStats.nb 27
The role of automation
– after the computation
Deployment
Firewall
App deployment
bikeApp = DynamicModule
{url = "http://api.citybik.es/barclays-cycle-hire.json", city = "London"},
ColumnActionMenu"Choose city", SortBy"city" ⧴ city = "city";
url = "url" /.
Import["http://api.citybik.es/networks.json", "JSON"], First,
DynamicDataset[Association /@ Import[url, "JSON"]]
Legended[GeoGraphics[#, ImageSize → 600,
PlotLabel → "Availability of bicycles in " <> city],
BarLegend[{"DarkRainbow", {0, 100}}]] &,
AbsolutePointSize[10], ColorData["DarkRainbow"]#bikes   #bikes + #free,
PointGeoPosition{#lat, #lng}  1 000 000. &,
SynchronousUpdating → False, TrackedSymbols ⧴ {url, city}
CloudDeploy[bikeApp, Permissions → "Public"]
28 MoreThanJustStats.nb
API deployment
In[ ]:= CloudDeploy[
APIFunction[{"class" → "String", "age" → "Number", "sex" → "String"},
Function[titanicSurvival[{#class, #age, #sex}]],
AllowedCloudExtraParameters → All],
"TitanicPredictor",
Permissions → "Public"
]
Out[ ]= CloudObjecthttps://www.wolframcloud.com/objects/jonm/TitanicPredictor
In[ ]:= EmbedCode[%, "Java"]
Out[ ]=
Embeddable Code
Use the code below to call the Wolfram Cloud function from Java:
Code
Copy to Clipboard
if (_conn.getResponseCode() != 200) {
throw new IOException(_conn.getResponseMessage());
}
BufferedReader _rdr = new BufferedReader(new
InputStreamReader(_conn.getInputStream()));
StringBuilder _sb = new StringBuilder();
String _line;
while ((_line = _rdr.readLine()) != null) {
_sb.append(_line);
}
_rdr.close();
_conn.disconnect();
return _sb.toString();
}
}
MoreThanJustStats.nb 29
Breaking the Boundaries of
Traditional Data Science
The toolset is HUGE - use more of it
Automation makes the toolset accessible
The human’s role is to ask deeper questions of the data
30 MoreThanJustStats.nb

Data Science Is More Than Just Statistics

  • 1.
    Data Science isMore Than Just Statistics Please visit us at stand N310 opposite!
  • 2.
    Data science =Statistics? 2 MoreThanJustStats.nb
  • 3.
    Data science =Computation with data MoreThanJustStats.nb 3
  • 4.
    Computation ⊃ {Statistics, modeling,visualization, machine learning, signal processing, geometry, image processing, maths, semantics, networks, queues, geodesy, random processes, audio, survival analysi 4 MoreThanJustStats.nb
  • 5.
    Use computation – tofind things to count Example: Text Out[ ]= Word frequency in Lord of the Flies In[ ]:= MaximalBy[TextSentences[lotf], Classify["Sentiment", #, {"Probability", "Positive"}] &] Out[ ]= {We are going to have fun on this island!} MoreThanJustStats.nb 5
  • 6.
    Out[ ]= Example: Images In[1]:=image = 6 MoreThanJustStats.nb
  • 7.
    In[2]:= TabView "Blobs" →i = Image[Rasterize[Graphics[{Disk[], Disk[{0.7, 0}, 1]}]]], "Distances" → i2 = ImageAdjust@DistanceTransform[ColorNegate@i], "Maxima" → Dilation[MaxDetect[i2], 2]  Out[2]= Blobs Distances Maxima In[4]:= Showimage, GraphicsText[Style["×", FontColor → White], #] & /@ idata = DeleteDuplicates[Last /@ ComponentMeasurements[ MaxDetect[DistanceTransform[DistanceTransform[DeleteSmallComponents[ ColorNegate[DeleteSmallComponents[Binarize[image]]], 100]]]], "Centroid"], EuclideanDistance[##] < 6 &] Out[4]= × ×× × ×× × × × × ×× ×× × ×× × ×× × ×× × ×× × × × MoreThanJustStats.nb 7
  • 8.
    In[5]:= SmoothHistogram3D[idata, BoxRatios→ {3, 2, 1}] Out[5]= 8 MoreThanJustStats.nb
  • 9.
    Use computation – toinject context Example: London bikes In[15]:= currentBikeData = Import["http://api.citybik.es/barclays-cycle-hire.json"]; In[16]:= currentBikeData[[1]] Out[16]= {bikes → 10, name → 000989 - Murray Grove , Hoxton, idx → 0, lat → 51 530 890, timestamp → 2018-04-25T12:11:42.632000Z, lng → -89 782, id → 0, free → 17, number → 63} In[13]:= Dataset[Association /@ currentBikeData] Out[13]= △ ▽ bikes name idx 10 000989 - Murray Grove , Hoxton 0 8 200069 - Knaresborough Place, Earl's Court 1 0 300057 - Westbourne Park Road, Portobello 2 29 000981 - British Museum, Bloomsbury 3 8 001083 - Commercial Street, Shoreditch 4 8 001027 - Warwick Avenue Station, Maida Vale 5 25 000971 - Godliman Street, St. Paul's 6 27 000974 - Guilford Street , Bloomsbury 7 6 001060 - Torrens Street, Angel 8 9 001038 - Harrington Square 1, Camden Town 9 1 001070 - Bricklayers Arms, Borough 10 12 001047 - Falkirk Street, Hoxton 11 0 001041 - Westbourne Grove, Bayswater 12 18 001042 - Woodstock Street, Mayfair 13 14 001049 - Finsbury Leisure Centre, St. Luke's 14 4 001037 - Park Lane , Hyde Park 15 15 001050 - Park Road (Baker Street), The Regent's Park 16 7 000973 - Bethnal Green Road, Shoreditch 17 17 001053 - Clerkenwell Green, Clerkenwell 18 9 001078 - Lambeth Road, Vauxhall 19 showing 1–20 of 784 MoreThanJustStats.nb 9
  • 10.
    In[14]:= Legended GeoGraphicsAbsolutePointSize[10], ColorData["DarkRainbow"] #1[[2]] 0.001 + #1[[2]] + #1[[3]], Point[#[[1]]] & /@ QuietGeoPosition{"lat", "lng"}  1 000 000., "bikes", "free" /. currentBikeData, PlotLabel → "Availability of bicycles in London", ImageSize → 600, BarLegend[{"DarkRainbow", {0, 35}}] Out[14]= Example: Accidents Data Local density Local points All density 10 MoreThanJustStats.nb
  • 11.
  • 12.
    0 5 1015 20 0.75 1.00 1.25 1.50 1.75 Relative Danger Near Fitch Learning 0:00 4:49 9:38 14:27 19:16 0 0 200 400 600 800 1000 Accidents By Time Of Day 12 MoreThanJustStats.nb
  • 13.
    Use computation – tochange the viewpoint Example: Supersonic car 2000 4000 6000 8000 time -500 500 1000 1500 2000 MoreThanJustStats.nb 13
  • 14.
    Apply some calculus 20004000 6000 8000 -0.5 0.5 Apply some signal processing 2000 4000 6000 8000 10000 12 11 10 9 8 7 6 5 4 3 2 1 Load on front left wheel 14 MoreThanJustStats.nb
  • 15.
    Use computation – toinject a new viewpoint Example: Finance Correlation Threshold 0.1 0.3 0.45 0.5 0.55 Portfolio Correlation ATHX PFBX NKTR PRGX THLD REGN IDXX WBMD TSRA CSII ATHX 1. -0.155746 0.192554 0.0346706 0.199476 0.215293 0.055311 0.0598668 0.0667703 0.256818 PFBX -0.155746 1. 0.0551318 -0.0238231 0.0864465 -0.0371775 0.0372411 0.0297157 0.0501491 -0.0263306 NKTR 0.192554 0.0551318 1. 0.173956 0.408224 0.430739 0.252799 0.244477 0.240393 0.340634 PRGX 0.0346706 -0.0238231 0.173956 1. 0.107611 0.152259 0.187019 0.147743 0.224447 0.233454 THLD 0.199476 0.0864465 0.408224 0.107611 1. 0.340108 0.194287 0.231985 0.209756 0.280469 REGN 0.215293 -0.0371775 0.430739 0.152259 0.340108 1. 0.328254 0.21232 0.195491 0.282689 IDXX 0.055311 0.0372411 0.252799 0.187019 0.194287 0.328254 1. 0.21433 0.186771 0.287849 WBMD 0.0598668 0.0297157 0.244477 0.147743 0.231985 0.21232 0.21433 1. 0.203065 0.202096 TSRA 0.0667703 0.0501491 0.240393 0.224447 0.209756 0.195491 0.186771 0.203065 1. 0.252917 CSII 0.256818 -0.0263306 0.340634 0.233454 0.280469 0.282689 0.287849 0.202096 0.252917 1. CFBK -0.0202622 0.047074 0.045182 -0.0116863 -0.00255542 -0.000900992 -0.0169211 -0.00205736 0.0692481 -0.0932367 SHOO 0.0666881 0.0877096 0.137083 0.246094 0.0914212 0.17177 0.172684 0.146134 0.156682 0.224934 NCMI -0.00881864 0.0229376 0.0943068 0.26504 0.117862 -0.0183203 0.120724 0.118175 0.205415 0.199415 BMRN 0.210411 0.11726 0.427514 0.222912 0.377895 0.555568 0.42669 0.2383 0.222794 0.395739 MGLN 0.0861668 0.0244271 0.269405 0.228394 0.217646 0.253427 0.435732 0.241649 0.322608 0.290727 CRDC -0.0246006 0.0999991 0.0603159 0.09501 0.00876257 -0.0257783 0.00368514 0.0795676 0.00799325 -0.0202939 SURG 0.0793505 -0.0290028 0.0249819 -0.0184576 0.0949218 0.120998 0.0330482 0.165451 0.0105265 0.128889 HFFC 0.0159068 0.0453208 0.056407 -0.133485 0.0857123 0.0728086 0.0179226 -0.0582327 0.0575192 0.0567152 LTRX 0.160061 0.0175897 0.11774 0.0961482 0.22694 0.0688794 -0.0311349 0.197049 0.211728 0.134419 MFNC -0.102578 0.00566642 0.070634 0.0481224 0.0240564 0.0164561 -0.0230522 -0.0500249 0.0625739 0.0602153 SHLO 0.137258 0.00834566 0.259109 0.255858 0.22768 0.170004 0.184693 0.262615 0.283098 0.19992 LABC 0.0582621 -0.0770103 0.0336639 -0.0219564 -0.0555295 0.0168567 -0.035154 -0.000979898 -0.0312983 -0.0374193 SNFCA 0.106298 -0.0796818 0.0353886 -0.0379301 0.102008 0.0133965 -0.0160458 0.064443 0.00624401 0.0744798 ASTI 0.0296936 -0.10231 0.0579198 0.130839 0.170478 0.159669 0.0242666 0.060238 0.103079 0.0802336 STRM 0.103553 -0.027044 0.015168 0.0813282 0.0710839 0.0916784 0.11691 0.0622404 0.10743 0.03938 RLOC 0.0673226 0.12551 0.174601 0.0897312 0.204838 0.117319 0.115966 0.214765 0.200785 0.178144 HSKA 0.0696611 0.0310147 0.0391807 0.0775681 0.140841 0.185131 -0.0176538 0.0500578 0.0610371 0.10533 NWFL -0.145537 0.12346 -0.0149798 -0.0737182 -0.0392499 -0.0887732 0.0110006 0.00857444 -0.0703103 -0.0690263 IFON 0.173712 0.0506925 0.112228 0.118179 0.168008 0.183346 0.146735 0.103064 0.0700051 0.1546 GENC -0.00454872 0.111557 0.0929517 0.118871 0.0893299 0.0951799 0.0683046 0.162016 -0.0174889 0.0333547 VSCI 0.0934438 -0.122724 -0.0337841 -0.11932 -0.0100928 -0.0106619 0.0262221 0.0221221 -0.0462289 0.00243774 HNSN 0.0728787 -0.0202281 0.0217612 0.0479466 0.039112 0.0492618 0.0121075 -0.00590073 0.0782787 0.177993 AMGN 0.144263 0.0524144 0.434697 0.164497 0.363856 0.567489 0.345919 0.270955 0.339446 0.383243 TOPS 0.0165096 0.0629182 0.0856118 -0.0616743 0.0295842 -0.006062 -0.0335838 0.143903 0.0738433 -0.0262043 UTSI 0.0574989 0.0159024 0.0864641 0.0859226 0.117787 0.0371759 -0.0538848 0.137577 0.0161543 0.0243469 INFN 0.136798 -0.172368 0.304939 0.257152 0.275845 0.267789 0.20102 0.283399 0.254023 0.283986 SAAS 0.100494 0.0334147 0.284865 0.292893 0.343925 0.250811 0.263846 0.298446 0.235286 0.290863 KRNY 0.00778643 0.0275327 0.261778 0.327411 0.227934 0.16828 0.237698 0.192625 0.336821 0.250728 ACUR 0.0599088 0.0654299 -0.0287304 0.0594885 0.0324952 0.0809222 0.0524753 0.065247 -0.0453231 0.107133 BERK -0.0263423 0.0857371 -0.0267503 -0.0363498 0.0123773 0.0174475 0.0648471 -0.0242965 0.079019 -0.045243 QCCO -0.036335 -0.0247951 0.125918 0.0854495 0.0777864 0.0406812 0.113156 0.0767831 0.0439639 0.0665833 LINC 0.109821 0.0471629 0.0797274 0.0278627 0.0350469 0.0664748 0.0578671 0.00775897 0.11751 0.0977395 KOPN 0.185468 0.00644997 0.398103 0.277598 0.350097 0.290742 0.260754 0.312968 0.341296 0.394899 ICCC 0.152975 -0.14512 0.00441828 0.0278111 0.0303676 -0.0192398 -0.0681873 0.0840796 0.0132102 0.0659621 Data Visualized Graph Communities MoreThanJustStats.nb 15
  • 16.
  • 17.
    Use computation – toseparate signal from noise MoreThanJustStats.nb 17
  • 18.
    4.91 s |8192 Sound Frequency shifts Model Result 18 MoreThanJustStats.nb
  • 19.
    So why ismost data science just counting? MoreThanJustStats.nb 19
  • 20.
    So why ismost data science just counting? Computation can be difficult You have to know what’s possible 20 MoreThanJustStats.nb
  • 21.
    The role ofautomation – making computation easier Example: Titanic In[ ]:= Short[tData = ExampleData[{"MachineLearning", "Titanic"}, "Data"]] Out[ ]//Short= {{1st, 29., female} → survived, {1st, 0.9167, male} → survived, 1306, {3rd, 29., male} → died} In[ ]:= titanicSurvival = Classify[tData] Out[ ]= ClassifierFunction Input type: {Nominal, Numerical, Nominal} Classes: died, survived  In[ ]:= titanicSurvival[{"1st", 46, "male"}] Out[ ]= died In[ ]:= Plot[titanicSurvival[{"1st", age, "male"}, {"Probability", "died"}], {age, 0, 60}] Out[ ]= 10 20 30 40 50 60 0.4 0.5 0.6 0.7 MoreThanJustStats.nb 21
  • 22.
    Example: Day &night In[ ]:= daynight = Classify  → "Night", → "Day", → "Night", → "Night", → "Day", → "Night", → "Day", → "Day", → "Night", → "Night", → "Day", → "Night", → "Night", → "Day", → "Night", → "Night", → "Day", → "Day", → "Day", → "Day", → "Night", → "Night", → "Day", → "Night", → "Night", → "Day", → "Day", → "Day", → "Night", → "Day" Out[ ]= ClassifierFunction Input type: Image Classes: Day, Night  In[ ]:= daynight , , , , ,  Out[ ]= {Day, Night, Day, Night, Night, Night} 22 MoreThanJustStats.nb
  • 23.
    The role ofautomation – automating insights Example: Image identification Example: Supervising the computer Data = 0 Reset Capture: Rock Paper Scissors Watch Stop Train MoreThanJustStats.nb 23
  • 24.
    Example: No supervision- “Hands off the wheel” Dogs In[ ]:= Dataset[dogs] Out[ ]= 24 MoreThanJustStats.nb
  • 25.
    In[ ]:= FeatureSpacePlot[Take[dogs,60], LabelingSize → 70] Out[ ]= In[ ]:= nearestDog = FeatureNearest[dogs] Out[ ]= NearestFunction Input type: Image Output property: Element Unable to store data in notebook.  In[ ]:= Grid[{testDogs, First /@ nearestDog[testDogs]}] Out[ ]= MoreThanJustStats.nb 25
  • 26.
  • 27.
  • 28.
    The role ofautomation – after the computation Deployment Firewall App deployment bikeApp = DynamicModule {url = "http://api.citybik.es/barclays-cycle-hire.json", city = "London"}, ColumnActionMenu"Choose city", SortBy"city" ⧴ city = "city"; url = "url" /. Import["http://api.citybik.es/networks.json", "JSON"], First, DynamicDataset[Association /@ Import[url, "JSON"]] Legended[GeoGraphics[#, ImageSize → 600, PlotLabel → "Availability of bicycles in " <> city], BarLegend[{"DarkRainbow", {0, 100}}]] &, AbsolutePointSize[10], ColorData["DarkRainbow"]#bikes   #bikes + #free, PointGeoPosition{#lat, #lng}  1 000 000. &, SynchronousUpdating → False, TrackedSymbols ⧴ {url, city} CloudDeploy[bikeApp, Permissions → "Public"] 28 MoreThanJustStats.nb
  • 29.
    API deployment In[ ]:=CloudDeploy[ APIFunction[{"class" → "String", "age" → "Number", "sex" → "String"}, Function[titanicSurvival[{#class, #age, #sex}]], AllowedCloudExtraParameters → All], "TitanicPredictor", Permissions → "Public" ] Out[ ]= CloudObjecthttps://www.wolframcloud.com/objects/jonm/TitanicPredictor In[ ]:= EmbedCode[%, "Java"] Out[ ]= Embeddable Code Use the code below to call the Wolfram Cloud function from Java: Code Copy to Clipboard if (_conn.getResponseCode() != 200) { throw new IOException(_conn.getResponseMessage()); } BufferedReader _rdr = new BufferedReader(new InputStreamReader(_conn.getInputStream())); StringBuilder _sb = new StringBuilder(); String _line; while ((_line = _rdr.readLine()) != null) { _sb.append(_line); } _rdr.close(); _conn.disconnect(); return _sb.toString(); } } MoreThanJustStats.nb 29
  • 30.
    Breaking the Boundariesof Traditional Data Science The toolset is HUGE - use more of it Automation makes the toolset accessible The human’s role is to ask deeper questions of the data 30 MoreThanJustStats.nb