SlideShare a Scribd company logo
1
slides= tiny.cc/se15
1ai4se.net
October 2015
Slides: tiny.cc/se15
(A)Future of
SE Research:
Research for SE,
SE for Research
tim.menzies@gmail.com
https://menzies.us
ai4se.net
2ai4se.net
slides= tiny.cc/se15
Data mining tools should,
and can, do much more
• Operating systems do more than just schedule processes:
– Editors
– Compilers
– File systems,
– Network
connections,
– Memory
management
– Etc
• What services should be standard in data mining tools?
ai4se.net
3
slides= tiny.cc/se15
3ai4se.net
IEEE trans SE ‘13a
ESE ‘09
ESE ‘14
IEEE trans SE ‘15
Icse ‘16?
Wvu ‘13
ICSE ‘15
IEEE trans SE ’13b
IEEE trans SE ‘12
4
slides= tiny.cc/se15
4ai4se.net
Not in this talk:
not what everyone else is talking about
• Principles for designing
case studies
• Visualizations
• Data mining
• Big Data
• Qualitative methods
see parts1+2
5
slides= tiny.cc/se15
5ai4se.net
The talk…
adding in some missing bits
6
slides= tiny.cc/se15
6ai4se.net
1. Software tools for
“citizen scientists”.
2. Beyond mere
data repositories
3. What happens when decision
software goes wrong?
4. Proposed services for
nextgen repositories
5. The Future?
ai4se.net
7
slides= tiny.cc/se15
7ai4se.net
1. Software tools for
“citizen scientists”.
2. Beyond mere
data repositories
3. What happens when decision
software goes wrong?
4. Proposed services for
nextgen repositories
5. The Future?
ai4se.net
8
slides= tiny.cc/se15
8ai4se.net
Software tools for “citizen scientists”
• Science has escaped the lab
– roaming free in the world.
• When every citizen can be a
scientist (making
generalizations from data)
– Then it should be possible to
audit those conclusions
• Want to mistrust the
conclusions of citizen scientists
– Just as we mistrust and
evaluate, review, explore, evolve
the conclusions of any other
scientist.
9ai4se.net
slides= tiny.cc/se15
Software mediates what we see
and how we act in the world
1. Silicon valley developers view every new
feature as an experiment, to be tested
within some mash up.
2. Chemists win Nobel Prize for software
sims http://goo.gl/Lwensc
3. Engineers use software to optical
tweezers, radiation therapy, remote
sensing, chip design,
http://goo.gl/qBMyIZ
4. Web analysts use software to analyze
clickstreams to improve sales and
marketing strategies;
http://goo.gl/b26CfY
5. Stock traders write software to simulate
trading strategies
http://www.quantopian.com
6. Analysts write software to mine labor
statistics data to review proposed gov
policies http://goo.gl/X4kgnc
7. Journalists use software to analyze
economic data, make visualizations of their
news stories http://fivethirtyeight.com
8. In London or New York, ambulances wait
for your call at a location determined by a
software model http://goo.gl/8SMd1p
9. Etc etc etc
10
slides= tiny.cc/se15
10ai4se.net
Important to understand how
software can divides us
See also “Facebook emotion study breached
ethical guidelines, researchers say” June 30,
2014, The Guardian http://goo.gl/gTRkmp
Yes.ai4se.net
12ai4se.net
slides= tiny.cc/se15
Better SE = better data science
= better science
• A data scientist isa
engineer
– Delivering, under
constraints, to
acceptable quality
standards
• A data scientist isa
software developer
– Complex scripts, test-
driven development,
version control
• A data scientist isa
requirements
engineering
– Understanding and
navigating and trading
off between user goals
• A data scientist isa agile
programmer
– Uses feedback from
writing, running code
and query results to
constantly revise goals
and code
Data scientist isa software engineering
13
slides= tiny.cc/se15
13ai4se.net
1. Software tools for
“citizen scientists”.
2. Beyond mere
data repositories
3. What happens when decision
software goes wrong?
4. Proposed services for
nextgen repositories
5. The Future?
ai4se.net
14ai4se.net
slides= tiny.cc/se15
#storeYourData
• URL openscience.us/repo
• Data from 100s of projects
• E.g. EUSE: 250,000K+ spreadsheets
• E.g. Softgoals: 150+ softgoal models
• Oldest continuous repository of SE data (2004)
14
http://openscience.us/repo
15
slides= tiny.cc/se15
15ai4se.net
15
So many data repositories
• What’s next?
• What tools would we need for an “debate”-oriented
repository ?
To design those
tools, ask:
1. What problems
are seen when
people try to share
data and
conclusions?
2. What minimal data
structures address
those problems?
Let’s talk tools
ai4se.net
17
slides= tiny.cc/se15
17ai4se.net
1. Software tools for
“citizen scientists”.
2. Beyond mere
data repositories
3. What happens when decision
software goes wrong?
4. Proposed services for
nextgen repositories
5. The Future?
ai4se.net
18
slides= tiny.cc/se15
18ai4se.net
Models have “certification envelopes”
• Columbia ice strike
– Size: 1200 m2
– Speed: 477 mpg (relative to vehicle)
• Certified as “safe” by the CRATER micro-
meteorite model.
– A experiment in CRATER’s DB:
• Size: 3cm3
• Speed: under 100 mpg
• Columbia, and crew, dies on re-entry
• Lesson: conclusions should come with a
“certification envelope”
– If new tests outside of the envelope of
the training set
– Raise an alert
Bad things happen when you stretch the envelope
19
slides= tiny.cc/se15
19ai4se.net
Goals matter
• Learners work
this way
– Users want it
that way
• Waste of time
learning models
users do not want
– Better to tune
learning methods
to goals of users
• Enter search-based
software
engineering
– Multi-goal
optimization
Learners learn for X, users want Y
20
slides= tiny.cc/se15
20ai4se.net
Locality matters
(what is true there may not be true here)
• Devanbu et al. ASE’11
Ecological Inference
• Betternburg et al. MSR’12
Think local, act global,
• Menzies et al. TSE’13
Local versus Global learning,
• Yang et al. IST’13
Handling local bias,
• Minku et al. ICSE’14
Best Use of Cross-Company Data
Using ensemble data
Using local data
Error(lessisbetter)
Not general models ,but general methods for local models
21
slides= tiny.cc/se15
21ai4se.net
Sharing matters
• How was the error found so fast?
– Open science
Given enough eyes, all bugs are shallow
When (2013) What
Mar 15 “Better cross-company learning”
accepted to MSR’13
Mar 29 Camera-ready submitted
?Apr 10 Pre-prints go on-line
Apr 29 Hyeongmin Jeon, graduate student
at Pusan Natl. Univ.emailed us: can’t
reproduce result
May 4 Fayola Peters, checking code, found
error. Manic week of experiments
follow
May 11 We conclude results definitely wrong
May 12 Email MSR organizers. Our penalty?
Present paper and its error.
22
slides= tiny.cc/se15
22ai4se.net
Compression and privacy matter
• Facebook, Google, Netflix etc
• Small X% of all users are subjects in continual experiments:
testing new features
• Data from studies, retained indefinitely, warehoused
– Problems with volume (needs compression)
– Problems with confidentiality (needs privacy)
• If I want to challenge the conclusions made by Facebook,
Google, Netflix, etc
– I need to be able to access, privately, that data
– (needs trusted sharing)
Squeezing and secrets
23
slides= tiny.cc/se15
23ai4se.net
Lessons learned
• Certification envelopes (when not to trust conclusions)
• Goals matter (not everything is “classification”)
• Locality matters (when their conclusions do not hold for you)
• Need “streaming tools” (continually stream over a never
ending sequence of new data)
• Need repair tools (to fix broken ideas)
• Verification matters (sooner or later, we all screw up)
• Need to transfer data (get by with a little help from your
friends)
• Need compression tools (to save space)
• Need privacy tools (so you can share)
What matters?
24
slides= tiny.cc/se15
24ai4se.net
1. Software tools for
“citizen scientists”.
2. Beyond mere
data repositories
3. What happens when decision
software goes wrong?
4. Proposed services for
nextgen repositories
5. The Future?
ai4se.net
25
slides= tiny.cc/se15
25ai4se.net
Digression: WHERE:
O(N)top-down divisive clusterings
• Fast: works on an approximation to eigenvectors (the FASTMAP heuristic)
Faloutsos [1995]. A O(N) generation of axis of large variability
• Pick any point X;
• Find E= East = furthest from X,
• Find W = West furthest from East.
• East, West = “the poles”
• All points have distance a,b to (E,W)
• c = dist(W,E)
• x = (a2 + c2 − b2)/2c
• Find median(x), recurse on each half
26
slides= tiny.cc/se15
26ai4se.net
WHERE approximates data as multiple
linear models (drawn in eigenspace)
If
Platt 2005: FASTMP= Nystrom algorithm = approximations to PCA.
combines similar influences, ignores irrelevancies, outliers
27
slides= tiny.cc/se15
27ai4se.net
If
Hold that thought
Underlying data structure
to much of my current thinking
• If cluster to leaves of size sqrt(n),
• Only need 2*sqrt(n)-1 nodes, each with 2 poles
• So 4*sqrt(n) – 2 examples
• Which we can reduce, later (see optimization)
28
slides= tiny.cc/se15
28ai4se.net
Is Where a multi-objective optimization algorithm?
Mutate towards useful “end”?
Now can reason about combinations of user goals?
Krall (WVU), Menzies et al. TSE 2015, GALE.
Orders of magnitude faster than standard
optimizers. Just as effective
• Evolutionary optimizers = select,
crossover, mutate, repeat
• Select:
• Evaluate each pole as you
descend the tree
• Cull the half leading to the
worst pole
• Crossover, mutate
• In the surviving leaves,
• mutate examples towards to
the best pole
29
slides= tiny.cc/se15
29ai4se.netai4se.net
Works well, using far fewer evals
30
slides= tiny.cc/se15
30ai4se.net
Is WHERE a compression algorithm?
Use it for the certification envelope?
Ship models with a summary of their training data?
• Call each leaf one “class”
• Run a decision tree learner to
find a model for the “classes”
Vasil Papakroni, WVU masters thesis, 2012
Prediction using WHERE’s clusters works
Just as well as other standard methods
(for software effort and defect estimation)
• Anything lost for (e.g.)
prediction?
31
slides= tiny.cc/se15
31ai4se.net
Can WHERE support locality?
Deliver specialized lessons for different problems?
• Build one model per
cluster using your learner
de jour
• O(log(N)) indexing of new
data to old models
• Push test data down the tree
Butcher, Menzies et al. Local vs Global. TSE’13.
Local models have better medians and less
variance
32
slides= tiny.cc/se15
32ai4se.net
Is WHERE a tool for privacy?
• Hide the individuals, preserves the
shape of the data
• Don’t share all the data, just the
poles.
• 100% privacy on data not in
poles
• Don’t share the poles exactly,
• Mutate them slightly, by no
more than half the axis length
• Predictions in reduced space work
as well as in raw data space
Peters, Menzies, TSE’13, Balancing privacy and utility
33
slides= tiny.cc/se15
33ai4se.net
Is WHERE an anomaly detector?
• WHERE’s trees are a
O(log(N)) time index to
the leaves
• Test data is “alien” if, after
falling to its nearest leaf, it
is outside of the poles
Peters, Menzies, ICSE’15, LACE2
34
slides= tiny.cc/se15
34ai4se.net
WHERE and “the sharing trick”
• Community of N data owners
• Pass around a cache in random
order
• Owner “I” just adds anomalous
data
• Then privatized as per above
• Cache size: < 5%
• Models learned from cache as
good or better than from all raw
Peters, Menzies, ICSE’15, LACE2
35
slides= tiny.cc/se15
35ai4se.net
Is WHERE a pollution marking tool
(here thar be dragons, best not go thar)
• Mark in as polluted all
sub-trees with more than
X% anomalies
• When making conclusions,
stay away from the
polluted sub-trees
Kocaguneli, Menzies et al, Analogy Estimation, TSE12
36
slides= tiny.cc/se15
36ai4se.net
Is WHERE an incremental learner?
(i.e. data mining for streams)
• Build models per subtree,
using your learner de jour
• In all sub-trees, keep a sample
of data plus any anomalies
• When too many pollution
markers, recluster just that
sub-tree
• Dianne Gordon-Spears (2002):
such hierarchical incremental
repair 10,000 times faster
than global reorganizations
37
slides= tiny.cc/se15
37ai4se.net
IEEE trans SE ‘13a
ESE ‘09
ESE ‘14
IEEE trans SE ‘15
Icse ‘16?
Wvu ‘13
ICSE ‘15
IEEE trans SE ’13b
IEEE trans SE ‘12
Published
To do
Executing
38
slides= tiny.cc/se15
38ai4se.net
Lessons learned
• Certification envelopes (when not to trust conclusions)
• Goals matter (not everything is “classification”)
• Locality matters (when their conclusions do not hold for you)
• Need “streaming tools” (continually stream over a never
ending sequence of new data)
• Need repair tools (to fix broken ideas)
• Verification matters (sooner or later, we all screw up)
• Need compression tools (to save space)
• Need privacy tools (so you can share)
What matters?
39
slides= tiny.cc/se15
39ai4se.net
1. Software tools for
“citizen scientists”.
2. Beyond mere
data repositories
3. What happens when decision
software goes wrong?
4. Proposed services for
nextgen repositories
5. The Future?
ai4se.net
40
slides= tiny.cc/se15
40ai4se.net
Confucius: “Study the past if
you would define the future.”
• History of SE
– X is not part of SE
– People are having trouble with X
– Experiments: Extend SE to include X
– Conclusion: “you know what? SE tool support makes X easier”
41
slides= tiny.cc/se15
41ai4se.net
• Future of SE
– Software mediates what we see and how we act in the world
– Everyone with software is now a scientist
– Software supports communities as they judge conclusions
Confucius: “Study the past if
you would define the future.”
42
slides= tiny.cc/se15
42ai4se.net
To find the future,
extrapolate the past
• Future of SE
– Software mediates how everyone sees and acts on the world
– Everyone with software is now a scientist
– Software supports communities as they judge conclusions
43
slides= tiny.cc/se15
43ai4se.net
This talk
• Services for data repositories supporting citizen scientists
– Enabling reflect, act, discover
– The next generation of continuous science.
44
slides= tiny.cc/se15
44ai4se.net
Software engineering researchers just studying
software is like astronomers just studying telescopes.
• After we grind the lenses, we should look through the scope.
• After we build the software, we see how people are using it
45
slides= tiny.cc/se15
45ai4se.net
End of my tale tail
• Questions? Comments?
46ai4se.net
slides= tiny.cc/se15
About me
• Full Prof in CS NC State. Teaches SE and automated SE.
• Researches synergies human+AI, with focus on data
mining for SE.
• Assoc editor IEEE Transactions on SE, Empirical SE, the
Automated SE Journal , Software Quality Journal
• Was co-PC-chair for ASE’12, ICSE'15 NIER track.
• Will be co-general chair of ICMSE'16.
• Author of 230+ referred pubs.
• One of the 100th most cited authors in SE (of 80,000
http://goo.gl/BnFJs).
• PI for NSF, NIJ, DoD, NASA, USDA, and research work
with private companies.
• Co-founder of the PROMISE conference series on
reproducible experiments in SE.
• Current curator PROMISE web site, SE
research data http://openscience.us/repo .
• Vita: http://goo.gl/8eNhYM
• Pubs: https://goo.gl/qNQAIq
• Home page: http://menzies.us
slides= tiny.cc/se15
Backup slides
48
slides= tiny.cc/se15
48ai4se.net
http://mshang.ca/syntree/
[clustering [contexts [locality [transfer]]]
[compression
[prediction [planning multi-goal optimization]
]
[privacy [sharing [verification]]]
[anomalyDetection certificationEnvelope
[pollutionMarking [incrementalRepair [streaming]]
49
slides= tiny.cc/se15
49ai4se.net
Code used in my
last paper
(1100 LOC of Python
calling scikitlearn)
50
slides= tiny.cc/se15
50ai4se.net
• ECL: a higher-level set-
based language (more
succinct)
• But if you can write it
quick,
– you can write it wrong, quick.
• Implications for
– markets, ambulances, government
policies, homeland security,
toasters. Air safety, Nobel prizes,
web-company advertising polices,
do we take the family to Cairo for a
holiday, etc etc
Note: not necessarily solved by
higher-level languages
Sheldon: a grand unified theory, insofar as
it explains everything, will ipso facto
explain neurobiology.
Amy: Yes, but if I’m successful….
I will be able to map and reproduce your
thought processes in deriving a grand
unified theory, and therefore, subsume
your conclusions under my paradigm.
Recall the words of
Dr. Amy Farrar Fowler, Ph.D.
Apologies to fans of the BBT:
This conversation occurred in JPL,
cafeteria, not Amy’s flat
ai4se.net
slides= tiny.cc/se15
53
slides= tiny.cc/se15
53ai4se.net
WHERE = fast analog for PCA
(so WHERE is a heuristic spectral learner)
53ai4se.net
Spectral learners : works on eigenvectors
• combine related influences
• ignore outliers and irrelevancies
54
slides= tiny.cc/se15
54ai4se.net
GALE: one of the best, far fewer evals
Gray: stats tests: as good as the best
ai4se.net
55
slides= tiny.cc/se15
55ai4se.net
Transfer matters (and is possible)
B.Turhan,
T.Menzies, A.
Bener, J. Di
Stefano. 2009.
On the relative
value of cross-
company and
within-
company data
for defect
prediction.
Empirical
Softw. Eng.
14(5) 2009,
When not enough local data, ask your friends
56
slides= tiny.cc/se15
56ai4se.net
Is WHERE a verification tool
• With enough eyeballs,
• Are all bugs are shallow?
57ai4se.net
slides= tiny.cc/se15
If it works, try to make it better
• “The following is my valiant
attempt to capture the
difference (between PROMISE
and MSR)”
• “To misquote George Box, I
hope my model is more useful
than it is wrong:
– For the most part, the MSR
community was mostly
concerned with the initial
collection of data sets from
software projects.
– Meanwhile, the PROMISE
community emphasized the
analysis of the data after it was
collected.”
• “The PROMISE people
routinely posted all their data
on a public repository
– their new papers would re-
analyze old data, in an attempt
to improve that analysis.
– In fact, I used to joke
“PROMISE. Australian for
repeatability” (apologies to the
Fosters Brewing company). “
57
Dr. Prem Devanbu
UC Davis
General chair, MSR’14
The PROMISE Project
58ai4se.net
slides= tiny.cc/se15
58
Perspective on
Data Science
for Software
Engineering
Tim Menzies
Laurie Williams
Thomas
Zimmermann
2014 2015 2016
The PROMISE Project
Oursummary. Andotherrelatedbooks
The MSR
community
and others

More Related Content

What's hot

Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine Learning
Corey Chivers
 
Deep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up SeattleDeep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up Seattle
Domino Data Lab
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014The Hive
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
Armando Vieira
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
odsc
 
Putting the Magic in Data Science
Putting the Magic in Data SciencePutting the Magic in Data Science
Putting the Magic in Data Science
Sean Taylor
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
James Hendler
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
DataWorks Summit
 
Data Science Popup Austin: Conflict in Growing Data Science Organizations
Data Science Popup Austin: Conflict in Growing Data Science Organizations Data Science Popup Austin: Conflict in Growing Data Science Organizations
Data Science Popup Austin: Conflict in Growing Data Science Organizations
Domino Data Lab
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
Paco Nathan
 
MIT Sloan: Intro to Machine Learning
MIT Sloan: Intro to Machine LearningMIT Sloan: Intro to Machine Learning
MIT Sloan: Intro to Machine Learning
Lex Fridman
 
Data Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and AnalyticsData Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and Analytics
Domino Data Lab
 
Semantic Web: The Inside Story
Semantic Web: The Inside StorySemantic Web: The Inside Story
Semantic Web: The Inside Story
James Hendler
 
Towards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that MattersTowards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that Matters
Tao Xie
 
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
AI Frontiers
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
Paco Nathan
 
Crowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesCrowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic Perspectives
Aditya Parameswaran
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
ryanorban
 
Dark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific ComputingDark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific Computing
Greg Wilson
 

What's hot (20)

Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine Learning
 
Deep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up SeattleDeep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up Seattle
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
Putting the Magic in Data Science
Putting the Magic in Data SciencePutting the Magic in Data Science
Putting the Magic in Data Science
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
Data Science Popup Austin: Conflict in Growing Data Science Organizations
Data Science Popup Austin: Conflict in Growing Data Science Organizations Data Science Popup Austin: Conflict in Growing Data Science Organizations
Data Science Popup Austin: Conflict in Growing Data Science Organizations
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
 
MIT Sloan: Intro to Machine Learning
MIT Sloan: Intro to Machine LearningMIT Sloan: Intro to Machine Learning
MIT Sloan: Intro to Machine Learning
 
Data Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and AnalyticsData Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and Analytics
 
Semantic Web: The Inside Story
Semantic Web: The Inside StorySemantic Web: The Inside Story
Semantic Web: The Inside Story
 
Towards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that MattersTowards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that Matters
 
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
Crowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesCrowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic Perspectives
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Dark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific ComputingDark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific Computing
 

Viewers also liked

Making Rain: Prep, Pitch and Close
Making Rain: Prep, Pitch and CloseMaking Rain: Prep, Pitch and Close
Making Rain: Prep, Pitch and CloseBlackBerry
 
Computer Science = ideas
Computer Science = ideasComputer Science = ideas
Computer Science = ideas
CS, NcState
 
Software Faults, Failures and Their Mitigations | Turing100@Persistent
Software Faults, Failures and Their Mitigations | Turing100@PersistentSoftware Faults, Failures and Their Mitigations | Turing100@Persistent
Software Faults, Failures and Their Mitigations | Turing100@Persistent
Persistent Systems Ltd.
 
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
CS, NcState
 
Elane - Promise08
Elane - Promise08Elane - Promise08
Elane - Promise08
gregoryg
 
Promise 2011: "Does Measuring Code Change Improve Fault Prediction?"
Promise 2011: "Does Measuring Code Change Improve Fault Prediction?"Promise 2011: "Does Measuring Code Change Improve Fault Prediction?"
Promise 2011: "Does Measuring Code Change Improve Fault Prediction?"
CS, NcState
 
Using Developer Information as a Prediction Factor
Using Developer Information as a Prediction FactorUsing Developer Information as a Prediction Factor
Using Developer Information as a Prediction Factor
Tim Menzies
 
M1 L2 grammar
M1 L2 grammarM1 L2 grammar
M1 L2 grammar
Bridget Beaver
 
A Web 2.0 Personal Learning Environment for Classical Chinese Poetry
A Web 2.0 Personal Learning Environment for Classical Chinese PoetryA Web 2.0 Personal Learning Environment for Classical Chinese Poetry
A Web 2.0 Personal Learning Environment for Classical Chinese Poetry
Ralf Klamma
 
A survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithmsA survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithms
Ahmed Magdy Ezzeldin, MSc.
 

Viewers also liked (10)

Making Rain: Prep, Pitch and Close
Making Rain: Prep, Pitch and CloseMaking Rain: Prep, Pitch and Close
Making Rain: Prep, Pitch and Close
 
Computer Science = ideas
Computer Science = ideasComputer Science = ideas
Computer Science = ideas
 
Software Faults, Failures and Their Mitigations | Turing100@Persistent
Software Faults, Failures and Their Mitigations | Turing100@PersistentSoftware Faults, Failures and Their Mitigations | Turing100@Persistent
Software Faults, Failures and Their Mitigations | Turing100@Persistent
 
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
 
Elane - Promise08
Elane - Promise08Elane - Promise08
Elane - Promise08
 
Promise 2011: "Does Measuring Code Change Improve Fault Prediction?"
Promise 2011: "Does Measuring Code Change Improve Fault Prediction?"Promise 2011: "Does Measuring Code Change Improve Fault Prediction?"
Promise 2011: "Does Measuring Code Change Improve Fault Prediction?"
 
Using Developer Information as a Prediction Factor
Using Developer Information as a Prediction FactorUsing Developer Information as a Prediction Factor
Using Developer Information as a Prediction Factor
 
M1 L2 grammar
M1 L2 grammarM1 L2 grammar
M1 L2 grammar
 
A Web 2.0 Personal Learning Environment for Classical Chinese Poetry
A Web 2.0 Personal Learning Environment for Classical Chinese PoetryA Web 2.0 Personal Learning Environment for Classical Chinese Poetry
A Web 2.0 Personal Learning Environment for Classical Chinese Poetry
 
A survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithmsA survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithms
 

Similar to Future se oct15

Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
Trieu Nguyen
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Ali Alkan
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
Anant Corporation
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
Mihai Criveti
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Soujanya V
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
Neo4j
 
A Space X Industry Day Briefing 7 Jul08 Jgm R4
A Space X Industry Day Briefing 7 Jul08 Jgm R4A Space X Industry Day Briefing 7 Jul08 Jgm R4
A Space X Industry Day Briefing 7 Jul08 Jgm R4
jmorriso
 
Module 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdfModule 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdf
fathiah5
 
Data science workflow v1.1
Data science workflow v1.1Data science workflow v1.1
Data science workflow v1.1
Jessie_N
 
Lecture_1_Intro.pdf
Lecture_1_Intro.pdfLecture_1_Intro.pdf
Lecture_1_Intro.pdf
paijitk
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
markgrover
 
Doing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentDoing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics Environment
Tasktop
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create
PyData
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf
Ayele40
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence Development
ManojKumarR41
 
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Amazon Web Services
 
Data-X-Sparse-v2
Data-X-Sparse-v2Data-X-Sparse-v2
Data-X-Sparse-v2
Ikhlaq Sidhu
 
KSU IT4983 Capstone Projects Report 2017 Update
KSU IT4983 Capstone Projects Report 2017 UpdateKSU IT4983 Capstone Projects Report 2017 Update
KSU IT4983 Capstone Projects Report 2017 Update
Jack Zheng
 

Similar to Future se oct15 (20)

Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
A Space X Industry Day Briefing 7 Jul08 Jgm R4
A Space X Industry Day Briefing 7 Jul08 Jgm R4A Space X Industry Day Briefing 7 Jul08 Jgm R4
A Space X Industry Day Briefing 7 Jul08 Jgm R4
 
Module 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdfModule 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdf
 
Data science workflow v1.1
Data science workflow v1.1Data science workflow v1.1
Data science workflow v1.1
 
Lecture_1_Intro.pdf
Lecture_1_Intro.pdfLecture_1_Intro.pdf
Lecture_1_Intro.pdf
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Doing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentDoing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics Environment
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence Development
 
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
 
Data-X-Sparse-v2
Data-X-Sparse-v2Data-X-Sparse-v2
Data-X-Sparse-v2
 
KSU IT4983 Capstone Projects Report 2017 Update
KSU IT4983 Capstone Projects Report 2017 UpdateKSU IT4983 Capstone Projects Report 2017 Update
KSU IT4983 Capstone Projects Report 2017 Update
 

More from CS, NcState

Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9CS, NcState
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).
CS, NcState
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data Science
CS, NcState
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits
CS, NcState
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab templateCS, NcState
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUCS, NcState
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements Engineering
CS, NcState
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia
CS, NcState
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software Engineering
CS, NcState
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)
CS, NcState
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data Science
CS, NcState
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1CS, NcState
 
Know thy tools
Know thy toolsKnow thy tools
Know thy tools
CS, NcState
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
CS, NcState
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
CS, NcState
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?
CS, NcState
 
Sayyad slides ase13_v4
Sayyad slides ase13_v4Sayyad slides ase13_v4
Sayyad slides ase13_v4
CS, NcState
 
Ase2013
Ase2013Ase2013
Ase2013
CS, NcState
 
Warning: don't do CS
Warning: don't do CSWarning: don't do CS
Warning: don't do CS
CS, NcState
 

More from CS, NcState (20)

Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data Science
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab template
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements Engineering
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software Engineering
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data Science
 
Goldrush
GoldrushGoldrush
Goldrush
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
 
Know thy tools
Know thy toolsKnow thy tools
Know thy tools
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?
 
Sayyad slides ase13_v4
Sayyad slides ase13_v4Sayyad slides ase13_v4
Sayyad slides ase13_v4
 
Ase2013
Ase2013Ase2013
Ase2013
 
Warning: don't do CS
Warning: don't do CSWarning: don't do CS
Warning: don't do CS
 

Recently uploaded

AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
Kamal Acharya
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 

Recently uploaded (20)

AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 

Future se oct15

  • 1. 1 slides= tiny.cc/se15 1ai4se.net October 2015 Slides: tiny.cc/se15 (A)Future of SE Research: Research for SE, SE for Research tim.menzies@gmail.com https://menzies.us ai4se.net
  • 2. 2ai4se.net slides= tiny.cc/se15 Data mining tools should, and can, do much more • Operating systems do more than just schedule processes: – Editors – Compilers – File systems, – Network connections, – Memory management – Etc • What services should be standard in data mining tools? ai4se.net
  • 3. 3 slides= tiny.cc/se15 3ai4se.net IEEE trans SE ‘13a ESE ‘09 ESE ‘14 IEEE trans SE ‘15 Icse ‘16? Wvu ‘13 ICSE ‘15 IEEE trans SE ’13b IEEE trans SE ‘12
  • 4. 4 slides= tiny.cc/se15 4ai4se.net Not in this talk: not what everyone else is talking about • Principles for designing case studies • Visualizations • Data mining • Big Data • Qualitative methods see parts1+2
  • 6. 6 slides= tiny.cc/se15 6ai4se.net 1. Software tools for “citizen scientists”. 2. Beyond mere data repositories 3. What happens when decision software goes wrong? 4. Proposed services for nextgen repositories 5. The Future? ai4se.net
  • 7. 7 slides= tiny.cc/se15 7ai4se.net 1. Software tools for “citizen scientists”. 2. Beyond mere data repositories 3. What happens when decision software goes wrong? 4. Proposed services for nextgen repositories 5. The Future? ai4se.net
  • 8. 8 slides= tiny.cc/se15 8ai4se.net Software tools for “citizen scientists” • Science has escaped the lab – roaming free in the world. • When every citizen can be a scientist (making generalizations from data) – Then it should be possible to audit those conclusions • Want to mistrust the conclusions of citizen scientists – Just as we mistrust and evaluate, review, explore, evolve the conclusions of any other scientist.
  • 9. 9ai4se.net slides= tiny.cc/se15 Software mediates what we see and how we act in the world 1. Silicon valley developers view every new feature as an experiment, to be tested within some mash up. 2. Chemists win Nobel Prize for software sims http://goo.gl/Lwensc 3. Engineers use software to optical tweezers, radiation therapy, remote sensing, chip design, http://goo.gl/qBMyIZ 4. Web analysts use software to analyze clickstreams to improve sales and marketing strategies; http://goo.gl/b26CfY 5. Stock traders write software to simulate trading strategies http://www.quantopian.com 6. Analysts write software to mine labor statistics data to review proposed gov policies http://goo.gl/X4kgnc 7. Journalists use software to analyze economic data, make visualizations of their news stories http://fivethirtyeight.com 8. In London or New York, ambulances wait for your call at a location determined by a software model http://goo.gl/8SMd1p 9. Etc etc etc
  • 10. 10 slides= tiny.cc/se15 10ai4se.net Important to understand how software can divides us See also “Facebook emotion study breached ethical guidelines, researchers say” June 30, 2014, The Guardian http://goo.gl/gTRkmp
  • 12. 12ai4se.net slides= tiny.cc/se15 Better SE = better data science = better science • A data scientist isa engineer – Delivering, under constraints, to acceptable quality standards • A data scientist isa software developer – Complex scripts, test- driven development, version control • A data scientist isa requirements engineering – Understanding and navigating and trading off between user goals • A data scientist isa agile programmer – Uses feedback from writing, running code and query results to constantly revise goals and code Data scientist isa software engineering
  • 13. 13 slides= tiny.cc/se15 13ai4se.net 1. Software tools for “citizen scientists”. 2. Beyond mere data repositories 3. What happens when decision software goes wrong? 4. Proposed services for nextgen repositories 5. The Future? ai4se.net
  • 14. 14ai4se.net slides= tiny.cc/se15 #storeYourData • URL openscience.us/repo • Data from 100s of projects • E.g. EUSE: 250,000K+ spreadsheets • E.g. Softgoals: 150+ softgoal models • Oldest continuous repository of SE data (2004) 14 http://openscience.us/repo
  • 15. 15 slides= tiny.cc/se15 15ai4se.net 15 So many data repositories • What’s next? • What tools would we need for an “debate”-oriented repository ?
  • 16. To design those tools, ask: 1. What problems are seen when people try to share data and conclusions? 2. What minimal data structures address those problems? Let’s talk tools ai4se.net
  • 17. 17 slides= tiny.cc/se15 17ai4se.net 1. Software tools for “citizen scientists”. 2. Beyond mere data repositories 3. What happens when decision software goes wrong? 4. Proposed services for nextgen repositories 5. The Future? ai4se.net
  • 18. 18 slides= tiny.cc/se15 18ai4se.net Models have “certification envelopes” • Columbia ice strike – Size: 1200 m2 – Speed: 477 mpg (relative to vehicle) • Certified as “safe” by the CRATER micro- meteorite model. – A experiment in CRATER’s DB: • Size: 3cm3 • Speed: under 100 mpg • Columbia, and crew, dies on re-entry • Lesson: conclusions should come with a “certification envelope” – If new tests outside of the envelope of the training set – Raise an alert Bad things happen when you stretch the envelope
  • 19. 19 slides= tiny.cc/se15 19ai4se.net Goals matter • Learners work this way – Users want it that way • Waste of time learning models users do not want – Better to tune learning methods to goals of users • Enter search-based software engineering – Multi-goal optimization Learners learn for X, users want Y
  • 20. 20 slides= tiny.cc/se15 20ai4se.net Locality matters (what is true there may not be true here) • Devanbu et al. ASE’11 Ecological Inference • Betternburg et al. MSR’12 Think local, act global, • Menzies et al. TSE’13 Local versus Global learning, • Yang et al. IST’13 Handling local bias, • Minku et al. ICSE’14 Best Use of Cross-Company Data Using ensemble data Using local data Error(lessisbetter) Not general models ,but general methods for local models
  • 21. 21 slides= tiny.cc/se15 21ai4se.net Sharing matters • How was the error found so fast? – Open science Given enough eyes, all bugs are shallow When (2013) What Mar 15 “Better cross-company learning” accepted to MSR’13 Mar 29 Camera-ready submitted ?Apr 10 Pre-prints go on-line Apr 29 Hyeongmin Jeon, graduate student at Pusan Natl. Univ.emailed us: can’t reproduce result May 4 Fayola Peters, checking code, found error. Manic week of experiments follow May 11 We conclude results definitely wrong May 12 Email MSR organizers. Our penalty? Present paper and its error.
  • 22. 22 slides= tiny.cc/se15 22ai4se.net Compression and privacy matter • Facebook, Google, Netflix etc • Small X% of all users are subjects in continual experiments: testing new features • Data from studies, retained indefinitely, warehoused – Problems with volume (needs compression) – Problems with confidentiality (needs privacy) • If I want to challenge the conclusions made by Facebook, Google, Netflix, etc – I need to be able to access, privately, that data – (needs trusted sharing) Squeezing and secrets
  • 23. 23 slides= tiny.cc/se15 23ai4se.net Lessons learned • Certification envelopes (when not to trust conclusions) • Goals matter (not everything is “classification”) • Locality matters (when their conclusions do not hold for you) • Need “streaming tools” (continually stream over a never ending sequence of new data) • Need repair tools (to fix broken ideas) • Verification matters (sooner or later, we all screw up) • Need to transfer data (get by with a little help from your friends) • Need compression tools (to save space) • Need privacy tools (so you can share) What matters?
  • 24. 24 slides= tiny.cc/se15 24ai4se.net 1. Software tools for “citizen scientists”. 2. Beyond mere data repositories 3. What happens when decision software goes wrong? 4. Proposed services for nextgen repositories 5. The Future? ai4se.net
  • 25. 25 slides= tiny.cc/se15 25ai4se.net Digression: WHERE: O(N)top-down divisive clusterings • Fast: works on an approximation to eigenvectors (the FASTMAP heuristic) Faloutsos [1995]. A O(N) generation of axis of large variability • Pick any point X; • Find E= East = furthest from X, • Find W = West furthest from East. • East, West = “the poles” • All points have distance a,b to (E,W) • c = dist(W,E) • x = (a2 + c2 − b2)/2c • Find median(x), recurse on each half
  • 26. 26 slides= tiny.cc/se15 26ai4se.net WHERE approximates data as multiple linear models (drawn in eigenspace) If Platt 2005: FASTMP= Nystrom algorithm = approximations to PCA. combines similar influences, ignores irrelevancies, outliers
  • 27. 27 slides= tiny.cc/se15 27ai4se.net If Hold that thought Underlying data structure to much of my current thinking • If cluster to leaves of size sqrt(n), • Only need 2*sqrt(n)-1 nodes, each with 2 poles • So 4*sqrt(n) – 2 examples • Which we can reduce, later (see optimization)
  • 28. 28 slides= tiny.cc/se15 28ai4se.net Is Where a multi-objective optimization algorithm? Mutate towards useful “end”? Now can reason about combinations of user goals? Krall (WVU), Menzies et al. TSE 2015, GALE. Orders of magnitude faster than standard optimizers. Just as effective • Evolutionary optimizers = select, crossover, mutate, repeat • Select: • Evaluate each pole as you descend the tree • Cull the half leading to the worst pole • Crossover, mutate • In the surviving leaves, • mutate examples towards to the best pole
  • 30. 30 slides= tiny.cc/se15 30ai4se.net Is WHERE a compression algorithm? Use it for the certification envelope? Ship models with a summary of their training data? • Call each leaf one “class” • Run a decision tree learner to find a model for the “classes” Vasil Papakroni, WVU masters thesis, 2012 Prediction using WHERE’s clusters works Just as well as other standard methods (for software effort and defect estimation) • Anything lost for (e.g.) prediction?
  • 31. 31 slides= tiny.cc/se15 31ai4se.net Can WHERE support locality? Deliver specialized lessons for different problems? • Build one model per cluster using your learner de jour • O(log(N)) indexing of new data to old models • Push test data down the tree Butcher, Menzies et al. Local vs Global. TSE’13. Local models have better medians and less variance
  • 32. 32 slides= tiny.cc/se15 32ai4se.net Is WHERE a tool for privacy? • Hide the individuals, preserves the shape of the data • Don’t share all the data, just the poles. • 100% privacy on data not in poles • Don’t share the poles exactly, • Mutate them slightly, by no more than half the axis length • Predictions in reduced space work as well as in raw data space Peters, Menzies, TSE’13, Balancing privacy and utility
  • 33. 33 slides= tiny.cc/se15 33ai4se.net Is WHERE an anomaly detector? • WHERE’s trees are a O(log(N)) time index to the leaves • Test data is “alien” if, after falling to its nearest leaf, it is outside of the poles Peters, Menzies, ICSE’15, LACE2
  • 34. 34 slides= tiny.cc/se15 34ai4se.net WHERE and “the sharing trick” • Community of N data owners • Pass around a cache in random order • Owner “I” just adds anomalous data • Then privatized as per above • Cache size: < 5% • Models learned from cache as good or better than from all raw Peters, Menzies, ICSE’15, LACE2
  • 35. 35 slides= tiny.cc/se15 35ai4se.net Is WHERE a pollution marking tool (here thar be dragons, best not go thar) • Mark in as polluted all sub-trees with more than X% anomalies • When making conclusions, stay away from the polluted sub-trees Kocaguneli, Menzies et al, Analogy Estimation, TSE12
  • 36. 36 slides= tiny.cc/se15 36ai4se.net Is WHERE an incremental learner? (i.e. data mining for streams) • Build models per subtree, using your learner de jour • In all sub-trees, keep a sample of data plus any anomalies • When too many pollution markers, recluster just that sub-tree • Dianne Gordon-Spears (2002): such hierarchical incremental repair 10,000 times faster than global reorganizations
  • 37. 37 slides= tiny.cc/se15 37ai4se.net IEEE trans SE ‘13a ESE ‘09 ESE ‘14 IEEE trans SE ‘15 Icse ‘16? Wvu ‘13 ICSE ‘15 IEEE trans SE ’13b IEEE trans SE ‘12 Published To do Executing
  • 38. 38 slides= tiny.cc/se15 38ai4se.net Lessons learned • Certification envelopes (when not to trust conclusions) • Goals matter (not everything is “classification”) • Locality matters (when their conclusions do not hold for you) • Need “streaming tools” (continually stream over a never ending sequence of new data) • Need repair tools (to fix broken ideas) • Verification matters (sooner or later, we all screw up) • Need compression tools (to save space) • Need privacy tools (so you can share) What matters?
  • 39. 39 slides= tiny.cc/se15 39ai4se.net 1. Software tools for “citizen scientists”. 2. Beyond mere data repositories 3. What happens when decision software goes wrong? 4. Proposed services for nextgen repositories 5. The Future? ai4se.net
  • 40. 40 slides= tiny.cc/se15 40ai4se.net Confucius: “Study the past if you would define the future.” • History of SE – X is not part of SE – People are having trouble with X – Experiments: Extend SE to include X – Conclusion: “you know what? SE tool support makes X easier”
  • 41. 41 slides= tiny.cc/se15 41ai4se.net • Future of SE – Software mediates what we see and how we act in the world – Everyone with software is now a scientist – Software supports communities as they judge conclusions Confucius: “Study the past if you would define the future.”
  • 42. 42 slides= tiny.cc/se15 42ai4se.net To find the future, extrapolate the past • Future of SE – Software mediates how everyone sees and acts on the world – Everyone with software is now a scientist – Software supports communities as they judge conclusions
  • 43. 43 slides= tiny.cc/se15 43ai4se.net This talk • Services for data repositories supporting citizen scientists – Enabling reflect, act, discover – The next generation of continuous science.
  • 44. 44 slides= tiny.cc/se15 44ai4se.net Software engineering researchers just studying software is like astronomers just studying telescopes. • After we grind the lenses, we should look through the scope. • After we build the software, we see how people are using it
  • 45. 45 slides= tiny.cc/se15 45ai4se.net End of my tale tail • Questions? Comments?
  • 46. 46ai4se.net slides= tiny.cc/se15 About me • Full Prof in CS NC State. Teaches SE and automated SE. • Researches synergies human+AI, with focus on data mining for SE. • Assoc editor IEEE Transactions on SE, Empirical SE, the Automated SE Journal , Software Quality Journal • Was co-PC-chair for ASE’12, ICSE'15 NIER track. • Will be co-general chair of ICMSE'16. • Author of 230+ referred pubs. • One of the 100th most cited authors in SE (of 80,000 http://goo.gl/BnFJs). • PI for NSF, NIJ, DoD, NASA, USDA, and research work with private companies. • Co-founder of the PROMISE conference series on reproducible experiments in SE. • Current curator PROMISE web site, SE research data http://openscience.us/repo . • Vita: http://goo.gl/8eNhYM • Pubs: https://goo.gl/qNQAIq • Home page: http://menzies.us
  • 48. 48 slides= tiny.cc/se15 48ai4se.net http://mshang.ca/syntree/ [clustering [contexts [locality [transfer]]] [compression [prediction [planning multi-goal optimization] ] [privacy [sharing [verification]]] [anomalyDetection certificationEnvelope [pollutionMarking [incrementalRepair [streaming]]
  • 49. 49 slides= tiny.cc/se15 49ai4se.net Code used in my last paper (1100 LOC of Python calling scikitlearn)
  • 50. 50 slides= tiny.cc/se15 50ai4se.net • ECL: a higher-level set- based language (more succinct) • But if you can write it quick, – you can write it wrong, quick. • Implications for – markets, ambulances, government policies, homeland security, toasters. Air safety, Nobel prizes, web-company advertising polices, do we take the family to Cairo for a holiday, etc etc Note: not necessarily solved by higher-level languages
  • 51. Sheldon: a grand unified theory, insofar as it explains everything, will ipso facto explain neurobiology. Amy: Yes, but if I’m successful…. I will be able to map and reproduce your thought processes in deriving a grand unified theory, and therefore, subsume your conclusions under my paradigm. Recall the words of Dr. Amy Farrar Fowler, Ph.D. Apologies to fans of the BBT: This conversation occurred in JPL, cafeteria, not Amy’s flat ai4se.net
  • 53. 53 slides= tiny.cc/se15 53ai4se.net WHERE = fast analog for PCA (so WHERE is a heuristic spectral learner) 53ai4se.net Spectral learners : works on eigenvectors • combine related influences • ignore outliers and irrelevancies
  • 54. 54 slides= tiny.cc/se15 54ai4se.net GALE: one of the best, far fewer evals Gray: stats tests: as good as the best ai4se.net
  • 55. 55 slides= tiny.cc/se15 55ai4se.net Transfer matters (and is possible) B.Turhan, T.Menzies, A. Bener, J. Di Stefano. 2009. On the relative value of cross- company and within- company data for defect prediction. Empirical Softw. Eng. 14(5) 2009, When not enough local data, ask your friends
  • 56. 56 slides= tiny.cc/se15 56ai4se.net Is WHERE a verification tool • With enough eyeballs, • Are all bugs are shallow?
  • 57. 57ai4se.net slides= tiny.cc/se15 If it works, try to make it better • “The following is my valiant attempt to capture the difference (between PROMISE and MSR)” • “To misquote George Box, I hope my model is more useful than it is wrong: – For the most part, the MSR community was mostly concerned with the initial collection of data sets from software projects. – Meanwhile, the PROMISE community emphasized the analysis of the data after it was collected.” • “The PROMISE people routinely posted all their data on a public repository – their new papers would re- analyze old data, in an attempt to improve that analysis. – In fact, I used to joke “PROMISE. Australian for repeatability” (apologies to the Fosters Brewing company). “ 57 Dr. Prem Devanbu UC Davis General chair, MSR’14 The PROMISE Project
  • 58. 58ai4se.net slides= tiny.cc/se15 58 Perspective on Data Science for Software Engineering Tim Menzies Laurie Williams Thomas Zimmermann 2014 2015 2016 The PROMISE Project Oursummary. Andotherrelatedbooks The MSR community and others