Mind the Gap - Data Science Meets Software Engineering
1. Data Science meets Software Engineering
Vienna Semantic Web Meetup
2016-03-01
Bernhard Haslhofer
2. Who am I?
• Scientist at AIT’s Digital Insight Lab
• Specialization
• Network Analytics
• Machine Learning
• Text Mining
• PhD in Computer Science
4. Plan for tonight
• Build an example service
• Approach problem from
• software engineering perspective
• data science perspective
• Look at gap & propose solution
7. Steps
• Identify use cases / features
• Choose framework
• Implement functionality
• Ensure quality: test functionality, scalability etc…
• Deploy service
8. Ensure quality
public classify(Document document) {
….
}
@Test(timeout=100)
public test_classify(…) {
d = new Document(…)
c = classifier.classify(d)
assertNotNull(c)
assert(c in [sports, politics, …])
}
9. Result / Quality Expectation
• A service
• implementing defined use case(s)
• passing all tests (unit, integration, functional)
• fulfilling scalability needs
11. Steps
• Define problem / hypothesis
• Collect data
• Design approach / model
• Ensure quality: evaluate model, compare
• Prototype algorithm (in R, Matlab, Octave, etc.)
12. Ensure quality
• Split dataset into training / test / cross-validation
dataset
• Train model using training dataset
• Evaluate using test (and cross-validation) dataset
• Report and investigate metrics
• precision, recall, F1, …
13. What ???
Software Engineering Data Science
Overall Goal Build the service Build the service
Technical Goal
Implement software features,
deploy working service
Find the right model features, get
the model right
Quality
assurance
Unit, functional, integration tests
Evaluate model, report metrics, re-
design model
14. What ???
• The overall (business) goal can be the same
• Different technical approach
• language issues (what is a “feature” !?)
• lack of understanding differences and necessities
• Different quality assurance
• notion of “testing” is different
• different “success factors” (passing test vs. metrics)
16. Tool support
@Test(precision >= 0.8)
@Test(timeout=100)
public test_classify(…) {
d = new Document(…)
c = classifier.classify(d)
assertNotNull(c)
assert(c in [sports, politics, …])
}