Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mind the Gap - Data Science Meets Software Engineering

106 views

Published on

Talk given at the Vienna Semantic Web Meetup in 2016

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Mind the Gap - Data Science Meets Software Engineering

  1. 1. Data Science meets Software Engineering Vienna Semantic Web Meetup 2016-03-01 Bernhard Haslhofer
  2. 2. Who am I? • Scientist at AIT’s Digital Insight Lab • Specialization • Network Analytics • Machine Learning • Text Mining • PhD in Computer Science
  3. 3. Plan for tonight • Build an example service • Approach problem from • software engineering perspective • data science perspective • Look at gap & propose solution
  4. 4. Example Service sports politics art business Text Classification API
  5. 5. Approaching the Problem Software Engineering Software Engineering
  6. 6. Steps • Identify use cases / features • Choose framework • Implement functionality • Ensure quality: test functionality, scalability etc… • Deploy service
  7. 7. Ensure quality public classify(Document document) { …. } @Test(timeout=100) public test_classify(…) { d = new Document(…) c = classifier.classify(d) assertNotNull(c) assert(c in [sports, politics, …]) }
  8. 8. Result / Quality Expectation • A service • implementing defined use case(s) • passing all tests (unit, integration, functional) • fulfilling scalability needs
  9. 9. Approaching the Problem Data Science Data Science
  10. 10. Steps • Define problem / hypothesis • Collect data • Design approach / model • Ensure quality: evaluate model, compare • Prototype algorithm (in R, Matlab, Octave, etc.)
  11. 11. Ensure quality • Split dataset into training / test / cross-validation dataset • Train model using training dataset • Evaluate using test (and cross-validation) dataset • Report and investigate metrics • precision, recall, F1, …
  12. 12. What ??? Software Engineering Data Science Overall Goal Build the service Build the service Technical Goal Implement software features, deploy working service Find the right model features, get the model right Quality assurance Unit, functional, integration tests Evaluate model, report metrics, re- design model
  13. 13. What ??? • The overall (business) goal can be the same • Different technical approach • language issues (what is a “feature” !?) • lack of understanding differences and necessities • Different quality assurance • notion of “testing” is different • different “success factors” (passing test vs. metrics)
  14. 14. Possible solution Define Goal Collect Ground Truth Implement Model and Functions Test & Evaluate Analyze Errors Deploy Service Metrics Driven Software Engineering
  15. 15. Tool support @Test(precision >= 0.8) @Test(timeout=100) public test_classify(…) { d = new Document(…) c = classifier.classify(d) assertNotNull(c) assert(c in [sports, politics, …]) }
  16. 16. Thank You! bernhard.haslhofer@ait.ac.at

×