Mind the Gap - Data Science Meets Software Engineering

B
Bernhard HaslhoferResearcher at University of Vienna
Data Science meets Software Engineering
Vienna Semantic Web Meetup
2016-03-01
Bernhard Haslhofer
Who am I?
• Scientist at AIT’s Digital Insight Lab
• Specialization
• Network Analytics
• Machine Learning
• Text Mining
• PhD in Computer Science
Mind the Gap - Data Science Meets Software Engineering
Plan for tonight
• Build an example service
• Approach problem from
• software engineering perspective
• data science perspective
• Look at gap & propose solution
Example Service
sports
politics
art
business
Text Classification
API
Approaching the Problem
Software
Engineering
Software
Engineering
Steps
• Identify use cases / features
• Choose framework
• Implement functionality
• Ensure quality: test functionality, scalability etc…
• Deploy service
Ensure quality
public classify(Document document) {
….
}
@Test(timeout=100)
public test_classify(…) {
d = new Document(…)
c = classifier.classify(d)
assertNotNull(c)
assert(c in [sports, politics, …])
}
Result / Quality Expectation
• A service
• implementing defined use case(s)
• passing all tests (unit, integration, functional)
• fulfilling scalability needs
Approaching the Problem
Data
Science
Data
Science
Steps
• Define problem / hypothesis
• Collect data
• Design approach / model
• Ensure quality: evaluate model, compare
• Prototype algorithm (in R, Matlab, Octave, etc.)
Ensure quality
• Split dataset into training / test / cross-validation
dataset
• Train model using training dataset
• Evaluate using test (and cross-validation) dataset
• Report and investigate metrics
• precision, recall, F1, …
What ???
Software Engineering Data Science
Overall Goal Build the service Build the service
Technical Goal
Implement software features,
deploy working service
Find the right model features, get
the model right
Quality
assurance
Unit, functional, integration tests
Evaluate model, report metrics, re-
design model
What ???
• The overall (business) goal can be the same
• Different technical approach
• language issues (what is a “feature” !?)
• lack of understanding differences and necessities
• Different quality assurance
• notion of “testing” is different
• different “success factors” (passing test vs. metrics)
Possible solution
Define Goal
Collect
Ground Truth
Implement
Model and
Functions
Test &
Evaluate
Analyze
Errors
Deploy
Service
Metrics Driven Software Engineering
Tool support
@Test(precision >= 0.8)
@Test(timeout=100)
public test_classify(…) {
d = new Document(…)
c = classifier.classify(d)
assertNotNull(c)
assert(c in [sports, politics, …])
}
Thank You!
bernhard.haslhofer@ait.ac.at
1 of 17

Recommended

Auto Epe Con10 by
Auto Epe Con10Auto Epe Con10
Auto Epe Con10Hubert Lobo
357 views26 slides
Data Visualizations with D3.js by
Data Visualizations with D3.jsData Visualizations with D3.js
Data Visualizations with D3.jsBrian Greig
131 views21 slides
Charlotte Front End - D3 by
Charlotte Front End - D3Charlotte Front End - D3
Charlotte Front End - D3Brian Greig
155 views21 slides
Introduction to dax7 IDE by
Introduction to dax7 IDEIntroduction to dax7 IDE
Introduction to dax7 IDEConfiz Limited - Dynamics AX
92 views19 slides
Eira presentation by
Eira presentationEira presentation
Eira presentationPontus Thome
151 views9 slides
A Kaggle Talk by
A Kaggle TalkA Kaggle Talk
A Kaggle TalkLex Toumbourou
348 views26 slides

More Related Content

Similar to Mind the Gap - Data Science Meets Software Engineering

Transferring Software Testing Tools to Practice by
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTao Xie
539 views34 slides
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs by
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j
254 views48 slides
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi... by
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
1.3K views42 slides
Machine learning by
Machine learningMachine learning
Machine learningSaravanan Subburayal
1K views30 slides
Making Data Science Scalable - 5 Lessons Learned by
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedLaurenz Wuttke
448 views36 slides
An Architecture for Agile Machine Learning in Real-Time Applications by
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsJohann Schleier-Smith
3.8K views47 slides

Similar to Mind the Gap - Data Science Meets Software Engineering(20)

Transferring Software Testing Tools to Practice by Tao Xie
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
Tao Xie539 views
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs by Neo4j
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j254 views
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi... by Ali Alkan
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Ali Alkan1.3K views
Making Data Science Scalable - 5 Lessons Learned by Laurenz Wuttke
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
Laurenz Wuttke448 views
An Architecture for Agile Machine Learning in Real-Time Applications by Johann Schleier-Smith
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
Cloud-based Modelling Solutions Empowering Tool Integration by Istvan Rath
Cloud-based Modelling Solutions Empowering Tool IntegrationCloud-based Modelling Solutions Empowering Tool Integration
Cloud-based Modelling Solutions Empowering Tool Integration
Istvan Rath228 views
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA by DotNetCampus
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
DotNetCampus581 views
Net campus2015 antimomusone by DotNetCampus
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
DotNetCampus237 views
Data Mining 2008 by llangit
Data Mining 2008Data Mining 2008
Data Mining 2008
llangit1.1K views
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D... by Sri Ambati
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Sri Ambati4.1K views
Expanding the idea of static analysis from code check to other development pr... by Andrey Karpov
Expanding the idea of static analysis from code check to other development pr...Expanding the idea of static analysis from code check to other development pr...
Expanding the idea of static analysis from code check to other development pr...
Andrey Karpov243 views
GraphTalk Wien - Intelligente Lösungen mit Graphen erstellen by Neo4j
GraphTalk Wien - Intelligente Lösungen mit Graphen erstellenGraphTalk Wien - Intelligente Lösungen mit Graphen erstellen
GraphTalk Wien - Intelligente Lösungen mit Graphen erstellen
Neo4j110 views
Neo4j GraphTalk Basel - Building intelligent Software with Graphs by Neo4j
Neo4j GraphTalk Basel - Building intelligent Software with GraphsNeo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j365 views
Bluegranite AA Webinar FINAL 28JUN16 by Andy Lathrop
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16
Andy Lathrop148 views
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM by Michał Łopuszyński
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMFrom Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
Development Practices & The Microsoft Approach by Steve Lange
Development Practices & The Microsoft ApproachDevelopment Practices & The Microsoft Approach
Development Practices & The Microsoft Approach
Steve Lange651 views

More from Bernhard Haslhofer

Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P... by
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...Bernhard Haslhofer
558 views20 slides
Token Systems, Payment Channels, and Corporate Currencies by
Token Systems, Payment Channels, and Corporate CurrenciesToken Systems, Payment Channels, and Corporate Currencies
Token Systems, Payment Channels, and Corporate CurrenciesBernhard Haslhofer
438 views48 slides
Can a blockchain solve the trust problem? by
Can a blockchain solve the trust problem?Can a blockchain solve the trust problem?
Can a blockchain solve the trust problem?Bernhard Haslhofer
1.2K views23 slides
Measurements in Cryptocurrency Networks by
Measurements in Cryptocurrency NetworksMeasurements in Cryptocurrency Networks
Measurements in Cryptocurrency NetworksBernhard Haslhofer
751 views37 slides
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur... by
 Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur... Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...Bernhard Haslhofer
926 views57 slides
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba... by
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...Bernhard Haslhofer
422 views22 slides

More from Bernhard Haslhofer(20)

Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P... by Bernhard Haslhofer
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Bernhard Haslhofer558 views
Token Systems, Payment Channels, and Corporate Currencies by Bernhard Haslhofer
Token Systems, Payment Channels, and Corporate CurrenciesToken Systems, Payment Channels, and Corporate Currencies
Token Systems, Payment Channels, and Corporate Currencies
Bernhard Haslhofer438 views
Can a blockchain solve the trust problem? by Bernhard Haslhofer
Can a blockchain solve the trust problem?Can a blockchain solve the trust problem?
Can a blockchain solve the trust problem?
Bernhard Haslhofer1.2K views
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur... by Bernhard Haslhofer
 Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur... Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Bernhard Haslhofer926 views
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba... by Bernhard Haslhofer
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Bernhard Haslhofer422 views
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics by Bernhard Haslhofer
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency AnalyticsO Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
Bernhard Haslhofer635 views
GraphSense - Real-time Insight into Virtual Currency Ecosystems by Bernhard Haslhofer
GraphSense - Real-time Insight into Virtual Currency EcosystemsGraphSense - Real-time Insight into Virtual Currency Ecosystems
GraphSense - Real-time Insight into Virtual Currency Ecosystems
Bernhard Haslhofer1.1K views
BITCOIN - De-anonymization and Money Laundering Detection Strategies by Bernhard Haslhofer
BITCOIN - De-anonymization and Money Laundering Detection StrategiesBITCOIN - De-anonymization and Money Laundering Detection Strategies
BITCOIN - De-anonymization and Money Laundering Detection Strategies
Bernhard Haslhofer2.7K views
Bitcoin - Introduction, Technical Aspects and Ongoing Developments by Bernhard Haslhofer
Bitcoin - Introduction, Technical Aspects and Ongoing DevelopmentsBitcoin - Introduction, Technical Aspects and Ongoing Developments
Bitcoin - Introduction, Technical Aspects and Ongoing Developments
Bernhard Haslhofer3.9K views
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen... by Bernhard Haslhofer
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Bernhard Haslhofer1.6K views
The value of open data and the OpenGLAM network by Bernhard Haslhofer
The value of open data and the OpenGLAM networkThe value of open data and the OpenGLAM network
The value of open data and the OpenGLAM network
Bernhard Haslhofer835 views
Offene Daten im Kulturbereich - Die pragmatische Perspektive by Bernhard Haslhofer
Offene Daten im Kulturbereich - Die pragmatische PerspektiveOffene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische Perspektive
Bernhard Haslhofer1.1K views
Semantic Tagging for old maps...and other things on the Web by Bernhard Haslhofer
Semantic Tagging for old maps...and other things on the WebSemantic Tagging for old maps...and other things on the Web
Semantic Tagging for old maps...and other things on the Web
Bernhard Haslhofer872 views

Recently uploaded

Voice Logger - Telephony Integration Solution at Aegis by
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at AegisNirmal Sharma
39 views1 slide
Melek BEN MAHMOUD.pdf by
Melek BEN MAHMOUD.pdfMelek BEN MAHMOUD.pdf
Melek BEN MAHMOUD.pdfMelekBenMahmoud
14 views1 slide
Vertical User Stories by
Vertical User StoriesVertical User Stories
Vertical User StoriesMoisés Armani Ramírez
14 views16 slides
Future of Indian ConsumerTech by
Future of Indian ConsumerTechFuture of Indian ConsumerTech
Future of Indian ConsumerTechKapil Khandelwal (KK)
21 views68 slides
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院IttrainingIttraining
52 views8 slides
Case Study Copenhagen Energy and Business Central.pdf by
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdfAitana
16 views3 slides

Recently uploaded(20)

Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma39 views
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson85 views
Attacking IoT Devices from a Web Perspective - Linux Day by Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri16 views
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 views
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
AMAZON PRODUCT RESEARCH.pdf by JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta26 views

Mind the Gap - Data Science Meets Software Engineering