SlideShare a Scribd company logo
Me

5 years as a software developer

3 years in working with data

Worked in 3 companies (1 startup)

~ 15 projects
The context

Your company has valuable data
− Maybe lots of it

You want to create predictive
models/algorithms

You want to integrate it into your product
What would you want

“Let's start doing something!”

Create a process:
− Your raw data → … → … → … → Happy end-user

Robust

Monitorable
All starts with writing code

Model/algorithm creation needs writing code

Amazing data scientists come from academia
(physics, chemistry, math, CS)

Code in academia is a means to an end

Developers hate academics' code
How to test it?

Testability must be considered in advance!

Testing different parts of code with unit tests?

Testing functionality?

Code output: number. Or several.
How to test it?

Testability must be considered in advance!

Testing different parts of code with unit tests?

Testing functionality?

Code output: number. Or several.
How to test it?

Small test dataset != real dataset !=
representable dataset

data1.csv, data2.csv, data21.csv, data3.csv,
data3-old.csv
− please, keep better track of datasets used for
development :(
Let's deploy!

Danger: too full-stack data scientist

CI pipeline should contain:
− Back-testing
− Newest-data-testing

Human is too human (lazy)

Model quality will degrade over time
Let's deploy!

Danger: too full-stack data scientist

CI pipeline should contain:
− Back-testing
− Newest-data-testing

Human is too human (lazy)

Model quality will degrade over time
Tools

Some offers that automate parts of work
− Data preparation, model creation, deployment

They put you inside a specific box

They are not free
Tools

Microsoft Azure, Amazon ML – visual creation
− Tradeoff: development speed VS best fit for your
data
− Also privacy policies
Operations

Computations, data transformations
− Violent spikes in RAM demand

Distributed storage (esp. HBase)
− Violent spikes in bandwidth demand
Speaking of Hadoop

Do you really need it?

Hadoop = fancy hammer, all data = nails?

It's “unstable”

It's not very visual
− What is visual is often poor
Speaking of Hadoop

Data lineage is tricky

Security (Kerberos) is tricky

Data consistency when modifying ETL tasks
− Apache Oozie
− Apache NiFi
Speaking of Hadoop

Data lineage is tricky

Security (Kerberos) is tricky

Data consistency when modifying ETL tasks
− Apache Oozie
− Apache NiFi
If you choose Hadoop

Consider Hortonworks, Cloudera, MapR
− They are free of charge

Doing it yourself – if you are expert or paid to
tinker
If you choose Hadoop

Don't rely on more classic monitoring tools

Hortonworks, Cloudera, MapR provide
excellent monitoring
Hardware

Big corporate customers – fanciest hardware

I've operated an old PC cluster – totally OK

Amazon EMR + S3
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how to prepare?

More Related Content

Similar to DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how to prepare?

OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
Paco Nathan
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
Big Data Spain
 
So your boss says you need to learn data science
So your boss says you need to learn data scienceSo your boss says you need to learn data science
So your boss says you need to learn data science
Susan Ibach
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
Adam Muise
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
Christopher Curtin
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
Rajesh Muppalla
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera, Inc.
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
PyData
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
CCG
 
SQL Server Managing Test Data & Stress Testing January 2011
SQL Server Managing Test Data & Stress Testing January 2011SQL Server Managing Test Data & Stress Testing January 2011
SQL Server Managing Test Data & Stress Testing January 2011
Mark Ginnebaugh
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
Andrei Savu
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
Paco Nathan
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
elephantscale
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
Paco Nathan
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
MongoDB
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
BIWUG
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
Joris Poelmans
 
Final deck
Final deckFinal deck
Final deck
Steve Watt
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
Blackvard
 
Way #5 Don’t end up in a ditch because you weren’t aware of roadblocks in you...
Way #5 Don’t end up in a ditch because you weren’t aware of roadblocks in you...Way #5 Don’t end up in a ditch because you weren’t aware of roadblocks in you...
Way #5 Don’t end up in a ditch because you weren’t aware of roadblocks in you...
panagenda
 

Similar to DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how to prepare? (20)

OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
So your boss says you need to learn data science
So your boss says you need to learn data scienceSo your boss says you need to learn data science
So your boss says you need to learn data science
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
 
SQL Server Managing Test Data & Stress Testing January 2011
SQL Server Managing Test Data & Stress Testing January 2011SQL Server Managing Test Data & Stress Testing January 2011
SQL Server Managing Test Data & Stress Testing January 2011
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
Final deck
Final deckFinal deck
Final deck
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Way #5 Don’t end up in a ditch because you weren’t aware of roadblocks in you...
Way #5 Don’t end up in a ditch because you weren’t aware of roadblocks in you...Way #5 Don’t end up in a ditch because you weren’t aware of roadblocks in you...
Way #5 Don’t end up in a ditch because you weren’t aware of roadblocks in you...
 

More from DevOpsDays Riga

DevOpsDaysRiga 2017: Mark Smalley - Kill DevOps
DevOpsDaysRiga 2017: Mark Smalley - Kill DevOpsDevOpsDaysRiga 2017: Mark Smalley - Kill DevOps
DevOpsDaysRiga 2017: Mark Smalley - Kill DevOps
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...
DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...
DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...
DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...
DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...
DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...
DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...
DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...
DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...
DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...
DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...
DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...
DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflow
DevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflowDevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflow
DevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflow
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot Overlords
DevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot OverlordsDevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot Overlords
DevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot Overlords
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...
DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...
DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...
DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...
DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...
DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...
DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...
DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...
DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...
DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...
DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...
DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...
DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs Culture
DevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs CultureDevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs Culture
DevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs Culture
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...
DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...
DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCity
DevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCityDevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCity
DevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCity
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...
DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...
DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...
DevOpsDays Riga
 

More from DevOpsDays Riga (20)

DevOpsDaysRiga 2017: Mark Smalley - Kill DevOps
DevOpsDaysRiga 2017: Mark Smalley - Kill DevOpsDevOpsDaysRiga 2017: Mark Smalley - Kill DevOps
DevOpsDaysRiga 2017: Mark Smalley - Kill DevOps
 
DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...
DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...
DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...
 
DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...
DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...
DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...
 
DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...
DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...
DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...
 
DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...
DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...
DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...
 
DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...
DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...
DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...
 
DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...
DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...
DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...
 
DevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflow
DevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflowDevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflow
DevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflow
 
DevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot Overlords
DevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot OverlordsDevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot Overlords
DevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot Overlords
 
DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...
DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...
DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...
 
DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...
DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...
DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...
 
DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...
DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...
DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...
 
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
 
DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...
DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...
DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...
 
DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...
DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...
DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...
 
DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...
DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...
DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...
 
DevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs Culture
DevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs CultureDevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs Culture
DevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs Culture
 
DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...
DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...
DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...
 
DevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCity
DevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCityDevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCity
DevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCity
 
DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...
DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...
DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...
 

Recently uploaded

How to make a complaint to the police for Social Media Fraud.pdf
How to make a complaint to the police for Social Media Fraud.pdfHow to make a complaint to the police for Social Media Fraud.pdf
How to make a complaint to the police for Social Media Fraud.pdf
Infosec train
 
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
rtunex8r
 
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
3a0sd7z3
 
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
APNIC
 
cyber crime.pptx..........................
cyber crime.pptx..........................cyber crime.pptx..........................
cyber crime.pptx..........................
GNAMBIKARAO
 
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
thezot
 
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
3a0sd7z3
 
KubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial IntelligentKubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial Intelligent
Emre Gündoğdu
 
Bengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal BrandingBengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal Branding
Tarandeep Singh
 
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
dtagbe
 
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
APNIC
 
HijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process HollowingHijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process Hollowing
Donato Onofri
 

Recently uploaded (12)

How to make a complaint to the police for Social Media Fraud.pdf
How to make a complaint to the police for Social Media Fraud.pdfHow to make a complaint to the police for Social Media Fraud.pdf
How to make a complaint to the police for Social Media Fraud.pdf
 
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
 
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
 
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...
 
cyber crime.pptx..........................
cyber crime.pptx..........................cyber crime.pptx..........................
cyber crime.pptx..........................
 
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
一比一原版新西兰林肯大学毕业证(Lincoln毕业证书)学历如何办理
 
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
 
KubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial IntelligentKubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial Intelligent
 
Bengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal BrandingBengaluru Dreamin' 24 - Personal Branding
Bengaluru Dreamin' 24 - Personal Branding
 
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
 
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...Securing BGP: Operational Strategies and Best Practices for Network Defenders...
Securing BGP: Operational Strategies and Best Practices for Network Defenders...
 
HijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process HollowingHijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process Hollowing
 

DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how to prepare?

  • 1.
  • 2. Me  5 years as a software developer  3 years in working with data  Worked in 3 companies (1 startup)  ~ 15 projects
  • 3. The context  Your company has valuable data − Maybe lots of it  You want to create predictive models/algorithms  You want to integrate it into your product
  • 4. What would you want  “Let's start doing something!”  Create a process: − Your raw data → … → … → … → Happy end-user  Robust  Monitorable
  • 5. All starts with writing code  Model/algorithm creation needs writing code  Amazing data scientists come from academia (physics, chemistry, math, CS)  Code in academia is a means to an end  Developers hate academics' code
  • 6. How to test it?  Testability must be considered in advance!  Testing different parts of code with unit tests?  Testing functionality?  Code output: number. Or several.
  • 7. How to test it?  Testability must be considered in advance!  Testing different parts of code with unit tests?  Testing functionality?  Code output: number. Or several.
  • 8. How to test it?  Small test dataset != real dataset != representable dataset  data1.csv, data2.csv, data21.csv, data3.csv, data3-old.csv − please, keep better track of datasets used for development :(
  • 9. Let's deploy!  Danger: too full-stack data scientist  CI pipeline should contain: − Back-testing − Newest-data-testing  Human is too human (lazy)  Model quality will degrade over time
  • 10. Let's deploy!  Danger: too full-stack data scientist  CI pipeline should contain: − Back-testing − Newest-data-testing  Human is too human (lazy)  Model quality will degrade over time
  • 11. Tools  Some offers that automate parts of work − Data preparation, model creation, deployment  They put you inside a specific box  They are not free
  • 12. Tools  Microsoft Azure, Amazon ML – visual creation − Tradeoff: development speed VS best fit for your data − Also privacy policies
  • 13. Operations  Computations, data transformations − Violent spikes in RAM demand  Distributed storage (esp. HBase) − Violent spikes in bandwidth demand
  • 14. Speaking of Hadoop  Do you really need it?  Hadoop = fancy hammer, all data = nails?  It's “unstable”  It's not very visual − What is visual is often poor
  • 15. Speaking of Hadoop  Data lineage is tricky  Security (Kerberos) is tricky  Data consistency when modifying ETL tasks − Apache Oozie − Apache NiFi
  • 16. Speaking of Hadoop  Data lineage is tricky  Security (Kerberos) is tricky  Data consistency when modifying ETL tasks − Apache Oozie − Apache NiFi
  • 17. If you choose Hadoop  Consider Hortonworks, Cloudera, MapR − They are free of charge  Doing it yourself – if you are expert or paid to tinker
  • 18. If you choose Hadoop  Don't rely on more classic monitoring tools  Hortonworks, Cloudera, MapR provide excellent monitoring
  • 19. Hardware  Big corporate customers – fanciest hardware  I've operated an old PC cluster – totally OK  Amazon EMR + S3