This document discusses how Puppet can be used to set up and manage a minimum viable BI (business intelligence) infrastructure at Stylight. It provides tips for running Puppet in standalone mode on Windows machines, using scheduled tasks to regularly sync configurations and run scripts, and defining reusable classes and definitions to avoid duplicating configurations. It also covers how Puppet can help implement a lean approach to ranking models through a multi-stage evaluation process using Solr and A/B testing.
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Puppet Camp London 2015 - Helping Data Teams with Puppet
1. S T Y L I G H T . C O M
Helping Data Teams with
Puppet
S T Y L I G H T . C O M
S E R G I I K H O M E N K O , D A T A S C I E N T I S T ,
S E R G I I . K H O M E N K O @ S T Y L I G H T . C O M , @ l c 0 d 3 r
2. W h o ? W h a t ? W h y ?
S e t t i n g u p y o u r B I w i t h p u p p e t .
S m a l l t i p s a n d t r i c k s
P u p p e t y o u r r a n k i n g
A G E N D A
3. Data scientist at one of the biggest fashion communities,
STYLIGHT. Data analysis and visualization hobbyist.
Speaker at Berlin Buzzwords 2014, ApacheCon Europe 2014
Founder and speaker at Munich Golang UG, Munich Tableau UG.
Speaker at Munich UseR Group, Munich Search UG, Munich
Quantified Self UG.
Sergii Khomenko
Milos Radovanovic
Passionate about DevOps stuff:
1. microservices
2. docker
3. 12 factor apps
4. continuous integration/deployment
4.
5.
6. L i v e i n 1 2 c o u n t r i e s
STYLIGHT – international community
7. S T Y L I G H T . C O M
Setting up your BI with
puppet.
8. T a b l e a u - r e p o r t i n g a n d a d - h o c s
P y t h o n / T a l e n d E T L t o o l s
Minimum Viable BI
9. R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E
Minimum Viable BI
We use Puppet for *nix servers and can’t merge
with Windows machine
Standalone mode for Puppet
– easier to start and develop
– windows machines are separated from *nix ones
10. R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E
Minimum Viable BI
cd c:folderwithour-bi
git pull origin master
IF %ERRORLEVEL% NEQ 0 set
context=GIT_FAILURE && goto error_handler
puppet apply --modulepath=puppetmodules puppetwin-
node-name.net.pp
IF %ERRORLEVEL% NEQ 0 set
context=PUPPET_FAILURE && goto error_handler
goto end
11. R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E
Minimum Viable BI
:error_handler
echo entering error_handler
EVENTCREATE /T ERROR /L APPLICATION /SO
Puppet_Scheduler /ID 100 /D "EXECUTION FAILED
REASON %context%"
goto end
:end
echo DONE
12. Minimum Viable BI
Standalone mode for Puppet
– configuration is totally separated
– custom modules --modulepath=puppetmodules
– Github hosted configuration
– Error handling via Windows event log
R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E
13. Minimum Viable BI
node 'ʹwin-‐‑node-‐‑name.net'ʹ {
scheduled_task {'ʹrefresh-‐‑1'ʹ:
ensure => present,
enabled => true,
command => 'ʹC:pathtoyourscript.bat'ʹ,
arguments => 'ʹsome args 'ʹ,
S C H E D U L I N G I S I M P O R T A N T
14. Minimum Viable BI
user => 'ʹyour-‐‑user'ʹ,
password => 'ʹyour-‐‑password'ʹ,
trigger => {
schedule => daily,
start_time => 'ʹ06:00'ʹ,
}
}
S C H E D U L I N G I S I M P O R T A N T
15. Minimum Viable BI
# Can't use the Puppet's scheduled_task as it does not
support to run the schedule task every 5 minutes.
https://github.com/sdliangzhihua/windows-puppet-
example/blob/master/manifest.pp#L68
S Y N C M Y C O N F I G U R A T I O N E V E R Y 1 5 M I N
16. Minimum Viable BI
$cmd = 'C:Windowssystem32cmd.exe'
$job_name = 'sync_code'
exec { 'CreateCodeSyncScheduledTask':
command => "${cmd} /C schtasks /create /sc
MINUTE /mo 15 /tn ${job_name} /tr C:your
puppet.bat /ru administrator /f",
onlyif => ["${cmd} /C schtasks /query /tn ${job_name}
& if errorlevel 1 (exit /b 0) else exit /b 1"],
S Y N C M Y C O N F I G U R A T I O N E V E R Y 1 5 M I N
17. S T Y L I G H T . C O M
Small tips and tricks
do not repeat yourself and other tricks
18. Minimum Viable BI
node 'ʹwin-‐‑node-‐‑name.net'ʹ {
scheduled_task {'ʹrefresh-‐‑1'ʹ:
ensure => present,
enabled => true,
command => 'ʹC:pathtoyourscript.bat'ʹ,
arguments => 'ʹsome args 'ʹ,
S C H E D U L I N G I S I M P O R T A N T
19. Small tips and tricks
class job_scheduler(
$ensure = $job_scheduler::params::ensure,
$enabled = $job_scheduler::params::enabled,
$user = $job_scheduler::params::user,
$password = $job_scheduler::params::password,
$working_dir = $job_scheduler::params::working_dir,
)inherits job_scheduler::params{
}
20. Small tips and tricks
define job_scheduler::job
(
$arguments ='ʹtableau_adobe.py'ʹ,
$command ='ʹc:Py27-‐‑32python.exe'ʹ,
$schedule_type ='ʹdaily'ʹ,
$start_time ='ʹ08:15'ʹ,
$day_of_week ='ʹevery'ʹ,
)
{
21. Small tips and tricks
define job_scheduler::tableau_job
(
$arguments ='ʹdefault-‐‑tableau'ʹ,
$command ='ʹc:foldertableau.bat'ʹ,
$schedule_type ='ʹdaily'ʹ,
$start_time ='ʹ21:00'ʹ,
$day_of_week ='ʹevery'ʹ,
)
{
22. Small tips and tricks
# Params with default values for the tableau job
# that might be changed in a job definition
#
# 1. $arguments ='default-argument',
# 2. $command ='c:folderscript.bat',
# 3. $schedule_type ='daily',
# 4. $start_time ='21:00',
# 5. $day_of_week ='every',
####################
24. Small tips and tricks
job_scheduler::redshift_job {
'ʹRS tagged products'ʹ: start_time => 'ʹ00:40'ʹ, params =>
'ʹ..datasourcessomething.tds'ʹ;
'ʹRS another job'ʹ: start_time => 'ʹ00:50'ʹ, params => 'ʹ..
datasourceselse.tds'ʹ
25. S T Y L I G H T . C O M
Puppet your ranking
Lean, flexible, powerful
26. A r a n k i n g i s a r e l a t i o n s h i p
b e t w e e n a s e t o f i t e m s s u c h t h a t ,
f o r a n y t w o i t e m s , t h e f i r s t i s
e i t h e r ' r a n k e d h i g h e r t h a n ' ,
' r a n k e d l o w e r t h a n ' o r ' r a n k e d
e q u a l t o ' t h e s e c o n d .
27. Ranking specifics:
• Seasonal influence
• Trends
• Cold start of new countries, shops
• Multiple dimensions of ranking model
28. Requirements:
• Decreasing time to implement new ranking
model
• Keeping working infrastructure alive
• A/B testing without changing entire
infrastructure
• Performance level - “still fast” and
“transparent”
Lean approach to Ranking
M u l t i p l e p o i n t s o f e v a l u a t i o n
33. Lean approach to Ranking
<% urls.each do |url| -%>
if ($args ~* <% if url['gender'] > 0 -%>gender_id%3A<
%= url['gender'] %>.*<% end -%><% url['tags'].each
do |tag| -%>tag_id%3A<%= tag %>.*<% end -%><%
if url['brand'] > 0 -%>brand_id%3A%28<%=
url['brand'] %>%29<% end -%>) {
set $orig $args;
set $args "q={!boost+b=%24b+defType=dismax+v=
%24qq}&qq=id:*";
rewrite ^(.*)$ "$1?$orig" break;
}
<% end -%>
nginx / templates / conf / solr-rewrites.conf.erb
34. Stages to evaluate a model:
• R ranking model
• Independent Solr-node
1. For internal use-cases
2. Testing for some of pages
3. A/B roll out for % of users
• Production roll out
Lean approach to Ranking
M u l t i p l e p o i n t s o f e v a l u a t i o n
36. S T Y L I G H T . C O M
Sergii Khomenko
Data Scientist
STYLIGHT GmbH
sergii.khomenko@stylight.com
@lc0d3r
Nymphenburger Straße 86
80636 Munich, Germany