How to create a secured multi tenancy for clustered ML with JupyterHub

How-to create a secured multi
tenancy for Clustered ML with
JupyterHub
Non-root + JupyterHub + Kerberos +
IPython Cluster as a service

Introduction
With this presentation you should be able to create a kerberos secured architecture for a
framework of an interactive data analysis and machine learning by using a
Jupyter/JupyterHub powered by IPython Clusters that enables the processing clustering
local and/or remote nodes.

Architecture
This architecture enables the following:
● Transparent data-science development
● User Authentication
● Authentication via Kerberos + SSH
● Upgrades on Cluster won’t affect the developments.
● Controlled access to the data and resources by Kerberos Tickets.
● Several coding API’s (Scala, R, Python, PySpark, etc…).
● Parallel Processing
● JupyterHub as service and non-root user

Pre-Assumptions
1. Jupyter Machine hostname: cm1.localdomain
2. Controller Node hostname: cm1.localdomain Engine Node hostname: cm2.localdomain
3. Conda Python version: 3.8.5
4. Jupyter Machine Authentication Pre-Installed: Kerberos
a. Kerberos Realm DOMAIN.COM
5. JupyterHub Machine Authentication Not-Installed: Kerberos
6. Permissions user with root or sudo
7. MIT Kerberos installed on your windows machine

Miniconda
Add Anaconda User/Dir
adduser anaconda;
passwd anaconda;
mkdir /opt/anaconda;
Download and installation
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -P /tmp;
chmod +x /tmp/Miniconda3-latest-Linux-x86_64.sh;
/tmp/Miniconda3-latest-Linux-x86_64.sh -b -u -p /opt/anaconda;
Note 1: Change with your values in the highlighted field.
Note 2: JupyterHub requires Python 3.X, therefore it will be installed Anaconda 3
Add Permissions MiniConda
chown -R anaconda:anaconda /opt/anaconda;
chmod -R go-w /opt/anaconda && chmod -R go+rX /opt/anaconda;
mkdir -p /apps/anaconda/pkgs;
chown -R anaconda:anaconda /apps/anaconda/pkgs && chmod -R oug+rwx /apps;

Anaconda
Set Conda Bash Configurations
nano .bashrc;
export CONDA_PKGS_DIRS="/apps/anaconda/pkgs","/opt/anaconda/pkgs","/home/$USER/.conda/pkgs"
export CONDA_ENVS_DIRS="/apps/anaconda/$USER/envs"
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/anaconda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/opt/anaconda/etc/profile.d/conda.sh" ]; then
. "/opt/anaconda/etc/profile.d/conda.sh"
else
export PATH="/opt/anaconda/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
conda config --set auto_update_conda False && conda config --add channels conda-forge;
conda config --set pip_interop_enabled True;
Note: Change with your values in the highlighted field.

Jupyter or JupyterHub?
JupyterHub it’s a multi-purpose notebook that:
● Manages authentication.
● Spawns single-user notebook on-demand.
● Gives each user a complete notebook
server.
How to choose?

JupyterHub
JupyterHub needs to be executed with root privileges or at least some root privileges (ie for example to access to the
pam passwords). Therefore we will need to configure a special user (with no password) that it will be used by the
sudospawner!
For this example we will set: user: jupyter | group: jupyterhub to execute the JupyterHub Server as a service. Any new
user that should access to Jupyter and Spawn Notebooks … must be added to the JupyterHub group.
Create User/Group to operate as Service
sudo useradd jupyter && sudo groupadd jupyterhub && sudo usermod jupyter -G jupyterhub;
Add jupyter to root group & Give Read Permissions (PAM)
sudo usermod -a -G root jupyter; sudo chmod g+r /etc/shadow;
Log as Jupyter user
su - jupyter;
Note 1: it’s only necessary to change the highlighted

JupyterHub
Set Conda Bash Configurations
Use the configurations on the Page 7.
Create Environment for JupyterHub
conda create -n jupyterhub_env;
Activate Environment for JupyterHub
conda activate jupyterhub_env;
Install JupyterHub Packages
conda install jupyterhub jupyterlab notebook configurable-http-proxy;
Install sudospawner Package
conda install -c conda-forge sudospawner;
Check sudospawner location
which sudospawner;
Note 1: it’s only necessary to change the highlighted
Create JupyterHub Directories
sudo mkdir /etc/jupyterhub;
sudo chown jupyter:jupyterhub /etc/jupyterhub;
Generate JupyterHub Config file
cd /etc/jupyterhub && jupyterhub --generate-config;

JupyterHub
Create/Edit sudoers config
sudo nano /etc/sudoers.d/jupytersudoers;
Runas_Alias JUPYTER_USERS = jupyter
Cmnd_Alias JUPYTER_CMD =
/apps/anaconda/jupyter/envs/jupyterhub_env/bin/sudospawner
%jupyterhub ALL=(jupyter) /usr/bin/sudo
jupyter ALL=(%jupyterhub) NOPASSWD:JUPYTER_CMD
Start JupyterHub Server With Config File
jupyterhub -f /etc/jupyterhub/jupyterhub_config.py;
Note: it’s only necessary to change the highlighted, ex: for your ip.
Create/Edit sudoers config
sudo nano /etc/sudoers.d/jupytersudoers;
import os
import pwd
import subprocess
def create_dir_hook(spawner):
if not os.path.exists(os.path.join('/home/', spawner.user.name)):
subprocess.call(["sudo", "/sbin/mkhomedir_helper",
spawner.user.name])
c.Spawner.pre_spawn_hook = create_dir_hook
c.JupyterHub.bind_url = 'http://10.111.22.333:8000'
c.JupyterHub.hub_bind_url = 'http://10.111.22.333:8081'
c.JupyterHub.hub_ip = '10.111.22.333’
c.JupyterHub.spawner_class = 'sudospawner.SudoSpawner'
c.SudoSpawner.sudospawner_path =
'/apps/anaconda/jupyter/envs/jupyterhub_env/bin/sudospawner'
c.Authenticator.admin_users = {'jupyter'}

JupyterHub
Create systemd JupyterHub Directory
sudo mkdir -p /home/jupyter/.config/systemd;
Create systemd JupyterHub service Configuration
sudo nano /home/jupyter/.config/systemd/jupyterhub.service;
[Unit]
Description=Jupyterhub Server
After=syslog.target network-online.target
[Service]
Type=simple
User=jupyter
ExecStart=/etc/jupyterhub/runJupyterhub.sh
WorkingDirectory=/etc/jupyterhub
Restart=on-failure
RestartSec=1min
TimeoutSec=5min
[Install]
WantedBy=multi-user.target
Note: it’s only necessary to change the highlighted
Create JupyterHub Script for Systemd
nano /etc/jupyterhub/runJupyterhub.sh;
#!/bin/bash
export
CONDA_PKGS_DIRS="/apps/anaconda/pkgs","/opt/anaconda/pkgs","/home/$USER/.
conda/pkgs"
export CONDA_ENVS_DIRS="/apps/anaconda/$USER/envs"
__conda_setup="$('/opt/anaconda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/opt/anaconda/etc/profile.d/conda.sh" ]; then
. "/opt/anaconda/etc/profile.d/conda.sh"
else
export PATH="/opt/anaconda/bin:$PATH"
fi
fi
unset __conda_setup
conda activate /apps/anaconda/jupyter/envs/jupyterhub_env
/apps/anaconda/jupyter/envs/jupyterhub_env/bin/jupyterhub -f
/etc/jupyterhub/jupyterhub_config.py 2>&1 | tee /var/log/jupyter/jupyterhub.log

JupyterHub
Create systemd JupyterHub service symbolic link
sudo ln -s /home/jupyter/.config/systemd/jupyterhub.service /etc/systemd/system/jupyterhub.service;
Enable/Start systemd JupyterHub service
sudo systemctl enable jupyterhub.service;
sudo systemctl start jupyterhub && systemctl status jupyterhub;

IPython Clusters
With this functionality it will enable on the current architecture, the ability to distribute your python processing between
local and/or remote cpu and therefore use the power of parallel processing.
Install ipyparallel
conda install ipyparallel;
Note: This package must be installed on the controller machine and on all remote engine nodes!
Apply to All Users
jupyter nbextension install --sys-prefix --py ipyparallel;
jupyter nbextension enable --sys-prefix --py ipyparallel;
jupyter serverextension enable --sys-prefix --py ipyparallel;

IPython Clusters
Create ssh profile on user
ipython profile create --parallel --profile=ssh;
Note: this is on the scope of the user that will run/spawn the notebook ex: tpsimoes
Configure ssh profile on user
nano /home/tpsimoes/.ipython/profile_ssh/ipcluster_config.py;
c.IPClusterStart.controller_launcher_class = 'Local'
c.IPClusterEngines.engine_launcher_class = 'SSH'
c.SSHEngineSetLauncher.engines = { 'cm1.localdomain' : 2, 'cm2.localdomain' : 5 }
nano /home/tpsimoes/.ipython/profile_ssh/ipcontroller_config.py;
c.IPControllerApp.location = 'cm1.localdomain'
c.HubFactory.client_ip = '10.111.22.333'
c.HubFactory.engine_ip = '10.111.22.333'
c.HubFactory.ip = '*'

So that IPython Cluster Controller (SSH profile) can communicate with all the engines (local and remote) we will need to
configure the SSH on Local machine and also on the remote nodes.
KeyLess Configuration
ssh-keygen;
Copy the SSH Public Key (id_rsa.pub) to the root account on your target hosts.
ssh-copy-id -i ~/.ssh/id_rsa.pub -p 22 tpsimoes@cm2.localdomain;
Add the SSH Public Key to the authorized_keys file on your target hosts.
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys;
Add User to SSH
ssh tpsimoes@localhost;
ssh tpsimoes@cm1.localdomain;
ssh tpsimoes@cm2.localdomain;
Try connecting User via SSH
ssh -p '22' 'tpsimoes@cm2.localdomain';
IPython Clusters

IPython Clusters
When starting a Cluster via JupyterHub UI would should see on your logs the communication between machines…
JupyterHub Logs
[I 2021-02-22 14:28:43.979 SingleUserNotebookApp launcher:591] ensuring remote cm1.localdomain:.ipython/profile_ssh/security/ exists
Connection to cm1.localdomain closed.
[I 2021-02-22 14:28:44.776 SingleUserNotebookApp launcher:595] sending /home/tpsimoes/.ipython/profile_ssh/security/ipcontroller-client.json to
cm1.localdomain:.ipython/profile_ssh/security/ipcontroller-client.json
[I 2021-02-22 14:28:46.405 SingleUserNotebookApp launcher:595] sending /home/tpsimoes/.ipython/profile_ssh/security/ipcontroller-engine.json to
cm1.localdomain:.ipython/profile_ssh/security/ipcontroller-engine.json
[I 2021-02-22 14:28:48.087 SingleUserNotebookApp launcher:595] sending /home/tpsimoes/.ipython/profile_ssh/security/ipcontroller-client.json to
cm2.localdomain:.ipython/profile_ssh/security/ipcontroller-client.json
[I 2021-02-22 14:28:49.652 SingleUserNotebookApp launcher:595] sending /home/tpsimoes/.ipython/profile_ssh/security/ipcontroller-engine.json to
cm2.localdomain:.ipython/profile_ssh/security/ipcontroller-engine.json

Kerberos on JupyterHub
Install gcc Lib
sudo yum install -y gcc;
Create HTTP principal/keytab
sudo kadmin.local <<eoj
addprinc -randkey HTTP/cm1.localdomain@DOMAIN.COM
xst -norandkey -k HTTP.keytab HTTP/cm1.localdomain@DOMAIN.COM
eoj
Change Ownership and Permissions on Keytab
sudo mv HTTP.keytab /etc/jupyterhub/HTTP.keytab;
sudo chmod 440 /etc/jupyterhub/HTTP.keytab;
sudo chown jupyter:jupyterhub /etc/jupyterhub/HTTP.keytab;
Install gcc Lib
pip install jupyterhub-kerberosauthenticator;
Edit Final JupyterHub Configurations
nano /etc/jupyterhub/jupyterhub_config.py;
c.PAMAuthenticator.open_sessions = False
import os
import pwd
import subprocess
def create_dir_hook(spawner):
if not os.path.exists(os.path.join('/home/', spawner.user.name)):
subprocess.call(["sudo", "/sbin/mkhomedir_helper",
spawner.user.name])
c.Spawner.pre_spawn_hook = create_dir_hook
c.JupyterHub.bind_url = 'http://10.111.22.333:8000'
c.JupyterHub.hub_bind_url = 'http://10.111.22.333:8081'
c.JupyterHub.hub_ip = '10.111.22.333'
c.JupyterHub.spawner_class = 'sudospawner.SudoSpawner'
c.SudoSpawner.sudospawner_path =
'/apps/anaconda/jupyter/envs/jupyterhub_env/bin/sudospawner'
c.Authenticator.admin_users = {'jupyter'}
c.JupyterHub.authenticator_class = 'kerberosauthenticator.KerberosLocalAuthenticator'

Kerberos on JupyterHub
Something to be in attention… is that all users principals should be added headless!
Create User Principal
sudo kadmin.local <<eoj
addprinc -pw password tpsimoes@DOMAIN.COM
modprinc -maxrenewlife 7d +allow_renewable tpsimoes@DOMAIN.COM
eoj
JupyterHub Logs
[I 2021-02-19 18:01:18.993 JupyterHub app:2240] Running JupyterHub version 1.1.0
[I 2021-02-19 18:01:18.994 JupyterHub app:2270] Using Authenticator: kerberosauthenticator.auth.KerberosLocalAuthenticator-0.2.0
[I 2021-02-19 18:01:18.994 JupyterHub app:2270] Using Spawner: jupyterhub.spawner.LocalProcessSpawner-1.1.0
[I 2021-02-19 18:01:18.994 JupyterHub app:2270] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-1.1.0
[I 2021-02-19 18:01:18.999 JupyterHub app:1349] Loading cookie_secret from /root/jupyterhub_cookie_secret
[I 2021-02-19 18:01:19.033 JupyterHub proxy:461] Generating new CONFIGPROXY_AUTH_TOKEN
[...]
[I 2021-02-19 18:02:36.755 JupyterHub base:707] User logged in: tpsimoes
[I 2021-02-19 18:02:36.757 JupyterHub log:174] 302 GET /hub/kerberos_login -> /hub/spawn (@10.184.16.24) 11.75ms
[I 2021-02-19 18:02:36.837 JupyterHub spawner:1417] Spawning jupyterhub-singleuser --port=42504
[I 2021-02-19 18:02:39.082 SingleUserNotebookApp singleuser:561] Starting jupyterhub-singleuser server version 1.1.0

Thanks
Big Data Engineer
Tiago Simões

How to create a secured multi tenancy for clustered ML with JupyterHub

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How to create a secured multi tenancy for clustered ML with JupyterHub

Similar to How to create a secured multi tenancy for clustered ML with JupyterHub (20)

Recently uploaded

Recently uploaded (20)

How to create a secured multi tenancy for clustered ML with JupyterHub