5. EDA
• Analyzing data sets to summarize and visualize data properties is our
main area of business
• Understanding characters of data
• Finding meaningful patters in data
• Possible modeling strategies
• Debugging strategies
• Visualization of results
7. Data Ingestion and process
• Start MySQL database
• RUN => docker run -d --name ntuhs --ip 172.17.0.2 -v /root/db:/var/lib/mysql -e "TZ=Asia/Taipei" -p
3306:3306 -e MYSQL_ROOT_PASSWORD=1234 -d mysql:5.7.31
MySQL data is 2020-12-02 training material
Set MySQL timezone
8. Data Ingestion and process
• Start Jupyter-notebook-lab
• RUN =>
• RUN =>
• Please docker_20201209 to work folder
docker run --rm --add-host=mysql:172.17.0.2 -p 20000:8888 -e JUPYTER_ENABLE_LAB=yes -v
/root/docker_20201209/work:/home/jovyan/work jupyter/datascience-notebook
chmod -R 777 /root/docker_20201209/work
9. Data Ingestion and process
• Run python scripts
• RUN =>
• RUN =>
• RUN =>
• RUN =>
• RUN =>
• RUN =>
etl_1.ipynb
etl_2.ipynb
etl_3.ipynb
basic_plot.ipynb
matplotlib.ipynb
seaborn.ipynb
10. Data Ingestion and process
• Check six tables are ready
• Tables were created at 2020-12-02 training class
11. Visualization on Web
• Start Webpage
• RUN =>
• RUN =>
docker run --rm -it -u 0 -p 8080:8080 -v /root/docker_20201209/app/:/app
orozcohsu/ntunhs_20201209:v1 /bin/bash
cd /app; python app.py
14. Visualization on Tableau
• Tableau on trial
• https://downloads.tableau.com/tssoftware/TableauDesktop-64bit-2020-3-
3.exe
15. Visualization on Tableau
• Download MySQL driver and install
• https://www.dropbox.com/s/exops2ksus288kl/vcredist_x64.exe?dl=0
• https://www.dropbox.com/s/46milxgm04m5w7q/mysql-connector-odbc-
5.3.13-winx64.msi?dl=0