Practical automation
for beginners
How to start workplace automation projects
with basic Python
Seoweon Yoo (seoweon.yoo@reddal.com)
Slides: bit.ly/practical_automation
0
Who I am not:
• Computer science major
• Developer by trade
• Expert in Python
• Anywhere near being able to say “I know enough
Python to speak at PyCon!”
The programs that run the world are not all made by coding gurus – they are made by
doers and go-getters (even if those people are beginners!)
2
About me
Who I am:
• Liberal arts major (English Literature)
• Employee of a professional services firm
• Beginner of programming and Python
• Developed 3+ automation tools in practical use
Agenda
4
Why we all need to give a try in automation
How to start automation projects
How to succeed in doing it
There is often a gap between what we want our jobs to be and what our job actually is
5
What we really do
Manual data work that
does not add value to
personal development
Image source: http://perceptionvsfact.com/
Automation opportunities can easily be found in our day-to-day work, regardless of the
specific occupation
6
Potential automation opportunities
Input Process Output
Frequency:
repetitive /
routine tasks
Ideal • From structured database
• Software data
• Website
• Dropdown input forms
• Fairly stable excel files
• Rules-based and objective
• Has formal documentation
• Perceived as a “chore” that is
non-critical but has to be done
• Low penalty for making
mistakes
Difficult
(but not
imposs-
ible)
• From unstructured or unstable
database
• Long-form free text input
fields
• Meeting notes
• Requires subjective
judgement
• Human interaction is an
important part of the process
• Complex logic that can be
interpreted in many ways
• Mission-critical output that
should be sensitive in
conveying nuances
• Output that needs a high level
of flexibility
ILLUSTRATIVE
• It’s cheaper (monetary, resource
utilization, and/or opportunity
cost-wise)
• … that is all you need!
• The time and cost to develop
programs in Python is fast
Done right, automating tasks in your workplace can save operational costs (which is
always good)
7
The business case for workplace automation
Source: Transaction Cost Management, Chihiro Suematsu (https://g.co/kgs/hYgBWz)
0
5
10
15
20
25
30
35
40
0 1 2 3 4 5 6 7
Accumulatedcost
Number of transactions
Manual process With software
Initial cost of
deploying automation
ILLUSTRATIVE
Agenda
8
Why we all need to give a try in automation
How to start automation projects
How to succeed in doing it
Task Library
Data manipulation pandas
numpy
pandasql (as a backup when SQL comes more easily)
Web scraping pandas
requests
(No need for BeautifulSoup if the web data is already in table form)
File management os (mainly for making and navigating file directories)
xlsxwriter (writing excel files, connecting them to pandas Data Frames)
Basic data analytics pandas
numpy
networkx (simple optimization problems)
Other basic libraries datetime, time (for dealing with time)
tqdm (for a nice progress bar in case the code takes a bit of time)
The libraries needed to accomplish automation can be at a basic level
9
Ultra-basic libraries used
• A smaller chunk of work is always better
–Break down the task you have in mind and choose the
easiest parts first
–A weekend project is better than a week-long project
• Always start with low-hanging fruit
–Choose tasks that are simple and straightforward
–Possible to apply even entry-level programming to
solve practical problems
Once opportunities are identified, narrow down the automation project to a reasonable
scope
10
Scoping automation opportunities
A group of tasks
An entire task
Part of a task
ILLUSTRATIVE
Entire project management process
• Custom project management web application for
shipbuilding company
• Web application based on Django framework
Weekly dashboard generation
• Project vendor flight cost control status report and
reaching practical cost-saving through simple
analytics
• Step-by-step Jupyter Notebook documentation
and script to produce Excel document
With minimal effort, it is possible to develop simple code that can be useful in daily
operations
11
Using Python for automating the workplace – three scopes
Data collection
• Web-scraping program for screening request for
proposals (RFP):
github.com/seoweon/narajangteo
• Simple script run on command prompt
• Currently used weekly for regular operationsPart of a task
Sales operations
Lead generation
Previously, it required an employee to manually look up predefined keywords and
reorganize data from the government procurement site every week
12
Request for proposal (RFP) collection process (manual)
Manual process:
• Look up relevant keywords in the search bar
• Browse through a large list of entries and make
judgements on whether or not the RFP is
applicable based on the title
• Click on each individual link and download files
Requires some human judgement,
but the process itself follows a
predefined set of rules and the
required data input is stable
• One-time input of all the keywords you want to search
• One-time input of all the keywords you want to exclude
• Run the program and get the shortened list
With a simple Python script, we were able to shorten the amount of time spent as well
as capture better opportunities
13
Automating data collection (1/2)
category.txt
구매/구입/유지보수/용역
exclude.txt
건설/신축공사/소모품
구매/보강공사/개보수공사
/연구
With a simple Python script, we were able to shorten the amount of time spent as well
as capture better opportunities
14
Automating data collection (2/2)
download.txt
20170803758-00
20170802451-00
20170800229-01
• Write down the code number for RFPs (found in the
generated excel file) that you are interested to learn
more about
• Run code for downloading the relevant files
• RFPs and relevant documents are found in each
respective folder
• Libraries used:
• pandas
• requests
• os
• datetime
• time
• string
• tqdm (optional)
• Time spent:
• 2 days for first commit
• Incremental commits
(25 so far)
Entire project management process
• Custom project management web application for
shipbuilding company
• Web application based on Django framework
As the scope of automation grows, so does complexity and the possibility of failure
15
Using Python for automating the workplace – three scopes
Data collection
• Web-scraping program for screening request for
proposals (RFP):
github.com/seoweon/narajangteo
• Simple script run on command prompt
• Used weekly for regular operations
Weekly dashboard generation
• Flight cost control status report
• Step-by-step Jupyter Notebook documentation
and script
• Was used for 6 months
An entire task
Flight cost
management
Data was combined, cleaned up, and enriched through Python
16
Data cleanup with Python
Internal flight data
Finance data
Project data
Passenger data
Generic airport data
Cleaned-up, combined, enriched data
• Deletes duplicates
• Detects discrepancies
• Adds more context to flight reason
○ Detects missed flights, round-trip,
nonstop flights
○ Detects connecting flights booked
separately
• Enriches data with additional information
○ Calculates flight distance (miles)
○ Calculates cost-per-mile
Client department internal data
Corporate travel department
Publicly available dataset
• Libraries used:
○ pandas
○ numpy
○ pandasql
○ haversine
○ networkx
Data collection
• Web-scraping program for screening request for
proposals (RFP):
github.com/seoweon/narajangteo
• Simple script run on command prompt
• Used weekly for regular operations
Weekly dashboard generation
• Project vendor flight cost control status report and
reaching practical cost-saving through simple
analytics
• Step-by-step Jupyter Notebook documentation
and script to produce Excel document
Entire project management process
• Custom project management application for
shipbuilding company
• Web application based on Django framework
• Four-month development project with three full-
time team members
• Two-month training effort for user onboarding
As the project scope grows even further, the more involved it becomes to develop as
well as to implement
17
Using Python for automating the workplace – three scopes
A group of tasks
A shipbuilding company digitized their processes for their large-scale construction
project within four months
18
Project management tool development process
Previously 20+ types of
individual static
documents
Server
Web framework
Web interface
(HTML/CSS/
Javascript)
Front-end framework
Libraries used:
Database
ILLUSTRATIVE
Initiation of tool
development
June 2016
Integration with client
internal software suite
November 2016
Launch of complete tool
September 2016
Weekly iteration of
database design,
functionalities, and UI
June – September 2016
Agenda
19
Why we all need to give a try in automation
How to start automation projects
How to succeed in doing it
• Keep it simple!
– Not only does it make writing the script easier, but it prevents bugs along the way
– Simple tools are also able to adapt to change more easily
– More complex tools can be built later on by combining multiple modular tools together (but it involves a team effort!)
• Know your audience
– Connect your script to more familiar tools (Excel input → Python script → Excel output)
– Avoid introducing technical elements to the end user (make it click-and-go and minimize the initial setup, like
downloading inessential packages)
• Document your code meticulously
– Especially important when a handover is expected
There are a few basic rules to keep in mind when getting started in automation
20
Lessons learned from automating
21
We’re hiring!
www.reddal.com/join-reddal/
Slides: bit.ly/practical_automation | e-mail: seoweon.yoo@reddal.com
Appendix
23
A shipbuilding company successfully digitized their processes for their large-scale
construction project within four months
25
• Objective: Enable timely and transparent communication between multinational
design and construction teams, and provide project visibility to executive level with
minimal friction
• Approach: Developed a cloud database and web application to replace excel
spreadsheets with rapid prototyping and agile product development to accurately
capture the business logic and processes
• Result: Implemented the web app for a 20-person team within four months,
replacing 80% of static spreadsheets and documents
• Many business processes were partially subjective and done differently per project,
resulting in difficulty translating it into programming procedures. All processes had
to be clearly defined and agreed upon throughout the organization, with a clear set
of rules and guidelines before being implemented into the database.
• Familiarity with pre-existing excel-based tools and reports led to difficulty in
changing team members’ habits into using the new tool. By making the user
interface as similar as possible to the existing documents, team members were
able to intuitively use the tool and overcome the learning gap.
Project management tool development process Case summary
Key challenges and solution
Initiation of tool
development
June 2016
Integration with client
internal software suite
November 2016
Launch of complete tool
September 2016
Weekly iteration of database design,
functionalities, and UI
June – September 2016
Previously 20+ types of
individual static documents
Server
Web framework
Web interface
(HTML/CSS/
Javascript)
Front-end
framework
Libraries used:
Database
BACK-UP

Practical automation for beginners

  • 1.
    Practical automation for beginners Howto start workplace automation projects with basic Python Seoweon Yoo (seoweon.yoo@reddal.com) Slides: bit.ly/practical_automation 0
  • 2.
    Who I amnot: • Computer science major • Developer by trade • Expert in Python • Anywhere near being able to say “I know enough Python to speak at PyCon!” The programs that run the world are not all made by coding gurus – they are made by doers and go-getters (even if those people are beginners!) 2 About me Who I am: • Liberal arts major (English Literature) • Employee of a professional services firm • Beginner of programming and Python • Developed 3+ automation tools in practical use
  • 3.
    Agenda 4 Why we allneed to give a try in automation How to start automation projects How to succeed in doing it
  • 4.
    There is oftena gap between what we want our jobs to be and what our job actually is 5 What we really do Manual data work that does not add value to personal development Image source: http://perceptionvsfact.com/
  • 5.
    Automation opportunities caneasily be found in our day-to-day work, regardless of the specific occupation 6 Potential automation opportunities Input Process Output Frequency: repetitive / routine tasks Ideal • From structured database • Software data • Website • Dropdown input forms • Fairly stable excel files • Rules-based and objective • Has formal documentation • Perceived as a “chore” that is non-critical but has to be done • Low penalty for making mistakes Difficult (but not imposs- ible) • From unstructured or unstable database • Long-form free text input fields • Meeting notes • Requires subjective judgement • Human interaction is an important part of the process • Complex logic that can be interpreted in many ways • Mission-critical output that should be sensitive in conveying nuances • Output that needs a high level of flexibility ILLUSTRATIVE
  • 6.
    • It’s cheaper(monetary, resource utilization, and/or opportunity cost-wise) • … that is all you need! • The time and cost to develop programs in Python is fast Done right, automating tasks in your workplace can save operational costs (which is always good) 7 The business case for workplace automation Source: Transaction Cost Management, Chihiro Suematsu (https://g.co/kgs/hYgBWz) 0 5 10 15 20 25 30 35 40 0 1 2 3 4 5 6 7 Accumulatedcost Number of transactions Manual process With software Initial cost of deploying automation ILLUSTRATIVE
  • 7.
    Agenda 8 Why we allneed to give a try in automation How to start automation projects How to succeed in doing it
  • 8.
    Task Library Data manipulationpandas numpy pandasql (as a backup when SQL comes more easily) Web scraping pandas requests (No need for BeautifulSoup if the web data is already in table form) File management os (mainly for making and navigating file directories) xlsxwriter (writing excel files, connecting them to pandas Data Frames) Basic data analytics pandas numpy networkx (simple optimization problems) Other basic libraries datetime, time (for dealing with time) tqdm (for a nice progress bar in case the code takes a bit of time) The libraries needed to accomplish automation can be at a basic level 9 Ultra-basic libraries used
  • 9.
    • A smallerchunk of work is always better –Break down the task you have in mind and choose the easiest parts first –A weekend project is better than a week-long project • Always start with low-hanging fruit –Choose tasks that are simple and straightforward –Possible to apply even entry-level programming to solve practical problems Once opportunities are identified, narrow down the automation project to a reasonable scope 10 Scoping automation opportunities A group of tasks An entire task Part of a task ILLUSTRATIVE
  • 10.
    Entire project managementprocess • Custom project management web application for shipbuilding company • Web application based on Django framework Weekly dashboard generation • Project vendor flight cost control status report and reaching practical cost-saving through simple analytics • Step-by-step Jupyter Notebook documentation and script to produce Excel document With minimal effort, it is possible to develop simple code that can be useful in daily operations 11 Using Python for automating the workplace – three scopes Data collection • Web-scraping program for screening request for proposals (RFP): github.com/seoweon/narajangteo • Simple script run on command prompt • Currently used weekly for regular operationsPart of a task Sales operations Lead generation
  • 11.
    Previously, it requiredan employee to manually look up predefined keywords and reorganize data from the government procurement site every week 12 Request for proposal (RFP) collection process (manual) Manual process: • Look up relevant keywords in the search bar • Browse through a large list of entries and make judgements on whether or not the RFP is applicable based on the title • Click on each individual link and download files Requires some human judgement, but the process itself follows a predefined set of rules and the required data input is stable
  • 12.
    • One-time inputof all the keywords you want to search • One-time input of all the keywords you want to exclude • Run the program and get the shortened list With a simple Python script, we were able to shorten the amount of time spent as well as capture better opportunities 13 Automating data collection (1/2) category.txt 구매/구입/유지보수/용역 exclude.txt 건설/신축공사/소모품 구매/보강공사/개보수공사 /연구
  • 13.
    With a simplePython script, we were able to shorten the amount of time spent as well as capture better opportunities 14 Automating data collection (2/2) download.txt 20170803758-00 20170802451-00 20170800229-01 • Write down the code number for RFPs (found in the generated excel file) that you are interested to learn more about • Run code for downloading the relevant files • RFPs and relevant documents are found in each respective folder • Libraries used: • pandas • requests • os • datetime • time • string • tqdm (optional) • Time spent: • 2 days for first commit • Incremental commits (25 so far)
  • 14.
    Entire project managementprocess • Custom project management web application for shipbuilding company • Web application based on Django framework As the scope of automation grows, so does complexity and the possibility of failure 15 Using Python for automating the workplace – three scopes Data collection • Web-scraping program for screening request for proposals (RFP): github.com/seoweon/narajangteo • Simple script run on command prompt • Used weekly for regular operations Weekly dashboard generation • Flight cost control status report • Step-by-step Jupyter Notebook documentation and script • Was used for 6 months An entire task Flight cost management
  • 15.
    Data was combined,cleaned up, and enriched through Python 16 Data cleanup with Python Internal flight data Finance data Project data Passenger data Generic airport data Cleaned-up, combined, enriched data • Deletes duplicates • Detects discrepancies • Adds more context to flight reason ○ Detects missed flights, round-trip, nonstop flights ○ Detects connecting flights booked separately • Enriches data with additional information ○ Calculates flight distance (miles) ○ Calculates cost-per-mile Client department internal data Corporate travel department Publicly available dataset • Libraries used: ○ pandas ○ numpy ○ pandasql ○ haversine ○ networkx
  • 16.
    Data collection • Web-scrapingprogram for screening request for proposals (RFP): github.com/seoweon/narajangteo • Simple script run on command prompt • Used weekly for regular operations Weekly dashboard generation • Project vendor flight cost control status report and reaching practical cost-saving through simple analytics • Step-by-step Jupyter Notebook documentation and script to produce Excel document Entire project management process • Custom project management application for shipbuilding company • Web application based on Django framework • Four-month development project with three full- time team members • Two-month training effort for user onboarding As the project scope grows even further, the more involved it becomes to develop as well as to implement 17 Using Python for automating the workplace – three scopes A group of tasks
  • 17.
    A shipbuilding companydigitized their processes for their large-scale construction project within four months 18 Project management tool development process Previously 20+ types of individual static documents Server Web framework Web interface (HTML/CSS/ Javascript) Front-end framework Libraries used: Database ILLUSTRATIVE Initiation of tool development June 2016 Integration with client internal software suite November 2016 Launch of complete tool September 2016 Weekly iteration of database design, functionalities, and UI June – September 2016
  • 18.
    Agenda 19 Why we allneed to give a try in automation How to start automation projects How to succeed in doing it
  • 19.
    • Keep itsimple! – Not only does it make writing the script easier, but it prevents bugs along the way – Simple tools are also able to adapt to change more easily – More complex tools can be built later on by combining multiple modular tools together (but it involves a team effort!) • Know your audience – Connect your script to more familiar tools (Excel input → Python script → Excel output) – Avoid introducing technical elements to the end user (make it click-and-go and minimize the initial setup, like downloading inessential packages) • Document your code meticulously – Especially important when a handover is expected There are a few basic rules to keep in mind when getting started in automation 20 Lessons learned from automating
  • 20.
  • 21.
  • 22.
    A shipbuilding companysuccessfully digitized their processes for their large-scale construction project within four months 25 • Objective: Enable timely and transparent communication between multinational design and construction teams, and provide project visibility to executive level with minimal friction • Approach: Developed a cloud database and web application to replace excel spreadsheets with rapid prototyping and agile product development to accurately capture the business logic and processes • Result: Implemented the web app for a 20-person team within four months, replacing 80% of static spreadsheets and documents • Many business processes were partially subjective and done differently per project, resulting in difficulty translating it into programming procedures. All processes had to be clearly defined and agreed upon throughout the organization, with a clear set of rules and guidelines before being implemented into the database. • Familiarity with pre-existing excel-based tools and reports led to difficulty in changing team members’ habits into using the new tool. By making the user interface as similar as possible to the existing documents, team members were able to intuitively use the tool and overcome the learning gap. Project management tool development process Case summary Key challenges and solution Initiation of tool development June 2016 Integration with client internal software suite November 2016 Launch of complete tool September 2016 Weekly iteration of database design, functionalities, and UI June – September 2016 Previously 20+ types of individual static documents Server Web framework Web interface (HTML/CSS/ Javascript) Front-end framework Libraries used: Database BACK-UP