How To Delete Unused
Python Code Safely
Aug, 2018 by PCMan (洪任諭)
<pcman.tw@gmail.com>
COSCUP 2018
Who Am I?
● Free Software
○ Creator of LXDE/LXQt desktop
○ PCMan BBS client
○ Linux user since 2004
● Career
○ Senior System Engineer; Appier Inc.
○ Physician, Taipei Veterans General Hospital (Rheumatology)
● Education
○ Master of Computer Science, National Taiwan University
○ Medicine, National Yang-Ming University
Problems of Legacy Code
● Large and complicated code base
● It works! (but I don't know why)
● Unused variables
● Unused functions
● Unused imports
● Unreachable code path
● No documentations
● Broken unit tests
3
Static Code Analysis
4
Some Helpful Tools
● Pyflake + autoflake
○ Delete unused variables
○ Delete unused imports
○ Expand import *
○ Cannot delete unused functions
● Vulture
○ (Claim to) find unused functions
○ Provide confidence levels
○ Unfortunately, does not work in some cases 5
Example use of autoflake
> autoflake 
--in-place  # warning: this will directly edit the *.py files
--remove-unused-variables 
--remove-all-unused-imports 
--expand-star-imports 
--recursive 
<source dir>
6
Example use of vulture - Find Unused Functions
> vulture 
myscript.py 
--min-confidence 100 # Only report 100% dead code.
● Does not work for a some code base I tested and always
reported 60% confidence for all functions :-(
7
Example Output of Vulture 0.25
$ vulture --sort-by-size .
app.py:1: unused import 'sys' (90% confidence, 1 line)
app.py:2: unused import 'json' (90% confidence, 1 line)
app.py:5: unused import 'demo2' (90% confidence, 1 line)
app.py:10: unused variable 'i' (60% confidence, 1 line)
app.py:14: unused function 'unused_func0' (60% confidence, 2 lines)
demo.py:3: unused function 'unused_func1' (60% confidence, 2 lines)
demo2.py:1: unused function 'unused_func2' (60% confidence, 2 lines)
demo2.py:4: unused function 'unused_func3' (60% confidence, 2 lines)
demo2.py:7: unused function 'unused_func4' (60% confidence, 2 lines)
8Test example code: https://github.com/PCMan/python-find-unused-func/tree/master/example
Code Coverage
9
Coverage.py
● Measure code coverage of Python programs
● Beautiful reports
● Often used to calculate unit test coverage
● Example:
> coverage run your_program.py
> coverage report # text-based summary report
> coverage html # generate colorful detailed reports
10
https://coverage.readthedocs.io/en/coverage-4.5.1a/
Example Output of Coverage.py
11Test example code: https://github.com/PCMan/python-find-unused-func/tree/master/example
Example Output of Coverage.py
12Test example code: https://github.com/PCMan/python-find-unused-func/tree/master/example
Code Coverage Tests
Pros:
● Very detailed reports (statement level)
● Can observe actual behavior at runtime
Cons:
● Need code that can run (not for broken legacy code)
● Need test cases with good quality and coverage
● May not work reliably with concurrency (such as gevent)
13
Alternatives?
14
DIY with Python AST
15
16
Example Python AST
17
<_ast.Module object at 0x7f3aeaecd630>
<_ast.ImportFrom object at 0x7f3aeaecd550>
alias(name='used_func4', asname=None)
<_ast.FunctionDef object at 0x7f3aeaecd668>
arguments(args=[], vararg=None, kwonlyargs=[],
kw_defaults=[], kwarg=None, defaults=[])
<_ast.Expr object at 0x7f3aeaecd748>
<_ast.Call object at 0x7f3aeaecd898>
<_ast.Name object at 0x7f3aeaecd860>
Load()
Str(s='unused 1')
# module demo.py
from demo2 
import used_func4
def unused_func1():
print("unused 1")
Test example code: https://github.com/PCMan/python-find-unused-func/tree/master/example
Some Useful Python AST Node Types
18
● ast.Module: the root node
● ast.FunctionDef: function definition
● ast.Attribute: access attribute of an object
● ast.Name: symbol name
● ast.Call: method invocation
● ast.ImportFrom: from xxx import yyy
● ast.NodeVisitor: tree traversal
Reference: http://greentreesnakes.readthedocs.io/en/latest/tofrom.html
Find Functions and Their (Potential) Callers
19
def find_symbols(py_file):
used_symbols = defaultdict(int)
defined_funcs = []
with open(py_file) as f:
src = f.read()
tree = ast.parse(src, py_file) # parse the python code
for node in ast.walk(tree): # iterate through all nodes (unordered)
if isinstance(node, ast.FunctionDef): # function definition
defined_funcs.append(node.name)
elif isinstance(node, ast.Attribute): # reference obj.attribute
used_symbols[node.attr] += 1
elif isinstance(node, ast.Name): # name of an identifier
used_symbols[node.id] += 1
return defined_funcs, used_symbols
Find Unused Functions
20
1. Find all defined functions in the whole source tree
2. Find all references to object attributes or symbol names in
the whole source tree
3. A function is defined but the name is not referenced in all
files ⇒ unused
Example Output - List Unused Functions
> python3 ../find_unused.py ./*.py
./app.py
unused_func0
./demo.py
unused_func1
./demo2.py
unused_func2
unused_func3
unused_func4
21Test example code: https://github.com/PCMan/python-find-unused-func/tree/master/example
Pitfalls & Limitations
22
● False negative (unused but not recognized)
○ Only compare symbol names
○ Different functions can have the same name
○ Unused recursive functions cannot be found
(referenced by itself → always used)
● Cannot handle dynamic cases:
○ getattr(obj, 'func_name')()
○ globals()['func_name']()
○ from module import * → use autoflake to remove this
The Whole Process
1. Remove unused import and expand * with autoflake
2. Examine every getattr(), globals(), and __import__ in the
code (to make sure they don't reference functions)
3. List unused functions
4. Delete unused functions (with an IDE like PyCharm)
5. Repeat steps 1 - 4 until all unused functions are deleted
a. After deleting some functions, their dependencies may
become unused as well
23
Get the Tool
https://github.com/PCMan/python-find-unused-func
24

2018 cosup-delete unused python code safely - english

  • 1.
    How To DeleteUnused Python Code Safely Aug, 2018 by PCMan (洪任諭) <pcman.tw@gmail.com> COSCUP 2018
  • 2.
    Who Am I? ●Free Software ○ Creator of LXDE/LXQt desktop ○ PCMan BBS client ○ Linux user since 2004 ● Career ○ Senior System Engineer; Appier Inc. ○ Physician, Taipei Veterans General Hospital (Rheumatology) ● Education ○ Master of Computer Science, National Taiwan University ○ Medicine, National Yang-Ming University
  • 3.
    Problems of LegacyCode ● Large and complicated code base ● It works! (but I don't know why) ● Unused variables ● Unused functions ● Unused imports ● Unreachable code path ● No documentations ● Broken unit tests 3
  • 4.
  • 5.
    Some Helpful Tools ●Pyflake + autoflake ○ Delete unused variables ○ Delete unused imports ○ Expand import * ○ Cannot delete unused functions ● Vulture ○ (Claim to) find unused functions ○ Provide confidence levels ○ Unfortunately, does not work in some cases 5
  • 6.
    Example use ofautoflake > autoflake --in-place # warning: this will directly edit the *.py files --remove-unused-variables --remove-all-unused-imports --expand-star-imports --recursive <source dir> 6
  • 7.
    Example use ofvulture - Find Unused Functions > vulture myscript.py --min-confidence 100 # Only report 100% dead code. ● Does not work for a some code base I tested and always reported 60% confidence for all functions :-( 7
  • 8.
    Example Output ofVulture 0.25 $ vulture --sort-by-size . app.py:1: unused import 'sys' (90% confidence, 1 line) app.py:2: unused import 'json' (90% confidence, 1 line) app.py:5: unused import 'demo2' (90% confidence, 1 line) app.py:10: unused variable 'i' (60% confidence, 1 line) app.py:14: unused function 'unused_func0' (60% confidence, 2 lines) demo.py:3: unused function 'unused_func1' (60% confidence, 2 lines) demo2.py:1: unused function 'unused_func2' (60% confidence, 2 lines) demo2.py:4: unused function 'unused_func3' (60% confidence, 2 lines) demo2.py:7: unused function 'unused_func4' (60% confidence, 2 lines) 8Test example code: https://github.com/PCMan/python-find-unused-func/tree/master/example
  • 9.
  • 10.
    Coverage.py ● Measure codecoverage of Python programs ● Beautiful reports ● Often used to calculate unit test coverage ● Example: > coverage run your_program.py > coverage report # text-based summary report > coverage html # generate colorful detailed reports 10 https://coverage.readthedocs.io/en/coverage-4.5.1a/
  • 11.
    Example Output ofCoverage.py 11Test example code: https://github.com/PCMan/python-find-unused-func/tree/master/example
  • 12.
    Example Output ofCoverage.py 12Test example code: https://github.com/PCMan/python-find-unused-func/tree/master/example
  • 13.
    Code Coverage Tests Pros: ●Very detailed reports (statement level) ● Can observe actual behavior at runtime Cons: ● Need code that can run (not for broken legacy code) ● Need test cases with good quality and coverage ● May not work reliably with concurrency (such as gevent) 13
  • 14.
  • 15.
  • 16.
  • 17.
    Example Python AST 17 <_ast.Moduleobject at 0x7f3aeaecd630> <_ast.ImportFrom object at 0x7f3aeaecd550> alias(name='used_func4', asname=None) <_ast.FunctionDef object at 0x7f3aeaecd668> arguments(args=[], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]) <_ast.Expr object at 0x7f3aeaecd748> <_ast.Call object at 0x7f3aeaecd898> <_ast.Name object at 0x7f3aeaecd860> Load() Str(s='unused 1') # module demo.py from demo2 import used_func4 def unused_func1(): print("unused 1") Test example code: https://github.com/PCMan/python-find-unused-func/tree/master/example
  • 18.
    Some Useful PythonAST Node Types 18 ● ast.Module: the root node ● ast.FunctionDef: function definition ● ast.Attribute: access attribute of an object ● ast.Name: symbol name ● ast.Call: method invocation ● ast.ImportFrom: from xxx import yyy ● ast.NodeVisitor: tree traversal Reference: http://greentreesnakes.readthedocs.io/en/latest/tofrom.html
  • 19.
    Find Functions andTheir (Potential) Callers 19 def find_symbols(py_file): used_symbols = defaultdict(int) defined_funcs = [] with open(py_file) as f: src = f.read() tree = ast.parse(src, py_file) # parse the python code for node in ast.walk(tree): # iterate through all nodes (unordered) if isinstance(node, ast.FunctionDef): # function definition defined_funcs.append(node.name) elif isinstance(node, ast.Attribute): # reference obj.attribute used_symbols[node.attr] += 1 elif isinstance(node, ast.Name): # name of an identifier used_symbols[node.id] += 1 return defined_funcs, used_symbols
  • 20.
    Find Unused Functions 20 1.Find all defined functions in the whole source tree 2. Find all references to object attributes or symbol names in the whole source tree 3. A function is defined but the name is not referenced in all files ⇒ unused
  • 21.
    Example Output -List Unused Functions > python3 ../find_unused.py ./*.py ./app.py unused_func0 ./demo.py unused_func1 ./demo2.py unused_func2 unused_func3 unused_func4 21Test example code: https://github.com/PCMan/python-find-unused-func/tree/master/example
  • 22.
    Pitfalls & Limitations 22 ●False negative (unused but not recognized) ○ Only compare symbol names ○ Different functions can have the same name ○ Unused recursive functions cannot be found (referenced by itself → always used) ● Cannot handle dynamic cases: ○ getattr(obj, 'func_name')() ○ globals()['func_name']() ○ from module import * → use autoflake to remove this
  • 23.
    The Whole Process 1.Remove unused import and expand * with autoflake 2. Examine every getattr(), globals(), and __import__ in the code (to make sure they don't reference functions) 3. List unused functions 4. Delete unused functions (with an IDE like PyCharm) 5. Repeat steps 1 - 4 until all unused functions are deleted a. After deleting some functions, their dependencies may become unused as well 23
  • 24.