Successfully reported this slideshow.

C# to python

3

Share

Upcoming SlideShare
A tour of Python
A tour of Python
Loading in …3
×
1 of 91
1 of 91

C# to python

3

Share

Download to read offline

As a long time C# developer, I started with Python as a second language for ML purposes. Starting in Python is easy, doing engineering grade python turned out to be a lot harder, so these are 10 things I learned along the way to writing production code in Python.

As a long time C# developer, I started with Python as a second language for ML purposes. Starting in Python is easy, doing engineering grade python turned out to be a lot harder, so these are 10 things I learned along the way to writing production code in Python.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

C# to python

  1. 1. FROM C# TO PYTHON 10 THINGS I LEARNED ALONG THE WAY Tess Ferrandez
  2. 2. TESS SOFTWARE ENGINEER & DATA SCIENTIST at MICROSOFT
  3. 3. NOTEBOOKS ARE FOR EXPLORATION 1
  4. 4. REUSE CODE SOURCE CONTROL DEBUG TEST CI/CD PIPELINE
  5. 5. IF IT’S GOING INTO PROD, IT’S GOING IN A .PY FILE
  6. 6. PYTHON IS VERY FLEXIBLE 2
  7. 7. # read the data df = pd.read_csv('../../data/houses.csv') # print the first five records print(df.head()) # plot the price df.price.plot(kind='hist', bins=100) plt.show() IMPERATIVE
  8. 8. def clean_region(region: str) -> str:… def clean_broker(broker_name: str) -> str:… def clean_data(input_file: str, output_file: str):… if __name__ == '__main__': clean_data('data/interim/houses.csv', 'data/processed/houses.csv') PROCEDURAL
  9. 9. FUNCTIONAL def square(x: int) -> int: return x * x numbers = [1, 2, 3, 4, 5] num_sum = reduce(lambda x, y: x + y, numbers, 0) squares = map(square, numbers)
  10. 10. OBJECT ORIENTED class StringOps: def __init__(self, characters): self.characters = characters def stringify(self): self.string = ''.join(self.characters) sample_str = StringOps(['p', 'y', 't', 'h', 'o', 'n']) sample_str.stringify() print(sample_str.string)
  11. 11. YOU CAN MIX AND MATCH PARADIGMS AS YOU PLEASE, BUT KEEP YOUR CODE AND SOCKS DRY
  12. 12. USE A COOKIE CUTTER PROJECT STRUCTURE 3
  13. 13. A BIG PILE O’ FILES clean_dataset.py clean_dataset2.py clean-2019-02-01.py clean-tf-1.py super-final-version-of-this- cleaning-script.py
  14. 14. MAKEFILE SETUP DOCS NOTEBOOKS / REPORTS REQUIREMENTS.TXT TESTS SEPARATELY
  15. 15. USE A COOKIE CUTTER PROJECT STRUCTURE OTHER PEOPLE WILL THANK YOU
  16. 16. USE A COOKIE CUTTER PROJECT STRUCTURE OTHER PEOPLE I WILL THANK YOU (PERSONALLY!)
  17. 17. WRITING READABLE & MAINTAINABLE CODE 4
  18. 18. import random, sys import os def myfunc(): rando = random.random() return random.randint(0,100) def multiply (a, b): return a * b print(multiply(myfunc(), myfunc()))
  19. 19. PEP8.ORG PYTHON ENHANCEMENT PROPOSAL
  20. 20. import random, sys import os def myfunc(): rando = random.random() return random.randint(0,100) def multiply (a, b): return a * b print(multiply(myfunc(), myfunc()))
  21. 21. import random, sys import os def myfunc(): rando = random.random() return random.randint(0,100) def multiply (a, b): return a * b print(multiply(myfunc(), myfunc()))
  22. 22. import random def myfunc(): rando = random.random() return random.randint(0,100) def multiply (a, b): return a * b print(multiply(myfunc(), myfunc())) UNUSED IMPORTS
  23. 23. import random def myfunc(): rando = random.random() return random.randint(0,100) def multiply (a, b): return a * b print(multiply(myfunc(), myfunc())) SEPARATING LINES
  24. 24. import random def myfunc(): rando = random.random() return random.randint(0, 100) def multiply(a, b): return a * b print(multiply(myfunc(), myfunc())) WHITE SPACES
  25. 25. import random def myfunc(): return random.randint(0, 100) def multiply(a, b): return a * b print(multiply(myfunc(), myfunc())) UNUSED VARIABLES
  26. 26. import random def random_number(): return random.randint(0, 100) def multiply(a, b): return a * b print(multiply(random_number(), random_number())) WEIRD FUNCTION NAMES
  27. 27. import random def random_number(): return random.randint(0, 100) def multiply(a, b): return a * b print(multiply(random_number(), random_number()))
  28. 28. # See https://pre-commit.com for more information # See https://pre-commit.com/hooks.html for more hooks repos: - repo: https://github.com/ambv/black rev: stable hooks: - id: black language_version: python3.7 - repo: https://github.com/pre-commit/pre-commit-hooks rev: v2.0.0 hooks: - id: flake8
  29. 29. def add(a, b): return a + b result = add('hello', 'world') result = add(2, 3) def add(a: int, b: int) -> int: return a + b result = add('hello', 'world') result = add(2, 3)
  30. 30. PEP8 ALL THE CODES
  31. 31. A SWEET DEV ENVIRONMENT SETUP 5
  32. 32. PIP, CONDA AND VIRTUAL ENVIRONMENTS 6
  33. 33. pip install pandas conda install pandas
  34. 34. conda create –name myenv python=3.6 conda activate myenv # Install all the things # Work on the application conda deactivate myenv
  35. 35. REQUIREMENTS.TXT
  36. 36. KEEP YOUR MACHINE CLEAN AND YOUR PANDAS SEPARATED
  37. 37. EMBRACING PYTHONIC PYTHON 7
  38. 38. SQUARE SOME NUMBERS
  39. 39. nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
  40. 40. squares = [] i = 0 while i < len(nums): squares.append(nums[i] * nums[i]) i += 1
  41. 41. squares = [] for i in range(len(nums)): squares.append(nums[i] * nums[i])
  42. 42. squares = [] for num in nums: squares.append(num * num)
  43. 43. squares = [num * num for num in nums]
  44. 44. squares = [num * num for num in nums if num % 2 == 0]
  45. 45. fruits = ['apple', 'mango', 'banana', 'cherry’] fruit_lens = {fruit: len(fruit) for fruit in fruits} {'apple': 5, 'mango': 5, 'banana': 6, 'cherry': 6}
  46. 46. SUM ALL NUMBERS BETWEEN 10 AND 1000
  47. 47. a = 10 b = 1000 total_sum = 0 while b >= a: total_sum += a a += 1
  48. 48. total_sum = sum(range(10, 1001))
  49. 49. IS THIS ITEM IN THE LIST?
  50. 50. fruits = ['apples', 'oranges', 'bananas', 'grapes'] found = False size = len(fruits) for i in range(size): if fruits[i] == 'cherries': found = True
  51. 51. found = 'cherries' in fruits
  52. 52. LIVE ZEN
  53. 53. >>> import this The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!
  54. 54. ARGUMENT PARSING WITH CLICK 8
  55. 55. def main(): parser = argparse.ArgumentParser() parser.add_argument('input_file', default='in.txt’, type=str, help=‘…') parser.add_argument('ouput_file', default='out.txt’, type=str, help=‘…') parser.add_argument(‘--debug', required=True, type=bool, help=‘…') args = parser.parse_args() # do some work print(args.debug) if __name__ == '__main__': main()
  56. 56. python myscript.py --help Usage: myscript.py [OPTIONS] [INPUT_FILE] [OUTPUT_FILE] Options: --debug BOOLEAN [required] --help Show this message and exit.
  57. 57. def main(): parser = argparse.ArgumentParser() parser.add_argument('input_file', default='in.txt’, type=str, help=‘…') parser.add_argument('ouput_file', default='out.txt’, type=str, help=‘…') parser.add_argument(‘--debug', required=True, type=bool, help=‘…') args = parser.parse_args() # do some work print(args.debug) if __name__ == '__main__': main()
  58. 58. @click.command() @click.argument('input_file', default='in.txt', type=click.Path(), help=‘…') @click.argument('output_file', default='out.txt', type=click.Path(), help=‘…') @click.option('--debug', required=True, type=click.BOOL, help=‘…') def main(input_file, output_file, debug): print(input_file) print(output_file) print(debug) if __name__ == '__main__': main()
  59. 59. CLICK MAKES ARGUMENT PARSING READABLE AND TESTABLE
  60. 60. TESTING WITH PYTEST 9
  61. 61. def test_add_positive(): assert add(1, 2) == 3
  62. 62. @pytest.mark.parametrize('val1, val2, expected_result', [ # small values (1, 2, 3), # negative values (-2, -1, 3) ]) def test_add(val1, val2, expected_result): actual_result = add(val1, val2) assert actual_result == expected_result
  63. 63. @pytest.mark.longrunning def test_integration_between_two_systems(): # this might take a while
  64. 64. def remove_file(filename): if os.path.isfile(filename): os.remove(filename)
  65. 65. @mock.patch('src.utils.file_utils.os.path') @mock.patch('src.utils.file_utils.os') def test_remove_file_not_removed_if…(mock_os, mock_path): mock_path.isfile.return_value = False remove_file('anyfile.txt') assert mock_os.remove.called == False
  66. 66. A TEST FOLDER IN THE ROOT IS PRETTY NICE
  67. 67. THERE IS A PACKAGE FOR THAT 10
  68. 68. PLOTTING
  69. 69. NEURAL NETWORKS
  70. 70. POSE DETECTION
  71. 71. FOCUSED OPTICAL FLOW
  72. 72. model_path = '../resnet50_coco_best_v2.1.0.h5' model = models.load_model(model_path, backbone_name='resnet50’) image_path = '../data/images/basket_image.jpg' image = read_image_bgr(image_path) image = preprocess_image(image) image, scale = resize_image(image) # process image boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0)) from keras_retinanet import models from keras_retinanet.utils.image import read_image_bgr, preprocess_image, resize_image OBJECT DETECTION
  73. 73. BACKGROUND REMOVAL
  74. 74. THIS IS THE REASON WHY WE DO ML IN PYTHON
  75. 75. 2 3 41 6 7 85 9 10
  76. 76. FROM C# TO PYTHON 10 THINGS I LEARNED ALONG THE WAY Tess Ferrandez

Editor's Notes

  • Software Engineer and Data Scientist – In that order.
    My main goal is to create software that solves a business need, sometimes that includes Machine Learning, but I’m equally happy if it doesn’t
    Too often, we walk into the room algorithm first, as if adding AI/ML had it’s own value, and virtually every time we do that, we fail…
  • Intro
    First entry point for many
    Many training courses are done exclusively in Notebooks, same with Kaggle

    The good
    Great for exploration – unprecedented
    Great for telling an analysis story – Documentation with Markdown

    The bad
    Executing items out of order – what did you even execute?
    Testing
    Debugging
    CI/CD Pipeline
    Reproducing
    Adding to Source Control

    Suggestions for good practices
    Naming
    Export reports as HTML
    Export code as scripts

    Some alternatives
    Terminal
    Jupyter in VS Code and PyCharm
    Interactive cells in VS Code in PyCharm
  • More like a recipe – great for step by step tasks like exploring or cleaning data
  • Modularizing, putting more common tasks in procedures.
    As we move to prod, we need this… the imperative style leans a lot on globals, non-dry code with many code smells that you don’t want in prod
  • Every statement can be seen as a mathematical function – state and mutability is avoided
    Python is not a pure functional language – but this paradigm lends itself extremely well to data manipulation of large datasets as we keep iterating through, only using what we need
  • While python does do Object Oriented programming, it doesn’t do encapsulation, so nothing is private.
    You can optionally do _myprivate var, but still, it is only convention, not real hiding
  • KEEP YOUR CODE AND SOCKS DRY
  • KEEP YOUR CODE AND SOCKS DRY
  • KEEP YOUR CODE AND SOCKS DRY
  • Pip installs from pypi (only python)
    Conda from conda cloud (any language)

    Difference in dependency management
  • Multiple overlapping
    Small players
    Have to train guestures
    Weird angles
  • ×