Successfully reported this slideshow.
Your SlideShare is downloading. ×

Beating Python's GIL to Max Out Your CPUs

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 57 Ad

Beating Python's GIL to Max Out Your CPUs

Download to read offline

Among the #1 complaints of Python in a data analysis context is the presence of the Global Interpreter Lock, or GIL. At its core, it means that a given Python program cannot easily utilize more than one core of a multi-core machine to do computation in parallel. However, fear not! To beat the GIL, you just need to be willing to adopt a little magic -- and this talk will tell you how.

Among the #1 complaints of Python in a data analysis context is the presence of the Global Interpreter Lock, or GIL. At its core, it means that a given Python program cannot easily utilize more than one core of a multi-core machine to do computation in parallel. However, fear not! To beat the GIL, you just need to be willing to adopt a little magic -- and this talk will tell you how.

Advertisement
Advertisement

More Related Content

Slideshows for you (19)

Similar to Beating Python's GIL to Max Out Your CPUs (20)

Advertisement

Recently uploaded (20)

Beating Python's GIL to Max Out Your CPUs

  1. 1. Beating Python's GIL! to Max Out Your CPUs Andrew Montalenti! CTO, Parse.ly @amontalenti
  2. 2. Scaling Python! to 3,000 Cores Andrew Montalenti! CTO, Parse.ly @amontalenti OR:
  3. 3. What happens when you have 153 TB of compressed customer data that may need to be reprocessed at any time, and it’s now growing at 10-20TB per month?
  4. 4. @dabeaz = “the GIL guy”
  5. 5. Is the GIL a feature, not a bug?! In one Python process, at any one time, only one Python bytecode instruction is executing at once.
  6. 6. should we just rewrite it in Go?
  7. 7. fast functions! running in parallel
  8. 8. Python State Code Server 1 Core 2 Core 1 Server 2 Core 2 Core 1 Server 3 Core 2 Core 1 from urllib.parse import urlparse urls = ["http://arstechnica.com/", "http://ars.to/1234", "http://ars.to/5678", ...]
  9. 9. Python State Code Server 1 Core 2 Core 1 Server 2 Core 2 Core 1 Server 3 Core 2 Core 1 map(urlparse, urls) from urllib.parse import urlparse urls = ["http://arstechnica.com/", "http://ars.to/1234", "http://ars.to/5678", ...]
  10. 10. Cython speeding up functions on a single core
  11. 11. concurrent.futures good map API, but odd implementation details
  12. 12. Python State Code Server 1 Core 2 Core 1 Server 2 Core 2 Core 1 Server 3 Core 2 Core 1 executor = ThreadPoolExecutor() executor.map(urlparse, urls)
  13. 13. Python State Code Server 1 Core 2 Core 1 Server 2 Core 2 Core 1 Server 3 Core 2 Core 1 executor = ProcessPoolExecutor() executor.map(urlparse, urls) Python subprocess State Code Python subprocess State Code pickle.dumps() os.fork()
  14. 14. joblib map functions over local machine cores by cleaning up stdlib facilities
  15. 15. Python State Code Server 1 Core 2 Core 1 Server 2 Core 2 Core 1 Server 3 Core 2 Core 1 par = Parallel(n_jobs=2) do_urlparse = delayed(urlparse) par(do_urlparse(url) for url in urls) Python subprocess State Code Python subprocess State Code pickle.dumps() os.fork()
  16. 16. ipyparallel map functions over a pet compute cluster
  17. 17. Python State Code Server 1 Core 2 Core 1 Server 2 Core 2 Core 1 Server 3 Core 2 Core 1 rc = Client() rc[:].map_sync(urlparse, urls) Python State Code Python State Code ipengine Python State Code Python State Code Python State Code ipengine ipengine ipcontroller Python State Code pickle.dumps() pickle.dumps()
  18. 18. pykafka map functions over a multi-consumer log
  19. 19. Python State Code Server 1 Core 2 Core 1 Server 2 Core 2 Core 1 Server 3 Core 2 Core 1 consumer = ... # balanced while True: msg = consumer.consume() msg = json.loads(msg) urlparse(msg["url"]) Python State Code Python State Code Python State Code Python State Code Python State Code pykafka.producer Python State Code
  20. 20. pystorm map functions over a stream of inputs to generate a stream of outputs
  21. 21. Python State Code Server 1 Core 2 Core 1 Server 2 Core 2 Core 1 Server 3 Core 2 Core 1 Python State Code Python State Code Python State Code Python State Code pykafka.producer Python State Code multi-lang json protocol class UrlParser(Topology): url_spout = UrlSpout.spec(p=1) url_bolt = UrlBolt.spec(p=4, input=url_spout)
  22. 22. pyspark map functions over a dataset representation to perform transformations and actions
  23. 23. Python State Code Server 1 Core 2 Core 1 Server 2 Core 2 Core 1 Server 3 Core 2 Core 1 Python State Code Python State Code Python State Code Python State Code pyspark.SparkContext sc = SparkContext() file_rdd = sc.textFile(files) file_rdd.map(urlparse).take(1) cloudpickle py4j and binary pipes
  24. 24. "lambda architecture"
  25. 25. Parse.ly "Batch Layer" Topologies with Spark & S3 Parse.ly "Speed Layer" Topologies with Storm & Kafka Parse.ly Dashboards and APIs with Elasticsearch & Cassandra Parse.ly Raw Data Warehouse with Streaming & SQL Access Technology Component Summary
  26. 26. parting thoughts
  27. 27. the free lunch is over, but not how we thought
  28. 28. multi-process, not multi-thread multi-node, not multi-core message passing, not shared memory ! heaps of data and streams of data
  29. 29. GIL: it's a feature, not a bug. help us!
 pystorm pykafka streamparse
  30. 30. Questions? tweet at @amontalenti

×