This document is the introduction to a tutorial about generator tricks for systems programmers presented by David Beazley. It provides biographical information about the presenter, outlines the goals and structure of the tutorial, and introduces some key concepts around iterators and generators in Python. The tutorial will cover practical uses of generators with a focus on files, file systems, parsing, networking, threads, and other systems programming tasks. It aims to provide compelling examples of using generators and get attendees thinking about how to apply them.
Make container without_docker_6-overlay-network_1 Sam Kim
분산환경에서 컨테이너 간의 통신은 어떻게 이루어 지는 것일까요? 3,4편에서는 호스트 안에 가상네트워크를 만들어보았습니다. 6편에서는 이를 바탕으로 분산환경에서 호스트 간에 가상 네트워크로 통신이 가능하도록 만들어 봅니다. 이 방법은 실제 쿠버네티스 flannel 등의 CNI에서 사용하고 있는 vxlan 기반의 오버레이 네트워크 구성을 다룹니다.
This document provides an overview and introduction to Muduo, a C++ network programming library for Linux. Some key points:
- Muduo is a non-blocking, event-driven, multi-core ready C++ network library that aims to provide high performance and modern features.
- The document discusses challenges with network programming using sockets APIs directly and how a library like Muduo can help abstract away complexity.
- It covers core concepts in non-blocking and event-driven network programming used by Muduo like the event loop, callbacks, and lifetime management of connection objects.
- Examples are provided of how Muduo implements patterns like chat servers and comparisons are made to other libraries
Knee deep in the undef - Tales from refactoring old Puppet codebasesPeter Souter
As Puppet pushes into it’s second decade of reign, there are several organisations out there that have been using Puppet for a long time. Sometimes, even since the beginning!
With the EOL announcement Puppet 3.X release, we’ve had a number of customers approach us to help with their upgrade. Normally the upgrade itself is fairly, it’s the code base that gives the biggest challenge, especially those with over 3 years of organic growth.
So let’s spread the word of common anti-patterns and issues that can come back to bite you
We’ll be talking about how Hiera is both the best and worst thing to happen to Puppet, marvel at how people were happily running 0.2 Puppet in production and what hacky solutions that seemed good at the time will come back to bite you!
By the end of this, you’ll hopefully have learnt how to make sure that your Puppet code is defensively coded to to make sure your Puppet code base is healthy for the next decade!
This document discusses setting up a network bridge without Docker. It provides a Vagrantfile to configure a virtual machine environment with Ubuntu 18.04, along with tools like Go and Docker installed. Instructions are given to create a bridge between two network namespaces called RED and BLUE using IP addresses in the 11.11.11.0/24 range. Tests show that hosts can ping each other within this network but not across the real interface and IP range of the host machine. Additional routing and IP configuration is needed to allow outside communication.
GStreamer is a media framework for Linux that allows construction of graphs to process and play multimedia streams. It includes elements for sources, filters, decoders, encoders and sinks that can be connected together. Examples show how to play audio and video files, stream from webcams and the internet, and record or stream multimedia compositions. GStreamer has plugins for common formats and can be used for applications like media players, video editing and multimedia centers.
This document provides an introduction and overview of Gstreamer, including its concepts and examples of its use. Gstreamer is a media framework that allows building media handling applications and facilitating tasks like accessing hardware, building plugins, and using scriptable command line tools. It discusses key Gstreamer concepts and provides examples of using it to analyze media files, transcode video and audio to different formats, and stream video. The document encourages questions and provides credits for resources used.
This document discusses Zurg, a distributed process management system with a master-slave architecture. The Zurg slave runs on each host and can run commands, start and monitor applications, and collect performance data. It communicates with the Zurg master. Some challenges discussed include reliably detecting when processes exit, limiting output, and ensuring processes are properly restarted if the slave crashes. The master will store status information accessible via web interfaces.
Make container without_docker_6-overlay-network_1 Sam Kim
분산환경에서 컨테이너 간의 통신은 어떻게 이루어 지는 것일까요? 3,4편에서는 호스트 안에 가상네트워크를 만들어보았습니다. 6편에서는 이를 바탕으로 분산환경에서 호스트 간에 가상 네트워크로 통신이 가능하도록 만들어 봅니다. 이 방법은 실제 쿠버네티스 flannel 등의 CNI에서 사용하고 있는 vxlan 기반의 오버레이 네트워크 구성을 다룹니다.
This document provides an overview and introduction to Muduo, a C++ network programming library for Linux. Some key points:
- Muduo is a non-blocking, event-driven, multi-core ready C++ network library that aims to provide high performance and modern features.
- The document discusses challenges with network programming using sockets APIs directly and how a library like Muduo can help abstract away complexity.
- It covers core concepts in non-blocking and event-driven network programming used by Muduo like the event loop, callbacks, and lifetime management of connection objects.
- Examples are provided of how Muduo implements patterns like chat servers and comparisons are made to other libraries
Knee deep in the undef - Tales from refactoring old Puppet codebasesPeter Souter
As Puppet pushes into it’s second decade of reign, there are several organisations out there that have been using Puppet for a long time. Sometimes, even since the beginning!
With the EOL announcement Puppet 3.X release, we’ve had a number of customers approach us to help with their upgrade. Normally the upgrade itself is fairly, it’s the code base that gives the biggest challenge, especially those with over 3 years of organic growth.
So let’s spread the word of common anti-patterns and issues that can come back to bite you
We’ll be talking about how Hiera is both the best and worst thing to happen to Puppet, marvel at how people were happily running 0.2 Puppet in production and what hacky solutions that seemed good at the time will come back to bite you!
By the end of this, you’ll hopefully have learnt how to make sure that your Puppet code is defensively coded to to make sure your Puppet code base is healthy for the next decade!
This document discusses setting up a network bridge without Docker. It provides a Vagrantfile to configure a virtual machine environment with Ubuntu 18.04, along with tools like Go and Docker installed. Instructions are given to create a bridge between two network namespaces called RED and BLUE using IP addresses in the 11.11.11.0/24 range. Tests show that hosts can ping each other within this network but not across the real interface and IP range of the host machine. Additional routing and IP configuration is needed to allow outside communication.
GStreamer is a media framework for Linux that allows construction of graphs to process and play multimedia streams. It includes elements for sources, filters, decoders, encoders and sinks that can be connected together. Examples show how to play audio and video files, stream from webcams and the internet, and record or stream multimedia compositions. GStreamer has plugins for common formats and can be used for applications like media players, video editing and multimedia centers.
This document provides an introduction and overview of Gstreamer, including its concepts and examples of its use. Gstreamer is a media framework that allows building media handling applications and facilitating tasks like accessing hardware, building plugins, and using scriptable command line tools. It discusses key Gstreamer concepts and provides examples of using it to analyze media files, transcode video and audio to different formats, and stream video. The document encourages questions and provides credits for resources used.
This document discusses Zurg, a distributed process management system with a master-slave architecture. The Zurg slave runs on each host and can run commands, start and monitor applications, and collect performance data. It communicates with the Zurg master. Some challenges discussed include reliably detecting when processes exit, limiting output, and ensuring processes are properly restarted if the slave crashes. The master will store status information accessible via web interfaces.
This document discusses using Docker containers without Docker. It provides a Vagrantfile configuration to set up a virtual machine environment for experiments. The Vagrantfile configures a Ubuntu 18.04 virtual machine with Docker, Go, and other tools installed. The document then covers mounting namespaces and how to isolate the root filesystem of a process to emulate containers without Docker.
The document describes using generators to process large log files. A non-generator solution would open the log file and iterate through it line-by-line, splitting each line to extract the byte value and add it to a running total. However, this requires keeping the entire file contents in memory. A generator-based solution yields lines from the file one at a time to avoid loading the entire file into memory at once.
PuppetConf 2016: Nice and Secure: Good OpSec Hygiene With Puppet! – Peter Sou...Puppet
Here are the slides from Peter Souter's PuppetConf 2016 presentation called Nice and Secure: Good OpSec Hygiene With Puppet!. Watch the videos at https://www.youtube.com/playlist?list=PLV86BgbREluVjwwt-9UL8u2Uy8xnzpIqa
TDC2016POA | Trilha Ruby - Stack Level too Deep e Tail Call Optimization: É u...tdc-globalcode
The document discusses stack overflow errors in Ruby. It explains that Ruby uses a stack to store method calls and variable scopes. If a method recursively calls itself too many times, it can exceed the stack size, resulting in a "stack level too deep" error. This error acts as a protection against infinite recursion that could crash the program by filling the entire stack.
The OSI Superboard II was the computer on which I first learned to program back in 1979. Python is why programming remains fun today. In this tale of old meets new, I describe how I have used Python 3 to create a cloud computing service for my still-working Superboard--a problem complicated by it only having 8Kb of RAM and 300-baud cassette tape audio ports for I/O.
The document discusses the SWIG tool, which allows C/C++ code to be integrated with scripting languages like Perl, Python, and Tcl. SWIG works by taking C/C++ header files, wrapping the code in a wrapper module, and compiling it into an extension that can then be used from the scripting language. While SWIG handles many common C/C++ features automatically, interface building can sometimes be challenging due to issues like pointer ambiguity, preprocessor macros, or advanced C++ features that SWIG may not fully support.
1. The document discusses configuring internet authentication and WiFi for FreeBSD 6.2. It involves compiling the FreeBSD kernel to include authentication modules, setting up various services like Apache, MySQL, FreeRADIUS, and Chillispot.
2. Configuration steps include enabling firewall, NAT, and proxy options in the kernel, installing LAMP and SSL modules, and configuring FreeRADIUS for authentication using MySQL with users added to the database.
3. Additional services like Squid and MRTG are also configured for monitoring network traffic and authentication logs. The document provides commands and configuration files needed to set up this authentication infrastructure on FreeBSD 6.2.
The document discusses various Linux commands for working with files and directories, downloading data from FTP servers, and running BLAST to analyze sequence data. It provides examples of using commands like ls, mkdir, cp, mv, rm, cd, tar, gzip, lftp, formatdb, and blastall. The document walks through cloning BLAST from an FTP site, preparing yeast sequence files for analysis, and running BLASTN and TBLASTX searches to identify matches between a sample sequence and the yeast database.
The document provides instructions for updating the FreeBSD 7.2 ports tree on a WebServer. The following steps are outlined:
1. Log in as the user "sermpan" and su to root.
2. Extract a backup of the ports tree files from /backups/distfiles72.tar.
3. Install and clean the cvsup port to update the ports tree files.
This document provides an introduction to the VeriFast program verifier. It describes how to set up VeriFast, including downloading required files. It explains that VeriFast can verify single-threaded and multi-threaded C/Java programs annotated with preconditions and postconditions written in separation logic, and that it avoids illegal memory accesses like buffer overflows. The document demonstrates running VeriFast on sample code, showing how it finds errors, and provides references for more information.
This document discusses a presentation on advanced uses of generators in Python. It covers context managers, which allow entry and exit actions for code blocks using the 'with' statement. Generators can be used to implement context managers via the yield keyword and decorator contextmanager. This transforms generator functions into objects that support the required __enter__ and __exit__ methods to monitor code block execution. The presentation aims to expand understanding of generators beyond iteration by exploring this technique for simplifying resource management tasks like file handling through context managers.
The document discusses various techniques for optimizing the performance of embedded Ruby (ERuby) templates. It describes 7 iterations of improvements to "MyEruby" that reduced the processing time from over 69 seconds to under 1 second. The optimizations included avoiding line splitting, replacing parsing with patterns, tuning regular expressions, inline expansion and array buffers.
Communications is a versatile field that develops skills like critical thinking, problem solving, and writing which are valuable for a wide range of careers from journalism to public relations. The growth of social media and new telecommunication technologies have rapidly changed the communications industry and created more opportunities in areas like crisis management. A communications degree cultivates skills applicable to many career options including internal/external communications, sales, education, research, management, and consulting.
This document proposes a framework for modeling tagging systems and user tagging behavior to combat tag spam. It introduces methods for ranking documents matching a tag based on taggers' reliability. The authors study how existing approaches perform under malicious attacks and the impact of moderation. They define models of good and bad tagging behavior to simulate tag spam and evaluate different query answering schemes and moderator strategies.
This document provides guidance on using the OpenSolaris operating system on Amazon Elastic Compute Cloud (EC2). It discusses the AMI tools and APIs needed to access EC2, prerequisites for using the Solaris AMI such as Java and SSH setup, how to launch and connect to OpenSolaris instances on EC2, and how to rebundle and register customized OpenSolaris AMIs. It also covers limitations and references for further information.
This document brings together opposites like rappers and forest animals. It combines creative concepts like Loctite glue sticking things together that would never be joined otherwise. Objects that are totally opposed or antithetical to each other are united.
The document describes a tutorial on generator tricks for systems programmers given by David Beazley. It introduces generators and how they can be used to process large data files like web server logs more efficiently by generating values one at a time instead of building large lists. Specifically, it presents a problem of summing the last column of a web log to find the total bytes transferred, and suggests using a generator to process the log line by line due to potentially large file sizes.
The document discusses the challenges of working with binary encoded data and introduces Preon as a declarative data binding framework for binary encoded data. Preon aims to allow developers to declaratively map data structures to encoding formats and generate decoders, encoders and documentation automatically through annotations on data types. It emphasizes convention over configuration and supports features like expressions, references, inheritance and variable introductions to handle complex encoding scenarios.
The document summarizes a conference called the JVM Language Summit that was held in 2008. Over 80 key VM and language designers met for 3 days to discuss the future of their projects related to the Java Virtual Machine (JVM). Presentations were given on various languages and VMs like Java, Clojure, Scala, and the HotSpot VM. Key topics of discussion included invokedynamic, metaobject protocols, language interoperability, and platform design. Attendees found the rapid exchange of ideas and new partnerships formed to be very valuable for advancing innovation on the JVM.
This document discusses using Docker containers without Docker. It provides a Vagrantfile configuration to set up a virtual machine environment for experiments. The Vagrantfile configures a Ubuntu 18.04 virtual machine with Docker, Go, and other tools installed. The document then covers mounting namespaces and how to isolate the root filesystem of a process to emulate containers without Docker.
The document describes using generators to process large log files. A non-generator solution would open the log file and iterate through it line-by-line, splitting each line to extract the byte value and add it to a running total. However, this requires keeping the entire file contents in memory. A generator-based solution yields lines from the file one at a time to avoid loading the entire file into memory at once.
PuppetConf 2016: Nice and Secure: Good OpSec Hygiene With Puppet! – Peter Sou...Puppet
Here are the slides from Peter Souter's PuppetConf 2016 presentation called Nice and Secure: Good OpSec Hygiene With Puppet!. Watch the videos at https://www.youtube.com/playlist?list=PLV86BgbREluVjwwt-9UL8u2Uy8xnzpIqa
TDC2016POA | Trilha Ruby - Stack Level too Deep e Tail Call Optimization: É u...tdc-globalcode
The document discusses stack overflow errors in Ruby. It explains that Ruby uses a stack to store method calls and variable scopes. If a method recursively calls itself too many times, it can exceed the stack size, resulting in a "stack level too deep" error. This error acts as a protection against infinite recursion that could crash the program by filling the entire stack.
The OSI Superboard II was the computer on which I first learned to program back in 1979. Python is why programming remains fun today. In this tale of old meets new, I describe how I have used Python 3 to create a cloud computing service for my still-working Superboard--a problem complicated by it only having 8Kb of RAM and 300-baud cassette tape audio ports for I/O.
The document discusses the SWIG tool, which allows C/C++ code to be integrated with scripting languages like Perl, Python, and Tcl. SWIG works by taking C/C++ header files, wrapping the code in a wrapper module, and compiling it into an extension that can then be used from the scripting language. While SWIG handles many common C/C++ features automatically, interface building can sometimes be challenging due to issues like pointer ambiguity, preprocessor macros, or advanced C++ features that SWIG may not fully support.
1. The document discusses configuring internet authentication and WiFi for FreeBSD 6.2. It involves compiling the FreeBSD kernel to include authentication modules, setting up various services like Apache, MySQL, FreeRADIUS, and Chillispot.
2. Configuration steps include enabling firewall, NAT, and proxy options in the kernel, installing LAMP and SSL modules, and configuring FreeRADIUS for authentication using MySQL with users added to the database.
3. Additional services like Squid and MRTG are also configured for monitoring network traffic and authentication logs. The document provides commands and configuration files needed to set up this authentication infrastructure on FreeBSD 6.2.
The document discusses various Linux commands for working with files and directories, downloading data from FTP servers, and running BLAST to analyze sequence data. It provides examples of using commands like ls, mkdir, cp, mv, rm, cd, tar, gzip, lftp, formatdb, and blastall. The document walks through cloning BLAST from an FTP site, preparing yeast sequence files for analysis, and running BLASTN and TBLASTX searches to identify matches between a sample sequence and the yeast database.
The document provides instructions for updating the FreeBSD 7.2 ports tree on a WebServer. The following steps are outlined:
1. Log in as the user "sermpan" and su to root.
2. Extract a backup of the ports tree files from /backups/distfiles72.tar.
3. Install and clean the cvsup port to update the ports tree files.
This document provides an introduction to the VeriFast program verifier. It describes how to set up VeriFast, including downloading required files. It explains that VeriFast can verify single-threaded and multi-threaded C/Java programs annotated with preconditions and postconditions written in separation logic, and that it avoids illegal memory accesses like buffer overflows. The document demonstrates running VeriFast on sample code, showing how it finds errors, and provides references for more information.
This document discusses a presentation on advanced uses of generators in Python. It covers context managers, which allow entry and exit actions for code blocks using the 'with' statement. Generators can be used to implement context managers via the yield keyword and decorator contextmanager. This transforms generator functions into objects that support the required __enter__ and __exit__ methods to monitor code block execution. The presentation aims to expand understanding of generators beyond iteration by exploring this technique for simplifying resource management tasks like file handling through context managers.
The document discusses various techniques for optimizing the performance of embedded Ruby (ERuby) templates. It describes 7 iterations of improvements to "MyEruby" that reduced the processing time from over 69 seconds to under 1 second. The optimizations included avoiding line splitting, replacing parsing with patterns, tuning regular expressions, inline expansion and array buffers.
Communications is a versatile field that develops skills like critical thinking, problem solving, and writing which are valuable for a wide range of careers from journalism to public relations. The growth of social media and new telecommunication technologies have rapidly changed the communications industry and created more opportunities in areas like crisis management. A communications degree cultivates skills applicable to many career options including internal/external communications, sales, education, research, management, and consulting.
This document proposes a framework for modeling tagging systems and user tagging behavior to combat tag spam. It introduces methods for ranking documents matching a tag based on taggers' reliability. The authors study how existing approaches perform under malicious attacks and the impact of moderation. They define models of good and bad tagging behavior to simulate tag spam and evaluate different query answering schemes and moderator strategies.
This document provides guidance on using the OpenSolaris operating system on Amazon Elastic Compute Cloud (EC2). It discusses the AMI tools and APIs needed to access EC2, prerequisites for using the Solaris AMI such as Java and SSH setup, how to launch and connect to OpenSolaris instances on EC2, and how to rebundle and register customized OpenSolaris AMIs. It also covers limitations and references for further information.
This document brings together opposites like rappers and forest animals. It combines creative concepts like Loctite glue sticking things together that would never be joined otherwise. Objects that are totally opposed or antithetical to each other are united.
The document describes a tutorial on generator tricks for systems programmers given by David Beazley. It introduces generators and how they can be used to process large data files like web server logs more efficiently by generating values one at a time instead of building large lists. Specifically, it presents a problem of summing the last column of a web log to find the total bytes transferred, and suggests using a generator to process the log line by line due to potentially large file sizes.
The document discusses the challenges of working with binary encoded data and introduces Preon as a declarative data binding framework for binary encoded data. Preon aims to allow developers to declaratively map data structures to encoding formats and generate decoders, encoders and documentation automatically through annotations on data types. It emphasizes convention over configuration and supports features like expressions, references, inheritance and variable introductions to handle complex encoding scenarios.
The document summarizes a conference called the JVM Language Summit that was held in 2008. Over 80 key VM and language designers met for 3 days to discuss the future of their projects related to the Java Virtual Machine (JVM). Presentations were given on various languages and VMs like Java, Clojure, Scala, and the HotSpot VM. Key topics of discussion included invokedynamic, metaobject protocols, language interoperability, and platform design. Attendees found the rapid exchange of ideas and new partnerships formed to be very valuable for advancing innovation on the JVM.
Guillaume Laforge presents on creating domain-specific languages with Groovy. He discusses how DSLs can help bridge communication between developers and subject matter experts by using a more expressive shared language. He provides examples of Groovy's capabilities for building DSLs, including its flexible syntax, optional typing, native constructs, closures, and dynamic metaprogramming features. He also covers integrating DSLs into applications and considerations for designing custom DSLs.
This document discusses Groovy's capabilities for building domain-specific languages (DSLs). It provides examples of how Groovy allows flexible syntax through features like optional typing, closures, builders, and the meta-object protocol (MOP). The MOP allows intercepting method calls at runtime to change behavior. Groovy is well-suited for DSLs as it can seamlessly integrate DSLs into applications and its compiler supports transformations.
This document outlines tips and techniques used by penetration testers. It begins with an introduction explaining that penetration testing involves both standardized methodologies as well as improvisation. The document then provides several tips related to reconnaissance, scanning, networking, passwords, and reporting from penetration tests. Each tip is meant to help save time, enable hacks that otherwise wouldn't be possible, or better help clients understand security risks. Overall, the tips suggest using common tools and techniques creatively to find and exploit security vulnerabilities.
Performance, Games, and Distributed Testing in JavaScriptjeresig
This document discusses various techniques for measuring and optimizing JavaScript performance, including profiling tools in browsers like Firebug and Safari. It also addresses challenges in building multiplayer JavaScript games, such as latency issues, and proposes solutions like combining elements of strategy, intelligence and accuracy. The document concludes by covering distributed and automated testing techniques like TestSwarm that can help address challenges of testing JavaScript across many browsers and platforms.
2 Roads to Redemption - Thoughts on XSS and SQLIAguestfdcb8a
This document discusses approaches to preventing SQL injection attacks (SQLIA) and cross-site scripting (XSS) vulnerabilities. It proposes using rich data types that define how data should be validated, sanitized, and serialized depending on its context and use. This could help frameworks automatically apply the proper validation and output encoding. However, such an approach needs good infrastructure support from frameworks and a comprehensive catalogue of data types to be practical. It may not be worth the effort compared to simpler approaches like those used in Django.
The document discusses Python generator functions and expressions, providing examples of how generator functions can be used to iteratively yield values like in a countdown, and how generator expressions are similar to list comprehensions but produce values iteratively instead of building a list. The document also discusses using generators to process data files like summing the bytes transferred from entries in an Apache web server log.
The document discusses Jeff Hammerbacher's presentation on socializing big data and the Hadoop community. It provides an overview of Hadoop, including what it is, how it works, and some of its subprojects. It also discusses Hadoop's use at Yahoo and the growth of the Hadoop community. The overall message is that Hadoop is producing innovative open source software for large-scale data management and analysis, and that its community is open to all and will play a central role in the evolution of data processing technologies.
This document provides an overview of the presentation "The Buzz About Groovy and Grails" given by Eric Weimer to the Chicago Groovy User Group on March 10, 2009. The presentation introduces Groovy and Grails, explains their benefits for Java developers and IT managers, demonstrates key Groovy features like closures and syntactic sugar, and argues that Groovy and Grails are production ready and improve developer productivity. The document concludes by recommending books for further reading on Groovy and Grails.
This document provides an overview of concurrency in Python using multiprocessing and threading. It begins by introducing the speaker and defining key terms like concurrency, threads, and processes. It then discusses the benefits and use cases of threads versus processes. The document also covers the Global Interpreter Lock (GIL) in Python and how multiprocessing can help avoid it. It provides an example benchmark showing multiprocessing can significantly outperform threading for CPU-bound tasks. Finally, it discusses key aspects of Python's multiprocessing module like Process, Queue, Pool, and Manager classes.
Ditching Fibre Channel & SCSI: Saying hast la vista to your vendors and "ooh ...jasonjwwilliams
The document discusses ditching fibre channel and SCSI storage in favor of more flexible and cost-effective open storage solutions using commodity hardware and Ethernet. It highlights how DigiTar utilizes open source technologies like ZFS, Solaris, and commodity hardware to build robust storage infrastructures at lower costs than traditional enterprise solutions. The presentation addresses challenges in moving to this approach and potential future directions for open storage.
A guest lecture I gave for the "Internet Technology" course at my old University (Bath). I tried to pull together all of the things I wish I'd been told before I started building things on the Web.
The document discusses API design in PHP, focusing on Ning's PHP API. Some key points:
- Ning's PHP API provides a REST interface to its social networking platform and has been in use since 2005.
- The API is used for content storage, user profile management, tagging, search, and other functions.
- Good API design principles include making things predictable, modular, stable, and prioritizing human performance over computer performance.
- API design should be use case driven and additions should be easy while removals are hard. Names, versioning, and documentation are important.
Spring ME is a lightweight version of the Spring framework that aims to provide dependency injection and inversion of control capabilities on resource constrained platforms like Java ME. It uses a meta model and code generation to configure objects without runtime reflection, keeping dependencies and size small. Current features include basic dependency injection through a generated BeanFactory, with plans to add request/session scoping, AOP, and integrations with other Java ME frameworks. The goal is a micro version of Spring useful beyond just Java ME.
Choosing an Application framework for Mobile Linux Devicesshreyas
Presentation i did at Ottawa Linux symposium about various application toolkits for Linux based embedded devices and how they stack up against each other.
Best Practices In Implementing Container Image Promotion PipelinesAll Things Open
Presented by: Baruch Sadogursky, JFrog
Presented at All Things Open 2020
Abstract: Surprisingly, implementing a secure, robust and fast promotion pipelines for container images is not as easy as it might sound. Automating dependency resolution (base images), implementing multiple registries for different maturity stages and making sure that we actually run in production containers from the images we intended can be tricky. In this talk, we will compare different approaches, compile a wish-list of features and create a pipeline that checks all the boxes using free and open-source tools.
Simon Willison gave a presentation on Comet, a technique for enabling live data updates in web applications. Comet allows a web server to push events to connected browsers in real-time. It has faced many technical challenges due to browser limitations. Key techniques discussed include streaming, long polling, and the Bayeaux protocol which provides a common way for Comet clients and servers to communicate. The presentation showed how to easily build a basic Comet application using Jetty and Dojo in just a few lines of code.
Similar to Generator Tricks for Systems Programmers (20)
This document discusses event driven architecture (EDA) and domain driven design. It begins with an introduction to the speaker and an overview of EDA basics. It then describes problems with traditional SOA implementations, where domain logic gets split across many systems. The document proposes that exposing domain events on a shared event bus allows isolating cross-cutting functions to separate systems while keeping domain logic together. It provides examples of how this approach improves scalability and decouples systems. Finally, it outlines potential business benefits of using EDA like enabling complex event processing, business process management, and business activity monitoring on top of the domain events.
The document discusses trends and challenges facing information technology, including building a civic semantic web and waiving rights over linked data. It also discusses whether semantic technologies could permit meaningful brand relationships. The document contains a chart showing government department spending in the UK, with the Department of Health spending £105.7 billion, followed by local and regional government spending £34.3 billion, and the NHS spending £90.7 billion.
genpaxospublic-090703114743-phpapp01.pdfHiroshi Ono
This document summarizes an Erlang meeting held on July 3, 2009 in Tokyo. It discusses the gen_paxos Erlang module, which implements the Paxos consensus algorithm. Paxos is needed to solve problems like split-brains where data could become inconsistent without coordination between nodes. The document explains the key aspects of Paxos like its phases, data model in gen_paxos, and how nodes communicate through message passing in Erlang. It also provides references to related works and papers about Paxos.
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfHiroshi Ono
The document discusses Scala and functional programming concepts. It provides examples of building a chat application in 30 lines of code using Lift, defining case classes and actors for messages. It summarizes that Scala is a pragmatically oriented, statically typed language that runs on the JVM and has a unique blend of object-oriented and functional programming. Functional programming concepts like immutable data structures, functions as first-class values, and for-comprehensions are demonstrated with examples in Scala.
This document is the introduction to "The Little Book of Semaphores" by Allen B. Downey. It provides an overview of the book, which uses examples and puzzles to teach synchronization concepts and patterns. The book aims to give students more practice with these challenging concepts than a typical operating systems course allows. It also discusses the book's licensing as free and open source documentation.
This document provides style guidelines for Scala developers at Twitter. It outlines recommendations for imports, implicit usage, reflection, comments, whitespace, logging, project layout, variable naming conventions, and ends by thanking people for attending.
This document introduces developing a Scala DSL for Apache Camel. It discusses using Scala features like implicit conversions, passing functions as parameters, and by-name parameters to build a DSL. It provides examples of simple routes in the Scala DSL and compares them to Java. It also covers tooling for Scala in Maven and Eclipse and caveats like interacting with Java generics. The goal is to learn basic Scala concepts and syntax for building a Scala DSL, using Camel as an example.
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfHiroshi Ono
The document discusses alternative concurrency paradigms to shared-state concurrency for the JVM, including software transactional memory which allows transactions over shared memory, message passing concurrency using the actor model where actors communicate asynchronously via message passing, and dataflow concurrency where variables can only be assigned once. It provides examples of how these paradigms can be used to implement solutions like transferring funds between bank accounts more elegantly than with shared-state concurrency and locks.
This document discusses using TCP/IP for high performance computing (HPC) applications. It finds that while TCP/IP can achieve bandwidth of 1 Gbps over short distances with low latency, the bandwidth degrades significantly over wide area networks with higher latency. It investigates tuning TCP parameters like socket buffer sizes to improve performance over high latency networks.
Martin Odersky outlines the growth and adoption of Scala over the past 6 years and discusses Scala's future direction over the next 5 years. Key points include:
- Scala has grown from its first classroom use in 2003 to filling a full day of talks at JavaOne in 2009 and developing a large user community.
- Scala 2.8 will include new collections, package objects, named/default parameters, and improved tool support.
- Over the next 5 years, Scala will focus on concurrency and parallelism features at all levels from primitives to tools.
- Other areas of focus include extended libraries, performance improvements, and standardized compiler plugin architecture.
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfHiroshi Ono
This document discusses alternative concurrency paradigms for the Java Virtual Machine (JVM). It begins with an agenda and discusses how Moore's Law no longer solves concurrency problems as processors are becoming multi-core. It then discusses the problems with shared-state concurrency and how separating identity and value can help. The document introduces software transactional memory, message passing concurrency using actors, and dataflow concurrency as alternative paradigms. It uses examples of bank account transfers to demonstrate how these paradigms can be implemented and discusses their advantages over shared-state concurrency.
This document contains the schedule for a conference with sessions on various topics in natural language processing and computational linguistics. The conference will take place from September 14-16. Each day consists of morning and afternoon sessions split into parallel tracks (1a and 1b). Sessions cover areas like semantics, parsing, sentiment analysis, and more. Keynote speakers include Ricardo Baeza-Yates, Kevin Bretonnel Cohen, Mirella Lapata, Shalom Lappin, and Massimo Poesio. Presentations are 20 minutes each with coffee breaks in the mornings and poster sessions in the afternoons.
The article discusses the Guardian's Datastore project, which makes data of public interest freely available online for reuse. Some key points:
- The Datastore contains datasets on topics like MPs' expenses, carbon emissions, and public opinion polls. This data was previously hard to access but the web now allows easy access to billions of statistics.
- Making this data open and machine-readable supports the Guardian's tradition of fact-checking and transparency. It also encourages others to analyze and build upon the data in new ways.
- An early example involved crowdsourcing the review of 500,000 pages of MPs' expenses, revealing new insights. Other Guardian datasets like music recommendations and university rankings are now available for others
genpaxospublic-090703114743-phpapp01.pdfHiroshi Ono
This document summarizes a presentation on Paxos and gen_paxos. It introduces Paxos as a distributed consensus algorithm that is robust to network failures and allows data replication across multiple nodes. It then describes the gen_paxos Erlang implementation of Paxos, including its data model, state machine approach, and messaging between nodes. Key aspects of Paxos like the prepare and propose phases are explained through examples. The document also provides context on applications of Paxos and references for further reading.
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
1. Generator Tricks
For Systems Programmers
David Beazley
http://www.dabeaz.com
Presented at PyCon'2008
Copyright (C) 2008, http://www.dabeaz.com 1- 1
An Introduction
• Generators are cool!
• But what are they?
• And what are they good for?
• That's what this tutorial is about
Copyright (C) 2008, http://www.dabeaz.com 1- 2
2. About Me
• I'm a long-time Pythonista
• First started using Python with version 1.3
• Author : Python Essential Reference
• Responsible for a number of open source
Python-related packages (Swig, PLY, etc.)
Copyright (C) 2008, http://www.dabeaz.com 1- 3
My Story
My addiction to generators started innocently
enough. I was just a happy Python
programmer working away in my secret lair
when I got quot;the call.quot; A call to sort through
1.5 Terabytes of C++ source code (~800
weekly snapshots of a million line application).
That's when I discovered the os.walk()
function. I knew this wasn't going to end well...
Copyright (C) 2008, http://www.dabeaz.com 1- 4
3. Back Story
• I think generators are wicked cool
• An extremely useful language feature
• Yet, they still seem a rather exotic
• I still don't think I've fully wrapped my brain
around the best approach to using them
Copyright (C) 2008, http://www.dabeaz.com 1- 5
A Complaint
• The coverage of generators in most Python
books is lame (mine included)
• Look at all of these cool examples!
• Fibonacci Numbers
• Squaring a list of numbers
• Randomized sequences
• Wow! Blow me over!
Copyright (C) 2008, http://www.dabeaz.com 1- 6
4. This Tutorial
• Some more practical uses of generators
• Focus is quot;systems programmingquot;
• Which loosely includes files, file systems,
parsing, networking, threads, etc.
• My goal : To provide some more compelling
examples of using generators
• Planting some seeds
Copyright (C) 2008, http://www.dabeaz.com 1- 7
Support Files
• Files used in this tutorial are available here:
http://www.dabeaz.com/generators/
• Go there to follow along with the examples
Copyright (C) 2008, http://www.dabeaz.com 1- 8
5. Disclaimer
• This isn't meant to be an exhaustive tutorial
on generators and related theory
• Will be looking at a series of examples
• I don't know if the code I've written is the
quot;bestquot; way to solve any of these problems.
• So, let's have a discussion
Copyright (C) 2008, http://www.dabeaz.com 1- 9
Performance Details
• There are some later performance numbers
• Python 2.5.1 on OS X 10.4.11
• All tests were conducted on the following:
• Mac Pro 2x2.66 Ghz Dual-Core Xeon
• 3 Gbytes RAM
• WDC WD2500JS-41SGB0 Disk (250G)
• Timings are 3-run average of 'time' command
Copyright (C) 2008, http://www.dabeaz.com 1- 10
6. Part I
Introduction to Iterators and Generators
Copyright (C) 2008, http://www.dabeaz.com 1- 11
Iteration
• As you know, Python has a quot;forquot; statement
• You use it to loop over a collection of items
>>> for x in [1,4,5,10]:
... print x,
...
1 4 5 10
>>>
• And, as you have probably noticed, you can
iterate over many different kinds of objects
(not just lists)
Copyright (C) 2008, http://www.dabeaz.com 1- 12
7. Iterating over a Dict
• If you loop over a dictionary you get keys
>>> prices = { 'GOOG' : 490.10,
... 'AAPL' : 145.23,
... 'YHOO' : 21.71 }
...
>>> for key in prices:
... print key
...
YHOO
GOOG
AAPL
>>>
Copyright (C) 2008, http://www.dabeaz.com 1- 13
Iterating over a String
• If you loop over a string, you get characters
>>> s = quot;Yow!quot;
>>> for c in s:
... print c
...
Y
o
w
!
>>>
Copyright (C) 2008, http://www.dabeaz.com 1- 14
8. Iterating over a File
• If you loop over a file you get lines
>>> for line in open(quot;real.txtquot;):
... print line,
...
Real Programmers write in FORTRAN
Maybe they do now,
in this decadent era of
Lite beer, hand calculators, and quot;user-friendlyquot; software
but back in the Good Old Days,
when the term quot;softwarequot; sounded funny
and Real Computers were made out of drums and vacuum tubes,
Real Programmers wrote in machine code.
Not FORTRAN. Not RATFOR. Not, even, assembly language.
Machine Code.
Raw, unadorned, inscrutable hexadecimal numbers.
Directly.
Copyright (C) 2008, http://www.dabeaz.com 1- 15
Consuming Iterables
• Many functions consume an quot;iterablequot; object
• Reductions:
sum(s), min(s), max(s)
• Constructors
list(s), tuple(s), set(s), dict(s)
• in operator
item in s
• Many others in the library
Copyright (C) 2008, http://www.dabeaz.com 1- 16
9. Iteration Protocol
• The reason why you can iterate over different
objects is that there is a specific protocol
>>> items = [1, 4, 5]
>>> it = iter(items)
>>> it.next()
1
>>> it.next()
4
>>> it.next()
5
>>> it.next()
Traceback (most recent call last):
File quot;<stdin>quot;, line 1, in <module>
StopIteration
>>>
Copyright (C) 2008, http://www.dabeaz.com 1- 17
Iteration Protocol
• An inside look at the for statement
for x in obj:
# statements
• Underneath the covers
_iter = iter(obj) # Get iterator object
while 1:
try:
x = _iter.next() # Get next item
except StopIteration: # No more items
break
# statements
...
• Any object that supports iter() and next() is
said to be quot;iterable.quot;
Copyright (C) 2008, http://www.dabeaz.com 1-18
10. Supporting Iteration
• User-defined objects can support iteration
• Example: Counting down...
>>> for x in countdown(10):
... print x,
...
10 9 8 7 6 5 4 3 2 1
>>>
• To do this, you just have to make the object
implement __iter__() and next()
Copyright (C) 2008, http://www.dabeaz.com 1-19
Supporting Iteration
• Sample implementation
class countdown(object):
def __init__(self,start):
self.count = start
def __iter__(self):
return self
def next(self):
if self.count <= 0:
raise StopIteration
r = self.count
self.count -= 1
return r
Copyright (C) 2008, http://www.dabeaz.com 1-20
11. Iteration Example
• Example use:
>>> c = countdown(5)
>>> for i in c:
... print i,
...
5 4 3 2 1
>>>
Copyright (C) 2008, http://www.dabeaz.com 1-21
Iteration Commentary
• There are many subtle details involving the
design of iterators for various objects
• However, we're not going to cover that
• This isn't a tutorial on quot;iteratorsquot;
• We're talking about generators...
Copyright (C) 2008, http://www.dabeaz.com 1-22
12. Generators
• A generator is a function that produces a
sequence of results instead of a single value
def countdown(n):
while n > 0:
yield n
n -= 1
>>> for i in countdown(5):
... print i,
...
5 4 3 2 1
>>>
• Instead of returning a value, you generate a
series of values (using the yield statement)
Copyright (C) 2008, http://www.dabeaz.com 1-23
Generators
• Behavior is quite different than normal func
• Calling a generator function creates an
generator object. However, it does not start
running the function.
def countdown(n):
print quot;Counting down fromquot;, n
while n > 0:
yield n
n -= 1 Notice that no
output was
>>> x = countdown(10) produced
>>> x
<generator object at 0x58490>
>>>
Copyright (C) 2008, http://www.dabeaz.com 1-24
13. Generator Functions
• The function only executes on next()
>>> x = countdown(10)
>>> x
<generator object at 0x58490>
>>> x.next()
Counting down from 10 Function starts
10 executing here
>>>
• yield produces a value, but suspends the function
• Function resumes on next call to next()
>>> x.next()
9
>>> x.next()
8
>>>
Copyright (C) 2008, http://www.dabeaz.com 1-25
Generator Functions
• When the generator returns, iteration stops
>>> x.next()
1
>>> x.next()
Traceback (most recent call last):
File quot;<stdin>quot;, line 1, in ?
StopIteration
>>>
Copyright (C) 2008, http://www.dabeaz.com 1-26
14. Generator Functions
• A generator function is mainly a more
convenient way of writing an iterator
• You don't have to worry about the iterator
protocol (.next, .__iter__, etc.)
• It just works
Copyright (C) 2008, http://www.dabeaz.com 1-27
Generators vs. Iterators
• A generator function is slightly different
than an object that supports iteration
• A generator is a one-time operation. You
can iterate over the generated data once,
but if you want to do it again, you have to
call the generator function again.
• This is different than a list (which you can
iterate over as many times as you want)
Copyright (C) 2008, http://www.dabeaz.com 1-28
15. Generator Expressions
• A generated version of a list comprehension
>>> a = [1,2,3,4]
>>> b = (2*x for x in a)
>>> b
<generator object at 0x58760>
>>> for i in b: print b,
...
2 4 6 8
>>>
• This loops over a sequence of items and applies
an operation to each item
• However, results are produced one at a time
using a generator
Copyright (C) 2008, http://www.dabeaz.com 1-29
Generator Expressions
• Important differences from a list comp.
• Does not construct a list.
• Only useful purpose is iteration
• Once consumed, can't be reused
• Example:
>>> a = [1,2,3,4]
>>> b = [2*x for x in a]
>>> b
[2, 4, 6, 8]
>>> c = (2*x for x in a)
<generator object at 0x58760>
>>>
Copyright (C) 2008, http://www.dabeaz.com 1-30
16. Generator Expressions
• General syntax
(expression for i in s if cond1
for j in t if cond2
...
if condfinal)
• What it means for i in s:
if cond1:
for j in t:
if cond2:
...
if condfinal: yield expression
Copyright (C) 2008, http://www.dabeaz.com 1-31
A Note on Syntax
• The parens on a generator expression can
dropped if used as a single function argument
• Example:
sum(x*x for x in s)
Generator expression
Copyright (C) 2008, http://www.dabeaz.com 1-32
17. Interlude
• We now have two basic building blocks
• Generator functions:
def countdown(n):
while n > 0:
yield n
n -= 1
• Generator expressions
squares = (x*x for x in s)
• In both cases, we get an object that
generates values (which are typically
consumed in a for loop)
Copyright (C) 2008, http://www.dabeaz.com 1-33
Part 2
Processing Data Files
(Show me your Web Server Logs)
Copyright (C) 2008, http://www.dabeaz.com 1- 34
18. Programming Problem
Find out how many bytes of data were
transferred by summing up the last column
of data in this Apache web server log
81.107.39.38 - ... quot;GET /ply/ HTTP/1.1quot; 200 7587
81.107.39.38 - ... quot;GET /favicon.ico HTTP/1.1quot; 404 133
81.107.39.38 - ... quot;GET /ply/bookplug.gif HTTP/1.1quot; 200 23903
81.107.39.38 - ... quot;GET /ply/ply.html HTTP/1.1quot; 200 97238
81.107.39.38 - ... quot;GET /ply/example.html HTTP/1.1quot; 200 2359
66.249.72.134 - ... quot;GET /index.html HTTP/1.1quot; 200 4447
Oh yeah, and the log file might be huge (Gbytes)
Copyright (C) 2008, http://www.dabeaz.com 1-35
The Log File
• Each line of the log looks like this:
81.107.39.38 - ... quot;GET /ply/ply.html HTTP/1.1quot; 200 97238
• The number of bytes is the last column
bytestr = line.rsplit(None,1)[1]
• It's either a number or a missing value (-)
81.107.39.38 - ... quot;GET /ply/ HTTP/1.1quot; 304 -
• Converting the value
if bytestr != '-':
bytes = int(bytestr)
Copyright (C) 2008, http://www.dabeaz.com 1-36
19. A Non-Generator Soln
• Just do a simple for-loop
wwwlog = open(quot;access-logquot;)
total = 0
for line in wwwlog:
bytestr = line.rsplit(None,1)[1]
if bytestr != '-':
total += int(bytestr)
print quot;Totalquot;, total
• We read line-by-line and just update a sum
• However, that's so 90s...
Copyright (C) 2008, http://www.dabeaz.com 1-37
A Generator Solution
• Let's use some generator expressions
wwwlog = open(quot;access-logquot;)
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print quot;Totalquot;, sum(bytes)
• Whoa! That's different!
• Less code
• A completely different programming style
Copyright (C) 2008, http://www.dabeaz.com 1-38
20. Generators as a Pipeline
• To understand the solution, think of it as a data
processing pipeline
access-log wwwlog bytecolumn bytes sum() total
• Each step is defined by iteration/generation
wwwlog = open(quot;access-logquot;)
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print quot;Totalquot;, sum(bytes)
Copyright (C) 2008, http://www.dabeaz.com 1-39
Being Declarative
• At each step of the pipeline, we declare an
operation that will be applied to the entire
input stream
access-log wwwlog bytecolumn bytes sum() total
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
This operation gets applied to
every line of the log file
Copyright (C) 2008, http://www.dabeaz.com 1-40
21. Being Declarative
• Instead of focusing on the problem at a
line-by-line level, you just break it down
into big operations that operate on the
whole file
• This is very much a quot;declarativequot; style
• The key : Think big...
Copyright (C) 2008, http://www.dabeaz.com 1-41
Iteration is the Glue
• The glue that holds the pipeline together is the
iteration that occurs in each step
wwwlog = open(quot;access-logquot;)
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print quot;Totalquot;, sum(bytes)
• The calculation is being driven by the last step
• The sum() function is consuming values being
pushed through the pipeline (via .next() calls)
Copyright (C) 2008, http://www.dabeaz.com 1-42
22. Performance
• Surely, this generator approach has all
sorts of fancy-dancy magic that is slow.
• Let's check it out on a 1.3Gb log file...
% ls -l big-access-log
-rw-r--r-- beazley 1303238000 Feb 29 08:06 big-access-log
Copyright (C) 2008, http://www.dabeaz.com 1-43
Performance Contest
wwwlog = open(quot;big-access-logquot;)
total = 0
for line in wwwlog: Time
bytestr = line.rsplit(None,1)[1]
if bytestr != '-':
total += int(bytestr) 27.20
print quot;Totalquot;, total
wwwlog = open(quot;big-access-logquot;)
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print quot;Totalquot;, sum(bytes) Time
25.96
Copyright (C) 2008, http://www.dabeaz.com 1-44
23. Commentary
• Not only was it not slow, it was 5% faster
• And it was less code
• And it was relatively easy to read
• And frankly, I like it a whole better...
quot;Back in the old days, we used AWK for this and
we liked it. Oh, yeah, and get off my lawn!quot;
Copyright (C) 2008, http://www.dabeaz.com 1-45
Performance Contest
wwwlog = open(quot;access-logquot;)
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print quot;Totalquot;, sum(bytes) Time
25.96
% awk '{ total += $NF } END { print total }' big-access-log
Time
Note:extracting the last
column may not be 37.33
awk's strong point
Copyright (C) 2008, http://www.dabeaz.com 1-46
24. Food for Thought
• At no point in our generator solution did
we ever create large temporary lists
• Thus, not only is that solution faster, it can
be applied to enormous data files
• It's competitive with traditional tools
Copyright (C) 2008, http://www.dabeaz.com 1-47
More Thoughts
• The generator solution was based on the
concept of pipelining data between
different components
• What if you had more advanced kinds of
components to work with?
• Perhaps you could perform different kinds
of processing by just plugging various
pipeline components together
Copyright (C) 2008, http://www.dabeaz.com 1-48
25. This Sounds Familiar
• The Unix philosophy
• Have a collection of useful system utils
• Can hook these up to files or each other
• Perform complex tasks by piping data
Copyright (C) 2008, http://www.dabeaz.com 1-49
Part 3
Fun with Files and Directories
Copyright (C) 2008, http://www.dabeaz.com 1- 50
26. Programming Problem
You have hundreds of web server logs scattered
across various directories. In additional, some of
the logs are compressed. Modify the last program
so that you can easily read all of these logs
foo/
access-log-012007.gz
access-log-022007.gz
access-log-032007.gz
...
access-log-012008
bar/
access-log-092007.bz2
...
access-log-022008
Copyright (C) 2008, http://www.dabeaz.com 1-51
os.walk()
• A very useful function for searching the
file system
import os
for path, dirlist, filelist in os.walk(topdir):
# path : Current directory
# dirlist : List of subdirectories
# filelist : List of files
...
• This utilizes generators to recursively walk
through the file system
Copyright (C) 2008, http://www.dabeaz.com 1-52
27. find
• Generate all filenames in a directory tree
that match a given filename pattern
import os
import fnmatch
def gen_find(filepat,top):
for path, dirlist, filelist in os.walk(top):
for name in fnmatch.filter(filelist,filepat):
yield os.path.join(path,name)
• Examples
pyfiles = gen_find(quot;*.pyquot;,quot;/quot;)
logs = gen_find(quot;access-log*quot;,quot;/usr/www/quot;)
Copyright (C) 2008, http://www.dabeaz.com 1-53
Performance Contest
pyfiles = gen_find(quot;*.pyquot;,quot;/quot;)
for name in pyfiles:
Wall Clock Time
print name
559s
% find / -name '*.py'
Wall Clock Time
468s
Performed on a 750GB file system
containing about 140000 .py files
Copyright (C) 2008, http://www.dabeaz.com 1-54
28. A File Opener
• Open a sequence of filenames
import gzip, bz2
def gen_open(filenames):
for name in filenames:
if name.endswith(quot;.gzquot;):
yield gzip.open(name)
elif name.endswith(quot;.bz2quot;):
yield bz2.BZ2File(name)
else:
yield open(name)
• This is interesting.... it takes a sequence of
filenames as input and yields a sequence of open
file objects
Copyright (C) 2008, http://www.dabeaz.com 1-55
cat
• Concatenate items from one or more
source into a single sequence of items
def gen_cat(sources):
for s in sources:
for item in s:
yield item
• Example:
lognames = gen_find(quot;access-log*quot;, quot;/usr/wwwquot;)
logfiles = gen_open(lognames)
loglines = gen_cat(logfiles)
Copyright (C) 2008, http://www.dabeaz.com 1-56
29. grep
• Generate a sequence of lines that contain
a given regular expression
import re
def gen_grep(pat, lines):
patc = re.compile(pat)
for line in lines:
if patc.search(line): yield line
• Example:
lognames = gen_find(quot;access-log*quot;, quot;/usr/wwwquot;)
logfiles = gen_open(lognames)
loglines = gen_cat(logfiles)
patlines = gen_grep(pat, loglines)
Copyright (C) 2008, http://www.dabeaz.com 1-57
Example
• Find out how many bytes transferred for a
specific pattern in a whole directory of logs
pat = rquot;somepatternquot;
logdir = quot;/some/dir/quot;
filenames = gen_find(quot;access-log*quot;,logdir)
logfiles = gen_open(filenames)
loglines = gen_cat(logfiles)
patlines = gen_grep(pat,loglines)
bytecolumn = (line.rsplit(None,1)[1] for line in patlines)
bytes = (int(x) for x in bytecolumn if x != '-')
print quot;Totalquot;, sum(bytes)
Copyright (C) 2008, http://www.dabeaz.com 1-58
30. Important Concept
• Generators decouple iteration from the
code that uses the results of the iteration
• In the last example, we're performing a
calculation on a sequence of lines
• It doesn't matter where or how those
lines are generated
• Thus, we can plug any number of
components together up front as long as
they eventually produce a line sequence
Copyright (C) 2008, http://www.dabeaz.com 1-59
Part 4
Parsing and Processing Data
Copyright (C) 2008, http://www.dabeaz.com 1- 60
31. Programming Problem
Web server logs consist of different columns of
data. Parse each line into a useful data structure
that allows us to easily inspect the different fields.
81.107.39.38 - - [24/Feb/2008:00:08:59 -0600] quot;GET ...quot; 200 7587
host referrer user [datetime] quot;requestquot; status bytes
Copyright (C) 2008, http://www.dabeaz.com 1-61
Parsing with Regex
• Let's route the lines through a regex parser
logpats = r'(S+) (S+) (S+) [(.*?)] '
r'quot;(S+) (S+) (S+)quot; (S+) (S+)'
logpat = re.compile(logpats)
groups = (logpat.match(line) for line in loglines)
tuples = (g.groups() for g in groups if g)
• This generates a sequence of tuples
('71.201.176.194', '-', '-', '26/Feb/2008:10:30:08 -0600',
'GET', '/ply/ply.html', 'HTTP/1.1', '200', '97238')
Copyright (C) 2008, http://www.dabeaz.com 1-62
32. Tuples to Dictionaries
• Let's turn tuples into dictionaries
colnames = ('host','referrer','user','datetime',
'method','request','proto','status','bytes')
log = (dict(zip(colnames,t)) for t in tuples)
• This generates a sequence of named fields
{ 'status' : '200',
'proto' : 'HTTP/1.1',
'referrer': '-',
'request' : '/ply/ply.html',
'bytes' : '97238',
'datetime': '24/Feb/2008:00:08:59 -0600',
'host' : '140.180.132.213',
'user' : '-',
'method' : 'GET'}
Copyright (C) 2008, http://www.dabeaz.com 1-63
Field Conversion
• Map specific dictionary fields through a function
def field_map(dictseq,name,func):
for d in dictseq:
d[name] = func(d[name])
yield d
• Example: Convert a few field values
log = field_map(log,quot;statusquot;, int)
log = field_map(log,quot;bytesquot;,
lambda s: int(s) if s !='-' else 0)
Copyright (C) 2008, http://www.dabeaz.com 1-64
33. Field Conversion
• Creates dictionaries of converted values
{ 'status': 200,
'proto': 'HTTP/1.1', Note conversion
'referrer': '-',
'request': '/ply/ply.html',
'datetime': '24/Feb/2008:00:08:59 -0600',
'bytes': 97238,
'host': '140.180.132.213',
'user': '-',
'method': 'GET'}
• Again, this is just one big processing pipeline
Copyright (C) 2008, http://www.dabeaz.com 1-65
The Code So Far
lognames = gen_find(quot;access-log*quot;,quot;wwwquot;)
logfiles = gen_open(lognames)
loglines = gen_cat(logfiles)
groups = (logpat.match(line) for line in loglines)
tuples = (g.groups() for g in groups if g)
colnames = ('host','referrer','user','datetime','method',
'request','proto','status','bytes')
log = (dict(zip(colnames,t)) for t in tuples)
log = field_map(log,quot;bytesquot;,
lambda s: int(s) if s != '-' else 0)
log = field_map(log,quot;statusquot;,int)
Copyright (C) 2008, http://www.dabeaz.com 1-66
34. Packaging
• To make it more sane, you may want to package
parts of the code into functions
def lines_from_dir(filepat, dirname):
names = gen_find(filepat,dirname)
files = gen_open(names)
lines = gen_cat(files)
return lines
• This is a generate purpose function that reads all
lines from a series of files in a directory
Copyright (C) 2008, http://www.dabeaz.com 1-67
Packaging
• Parse an Apache log
def apache_log(lines):
groups = (logpat.match(line) for line in lines)
tuples = (g.groups() for g in groups if g)
colnames = ('host','referrer','user','datetime','method',
'request','proto','status','bytes')
log = (dict(zip(colnames,t)) for t in tuples)
log = field_map(log,quot;bytesquot;,
lambda s: int(s) if s != '-' else 0)
log = field_map(log,quot;statusquot;,int)
return log
Copyright (C) 2008, http://www.dabeaz.com 1-68
35. Example Use
• It's easy
lines = lines_from_dir(quot;access-log*quot;,quot;wwwquot;)
log = apache_log(lines)
for r in log:
print r
• Different components have been subdivided
according to the data that they process
Copyright (C) 2008, http://www.dabeaz.com 1-69
A Query Language
• Now that we have our log, let's do some queries
• Find the set of all documents that 404
stat404 = set(r['request'] for r in log
if r['status'] == 404)
• Print all requests that transfer over a megabyte
large = (r for r in log
if r['bytes'] > 1000000)
for r in large:
print r['request'], r['bytes']
Copyright (C) 2008, http://www.dabeaz.com 1-70
36. A Query Language
• Find the largest data transfer
print quot;%d %squot; % max((r['bytes'],r['request'])
for r in log)
• Collect all unique host IP addresses
hosts = set(r['host'] for r in log)
• Find the number of downloads of a file
sum(1 for r in log
if r['request'] == '/ply/ply-2.3.tar.gz')
Copyright (C) 2008, http://www.dabeaz.com 1-71
A Query Language
• Find out who has been hitting robots.txt
addrs = set(r['host'] for r in log
if 'robots.txt' in r['request'])
import socket
for addr in addrs:
try:
print socket.gethostbyaddr(addr)[0]
except socket.herror:
print addr
Copyright (C) 2008, http://www.dabeaz.com 1-72
37. Performance Study
• Sadly, the last example doesn't run so fast on a
huge input file (53 minutes on the 1.3GB log)
• But, the beauty of generators is that you can plug
filters in at almost any stage
lines = lines_from_dir(quot;big-access-logquot;,quot;.quot;)
lines = (line for line in lines if 'robots.txt' in line)
log = apache_log(lines)
addrs = set(r['host'] for r in log)
...
• That version takes 93 seconds
Copyright (C) 2008, http://www.dabeaz.com 1-73
Some Thoughts
• I like the idea of using generator expressions as a
pipeline query language
• You can write simple filters, extract data, etc.
• You you pass dictionaries/objects through the
pipeline, it becomes quite powerful
• Feels similar to writing SQL queries
Copyright (C) 2008, http://www.dabeaz.com 1-74
38. Part 5
Processing Infinite Data
Copyright (C) 2008, http://www.dabeaz.com 1- 75
Question
• Have you ever used 'tail -f' in Unix?
% tail -f logfile
...
... lines of output ...
...
• This prints the lines written to the end of a file
• The quot;standardquot; way to watch a log file
• I used this all of the time when working on
scientific simulations ten years ago...
Copyright (C) 2008, http://www.dabeaz.com 1-76
39. Infinite Sequences
• Tailing a log file results in an quot;infinitequot; stream
• It constantly watches the file and yields lines as
soon as new data is written
• But you don't know how much data will actually
be written (in advance)
• And log files can often be enormous
Copyright (C) 2008, http://www.dabeaz.com 1-77
Tailing a File
• A Python version of 'tail -f'
import time
def follow(thefile):
thefile.seek(0,2) # Go to the end of the file
while True:
line = thefile.readline()
if not line:
time.sleep(0.1) # Sleep briefly
continue
yield line
• Idea : Seek to the end of the file and repeatedly
try to read new lines. If new data is written to
the file, we'll pick it up.
Copyright (C) 2008, http://www.dabeaz.com 1-78
40. Example
• Using our follow function
logfile = open(quot;access-logquot;)
loglines = follow(logfile)
for line in loglines:
print line,
• This produces the same output as 'tail -f'
Copyright (C) 2008, http://www.dabeaz.com 1-79
Example
• Turn the real-time log file into records
logfile = open(quot;access-logquot;)
loglines = follow(logfile)
log = apache_log(loglines)
• Print out all 404 requests as they happen
r404 = (r for r in log if r['status'] == 404)
for r in r404:
print r['host'],r['datetime'],r['request']
Copyright (C) 2008, http://www.dabeaz.com 1-80
41. Commentary
• We just plugged this new input scheme onto
the front of our processing pipeline
• Everything else still works, with one caveat-
functions that consume an entire iterable won't
terminate (min, max, sum, set, etc.)
• Nevertheless, we can easily write processing
steps that operate on an infinite data stream
Copyright (C) 2008, http://www.dabeaz.com 1-81
Thoughts
• This data pipeline idea is really quite powerful
• Captures a lot of common systems problems
• Especially consumer-producer problems
Copyright (C) 2008, http://www.dabeaz.com 1-82
42. Part 6
Feeding the Pipeline
Copyright (C) 2008, http://www.dabeaz.com 1- 83
Feeding Generators
• In order to feed a generator processing
pipeline, you need to have an input source
• So far, we have looked at two file-based inputs
• Reading a file
lines = open(filename)
• Tailing a file
lines = follow(open(filename))
Copyright (C) 2008, http://www.dabeaz.com 1-84
43. Generating Connections
• Generate a sequence of TCP connections
import socket
def receive_connections(addr):
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)
s.bind(addr)
s.listen(5)
while True:
client = s.accept()
yield client
• Example:
for c,a in receive_connections((quot;quot;,9000)):
c.send(quot;Hello Worldnquot;)
c.close()
Copyright (C) 2008, http://www.dabeaz.com 1-85
Generating Messages
• Receive a sequence of UDP messages
import socket
def receive_messages(addr,maxsize):
s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
s.bind(addr)
while True:
msg = s.recvfrom(maxsize)
yield msg
• Example:
for msg, addr in receive_messages((quot;quot;,10000),1024):
print msg, quot;fromquot;, addr
Copyright (C) 2008, http://www.dabeaz.com 1-86
44. I/O Multiplexing
• Generating I/O events on a set of sockets
import select
def gen_events(socks):
while True:
rdr,wrt,err = select.select(socks,socks,socks,0.1)
for r in rdr:
yield quot;readquot;,r
for w in wrt:
yield quot;writequot;,w
for e in err:
yield quot;errorquot;,e
• Note: Using this one is little tricky
• Example : Reading from multiple client sockets
Copyright (C) 2008, http://www.dabeaz.com 1-87
I/O Multiplexing
clientset = []
def acceptor(sockset,addr):
for c,a in receive_connections(addr):
sockset.append(c)
acc_thr = threading.Thread(target=acceptor,
args=(clientset,(quot;quot;,12000))
acc_thr.setDaemon(True)
acc_thr.start()
for evt,s in gen_events(clientset):
if evt == 'read':
data = s.recv(1024)
if not data:
print quot;Closingquot;, s
s.close()
clientset.remove(s)
else:
print s,data
Copyright (C) 2008, http://www.dabeaz.com 1-88
45. Consuming a Queue
• Generate a sequence of items from a queue
def consume_queue(thequeue):
while True:
item = thequeue.get()
if item is StopIteration: break
yield item
• Note: Using StopIteration as a sentinel
• Might be used to feed a generator pipeline as a
consumer thread
Copyright (C) 2008, http://www.dabeaz.com 1-89
Consuming a Queue
• Example:
import Queue, threading
def consumer(q):
for item in consume_queue(q):
print quot;Consumedquot;, item
print quot;Donequot;
in_q = Queue.Queue()
con_thr = threading.Thread(target=consumer,args=(in_q,))
con_thr.start()
for i in xrange(100):
in_q.put(i)
in_q.put(StopIteration)
Copyright (C) 2008, http://www.dabeaz.com 1-90
46. Part 7
Extending the Pipeline
Copyright (C) 2008, http://www.dabeaz.com 1- 91
Multiple Processes
• Can you extend a processing pipeline across
processes and machines?
process 2
socket
pipe
process 1
Copyright (C) 2008, http://www.dabeaz.com 1-92
47. Pickler/Unpickler
• Turn a generated sequence into pickled objects
def gen_pickle(source):
for item in source:
yield pickle.dumps(item)
def gen_unpickle(infile):
while True:
try:
item = pickle.load(infile)
yield item
except EOFError:
return
• Now, attach these to a pipe or socket
Copyright (C) 2008, http://www.dabeaz.com 1-93
Sender/Receiver
• Example: Sender
def sendto(source,addr):
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect(addr)
for pitem in gen_pickle(source):
s.sendall(pitem)
s.close()
• Example: Receiver
def receivefrom(addr):
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)
s.bind(addr)
s.listen(5)
c,a = s.accept()
for item in gen_unpickle(c.makefile()):
yield item
c.close()
Copyright (C) 2008, http://www.dabeaz.com 1-94
48. Example Use
• Example: Read log lines and parse into records
# netprod.py
lines = follow(open(quot;access-logquot;))
log = apache_log(lines)
sendto(log,(quot;quot;,15000))
• Example: Pick up the log on another machine
# netcons.py
for r in receivefrom((quot;quot;,15000)):
print r
Copyright (C) 2008, http://www.dabeaz.com 1-95
Fanning Out
• In all of our examples, the processing pipeline is
driven by a single consumer
for item in gen:
# Consume item
• Can you expand the pipeline to multiple
consumers?
generator
consumer1 consumer2 consumer3
Copyright (C) 2008, http://www.dabeaz.com 1-96
49. Broadcasting
• Consume a generator and send items to a set
of consumers
def broadcast(source, consumers):
for item in source:
for c in consumers:
c.send(item)
• This changes the control-flow
• The broadcaster is what consumes items
• Those items have to be sent to consumers for
processing
Copyright (C) 2008, http://www.dabeaz.com 1-97
Consumers
• To create a consumer, define an object with a
send method on it
class Consumer(object):
def send(self,item):
print self, quot;gotquot;, item
• Example:
c1 = Consumer()
c2 = Consumer()
c3 = Consumer()
lines = follow(open(quot;access-logquot;))
broadcast(lines,[c1,c2,c3])
Copyright (C) 2008, http://www.dabeaz.com 1-98
50. Consumers
• Sadly, inside consumers, it is not possible to
continue the same processing pipeline idea
• In order for it to work, there has to be a single
iteration that is driving the pipeline
• With multiple consumers, you would have to be
iterating in more than one location at once
• You can do this with threads or distributed
processes however
Copyright (C) 2008, http://www.dabeaz.com 1-99
Network Consumer
• Example:
import socket,pickle
class NetConsumer(object):
def __init__(self,addr):
self.s = socket.socket(socket.AF_INET,
socket.SOCK_STREAM)
self.s.connect(addr)
def send(self,item):
pitem = pickle.dumps(item)
self.s.sendall(pitem)
def close(self):
self.s.close()
• This will route items to a network receiver
Copyright (C) 2008, http://www.dabeaz.com 1-
100
51. Network Consumer
• Example Usage:
class Stat404(NetConsumer):
def send(self,item):
if item['status'] == 404:
NetConsumer.send(self,item)
lines = follow(open(quot;access-logquot;))
log = apache_log(lines)
stat404 = Stat404((quot;somehostquot;,15000))
broadcast(log, [stat404])
• The 404 entries will go elsewhere...
Copyright (C) 2008, http://www.dabeaz.com 1-
101
Consumer Thread
• Example: import Queue, threading
class ConsumerThread(threading.Thread):
def __init__(self,target):
threading.Thread.__init__(self)
self.setDaemon(True)
self.in_queue = Queue.Queue()
self.target = target
def send(self,item):
self.in_queue.put(item)
def generate(self):
while True:
item = self.in_queue.get()
yield item
def run(self):
self.target(self.generate())
Copyright (C) 2008, http://www.dabeaz.com 1-
102
52. Consumer Thread
• Sample usage (building on earlier code)
def find_404(log):
for r in (r for r in log if r['status'] == 404):
print r['status'],r['datetime'],r['request']
def bytes_transferred(log):
total = 0
for r in log:
total += r['bytes']
print quot;Total bytesquot;, total
c1 = ConsumerThread(find_404)
c1.start()
c2 = ConsumerThread(bytes_transferred)
c2.start()
lines = follow(open(quot;access-logquot;)) # Follow a log
log = apache_log(lines) # Turn into records
broadcast(log,[c1,c2]) # Broadcast to consumers
Copyright (C) 2008, http://www.dabeaz.com 1-
103
Multiple Sources
• In all of our examples, the processing pipeline is
being fed by a single source
• But, what if you had multiple sources?
source1 source2 source3
Copyright (C) 2008, http://www.dabeaz.com 1-
104
53. Concatenation
• Concatenate one source after another
def concatenate(sources):
for s in sources:
for item in s:
yield item
• This generates one big sequence
• Consumes each generator one at a time
• Only works with generators that terminate
Copyright (C) 2008, http://www.dabeaz.com 1-
105
Parallel Iteration
• Zipping multiple generators together
import itertools
z = itertools.izip(s1,s2,s3)
• This one is only marginally useful
• Requires generators to go lock-step
• Terminates when the first exits
Copyright (C) 2008, http://www.dabeaz.com 1-
106
54. Multiplexing
• Consumer from multiple generators in real-
time--producing values as they are generated
• Example use
log1 = follow(open(quot;foo/access-logquot;))
log2 = follow(open(quot;bar/access-logquot;))
lines = gen_multiplex([log1,log2])
• There is no way to poll a generator. So, how do
you do this?
Copyright (C) 2008, http://www.dabeaz.com 1-
107
Multiplexing Generators
def gen_multiplex(genlist):
item_q = Queue.Queue()
def run_one(source):
for item in source: item_q.put(item)
def run_all():
thrlist = []
for source in genlist:
t = threading.Thread(target=run_one,args=(source,))
t.start()
thrlist.append(t)
for t in thrlist: t.join()
item_q.put(StopIteration)
threading.Thread(target=run_all).start()
while True:
item = item_q.get()
if item is StopIteration: return
yield item
Copyright (C) 2008, http://www.dabeaz.com 1-
108
55. Multiplexing Generators
def gen_multiplex(genlist):
item_q = Queue.Queue()
def run_one(source):
for item in source: item_q.put(item)
def run_all():
thrlist = []
Each generator runs in a
for source in genlist: thread and drops items
onto a queue
t = threading.Thread(target=run_one,args=(source,))
t.start()
thrlist.append(t)
for t in thrlist: t.join()
item_q.put(StopIteration)
threading.Thread(target=run_all).start()
while True:
item = item_q.get()
if item is StopIteration: return
yield item
Copyright (C) 2008, http://www.dabeaz.com 1-
109
Multiplexing Generators
def gen_multiplex(genlist):
item_q = Queue.Queue()
def run_one(source):
for item in source: item_q.put(item)
def run_all():
thrlist = []
for source in genlist:
t = threading.Thread(target=run_one,args=(source,))
t.start()
thrlist.append(t)
for t in thrlist: t.join()
item_q.put(StopIteration) Pull items off the queue
and yield them
threading.Thread(target=run_all).start()
while True:
item = item_q.get()
if item is StopIteration: return
yield item
Copyright (C) 2008, http://www.dabeaz.com 1-
110
56. Multiplexing Generators
def gen_multiplex(genlist):
item_q = Queue.Queue() Run all of the
def run_one(source): generators, wait for them
for item in source: item_q.put(item)
to terminate, then put a
def run_all(): sentinel on the queue
thrlist = []
for source in genlist:
(StopIteration)
t = threading.Thread(target=run_one,args=(source,))
t.start()
thrlist.append(t)
for t in thrlist: t.join()
item_q.put(StopIteration)
threading.Thread(target=run_all).start()
while True:
item = item_q.get()
if item is StopIteration: return
yield item
Copyright (C) 2008, http://www.dabeaz.com 1-
111
Part 8
Various Programming Tricks (And Debugging)
Copyright (C) 2008, http://www.dabeaz.com 1-112
57. Putting it all Together
• This data processing pipeline idea is powerful
• But, it's also potentially mind-boggling
• Especially when you have dozens of pipeline
stages, broadcasting, multiplexing, etc.
• Let's look at a few useful tricks
Copyright (C) 2008, http://www.dabeaz.com 1-
113
Creating Generators
• Any single-argument function is easy to turn
into a generator function
def generate(func):
def gen_func(s):
for item in s:
yield func(item)
return gen_func
• Example:
gen_sqrt = generate(math.sqrt)
for x in gen_sqrt(xrange(100)):
print x
Copyright (C) 2008, http://www.dabeaz.com 1-
114
58. Debug Tracing
• A debugging function that will print items going
through a generator
def trace(source):
for item in source:
print item
yield item
• This can easily be placed around any generator
lines = follow(open(quot;access-logquot;))
log = trace(apache_log(lines))
r404 = trace(r for r in log if r['status'] == 404)
• Note: Might consider logging module for this
Copyright (C) 2008, http://www.dabeaz.com 1-
115
Recording the Last Item
• Store the last item generated in the generator
class storelast(object):
def __init__(self,source):
self.source = source
def next(self):
item = self.source.next()
self.last = item
return item
def __iter__(self):
return self
• This can be easily wrapped around a generator
lines = storelast(follow(open(quot;access-logquot;)))
log = apache_log(lines)
for r in log:
print r
print lines.last
Copyright (C) 2008, http://www.dabeaz.com 1-
116
59. Shutting Down
• Generators can be shut down using .close()
import time
def follow(thefile):
thefile.seek(0,2) # Go to the end of the file
while True:
line = thefile.readline()
if not line:
time.sleep(0.1) # Sleep briefly
continue
yield line
• Example:
lines = follow(open(quot;access-logquot;))
for i,line in enumerate(lines):
print line,
if i == 10: lines.close()
Copyright (C) 2008, http://www.dabeaz.com 1-
117
Shutting Down
• In the generator, GeneratorExit is raised
import time
def follow(thefile):
thefile.seek(0,2) # Go to the end of the file
try:
while True:
line = thefile.readline()
if not line:
time.sleep(0.1) # Sleep briefly
continue
yield line
except GeneratorExit:
print quot;Follow: Shutting downquot;
• This allows for resource cleanup (if needed)
Copyright (C) 2008, http://www.dabeaz.com 1-
118
60. Ignoring Shutdown
• Question: Can you ignore GeneratorExit?
import time
def follow(thefile):
thefile.seek(0,2) # Go to the end of the file
while True:
try:
line = thefile.readline()
if not line:
time.sleep(0.1) # Sleep briefly
continue
yield line
except GeneratorExit:
print quot;Forget about itquot;
• Answer: No. You'll get a RuntimeError
Copyright (C) 2008, http://www.dabeaz.com 1-
119
Shutdown and Threads
• Question : Can a thread shutdown a generator
running in a different thread?
lines = follow(open(quot;foo/test.logquot;))
def sleep_and_close(s):
time.sleep(s)
lines.close()
threading.Thread(target=sleep_and_close,args=(30,)).start()
for line in lines:
print line,
Copyright (C) 2008, http://www.dabeaz.com 1-
120
61. Shutdown and Threads
• Separate threads can not call .close()
• Output:
Exception in thread Thread-1:
Traceback (most recent call last):
File quot;/Library/Frameworks/Python.framework/Versions/2.5/
lib/python2.5/threading.pyquot;, line 460, in __bootstrap
self.run()
File quot;/Library/Frameworks/Python.framework/Versions/2.5/
lib/python2.5/threading.pyquot;, line 440, in run
self.__target(*self.__args, **self.__kwargs)
File quot;genfollow.pyquot;, line 31, in sleep_and_close
lines.close()
ValueError: generator already executing
Copyright (C) 2008, http://www.dabeaz.com 1-
121
Shutdown and Signals
• Can you shutdown a generator with a signal?
import signal
def sigusr1(signo,frame):
print quot;Closing it downquot;
lines.close()
signal.signal(signal.SIGUSR1,sigusr1)
lines = follow(open(quot;access-logquot;))
for line in lines:
print line,
• From the command line
% kill -USR1 pid
Copyright (C) 2008, http://www.dabeaz.com 1-
122
62. Shutdown and Signals
• This also fails:
Traceback (most recent call last):
File quot;genfollow.pyquot;, line 35, in <module>
for line in lines:
File quot;genfollow.pyquot;, line 8, in follow
time.sleep(0.1)
File quot;genfollow.pyquot;, line 30, in sigusr1
lines.close()
ValueError: generator already executing
• Sigh.
Copyright (C) 2008, http://www.dabeaz.com 1-
123
Shutdown
• The only way to externally shutdown a
generator would be to instrument with a flag or
some kind of check
def follow(thefile,shutdown=None):
thefile.seek(0,2)
while True:
if shutdown and shutdown.isSet(): break
line = thefile.readline()
if not line:
time.sleep(0.1)
continue
yield line
Copyright (C) 2008, http://www.dabeaz.com 1-
124
63. Shutdown
• Example:
import threading,signal
shutdown = threading.Event()
def sigusr1(signo,frame):
print quot;Closing it downquot;
shutdown.set()
signal.signal(signal.SIGUSR1,sigusr1)
lines = follow(open(quot;access-logquot;),shutdown)
for line in lines:
print line,
Copyright (C) 2008, http://www.dabeaz.com 1-
125
Part 9
Co-routines
Copyright (C) 2008, http://www.dabeaz.com 1-126
64. The Final Frontier
• In Python 2.5, generators picked up the ability
to receive values using .send()
def recv_count():
try:
while True:
n = (yield) # Yield expression
print quot;T-minusquot;, n
except GeneratorExit:
print quot;Kaboom!quot;
• Think of this function as receiving values rather
than generating them
Copyright (C) 2008, http://www.dabeaz.com 1-
127
Example Use
• Using a receiver
>>> r = recv_count()
>>> r.next() Note: must call .next() here
>>> for i in range(5,0,-1):
... r.send(i)
...
T-minus 5
T-minus 4
T-minus 3
T-minus 2
T-minus 1
>>> r.close()
Kaboom!
>>>
Copyright (C) 2008, http://www.dabeaz.com 1-
128
65. Co-routines
• This form of a generator is a quot;co-routinequot;
• Also sometimes called a quot;reverse-generatorquot;
• Python books (mine included) do a pretty poor
job of explaining how co-routines are supposed
to be used
• I like to think of them as quot;receiversquot; or
quot;consumerquot;. They receive values sent to them.
Copyright (C) 2008, http://www.dabeaz.com 1-
129
Setting up a Coroutine
• To get a co-routine to run properly, you have to
ping it with a .next() operation first
def recv_count():
try:
while True:
n = (yield) # Yield expression
print quot;T-minusquot;, n
except GeneratorExit:
print quot;Kaboom!quot;
• Example:r = recv_count()
r.next()
• This advances it to the first yield--where it will
receive its first value
Copyright (C) 2008, http://www.dabeaz.com 1-
130
66. @consumer decorator
• The .next() bit can be handled via decoration
def consumer(func):
def start(*args,**kwargs):
c = func(*args,**kwargs)
c.next()
return c
return start
• Example:@consumer
def recv_count():
try:
while True:
n = (yield) # Yield expression
print quot;T-minusquot;, n
except GeneratorExit:
print quot;Kaboom!quot;
Copyright (C) 2008, http://www.dabeaz.com 1-
131
@consumer decorator
• Using the decorated version
>>> r = recv_count()
>>> for i in range(5,0,-1):
... r.send(i)
...
T-minus 5
T-minus 4
T-minus 3
T-minus 2
T-minus 1
>>> r.close()
Kaboom!
>>>
• Don't need the extra .next() step here
Copyright (C) 2008, http://www.dabeaz.com 1-
132
67. Coroutine Pipelines
• Co-routines also set up a processing pipeline
• Instead of being defining by iteration, it's
defining by pushing values into the pipeline
using .send()
.send() .send() .send()
• We already saw some of this with broadcasting
Copyright (C) 2008, http://www.dabeaz.com 1-
133
Broadcasting (Reprise)
• Consume a generator and send items to a set
of consumers
def broadcast(source, consumers):
for item in source:
for c in consumers:
c.send(item)
• Notice that send() operation there
• The consumers could be co-routines
Copyright (C) 2008, http://www.dabeaz.com 1-
134
68. Example
@consumer
def find_404():
while True:
r = (yield)
if r['status'] == 404:
print r['status'],r['datetime'],r['request']
@consumer
def bytes_transferred():
total = 0
while True:
r = (yield)
total += r['bytes']
print quot;Total bytesquot;, total
lines = follow(open(quot;access-logquot;))
log = apache_log(lines)
broadcast(log,[find_404(),bytes_transferred()])
Copyright (C) 2008, http://www.dabeaz.com 1-
135
Discussion
• In last example, multiple consumers
• However, there were no threads
• Further exploration along these lines can take
you into co-operative multitasking, concurrent
programming without using threads
• That's an entirely different tutorial!
Copyright (C) 2008, http://www.dabeaz.com 1-
136
69. Wrap Up
Copyright (C) 2008, http://www.dabeaz.com 1-137
The Big Idea
• Generators are an incredibly useful tool for a
variety of quot;systemsquot; related problem
• Power comes from the ability to set up
processing pipelines
• Can create components that plugged into the
pipeline as reusable pieces
• Can extend the pipeline idea in many directions
(networking, threads, co-routines)
Copyright (C) 2008, http://www.dabeaz.com 1-
138
70. Code Reuse
• I like the way that code gets reused with
generators
• Small components that just process a data
stream
• Personally, I think this is much easier than what
you commonly see with OO patterns
Copyright (C) 2008, http://www.dabeaz.com 1-
139
Example
• SocketServer Module (Strategy Pattern)
import SocketServer
class HelloHandler(SocketServer.BaseRequestHandler):
def handle(self):
self.request.sendall(quot;Hello Worldnquot;)
serv = SocketServer.TCPServer((quot;quot;,8000),HelloHandler)
serv.serve_forever()
• My generator version
for c,a in receive_connections((quot;quot;,8000)):
c.send(quot;Hello Worldnquot;)
c.close()
Copyright (C) 2008, http://www.dabeaz.com 1-
140
71. Pitfalls
• I don't think many programmers really
understand generators yet
• Springing this on the uninitiated might cause
their head to explode
• Error handling is really tricky because you have
lots of components chained together
• Need to pay careful attention to debugging,
reliability, and other issues.
Copyright (C) 2008, http://www.dabeaz.com 1-
141
Thanks!
• I hope you got some new ideas from this class
• Please feel free to contact me
http://www.dabeaz.com
Copyright (C) 2008, http://www.dabeaz.com 1-
142