C / C++ / Java a lot of projects … definitely references languagesC# going up but yet a few projectsRuby is not well established .. Yet? Only for web???Python has already a lot of projects and has the lowest LOC per project
1 Favorite
Marinho Brandão, Desenvolvedor at Comunidade Django Brasil, favorited this 2 months ago
Python For Large Company? - Presentation Transcript
PythonIn Large Companies? SébastienTandel sebastien.tandel@corp.terra.com.br sebastien.tandel@gmail.com
Plan About Terra The 7 steps Prototype Define the Goals Integration Some Libs Prove It Works Evangelize Next Steps Conclusions
About Terra : Web Portal Largest Latin American web portal Located in 18 countries 1000s of servers Brazil : ~7M unique visitors / day ~70M pageviews / day
About Terra Source: Nielsen NetView (June 2009)
About Terra : Email Plaftorm I’m part of the email team. Some stats : +10M mailboxes +30M inbound emails per day +30M outbound emails per day avg : 300 mail/s, peak : 600 mail/s Systems Main systems : SMTP, LMTP, POP, IMAP, Webmail Total of +30 systems to design/develop/maintain Main languages used C / C++
About Terra
Several “official” languages at Terra :
PHP, C, C++, Java, C#, Erlang
Average # is one per team!
No official “scripting” language (Python, Perl or other)
Why? From what I hear Performance Integration with others systems (& legacy) Costs / benefits? Buzzword fear Labor market
Flash Python Overview Python is … Interpreted Dynamically Typed Really Concise Multi-paradigm : procedural, OO, functional Exceptions : helpful for robustness, debug (no strace ;)) Garbage Collector : don’t worry about allocation/free
Step 1 : Prototype
Step 1 : Prototype Buggy system re-written as prototype in Python Surprise! Worked a lot better than its C cousin Prototype is now in production! Spread the word about this rewrite around me Some technical people liked the idea One has not been so enthusiast … my manager Cons: no integration with homemade systems Just one example
Step 1 : Prototype Introducing new ideas is a long and though way
Step 2 : Define the Goals Performance critical systems : postfix, lmtp, imap / pop
Step 2 : Define the Goals
Performance critical systems
Web-based systems Webmail, ECP
Step 2 : Define the Goals
Performance critical systems
Web-based systems
Backend systems : spamreporter, cleaner, clx trainer, base trainer, mfsbuilder, migrador, nnentrega, smigol, …
Step 2 : Define the Goals
Performance critical systems
Web-based systems
Backend systems
Almost inexistent systems (though interesting ones) : Mailboxes stats, logs analysis (stats and user behavior characterization)
Step 2 : Define the Goals
Performance critical systems
Web-based systems
Backend systems
Stats / User behavior characterization, …
System / Integration tests scripts
Step 2 : Define the Goals
Performance critical systems
Web-based systems
Backend systems
Stats / User behavior characterization, …
System / Integration tests scripts
The Grail :
Python can be used for ALL except Performance Critical Systems
Step 3 : Integration
Step 3 : Integration Python could be used with every systems but how can I interface with the homemade systems (legacy) ?
Step 3 : Integration Various way to create Python Bindings : Python C API: the “hard” way
Step 3 : Integration Various way to create Python Bindings : Python C API: the “hard” way swig : the lazy way won’t create a Pythonic API for you
Step 3 : Integration Various way to create Python Bindings : Python C API : the “hard” way swig : the lazy way ctypes: the stupidly easy way from ctypes import cdll l = cdll.LoadLibrary(“libc.so.6”) l.mkdir(“python-mkdir-test”)
Step 3 : Integration Various way to create Python Bindings : Python C API : the “hard” way swig : the lazy way ctypes : the stupidly easy way Cython : write python, compile with gcc
Step 3 : Integration Wrote bindings to interface with all major internal systems (thanks to ctypes) With pythonic API!
Step 3 : Integration from trrauth import TrrAuth auth= TrrAuth(“IMAP”)auth.open_userpass(“standel”, “1q2w3e”, “terra”) auth.attributes = [ “short_name”, “id_perm”, “antispam” ] for attr, value in auth: print attr, “:”, value
Step 4 : Some Libs
Step 4 : Some LibsMaster / Slave Master responsible for : Forking the slaves Reading a “list” of tasks Distribution of the tasks to the slaves Slave responsible for : Execution of the task Return execution status to the master Key characteristics : Slave death detection Handle unhandled exceptions(+ hook) Master <-> slave protocol allows temporary error code Timeout of the tasks
Step 4 : Some LibsMaster / Slave One neat characteristic : System might got bug in prod w/ minimal impact If unhandled exception occurs Only one slave dies It is detected and master will fork a new one (if needed) The lib handles the exception : Default behavior : prints to console User defined (callback) : e.g. write the stack trace to a file! Cherry on the cake : getting specific production data about faulty task
from robustpools.process_pool import master_task_list from robustpools.process_pool import slave_task_list m_config = { 'INFINITE_LOOP' : 0 } class list_task(object): def __init__(self, list, num, timeout_validity=600): self.__num = num def _id(self): return self.__num id = property(_id) class list_slave(slave_task_list): def __init__(self): super(list_slave, self).__init__(list_task) def run(self, task): print task.id return 0, "ok” list = xrange(10) m = master_task_list(list, num_slave=5, slave_class=list_slave, config=m_config) m.start()
Step 4 : Some LibsTCP Sockets Pool Manage connections to a pool of servers send in a round-robin/priority way to each server Detect connection errors Retry to connect Number of retries limited => after mark as dead Retry again later with exponential backoff
Step 5 : Prove It Works
Step 5 : Prove It Works Prove = collect data … How? Write integrated systems using bindings and libs of previous steps. Show it works Performance Productivity
Step 5 : Prove It Works Performance, one obvious thought : C/C++ PINCS Performance is not C, Stupid!
Step 5 : Prove It WorksPerformance Some of the rewrites works faster than C/C++ cousins Why? OS / Systems limits Libs (legacy) Algorithms Software Architecture Infrastructure
Step 5 : Prove It WorksProductivity BTW, pure performance so important? Time to Market much more important Adopt Lean Thinking and eliminate every possible waste
Writing too much code is a big waste in several ways
Loose time when writing Increase # bugs More time to maintain More time to know code base (think to new employees)
Impact Overall Productivity
Step 5 : Prove It WorksProductivity http://page.mi.fu-berlin.de/prechelt/Biblio/jccpprt2_advances2003.pdf
Step 5 : Prove It WorksProductivity http://page.mi.fu-berlin.de/prechelt/Biblio/jccpprt2_advances2003.pdf
Step 5 : Prove It WorksProductivity http://www.ohloh.net
Step 5 : Prove It WorksProductivity http://www.ohloh.net
Step 5 : Prove It WorksProductivity http://www.ohloh.net
Step 5 : Prove It WorksProductivity
Step 5 : Prove It WorksProductivity Some existing C/C++ systems re-written in Python Original C/C++ versions total of ~20.000 LOC In Python, 4-6x less code ! The previous numbers do not seem to lie
Step 5 : Prove It WorksProductivity Oh, parsing an email? Any idea in C/C++?
Step 5 : Prove It WorksProductivity
parsing an email
from email import message_from_file fh = open(filename, “r”) mail = message_from_file(fh) fh.close()
Step 5 : Prove It WorksProductivity
parsing an email
content types of parts? Any idea in C/C++ ? from email import message_from_file def get_mail(filename): fh = open(filename, “r”) mail = message_from_file(fh) fh.close() return mail mail = get_mail(filename)
Step 5 : Prove It WorksProductivity
parsing an email
content types of parts
from email import message_from_file def get_mail(filename): fh = open(filename, “r”) mail = message_from_file(fh) fh.close() return mail mail = get_mail(filename) for part in mail.walk(): print part.get_content_type()
Step 5 : Prove It WorksProductivity
parsing an email
content types of parts
getting headers?
Any idea in C/C++?
from email import message_from_file def get_mail(filename): fh = open(filename, “r”) mail = message_from_file(fh) fh.close() return mail mail = get_mail(filename) for part in mail.walk(): print part.get_content_type()
Step 5 : Prove It WorksProductivity
parsing an email
content types of parts
getting headers
Python libs are just that simple!
… and there are a lot! from email import message_from_file def get_mail(filename): fh = open(filename, “r”) mail = message_from_file(fh) fh.close() return mail mail = get_mail(filename) for part in mail.walk(): print part.get_content_type() print mail[“From”] print mail[“Subject”]
Step 5 : Prove It WorksPerformance (Again?) For equivalent architecture (libs, algorithm, infrastructure) C is a best performer than Python! Python Is Not C, Stupid!
Step 5 : Prove It WorksPerformance (Again?) Bottleneck discovered! PINCS! : think first to architecture!
Step 5 : Prove It WorksPerformance (Again?) Bottleneck discovered!
PINCS! : think first to architecture!
Ctypes/ Swig : python bindings Write your bottleneck in C / C++, use it in your python app
Step 5 : Prove It WorksPerformance (Again?) Bottleneck discovered!
PINCS! : think first to architecture!
Ctypes : absurdly easy python bindings
Cython: write python, obtain a gcc compiled lib
Step 5 : Prove It WorksPerformance (Again?) Bottleneck discovered!
PINCS! : think first to architecture!
Ctypes : absurdly easy python bindings
Cython: write python, obtain a gcc compiled lib
Psyco: JIT for python Just an additional module import in your code 2 – 100x times faster than normal Python Requires a bit more memory
Step 5 : Prove It WorksPerformance (Again?) Bottleneck discovered!
PINCS! : think first to architecture!
Ctypes : absurdly easy python bindings
Cython: write python, obtain a gcc compiled lib
Psyco: JIT for python
Unladden Swallow : Google Project
Produce a version of Python at least 5x faster
Every patch goes to Python (no fork!)
Step 6 : Evangelize
Step 6 : Evangelize Once having stopped and look at what have been accomplished … Show it, Evangelize!
Step 6 : Evangelize Because introducing a “new technology” is not just about teaching something to users. You’ve got to play the role of evangelist! Innovators (3.5%) New stuffs? they’re in!
Step 6 : Evangelize Because introducing a “new technology” is not just about teaching something to users. You’ve got to play the role of evangelist!
Innovators (3.5%)
Early-adopters (12.5%) Open to new ideas but check before
Step 6 : Evangelize Because introducing a “new technology” is not just about teaching something to users. You’ve got to play the role of evangelist!
Innovators (3.5%)
Early-adopters (12.5%)
Early majority (35%) First, they must see the idea working
Step 6 : Evangelize Because introducing a “new technology” is not just about teaching something to users. You’ve got to play the role of evangelist!
Innovators (3.5%)
Early-adopters (12.5%)
Early majority (35%)
Late majority (35%) Accept after lot of pressure, or imposed
Step 6 : Evangelize Because introducing a “new technology” is not just about teaching something to users. You’ve got to play the role of evangelist!
Innovators (3.5%)
Early-adopters (12.5%)
Early majority (35%)
Late majority (35%)
Laggard (14%) Never accept (why would I want to change?)
Step 6 : Evangelize During work, I constantly spoke (a lot) to others Presentation on Python made for all Present to a large audience what has been done Open discussion Poster resuming what has been done Wiki page documenting Python stuffs Specific mailing-list related to Python
Step 6 : Evangelize lot of work and slow process but I won some allies Some technical people are convinced that Python is useful Some managers are convinced that Python could be a good thing for Terra Starting evaluation in some specific cases
Step 7 : Next Steps
Step 7 : Next Steps Proven that Python could be useful in some cases. Don’t forget my Grail! The way has not ended … I’m lobbying to start using Python for web development. And again, I made a prototype
Step 7 : Next Steps Django = THE Python MVC web framework : Model : By describing data, no code written (SQLAlchemy) Automatic creation of tables (if needed), Data accessed through objects, No SQL needed! View : access models to get the data render the output through templates
loose coupling interface <-> code!
Controller : REST through url parsing
Step 7 : Next Steps Login : Module auth already exists. Easy to tell django that authentication is required @login_required def list_abook(request, username): … login_requiredis a python decorator
Step 7 : Next Steps Caching information (memcache, bd, file, …) 4 levels : Per site : one config line Per view : one python decorator @cache_page(60 * 15) def list_abook(request, username): … In templates : maybe better to let this one out! Low-level cache access : cache.get(id) cache.set(id, value, timeout)
Step 7 : Next Steps Address book Web Service Retrieve address book of one user, Add an account, Add an entry to the address book of a user, View all the address book entries, Output in HTML, JSON and CSV < 100 LOC 2 hours (w/o knowing the framework) Not one line of SQL just usefulcode
Conclusions One year and a half … and Evangelization is not done yet! Email Team : Several systems have been written in Python and works really fine … even with the Terra high load! Web project should start right now People are starting using/learning it inside the company Some teams are starting evaluating Python Some Terra employees here at this conference!
Companies in the process of adoption of a language more
Companies in the process of adoption of a language evaluate several aspects like :
* performance
* integration with existing ecosystem
* productivity
* use case of this language
In this presentation, we'll focus e think about these points sharing the experience of the integration of Python at Terra. less
0 comments
Post a comment