PythonIn Large Companies?SébastienTandelsebastien.tandel@corp.terra.com.brsebastien.tandel@gmail.com
PlanAbout TerraThe 7 stepsPrototypeDefine the GoalsIntegrationSome LibsProve It WorksEvangelizeNext StepsConclusions
About Terra : Web PortalLargest Latin American web portalLocated in 18 countries1000s of serversBrazil :~7M unique visitors / day~70M pageviews / day
About TerraSource: Nielsen NetView (June 2009)
About Terra : Email PlaftormI’m part of the email team.Some stats : +10M mailboxes+30M inbound emails per day+30M outbound emails per dayavg : 300 mail/s, peak : 600 mail/sSystemsMain systems : SMTP, LMTP, POP, IMAP, WebmailTotal of +30 systems to design/develop/maintainMain languages used C / C++
About TerraSeveral “official” languages at Terra :PHP, C, C++, Java, C#, ErlangAverage # is one per team!
No official “scripting” language (Python, Perl or other)Why? From what I hearPerformanceIntegration with others systems (& legacy)Costs / benefits?Buzzword fearLabor market
Flash Python OverviewPython is …InterpretedDynamically TypedReally ConciseMulti-paradigm : procedural, OO, functionalExceptions : helpful for robustness, debug (no strace ;))Garbage Collector : don’t worry about allocation/free
Step 1 : Prototype
Step 1 : PrototypeBuggy system re-written as prototype in PythonSurprise! Worked a lot better than its C cousinPrototype is now in production!Spread the word about this rewrite around meSome technical people liked the ideaOne has not been so enthusiast … my managerCons: no integration with homemade systemsJust one example
Step 1 : PrototypeIntroducing new ideas is a long and though way
Step 2 : Define the GoalsPerformance critical systems : postfix, lmtp, imap / pop
Step 2 : Define the GoalsPerformance critical systemsWeb-based systemsWebmail, ECP
Step 2 : Define the GoalsPerformance critical systems
Web-based systemsBackend systems :spamreporter, cleaner, clx trainer, base trainer, mfsbuilder, migrador, nnentrega, smigol, …
Step 2 : Define the GoalsPerformance critical systems
Web-based systems
Backend systemsAlmost inexistent systems (though interesting ones) :Mailboxes stats, logs analysis (stats and user behavior characterization)
Step 2 : Define the GoalsPerformance critical systems
Web-based systems
Backend systems
Stats / User behavior characterization, …System / Integration tests scripts
Step 2 : Define the GoalsPerformance critical systems
Web-based systems
Backend systems
Stats / User behavior characterization, …
System / Integration tests scriptsThe Grail :Python can be used for ALL except Performance Critical SystemsStep 3 : Integration
Step 3 : IntegrationPython could be used with every systemsbut how can I interface with the homemade systems (legacy) ? 
Step 3 : IntegrationVarious way to create Python Bindings :Python C API: the “hard” way
Step 3 : IntegrationVarious way to create Python Bindings :Python C API: the “hard” wayswig : the lazy way won’t create a Pythonic API for you
Step 3 : IntegrationVarious way to create Python Bindings :Python C API : the “hard” wayswig : the lazy wayctypes: the stupidly easy wayfrom ctypes import cdlll = cdll.LoadLibrary(“libc.so.6”)l.mkdir(“python-mkdir-test”)
Step 3 : IntegrationVarious way to create Python Bindings :Python C API : the “hard” wayswig : the lazy way ctypes : the stupidly easy wayCython : write python, compile with gcc
Step 3 : IntegrationWrote bindings to interface with all major internal systems (thanks to ctypes)With pythonic API! 
Step 3 : Integration  from trrauth import TrrAuthauth = TrrAuth(“IMAP”)auth.open_userpass(“standel”, “1q2w3e”, “terra”)auth.attributes = [ “short_name”, “id_perm”, “antispam” ]
Step 3 : Integration  from trrauth import TrrAuthauth = TrrAuth(“IMAP”)auth.open_userpass(“standel”, “1q2w3e”, “terra”)auth.attributes = [ “short_name”, “id_perm”, “antispam” ]	print auth.short_name, “:”, auth.id_perm
Step 3 : Integration  from trrauth import TrrAuthauth= TrrAuth(“IMAP”)auth.open_userpass(“standel”, “1q2w3e”, “terra”)auth.attributes = [ “short_name”, “id_perm”, “antispam” ]for attr, value in auth:  print attr, “:”, value
Step 4 : Some Libs
Step 4 : Some LibsMaster / SlaveMaster responsible for :Forking the slavesReading a “list” of tasksDistribution of the tasks to the slavesSlave responsible for :Execution of the taskReturn execution status to the masterKey characteristics :Slave death detectionHandle unhandled exceptions(+ hook)Master <-> slave protocol allows temporary error codeTimeout of the tasks
Step 4 : Some LibsMaster / SlaveOne neat characteristic :System might got bug in prod w/ minimal impactIf unhandled exception occursOnly one slave diesIt is detected and master will fork a new one (if needed)The lib handles the exception :Default behavior : prints to consoleUser defined (callback) : e.g. write the stack trace to a file!Cherry on the cake : getting specific production data about faulty task
from robustpools.process_pool import master_task_listfrom robustpools.process_pool import slave_task_listm_config = { 'INFINITE_LOOP' : 0 }class list_task(object):  def __init__(self, list, num, timeout_validity=600):self.__num = num  def _id(self):    return self.__num  id = property(_id) class list_slave(slave_task_list):  def __init__(self):super(list_slave, self).__init__(list_task)  def run(self, task):    print task.id    return 0, "ok”list = xrange(10)m = master_task_list(list, num_slave=5, slave_class=list_slave, config=m_config)m.start()
Step 4 : Some LibsTCP Sockets PoolManage connections to a pool of serverssend in a round-robin/priority way to each serverDetect connection errorsRetry to connectNumber of retries limited => after mark as deadRetry again later with exponential backoff
Step 5 : Prove It Works
Step 5 : Prove It WorksProve = collect data … How?Write integrated systems using bindings and libs of previous steps.Show it works PerformanceProductivity
Step 5 : Prove It WorksPerformance, one obvious thought : C/C++PINCSPerformance is not C, Stupid!
Step 5 : Prove It WorksPerformanceSome of the rewrites works faster than C/C++ cousinsWhy?OS / Systems limitsLibs (legacy)AlgorithmsSoftware ArchitectureInfrastructure
Step 5 : Prove It WorksProductivityBTW, pure performance so important?Time to Market much more importantAdopt Lean Thinking and eliminate every possible wasteWriting too much code is a big waste in several waysLoose time when writingIncrease # bugsMore time to maintainMore time to know code base (think to new employees)Impact Overall ProductivityStep 5 : Prove It WorksProductivityhttp://page.mi.fu-berlin.de/prechelt/Biblio/jccpprt2_advances2003.pdf
Step 5 : Prove It WorksProductivityhttp://page.mi.fu-berlin.de/prechelt/Biblio/jccpprt2_advances2003.pdf
Step 5 : Prove It WorksProductivityhttp://www.ohloh.net
Step 5 : Prove It WorksProductivityhttp://www.ohloh.net
Step 5 : Prove It WorksProductivityhttp://www.ohloh.net
Step 5 : Prove It WorksProductivity
Step 5 : Prove It WorksProductivitySome existing C/C++ systems re-written in PythonOriginal C/C++ versions total of ~20.000 LOC In Python, 4-6x less code !The previous numbers do not seem to lie 
Step 5 : Prove It WorksProductivityOh, parsing an email?Any idea in C/C++?
Step 5 : Prove It WorksProductivityparsing an emailfrom email import message_from_filefh = open(filename, “r”)mail = message_from_file(fh)fh.close()
Step 5 : Prove It WorksProductivityparsing an emailcontent types of parts?Any idea in C/C++ ?from email import message_from_filedef get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mailmail = get_mail(filename)
Step 5 : Prove It WorksProductivityparsing an email
content types of partsfrom email import message_from_filedef get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mailmail = get_mail(filename)for part in mail.walk():	print part.get_content_type()
Step 5 : Prove It WorksProductivityparsing an email
content types of parts
getting headers?
Any idea in C/C++?from email import message_from_filedef get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mailmail = get_mail(filename)for part in mail.walk():	print part.get_content_type()
Step 5 : Prove It WorksProductivityparsing an email
content types of parts
getting headers
Python libs are just that simple!… and there are a lot!from email import message_from_filedef get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mailmail = get_mail(filename)for part in mail.walk():	print part.get_content_type()print mail[“From”]print mail[“Subject”]
Step 5 : Prove It WorksPerformance (Again?)For equivalent architecture (libs, algorithm, infrastructure)C is a best performer than Python! Python Is Not C, Stupid!
Step 5 : Prove It WorksPerformance (Again?)Bottleneck discovered!PINCS! : think first to architecture!
Step 5 : Prove It WorksPerformance (Again?)Bottleneck discovered!PINCS! : think first to architecture!Ctypes/ Swig : python bindingsWrite your bottleneck in C / C++, use it in your python app
Step 5 : Prove It WorksPerformance (Again?)Bottleneck discovered!PINCS! : think first to architecture!
Ctypes : absurdly easy python bindingsCython: write python, obtain a gcc compiled lib

Python For Large Company?

  • 1.
  • 2.
    PlanAbout TerraThe 7stepsPrototypeDefine the GoalsIntegrationSome LibsProve It WorksEvangelizeNext StepsConclusions
  • 3.
    About Terra :Web PortalLargest Latin American web portalLocated in 18 countries1000s of serversBrazil :~7M unique visitors / day~70M pageviews / day
  • 4.
    About TerraSource: NielsenNetView (June 2009)
  • 5.
    About Terra :Email PlaftormI’m part of the email team.Some stats : +10M mailboxes+30M inbound emails per day+30M outbound emails per dayavg : 300 mail/s, peak : 600 mail/sSystemsMain systems : SMTP, LMTP, POP, IMAP, WebmailTotal of +30 systems to design/develop/maintainMain languages used C / C++
  • 6.
    About TerraSeveral “official”languages at Terra :PHP, C, C++, Java, C#, ErlangAverage # is one per team!
  • 7.
    No official “scripting”language (Python, Perl or other)Why? From what I hearPerformanceIntegration with others systems (& legacy)Costs / benefits?Buzzword fearLabor market
  • 8.
    Flash Python OverviewPythonis …InterpretedDynamically TypedReally ConciseMulti-paradigm : procedural, OO, functionalExceptions : helpful for robustness, debug (no strace ;))Garbage Collector : don’t worry about allocation/free
  • 9.
    Step 1 :Prototype
  • 10.
    Step 1 :PrototypeBuggy system re-written as prototype in PythonSurprise! Worked a lot better than its C cousinPrototype is now in production!Spread the word about this rewrite around meSome technical people liked the ideaOne has not been so enthusiast … my managerCons: no integration with homemade systemsJust one example
  • 11.
    Step 1 :PrototypeIntroducing new ideas is a long and though way
  • 12.
    Step 2 :Define the GoalsPerformance critical systems : postfix, lmtp, imap / pop
  • 13.
    Step 2 :Define the GoalsPerformance critical systemsWeb-based systemsWebmail, ECP
  • 14.
    Step 2 :Define the GoalsPerformance critical systems
  • 15.
    Web-based systemsBackend systems:spamreporter, cleaner, clx trainer, base trainer, mfsbuilder, migrador, nnentrega, smigol, …
  • 16.
    Step 2 :Define the GoalsPerformance critical systems
  • 17.
  • 18.
    Backend systemsAlmost inexistentsystems (though interesting ones) :Mailboxes stats, logs analysis (stats and user behavior characterization)
  • 19.
    Step 2 :Define the GoalsPerformance critical systems
  • 20.
  • 21.
  • 22.
    Stats / Userbehavior characterization, …System / Integration tests scripts
  • 23.
    Step 2 :Define the GoalsPerformance critical systems
  • 24.
  • 25.
  • 26.
    Stats / Userbehavior characterization, …
  • 27.
    System / Integrationtests scriptsThe Grail :Python can be used for ALL except Performance Critical SystemsStep 3 : Integration
  • 28.
    Step 3 :IntegrationPython could be used with every systemsbut how can I interface with the homemade systems (legacy) ? 
  • 29.
    Step 3 :IntegrationVarious way to create Python Bindings :Python C API: the “hard” way
  • 30.
    Step 3 :IntegrationVarious way to create Python Bindings :Python C API: the “hard” wayswig : the lazy way won’t create a Pythonic API for you
  • 31.
    Step 3 :IntegrationVarious way to create Python Bindings :Python C API : the “hard” wayswig : the lazy wayctypes: the stupidly easy wayfrom ctypes import cdlll = cdll.LoadLibrary(“libc.so.6”)l.mkdir(“python-mkdir-test”)
  • 32.
    Step 3 :IntegrationVarious way to create Python Bindings :Python C API : the “hard” wayswig : the lazy way ctypes : the stupidly easy wayCython : write python, compile with gcc
  • 33.
    Step 3 :IntegrationWrote bindings to interface with all major internal systems (thanks to ctypes)With pythonic API! 
  • 34.
    Step 3 :Integration from trrauth import TrrAuthauth = TrrAuth(“IMAP”)auth.open_userpass(“standel”, “1q2w3e”, “terra”)auth.attributes = [ “short_name”, “id_perm”, “antispam” ]
  • 35.
    Step 3 :Integration from trrauth import TrrAuthauth = TrrAuth(“IMAP”)auth.open_userpass(“standel”, “1q2w3e”, “terra”)auth.attributes = [ “short_name”, “id_perm”, “antispam” ] print auth.short_name, “:”, auth.id_perm
  • 36.
    Step 3 :Integration from trrauth import TrrAuthauth= TrrAuth(“IMAP”)auth.open_userpass(“standel”, “1q2w3e”, “terra”)auth.attributes = [ “short_name”, “id_perm”, “antispam” ]for attr, value in auth:  print attr, “:”, value
  • 37.
    Step 4 :Some Libs
  • 38.
    Step 4 :Some LibsMaster / SlaveMaster responsible for :Forking the slavesReading a “list” of tasksDistribution of the tasks to the slavesSlave responsible for :Execution of the taskReturn execution status to the masterKey characteristics :Slave death detectionHandle unhandled exceptions(+ hook)Master <-> slave protocol allows temporary error codeTimeout of the tasks
  • 39.
    Step 4 :Some LibsMaster / SlaveOne neat characteristic :System might got bug in prod w/ minimal impactIf unhandled exception occursOnly one slave diesIt is detected and master will fork a new one (if needed)The lib handles the exception :Default behavior : prints to consoleUser defined (callback) : e.g. write the stack trace to a file!Cherry on the cake : getting specific production data about faulty task
  • 40.
    from robustpools.process_pool importmaster_task_listfrom robustpools.process_pool import slave_task_listm_config = { 'INFINITE_LOOP' : 0 }class list_task(object): def __init__(self, list, num, timeout_validity=600):self.__num = num def _id(self): return self.__num id = property(_id) class list_slave(slave_task_list): def __init__(self):super(list_slave, self).__init__(list_task) def run(self, task): print task.id return 0, "ok”list = xrange(10)m = master_task_list(list, num_slave=5, slave_class=list_slave, config=m_config)m.start()
  • 41.
    Step 4 :Some LibsTCP Sockets PoolManage connections to a pool of serverssend in a round-robin/priority way to each serverDetect connection errorsRetry to connectNumber of retries limited => after mark as deadRetry again later with exponential backoff
  • 42.
    Step 5 :Prove It Works
  • 43.
    Step 5 :Prove It WorksProve = collect data … How?Write integrated systems using bindings and libs of previous steps.Show it works PerformanceProductivity
  • 44.
    Step 5 :Prove It WorksPerformance, one obvious thought : C/C++PINCSPerformance is not C, Stupid!
  • 45.
    Step 5 :Prove It WorksPerformanceSome of the rewrites works faster than C/C++ cousinsWhy?OS / Systems limitsLibs (legacy)AlgorithmsSoftware ArchitectureInfrastructure
  • 46.
    Step 5 :Prove It WorksProductivityBTW, pure performance so important?Time to Market much more importantAdopt Lean Thinking and eliminate every possible wasteWriting too much code is a big waste in several waysLoose time when writingIncrease # bugsMore time to maintainMore time to know code base (think to new employees)Impact Overall ProductivityStep 5 : Prove It WorksProductivityhttp://page.mi.fu-berlin.de/prechelt/Biblio/jccpprt2_advances2003.pdf
  • 47.
    Step 5 :Prove It WorksProductivityhttp://page.mi.fu-berlin.de/prechelt/Biblio/jccpprt2_advances2003.pdf
  • 48.
    Step 5 :Prove It WorksProductivityhttp://www.ohloh.net
  • 49.
    Step 5 :Prove It WorksProductivityhttp://www.ohloh.net
  • 50.
    Step 5 :Prove It WorksProductivityhttp://www.ohloh.net
  • 51.
    Step 5 :Prove It WorksProductivity
  • 52.
    Step 5 :Prove It WorksProductivitySome existing C/C++ systems re-written in PythonOriginal C/C++ versions total of ~20.000 LOC In Python, 4-6x less code !The previous numbers do not seem to lie 
  • 53.
    Step 5 :Prove It WorksProductivityOh, parsing an email?Any idea in C/C++?
  • 54.
    Step 5 :Prove It WorksProductivityparsing an emailfrom email import message_from_filefh = open(filename, “r”)mail = message_from_file(fh)fh.close()
  • 55.
    Step 5 :Prove It WorksProductivityparsing an emailcontent types of parts?Any idea in C/C++ ?from email import message_from_filedef get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mailmail = get_mail(filename)
  • 56.
    Step 5 :Prove It WorksProductivityparsing an email
  • 57.
    content types ofpartsfrom email import message_from_filedef get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mailmail = get_mail(filename)for part in mail.walk(): print part.get_content_type()
  • 58.
    Step 5 :Prove It WorksProductivityparsing an email
  • 59.
  • 60.
  • 61.
    Any idea inC/C++?from email import message_from_filedef get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mailmail = get_mail(filename)for part in mail.walk(): print part.get_content_type()
  • 62.
    Step 5 :Prove It WorksProductivityparsing an email
  • 63.
  • 64.
  • 65.
    Python libs arejust that simple!… and there are a lot!from email import message_from_filedef get_mail(filename):fh = open(filename, “r”)mail = message_from_file(fh)fh.close()return mailmail = get_mail(filename)for part in mail.walk(): print part.get_content_type()print mail[“From”]print mail[“Subject”]
  • 66.
    Step 5 :Prove It WorksPerformance (Again?)For equivalent architecture (libs, algorithm, infrastructure)C is a best performer than Python! Python Is Not C, Stupid!
  • 67.
    Step 5 :Prove It WorksPerformance (Again?)Bottleneck discovered!PINCS! : think first to architecture!
  • 68.
    Step 5 :Prove It WorksPerformance (Again?)Bottleneck discovered!PINCS! : think first to architecture!Ctypes/ Swig : python bindingsWrite your bottleneck in C / C++, use it in your python app
  • 69.
    Step 5 :Prove It WorksPerformance (Again?)Bottleneck discovered!PINCS! : think first to architecture!
  • 70.
    Ctypes : absurdlyeasy python bindingsCython: write python, obtain a gcc compiled lib
  • 71.
    Step 5 :Prove It WorksPerformance (Again?)Bottleneck discovered!PINCS! : think first to architecture!
  • 72.
    Ctypes : absurdlyeasy python bindings
  • 73.
    Cython: write python,obtain a gcc compiled libPsyco: JIT for pythonJust an additional module import in your code2 – 100x times faster than normal PythonRequires a bit more memory
  • 74.
    Step 5 :Prove It WorksPerformance (Again?)Bottleneck discovered!PINCS! : think first to architecture!
  • 75.
    Ctypes : absurdlyeasy python bindings
  • 76.
    Cython: write python,obtain a gcc compiled lib
  • 77.
  • 78.
    Unladden Swallow :Google Project
  • 79.
    Produce a versionof Python at least 5x faster
  • 80.
    Every patch goesto Python (no fork!)Step 6 : Evangelize
  • 81.
    Step 6 :EvangelizeOnce having stopped and look at what have been accomplished …Show it, Evangelize!
  • 82.
    Step 6 :EvangelizeBecause introducing a “new technology” is not just about teaching something to users. You’ve got to play the role of evangelist! Innovators (3.5%)New stuffs? they’re in!
  • 83.
    Step 6 :EvangelizeBecause introducing a “new technology” is not just about teaching something to users. You’ve got to play the role of evangelist! Innovators (3.5%) Early-adopters (12.5%)Open to new ideas but check before
  • 84.
    Step 6 :EvangelizeBecause introducing a “new technology” is not just about teaching something to users. You’ve got to play the role of evangelist! Innovators (3.5%)
  • 85.
    Early-adopters (12.5%)Early majority(35%)First, they must see the idea working
  • 86.
    Step 6 :EvangelizeBecause introducing a “new technology” is not just about teaching something to users. You’ve got to play the role of evangelist! Innovators (3.5%)
  • 87.
  • 88.
    Early majority (35%)Latemajority (35%)Accept after lot of pressure, or imposed
  • 89.
    Step 6 :EvangelizeBecause introducing a “new technology” is not just about teaching something to users. You’ve got to play the role of evangelist! Innovators (3.5%)
  • 90.
  • 91.
  • 92.
    Late majority (35%)Laggard(14%)Never accept (why would I want to change?)
  • 93.
    Step 6 :EvangelizeDuring work, I constantly spoke (a lot) to othersPresentation on Python made for allPresent to a large audience what has been doneOpen discussionPoster resuming what has been doneWiki page documenting Python stuffsSpecific mailing-list related to Python
  • 94.
    Step 6 :Evangelizelot of work and slow process but I won some alliesSome technical people are convinced that Python is usefulSome managers are convinced that Python could be a good thing for Terra Starting evaluation in some specific cases
  • 95.
    Step 7 :Next Steps
  • 96.
    Step 7 :Next Steps Proven that Python could be useful in some cases.Don’t forget my Grail!The way has not ended …I’m lobbying to start using Python for web development.And again, I made a prototype
  • 97.
    Step 7 :Next Steps Django = THE Python MVC web framework :Model :By describing data, no code written (SQLAlchemy)Automatic creation of tables (if needed),Data accessed through objects,No SQL needed!View :access models to get the datarender the output through templatesloose coupling interface code!Controller :REST through url parsing
  • 98.
    Step 7 :Next Steps Login : Module auth already exists.Easy to tell django that authentication is required@login_requireddef list_abook(request, username):…login_requiredis a python decorator
  • 99.
    Step 7 :Next Steps Caching information (memcache, bd, file, …)4 levels :Per site : one config linePer view : one python decorator@cache_page(60 * 15)def list_abook(request, username):…In templates : maybe better to let this one out! Low-level cache access :cache.get(id)cache.set(id, value, timeout)
  • 100.
    Step 7 :Next Steps Address book Web ServiceRetrieve address book of one user,Add an account,Add an entry to the address book of a user,View all the address book entries,Output in HTML, JSON and CSV< 100 LOC2 hours (w/o knowing the framework)Not one line of SQLjust usefulcode
  • 101.
    ConclusionsOne year anda half …and Evangelization is not done yet!Email Team :Several systems have been written in Python and works really fine … even with the Terra high load!Web project should start right nowPeople are starting using/learning it inside the companySome teams are starting evaluating PythonSome Terra employees here at this conference!
  • 102.

Editor's Notes

  • #40 C / C++ / Java a lot of projects … definitely references languagesC# going up but yet a few projectsRuby is not well established .. Yet? Only for web???Python has already a lot of projects and has the lowest LOC per project