Your SlideShare is downloading. ×
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking  of Python Scripts
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Benchy, python framework for performance benchmarking of Python Scripts

678

Published on

Talk at Pythonnordeste at Fortaleza, 25/05/2013 by Marcel Caraciolo

Talk at Pythonnordeste at Fortaleza, 25/05/2013 by Marcel Caraciolo

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
678
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. BenchyLightweight performing benchmark framework forPython scriptsMarcel Caraciolo@marcelcaracioloDeveloper, Cientist, contributor to the Crab recsys project,works with Python for 6 years, interested at mobile,education, machine learning and dataaaaa!Recife, Brazil - http://aimotion.blogspot.com
  • 2. About meCo-founder of Crab - Python recsys libraryCientist Chief at Atepassar, e-learning social networkCo-Founder and Instructor of PyCursos, teaching Python on-lineCo-Founder of Pingmind, on-line infrastructure for MOOC’sInterested at Python, mobile, e-learning and machine learning!
  • 3. Why do we test ?
  • 4. Freedom from fear
  • 5. Testing forperformance
  • 6. What made mycode slower ?
  • 7. me
  • 8. Solutions ?In  [1]:  def  f(x):      ...:          return  x*x      ...:  In  [2]:  %timeit  for  x  in  range(100):  f(x)100000  loops,  best  of  3:  20.3  us  per  loop
  • 9. Stop. Help is nearPerformance benchmarks to compare several python codealternativesGenerates graphs using matplotlibMemory consumption, Performance timing availablehttps://github.com/python-recsys/benchy
  • 10. Performancebenchmarks
  • 11. Writing benchmarks$  easy_install  -­‐U  benchy  #  pip  install  -­‐U  benchy
  • 12. Writing benchmarksfrom  benchy.api  import  Benchmarkcommon_setup  =  ""statement  =  "lst  =  [i  for  x  in  range(100000)]"benchmark1  =  Benchmark(statement,  common_setup,  name=  "range")statement  =  "lst  =  [i  for  x  in  xrange(100000)]"benchmark2  =  Benchmark(statement,  common_setup,  name=  "xrange")statement  =  "lst  =  [i]  *  100000"benchmark3  =  Benchmark(statement,  common_setup,  name=  "range")
  • 13. Use them in yourworkflow[1]:  print  benchmark1.run(){memory:  {repeat:  3,                        success:  True,                        units:  MB,                        usage:  2.97265625},  runtime:  {loops:  100,                          repeat:  3,                          success:  True,                          timing:  7.5653696060180664,                          units:  ms}}Same code as %timeitand %memit
  • 14. Beautiful reportsrst_text  =  benchmark1.to_rst(results)
  • 15. Benchmark suitefrom  benchy.api  import  BenchmarkSuitesuite  =  BenchmarkSuite()suite.append(benchmark1)suite.append(benchmark2)suite.append(benchmark3)
  • 16. Run the benchmarksfrom  benchy.api  import  BenchmarkRunnerrunner  =  BenchmarkRunner(benchmarks=suite,  tmp_dir=.,                                                            name=  List  Allocation  Benchmark)n_benchs,  results  =  runner.run()
  • 17. Who is the faster ?{Benchmark(list  with  "*"):        {runtime:  {timing:  0.47582697868347168,  repeat:  3,  success:  True,  loops:  1000,  timeBaselines:  1.0,  units:  ms},        memory:  {usage:  0.3828125,  units:  MB,  repeat:  3,  success:  True}},Benchmark(list  with  xrange):        {runtime:  {timing:  5.623779296875,  repeat:  3,  success:  True,  loops:  100,  timeBaselines:  11.818958463504936,  units:  ms},        memory:  {usage:  0.71484375,  units:  MB,  repeat:  3,  success:  True}},Benchmark(list  with  range):  {        runtime:  {timing:  6.5933513641357422,  repeat:  3,  success:  True,  loops:  100,  timeBaselines:  13.856615239384636,  units:  ms},        memory:  {usage:  2.2109375,  units:  MB,  repeat:  3,  success:  True}}}
  • 18. Plot relativefig  =  runner.plot_relative(results,  horizontal=True)plt.savefig(%s_r.png  %  runner.name,  bbox_inches=tight)
  • 19. Plot absoluterunner.plot_absolute(results,  horizontal=False)plt.savefig(%s.png  %  runner.name)  #  bbox_inches=tight)
  • 20. Full reportrst_text  =  runner.to_rst(results,  runner.name  +  png,                runner.name  +  _r.png)with  open(teste.rst,  w)  as  f:                f.write(rst_text)
  • 21. Full report
  • 22. Full report
  • 23. Why ?Benchmark pairwise functions at Crab recsys libraryhttp://aimotion.blogspot.com.br/2013/03/performing-runtime-benchmarks-with.html
  • 24. Get involvedCreate the benchmarks as TestCasesCheck automatically for benchmark files and run like %nose.test()More setup and teardown controlGroup benchmarks at the same graph
  • 25. ImprovementsAdded Database HandlerAdded Git SupportAdded New RunnerRun benchmarks
  • 26. db.pyimport  sqlite3    class  BenchmarkDb(object):        """        Persistence  handler  for  bechmark  results        """        def  _create_tables(self):                self._cursor.execute("drop  table  if  exists  benchmarksuites")                self._cursor.execute("drop  table  if  exists  benchmarks")                self._cursor.execute("drop  table  if  exists  results")                ...                    self._cursor.execute(CREATE  TABLE                              benchmarks(checksum  text  PRIMARY  KEY,                          name  text,  description  text,  suite_id  integer,                            FOREIGN  KEY(suite_id)  REFERENCES  benchmarksuites(id)))                  self._cursor.execute(CREATE  TABLE  results(id  integer                          PRIMARY  KEY  AUTOINCREMENT,  checksum  text,                          timestamp  timestamp,  ncalls  text,  timing  float,  traceback  text,                          FOREIGN  KEY(checksum)  REFERENCES  benchmarks(checksum)))                  self._con.commit()          def  write_benchmark(self,  bm,  suite=None):                if  suite  is  not  None:                        self._cursor.execute(SELECT  id  FROM  benchmarksuites                                  where  name  =  "%s"  %  suite.name)                        row  =  self._cursor.fetchone()                else:                        row  =  None                  if  row  ==  None:                        self._cursor.execute(INSERT  INTO  benchmarks  VALUES  (?,  ?,  ?,  ?),                                (bm.checksum,  bm.name,  bm.description,  None))                else:                        self._cursor.execute(INSERT  INTO  benchmarks  VALUES  (?,  ?,  ?,  ?),                                (bm.checksum,  bm.name,  bm.description,  row[0]))
  • 27. ImprovementsAdded Database HandlerAdded Git SupportAdded New RunnerRun benchmarks
  • 28. Git Repoclass  GitRepository(Repository):        """        Read  some  basic  statistics  about  a  git  repository        """          def  __init__(self,  repo_path):                self.repo_path  =  repo_path                self.git  =  _git_command(self.repo_path)                (self.shas,  self.messages,                  self.timestamps,  self.authors)  =  self._parse_commit_log()[(d87fdf2, datetime.datetime(2013, 3, 22, 16, 55, 38)), (a90a449, datetime.datetime(2013, 3, 22, 16, 54, 36)),(fe66a86, datetime.datetime(2013, 3, 22, 16, 51, 2)), (bea6b21, datetime.datetime(2013, 3, 22, 13, 14, 22)),(bde5e63, datetime.datetime(2013, 3, 22, 5, 2, 56)), (89634f6, datetime.datetime(2013, 3, 20, 4, 16, 19))]
  • 29. Git Repoclass  BenchmarkRepository(object):        """        Manage  an  isolated  copy  of  a  repository  for  benchmarking        """        ...          def  _copy_repo(self):                if  os.path.exists(self.target_dir):                        print  Deleting  %s  first  %  self.target_dir                        #  response  =  raw_input(%s  exists,  delete?  y/n  %  self.target_dir)                        #  if  response  ==  n:                        #          raise  Exception(foo)                        cmd  =  rm  -­‐rf  %s  %  self.target_dir                        print  cmd                        os.system(cmd)                  self._clone(self.target_dir_tmp,  self.target_dir)                self._prep()                self._copy_benchmark_scripts_and_deps()          def  _clone(self,  source,  target):                cmd  =  git  clone  %s  %s  %  (source,  target)                print  cmd                os.system(cmd)          def  _copy_benchmark_scripts_and_deps(self):                pth,  _  =  os.path.split(os.path.abspath(__file__))                deps  =  [os.path.join(pth,  run_benchmarks.py)]                if  self.dependencies  is  not  None:                        deps.extend(self.dependencies)                  for  dep  in  deps:                        cmd  =  cp  %s  %s  %  (dep,  self.target_dir)                        print  cmd                        proc  =  subprocess.Popen(cmd,  shell=True)                        proc.wait()
  • 30. ImprovementsAdded Database HandlerAdded Git SupportAdded New RunnerRun benchmarks
  • 31. New Runner  class  BenchmarkGitRunner(BenchmarkRunner):    ...              def  _register_benchmarks(self):                ex_benchmarks  =  self.db.get_benchmarks()                db_checksums  =  set(ex_benchmarks.index)                for  bm  in  self.benchmarks:                        if  bm.checksum  in  db_checksums:                                self.db.update_name(bm)                        else:                                print  Writing  new  benchmark  %s,  %s  %  (bm.name,                                                                                              bm.checksum)                                self.db.write_benchmark(bm)  
  • 32. New runner  class  BenchmarkGitRunner(BenchmarkRunner):    ...              def  _run_revision(self,  rev):                need_to_run  =  self._get_benchmarks_for_rev(rev)                  if  not  need_to_run:                        print  No  benchmarks  need  running  at  %s  %  rev                        return  0,  {}                  print  Running  %d  benchmarks  for  revision  %s  %  (len(need_to_run),  rev)                for  bm  in  need_to_run:                        print  bm.name                  self.bench_repo.switch_to_revision(rev)                  pickle_path  =  os.path.join(self.tmp_dir,  benchmarks.pickle)                results_path  =  os.path.join(self.tmp_dir,  results.pickle)                if  os.path.exists(results_path):                        os.remove(results_path)                pickle.dump(need_to_run,  open(pickle_path,  w))                  #  run  the  process                cmd  =  python  %s/run_benchmarks.py  %s  %s  %  (pickle_path,  results_path)                print  cmd                proc  =  subprocess.Popen(cmd,  stdout=subprocess.PIPE,                                                                stderr=subprocess.PIPE,                                                                shell=True,                                                                cwd=self.tmp_dir)                stdout,  stderr  =  proc.communicate()  
  • 33. New runner  class  BenchmarkGitRunner(BenchmarkRunner):    ...              def  _run_revision(self,  rev):                need_to_run  =  self._get_benchmarks_for_rev(rev)                  if  not  need_to_run:                        print  No  benchmarks  need  running  at  %s  %  rev                        return  0,  {}                  print  Running  %d  benchmarks  for  revision  %s  %  (len(need_to_run),  rev)                for  bm  in  need_to_run:                        print  bm.name                  self.bench_repo.switch_to_revision(rev)                #  run  the  process                cmd  =  python  %s/run_benchmarks.py  %s  %s  %  (pickle_path,  results_path)                print  cmd                proc  =  subprocess.Popen(cmd,  stdout=subprocess.PIPE,                                                                stderr=subprocess.PIPE,                                                                shell=True,                                                                cwd=self.tmp_dir)                stdout,  stderr  =  proc.communicate()                      if  stderr:                        if  ("object  has  no  attribute"  in  stderr  or                                ImportError  in  stderr):                                print  stderr                                print  HARD  CLEANING!                                self.bench_repo.hard_clean()                        print  stderr                    if  not  os.path.exists(results_path):                        print  Failed  for  revision  %s  %  rev                        return  len(need_to_run),  {}                results  =  pickle.load(open(results_path,  r))
  • 34. ImprovementsAdded Database HandlerAdded Git SupportAdded New RunnerRun benchmarks
  • 35. Runningfrom  benchmark  import  Benchmark,  BenchmarkRepository,  BenchmarkGitRunnertry:        REPO_PATH  =  config.get(setup,  repo_path)        REPO_URL  =  config.get(setup,  repo_url)        DB_PATH  =  config.get(setup,  db_path)        TMP_DIR  =  config.get(setup,  tmp_dir)except:        REPO_PATH  =  os.path.abspath(os.path.join(os.path.dirname(__file__),  "../"))        REPO_URL  =  git@github.com:python-­‐recsys/crab.git        DB_PATH  =  os.path.join(REPO_PATH,  suite/benchmarks.db)        TMP_DIR  =  os.path.join(HOME,  tmp/base_benchy/)PREPARE  =  """python  setup.py  clean"""BUILD  =  """python  setup.py  build_ext  -­‐-­‐inplace"""repo  =  BenchmarkRepository(REPO_PATH,  REPO_URL,  DB_PATH,  TMP_DIR)
  • 36. Running        common_setup  =  """          import  numpy          from  crab.metrics  import  cosine_distances          X  =  numpy.random.uniform(1,5,(1000,))        """          bench  =  Benchmark(statement,  setup_bk1,  name="Crab  Cosine")          suite  =  BenchmarkSuite()        suite.append(bench)                  statement  =  "cosine_distances(X,  X)"          runner  =  BenchmarkGitRunner(suite,  .,  Absolute  timing  in  ms)        n_benchs,  results  =  runner.run()          runner.plot_history(results)        plt.show()
  • 37. ImprovementsHistorical commits from version control nowbenchmarked
  • 38. Working now:Module detectionby_module  =  {}benchmarks  =  []modules  =  [metrics,                      recommenders,                      similarities]for  modname  in  modules:        ref  =  __import__(modname)        by_module[modname]  =  [v  for  v  in  ref.__dict__.values()                                                    if  isinstance(v,  Benchmark)]        benchmarks.extend(by_module[modname])for  bm  in  benchmarks:        assert(bm.name  is  not  None)
  • 39. https://github.com/python-recsys/benchyForks and pull requests are welcomed!
  • 40. BenchyLightweight performing benchmark framework forPython scriptsMarcel Caraciolo@marcelcaracioloDeveloper, Cientist, contributor to the Crab recsys project,works with Python for 6 years, interested at mobile,education, machine learning and dataaaaa!Recife, Brazil - http://aimotion.blogspot.com

×