TRAP STATUS UPDATE
TRAnsients Pipeline
Gijs Molenaar
gijs@pythonic.nl
@gijzelaerr
Thursday, July 11, 13
ABOUTTRAP
• TRAnsients Pipeline
• Detect and classify transients in multi-frequency radio sky
image time series
• Emit VOevents
• 99% Python
Thursday, July 11, 13
STEPS
Thursday, July 11, 13
A LOT HAPPENED
• Version 1.0 imminent
• Focused on code quality and performance
• No big new science features
Thursday, July 11, 13
PERFORMANCE
• A lot faster
• Really a lot faster
• 0.85 image per second
per core
• Scales well
minutes
Thursday, July 11, 13
RSM CYCLE0 RUN0
• 3402 images
• processing record - 5:21 min
• 2 machines, 36 cores
• 5645 unique sources
• 667 detected transients
• previous version: 400 min on
40 cores
Thursday, July 11, 13
TRAP & AARTFAAC
• AARTFAAC
• 48 images/s
• 57 (real) cores required
• 1 or 2 big fat systems will do!
Thursday, July 11, 13
INSTALLABILITY
• Merged TKP into TRAP
• Almost open source
• Easy database setup
• Remove many dependencies
• Like Lofar System Software (closed source)
Thursday, July 11, 13
QUALITY CONTROL
• Automated rejection of
bad images
• Known bright source in FOV
• RMS x times higher than
theoretical noise
• oversampled / undersampled /
highly elliptical
Thursday, July 11, 13
STORAGE
• Added support for PostgreSQL
• fast with small datasets
• Many off-the-shelf tools available
Thursday, July 11, 13
UNDERTHE HOOD
• Switched to celery
• asynchronous job queue
• based on distributed
message passing
• No more cuisine
Thursday, July 11, 13
WHY CELERY
• Easier to use / install / debug
• Faster - hot processes
• Many off-the-shelf tools
• CEP1 compatible
• Easy to add compute nodes
Thursday, July 11, 13
Thursday, July 11, 13
DISCO?
• Maybe add support for
Disco in the future
• Similar
• Map - Reduce
• Hadoop for Python
• Distributed file system
Thursday, July 11, 13
USABILITY
• tkp-manage.py
• Pipeline management tool
• Inspired by Django
manage.py command
• Easy to
• setup pipeline
• add and run jobs
• run celery workers
• Add new commands
Thursday, July 11, 13
DEMO?
Thursday, July 11, 13
SUPPORTEDTELESCOPES
• Support for FITS and CASA tables
• field parsers for LOFAR
• Possible to add telescope specific field parsing and
quality checks
• ThunderKAT next week
Thursday, July 11, 13
PROJECT CLEANUP
• removed 40% of code
• 80% unit tested
• Added jenkins build server
• Performance regression tests
• Pull request/review work flow
• hipchat for central communication
Thursday, July 11, 13
WEB INTERFACE BANANA
• New web interface
• Rewrite of TKP-web
• Future ready
• Scientist friendly
Thursday, July 11, 13
Thursday, July 11, 13
DEMO?
Thursday, July 11, 13
FUTURE WORK
• More stable releases
• Add support for non-LOFAR data
• More quality checks
• Source storage and association
performance
• Distributed file system
• Automated classification
• Web based data exploration
Thursday, July 11, 13
QUESTIONS
gijs@pythonic.nl
@gijzelaerr
Thursday, July 11, 13

TRAP (transient detection pipeline) status update

  • 1.
    TRAP STATUS UPDATE TRAnsientsPipeline Gijs Molenaar gijs@pythonic.nl @gijzelaerr Thursday, July 11, 13
  • 2.
    ABOUTTRAP • TRAnsients Pipeline •Detect and classify transients in multi-frequency radio sky image time series • Emit VOevents • 99% Python Thursday, July 11, 13
  • 3.
  • 4.
    A LOT HAPPENED •Version 1.0 imminent • Focused on code quality and performance • No big new science features Thursday, July 11, 13
  • 5.
    PERFORMANCE • A lotfaster • Really a lot faster • 0.85 image per second per core • Scales well minutes Thursday, July 11, 13
  • 6.
    RSM CYCLE0 RUN0 •3402 images • processing record - 5:21 min • 2 machines, 36 cores • 5645 unique sources • 667 detected transients • previous version: 400 min on 40 cores Thursday, July 11, 13
  • 7.
    TRAP & AARTFAAC •AARTFAAC • 48 images/s • 57 (real) cores required • 1 or 2 big fat systems will do! Thursday, July 11, 13
  • 8.
    INSTALLABILITY • Merged TKPinto TRAP • Almost open source • Easy database setup • Remove many dependencies • Like Lofar System Software (closed source) Thursday, July 11, 13
  • 9.
    QUALITY CONTROL • Automatedrejection of bad images • Known bright source in FOV • RMS x times higher than theoretical noise • oversampled / undersampled / highly elliptical Thursday, July 11, 13
  • 10.
    STORAGE • Added supportfor PostgreSQL • fast with small datasets • Many off-the-shelf tools available Thursday, July 11, 13
  • 11.
    UNDERTHE HOOD • Switchedto celery • asynchronous job queue • based on distributed message passing • No more cuisine Thursday, July 11, 13
  • 12.
    WHY CELERY • Easierto use / install / debug • Faster - hot processes • Many off-the-shelf tools • CEP1 compatible • Easy to add compute nodes Thursday, July 11, 13
  • 13.
  • 14.
    DISCO? • Maybe addsupport for Disco in the future • Similar • Map - Reduce • Hadoop for Python • Distributed file system Thursday, July 11, 13
  • 15.
    USABILITY • tkp-manage.py • Pipelinemanagement tool • Inspired by Django manage.py command • Easy to • setup pipeline • add and run jobs • run celery workers • Add new commands Thursday, July 11, 13
  • 16.
  • 17.
    SUPPORTEDTELESCOPES • Support forFITS and CASA tables • field parsers for LOFAR • Possible to add telescope specific field parsing and quality checks • ThunderKAT next week Thursday, July 11, 13
  • 18.
    PROJECT CLEANUP • removed40% of code • 80% unit tested • Added jenkins build server • Performance regression tests • Pull request/review work flow • hipchat for central communication Thursday, July 11, 13
  • 19.
    WEB INTERFACE BANANA •New web interface • Rewrite of TKP-web • Future ready • Scientist friendly Thursday, July 11, 13
  • 20.
  • 21.
  • 22.
    FUTURE WORK • Morestable releases • Add support for non-LOFAR data • More quality checks • Source storage and association performance • Distributed file system • Automated classification • Web based data exploration Thursday, July 11, 13
  • 23.