SlideShare a Scribd company logo
ProgressTracker
A handy pattern for tracking processing progress
2018-01-29
A common problem
Often I am processing a lot of messages in a simple script
for message in messages:
process(message)
The processing might take several minutes and I want to know how close I am
to completion.
I want some indication of progress
First attempt
Print out every 100 records.
for index, message in enumerate(messages):
if index % 100 == 0:
print(f"Processed {index} messages")
process(message)
Second attempt: Add time taken
from datetime import datetime
start_time = datetime.utcnow()
for index, message in enumerate(messages):
if index % 100 == 0:
print(f"Processed {index} messages")
process(message)
end_time = datetime.utcnow()
print(f"Processing took {end_time - start_time}")
Third attempt: Add messages/second
start_time = datetime.utcnow()
for index, message in enumerate(messages):
if index % 100 == 0:
seconds_so_far = (datetime.utcnow() - start_time).total_seconds()
messages_per_second = (index / seconds_so_far) if seconds_so_far != 0 else None
print(f"Processed {index} messages ({messages_per_second}/s)“)
process(message)
end_time = datetime.utcnow()
print(f"Processing took {end_time - start_time}")
Third attempt: Add messages/second
start_time = datetime.utcnow()
for index, message in enumerate(messages):
if index % 100 == 0:
seconds_so_far = (datetime.utcnow() - start_time).total_seconds()
messages_per_second = (index / seconds_so_far) if seconds_so_far != 0 else None
print(f"Processed {index} messages ({messages_per_second}/s)“)
process(message)
end_time = datetime.utcnow()
total_duration = end_time - start_time
total_seconds = total_duration.total_seconds()
messages_per_second = (index / total_seconds) if total_seconds != 0 else None
print(f"Processing took {total_duration}, ({messages_per_second}/s)")
Third attempt: Add messages/second
def print_progress(messages_processed, start_time):
duration = (datetime.utcnow() - start_time)
seconds_so_far = duration.total_seconds()
messages_per_second = (messages_processed / seconds_so_far) if seconds_so_far != 0 else None
print(f"Processed {messages_processed} messages in {duration} ({messages_per_second}/s)“)
start_time = datetime.utcnow()
for index, message in enumerate(messages):
if index % 100 == 0:
print_progress(index, start_time)
process(message)
print_progress(len(messages), start_time)
Fourth attempt: Add percent complete
def print_progress(messages_processed, total_message_count, start_time):
duration = (datetime.utcnow() - start_time)
seconds_so_far = duration.total_seconds()
messages_per_second = (messages_processed / seconds_so_far) if seconds_so_far != 0 else None
percent_complete = (messages_processed / total_message_count) * 100
print(f“Processed {messages_processed} messages ({percent_complete}%) in {duration} ({messages_per_second}/s)“)
start_time = datetime.utcnow()
for index, message in enumerate(messages):
if index % 100 == 0:
print_progress(index, len(messages), start_time)
process(message)
print_progress(len(messages), len(messages), start_time)
Fifth attempt: Add time remaining
from datetime import datetime, timedelta
def print_progress(messages_processed, total_message_count, start_time):
duration = (datetime.utcnow() - start_time)
seconds_so_far = duration.total_seconds()
messages_per_second = (messages_processed / seconds_so_far) if seconds_so_far != 0 else None
percent_complete = (messages_processed / total_message_count) * 100
estimated_time_remaining = timedelta(seconds=((100 - percent_complete) / percent_complete) * seconds_so_far) if percent_complete != 0 else None
print(f"Processed {messages_processed} messages ({percent_complete}%) in {duration} ({messages_per_second}/s, ETA: {estimated_time_remaining})”)
start_time = datetime.utcnow()
for index, message in enumerate(messages):
if index % 100 == 0:
print_progress(index, len(messages), start_time)
process(message)
print_progress(len(messages), len(messages), start_time)
Repeated parameters
We are passing in the same parameter values (total_message_count,
start_time) every time we call print progress:
start_time = datetime.utcnow()
for index, message in enumerate(messages):
if index % 100 == 0:
print_progress(index, len(messages), start_time)
process(message)
print_progress(len(messages), len(messages), start_time)
It would be nice if print_progress would remember these values.
Rework it as a class?
Sixth attempt: Refactor as class
class ProgressTracker(object):
def __init__(self, total_message_count, start_time):
self.total_message_count = total_message_count
self.start_time = start_time
def print_progress(self, messages_processed):
duration = (datetime.utcnow() - self.start_time)
seconds_so_far = duration.total_seconds()
messages_per_second = (messages_processed / seconds_so_far) if seconds_so_far != 0 else None
percent_complete = (messages_processed / self.total_message_count) * 100
estimated_time_remaining = timedelta(seconds=((100 - percent_complete) / percent_complete) * seconds_so_far) if percent_complete != 0 else None
print(f"Processed {messages_processed} messages ({percent_complete}%) in {duration} ({messages_per_second}/s, ETA: {estimated_time_remaining})")
start_time = datetime.utcnow()
tracker = ProgressTracker(len(messages), start_time)
for index, message in enumerate(messages):
if index % 100 == 0:
tracker.print_progress(index)
process(message)
tracker.print_progress(len(messages))
Sixth attempt: Refactor as class
class ProgressTracker(object):
def __init__(self, total_message_count):
self.total_message_count = total_message_count
self.start_time = datetime.utcnow()
def print_progress(self, messages_processed):
duration = (datetime.utcnow() - self.start_time)
seconds_so_far = duration.total_seconds()
messages_per_second = (messages_processed / seconds_so_far) if seconds_so_far != 0 else None
percent_complete = (messages_processed / self.total_message_count) * 100
estimated_time_remaining = timedelta(seconds=((100 - percent_complete) / percent_complete) *
seconds_so_far) if percent_complete != 0 else None
print(f"Processed {messages_processed} messages ({percent_complete}%) in {duration}
({messages_per_second}/s, ETA: {estimated_time_remaining})")
tracker = ProgressTracker(len(messages))
for index, message in enumerate(messages):
if index % 100 == 0:
tracker.print_progress(index)
process(message)
tracker.print_progress(len(messages))
Results
So now we’ve gone from 2 lines to 28 lines
Not quite fair. If we move the Progress Tracker out into a different file, it’s
only 7 lines:
from progress_tracker import ProgressTracker
tracker = ProgressTracker(len(messages))
for index, message in enumerate(messages):
if index % 100 == 0:
tracker.print_progress(index)
process(message)
tracker.print_progress(len(messages))
Generators
Remember enumerate()?
for index, message in enumerate(messages):
It wraps the iteration of a iterable and does additional computation
We could do the same thing with ProgressTracker
Seventh attempt: Refactor as generator
class ProgressTracker(object):
def __init__(self, iterable):
self.iterable = iterable
self.total_message_count = len(iterable)
self.start_time = None
def print_progress(self, messages_processed):
…
def __iter__(self):
if self.start_time is None:
self.start_time = datetime.utcnow()
for index, message in enumerate(messages):
if index % 100 == 0:
self.print_progress(index)
yield message
self.print_progress(self.total_message_count)
Results
Back down to 3 lines:
from progress_tracker import ProgressTracker
for message in ProgressTracker(messages):
process(message)
Limitations
Currently I have a hard-coded “output every 100 entries”
• This might be way too much output, especially if you are processing
millions of messages.
You might want to only output every 10%
But every 10% might be too long between reports
So you might also want to output every 30 seconds as well.
Or perhaps more complicated conditions.
ie. You want to be able to customize the conditions that will trigger output.
Unbounded message stream
What about infinite streams of messages?
You obviously can’t do percent complete or ETA
But it would be nice to use the same code for both bounded and unbounded
streams.
Final API
ProgressTracker(
iterable, # The iterable to iterate over
total=None, # Override for the total message count, defaults to len(iterable)
callback=print, # A function (f(string): None) that gets called each time a condition matches
format_string=None, # Custom format string, sensible defaults for both bounded and unbounded iterables
every_n_records=None, # Reports every n records
every_x_percent=None, # Reports after every x percent
every_n_seconds=None, # Reports every n seconds
every_n_seconds_idle=None, # Report every n seconds, but only if there hasn’t been any
progress. Useful for infinite streams
ignore_first_iteration=True, # Don’t report on the first iteration
last_iteration=False # Report after the last iteration
)
for message in ProgressTracker(messages, every_n_records=10000, every_x_percent=5):
process(message)
Final API
Make it more Pythonic:
def track_progress(iterable, **kwargs):
return ProgressTracker(iterable, **kwargs)
Example:
for message in track_progress(messages, every_n_records=10000, every_x_percent=5):
process(message)
Limitations
• Single threaded
Thanks
Questions?
I’m Michael Overmeyer:
@movermeyer on every platform

More Related Content

Similar to Progress tracker - A handy progress printout pattern

Something about Golang
Something about GolangSomething about Golang
Something about GolangAnton Arhipov
 
Operating system labs
Operating system labsOperating system labs
Operating system labsbhaktisagar4
 
lec3 (1).ppt megerments for peromomence
lec3 (1).ppt megerments   for peromomencelec3 (1).ppt megerments   for peromomence
lec3 (1).ppt megerments for peromomenceMadhuGupta99385
 
In assembly language for x86 processors write a code that wi.pdf
In assembly language for x86 processors write a code that wi.pdfIn assembly language for x86 processors write a code that wi.pdf
In assembly language for x86 processors write a code that wi.pdfadithyaups
 
Lab manual operating system [cs 502 rgpv] (usefulsearch.org) (useful search)
Lab manual operating system [cs 502 rgpv] (usefulsearch.org)  (useful search)Lab manual operating system [cs 502 rgpv] (usefulsearch.org)  (useful search)
Lab manual operating system [cs 502 rgpv] (usefulsearch.org) (useful search)Make Mannan
 
ODU ACM Python & Memento Presentation
ODU ACM Python & Memento PresentationODU ACM Python & Memento Presentation
ODU ACM Python & Memento PresentationScottAinsworth
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwnARUN DN
 
Help Needed!UNIX Shell and History Feature This project consists.pdf
Help Needed!UNIX Shell and History Feature This project consists.pdfHelp Needed!UNIX Shell and History Feature This project consists.pdf
Help Needed!UNIX Shell and History Feature This project consists.pdfmohdjakirfb
 
Performance analysis and randamized agoritham
Performance analysis and randamized agorithamPerformance analysis and randamized agoritham
Performance analysis and randamized agorithamlilyMalar1
 
Make Sure Your Applications Crash
Make Sure Your  Applications CrashMake Sure Your  Applications Crash
Make Sure Your Applications CrashMoshe Zadka
 
Taming Cloud APIs with Swift
Taming Cloud APIs with SwiftTaming Cloud APIs with Swift
Taming Cloud APIs with SwiftTim Burks
 
Advanced patterns in asynchronous programming
Advanced patterns in asynchronous programmingAdvanced patterns in asynchronous programming
Advanced patterns in asynchronous programmingMichael Arenzon
 
Java Foundations: Data Types and Type Conversion
Java Foundations: Data Types and Type ConversionJava Foundations: Data Types and Type Conversion
Java Foundations: Data Types and Type ConversionSvetlin Nakov
 
To write a program that implements the following C++ concepts 1. Dat.pdf
To write a program that implements the following C++ concepts 1. Dat.pdfTo write a program that implements the following C++ concepts 1. Dat.pdf
To write a program that implements the following C++ concepts 1. Dat.pdfSANDEEPARIHANT
 
Python programing
Python programingPython programing
Python programinghamzagame
 
«Gevent — быть или не быть?» Александр Мокров, Positive Technologies
«Gevent — быть или не быть?» Александр Мокров, Positive Technologies«Gevent — быть или не быть?» Александр Мокров, Positive Technologies
«Gevent — быть или не быть?» Александр Мокров, Positive Technologiesit-people
 

Similar to Progress tracker - A handy progress printout pattern (20)

Something about Golang
Something about GolangSomething about Golang
Something about Golang
 
Writing Faster Python 3
Writing Faster Python 3Writing Faster Python 3
Writing Faster Python 3
 
Operating system labs
Operating system labsOperating system labs
Operating system labs
 
lec3 (1).ppt megerments for peromomence
lec3 (1).ppt megerments   for peromomencelec3 (1).ppt megerments   for peromomence
lec3 (1).ppt megerments for peromomence
 
In assembly language for x86 processors write a code that wi.pdf
In assembly language for x86 processors write a code that wi.pdfIn assembly language for x86 processors write a code that wi.pdf
In assembly language for x86 processors write a code that wi.pdf
 
Lab manual operating system [cs 502 rgpv] (usefulsearch.org) (useful search)
Lab manual operating system [cs 502 rgpv] (usefulsearch.org)  (useful search)Lab manual operating system [cs 502 rgpv] (usefulsearch.org)  (useful search)
Lab manual operating system [cs 502 rgpv] (usefulsearch.org) (useful search)
 
ODU ACM Python & Memento Presentation
ODU ACM Python & Memento PresentationODU ACM Python & Memento Presentation
ODU ACM Python & Memento Presentation
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwn
 
Help Needed!UNIX Shell and History Feature This project consists.pdf
Help Needed!UNIX Shell and History Feature This project consists.pdfHelp Needed!UNIX Shell and History Feature This project consists.pdf
Help Needed!UNIX Shell and History Feature This project consists.pdf
 
Performance analysis and randamized agoritham
Performance analysis and randamized agorithamPerformance analysis and randamized agoritham
Performance analysis and randamized agoritham
 
C# Loops
C# LoopsC# Loops
C# Loops
 
Lab 1 izz
Lab 1 izzLab 1 izz
Lab 1 izz
 
Make Sure Your Applications Crash
Make Sure Your  Applications CrashMake Sure Your  Applications Crash
Make Sure Your Applications Crash
 
Taming Cloud APIs with Swift
Taming Cloud APIs with SwiftTaming Cloud APIs with Swift
Taming Cloud APIs with Swift
 
Advanced patterns in asynchronous programming
Advanced patterns in asynchronous programmingAdvanced patterns in asynchronous programming
Advanced patterns in asynchronous programming
 
Java Foundations: Data Types and Type Conversion
Java Foundations: Data Types and Type ConversionJava Foundations: Data Types and Type Conversion
Java Foundations: Data Types and Type Conversion
 
To write a program that implements the following C++ concepts 1. Dat.pdf
To write a program that implements the following C++ concepts 1. Dat.pdfTo write a program that implements the following C++ concepts 1. Dat.pdf
To write a program that implements the following C++ concepts 1. Dat.pdf
 
Python programing
Python programingPython programing
Python programing
 
Gevent be or not to be
Gevent be or not to beGevent be or not to be
Gevent be or not to be
 
«Gevent — быть или не быть?» Александр Мокров, Positive Technologies
«Gevent — быть или не быть?» Александр Мокров, Positive Technologies«Gevent — быть или не быть?» Александр Мокров, Positive Technologies
«Gevent — быть или не быть?» Александр Мокров, Positive Technologies
 

Recently uploaded

GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisNeo4j
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAlluxio, Inc.
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfmbmh111980
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandIES VE
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfkalichargn70th171
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfOrtus Solutions, Corp
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownloadvrstrong314
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILNatan Silnitsky
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion Clinic
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfMeon Technology
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1KnowledgeSeed
 

Recently uploaded (20)

GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysis
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 

Progress tracker - A handy progress printout pattern

  • 1. ProgressTracker A handy pattern for tracking processing progress 2018-01-29
  • 2. A common problem Often I am processing a lot of messages in a simple script for message in messages: process(message) The processing might take several minutes and I want to know how close I am to completion. I want some indication of progress
  • 3. First attempt Print out every 100 records. for index, message in enumerate(messages): if index % 100 == 0: print(f"Processed {index} messages") process(message)
  • 4. Second attempt: Add time taken from datetime import datetime start_time = datetime.utcnow() for index, message in enumerate(messages): if index % 100 == 0: print(f"Processed {index} messages") process(message) end_time = datetime.utcnow() print(f"Processing took {end_time - start_time}")
  • 5. Third attempt: Add messages/second start_time = datetime.utcnow() for index, message in enumerate(messages): if index % 100 == 0: seconds_so_far = (datetime.utcnow() - start_time).total_seconds() messages_per_second = (index / seconds_so_far) if seconds_so_far != 0 else None print(f"Processed {index} messages ({messages_per_second}/s)“) process(message) end_time = datetime.utcnow() print(f"Processing took {end_time - start_time}")
  • 6. Third attempt: Add messages/second start_time = datetime.utcnow() for index, message in enumerate(messages): if index % 100 == 0: seconds_so_far = (datetime.utcnow() - start_time).total_seconds() messages_per_second = (index / seconds_so_far) if seconds_so_far != 0 else None print(f"Processed {index} messages ({messages_per_second}/s)“) process(message) end_time = datetime.utcnow() total_duration = end_time - start_time total_seconds = total_duration.total_seconds() messages_per_second = (index / total_seconds) if total_seconds != 0 else None print(f"Processing took {total_duration}, ({messages_per_second}/s)")
  • 7. Third attempt: Add messages/second def print_progress(messages_processed, start_time): duration = (datetime.utcnow() - start_time) seconds_so_far = duration.total_seconds() messages_per_second = (messages_processed / seconds_so_far) if seconds_so_far != 0 else None print(f"Processed {messages_processed} messages in {duration} ({messages_per_second}/s)“) start_time = datetime.utcnow() for index, message in enumerate(messages): if index % 100 == 0: print_progress(index, start_time) process(message) print_progress(len(messages), start_time)
  • 8. Fourth attempt: Add percent complete def print_progress(messages_processed, total_message_count, start_time): duration = (datetime.utcnow() - start_time) seconds_so_far = duration.total_seconds() messages_per_second = (messages_processed / seconds_so_far) if seconds_so_far != 0 else None percent_complete = (messages_processed / total_message_count) * 100 print(f“Processed {messages_processed} messages ({percent_complete}%) in {duration} ({messages_per_second}/s)“) start_time = datetime.utcnow() for index, message in enumerate(messages): if index % 100 == 0: print_progress(index, len(messages), start_time) process(message) print_progress(len(messages), len(messages), start_time)
  • 9. Fifth attempt: Add time remaining from datetime import datetime, timedelta def print_progress(messages_processed, total_message_count, start_time): duration = (datetime.utcnow() - start_time) seconds_so_far = duration.total_seconds() messages_per_second = (messages_processed / seconds_so_far) if seconds_so_far != 0 else None percent_complete = (messages_processed / total_message_count) * 100 estimated_time_remaining = timedelta(seconds=((100 - percent_complete) / percent_complete) * seconds_so_far) if percent_complete != 0 else None print(f"Processed {messages_processed} messages ({percent_complete}%) in {duration} ({messages_per_second}/s, ETA: {estimated_time_remaining})”) start_time = datetime.utcnow() for index, message in enumerate(messages): if index % 100 == 0: print_progress(index, len(messages), start_time) process(message) print_progress(len(messages), len(messages), start_time)
  • 10. Repeated parameters We are passing in the same parameter values (total_message_count, start_time) every time we call print progress: start_time = datetime.utcnow() for index, message in enumerate(messages): if index % 100 == 0: print_progress(index, len(messages), start_time) process(message) print_progress(len(messages), len(messages), start_time) It would be nice if print_progress would remember these values. Rework it as a class?
  • 11. Sixth attempt: Refactor as class class ProgressTracker(object): def __init__(self, total_message_count, start_time): self.total_message_count = total_message_count self.start_time = start_time def print_progress(self, messages_processed): duration = (datetime.utcnow() - self.start_time) seconds_so_far = duration.total_seconds() messages_per_second = (messages_processed / seconds_so_far) if seconds_so_far != 0 else None percent_complete = (messages_processed / self.total_message_count) * 100 estimated_time_remaining = timedelta(seconds=((100 - percent_complete) / percent_complete) * seconds_so_far) if percent_complete != 0 else None print(f"Processed {messages_processed} messages ({percent_complete}%) in {duration} ({messages_per_second}/s, ETA: {estimated_time_remaining})") start_time = datetime.utcnow() tracker = ProgressTracker(len(messages), start_time) for index, message in enumerate(messages): if index % 100 == 0: tracker.print_progress(index) process(message) tracker.print_progress(len(messages))
  • 12. Sixth attempt: Refactor as class class ProgressTracker(object): def __init__(self, total_message_count): self.total_message_count = total_message_count self.start_time = datetime.utcnow() def print_progress(self, messages_processed): duration = (datetime.utcnow() - self.start_time) seconds_so_far = duration.total_seconds() messages_per_second = (messages_processed / seconds_so_far) if seconds_so_far != 0 else None percent_complete = (messages_processed / self.total_message_count) * 100 estimated_time_remaining = timedelta(seconds=((100 - percent_complete) / percent_complete) * seconds_so_far) if percent_complete != 0 else None print(f"Processed {messages_processed} messages ({percent_complete}%) in {duration} ({messages_per_second}/s, ETA: {estimated_time_remaining})") tracker = ProgressTracker(len(messages)) for index, message in enumerate(messages): if index % 100 == 0: tracker.print_progress(index) process(message) tracker.print_progress(len(messages))
  • 13. Results So now we’ve gone from 2 lines to 28 lines Not quite fair. If we move the Progress Tracker out into a different file, it’s only 7 lines: from progress_tracker import ProgressTracker tracker = ProgressTracker(len(messages)) for index, message in enumerate(messages): if index % 100 == 0: tracker.print_progress(index) process(message) tracker.print_progress(len(messages))
  • 14. Generators Remember enumerate()? for index, message in enumerate(messages): It wraps the iteration of a iterable and does additional computation We could do the same thing with ProgressTracker
  • 15. Seventh attempt: Refactor as generator class ProgressTracker(object): def __init__(self, iterable): self.iterable = iterable self.total_message_count = len(iterable) self.start_time = None def print_progress(self, messages_processed): … def __iter__(self): if self.start_time is None: self.start_time = datetime.utcnow() for index, message in enumerate(messages): if index % 100 == 0: self.print_progress(index) yield message self.print_progress(self.total_message_count)
  • 16. Results Back down to 3 lines: from progress_tracker import ProgressTracker for message in ProgressTracker(messages): process(message)
  • 17. Limitations Currently I have a hard-coded “output every 100 entries” • This might be way too much output, especially if you are processing millions of messages. You might want to only output every 10% But every 10% might be too long between reports So you might also want to output every 30 seconds as well. Or perhaps more complicated conditions. ie. You want to be able to customize the conditions that will trigger output.
  • 18. Unbounded message stream What about infinite streams of messages? You obviously can’t do percent complete or ETA But it would be nice to use the same code for both bounded and unbounded streams.
  • 19. Final API ProgressTracker( iterable, # The iterable to iterate over total=None, # Override for the total message count, defaults to len(iterable) callback=print, # A function (f(string): None) that gets called each time a condition matches format_string=None, # Custom format string, sensible defaults for both bounded and unbounded iterables every_n_records=None, # Reports every n records every_x_percent=None, # Reports after every x percent every_n_seconds=None, # Reports every n seconds every_n_seconds_idle=None, # Report every n seconds, but only if there hasn’t been any progress. Useful for infinite streams ignore_first_iteration=True, # Don’t report on the first iteration last_iteration=False # Report after the last iteration ) for message in ProgressTracker(messages, every_n_records=10000, every_x_percent=5): process(message)
  • 20. Final API Make it more Pythonic: def track_progress(iterable, **kwargs): return ProgressTracker(iterable, **kwargs) Example: for message in track_progress(messages, every_n_records=10000, every_x_percent=5): process(message)