This document provides best practices for fixing data issues that occur in production databases. It recommends treating data fixes as code by checking fixes into source control, testing them, and conducting code reviews. It also advises logging all data fix executions, changes, and exceptions. Developers should make fixes idempotent and reversible when possible, be fault-tolerant of exceptions, and optimize for bottlenecks like CPU, memory, and database usage. Database snapshots should be used for testing and reverting changes.
Richard Fridrich: Třesení stromem v JavaScriptuDevelcz
Díky ES6 můžeme optimalizovat velikost JS projektů pomocí tzv. "třesení stromem" (tree shaking). V přednášce si řekneme: Co to je? Na co je to dobré? Jaký dopad to bude mít na JS ekosystém? Jak to použít ve svém projektu? Jak připravit své moduly a knižnice na pořádné třesení?
Richard Fridrich: Třesení stromem v JavaScriptuDevelcz
Díky ES6 můžeme optimalizovat velikost JS projektů pomocí tzv. "třesení stromem" (tree shaking). V přednášce si řekneme: Co to je? Na co je to dobré? Jaký dopad to bude mít na JS ekosystém? Jak to použít ve svém projektu? Jak připravit své moduly a knižnice na pořádné třesení?
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2a2Djbp.
Gerard Sans explains RxJS' data architecture based on reactive programming, exploring Observables API using RxJS koans and unit tests. RxJS 5 focuses on performance and usability. Filmed at qconlondon.com.
Gerard Sans is a multi-talented Computer Science Engineer specialised in Web. He has lived and worked for all sorts of companies in Germany, Brazil, UK and Spain. He enjoys running AngularJS Labs London, mentoring AngularJS students, participating in the community, giving talks and writing technical articles at Medium.
Discover immutable data structure:
Di's Tech Talk on Immutable Data Structures:
- Why you will find them very useful in solving one the biggest programming challenges
- How these data structures are implemented to gain nearly native performance
- How you can use them in JavaScript, particularly with React.js
The GPars (Groovy Parallel Systems) project provides multiple abstractions for concurrent, parallel programming in Groovy and Java. Rather than dealing directly with threads, synchronization, and locks, or even the java.util.concurrent classes added in Java 5, the project allows you to think in terms of actors, data flows, or composable asynchronous functions (to name a few).
In this talk, I covered the basics of GPars, including what it's like to learn to use it. Although I've done a fair amount of concurrent programming, I've just started using GPars. As such, this talk should be suitable for Groovy beginners.
Data analysis and visualization with mongo db [mongodb world 2016]Alexander Hendorf
MongoDB World, New York, June 29th
This talk will feature why MongoDB was the right choice and how one can visualize data via the MongoDB Connector for BI (e.g. with Tableau or open source libraries like bokeh) straight from MongoDB. We'll be building an application that offers real-time insights for the music industry.
[2019] Java에서 Fiber를 이용하여 동시성concurrency 프로그래밍 쉽게 하기NHN FORWARD
※다운로드하시면 더 선명한 자료를 보실 수 있습니다.
복잡하고 어려운 동시성(concurrency) 프로그래밍을 Fiber를 이용하여 성능은 유지하면서 간단하게 작성하는 방법을 알아봅니다.
Java에서 유저 모드 경량 스레드인 Fiber를 배우고 언제 어떻게 사용하면 좋을지 알게 됩니다.
목차
1. 동시성(concurrency) 프로그래밍이 어려운 이유
2. Fiber란 무엇인가?
3. Coroutine은 무엇인가?
4. Java Fiber Quasar
5. Fiber를 이용해 동시성 프로그래밍을 하는 방법
6. Fiber의 성능
7. 적용 사례 소개 (Tardis)
8. Java Fiber의 미래, Project Loom
9. Q&A
대상
- Java로 실시간 게임 서버를 만들고 싶은 분
- Java에서 동시성(concurrency) 프로그래밍을 쉽게 해보고 싶은 분
■관련 동영상: https://youtu.be/7H_ROv5rNIg
Random speed program in CPP
This program is calculating the random speed explicitly and implementing the concept of matrix.
programminghomeworkhelp.com is the leading online solution provider for C++ Programming assignments. If you are struggling with your scoring in C++ Programming Assignments, Homework or Projects then email your requirements at info@programminghomeworkhelp.com and we will ensure excellent grades. Email your C++ assignment to info@programminghomeworkhelp.com and distress yourself from the complex C++ Programming Assignments.
Cassandra Day Denver 2014: Building Java Applications with Apache CassandraDataStax Academy
Speaker: Tim Berglund, Global Director of Training at DataStax
So you’re a JVM developer, you understand Cassandra’s architecture, and you’re on your way to knowing its data model well enough to build descriptive data models that perform well. What you need now is to know the Java Driver.
What seems like an inconsequential library that proxies your application’s queries to your Cassandra cluster is actually a sophisticated piece of code that solves a lot of problems for you that early Cassandra developers had to code by hand. Come to this session to see features you might be missing and examples of how to use the Java driver in real applications.
In this presentation speaker considered theoretical basics of using the redux-saga library, which was created for facilitating the organization of the so-called “side effects” (for example, asynchronous operations). The event participants obtained practical tips on using saga in real-life projects.
This presentation by Igor Nesterenko (Lead Software Engineer, Consultant, GlobalLogic, Kharkiv) was delivered at GlobalLogic Kharkiv JS TechTalk #2 on August 17, 2018.
Description
At Stitch Fix most application logs are output in a structured JSON format for simpler debugging and downstream consumption.
In this talk we’ll cover in more detail why structured logs are useful and provide leverage, caveats to using them, and how simple it is to get one going with Python.
Abstract
At Stitch Fix most application logs are output in a structured JSON format for simpler debugging and downstream consumption. For example, data scientists can add a field to their application log and it will automatically turn up as a parsed field in Elasticsearch for easy dashboarding and querying via Kibana, or be easily found and queried in Presto. In this talk we’ll cover in more detail why structured logs are useful and provide leverage, caveats to using them, and how simple it is to get one going with Python.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2a2Djbp.
Gerard Sans explains RxJS' data architecture based on reactive programming, exploring Observables API using RxJS koans and unit tests. RxJS 5 focuses on performance and usability. Filmed at qconlondon.com.
Gerard Sans is a multi-talented Computer Science Engineer specialised in Web. He has lived and worked for all sorts of companies in Germany, Brazil, UK and Spain. He enjoys running AngularJS Labs London, mentoring AngularJS students, participating in the community, giving talks and writing technical articles at Medium.
Discover immutable data structure:
Di's Tech Talk on Immutable Data Structures:
- Why you will find them very useful in solving one the biggest programming challenges
- How these data structures are implemented to gain nearly native performance
- How you can use them in JavaScript, particularly with React.js
The GPars (Groovy Parallel Systems) project provides multiple abstractions for concurrent, parallel programming in Groovy and Java. Rather than dealing directly with threads, synchronization, and locks, or even the java.util.concurrent classes added in Java 5, the project allows you to think in terms of actors, data flows, or composable asynchronous functions (to name a few).
In this talk, I covered the basics of GPars, including what it's like to learn to use it. Although I've done a fair amount of concurrent programming, I've just started using GPars. As such, this talk should be suitable for Groovy beginners.
Data analysis and visualization with mongo db [mongodb world 2016]Alexander Hendorf
MongoDB World, New York, June 29th
This talk will feature why MongoDB was the right choice and how one can visualize data via the MongoDB Connector for BI (e.g. with Tableau or open source libraries like bokeh) straight from MongoDB. We'll be building an application that offers real-time insights for the music industry.
[2019] Java에서 Fiber를 이용하여 동시성concurrency 프로그래밍 쉽게 하기NHN FORWARD
※다운로드하시면 더 선명한 자료를 보실 수 있습니다.
복잡하고 어려운 동시성(concurrency) 프로그래밍을 Fiber를 이용하여 성능은 유지하면서 간단하게 작성하는 방법을 알아봅니다.
Java에서 유저 모드 경량 스레드인 Fiber를 배우고 언제 어떻게 사용하면 좋을지 알게 됩니다.
목차
1. 동시성(concurrency) 프로그래밍이 어려운 이유
2. Fiber란 무엇인가?
3. Coroutine은 무엇인가?
4. Java Fiber Quasar
5. Fiber를 이용해 동시성 프로그래밍을 하는 방법
6. Fiber의 성능
7. 적용 사례 소개 (Tardis)
8. Java Fiber의 미래, Project Loom
9. Q&A
대상
- Java로 실시간 게임 서버를 만들고 싶은 분
- Java에서 동시성(concurrency) 프로그래밍을 쉽게 해보고 싶은 분
■관련 동영상: https://youtu.be/7H_ROv5rNIg
Random speed program in CPP
This program is calculating the random speed explicitly and implementing the concept of matrix.
programminghomeworkhelp.com is the leading online solution provider for C++ Programming assignments. If you are struggling with your scoring in C++ Programming Assignments, Homework or Projects then email your requirements at info@programminghomeworkhelp.com and we will ensure excellent grades. Email your C++ assignment to info@programminghomeworkhelp.com and distress yourself from the complex C++ Programming Assignments.
Cassandra Day Denver 2014: Building Java Applications with Apache CassandraDataStax Academy
Speaker: Tim Berglund, Global Director of Training at DataStax
So you’re a JVM developer, you understand Cassandra’s architecture, and you’re on your way to knowing its data model well enough to build descriptive data models that perform well. What you need now is to know the Java Driver.
What seems like an inconsequential library that proxies your application’s queries to your Cassandra cluster is actually a sophisticated piece of code that solves a lot of problems for you that early Cassandra developers had to code by hand. Come to this session to see features you might be missing and examples of how to use the Java driver in real applications.
In this presentation speaker considered theoretical basics of using the redux-saga library, which was created for facilitating the organization of the so-called “side effects” (for example, asynchronous operations). The event participants obtained practical tips on using saga in real-life projects.
This presentation by Igor Nesterenko (Lead Software Engineer, Consultant, GlobalLogic, Kharkiv) was delivered at GlobalLogic Kharkiv JS TechTalk #2 on August 17, 2018.
Description
At Stitch Fix most application logs are output in a structured JSON format for simpler debugging and downstream consumption.
In this talk we’ll cover in more detail why structured logs are useful and provide leverage, caveats to using them, and how simple it is to get one going with Python.
Abstract
At Stitch Fix most application logs are output in a structured JSON format for simpler debugging and downstream consumption. For example, data scientists can add a field to their application log and it will automatically turn up as a parsed field in Elasticsearch for easy dashboarding and querying via Kibana, or be easily found and queried in Presto. In this talk we’ll cover in more detail why structured logs are useful and provide leverage, caveats to using them, and how simple it is to get one going with Python.
Business Dashboards using Bonobo ETL, Grafana and Apache AirflowRomain Dorgueil
Zero-to-one hands-on introduction to building a business dashboard using Bonobo ETL, Apache Airflow, and a bit of Grafana (because graphs are cool). The talk is based on the early version of our tools to visualize apercite.fr website. Plan, Implementation, Visualization, Monitoring and Iterate from there.
An exploration into RxJava on Android for the experienced, yet uninitiated software engineer. This presentation explores Declarative vs Imperative programming paradigms and expands the discussion into Functional Reactive Programming. It explains the benefits of the observer contract, high-order functions, and schedulers available in RxJava. It also explains the purpose of the Android integration libraries: RxAndroid, RxLifecycle, and RxBindings.
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak PROIDEA
Speaker: Andrzej Dyjak
Language: English
In recent years security industry started to grow fond of Apple’s iOS and OS X platforms. This talk will cover one of XNU's flagship debugging utilities: DTrace, a dynamic tracing framework for troubleshooting kernel and application problems on production systems in real time. It will be shown how it can be used in order to ease various tasks within the realm of dynamic binary analysis and beyond.
CONFidence: http://confidence.org.pl/
Slides for the Cluj.py meetup where we explored the inner workings of CPython, the reference implementation of Python. Includes examples of writing a C extension to Python, and introduces Cython - ultimately the sanest way of writing C extensions.
Also check out the code samples on GitHub: https://github.com/trustyou/meetups/tree/master/python-c
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Understanding Nidhi Software Pricing: A Quick Guide 🌟
Choosing the right software is vital for Nidhi companies to streamline operations. Our latest presentation covers Nidhi software pricing, key factors, costs, and negotiation tips.
📊 What You’ll Learn:
Key factors influencing Nidhi software price
Understanding the true cost beyond the initial price
Tips for negotiating the best deal
Affordable and customizable pricing options with Vector Nidhi Software
🔗 Learn more at: www.vectornidhisoftware.com/software-for-nidhi-company/
#NidhiSoftwarePrice #NidhiSoftware #VectorNidhi
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
AI Genie Review: World’s First Open AI WordPress Website CreatorGoogle
AI Genie Review: World’s First Open AI WordPress Website Creator
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-genie-review
AI Genie Review: Key Features
✅Creates Limitless Real-Time Unique Content, auto-publishing Posts, Pages & Images directly from Chat GPT & Open AI on WordPress in any Niche
✅First & Only Google Bard Approved Software That Publishes 100% Original, SEO Friendly Content using Open AI
✅Publish Automated Posts and Pages using AI Genie directly on Your website
✅50 DFY Websites Included Without Adding Any Images, Content Or Doing Anything Yourself
✅Integrated Chat GPT Bot gives Instant Answers on Your Website to Visitors
✅Just Enter the title, and your Content for Pages and Posts will be ready on your website
✅Automatically insert visually appealing images into posts based on keywords and titles.
✅Choose the temperature of the content and control its randomness.
✅Control the length of the content to be generated.
✅Never Worry About Paying Huge Money Monthly To Top Content Creation Platforms
✅100% Easy-to-Use, Newbie-Friendly Technology
✅30-Days Money-Back Guarantee
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIGenieApp #AIGenieBonus #AIGenieBonuses #AIGenieDemo #AIGenieDownload #AIGenieLegit #AIGenieLiveDemo #AIGenieOTO #AIGeniePreview #AIGenieReview #AIGenieReviewandBonus #AIGenieScamorLegit #AIGenieSoftware #AIGenieUpgrades #AIGenieUpsells #HowDoesAlGenie #HowtoBuyAIGenie #HowtoMakeMoneywithAIGenie #MakeMoneyOnline #MakeMoneywithAIGenie
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
21. Track execution.
● Log when a script is executed.
● Log everything that changed.
● Log what did not change.
● Centralize your logging.
● Track the script’s progress.
26. for user_id in list_of_user_ids:
try:
toggle = FeatureToggle.objects.get(user_id=user_id)
except FeatureToggle.DoesNotExist:
logger.info('FeatureToggle does not exist for User
{}'.format(User_id))
continue
toggle.orientation_videos = True
toggle.save()
34. feature_toggles = []
for user_id in user_ids_to_backfill:
feature_toggles.append(FeatureToggle(
user_id=user_id,
orientation_videos=True
)
)
FeatureToggle.objects.bulk_create(feature_toggles)
35. def backfill_activity_progresses():
conn = psycopg2.connect("some_credentials")
cursor = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
data_to_replicate = []
for index in tqdm(batch_range):
cursor.execute("SELECT user_id, type, correct_answers,
total_answers, is_complete FROM legacy_activity ORDER BY id;")
data_to_replicate.append(cursor.fetchall())
conn.close()
add_to_new_activity_table(data_to_replicate)
37. Execute at the right level
of abstraction.
● Use existing functions and the ORM when
you can afford to.
● Use SQL when execution time becomes
significant.
3. This is a talk about how to deal with problems that arise with your data in production.
Imagine that you’re working on a web application. It has an admin interface where admins can toggle different features on and off for different users.
Somehow, those feature toggles got messed up. Now the orientation_videos flag is set to False for a large number of users.
Fortunately, you have a way to recover which users are supposed to have that feature enabled so you can go into your database and fix the problem.
If that is your response to updating production data in your web application, I’m going to suggest that you stop and think about the variety of ways in which such an operation can go wrong.
5. The right approach is to never, ever change production data.
But of course that’s not how things work in the real world.
6. So instead, let’s talk about some realistic ways to fix your data safely. The first bit of advice should be obvious: don’t just start executing SQL queries or shell commands off the cuff. Treat these changes as code. That means checking them in...
7. Testing them. This might seem like a waste of time since you’re probably going to throw this code away after you run it. But better to waste a little time writing a test than a lot of time trying to reverse a catastrophic mistake to your data.
Whatever process your team has for code review, do that.
10. Secondly, when you execute one of these scripts, the last thing that you want is code that runs silently for an indeterminate period of time, and may or may not have had the desired effect. So be very generous with your logging.
Secondly, when you execute one of these scripts, the last thing that you want is code that runs silently for an indeterminate period of time, and may or may not have had the desired effect. So be very generous with your logging.
Secondly, when you execute one of these scripts, the last thing that you want is code that runs silently for an indeterminate period of time, and may or may not have had the desired effect. So be very generous with your logging.
Here’s an example script. Note that we’re logging at the beginning of the function, at the end, and for every code path in between.
Another consideration is that you should centralize your logging. These logs need to be accessible to anyone on your team who may need to look back and see what happened.
Any tool that works for you is fine, but what we use is Amazon Kinesis Firehose. We use it to write logs from our scripts to Amazon S3. What’s great about this service is the simplicity of using it.
You set up a firehose in AWS, and then writing a log to an S3 bucket is just as simple as this. Note that I have a cute little function that gets the filename of the script that’s calling this function.
Since I imagine that you’ll still be running these scripts manually, it’s pretty important to know how far along you are.
For that, we use tqdm
You pass an iterable in the “tqdm” function, passing it a count if your iterable is expensive to get the length of, and it gives you a little progress bar which shows time estimates.
15. Speaking of time, your script might take a long time to run. And won’t it be annoying if it runs for 3 hours and then fails halfway through?
It’s always easy to forget to handle error conditions.
So it’s a good idea to practice defensive programming in this case. Think about possible exceptions and catch them, making sure to log.
Another thing to think about is, when your script does break 2 hours in, can you safely run it again and get the desired results?
This might not always be possible or necessary, but in some extreme cases we have resorted to adding a new field to a model to track which items have been backfilled.
Another nice feature is reversibility. If you screwed something up, is it possible to figure out what the previous state of the data was?
This is where really detailed logging comes into play. If your logs contain all of the necessary information, you could conceivably parse them to get the original state of your data and reverse the damage.
20. If you’re dealing with a lot of data that needs to be fixed, you’re going to start needing to do some actual engineering. You’re going to need to think about what bottlenecks you might encounter. Maybe you’re doing something computationally intensive, in which case you might need to think about how to parallelize the job.
Or maybe the naive version of your script is going to load several GB of data into memory. In that case, you might need to rewrite to use a generator or something.
More likely, the database is going to be your big issue.
In that case, it’s time to explore some of the features of your ORM. For example, here’s a construct in the Django ORM that I’ve used to bulk create objects instead of creating them one by one. That can be a huge time saver.
If you’re still running too slow, you might want to drop down directly to the SQL level and skip Object instantiation and so on. You can get huge performance gains this way.
But please, please don’t do these things if you don’t need to. I’ve gotten into PR debates with coworkers who wanted to over-optimize a script that takes 15 minutes to run. Don’t get fancy.
So my general rule of thumb is to use the ORM when you can, and use SQL or the equivalent when you need to.
24. Let’s talk about other issue which should be obvious but needs to be said. Backups are your friend. First of all, if it’s at all feasible, set up a snapshot of your data and run your script against that before running it against the real thing.
If you’re going to be doing something risky, please back up your data before you execute your script. Right before.
If you’re going to be doing significant changes rather often, consider automating your database backups so that you snapshot right before your scripts run.
There are lots of other considerations, but those are some of the big ones.