Conquering a difficult data conversion

14. ~220,000 contacts > 25,000 duplicates ~220,000 contacts ~35,000 contacts Image credit: http://blog.legalvoice.org/2013/04/small-steps-forward-in-uphill-battle.html

15. Image credit: http://birth-of-a-notion.blogspot.com/2016/06/taking-roller-coaster-ride-with-indiana.html

16. Image credit: http://birth-of-a-notion.blogspot.com/2016/06/taking-roller-coaster-ride-with-indiana.html

17. Image credit: https://www.skywrite.me/powerpoint-person-icon/

18.

19.

20.

21.

22.

23.

24. Thank you Raiser’s Edge migration blog post: http://bit.ly/re-2-sf

Editor's Notes

Hi, my name is Jason and I work for the American Craft Council. I was honored but honestly a little surprised when Nick invited me to come speak.
Our organization just passed one year since kicking off our Salesforce implementation. We’re now mere weeks away from completing our data conversion process. So I don’t feel like I have much to share in regards to actually using Salesforce, but I can share a good story about our journey getting here.
This is our data map, detailing where all of the customer data systems and siloes existed at the outset of our journey.
The primary challenge was these four core customer data systems in the middle. Note how there are no lines directly connecting them. We set out to build integrations with the fulfillment vendor and our show artist application, while migrating data from and replacing the ACT! database and Raiser’s Edge
Now everyone said that the Raiser’s Edge migration would be difficult, and it certainly was. But that’s not the story I’m here to talk about today.
Because behind that was an even more imposing and massive challenge. The fulfillment vendor integration.
Our organization runs a magazine subscription based membership program, so this vendor controls all of our membership data. All editing has to go through them. All list exports had to go through them.
This is a key piece, but it was destined to be a long road.
We started soon after kicking off our implementation last August. The vendor offered two integration methods: an API or a nightly feed of CSV files.
We initially pursued the API route, but after getting that provisioned our Redpath developer soon discovered that the vendor doesn’t support querying it via the last updated timestamp. And since pulling all 220,000-plus customer records through the API every time we call it isn’t feasible, we had to scrap that and fall back to the nightly file feed.
The file feed was provisioned in January and we re-engaged the work using a two track approach. Redpath built the technical integration while I led the in house effort to get the data in shape.
The two primary challenges with the dataset were its massive size and poor quality. Hundreds of hours went into cleaning this up.
In that process we identified over 25,000 duplicate mappings, narrowed our import universe down to 35,000 recently active customers, and edited a large portion of the records for consistency and accuracy.
After initially winnowing down this dataset, the challenge became keeping it clean. In six months of observing and manually merging in files from the feed, I learned a few things about what the ongoing challenges would be
Whenever a magazine issue is fulfilled to a customer, they pop up in the feed Whenever any mailing is sent to a customer, including solicitations sent to past customers, they pop up in the feed Whenever solicitations sent to rented lists result in a do not contact request, a record is then created for that customer and they pop up in the feed A significant number of new duplicate records are being created on an ongoing basis The vendor sometimes merges duplicate records into a surviving record that we’ve already identified as a duplicate
To manage this situation we had to build, as Andy Bergman said, gates on gates on gates
Here’s a peek at our basic architecture. The vendor’s mainframe generates the file feed and places the result on an FTP server each night. A virtualized Windows Server (running on Azure cloud using donated credits, thank you Microsoft) executes a Python script to reach out and copy the files over. Jitterbit free then executes on the server to parse the CSV files and push the data into a custom Salesforce table called Temp Objects. Apex code in Salesforce then takes over, scanning the Temp Objects for matching ID numbers and running through a duplicate detection algorithm. If a record sent by the vendor does not match any existing contact, a new record is created. However if a match is found, and their name or contact info has changed, then the update goes into a queue to review and approve.
The image on top is what the listview for our approval queue looks like. In order to help make this more manageable, I created a custom field for Temp Objects that evaluates what type of change is pending via this unwieldy nested if-then-else statement. Specific ones which we’re never or almost never going to want to keep are scored lower so that they sink to the bottom of the list and we can concentrate on the more likely ones at top. The list view for the queue allows us to either check a box to approve changes, or delete the change from the queue. This works pretty well when there are a few dozen or less to review. But when the firehose opens and it gets to be hundreds or even a couple thousand, then we need a more powerful method.
That’s where Apsona reporting and Data Loader come in. I built a report in Apsona to export the queue to a CSV file. This could also be done in standard reporting but that would require defining a custom report type first. Apsona let me start by just choosing the table to report from, which was just faster and easier.
After exporting the table, I paste in a few formulas which evaluate changes in a bit more detail, and because the same customer may have popped up in more than one day’s feed, sort the list by customer number and date and remove duplicates.
Any unwanted changes are moved into a workbook labeled Delete, to get pushed back through a Data Loader delete operation that zaps them from the queue. Meanwhile, any wanted changes are pushed back through a Data Loader update operation to flag the Boolean field for Approve Changes. Once that’s done, Salesforce takes over and updates the information on the corresponding contact record. --IF THERE’s ENOUGH TIME, CONTINUE-- We also deal with unwanted contact records via Apsona reporting and Data Loader. We only want data from the vendor on anyone who has purchased a subscription for themself or as a gift, we don’t want contact records for people where the only reason they have a record is to flag them “do not contact.” So I built an Apsona report for most recent Opportunity Contact Role. I haven’t been able to get this to cleanly give me just the contacts with no OCR, but it does effectively give me a list of all contacts created by the integration service account and whether or not they have an associated OCR. That lets me export it to Excel, sort and cut it to just the ones who don’t, and then run that back through a Data Loader delete operation to zap them from the system.
Some sage IT pro advice I heard once is that we’re not in the business of running elegant systems, we do what it takes to meet needs and succeed. I hope that this presentation gave you a couple ideas for how to tackle challenges. Thanks for giving me the opportunity to speak this morning.

Conquering a difficult data conversion

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Conquering a difficult data conversion

Similar to Conquering a difficult data conversion (20)

More from Jason Samuels

More from Jason Samuels (15)

Recently uploaded

Recently uploaded (20)

Conquering a difficult data conversion

Editor's Notes