So much data so many uses with notes


Published on

Shows the use of Excel and Esri ArcGIS Desktop 10.1 to make statistics reports from Innovative Interfaces circulation data. Originally presented at IUG 2014 as part of panel: Slinging statistics and dicing data in the public library. Includes speaker notes.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

So much data so many uses with notes

  1. 1. I am Susan Lytinen, a Data Projects Specialist for the Gail Borden Public Library District in Elgin, IL, about 50 miles west of Chicago. My position was created to gather data to help us make decisions. I’ll be motoring through this presentation fairly quickly, but it’s online, with practically every word I’m saying in the presenter notes, and you can always contact me with any questions. 1
  2. 2. I started saving daily checkout data from our Innovative ILS to make maps, but I soon realized that it could be used to gain many different insights about our 3 library service points: the Main Library, the Rakow Branch, and the MediaBank disc dispenser which is built into the outside wall of the Rakow Branch. Please note that the door that opens into the Rakow Branch is mere feet from the MediaBank. For instance, our Innovative reports of circulation by location code told us how many items were being checked out of the MediaBank, but analyzing daily checkout data told us how many individual people were using it, how often, and whether they also checked out materials from inside the Rakow Branch during the same visit. We can also produce lists of materials sent from the Main Library to the Rakow Branch, and vice versa, for collection development purposes. And these are just the first projects that occurred to me. Since I enjoy mapping, I am also including some maps that I made to show where the patrons live that use each of our library buildings. 2
  3. 3. It is always pleasant to hear an acknowledged expert recommend something you are already doing. On March 4, I watched a webinar given by the well-known library consultant Joan Frye Williams -- Measurements that matter: analyzing patron behavior : an Infopeople webinar / Joan Frye Williams Ms. Williams stressed the importance of this kind of information gathering and analysis. 3
  4. 4. I have been collecting daily checkout data since July 1, 2012. The following dates are included in the reports shown in this presentation. Checkout data from 7/1/12-12/31/13 Patron records as of 1/27/14 These screen shots show which fields I have been saving. 4
  5. 5. Some libraries may not have OUT LOC. This field is useful, because it tells you the Innovative terminal number for the computer where the item was checked out. Are your patrons are using your self-checks or still taking everything to the Circulation Desk? Are you sending a lot of material from one library branch to another? 5
  6. 6. When you are preserving information for checkouts, depending on how often you search and how short your loan periods are, you may need to search for checkouts by LOUTDATE as well. This search finds items that were checked out on your specified date, but were returned before you executed your search. Some libraries may not have LPATRON, LOUTDATE. There is no LAST OUT LOC field. You are going to miss some checkouts no matter what, but it is no use agonizing about it. I have seen recent Sierra discussion list posts about using SQL to search circ_trans and item_circ_history, but I am not there yet. 6
  7. 7. These are the Innovative Create Lists searches and exports I use. I eliminated the location “zfly” because we use that for temporary items that are created at the Circulation Desk when someone wants to check out material that is not in our database. I have started exporting csv files instead of txt because csv files will open right up in Excel – you don’t have to import them. 7
  8. 8. 8
  9. 9. I cumulate the 2 daily csv files (outdate and last outdate) into a single monthly spreadsheet. These screen shots show the daily files for December 15, 2013, and the December 2013 spreadsheet. In order to do that: • I use macros to add columns and change the column headers on the daily files. • In the LOUTDATE files, I fake an OUT LOC by assuming that the item was checked out at the library where it is normally shelved. After all, why would someone request an item to be sent to a different building, only to return it on the same day? • I add the last 5 columns. • I copy the LocationID (OUT LOC) into the Checkout library field, then use “find and replace” to change it to a one-letter library code. • The Owning library is the 1st letter of the CollID (LOCATION). • Check/Own is the Checkout library concatenated to the Owning library. • Date is the date from the DateTime field. • Month is the 1st day of the month. I am rather inept, and cannot figure out how to get Excel to just say the month and year. Then I copy and paste the multiple files into a single spreadsheet. There must be more streamlined ways to do this – I just have to find out what they are! Annual spreadsheets get to be a little unwieldy for Excel to process. And in most cases, I 9
  10. 10. want monthly statistics throughout the year. 9
  11. 11. Excel PivotTables are an easy way to analyze data. If you have not used them, they are not too difficult to make. The screen shots show the fields I used in the PivotTables. These tables are for the month of February 2013. The first PivotTable shows the number of items checked out per patron at each of 3 service points: m = MediaBank at the Rakow Branch g = Main Library r = Rakow Branch The first patron in the table, however, is the enigmatic Patron 0. I do not know why, but a number of incomplete entries appear in each checkout file. Usually the item record information is complete, but the patronID is 0, and the DateTime is blank. I exclude these entries from the final count. A more conventional patron, Patron 1000792, had 29 MediaBank checkouts and 1 Rakow Branch checkout during the month. The second PivotTable adds the date to show whether people who checked items out from the MediaBank also went into the Rakow Branch and checked things out on the same day. 10
  12. 12. You can see that Patron 1000792 checked out from the MediaBank on 8 different days, and checked out from Rakow on 1 of the days he used the MediaBank. BETTER QUESTION: Do MediaBank users ever visit the Rakow Branch without using the MediaBank? But I didn’t think of that in time for this presentation. 10
  13. 13. I decided to consider that all items checked out by the same patron at the same site on the same day would count as one session or visit. To find out how many MediaBank visits there were during the month, I copied the second PivotTable, then sorted by MediaBank checkouts (column C). There were 1,862 MediaBank visits in February 2013. To find out how many times people checked items out from the MediaBank, and also checked items out from the Rakow Branch on the same day, I sorted lines 2 through 1,863 of the spreadsheet by number of Rakow Branch checkouts (Column E). There were 290 occasions on which patrons checked items out from both the MediaBank and the Rakow Branch on the same day. Why is it that most people who visit the MediaBank, which is right by the door of the Rakow Branch, do not also go inside the building and check something out? Are they only interested in Blu-rays, DVDs, and videogames? Are they visiting when the Rakow Branch is closed? …? 11
  14. 14. Our fiscal year runs from July to June. This report includes Joan Frye Williams’ favorite statistic, “mode”, the value which occurs most often. The statistic “days between visits” is messy to figure out, but I will continue it for a while to see if it is used. I get all the dates of the visits into a spreadsheet, insert columns to hold the number of days between the visits, and then get averages, etc. for those numbers. 12
  15. 15. It is easy to get a report of items sent from one building to the other from the monthly spreadsheet. Items owned by the Rakow Branch have Owning library = r. Items checked out at the Main Library have Checkout library = g. I found it easier to look for these 2 codes in 1 field, so Check/Own = gr. I have a list of staff administrative cards, and I use VLOOKUP to eliminate checkouts on those cards. Collection Development staff like a list of the actual items. I sort it by location code, call number, title, and datetime so that repeated loans of the same title will appear together. I delete the PatronID column. 13
  16. 16. I use a PivotTable to get the number of items sent each month by location code, and add the caption for the location code to the finished report. 14
  17. 17. A map is a format that helps you visualize data. It’s also a lot of fun to make. 15
  18. 18. The most widely-used mapping software is Esri’s ArcGIS. When I started thinking about it, I heard that it is too expensive and too difficult. The standard exclamation is “They give advanced degrees in that!” Not too difficult: we teach ourselves Excel, Access, etc. Tools: Esri tutorial, books, classes. Not too expensive: ArcGIS Desktop Basic = $1500/ year. But only $250/year for educational/nonprofit institution using only for administrative purposes. I used GIS tutorial1: basic workbook, by Wilpen L. Gorr and Kristen S. Kurland. Redlands, CA: Esri Press. The 5th edition came out May 3, 2013. ISBN 978-158948-335-4 List price $79.95. Amazon price $43.86. It includes access to a 180-day trial of ArcGIS® 10.1 for Desktop Advanced software and a DVD with data for working through the exercises. There are also open source GIS programs, such as Quantum GIS. This screen shows ArcMap, the ArcGIS element that you use to make maps. Maps are made up of various files. The right pane shows the files you can choose to make the map. 16
  19. 19. The center pane holds the map itself. The left pane shows the files that make up the map you are working on. You can also draw on the map, but whenever possible you want to use files that already exist. The most interesting thing about ArcMap for this project is its ability to take a spreadsheet of addresses and geocode them -- locate them on a map. 16
  20. 20. The first spreadsheet shows active patrons and their addresses (which I have whited out in the screen shot). As you know, Innovative patron records have one field that holds the entire address, but with patience it is possible to parse out the various elements. Perhaps this is easier when you get patron information via SQL? To make the map I want, I need to add a code to each patron showing which libraries that patron used. In the second spreadsheet, Columns A – D show the results of a PivotTable of checkout records. You can see how many items each patron checked out from each service point: g = Main Library m = MediaBank r = Rakow Branch In Columns E – G, I used formulas to change a checkout number greater than 0 into the letter for each building. You can see the formula I used for Column E in the screen shot. In Column H, I concatenated all 3 building codes, so you could see where the patron checked out items. But I was worried that the map would be too hard to read if I used all 3 locations (too many code combinations). So I simplified the codes to only 2 locations in Column I. Since the MediaBank is located at the Rakow Branch, I used “r” to mean either the MediaBank or the 17
  21. 21. Rakow Branch. Another thing to notice is that the PatronID in the patron record is 2 characters longer than the PatronID in the checkout record. The PatronID in the patron record has a “p” at the beginning and a check digit at the end. 17
  22. 22. I used the Excel VLOOKUP function to add the checkout library code to the patron records. First I copied the PatronID from Column A of the patron spreadsheet into Column J and chopped off the beginning “p” and the ending check digit to make the ShortID. Then I added Column K to hold the checkout library code. Then I copied the PatronID and the checkout library code from the checkout record spreadsheet into Columns O – P of the patron spreadsheet. I used VLOOKUP to copy the checkout library code from Column P to Column K. You can see the formula in the screen shot. If the ShortID in Column J does not match any PatronID in Column O, the formula returns “#N/A”. That means that the patron did not check out any items in the time period covered by my report, and for this project I do not want to map patrons without checkouts. I copied Column K and pasted the values back into the spreadsheet, deleted Columns O – P, and deleted the lines with “#N/A”. That made the spreadsheet ready to pull into ArcMap. 18
  23. 23. I pulled the spreadsheet of patrons into ArcMap and geocoded the addresses. If you have your own file of mapped reference addresses, the geocoding operation is free. However, if you want to use Esri’s online World Geocode Service, as I did, there is an additional charge. You need to tell ArcMap which fields in your table contain the address parts. 19
  24. 24. Of 44,557 patrons in the spreadsheet, 36,374 (~82%) were geocoded. You have a chance to go over the records that did not geocode and match them manually to possible addresses in the reference file, but I did not do that in this case. The file that was formed by geocoding is called a shapefile. The shapefile is accompanied by an attribute table that has the fields from the Excel spreadsheet, and additional fields with address information from Esri. Again, I have whited out the addresses. You can use data from the attribute table to change the way the shapefile looks. You can also add data to the table. 20
  25. 25. I threw in a background map, added shapefiles of the library district boundaries and the 2 library buildings, and colored the dots by checkout library: Main Library only (27,077), Rakow Branch only (3,215), or both (6,082). Unfortunately, the dots cover each other, although I tried to layer them so that the Main Library only patrons were on the bottom with the largest dot and the faintest color, and the Rakow Branch only patrons on the top with the smallest dot and the most vivid color. 21
  26. 26. When you look at maps which show only one type of patron at a time, you can see how misleading the 3-color map is. 22
  27. 27. 23
  28. 28. 24
  29. 29. I made the dots smaller, so they would not overlay each other. You can see how difficult it can be to make a map communicate information effectively. However, it is obvious that patrons do not always go to the library that is closest to them. I wanted to find out how many patrons go to the library that is farther from them. 25
  30. 30. ArcMap has a “measure” tool that tells you the distance between 2 points, but not between 1 point and 36,374 other points. There is a tool that will measure the distances between large numbers of points, but it is not included in the basic version of ArcGIS. However, there is another way to record the approximate distance of each patron from each library building. I used the “measure” tool to find out that the distance from the Rakow Branch to the farthest patron in the northeast corner of the district is 8.2 miles. 26
  31. 31. “Select by location” let me select all patrons with a certain distance from the Rakow Branch. I used ¼ mile increments to measure how far patrons live from the branch. Since we have already established that the distance from the Rakow Branch to the farthest patron is 8.2 miles, it is not surprising that when I searched by 8.25 miles, I got all 36,374 patrons. However, when I searched by 8.0 miles, a few patrons in the northeast corner of the district were not selected. That means that those patrons live between 8.25 and 8.0 miles from the Rakow Branch. They can be seen on the map because their dots are dark, instead of the florescent blue that you see when a feature is selected. I wanted to label those patrons “8.25” by adding the information to the shapefile’s attribute table. 27
  32. 32. When you select features on the map, the attribute table’s lines for those features are also selected (highlighted in florescent blue). You can choose to see either all the lines in the table, or just the lines that have been selected. I chose to see the selected lines, and the attribute table told me there were 36,242 out of 36,347. That means that 132 patrons live between 8.25 and 8.0 miles from the Rakow Branch, and I wanted to label them “8.25”. I added a field to the attribute table called “RakowDist” to hold this information. Fortunately, the attribute table has a handy icon that lets you reverse the selection on the map (and on the attribute table). When you click on that icon, the dots that were highlighted turn dark, and the dots that were dark become highlighted. As you will see on the next page, the highlighted lines in the attribute table change, too. 28
  33. 33. When you choose to look at only the selected lines in the attribute table, there are now only 132. After you tell ArcMap that you want to edit the table, you can copy 132 lines from an Excel spreadsheet and paste them into the attribute table. 29
  34. 34. The next step is to search by location for patrons who live within 7.75 miles of the Rakow Branch. Then I reversed the selection using the attribute table. You can see that the little triangle of patrons in the northeast corner is bigger. These people live between 8.25 and 7.75 miles of the branch. This includes the 132 people who live between 8.25 and 8.0 miles of the branch, the ones we found in the previous search. 30
  35. 35. The attribute table tells us that that 351 people live between 8.25 and 7.75 miles of the branch. However, 132 of those people already have “8.25” in the “RakowDist” column. I sorted the column largest to smallest. Then I pasted “8.0” into the lines where RakowDist = 0, lines 133 – 351. I repeated this, decreasing the distance by 0.25 miles each time, until I had a distance from the Rakow Branch for each patron. I did the whole thing over again to get the distance from the Main Library for each patron. 31
  36. 36. The population around the Main Library is denser, so a map color-coded by distance for the Main Library is more striking than the map for the Rakow Branch. I thought a “heat map”, with the colors gradually going more blue the closer they were to the building, would be effective, but it is hard to read. 32
  37. 37. Concentric bands of contrasting colors are easier to see. 33
  38. 38. To find out how many patrons live closer to the Main Library, but go only to the Rakow Branch, I used “Select by attributes.” I chose patrons where the column “Two” (the code for the checkout library) = “r” and The distance to the Main Library is less than the distance to the Rakow Branch. There are 3,215 patrons who go only to the Rakow Branch. 235, or 7.3%, live closer to the Main Library than to the Rakow Branch. Are these people drawn to the Rakow Branch by the MediaBank? Not all of them, as you can see from the attribute table. Several patrons have an “r” in the column that shows Rakow Branch use, but no “m” in column that shows MediaBank use. 34
  39. 39. What other factors would there be? The yellow dots showing these patrons are not grouped in a limited geographic area. Perhaps they work or shop by the Rakow Branch? 35
  40. 40. To find out how many patrons live closer to the Rakow Branch, but go only to the Main Library, I chose patrons where the column “Two” (the code for the checkout library) = “g” and The distance to the Main Library is more than the distance to the Rakow Branch. There are 27,077 patrons who go only to the Main Library. 3,079, or 11.4%, live closer to the Rakow Branch than to the Main Library. 36
  41. 41. As you can see, some of the Main Library only patrons live very close to the Rakow Branch. There are 3,215 patrons who exclusively go to the Rakow Branch, and 3,079 patrons who live closer to Rakow yet shun it. 37
  42. 42. To avoid taking too much time and inducing boredom, I skipped steps that I used in making these reports. Please feel free to email or call me with any questions. My ArcGIS skills are not extensive, but, as you can see, the software is fun to experiment with. 38
  43. 43. 39