SlideShare a Scribd company logo
1 of 43
Download to read offline
I am Susan Lytinen, a Data Projects Specialist for the Gail Borden Public Library District in
Elgin, IL, about 50 miles west of Chicago.
My position was created to gather data to help us make decisions.
I’ll be motoring through this presentation fairly quickly, but it’s online, with practically every
word I’m saying in the presenter notes, and you can always contact me with any questions.
1
I started saving daily checkout data from our Innovative ILS to make maps, but I
soon realized that it could be used to gain many different insights about our 3
library service points: the Main Library, the Rakow Branch, and the MediaBank disc
dispenser which is built into the outside wall of the Rakow Branch. Please note that
the door that opens into the Rakow Branch is mere feet from the MediaBank.
For instance, our Innovative reports of circulation by location code told us how
many items were being checked out of the MediaBank, but analyzing daily checkout
data told us how many individual people were using it, how often, and whether
they also checked out materials from inside the Rakow Branch during the same visit.
We can also produce lists of materials sent from the Main Library to the Rakow
Branch, and vice versa, for collection development purposes.
And these are just the first projects that occurred to me.
Since I enjoy mapping, I am also including some maps that I made to show where
the patrons live that use each of our library buildings.
2
It is always pleasant to hear an acknowledged expert recommend something you
are already doing.
On March 4, I watched a webinar given by the well-known library consultant Joan
Frye Williams --
Measurements that matter: analyzing patron behavior : an Infopeople webinar /
Joan Frye Williams
https://infopeople.org/civicrm/event/info?id=377&reset=1
Ms. Williams stressed the importance of this kind of information gathering and
analysis.
3
I have been collecting daily checkout data since July 1, 2012. The following dates are
included in the reports shown in this presentation.
Checkout data from 7/1/12-12/31/13
Patron records as of 1/27/14
These screen shots show which fields I have been saving.
4
Some libraries may not have OUT LOC. This field is useful, because it tells you the
Innovative terminal number for the computer where the item was checked out. Are your
patrons are using your self-checks or still taking everything to the Circulation Desk? Are you
sending a lot of material from one library branch to another?
5
When you are preserving information for checkouts, depending on how often you
search and how short your loan periods are, you may need to search for checkouts
by LOUTDATE as well. This search finds items that were checked out on your
specified date, but were returned before you executed your search.
Some libraries may not have LPATRON, LOUTDATE.
There is no LAST OUT LOC field.
You are going to miss some checkouts no matter what, but it is no use agonizing
about it.
I have seen recent Sierra discussion list posts about using SQL to search circ_trans
and item_circ_history, but I am not there yet.
6
These are the Innovative Create Lists searches and exports I use.
I eliminated the location “zfly” because we use that for temporary items that are created at
the Circulation Desk when someone wants to check out material that is not in our
database.
I have started exporting csv files instead of txt because csv files will open right up in Excel –
you don’t have to import them.
7
8
I cumulate the 2 daily csv files (outdate and last outdate) into a single monthly
spreadsheet. These screen shots show the daily files for December 15, 2013, and the
December 2013 spreadsheet.
In order to do that:
• I use macros to add columns and change the column headers on the daily files.
• In the LOUTDATE files, I fake an OUT LOC by assuming that the item was checked out at
the library where it is normally shelved. After all, why would someone request an item
to be sent to a different building, only to return it on the same day?
• I add the last 5 columns.
• I copy the LocationID (OUT LOC) into the Checkout library field, then use “find
and replace” to change it to a one-letter library code.
• The Owning library is the 1st letter of the CollID (LOCATION).
• Check/Own is the Checkout library concatenated to the Owning library.
• Date is the date from the DateTime field.
• Month is the 1st day of the month. I am rather inept, and cannot figure out how
to get Excel to just say the month and year.
Then I copy and paste the multiple files into a single spreadsheet.
There must be more streamlined ways to do this – I just have to find out what they are!
Annual spreadsheets get to be a little unwieldy for Excel to process. And in most cases, I
9
want monthly statistics throughout the year.
9
Excel PivotTables are an easy way to analyze data. If you have not used them, they are not
too difficult to make. The screen shots show the fields I used in the PivotTables.
These tables are for the month of February 2013.
The first PivotTable shows the number of items checked out per patron at each of 3 service
points:
m = MediaBank at the Rakow Branch
g = Main Library
r = Rakow Branch
The first patron in the table, however, is the enigmatic Patron 0.
I do not know why, but a number of incomplete entries appear in each checkout file.
Usually the item record information is complete, but the patronID is 0, and the DateTime is
blank. I exclude these entries from the final count.
A more conventional patron, Patron 1000792, had 29 MediaBank checkouts and 1 Rakow
Branch checkout during the month.
The second PivotTable adds the date to show whether people who checked items out from
the MediaBank also went into the Rakow Branch and checked things out on the same day.
10
You can see that Patron 1000792 checked out from the MediaBank on 8 different
days, and checked out from Rakow on 1 of the days he used the MediaBank.
BETTER QUESTION: Do MediaBank users ever visit the Rakow Branch without using
the MediaBank? But I didn’t think of that in time for this presentation.
10
I decided to consider that all items checked out by the same patron at the same site on the
same day would count as one session or visit.
To find out how many MediaBank visits there were during the month, I copied the second
PivotTable, then sorted by MediaBank checkouts (column C). There were 1,862 MediaBank
visits in February 2013.
To find out how many times people checked items out from the MediaBank, and also
checked items out from the Rakow Branch on the same day, I sorted lines 2 through 1,863
of the spreadsheet by number of Rakow Branch checkouts (Column E). There were 290
occasions on which patrons checked items out from both the MediaBank and the Rakow
Branch on the same day.
Why is it that most people who visit the MediaBank, which is right by the door of the
Rakow Branch, do not also go inside the building and check something out? Are they only
interested in Blu-rays, DVDs, and videogames? Are they visiting when the Rakow Branch is
closed? …?
11
Our fiscal year runs from July to June.
This report includes Joan Frye Williams’ favorite statistic, “mode”, the value which occurs
most often.
The statistic “days between visits” is messy to figure out, but I will continue it for a while to
see if it is used. I get all the dates of the visits into a spreadsheet, insert columns to hold
the number of days between the visits, and then get averages, etc. for those numbers.
12
It is easy to get a report of items sent from one building to the other from the monthly
spreadsheet.
Items owned by the Rakow Branch have Owning library = r.
Items checked out at the Main Library have Checkout library = g.
I found it easier to look for these 2 codes in 1 field, so Check/Own = gr.
I have a list of staff administrative cards, and I use VLOOKUP to eliminate checkouts on
those cards.
Collection Development staff like a list of the actual items. I sort it by location code, call
number, title, and datetime so that repeated loans of the same title will appear together. I
delete the PatronID column.
13
I use a PivotTable to get the number of items sent each month by location code, and add
the caption for the location code to the finished report.
14
A map is a format that helps you visualize data. It’s also a lot of fun to make.
15
The most widely-used mapping software is Esri’s ArcGIS. When I started thinking about it, I
heard that it is too expensive and too difficult. The standard exclamation is “They give
advanced degrees in that!”
Not too difficult: we teach ourselves Excel, Access, etc. Tools: Esri tutorial, books, classes.
Not too expensive: ArcGIS Desktop Basic = $1500/ year. But only $250/year for
educational/nonprofit institution using only for administrative purposes.
I used GIS tutorial1: basic workbook, by Wilpen L. Gorr and Kristen S. Kurland. Redlands,
CA: Esri Press. The 5th edition came out May 3, 2013. ISBN 978-158948-335-4 List price
$79.95. Amazon price $43.86. It includes access to a 180-day trial of ArcGIS® 10.1
for Desktop Advanced software and a DVD with data for working through the
exercises.
There are also open source GIS programs, such as Quantum GIS.
This screen shows ArcMap, the ArcGIS element that you use to make maps.
Maps are made up of various files.
The right pane shows the files you can choose to make the map.
16
The center pane holds the map itself.
The left pane shows the files that make up the map you are working on.
You can also draw on the map, but whenever possible you want to use files that already
exist.
The most interesting thing about ArcMap for this project is its ability to take a spreadsheet of
addresses and geocode them -- locate them on a map.
16
The first spreadsheet shows active patrons and their addresses (which I have whited out in
the screen shot). As you know, Innovative patron records have one field that holds the
entire address, but with patience it is possible to parse out the various elements. Perhaps
this is easier when you get patron information via SQL?
To make the map I want, I need to add a code to each patron showing which libraries that
patron used.
In the second spreadsheet, Columns A – D show the results of a PivotTable of checkout
records. You can see how many items each patron checked out from each service point:
g = Main Library
m = MediaBank
r = Rakow Branch
In Columns E – G, I used formulas to change a checkout number greater than 0 into the
letter for each building. You can see the formula I used for Column E in the screen shot.
In Column H, I concatenated all 3 building codes, so you could see where the patron
checked out items.
But I was worried that the map would be too hard to read if I used all 3 locations (too many
code combinations). So I simplified the codes to only 2 locations in Column I. Since the
MediaBank is located at the Rakow Branch, I used “r” to mean either the MediaBank or the
17
Rakow Branch.
Another thing to notice is that the PatronID in the patron record is 2 characters longer than
the PatronID in the checkout record. The PatronID in the patron record has a “p” at the
beginning and a check digit at the end.
17
I used the Excel VLOOKUP function to add the checkout library code to the patron records.
First I copied the PatronID from Column A of the patron spreadsheet into Column J and
chopped off the beginning “p” and the ending check digit to make the ShortID. Then I
added Column K to hold the checkout library code.
Then I copied the PatronID and the checkout library code from the checkout record
spreadsheet into Columns O – P of the patron spreadsheet.
I used VLOOKUP to copy the checkout library code from Column P to Column K. You can see
the formula in the screen shot.
If the ShortID in Column J does not match any PatronID in Column O, the formula returns
“#N/A”. That means that the patron did not check out any items in the time period covered
by my report, and for this project I do not want to map patrons without checkouts.
I copied Column K and pasted the values back into the spreadsheet, deleted Columns O – P,
and deleted the lines with “#N/A”. That made the spreadsheet ready to pull into ArcMap.
18
I pulled the spreadsheet of patrons into ArcMap and geocoded the addresses.
If you have your own file of mapped reference addresses, the geocoding operation is free.
However, if you want to use Esri’s online World Geocode Service, as I did, there is an
additional charge.
You need to tell ArcMap which fields in your table contain the address parts.
19
Of 44,557 patrons in the spreadsheet, 36,374 (~82%) were geocoded. You have a chance to
go over the records that did not geocode and match them manually to possible addresses
in the reference file, but I did not do that in this case.
The file that was formed by geocoding is called a shapefile.
The shapefile is accompanied by an attribute table that has the fields from the Excel
spreadsheet, and additional fields with address information from Esri. Again, I have whited
out the addresses.
You can use data from the attribute table to change the way the shapefile looks. You can
also add data to the table.
20
I threw in a background map, added shapefiles of the library district boundaries and
the 2 library buildings, and colored the dots by checkout library: Main Library only
(27,077), Rakow Branch only (3,215), or both (6,082).
Unfortunately, the dots cover each other, although I tried to layer them so that the
Main Library only patrons were on the bottom with the largest dot and the faintest
color, and the Rakow Branch only patrons on the top with the smallest dot and the
most vivid color.
21
When you look at maps which show only one type of patron at a time, you can see how
misleading the 3-color map is.
22
23
24
I made the dots smaller, so they would not overlay each other.
You can see how difficult it can be to make a map communicate information effectively.
However, it is obvious that patrons do not always go to the library that is closest to them. I
wanted to find out how many patrons go to the library that is farther from them.
25
ArcMap has a “measure” tool that tells you the distance between 2 points, but not
between 1 point and 36,374 other points. There is a tool that will measure the
distances between large numbers of points, but it is not included in the basic
version of ArcGIS.
However, there is another way to record the approximate distance of each patron
from each library building.
I used the “measure” tool to find out that the distance from the Rakow Branch to
the farthest patron in the northeast corner of the district is 8.2 miles.
26
“Select by location” let me select all patrons with a certain distance from the Rakow
Branch. I used ¼ mile increments to measure how far patrons live from the branch.
Since we have already established that the distance from the Rakow Branch to the
farthest patron is 8.2 miles, it is not surprising that when I searched by 8.25 miles, I
got all 36,374 patrons. However, when I searched by 8.0 miles, a few patrons in the
northeast corner of the district were not selected. That means that those patrons
live between 8.25 and 8.0 miles from the Rakow Branch. They can be seen on the
map because their dots are dark, instead of the florescent blue that you see when a
feature is selected.
I wanted to label those patrons “8.25” by adding the information to the shapefile’s
attribute table.
27
When you select features on the map, the attribute table’s lines for those features are also
selected (highlighted in florescent blue). You can choose to see either all the lines in the
table, or just the lines that have been selected. I chose to see the selected lines, and the
attribute table told me there were 36,242 out of 36,347.
That means that 132 patrons live between 8.25 and 8.0 miles from the Rakow Branch, and I
wanted to label them “8.25”. I added a field to the attribute table called “RakowDist” to
hold this information.
Fortunately, the attribute table has a handy icon that lets you reverse the selection on the
map (and on the attribute table). When you click on that icon, the dots that were
highlighted turn dark, and the dots that were dark become highlighted.
As you will see on the next page, the highlighted lines in the attribute table change, too.
28
When you choose to look at only the selected lines in the attribute table, there are now
only 132. After you tell ArcMap that you want to edit the table, you can copy 132 lines from
an Excel spreadsheet and paste them into the attribute table.
29
The next step is to search by location for patrons who live within 7.75 miles of the Rakow
Branch.
Then I reversed the selection using the attribute table.
You can see that the little triangle of patrons in the northeast corner is bigger. These people
live between 8.25 and 7.75 miles of the branch.
This includes the 132 people who live between 8.25 and 8.0 miles of the branch, the ones
we found in the previous search.
30
The attribute table tells us that that 351 people live between 8.25 and 7.75 miles of the
branch. However, 132 of those people already have “8.25” in the “RakowDist” column.
I sorted the column largest to smallest.
Then I pasted “8.0” into the lines where RakowDist = 0, lines 133 – 351.
I repeated this, decreasing the distance by 0.25 miles each time, until I had a distance from
the Rakow Branch for each patron.
I did the whole thing over again to get the distance from the Main Library for each patron.
31
The population around the Main Library is denser, so a map color-coded by distance for the
Main Library is more striking than the map for the Rakow Branch.
I thought a “heat map”, with the colors gradually going more blue the closer they were to
the building, would be effective, but it is hard to read.
32
Concentric bands of contrasting colors are easier to see.
33
To find out how many patrons live closer to the Main Library, but go only to the Rakow
Branch, I used “Select by attributes.”
I chose patrons where the column “Two” (the code for the checkout library) = “r”
and
The distance to the Main Library is less than the distance to the Rakow Branch.
There are 3,215 patrons who go only to the Rakow Branch. 235, or 7.3%, live closer to the
Main Library than to the Rakow Branch.
Are these people drawn to the Rakow Branch by the MediaBank? Not all of them, as you
can see from the attribute table. Several patrons have an “r” in the column that shows
Rakow Branch use, but no “m” in column that shows MediaBank use.
34
What other factors would there be? The yellow dots showing these patrons are not
grouped in a limited geographic area. Perhaps they work or shop by the Rakow Branch?
35
To find out how many patrons live closer to the Rakow Branch, but go only to the Main
Library,
I chose patrons where the column “Two” (the code for the checkout library) = “g”
and
The distance to the Main Library is more than the distance to the Rakow Branch.
There are 27,077 patrons who go only to the Main Library. 3,079, or 11.4%, live closer to
the Rakow Branch than to the Main Library.
36
As you can see, some of the Main Library only patrons live very close to the Rakow Branch.
There are 3,215 patrons who exclusively go to the Rakow Branch, and 3,079 patrons
who live closer to Rakow yet shun it.
37
To avoid taking too much time and inducing boredom, I skipped steps that I used in making
these reports.
Please feel free to email or call me with any questions.
My ArcGIS skills are not extensive, but, as you can see, the software is fun to experiment
with.
38
39

More Related Content

Similar to So much data so many uses with notes

How to become Data Driven for startups - keboola
How to become Data Driven for startups - keboolaHow to become Data Driven for startups - keboola
How to become Data Driven for startups - keboolaPavel Dolezal
 
Internship Final Project
Internship Final ProjectInternship Final Project
Internship Final ProjectRyan Wall
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project reportsonalighai
 
2016 0921 IMA MO-Stand-Out (Handout)
2016 0921 IMA MO-Stand-Out (Handout)2016 0921 IMA MO-Stand-Out (Handout)
2016 0921 IMA MO-Stand-Out (Handout)Invenio Advisors, LLC
 
Data Visualisation Design Workshop #UXbne
Data Visualisation Design Workshop #UXbneData Visualisation Design Workshop #UXbne
Data Visualisation Design Workshop #UXbneCam Taylor
 
GROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docx
GROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docxGROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docx
GROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docxnyashatumba
 
Detailed Scraping Instructions
Detailed Scraping InstructionsDetailed Scraping Instructions
Detailed Scraping InstructionsNate Kurth
 
How to become data analysis
How to become data analysisHow to become data analysis
How to become data analysisAkhgar24
 
O neal columbia
O neal columbiaO neal columbia
O neal columbiaENUG
 
IBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARNIBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARNabclearnn
 
Basics+of+Datawarehousing
Basics+of+DatawarehousingBasics+of+Datawarehousing
Basics+of+Datawarehousingtheextraaedge
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceNisheet Mahajan
 
Unit 3 assessment 3 lesson
Unit 3 assessment 3 lessonUnit 3 assessment 3 lesson
Unit 3 assessment 3 lessonMrJRogers
 
Data-Driven Rules in HFM
Data-Driven Rules in HFMData-Driven Rules in HFM
Data-Driven Rules in HFMaa026593
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsDataWorks Summit
 
A case study of encouraging compliance with Spectrum data standards and proce...
A case study of encouraging compliance with Spectrum data standards and proce...A case study of encouraging compliance with Spectrum data standards and proce...
A case study of encouraging compliance with Spectrum data standards and proce...Axiell ALM
 

Similar to So much data so many uses with notes (20)

How to become Data Driven for startups - keboola
How to become Data Driven for startups - keboolaHow to become Data Driven for startups - keboola
How to become Data Driven for startups - keboola
 
Internship Final Project
Internship Final ProjectInternship Final Project
Internship Final Project
 
Dwbi Project
Dwbi ProjectDwbi Project
Dwbi Project
 
battery pa report.docx
battery pa report.docxbattery pa report.docx
battery pa report.docx
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
 
2016 0921 IMA MO-Stand-Out (Handout)
2016 0921 IMA MO-Stand-Out (Handout)2016 0921 IMA MO-Stand-Out (Handout)
2016 0921 IMA MO-Stand-Out (Handout)
 
Data Visualisation Design Workshop #UXbne
Data Visualisation Design Workshop #UXbneData Visualisation Design Workshop #UXbne
Data Visualisation Design Workshop #UXbne
 
GROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docx
GROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docxGROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docx
GROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docx
 
Detailed Scraping Instructions
Detailed Scraping InstructionsDetailed Scraping Instructions
Detailed Scraping Instructions
 
How to become data analysis
How to become data analysisHow to become data analysis
How to become data analysis
 
O neal columbia
O neal columbiaO neal columbia
O neal columbia
 
IBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARNIBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARN
 
Basics+of+Datawarehousing
Basics+of+DatawarehousingBasics+of+Datawarehousing
Basics+of+Datawarehousing
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business Intelligence
 
Unit 3 assessment 3 lesson
Unit 3 assessment 3 lessonUnit 3 assessment 3 lesson
Unit 3 assessment 3 lesson
 
Uses of excel
Uses of excelUses of excel
Uses of excel
 
Data-Driven Rules in HFM
Data-Driven Rules in HFMData-Driven Rules in HFM
Data-Driven Rules in HFM
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
 
Sql Lab 4 Essay
Sql Lab 4 EssaySql Lab 4 Essay
Sql Lab 4 Essay
 
A case study of encouraging compliance with Spectrum data standards and proce...
A case study of encouraging compliance with Spectrum data standards and proce...A case study of encouraging compliance with Spectrum data standards and proce...
A case study of encouraging compliance with Spectrum data standards and proce...
 

Recently uploaded

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 

Recently uploaded (20)

Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 

So much data so many uses with notes

  • 1. I am Susan Lytinen, a Data Projects Specialist for the Gail Borden Public Library District in Elgin, IL, about 50 miles west of Chicago. My position was created to gather data to help us make decisions. I’ll be motoring through this presentation fairly quickly, but it’s online, with practically every word I’m saying in the presenter notes, and you can always contact me with any questions. 1
  • 2. I started saving daily checkout data from our Innovative ILS to make maps, but I soon realized that it could be used to gain many different insights about our 3 library service points: the Main Library, the Rakow Branch, and the MediaBank disc dispenser which is built into the outside wall of the Rakow Branch. Please note that the door that opens into the Rakow Branch is mere feet from the MediaBank. For instance, our Innovative reports of circulation by location code told us how many items were being checked out of the MediaBank, but analyzing daily checkout data told us how many individual people were using it, how often, and whether they also checked out materials from inside the Rakow Branch during the same visit. We can also produce lists of materials sent from the Main Library to the Rakow Branch, and vice versa, for collection development purposes. And these are just the first projects that occurred to me. Since I enjoy mapping, I am also including some maps that I made to show where the patrons live that use each of our library buildings. 2
  • 3. It is always pleasant to hear an acknowledged expert recommend something you are already doing. On March 4, I watched a webinar given by the well-known library consultant Joan Frye Williams -- Measurements that matter: analyzing patron behavior : an Infopeople webinar / Joan Frye Williams https://infopeople.org/civicrm/event/info?id=377&reset=1 Ms. Williams stressed the importance of this kind of information gathering and analysis. 3
  • 4. I have been collecting daily checkout data since July 1, 2012. The following dates are included in the reports shown in this presentation. Checkout data from 7/1/12-12/31/13 Patron records as of 1/27/14 These screen shots show which fields I have been saving. 4
  • 5. Some libraries may not have OUT LOC. This field is useful, because it tells you the Innovative terminal number for the computer where the item was checked out. Are your patrons are using your self-checks or still taking everything to the Circulation Desk? Are you sending a lot of material from one library branch to another? 5
  • 6. When you are preserving information for checkouts, depending on how often you search and how short your loan periods are, you may need to search for checkouts by LOUTDATE as well. This search finds items that were checked out on your specified date, but were returned before you executed your search. Some libraries may not have LPATRON, LOUTDATE. There is no LAST OUT LOC field. You are going to miss some checkouts no matter what, but it is no use agonizing about it. I have seen recent Sierra discussion list posts about using SQL to search circ_trans and item_circ_history, but I am not there yet. 6
  • 7. These are the Innovative Create Lists searches and exports I use. I eliminated the location “zfly” because we use that for temporary items that are created at the Circulation Desk when someone wants to check out material that is not in our database. I have started exporting csv files instead of txt because csv files will open right up in Excel – you don’t have to import them. 7
  • 8. 8
  • 9. I cumulate the 2 daily csv files (outdate and last outdate) into a single monthly spreadsheet. These screen shots show the daily files for December 15, 2013, and the December 2013 spreadsheet. In order to do that: • I use macros to add columns and change the column headers on the daily files. • In the LOUTDATE files, I fake an OUT LOC by assuming that the item was checked out at the library where it is normally shelved. After all, why would someone request an item to be sent to a different building, only to return it on the same day? • I add the last 5 columns. • I copy the LocationID (OUT LOC) into the Checkout library field, then use “find and replace” to change it to a one-letter library code. • The Owning library is the 1st letter of the CollID (LOCATION). • Check/Own is the Checkout library concatenated to the Owning library. • Date is the date from the DateTime field. • Month is the 1st day of the month. I am rather inept, and cannot figure out how to get Excel to just say the month and year. Then I copy and paste the multiple files into a single spreadsheet. There must be more streamlined ways to do this – I just have to find out what they are! Annual spreadsheets get to be a little unwieldy for Excel to process. And in most cases, I 9
  • 10. want monthly statistics throughout the year. 9
  • 11. Excel PivotTables are an easy way to analyze data. If you have not used them, they are not too difficult to make. The screen shots show the fields I used in the PivotTables. These tables are for the month of February 2013. The first PivotTable shows the number of items checked out per patron at each of 3 service points: m = MediaBank at the Rakow Branch g = Main Library r = Rakow Branch The first patron in the table, however, is the enigmatic Patron 0. I do not know why, but a number of incomplete entries appear in each checkout file. Usually the item record information is complete, but the patronID is 0, and the DateTime is blank. I exclude these entries from the final count. A more conventional patron, Patron 1000792, had 29 MediaBank checkouts and 1 Rakow Branch checkout during the month. The second PivotTable adds the date to show whether people who checked items out from the MediaBank also went into the Rakow Branch and checked things out on the same day. 10
  • 12. You can see that Patron 1000792 checked out from the MediaBank on 8 different days, and checked out from Rakow on 1 of the days he used the MediaBank. BETTER QUESTION: Do MediaBank users ever visit the Rakow Branch without using the MediaBank? But I didn’t think of that in time for this presentation. 10
  • 13. I decided to consider that all items checked out by the same patron at the same site on the same day would count as one session or visit. To find out how many MediaBank visits there were during the month, I copied the second PivotTable, then sorted by MediaBank checkouts (column C). There were 1,862 MediaBank visits in February 2013. To find out how many times people checked items out from the MediaBank, and also checked items out from the Rakow Branch on the same day, I sorted lines 2 through 1,863 of the spreadsheet by number of Rakow Branch checkouts (Column E). There were 290 occasions on which patrons checked items out from both the MediaBank and the Rakow Branch on the same day. Why is it that most people who visit the MediaBank, which is right by the door of the Rakow Branch, do not also go inside the building and check something out? Are they only interested in Blu-rays, DVDs, and videogames? Are they visiting when the Rakow Branch is closed? …? 11
  • 14. Our fiscal year runs from July to June. This report includes Joan Frye Williams’ favorite statistic, “mode”, the value which occurs most often. The statistic “days between visits” is messy to figure out, but I will continue it for a while to see if it is used. I get all the dates of the visits into a spreadsheet, insert columns to hold the number of days between the visits, and then get averages, etc. for those numbers. 12
  • 15. It is easy to get a report of items sent from one building to the other from the monthly spreadsheet. Items owned by the Rakow Branch have Owning library = r. Items checked out at the Main Library have Checkout library = g. I found it easier to look for these 2 codes in 1 field, so Check/Own = gr. I have a list of staff administrative cards, and I use VLOOKUP to eliminate checkouts on those cards. Collection Development staff like a list of the actual items. I sort it by location code, call number, title, and datetime so that repeated loans of the same title will appear together. I delete the PatronID column. 13
  • 16. I use a PivotTable to get the number of items sent each month by location code, and add the caption for the location code to the finished report. 14
  • 17. A map is a format that helps you visualize data. It’s also a lot of fun to make. 15
  • 18. The most widely-used mapping software is Esri’s ArcGIS. When I started thinking about it, I heard that it is too expensive and too difficult. The standard exclamation is “They give advanced degrees in that!” Not too difficult: we teach ourselves Excel, Access, etc. Tools: Esri tutorial, books, classes. Not too expensive: ArcGIS Desktop Basic = $1500/ year. But only $250/year for educational/nonprofit institution using only for administrative purposes. I used GIS tutorial1: basic workbook, by Wilpen L. Gorr and Kristen S. Kurland. Redlands, CA: Esri Press. The 5th edition came out May 3, 2013. ISBN 978-158948-335-4 List price $79.95. Amazon price $43.86. It includes access to a 180-day trial of ArcGIS® 10.1 for Desktop Advanced software and a DVD with data for working through the exercises. There are also open source GIS programs, such as Quantum GIS. This screen shows ArcMap, the ArcGIS element that you use to make maps. Maps are made up of various files. The right pane shows the files you can choose to make the map. 16
  • 19. The center pane holds the map itself. The left pane shows the files that make up the map you are working on. You can also draw on the map, but whenever possible you want to use files that already exist. The most interesting thing about ArcMap for this project is its ability to take a spreadsheet of addresses and geocode them -- locate them on a map. 16
  • 20. The first spreadsheet shows active patrons and their addresses (which I have whited out in the screen shot). As you know, Innovative patron records have one field that holds the entire address, but with patience it is possible to parse out the various elements. Perhaps this is easier when you get patron information via SQL? To make the map I want, I need to add a code to each patron showing which libraries that patron used. In the second spreadsheet, Columns A – D show the results of a PivotTable of checkout records. You can see how many items each patron checked out from each service point: g = Main Library m = MediaBank r = Rakow Branch In Columns E – G, I used formulas to change a checkout number greater than 0 into the letter for each building. You can see the formula I used for Column E in the screen shot. In Column H, I concatenated all 3 building codes, so you could see where the patron checked out items. But I was worried that the map would be too hard to read if I used all 3 locations (too many code combinations). So I simplified the codes to only 2 locations in Column I. Since the MediaBank is located at the Rakow Branch, I used “r” to mean either the MediaBank or the 17
  • 21. Rakow Branch. Another thing to notice is that the PatronID in the patron record is 2 characters longer than the PatronID in the checkout record. The PatronID in the patron record has a “p” at the beginning and a check digit at the end. 17
  • 22. I used the Excel VLOOKUP function to add the checkout library code to the patron records. First I copied the PatronID from Column A of the patron spreadsheet into Column J and chopped off the beginning “p” and the ending check digit to make the ShortID. Then I added Column K to hold the checkout library code. Then I copied the PatronID and the checkout library code from the checkout record spreadsheet into Columns O – P of the patron spreadsheet. I used VLOOKUP to copy the checkout library code from Column P to Column K. You can see the formula in the screen shot. If the ShortID in Column J does not match any PatronID in Column O, the formula returns “#N/A”. That means that the patron did not check out any items in the time period covered by my report, and for this project I do not want to map patrons without checkouts. I copied Column K and pasted the values back into the spreadsheet, deleted Columns O – P, and deleted the lines with “#N/A”. That made the spreadsheet ready to pull into ArcMap. 18
  • 23. I pulled the spreadsheet of patrons into ArcMap and geocoded the addresses. If you have your own file of mapped reference addresses, the geocoding operation is free. However, if you want to use Esri’s online World Geocode Service, as I did, there is an additional charge. You need to tell ArcMap which fields in your table contain the address parts. 19
  • 24. Of 44,557 patrons in the spreadsheet, 36,374 (~82%) were geocoded. You have a chance to go over the records that did not geocode and match them manually to possible addresses in the reference file, but I did not do that in this case. The file that was formed by geocoding is called a shapefile. The shapefile is accompanied by an attribute table that has the fields from the Excel spreadsheet, and additional fields with address information from Esri. Again, I have whited out the addresses. You can use data from the attribute table to change the way the shapefile looks. You can also add data to the table. 20
  • 25. I threw in a background map, added shapefiles of the library district boundaries and the 2 library buildings, and colored the dots by checkout library: Main Library only (27,077), Rakow Branch only (3,215), or both (6,082). Unfortunately, the dots cover each other, although I tried to layer them so that the Main Library only patrons were on the bottom with the largest dot and the faintest color, and the Rakow Branch only patrons on the top with the smallest dot and the most vivid color. 21
  • 26. When you look at maps which show only one type of patron at a time, you can see how misleading the 3-color map is. 22
  • 27. 23
  • 28. 24
  • 29. I made the dots smaller, so they would not overlay each other. You can see how difficult it can be to make a map communicate information effectively. However, it is obvious that patrons do not always go to the library that is closest to them. I wanted to find out how many patrons go to the library that is farther from them. 25
  • 30. ArcMap has a “measure” tool that tells you the distance between 2 points, but not between 1 point and 36,374 other points. There is a tool that will measure the distances between large numbers of points, but it is not included in the basic version of ArcGIS. However, there is another way to record the approximate distance of each patron from each library building. I used the “measure” tool to find out that the distance from the Rakow Branch to the farthest patron in the northeast corner of the district is 8.2 miles. 26
  • 31. “Select by location” let me select all patrons with a certain distance from the Rakow Branch. I used ¼ mile increments to measure how far patrons live from the branch. Since we have already established that the distance from the Rakow Branch to the farthest patron is 8.2 miles, it is not surprising that when I searched by 8.25 miles, I got all 36,374 patrons. However, when I searched by 8.0 miles, a few patrons in the northeast corner of the district were not selected. That means that those patrons live between 8.25 and 8.0 miles from the Rakow Branch. They can be seen on the map because their dots are dark, instead of the florescent blue that you see when a feature is selected. I wanted to label those patrons “8.25” by adding the information to the shapefile’s attribute table. 27
  • 32. When you select features on the map, the attribute table’s lines for those features are also selected (highlighted in florescent blue). You can choose to see either all the lines in the table, or just the lines that have been selected. I chose to see the selected lines, and the attribute table told me there were 36,242 out of 36,347. That means that 132 patrons live between 8.25 and 8.0 miles from the Rakow Branch, and I wanted to label them “8.25”. I added a field to the attribute table called “RakowDist” to hold this information. Fortunately, the attribute table has a handy icon that lets you reverse the selection on the map (and on the attribute table). When you click on that icon, the dots that were highlighted turn dark, and the dots that were dark become highlighted. As you will see on the next page, the highlighted lines in the attribute table change, too. 28
  • 33. When you choose to look at only the selected lines in the attribute table, there are now only 132. After you tell ArcMap that you want to edit the table, you can copy 132 lines from an Excel spreadsheet and paste them into the attribute table. 29
  • 34. The next step is to search by location for patrons who live within 7.75 miles of the Rakow Branch. Then I reversed the selection using the attribute table. You can see that the little triangle of patrons in the northeast corner is bigger. These people live between 8.25 and 7.75 miles of the branch. This includes the 132 people who live between 8.25 and 8.0 miles of the branch, the ones we found in the previous search. 30
  • 35. The attribute table tells us that that 351 people live between 8.25 and 7.75 miles of the branch. However, 132 of those people already have “8.25” in the “RakowDist” column. I sorted the column largest to smallest. Then I pasted “8.0” into the lines where RakowDist = 0, lines 133 – 351. I repeated this, decreasing the distance by 0.25 miles each time, until I had a distance from the Rakow Branch for each patron. I did the whole thing over again to get the distance from the Main Library for each patron. 31
  • 36. The population around the Main Library is denser, so a map color-coded by distance for the Main Library is more striking than the map for the Rakow Branch. I thought a “heat map”, with the colors gradually going more blue the closer they were to the building, would be effective, but it is hard to read. 32
  • 37. Concentric bands of contrasting colors are easier to see. 33
  • 38. To find out how many patrons live closer to the Main Library, but go only to the Rakow Branch, I used “Select by attributes.” I chose patrons where the column “Two” (the code for the checkout library) = “r” and The distance to the Main Library is less than the distance to the Rakow Branch. There are 3,215 patrons who go only to the Rakow Branch. 235, or 7.3%, live closer to the Main Library than to the Rakow Branch. Are these people drawn to the Rakow Branch by the MediaBank? Not all of them, as you can see from the attribute table. Several patrons have an “r” in the column that shows Rakow Branch use, but no “m” in column that shows MediaBank use. 34
  • 39. What other factors would there be? The yellow dots showing these patrons are not grouped in a limited geographic area. Perhaps they work or shop by the Rakow Branch? 35
  • 40. To find out how many patrons live closer to the Rakow Branch, but go only to the Main Library, I chose patrons where the column “Two” (the code for the checkout library) = “g” and The distance to the Main Library is more than the distance to the Rakow Branch. There are 27,077 patrons who go only to the Main Library. 3,079, or 11.4%, live closer to the Rakow Branch than to the Main Library. 36
  • 41. As you can see, some of the Main Library only patrons live very close to the Rakow Branch. There are 3,215 patrons who exclusively go to the Rakow Branch, and 3,079 patrons who live closer to Rakow yet shun it. 37
  • 42. To avoid taking too much time and inducing boredom, I skipped steps that I used in making these reports. Please feel free to email or call me with any questions. My ArcGIS skills are not extensive, but, as you can see, the software is fun to experiment with. 38
  • 43. 39