SlideShare a Scribd company logo
1 of 34
Download to read offline
Batch uploading
to the Internet
Archive using
Python...
...and a Raspberry Pi 4
Alison Harvey
Special Collections and Archives
Cardiff University
This tutorial will explain:
- How to upload multiple images or multi-page documents to the
Internet Archive using Python (without a Linux PC)
- How to name files for successful ingest
- How to prepare metadata (information about the files)
- How to set up a Raspberry Pi 4 out of the box
- Installing the operating system
- Configuring the operating system
- Installing the Internet Archive Python library via command line
- Connecting to your Internet Archive account
- Uploading batches of files to your account
You will need:
- Raspberry Pi 4
- Monitor
- HDMI to micro HDMI cable, to connect the monitor to the Raspberry Pi
- USB keyboard and mouse
- Empty 64GB SD card and adapter - to use as your Raspberry Pi’s hard drive
- Image files or .zip files on a USB stick/external hard drive
- A .csv file of metadata saved in the same location
File-naming individual images for ingest
- You may upload jpg, jpeg, jp2, tif, tiff, png, gif or bmp files
- Filenames will form part of the final URL, and must be unique to the Internet
Archive. Use a file-naming convention to create a code that is meaningful to
you, but which is unlikely to have already been used.
- This might be three letters (e.g. ABC) to describe the collection, two digits to
describe the year the images were created (e.g. 2019), and three digits for a
running number per file (001-003 etc):
- ABC19001.tif
- ABC19002.tif
- ABC19003.tif
File-naming multi-page texts for ingest
- It is possible to upload multiple images belonging to a single text, and compile
them into a zip file for ingest as a single digital object.
- Follow the file-naming advice as for images, but use the running number
portion of the filename to indicate the order in which the pages should appear
in the final presentation, e.g.
- Page 1 = XYZ19001.tif
- Page 2 = XYZ19002.tif
- Page 3 = XYZ19003.tif
[prefix] [running number]
- Then, zip all images belonging to the same document/book into a single file
- Filename it with the prefix, followed by _images.zip, e.g. XYZ19_images.zip
Preparing image metadata
- identifier
- file
- title
- description
- collection
- subject
Use Excel or Google Sheets to create a spreadsheet with the following column headings:
- contributor
- date
- publisher
- creator
- language
- licenseurl
- mediatype
Headings must be spelt and capitalised exactly as above. Headings in bold are mandatory.
Mandatory fields: identifier
This is the image or zip filename, without the filetype prefix.
It will be used as part of the final URL, and it must be unique to the Internet
Archive, e.g.
Images: ABC19001
Texts: XYZ19
Mandatory fields: file
This is the path to your file, including the filetype prefix
This must begin with a forward slash. You will be storing files in your Raspberry
Pi’s ‘home’ folder: /home/pi
Your filepath will be:
Images: /home/pi/ABC19001.tif
Texts: /home/pi/XYZ19_images.zip
Use a new row in the spreadsheet for every file.
Mandatory fields: title
This will appear on
your account home
page. Try to make it
short but descriptive.
Think about what
information would be
of most use to
someone browsing
through the visual
content of your home
page.
Mandatory fields: description
Here, you can add any additional information that will not fit in the title field.
Images, e.g. Cathays Park: aerial photograph looking south west, 1962:
Black and white photo. Showing Cathays Park and College Buildings.
Cardiff Castle and Cardiff Arms Park visible in background. Dimensions:
160mm (h) x 210mm. Image area: 149mm (h) x 210mm (w).
Texts, e.g. On holiday in wartime, France 1914:
Handwritten and hand-illustrated journal of travels across France in the
autumn of 1914. Provenance: Deposited by Frith-Beard in 1932. Archival
ref: 410.
Mandatory fields: collection
Unless you already have collections set up on your account, use the default
collections:
Images: opensource_image
Texts: opensource
The collections must be spelt and capitalised exactly as above.
Mandatory fields: subject
This field can be searched, but also allows users to filter
items on the same topic.
Think about what information would be most helpful for
your users to be able to filter. Use terminology, spelling,
and capitalisation consistently, so that all matches group
successfully under a single heading.
If you have multiple subjects, use the column headings:
- subject[0]
- subject[1]
- subject[2]... and so on.
Always begin counting at 0, and do not add spaces.
Optional fields: contributor
If you add this field to every item you upload, it can be used a quick means of
identifying and extracting information about all your items.
Use advanced search to run a query on your contributor name that will return all
items you have ever uploaded:
and Archives, Cardiff University
‘Contributor’ could be your own name, or the name of your organisation. Make sure
it is detailed enough to be unique, to ensure that you only retrieve your own results.
Optional fields: date
This field allows collections to be searched and filtered by date.
This field must be machine-readable, expressing the date as either
YYYY (e.g. 1982) or YYYY-MM-DD (e.g. 1982-11-26)
If you do not have this information, or the date is estimated, leave
this field blank, and use the description field to either indicate that
the item is undated or of uncertain date.
Optional fields: publisher, creator
Add this information if you have it.
Creator names can be used to filter content. Present them in a consistent
format to ensure that all matches group successfully under a single heading,
such as:
Surname, forename, yyyy birth date-yyyy death date
Owen, Morfydd, 1891-1918
Optional fields: language
As well as allowing users to search and filter by the language of the text,
completing this field helps the Internet Archive to apply OCR to your items.
OCR, or Optical Character Recognition, analyses the shape of letters found in
images of printed text, and converts it into machine-encoded text. Users are
then able to search for words and phrases found inside the digital objects.
For multilingual texts, use the column headings: language[0], language[1] etc.
Always use the relevant ISO 639-2 code for your language, e.g.
- English (eng)
- Welsh (wel)
- Arabic (ara)
Optional fields: licenseurl
This field applies a license to your content, which tells users what they are allowed to do with it.
Visit Creative Commons to generate an appropriate license, and copy the url into the spreadsheet,
e.g. http://creativecommons.org/licenses/by/4.0/
Optional fields: mediatype
Images: image
Texts: texts
This field classifies the object as image or text for the purpose of filtering.
The types must be spelt and capitalised exactly as above.
Saving as csv
When your table of metadata is complete, with an item on each row, you are ready
to save as csv.
Saving to csv directly from Excel can cause errors - if you have been working in
Excel, paste cells into a Google Sheets document when your metadata is complete.
From Google Sheets, select File > Download > Comma-separated values. Save the
csv file in the same location as your image files or zip files.
Setting up a Raspberry Pi: install and run OS imager
Download Raspberry Pi
imager for Windows or Mac
on your usual PC.
Insert the SD card in the
adapter, plug into PC, and
run Raspberry Pi imager.
Setting up a Raspberry Pi: erase and format SD card
First, prepare the SD card by
erasing all previous data and
format it, ready to flash the
new OS.
Under Choose OS, scroll
down and select Erase.
Under Choose Storage,
select the SD card.
Select Write.
Setting up a Raspberry Pi: flash the OS to the SD card
Under Choose OS, select
Raspberry Pi OS (32 bit).
Under Choose Storage,
select the SD card.
Select Write. This will flash
the OS to the card - it may
take several minutes.
Setting up a Raspberry Pi: getting connected
Eject the SD card from the
PC, and remove from its
adapter.
Insert the card into the back
of the Raspberry Pi as
shown.
Setting up a Raspberry Pi: getting connected
Connect the monitor with the
HDMI-Micro HDMI cable.
Connect the keyboard and
mouse.
Finally, connect the power
cable, and switch on power.
Setting up a Raspberry Pi: installing the OS
The Raspberry Pi will boot (this may take several minutes, as it installs the OS).
When it’s complete, it will look like this. Work through the following set up stages.
Setting up a Raspberry Pi: location and language
Setting up a Raspberry Pi: change default password
Setting up a Raspberry Pi: set up screen
Setting up a Raspberry Pi: connect to wifi
Setting up a Raspberry Pi: run updates
Setting up a Raspberry Pi: restart
Copy files to the Raspberry Pi
Connect your USB stick or external hard drive to your Raspberry Pi
Copy across all image files or zip files due to be transferred. Make note of the
name of your .csv file.
Save all files to /home/pi
If you want to create folders to organise files, do so under /home/pi, but
remember to update the file paths in your csv file to reflect the new folders.
Installing and configuring the Internet Archive python library
Open the command line (top menu bar)
and enter these commands:
$ sudo pip install internetarchive
$ ia configure
Enter your Internet Archive credentials
If you have stored images and csv in a
folder below /home/ia/, use cd to navigate
to the correct location of your files.
Installing and configuring the Internet Archive python library
Enter the following command, replacing [filename] with the name of your csv file.
This tells the Pi where to look for your metadata. Then the metadata tells it where
to find the files to upload, and how to describe them.
$ ia upload --spreadsheet=[filename].csv
Depending how many files you are uploading, the programme may run for several
hours. Do not close the command line or disconnect the Raspberry Pi.
Each file will be added to your Internet Archive account as it completes. It can take
up to 24 hours for the final documents to render on the live site.
Congratulations - you have batch uploaded to the Internet Archive!

More Related Content

Similar to Batch uploading to the Internet Archive using Python

SessionFive_ImportingandExportingData
SessionFive_ImportingandExportingDataSessionFive_ImportingandExportingData
SessionFive_ImportingandExportingData
Hellen Gakuruh
 
ExplanationThe files into which we are writing the date area called.pdf
ExplanationThe files into which we are writing the date area called.pdfExplanationThe files into which we are writing the date area called.pdf
ExplanationThe files into which we are writing the date area called.pdf
aquacare2008
 
Atmosphere 2014: Hadoop: Challenge accepted! - Arkadiusz Osinski, Robert Mroc...
Atmosphere 2014: Hadoop: Challenge accepted! - Arkadiusz Osinski, Robert Mroc...Atmosphere 2014: Hadoop: Challenge accepted! - Arkadiusz Osinski, Robert Mroc...
Atmosphere 2014: Hadoop: Challenge accepted! - Arkadiusz Osinski, Robert Mroc...
PROIDEA
 
Log into your netlab workstation then ssh to server.cnt1015.local wi.docx
Log into your netlab workstation then ssh to server.cnt1015.local wi.docxLog into your netlab workstation then ssh to server.cnt1015.local wi.docx
Log into your netlab workstation then ssh to server.cnt1015.local wi.docx
desteinbrook
 

Similar to Batch uploading to the Internet Archive using Python (20)

Data file handling in python introduction,opening & closing files
Data file handling in python introduction,opening & closing filesData file handling in python introduction,opening & closing files
Data file handling in python introduction,opening & closing files
 
Data file handling in python introduction,opening & closing files
Data file handling in python introduction,opening & closing filesData file handling in python introduction,opening & closing files
Data file handling in python introduction,opening & closing files
 
How To Theme Fedora
How To Theme FedoraHow To Theme Fedora
How To Theme Fedora
 
Data file handling in c++
Data file handling in c++Data file handling in c++
Data file handling in c++
 
Devry cis-170-c-i lab-7-of-7-sequential-files
Devry cis-170-c-i lab-7-of-7-sequential-filesDevry cis-170-c-i lab-7-of-7-sequential-files
Devry cis-170-c-i lab-7-of-7-sequential-files
 
Devry cis-170-c-i lab-7-of-7-sequential-files
Devry cis-170-c-i lab-7-of-7-sequential-filesDevry cis-170-c-i lab-7-of-7-sequential-files
Devry cis-170-c-i lab-7-of-7-sequential-files
 
Assignment 1 MapReduce With Hadoop
Assignment 1  MapReduce With HadoopAssignment 1  MapReduce With Hadoop
Assignment 1 MapReduce With Hadoop
 
SessionFive_ImportingandExportingData
SessionFive_ImportingandExportingDataSessionFive_ImportingandExportingData
SessionFive_ImportingandExportingData
 
Using Rational Publishing Engine to generate documents from Rational Rhapsody
Using Rational Publishing Engine to generate documents from Rational RhapsodyUsing Rational Publishing Engine to generate documents from Rational Rhapsody
Using Rational Publishing Engine to generate documents from Rational Rhapsody
 
ExplanationThe files into which we are writing the date area called.pdf
ExplanationThe files into which we are writing the date area called.pdfExplanationThe files into which we are writing the date area called.pdf
ExplanationThe files into which we are writing the date area called.pdf
 
R stata
R stataR stata
R stata
 
Introduction of file handling
Introduction of file handlingIntroduction of file handling
Introduction of file handling
 
Atmosphere 2014: Hadoop: Challenge accepted! - Arkadiusz Osinski, Robert Mroc...
Atmosphere 2014: Hadoop: Challenge accepted! - Arkadiusz Osinski, Robert Mroc...Atmosphere 2014: Hadoop: Challenge accepted! - Arkadiusz Osinski, Robert Mroc...
Atmosphere 2014: Hadoop: Challenge accepted! - Arkadiusz Osinski, Robert Mroc...
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Drupal 8 - Corso frontend development
Drupal 8 - Corso frontend developmentDrupal 8 - Corso frontend development
Drupal 8 - Corso frontend development
 
Log into your netlab workstation then ssh to server.cnt1015.local wi.docx
Log into your netlab workstation then ssh to server.cnt1015.local wi.docxLog into your netlab workstation then ssh to server.cnt1015.local wi.docx
Log into your netlab workstation then ssh to server.cnt1015.local wi.docx
 
Files
FilesFiles
Files
 
Pig
PigPig
Pig
 
Podcasting 101
Podcasting 101Podcasting 101
Podcasting 101
 
C Assignment Help
C Assignment HelpC Assignment Help
C Assignment Help
 

Recently uploaded

一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
ayvbos
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
F
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
pxcywzqs
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 

Recently uploaded (20)

Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
Call girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsCall girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girls
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 

Batch uploading to the Internet Archive using Python

  • 1. Batch uploading to the Internet Archive using Python... ...and a Raspberry Pi 4 Alison Harvey Special Collections and Archives Cardiff University
  • 2. This tutorial will explain: - How to upload multiple images or multi-page documents to the Internet Archive using Python (without a Linux PC) - How to name files for successful ingest - How to prepare metadata (information about the files) - How to set up a Raspberry Pi 4 out of the box - Installing the operating system - Configuring the operating system - Installing the Internet Archive Python library via command line - Connecting to your Internet Archive account - Uploading batches of files to your account
  • 3. You will need: - Raspberry Pi 4 - Monitor - HDMI to micro HDMI cable, to connect the monitor to the Raspberry Pi - USB keyboard and mouse - Empty 64GB SD card and adapter - to use as your Raspberry Pi’s hard drive - Image files or .zip files on a USB stick/external hard drive - A .csv file of metadata saved in the same location
  • 4. File-naming individual images for ingest - You may upload jpg, jpeg, jp2, tif, tiff, png, gif or bmp files - Filenames will form part of the final URL, and must be unique to the Internet Archive. Use a file-naming convention to create a code that is meaningful to you, but which is unlikely to have already been used. - This might be three letters (e.g. ABC) to describe the collection, two digits to describe the year the images were created (e.g. 2019), and three digits for a running number per file (001-003 etc): - ABC19001.tif - ABC19002.tif - ABC19003.tif
  • 5. File-naming multi-page texts for ingest - It is possible to upload multiple images belonging to a single text, and compile them into a zip file for ingest as a single digital object. - Follow the file-naming advice as for images, but use the running number portion of the filename to indicate the order in which the pages should appear in the final presentation, e.g. - Page 1 = XYZ19001.tif - Page 2 = XYZ19002.tif - Page 3 = XYZ19003.tif [prefix] [running number] - Then, zip all images belonging to the same document/book into a single file - Filename it with the prefix, followed by _images.zip, e.g. XYZ19_images.zip
  • 6. Preparing image metadata - identifier - file - title - description - collection - subject Use Excel or Google Sheets to create a spreadsheet with the following column headings: - contributor - date - publisher - creator - language - licenseurl - mediatype Headings must be spelt and capitalised exactly as above. Headings in bold are mandatory.
  • 7. Mandatory fields: identifier This is the image or zip filename, without the filetype prefix. It will be used as part of the final URL, and it must be unique to the Internet Archive, e.g. Images: ABC19001 Texts: XYZ19
  • 8. Mandatory fields: file This is the path to your file, including the filetype prefix This must begin with a forward slash. You will be storing files in your Raspberry Pi’s ‘home’ folder: /home/pi Your filepath will be: Images: /home/pi/ABC19001.tif Texts: /home/pi/XYZ19_images.zip Use a new row in the spreadsheet for every file.
  • 9. Mandatory fields: title This will appear on your account home page. Try to make it short but descriptive. Think about what information would be of most use to someone browsing through the visual content of your home page.
  • 10. Mandatory fields: description Here, you can add any additional information that will not fit in the title field. Images, e.g. Cathays Park: aerial photograph looking south west, 1962: Black and white photo. Showing Cathays Park and College Buildings. Cardiff Castle and Cardiff Arms Park visible in background. Dimensions: 160mm (h) x 210mm. Image area: 149mm (h) x 210mm (w). Texts, e.g. On holiday in wartime, France 1914: Handwritten and hand-illustrated journal of travels across France in the autumn of 1914. Provenance: Deposited by Frith-Beard in 1932. Archival ref: 410.
  • 11. Mandatory fields: collection Unless you already have collections set up on your account, use the default collections: Images: opensource_image Texts: opensource The collections must be spelt and capitalised exactly as above.
  • 12. Mandatory fields: subject This field can be searched, but also allows users to filter items on the same topic. Think about what information would be most helpful for your users to be able to filter. Use terminology, spelling, and capitalisation consistently, so that all matches group successfully under a single heading. If you have multiple subjects, use the column headings: - subject[0] - subject[1] - subject[2]... and so on. Always begin counting at 0, and do not add spaces.
  • 13. Optional fields: contributor If you add this field to every item you upload, it can be used a quick means of identifying and extracting information about all your items. Use advanced search to run a query on your contributor name that will return all items you have ever uploaded: and Archives, Cardiff University ‘Contributor’ could be your own name, or the name of your organisation. Make sure it is detailed enough to be unique, to ensure that you only retrieve your own results.
  • 14. Optional fields: date This field allows collections to be searched and filtered by date. This field must be machine-readable, expressing the date as either YYYY (e.g. 1982) or YYYY-MM-DD (e.g. 1982-11-26) If you do not have this information, or the date is estimated, leave this field blank, and use the description field to either indicate that the item is undated or of uncertain date.
  • 15. Optional fields: publisher, creator Add this information if you have it. Creator names can be used to filter content. Present them in a consistent format to ensure that all matches group successfully under a single heading, such as: Surname, forename, yyyy birth date-yyyy death date Owen, Morfydd, 1891-1918
  • 16. Optional fields: language As well as allowing users to search and filter by the language of the text, completing this field helps the Internet Archive to apply OCR to your items. OCR, or Optical Character Recognition, analyses the shape of letters found in images of printed text, and converts it into machine-encoded text. Users are then able to search for words and phrases found inside the digital objects. For multilingual texts, use the column headings: language[0], language[1] etc. Always use the relevant ISO 639-2 code for your language, e.g. - English (eng) - Welsh (wel) - Arabic (ara)
  • 17. Optional fields: licenseurl This field applies a license to your content, which tells users what they are allowed to do with it. Visit Creative Commons to generate an appropriate license, and copy the url into the spreadsheet, e.g. http://creativecommons.org/licenses/by/4.0/
  • 18. Optional fields: mediatype Images: image Texts: texts This field classifies the object as image or text for the purpose of filtering. The types must be spelt and capitalised exactly as above.
  • 19. Saving as csv When your table of metadata is complete, with an item on each row, you are ready to save as csv. Saving to csv directly from Excel can cause errors - if you have been working in Excel, paste cells into a Google Sheets document when your metadata is complete. From Google Sheets, select File > Download > Comma-separated values. Save the csv file in the same location as your image files or zip files.
  • 20. Setting up a Raspberry Pi: install and run OS imager Download Raspberry Pi imager for Windows or Mac on your usual PC. Insert the SD card in the adapter, plug into PC, and run Raspberry Pi imager.
  • 21. Setting up a Raspberry Pi: erase and format SD card First, prepare the SD card by erasing all previous data and format it, ready to flash the new OS. Under Choose OS, scroll down and select Erase. Under Choose Storage, select the SD card. Select Write.
  • 22. Setting up a Raspberry Pi: flash the OS to the SD card Under Choose OS, select Raspberry Pi OS (32 bit). Under Choose Storage, select the SD card. Select Write. This will flash the OS to the card - it may take several minutes.
  • 23. Setting up a Raspberry Pi: getting connected Eject the SD card from the PC, and remove from its adapter. Insert the card into the back of the Raspberry Pi as shown.
  • 24. Setting up a Raspberry Pi: getting connected Connect the monitor with the HDMI-Micro HDMI cable. Connect the keyboard and mouse. Finally, connect the power cable, and switch on power.
  • 25. Setting up a Raspberry Pi: installing the OS The Raspberry Pi will boot (this may take several minutes, as it installs the OS). When it’s complete, it will look like this. Work through the following set up stages.
  • 26. Setting up a Raspberry Pi: location and language
  • 27. Setting up a Raspberry Pi: change default password
  • 28. Setting up a Raspberry Pi: set up screen
  • 29. Setting up a Raspberry Pi: connect to wifi
  • 30. Setting up a Raspberry Pi: run updates
  • 31. Setting up a Raspberry Pi: restart
  • 32. Copy files to the Raspberry Pi Connect your USB stick or external hard drive to your Raspberry Pi Copy across all image files or zip files due to be transferred. Make note of the name of your .csv file. Save all files to /home/pi If you want to create folders to organise files, do so under /home/pi, but remember to update the file paths in your csv file to reflect the new folders.
  • 33. Installing and configuring the Internet Archive python library Open the command line (top menu bar) and enter these commands: $ sudo pip install internetarchive $ ia configure Enter your Internet Archive credentials If you have stored images and csv in a folder below /home/ia/, use cd to navigate to the correct location of your files.
  • 34. Installing and configuring the Internet Archive python library Enter the following command, replacing [filename] with the name of your csv file. This tells the Pi where to look for your metadata. Then the metadata tells it where to find the files to upload, and how to describe them. $ ia upload --spreadsheet=[filename].csv Depending how many files you are uploading, the programme may run for several hours. Do not close the command line or disconnect the Raspberry Pi. Each file will be added to your Internet Archive account as it completes. It can take up to 24 hours for the final documents to render on the live site. Congratulations - you have batch uploaded to the Internet Archive!