SlideShare a Scribd company logo
1 of 13
Download to read offline
Social Feed Manager
Laura Wrubel
@liblaura @SocialFeedMgr
http://go.gwu.edu/sfm
Web Archives and Digital Libraries workshop, JCDL 2016
Social Feed Manager is supported by the National Historical Publications & Records Commission
Allows users to create collections of data
from social media platforms
Open source software, not a black box
Research documentation (for researchers)
≈ provenance metadata (for archivists)
(and it’s really important for both)
Creation
Authoring of the social media
● Creation metadata is provided by Twitter as JSON via API.
● Social media user metadata:
○ Screen name
○ Date account created
○ Location
● Tweet metadata:
○ Date
○ Tweet text
○ Mentions
○ Hashtags
○ URLs
○ Source (how posted)
● SFM records it in WARC files.
Selection
Decisions by the SFM user which leads SFM to harvest the tweet
Recorded in the SFM database
● Collection information
○ Harvest type
○ Harvest options (e.g., incremental, harvest web resources)
○ Credentials (API keys)
○ Description of collection
● Seeds for the collection (which vary by platform)
○ Screen name
○ UID
○ Keywords to filter on
● Change log
○ Change note
○ Fields changed
○ User who made change
○ Date of change
Collection
How SFM retrieved the tweet from Twitter’s API
● Collection metadata is received by SFM’s Twitter harvester & recorded
within WARCs.
● WARCs include the exact HTTP request/response
○ URL with params such as user account id or keywords
○ HTTP headers
● WARC record headers also include:
○ Date WARC record created
○ Server information
○ Fixities
Collection (cont)
● WARC file metadata, recorded in the SFM database:
○ File location
○ File size
○ Fixity
○ Creation date
● Harvest metadata:
○ Date
○ Collection
○ Date harvest started
○ Date harvest ended
○ Messages (informational, warning, or error)
○ Token/seed updates
○ Basic stats on number of items collected
Working paper: http://bit.ly/tweet-prov
Comments welcome!
How is this useful? http://bit.ly/tweet-prov
● Which of this provenance metadata do you (researcher,
archivist, librarian, etc.) want access to?
● How do you want access to this metadata? In SFM’s UI? In
reports when exports are created? Exposed via SFM’s
software libraries? A REST API? Machine-readable?
Human-readable?
● What metadata have we missed?
● Do the answers to the previous questions vary by discipline
(e.g., humanities, social science, etc.)?
● Are there other relevant specifications or standards that we
should consider? Is there value in a mapping to or providing
output in accordance with metadata standards such as
PREMIS or PROV?

More Related Content

Similar to Social Feed Manager, WADL/JCDL 2016

Streamsets and spark in Retail
Streamsets and spark in RetailStreamsets and spark in Retail
Streamsets and spark in RetailHari Shreedharan
 
Analytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari ShreedharanAnalytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari ShreedharanDatabricks
 
LiveFolders as feeds
LiveFolders as feedsLiveFolders as feeds
LiveFolders as feedsGabor Paller
 
Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...yalisassoon
 
hacking techniques and intrusion techniques useful in OSINT.pptx
hacking techniques and intrusion techniques useful in OSINT.pptxhacking techniques and intrusion techniques useful in OSINT.pptx
hacking techniques and intrusion techniques useful in OSINT.pptxsconalbg
 
Resource discovery and information sharing: reaching the 2.0 turn
Resource discovery and information sharing: reaching the 2.0 turnResource discovery and information sharing: reaching the 2.0 turn
Resource discovery and information sharing: reaching the 2.0 turnBonaria Biancu
 
4th Content Providers Community Call
4th Content Providers Community Call4th Content Providers Community Call
4th Content Providers Community CallOpenAIRE
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDataWorks Summit
 
Digital game preservation conference 12 25-2018
Digital game preservation conference   12 25-2018Digital game preservation conference   12 25-2018
Digital game preservation conference 12 25-2018peterchanws
 
What Goes In Must Come Out: Egress-Assess and Data Exfiltration
What Goes In Must Come Out: Egress-Assess and Data ExfiltrationWhat Goes In Must Come Out: Egress-Assess and Data Exfiltration
What Goes In Must Come Out: Egress-Assess and Data ExfiltrationCTruncer
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScaleSeunghyun Lee
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...openminted_eu
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
 
Ismael Benito & Arnau Gàmez - Hacking Tokens: A Massive PoC [rooted2018]
Ismael Benito & Arnau Gàmez - Hacking Tokens: A Massive PoC [rooted2018]Ismael Benito & Arnau Gàmez - Hacking Tokens: A Massive PoC [rooted2018]
Ismael Benito & Arnau Gàmez - Hacking Tokens: A Massive PoC [rooted2018]RootedCON
 
OpenSearch.pdf
OpenSearch.pdfOpenSearch.pdf
OpenSearch.pdfAbhi Jain
 

Similar to Social Feed Manager, WADL/JCDL 2016 (20)

Streamsets and spark in Retail
Streamsets and spark in RetailStreamsets and spark in Retail
Streamsets and spark in Retail
 
Analytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari ShreedharanAnalytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari Shreedharan
 
Subj3ct
Subj3ctSubj3ct
Subj3ct
 
LiveFolders as feeds
LiveFolders as feedsLiveFolders as feeds
LiveFolders as feeds
 
Livefoldersasfeeds
LivefoldersasfeedsLivefoldersasfeeds
Livefoldersasfeeds
 
Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...
 
hacking techniques and intrusion techniques useful in OSINT.pptx
hacking techniques and intrusion techniques useful in OSINT.pptxhacking techniques and intrusion techniques useful in OSINT.pptx
hacking techniques and intrusion techniques useful in OSINT.pptx
 
Week10
Week10Week10
Week10
 
Resource discovery and information sharing: reaching the 2.0 turn
Resource discovery and information sharing: reaching the 2.0 turnResource discovery and information sharing: reaching the 2.0 turn
Resource discovery and information sharing: reaching the 2.0 turn
 
4th Content Providers Community Call
4th Content Providers Community Call4th Content Providers Community Call
4th Content Providers Community Call
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druid
 
Digital game preservation conference 12 25-2018
Digital game preservation conference   12 25-2018Digital game preservation conference   12 25-2018
Digital game preservation conference 12 25-2018
 
Publishing Linked Data using Schema.org
Publishing Linked Data using Schema.orgPublishing Linked Data using Schema.org
Publishing Linked Data using Schema.org
 
What Goes In Must Come Out: Egress-Assess and Data Exfiltration
What Goes In Must Come Out: Egress-Assess and Data ExfiltrationWhat Goes In Must Come Out: Egress-Assess and Data Exfiltration
What Goes In Must Come Out: Egress-Assess and Data Exfiltration
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
 
Draux "Working with Scholarly APIs: A NISO Training Series, Session Four: Dig...
Draux "Working with Scholarly APIs: A NISO Training Series, Session Four: Dig...Draux "Working with Scholarly APIs: A NISO Training Series, Session Four: Dig...
Draux "Working with Scholarly APIs: A NISO Training Series, Session Four: Dig...
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
Ismael Benito & Arnau Gàmez - Hacking Tokens: A Massive PoC [rooted2018]
Ismael Benito & Arnau Gàmez - Hacking Tokens: A Massive PoC [rooted2018]Ismael Benito & Arnau Gàmez - Hacking Tokens: A Massive PoC [rooted2018]
Ismael Benito & Arnau Gàmez - Hacking Tokens: A Massive PoC [rooted2018]
 
OpenSearch.pdf
OpenSearch.pdfOpenSearch.pdf
OpenSearch.pdf
 

Recently uploaded

Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Deliverybabeytanya
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...SofiyaSharma5
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsstephieert
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirtrahman018755
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 

Recently uploaded (20)

Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girls
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 

Social Feed Manager, WADL/JCDL 2016

  • 1. Social Feed Manager Laura Wrubel @liblaura @SocialFeedMgr http://go.gwu.edu/sfm Web Archives and Digital Libraries workshop, JCDL 2016 Social Feed Manager is supported by the National Historical Publications & Records Commission
  • 2. Allows users to create collections of data from social media platforms
  • 3. Open source software, not a black box
  • 4.
  • 5.
  • 6.
  • 7. Research documentation (for researchers) ≈ provenance metadata (for archivists) (and it’s really important for both)
  • 8. Creation Authoring of the social media ● Creation metadata is provided by Twitter as JSON via API. ● Social media user metadata: ○ Screen name ○ Date account created ○ Location ● Tweet metadata: ○ Date ○ Tweet text ○ Mentions ○ Hashtags ○ URLs ○ Source (how posted) ● SFM records it in WARC files.
  • 9. Selection Decisions by the SFM user which leads SFM to harvest the tweet Recorded in the SFM database ● Collection information ○ Harvest type ○ Harvest options (e.g., incremental, harvest web resources) ○ Credentials (API keys) ○ Description of collection ● Seeds for the collection (which vary by platform) ○ Screen name ○ UID ○ Keywords to filter on ● Change log ○ Change note ○ Fields changed ○ User who made change ○ Date of change
  • 10. Collection How SFM retrieved the tweet from Twitter’s API ● Collection metadata is received by SFM’s Twitter harvester & recorded within WARCs. ● WARCs include the exact HTTP request/response ○ URL with params such as user account id or keywords ○ HTTP headers ● WARC record headers also include: ○ Date WARC record created ○ Server information ○ Fixities
  • 11. Collection (cont) ● WARC file metadata, recorded in the SFM database: ○ File location ○ File size ○ Fixity ○ Creation date ● Harvest metadata: ○ Date ○ Collection ○ Date harvest started ○ Date harvest ended ○ Messages (informational, warning, or error) ○ Token/seed updates ○ Basic stats on number of items collected
  • 13. How is this useful? http://bit.ly/tweet-prov ● Which of this provenance metadata do you (researcher, archivist, librarian, etc.) want access to? ● How do you want access to this metadata? In SFM’s UI? In reports when exports are created? Exposed via SFM’s software libraries? A REST API? Machine-readable? Human-readable? ● What metadata have we missed? ● Do the answers to the previous questions vary by discipline (e.g., humanities, social science, etc.)? ● Are there other relevant specifications or standards that we should consider? Is there value in a mapping to or providing output in accordance with metadata standards such as PREMIS or PROV?