SlideShare a Scribd company logo
BACK TO BASICS:
BIG DATA AND EDUCATION IN THE SOCIAL SCIENCES
Matthew S. Weber
Rutgers University
AEJMC 2014
Montreal, Canada
2
5
Breaking down the walls of big data?
6
http://archivehub.rutgers.ed
u
EXAMPLE: Undergraduates
Learning About Your Network
• By being aware of your connections, you can take an active role
in managing your connections
– Be aware of the connections that you have, and what they contribute to your
“network”
– Seek out networking opportunities
– Forge connections with people you admire and respect
LinkedIn Network Maps
Assignment Prompt
Prompt: Use www.touchgraph.com/facebook to generate a map of
your Facebook network. Spend some time exploring your different
connections, and then respond to the following:
• What different types of clusters do you see? Be specific in
identifying at least 2 – 3 different clusters.
• Is there someone in your network you forgot about? Who? Why?
• Identify 2 people who you feel are the most useful connections in
your network based on where they are positioned. Who are they
and why are they useful?
12
EXAMPLE: PhD
SET DEFAULT_PARALLEL 30;
titles = LOAD '/home/hai/Projects/HistoryCrawl/Data/IA/2_26_2014/nsf1.wat.gz' USING
org.archive.hadoop.ArchiveJSONViewLoader('Envelope.Payload-Metadata.HTTP-Response-
Metadata.HTML-Metadata','Envelope.WARC-Header-Metadata.WARC-Target-
URI','Envelope.WARC-Header-Metadata.WARC-Date','Envelope.WARC-Header-
Metadata.Content-Type','Envelope.WARC-Header-Metadata.Content-Length') AS
(links:chararray,target:chararray,date:chararray,contenttype:chararray,contentlength:chararray)
;
nonnulls = filter titles by links is not null;
paths = foreach nonnulls generate org.sci.historycrawl.parser($0,$1,$2),$2,$3,$4;
i6 = foreach paths generate bagwati.url,$1,$2,$3;
i7 = foreach i6 generate flatten($0) as
words,org.sci.historycrawl.formatdate(SUBSTRING($1,0,10)),$2,$3;
i8 = foreach i7 generate
org.sci.historycrawl.getsourceURL($0),org.sci.historycrawl.getdstURL($0),org.sci.historycrawl.ge
tText($0),$1,$2,(long)$3;
i9 = group i8 by ($0,$1,$3);
i10 = foreach i9 generate
FLATTEN(group),FLATTEN(TOP(1,0,i8.$2)),COUNT(i8),FLATTEN(TOP(1,0,i8.$4)),SUM(i8.$5);
i11 = filter i10 by $0 is not null;
i12 = filter i11 by $1 is not null;
SET DEFAULT_PARALLEL 30;
titles = LOAD '/home/hai/Projects/HistoryCrawl/Data/IA/2_26_2014/nsf1.wat.gz' USING
org.archive.hadoop.ArchiveJSONViewLoader('Envelope.Payload-Metadata.HTTP-Response-
Metadata.HTML-Metadata','Envelope.WARC-Header-Metadata.WARC-Target-
URI','Envelope.WARC-Header-Metadata.WARC-Date','Envelope.WARC-Header-
Metadata.Content-Type','Envelope.WARC-Header-Metadata.Content-Length') AS
(links:chararray,target:chararray,date:chararray,contenttype:chararray,contentlength:chararray)
;
nonnulls = filter titles by links is not null;
paths = foreach nonnulls generate org.sci.historycrawl.parser($0,$1,$2),$2,$3,$4;
i6 = foreach paths generate bagwati.url,$1,$2,$3;
i7 = foreach i6 generate flatten($0) as
words,org.sci.historycrawl.formatdate(SUBSTRING($1,0,10)),$2,$3;
i8 = foreach i7 generate
org.sci.historycrawl.getsourceURL($0),org.sci.historycrawl.getdstURL($0),org.sci.historycrawl.ge
tText($0),$1,$2,(long)$3;
i9 = group i8 by ($0,$1,$3);
i10 = foreach i9 generate
FLATTEN(group),FLATTEN(TOP(1,0,i8.$2)),COUNT(i8),FLATTEN(TOP(1,0,i8.$4)),SUM(i8.$5);
i11 = filter i10 by $0 is not null;
i12 = filter i11 by $1 is not null;
SET DEFAULT_PARALLEL 30;
titles = LOAD '/home/hai/Projects/HistoryCrawl/Data/IA/2_26_2014/nsf1.wat.gz' USING
org.archive.hadoop.ArchiveJSONViewLoader('Envelope.Payload-Metadata.HTTP-Response-
Metadata.HTML-Metadata','Envelope.WARC-Header-Metadata.WARC-Target-
URI','Envelope.WARC-Header-Metadata.WARC-Date','Envelope.WARC-Header-
Metadata.Content-Type','Envelope.WARC-Header-Metadata.Content-Length') AS
(links:chararray,target:chararray,date:chararray,contenttype:chararray,contentlength:chararray)
;
nonnulls = filter titles by links is not null;
paths = foreach nonnulls generate org.sci.historycrawl.parser($0,$1,$2),$2,$3,$4;
i6 = foreach paths generate bagwati.url,$1,$2,$3;
i7 = foreach i6 generate flatten($0) as
words,org.sci.historycrawl.formatdate(SUBSTRING($1,0,10)),$2,$3;
i8 = foreach i7 generate
org.sci.historycrawl.getsourceURL($0),org.sci.historycrawl.getdstURL($0),org.sci.historycrawl.ge
tText($0),$1,$2,(long)$3;
i9 = group i8 by ($0,$1,$3);
i10 = foreach i9 generate
FLATTEN(group),FLATTEN(TOP(1,0,i8.$2)),COUNT(i8),FLATTEN(TOP(1,0,i8.$4)),SUM(i8.$5);
i11 = filter i10 by $0 is not null;
i12 = filter i11 by $1 is not null;
18
Source | Destination | Date | Frequency | Content Type | Bytes | Descriptive Text
Link Data:
http://gawker.com/5953665/mitt-romneys-
staff-played-the-media-covering-them-in-a-
friendly-game-of-flag-football
Mitt Romney's Staff Played the Media Covering
Them in a Friendly Game of Flag
http://gawker.com
2012-10-22
19
Dataset Research Potential Dates Captures Unique URLs
Hurricane Katrina Online networks and organizational
resilience (Chewning, Lai and Doerfel,
2012; Perry, Taylor and Doerfel, 2003) in
the wake of disasters; information
dissemination
2003 – 2012 1,694,236 663,740
Superstorm
Sandy
2003 – 2012 41,703,112 20,013,455
US Senate Study the growth of political activity in
online environments (Adamic & Glance,
2005; Bruns, 2007; Chang & Park, 2012);
polarization & media discourse
109th – 112th
Congresses
26,965,770 8,674,397
US House 51,840,777 12,410,014
Occupy Wall
Street
Previous research on NGOs in the online
environment (Bach & Stark, 2004;
Shumate, 2003, 2012; Shumate, Fulk, &
Monge, 2005); use of hyperlink data to
study the formation and role of alliances
between SMOs
2010 – 2012 247,928,272 11,3259,655
US Media
Previous studies of news media
organizations (Greer & Mensing, 2006;
Weber, 2012; Weber & Monge, In
Press); focus on evolutionary patterns
2008 – 2012 1,315,132,555 539,184,823
• Email me! matthew.weber@rutgers.edu
• ArchiveHub: http://archivehub.rutgers.edu
• The Team
– Kris Carpenter, Vinay Goel, Internet Archive
– David Lazer, Katherine Ognyanova, Northeastern University
– Allie Kosterich, Hai Nguyen, Luan Nguyen, Marya Doerfel, Rutgers University
– Peter Monge, Ayushman Datta, Kristen Guth, USC
20Research supported by NSF Award #1244727 and the NetSCI Lab @ Rutgers

More Related Content

Viewers also liked

Wire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub ProjectWire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub Project
mwe400
 
032415 marketing 101 watershed upload
032415 marketing 101   watershed upload032415 marketing 101   watershed upload
032415 marketing 101 watershed upload
mwe400
 
Web Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives UnleashedWeb Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives Unleashed
mwe400
 
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
mwe400
 
Internet Archives and Social Science Research - Yeungnam University
Internet Archives and Social Science Research - Yeungnam UniversityInternet Archives and Social Science Research - Yeungnam University
Internet Archives and Social Science Research - Yeungnam University
mwe400
 
Inspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketersInspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketers
Center for Compassionate Touch LLC
 

Viewers also liked (6)

Wire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub ProjectWire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub Project
 
032415 marketing 101 watershed upload
032415 marketing 101   watershed upload032415 marketing 101   watershed upload
032415 marketing 101 watershed upload
 
Web Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives UnleashedWeb Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives Unleashed
 
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
 
Internet Archives and Social Science Research - Yeungnam University
Internet Archives and Social Science Research - Yeungnam UniversityInternet Archives and Social Science Research - Yeungnam University
Internet Archives and Social Science Research - Yeungnam University
 
Inspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketersInspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketers
 

Similar to AEJMC 2014 - Big Data and Education

Small Worlds Social Graphs Social Media
Small Worlds Social Graphs Social MediaSmall Worlds Social Graphs Social Media
Small Worlds Social Graphs Social Media
suresh sood
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)
Han Woo PARK
 
Conducting Twitter Reserch
Conducting Twitter ReserchConducting Twitter Reserch
Conducting Twitter Reserch
Kim Holmberg
 
Decomposing Social and Semantic Networks in Emerging “Big Data” Research
Decomposing Social and Semantic Networks in Emerging “Big Data” ResearchDecomposing Social and Semantic Networks in Emerging “Big Data” Research
Decomposing Social and Semantic Networks in Emerging “Big Data” Research
Han Woo PARK
 
CeB - f - s01
CeB - f - s01CeB - f - s01
CeB - f - s01
gauvins
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
Josh Cowls
 
Social Media Research Methods
Social Media Research MethodsSocial Media Research Methods
Social Media Research Methods
Katrin Weller
 
Associating events with people on social networks using a priori
Associating events with people on social networks using a prioriAssociating events with people on social networks using a priori
Associating events with people on social networks using a priori
csandit
 
My Dissertation Defense
My Dissertation Defense My Dissertation Defense
My Dissertation Defense
Laura Pasquini
 
2009 December NodeXL Overview
2009 December NodeXL Overview2009 December NodeXL Overview
2009 December NodeXL Overview
Marc Smith
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
University of Washington
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network Approach
Andry Alamsyah
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ Nettab
Duncan Hull
 
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdfMaemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
WARCnet
 
Mapping big data science
Mapping big data scienceMapping big data science
Mapping big data science
Han Woo PARK
 
Understanding Scientific and Societal Adoption and Impact of Science Through ...
Understanding Scientific and Societal Adoption and Impact of Science Through ...Understanding Scientific and Societal Adoption and Impact of Science Through ...
Understanding Scientific and Societal Adoption and Impact of Science Through ...
Stefan Dietze
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Science
datasciencekorea
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생
Han Woo PARK
 
2010 sept - mobile web africa - marc smith - says who - mapping social medi...
2010   sept - mobile web africa - marc smith - says who - mapping social medi...2010   sept - mobile web africa - marc smith - says who - mapping social medi...
2010 sept - mobile web africa - marc smith - says who - mapping social medi...
Marc Smith
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
Paco Nathan
 

Similar to AEJMC 2014 - Big Data and Education (20)

Small Worlds Social Graphs Social Media
Small Worlds Social Graphs Social MediaSmall Worlds Social Graphs Social Media
Small Worlds Social Graphs Social Media
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)
 
Conducting Twitter Reserch
Conducting Twitter ReserchConducting Twitter Reserch
Conducting Twitter Reserch
 
Decomposing Social and Semantic Networks in Emerging “Big Data” Research
Decomposing Social and Semantic Networks in Emerging “Big Data” ResearchDecomposing Social and Semantic Networks in Emerging “Big Data” Research
Decomposing Social and Semantic Networks in Emerging “Big Data” Research
 
CeB - f - s01
CeB - f - s01CeB - f - s01
CeB - f - s01
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
Social Media Research Methods
Social Media Research MethodsSocial Media Research Methods
Social Media Research Methods
 
Associating events with people on social networks using a priori
Associating events with people on social networks using a prioriAssociating events with people on social networks using a priori
Associating events with people on social networks using a priori
 
My Dissertation Defense
My Dissertation Defense My Dissertation Defense
My Dissertation Defense
 
2009 December NodeXL Overview
2009 December NodeXL Overview2009 December NodeXL Overview
2009 December NodeXL Overview
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network Approach
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ Nettab
 
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdfMaemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
 
Mapping big data science
Mapping big data scienceMapping big data science
Mapping big data science
 
Understanding Scientific and Societal Adoption and Impact of Science Through ...
Understanding Scientific and Societal Adoption and Impact of Science Through ...Understanding Scientific and Societal Adoption and Impact of Science Through ...
Understanding Scientific and Societal Adoption and Impact of Science Through ...
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Science
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생
 
2010 sept - mobile web africa - marc smith - says who - mapping social medi...
2010   sept - mobile web africa - marc smith - says who - mapping social medi...2010   sept - mobile web africa - marc smith - says who - mapping social medi...
2010 sept - mobile web africa - marc smith - says who - mapping social medi...
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 

Recently uploaded

What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
NgcHiNguyn25
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5
sayalidalavi006
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 

Recently uploaded (20)

What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 

AEJMC 2014 - Big Data and Education

  • 1. BACK TO BASICS: BIG DATA AND EDUCATION IN THE SOCIAL SCIENCES Matthew S. Weber Rutgers University AEJMC 2014 Montreal, Canada
  • 2. 2
  • 3.
  • 4.
  • 5. 5 Breaking down the walls of big data?
  • 8. Learning About Your Network • By being aware of your connections, you can take an active role in managing your connections – Be aware of the connections that you have, and what they contribute to your “network” – Seek out networking opportunities – Forge connections with people you admire and respect
  • 9.
  • 11.
  • 12. Assignment Prompt Prompt: Use www.touchgraph.com/facebook to generate a map of your Facebook network. Spend some time exploring your different connections, and then respond to the following: • What different types of clusters do you see? Be specific in identifying at least 2 – 3 different clusters. • Is there someone in your network you forgot about? Who? Why? • Identify 2 people who you feel are the most useful connections in your network based on where they are positioned. Who are they and why are they useful? 12
  • 14.
  • 15. SET DEFAULT_PARALLEL 30; titles = LOAD '/home/hai/Projects/HistoryCrawl/Data/IA/2_26_2014/nsf1.wat.gz' USING org.archive.hadoop.ArchiveJSONViewLoader('Envelope.Payload-Metadata.HTTP-Response- Metadata.HTML-Metadata','Envelope.WARC-Header-Metadata.WARC-Target- URI','Envelope.WARC-Header-Metadata.WARC-Date','Envelope.WARC-Header- Metadata.Content-Type','Envelope.WARC-Header-Metadata.Content-Length') AS (links:chararray,target:chararray,date:chararray,contenttype:chararray,contentlength:chararray) ; nonnulls = filter titles by links is not null; paths = foreach nonnulls generate org.sci.historycrawl.parser($0,$1,$2),$2,$3,$4; i6 = foreach paths generate bagwati.url,$1,$2,$3; i7 = foreach i6 generate flatten($0) as words,org.sci.historycrawl.formatdate(SUBSTRING($1,0,10)),$2,$3; i8 = foreach i7 generate org.sci.historycrawl.getsourceURL($0),org.sci.historycrawl.getdstURL($0),org.sci.historycrawl.ge tText($0),$1,$2,(long)$3; i9 = group i8 by ($0,$1,$3); i10 = foreach i9 generate FLATTEN(group),FLATTEN(TOP(1,0,i8.$2)),COUNT(i8),FLATTEN(TOP(1,0,i8.$4)),SUM(i8.$5); i11 = filter i10 by $0 is not null; i12 = filter i11 by $1 is not null;
  • 16. SET DEFAULT_PARALLEL 30; titles = LOAD '/home/hai/Projects/HistoryCrawl/Data/IA/2_26_2014/nsf1.wat.gz' USING org.archive.hadoop.ArchiveJSONViewLoader('Envelope.Payload-Metadata.HTTP-Response- Metadata.HTML-Metadata','Envelope.WARC-Header-Metadata.WARC-Target- URI','Envelope.WARC-Header-Metadata.WARC-Date','Envelope.WARC-Header- Metadata.Content-Type','Envelope.WARC-Header-Metadata.Content-Length') AS (links:chararray,target:chararray,date:chararray,contenttype:chararray,contentlength:chararray) ; nonnulls = filter titles by links is not null; paths = foreach nonnulls generate org.sci.historycrawl.parser($0,$1,$2),$2,$3,$4; i6 = foreach paths generate bagwati.url,$1,$2,$3; i7 = foreach i6 generate flatten($0) as words,org.sci.historycrawl.formatdate(SUBSTRING($1,0,10)),$2,$3; i8 = foreach i7 generate org.sci.historycrawl.getsourceURL($0),org.sci.historycrawl.getdstURL($0),org.sci.historycrawl.ge tText($0),$1,$2,(long)$3; i9 = group i8 by ($0,$1,$3); i10 = foreach i9 generate FLATTEN(group),FLATTEN(TOP(1,0,i8.$2)),COUNT(i8),FLATTEN(TOP(1,0,i8.$4)),SUM(i8.$5); i11 = filter i10 by $0 is not null; i12 = filter i11 by $1 is not null;
  • 17. SET DEFAULT_PARALLEL 30; titles = LOAD '/home/hai/Projects/HistoryCrawl/Data/IA/2_26_2014/nsf1.wat.gz' USING org.archive.hadoop.ArchiveJSONViewLoader('Envelope.Payload-Metadata.HTTP-Response- Metadata.HTML-Metadata','Envelope.WARC-Header-Metadata.WARC-Target- URI','Envelope.WARC-Header-Metadata.WARC-Date','Envelope.WARC-Header- Metadata.Content-Type','Envelope.WARC-Header-Metadata.Content-Length') AS (links:chararray,target:chararray,date:chararray,contenttype:chararray,contentlength:chararray) ; nonnulls = filter titles by links is not null; paths = foreach nonnulls generate org.sci.historycrawl.parser($0,$1,$2),$2,$3,$4; i6 = foreach paths generate bagwati.url,$1,$2,$3; i7 = foreach i6 generate flatten($0) as words,org.sci.historycrawl.formatdate(SUBSTRING($1,0,10)),$2,$3; i8 = foreach i7 generate org.sci.historycrawl.getsourceURL($0),org.sci.historycrawl.getdstURL($0),org.sci.historycrawl.ge tText($0),$1,$2,(long)$3; i9 = group i8 by ($0,$1,$3); i10 = foreach i9 generate FLATTEN(group),FLATTEN(TOP(1,0,i8.$2)),COUNT(i8),FLATTEN(TOP(1,0,i8.$4)),SUM(i8.$5); i11 = filter i10 by $0 is not null; i12 = filter i11 by $1 is not null;
  • 18. 18 Source | Destination | Date | Frequency | Content Type | Bytes | Descriptive Text Link Data: http://gawker.com/5953665/mitt-romneys- staff-played-the-media-covering-them-in-a- friendly-game-of-flag-football Mitt Romney's Staff Played the Media Covering Them in a Friendly Game of Flag http://gawker.com 2012-10-22
  • 19. 19 Dataset Research Potential Dates Captures Unique URLs Hurricane Katrina Online networks and organizational resilience (Chewning, Lai and Doerfel, 2012; Perry, Taylor and Doerfel, 2003) in the wake of disasters; information dissemination 2003 – 2012 1,694,236 663,740 Superstorm Sandy 2003 – 2012 41,703,112 20,013,455 US Senate Study the growth of political activity in online environments (Adamic & Glance, 2005; Bruns, 2007; Chang & Park, 2012); polarization & media discourse 109th – 112th Congresses 26,965,770 8,674,397 US House 51,840,777 12,410,014 Occupy Wall Street Previous research on NGOs in the online environment (Bach & Stark, 2004; Shumate, 2003, 2012; Shumate, Fulk, & Monge, 2005); use of hyperlink data to study the formation and role of alliances between SMOs 2010 – 2012 247,928,272 11,3259,655 US Media Previous studies of news media organizations (Greer & Mensing, 2006; Weber, 2012; Weber & Monge, In Press); focus on evolutionary patterns 2008 – 2012 1,315,132,555 539,184,823
  • 20. • Email me! matthew.weber@rutgers.edu • ArchiveHub: http://archivehub.rutgers.edu • The Team – Kris Carpenter, Vinay Goel, Internet Archive – David Lazer, Katherine Ognyanova, Northeastern University – Allie Kosterich, Hai Nguyen, Luan Nguyen, Marya Doerfel, Rutgers University – Peter Monge, Ayushman Datta, Kristen Guth, USC 20Research supported by NSF Award #1244727 and the NetSCI Lab @ Rutgers

Editor's Notes

  1. I believe that when engaging with large-scale data it’s important to break down the barrier between the technical and the social; that doesn’t mean that every journalism student should know code, or that every programmer should understand Hofstede, but even a basic level of cross-fertilization can help to advance a better understand of work with large-scale data. This is an area that often sits at the intersection of fields – cross-fertilizaiton is critical.
  2. Lead PI on an NSF funded project to develop new tools for reearchers to access large scale data from the Internet Archive – currently working with 40TB of raw data.
  3. Code often looks scary, but one of the challenges is to break it down and make it accessible.
  4. Code often looks scary, but one of the challenges is to break it down and make it accessible. Similar instructional approach with things like R
  5. 20th Century Collection = 9TB of metadata Media Seed List = 4,891