SlideShare a Scribd company logo
3 Affordable Solutions
Open Data and Open Source Data Science for You
March, 2016
. © TechSoup Global | All rights reserved2
Introduction
Paula Alves @LadyData
Steph Nagoski @InformationChef
Session ID 104 #16NTCopendata
Materials & Collaboration Notes
http://po.st/opendata-16NTC
Evaluation Link: http://po.st/fUt2gY
** WARNING: This presentation exposes information that you may find
disturbing.
. © TechSoup Global | All rights reserved3
Outline
Data Wrangling, Merging
Small Data Problems
Open Data Examples
Online Abuse management - Social and Technical, together
Abusive Community Analysis - Example using Reddit data
Bot Detection & Usage
. © TechSoup Global | All rights reserved4
Data Wrangling / Data Merging Tools
Cleaning and merging multiple data sources :
databases, CSV, txt files, JSON, XML, web services & Open Data Files
Trifacta www.trifacta.com/trifacta-wrangler/
OpenRefine - previously Google Refine http://openrefine.org/
Microsoft offerings you might already have: SSIS & Azure Data Factory
Other options include Crowdflower for data cleansing & tagging
http://crowdflower.com
. © TechSoup Global | All rights reserved5
Data Wrangling - Trifacta
. © TechSoup Global | All rights reserved6
Data Wrangling - OpenRefine
Clean, Merge, and Transform data – for Javascript developers
. © TechSoup Global | All rights reserved7
Crowdflower
Tool to enrich your data through technical and crowdsourced tagging,
flagging, manual review.
. © TechSoup Global | All rights reserved8
Big Data? We all hope we grow that big. For now…
. © TechSoup Global | All rights reserved9
Small Data Problem Examples
San Francisco Health Improvement Partnership - Alcohol Policy
Partnership Working Group w/Trifacta
https://jrnew.shinyapps.io/sfhip-app/
Is neighborhood crime correlated with alcohol sales?
. © TechSoup Global | All rights reserved10
Small Data Problem Examples
Bosnian/Hertzegovinan Electoral data w/Google Refine
https://www.youtube.com/watch?v=BcxgAOCFppY
Southern Poverty Law Center Hate group list
https://www.splcenter.org/hate-map
Conversion Therapy source list
http://www.truthwinsout.org/ex-gay-consumer-fraud-division/
Govt Data sources - 18F - College Information
https://collegescorecard.ed.gov/search/?major=computer&sort=advantage:desc
. © TechSoup Global | All rights reserved11
Outline
Data Wrangling, Merging
Small Data Problems
Open Data Examples
Online Abuse management - Social and Technical, together
Abusive Community Analysis - Example using Reddit data
Bot Detection & Usage
. © TechSoup Global | All rights reserved12
Reusable Open Data Analysis
DataKind - http://www.datakind.org/blog/open-data-in-action-our-top-25
. © TechSoup Global | All rights reserved13
Reusable Open Data Analysis
CivicTech – Trends in Civic Tech Investment tool
http://knightfoundation.org/features/civictech/
. © TechSoup Global | All rights reserved14
Reusable Open Data Analysis
Data For Good: http://datalook.io/non-techies/
Library of reusable projects, with a focus on Non-Tech Users!
. © TechSoup Global | All rights reserved15
Open Data Formats -> Open Data Services
18F - GSA branch committed to open development & open data
https://18f.gsa.gov/
Open Data Maker: convert CSV files to an extensible open API
w/analytics https://github.com/18F/open-data-maker
First large example of use of OpenDataMaker API:
https://collegescorecard.ed.gov/
. © TechSoup Global | All rights reserved16
Free Speech, and Groups that may disagree w/you
#BlackLivesMatter
Feminist Frequency - Media Criticism from Feminist perspective
Jewish and Islamic communities
Disability Organizations
Reproductive Health and Women’s rights
Any nonprofit that advocates for oppressed minorities
. © TechSoup Global | All rights reserved17
Handling Online Abuse 1
Crowdsourcing support/handling:
Online Abuse Prevention Initiative (OAPI)
http://onlineabuseprevention.org/
Projects: https://github.com/oapi
Hollaback’s new Heartmob https://iheartmob.org/
Shared Block Lists - https://blocktogether.org/
Hiding blocked users from Twitter Search
http://blog.randi.io/2016/01/13/hiding-blocked-users-from-twitter-search/
GoodGame AutoBlocker https://github.com/freebsdgirl/ggautoblocker
. © TechSoup Global | All rights reserved18
Outline
Data Wrangling, Merging
Small Data Problems
Open Data Examples
Online Abuse management - Social and Technical, together
Abusive Community Analysis - Example using Reddit data
Bot Detection & Usage
. © TechSoup Global | All rights reserved19
Reddit Common Terms in Offensive Thread
http://reddit.com/r/WhiteRights
. © TechSoup Global | All rights reserved20
Top 25 Most Frequent Words
. © TechSoup Global | All rights reserved21
Sample from Top 50 Bigrams in Reddit dataset
Word1 Word2 Rank
bin laden 5
ann coulter 9
jim crow 14
hip hop 18
pearl harbor 22
nelson mandela 27
martin luther 39
charlie hebdo 40
bernie sanders 48
anglo saxon 50
. © TechSoup Global | All rights reserved22
Code Examples of Reddit Analysis
Placeholder
. © TechSoup Global | All rights reserved23
Handling Online Abuse : Bots
Bot Detection: http://www.erinshellman.com/bot-or-not/
. © TechSoup Global | All rights reserved24
Handling Online Abuse: Bots
Productized simple analysis of twitter bots: https://www.twitteraudit.com/
. © TechSoup Global | All rights reserved25
Takeaways
Many tools for merging, cleaning & preparing your data for analysis are
now accessible to end-users, many of them open source or free for
nonprofits.
Accessing Open Data through API-based applications is more efficient,
centrally updated, fresher data, better performance, end-user focused.
Lots of tools are available to help monitor and manage Social Media.
Advanced Data Science tools to detect problems are starting to be used in
more end-user friendly ways.
26
What do YOU think?
Collaborative Q&A
Session

More Related Content

What's hot

The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
Martin Hepp
 
Martin Stabe, interactive producer, Financial Times
Martin Stabe, interactive producer, Financial TimesMartin Stabe, interactive producer, Financial Times
Martin Stabe, interactive producer, Financial Times
joelmgunter
 

What's hot (17)

Unpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructureUnpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructure
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
 
Some recent Research and Resources in the area of Data Science
Some recent Research and Resources in the area of  Data ScienceSome recent Research and Resources in the area of  Data Science
Some recent Research and Resources in the area of Data Science
 
Data Journalism Workshop - Prague
Data Journalism Workshop - PragueData Journalism Workshop - Prague
Data Journalism Workshop - Prague
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 
WCIT2010
WCIT2010WCIT2010
WCIT2010
 
OpenDataHK Meetup 13 June 2013 What is Open Data?
OpenDataHK Meetup 13 June 2013 What is Open Data? OpenDataHK Meetup 13 June 2013 What is Open Data?
OpenDataHK Meetup 13 June 2013 What is Open Data?
 
Data and science
Data and scienceData and science
Data and science
 
Semantic Web Science
Semantic Web ScienceSemantic Web Science
Semantic Web Science
 
Martin Stabe, interactive producer, Financial Times
Martin Stabe, interactive producer, Financial TimesMartin Stabe, interactive producer, Financial Times
Martin Stabe, interactive producer, Financial Times
 
Data, data, data
Data, data, dataData, data, data
Data, data, data
 
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
 
Open Data and Mind Mapping
Open Data and Mind MappingOpen Data and Mind Mapping
Open Data and Mind Mapping
 
Open Data in a Day - Introduction to Open Data
Open Data in a Day - Introduction to Open DataOpen Data in a Day - Introduction to Open Data
Open Data in a Day - Introduction to Open Data
 
Using the Internet to Research Private Companies for Competitive Intelligence
Using the Internet to Research Private Companies for Competitive IntelligenceUsing the Internet to Research Private Companies for Competitive Intelligence
Using the Internet to Research Private Companies for Competitive Intelligence
 
Pixelache 110311-Hintikka-Kari-A-Open-data-Network-esthetics
Pixelache 110311-Hintikka-Kari-A-Open-data-Network-estheticsPixelache 110311-Hintikka-Kari-A-Open-data-Network-esthetics
Pixelache 110311-Hintikka-Kari-A-Open-data-Network-esthetics
 
Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)Use of Open Data in Hong Kong (LegCo 2014)
Use of Open Data in Hong Kong (LegCo 2014)
 

Viewers also liked (11)

Overview of the Helprace Experience
Overview of the Helprace ExperienceOverview of the Helprace Experience
Overview of the Helprace Experience
 
Asrao bio data[new]
Asrao bio data[new]Asrao bio data[new]
Asrao bio data[new]
 
Py413 presentation
Py413 presentationPy413 presentation
Py413 presentation
 
Functional programming is used in facebook to run news feeds smooth
Functional programming is used in facebook to run news feeds smoothFunctional programming is used in facebook to run news feeds smooth
Functional programming is used in facebook to run news feeds smooth
 
Prezentacja na ti 2016
Prezentacja na ti 2016Prezentacja na ti 2016
Prezentacja na ti 2016
 
カミーノ・デ・サンティアゴ
カミーノ・デ・サンティアゴカミーノ・デ・サンティアゴ
カミーノ・デ・サンティアゴ
 
使ってはいけないテンプレートタグ(Word bench 2015/08)
使ってはいけないテンプレートタグ(Word bench 2015/08)使ってはいけないテンプレートタグ(Word bench 2015/08)
使ってはいけないテンプレートタグ(Word bench 2015/08)
 
Eng Site Presentation
Eng Site PresentationEng Site Presentation
Eng Site Presentation
 
2010WebPlanning
2010WebPlanning2010WebPlanning
2010WebPlanning
 
Honors Thesis Proposal
Honors Thesis ProposalHonors Thesis Proposal
Honors Thesis Proposal
 
الرؤية والرسالة
الرؤية والرسالةالرؤية والرسالة
الرؤية والرسالة
 

Similar to NTC16 - Open Data and Open Source Data Science

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 
WIT Career Lecture Series - CTeixeira Data Scientist
WIT Career Lecture Series - CTeixeira Data ScientistWIT Career Lecture Series - CTeixeira Data Scientist
WIT Career Lecture Series - CTeixeira Data Scientist
Christopher Teixeira
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
Daniel Katz
 
eval("freedom_stack.push(culture)"); // The Creative Commons Solution?
eval("freedom_stack.push(culture)"); // The Creative Commons Solution?eval("freedom_stack.push(culture)"); // The Creative Commons Solution?
eval("freedom_stack.push(culture)"); // The Creative Commons Solution?
Mike Linksvayer
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
Brian Crotty
 

Similar to NTC16 - Open Data and Open Source Data Science (20)

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers
I LOVE Tech 2024 - Unlocking AI:Navigating Open Source vs. Commercial FrontiersI LOVE Tech 2024 - Unlocking AI:Navigating Open Source vs. Commercial Frontiers
I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers
 
Data Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact SolutionsData Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact Solutions
 
WIT Career Lecture Series - CTeixeira Data Scientist
WIT Career Lecture Series - CTeixeira Data ScientistWIT Career Lecture Series - CTeixeira Data Scientist
WIT Career Lecture Series - CTeixeira Data Scientist
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
SoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningSoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social Mining
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 
data, big data, open data
data, big data, open datadata, big data, open data
data, big data, open data
 
Big data
Big data Big data
Big data
 
Building a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache SparkBuilding a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache Spark
 
13 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v313 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v3
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data Commons
 
eval("freedom_stack.push(culture)"); // The Creative Commons Solution?
eval("freedom_stack.push(culture)"); // The Creative Commons Solution?eval("freedom_stack.push(culture)"); // The Creative Commons Solution?
eval("freedom_stack.push(culture)"); // The Creative Commons Solution?
 
Data ethics for developers
Data ethics for developersData ethics for developers
Data ethics for developers
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
 
Big Data
Big DataBig Data
Big Data
 

Recently uploaded

Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 

Recently uploaded (20)

Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 

NTC16 - Open Data and Open Source Data Science

  • 1. 3 Affordable Solutions Open Data and Open Source Data Science for You March, 2016
  • 2. . © TechSoup Global | All rights reserved2 Introduction Paula Alves @LadyData Steph Nagoski @InformationChef Session ID 104 #16NTCopendata Materials & Collaboration Notes http://po.st/opendata-16NTC Evaluation Link: http://po.st/fUt2gY ** WARNING: This presentation exposes information that you may find disturbing.
  • 3. . © TechSoup Global | All rights reserved3 Outline Data Wrangling, Merging Small Data Problems Open Data Examples Online Abuse management - Social and Technical, together Abusive Community Analysis - Example using Reddit data Bot Detection & Usage
  • 4. . © TechSoup Global | All rights reserved4 Data Wrangling / Data Merging Tools Cleaning and merging multiple data sources : databases, CSV, txt files, JSON, XML, web services & Open Data Files Trifacta www.trifacta.com/trifacta-wrangler/ OpenRefine - previously Google Refine http://openrefine.org/ Microsoft offerings you might already have: SSIS & Azure Data Factory Other options include Crowdflower for data cleansing & tagging http://crowdflower.com
  • 5. . © TechSoup Global | All rights reserved5 Data Wrangling - Trifacta
  • 6. . © TechSoup Global | All rights reserved6 Data Wrangling - OpenRefine Clean, Merge, and Transform data – for Javascript developers
  • 7. . © TechSoup Global | All rights reserved7 Crowdflower Tool to enrich your data through technical and crowdsourced tagging, flagging, manual review.
  • 8. . © TechSoup Global | All rights reserved8 Big Data? We all hope we grow that big. For now…
  • 9. . © TechSoup Global | All rights reserved9 Small Data Problem Examples San Francisco Health Improvement Partnership - Alcohol Policy Partnership Working Group w/Trifacta https://jrnew.shinyapps.io/sfhip-app/ Is neighborhood crime correlated with alcohol sales?
  • 10. . © TechSoup Global | All rights reserved10 Small Data Problem Examples Bosnian/Hertzegovinan Electoral data w/Google Refine https://www.youtube.com/watch?v=BcxgAOCFppY Southern Poverty Law Center Hate group list https://www.splcenter.org/hate-map Conversion Therapy source list http://www.truthwinsout.org/ex-gay-consumer-fraud-division/ Govt Data sources - 18F - College Information https://collegescorecard.ed.gov/search/?major=computer&sort=advantage:desc
  • 11. . © TechSoup Global | All rights reserved11 Outline Data Wrangling, Merging Small Data Problems Open Data Examples Online Abuse management - Social and Technical, together Abusive Community Analysis - Example using Reddit data Bot Detection & Usage
  • 12. . © TechSoup Global | All rights reserved12 Reusable Open Data Analysis DataKind - http://www.datakind.org/blog/open-data-in-action-our-top-25
  • 13. . © TechSoup Global | All rights reserved13 Reusable Open Data Analysis CivicTech – Trends in Civic Tech Investment tool http://knightfoundation.org/features/civictech/
  • 14. . © TechSoup Global | All rights reserved14 Reusable Open Data Analysis Data For Good: http://datalook.io/non-techies/ Library of reusable projects, with a focus on Non-Tech Users!
  • 15. . © TechSoup Global | All rights reserved15 Open Data Formats -> Open Data Services 18F - GSA branch committed to open development & open data https://18f.gsa.gov/ Open Data Maker: convert CSV files to an extensible open API w/analytics https://github.com/18F/open-data-maker First large example of use of OpenDataMaker API: https://collegescorecard.ed.gov/
  • 16. . © TechSoup Global | All rights reserved16 Free Speech, and Groups that may disagree w/you #BlackLivesMatter Feminist Frequency - Media Criticism from Feminist perspective Jewish and Islamic communities Disability Organizations Reproductive Health and Women’s rights Any nonprofit that advocates for oppressed minorities
  • 17. . © TechSoup Global | All rights reserved17 Handling Online Abuse 1 Crowdsourcing support/handling: Online Abuse Prevention Initiative (OAPI) http://onlineabuseprevention.org/ Projects: https://github.com/oapi Hollaback’s new Heartmob https://iheartmob.org/ Shared Block Lists - https://blocktogether.org/ Hiding blocked users from Twitter Search http://blog.randi.io/2016/01/13/hiding-blocked-users-from-twitter-search/ GoodGame AutoBlocker https://github.com/freebsdgirl/ggautoblocker
  • 18. . © TechSoup Global | All rights reserved18 Outline Data Wrangling, Merging Small Data Problems Open Data Examples Online Abuse management - Social and Technical, together Abusive Community Analysis - Example using Reddit data Bot Detection & Usage
  • 19. . © TechSoup Global | All rights reserved19 Reddit Common Terms in Offensive Thread http://reddit.com/r/WhiteRights
  • 20. . © TechSoup Global | All rights reserved20 Top 25 Most Frequent Words
  • 21. . © TechSoup Global | All rights reserved21 Sample from Top 50 Bigrams in Reddit dataset Word1 Word2 Rank bin laden 5 ann coulter 9 jim crow 14 hip hop 18 pearl harbor 22 nelson mandela 27 martin luther 39 charlie hebdo 40 bernie sanders 48 anglo saxon 50
  • 22. . © TechSoup Global | All rights reserved22 Code Examples of Reddit Analysis Placeholder
  • 23. . © TechSoup Global | All rights reserved23 Handling Online Abuse : Bots Bot Detection: http://www.erinshellman.com/bot-or-not/
  • 24. . © TechSoup Global | All rights reserved24 Handling Online Abuse: Bots Productized simple analysis of twitter bots: https://www.twitteraudit.com/
  • 25. . © TechSoup Global | All rights reserved25 Takeaways Many tools for merging, cleaning & preparing your data for analysis are now accessible to end-users, many of them open source or free for nonprofits. Accessing Open Data through API-based applications is more efficient, centrally updated, fresher data, better performance, end-user focused. Lots of tools are available to help monitor and manage Social Media. Advanced Data Science tools to detect problems are starting to be used in more end-user friendly ways.
  • 26. 26 What do YOU think? Collaborative Q&A Session

Editor's Notes

  1. V0.71 SJN BCI Team March 11, 2016
  2. Steph talk about Feminist Frequency Paula talks about X Advocacy and Human Rights Organizations: Feminist, Transgender, Black Lives Matter. Public posts will get attacked by opposition. These attacks range from friendly banter and scholastic questioning to abuse, rape, death threats, doxxing, etc. All large political organizations have some amount of fringe abusers, some communities more prone to build a large collection of abusers over time.
  3. Additional related Use Cases & Personas