SlideShare a Scribd company logo
BIG DATA and VERACITY:
A novel approach to data
veracity using crowd-sourcing
techniques
Samarth Bhargav, Bhoomika Agarwal,
Abhiram Ravikumar and Vrishabh DN
April 18, 2014
Presented at BMS Institute of Technology, Bangalore
Introduction
Big Data
● What is Big Data?
● The 3 traditional V’s
o Volume
o Velocity
o Variety
● Fourth V
● Crowdsourcing
Volume
VarietyVelocity
Veracity
The 4 Vs of Big Data
Source: http://well-managed-business-intelligence.blogspot.in/2012/06/big-data-fourth.html
Crowdsourcing - Models in place
GOOGLE MAPS
WIKIPEDIA
DUOLINGO
RECAPTCHA
AMAZON TURK
● Digitizing one word at a time
● Utilize the 10 seconds spent by humans, productively
● Digitizing old books - herculean task for computers
● An efficient alternative to OCR
● Workflow - entry, multiple-checks, verify, upload
● 20 years of The New York Times Daily was digitized in
just a couple of months
reCAPTCHA
● “Enrich Google Maps with your local knowledge”
● The Google Map Maker project
● Data used by Google Maps and Google Earth
● Projects like PhotoSphere and StreetView use huge
contributions from the masses
● Workflow
○ add/edit places
○ verified by a moderator
○ cross-referenced and updated
Google Maps
WIKIPEDIA
● Termed as the “mother of all encyclopedias”
● Hosts an immense pool of data, multi-linguistic in nature
and entirely community driven
● Run by donations from all over the world (crowdfunding)
● Dynamic and constantly updated, thus scores big over
traditional encyclopedias
● Unbiased and high-quality
information
● Data-verification and
validation done instantly
by both experts and
general public
DUOLINGO
● Learn a language and translate the Web
● Entirely free and crowd-driven
● Luis van Ahn - ESP games and reCAPTCHA
● Workflow
o website to be translated is uploaded
o broken into parts & given to students
o students translate the doc during learning procedure
o translated doc returned to owner
● Win-win situation for both students and corporates
● Popular on both web as well as mobile platforms
Amazon Mechanical Turk
● Use of artificial intelligence to run businesses
● HITs enable machine learning concepts
● Workflow
o Requester places task on the site or through API
o Provider picks a suitable task
o Payments made through Amazon gift certificates
● Advantages include
o Quality assurance
o Scalability options
o Lower cost
Analysis
● Handling data IS important
● Google FLU tracker
● KickStarter and CosmoQuest
● Lot of scope and wide opportunities
Repercussions
● Senator Kennedy’s story
● FCRA (Fair Credit Reporting Act)
● Crowds unaware of data-acquisition
● Confidential data and security-leaks to be
addressed with care
Conclusion
Crowdsourcing
model
Volume Velocity Variety Veracity
Google Maps terabytes high low medium
Duolingo terabytes medium high high
reCAPTCHA petabytes very high very high very high
Amazon Turk petabytes medium very high high
Wikipedia petabytes medium high very high
References
1. http://crowdsourcingweek.com/you-have-helped-digitize-millions-of-books-through-online-
collaboration/
2. http://www.loopinsight.com/2014/03/14/duolingo-recaptcha-and-a-magnificent-piece-of-
crowdsourcing/
3. http://www.cracked.com/article_19431_5-mind-blowing-things-crowds-do-better-than-
experts.html
4. http://royal.pingdom.com/2012/02/08/google-maps-turns-7-years-old-amazing-facts-and-figures/
5. http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk
6. http://www.pomona.edu/academics/departments/psychology/files/Buhrmester%20-
Crowdsourcing-Amazon-MTurk.pdf
7. http://hcil2.cs.umd.edu/trs/2010-09/2010-09.pdf
8. http://www.slideshare.net/davidgracia/crowdsourcing-at-wikipedia-8586584
9. http://info.articleonepartners.com/crowdsourcing-series-wikipedia-the-godfather-of-
crowdsourcing/
10. http://ezinearticles.com/?Wikipedia---A-Successful-Crowdsourcing-Project&id=3736803
Question & Answers time! :-)
Source:http://2.bp.blogspot.com/
Thank you, UTSAHA 2k’14.

More Related Content

What's hot

Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
Fujitsu UK
 
Big data analysis using map/reduce
Big data analysis using map/reduceBig data analysis using map/reduce
Big data analysis using map/reduce
RenuSuren
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
Jayant Mukherjee
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
Shree M.L.Kakadiya MCA mahila college, Amreli
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
SiamAhmed16
 
big data analytics in mobile cellular network
big data analytics in mobile cellular networkbig data analytics in mobile cellular network
big data analytics in mobile cellular network
shubham patil
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
Sadhana Singh
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
TUSHAR GARG
 
Sina Sohangir Presentation on IWMC 2015
Sina Sohangir Presentation on IWMC 2015Sina Sohangir Presentation on IWMC 2015
Sina Sohangir Presentation on IWMC 2015
Iran Entrepreneurship Association
 
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Homeland Security Research Corp.
 
The importance of data
The importance of dataThe importance of data
The importance of data
APNIC
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsKaniska Mandal
 
Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaStudent
 
Integrating Big Data Technologies
Integrating Big Data TechnologiesIntegrating Big Data Technologies
Integrating Big Data TechnologiesDATAVERSITY
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
Sivashankar Ganapathy
 
Big data ppt
Big data pptBig data ppt
Big data Ppt
Big data PptBig data Ppt
Big data Ppt
Prashant Navatre
 
Big Data introduction - Café Numérique Bruxelles
Big Data introduction - Café Numérique BruxellesBig Data introduction - Café Numérique Bruxelles
Big Data introduction - Café Numérique Bruxelles
Eric Rodriguez (Hiring in Lex)
 

What's hot (20)

Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
 
Big data analysis using map/reduce
Big data analysis using map/reduceBig data analysis using map/reduce
Big data analysis using map/reduce
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
big data analytics in mobile cellular network
big data analytics in mobile cellular networkbig data analytics in mobile cellular network
big data analytics in mobile cellular network
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Sina Sohangir Presentation on IWMC 2015
Sina Sohangir Presentation on IWMC 2015Sina Sohangir Presentation on IWMC 2015
Sina Sohangir Presentation on IWMC 2015
 
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
 
The importance of data
The importance of dataThe importance of data
The importance of data
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data Analytics
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by Jaseela
 
Integrating Big Data Technologies
Integrating Big Data TechnologiesIntegrating Big Data Technologies
Integrating Big Data Technologies
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data Ppt
Big data PptBig data Ppt
Big data Ppt
 
Big Data introduction - Café Numérique Bruxelles
Big Data introduction - Café Numérique BruxellesBig Data introduction - Café Numérique Bruxelles
Big Data introduction - Café Numérique Bruxelles
 

Viewers also liked

Basuras en lugares especificos
Basuras en lugares especificosBasuras en lugares especificos
Basuras en lugares especificosdormelion
 
Take back control - introduction
Take back control - introductionTake back control - introduction
Take back control - introduction
Abhiram Ravikumar
 
Rockin' Search Engine Optimization in Drupal
Rockin' Search Engine Optimization in DrupalRockin' Search Engine Optimization in Drupal
Rockin' Search Engine Optimization in Drupal
Matt Glaman
 
Tracnghiemnlkt
TracnghiemnlktTracnghiemnlkt
Museum Textile Review- Collections Care: Costume & books at New Harmony/ Harm...
Museum Textile Review- Collections Care: Costume & books at New Harmony/ Harm...Museum Textile Review- Collections Care: Costume & books at New Harmony/ Harm...
Museum Textile Review- Collections Care: Costume & books at New Harmony/ Harm...
Museum Grant Advocate: CAP, NEA, NEH in 7 states
 
Creating Shareable content
Creating Shareable contentCreating Shareable content
Creating Shareable content
Sonja Fuchs
 
Advert codes and conventions
Advert codes and conventionsAdvert codes and conventions
Advert codes and conventions
rosedalyx
 
Clichés de Jean Ledocq sur le thème Art Image de 2016
Clichés de Jean Ledocq sur le thème Art Image de 2016Clichés de Jean Ledocq sur le thème Art Image de 2016
Clichés de Jean Ledocq sur le thème Art Image de 2016
Jean LEDOCQ
 
Hooks Historic Drugstore Preservation
Hooks Historic Drugstore Preservation Hooks Historic Drugstore Preservation
Hooks Historic Drugstore Preservation
Museum Grant Advocate: CAP, NEA, NEH in 7 states
 
Textile Military History, 27th Indiana. Vol. Regiment, Dubois county Civil W...
Textile Military History, 27th Indiana. Vol. Regiment,  Dubois county Civil W...Textile Military History, 27th Indiana. Vol. Regiment,  Dubois county Civil W...
Textile Military History, 27th Indiana. Vol. Regiment, Dubois county Civil W...
Museum Grant Advocate: CAP, NEA, NEH in 7 states
 
Museum collection storage- Cincinnati History Museum Ctr, Geiger - Fleishman...
Museum collection storage-  Cincinnati History Museum Ctr, Geiger - Fleishman...Museum collection storage-  Cincinnati History Museum Ctr, Geiger - Fleishman...
Museum collection storage- Cincinnati History Museum Ctr, Geiger - Fleishman...
Museum Grant Advocate: CAP, NEA, NEH in 7 states
 
Welcome to Drupal 262
Welcome to Drupal 262Welcome to Drupal 262
Welcome to Drupal 262
Matt Glaman
 
20150423 跨科際短講籌備會議
20150423 跨科際短講籌備會議20150423 跨科際短講籌備會議
20150423 跨科際短講籌備會議Wendy Yuchen Sun
 
cancer-de-cuello-uterino
 cancer-de-cuello-uterino  cancer-de-cuello-uterino
cancer-de-cuello-uterino Teryon
 
References expose 2016
References expose 2016References expose 2016
References expose 2016
Jean LEDOCQ
 
Website codes and conventions
Website codes and conventionsWebsite codes and conventions
Website codes and conventions
rosedalyx
 

Viewers also liked (17)

Basuras en lugares especificos
Basuras en lugares especificosBasuras en lugares especificos
Basuras en lugares especificos
 
Take back control - introduction
Take back control - introductionTake back control - introduction
Take back control - introduction
 
Rockin' Search Engine Optimization in Drupal
Rockin' Search Engine Optimization in DrupalRockin' Search Engine Optimization in Drupal
Rockin' Search Engine Optimization in Drupal
 
Tracnghiemnlkt
TracnghiemnlktTracnghiemnlkt
Tracnghiemnlkt
 
Museum Textile Review- Collections Care: Costume & books at New Harmony/ Harm...
Museum Textile Review- Collections Care: Costume & books at New Harmony/ Harm...Museum Textile Review- Collections Care: Costume & books at New Harmony/ Harm...
Museum Textile Review- Collections Care: Costume & books at New Harmony/ Harm...
 
Creating Shareable content
Creating Shareable contentCreating Shareable content
Creating Shareable content
 
Air France
Air FranceAir France
Air France
 
Advert codes and conventions
Advert codes and conventionsAdvert codes and conventions
Advert codes and conventions
 
Clichés de Jean Ledocq sur le thème Art Image de 2016
Clichés de Jean Ledocq sur le thème Art Image de 2016Clichés de Jean Ledocq sur le thème Art Image de 2016
Clichés de Jean Ledocq sur le thème Art Image de 2016
 
Hooks Historic Drugstore Preservation
Hooks Historic Drugstore Preservation Hooks Historic Drugstore Preservation
Hooks Historic Drugstore Preservation
 
Textile Military History, 27th Indiana. Vol. Regiment, Dubois county Civil W...
Textile Military History, 27th Indiana. Vol. Regiment,  Dubois county Civil W...Textile Military History, 27th Indiana. Vol. Regiment,  Dubois county Civil W...
Textile Military History, 27th Indiana. Vol. Regiment, Dubois county Civil W...
 
Museum collection storage- Cincinnati History Museum Ctr, Geiger - Fleishman...
Museum collection storage-  Cincinnati History Museum Ctr, Geiger - Fleishman...Museum collection storage-  Cincinnati History Museum Ctr, Geiger - Fleishman...
Museum collection storage- Cincinnati History Museum Ctr, Geiger - Fleishman...
 
Welcome to Drupal 262
Welcome to Drupal 262Welcome to Drupal 262
Welcome to Drupal 262
 
20150423 跨科際短講籌備會議
20150423 跨科際短講籌備會議20150423 跨科際短講籌備會議
20150423 跨科際短講籌備會議
 
cancer-de-cuello-uterino
 cancer-de-cuello-uterino  cancer-de-cuello-uterino
cancer-de-cuello-uterino
 
References expose 2016
References expose 2016References expose 2016
References expose 2016
 
Website codes and conventions
Website codes and conventionsWebsite codes and conventions
Website codes and conventions
 

Similar to A novel approach to big data veracity using crowd-sourcing techniques

IOT Paris Seminar 2015 - Storage Challenges in IOT
IOT Paris Seminar 2015 - Storage Challenges in IOTIOT Paris Seminar 2015 - Storage Challenges in IOT
IOT Paris Seminar 2015 - Storage Challenges in IOT
MongoDB
 
Google Case Study .pptx
Google Case Study .pptxGoogle Case Study .pptx
Google Case Study .pptx
NitiMehta8
 
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseSkymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Adam Gibson
 
Knowledge-based economy presentation
Knowledge-based economy presentation Knowledge-based economy presentation
Knowledge-based economy presentation
Numan Dilder
 
The Hyper Connected Era: Mobile First, Cloud First and Multi Screen
The Hyper Connected Era: Mobile First, Cloud First and Multi Screen The Hyper Connected Era: Mobile First, Cloud First and Multi Screen
The Hyper Connected Era: Mobile First, Cloud First and Multi Screen
Jose Papo, MSc
 
ICTA Meetup 11 - Big Data
ICTA Meetup 11 - Big DataICTA Meetup 11 - Big Data
ICTA Meetup 11 - Big Data
Crishantha Nanayakkara
 
State of Technology in Libraries 2019
State of Technology in Libraries 2019State of Technology in Libraries 2019
State of Technology in Libraries 2019
Nick Tanzi
 
Big & Open Data: Challenges for Smartcity
Big & Open Data:  Challenges for SmartcityBig & Open Data:  Challenges for Smartcity
Big & Open Data: Challenges for Smartcity
Victoria López
 
Mobile semantic technology
Mobile semantic technologyMobile semantic technology
Mobile semantic technology
Thomas Kelly, PMP
 
Human Computation for Big Data
Human Computation for Big DataHuman Computation for Big Data
Human Computation for Big Data
eXascale Infolab
 
RightScale Webinar: Get Top Performance for Your Games
RightScale Webinar: Get Top Performance for Your GamesRightScale Webinar: Get Top Performance for Your Games
RightScale Webinar: Get Top Performance for Your Games
RightScale
 
Technology building blocks for innovation
Technology building blocks for innovationTechnology building blocks for innovation
Technology building blocks for innovation
Mahmoud Jalajel
 
Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...
Andreas Kamilaris
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015Kanwal Prakash Singh
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015Kanwal Prakash Singh
 
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Ukraine
 
Digitalization: A Challenge and An Opportunity for Banks
Digitalization: A Challenge and An Opportunity for BanksDigitalization: A Challenge and An Opportunity for Banks
Digitalization: A Challenge and An Opportunity for Banks
Jérôme Kehrli
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted Conf
In Marketing We Trust
 
"The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor...
"The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor..."The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor...
"The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor...
Quantopian
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Steven Ramage
 

Similar to A novel approach to big data veracity using crowd-sourcing techniques (20)

IOT Paris Seminar 2015 - Storage Challenges in IOT
IOT Paris Seminar 2015 - Storage Challenges in IOTIOT Paris Seminar 2015 - Storage Challenges in IOT
IOT Paris Seminar 2015 - Storage Challenges in IOT
 
Google Case Study .pptx
Google Case Study .pptxGoogle Case Study .pptx
Google Case Study .pptx
 
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseSkymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the Enterprise
 
Knowledge-based economy presentation
Knowledge-based economy presentation Knowledge-based economy presentation
Knowledge-based economy presentation
 
The Hyper Connected Era: Mobile First, Cloud First and Multi Screen
The Hyper Connected Era: Mobile First, Cloud First and Multi Screen The Hyper Connected Era: Mobile First, Cloud First and Multi Screen
The Hyper Connected Era: Mobile First, Cloud First and Multi Screen
 
ICTA Meetup 11 - Big Data
ICTA Meetup 11 - Big DataICTA Meetup 11 - Big Data
ICTA Meetup 11 - Big Data
 
State of Technology in Libraries 2019
State of Technology in Libraries 2019State of Technology in Libraries 2019
State of Technology in Libraries 2019
 
Big & Open Data: Challenges for Smartcity
Big & Open Data:  Challenges for SmartcityBig & Open Data:  Challenges for Smartcity
Big & Open Data: Challenges for Smartcity
 
Mobile semantic technology
Mobile semantic technologyMobile semantic technology
Mobile semantic technology
 
Human Computation for Big Data
Human Computation for Big DataHuman Computation for Big Data
Human Computation for Big Data
 
RightScale Webinar: Get Top Performance for Your Games
RightScale Webinar: Get Top Performance for Your GamesRightScale Webinar: Get Top Performance for Your Games
RightScale Webinar: Get Top Performance for Your Games
 
Technology building blocks for innovation
Technology building blocks for innovationTechnology building blocks for innovation
Technology building blocks for innovation
 
Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
 
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
 
Digitalization: A Challenge and An Opportunity for Banks
Digitalization: A Challenge and An Opportunity for BanksDigitalization: A Challenge and An Opportunity for Banks
Digitalization: A Challenge and An Opportunity for Banks
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted Conf
 
"The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor...
"The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor..."The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor...
"The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor...
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
 

More from Abhiram Ravikumar

Innovate the foss-way
Innovate the foss-wayInnovate the foss-way
Innovate the foss-way
Abhiram Ravikumar
 
Rust meetup delhi nov 18
Rust meetup delhi nov 18Rust meetup delhi nov 18
Rust meetup delhi nov 18
Abhiram Ravikumar
 
Ethereum and blockchain
Ethereum and blockchainEthereum and blockchain
Ethereum and blockchain
Abhiram Ravikumar
 
BCI Media Playet | Intuit Accessibility Summit
BCI Media Playet | Intuit Accessibility SummitBCI Media Playet | Intuit Accessibility Summit
BCI Media Playet | Intuit Accessibility Summit
Abhiram Ravikumar
 
Privacy & Security on the Web - Tools on Mozilla Firefox
Privacy & Security on the Web - Tools on Mozilla FirefoxPrivacy & Security on the Web - Tools on Mozilla Firefox
Privacy & Security on the Web - Tools on Mozilla Firefox
Abhiram Ravikumar
 
A seminar on User Topic Interest profiles research by Google
A seminar on  User Topic Interest profiles research by GoogleA seminar on  User Topic Interest profiles research by Google
A seminar on User Topic Interest profiles research by Google
Abhiram Ravikumar
 
A kick-start into Open Source
A kick-start into Open SourceA kick-start into Open Source
A kick-start into Open Source
Abhiram Ravikumar
 

More from Abhiram Ravikumar (7)

Innovate the foss-way
Innovate the foss-wayInnovate the foss-way
Innovate the foss-way
 
Rust meetup delhi nov 18
Rust meetup delhi nov 18Rust meetup delhi nov 18
Rust meetup delhi nov 18
 
Ethereum and blockchain
Ethereum and blockchainEthereum and blockchain
Ethereum and blockchain
 
BCI Media Playet | Intuit Accessibility Summit
BCI Media Playet | Intuit Accessibility SummitBCI Media Playet | Intuit Accessibility Summit
BCI Media Playet | Intuit Accessibility Summit
 
Privacy & Security on the Web - Tools on Mozilla Firefox
Privacy & Security on the Web - Tools on Mozilla FirefoxPrivacy & Security on the Web - Tools on Mozilla Firefox
Privacy & Security on the Web - Tools on Mozilla Firefox
 
A seminar on User Topic Interest profiles research by Google
A seminar on  User Topic Interest profiles research by GoogleA seminar on  User Topic Interest profiles research by Google
A seminar on User Topic Interest profiles research by Google
 
A kick-start into Open Source
A kick-start into Open SourceA kick-start into Open Source
A kick-start into Open Source
 

Recently uploaded

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 

Recently uploaded (20)

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 

A novel approach to big data veracity using crowd-sourcing techniques

  • 1. BIG DATA and VERACITY: A novel approach to data veracity using crowd-sourcing techniques Samarth Bhargav, Bhoomika Agarwal, Abhiram Ravikumar and Vrishabh DN April 18, 2014 Presented at BMS Institute of Technology, Bangalore
  • 2. Introduction Big Data ● What is Big Data? ● The 3 traditional V’s o Volume o Velocity o Variety ● Fourth V ● Crowdsourcing Volume VarietyVelocity Veracity
  • 3. The 4 Vs of Big Data Source: http://well-managed-business-intelligence.blogspot.in/2012/06/big-data-fourth.html
  • 4. Crowdsourcing - Models in place GOOGLE MAPS WIKIPEDIA DUOLINGO RECAPTCHA AMAZON TURK
  • 5. ● Digitizing one word at a time ● Utilize the 10 seconds spent by humans, productively ● Digitizing old books - herculean task for computers ● An efficient alternative to OCR ● Workflow - entry, multiple-checks, verify, upload ● 20 years of The New York Times Daily was digitized in just a couple of months reCAPTCHA
  • 6. ● “Enrich Google Maps with your local knowledge” ● The Google Map Maker project ● Data used by Google Maps and Google Earth ● Projects like PhotoSphere and StreetView use huge contributions from the masses ● Workflow ○ add/edit places ○ verified by a moderator ○ cross-referenced and updated Google Maps
  • 7. WIKIPEDIA ● Termed as the “mother of all encyclopedias” ● Hosts an immense pool of data, multi-linguistic in nature and entirely community driven ● Run by donations from all over the world (crowdfunding) ● Dynamic and constantly updated, thus scores big over traditional encyclopedias ● Unbiased and high-quality information ● Data-verification and validation done instantly by both experts and general public
  • 8. DUOLINGO ● Learn a language and translate the Web ● Entirely free and crowd-driven ● Luis van Ahn - ESP games and reCAPTCHA ● Workflow o website to be translated is uploaded o broken into parts & given to students o students translate the doc during learning procedure o translated doc returned to owner ● Win-win situation for both students and corporates ● Popular on both web as well as mobile platforms
  • 9. Amazon Mechanical Turk ● Use of artificial intelligence to run businesses ● HITs enable machine learning concepts ● Workflow o Requester places task on the site or through API o Provider picks a suitable task o Payments made through Amazon gift certificates ● Advantages include o Quality assurance o Scalability options o Lower cost
  • 10. Analysis ● Handling data IS important ● Google FLU tracker ● KickStarter and CosmoQuest ● Lot of scope and wide opportunities
  • 11. Repercussions ● Senator Kennedy’s story ● FCRA (Fair Credit Reporting Act) ● Crowds unaware of data-acquisition ● Confidential data and security-leaks to be addressed with care
  • 12. Conclusion Crowdsourcing model Volume Velocity Variety Veracity Google Maps terabytes high low medium Duolingo terabytes medium high high reCAPTCHA petabytes very high very high very high Amazon Turk petabytes medium very high high Wikipedia petabytes medium high very high
  • 13. References 1. http://crowdsourcingweek.com/you-have-helped-digitize-millions-of-books-through-online- collaboration/ 2. http://www.loopinsight.com/2014/03/14/duolingo-recaptcha-and-a-magnificent-piece-of- crowdsourcing/ 3. http://www.cracked.com/article_19431_5-mind-blowing-things-crowds-do-better-than- experts.html 4. http://royal.pingdom.com/2012/02/08/google-maps-turns-7-years-old-amazing-facts-and-figures/ 5. http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk 6. http://www.pomona.edu/academics/departments/psychology/files/Buhrmester%20- Crowdsourcing-Amazon-MTurk.pdf 7. http://hcil2.cs.umd.edu/trs/2010-09/2010-09.pdf 8. http://www.slideshare.net/davidgracia/crowdsourcing-at-wikipedia-8586584 9. http://info.articleonepartners.com/crowdsourcing-series-wikipedia-the-godfather-of- crowdsourcing/ 10. http://ezinearticles.com/?Wikipedia---A-Successful-Crowdsourcing-Project&id=3736803
  • 14. Question & Answers time! :-) Source:http://2.bp.blogspot.com/ Thank you, UTSAHA 2k’14.