• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Factual 2011 Web 2.0 Presentation

Factual 2011 Web 2.0 Presentation



Gil Elbaz, CEO and founder of Factual, gave a talk at the 2011 Web 2.0 Conference in San Francisco. His talk was entitled: "Big Data Challenges: Getting Some."

Gil Elbaz, CEO and founder of Factual, gave a talk at the 2011 Web 2.0 Conference in San Francisco. His talk was entitled: "Big Data Challenges: Getting Some."



Total Views
Views on SlideShare
Embed Views



4 Embeds 444

http://blog.factual.com 438
http://www.slideshare.net 3
http://thenewculturerevolution.tumblr.com 2
http://noahinsider.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Factual 2011 Web 2.0 Presentation Factual 2011 Web 2.0 Presentation Presentation Transcript

    • Big Data Challenges: Getting SomeMarch 31, 2011Gil Elbaz @factual @gilelbaz
    • Road to Information Singularity Conf dential i 2
    • Networks Underlying Information Flow ! Density: number of connecting paths ! Plasticity: ease of forming new paths ! Speed & Flow: !""#$%%&&&()*++,-+.*/(01,-211(**34*5%()*++,-+6.*/6(01,-211% rate of information transfer Conf dential i 3
    • The Internet !""#$%%&&&7578*-4*5%9,+,"7):;/*11/*7<1:=52/,47-:>2)24*550-,47",*-1:?-"2/-2"%<#%@ABACD@ECF Conf dential i 4
    • Search Engines Conf dential i 5
    • Social Networks: Facebook !""#$%%A(#()*+1#*"4*5% 600 million Facebook users 130 average friends 8 friend requests / month 15 messages / day / user Conf dential i 6
    • Trending of Unfriending Conf dential i 7
    • Conf dential i 8
    • Unfriending Conf dential i 9
    • Another Network: The Brain 100 billion neurons 1000 ‘hardwired’ synapses !""#$%%&2)4*52"*G57/"4*5%A@CC%@C Conf dential i 10
    • Web 3.0: Data Web Conf dential i 11
    • Web Scale Data = More Pain Findability Access Rights Economics Standards Integration & Aggregation Trust Conf dential i 12
    • Web 2.0 Model: Scale-Free Networks&&&.0"0/22H#)*/7",*--2" Conf dential i 13
    • Book Data: Progress Being Made Google Book Search API Open Library Books API ISBNdb Amazon API LibraryThing GoodReads WorldCat Conf dential i 14
    • Google Book Search API Amazon API Open Library Books API LibraryThing ISBNdb WorldCat GoodReads I,-<7(,),"JKKKKKKKKKKKK=44211KKKKKKK L,+!"1KKKKKKKKKKKKM4*-*5,41KKKKKKK N"7-<7/<1KKKKKKKKKKKKKKK>/01"KKKKKK Conf dential i
    • Another Case Study: Local Data !""#$%%1"2O24!2-2J#*1"2/*014*5% Conf dential i 16
    • Another Case Study: Local Data !"##$%$&$$(#)*+()(,-&(##)%./!"#$%"$&"$()*$*!)$%+*+0 !"#$$%& !"#$$%& ()%*++, Examine Twitter sentiment ()%*++, (avoid dirty coffee shops) -++.$ -++.$ +/&01/(&% Identify areas of highest +/&01/(&% bike thefts 2%3. 2%3. 4#33+" 4#33+" Correlate check-ins with 5++63% property values 5++63% 7+8%9:/;)$#+; 7+8%9:/;)$#+; Conf dential i 17
    • HomeJunction Conf dential i 18
    • Factual is Example of New Information Network "#$#%&( )$&*+*#(&( 345&*6&$ ,-./#&01&-*&2 ,-."-%$%+*+ Aggregate Mash Curate Dedupe Canonicalize Developers Publishers Search Engines !"#$%"&()"*+$,-.-/(0(1("*+$%231#-&"$4..* Conf dential i 19
    • Factual’s Open Data Model Free, access via APIs, SDKs, and downloads BUT… we ask you to contribute back into ecosystem. Benef ts i ! Drive down costs ! Rapid iteration ! Differentiate on user experience ! Only need small % participation from world (e.g. Wikipedia) Conf dential i 20
    • Equivalence Measurements =? Subway Sandwiches Subway 52 E Court St 52 West Court St Cincinnati, OH 45202 (513)-241-6699 (800)-653-2323 Conf dential i 21
    • Large-Scale Aggregation Technologies Conf dential i 22
    • Large-Scale Aggregation Technologies =#7/"52-"1KPK=#"1 ;2-"2/KPK;"/ ;*/#KPK;*/#*/7",*- N2/O,42KPKNO4 =""*/-2JKPK=""J =11*4KPK=11*4,7"21 ?-4KPK?-4*/#*/7"2< =11-KPK=11*4,7",*- ;*KPK;*5#7-J Q*0-"KPKQ" R/*1KPKR/*"!2/1 KKKKKKKRRSKPKR7/(2T02KKK U*/,KPK>2< Conf dential i 23
    • Large-Scale Aggregation Technologies L21"70/7-"KPKL1"/-" L21"70/7-"KPKL21"07/7-" V*1#KPKV*1#,"7) R,))7/<1KPKR,)),7/<1 N7)*-KPKN)- R0..2"KPKR0..2"" ;2-"2/KPK;"/ =#7/"52-"1KPK=#"1 R*0",T02KPKR"T W2&2)2/1KPKW2&)2/1 ;)27-2/1KPK;)-/1 KKKKKQ7/32"KPKQ3"8K X/7+2-KPKYZL2,))JK[ Conf dential i 24
    • Kragen OReilly? Conf dential i 25
    • Large-Scale Deduping • Specialized data compression & folding techniques • Eliminate redundant entities - endpoints and authority pages • Improves precision & recall • Enables real-time dedupe and crosswalks Conf dential i 26
    • Shared Foundational Data ! Commoditization of data ! Head attributes for people, places, things decreasing in value ! hCard data value driven to zero (visual of local data being identical on thousand of apps) ! Entertainment: IMDB exposed all their data for non- commercial use (link to site map) ! Yet, there are still lots of errors in foundation data – thus need “living” model Conf dential i
    • LA Neighborhoods: Another Crowdsourcing Example ! LA Times started with 87 neighborhoods based on census tracts ! Incorporated 650+ user maps ! Ended with 114 neighborhoods for LA City ! Added additional 158 neighborhoods for LA County Conf dential i
    • Ownership & Rights: LA Neighborhoods: ! Terms of Service: Creative Commons Attribution, Noncommercial, Share- Alike license ! Can share and remix as long as it’s for noncommercial uses, attributed to the LA Times, and shared under the same terms Conf dential i
    • Evolving “Buy” Model ! Data Marketplaces (“itunes of data?”) ! Data Search Engines ! Microformats / Semantic Web Markups / Other Standards ! Electronic Forms of T&Cs Conf dential i
    • Summary: Road to the Information Singularity ! Rise in community storage and access ! New common schemas and standards ! Def nitive, accountable sources of “open” data i ! Trends towards sharing of foundational data ! Buy models based on unique data, novel access methods, SLAs, value-added services Conf dential i 31
    • Thank you! Questions......Gil Elbaz @factual @gilelbaz