Factual 2011 Web 2.0 Presentation

1,998 views

Published on

Gil Elbaz, CEO and founder of Factual, gave a talk at the 2011 Web 2.0 Conference in San Francisco. His talk was entitled: "Big Data Challenges: Getting Some."

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,998
On SlideShare
0
From Embeds
0
Number of Embeds
487
Actions
Shares
0
Downloads
47
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Factual 2011 Web 2.0 Presentation

  1. 1. Big Data Challenges: Getting SomeMarch 31, 2011Gil Elbaz @factual @gilelbaz
  2. 2. Road to Information Singularity Conf dential i 2
  3. 3. Networks Underlying Information Flow ! Density: number of connecting paths ! Plasticity: ease of forming new paths ! Speed & Flow: !""#$%%&&&()*++,-+.*/(01,-211(**34*5%()*++,-+6.*/6(01,-211% rate of information transfer Conf dential i 3
  4. 4. The Internet !""#$%%&&&7578*-4*5%9,+,"7):;/*11/*7<1:=52/,47-:>2)24*550-,47",*-1:?-"2/-2"%<#%@ABACD@ECF Conf dential i 4
  5. 5. Search Engines Conf dential i 5
  6. 6. Social Networks: Facebook !""#$%%A(#()*+1#*"4*5% 600 million Facebook users 130 average friends 8 friend requests / month 15 messages / day / user Conf dential i 6
  7. 7. Trending of Unfriending Conf dential i 7
  8. 8. Conf dential i 8
  9. 9. Unfriending Conf dential i 9
  10. 10. Another Network: The Brain 100 billion neurons 1000 ‘hardwired’ synapses !""#$%%&2)4*52"*G57/"4*5%A@CC%@C Conf dential i 10
  11. 11. Web 3.0: Data Web Conf dential i 11
  12. 12. Web Scale Data = More Pain Findability Access Rights Economics Standards Integration & Aggregation Trust Conf dential i 12
  13. 13. Web 2.0 Model: Scale-Free Networks&&&.0"0/22H#)*/7",*--2" Conf dential i 13
  14. 14. Book Data: Progress Being Made Google Book Search API Open Library Books API ISBNdb Amazon API LibraryThing GoodReads WorldCat Conf dential i 14
  15. 15. Google Book Search API Amazon API Open Library Books API LibraryThing ISBNdb WorldCat GoodReads I,-<7(,),"JKKKKKKKKKKKK=44211KKKKKKK L,+!"1KKKKKKKKKKKKM4*-*5,41KKKKKKK N"7-<7/<1KKKKKKKKKKKKKKK>/01"KKKKKK Conf dential i
  16. 16. Another Case Study: Local Data !""#$%%1"2O24!2-2J#*1"2/*014*5% Conf dential i 16
  17. 17. Another Case Study: Local Data !"##$%$&$$(#)*+()(,-&(##)%./!"#$%"$&"$()*$*!)$%+*+0 !"#$$%& !"#$$%& ()%*++, Examine Twitter sentiment ()%*++, (avoid dirty coffee shops) -++.$ -++.$ +/&01/(&% Identify areas of highest +/&01/(&% bike thefts 2%3. 2%3. 4#33+" 4#33+" Correlate check-ins with 5++63% property values 5++63% 7+8%9:/;)$#+; 7+8%9:/;)$#+; Conf dential i 17
  18. 18. HomeJunction Conf dential i 18
  19. 19. Factual is Example of New Information Network "#$#%&( )$&*+*#(&( 345&*6&$ ,-./#&01&-*&2 ,-."-%$%+*+ Aggregate Mash Curate Dedupe Canonicalize Developers Publishers Search Engines !"#$%"&()"*+$,-.-/(0(1("*+$%231#-&"$4..* Conf dential i 19
  20. 20. Factual’s Open Data Model Free, access via APIs, SDKs, and downloads BUT… we ask you to contribute back into ecosystem. Benef ts i ! Drive down costs ! Rapid iteration ! Differentiate on user experience ! Only need small % participation from world (e.g. Wikipedia) Conf dential i 20
  21. 21. Equivalence Measurements =? Subway Sandwiches Subway 52 E Court St 52 West Court St Cincinnati, OH 45202 (513)-241-6699 (800)-653-2323 Conf dential i 21
  22. 22. Large-Scale Aggregation Technologies Conf dential i 22
  23. 23. Large-Scale Aggregation Technologies =#7/"52-"1KPK=#"1 ;2-"2/KPK;"/ ;*/#KPK;*/#*/7",*- N2/O,42KPKNO4 =""*/-2JKPK=""J =11*4KPK=11*4,7"21 ?-4KPK?-4*/#*/7"2< =11-KPK=11*4,7",*- ;*KPK;*5#7-J Q*0-"KPKQ" R/*1KPKR/*"!2/1 KKKKKKKRRSKPKR7/(2T02KKK U*/,KPK>2< Conf dential i 23
  24. 24. Large-Scale Aggregation Technologies L21"70/7-"KPKL1"/-" L21"70/7-"KPKL21"07/7-" V*1#KPKV*1#,"7) R,))7/<1KPKR,)),7/<1 N7)*-KPKN)- R0..2"KPKR0..2"" ;2-"2/KPK;"/ =#7/"52-"1KPK=#"1 R*0",T02KPKR"T W2&2)2/1KPKW2&)2/1 ;)27-2/1KPK;)-/1 KKKKKQ7/32"KPKQ3"8K X/7+2-KPKYZL2,))JK[ Conf dential i 24
  25. 25. Kragen OReilly? Conf dential i 25
  26. 26. Large-Scale Deduping • Specialized data compression & folding techniques • Eliminate redundant entities - endpoints and authority pages • Improves precision & recall • Enables real-time dedupe and crosswalks Conf dential i 26
  27. 27. Shared Foundational Data ! Commoditization of data ! Head attributes for people, places, things decreasing in value ! hCard data value driven to zero (visual of local data being identical on thousand of apps) ! Entertainment: IMDB exposed all their data for non- commercial use (link to site map) ! Yet, there are still lots of errors in foundation data – thus need “living” model Conf dential i
  28. 28. LA Neighborhoods: Another Crowdsourcing Example ! LA Times started with 87 neighborhoods based on census tracts ! Incorporated 650+ user maps ! Ended with 114 neighborhoods for LA City ! Added additional 158 neighborhoods for LA County Conf dential i
  29. 29. Ownership & Rights: LA Neighborhoods: ! Terms of Service: Creative Commons Attribution, Noncommercial, Share- Alike license ! Can share and remix as long as it’s for noncommercial uses, attributed to the LA Times, and shared under the same terms Conf dential i
  30. 30. Evolving “Buy” Model ! Data Marketplaces (“itunes of data?”) ! Data Search Engines ! Microformats / Semantic Web Markups / Other Standards ! Electronic Forms of T&Cs Conf dential i
  31. 31. Summary: Road to the Information Singularity ! Rise in community storage and access ! New common schemas and standards ! Def nitive, accountable sources of “open” data i ! Trends towards sharing of foundational data ! Buy models based on unique data, novel access methods, SLAs, value-added services Conf dential i 31
  32. 32. Thank you! Questions......Gil Elbaz @factual @gilelbaz

×