Zanran<br />- a tale<br />1<br />
people<br />Jon Goldhill<br />Yves Dassas<br />MBA from London Business School<br />Started telecom information business i...
3<br />
4<br />
5<br />
6<br />..While in early beta, this is a pretty exciting place for a data junkie.  <br />www.clearhonestdata.com 12 May 201...
7<br />How did we get here?<br />
Image classification (filtering)<br />8<br />
Difficult!<br />9<br />
10<br />Started fundraising, January 2008<br />Gave up on fundraising, May 2008<br />But... introduction to AWS user<br />
11<br />in office<br />in datacentre<br />Amazon cloud<br /><ul><li>Easy to maintain
 Very limited in scale
 Familiar
 Expensive – machines and space
 Committing
 Cheap to experiment
Scaleable
Upcoming SlideShare
Loading in …5
×

AWS Customer Presentation - Zanran and AWS

2,326 views
2,161 views

Published on

Jon Goldhill from Zanran talks about running a search engine on AWS

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,326
On SlideShare
0
From Embeds
0
Number of Embeds
934
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

AWS Customer Presentation - Zanran and AWS

  1. 1. Zanran<br />- a tale<br />1<br />
  2. 2. people<br />Jon Goldhill<br />Yves Dassas<br />MBA from London Business School<br />Started telecom information business in 1987<br />PhD in electrochemistry<br />Started voice processing business in 1995<br />Built international telephony business together<br />2<br />
  3. 3. 3<br />
  4. 4. 4<br />
  5. 5. 5<br />
  6. 6. 6<br />..While in early beta, this is a pretty exciting place for a data junkie.  <br />www.clearhonestdata.com 12 May 2011<br />Is Zanran Any Good?<br />Short answer: For some queries, yes, Zanran is quite good. Almost scarily so, actually. <br />http://SearchEngineLand.com 12 May 2011<br />I don't usually post on non-patent or other IP matters, but I'm making an exception for a valuable search engine that should be used when it's data rather than words that you are looking for....<br />Steve van Dulken’s blog on Patents and IP, 13 Aug 2011<br />
  7. 7. 7<br />How did we get here?<br />
  8. 8. Image classification (filtering)<br />8<br />
  9. 9. Difficult!<br />9<br />
  10. 10. 10<br />Started fundraising, January 2008<br />Gave up on fundraising, May 2008<br />But... introduction to AWS user<br />
  11. 11. 11<br />in office<br />in datacentre<br />Amazon cloud<br /><ul><li>Easy to maintain
  12. 12. Very limited in scale
  13. 13. Familiar
  14. 14. Expensive – machines and space
  15. 15. Committing
  16. 16. Cheap to experiment
  17. 17. Scaleable
  18. 18. Avoid purchase errorsbut: unfamiliar, Linux</li></li></ul><li>12<br />Zanran front end <br />– what the users interact with<br />Webserver<br />Solr<br />ec2: High-Memory Extra Large Instance<br />users<br />Storage<br />on S3<br />
  19. 19. 13<br />Zanran back end <br />– batch processing<br />Crawl the internet<br />Stage 1<br />Stage 2<br />Stage 3<br />Stage 4<br />new PDF, Excel, etc<br />Amazon RDS<br />Image processing<br />Is this a graph?<br />Text extraction<br />Find a title + other useful text<br />index<br />Solr<br />2<br />
  20. 20. 14<br />scale<br />Crawling<br />Image processing<br />Text extraction<br />Re-indexing <br />10 small instances<br />300 small instances<br />20 small instances<br />1 extra large instance<br />
  21. 21. 15<br />reliability<br />Solr: 6 months<br />RDS: 19 months<br />S3: 100m+ files stored<br />
  22. 22. Benefits from using Amazon<br />scaleability – from 3 to 303 servers<br />scaleability – from 7 to 17GB RAM<br />flexibility – Solr development servers<br />‘ecosystem’ – RightScale, forums<br />lower capital and operations costs<br />16<br />
  23. 23. office dog<br />not present today<br />17<br />

×