Ase architecture

556 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
556
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Ase architecture

  1. 1. Dongfei@ginioFebruary 9, 2009
  2. 2. ž Cool artist page
  3. 3. ž highlight the query, more obvious to find results
  4. 4. ž aggregatesame download links and show them in the new download page
  5. 5. ž Spell checking
  6. 6. ž add CMS to help correct the information
  7. 7. ž integrate query, you can query like: song name+ artist, ie: because of you ne-yo
  8. 8. ž search lyric with song name
  9. 9. ž Performance ¡ time cost per 1000 random query 1200 1000 800 second time 600 first time 400 200 0 former first improment second improment
  10. 10. ž entirely drop the unmaintainable and messy codež more close relevance, integrated with song name, artist, album and lyricž delete some bad linksž fix bug of filtering song formatž truncate the phases to normalize the records
  11. 11. ž total artists ž songs with lyric ¡ 10,145 ¡ 914,136ž total albums ž total links ¡ 935,245 ¡ 1,610,896ž total songs ¡ 1,174,579ž total lyrics ¡ 1,729,962ž lyrics without mp3id ¡ 4,226
  12. 12. ž total artists ¡ 230182ž total albums ¡ 638929ž total songs ¡ 6.5Mž Cost time ¡ 10 days
  13. 13. žAllabove is finished by one guy in 15 days.žHow to realize the goals in such a short time?
  14. 14. ž Experience ¡ Start up quickly, avoid the same mistake made before, stand on giant shouldersJž Passion ¡ No holidays. Working for 100hours per week. (one percent inspiration and ninety-nine percent perspiration )ž Curiosity ¡ Interested in new stuff , love to read, observe, think and try.
  15. 15. •Reuse role•Reuse open source
  16. 16. ž Product Designž Software developž User Interfacež Data analysisž…žTask driven!
  17. 17. ž Web choice: ¡ Nginx (apache, squid)+ php + fastcgi + smarty + memcachedž Index choice: ¡ Solr+Lucene (ferret ,sphinx)ž Storage choise: ¡ MySQL, regular file systemž Language choice: ¡ Python, PHP, SHELLž CMS: ¡ Django
  18. 18. žLucene is a full-text search libraryžAdd documents to an index via IndexWriter ¡ A document is a a collection of fields ¡ No config files, dynamic field typing ¡ Flexible text analysis – tokenizers, filtersžSearch for documents via IndexSearcher Hits = search(Query,Filter,Sort,topN)žScoring: tf * idf * lengthNorm
  19. 19. žA full text search server based on Lucenež XML/HTTP Interfacesž Loose Schema to define types and fieldsž Web Administration Interfacež Extensive Cachingž Index Replicationž Extensible Open Architecturež Written in Java5, deployable as a WAR
  20. 20. HTTP Request Servlet Update ServletAdmin Disjunction XML Standard Custom XMLInterface Request Max Request Response Update Request Interface Handler Handler Writer Handler Config Schema Caching Update Solr Core Handler Analysis Concurrency ReplicationLucene
  21. 21. ž The article of ACM "Why Writing Your Own Search Engines is Hard.ž built as a general purpose search engine system. Its like a giant Swiss Army knife with a million little blades. I dont understand how Nutch works, and in the time it would have taken me to figure it out I was able to write my own crawler from scratch.
  22. 22. ž Thebest choice to start an application quicklyž Django (Google makes a similar framework in GAE)ž Abundant lib
  23. 23. ž Smarty http://www.wujianrong.com/archives/2006/ 07/smartyr-1.htmlž XML Paser yum -y install php-xml
  24. 24. ToDo List ex: What did I learn Feb 2009 yesterday? Have I made precess 1、learn NLP technology 2、master skill of time a little bit? management 3、finish a milestone on time … Group VSE:XXX
  25. 25. ž Crawl more sites about lyric, audio files, double the data setž Download the artist and album imagesž Improve search relevancež Extract from heterogeneous data sources.ž Define a player as a killer application
  26. 26. ž Embrace changež Find the appropriate position and make some achievementž Time management and work efficientlyž The reason of product and technology process is the real process the people have madež Every problem would have a solutionž Self-driven and inspiration
  27. 27. END

×