0
HOMEPAGE & SEARCH ENGINE 2008.12.08
<ul><li>2. About Cloud computing  </li></ul><ul><li>3. Application Introduction </li></ul><ul><li>- Nutch </li></ul><ul><l...
2. ABOUT CLOUD COMPUTING
<ul><li>Cloud computing is Internet-based (&quot;cloud&quot;) development and use of computer technology (&quot;computing&...
3. APPLICATION INTRODUCTION
<ul><li>open source web-search software based Lucene </li></ul><ul><li>원래는  Apache Lucene project  의  sub-project </li></u...
<ul><li>Transparency .  </li></ul><ul><ul><li>Nutch is open source, so anyone can see how the ranking algorithms work.  </...
<ul><li>Extensibility .  </li></ul><ul><ul><li>Nutch is very flexible </li></ul></ul><ul><ul><li>it can be customized and ...
<ul><li>Nutch divides naturally into two pieces:  </li></ul><ul><ul><li>the crawler  </li></ul></ul><ul><ul><li>the search...
<ul><li>More detail about crawler </li></ul><ul><ul><li>the Nutch crawler system produces three key data structures: </li>...
<ul><li>More detail about searcher </li></ul><ul><ul><li>Nutch looks for these in the  index  and  segments  subdirectorie...
<ul><li>crawl db 로부터  url 의 목록을 생성한다 .  </li></ul><ul><li>segment 에서  url 의 목록을  fetch 한다 .  </li></ul><ul><li>segment 에서 ...
<ul><li>Nutch  실행 방법 </li></ul><ul><ul><li>Nutch 가 설치된  directory  에서  cralwing 을 시작 </li></ul></ul><ul><ul><li>>>  /bin/n...
<ul><li>Nutch 0.9 from apache-nutch homepage </li></ul><ul><li>JAVA JDK-6 </li></ul><ul><li>Tomcat 5.5 version  이상  versio...
<ul><li>A project for Cloud Computing of Google </li></ul><ul><li>Google web application platform </li></ul><ul><ul><li>Ea...
<ul><li>Google App Engine 에서 제공하는 기능 </li></ul><ul><ul><li>Python 이 제공하는 기본 기능  </li></ul></ul><ul><ul><ul><li>Python 으로 만...
<ul><li>Google’s Moto :  </li></ul><ul><ul><li>“ Web Development that doesn’t hurt” </li></ul></ul><ul><ul><li>Google App ...
<ul><li>Google App Engine  실행 방법 </li></ul><ul><ul><li>Google-engine 이 설치된  directory 로 이동 </li></ul></ul><ul><ul><li>Goog...
<ul><li>Google App Engine using the App Engine software development kit (SDK) </li></ul><ul><li>Python 2.5 </li></ul><ul><...
4. PRESENTATION
<ul><li>Nutch  </li></ul><ul><li>Google App Engine + Nutch </li></ul><ul><li>Another example of using Google App Engine </...
Upcoming SlideShare
Loading in...5
×

Nutch Homepage Search Engine

2,335

Published on

Source: http://flyingbono.tistory.com/entry/Nutch-%EC%82%AC%EC%9A%A9%ED%95%B4%EB%B3%B4%EA%B8%B0

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,335
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
66
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Transcript of "Nutch Homepage Search Engine"

    1. 1. HOMEPAGE & SEARCH ENGINE 2008.12.08
    2. 2. <ul><li>2. About Cloud computing </li></ul><ul><li>3. Application Introduction </li></ul><ul><li>- Nutch </li></ul><ul><li>- Google App Engine </li></ul><ul><li>4. Presentation </li></ul>Contents
    3. 3. 2. ABOUT CLOUD COMPUTING
    4. 4. <ul><li>Cloud computing is Internet-based (&quot;cloud&quot;) development and use of computer technology (&quot;computing&quot;). </li></ul><ul><li>Cloud computing is a general concept that incorporates software as a service (SaaS), Web 2.0 and other recent, well-known technology trends, in which the common theme is reliance on the Internet for satisfying the computing needs of the users. </li></ul>2. What is Cloud computing?
    5. 5. 3. APPLICATION INTRODUCTION
    6. 6. <ul><li>open source web-search software based Lucene </li></ul><ul><li>원래는 Apache Lucene project 의 sub-project </li></ul><ul><li>Lucene 을 좀더 사용하기 편하게 하기 위한 목적 </li></ul><ul><li>Lucene Java : </li></ul><ul><ul><li>Apache 의 매우 유명한 open source search engine </li></ul></ul>3-1.What is ‘Nutch’?
    7. 7. <ul><li>Transparency . </li></ul><ul><ul><li>Nutch is open source, so anyone can see how the ranking algorithms work. </li></ul></ul><ul><li>Understanding . </li></ul><ul><ul><li>Nutch has been built using ideas from academia and industry </li></ul></ul><ul><ul><ul><li>for instance, core parts of Nutch are currently being re-implemented to use the Map Reduce distributed processing model </li></ul></ul></ul><ul><ul><li>Nutch is attractive for researchers who want to try out new search algorithms, since it is so easy to extend. </li></ul></ul>3-1. What is Nutch?
    8. 8. <ul><li>Extensibility . </li></ul><ul><ul><li>Nutch is very flexible </li></ul></ul><ul><ul><li>it can be customized and incorporated into your application. </li></ul></ul><ul><ul><li>For developers, Nutch is a great platform for adding search to heterogeneous collections of information, and being able to customize the search interface, or extend the out-of-the-box functionality through the plugin mechanism. </li></ul></ul>3-1. What is Nutch?
    9. 9. <ul><li>Nutch divides naturally into two pieces: </li></ul><ul><ul><li>the crawler </li></ul></ul><ul><ul><li>the searcher </li></ul></ul><ul><li>Crawl </li></ul><ul><ul><li>페이지를 수집 </li></ul></ul><ul><ul><li>페이지에 대한 index 를 만든다 </li></ul></ul><ul><ul><ul><li>index 는 Crawl 과 Search 간의 가교 역할을 한다 </li></ul></ul></ul><ul><li>Search </li></ul><ul><ul><li>유저의 요청에 따라 필요한 정보를 찾아서 보여준다 </li></ul></ul>3-1. What is Nutch?
    10. 10. <ul><li>More detail about crawler </li></ul><ul><ul><li>the Nutch crawler system produces three key data structures: </li></ul></ul><ul><ul><ul><li>The WebDB containing the web graph of pages and links. </li></ul></ul></ul><ul><ul><ul><li>A set of segments containing the raw data retrieved from the Web by the fetchers. </li></ul></ul></ul><ul><ul><ul><li>The merged index created by indexing and de-duplicating parsed data from the segments. </li></ul></ul></ul>3-1. What is Nutch?
    11. 11. <ul><li>More detail about searcher </li></ul><ul><ul><li>Nutch looks for these in the index and segments subdirectories of the directory defined in the searcher.dir property. </li></ul></ul><ul><ul><li>The default value for searcher.dir is the current directory (.), which is where you started Tomcat. </li></ul></ul>3-1. What is Nutch?
    12. 12. <ul><li>crawl db 로부터 url 의 목록을 생성한다 . </li></ul><ul><li>segment 에서 url 의 목록을 fetch 한다 . </li></ul><ul><li>segment 에서 fetch 한 contents 를 분석 (parse) 한다 . </li></ul><ul><li>세그먼트로부터 crawl db 와 분석한 데이터를 업데이트 한다 </li></ul><ul><li>segments 로부터 invert 링크를 분석한다 . </li></ul><ul><li>segment 문서와 anchor 문서에 대한 색인을 생성한다 . </li></ul><ul><ul><li>이 부분을 계속 반복 실행 </li></ul></ul>3-1. What is Nutch?
    13. 13. <ul><li>Nutch 실행 방법 </li></ul><ul><ul><li>Nutch 가 설치된 directory 에서 cralwing 을 시작 </li></ul></ul><ul><ul><li>>> /bin/nutch crawl –dir urls crawl –depth 3 -topN 10 </li></ul></ul><ul><ul><li>Tomcat 5.5 를 실행 </li></ul></ul><ul><ul><ul><li>주의할 점 : Nutch directory 에서 tomcat 을 실행시켜야 함 </li></ul></ul></ul><ul><ul><li>>> /opt/apache-tomcat-5.5.27/bin/catalina.sh start </li></ul></ul><ul><ul><li>http://localhost:8080/en/ </li></ul></ul>3-1. What is Nutch?
    14. 14. <ul><li>Nutch 0.9 from apache-nutch homepage </li></ul><ul><li>JAVA JDK-6 </li></ul><ul><li>Tomcat 5.5 version 이상 version </li></ul><ul><li>OS : Linux server Edition </li></ul><ul><li> Cygwin for Window’s developer </li></ul>3-2 .Development environment of Nutch
    15. 15. <ul><li>A project for Cloud Computing of Google </li></ul><ul><li>Google web application platform </li></ul><ul><ul><li>Easy to build, easy to maintain, and easy to scale as user’s traffic and data storage needs grow </li></ul></ul><ul><ul><li>No servers to maintain, with App Engine : just upload an application, and it’s ready to serve your users. </li></ul></ul>3-3. What is ‘Google App Engine’?
    16. 16. <ul><li>Google App Engine 에서 제공하는 기능 </li></ul><ul><ul><li>Python 이 제공하는 기본 기능 </li></ul></ul><ul><ul><ul><li>Python 으로 만들어 졌기 때문 </li></ul></ul></ul><ul><ul><li>BigTable/GFS 기술이 뒷받침하는 견고한 Datastore </li></ul></ul><ul><ul><ul><li>Google 에서 만든 기존의 oracle, mysql 과 같은 database </li></ul></ul></ul><ul><ul><li>확장성을 제공하는 호스팅 공간 </li></ul></ul><ul><ul><li>Free ‘Google’ account </li></ul></ul><ul><ul><li>SDK 를 이용한 로컬 개발 및 테스트 </li></ul></ul>3-3. What is ‘Google App Engine’?
    17. 17. <ul><li>Google’s Moto : </li></ul><ul><ul><li>“ Web Development that doesn’t hurt” </li></ul></ul><ul><ul><li>Google App Engine 을 통해 웹 서비스 개발자들은 또 다른 고통 없이 개발할 수 있는 선택권을 갖게 된다 . </li></ul></ul><ul><ul><ul><li>Load balancing, automatic scaling, dynamic web serving 등을 Google App Engine 에서 제공할테니 걱정 없이 application 개발만 신경 써라 </li></ul></ul></ul><ul><ul><li>다만 , 이 선택에는 세가지의 제약이 따른다 . </li></ul></ul><ul><ul><ul><li>1. 모든 코드는 반드시 Python 으로 작성해야 한다 . </li></ul></ul></ul><ul><ul><ul><ul><li>현재 , perl 로 개발 중 </li></ul></ul></ul></ul><ul><ul><ul><li>2. 사용량 제한을 통해 비용 지불의 가능성이 존재한다 . </li></ul></ul></ul><ul><ul><ul><ul><li>무료로 제공되는 사용량 </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>500MB of persistent storage and enough CPU and bandwidth for about 5 million page views a month </li></ul></ul></ul></ul></ul><ul><ul><ul><li>3. 모든 데이타는 구글 플랫폼에서 움직이며 구글이 갖게 된다는 점이다 . 이는 , 구글 플랫폼에 종속된 어플리케이션은 쉽게 구글 플랫폼을 벗어나지 못하게 할 것이다 . </li></ul></ul></ul><ul><ul><ul><ul><li>3 번 째 제약이 Google App Engine 의 가장 치명적 </li></ul></ul></ul></ul>3-3. What is ‘Google App Engine’?
    18. 18. <ul><li>Google App Engine 실행 방법 </li></ul><ul><ul><li>Google-engine 이 설치된 directory 로 이동 </li></ul></ul><ul><ul><li>Google-engine 실행 명령 </li></ul></ul><ul><ul><ul><li>dev_appserver.py bono/ : Test 용 </li></ul></ul></ul><ul><ul><ul><li>appcfg.py update bono/ : Web 에 uploading 함 </li></ul></ul></ul><ul><ul><ul><ul><li>ID & PWD 를 매번 입력하여 uploading </li></ul></ul></ul></ul><ul><ul><li>결과 화면 확인 </li></ul></ul><ul><ul><ul><li>http://localhost:8080/ </li></ul></ul></ul><ul><ul><ul><li>http://flyingbono.appspot.com </li></ul></ul></ul>3-3. What is Google App Engine?
    19. 19. <ul><li>Google App Engine using the App Engine software development kit (SDK) </li></ul><ul><li>Python 2.5 </li></ul><ul><ul><li>You need active Python in window environment </li></ul></ul><ul><li>OS : Windows </li></ul><ul><li> Mac OS X </li></ul><ul><li> Linux </li></ul>3-4. The Development Environment
    20. 20. 4. PRESENTATION
    21. 21. <ul><li>Nutch </li></ul><ul><li>Google App Engine + Nutch </li></ul><ul><li>Another example of using Google App Engine </li></ul>4. Presentation
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×