Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
   Big Data: Web, SNS, System log, voice data, images/video, sensor       data…      Growth rate is 45%/year       ◦ Inc...
   Hadoop:A de-facto distributed computing framework for Big    Data       But not suitable for realtime processing and ...
Batch application            Realtime application        Simple Analysis               In-depth Analysis         (Statisti...
   Requirements: “Scalability,” ”Realtime processing,”    and “In-depth analysis”   Joint development with Preferred Inf...
   【Big Data】Big stream⇒ worldwide:8000 Tweets/sec,Japanese:500~2000tweets/sec   【Realtime processing】recognition of “go...
Realtime recommendation for E-Commercesites/On demand TV  ・Conventional batch processing:a recommended item for a certain ...
2-3machines                            for current                                               Company【Big Data】        ...
【Big Data】& 【Realtime processing】                  100,000/sec update throughput per serverBuying/search   queries        ...
   Jubatus OSS website    ◦ http://jubat.us    ◦ 2nd edition will be      released on 17th Feb.              Features    ...
11
Jubatus Presentation on R&D forum 2011
Upcoming SlideShare
Loading in …5
×

Jubatus Presentation on R&D forum 2011

2,111 views

Published on

Published in: Technology, Business
  • Be the first to comment

Jubatus Presentation on R&D forum 2011

  1. 1.  Big Data: Web, SNS, System log, voice data, images/video, sensor data…  Growth rate is 45%/year ◦ Increase of “unstructured data” such as sensor data Sensor data 35ZB (5 billionsStructured data phones) in 2020 Customer data 45%growth/year images/video SNSBusiness data 0.8ZB (Processed data:100TB/day) (uploaded videos: 60,000/week) in 2009 Unstructured data (8,000Tweets/sec) 2
  2. 2.  Hadoop:A de-facto distributed computing framework for Big Data  But not suitable for realtime processing and in-depth analysis Realtime Processing In-depth Analysis Batch Processing Simple Statistics Big data 3
  3. 3. Batch application Realtime application Simple Analysis In-depth Analysis (Statistics) (classification, estimation, prediction)BigData Realtime(Online) Batch(Stored) Jubatus 4
  4. 4.  Requirements: “Scalability,” ”Realtime processing,” and “In-depth analysis” Joint development with Preferred Infrastructure In-depth Analysis SVMlight Scalabili ty References: •Hadoop->http://hadoop.apache.org/ •mahout->http://mahout.apache.org/ •WEKA->http://weka-jp.info/ •SVMlight->http://svmlight.joachims.org/ •Yahoo! S4->http://s4.io/ •TwitterStorm->http://engineering.twitter.com/2011/08/storm-is- coming-more-details-and-plans.html •CEP-> Complex Event Processing 5
  5. 5.  【Big Data】Big stream⇒ worldwide:8000 Tweets/sec,Japanese:500~2000tweets/sec 【Realtime processing】recognition of “good”/”bad” newsby learning ⇒ following up bursty tweets 【In-deapth analysis】automatic classification of “tweets related to topics of interest(keyword)” Realtime Client Application analysis by【Big Data】tweets Jubatus keyword: NTTWorldwide: 8000Tweets/sec  Monitoring for NTT-relatedJapanese: 2000Tweets/sec tweets  Unnecessarily to contain results “NTT”Twitte r 【Realtime】【in-depth analysis】 Automatic realtime classification for highly related tweets with the concerned issue(keyword) 6
  6. 6. Realtime recommendation for E-Commercesites/On demand TV ・Conventional batch processing:a recommended item for a certain period ・Jubatus:instant recognition of sudden changes in buying trend Recommendation accuracyCustomers Sudden order increase after the death of a celebrity Customer Sudden order increase buying after a TV expose history Real behavior Jubatus Batch processing Realtime recommendation time by Jubatus Recommended items are updated in realtime by relating other customers’ buying history trends 7
  7. 7. 2-3machines for current Company【Big Data】 Twitter stream CategoryTweetsWorldwide: CompanyA8000Tweets/sec CompanyB CompanyC Twitter CompanyD ... 【Realtime】 &【in-depth analysis】 Realtime automatic company classification for “tweets” 8
  8. 8. 【Big Data】& 【Realtime processing】 100,000/sec update throughput per serverBuying/search queries Item Item Item Item ... 1 2 3 X User ○ ○ ○ A User B ○ ... ○ ○ User ○ ○ Y Recommend ed item 【Big Data】&【In-depth analysis】 Response time: 0.1sec for 30 million users(x10 faster than Mahout) 9
  9. 9.  Jubatus OSS website ◦ http://jubat.us ◦ 2nd edition will be released on 17th Feb. Features 1st ed. Linear classification Regression,Statistics, 2nd ed. Recommendation OSS community Web: http://jubat.us Github https://github.com/jubatus/jubatus Twitter @JubatusOfficial 10
  10. 10. 11

×