Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Jubatus Presentation on R&D forum 2011

1,993 views

Published on

Published in: Technology, Business
  • Be the first to comment

Jubatus Presentation on R&D forum 2011

  1. 1.  Big Data: Web, SNS, System log, voice data, images/video, sensor data…  Growth rate is 45%/year ◦ Increase of “unstructured data” such as sensor data Sensor data 35ZB (5 billionsStructured data phones) in 2020 Customer data 45%growth/year images/video SNSBusiness data 0.8ZB (Processed data:100TB/day) (uploaded videos: 60,000/week) in 2009 Unstructured data (8,000Tweets/sec) 2
  2. 2.  Hadoop:A de-facto distributed computing framework for Big Data  But not suitable for realtime processing and in-depth analysis Realtime Processing In-depth Analysis Batch Processing Simple Statistics Big data 3
  3. 3. Batch application Realtime application Simple Analysis In-depth Analysis (Statistics) (classification, estimation, prediction)BigData Realtime(Online) Batch(Stored) Jubatus 4
  4. 4.  Requirements: “Scalability,” ”Realtime processing,” and “In-depth analysis” Joint development with Preferred Infrastructure In-depth Analysis SVMlight Scalabili ty References: •Hadoop->http://hadoop.apache.org/ •mahout->http://mahout.apache.org/ •WEKA->http://weka-jp.info/ •SVMlight->http://svmlight.joachims.org/ •Yahoo! S4->http://s4.io/ •TwitterStorm->http://engineering.twitter.com/2011/08/storm-is- coming-more-details-and-plans.html •CEP-> Complex Event Processing 5
  5. 5.  【Big Data】Big stream⇒ worldwide:8000 Tweets/sec,Japanese:500~2000tweets/sec 【Realtime processing】recognition of “good”/”bad” newsby learning ⇒ following up bursty tweets 【In-deapth analysis】automatic classification of “tweets related to topics of interest(keyword)” Realtime Client Application analysis by【Big Data】tweets Jubatus keyword: NTTWorldwide: 8000Tweets/sec  Monitoring for NTT-relatedJapanese: 2000Tweets/sec tweets  Unnecessarily to contain results “NTT”Twitte r 【Realtime】【in-depth analysis】 Automatic realtime classification for highly related tweets with the concerned issue(keyword) 6
  6. 6. Realtime recommendation for E-Commercesites/On demand TV ・Conventional batch processing:a recommended item for a certain period ・Jubatus:instant recognition of sudden changes in buying trend Recommendation accuracyCustomers Sudden order increase after the death of a celebrity Customer Sudden order increase buying after a TV expose history Real behavior Jubatus Batch processing Realtime recommendation time by Jubatus Recommended items are updated in realtime by relating other customers’ buying history trends 7
  7. 7. 2-3machines for current Company【Big Data】 Twitter stream CategoryTweetsWorldwide: CompanyA8000Tweets/sec CompanyB CompanyC Twitter CompanyD ... 【Realtime】 &【in-depth analysis】 Realtime automatic company classification for “tweets” 8
  8. 8. 【Big Data】& 【Realtime processing】 100,000/sec update throughput per serverBuying/search queries Item Item Item Item ... 1 2 3 X User ○ ○ ○ A User B ○ ... ○ ○ User ○ ○ Y Recommend ed item 【Big Data】&【In-depth analysis】 Response time: 0.1sec for 30 million users(x10 faster than Mahout) 9
  9. 9.  Jubatus OSS website ◦ http://jubat.us ◦ 2nd edition will be released on 17th Feb. Features 1st ed. Linear classification Regression,Statistics, 2nd ed. Recommendation OSS community Web: http://jubat.us Github https://github.com/jubatus/jubatus Twitter @JubatusOfficial 10
  10. 10. 11

×