Jubatus Presentation on R&D forum 2011

1,885 views

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,885
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
44
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Jubatus Presentation on R&D forum 2011

  1. 1.  Big Data: Web, SNS, System log, voice data, images/video, sensor data…  Growth rate is 45%/year ◦ Increase of “unstructured data” such as sensor data Sensor data 35ZB (5 billionsStructured data phones) in 2020 Customer data 45%growth/year images/video SNSBusiness data 0.8ZB (Processed data:100TB/day) (uploaded videos: 60,000/week) in 2009 Unstructured data (8,000Tweets/sec) 2
  2. 2.  Hadoop:A de-facto distributed computing framework for Big Data  But not suitable for realtime processing and in-depth analysis Realtime Processing In-depth Analysis Batch Processing Simple Statistics Big data 3
  3. 3. Batch application Realtime application Simple Analysis In-depth Analysis (Statistics) (classification, estimation, prediction)BigData Realtime(Online) Batch(Stored) Jubatus 4
  4. 4.  Requirements: “Scalability,” ”Realtime processing,” and “In-depth analysis” Joint development with Preferred Infrastructure In-depth Analysis SVMlight Scalabili ty References: •Hadoop->http://hadoop.apache.org/ •mahout->http://mahout.apache.org/ •WEKA->http://weka-jp.info/ •SVMlight->http://svmlight.joachims.org/ •Yahoo! S4->http://s4.io/ •TwitterStorm->http://engineering.twitter.com/2011/08/storm-is- coming-more-details-and-plans.html •CEP-> Complex Event Processing 5
  5. 5.  【Big Data】Big stream⇒ worldwide:8000 Tweets/sec,Japanese:500~2000tweets/sec 【Realtime processing】recognition of “good”/”bad” newsby learning ⇒ following up bursty tweets 【In-deapth analysis】automatic classification of “tweets related to topics of interest(keyword)” Realtime Client Application analysis by【Big Data】tweets Jubatus keyword: NTTWorldwide: 8000Tweets/sec  Monitoring for NTT-relatedJapanese: 2000Tweets/sec tweets  Unnecessarily to contain results “NTT”Twitte r 【Realtime】【in-depth analysis】 Automatic realtime classification for highly related tweets with the concerned issue(keyword) 6
  6. 6. Realtime recommendation for E-Commercesites/On demand TV ・Conventional batch processing:a recommended item for a certain period ・Jubatus:instant recognition of sudden changes in buying trend Recommendation accuracyCustomers Sudden order increase after the death of a celebrity Customer Sudden order increase buying after a TV expose history Real behavior Jubatus Batch processing Realtime recommendation time by Jubatus Recommended items are updated in realtime by relating other customers’ buying history trends 7
  7. 7. 2-3machines for current Company【Big Data】 Twitter stream CategoryTweetsWorldwide: CompanyA8000Tweets/sec CompanyB CompanyC Twitter CompanyD ... 【Realtime】 &【in-depth analysis】 Realtime automatic company classification for “tweets” 8
  8. 8. 【Big Data】& 【Realtime processing】 100,000/sec update throughput per serverBuying/search queries Item Item Item Item ... 1 2 3 X User ○ ○ ○ A User B ○ ... ○ ○ User ○ ○ Y Recommend ed item 【Big Data】&【In-depth analysis】 Response time: 0.1sec for 30 million users(x10 faster than Mahout) 9
  9. 9.  Jubatus OSS website ◦ http://jubat.us ◦ 2nd edition will be released on 17th Feb. Features 1st ed. Linear classification Regression,Statistics, 2nd ed. Recommendation OSS community Web: http://jubat.us Github https://github.com/jubatus/jubatus Twitter @JubatusOfficial 10
  10. 10. 11

×