20140523 ja sst

•Download as PPTX, PDF•

1 like•1,190 views

Mitsuo Hangai

13
https://progres02.jposting.net/pgrakuten/job.phtml?job_code=1386&lang=ja

15
http://event.rakuten.co.jp/campaign/supersale/?l-id=top_normal_flashbnr_10_502&l-id=ppf_pc_s1_pc_web_t1_410

Featured

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Featured (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

20140523 ja sst

1. 大規模なデータを扱う場合のテストの悩みごと JaSST 東北2014 Vol.01 May/23/2014 Mitsuo Hangai New Service Development Department, Rakuten Inc. http://www.rakuten.co.jp/

2. 2 Mitsuo Hangai(半谷充生) 楽天株式会社仙台支社汉语 @bangucs

3. 3

4. 4

5. 5

6. 6

7. 7

8. 8

9. 9

10. 10

11. 11

12. 12

13. 13 https://progres02.jposting.net/pgrakuten/job.phtml?job_code=1386&lang=ja

14. 14 誰か一緒に悩んでくれませんかｗ

15. 15 http://event.rakuten.co.jp/campaign/supersale/?l-id=top_normal_flashbnr_10_502&l-id=ppf_pc_s1_pc_web_t1_410

Editor's Notes

こんにちは、楽天の半谷と申します。ちょっと今日は悩みを聞いていただこうと思って来ました。
自己紹介はこんな感じです。 Scrumに興味がありまして、仙台ですくすくスクラムという勉強会を運営しています。最近はデザイン思考とか、どうやってプロダクトを生み出すかとか、主に上流に興味があります。
私がやってる仕事・楽天市場のデータウェアハウスと、それにまつわるいろんなものを開発したり運用したりしてます。・社内共有の2PBのHadoopクラスタがありまして、そこに毎日、受注情報やら商品の更新情報やら店舗情報やら、数十GBのデータをぶち込んで、レポーティングしたり分析したり悪い店舗さんや利用者を監視したり、いろんな事に使っています。・あんまり言えませんが、バールのようなものを購入した人探して～って警察から依頼受けて探せたりとか、・最近だとおかしな価格で売ってる商品は売れなくしちゃったりとか、そういう事をやっています。・ちなみに今はセールの準備中でおおわらわですｗ成果物はSQLとデータ、というのが一番多いです
・Hadoopはデータを分散処理するのにMap/Reduceっていうアルゴリズムを使ってます・平たく言うと人海戦術です・Javaで書けるけどJoinとか難しいのでFacebookがHiveっていうSQLで書けるのを作った・古いデータベースをこれを使って2011年にリプレースした
・HiveはHadoopのHDFSというファイルシステムをSQLで分析するようにしたもの。・HDFSはランダムなInsertやUpdateに向かない。やりたいならHBaseとか使うしか無いが、これもこれでハードル高い・なのでHiveにもInsert intoとかUpdateとか無い。データを作るのが大変。
・あんまり開発環境が充実してない・規模が全然違う。テスト環境は数TBくらいしか無い・本番には予測不可能なデータがいっぱい。店舗さんとか利用者が入力した情報がそのまま入ってるから。
たとえばこんな、ちゃんと割引してるよねってのを調べるSQLがあったとします
こんなので出力結果が数百万件とかなると、Limitつけて100件だけ出すといっても実行に5分くらいはかかっちゃいますこれでも5分で帰ってくるのがHadoopのすごいとこではありますが。
確認というと現状は、出てきたリストをサンプリングして、・元のテーブルをSelectして見て期待値通りか　とか・商品のページを実際に見てみてつじつま合ってるねって見る　とかかなり泥くさい事をやってなんとかしてます
理想としてはセール中は買物しまくりたいんですよ、ポイントいっぱい入るし安いもの出てるし皆さん是非御利用くださいね
が、実際は、自動化出来てない部分が多いので、出したデータに不具合やら想定外やらあったり、ロジックの確認とかそういうのが来まくってあんまり買物できません・・・
てことで結構悩んでます・SQLのテストをいかに自動化するか・手と目でやってるデータのチェックをどうやって自動化するか・いかにデータの正しさを保障するかとかとか、こうやったらいんじゃねとかうちはこうやってるよ、みたいな事例とか、ご存知だったら教えてほしいですー
で、うちの会社エンジニア募集しとります

20140523 ja sst

Recommended

Recommended

More Related Content

Featured

Featured (20)

20140523 ja sst

Editor's Notes