Successfully reported this slideshow.
Your SlideShare is downloading. ×

Localisation sentiment analysis - best practices and challenges

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Adversarial search
Adversarial search
Loading in …3
×

Check these out next

1 of 38 Ad

Localisation sentiment analysis - best practices and challenges

Download to read offline

- How can you measure players’ perspective on localization quality?
- How do top PC games and publishers compare in terms of localization sentiment?
- Which genres are most sensitive to localization quality?
- What is the gold standard of localization quality in terms of player sentiment?
- What are the methodological challenges of localization sentiment analysis?

To try and answer these questions, we parsed more than 1 million Steam reviews by Russian players, extracted and manually categorized the ones that mention localization, calculated localization sentiment scores and tested a number of hypotheses. We also completed a pilot machine extraction of localisation sentiments with moderately accurate results.

Before we could dig to the bottom, we had to deal with many technical and methodological challenges and we’d like to share our findings. The research complements our surveys of Russian and Chinese players completed in 2015-2016 and presents a different perspective, as this time we observed existing real-life behavior instead of surveying respondents.

- How can you measure players’ perspective on localization quality?
- How do top PC games and publishers compare in terms of localization sentiment?
- Which genres are most sensitive to localization quality?
- What is the gold standard of localization quality in terms of player sentiment?
- What are the methodological challenges of localization sentiment analysis?

To try and answer these questions, we parsed more than 1 million Steam reviews by Russian players, extracted and manually categorized the ones that mention localization, calculated localization sentiment scores and tested a number of hypotheses. We also completed a pilot machine extraction of localisation sentiments with moderately accurate results.

Before we could dig to the bottom, we had to deal with many technical and methodological challenges and we’d like to share our findings. The research complements our surveys of Russian and Chinese players completed in 2015-2016 and presents a different perspective, as this time we observed existing real-life behavior instead of surveying respondents.

Advertisement
Advertisement

More Related Content

Advertisement

Localisation sentiment analysis - best practices and challenges

  1. 1. Localisation Sentiment Analysis Best practices and challenges Demid Tishin dtishin@gmail.com
  2. 2. Contents •Rationale for localisation sentiment analysis •Research scope, assumptions & limitations •Parsing •Marking •Validating marks •Calculating scores & benchmarking •Some findings from the ranking •Correlation with earlier research •Automation of loc sentiment analysis •Key takeaways •Discussion
  3. 3. Rationale for localisation sentiment analysis •Gather additional data for workflow and vendor management improvement. •Identify localisation quality advocates from other game developers & team up. •Select benchmark content for localisation quality evaluation systems. Better game localisation quality!* *Not guaranteed. Results may vary.
  4. 4. Scope, assumptions & limitations •PC (Steam) titles; no physical distribution. •Games with 1M+ global owners as of May 2018 that have a Russian version – total 267 titles (non-random sampling). •Reviews in Russian. •“Most recent” and “Most helpful (all time)” reviews, cap ≈ 200K entries. •All non-specific sentiments are assumed to be about main game (not the DLCs). •All DLC-specific sentiment excluded. •No localisation sentiment is treated as neutral localisation sentiment.
  5. 5. Localisation sentiment analysis workflow Update rules Parse Validate Mark Calculate Analyse
  6. 6. Initial rules •Used wildcards (locali*) •Used both Cyrillic and Latin script. •перевод|perevod|переве|pereve|локализ|lokaliz| русск|russk|язык|yazik|озвуч|ozvuch|дубляж|dubly| субтитр|subtitr|опечат|opechat|граммат|grammat| орфогр|orfogr|пунктуац|punkt|текст|tekst
  7. 7. Parsing •Steam: steam-scraper data scraper (Python) – number of available reviews limited by Steam; speed ≈1,000 reviews per minute (depends on reviews size). • For mobile platforms we use google-play-scraper (Javascript) (4,400 reviews per language per game; standard quotas 50,000 server requests per day, 10 requests per second, cooldown 1 hour) and app-store-scraper (Javascript) – 500 reviews per territory, ≈5,000 reviews per minute. •Deleted duplicates (need to ignore “page”, “page order”, “date” and “username” fields). •Extracted reviews with keywords (Notepad++ regular expressions).
  8. 8. Validating data •Check for false positives on a small batch of data → prepare a blacklist of keywords (exceptions). e.g. “text” but not “texture” e.g. “Russian” but not “Russian servers” •“Grammar”, “punctuation”, “typo” – highly noisy keywords (players refer to their own writing).
  9. 9. 1M reviews 30K with keywords 17K relevant
  10. 10. Marking localisation sentiment • One review – One mark. • Separate markers for presence (Y), absence (N) and quality (- / +), as well as a neutral marker (0). • Separate markers for VO (V, EV, RV) and Loc (L). • Localisation and VO mentioned – mark localisation. • Negative and positive sentiment – mark negative. • Sentiment about marketing assets only – ignore. • Sarcasm obvious – mark negative. • Sentiment about DLC only – ignore. • Sentiment about non-Steam version – ignore. • Sentiment about technical problems – ignore. • Manual marking output: 4-5 reviews per minute.
  11. 11. Marking localisation sentiment (example)
  12. 12. Calculating localisation sentiment scores •User noted positive quality of loc = +1 •User noted negative quality of loc = -1 •User noted presence of loc = +1 •User noted positive quality of Rus. VO = +1 •User noted negative quality of Rus. VO = -1 •User noted presence of Rus. VO = +1 (only if Steam shows availability of Russian VO) •User noted absence of Rus. VO = -1 (only if Steam shows unavailability of Russian VO) •All other marks (voiceover sentiment with unspecified language, unclear sentiment etc) = 0 •Reviews with no localisation sentiment = 0
  13. 13. Calculating localisation sentiment scores • 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 𝑠𝑐𝑜𝑟𝑒 = σ(+1) + σ(−1) 𝑇𝑜𝑡𝑎𝑙 𝑟𝑒𝑣𝑖𝑒𝑤𝑠 𝑝𝑎𝑟𝑠𝑒𝑑 •Not an absolute score: 0 does not signify a truly neutral localisation sentiment, since users might be X times more likely to express a negative sentiment than a positive sentiment. •Useful for comparing games against each other, or different stages of the same product. •Validation of the score: Confidence interval of the “total reviews parsed” sample (in the total estimated number of Russian players) for the title should be at least 3x less than the share of all localisation sentiments in the sample! → 104 titles (of 267 marked).
  14. 14. Preparing a ranking of titles •Parameters in order of importance: 1. Localisation sentiment score. 2. Share of positive sentiments in total. 3. “Net promoter score”: σ(+1) − σ −1 •4 states for parameters 1 and 2, depending on where the value lies on the range across all titles in the ranking:
  15. 15. Loc sentiment ranking of titles (snapshot)
  16. 16. Top 22 titles by % positive sentiment (highest first)* • Kerbal Space Program • DOOM • Stardew Valley • Tomb Raider • The Witcher 2: Assassins of Kings Enhanced Edition • Titan Quest Anniversary Edition • Neverwinter • Trine 2: Complete Story • Torchlight II • Left 4 Dead 2 • Hitman: Absolution • Alien: Isolation • Game Dev Tycoon • POSTAL 2 • Far Cry 3 • Everlasting Summer • Team Fortress 2 • Portal 2 • Mafia II • Mirror's Edge • The Elder Scrolls V: Skyrim • Tom Clancy’s The Division *benchmark – 80%
  17. 17. Bottom 22 titles by % positive sentiment (lowest first) • Line of Sight • XCOM 2 • Total War: WARHAMMER II • Sid Meier’s Civilization VI • Fallout 4 • Warhammer: Vermintide 2 • HITMAN • Sleeping Dogs: Definitive Edition • L.A. Noire • Max Payne 3 • Grand Theft Auto V • Chivalry: Medieval Warfare • Total War: ATTILA • Loadout • SMITE • Dead Space 2 • Mad Max • Dying Light • Batman: Arkham Origins • Wolfenstein: The New Order • Alan Wake • Warhammer: End Times - Vermintide
  18. 18. What is the sentiment benchmark? Share of positive loc sentiments = 80% Weighted loc sentiment score = +0.01 • These are the average sentiment scores for titles that ranked high in our 2016 survey of players (avg 90%) and were present in both data sets (2016 and 2018). • The 90% cut-off point for 2016 survey is to include all titles by Blizzard, which was selected for its unbeatably consistent scores (lowest score = Overwatch, 92%)
  19. 19. • More sensitive to loc: Strategy, Adventure, RPG • Less sensitive to loc: MMO, Action, Simulation, Casual
  20. 20. How does self-publishing affect localisation sentiment? Positive sentiments (Mean average) Loc sentiment score Self-published titles (incl. by internal studios) 44% -0.0028 Titles with dedicated external publisher 56% 0.0020
  21. 21. Other findings (treat with caution) • Some correlation was observed between loc sentiment and share of Russian players in the game’s audience: Positive loc sentiment > 66% → 12% players were Russian Positive loc sentiment < 33% → 9% players were Russian. • No correlation was observed between Russian user score and availability of Russian VO. • Players’ localisation sentiment (share of positive sentiments) is generally independent of whether they recommend the game or not (Russian user score); same for userscore in the sample of localisation-related reviews.
  22. 22. •Median variation from 2018 data ≈ 16% ☺
  23. 23. Automation: initial approach •Divide all manually marked reviews into 2 sets – positive and negative → Extract specific collocations of 2-6 words → update rules. •Didn’t work: 1. Attribute word(s) often separated from the keyword. 2. Multiple grammatical forms / affixes. 3. Chains of attributes and keywords, endless variations: Очень разочаровал русский перевод в игре: ошибки в текстовых словах ( даже в интерфейсе), ошибки в переводе, в озвучке повторяются слова и звучат банально, и дословно, в общем получаем мы нелепую озвучку и перевод игры в целом.
  24. 24. Automation: search rules and keywords •Working approach: Keyword base + Attribute base (before / after the keyword) separated by 25 characters (max.) •≈ Pareto distribution of keyword frequency. The vast majority of sentiments have any of the 6 keywords: локализ, перев, русск, озвуч, дубляж, субтитр •Non-linear correlation between number of dictionary entries and resulting accuracy: перевод, перевед, перевел, перевест (4 bases) can be reduced to перев (1 base) with accuracy loss ≈5%
  25. 25. Automation: attribute words •Manually compiled 2 dictionaries of most frequent attribute bases (negative and positive). •Validated each attribute to ensure accuracy: • If the proportion of frequencies in negative : positive data sets is less than 2:1 (or 1:2) → remove. • If the frequency in the false positives data set is considerably higher than in two other data sets combined → remove. •If a review contains both positive and negative templates → mark as both positive and negative sentiment.
  26. 26. Automation: tips & tricks • Complex sentences with contrasted parts and punctuation signs can lead to a false positive: "русская локализация радует, но сюжет плохой". Blacklisting all templates with punctuation signs marginally improves accuracy (by 1-2%).
  27. 27. Automation: tips & tricks • When multiple collocations have been detected in a review → compare if any of the collocations include the others → remove the mark for the inner one: Хорошая русская локализация
  28. 28. Automation: tips & tricks • Delete the space before the attribute and check for a negation prefix (“non-”) or particle (“not”) → invert the mark. Игра переведена не полностью • This also helps to remove redundant terms from the dictionaries.
  29. 29. Automation: training approaches (pick 1) Find all sentiments (and a lot of noise) Find most sentiments (with minimum noise)
  30. 30. Automation: KPIs •Machine competence = % correct : % inverted (≈ 8:1) 60% identified correctly by machine (target = 80%) 7% inverted 33% not identified by machine Human marks Machine marks 33% noise (false positives)
  31. 31. Other challenges • Keyword ambiguity: the Russian term “озвучка” usually means “voice over” or “voice acting”, but can also mean “sound design”. • Typically the player mentions “voice over” without specifying if she meant Russian VO, English acting or something else. How you interpret these depends on the purpose of your analysis. • Negative sentiment machine is harder to optimise (people tend to use negative words more and combine them freely). • Detecting sarcasm is hard. • Complaints about absence of VO = negative loc sentiment?
  32. 32. Key takeaways • Two meaningful sentiment scores – % positive and weighted. • Benchmarks for localisation sentiment are 80% (% positive) and 0.01 (weighted score). • High % of loc-related reviews (> 1%) and high overall no. of any reviews (> 2,000) are important for validity. • Strategy, Adventure and RPG are more sensitive to localisation than MMO, Action, Simulation and Casual. • Some AAA developers and publishers are consistently better than others. Self-published titles generally have worse loc sentiment. • Machine identifies at least 67% loc sentiments compared to a human, with at least 8:1 accuracy and 33% noise. Accuracy can be futher improved.
  33. 33. Our research team Demid Tishin founding partner www.allcorrectgames.com More research here! www.slideshare.net/dtishin Need customised analysis? dtishin@gmail.com Dmitry Arthur Denis Demid
  34. 34. • Do you measure players’ localisation sentiment? • What challenges do you face on the way? • What actions do you take based on the findings? • E.g. revise localisation workflow, vendor pool, etc. • How do you automate it? • What are your benchmarks? • How do you factor player sentiment in your localisation quality evaluation systems?

×