This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.
For the latest updates go to http://www.statmt.org/mosescore/
or follow us on Twitter - #MosesCore
2. MosesCore
Year 1 (2012)
• Easier installation
– Binary releases
– Pre-built models
• Testing and Releases
– Linux, Mac OSX, Windows
– 32 and 64-bit
• Faster training
– Parallelism at all stages
5. Why did you Refactor?
• Feature Function Framework
– easier to implement new features
– use sparse features
• Simplify class structure
– easier to develop with Moses
• Delete functionality
– easier to refactor code
– very little deletion
6. Why did you Refactor?
• Feature Function Framework
– easier to implement new features
– use sparse features
• Simplify class structure
– easier to develop with Moses
• Delete functionality
– easier to refactor code
– very little deletion
7. Why did you Refactor?
• Feature Function Framework
– easier to implement new features
– use sparse features
• Simplify class structure
– easier to develop with Moses
• Delete functionality
– easier to refactor code
– very little deletion
8. Specify a Feature Function
Then….
[lmodel-file]
8 0 3 europarl.en.srilm.gz
[weight-l]
0.142
ini file:
• New Feature Function
– New sections
● [feature-function-file]
● [weight-?]
• Custom code
– Parse ini file
– Initialize feature function
9. Adding new Feature Function
Now….
[feature]
KENLM file=path order=0
[weight]
KENLM0= 0.142
ini file:
• New Feature Function
– No new section
● Line in [feature] section
● Line in [weight] section
– Framework
● parse ini file
● initialize feature
10. MosesCore
Year 3 (2014)
• Exploit new framework
– Updatable phrase-table
– Neural network language model
– Bilingual language models
– Transliteration
• Translation rule properties
• Syntax decoding
11. MosesCore
Year 3 (2014)
• Exploit new framework
– Updatable phrase-table
● Dynamic suffix array
● Stores training data
– Extract translation rule on-the-fly
– Neural network language model
– Bilingual language models
– Transliteration
• Translation rule properties
• Syntax decoding
12. MosesCore
Year 3 (2014)
• Exploit new framework
– Updatable phrase-table
– Neural network language model
● Continuous space LM
– Bilingual language models
– Transliteration
• Translation rule properties
• Syntax decoding
13. MosesCore
Year 3 (2014)
• Exploit new framework
– Updatable phrase-table
– Neural network language model
– Bilingual language models
● Replicate Devlin et al, 2014
● Large quality gains
– Transliteration
• Translation rule properties
• Syntax decoding
14. MosesCore
Year 3 (2014)
• Exploit new framework
– Updatable phrase-table
– Neural network language model
– Bilingual language models
– Transliteration
● Character level translation
● Learns from parallel data
● Integrate into decoder
• Translation rule properties
• Syntax decoding
15. MosesCore
Year 3 (2014)
• Exploit new framework
– Updatable phrase-table
– Neural network language model
– Bilingual language models
– Transliteration
• Translation rule properties
– Extra information for each rule
● Context, syntax, domain etc
• Syntax decoding
16. MosesCore
Year 3 (2014)
• Exploit new framework
– Updatable phrase-table
– Neural network language model
– Bilingual language models
– Transliteration
• Translation rule properties
• Syntax decoding
– Faster, memory efficient decoding
– More syntactic models
17. Technical Breakout
• Organization and Releases
– Academic and commercial needs
– Prevent forks
– Development/Stable versions
– Forwards/Backward compatibility
– Upgradability
• Features
• Deployment
• Future development
18. Technical Breakout
• Organization and Releases
• Features
• Deployment
• Future development
19. Technical Breakout
• Organization and Releases
• Features
• Deployment
– Platform/Clouds
– Docker containers
– Priorities
– Interaction and data formats
• Future development
20. Technical Breakout
• Organization and Releases
• Features
• Deployment
• Future development
– User-friendliness
– End-to-end solution
– Users
21.
22. Changes in Moses
Hieu Hoang
TAUS
October 2014
Thanks for inviting me to come
Here to tell you a little about the things I’ve
been doing to Moses
- over the past 2 years
- mainly concentrate of the past year
- but will quickly tell you about things I did
prior to that
1
23. MosesCore
Year 1 (2012)
• Easier installation
– Binary releases
– Pre-built models
• Testing and Releases
– Linux, Mac OSX, Windows
– 32 and 64-bit
• Faster training
– Parallelism at all stages
In the 1st year
- picked off the low hanging fruit
- fixed many of the easy issues that required
- time & effort
Made installation easier
Run a lot of experiments anyway
- gave some of them away
- with all the scripts + configuration
- used to run them
- students can see how to replicate our
results
Lots of testing
- all major platforms
Made obvious speed improvements
2
24. MosesCore
Year 2 (2013)
• Even Easier installation
– Binary releases
– Pre-built models
– Virtual Machines
– Amazon EC2
• Refactored Decoder
In year 2
- made it even easier to install
- if you can’t be bother to compile or even
download the binaries
- download a virtual machine with moses +
friends installed
OR
rent an amazon server with moses + friends
installed
3
25. MosesCore
Year 2 (2013)
• Even Easier installation
– Binary releases
– Pre-built models
– Virtual Machines
– Amazon EC2
• Refactored Decoder
However, the main reason I came here today
- talk about the major changes I made
- in decoder
- and else where
Makes is easier for us coders
- add and change things in Moses
4
26. Why did you Refactor?
• Feature Function Framework
– easier to implement new features
– use sparse features
• Simplify class structure
– easier to develop with Moses
• Delete functionality
– easier to refactor code
– very little deletion
What is a feature function?
- something that gives a translation a score
over the last few years
- gotten bored with existing features like
language models and reordering models
the trend in MT
- create novel features which give a score to
a translation
- like any feature, tries to give bigger scores
to better models
New feature function framework
- designed to make it easy to add new
features
5
27. Why did you Refactor?
• Feature Function Framework
– easier to implement new features
– use sparse features
• Simplify class structure
– easier to develop with Moses
• Delete functionality
– easier to refactor code
– very little deletion
Simplify class structure
- to make it easier for us to develop with
Moses
- Moses has been around for 8 years now
- everyone has the freedom to add what
they want
- no-one is in overall control
- this way of organising an open-source
project is great
- gotten lots of contribution, lots of
features
- downside
- grown organically
- things are not as well structured as
they can be
6
28. Why did you Refactor?
• Feature Function Framework
– easier to implement new features
– use sparse features
• Simplify class structure
– easier to develop with Moses
• Delete functionality
– easier to refactor code
– very little deletion
Why did I delete things
- delete very little
- I’m not the gatekeeper of moses, I don’t
control it
- if a functionality was deleted, it’s not a
comment on usefulness of it
- purely ‘cos it got in the way of the
refactoring
Quickly go thru the last 2
- before telling you about feature functions
7
29. Specify a Feature Function
Then….
[lmodel-file]
8 0 3 europarl.en.srilm.gz
[weight-l]
0.142
ini file:
• New Feature Function
– New sections
● [feature-function-file]
● [weight-?]
• Custom code
– Parse ini file
– Initialize feature function
completely bestoked
- no framework to help you
- if you don’t do it right, wont’ work
8
30. Adding new Feature Function
Now….
[feature]
KENLM file=path order=0
[weight]
KENLM0= 0.142
ini file:
• New Feature Function
– No new section
● Line in [feature] section
● Line in [weight] section
– Framework
● parse ini file
● initialize feature
Write a class that implements the feature
function
The framework does the rest
- no need to create a custom section in the ini
file
or
- change StaticData class
or
- change Paramater class
9
31. MosesCore
Year 3 (2014)
• Exploit new framework
– Updatable phrase-table
– Neural network language model
– Bilingual language models
– Transliteration
• Translation rule properties
• Syntax decoding
32. MosesCore
Year 3 (2014)
• Exploit new framework
– Updatable phrase-table
● Dynamic suffix array
● Stores training data
– Extract translation rule on-the-fly
– Neural network language model
– Bilingual language models
– Transliteration
• Translation rule properties
• Syntax decoding
33. MosesCore
Year 3 (2014)
• Exploit new framework
– Updatable phrase-table
– Neural network language model
● Continuous space LM
– Bilingual language models
– Transliteration
• Translation rule properties
• Syntax decoding
34. MosesCore
Year 3 (2014)
• Exploit new framework
– Updatable phrase-table
– Neural network language model
– Bilingual language models
● Replicate Devlin et al, 2014
● Large quality gains
– Transliteration
• Translation rule properties
• Syntax decoding
35. MosesCore
Year 3 (2014)
• Exploit new framework
– Updatable phrase-table
– Neural network language model
– Bilingual language models
– Transliteration
● Character level translation
● Learns from parallel data
● Integrate into decoder
• Translation rule properties
• Syntax decoding
36. MosesCore
Year 3 (2014)
• Exploit new framework
– Updatable phrase-table
– Neural network language model
– Bilingual language models
– Transliteration
• Translation rule properties
– Extra information for each rule
● Context, syntax, domain etc
• Syntax decoding
37. MosesCore
Year 3 (2014)
• Exploit new framework
– Updatable phrase-table
– Neural network language model
– Bilingual language models
– Transliteration
• Translation rule properties
• Syntax decoding
– Faster, memory efficient decoding
– More syntactic models
38. Technical Breakout
• Organization and Releases
– Academic and commercial needs
– Prevent forks
– Development/Stable versions
– Forwards/Backward compatibility
– Upgradability
• Features
• Deployment
• Future development
39. Technical Breakout
• Organization and Releases
• Features
• Deployment
• Future development
40. Technical Breakout
• Organization and Releases
• Features
• Deployment
– Platform/Clouds
– Docker containers
– Priorities
– Interaction and data formats
• Future development
41. Technical Breakout
• Organization and Releases
• Features
• Deployment
• Future development
– User-friendliness
– End-to-end solution
– Users