SlideShare a Scribd company logo
1 of 30
2022.01.07
MusicBERT:
Symbolic Music Understanding
with Large-Scale Pre-Training
Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu
ACL 2021
Hyeshin Chu
Contents
• Overview
• Introduction
• Related Work
• Methodology
• Experiments & Results
• Conclusion
3
Overview
Suggest novel methods to apply NLP approaches to music domain
Introduce MusicBERT, a large-scale pretrained model for symbolic music understanding
Evaluate the performance on four tasks
4
Overview
Suggest novel methods to apply NLP approaches to music domain
Introduce MusicBERT, a large-scale pretrained model for symbolic music understanding
Evaluate the performance on four tasks
5
Contributions
Construct a large-scale symbolic music corpus
– Million MIDI Dataset(MMD)
Design some mechanisms to enhance
pre-training with symbolic music data
(OctupleMIDI Encoding & Masking Strategies)
Achieve the state-of-the-art results on
four music understanding tasks
: Melody Completion, Accompaniment Suggestion,
Genre Classification, and Style Classification
6
Related Work
Symbolic Music Understanding Symbolic Music Encoding
Masking Strategies in Pre-
training
Word2vec models for
music:
• Huang et al., 2016
• Madjiheurem et al.,
2016
Divide music pieces
 Fixed duration music
slices
• Herremans et al., 2017
• Chuan et al., 2020
Small NN models &
Only a few music tokens
as inputs
MIDI-based
• MIDI
• REMI (Huang and Yang,
2020)
• CP (Hsiao et al., 2021)
Pianoroll-based
• Brunner et al., 2018
• Ji et al., 2020
Still need long input tokens
Application of masking
strategies for music
domain
• MASS (Song et al., 2019)
• SpanBERT (Joshi et al., 2020)
Not considering the
difference between
NLP & music
7
Model Overview
MusicBERT, a large scale Transformer model for symbolic music understanding
8
Model Overview
MusicBERT, a large scale Transformer model for symbolic music understanding
Based on Transformer encoder (Vaswani et al., 2017)
9
Model Overview
MusicBERT, a large scale Transformer model for symbolic music understanding
A novel encoding method, OctupleMIDI, to encode the music sequence more
efficiently
10
Model Overview
MusicBERT, a large scale Transformer model for symbolic music understanding
Predict music tokens as output
11
OctupleMIDI Encoding
Figure 2. Different encoding methods for symbolic music
12
OctupleMIDI Encoding
Previous MIDI-based representations: Still long for Transformer structure
(computation complexity & learning inefficiency)
13
OctupleMIDI Encoding
OctupleMIDI,
a compact symbolic music encoding method
• Encode 6 notes into 6 tokens
• Much shorter than REMI & CP
• Apply to various kinds of music
14
OctupleMIDI Encoding
OctupleMIDI,
a compact symbolic music encoding method
• Encode 6 notes into 6 tokens
• Much shorter than REMI & CP
• Apply to various kinds of music
15
OctupleMIDI Encoding
OctupleMIDI,
a compact symbolic music encoding method
• Encode 6 notes into 6 tokens
• Much shorter than REMI & CP
• Apply to various kinds of music
Each Octuple token:
• Correspon to a note
• Contain 8 elements
16
OctupleMIDI Encoding
Time Signature
Tempo
Bar and Position
A fraction (e.g., 2/4):
• Length of a beat (note duration  e.g., a quarter note in
2/4),
• Number of beats in a bar (e.g., 2 beats in 2/4)
Beats per minute (BPM)
• Pace of music
• From 16 to 256 for OctupleMIDI
On-set time of a note
• 256 bars in a music piece (0 to 255)
• 1/64 note to represent the on-set time of a note (from
0)
17
OctupleMIDI Encoding
Instrument
Pitch
Duration
Velocity
Follow MIDI format
• 129 tokens to represent instruments
• 0 to 127: different general instruments (e.g., piano and
bass)
• 128: special percussion instrument (e.g., drum)
Note pitches for general instruments
• 128 tokens to represent pitch values (follow MIDI
format)
Note pitches for percussion instruments
• 128 pitch tokens to represent percussion type
Note duration
• 128 tokens (percussion: all set to 0)
Quantize the velocity of a note into 32 different values
• Interval of 4 (e.g., 2, 6, 10, … , 122, 126)
18
Masking Strategy
Bar-level masking strategy:
Elements with the same type in the same bar & mask simulaneously
 Avoid information leakage & Learn the contextual representation well
19
Pre-training Corpus
Table 2. Size of different music datasets
OctupleMIDI encoding is universal
 Most MIDI files can be converted
without noticeable loss of musical
information
 Cleaning and deduplication
 Obtain Million-MIDI Dataset (MMD):
1.5 million songs with 2 billion octuple
tokens (musical notes)
20
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Table 4. Model configurations of MusicBERT
Small MusicBERT
To compare with baselines (similar data
size)
Base MusicBERT
To achieve the SOTA results
21
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Four downstream task
Melody Completion Genre & Style Classification
Accompaniment Suggestion
Table 3. Results of different models on the four downstream tasks
22
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Four downstream task
Melody Completion Genre & Style Classification
Accompaniment Suggestion
Table 3. Results of different models
on the four downstream tasks
Task Find the most matched consecutive phrase
in a given set of candidates for a given melodic
phrase
Evaluation The rate of correctly chosen phrase
in the top k candidates
Best Performance 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑠𝑚𝑎𝑙𝑙 , 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒
23
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Four downstream task
Melody Completion Genre & Style Classification
Accompaniment Suggestion
Table 3. Results of different models
on the four downstream tasks
Task To find the most related accompaniment phrase
in a given set of harmonic phrase candidates for a
given melodic phrase
Evaluation The rate of correctly chosen phrase
in the top k candidates
Best Performance 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑠𝑚𝑎𝑙𝑙 , 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒
24
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Four downstream task
Melody Completion Genre & Style Classification
Accompaniment Suggestion
Table 3. Results of different models
on the four downstream tasks
Task To classify the genre and style
Dataset TOP-MAGD for genre, MASD for style
Evaluation F1-micro score
Best Performance 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑠𝑚𝑎𝑙𝑙 , 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒
25
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Experiment on 𝑴𝒖𝒔𝒊𝒄𝑩𝑬𝑹𝑻𝒔𝒎𝒂𝒍𝒍
Effectiveness of OctupleMIDI
Effectiveness of Bar-Level
Masking
Effectiveness of Pre-training
OctupleMIDI significantly outperforms REMI and CP
: Learn from a larger proportion of a music song
with the compact OctupleMIDI encoding
Table 5. Results of different encoding methods
26
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Effectiveness of OctupleMIDI
Effectiveness of Bar-Level
Masking
Effectiveness of Pre-training
Experiment on 𝑴𝒖𝒔𝒊𝒄𝑩𝑬𝑹𝑻𝒔𝒎𝒂𝒍𝒍
Random Randomly masks the elements in the octuple
token
Octuple Randomly mask some octuple tokens
(mask all the elements in an octuple token)
Bar The elements with the same type in the same bar are
27
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Effectiveness of OctupleMIDI
Effectiveness of Bar-Level
Masking
Effectiveness of Pre-training
Experiment on 𝑴𝒖𝒔𝒊𝒄𝑩𝑬𝑹𝑻𝒔𝒎𝒂𝒍𝒍
Pre-training is critical for symbolic music
understanding
28
Conclusion
Propose OctupleMIDI encoding & bar-level masking strategy for music
domain
Develop MusicBERT, a large-scale pre-trained model
for symbolic music understanding
Achieve state-of-the-art performance on
all four evaluated symbolic music understanding task
29
For my research
Acquire some baseline models & datasets to review
Understand new symbolic music representation method
Learn how to design experiments to measure each feature of a model
Thank you

More Related Content

What's hot

カネとAgile #RSGT2018
カネとAgile #RSGT2018カネとAgile #RSGT2018
カネとAgile #RSGT2018Itsuki Kuroda
 
“Боловсон жорлон” төсөл
“Боловсон жорлон”  төсөл“Боловсон жорлон”  төсөл
“Боловсон жорлон” төсөлgalsan Lkhanaa
 
日経BPリーン式創業塾 #leanstartup #リーンスタートアップ
日経BPリーン式創業塾 #leanstartup #リーンスタートアップ日経BPリーン式創業塾 #leanstartup #リーンスタートアップ
日経BPリーン式創業塾 #leanstartup #リーンスタートアップItsuki Kuroda
 
Egely György - Nano Dust Fusion (40pages) - George Egely
Egely György - Nano Dust Fusion (40pages) - George EgelyEgely György - Nano Dust Fusion (40pages) - George Egely
Egely György - Nano Dust Fusion (40pages) - George EgelyExopolitics Hungary
 
Startup science 6 Problem Solution Fit
Startup science 6 Problem Solution FitStartup science 6 Problem Solution Fit
Startup science 6 Problem Solution FitMasa Tadokoro
 
дэлхйин далай
дэлхйин далайдэлхйин далай
дэлхйин далайtungalag
 
スタートアップ立ち上げマニュアル
スタートアップ立ち上げマニュアルスタートアップ立ち上げマニュアル
スタートアップ立ち上げマニュアルTakaya Shinozuka
 
11 р анги хэрэглэгдэхүүн
11 р анги хэрэглэгдэхүүн11 р анги хэрэглэгдэхүүн
11 р анги хэрэглэгдэхүүнUUUUR
 
ашигт малтмал
ашигт малтмалашигт малтмал
ашигт малтмалlimbeea
 
新入社員がイチ早く成果を出すために必要な5つのこと【グロービス経営大学院 特別授業】
新入社員がイチ早く成果を出すために必要な5つのこと【グロービス経営大学院 特別授業】新入社員がイチ早く成果を出すために必要な5つのこと【グロービス経営大学院 特別授業】
新入社員がイチ早く成果を出すために必要な5つのこと【グロービス経営大学院 特別授業】schoowebcampus
 
説得のためのテクノロジ:「カプトロジ」入門
説得のためのテクノロジ:「カプトロジ」入門説得のためのテクノロジ:「カプトロジ」入門
説得のためのテクノロジ:「カプトロジ」入門Nanae Sh irozu
 
Уул уурхайн ирээдүйн чиг хандлага, технологи
Уул уурхайн ирээдүйн чиг хандлага, технологиУул уурхайн ирээдүйн чиг хандлага, технологи
Уул уурхайн ирээдүйн чиг хандлага, технологиKhurtsbaatar Bold
 
trippieceの2億円資金調達プロセス 先生:小泉 文明
trippieceの2億円資金調達プロセス 先生:小泉 文明trippieceの2億円資金調達プロセス 先生:小泉 文明
trippieceの2億円資金調達プロセス 先生:小泉 文明schoowebcampus
 
Менежментийн төлөвлөгөө бие даалтын удирдамж
Менежментийн төлөвлөгөө бие даалтын удирдамжМенежментийн төлөвлөгөө бие даалтын удирдамж
Менежментийн төлөвлөгөө бие даалтын удирдамжAdilbishiin Gelegjamts
 
なぜ今、ハードテックスタートアップなのか
なぜ今、ハードテックスタートアップなのかなぜ今、ハードテックスタートアップなのか
なぜ今、ハードテックスタートアップなのかTakaaki Umada
 
サッカーの戦術トレンドから考えるソフトウェア開発のチームマネジメント
サッカーの戦術トレンドから考えるソフトウェア開発のチームマネジメントサッカーの戦術トレンドから考えるソフトウェア開発のチームマネジメント
サッカーの戦術トレンドから考えるソフトウェア開発のチームマネジメントMasakatsu Sugii
 
стратегийн удирдлагын онолын үндэс
стратегийн удирдлагын онолын үндэсстратегийн удирдлагын онолын үндэс
стратегийн удирдлагын онолын үндэсNael Narantsengel
 
Strategic Management - Lecture 1
Strategic Management - Lecture 1Strategic Management - Lecture 1
Strategic Management - Lecture 1Энхтамир Ш
 
Газарзүй 11 Дэлхийн ХАА.ppt
Газарзүй 11 Дэлхийн ХАА.pptГазарзүй 11 Дэлхийн ХАА.ppt
Газарзүй 11 Дэлхийн ХАА.pptEnkh Tseba
 

What's hot (20)

カネとAgile #RSGT2018
カネとAgile #RSGT2018カネとAgile #RSGT2018
カネとAgile #RSGT2018
 
Хамтран ажиллах санал
Хамтран ажиллах саналХамтран ажиллах санал
Хамтран ажиллах санал
 
“Боловсон жорлон” төсөл
“Боловсон жорлон”  төсөл“Боловсон жорлон”  төсөл
“Боловсон жорлон” төсөл
 
日経BPリーン式創業塾 #leanstartup #リーンスタートアップ
日経BPリーン式創業塾 #leanstartup #リーンスタートアップ日経BPリーン式創業塾 #leanstartup #リーンスタートアップ
日経BPリーン式創業塾 #leanstartup #リーンスタートアップ
 
Egely György - Nano Dust Fusion (40pages) - George Egely
Egely György - Nano Dust Fusion (40pages) - George EgelyEgely György - Nano Dust Fusion (40pages) - George Egely
Egely György - Nano Dust Fusion (40pages) - George Egely
 
Startup science 6 Problem Solution Fit
Startup science 6 Problem Solution FitStartup science 6 Problem Solution Fit
Startup science 6 Problem Solution Fit
 
дэлхйин далай
дэлхйин далайдэлхйин далай
дэлхйин далай
 
スタートアップ立ち上げマニュアル
スタートアップ立ち上げマニュアルスタートアップ立ち上げマニュアル
スタートアップ立ち上げマニュアル
 
11 р анги хэрэглэгдэхүүн
11 р анги хэрэглэгдэхүүн11 р анги хэрэглэгдэхүүн
11 р анги хэрэглэгдэхүүн
 
ашигт малтмал
ашигт малтмалашигт малтмал
ашигт малтмал
 
新入社員がイチ早く成果を出すために必要な5つのこと【グロービス経営大学院 特別授業】
新入社員がイチ早く成果を出すために必要な5つのこと【グロービス経営大学院 特別授業】新入社員がイチ早く成果を出すために必要な5つのこと【グロービス経営大学院 特別授業】
新入社員がイチ早く成果を出すために必要な5つのこと【グロービス経営大学院 特別授業】
 
説得のためのテクノロジ:「カプトロジ」入門
説得のためのテクノロジ:「カプトロジ」入門説得のためのテクノロジ:「カプトロジ」入門
説得のためのテクノロジ:「カプトロジ」入門
 
Уул уурхайн ирээдүйн чиг хандлага, технологи
Уул уурхайн ирээдүйн чиг хандлага, технологиУул уурхайн ирээдүйн чиг хандлага, технологи
Уул уурхайн ирээдүйн чиг хандлага, технологи
 
trippieceの2億円資金調達プロセス 先生:小泉 文明
trippieceの2億円資金調達プロセス 先生:小泉 文明trippieceの2億円資金調達プロセス 先生:小泉 文明
trippieceの2億円資金調達プロセス 先生:小泉 文明
 
Менежментийн төлөвлөгөө бие даалтын удирдамж
Менежментийн төлөвлөгөө бие даалтын удирдамжМенежментийн төлөвлөгөө бие даалтын удирдамж
Менежментийн төлөвлөгөө бие даалтын удирдамж
 
なぜ今、ハードテックスタートアップなのか
なぜ今、ハードテックスタートアップなのかなぜ今、ハードテックスタートアップなのか
なぜ今、ハードテックスタートアップなのか
 
サッカーの戦術トレンドから考えるソフトウェア開発のチームマネジメント
サッカーの戦術トレンドから考えるソフトウェア開発のチームマネジメントサッカーの戦術トレンドから考えるソフトウェア開発のチームマネジメント
サッカーの戦術トレンドから考えるソフトウェア開発のチームマネジメント
 
стратегийн удирдлагын онолын үндэс
стратегийн удирдлагын онолын үндэсстратегийн удирдлагын онолын үндэс
стратегийн удирдлагын онолын үндэс
 
Strategic Management - Lecture 1
Strategic Management - Lecture 1Strategic Management - Lecture 1
Strategic Management - Lecture 1
 
Газарзүй 11 Дэлхийн ХАА.ppt
Газарзүй 11 Дэлхийн ХАА.pptГазарзүй 11 Дэлхийн ХАА.ppt
Газарзүй 11 Дэлхийн ХАА.ppt
 

Similar to MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training

PopMAG: Pop Music Accompaniment Generation
PopMAG: Pop Music Accompaniment GenerationPopMAG: Pop Music Accompaniment Generation
PopMAG: Pop Music Accompaniment Generationivaderivader
 
MLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicMLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicEric Battenberg
 
Ml conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMl conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMLconf
 
Learning to Groove with Inverse Sequence Transformations
Learning to Groove with Inverse Sequence TransformationsLearning to Groove with Inverse Sequence Transformations
Learning to Groove with Inverse Sequence Transformationsivaderivader
 
A system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm gameA system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm gameKuan Ting Chen
 
Two-step Melody Harmonious Generator
Two-step Melody Harmonious GeneratorTwo-step Melody Harmonious Generator
Two-step Melody Harmonious GeneratorSofya Latkina
 
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...Ju-Chiang Wang
 
Music genre prediction
Music genre predictionMusic genre prediction
Music genre predictionAnusha Chavva
 

Similar to MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training (9)

PopMAG: Pop Music Accompaniment Generation
PopMAG: Pop Music Accompaniment GenerationPopMAG: Pop Music Accompaniment Generation
PopMAG: Pop Music Accompaniment Generation
 
MLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicMLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to Music
 
Ml conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMl conf2013 teaching_computers_share
Ml conf2013 teaching_computers_share
 
Learning to Groove with Inverse Sequence Transformations
Learning to Groove with Inverse Sequence TransformationsLearning to Groove with Inverse Sequence Transformations
Learning to Groove with Inverse Sequence Transformations
 
A system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm gameA system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm game
 
Two-step Melody Harmonious Generator
Two-step Melody Harmonious GeneratorTwo-step Melody Harmonious Generator
Two-step Melody Harmonious Generator
 
EEND-SS.pdf
EEND-SS.pdfEEND-SS.pdf
EEND-SS.pdf
 
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
 
Music genre prediction
Music genre predictionMusic genre prediction
Music genre prediction
 

More from ivaderivader

DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality ivaderivader
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...ivaderivader
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...ivaderivader
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...ivaderivader
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networksivaderivader
 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...ivaderivader
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualizationivaderivader
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...ivaderivader
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Poolingivaderivader
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...ivaderivader
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeivaderivader
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removalivaderivader
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Networkivaderivader
 
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsScreen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI Componentsivaderivader
 
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...ivaderivader
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translationivaderivader
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking Systemivaderivader
 

More from ivaderivader (20)

Argument Mining
Argument MiningArgument Mining
Argument Mining
 
Papers at CHI23
Papers at CHI23Papers at CHI23
Papers at CHI23
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removal
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
 
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsScreen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
 
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translation
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking System
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 

MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training

  • 1. 2022.01.07 MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu ACL 2021 Hyeshin Chu
  • 2. Contents • Overview • Introduction • Related Work • Methodology • Experiments & Results • Conclusion
  • 3. 3 Overview Suggest novel methods to apply NLP approaches to music domain Introduce MusicBERT, a large-scale pretrained model for symbolic music understanding Evaluate the performance on four tasks
  • 4. 4 Overview Suggest novel methods to apply NLP approaches to music domain Introduce MusicBERT, a large-scale pretrained model for symbolic music understanding Evaluate the performance on four tasks
  • 5. 5 Contributions Construct a large-scale symbolic music corpus – Million MIDI Dataset(MMD) Design some mechanisms to enhance pre-training with symbolic music data (OctupleMIDI Encoding & Masking Strategies) Achieve the state-of-the-art results on four music understanding tasks : Melody Completion, Accompaniment Suggestion, Genre Classification, and Style Classification
  • 6. 6 Related Work Symbolic Music Understanding Symbolic Music Encoding Masking Strategies in Pre- training Word2vec models for music: • Huang et al., 2016 • Madjiheurem et al., 2016 Divide music pieces  Fixed duration music slices • Herremans et al., 2017 • Chuan et al., 2020 Small NN models & Only a few music tokens as inputs MIDI-based • MIDI • REMI (Huang and Yang, 2020) • CP (Hsiao et al., 2021) Pianoroll-based • Brunner et al., 2018 • Ji et al., 2020 Still need long input tokens Application of masking strategies for music domain • MASS (Song et al., 2019) • SpanBERT (Joshi et al., 2020) Not considering the difference between NLP & music
  • 7. 7 Model Overview MusicBERT, a large scale Transformer model for symbolic music understanding
  • 8. 8 Model Overview MusicBERT, a large scale Transformer model for symbolic music understanding Based on Transformer encoder (Vaswani et al., 2017)
  • 9. 9 Model Overview MusicBERT, a large scale Transformer model for symbolic music understanding A novel encoding method, OctupleMIDI, to encode the music sequence more efficiently
  • 10. 10 Model Overview MusicBERT, a large scale Transformer model for symbolic music understanding Predict music tokens as output
  • 11. 11 OctupleMIDI Encoding Figure 2. Different encoding methods for symbolic music
  • 12. 12 OctupleMIDI Encoding Previous MIDI-based representations: Still long for Transformer structure (computation complexity & learning inefficiency)
  • 13. 13 OctupleMIDI Encoding OctupleMIDI, a compact symbolic music encoding method • Encode 6 notes into 6 tokens • Much shorter than REMI & CP • Apply to various kinds of music
  • 14. 14 OctupleMIDI Encoding OctupleMIDI, a compact symbolic music encoding method • Encode 6 notes into 6 tokens • Much shorter than REMI & CP • Apply to various kinds of music
  • 15. 15 OctupleMIDI Encoding OctupleMIDI, a compact symbolic music encoding method • Encode 6 notes into 6 tokens • Much shorter than REMI & CP • Apply to various kinds of music Each Octuple token: • Correspon to a note • Contain 8 elements
  • 16. 16 OctupleMIDI Encoding Time Signature Tempo Bar and Position A fraction (e.g., 2/4): • Length of a beat (note duration  e.g., a quarter note in 2/4), • Number of beats in a bar (e.g., 2 beats in 2/4) Beats per minute (BPM) • Pace of music • From 16 to 256 for OctupleMIDI On-set time of a note • 256 bars in a music piece (0 to 255) • 1/64 note to represent the on-set time of a note (from 0)
  • 17. 17 OctupleMIDI Encoding Instrument Pitch Duration Velocity Follow MIDI format • 129 tokens to represent instruments • 0 to 127: different general instruments (e.g., piano and bass) • 128: special percussion instrument (e.g., drum) Note pitches for general instruments • 128 tokens to represent pitch values (follow MIDI format) Note pitches for percussion instruments • 128 pitch tokens to represent percussion type Note duration • 128 tokens (percussion: all set to 0) Quantize the velocity of a note into 32 different values • Interval of 4 (e.g., 2, 6, 10, … , 122, 126)
  • 18. 18 Masking Strategy Bar-level masking strategy: Elements with the same type in the same bar & mask simulaneously  Avoid information leakage & Learn the contextual representation well
  • 19. 19 Pre-training Corpus Table 2. Size of different music datasets OctupleMIDI encoding is universal  Most MIDI files can be converted without noticeable loss of musical information  Cleaning and deduplication  Obtain Million-MIDI Dataset (MMD): 1.5 million songs with 2 billion octuple tokens (musical notes)
  • 20. 20 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Table 4. Model configurations of MusicBERT Small MusicBERT To compare with baselines (similar data size) Base MusicBERT To achieve the SOTA results
  • 21. 21 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Four downstream task Melody Completion Genre & Style Classification Accompaniment Suggestion Table 3. Results of different models on the four downstream tasks
  • 22. 22 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Four downstream task Melody Completion Genre & Style Classification Accompaniment Suggestion Table 3. Results of different models on the four downstream tasks Task Find the most matched consecutive phrase in a given set of candidates for a given melodic phrase Evaluation The rate of correctly chosen phrase in the top k candidates Best Performance 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑠𝑚𝑎𝑙𝑙 , 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒
  • 23. 23 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Four downstream task Melody Completion Genre & Style Classification Accompaniment Suggestion Table 3. Results of different models on the four downstream tasks Task To find the most related accompaniment phrase in a given set of harmonic phrase candidates for a given melodic phrase Evaluation The rate of correctly chosen phrase in the top k candidates Best Performance 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑠𝑚𝑎𝑙𝑙 , 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒
  • 24. 24 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Four downstream task Melody Completion Genre & Style Classification Accompaniment Suggestion Table 3. Results of different models on the four downstream tasks Task To classify the genre and style Dataset TOP-MAGD for genre, MASD for style Evaluation F1-micro score Best Performance 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑠𝑚𝑎𝑙𝑙 , 𝑀𝑢𝑠𝑖𝑐𝐵𝐸𝑅𝑇𝑏𝑎𝑠𝑒
  • 25. 25 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Experiment on 𝑴𝒖𝒔𝒊𝒄𝑩𝑬𝑹𝑻𝒔𝒎𝒂𝒍𝒍 Effectiveness of OctupleMIDI Effectiveness of Bar-Level Masking Effectiveness of Pre-training OctupleMIDI significantly outperforms REMI and CP : Learn from a larger proportion of a music song with the compact OctupleMIDI encoding Table 5. Results of different encoding methods
  • 26. 26 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Effectiveness of OctupleMIDI Effectiveness of Bar-Level Masking Effectiveness of Pre-training Experiment on 𝑴𝒖𝒔𝒊𝒄𝑩𝑬𝑹𝑻𝒔𝒎𝒂𝒍𝒍 Random Randomly masks the elements in the octuple token Octuple Randomly mask some octuple tokens (mask all the elements in an octuple token) Bar The elements with the same type in the same bar are
  • 27. 27 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Effectiveness of OctupleMIDI Effectiveness of Bar-Level Masking Effectiveness of Pre-training Experiment on 𝑴𝒖𝒔𝒊𝒄𝑩𝑬𝑹𝑻𝒔𝒎𝒂𝒍𝒍 Pre-training is critical for symbolic music understanding
  • 28. 28 Conclusion Propose OctupleMIDI encoding & bar-level masking strategy for music domain Develop MusicBERT, a large-scale pre-trained model for symbolic music understanding Achieve state-of-the-art performance on all four evaluated symbolic music understanding task
  • 29. 29 For my research Acquire some baseline models & datasets to review Understand new symbolic music representation method Learn how to design experiments to measure each feature of a model