Research on Automatic Music Composition at the Taiwan AI Labs, April 2020

http://mac.citi.sinica.edu.tw/~yang/
yhyang@ailabs.tw
Yi-Hsuan Yang, Ph.D.
Taiwan AI Labs

About the Music AI Team @ Taiwan AI Labs
• About Taiwan AI Labs
Ø Privately-funded research organization,
founded by Ethan Tu (PTT) in 2017
Ø Three main research area: 1) HCI, 2) medicine, 3) smart city
• About the Music AI team
Ø Members
uscientist [me; since March 2019]
uML engineers (for models)
umusicians
uprogram manager
usoftware engineers (for frontend/backend)
2(image of our musicians, processed by neural image style transfer)

Well-Established Music Technology:
Making Sounds
3
• Music synthesizers that make realistic sounds (e.g,.
electric piano) and new sounds (e.g., electric guitar)
• Based on digital signal processing
Sources:
https://www.wardbrodt.com/blog/history-of-the-electronic-keyboard-infographic-madison-wisconsin
https://www.musicnexo.com/blog/en/history-of-the-electric-guitar-eternal-youth/
https://freesound.org/people/karolist/sounds/370934/

Emerging Music Technology:
Making Music
4
• Computers that can understand (existing) music and
create new music performances
• Based on machine learning
• Example: “Bach Doodle” by Google
(https://www.google.com/doodles/celebrating-johann-sebastian-bach)
https://youtu.be/gsUV0mGEGaY
(image is from Google Magenta’s website)

Use Cases
5
• Make musicians’ life easier
Ø inspire ideas
Ø suggest continuations
Ø suggest accompaniments
• Empower everyone to make music
Ø“democratization” of music creation
• Create copyright free music for videos or games
• Music education

Growing Interest in Music Composing AI
(images are from the internet)

Google’s Magenta Studio: A DAW Plugin
https://magenta.tensorflow.org/studio/
7
• “Continue,” “Generate 4 bars,” “Drumify,” “Interpolate,” “Groove”
(image is from Google Magenta’s website)

Our View
• There are moments in life that
we treasure. How we
musicalize those moments?
• Streaming music cannot really
interact with us
• Interaction is the key
• And, we focus on pop music

Envisioned Paradigm Shift
9(images are from the internet)

Deep Learning-based Approaches
• Question: What’s the input and output, for an
automatic music composition model?
10(image is from the internet)

Two Scenarios
11
Unconditional Conditional
• Prime melody • Tags
• Lyrics • Video clips
(Slide made by Hao-Min Liu)

Beyond Melody
12
Melody
Unconditional
Chords C Am Dm
G7
Drums
Other
Instruments
Conditional
(Slide made by Hao-Min Liu)

Two Main Approaches to
Automatic Music Composition
(1) Consider music as an image (e.g., MuseGAN [dong18aaai])
p pros: a bird-eye view of what’s going on
p cons: do not have ideas of “notes”
13(image is from the internet)

Example Image-based Model: MuseGAN
• Based on GAN (generative adversarial network)
phttps://salu133445.github.io/musegan/results
phttps://salu133445.github.io/ismir2019tutorial/
14
“MuseGAN: Multi-track sequential generative adversarial networks for symbolic music
generation and accompaniment”, AAAI 2018 (from my group at Accademia Sinica)

Two Main Approaches to
Automatic Music Composition
(2) Consider music as a language (e.g., [huang19iclr])
p pros: do have ideas of “notes”
p cons: do not have a bird-eye view of what’s going on
(because a music piece becomes now a sequence of “tokens”)
p becomes a “natural language generation” (NLG) problem
15(image is from the ICLR’19 paper)

Differences between Music and Text
• Music is “polyphonic”: there can be multiple notes
sounding at the same time
ØTherefore, while converting a music piece into a sequence
of tokens, we need a specialized token called “TIME-SHIFT”
16
“Music Transformer: Generating music with long-term structure”, ICLR 2019

Differences between Music and Text
• Music is “polyphonic”: there can be multiple notes
sounding at the same time
ØTherefore, while converting a music piece into a sequence
of tokens, we need a specialized token called “TIME-SHIFT”
17
“LakhNES: Improving multi-
instrumental music generation
with cross-domain pre-training”,
ISMIR 2019
ØAnd, for multi-
instrument music,
we specify the
instrument that
plays each note

“Transformer”: State-of-the-art Approach
• Attention is all you need
Ø Adopted by Google Magenta [huang19iclr], OpenAI, etc
Ø Use self-attention to learn the dependency between tokens
18(image is from the transformer paper)

Google’s Music Transformer
19
https://magenta.github.io/listen-to-transformer/

Google’s Music Transformer Is NOT
Designed for Pop Music
20
• No built-in sense of “bars”
“downbeat
probability”
generated
piano roll

“Pop Music Transformer”
• Our model (arXiv:2002.00212)
pgenerate minute-long popular piano music with expressive,
coherent and clear structure of rhythm and harmony,
without needing any post-processing to refine the result
phttps://soundcloud.com/yating_ai/sets/ai-piano-
generation-demo-202004
21
“downbeat
probability”
generated
piano roll
Yu-Siang Huang
Work done by

PK Google’s Music Transformer
22
https://magenta.github.io/listen-to-transformer/
Google’s original model is
trained on classical music
using Transformer; we
retrain it on our Pop dataset
using Transformer-XL
Our model (“REMI”)
beats Google’s model
(”Baselines 1 & 3”) in
a user study that
involves 76 raters

We Can Also Add Drums
• It’s easy to add drums to the piano generated by our
model, because the generated piano track has clearer
rhythmic patterns
phttps://soundcloud.com/yating_ai/sets/ai-pianodrum-
generation-demo-202004
23
(Image from the paper “A Review of Automatic
Drum Transcription”)
Wen-Yi Hsiao
Work done by

Guitar Transformer
• The same model, after some proper modifications,
can also generate fingerstyle guitar music
p https://soundcloud.com/yating_ai/ai-guitar-tab-
generation-202003/s-KHozfW0PTv5
p Add string & fret
p Right-hand technique: slap, press, hit top, upstroke,
downstroke
24
Yu-Hua Chen
Work done by

https://www.youtube.com/watch?v=9ZIJrr6lmHg
Play Mode: Jamming with Our AI
25
• Yeh et al., “Learning to generate Jazz and Pop piano music from audio via MIR
techniques,” ISMIR-LBD 2019
• Hsiao et al., “Jamming with Yating: Interactive demonstration of a music
composition AI,” ISMIR-LBD 2019
Yin-Cheng Yeh
& Wen-Yi Hsiao
Work done by

Play Mode 2: Generate by Drawing
• What does “love” sound like?
https://ailabs.tw/yating-music-piano-demo/
26
Vibert Thio
Work done by

How Did We Make It?
The “MIR4generation” Pipeline
28
• Use “music information retrieval” (MIR) techniques
to empower AI some knowledge of music
p Chord: for harmonic structure
p Beat/downbeat: for rhythmic structure
p (The AI needs to understand music better!)

The “REMI” Representation of Music
• REMI: REvamped MIDI-like representation of music
p Add chord & tempo
related tokens
p Use `Note-Duration`
instead of Note-Off
p Use `Position & Bar`
instead of Time-Shift
29

• Use Note-Duration instead of Note-Off
p because it’s hard for a model to pair note-on and note-off
p in contrast, note-on & note-duration always occur together
30

• Use Position & Bar instead of Time-Shift
p Position(1/16), Position(2/16), … Position(*/16), …
p provide an explicit metric grid for the model
31

• Use Position & Bar instead of Time-Shift
p Position(1/16), Position(2/16), … Position(*/16), …
p provide an explicit metric grid for the model
n make it easy to add “bar-level” conditions
n make it easy to sync different tracks (e.g. piano & drum)
32

Piano Generation + Drum
https://soundcloud.com/yating_ai/sets/ai-pianodrum-generation-
demo-202004
Piano
Piano
Drums
• Input
• Result of structure analysis
• Result of grooving analysis & add drum loops
(Slide made by Wen-Yi Hsiao)

34
• Easier to learn Position & Bar than to learn Time-Shift

Transformer-XL instead of Transformer
35
Transformer (NeurIPS, 2017)
Transformer-XL (ACL, 2019)
(images are from the ACL’19 paper)

Piano Generation: Pop Music Transformer
36
https://arxiv.org/abs/2002.00212

Open Research
Open source code
https://github.com/YatingMusic/remi (the music composing AI)
https://github.com/YatingMusic/miditoolkit (prepare MIDI data)
https://github.com/YatingMusic/ReaRender (convert MIDI to audio)
37
Yu-Siang Huang
& Wen-Yi Hsiao
Work done by

Ongoing Work: AI Strings
• Generate multi-instrument music
38
Sample output of the generated result
現場版才聽得到～
Yin-Cheng Yeh
WIP

Ongoing Work: AI Vocal
39
• Use GAN to generate realistic vocal timbre & melody
that goes nicely with a given piano track
現場版才聽得到～
Jen-Yu Liu
WIP

Ongoing Work: AI Painter
41
• AI art + music
p music to image
translation
p image to music
translation
p AI canvas as an
interface for
human-AI
interaction
Jen-Yu Liu
WIP
result of the current model →

Conclusion
• Why automatic music composition
p musicians’ friends
p democratization of music creation
• How
p image-based approach
p text-based approach
• What we are doing
p marry music analysis and music composition (e.g., `beat-
based music modeling` in Pop Music Transformer)
p combine image generation and music generation
p explore novel ways of human-AI musical interaction
43

Research on Automatic Music Composition at the Taiwan AI Labs, April 2020

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Research on Automatic Music Composition at the Taiwan AI Labs, April 2020

Similar to Research on Automatic Music Composition at the Taiwan AI Labs, April 2020 (20)

Recently uploaded

Recently uploaded (20)

Research on Automatic Music Composition at the Taiwan AI Labs, April 2020