Slides introducing our ongoing projects on automatic music composition at the Yating Music AI Team of the Taiwan AI Labs (https://ailabs.tw/). The following URLs link to some demo audio files we have put on SoundCloud: all of them were fully automatically generated without any manual post-processing or editing.
@ai_piano demo: https://soundcloud.com/yating_ai/sets/ai-piano-generation-demo-202004
@ai_piano+drum demo: https://soundcloud.com/yating_ai/sets/ai-pianodrum-generation-demo-202004
@ai_guitar demo: https://soundcloud.com/yating_ai/ai-guitar-tab-generation-202003/s-KHozfW0PTv5
2. About the Music AI Team @ Taiwan AI Labs
• About Taiwan AI Labs
Ø Privately-funded research organization,
founded by Ethan Tu (PTT) in 2017
Ø Three main research area: 1) HCI, 2) medicine, 3) smart city
• About the Music AI team
Ø Members
uscientist [me; since March 2019]
uML engineers (for models)
umusicians
uprogram manager
usoftware engineers (for frontend/backend)
2(image of our musicians, processed by neural image style transfer)
3. Well-Established Music Technology:
Making Sounds
3
• Music synthesizers that make realistic sounds (e.g,.
electric piano) and new sounds (e.g., electric guitar)
• Based on digital signal processing
Sources:
https://www.wardbrodt.com/blog/history-of-the-electronic-keyboard-infographic-madison-wisconsin
https://www.musicnexo.com/blog/en/history-of-the-electric-guitar-eternal-youth/
https://freesound.org/people/karolist/sounds/370934/
4. Emerging Music Technology:
Making Music
4
• Computers that can understand (existing) music and
create new music performances
• Based on machine learning
• Example: “Bach Doodle” by Google
(https://www.google.com/doodles/celebrating-johann-sebastian-bach)
https://youtu.be/gsUV0mGEGaY
(image is from Google Magenta’s website)
5. Use Cases
5
• Make musicians’ life easier
Ø inspire ideas
Ø suggest continuations
Ø suggest accompaniments
• Empower everyone to make music
Ø“democratization” of music creation
• Create copyright free music for videos or games
• Music education
7. Google’s Magenta Studio: A DAW Plugin
https://magenta.tensorflow.org/studio/
7
• “Continue,” “Generate 4 bars,” “Drumify,” “Interpolate,” “Groove”
(image is from Google Magenta’s website)
8. Our View
• There are moments in life that
we treasure. How we
musicalize those moments?
• Streaming music cannot really
interact with us
• Interaction is the key
• And, we focus on pop music
(images are from the internet)
13. Two Main Approaches to
Automatic Music Composition
(1) Consider music as an image (e.g., MuseGAN [dong18aaai])
p pros: a bird-eye view of what’s going on
p cons: do not have ideas of “notes”
13(image is from the internet)
14. Example Image-based Model: MuseGAN
• Based on GAN (generative adversarial network)
phttps://salu133445.github.io/musegan/results
phttps://salu133445.github.io/ismir2019tutorial/
14
“MuseGAN: Multi-track sequential generative adversarial networks for symbolic music
generation and accompaniment”, AAAI 2018 (from my group at Accademia Sinica)
15. Two Main Approaches to
Automatic Music Composition
(2) Consider music as a language (e.g., [huang19iclr])
p pros: do have ideas of “notes”
p cons: do not have a bird-eye view of what’s going on
(because a music piece becomes now a sequence of “tokens”)
p becomes a “natural language generation” (NLG) problem
15(image is from the ICLR’19 paper)
16. Differences between Music and Text
• Music is “polyphonic”: there can be multiple notes
sounding at the same time
ØTherefore, while converting a music piece into a sequence
of tokens, we need a specialized token called “TIME-SHIFT”
16
“Music Transformer: Generating music with long-term structure”, ICLR 2019
17. Differences between Music and Text
• Music is “polyphonic”: there can be multiple notes
sounding at the same time
ØTherefore, while converting a music piece into a sequence
of tokens, we need a specialized token called “TIME-SHIFT”
17
“LakhNES: Improving multi-
instrumental music generation
with cross-domain pre-training”,
ISMIR 2019
ØAnd, for multi-
instrument music,
we specify the
instrument that
plays each note
18. “Transformer”: State-of-the-art Approach
• Attention is all you need
Ø Adopted by Google Magenta [huang19iclr], OpenAI, etc
Ø Use self-attention to learn the dependency between tokens
18(image is from the transformer paper)
20. Google’s Music Transformer Is NOT
Designed for Pop Music
20
• No built-in sense of “bars”
“downbeat
probability”
generated
piano roll
21. “Pop Music Transformer”
• Our model (arXiv:2002.00212)
pgenerate minute-long popular piano music with expressive,
coherent and clear structure of rhythm and harmony,
without needing any post-processing to refine the result
phttps://soundcloud.com/yating_ai/sets/ai-piano-
generation-demo-202004
21
“downbeat
probability”
generated
piano roll
Yu-Siang Huang
Work done by
22. PK Google’s Music Transformer
22
https://magenta.github.io/listen-to-transformer/
Google’s original model is
trained on classical music
using Transformer; we
retrain it on our Pop dataset
using Transformer-XL
Our model (“REMI”)
beats Google’s model
(”Baselines 1 & 3”) in
a user study that
involves 76 raters
23. We Can Also Add Drums
• It’s easy to add drums to the piano generated by our
model, because the generated piano track has clearer
rhythmic patterns
phttps://soundcloud.com/yating_ai/sets/ai-pianodrum-
generation-demo-202004
23
(Image from the paper “A Review of Automatic
Drum Transcription”)
Wen-Yi Hsiao
Work done by
24. Guitar Transformer
• The same model, after some proper modifications,
can also generate fingerstyle guitar music
p https://soundcloud.com/yating_ai/ai-guitar-tab-
generation-202003/s-KHozfW0PTv5
p Add string & fret
p Right-hand technique: slap, press, hit top, upstroke,
downstroke
24
Yu-Hua Chen
Work done by
(images are from the internet)
25. https://www.youtube.com/watch?v=9ZIJrr6lmHg
Play Mode: Jamming with Our AI
25
• Yeh et al., “Learning to generate Jazz and Pop piano music from audio via MIR
techniques,” ISMIR-LBD 2019
• Hsiao et al., “Jamming with Yating: Interactive demonstration of a music
composition AI,” ISMIR-LBD 2019
Yin-Cheng Yeh
& Wen-Yi Hsiao
Work done by
26. Play Mode 2: Generate by Drawing
• What does “love” sound like?
https://ailabs.tw/yating-music-piano-demo/
26
Vibert Thio
Work done by
27. How Did We Make It?
The “MIR4generation” Pipeline
28
• Use “music information retrieval” (MIR) techniques
to empower AI some knowledge of music
p Chord: for harmonic structure
p Beat/downbeat: for rhythmic structure
p (The AI needs to understand music better!)
28. The “REMI” Representation of Music
• REMI: REvamped MIDI-like representation of music
p Add chord & tempo
related tokens
p Use `Note-Duration`
instead of Note-Off
p Use `Position & Bar`
instead of Time-Shift
29
29. The “REMI” Representation of Music
• Use Note-Duration instead of Note-Off
p because it’s hard for a model to pair note-on and note-off
p in contrast, note-on & note-duration always occur together
30
30. The “REMI” Representation of Music
• Use Position & Bar instead of Time-Shift
p Position(1/16), Position(2/16), … Position(*/16), …
p provide an explicit metric grid for the model
31
31. The “REMI” Representation of Music
• Use Position & Bar instead of Time-Shift
p Position(1/16), Position(2/16), … Position(*/16), …
p provide an explicit metric grid for the model
n make it easy to add “bar-level” conditions
n make it easy to sync different tracks (e.g. piano & drum)
32
32. Piano Generation + Drum
https://soundcloud.com/yating_ai/sets/ai-pianodrum-generation-
demo-202004
Piano
Piano
Drums
• Input
• Result of structure analysis
• Result of grooving analysis & add drum loops
(Slide made by Wen-Yi Hsiao)
36. Open Research
Open source code
https://github.com/YatingMusic/remi (the music composing AI)
https://github.com/YatingMusic/miditoolkit (prepare MIDI data)
https://github.com/YatingMusic/ReaRender (convert MIDI to audio)
37
Yu-Siang Huang
& Wen-Yi Hsiao
Work done by
37. Ongoing Work: AI Strings
• Generate multi-instrument music
38
Sample output of the generated result
現場版才聽得到~
Yin-Cheng Yeh
WIP
38. Ongoing Work: AI Vocal
39
• Use GAN to generate realistic vocal timbre & melody
that goes nicely with a given piano track
現場版才聽得到~
Jen-Yu Liu
WIP
39. Ongoing Work: AI Painter
41
• AI art + music
p music to image
translation
p image to music
translation
p AI canvas as an
interface for
human-AI
interaction
Jen-Yu Liu
WIP
result of the current model →
40. Conclusion
• Why automatic music composition
p musicians’ friends
p democratization of music creation
• How
p image-based approach
p text-based approach
• What we are doing
p marry music analysis and music composition (e.g., `beat-
based music modeling` in Pop Music Transformer)
p combine image generation and music generation
p explore novel ways of human-AI musical interaction
43