SlideShare a Scribd company logo
1 of 40
Download to read offline
http://mac.citi.sinica.edu.tw/~yang/
yhyang@ailabs.tw
Yi-Hsuan Yang, Ph.D.
Taiwan AI Labs
About the Music AI Team @ Taiwan AI Labs
• About Taiwan AI Labs
Ø Privately-funded research organization,
founded by Ethan Tu (PTT) in 2017
Ø Three main research area: 1) HCI, 2) medicine, 3) smart city
• About the Music AI team
Ø Members
uscientist [me; since March 2019]
uML engineers (for models)
umusicians
uprogram manager
usoftware engineers (for frontend/backend)
2(image of our musicians, processed by neural image style transfer)
Well-Established Music Technology:
Making Sounds
3
• Music synthesizers that make realistic sounds (e.g,.
electric piano) and new sounds (e.g., electric guitar)
• Based on digital signal processing
Sources:
https://www.wardbrodt.com/blog/history-of-the-electronic-keyboard-infographic-madison-wisconsin
https://www.musicnexo.com/blog/en/history-of-the-electric-guitar-eternal-youth/
https://freesound.org/people/karolist/sounds/370934/
Emerging Music Technology:
Making Music
4
• Computers that can understand (existing) music and
create new music performances
• Based on machine learning
• Example: “Bach Doodle” by Google
(https://www.google.com/doodles/celebrating-johann-sebastian-bach)
https://youtu.be/gsUV0mGEGaY
(image is from Google Magenta’s website)
Use Cases
5
• Make musicians’ life easier
Ø inspire ideas
Ø suggest continuations
Ø suggest accompaniments
• Empower everyone to make music
Ø“democratization” of music creation
• Create copyright free music for videos or games
• Music education
Growing Interest in Music Composing AI
(images are from the internet)
Google’s Magenta Studio: A DAW Plugin
https://magenta.tensorflow.org/studio/
7
• “Continue,” “Generate 4 bars,” “Drumify,” “Interpolate,” “Groove”
(image is from Google Magenta’s website)
Our View
• There are moments in life that
we treasure. How we
musicalize those moments?
• Streaming music cannot really
interact with us
• Interaction is the key
• And, we focus on pop music
(images are from the internet)
Envisioned Paradigm Shift
9(images are from the internet)
Deep Learning-based Approaches
• Question: What’s the input and output, for an
automatic music composition model?
10(image is from the internet)
Two Scenarios
11
Unconditional Conditional
• Prime melody • Tags
• Lyrics • Video clips
(Slide made by Hao-Min Liu)
Beyond Melody
12
Melody
Unconditional
Chords C Am Dm
G7
Drums
Other
Instruments
Conditional
(Slide made by Hao-Min Liu)
Two Main Approaches to
Automatic Music Composition
(1) Consider music as an image (e.g., MuseGAN [dong18aaai])
p pros: a bird-eye view of what’s going on
p cons: do not have ideas of “notes”
13(image is from the internet)
Example Image-based Model: MuseGAN
• Based on GAN (generative adversarial network)
phttps://salu133445.github.io/musegan/results
phttps://salu133445.github.io/ismir2019tutorial/
14
“MuseGAN: Multi-track sequential generative adversarial networks for symbolic music
generation and accompaniment”, AAAI 2018 (from my group at Accademia Sinica)
Two Main Approaches to
Automatic Music Composition
(2) Consider music as a language (e.g., [huang19iclr])
p pros: do have ideas of “notes”
p cons: do not have a bird-eye view of what’s going on
(because a music piece becomes now a sequence of “tokens”)
p becomes a “natural language generation” (NLG) problem
15(image is from the ICLR’19 paper)
Differences between Music and Text
• Music is “polyphonic”: there can be multiple notes
sounding at the same time
ØTherefore, while converting a music piece into a sequence
of tokens, we need a specialized token called “TIME-SHIFT”
16
“Music Transformer: Generating music with long-term structure”, ICLR 2019
Differences between Music and Text
• Music is “polyphonic”: there can be multiple notes
sounding at the same time
ØTherefore, while converting a music piece into a sequence
of tokens, we need a specialized token called “TIME-SHIFT”
17
“LakhNES: Improving multi-
instrumental music generation
with cross-domain pre-training”,
ISMIR 2019
ØAnd, for multi-
instrument music,
we specify the
instrument that
plays each note
“Transformer”: State-of-the-art Approach
• Attention is all you need
Ø Adopted by Google Magenta [huang19iclr], OpenAI, etc
Ø Use self-attention to learn the dependency between tokens
18(image is from the transformer paper)
Google’s Music Transformer
19
https://magenta.github.io/listen-to-transformer/
Google’s Music Transformer Is NOT
Designed for Pop Music
20
• No built-in sense of “bars”
“downbeat
probability”
generated
piano roll
“Pop Music Transformer”
• Our model (arXiv:2002.00212)
pgenerate minute-long popular piano music with expressive,
coherent and clear structure of rhythm and harmony,
without needing any post-processing to refine the result
phttps://soundcloud.com/yating_ai/sets/ai-piano-
generation-demo-202004
21
“downbeat
probability”
generated
piano roll
Yu-Siang Huang
Work done by
PK Google’s Music Transformer
22
https://magenta.github.io/listen-to-transformer/
Google’s original model is
trained on classical music
using Transformer; we
retrain it on our Pop dataset
using Transformer-XL
Our model (“REMI”)
beats Google’s model
(”Baselines 1 & 3”) in
a user study that
involves 76 raters
We Can Also Add Drums
• It’s easy to add drums to the piano generated by our
model, because the generated piano track has clearer
rhythmic patterns
phttps://soundcloud.com/yating_ai/sets/ai-pianodrum-
generation-demo-202004
23
(Image from the paper “A Review of Automatic
Drum Transcription”)
Wen-Yi Hsiao
Work done by
Guitar Transformer
• The same model, after some proper modifications,
can also generate fingerstyle guitar music
p https://soundcloud.com/yating_ai/ai-guitar-tab-
generation-202003/s-KHozfW0PTv5
p Add string & fret
p Right-hand technique: slap, press, hit top, upstroke,
downstroke
24
Yu-Hua Chen
Work done by
(images are from the internet)
https://www.youtube.com/watch?v=9ZIJrr6lmHg
Play Mode: Jamming with Our AI
25
• Yeh et al., “Learning to generate Jazz and Pop piano music from audio via MIR
techniques,” ISMIR-LBD 2019
• Hsiao et al., “Jamming with Yating: Interactive demonstration of a music
composition AI,” ISMIR-LBD 2019
Yin-Cheng Yeh
& Wen-Yi Hsiao
Work done by
Play Mode 2: Generate by Drawing
• What does “love” sound like?
https://ailabs.tw/yating-music-piano-demo/
26
Vibert Thio
Work done by
How Did We Make It?
The “MIR4generation” Pipeline
28
• Use “music information retrieval” (MIR) techniques
to empower AI some knowledge of music
p Chord: for harmonic structure
p Beat/downbeat: for rhythmic structure
p (The AI needs to understand music better!)
The “REMI” Representation of Music
• REMI: REvamped MIDI-like representation of music
p Add chord & tempo
related tokens
p Use `Note-Duration`
instead of Note-Off
p Use `Position & Bar`
instead of Time-Shift
29
The “REMI” Representation of Music
• Use Note-Duration instead of Note-Off
p because it’s hard for a model to pair note-on and note-off
p in contrast, note-on & note-duration always occur together
30
The “REMI” Representation of Music
• Use Position & Bar instead of Time-Shift
p Position(1/16), Position(2/16), … Position(*/16), …
p provide an explicit metric grid for the model
31
The “REMI” Representation of Music
• Use Position & Bar instead of Time-Shift
p Position(1/16), Position(2/16), … Position(*/16), …
p provide an explicit metric grid for the model
n make it easy to add “bar-level” conditions
n make it easy to sync different tracks (e.g. piano & drum)
32
Piano Generation + Drum
https://soundcloud.com/yating_ai/sets/ai-pianodrum-generation-
demo-202004
Piano
Piano
Drums
• Input
• Result of structure analysis
• Result of grooving analysis & add drum loops
(Slide made by Wen-Yi Hsiao)
The “REMI” Representation of Music
34
• Easier to learn Position & Bar than to learn Time-Shift
Transformer-XL instead of Transformer
35
Transformer (NeurIPS, 2017)
Transformer-XL (ACL, 2019)
(images are from the ACL’19 paper)
Piano Generation: Pop Music Transformer
36
https://arxiv.org/abs/2002.00212
Open Research
Open source code
https://github.com/YatingMusic/remi (the music composing AI)
https://github.com/YatingMusic/miditoolkit (prepare MIDI data)
https://github.com/YatingMusic/ReaRender (convert MIDI to audio)
37
Yu-Siang Huang
& Wen-Yi Hsiao
Work done by
Ongoing Work: AI Strings
• Generate multi-instrument music
38
Sample output of the generated result
現場版才聽得到~
Yin-Cheng Yeh
WIP
Ongoing Work: AI Vocal
39
• Use GAN to generate realistic vocal timbre & melody
that goes nicely with a given piano track
現場版才聽得到~
Jen-Yu Liu
WIP
Ongoing Work: AI Painter
41
• AI art + music
p music to image
translation
p image to music
translation
p AI canvas as an
interface for
human-AI
interaction
Jen-Yu Liu
WIP
result of the current model →
Conclusion
• Why automatic music composition
p musicians’ friends
p democratization of music creation
• How
p image-based approach
p text-based approach
• What we are doing
p marry music analysis and music composition (e.g., `beat-
based music modeling` in Pop Music Transformer)
p combine image generation and music generation
p explore novel ways of human-AI musical interaction
43

More Related Content

What's hot

MOODetector: Automatic Music Emotion Recognition
MOODetector: Automatic Music Emotion RecognitionMOODetector: Automatic Music Emotion Recognition
MOODetector: Automatic Music Emotion Recognition
Rui Pedro Paiva
 
Electronic Music and Software Craftsmanship: analogue patterns.
Electronic Music and Software Craftsmanship: analogue patterns.Electronic Music and Software Craftsmanship: analogue patterns.
Electronic Music and Software Craftsmanship: analogue patterns.
Guillaume Saint Etienne
 
MSUPPLE RESUME2016theone
MSUPPLE RESUME2016theoneMSUPPLE RESUME2016theone
MSUPPLE RESUME2016theone
Mike Supple
 

What's hot (20)

Learning to Generate Jazz & Pop Piano Music from Audio via MIR Techniques
Learning to Generate Jazz & Pop Piano Music from Audio via MIR TechniquesLearning to Generate Jazz & Pop Piano Music from Audio via MIR Techniques
Learning to Generate Jazz & Pop Piano Music from Audio via MIR Techniques
 
20190625 Research at Taiwan AI Labs: Music and Speech AI
20190625 Research at Taiwan AI Labs: Music and Speech AI20190625 Research at Taiwan AI Labs: Music and Speech AI
20190625 Research at Taiwan AI Labs: Music and Speech AI
 
20211026 taicca 1 intro to mir
20211026 taicca 1 intro to mir20211026 taicca 1 intro to mir
20211026 taicca 1 intro to mir
 
"All you need is AI and music" by Keunwoo Choi
"All you need is AI and music" by Keunwoo Choi"All you need is AI and music" by Keunwoo Choi
"All you need is AI and music" by Keunwoo Choi
 
Artificial intelligence and Music
Artificial intelligence and MusicArtificial intelligence and Music
Artificial intelligence and Music
 
Understanding Music Playlists
Understanding Music PlaylistsUnderstanding Music Playlists
Understanding Music Playlists
 
MOODetector: Automatic Music Emotion Recognition
MOODetector: Automatic Music Emotion RecognitionMOODetector: Automatic Music Emotion Recognition
MOODetector: Automatic Music Emotion Recognition
 
Social Tags and Music Information Retrieval (Part I)
Social Tags and Music Information Retrieval (Part I)Social Tags and Music Information Retrieval (Part I)
Social Tags and Music Information Retrieval (Part I)
 
Echo nest-api-boston-2012
Echo nest-api-boston-2012Echo nest-api-boston-2012
Echo nest-api-boston-2012
 
Electronic Music and Software Craftsmanship: analogue patterns.
Electronic Music and Software Craftsmanship: analogue patterns.Electronic Music and Software Craftsmanship: analogue patterns.
Electronic Music and Software Craftsmanship: analogue patterns.
 
Denktank 2010
Denktank 2010Denktank 2010
Denktank 2010
 
Adaptive Music in Video Games (2018)
Adaptive Music in Video Games (2018)Adaptive Music in Video Games (2018)
Adaptive Music in Video Games (2018)
 
Computational models of symphonic music
Computational models of symphonic musicComputational models of symphonic music
Computational models of symphonic music
 
Wojciech Franke - Composing music with clojure.spec - Clojure/conj 2016
Wojciech Franke - Composing music with clojure.spec - Clojure/conj 2016Wojciech Franke - Composing music with clojure.spec - Clojure/conj 2016
Wojciech Franke - Composing music with clojure.spec - Clojure/conj 2016
 
The Creative Process Behind Dialogismos I: Theoretical and Technical Consider...
The Creative Process Behind Dialogismos I: Theoretical and Technical Consider...The Creative Process Behind Dialogismos I: Theoretical and Technical Consider...
The Creative Process Behind Dialogismos I: Theoretical and Technical Consider...
 
Adaptive Music in Games
Adaptive Music in GamesAdaptive Music in Games
Adaptive Music in Games
 
Adaptive Music in Kingdom Come: Deliverance
Adaptive Music in Kingdom Come: DeliveranceAdaptive Music in Kingdom Come: Deliverance
Adaptive Music in Kingdom Come: Deliverance
 
MSUPPLE RESUME2016theone
MSUPPLE RESUME2016theoneMSUPPLE RESUME2016theone
MSUPPLE RESUME2016theone
 
北原研究室の研究事例紹介:ベーシストの旋律分析とイコライザーの印象分析(Music×Analytics Meetup vol.5 ロングトーク)
北原研究室の研究事例紹介:ベーシストの旋律分析とイコライザーの印象分析(Music×Analytics Meetup vol.5 ロングトーク)北原研究室の研究事例紹介:ベーシストの旋律分析とイコライザーの印象分析(Music×Analytics Meetup vol.5 ロングトーク)
北原研究室の研究事例紹介:ベーシストの旋律分析とイコライザーの印象分析(Music×Analytics Meetup vol.5 ロングトーク)
 
Sound Events and Emotions: Investigating the Relation of Rhythmic Characteri...
Sound Events and Emotions: Investigating the Relation of Rhythmic Characteri...Sound Events and Emotions: Investigating the Relation of Rhythmic Characteri...
Sound Events and Emotions: Investigating the Relation of Rhythmic Characteri...
 

Similar to Research on Automatic Music Composition at the Taiwan AI Labs, April 2020

Music robots and techno robots - History, building and playing
Music robots and techno robots - History, building and playingMusic robots and techno robots - History, building and playing
Music robots and techno robots - History, building and playing
moritzsimongeist
 
Enhancing a Digital Sheet Music Collection A report for LIS-435 ...
 Enhancing a Digital Sheet Music Collection A report for LIS-435 ... Enhancing a Digital Sheet Music Collection A report for LIS-435 ...
Enhancing a Digital Sheet Music Collection A report for LIS-435 ...
crysatal16
 
Presentation Seminar
Presentation SeminarPresentation Seminar
Presentation Seminar
elmidodd
 
MA: Presentation Seminar
MA: Presentation SeminarMA: Presentation Seminar
MA: Presentation Seminar
elmidodd
 
Real Time Drum Augmentation with Physical Modeling
Real Time Drum Augmentation with Physical ModelingReal Time Drum Augmentation with Physical Modeling
Real Time Drum Augmentation with Physical Modeling
Ben Eyes
 

Similar to Research on Automatic Music Composition at the Taiwan AI Labs, April 2020 (20)

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
 
Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposal
 
Music robots and techno robots - History, building and playing
Music robots and techno robots - History, building and playingMusic robots and techno robots - History, building and playing
Music robots and techno robots - History, building and playing
 
Music mobile
Music mobileMusic mobile
Music mobile
 
Enhancing a Digital Sheet Music Collection A report for LIS-435 ...
 Enhancing a Digital Sheet Music Collection A report for LIS-435 ... Enhancing a Digital Sheet Music Collection A report for LIS-435 ...
Enhancing a Digital Sheet Music Collection A report for LIS-435 ...
 
Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5
 
Drum Morphium Concept
Drum Morphium ConceptDrum Morphium Concept
Drum Morphium Concept
 
Web audio, Trackers and Making Music
Web audio, Trackers and Making MusicWeb audio, Trackers and Making Music
Web audio, Trackers and Making Music
 
Music Trackers - Linux Usergroup Nijmegen 2014
Music Trackers - Linux Usergroup Nijmegen 2014Music Trackers - Linux Usergroup Nijmegen 2014
Music Trackers - Linux Usergroup Nijmegen 2014
 
Jupyter and Music
Jupyter and MusicJupyter and Music
Jupyter and Music
 
Helmi Zuhdi
Helmi ZuhdiHelmi Zuhdi
Helmi Zuhdi
 
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
 
AUTOMATED MUSIC MAKING WITH RECURRENT NEURAL NETWORK
AUTOMATED MUSIC MAKING WITH RECURRENT NEURAL NETWORKAUTOMATED MUSIC MAKING WITH RECURRENT NEURAL NETWORK
AUTOMATED MUSIC MAKING WITH RECURRENT NEURAL NETWORK
 
Presentation Seminar
Presentation SeminarPresentation Seminar
Presentation Seminar
 
The synthesizer for A2 music tech students
The synthesizer for A2 music tech studentsThe synthesizer for A2 music tech students
The synthesizer for A2 music tech students
 
MA: Presentation Seminar
MA: Presentation SeminarMA: Presentation Seminar
MA: Presentation Seminar
 
音楽の非専門家が演奏・創作を通じて音楽を楽しめる世界を目指して
音楽の非専門家が演奏・創作を通じて音楽を楽しめる世界を目指して音楽の非専門家が演奏・創作を通じて音楽を楽しめる世界を目指して
音楽の非専門家が演奏・創作を通じて音楽を楽しめる世界を目指して
 
MusicEngine for Unity/ADX2LE(English)
MusicEngine for Unity/ADX2LE(English)MusicEngine for Unity/ADX2LE(English)
MusicEngine for Unity/ADX2LE(English)
 
Real Time Drum Augmentation with Physical Modeling
Real Time Drum Augmentation with Physical ModelingReal Time Drum Augmentation with Physical Modeling
Real Time Drum Augmentation with Physical Modeling
 
Presentation on LMMS
Presentation on LMMSPresentation on LMMS
Presentation on LMMS
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 

Research on Automatic Music Composition at the Taiwan AI Labs, April 2020

  • 2. About the Music AI Team @ Taiwan AI Labs • About Taiwan AI Labs Ø Privately-funded research organization, founded by Ethan Tu (PTT) in 2017 Ø Three main research area: 1) HCI, 2) medicine, 3) smart city • About the Music AI team Ø Members uscientist [me; since March 2019] uML engineers (for models) umusicians uprogram manager usoftware engineers (for frontend/backend) 2(image of our musicians, processed by neural image style transfer)
  • 3. Well-Established Music Technology: Making Sounds 3 • Music synthesizers that make realistic sounds (e.g,. electric piano) and new sounds (e.g., electric guitar) • Based on digital signal processing Sources: https://www.wardbrodt.com/blog/history-of-the-electronic-keyboard-infographic-madison-wisconsin https://www.musicnexo.com/blog/en/history-of-the-electric-guitar-eternal-youth/ https://freesound.org/people/karolist/sounds/370934/
  • 4. Emerging Music Technology: Making Music 4 • Computers that can understand (existing) music and create new music performances • Based on machine learning • Example: “Bach Doodle” by Google (https://www.google.com/doodles/celebrating-johann-sebastian-bach) https://youtu.be/gsUV0mGEGaY (image is from Google Magenta’s website)
  • 5. Use Cases 5 • Make musicians’ life easier Ø inspire ideas Ø suggest continuations Ø suggest accompaniments • Empower everyone to make music Ø“democratization” of music creation • Create copyright free music for videos or games • Music education
  • 6. Growing Interest in Music Composing AI (images are from the internet)
  • 7. Google’s Magenta Studio: A DAW Plugin https://magenta.tensorflow.org/studio/ 7 • “Continue,” “Generate 4 bars,” “Drumify,” “Interpolate,” “Groove” (image is from Google Magenta’s website)
  • 8. Our View • There are moments in life that we treasure. How we musicalize those moments? • Streaming music cannot really interact with us • Interaction is the key • And, we focus on pop music (images are from the internet)
  • 9. Envisioned Paradigm Shift 9(images are from the internet)
  • 10. Deep Learning-based Approaches • Question: What’s the input and output, for an automatic music composition model? 10(image is from the internet)
  • 11. Two Scenarios 11 Unconditional Conditional • Prime melody • Tags • Lyrics • Video clips (Slide made by Hao-Min Liu)
  • 12. Beyond Melody 12 Melody Unconditional Chords C Am Dm G7 Drums Other Instruments Conditional (Slide made by Hao-Min Liu)
  • 13. Two Main Approaches to Automatic Music Composition (1) Consider music as an image (e.g., MuseGAN [dong18aaai]) p pros: a bird-eye view of what’s going on p cons: do not have ideas of “notes” 13(image is from the internet)
  • 14. Example Image-based Model: MuseGAN • Based on GAN (generative adversarial network) phttps://salu133445.github.io/musegan/results phttps://salu133445.github.io/ismir2019tutorial/ 14 “MuseGAN: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment”, AAAI 2018 (from my group at Accademia Sinica)
  • 15. Two Main Approaches to Automatic Music Composition (2) Consider music as a language (e.g., [huang19iclr]) p pros: do have ideas of “notes” p cons: do not have a bird-eye view of what’s going on (because a music piece becomes now a sequence of “tokens”) p becomes a “natural language generation” (NLG) problem 15(image is from the ICLR’19 paper)
  • 16. Differences between Music and Text • Music is “polyphonic”: there can be multiple notes sounding at the same time ØTherefore, while converting a music piece into a sequence of tokens, we need a specialized token called “TIME-SHIFT” 16 “Music Transformer: Generating music with long-term structure”, ICLR 2019
  • 17. Differences between Music and Text • Music is “polyphonic”: there can be multiple notes sounding at the same time ØTherefore, while converting a music piece into a sequence of tokens, we need a specialized token called “TIME-SHIFT” 17 “LakhNES: Improving multi- instrumental music generation with cross-domain pre-training”, ISMIR 2019 ØAnd, for multi- instrument music, we specify the instrument that plays each note
  • 18. “Transformer”: State-of-the-art Approach • Attention is all you need Ø Adopted by Google Magenta [huang19iclr], OpenAI, etc Ø Use self-attention to learn the dependency between tokens 18(image is from the transformer paper)
  • 20. Google’s Music Transformer Is NOT Designed for Pop Music 20 • No built-in sense of “bars” “downbeat probability” generated piano roll
  • 21. “Pop Music Transformer” • Our model (arXiv:2002.00212) pgenerate minute-long popular piano music with expressive, coherent and clear structure of rhythm and harmony, without needing any post-processing to refine the result phttps://soundcloud.com/yating_ai/sets/ai-piano- generation-demo-202004 21 “downbeat probability” generated piano roll Yu-Siang Huang Work done by
  • 22. PK Google’s Music Transformer 22 https://magenta.github.io/listen-to-transformer/ Google’s original model is trained on classical music using Transformer; we retrain it on our Pop dataset using Transformer-XL Our model (“REMI”) beats Google’s model (”Baselines 1 & 3”) in a user study that involves 76 raters
  • 23. We Can Also Add Drums • It’s easy to add drums to the piano generated by our model, because the generated piano track has clearer rhythmic patterns phttps://soundcloud.com/yating_ai/sets/ai-pianodrum- generation-demo-202004 23 (Image from the paper “A Review of Automatic Drum Transcription”) Wen-Yi Hsiao Work done by
  • 24. Guitar Transformer • The same model, after some proper modifications, can also generate fingerstyle guitar music p https://soundcloud.com/yating_ai/ai-guitar-tab- generation-202003/s-KHozfW0PTv5 p Add string & fret p Right-hand technique: slap, press, hit top, upstroke, downstroke 24 Yu-Hua Chen Work done by (images are from the internet)
  • 25. https://www.youtube.com/watch?v=9ZIJrr6lmHg Play Mode: Jamming with Our AI 25 • Yeh et al., “Learning to generate Jazz and Pop piano music from audio via MIR techniques,” ISMIR-LBD 2019 • Hsiao et al., “Jamming with Yating: Interactive demonstration of a music composition AI,” ISMIR-LBD 2019 Yin-Cheng Yeh & Wen-Yi Hsiao Work done by
  • 26. Play Mode 2: Generate by Drawing • What does “love” sound like? https://ailabs.tw/yating-music-piano-demo/ 26 Vibert Thio Work done by
  • 27. How Did We Make It? The “MIR4generation” Pipeline 28 • Use “music information retrieval” (MIR) techniques to empower AI some knowledge of music p Chord: for harmonic structure p Beat/downbeat: for rhythmic structure p (The AI needs to understand music better!)
  • 28. The “REMI” Representation of Music • REMI: REvamped MIDI-like representation of music p Add chord & tempo related tokens p Use `Note-Duration` instead of Note-Off p Use `Position & Bar` instead of Time-Shift 29
  • 29. The “REMI” Representation of Music • Use Note-Duration instead of Note-Off p because it’s hard for a model to pair note-on and note-off p in contrast, note-on & note-duration always occur together 30
  • 30. The “REMI” Representation of Music • Use Position & Bar instead of Time-Shift p Position(1/16), Position(2/16), … Position(*/16), … p provide an explicit metric grid for the model 31
  • 31. The “REMI” Representation of Music • Use Position & Bar instead of Time-Shift p Position(1/16), Position(2/16), … Position(*/16), … p provide an explicit metric grid for the model n make it easy to add “bar-level” conditions n make it easy to sync different tracks (e.g. piano & drum) 32
  • 32. Piano Generation + Drum https://soundcloud.com/yating_ai/sets/ai-pianodrum-generation- demo-202004 Piano Piano Drums • Input • Result of structure analysis • Result of grooving analysis & add drum loops (Slide made by Wen-Yi Hsiao)
  • 33. The “REMI” Representation of Music 34 • Easier to learn Position & Bar than to learn Time-Shift
  • 34. Transformer-XL instead of Transformer 35 Transformer (NeurIPS, 2017) Transformer-XL (ACL, 2019) (images are from the ACL’19 paper)
  • 35. Piano Generation: Pop Music Transformer 36 https://arxiv.org/abs/2002.00212
  • 36. Open Research Open source code https://github.com/YatingMusic/remi (the music composing AI) https://github.com/YatingMusic/miditoolkit (prepare MIDI data) https://github.com/YatingMusic/ReaRender (convert MIDI to audio) 37 Yu-Siang Huang & Wen-Yi Hsiao Work done by
  • 37. Ongoing Work: AI Strings • Generate multi-instrument music 38 Sample output of the generated result 現場版才聽得到~ Yin-Cheng Yeh WIP
  • 38. Ongoing Work: AI Vocal 39 • Use GAN to generate realistic vocal timbre & melody that goes nicely with a given piano track 現場版才聽得到~ Jen-Yu Liu WIP
  • 39. Ongoing Work: AI Painter 41 • AI art + music p music to image translation p image to music translation p AI canvas as an interface for human-AI interaction Jen-Yu Liu WIP result of the current model →
  • 40. Conclusion • Why automatic music composition p musicians’ friends p democratization of music creation • How p image-based approach p text-based approach • What we are doing p marry music analysis and music composition (e.g., `beat- based music modeling` in Pop Music Transformer) p combine image generation and music generation p explore novel ways of human-AI musical interaction 43