Three approaches to automatic composition are discussed: rule-based models, genetic algorithms, and statistical models. Recent advances include FlowComposer, which uses constrained Markov models to generate lead sheets based on user input, and WaveNet, a neural network that generates raw audio. While earlier rule-based systems focused on genres like classical music, statistical models with harmonic constraints now show the best performance. Deep learning models also show promise but require further study on incorporating harmonic constraints. Objective evaluation metrics are still needed to assess composition quality.
Learning to Groove with Inverse Sequence Transformationsivaderivader
This document describes a model called GrooveVAE that aims to generate expressive drum performances from musical scores by learning the microtiming and velocity characteristics of human drumming performances. It presents several proposed models including an MLP, Seq2Seq, encoder-decoder, and Groove Transfer models. The models are evaluated on a dataset of human drum performances and aimed to accomplish tasks like humanizing a score by adding expressive timing and dynamics or completing an incomplete score. Results are measured using listening tests and quantitative metrics like timing error and velocity divergence. Future work could involve visualizing model outputs to help musicians understand generated expressive performances.
Timbral modeling for music artist recognition using i-vectorsHamid Eghbal-zadeh
This document summarizes a research paper on using i-vectors for artist recognition in music. The proposed method extracts i-vectors from songs to obtain a compact representation that captures artist variability. It uses a Gaussian mixture model (GMM) to calculate statistics from frame-level features, then performs factor analysis to extract i-vectors from the GMM supervectors. Experiments on a dataset of 20 artists show the method achieves better artist recognition performance than baselines, and works well across different backends like discriminant analysis and probabilistic linear discriminant analysis.
My talk at CMMR 2013 on the Mood Conductor system.
Mood Conductor is an interactive system that allows audience members to communicate emotional directions to performers in order to conduct improvised music performances. Mood Conductor consists of three main technical components: a smartphone-friendly web application used by the audience, a server-side application for aggregating and clustering audience indicated emotion coordinates in the arousal-valence space, and a visualisation client that creates a video
projection used by the musicians as guidance. This projection also provides visual feedback for the audience.
This document discusses the history of communication and key developments in the field. It covers early analog systems, Harry Nyquist's work converting signals to discrete samples, and Claude Shannon's seminal 1948 paper that established information theory and the digital basis of modern communication systems. Shannon viewed information sources and channels as random and used probability models, showing reliable communication is possible if the information rate is less than the channel capacity. While initially not well understood, Shannon's principles now underlie most communication systems designed over 50 years later. The document also discusses challenges in mobile ad hoc networks and position-based routing approaches that can efficiently route packets using only local neighbor information and nodes' movement histories.
Music Engendering Algorithm With Diverse Characteristic By Adjusting The Para...IRJET Journal
This document proposes using a Deep Convolutional Generative Adversarial Network (DCGAN) to generate music with diverse characteristics by adjusting the model parameters. It discusses related work using recurrent neural networks and GANs for music generation. It then describes three proposed DCGAN models that vary the convolutional layers to either focus on the temporal or frequency components of the music. The models are implemented on a dataset of MIDI music tracks. The generator and discriminator networks are defined and the GAN is trained to generate new music samples.
The document discusses various traffic models for 5G networks including stochastic geometry models, temporal-spatial traffic models, and traffic-aware models. Stochastic geometry models use Poisson point processes to model the random distribution of nodes and analyze network performance factors like arrival rate and file size. Temporal-spatial models aim to describe changing user behavior and traffic patterns over time and space better than stochastic models. Traffic-aware models apply techniques like game theory and information theory to learn traffic patterns, estimate future traffic, and optimize resource usage through strategies like proactive base station switching.
Research at MAC Lab, Academia Sincia, in 2017Yi-Hsuan Yang
Some research projects we did in 2017 at the Music & Audio Computing (MAC) Lab, Research Center for IT Innovation, Academia Sinica, Taipei, Taiwan. It includes three parts: 1) vocal separation, 2) music generation, 3) AI DJ.
Three approaches to automatic composition are discussed: rule-based models, genetic algorithms, and statistical models. Recent advances include FlowComposer, which uses constrained Markov models to generate lead sheets based on user input, and WaveNet, a neural network that generates raw audio. While earlier rule-based systems focused on genres like classical music, statistical models with harmonic constraints now show the best performance. Deep learning models also show promise but require further study on incorporating harmonic constraints. Objective evaluation metrics are still needed to assess composition quality.
Learning to Groove with Inverse Sequence Transformationsivaderivader
This document describes a model called GrooveVAE that aims to generate expressive drum performances from musical scores by learning the microtiming and velocity characteristics of human drumming performances. It presents several proposed models including an MLP, Seq2Seq, encoder-decoder, and Groove Transfer models. The models are evaluated on a dataset of human drum performances and aimed to accomplish tasks like humanizing a score by adding expressive timing and dynamics or completing an incomplete score. Results are measured using listening tests and quantitative metrics like timing error and velocity divergence. Future work could involve visualizing model outputs to help musicians understand generated expressive performances.
Timbral modeling for music artist recognition using i-vectorsHamid Eghbal-zadeh
This document summarizes a research paper on using i-vectors for artist recognition in music. The proposed method extracts i-vectors from songs to obtain a compact representation that captures artist variability. It uses a Gaussian mixture model (GMM) to calculate statistics from frame-level features, then performs factor analysis to extract i-vectors from the GMM supervectors. Experiments on a dataset of 20 artists show the method achieves better artist recognition performance than baselines, and works well across different backends like discriminant analysis and probabilistic linear discriminant analysis.
My talk at CMMR 2013 on the Mood Conductor system.
Mood Conductor is an interactive system that allows audience members to communicate emotional directions to performers in order to conduct improvised music performances. Mood Conductor consists of three main technical components: a smartphone-friendly web application used by the audience, a server-side application for aggregating and clustering audience indicated emotion coordinates in the arousal-valence space, and a visualisation client that creates a video
projection used by the musicians as guidance. This projection also provides visual feedback for the audience.
This document discusses the history of communication and key developments in the field. It covers early analog systems, Harry Nyquist's work converting signals to discrete samples, and Claude Shannon's seminal 1948 paper that established information theory and the digital basis of modern communication systems. Shannon viewed information sources and channels as random and used probability models, showing reliable communication is possible if the information rate is less than the channel capacity. While initially not well understood, Shannon's principles now underlie most communication systems designed over 50 years later. The document also discusses challenges in mobile ad hoc networks and position-based routing approaches that can efficiently route packets using only local neighbor information and nodes' movement histories.
Music Engendering Algorithm With Diverse Characteristic By Adjusting The Para...IRJET Journal
This document proposes using a Deep Convolutional Generative Adversarial Network (DCGAN) to generate music with diverse characteristics by adjusting the model parameters. It discusses related work using recurrent neural networks and GANs for music generation. It then describes three proposed DCGAN models that vary the convolutional layers to either focus on the temporal or frequency components of the music. The models are implemented on a dataset of MIDI music tracks. The generator and discriminator networks are defined and the GAN is trained to generate new music samples.
The document discusses various traffic models for 5G networks including stochastic geometry models, temporal-spatial traffic models, and traffic-aware models. Stochastic geometry models use Poisson point processes to model the random distribution of nodes and analyze network performance factors like arrival rate and file size. Temporal-spatial models aim to describe changing user behavior and traffic patterns over time and space better than stochastic models. Traffic-aware models apply techniques like game theory and information theory to learn traffic patterns, estimate future traffic, and optimize resource usage through strategies like proactive base station switching.
Research at MAC Lab, Academia Sincia, in 2017Yi-Hsuan Yang
Some research projects we did in 2017 at the Music & Audio Computing (MAC) Lab, Research Center for IT Innovation, Academia Sinica, Taipei, Taiwan. It includes three parts: 1) vocal separation, 2) music generation, 3) AI DJ.
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
The document discusses exploring randomly wired neural networks for image recognition. The authors define network generators based on random graph models like Erdos-Renyi, Barabasi-Albert, and Watts-Strogatz. These generators produce neural networks with random connectivity patterns. Experiments show that networks generated this way achieve competitive accuracy compared to hand-designed architectures, demonstrating that the generator design is important. The random wiring patterns provide performance comparable to networks from neural architecture search with fewer parameters and computations.
Generating Musical Notes and Transcription using Deep LearningVarad Meru
Music has always been the most followed art form, and lot of research had gone into understanding it. In recent years, deep learning approaches for building unsupervised hierarchical representations from unlabeled data have gained significant interest. Progress in fields, such as image processing and natural language processing, has been substantial, but to my knowledge, methods on auditory data for learning representations have not been studied extensively. In this project I try to use two methods for generating music from range of musical inputs such as MIDI to complex WAV formats. I use RNN-RBMs and CDBN to explore music.
A natural extension of the Random Access Machine (RAM) serial architecture is the Parallel Random Access Machine, or PRAM.
PRAMs consist of p processors and a global memory of unbounded size that is uniformly accessible to all processors.
Processors share a common clock but may execute different instructions in each cycle.
Parallel platforms can be organized in various ways, from an ideal parallel random access machine (PRAM) to more conventional architectures. PRAMs allow concurrent access to shared memory and can be divided into subclasses based on how simultaneous memory accesses are handled. Physical parallel computers use interconnection networks to provide communication between processing elements and memory. These networks include bus-based, crossbar, multistage, and various topologies like meshes and hypercubes. Maintaining cache coherence across multiple processors is important and can be achieved using invalidate protocols, directories, and snooping.
Automatic Music Composition with Transformers, Jan 2021Yi-Hsuan Yang
An up-to-date version of slides introducing our ongoing projects on automatic music composition at the Yating Music AI Team of the Taiwan AI Labs (https://ailabs.tw/), focusing on introducing the following two publications from our group.
[1] "Pop Music Transformer: Beat-based modeling and generation of expressive Pop piano compositions," in Proc. ACM Multimedia, 2020.
[2] "Compound Word Transformer: Learning to compose full-song music over dynamic directed hypergraphs," in Proc. AAAI 2021.
For the last version of the slides, please visit: https://www2.slideshare.net/affige/research-on-automatic-music-composition-at-the-taiwan-ai-labs-april-2020/edit?src=slideview
A new parallel bat algorithm for musical note recognition IJECEIAES
Music is a universal language that does not require an interpreter, where feelings and sensitivities are united, regardless of the different peoples and languages, The proposed system consists of two main stages: the process of extracting important properties using the linear discrimination analysis (LDA) This step is carried out after the initial treatment process using various procedures to remove musical lines, The second stage describes the recognition process using the bat algorithm, which is one of the metaheuristic algorithms after modifying the bat algorithm to obtain better discriminating results. The proposed system was supported by parallel implementation using the (developed bat algorithm DBA), which increased the speed of implementation significantly. The method was applied to 1250 different images of musical notes. The proposed system was implemented using MATLAB R2016a, Work was done on a Windows10 Processor OS (Intel ® Core TM i5-7200U CPU @ 2.50GHZ 2.70GHZ) computer.
This document summarizes research presented at recent Argument Mining workshops. It describes tasks from 2021 and 2022 on key point analysis and validity-novelty prediction. It also outlines papers on extracting argument components, predicting reasoning markers, and detecting argument boundaries. The document indicates that argument mining can analyze debates on social media and that legal corpora can help develop argument mining systems.
This document summarizes several papers being presented at the CHI 2023 conference related to human-AI collaboration, mental healthcare, comment visualization, and virtual reality. For human-AI collaboration, one paper studies the effects of AI code generators on novice programmers and another aims to bridge the abstraction gap between end users and large language models. A third paper investigates whether groups or individuals are better at AI-assisted decision making. For mental healthcare, two papers study user perceptions of recommender systems and interest in self-tracking technologies. For comment visualization, a paper examines selective avoidance of uncongenial facts. Three papers on virtual reality discuss opportunities for metaverse workspaces, embodying physics-aware avatars, and mapping finger motions
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
This document summarizes a research paper on learning graph representations for deep divergence graph kernels (DDGK). DDGK learns graph representations without supervision or domain knowledge by using a node-to-edges encoder and isomorphism attention. The isomorphism attention provides a bidirectional mapping between nodes in two graphs. DDGK then calculates a divergence score between the source and target graphs as a measure of their (dis)similarity. Experimental results showed DDGK produces representations competitive with other graph kernel baselines. The paper proposes several extensions, including different graph encoders and attention mechanisms, as well as improved regularization and scalability.
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality ivaderivader
This document summarizes research on developing a hybrid kinematic regression model for continuous 3D hand trajectory prediction in virtual reality. The model combines classical kinematics with regression to predict hand movements up to 300ms into the future with low error. A user study was conducted to collect training data from structured pointing tasks and unstructured VR gameplay. The model was evaluated through cross-validation and testing on new users and activities, demonstrating good performance without retraining. Limitations and opportunities for future work are discussed, such as expanding the model to support two-handed interactions and other types of tasks.
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...ivaderivader
This document summarizes a research paper that proposes using reinforcement learning to optimize the placement of electric vehicle charging stations in urban road networks. It formulates the charging station placement task as a reinforcement learning problem to maximize a defined utility function. The paper details the implementation of a deep Q-learning agent that takes actions to create, increase, or relocate charging stations. It evaluates the reinforcement learning approach on real-world road networks and charging demand datasets from two German cities, finding it outperforms baseline greedy algorithms in maximizing utility while meeting constraints.
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...ivaderivader
This document describes MindScope, an algorithm-assisted stress management system for college students. It collects behavioral data from students' smartphones to predict their stress levels and provides explanations of the predictions. A study was conducted with 36 students who used MindScope for 25 days. It found that providing algorithmic explanations for stress predictions helped reduce students' perceived stress levels over time. It also supported their stress management practices and self-reflection. However, the study was limited by its short duration and lack of a control group.
A Style-Based Generator Architecture for Generative Adversarial Networksivaderivader
StyleGAN is a generative adversarial network that achieves disentangled and scalable image generation. It uses adaptive instance normalization (AdaIN) to modify feature statistics at different scales, allowing scale-specific image stylization. The generator is designed as a learned mapping from latent space to image space. Latent codes are fed into each layer and transformed through AdaIN to modify feature statistics. This disentangles high-level attributes like pose, hair, etc. and allows controllable image synthesis through interpolation in latent space.
Neural Approximate Dynamic Programming for On-Demand Ride-Poolingivaderivader
This document summarizes a research paper on using neural approximate dynamic programming (NeurADP) to solve the ride-pooling matching problem (RMP) of optimally assigning passenger requests to vehicles. The key contributions are developing NeurADP, which uses a neural network to approximate the value function within an ADP framework, and connecting it to reinforcement learning. This allows NeurADP to handle the complex RMP involving partially filled vehicles better than prior methods. The researchers test NeurADP on New York City taxi data and find that changing constraints like maximum wait time and vehicle capacity affects NeurADP's ability to serve more requests while avoiding myopic assignments.
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeivaderivader
This document analyzes 233 VR fail videos from YouTube to understand types and causes of VR fails outside the lab. The analysis found that the most common types were excessive reaction (53%) and falling over (18%), while the top causes were fear (40%) and sensorimotor mismatch (26%). Spectators were often laughing or expressing concern. The analysis suggests design implications like changing play spaces, mechanics, and interactions to prevent fails while promoting seamful design and spectator engagement. Limitations include the corpus not being broadly representative and difficulty analyzing home videos.
Invertible Denoising Network: A Light Solution for Real Noise Removalivaderivader
This document summarizes an academic paper on invertible denoising networks (InvDN), a new type of neural network designed for real image denoising. The key points are:
1) InvDN is the first paper to design invertible networks for real image denoising by utilizing two different distributions - one for the noisy image and one for the clean image.
2) InvDN shows state-of-the-art results on benchmark datasets by restoring images and generating new noisy images from the restored clean images.
3) InvDN works by splitting images into low and high frequency components in the frequency domain and handling each separately, allowing the disentanglement of clean image and noise information needed for
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
The document discusses exploring randomly wired neural networks for image recognition. The authors define network generators based on random graph models like Erdos-Renyi, Barabasi-Albert, and Watts-Strogatz. These generators produce neural networks with random connectivity patterns. Experiments show that networks generated this way achieve competitive accuracy compared to hand-designed architectures, demonstrating that the generator design is important. The random wiring patterns provide performance comparable to networks from neural architecture search with fewer parameters and computations.
Generating Musical Notes and Transcription using Deep LearningVarad Meru
Music has always been the most followed art form, and lot of research had gone into understanding it. In recent years, deep learning approaches for building unsupervised hierarchical representations from unlabeled data have gained significant interest. Progress in fields, such as image processing and natural language processing, has been substantial, but to my knowledge, methods on auditory data for learning representations have not been studied extensively. In this project I try to use two methods for generating music from range of musical inputs such as MIDI to complex WAV formats. I use RNN-RBMs and CDBN to explore music.
A natural extension of the Random Access Machine (RAM) serial architecture is the Parallel Random Access Machine, or PRAM.
PRAMs consist of p processors and a global memory of unbounded size that is uniformly accessible to all processors.
Processors share a common clock but may execute different instructions in each cycle.
Parallel platforms can be organized in various ways, from an ideal parallel random access machine (PRAM) to more conventional architectures. PRAMs allow concurrent access to shared memory and can be divided into subclasses based on how simultaneous memory accesses are handled. Physical parallel computers use interconnection networks to provide communication between processing elements and memory. These networks include bus-based, crossbar, multistage, and various topologies like meshes and hypercubes. Maintaining cache coherence across multiple processors is important and can be achieved using invalidate protocols, directories, and snooping.
Automatic Music Composition with Transformers, Jan 2021Yi-Hsuan Yang
An up-to-date version of slides introducing our ongoing projects on automatic music composition at the Yating Music AI Team of the Taiwan AI Labs (https://ailabs.tw/), focusing on introducing the following two publications from our group.
[1] "Pop Music Transformer: Beat-based modeling and generation of expressive Pop piano compositions," in Proc. ACM Multimedia, 2020.
[2] "Compound Word Transformer: Learning to compose full-song music over dynamic directed hypergraphs," in Proc. AAAI 2021.
For the last version of the slides, please visit: https://www2.slideshare.net/affige/research-on-automatic-music-composition-at-the-taiwan-ai-labs-april-2020/edit?src=slideview
A new parallel bat algorithm for musical note recognition IJECEIAES
Music is a universal language that does not require an interpreter, where feelings and sensitivities are united, regardless of the different peoples and languages, The proposed system consists of two main stages: the process of extracting important properties using the linear discrimination analysis (LDA) This step is carried out after the initial treatment process using various procedures to remove musical lines, The second stage describes the recognition process using the bat algorithm, which is one of the metaheuristic algorithms after modifying the bat algorithm to obtain better discriminating results. The proposed system was supported by parallel implementation using the (developed bat algorithm DBA), which increased the speed of implementation significantly. The method was applied to 1250 different images of musical notes. The proposed system was implemented using MATLAB R2016a, Work was done on a Windows10 Processor OS (Intel ® Core TM i5-7200U CPU @ 2.50GHZ 2.70GHZ) computer.
This document summarizes research presented at recent Argument Mining workshops. It describes tasks from 2021 and 2022 on key point analysis and validity-novelty prediction. It also outlines papers on extracting argument components, predicting reasoning markers, and detecting argument boundaries. The document indicates that argument mining can analyze debates on social media and that legal corpora can help develop argument mining systems.
This document summarizes several papers being presented at the CHI 2023 conference related to human-AI collaboration, mental healthcare, comment visualization, and virtual reality. For human-AI collaboration, one paper studies the effects of AI code generators on novice programmers and another aims to bridge the abstraction gap between end users and large language models. A third paper investigates whether groups or individuals are better at AI-assisted decision making. For mental healthcare, two papers study user perceptions of recommender systems and interest in self-tracking technologies. For comment visualization, a paper examines selective avoidance of uncongenial facts. Three papers on virtual reality discuss opportunities for metaverse workspaces, embodying physics-aware avatars, and mapping finger motions
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
This document summarizes a research paper on learning graph representations for deep divergence graph kernels (DDGK). DDGK learns graph representations without supervision or domain knowledge by using a node-to-edges encoder and isomorphism attention. The isomorphism attention provides a bidirectional mapping between nodes in two graphs. DDGK then calculates a divergence score between the source and target graphs as a measure of their (dis)similarity. Experimental results showed DDGK produces representations competitive with other graph kernel baselines. The paper proposes several extensions, including different graph encoders and attention mechanisms, as well as improved regularization and scalability.
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality ivaderivader
This document summarizes research on developing a hybrid kinematic regression model for continuous 3D hand trajectory prediction in virtual reality. The model combines classical kinematics with regression to predict hand movements up to 300ms into the future with low error. A user study was conducted to collect training data from structured pointing tasks and unstructured VR gameplay. The model was evaluated through cross-validation and testing on new users and activities, demonstrating good performance without retraining. Limitations and opportunities for future work are discussed, such as expanding the model to support two-handed interactions and other types of tasks.
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...ivaderivader
This document summarizes a research paper that proposes using reinforcement learning to optimize the placement of electric vehicle charging stations in urban road networks. It formulates the charging station placement task as a reinforcement learning problem to maximize a defined utility function. The paper details the implementation of a deep Q-learning agent that takes actions to create, increase, or relocate charging stations. It evaluates the reinforcement learning approach on real-world road networks and charging demand datasets from two German cities, finding it outperforms baseline greedy algorithms in maximizing utility while meeting constraints.
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...ivaderivader
This document describes MindScope, an algorithm-assisted stress management system for college students. It collects behavioral data from students' smartphones to predict their stress levels and provides explanations of the predictions. A study was conducted with 36 students who used MindScope for 25 days. It found that providing algorithmic explanations for stress predictions helped reduce students' perceived stress levels over time. It also supported their stress management practices and self-reflection. However, the study was limited by its short duration and lack of a control group.
A Style-Based Generator Architecture for Generative Adversarial Networksivaderivader
StyleGAN is a generative adversarial network that achieves disentangled and scalable image generation. It uses adaptive instance normalization (AdaIN) to modify feature statistics at different scales, allowing scale-specific image stylization. The generator is designed as a learned mapping from latent space to image space. Latent codes are fed into each layer and transformed through AdaIN to modify feature statistics. This disentangles high-level attributes like pose, hair, etc. and allows controllable image synthesis through interpolation in latent space.
Neural Approximate Dynamic Programming for On-Demand Ride-Poolingivaderivader
This document summarizes a research paper on using neural approximate dynamic programming (NeurADP) to solve the ride-pooling matching problem (RMP) of optimally assigning passenger requests to vehicles. The key contributions are developing NeurADP, which uses a neural network to approximate the value function within an ADP framework, and connecting it to reinforcement learning. This allows NeurADP to handle the complex RMP involving partially filled vehicles better than prior methods. The researchers test NeurADP on New York City taxi data and find that changing constraints like maximum wait time and vehicle capacity affects NeurADP's ability to serve more requests while avoiding myopic assignments.
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeivaderivader
This document analyzes 233 VR fail videos from YouTube to understand types and causes of VR fails outside the lab. The analysis found that the most common types were excessive reaction (53%) and falling over (18%), while the top causes were fear (40%) and sensorimotor mismatch (26%). Spectators were often laughing or expressing concern. The analysis suggests design implications like changing play spaces, mechanics, and interactions to prevent fails while promoting seamful design and spectator engagement. Limitations include the corpus not being broadly representative and difficulty analyzing home videos.
Invertible Denoising Network: A Light Solution for Real Noise Removalivaderivader
This document summarizes an academic paper on invertible denoising networks (InvDN), a new type of neural network designed for real image denoising. The key points are:
1) InvDN is the first paper to design invertible networks for real image denoising by utilizing two different distributions - one for the noisy image and one for the clean image.
2) InvDN shows state-of-the-art results on benchmark datasets by restoring images and generating new noisy images from the restored clean images.
3) InvDN works by splitting images into low and high frequency components in the frequency domain and handling each separately, allowing the disentanglement of clean image and noise information needed for
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Networkivaderivader
The document proposes a traffic demand prediction model based on a dynamic transition convolutional neural network. It defines a transition network using density-peak based clustering to identify virtual stations. A dynamic graph convolutional gated recurrent unit is developed to model the station-to-station transitions over time. Meteorological data is also incorporated as external features to improve the prediction performance. The model is evaluated on bike and taxi demand data from New York City and shows improved results over various baseline models.
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training ivaderivader
The document introduces MusicBERT, a large-scale pre-trained Transformer model for symbolic music understanding. MusicBERT uses a novel encoding method called OctupleMIDI to efficiently encode music sequences into compact tokens. It also employs a bar-level masking strategy during pre-training. MusicBERT is evaluated on four music understanding tasks and achieves state-of-the-art results, demonstrating the effectiveness of its encoding method and pre-training approach.
Screen2Vec: Semantic Embedding of GUI Screens and GUI Componentsivaderivader
This document presents Screen2Vec, a technique for generating semantic embeddings of GUI screens and components using text, visual design, layout patterns, and app metadata. It has a two-level architecture that learns embeddings at the component and screen levels. The technique is self-supervised and does not require labeled data. It was trained on the RICO dataset and can be used for tasks like nearest neighbor retrieval and representing mobile tasks. Sample results showed Screen2Vec generated more comprehensive representations compared to baselines using only text or layout.
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...ivaderivader
The document summarizes research on using reinforcement learning to help taxi drivers maximize revenue by recommending optimal locations to drive to at different times and days. The methodology involves modeling driver decisions as activity graphs from taxi trajectory data and then using Monte Carlo reinforcement learning with states defined by zone, time, and day to learn an optimal policy. Experiments show the learned policy is able to achieve higher average revenues than top-earning drivers and common heuristics, demonstrating the potential of reinforcement learning to help taxi drivers.
Natural Language to Visualization by Neural Machine Translationivaderivader
This document summarizes a method called ncNet that uses neural machine translation to translate natural language queries to Vega-Zero visualization specifications. Key points include:
- ncNet is a Transformer-based seq2seq model that can translate natural language queries about data into specifications for visualizations.
- It uses techniques like attention forcing and visualization-aware translation to generate valid Vega-Zero outputs from the queries.
- An evaluation compares ncNet to existing methods on a dataset of natural language queries paired with Vega-Zero specifications, finding it outperforms baselines.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
3. 2
Overview of the Paper
• Musical Terminology
Phrase - 소절
Note - 음표
Pitch - 음의 높이
Tracks - instruments
Bars – beat, pixel
Fragments
Monophonic music- music composed of
one instrument
Polyphonic music – music composed of
multiple instruments
4. 3
Overview of the Paper
• Basic Ideas and Motivation
Music is an art of time
• Music has a hierarchical structure, with higher-level building blocks made up of smaller recurrent
patterns.
• People pay attention to structural patterns related to coherence, rhythm, tension and the
emotion flow while listening to music. Thus, a mechanism to account for the temporal structure is
critical.
Music is usually composed of multiple instruments/tracks.
• A modern orchestra usually contains four different sections: brass, strings, woodwinds and
percussion; a rock band includes a bass, a drum set, guitars and possibly a vocal.
• These tracks interact with one another closely and unfold over time interdependently.
Musical notes are often grouped into chords, arpeggios or melodies.
• It is not naturally suitable to introduce a chronological ordering of notes for polyphonic music.
• Therefore, success in natural language generation and monophonic music generation may not be
readily generalizable to polyphonic music generation.
5. 4
Overview of the Paper
• Basic Ideas and Motivation (Continued..)
Most prior arts chose to simplify symbolic music generation in certain ways to render
the problem manageable. Music has a hierarchical structure, with higher-level
building blocks made up of smaller recurrent patterns.
• Goal
To generate multi-track polyphonic music with the followings:
• Harmonic and rhythmic structure
• Multi-track interdependency
• Temporal structure
• Contribution
• Existing RNN based models Generate only single-track monophonic music
• Introducing a chronological ordering of notes for polyphonic music
• Generating polyphonic music as a combination of several monophonic melodies, etc.
6. 5
Overview of the Paper
• Approach
To incorporate a temporal model, two approaches are proposed for different scenarios
1) One generates music from scratch(without human inputs) while the other learns to follow the
underlying temporal structure of a track given a proiri by human.
2) viewing bars instead of notes as the basic compositional unit and generate music one bar after another
using transposed convolutional neural networks, which is known to be good at finding local,
translation-invariant patterns
To handle the interactions among tracks, three methods are suggested based on
understanding of how pop music is composed:
• One generates tracks independently by their private generators (one for each); another generates all tracks
jointly with only one generator; the other generates each track by its private generator with additional
shared inputs among tracks, which is expected to guide the tracks to be collectively harmonious and
coordinated.
• To cope with the grouping of notes, notes are considered as the basic compositional unit and
music is generated one bar after another using transposed convolutional neural networks, which is
known to be good at finding local, translation-invariant patterns.
• Purpose of intra-track and inter-track : Propose a few intra-track and inter-track objective measures and
use them to monitor the learning process and to evaluate the generated results of different proposed
models quantitatively.
7. 6
Proposed Model
• Generative Adversarial Networks (GANs)
The core concept of GANs is to achieve adversarial learning by constructing two
networks: the generator and the discriminator
The generator maps a random noise ‘z’ sampled from a prior distribution to the data
space.
The discriminator is trained to distinguish real data from those generated by the
generator, whereas the generator is trained to fool the discriminator.
The training procedure can be formally modeled as a two-player minimax game
between the generator G and the discriminator D :
*Pd and Pz represent the distribution of real data and the prior distribution of z
*E - Expectation
*log(D(x)) : Discriminator output for real data x
*log(D(G(z))) : Discriminator output for generated fake data G(z)
8. 7
Proposed Model
*Pd and Pz represent the distribution of real data and the prior distribution of z
*E - Expectation
*log(D(x)) : Discriminator output for real data x (close to 1 -> real)
*log(D(G(z))) : Discriminator output for generated fake data G(z) (close to 0 -> fake)
9. 8
Proposed Model
• Jamming Model
Multiple generators work independently and generate music of its own track from a
private random vector Zi, i = 1,2,...,M, where M denotes the number of generators
(or tracks).
M denotes the number of generators (or tracks). These generators receive critics (i.e
. back-propagated supervisory signals) from different discriminators. To generate
music of M tracks, M generators and M discriminators are needed.
10. 9
Proposed Model
• Composer Model
One single generator creates a multi-channel piano-roll, with each channel
representing a specific track.
CM requires only one shared random vector z and one discriminator, which
examines the M tracks collectively to tell whether the input music is real or fake.
Regardless of the value of M, we always need only one generator and one
discriminator.
11. 10
Proposed Model
• Hybrid Model
M generators and only one discriminator
Combining the idea of jamming and composing, the author expects that the inter-track
random vector can coordinate the generation of different musicians, namely ’Gi’ just like a
composer does. Moreover, we use only one discriminator to evaluate the M tracks
collectively.
A major difference between the composer model and the hybrid model lies in the flexibility
– in the hybrid model we can use different network architectures and different inputs for
the M generators. Therefore, we can for example vary the generation of one specific track
without losing the inter-track interdependency.
It is expected that the inter-track random vector can coordinate the generation of different
musicians.
12. 11
Proposed Model
• Modeling the Temporal Structure
The previous models presented can only generate multi-track music bar by bar, with
possibly no coherence among the bars. ->We need a temporal model to generate.
music of a few bars long, such as a musical phrase. There are 2 methods.
13. 12
Proposed Model
• Modeling the Temporal Structure
• Generation from Scratch
• GS aims to generate fixed-length musical phrases(소절) by viewing bar progression as another
dimension to grow the generator.
*A phrase doesn’t mean a mathematical unit, but a line of a song
• The generator consists of two sub networks, the temporal structure generator Gtemp and the bar
generator Gbar. Gtemp maps a noise vector z to a sequence of some latent vectors. The resulting z
carrying temporal information is then be used by Gbar to generate piano-rolls sequentially.(bar-by-bar)
14. 13
Proposed Model
• Modeling the Temporal Structure
• Track-conditional Generation
• TG assumes that the bar sequence y of one specific track is given by human, and tries to learn the
temporal structure underlying that track and to generate the remaining tracks. The track conditional
generator G generates bars one after another with the conditional bar generator G bar.
The multi-track piano -rolls of the remaining tracks of bar t are then generated by G bar.
• Encoder E is trained to map y to the space of z. E is expected to extract inter-track features instead of
intra-track features from the given track, since intra-track features are supposed not to be useful for
generating the other tracks. (ex) AIVA)
15. 14
Proposed Model
• Modeling the Temporal Structure
• MuseGAN
• An integration and extension of the proposed multi-track and temporal models
*M : max # track
T : max # Bar
i : initial track
t = initial bar
16. 15
• Modeling the Temporal Structure
• Inputs
Zt : An Inter-track time-dependent random vector
Z : An inter-track time independent random vector
Zit : Intra-track time dependent random vectors
Zi : Intra-track time independent random vectors
*M : max # track
T : max # Bar
i : initial track
t = initial bar
17. 16
• Modeling the Temporal Structure
• Inputs
Zt : An Inter-track time-dependent random vector
Z : An inter-track time independent random vector
Zit : Intra-track time dependent random vectors
Zi : Intra-track time independent random vectors
*M : max # track
T : max # Bar
i : initial track
t = initial bar
18. 17
• Modeling the Temporal Structure
• Inputs
Zt : An Inter-track time-dependent random vector
Z : An inter-track time independent random vector
Zit : Intra-track time dependent random vectors
Zi : Intra-track time independent random vectors
*M : max # track
T : max # Bar
i : initial track
t = initial bar
19. 18
• Modeling the Temporal Structure
• Inputs
Zt : An Inter-track time-dependent random vector
Z : An inter-track time independent random vector
Zit : Intra-track time dependent random vectors
Zi : Intra-track time independent random vectors
*M : max # track
T : max # Bar
i : initial track
t = initial bar
20. 19
• Modeling the Temporal Structure
• Inputs
Zt : An Inter-track time-dependent random vector
Z : An inter-track time independent random vector
Zit : Intra-track time dependent random vectors
Zi : Intra-track time independent random vectors
*M : max # track
T : max # Bar
i : initial track
t = initial bar
21. 20
Implementation
Dataset
• Lakh MIDI dataset(LMD) : a large collection of 176,581 unique MIDI files.
• The MIDI files are converted into multi-track piano-rolls.
• For each bar, we set the height to 128(pitch) and the width(time resolution, tempo and beat) to 96 for
modeling common temporal patterns such as triplets and 16th notes.
• Python library ‘pretty_midi’ is used to parse and process the MIDI files.
• The resulting dataset was named after Lakh Piano-roll Dataset(LPD)
Data Preprocessing
22. 21
Implementation
Data Preprocessing
• As these MIDI files are scraped from the web and mostly user-generated, the dataset is quite
noisy. Hence, we use LPD-matched in this work and perform three steps for further cleansing
1) Some tracks tend to play only a few notes in the entire songs. This increases data sparsity and
impedes the learning process. We deal with such a data imbalance issue by merging tracks of similar.
instruments (by summing their piano-rolls). Each multi-track piano-roll is compressed into five tracks:
bass, drums, guitar, piano and strings. After this step, we get the LPD-5-matched, which has 30,887
multi-track piano-rolls.
23. 22
Implementation
Data Preprocessing
• As these MIDI files are scraped from the web and mostly user-generated (Raffel and Ellis 2016),
the dataset is quite noisy. Hence, we use LPD-matched in this work and perform three steps for
further cleansing
2) we utilize the metadata provided in the LMD and MSD, and we pick only the piano-rolls that have
higher confidence score in matching, that are Rock songs and are in 4/4 time. After this step, we get.
the LPD-5-cleansed.
3) In order to acquire musically meaningful phrases to train our temporal model, we segment the
piano-rolls with a state-of-the-art algorithm, structural features (Serra` et al. 2012), and obtain phrases
accordingly. In this work, we consider four bars as a phrase and prune longer segments into proper size.
We get 50,266 phrases in total for the training data. Notably, although we use our models to generate
fixed-length segments only, the track-conditional model is able to generate music of any length
according to the input. The size of the target output tensor is hence
4 (bar) × 96 (time step) × 84 (note) × 5 (track).
24. 23
Implementation
Dataset
• Lakh MIDI dataset(LMD) : a large collection of 176,581 unique MIDI files.
• The MIDI files are converted into multi-track piano-rolls.
• For each bar, we set the height to 128(pitch) and the width(time resolution, tempo and beat) to 96 for
modeling common temporal patterns such as triplets and 16th notes.
• Python library ‘pretty_midi’ is used to parse and process the MIDI files.
• The resulting dataset was named after Lakh Piano-roll Dataset(LPD)
Data Preprocessing
25. 24
Implementation
Dataset
• Lakh MIDI dataset(LMD) : a large collection of 176,581 unique MIDI files.
• Python library ‘pretty_midi’ is used to parse and process the MIDI files.
• The resulting dataset was named after Lakh Piano-roll Dataset(LPD)
Data Preprocessing
26. 25
Implementation
Dataset
• Lakh MIDI dataset(LMD) : a large collection of 176,581 unique MIDI files.
• Python library ‘pretty_midi’ is used to parse and process the MIDI files.
• The resulting dataset was named after Lakh Piano-roll Dataset(LPD)
Data Preprocessing
27. 26
Implementation
Dataset
• Lakh MIDI dataset(LMD) : a large collection of 176,581 unique MIDI files.
• Python library ‘pretty_midi’ is used to parse and process the MIDI files.
• The resulting dataset was named after Lakh Piano-roll Dataset(LPD)
Data Preprocessing
28. 27
Experiment and Results
• Objective Metrics for Evaluation
The real and the generated data, including four intra-track and inter-track(the last
one) metrics:
• EB : ratio of empty bars(in %)
• UPC : number of used pitch classes per bar (from 0 to 12)
• QN : ratio of qualified notes(in %). We consider a note not shorter than three time steps as a
qualified note.
• DP or drum pattern : ratio of notes in 8- or 16-beat patterns, common ones for Rock songs in 4/4
time(in %)
• TD or Tonal Distance : It measures the harmonicity between a pair of tracks. Larger TD implies
weaker inter-track harmonic relations.
29. 28
Experiment and Results
• Analysis of Training Data
The real and the generated data, including four intra-track and inter-track(the last one) metr
ics:
*ablated : ablated version of the composer model, one without batch normalization layers
• EB : shows that categorizing the tracks into five families in appropriate.
• UPC : we find that the bass tends to play the melody, which results in a UPC below 2, while the guitar,
piano and strings tend to play the chords, which results in a UPC above 3.
30. 29
Experiment and Results
• Analysis of Training Data
The real and the generated data, including four intra-track and inter-track(the last o
ne) metrics:
• QN : High values of QN indicate that the converted piano-rolls are not overly fragmented.
• DP : we see that over 88 percent of the drum notes are in eighter 8- or 16-beat patterns.
• TD : are around 1.5 when measuring distance between a melody-like track(mostly the bass) and
a chord-like track (mostly one of the piano, guitar or strings), and around 1.00 for two chord-like
tracks.
31. 30
Experiment and Results
• Example Results
The tracks usually play in the same music scale(음계).
Chord-like intervals can be observed in some samples.
The bass often plays the lowest pitches and it is monophonic at most time.
The drums usually have 8 or 16 beat rhythmic patterns.
The guitar, piano and strings tend to play the chords, and their pitches sometimes overlap,
which indicates nice harmonic relations.
32. 31
Experiment and Results
• Objective Evaluation
20,000 bars are generated with each model and evaluate them in terms of the proposed
objective metrics.
The piano tracks are used as conditions and generate the other four tracks.
For comparison, we also include the result of an ablated version of the composer model,
one without batch normalization layers. This ablated model barely learns anything, so its
values can be taken as a reference.
For the intra-track metrics, we see that the jamming model tends to perform the best.
we see that all models tend to use more pitch classes and produce fairly less qualified
notes than the training data do.
This indicates that some noise might have been produced and that the generated music
contains a great amount of overly fragmented notes, which may result from the way we
binarize the continuous-valued output of G (to create binary-valued piano-rolls).
For the inter-track metric TD, composer model and the hybrid model are relatively lower
than that of the jamming models. ->music generated by the jamming model has weaker har
monic relation among tracks and that the composer model and the hybrid model may be
more appropriate for multi-track generation in terms of cross-track harmonic relation.
33. 32
Experiment and Results
• Example Results
(a) Training loss of the discriminator
(b) the UPC - number of used pitch
classes per bar
(c) the QN(Qualified Note) of the
strings track, for the composer model
generating from scratch
34. 33
Experiment and Results
• User Study
The subject rates the clips in terms of
whether they
• 1) have pleasant harmony
• 2) have unified rhythm
• 3) have clear musical structure
• 4) are coherent, and
• 5) the overall rating,
* a 5-point Likert scale
*144 subjects recruited from the
Internet via our social circles.
44 of them are deemed ‘pro user,’
according to a simple questionnaire
probing their musical background
36. 35
Conclusion and Future Work
• Conclusion
MuseGAN is a novel generative model for multi-track sequence generation under the
framework of GANs.
A model such as MuseGAN with deep CNNs for generating multi-track could be led to
new ideas left behind the work implemented in the thesis paper.
The objective metrics and the subjective user study show that the proposed models
can start to learn something about music.