This document discusses a system to automatically generate rhythms for songs in rhythm games. It presents a methodology that extracts common rhythm patterns from MIDI files using longest common subsequence algorithm. Patterns are assigned difficulty scores based on metrical complexity, number of notes, and longest repetition. Patterns are sorted into easy, medium, and hard levels for static mode games. For dynamic mode, patterns are replaced to match the specific song while maintaining difficulty level. An experiment found people engaged longer in both preferred and non-preferred music genres compared to just listening.
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)Chuyang Liu
This document summarizes work on using deep reinforcement learning to play the Snake game. It discusses:
1) Background on deep reinforcement learning algorithms like DQN that were applied to games like Atari 2600.
2) The author's implementation of a DQN using a global/local state vector to represent the Snake environment.
3) Experiments comparing the performance of standard DQN to optimizations like Double DQN, Prioritized Experience Replay, and Dueling Network structures.
This document discusses automatic speech recognition, including defining the task, the main challenges, and common approaches. The key difficulties identified are digitizing speech, separating speech from noise, dealing with variability between individuals, identifying phonemes, disambiguating homophones, handling continuous speech, and interpreting prosodic features. Common approaches are template matching, rule-based systems, and statistical/machine learning methods like hidden Markov models. Remaining challenges include robustness, adaptability, language modeling, and handling spontaneous speech.
This power-point presentation contains 45 slides. It describes SR system (a brief intro), what are the applications, the biological architecture of human speech recognition vs machine architecture, recognition process, flow summery of recognition process and the approaches to the SRS. All this is described in the first few slides (the first part, let's say), after that, this presentation describes the evolution process of SRS through the decades (the middle part), and at the last this presentation describes the machine learning approach in SRS. How neural net enhance the efficiency of a SRS.
The document discusses various techniques for smoothing N-gram language models, including Laplace smoothing, Good-Turing discounting, and backoff models. Laplace smoothing involves adding one to all counts to address unseen events. Good-Turing smoothing estimates probabilities for low counts based on the frequency of higher counts. Backoff models first use the highest available order N-gram model and only fallback to a lower order if the current count is zero.
PopMAG: Pop Music Accompaniment Generationivaderivader
This paper introduces PopMAG, a model that generates pop music accompaniment across multiple tracks simultaneously. It uses a novel multi-track MIDI representation called MuMIDI that encodes notes from different tracks into a single sequence, allowing the model to explicitly capture dependencies between tracks. The model achieves state-of-the-art results on three datasets based on both subjective listener evaluations and objective metrics. Ablation studies demonstrate the effectiveness of modeling notes as single tokens, using context memory in the encoder/decoder, and including bar/position embeddings.
slides presented at a three-hour local AI music course in Taiwan in Oct 2021; part 1: a brief introduction to music information retrieval (+analysis, +generation)
The document discusses parts-of-speech (POS) tagging. It defines POS tagging as labeling each word in a sentence with its appropriate part of speech. It provides an example tagged sentence and discusses the challenges of POS tagging, including ambiguity and open/closed word classes. It also discusses common tag sets and stochastic POS tagging using hidden Markov models.
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)Chuyang Liu
This document summarizes work on using deep reinforcement learning to play the Snake game. It discusses:
1) Background on deep reinforcement learning algorithms like DQN that were applied to games like Atari 2600.
2) The author's implementation of a DQN using a global/local state vector to represent the Snake environment.
3) Experiments comparing the performance of standard DQN to optimizations like Double DQN, Prioritized Experience Replay, and Dueling Network structures.
This document discusses automatic speech recognition, including defining the task, the main challenges, and common approaches. The key difficulties identified are digitizing speech, separating speech from noise, dealing with variability between individuals, identifying phonemes, disambiguating homophones, handling continuous speech, and interpreting prosodic features. Common approaches are template matching, rule-based systems, and statistical/machine learning methods like hidden Markov models. Remaining challenges include robustness, adaptability, language modeling, and handling spontaneous speech.
This power-point presentation contains 45 slides. It describes SR system (a brief intro), what are the applications, the biological architecture of human speech recognition vs machine architecture, recognition process, flow summery of recognition process and the approaches to the SRS. All this is described in the first few slides (the first part, let's say), after that, this presentation describes the evolution process of SRS through the decades (the middle part), and at the last this presentation describes the machine learning approach in SRS. How neural net enhance the efficiency of a SRS.
The document discusses various techniques for smoothing N-gram language models, including Laplace smoothing, Good-Turing discounting, and backoff models. Laplace smoothing involves adding one to all counts to address unseen events. Good-Turing smoothing estimates probabilities for low counts based on the frequency of higher counts. Backoff models first use the highest available order N-gram model and only fallback to a lower order if the current count is zero.
PopMAG: Pop Music Accompaniment Generationivaderivader
This paper introduces PopMAG, a model that generates pop music accompaniment across multiple tracks simultaneously. It uses a novel multi-track MIDI representation called MuMIDI that encodes notes from different tracks into a single sequence, allowing the model to explicitly capture dependencies between tracks. The model achieves state-of-the-art results on three datasets based on both subjective listener evaluations and objective metrics. Ablation studies demonstrate the effectiveness of modeling notes as single tokens, using context memory in the encoder/decoder, and including bar/position embeddings.
slides presented at a three-hour local AI music course in Taiwan in Oct 2021; part 1: a brief introduction to music information retrieval (+analysis, +generation)
The document discusses parts-of-speech (POS) tagging. It defines POS tagging as labeling each word in a sentence with its appropriate part of speech. It provides an example tagged sentence and discusses the challenges of POS tagging, including ambiguity and open/closed word classes. It also discusses common tag sets and stochastic POS tagging using hidden Markov models.
The document discusses various algorithms including dynamic programming, Warshall's and Floyd's algorithms, backtracking, branch and bound, graph coloring, the n-queen problem, Hamiltonian cycles, and the sum of subsets problem. It provides examples and explanations of these algorithms, such as using dynamic programming to solve the 0-1 knapsack problem and backtracking to solve the n-queen problem by trying different placements of queens on a chessboard.
Hidden Markov Models with applications to speech recognitionbutest
This document provides an introduction to hidden Markov models (HMMs). It discusses how HMMs can be used to model sequential data where the underlying states are not directly observable. The key aspects of HMMs are: (1) the model has a set of hidden states that evolve over time according to transition probabilities, (2) observations are emitted based on the current hidden state, (3) the four basic problems of HMMs are evaluation, decoding, training, and model selection. Examples discussed include modeling coin tosses, balls in urns, and speech recognition. Learning algorithms for HMMs like Baum-Welch and Viterbi are also summarized.
This document discusses key management for IPsec. It describes that IPsec uses two protocols: Oakley key determination protocol and Internet Security Association and Key Management Protocol (ISAKMP). Oakley uses Diffie-Hellman key exchange with cookies and nonces to establish secret keys securely. ISAKMP defines payloads for negotiating security attributes like encryption algorithms and authentication mechanisms. It also describes the ISAKMP header format which includes fields like initiator/responder cookies, next payload, version numbers, exchange type, flags, message ID and length.
The document describes a simulation of a telephone system to track processed, completed, blocked and busy calls. It shows the system state at various time steps as calls arrive and are connected or finished. When lines are all in use, arriving calls are delayed rather than lost. The simulation runs by scanning for the next event, selecting the activity that causes it, updating records to reflect the event's effects, and gathering statistics.
This document defines multimedia and describes its key components. Multimedia is defined as a combination of text, graphics, sound, animation and video delivered interactively via electronic means. The document discusses various multimedia elements including video, audio, images, maps and documents. It also covers multimedia applications, systems architecture, data interface standards and storage media. Common input devices for multimedia like pen, light pen, image scanners and MIDI are explained. Display and printing technologies such as CRT, LCD, inkjet and laser printers are also outlined.
The document discusses research issues in speech processing. It covers topics like speech production, speech processing tasks, speech measurements, speech signal components, automatic speech recognition, speaker recognition, text-to-speech systems, speech coding, and a proposed speech-assisted translation corrector system. The key challenges in speech processing research are modeling the human auditory system, developing large multilingual speech databases, and generating natural sounding synthetic speech.
Speech recognition systems convert spoken words to text in real-time. They are used in dictation software and intelligent assistants. Design challenges include background noise, accent variations, and speed of speech. Speaker dependent systems recognize one voice, while speaker independent systems recognize any voice without training. Speech is broken into phonemes and a hidden Markov model identifies phonemes and language models recognize words. Components include signal analysis, acoustic and language models. Applications include healthcare, military, phones, and personal computers. Siri and Google Now are examples of intelligent assistants using these techniques.
This document discusses regular languages and grammars. It begins by defining formal languages and describing two approaches to describing languages: the generative approach using grammars and the recognition approach using automata. It then discusses Noam Chomsky's hierarchy of formal grammars and how this classifies the expressive power of grammars. Regular languages are those described by regular grammars and recognized by finite automata. Regular expressions provide another way to describe regular languages. The document proves the equivalence between regular expressions, regular grammars, and finite automata by showing how to systematically construct automata from regular expressions and vice versa.
Abstract A usage of regular expressions to search text is well known and understood as a useful technique. Regular Expressions are generic representations for a string or a collection of strings. Regular expressions (regexps) are one of the most useful tools in computer science. NLP, as an area of computer science, has greatly benefitted from regexps: they are used in phonology, morphology, text analysis, information extraction, & speech recognition. This paper helps a reader to give a general review on usage of regular expressions illustrated with examples from natural language processing. In addition, there is a discussion on different approaches of regular expression in NLP. Keywords— Regular Expression, Natural Language Processing, Tokenization, Longest common subsequence alignment, POS tagging
----------------------------
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
This document discusses various sorting algorithms and their time complexities, including linear-time sorting algorithms. It introduces counting sort, which can sort in O(n) time when the range of input values is small. Radix sort is then presented as a generalization of counting sort that can sort integers in linear time by sorting based on individual digit positions. Bucket sort is also discussed as another linear-time sorting algorithm when inputs are uniformly distributed.
Natural language processing involves parsing text using a lexicon, categorization of parts of speech, and grammar rules. The parsing process involves determining the syntactic tree and label bracketing that represents the grammatical structure of sentences. Evaluation measures for parsing include precision, recall, and F1-score. Ambiguities from multiple word senses, anaphora, indexicality, metonymy, and metaphor make parsing challenging.
Deterministic context free grammars &non-deterministicLeyo Stephen
Deterministic context-free grammars are always unambiguous, while there are non-deterministic unambiguous grammars. The problem of determining if a grammar is ambiguous is undecidable in general. Many languages can have both ambiguous and unambiguous grammars, but some languages only admit ambiguous grammars and are considered inherently ambiguous.
The document discusses the Boyer-Moore string searching algorithm. It works by preprocessing the pattern string and comparing characters from right to left. If a mismatch occurs, it uses two heuristics - bad character and good suffix - to determine the shift amount. The bad character heuristic shifts past mismatching characters, while the good suffix heuristic looks for matching suffixes to allow larger shifts. The algorithm generally gets faster as the pattern length increases, running in sub-linear time on average. It has applications in tasks like virus scanning and database searching that require high-speed string searching.
The document discusses evolutionary algorithms and genetic algorithms. It defines evolutionary algorithms as computational models of natural selection and genetics that simulate evolution through processes of selection, mutation and reproduction to find optimal solutions to problems. Genetic algorithms are described as a class of stochastic search algorithms inspired by biological evolution that use concepts of natural selection and genetic inheritance to search for solutions. The key steps of a genetic algorithm are outlined, including initializing a population, evaluating fitness, selecting parents, performing crossover and mutation to produce offspring, and iterating over generations until a termination condition is met.
This document provides an overview of Markov models and hidden Markov models (HMMs). It begins by introducing Markov chains, which are probabilistic state machines where the probability of transitioning to the next state depends only on the current state. Hidden Markov models extend Markov chains by adding hidden states that are not directly observable. The key aspects of HMMs are defined, including the hidden states, observed outputs, transition probabilities, and output probabilities. The document then discusses how to compute the likelihood of an observed sequence given an HMM, including using the forward algorithm to efficiently sum over all possible state sequences. Overall, the document provides a conceptual introduction to Markov models and HMMs, focusing on their structure, assumptions, and the forward algorithm
This document discusses using MATLAB and a DSP processor for image processing and computer vision applications. It describes how MATLAB can be used to acquire images, analyze image content, and control actuators. However, image processing requires significant computational resources, so the code is run on a computer connected to a webcam rather than a microcontroller. The Texas Instruments TMS320C6713 DSK platform allows MATLAB codes to be implemented on a DSP processor for these types of applications. Example applications mentioned include medical imaging, object recognition, and robotic vision.
Digital cinema began transitioning to digital formats in 1999 and by 2012, 80% of screens worldwide had converted. Digital offers several advantages over analog film formats, including easier distribution and ability to preserve content for longer. Digital cinema uses DCP files rather than film reels and digital projectors like DLP rather than traditional film projectors. It provides security including frame-by-frame encryption and requires KDM keys to play content, securing the distribution process.
Skype Translator created buzz all over the place. Now you can embed same speech to speech translation service in Your applications. How does it work and what opportunities it creates for us to turn our visions of the future to reality of Today. Month ago Microsoft released a service, that allows anyone to extend their solutions and apps with such capability. In this session you will learn how Speech to Speech translations work. And will learn about companies and solutions that already use this capability.
Speech recognition technology allows users to communicate through spoken commands. It works by converting acoustic speech signals captured by a microphone into text. There are two main types of speech models - speaker independent models that can recognize many people, and speaker dependent models customized for a single person. The speech recognition process involves an audio input being digitized, then broken down into phonemes which are statistically modeled and matched to words in a grammar according to a dictionary to output recognized text.
Learning to Groove with Inverse Sequence Transformationsivaderivader
This document describes a model called GrooveVAE that aims to generate expressive drum performances from musical scores by learning the microtiming and velocity characteristics of human drumming performances. It presents several proposed models including an MLP, Seq2Seq, encoder-decoder, and Groove Transfer models. The models are evaluated on a dataset of human drum performances and aimed to accomplish tasks like humanizing a score by adding expressive timing and dynamics or completing an incomplete score. Results are measured using listening tests and quantitative metrics like timing error and velocity divergence. Future work could involve visualizing model outputs to help musicians understand generated expressive performances.
Three approaches to automatic composition are discussed: rule-based models, genetic algorithms, and statistical models. Recent advances include FlowComposer, which uses constrained Markov models to generate lead sheets based on user input, and WaveNet, a neural network that generates raw audio. While earlier rule-based systems focused on genres like classical music, statistical models with harmonic constraints now show the best performance. Deep learning models also show promise but require further study on incorporating harmonic constraints. Objective evaluation metrics are still needed to assess composition quality.
The document discusses various algorithms including dynamic programming, Warshall's and Floyd's algorithms, backtracking, branch and bound, graph coloring, the n-queen problem, Hamiltonian cycles, and the sum of subsets problem. It provides examples and explanations of these algorithms, such as using dynamic programming to solve the 0-1 knapsack problem and backtracking to solve the n-queen problem by trying different placements of queens on a chessboard.
Hidden Markov Models with applications to speech recognitionbutest
This document provides an introduction to hidden Markov models (HMMs). It discusses how HMMs can be used to model sequential data where the underlying states are not directly observable. The key aspects of HMMs are: (1) the model has a set of hidden states that evolve over time according to transition probabilities, (2) observations are emitted based on the current hidden state, (3) the four basic problems of HMMs are evaluation, decoding, training, and model selection. Examples discussed include modeling coin tosses, balls in urns, and speech recognition. Learning algorithms for HMMs like Baum-Welch and Viterbi are also summarized.
This document discusses key management for IPsec. It describes that IPsec uses two protocols: Oakley key determination protocol and Internet Security Association and Key Management Protocol (ISAKMP). Oakley uses Diffie-Hellman key exchange with cookies and nonces to establish secret keys securely. ISAKMP defines payloads for negotiating security attributes like encryption algorithms and authentication mechanisms. It also describes the ISAKMP header format which includes fields like initiator/responder cookies, next payload, version numbers, exchange type, flags, message ID and length.
The document describes a simulation of a telephone system to track processed, completed, blocked and busy calls. It shows the system state at various time steps as calls arrive and are connected or finished. When lines are all in use, arriving calls are delayed rather than lost. The simulation runs by scanning for the next event, selecting the activity that causes it, updating records to reflect the event's effects, and gathering statistics.
This document defines multimedia and describes its key components. Multimedia is defined as a combination of text, graphics, sound, animation and video delivered interactively via electronic means. The document discusses various multimedia elements including video, audio, images, maps and documents. It also covers multimedia applications, systems architecture, data interface standards and storage media. Common input devices for multimedia like pen, light pen, image scanners and MIDI are explained. Display and printing technologies such as CRT, LCD, inkjet and laser printers are also outlined.
The document discusses research issues in speech processing. It covers topics like speech production, speech processing tasks, speech measurements, speech signal components, automatic speech recognition, speaker recognition, text-to-speech systems, speech coding, and a proposed speech-assisted translation corrector system. The key challenges in speech processing research are modeling the human auditory system, developing large multilingual speech databases, and generating natural sounding synthetic speech.
Speech recognition systems convert spoken words to text in real-time. They are used in dictation software and intelligent assistants. Design challenges include background noise, accent variations, and speed of speech. Speaker dependent systems recognize one voice, while speaker independent systems recognize any voice without training. Speech is broken into phonemes and a hidden Markov model identifies phonemes and language models recognize words. Components include signal analysis, acoustic and language models. Applications include healthcare, military, phones, and personal computers. Siri and Google Now are examples of intelligent assistants using these techniques.
This document discusses regular languages and grammars. It begins by defining formal languages and describing two approaches to describing languages: the generative approach using grammars and the recognition approach using automata. It then discusses Noam Chomsky's hierarchy of formal grammars and how this classifies the expressive power of grammars. Regular languages are those described by regular grammars and recognized by finite automata. Regular expressions provide another way to describe regular languages. The document proves the equivalence between regular expressions, regular grammars, and finite automata by showing how to systematically construct automata from regular expressions and vice versa.
Abstract A usage of regular expressions to search text is well known and understood as a useful technique. Regular Expressions are generic representations for a string or a collection of strings. Regular expressions (regexps) are one of the most useful tools in computer science. NLP, as an area of computer science, has greatly benefitted from regexps: they are used in phonology, morphology, text analysis, information extraction, & speech recognition. This paper helps a reader to give a general review on usage of regular expressions illustrated with examples from natural language processing. In addition, there is a discussion on different approaches of regular expression in NLP. Keywords— Regular Expression, Natural Language Processing, Tokenization, Longest common subsequence alignment, POS tagging
----------------------------
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
This document discusses various sorting algorithms and their time complexities, including linear-time sorting algorithms. It introduces counting sort, which can sort in O(n) time when the range of input values is small. Radix sort is then presented as a generalization of counting sort that can sort integers in linear time by sorting based on individual digit positions. Bucket sort is also discussed as another linear-time sorting algorithm when inputs are uniformly distributed.
Natural language processing involves parsing text using a lexicon, categorization of parts of speech, and grammar rules. The parsing process involves determining the syntactic tree and label bracketing that represents the grammatical structure of sentences. Evaluation measures for parsing include precision, recall, and F1-score. Ambiguities from multiple word senses, anaphora, indexicality, metonymy, and metaphor make parsing challenging.
Deterministic context free grammars &non-deterministicLeyo Stephen
Deterministic context-free grammars are always unambiguous, while there are non-deterministic unambiguous grammars. The problem of determining if a grammar is ambiguous is undecidable in general. Many languages can have both ambiguous and unambiguous grammars, but some languages only admit ambiguous grammars and are considered inherently ambiguous.
The document discusses the Boyer-Moore string searching algorithm. It works by preprocessing the pattern string and comparing characters from right to left. If a mismatch occurs, it uses two heuristics - bad character and good suffix - to determine the shift amount. The bad character heuristic shifts past mismatching characters, while the good suffix heuristic looks for matching suffixes to allow larger shifts. The algorithm generally gets faster as the pattern length increases, running in sub-linear time on average. It has applications in tasks like virus scanning and database searching that require high-speed string searching.
The document discusses evolutionary algorithms and genetic algorithms. It defines evolutionary algorithms as computational models of natural selection and genetics that simulate evolution through processes of selection, mutation and reproduction to find optimal solutions to problems. Genetic algorithms are described as a class of stochastic search algorithms inspired by biological evolution that use concepts of natural selection and genetic inheritance to search for solutions. The key steps of a genetic algorithm are outlined, including initializing a population, evaluating fitness, selecting parents, performing crossover and mutation to produce offspring, and iterating over generations until a termination condition is met.
This document provides an overview of Markov models and hidden Markov models (HMMs). It begins by introducing Markov chains, which are probabilistic state machines where the probability of transitioning to the next state depends only on the current state. Hidden Markov models extend Markov chains by adding hidden states that are not directly observable. The key aspects of HMMs are defined, including the hidden states, observed outputs, transition probabilities, and output probabilities. The document then discusses how to compute the likelihood of an observed sequence given an HMM, including using the forward algorithm to efficiently sum over all possible state sequences. Overall, the document provides a conceptual introduction to Markov models and HMMs, focusing on their structure, assumptions, and the forward algorithm
This document discusses using MATLAB and a DSP processor for image processing and computer vision applications. It describes how MATLAB can be used to acquire images, analyze image content, and control actuators. However, image processing requires significant computational resources, so the code is run on a computer connected to a webcam rather than a microcontroller. The Texas Instruments TMS320C6713 DSK platform allows MATLAB codes to be implemented on a DSP processor for these types of applications. Example applications mentioned include medical imaging, object recognition, and robotic vision.
Digital cinema began transitioning to digital formats in 1999 and by 2012, 80% of screens worldwide had converted. Digital offers several advantages over analog film formats, including easier distribution and ability to preserve content for longer. Digital cinema uses DCP files rather than film reels and digital projectors like DLP rather than traditional film projectors. It provides security including frame-by-frame encryption and requires KDM keys to play content, securing the distribution process.
Skype Translator created buzz all over the place. Now you can embed same speech to speech translation service in Your applications. How does it work and what opportunities it creates for us to turn our visions of the future to reality of Today. Month ago Microsoft released a service, that allows anyone to extend their solutions and apps with such capability. In this session you will learn how Speech to Speech translations work. And will learn about companies and solutions that already use this capability.
Speech recognition technology allows users to communicate through spoken commands. It works by converting acoustic speech signals captured by a microphone into text. There are two main types of speech models - speaker independent models that can recognize many people, and speaker dependent models customized for a single person. The speech recognition process involves an audio input being digitized, then broken down into phonemes which are statistically modeled and matched to words in a grammar according to a dictionary to output recognized text.
Learning to Groove with Inverse Sequence Transformationsivaderivader
This document describes a model called GrooveVAE that aims to generate expressive drum performances from musical scores by learning the microtiming and velocity characteristics of human drumming performances. It presents several proposed models including an MLP, Seq2Seq, encoder-decoder, and Groove Transfer models. The models are evaluated on a dataset of human drum performances and aimed to accomplish tasks like humanizing a score by adding expressive timing and dynamics or completing an incomplete score. Results are measured using listening tests and quantitative metrics like timing error and velocity divergence. Future work could involve visualizing model outputs to help musicians understand generated expressive performances.
Three approaches to automatic composition are discussed: rule-based models, genetic algorithms, and statistical models. Recent advances include FlowComposer, which uses constrained Markov models to generate lead sheets based on user input, and WaveNet, a neural network that generates raw audio. While earlier rule-based systems focused on genres like classical music, statistical models with harmonic constraints now show the best performance. Deep learning models also show promise but require further study on incorporating harmonic constraints. Objective evaluation metrics are still needed to assess composition quality.
A Unified Music Recommender System Using Listening Habits and Semantics of Tagsdatasciencekorea
The document describes a unified music recommendation system that combines users' listening habits and semantics of tags. It proposes generating three types of user profiles: listening habits-based, tag-based, and a hybrid approach. A tag and emotion ontology are used to preprocess tags and assign weights. A music recommendation algorithm finds similar users and calculates item scores. An evaluation of the approaches found the hybrid method achieved the best precision and recall based on F-measure, outperforming listening habits only or tag-based recommendations. Statistical analysis confirmed the hybrid approach performed significantly better.
The document describes a unified music recommendation system that combines users' listening habits and semantics of tags. It proposes generating three types of user profiles: listening habits-based, tag-based, and a hybrid approach. A tag and emotion ontology are used to preprocess tags and assign weights. A music recommendation algorithm finds similar users and calculates item scores. An evaluation of the approaches found the hybrid method achieved the best precision and recall based on F-measure, outperforming listening habits only or tag-based recommendations. Statistical analysis confirmed the hybrid approach performed significantly better.
This document summarizes deep learning research at Niland Music for music recommendation. It describes how Niland has moved from traditional music information retrieval techniques to deep learning approaches using convolutional neural networks. Key points include:
- CNNs trained on mel-spectrograms of songs can achieve similar or better results than complex hand-engineered features and pooling techniques.
- Simple pooling methods like mean, max and variance work well with CNNs, outperforming more complex approaches.
- Training on larger datasets of 150k+ tracks improves results over smaller datasets.
- Residual networks can further improve performance over plain convolutional networks.
- More data, data augmentation, and semi-supervised techniques may provide additional gains
The document summarizes a paper on developing a music creation system using AI. It describes:
1) Conducting studies with video creators to understand challenges in adding music and preferences for a collaborative AI system. Participants preferred selecting from multiple example songs.
2) Developing a prototype that allows users to generate AI music for videos by mixing elements of songs and previewing automated combinations.
3) Evaluating the prototype with video creators and experts, who found it intuitive to use and that it provided a satisfactory level of creative control. Future work will focus on improving musical quality and developing new evaluation methods.
DNN-based frequency-domain permutation solver for multichannel audio source s...Kitamura Laboratory
Fumiya Hasuike, Daichi Kitamura, and Rui Watanabe,"DNN-based frequency-domain permutation solver for multichannel audio source separation," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2022), pp. 872–877, Chiang Mai, Thailand, November 2022.
earPod: Eyes-free Menu Selection using Touch Input and Reactive Audio FeedbackKimberly Aguada
This document outlines research on the earPod technique for eyes-free menu selection using touch input and auditory feedback. The research addressed how to design usable eyes-free menus and work around problems with serial audio. An earPod prototype was developed and evaluated against visual menus in 4 empirical studies. The studies found that earPod had comparable performance to visual menus and its use improved rapidly with practice. Audio techniques also worked better than visual when performing a secondary menu task while driving. The research contributed a novel eyes-free interaction technique and design recommendations for incorporating such techniques on mobile devices.
Crowsourcing for Social Multimedia Task: Crowsorting Timed Comments about Musicmultimediaeval
This paper provides an overview of the Crowdsorting Timed Comments about Music Task, a new task in the area of crowdsourcing for social media offered by the MediaEval 2014 Multimedia Benchmark. Data for this task is a set of Electronic Dance Music (EDM) tracks, collected from online music sharing platform Soundcloud. Given a set of noisy labels for segments of Electronic Dance Music (EDM) that were collected on Amazon Mechanical Turk, the task is to predict a single `correct' label. The labels indicate whether or not a `drop' occurs in the particular music segment. The larger aim of this task is to contribute to the development of hybrid human/conventional computation techniques to generate accurate labels for social multimedia content. For this reason, participants are also encouraged to predict labels by combining input from the crowd (i.e., human computation) with automatic computation (i.e., processing techniques applied to textual metadata and/or audio signal analysis).
http://ceur-ws.org/Vol-1263/mediaeval2014_submission_78.pdf
Automatic Set List Identification and Song Segmentation of Full-Length Concer...Ju-Chiang Wang
Recently, plenty of full-length concert videos have become available on video-sharing websites such as YouTube. As each video generally contains multiple songs, natural questions that arise include “what is the set list?” and “when does each song begin and end?” Indeed, many full concert videos on YouTube contain song lists and timecodes contributed by uploaders and viewers. However, newly uploaded content and videos of lesser-known artists typically lack this metadata. Manually labeling such metadata would be labor-intensive, and thus an automated solution is desirable. In this paper, we define a novel research problem, automatic set list segmentation of full concert videos, which calls for techniques in music information retrieval (MIR) such as audio fingerprinting, cover song identification, musical event detection, music alignment, and structural segmentation. Moreover, we propose a greedy approach that sequentially identifies a song from a database of studio versions and simultaneously estimates its probable boundaries in the concert. We conduct preliminary evaluations on a collection of 20 full concerts and 1,152 studio tracks. Our result demonstrates the effectiveness of the proposed greedy algorithm.
This document presents a user-centered approach for recommending music songs for daily activities. It explores using a user's preferences to generate personalized recommendations through collaborative filtering and characterizing songs based on acoustic features. An evaluation with 10 users found the personalized recommendations selected more songs and increased user satisfaction over the generic recommendations by 10%. However, more evaluation is still needed with more users and activities to further validate the approach.
This document discusses statistical computing for big data using distributed computing frameworks like MapReduce and Hadoop. It introduces MapReduce concepts like mappers, reducers, and Hadoop components including HDFS and YARN. Statistical challenges with big data are described, like scalability, dimensionality, and heterogeneity. The document discusses approaches for computing statistics on large datasets in parallel, including the Bag of Little Bootstraps method which breaks data into partitions to allow bootstrapping computations to run independently on clusters. Examples of computing means and counts in parallel using MapReduce are also provided.
The document summarizes a student robotics team's project to build an autonomous robot named Hermione to complete challenges in a competition using a VEX robotics platform. The team faced challenges with design, building, programming, testing, and completing the project within budget and time constraints. They implemented project management processes and received guidance from mentors to tackle challenges at each stage and produce a functioning robot to compete.
The document summarizes Kenneth Emeka Odoh's presentation on recommender systems and his solution to the WSDM Challenge competition. It includes discussions of the top solutions which used techniques like light gradient boosted machines, neural networks, and ensemble modeling. It also describes Kenneth's solution using bidirectional LSTMs with techniques like batch normalization and dropout to avoid overfitting on the time series song listening data. Overall, the presentation covered many state-of-the-art recommender system techniques for sequential and time series prediction tasks.
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training ivaderivader
The document introduces MusicBERT, a large-scale pre-trained Transformer model for symbolic music understanding. MusicBERT uses a novel encoding method called OctupleMIDI to efficiently encode music sequences into compact tokens. It also employs a bar-level masking strategy during pre-training. MusicBERT is evaluated on four music understanding tasks and achieves state-of-the-art results, demonstrating the effectiveness of its encoding method and pre-training approach.
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
Presented at IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano, Yu Takahashi, Kazunobu Kondo, "Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing," Proceedings of IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013), pp.392-397, Athens, Greece, December 2013.
This is the official presentation of the team Creamy Fireflies for the RecSys Challenge Workshop 2018, we achieved the 2nd place in the creative track and the 4th place in the Main track.
In the slides it is explained our methodologies, studies and model.
We are a team of six MSc students from Politecnico di Milano:
• Sebastiano Antenucci
• Simone Boglio
• Emanuele Chioso
• Ervin Dervishaj
• Shuwen Kang
• Tommaso Scarlatti.
and one PhD candidate:
• Maurizio Ferrari Dacrema
The document discusses heuristic algorithms and their applications. It describes various heuristic algorithms including greedy algorithms, hill climbing, simulated annealing, tabu search, and ant colony algorithms. It provides examples of how these algorithms can be applied to job scheduling problems and the traveling salesman problem. The advantages of heuristic algorithms are that they can find approximate solutions quickly without needing to derive formulas, but the disadvantage is solutions are not guaranteed to be optimal.
Talk for first-year PhD students at the CRG. The goal of the talk was to present scenarios that students will likely face and that can compromise reproducibility and efficiency in the analysis of data in the life sciences. Importantly, making the questions is probably more important than the given answers.
Similar to A system to generate rhythms automatically for songs in rhythm game (20)
Project Management: The Role of Project Dashboards.pdfKarya Keeper
Project management is a crucial aspect of any organization, ensuring that projects are completed efficiently and effectively. One of the key tools used in project management is the project dashboard, which provides a comprehensive view of project progress and performance. In this article, we will explore the role of project dashboards in project management, highlighting their key features and benefits.
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfVALiNTRY360
Salesforce Healthcare CRM, implemented by VALiNTRY360, revolutionizes patient management by enhancing patient engagement, streamlining administrative processes, and improving care coordination. Its advanced analytics, robust security, and seamless integration with telehealth services ensure that healthcare providers can deliver personalized, efficient, and secure patient care. By automating routine tasks and providing actionable insights, Salesforce Healthcare CRM enables healthcare providers to focus on delivering high-quality care, leading to better patient outcomes and higher satisfaction. VALiNTRY360's expertise ensures a tailored solution that meets the unique needs of any healthcare practice, from small clinics to large hospital systems.
For more info visit us https://valintry360.com/solutions/health-life-sciences
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Odoo releases a new update every year. The latest version, Odoo 17, came out in October 2023. It brought many improvements to the user interface and user experience, along with new features in modules like accounting, marketing, manufacturing, websites, and more.
The Odoo 17 update has been a hot topic among startups, mid-sized businesses, large enterprises, and Odoo developers aiming to grow their businesses. Since it is now already the first quarter of 2024, you must have a clear idea of what Odoo 17 entails and what it can offer your business if you are still not aware of it.
This blog covers the features and functionalities. Explore the entire blog and get in touch with expert Odoo ERP consultants to leverage Odoo 17 and its features for your business too.
An Overview of Odoo ERP
Odoo ERP was first released as OpenERP software in February 2005. It is a suite of business applications used for ERP, CRM, eCommerce, websites, and project management. Ten years ago, the Odoo Enterprise edition was launched to help fund the Odoo Community version.
When you compare Odoo Community and Enterprise, the Enterprise edition offers exclusive features like mobile app access, Odoo Studio customisation, Odoo hosting, and unlimited functional support.
Today, Odoo is a well-known name used by companies of all sizes across various industries, including manufacturing, retail, accounting, marketing, healthcare, IT consulting, and R&D.
The latest version, Odoo 17, has been available since October 2023. Key highlights of this update include:
Enhanced user experience with improvements to the command bar, faster backend page loading, and multiple dashboard views.
Instant report generation, credit limit alerts for sales and invoices, separate OCR settings for invoice creation, and an auto-complete feature for forms in the accounting module.
Improved image handling and global attribute changes for mailing lists in email marketing.
A default auto-signature option and a refuse-to-sign option in HR modules.
Options to divide and merge manufacturing orders, track the status of manufacturing orders, and more in the MRP module.
Dark mode in Odoo 17.
Now that the Odoo 17 announcement is official, let’s look at what’s new in Odoo 17!
What is Odoo ERP 17?
Odoo 17 is the latest version of one of the world’s leading open-source enterprise ERPs. This version has come up with significant improvements explained here in this blog. Also, this new version aims to introduce features that enhance time-saving, efficiency, and productivity for users across various organisations.
Odoo 17, released at the Odoo Experience 2023, brought notable improvements to the user interface and added new functionalities with enhancements in performance, accessibility, data analysis, and management, further expanding its reach in the market.
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid
IBM watsonx Code Assistant for Z, our latest Generative AI-assisted mainframe application modernization solution. Mainframe (IBM Z) application modernization is a topic that every mainframe client is addressing to various degrees today, driven largely from digital transformation. With generative AI comes the opportunity to reimagine the mainframe application modernization experience. Infusing generative AI will enable speed and trust, help de-risk, and lower total costs associated with heavy-lifting application modernization initiatives. This document provides an overview of the IBM watsonx Code Assistant for Z which uses the power of generative AI to make it easier for developers to selectively modernize COBOL business services while maintaining mainframe qualities of service.
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...The Third Creative Media
"Navigating Invideo: A Comprehensive Guide" is an essential resource for anyone looking to master Invideo, an AI-powered video creation tool. This guide provides step-by-step instructions, helpful tips, and comparisons with other AI video creators. Whether you're a beginner or an experienced video editor, you'll find valuable insights to enhance your video projects and bring your creative ideas to life.
What to do when you have a perfect model for your software but you are constrained by an imperfect business model?
This talk explores the challenges of bringing modelling rigour to the business and strategy levels, and talking to your non-technical counterparts in the process.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...kalichargn70th171
In today's fiercely competitive mobile app market, the role of the QA team is pivotal for continuous improvement and sustained success. Effective testing strategies are essential to navigate the challenges confidently and precisely. Ensuring the perfection of mobile apps before they reach end-users requires thoughtful decisions in the testing plan.
14 th Edition of International conference on computer visionShulagnaSarkar2
About the event
14th Edition of International conference on computer vision
Computer conferences organized by ScienceFather group. ScienceFather takes the privilege to invite speakers participants students delegates and exhibitors from across the globe to its International Conference on computer conferences to be held in the Various Beautiful cites of the world. computer conferences are a discussion of common Inventions-related issues and additionally trade information share proof thoughts and insight into advanced developments in the science inventions service system. New technology may create many materials and devices with a vast range of applications such as in Science medicine electronics biomaterials energy production and consumer products.
Nomination are Open!! Don't Miss it
Visit: computer.scifat.com
Award Nomination: https://x-i.me/ishnom
Conference Submission: https://x-i.me/anicon
For Enquiry: Computer@scifat.com
Malibou Pitch Deck For Its €3M Seed Roundsjcobrien
French start-up Malibou raised a €3 million Seed Round to develop its payroll and human resources
management platform for VSEs and SMEs. The financing round was led by investors Breega, Y Combinator, and FCVC.
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
Consistent toolbox talks are critical for maintaining workplace safety, as they provide regular opportunities to address specific hazards and reinforce safe practices.
These brief, focused sessions ensure that safety is a continual conversation rather than a one-time event, which helps keep safety protocols fresh in employees' minds. Studies have shown that shorter, more frequent training sessions are more effective for retention and behavior change compared to longer, infrequent sessions.
Engaging workers regularly, toolbox talks promote a culture of safety, empower employees to voice concerns, and ultimately reduce the likelihood of accidents and injuries on site.
The traditional method of conducting safety talks with paper documents and lengthy meetings is not only time-consuming but also less effective. Manual tracking of attendance and compliance is prone to errors and inconsistencies, leading to gaps in safety communication and potential non-compliance with OSHA regulations. Switching to a digital solution like Safelyio offers significant advantages.
Safelyio automates the delivery and documentation of safety talks, ensuring consistency and accessibility. The microlearning approach breaks down complex safety protocols into manageable, bite-sized pieces, making it easier for employees to absorb and retain information.
This method minimizes disruptions to work schedules, eliminates the hassle of paperwork, and ensures that all safety communications are tracked and recorded accurately. Ultimately, using a digital platform like Safelyio enhances engagement, compliance, and overall safety performance on site. https://safelyio.com/
A system to generate rhythms automatically for songs in rhythm game
1. A System to Generate Rhythms
Automatically for Songs in Rhythm Game
Kuan-Ting Chen 陳冠廷
Advisor : Yi-Shin Chen
Institute of Information Systems and Applications
National Tsing Hua University
1
3. Introduction
• There are many different music genres
• People have their preferred and not preferred
music genres
• Making people to engage in the music genres
which they are not preferred is our purpose
3
5. Related Work
• Conducting systems
– iConductor-[Liu et al. 2009]
– An interactive conducting system using kinect-[Toh
et al. 2012]
– Music interaction on mobile phones-[Chao et al.
2014]
5
6. Related Work
• Rhythm games
– Related to melody
• Guitar Hero
• Magic Piano
• Sing! Karaoke
– Only related to tempo
• 太鼓の達人
• ドラムマニア(DrumMania)
6
7. Related Work
• But in existing rhythm game, every time
players play, the tempo hints will always be
the same
• Replayability-[B. Shelley. Guidelines for Developing
Successful Games. 2001]
it motivates us to generate the dynamic tempo hints
for the rhythm game
7
8. Objective
• Static Mode
– For people who want to make good score in
rhythm game
• Dynamic Mode
– For people who want freshness in rhythm game
– make people stick in rhythm game more
– make game more flexible(just input a MIDI file,
then can directly play, no need to compose
rhythm sheet by players themselves)
8
11. Methodology
Data Collection
• Static Mode
– Downloaded specific MIDI file from
“https://freemidi.org/”
• Dynamic Mode
– We want to extract common rhythm patterns in
many songs
– Downloaded 24836 MIDI files from
“https://freemidi.org/”
11
13. Methodology
Data Preprocessing
• Static Mode
– Covert that specific MIDI file to XML file
• Dynamic Mode
– Only process with 15493 4/4 beat MIDI files
– Covert them to XML files
13
14. • Instrument number
• Measure number
• Number Or Character Format
– 32nd note → 1
– Fourth note → 8
– Half note → G
– Whole note → W
– 32nd rest → (1)
– Fourth rest → (8)
– Half rest → (G)
– Whole rest → (W)
Methodology
Data Preprocessing
14
17. Methodology
Pattern Extraction
• Longest Common Subsequence Algorithm
17
(G)8(8)(4)4(8)8(8) (4)4(8)8(8) (4)4(8)8(4)4 (4)4(8)8(8)(4)4(8)8(8)(4)4(8)8(4)4 (4)4(8)44(4)4
Longest common subsequence algorithm
(4)4(8)8(8)(4)4(8)8(8)(4)4(8)8(4)4
18. 18
Aggregate LCS result from all songs
Combine duplicate result actually in rhythm game
Keep patterns whose count is greater than threshold,
split to patterns whose length is one measure
Using LCS algorithm and then divide the count depending on the length of the song
Methodology
Pattern Extraction
23. • Longest Repetition method
– “SuccessiveSubPattern” function
• Ex. Input: “44442(2)2(2)44” Output: “4444”, “2(2)2(2)” and “44”
– “FindUnitPattern” function
• Ex. Input: “4444” Output: “4”
– “LongestRepeatedSubPattern” function
• Ex. Input: “84(8)84” Output: “84”, because “84” occurs twice in this pattern
– “Difficulty Score” function
• rt: rhythm notation
• Note:
1
𝑟𝑡
Ex. quarter note → 1/8 32nd note → 1/1
• Rest: -
1
𝑟𝑡∗2
Ex. quarter rest → -1/16
• Because one quarter note plus one quarter rest equals one half note in rhythm
game, so we set the difficulty score for rest this way (1/8 +(-1/16) = 1/16)
23
Methodology
Difficulty Score Calculation
24. • Output of “SuccessiveSubPattern”function is
not null:
– Ex. Pattern: “44442(2)2(2)44”
24
Methodology
Difficulty Score Calculation
4444 2(2)2(2) 44
(1/4)
repeat four times
*(1/4) + [(1/2-1/4)]
repeat two times
*(1/2) +(1/4)
repeat two times
*(1/2)
25. • Output of “SuccessiveSubPattern”function is null, but
output of “LongestRepeatedSubPattern” function is
not null:
– Ex. Pattern: “84(8)84”
25
Methodology
Difficulty Score Calculation
84 84 (8)
(1/8+1/4) *(1/2)
repeat two times
but not adjacent
*1.2 +(-1/16)
26. • Output of “SuccessiveSubPattern”function and
output of “LongestRepeatedSubPattern” function are
all null:
– Ex. Pattern: “(2)BC7”
26
Methodology
Difficulty Score Calculation
(-1/4) +(1/11) +(1/12) +(1/7)
30. • Static Mode
– Ex. Easy difficulty level:
→ Medium difficulty level:
• Dynamic Mode
– We want to match the specific song, do not want
that generated rhythm is totally unrelated to that
specific song
– AAAB → CCCD
(length of A,B,C,D are all one measure)
Methodology
Pattern Replacement
30
31. Experiment
Experimental Setup
• Nexus 5 smartphone and a rhythm game open
source(Beats)
• Two set of experiment:
– In Quiet environment
• 22 people(11 people started from their preferred music
genre, other 11 people started from their not preferred
music genre)
– In noisy environment
• 6 people(all started from their not preferred music
genre)
31
32. • Classical music and Popular music
• Provide 5 songs in each music genre for
selection
• Listening → Playing
• Open triangle sound for classical music, bass
drum sound for popular music
• Let them rate the interesting level by 5 point
Likert scale
32
Experiment
Experimental Setup
35. Experiment
Experiment Result
35
Preferred music genre
Not preferred music genre
The hypothesis passed the t-test with
the p-value 2.41941*10−9
,
which indicates that the results is
significant at the p < 0.001 level
(99% confidence level).
The hypothesis passed the t-test with
the p-value 6.5886*10−8,
which indicates that the results is
significant at the p < 0.001 level
(99% confidence level).
0
100
200
300
400
500
600
700
800
900
1000
1100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
seconds
user no.
listening duration playing duration
0
100
200
300
400
500
600
700
800
900
1000
1100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
seconds
user no.
listening duration playing duration
In Quiet environment
36. 36
Experiment
Experiment Result
0
1
2
3
4
5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
5
points
Likert
scale
user no.
preferred music genre not preferred music genre
Preferred music genre average points: 3.77
Not preferred music genre average points: 3.63
Feel open triangle sound does not match the song
In dynamic mode, it happens that rhythm does not match the song,
so bass drum sound makes it strange
The quality of music is not good(because of using MIDI file)
In Quiet environment
37. 37
Experiment
Experiment Result
0
20
40
60
80
100
120
140
160
180
1 2 3 4 5 6
seconds
user no.
listening duration playing duration
Preferred music genre
Not preferred music genre
0
100
200
300
400
500
1 2 3 4 5 6
seconds
user no.
listening duration playing duration
The hypothesis passed the t-test with
the p-value 0.03606,
which indicates that the results is
significant at the p < 0.05 level
(95% confidence level).
The hypothesis passed the t-test with
the p-value 0.00156,
which indicates that the results is
significant at the p < 0.05 level
(95% confidence level).
In Noisy environment
38. 38
Experiment
Experiment Result
0
1
2
3
4
5
1 2 3 4 5 6
5
points
Likert
scale
user no.
preferred music genre not preferred music genre
Preferred music genre average points: 4.83
Not preferred music genre average points: 4
Tempo hints in that popular song is not exciting
In Noisy environment
39. Conclusion and Future Work
• we propose a system which can generate rhythms
for songs in rhythm game automatically
• The experiment result reveals that in either
preferred or not preferred music genre, the
playing duration is longer significantly than
listening duration of users
• we expect that our rhythm game can engage
people in music genre that is uncommon in their
daily lives
39
40. Conclusion and Future Work
• In our rhythm game, we use MIDI files to let
players to play. If could use MP3 files instead,
the game experience will be better
• If could make further adjustment about
patterns to make it more suitable for specific
song, the game experience will be better
36
43. References
• Beats portable. https://github.com/Keripo/Beats.
• DrumMania (ドラムマニア). https://en.wikipedia.org/wiki/Drum_Mania.
• Freemidi.org. https://freemidi.org/
• Guitar Hero. https://en.wikipedia.org/wiki/Guitar_Hero.
• Longest common subsequence.
https://en.wikipedia.org/wiki/Longest_common_subsequence_problem.
• Magic Piano.
https://play.google.com/store/apps/details?id=com.smule.magicpiano&hl=zh_TW.
• Musescore. https://musescore.org/.
• Sing! Karaoke. https://en.wikipedia.org/wiki/Sing!_Karaoke.
• Taiko no Tatsujin (太鼓の達人) https://en.wikipedia.org/wiki/Taiko_no_Tatsujin.
• [Chao et al. 2014] W. Chao, K.-T. Chen, and Y.-S. Chen. Music interaction on mobile
phones. In Advances in Multimedia Information Processing - PCM 2014, pages
153–162. Springer, 2014.
43
44. References
• [Lerdahl et al. 1983] F. Lerdahl and R. Jackendoff. A Generative Theory of Tonal Music.
MIT Press, Cambridge, 1983.
• [Liu et al. 2009] Y.-L. Liu and Y.-S. Chen. iconduct:music control in the interactive
conducting system. Technical report, 2009.
• B. Shelley. Guidelines for Developing Successful Games.
http://www.gamasutra.com/view/feature/3041/guidelines_for_developing_.php,
2001. [Online; accessed 15-August-2001].
• [Toh et al. 2012] L.-W. Toh, W. Chao, and Y.-S. Chen. An interactive conducting
system using kinect. In Proceedings of The 2013 IEEE International Conference on
Multimedia and Expo, pages 1–6. IEEE, 2012.
• [Toussaint 2002] G. T. Toussaint. A mathematical analysis of african, brazilian and
cuban clave rhythms. In Proceedings of BRIDGES: Mathematical Connections
in Art, Music and Science, pages 157–168, 2002.
44
Editor's Notes
放Objective 有甚麼目標導致現在要做兩個模式
字放大
為什麼要用LCS
舉例子 做動畫
It’s my life acoustic snare durm
不該除上小節數 該除上做了LCS的次數