This seminar report summarizes query by humming technology. The basic architecture involves extracting melodic information from a hummed input, transcribing it, and comparing it to melodic contours in a database. Challenges include imperfect user queries and accurately capturing pitches from hums. Popular query by humming applications include Shazam, SoundHound, and Midomi. The report also discusses file formats like WAV and MIDI, and the Parsons code algorithm for representing melodies.
This certificate confirms that Ressania Chiwara was awarded the ACCA Advanced Diploma in Accounting and Business in February 2013. It was issued by the Association of Chartered Certified Accountants and contains Ressania Chiwara's registration number, certificate number, and a signature from the director of learning. The certificate remains the property of ACCA and cannot be altered or defaced.
Tendai Mugwagwa has completed the professional level of ACCA examinations, passing five subjects - P1 Governance, Risk and Ethics, P2 Corporate Reporting, P3 Business Analysis, P4 Advanced Financial Management, and P7 Advanced Audit and Assurance. This April 2016 certificate from the Association of Chartered Certified Accountants confirms Tendai Mugwagwa's registration number and qualifications at the professional level.
Sistem Multimedia memberikan pemahaman tentang konsep dasar multimedia dan komponen pembentuk sistem multimedia kepada mahasiswa Teknik Informatika. Mata kuliah ini membahas produksi konten, representasi data, penyimpanan, jaringan, distribusi, dan keamanan multimedia melalui tujuh pokok bahasan.
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...maranlar
Within computer science, "Multimedia" is a field of research that investigates how computers can support people in communication, information finding, and knowledge/opinion building. Multimedia content is defined broadly. It includes not only video, but also images accompanied by text and other information (for example, a geo-location). It can be professionally produced, or generated by users for online sharing. Computer scientists historically have a “love-hate” relationship with multimedia. They “love” it because of the richness of the data sources and the wealth of available data, which leads to interesting problems to tackle with machine learning. They “hate” it because multimedia is a diffuse and moving target: the interpretation of multimedia differs from person to person, and changes over time in the course of its use as a communication medium. This talk gives a view onto ongoing research in the area of multimedia information retrieval algorithms, which help people find multimedia. We look at a series of topics that reveal how pattern recognition, text processing, and crowdsourcing tools are used in multimedia research, and discuss both their limitations and their potential.
This certificate confirms that Ressania Chiwara was awarded the ACCA Advanced Diploma in Accounting and Business in February 2013. It was issued by the Association of Chartered Certified Accountants and contains Ressania Chiwara's registration number, certificate number, and a signature from the director of learning. The certificate remains the property of ACCA and cannot be altered or defaced.
Tendai Mugwagwa has completed the professional level of ACCA examinations, passing five subjects - P1 Governance, Risk and Ethics, P2 Corporate Reporting, P3 Business Analysis, P4 Advanced Financial Management, and P7 Advanced Audit and Assurance. This April 2016 certificate from the Association of Chartered Certified Accountants confirms Tendai Mugwagwa's registration number and qualifications at the professional level.
Sistem Multimedia memberikan pemahaman tentang konsep dasar multimedia dan komponen pembentuk sistem multimedia kepada mahasiswa Teknik Informatika. Mata kuliah ini membahas produksi konten, representasi data, penyimpanan, jaringan, distribusi, dan keamanan multimedia melalui tujuh pokok bahasan.
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...maranlar
Within computer science, "Multimedia" is a field of research that investigates how computers can support people in communication, information finding, and knowledge/opinion building. Multimedia content is defined broadly. It includes not only video, but also images accompanied by text and other information (for example, a geo-location). It can be professionally produced, or generated by users for online sharing. Computer scientists historically have a “love-hate” relationship with multimedia. They “love” it because of the richness of the data sources and the wealth of available data, which leads to interesting problems to tackle with machine learning. They “hate” it because multimedia is a diffuse and moving target: the interpretation of multimedia differs from person to person, and changes over time in the course of its use as a communication medium. This talk gives a view onto ongoing research in the area of multimedia information retrieval algorithms, which help people find multimedia. We look at a series of topics that reveal how pattern recognition, text processing, and crowdsourcing tools are used in multimedia research, and discuss both their limitations and their potential.
The document discusses developing a model to compose monophonic world music using deep learning techniques. It proposes using a bi-axial recurrent neural network with one axis representing time and the other representing musical notes. The network will be trained on a dataset of MIDI files describing pitch, timing, and velocity of notes. It will also incorporate information from music theory on scales, chords, and other elements extracted from sheet music files. The goal is to generate unique musical sequences while adhering to music theory rules. The model aims to address the problem of composing long durations of background music for public spaces in an automated way.
This document presents a device called the Tonalyzer, which provides a visual representation of tone to help musicians understand and achieve their desired tone. The device uses audio processing and Fourier analysis to analyze the frequency components of an input sound and display them graphically in real-time. It also allows users to save tone profiles for later comparison. An extensive user survey found that most target users are experienced musicians who struggle to describe tone and would benefit from a device to analyze and match tones. The key user specifications for the Tonalyzer are audio input, an interactive visual display, tone storage capabilities, durability for portable use, and long battery life to support musician needs.
The document discusses speech recognition and voice recognition. It covers what voice is, the components of sound, why voices are different, classification of speech sounds, the speech production process, what voice recognition is, automatic speech recognition (ASR), types of ASR systems including speaker-dependent and speaker-independent, approaches to speech recognition including template matching and statistical approaches, and the process of speech recognition.
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...TELKOMNIKA JOURNAL
The ability of traditional packet level Forward Error Correction approaches can limit errors for
small sporadic network losses but when dropouts of large portions occur listening quality becomes an
issue. Services such as audio-on-demand drastically increase the loads on networks therefore new, robust
and highly efficient coding algorithms are necessary. One method overlooked to date, which can work
alongside existing audio compression schemes, is that which takes account of the semantics and natural
repetition of music through meta-data tagging. Similarity detection within polyphonic audio has presented
problematic challenges within the field of Music Information Retrieval. We present a system which works
at the content level thus rendering it applicable in existing streaming services. Using the MPEG–7 Audio
Spectrum Envelope (ASE) gives features for extraction and combined with k-means clustering enables
self-similarity to be performed within polyphonic audio.
This document discusses the use of artificial intelligence in organized sound as surveyed in the journal Organised Sound. It provides an overview of key AI technologies like Auto-Tune audio processing that can correct pitch and organize sound. Applications discussed include general sound classification, open sound control for music networking, and time-frequency representations for sound analysis and resynthesis. The document also outlines recent research on intelligent composer assistants, responsive instruments, and recognition of musical sounds. Finally, it discusses the future of AI in organizing sound through planning and machine learning.
Application of Recurrent Neural Networks paired with LSTM - Music GenerationIRJET Journal
This document discusses using recurrent neural networks and long short-term memory networks to generate music. It notes that producing music can be expensive, but an AI system could provide a cheaper alternative for businesses. The system would be trained on music theory concepts like notes, chords, scales and keys to understand harmonious combinations. A web-based platform could then generate custom music based on user selections and input the trained machine learning model. The goal is an affordable way for companies to automatically produce unique music for branding and promotions.
This document describes a student project to create an algorithm that generates a short music playlist based on one or more seed songs. The algorithm is based on a previous published method called AutoDJ that uses Gaussian process regression with a kernel function to predict a user's preference for additional songs based on attributes of the seed songs. The key aspects of the student's algorithm include using a kernel that is trained on song attribute data to learn song similarities, and generating a playlist sorted by predicted user preference for each song. The student's model and data differ from the original AutoDJ method primarily due to having a mix of continuous and categorical song attributes rather than purely categorical data.
This document provides an overview of a dissertation on Emofy, a classical music recommender system. The summary includes:
- Emofy is a music recommender system that recommends classical Indian music based on the user's mood by classifying moods and associating different ragas and genres with different moods.
- The dissertation discusses collecting and labeling a dataset of classical music, extracting features to classify mood, and using machine learning algorithms like random forests to achieve over 90% accuracy in mood classification.
- The recommended system uses mood classification to map users to appropriate ragas and playlists of classical music tracks on Spotify aimed at therapeutic applications.
The document discusses how new multimedia technologies have changed musical culture and practices. It outlines how the music industry has shifted from CDs to online delivery and DAW production. It also discusses new trends in music consumption like music discovery sites, more passionate fans, the influence of celebrity culture, and openness to brand sponsorships. New content models are emerging like remixes, mashups, and live DJ sets. Research topics discussed include new methods of music representation, interaction rules for group experiences, automatic structure discovery, and characterizing aesthetics and emotions.
How Can The Essen Associative Code Be Usedlahtrumpet
The Essen Associative Code (EsAC) database and associated software tools can be used for musical analysis, sight-singing, analyzing recorded and printed music, and researching melodies. The Humdrum Toolkit is free software that allows users to encode music data in the EsAC format and analyze things like pitch contours, intervals, and phrase repetition. David Huron used the Humdrum Toolkit and EsAC database to analyze folksong melodies and found they tend to rise and fall on average. The Themefinder system allows searching the EsAC database to find musical examples for further analysis and comparison.
How Can The Essen Associative Code Be Usedlahtrumpet
The Essen Associative Code (EsAC) database and associated software tools can be used for musical analysis, sight-singing, analyzing recorded and printed music, and researching melodies. The Humdrum Toolkit is free software that allows users to encode music data in the EsAC format and analyze things like pitch contours, intervals, and phrase repetition. David Huron used the Humdrum Toolkit and EsAC database to analyze folksong melodies and found they tend to rise and fall on average. The Themefinder system allows searching the EsAC database to find musical examples for further analysis and comparison.
Jordan Smith has produced a glossary of terms related to sound design and production for computer games. The glossary contains definitions for terms like Foley Artistry, Sound Libraries, audio file formats like .wav and .mp3, limitations like RAM and mono audio, recording systems such as CDs and MIDI, sampling constraints like bit depth and sample rate, and tools like plug-ins and MIDI keyboards. Jordan provides context for each term and how it relates to his own production work where possible.
The document is a glossary created by a student, Steph Hawkins, for a unit on sound design and production. It contains definitions for over 15 key terms related to sound design methodology, file formats, audio limitations, and audio recording systems. For each term, Steph provides a short internet-researched definition and URL source, and also describes how the term relates to their own production practice.
Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...Kosetsu Tsukuda
本ポスターは2021年11月7日~12日に開催された「22nd International Society for Music Information Retrieval Conference (ISMIR 2021)」の発表資料です。
発表した論文のPDFは以下のURLから閲覧できます。
http://ktsukuda.me/wp-content/uploads/ISMIR2021_Lyrics_tsukuda.pdf
The document is a glossary of terms related to sound design and production for computer games. It contains definitions for terms like Foley artistry, sound libraries, audio file formats like .wav and .mp3, audio limitations involving hardware, recording systems, sampling, and more. For each term, it provides a short definition from an online source as well as any relevance to the author's own production practice.
The document proposes developing a complete music player application that integrates multiple music-related features into a single application. It discusses developing features like emotion recognition using neural networks to play music matching a user's mood, song mixing, YouTube linking to related music videos, karaoke, and lyrics display. The application would use technologies like Android Studio, MongoDB database, and APIs from other applications to consolidate functions currently found across multiple separate apps. This would provide a more unified music experience for users.
MLConf2013: Teaching Computer to Listen to MusicEric Battenberg
The document discusses machine listening and music information retrieval. It introduces common techniques in music auto-tagging like extracting features from audio spectrograms and training classifiers. Deep learning approaches that learn features directly from data are showing promise. Recurrent neural networks are discussed for modeling temporal dependencies in music, with an example of applying them to onset detection. The talk concludes with an example of live drum transcription using drum modeling, onset detection, spectrogram slicing and non-negative source separation.
The document provides an overview of Music Information Retrieval (MIR) techniques for analyzing music with computers. It discusses common MIR tasks like genre/mood classification, beat tracking, and music similarity. Recent approaches to music auto-tagging using deep learning are highlighted, such as using neural networks to learn features directly from audio rather than relying on hand-designed features. Recurrent neural networks are presented as a way to model temporal dependencies in music for applications like onset detection. As an example, the document describes a system for live drum transcription that uses onset detection, spectrogram slicing, and non-negative matrix factorization for source separation to detect drum activations in real-time performance audio.
The document describes an Android application called AllDup Music that identifies and removes duplicate music files from a user's phone. It does this by comparing the frequency of music files using a minhashing algorithm to detect duplicates, then prompts the user to delete any redundant files. The application aims to save storage space by eliminating duplicate music files that may have different names but identical content.
The document discusses sentiment analysis and opinion mining. It describes opinion mining as the process of analyzing text written in a natural language to classify it as positive, negative, or neutral based on the expressed sentiments. It outlines different levels of opinion mining including document, sentence, and aspect levels. It provides details on the typical architecture of an opinion mining system, including modules for preprocessing, part-of-speech tagging, aspect extraction, opinion identification, and orientation.
The document discusses big data and Hadoop as a framework for processing large datasets. It describes how Hadoop uses HDFS for storage and MapReduce for parallel processing. HDFS uses a master/slave architecture with a NameNode and DataNodes. MapReduce jobs are managed by a JobTracker and executed on TaskTrackers. The document provides an example of using MapReduce to find common friends between users. It concludes that Hadoop is capable of solving big data challenges through scalable and fault-tolerant distributed processing.
More Related Content
Similar to Query By Humming - Music Retrieval Technique
The document discusses developing a model to compose monophonic world music using deep learning techniques. It proposes using a bi-axial recurrent neural network with one axis representing time and the other representing musical notes. The network will be trained on a dataset of MIDI files describing pitch, timing, and velocity of notes. It will also incorporate information from music theory on scales, chords, and other elements extracted from sheet music files. The goal is to generate unique musical sequences while adhering to music theory rules. The model aims to address the problem of composing long durations of background music for public spaces in an automated way.
This document presents a device called the Tonalyzer, which provides a visual representation of tone to help musicians understand and achieve their desired tone. The device uses audio processing and Fourier analysis to analyze the frequency components of an input sound and display them graphically in real-time. It also allows users to save tone profiles for later comparison. An extensive user survey found that most target users are experienced musicians who struggle to describe tone and would benefit from a device to analyze and match tones. The key user specifications for the Tonalyzer are audio input, an interactive visual display, tone storage capabilities, durability for portable use, and long battery life to support musician needs.
The document discusses speech recognition and voice recognition. It covers what voice is, the components of sound, why voices are different, classification of speech sounds, the speech production process, what voice recognition is, automatic speech recognition (ASR), types of ASR systems including speaker-dependent and speaker-independent, approaches to speech recognition including template matching and statistical approaches, and the process of speech recognition.
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...TELKOMNIKA JOURNAL
The ability of traditional packet level Forward Error Correction approaches can limit errors for
small sporadic network losses but when dropouts of large portions occur listening quality becomes an
issue. Services such as audio-on-demand drastically increase the loads on networks therefore new, robust
and highly efficient coding algorithms are necessary. One method overlooked to date, which can work
alongside existing audio compression schemes, is that which takes account of the semantics and natural
repetition of music through meta-data tagging. Similarity detection within polyphonic audio has presented
problematic challenges within the field of Music Information Retrieval. We present a system which works
at the content level thus rendering it applicable in existing streaming services. Using the MPEG–7 Audio
Spectrum Envelope (ASE) gives features for extraction and combined with k-means clustering enables
self-similarity to be performed within polyphonic audio.
This document discusses the use of artificial intelligence in organized sound as surveyed in the journal Organised Sound. It provides an overview of key AI technologies like Auto-Tune audio processing that can correct pitch and organize sound. Applications discussed include general sound classification, open sound control for music networking, and time-frequency representations for sound analysis and resynthesis. The document also outlines recent research on intelligent composer assistants, responsive instruments, and recognition of musical sounds. Finally, it discusses the future of AI in organizing sound through planning and machine learning.
Application of Recurrent Neural Networks paired with LSTM - Music GenerationIRJET Journal
This document discusses using recurrent neural networks and long short-term memory networks to generate music. It notes that producing music can be expensive, but an AI system could provide a cheaper alternative for businesses. The system would be trained on music theory concepts like notes, chords, scales and keys to understand harmonious combinations. A web-based platform could then generate custom music based on user selections and input the trained machine learning model. The goal is an affordable way for companies to automatically produce unique music for branding and promotions.
This document describes a student project to create an algorithm that generates a short music playlist based on one or more seed songs. The algorithm is based on a previous published method called AutoDJ that uses Gaussian process regression with a kernel function to predict a user's preference for additional songs based on attributes of the seed songs. The key aspects of the student's algorithm include using a kernel that is trained on song attribute data to learn song similarities, and generating a playlist sorted by predicted user preference for each song. The student's model and data differ from the original AutoDJ method primarily due to having a mix of continuous and categorical song attributes rather than purely categorical data.
This document provides an overview of a dissertation on Emofy, a classical music recommender system. The summary includes:
- Emofy is a music recommender system that recommends classical Indian music based on the user's mood by classifying moods and associating different ragas and genres with different moods.
- The dissertation discusses collecting and labeling a dataset of classical music, extracting features to classify mood, and using machine learning algorithms like random forests to achieve over 90% accuracy in mood classification.
- The recommended system uses mood classification to map users to appropriate ragas and playlists of classical music tracks on Spotify aimed at therapeutic applications.
The document discusses how new multimedia technologies have changed musical culture and practices. It outlines how the music industry has shifted from CDs to online delivery and DAW production. It also discusses new trends in music consumption like music discovery sites, more passionate fans, the influence of celebrity culture, and openness to brand sponsorships. New content models are emerging like remixes, mashups, and live DJ sets. Research topics discussed include new methods of music representation, interaction rules for group experiences, automatic structure discovery, and characterizing aesthetics and emotions.
How Can The Essen Associative Code Be Usedlahtrumpet
The Essen Associative Code (EsAC) database and associated software tools can be used for musical analysis, sight-singing, analyzing recorded and printed music, and researching melodies. The Humdrum Toolkit is free software that allows users to encode music data in the EsAC format and analyze things like pitch contours, intervals, and phrase repetition. David Huron used the Humdrum Toolkit and EsAC database to analyze folksong melodies and found they tend to rise and fall on average. The Themefinder system allows searching the EsAC database to find musical examples for further analysis and comparison.
How Can The Essen Associative Code Be Usedlahtrumpet
The Essen Associative Code (EsAC) database and associated software tools can be used for musical analysis, sight-singing, analyzing recorded and printed music, and researching melodies. The Humdrum Toolkit is free software that allows users to encode music data in the EsAC format and analyze things like pitch contours, intervals, and phrase repetition. David Huron used the Humdrum Toolkit and EsAC database to analyze folksong melodies and found they tend to rise and fall on average. The Themefinder system allows searching the EsAC database to find musical examples for further analysis and comparison.
Jordan Smith has produced a glossary of terms related to sound design and production for computer games. The glossary contains definitions for terms like Foley Artistry, Sound Libraries, audio file formats like .wav and .mp3, limitations like RAM and mono audio, recording systems such as CDs and MIDI, sampling constraints like bit depth and sample rate, and tools like plug-ins and MIDI keyboards. Jordan provides context for each term and how it relates to his own production work where possible.
The document is a glossary created by a student, Steph Hawkins, for a unit on sound design and production. It contains definitions for over 15 key terms related to sound design methodology, file formats, audio limitations, and audio recording systems. For each term, Steph provides a short internet-researched definition and URL source, and also describes how the term relates to their own production practice.
Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...Kosetsu Tsukuda
本ポスターは2021年11月7日~12日に開催された「22nd International Society for Music Information Retrieval Conference (ISMIR 2021)」の発表資料です。
発表した論文のPDFは以下のURLから閲覧できます。
http://ktsukuda.me/wp-content/uploads/ISMIR2021_Lyrics_tsukuda.pdf
The document is a glossary of terms related to sound design and production for computer games. It contains definitions for terms like Foley artistry, sound libraries, audio file formats like .wav and .mp3, audio limitations involving hardware, recording systems, sampling, and more. For each term, it provides a short definition from an online source as well as any relevance to the author's own production practice.
The document proposes developing a complete music player application that integrates multiple music-related features into a single application. It discusses developing features like emotion recognition using neural networks to play music matching a user's mood, song mixing, YouTube linking to related music videos, karaoke, and lyrics display. The application would use technologies like Android Studio, MongoDB database, and APIs from other applications to consolidate functions currently found across multiple separate apps. This would provide a more unified music experience for users.
MLConf2013: Teaching Computer to Listen to MusicEric Battenberg
The document discusses machine listening and music information retrieval. It introduces common techniques in music auto-tagging like extracting features from audio spectrograms and training classifiers. Deep learning approaches that learn features directly from data are showing promise. Recurrent neural networks are discussed for modeling temporal dependencies in music, with an example of applying them to onset detection. The talk concludes with an example of live drum transcription using drum modeling, onset detection, spectrogram slicing and non-negative source separation.
The document provides an overview of Music Information Retrieval (MIR) techniques for analyzing music with computers. It discusses common MIR tasks like genre/mood classification, beat tracking, and music similarity. Recent approaches to music auto-tagging using deep learning are highlighted, such as using neural networks to learn features directly from audio rather than relying on hand-designed features. Recurrent neural networks are presented as a way to model temporal dependencies in music for applications like onset detection. As an example, the document describes a system for live drum transcription that uses onset detection, spectrogram slicing, and non-negative matrix factorization for source separation to detect drum activations in real-time performance audio.
The document describes an Android application called AllDup Music that identifies and removes duplicate music files from a user's phone. It does this by comparing the frequency of music files using a minhashing algorithm to detect duplicates, then prompts the user to delete any redundant files. The application aims to save storage space by eliminating duplicate music files that may have different names but identical content.
Similar to Query By Humming - Music Retrieval Technique (20)
The document discusses sentiment analysis and opinion mining. It describes opinion mining as the process of analyzing text written in a natural language to classify it as positive, negative, or neutral based on the expressed sentiments. It outlines different levels of opinion mining including document, sentence, and aspect levels. It provides details on the typical architecture of an opinion mining system, including modules for preprocessing, part-of-speech tagging, aspect extraction, opinion identification, and orientation.
The document discusses big data and Hadoop as a framework for processing large datasets. It describes how Hadoop uses HDFS for storage and MapReduce for parallel processing. HDFS uses a master/slave architecture with a NameNode and DataNodes. MapReduce jobs are managed by a JobTracker and executed on TaskTrackers. The document provides an example of using MapReduce to find common friends between users. It concludes that Hadoop is capable of solving big data challenges through scalable and fault-tolerant distributed processing.
Big data processing using - Hadoop TechnologyShital Kat
This document summarizes a report on Hadoop technology as a solution to big data processing. It discusses the big data problem, including defining big data, its characteristics and challenges. It then introduces Hadoop as a solution, describing its components HDFS for storage and MapReduce for parallel processing. Examples of common friend lists and word counting are provided. Finally, it briefly mentions some Hadoop projects and companies that use Hadoop.
School admission process management system (Documention)Shital Kat
This document outlines the project plan for developing a School Admission Process Management System. It includes sections on project initiation and scheduling, diagrams of the system, a project cost estimation, designing the user interface, and plans for testing. The system will automate the currently manual paper-based admission process to make it faster and easier to use. It will store and process student personal, academic, and fee information using a web interface and backend database. Testing will include white box, black box, unit, integration, and system testing to ensure quality.
The document summarizes Shital Katkar's seminar presentation on WiFi technology. It discusses various topics related to WiFi including radio waves, flavors of WiFi standards, applications, advantages, limitations and security. The presentation covered key elements of a WiFi network, how WiFi works using radio signals and WiFi cards, different WiFi network topologies and security threats to WiFi like eavesdropping and denial of service attacks. It emphasized the need for WiFi security and discussed various security techniques.
This document discusses WiFi security and provides information on various topics related to securing wireless networks. It begins with an introduction to wireless networking and then covers security threats like eavesdropping and man-in-the-middle attacks. The document analyzes early security protocols like WEP that were flawed and discusses improved protocols like WPA and WPA2. It provides tips for securing a wireless network and examines potential health effects of WiFi radiation. The conclusion emphasizes that wireless security has improved greatly with new standards but work remains to be done.
This document discusses 802.11 WiFi technology. It describes the different WiFi standards including 802.11b, 802.11a, 802.11g, and 802.11n. The key components of a WiFi network are access points, WiFi cards, and security measures like firewalls. It also explains how WiFi networks use radio signals to transmit data wirelessly over short ranges, allowing devices to connect to the Internet without wires. Common network topologies for WiFi include infrastructure modes with an access point and peer-to-peer ad-hoc modes without an access point.
WiFi, also known as 802.11, allows devices to connect to a wireless network without needing wires. An access point is connected to the internet and creates a WiFi hotspot with a range of 100-150 feet indoors. Devices within this range can then connect wirelessly to browse the internet. WiFi standards like 802.11b and g operate at 2.4GHz while 802.11a and n can also use 5GHz. Later standards offer faster speeds and greater ranges. WiFi is popular for homes, small businesses, and public places as it offers mobility and easy installation without wired connections. Potential limitations include interference and limited range compared to wired networks.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
2. 2
SEMINARS
OF
SEMISTER – II
[ YEAR 2013-2014 ]
NAME: SHITAL KATKAR
TOPIC : Query By Humming
SIGNATURE:________________
3. 3
INDEX
1 Introduction
1.1 Query By Humming
2 Basic Architecture
2.1 Extraction
2.2 Transcription
2.3 Comparison
3 Applications
3.1 Shazam
3.2 Sound-Hound
3.3 Midomi
3.4 Musipedia
4 The art of Singing
4.1 Challenges
5 File Formats
5.1 Wav File format
5.2 MIDI File format
6 System Architecture
6.1 Wav to MIDI conversion
7 Parson Code algorithm
7.1 Rules
7.2 Advantages
8 Benchmarking MIR System
8.1 Online MIR System
8.1.1 CatFind
8.1.2 MelDex
8.1.3 MelodyHound
8.1.4 ThemeFinder
8.1.5 Music Retrieval Demo
4. 4
8.2 Comparison of MIR System
8.3 Evaluation Issues
8.4 Subjective and objective testing
9 Conclusion
5. 5
1. INTRODUCTION
Many people often remember as short tidbit of a song but fail to recall the song's name. If
you can remember lyrics that correspond to the song you are trying to recall, finding the
song is as easy as performing a text query on a web search engine. A query by humming
system allows a user to find a song even if he merely knows the tune from part of the
melody.
• “I don’t know the name. I don’t know who does it.
• But I can’t get this song out of my head.”
• Well, why not just hum it.
Query by humming System
It is a music retrieval technology in which users can hum or sing a melody to retrieve the
song.
The user simply sings or hums the tune into a computer microphone, and the system
searches through a database of song for melodies containing the tune and returns a ranked
list of search results. Thus user can then find the desired song by listening to the results.
6. 6
A Query by Humming (QBH) system enables a user to hum a melody into a microphone
connected to a computer in order to retrieve a list of possible song titles that match the
query melody. The system analyzes the melodic and rhythmic information of the input
signal. The extracted data set is used as a database query. The result is presented as a list of
e.g. ten best matching results.
Generally, a QBH system is a Music Information Retrieval (MIR) system. A MIR systems
provides several means for music retrieval, which can be hummed audio signal, but also
music genre classification or text information about the artist or title.
7. 7
2. BASIC ARCHITECTURE
Fig- Basic System Architecture
The basic architecture of the system is depicted in above figure. A microphone takes the
hummed input and sends this as a PCM signal to extraction block. The extracted information
results here which is given to the transcription part. The transcription block forms Melody
Contour to be compared with all contours residing in the database. A result list is finally
presented to the user.
Extraction
The extraction block is also referred as the acoustic front end. After recording the signal
with a computer sound card the signal is band pass filtered to reduce environmental noise
and distortion. In this system a sampling rate of 8000 Hz is used. The signal is band limited
to 80 to 800 Hz, which is sufficient for sung input. This frequency range corresponds to a
musical note range of D2–G5.
Transcription
The transcription block transcribes the extracted information into the representation that is
needed for comparison. The main task is to segment the input stream into single notes. This
can be done using parson code algorithm.
8. 8
Comparison
The transcription result is used as database query. Several distance measures can be used to
find a similar piece of music. The database contains a collection of already transcribed
melodies formatted according to the MelodyContourType.
The Result is finally presented to the user.
9. 9
3. APPLICATIONS
These are some examples of QBH Systems.
Shazam
Shazam is a commercial mobile phone-based music identification service. The company was
founded in 1999 by Chris Barton, Philip Inghelbrecht, Avery Wang and Dhiraj Mukherjee.
Shazam uses a mobile phone's built-in microphone to gather a brief sample of music being
played. An acoustic fingerprint is created based on the sample, and is compared against a
central database for a match. If a match is found, information such as the artist, song title,
and album are relayed back to the user.
Shazam can identify prerecorded music being broadcast from any source, such as a radio,
television, cinema or club, provided that the background noise level is not high enough to
prevent an acoustic fingerprint being taken, and that the song is present in the software's
database.
10. 10
SoundHound
SoundHound (known as Midomi until December 2009) is a mobile device service that allows
users to identify music by humming, singing or playing a recorded track. The service was
launched by Melodis Corporation (now SoundHound Inc), under Chief Executive Keyvan
Mohajer in 2007 and has received funding from Global Catalyst Partners, TransLink Capital
and Walden Venture Capital.
SoundHound is a music search engine available on the Apple App Store, Google Play,
Windows Phone Store, and on June 5, 2013, was available on the BlackBerry 10 platform. It
enables users to identify music by playing, singing or humming a piece. It is also possible to
speak or type the name of the artist, composer, song and piece. Unlike competitor Shazam,
SoundHound can recognise tracks from singing, humming, speaking, or typing, as well as
from a recording. Sound matching is achieved through the company's 'Sound2Sound'
technology, which can match even poorly-hummed performances to professional
recordings.
11. 11
Midomi
Midomi is the ultimate music search tool. Sing, hum, or whistle to instantly find your
favorite music and connect with a community that shares your musical interests.
At midomi you can create your own profile, sing your favorite songs and share them with
your friends and get discovered by other midomi users. You can listen to and rate other
users' musical performances, see their pictures, send them messages, buy original music,
and more.
midomi features an extensive digital music store with a growing collection of more than two
million legal music tracks. You can listen to samples of original recordings, buy the full studio
versions directly from midomi, and play them on your Windows computer or compatible
music players.
12. 12
Musipedia
Musipedia is a search engine for identifying pieces of music. This can be done by whistling a
theme, playing it on a virtual piano keyboard, tapping the rhythm on the computer
keyboard, or entering the Parsons code. Anybody can modify the collection of melodies and
enter MIDI files, bitmaps with sheet music, lyrics or some text about the piece, or the
melodic contours as Parsons Code.
Musipedia's search engine works differently from that of search engines such as Shazam.
The latter can identify short snippets of audio (a few seconds taken from a recording), even
if it is transmitted over a phone connection. Shazam uses Audio Fingerprinting for that, a
technique that makes it possible to identify recordings. Musipedia, on the other hand, can
identify pieces of music that contain a given melody. Shazam finds exactly the recording that
contains a given snippet, but no other recordings of the same piece.
13. 13
4. THE ART OF SINGING
It is obvious that people have imperfect memories for melodies or may lack any formal
singing practice.
1.People sing any part of the melody. A repetitive melodic passage in a song may represent
the ’hook-line’ of a song that ’gets stuck in people’s head’.
2.People sing at the wrong key. People chose a random pitch to start their singing. Only for
their most favorite songs, people are thought to have a latent ability of absolute pitch.
3. People sing at a reasonably correct global tempo. People knew or had a feeling, by
previous hearings, what the correct tempo would be and were able to approach this tempo
reasonably accurately. But still it is not possible to sing in correct tempo.
4.People sing too many or too few notes. Human memory is imperfect to recall all pitches
in the right order. People sang just the line they remembered. They also added all kinds of
ornaments (e.g., grace notes, filler notes, or thinner notes) to beautify their singing or to
ease the muscular motor processes involved in singing.
5.People sing the wrong intervals or confuse some with others. People sang about 59% of
the intervals correctly, though there were differences due to singing experience, song
familiarity and recent song exposure. Interval confusion seems to be symmetric;
interchanging an interval with another was found to be equally likely as the other way
around. A large interval (thirds and larger) tends to be more easily interchanged for another.
6. People sing the contour reasonably accurately. People largely knew when to go up and
when to go down in pitch when singing; they did that correctly in 80% of the times.
14. 14
7. People with singing experience sing better on some aspects than people without singing
experience do. The non-experienced and experienced singers did not differ in singing the
contour of a melody accurately. However, experienced singers reproduced proportionally
more correct intervals and sang at a better timing.
8. People sing familiar melodies better than less familiar ones. Less familiar melodies were
reproduced with fewer notes and had proportionally fewer correct intervals than familiar
melodies. Also, both experienced and non-experienced singers improved their singing of
intervals when they had heard the melody very recently.
15. 15
4.1 CHALLENGES
Building such a system, however, presents some significantly greater challenges than
creating a conventional text-based search engine. Unlike lyrical content, there exists no
intuitively obvious way to represent and store melodic content in a database. The chosen
representation must be indexable for efficient searching. Furthermore, several issues
unique to query by humming systems pose significant challenges to creating an efficient and
accurate music search system.
1. Users may not make perfect queries. Even if a user has a perfect memory of a particular
tune, he may start at the wrong key, or he may hum a few notes off-pitch throughout the
course of the tune. Sometimes he may even drop some notes entirely or add notes that did
not exist in the original melody. Additionally, no user is expected to be able to perfectly
hum at the same tempo as the songs stored in the database. Finally, since none of these
errors are mutually exclusive, a humming query may contain any combination of these
errors.
2. Accurately capturing pitches and notes from user hums is difficult, even if the user
manages to submit a perfect query. Currently existing software for converting raw audio
data into discrete pitch information is mediocre at best and oftentimes will introduce a
great deal of noise when extracting the pitches from a user’s hum.
3. Similarly, accurately capturing melodic information from a pre-recorded music file is
difficult. Properly extracting the melody from a given song is a field of study on its own but
is absolutely critical for an accurate query by would be of little use if the database contains
inaccurate representations of the target songs.
16. 16
5.FILE FORMATS
Wav File Format
WAVE or WAV format is the short form of the Wave Audio File Format (rarely referred to as
the audio for Windows). WAV format compatible with Windows, Macintosh or Linux.
Despite the fact that the WAV file can hold compressed audio, the most common use is to
store it is just an uncompressed audio in linear PCM (LPCM). The standard format of Audio-
CD, for example, is the audio in LPCM, 2-channel, sampling frequency of 44,100 Hz and 16
bits per sample.
As a format, derived from the Resource Interchange File Format (RIFF), WAV-files can have
metadata (tags) in the chunk INFO. In addition, the WAV files can contain metadata
standard Extensible Metadata Platform (XMP).
Uncompressed WAV files are quite large in size, so, as file sharing over the Internet has
become popular, the WAV format has declined in popularity. However, it is still a widely
used, relatively "pure", i.e. lossless, file type, suitable for retaining "first generation"
archived files of high quality, or use on a system where high fidelity sound is required and
disk space is not restricted.
MIDI File Format
The term MIDI stands for Musical Instrument Digital Interface and is essentially a
communications protocol for computers and electronic musical instruments.
Although the produced MIDI files are not exactly the same as the typical digital audio
formats we use (like MP3, AAC, WMA, etc.) to listen to music, MIDI files can still be thought
of as digital music.
Rather than an actual audio recording stored as binary data, a MIDI file in its simplest form
is made up of information that describes what musical notes are to be played, along with
the types of instruments that are to be used
17. 17
MIDI Files therefore do not contain any 'real world' recordings like voice (e.g. Audio books),
live performances, etc.,
However, MIDI files are very small and can be played on a wide range of devices that
support the MIDI protocol. Examples of hardware that can play MIDI files include: cell
phones, smart phones, and even your computer using the right software. Examples of MIDI
file format is Monophonic and polyphonic Ringtones.
In QBH system it is chose to create our database of songs using songs in the midi file format.
Because the midi representation already discretizes the notes, making it easier to extract
the pitch and timing information necessary for our song matching. Alternate music file
formats such as wav, mp3, aiff, etc. would require complicated waveform and signal
processing that could lead to many inaccuracies. Each of our songs is also mapped to a set
of metadata attributes such as song name and song artist for eventual display in the GUI
result list.
18. 18
6. SYSTEM ARCHITECTURE
The architecture is illustrated in above Figure. Operation of the system is straight-forward.
Queries are hummed into a microphone, digitized, and fed into a pitch-tracking module. The
result, a contour representation of the hummed melody, is fed into the query engine, which
produces a ranked list of matching melodies. The database of melodies will be acquired by
processing public domain MIDI songs, and is stored as a flat file database. Pitch tracking can
be performed. Hummed queries may be recorded in a variety of formats. The query engine
uses an approximate pattern matching algorithm, in order to tolerate humming errors. The
melody database is essentially an indexed set of soundtracks. The acoustic query, which is
typically a few notes hummed by the user, is processed to detect its melody line. The
database is searched to find those songs that best match the query.
While the overall task is one that is easily performed by humans, many challenging
problems arise in the implementation of an automatic system. These include the signal
processing needed for extracting the melody from the stored audio and from the acoustic
query, and the pattern matching algorithms to achieve proper ranked retrieval. Further, a
robust system must be able to account for inaccuracies in the user’s singing
19. 19
6.1 WAV TO MIDI CONVERSION
To create a MIDI a file for a song recorded in WAV format a musician must determine pitch,
velocity and duration of each note being played and record these parameters into a
sequence of MIDI events. The Midi created represents the basic melody and chords of
recognized music. The difference between WAV and MIDI formats consists in representation
of sound and music. WAV format is digital recording of any sound (including speech) and
MIDI format is principally sequence of notes (or MIDI events). Here we have an Output File
(.mid) from an Input File (.wav) that contains musical data, and a Tone File (.wav) that
consists of monotone data. An advantage of such a structure is also the fact that the query
is prepared on the client side of the system. In this case the query is very short. Besides,
there is a possibility to evaluate its quality before sending to the server. The system provides
for playback of the recognized melody notes in MIDI format. This allows the user to listen to
a query and take a decision either to send it to the server or to sing it once again.
20. 20
7. PARSON CODE ALGORITHM
The Parsons code, formally named the Parsons Code for Melodic Contours, is a simple
notation used to identify a piece of music through melodic motion—the motion of
the pitch up and down. Denys Parsons developed this system for his 1975 book, The
Directory of Tunes and Musical Themes. Representing a melody in this manner makes it easy
to index or search for particular pieces.
User input to the system (humming) is converted into a sequence of relative pitch
transitions.
A note in the input is classified in one of three ways
1. U = "up," if the note is higher than the previous note
2. D = "down," if the note is lower than the previous note
3. r = "repeat," if the note is the same pitch as the previous note
4. * = first tone as reference
21. 21
First note is C (72nd note). We will make it as reference note. And put the * Second note is
also C, Since it is repeating, we will put R. Next is G. G note is upper than C so we will put U
(U for upper) For second G , We put R. and so on.
This textual pattern will store into database for comparison.
Advantages
1. Pattern remains same, even if user hum the tune in different scale even if user hum
some note off key.
2. Require less space since it is stored in textual file
22. 22
8. BENCHMARKING MUSIC INFORMATION RETRIEVAL SYSTEMS
Research Paper
Benchmarking Music Information Retrieval Systems
Josh Reiss Department of Electronic Engineering Queen Mary, University of London Mile End
Road, London E1 4NS UK +44-207-882-5528 josh.reiss@elec.qmul.ac.uk
Department of Electronic Engineering Queen Mary, University of London Mile End Road,
London E1 4NS UK +44-207-882-5528 josh.reiss@elec.qmul.ac.uk
Mark Sandler Department of Electronic Engineering Queen Mary, University of London Mile
End Road, London E1 4NS UK +44-207-882-7680 mark.sandler@elec.qmul.ac.uk
--
Goal of this research paper is to create an accurate and effective benchmarking system for
music information retrieval (MIR) systems. This will serve the multiple purposes of inspiring
the MIR community to add additional features and increased speed into existing projects,
and to measure the performance of their work and incorporate the ideas of other works. To
date, there has been no systematic rigorous review of the field, and thus there is little
knowledge of when an MIR implementation might fail in a real world setting.
ONLINE MIR SYSTEMS
For the purposes of this work, we considered five online MIR systems. The systems
considered all have certain properties in common. They may all be used online via the World
Wide Web. They all are used by entering a query concerning a piece of music, and all may
return information about music that matches that query. However, these systems differ
greatly in their features, goals and implementation. These differences are discussed in detail
below.
CatFind
CatFind allows one to search MIDI files using either a musical transcription or a melodic
profile based on the Parson’s Code. It has minimal features, and was intended primarily for
demonstration. Although it seems unlikely that this system will be extended, it is still useful
here as a system for comparison.
23. 23
MelDex
This allows searching of the New Zealand Digital Library. The MELody inDEX system is
designed to retrieve melodies from a database on the basis of a few notes sung into a
microphone. It accepts acoustic input from the user, transcribes it into common music
notation, then searches a database for tunes that contain the sung pattern, or patterns
similar to it. Thus the query is audio although the retrieved files are in symbolic
representation. Retrieval is ranked according to the closeness of the match. A variety of
different mechanisms are provided to control the search, depending on the precision of the
input.
MelodyHound
This melody recognition system was developed by Rainer Typke in 1997. It was originally
known as "Tuneserver" and hosted by the university of Karlsruhe. It searches directly on the
Parsons Code and was designed initially for Query By Whistling. That is, it will return the
song in the database that most closely matches a whistled query.
ThemeFinder
Themefinder, created by David Huron, et. al., allows one to identify common themes in
Western classical music, Folksongs, and latin Motets of the sixteenth century. Themefinder
provides a web-based interface to the Humdrum thema command, which in turn allows
searching of databases containing musical themes or incipits (opening note sequences).
Themes and incipits available through Themefinder are first encoded in the kern music data
format. Groups of incipits are assembled into databases. Currently there are three
databases: Classical Instrumental Music, European Folksongs, and Latin Motets from the
sixteenth century. Matched themes are displayed on-screen in graphical notation.
Music Retrieval Demo
The Music Retrieval Demo is notably different from the other MIR systems considered
herein. The Music Retrieval Demo performs similarity searches on raw audio data (WAV
files). No transcription of any kind is applied. It works by calculating the distance between
the selected file and all other files in the database. The other files can then be displayed in a
list ranked by their similarity, such that the more similar files are nearer the top. Distances
24. 24
are computed between templates, which are representations of the audio files, not the
audio itself. The waveform is Hamming-windowed into overlapping segments; each segment
is processed into a spectral representation of Mel- frequency cepstral coefficients. This is a
data-reducing transformation that replaces each 20ms window with 12 cepstral coefficients
plus an energy term, yielding a 13-valued vector. The next step is to quantize each vector
using a specially- designed quantization tree. This recursively divides the vector space into
bins, each of which corresponds to a leaf of the tree. Any MFCC vector will fall into one and
only one bin. Given a segment of audio, the distribution of the vectors in the various bins
characterize that audio. Counting how many vectors fall into each bin yields a histogram
template that is used in the distance measure. For this demonstration, the distance
between audio files is the simple Euclidean distance between their corresponding templates
(or rather 1 minus the distance, so closer files have larger scores). Once scores have been
computed for each audio clip, they are sorted by magnitude to produce a ranked list like
other search engines.
COMPARISON OF MIR SYSTEMS
In Table 1, we present a comparison of the features of the various MIR systems under
investigation. Note first that each of these systems was designed for a different purpose,
25. 25
and none of them can be considered a finished product. This table allows one to get an
overview of the state of the MIR systems available., the features that one may wish to
include in an MIR system, and the areas where improvement is most necessary. It also
highlights the need for a standardized testbed. Each of the MIR systems use a different
database of files for audio retrieval. Both CatFind and the Music Retrieval Demo have
databases with less than 500 files. Thus, any benchmarking estimates, such as retrieval
times and efficiency, are rendered useless. MelDex, MelodyHound and ThemeFinder have
databases containing over 10,000 files. This should be sufficient for estimating search
efficiency and salability.
EVALUATION ISSUES
Table 1 listed and compared the features available in existing online MIR systems. However,
this is not sufficient for effective benchmarking and evaluation of possible music
information retrieval systems that may appear in the near future and be used with large file
collection. The question of what features to evaluate is determined by what we can
measure that will reflect the ability of the system to satisfy the user. In a landmark paper,
Cleverdon[21] listed six main measurable quantities. This has become known as the
Cranfield model of information retrieval evaluation. Here, those properties are listed and
modified as applicable for MIR.
1. The coverage of the collection, that is, the extent to which the system includes relevant
matter.
2. The time lag, that is, the average interval between the time the search request is made
and the time an answer is given. Consideration should also be made of worst case or
close to worst case scenarios. It may be that certain genres or formats of music, as well
as certain types of queries, e. g., query and retrieval of polyphonic transcription based
audio may require far more time than other queries. Furthermore, if the testbed is
particularly large, dispersed or unindexed, such as with peer-to-peer based internet, then
bandwidth limitations and scalability may greatly reduce efficiency while maximizing the
collection size.
26. 26
3. The form of presentation of the output. For MIR systems this not only means having the
option of retrieving various formats, symbolic and audio, but it also implies identifying
multiple performances of the same composition.
4. The effort involved on the part of the user in obtaining answers to his search requests. So
far, MIR research has been dominated by audio engineers, computer scientists,
musicologists and librarians. As the field expands to include developers and user
interface experts this issue will acquire more significance.
5. The recall of the system, that is, the proportion of relevant material actually retrieved in
answer to a search request;
6. The precision of the system, that is, the proportion of retrieved material that is actually
relevant.
27. 27
9.CONCLUSION
Music retrieval is becoming more natural, simple and user friendly with the advancement of
QBH. Thus this technology will give broader application prospects for music retrieval.
Using Parson code algorithm it become easy to implement Query Matching System.
In this work, we have laid down a framework for benchmarking of future MIR systems. At
the moment, this field is in its infancy. There are only a handful of MIR systems available
online, each of which is quite limited in scope. Still, these benchmarking techniques were
applied to five online systems. Proposals were made concerning future benchmarking of full
online audio retrieval systems. It is hoped that these recommendations will be considered
and expanded upon as such systems become available.
28. 28
10.REFERENCES
Benchmarking Music Information Retrieval Systems
Josh Reiss Department of Electronic Engineering Queen Mary, University of London Mile End
Road, London E1 4NS UK +44-207-882-5528 josh.reiss@elec.qmul.ac.uk
Mark Sandler Department of Electronic Engineering Queen Mary, University of London Mile
End Road, London E1 4NS UK +44-207-882-7680 mark.sandler@elec.qmul.ac.uk
A Query by Humming system using MPEG-7 Descriptors
Jan-Mark Batke, Gunnar Eisenberg, Philipp Weishaupt, and Thomas Sikora
Communication Systems Group, Technical University of Berlin
Correspondence should be addressed to Jan-Mark Batke (batke@nue.tu-berlin.de)
MusicDB: A Query by Humming System
Edmond Lau, Annie Ding, Calvin On
6.830: Database Systems Final Project Report Massachusetts Institute of Technology
{edmond, annie_d, calvinon}@mit.edu