The University of Texas at Austin Spring 2015 ECE Senior Design Project
ScoReader is a computer vision system built on Android and Google Glass that translates a photograph of sheet music into a playable .midi file. The main motivation behind the creation of this project is that musicians, particularly novices, are able to learn music much faster if they know what it sounds like. In order to solve this problem, we created a system designed to convert a significant subset of images of sheet music into sound.
Our design solution consists of a front-end built on Android, and a back-end built in Python and hosted on a Google Cloud Platform virtual machine. Using the on-board camera of an Android device, the front-end will send a photo that the user supplies to the back-end, where a script will put the image through the Optical Music Recognition pipeline. The pipeline will produce a .midi file, which will then be sent back to the Android device and played for the user.
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
ScoReader: A Mobile Computer Vision System for Optical Music Recognition
1. ScoReaderDAVID LE | OLIVER LO | DEREK LY | DEREK Faculty Mentor: Dr. Christine Julien
Senior Design Open House - April 29th
2015 - The University of Texas at Austin
Musicians are often expected to learn new music quickly.
Hearing the actual pitches of an unfamiliar piece drastically
reduces the time required for a player to perform said
piece. With our application, musicians will be able to hear
a synthesized version of a song that will clear up issues of
pitch and rhythm, even for songs without easily accessible
professional recordings.
NGO | CAMERON MOUSIGHI | CHASE RIGGINS
Background
Figure 2: OMR Pipeline in Detail
1. Original Image 2. Thresholding 3. Line Removal
4. Parsed Line5. Parsed Symbols
OMR PIPELINE (Figure 2 Below)
1. Capture image and correct for skew
2. Convert image from RGB to grayscale
3. Remove all staff lines
4. Decompose sheet music into individual staff lines
5. Further decompose each staff line into musical symbols
6. Compare segmented images to database of musical
symbols using template matching
7. Musically interpret these symbols in the order they appear
Optical Music Recognition is computationally expensive,
which is why we chose to offload processing to a server. We
chose a Google Cloud Platform Server with specifications:
Component Quantity Details
CPU 2 vCPUs 2.6 GHz Intel Xeon E5
Memory 13 GB RAM 6.50 GB per Virtual Core
Figure 1: System Block Diagram
Our application captures an image of sheet music, sends
that image to a remote server for processing, and then plays
the returned audio file back to the user. This process is
illustrated in Figure 1 below.
Summary of Design & Block Diagram
PRIMARY PROBLEMS
• Image capture
• Image normalization
• Image verification
• Image stabilization
• Optical Music Recognition
• Real-time processing speed
• Accessibility
• Object recognition & Musical interpretation
REQUIREMENTS
• Processing time per page: 30 seconds
• Read note lengths: 64th
– Double Whole
• Read note pitches: ±3 staff ledger lines
• Read common: key signature & time signatures
CONSTRAINTS
• Enviroment - Visibility: indoors, well-lit area, low glare
• Enviroment - Noise: ≤ 55 dB
• User cannot tilt camera more than 35˚ from horizontal
• User must have access to Wi-Fi
• Music limited to 2 instruments
• Music must be printed clearly in a standard font
Problem Definition
ScoReader is an Optical Music Recognition (OMR)
application built for Android and Google Glass. We wanted
to advance and refine current OMR implementations while
developing on an accessible platform. Our project is unique
in utilizing Glass to provide musicians with a hands-free
experience.
Abstract
Figure 4: Conversion Accuracy by Pitch & Rhythm
Figure 3: Original Twinkle Twinkle Little Star Sheet Music
Ninety percent of pitches and rhythms in the samples
we supplied were correctly identified and met our stated
requirements. We organized our testing as follows:
• Glass UI (Subsystem)
• image normalization/verification
• music playback
• Server (Subsystem)
• line removal & staff/object segmentation
• object recognition/musical interpretation
In retrospect we would have liked to get an earlier start on the
imageprocessingmodules,sincetheircomplexitymadeithard
to debug. Additionally, creating our app solely for Android
would’ve allowed us to focus more on the functionality of the
app. If we were to continue working on the project, we would
extend our app to read stylistic markings, like dynamics,
tempo, accents, and instrument types.
To evaluate ScoReader, we compared the pitches and rhthyms
produced by the output file to the pitches and rhythms in the
sheet music. For each note, we considered the conversion a
100% success if both the pitch and rhythm are accurate. If
only one of these was correct, we said it was a 50% success,
and if both were missed, it was a failure.The final score is the
average score of each note on the page.
Testing & Evaluation Results