3. I want to find a particular runner
3http://youarefuckingkidding.com/are-you-kidding/
4. Our Goal
Automatically recognize different runners in each
frame of a video segment and gather them.
4http://blog.neocamino.com/lemail-retargeting-comment-ca-marche/
5. Related work
• GPS tracking
• No identification
• With error of several meters
• RFID
• Expensive
• Large dimension of reader
• Computer vision
• Recognize face or bib number(OCR)
• Time and power-consuming
• Easily-affected
5
http://www.ourbreadcrumbstrail.com/shes-gone-running-again-week-2-the-gossip-runs/gps-track-paris-marathon-2013/
http://www.finishlynx.com/product/rfid-chip-timing/passive-rfid-tag/
https://secure.bibnumbers.com/site/samples
13. Low frequency square wave High frequency square wave
13
time timelight signal light signal
1
𝑓𝑙
1
𝑓ℎ
Wide Strip Narrow Strip
Frequency Shift Keying
15. Transmitter
• Use frequency to represent data
• Select a frequency 𝑓𝑖 from a set ℱ= {𝑓1, 𝑓2, … 𝑓𝑁}
to represent log2 𝑁 bits pattern
15
Receiver
• Use strip width to demodulate data
• Estimate 𝑓𝑖 with 𝑓𝑖 =
1
2𝑊𝑇𝑟
, 𝑇𝑟: camera-dependent parameter
20. Transmitter - signal format
• Utilize RS-FSK
• Light frequency ↔ Strip width ↔ Runner ID
• pros:
• Simplest
• Need only one frame
• cons:
• Limited usable frequency → limited number of IDs
• No error detection or correction
20
21. Receiver - camera
• Two cameras for a set
• One for normal photo
One for RS-FSK, low exposure duration
• Up to 25 meter away
• 30 fps, resolution 2048 x 1080
21http://mikeshouts.com/point-grey-flea-3-digital-camera/
30. Strip Width Estimation
YIN pitch detection algorithm
• Autocorrelation
• Originally for audio signal
• Find square wave signal period in image
30
A. De Cheveign´e and H. Kawahara, “YIN, a fundamental frequency estimator for speech and music,” The Journal of the Acoustical
Society of America, vol. 111, no. 4, pp. 1917–1930, 2002.
35. Tracking
• Objective: avoid the influence of background noise
• bib center: center coordinate of all strip pattern
areas in a frame
• Calculate mean shift of bib center
to estimate shifting in next frame
• Background usually has no movement
35
37. Tracking
• Objective: avoid the influence of background noise
• bib center: center coordinate of all strip pattern
areas in a frame
• Calculate mean shift of bib center
to estimate shifting in next frame
• Background usually have no movement
• Bonus! Reduce the noise of strip width estimation
37
41. Recognition Accuracy
41
• Average recall of 5 videos for each setting
Weather
condition
Camera
perspective
Number of runners in an image
1 2 5
Sunny Front 91.0% 91.9% 88.5%
Side 87.8% 87.4% 84.7%
Cloudy
(Indoor)
Front 97.5% 97.1% 89.1%
Side 90.9% 90.3% 87.6%
42. Recognition Accuracy
42
• Less ambient light is better
Weather
condition
Camera
perspective
Number of runners in an image
1 2 5
Sunny Front 91.0% 91.9% 88.5%
Side 87.8% 87.4% 84.7%
Cloudy
(Indoor)
Front 97.5% 97.1% 89.1%
Side 90.9% 90.3% 87.6%
43. Recognition Accuracy
43
• Capturing runners from front is better
Weather
condition
Camera
perspective
Number of runners in an image
1 2 5
Sunny Front 91.0% 91.9% 88.5%
Side 87.8% 87.4% 84.7%
Cloudy
(Indoor)
Front 97.5% 97.1% 89.1%
Side 90.9% 90.3% 87.6%
44. Recognition Accuracy
44
• Number of runners has negative impact on recall
Weather
condition
Camera
perspective
Number of runners in an image
1 2 5
Sunny Front 91.0% 91.9% 88.5%
Side 87.8% 87.4% 84.7%
Cloudy
(Indoor)
Front 97.5% 97.1% 89.1%
Side 90.9% 90.3% 87.6%
46. Conclusion
• LightBib: recognize marathon runners automatically
using Visible Light Communication
• Average recall: 90%
46
Future work
• System scalability
• Dimension
http://tw.on.cc/hk/bkn/cnt/news/20141028/bkn-20141028201339939-1028_00822_001.html
52. Usable Frequency
• In ℱ= {𝑓1, 𝑓2, … 𝑓𝑁}, find
Frequency Bandwidth 𝑓1, 𝑓𝑁 and
Error Margin 𝑓𝑖 − 𝑓𝑖−1
• Camera captured from 25 m away
• 10 frames for each frequency
52
53. Usable Frequency
• In (2000 Hz, 3000 Hz) with 100 Hz step
• Frequency bandwidth up to 10k Hz
53
Editor's Notes
Good afternoon everybody. I’m Chiao from National Taiwan University.
In this talk, I will introduce LightBib, an marathoner recognition system using visible light communication.
This is a joint work with Chia-Wen Cheng, Wen-Hsuan Shen, Yu-Lin Wei and my adviser Hsin-Mu Tsai in National Taiwan University.
Marathon is a popular activity nowadays that many people join in.
To record the event, there will be hundreds of thousands of pictures or videos of runners.
For example, these pictures are about Brighton marathon in England.
And now, we come up with a question,
What happened if I want to collect pictures and videos of a particular runner?
You can imagine it must be a hard work.
So, our goal is to automatically recognize different runners in each frame of a video segment or in the pictures and gather them together.
Actually, there are already some related works trying to solve this problem but all of them have respective drawbacks.
First one is GPS tracking. You can get the position of runner, but you cannot use this technology to recognize runners in the pictures or videos.
Another one is RFID. However, this method is quite expensive and the RFID reader’s scale is quite large.
The last one is using computer vision to recognize human faces or bib number in the images.
The image processing is not efficient enough and also power-consuming.
As a result, we come to the idea of utilizing visible light to transmit runner’s bib number.
When the camera receive the information in light, we can achieve our goal to recognize different runners automatically.
So, why we use visible light communication?
One of the biggest advantages is that VLC allows us to associate the received ID with a particular image area occupied by a runner.
That is to say, we only need to recognize those runners appearing in the images.
Also, there are other benefits such as low deployment cost and less recognition cost.
The receiver of this system, which is camera, is cheaper than other device like RFID reader.
And the decoding process is more efficient than existing computer vision method.
Here is the outline of my following presentation.
Before going to the system design, let’s talk about some background knowledge.
Cameras, can be categorized into two different types by the type of shutter they use.
It’s either global shutter camera, or rolling shutter camera.
A global shutter camera exposes all pixels in an image simultaneously, like this.
And a rolling shutter camera exposes rows of pixels row by row, like this.
In other words, when you use a global shutter camera to capture a signal. The sampling rate will be equal to the frame rate of the receiving camera, which is usually 30 Hz.
And that’s too low for most of the communication applications.
Instead, if you use a rolling shutter camera, different rows of pixels will be captured in slightly different time.
The number of samples you can obtain in one image can be as large as the wide resolution of the image. That’s usually about 1000.
That means you have the opportunity to increase the bandwidth by a thousand time in the best case.
Now we add a light source which is switched on and off in fixed frequency, the light signal is essentially a square wave.
When you use a global shutter system, all pixels will be captured at the same time. So the image is either dark when the light is off, or bright when the light is on.
Instead, if you use a rolling shutter system, the image would exhibit a stripe pattern. This is because each row of the pixels are exposed at different time. When the light source changes between the on and off states, the row of pixels also changes between dark and bright. And that results in the stripe pattern.
If you transmit the signal as low frequency square wave, which means you turn the light on and off more slowly.
What you are going to observe in the received image is a stripe pattern with wider strips.
It is easy to figure out that when I transmit the high frequency square wave, there will be narrower strips .
This concept is called frequency shift keying modulation.
Here are some examples to show captured images of different transmitted frequency and their corresponding strip widths.
This equation describe the relation between the frequency and the strip width. The T of r here is a camera-dependent parameter.When the light is switched on and off in 1000, 2000 and 3000 Hz, the respective strip widths are 34 pixels, 17 pixels, and 11 pixels.
To construct a visible light communication,
At the transmitter, we use different frequency to represent the data we transmit.
And at the receiver, we use strip width in the received image to demodulate the data. We can estimate the transmitted frequency with this equation, then we can find out what is the information from the light.
OK, Let’s come to the main part of this research, LightBib.
This is the overview of this system.
Imagine that in the marathon, runners will wear a pair of light strips, which is what we called LightBib. They are used as transmitter to transmit the ID of the runner throughout the competition.
The cameras are set along the route to capture the images of runners.
Let’s see the transmitter first.
Here is the implementation of LightBib including light strips, control boards, and a power bank.
When a runner wears LightBib, the control board and the power bank will be packed in a waist pack it’s more easily to carry them.
Our transmission is based on the frequency shift keying that I mentioned before. We use the simplest approach to represent the ID data.
We choose one frequency to represent one runner’s ID. That means the LightBib will generate on-off pattern in one specific frequency throughout the marathon.
In other words, there is an one-to-one relationship between the light frequency, the strip width in the received image, and the runner’s ID.
The advantage is that you only need one frame to obtain the runner ID and to recognize the runner.
However, the trade-off is the limited number of unique IDs because we have finite range of usable frequencies.
Also, if the recognition is wrong in the first place, there would be no chance to correct it.
The receiver, cameras can be placed at any location along the route.
In our prototype, we need two cameras for a set.
One is for capturing regular video with normal exposure duration, while the other one is configured to use low exposure duration to receive the transmission of LightBib.
Here are two sets of examples of the captured images of the two cameras.
The left two images are captured with low exposure duration, and you can observe the strip pattern in the images.
The right two are captured in normal settings.
Now, I’ll explain how to decode the runner ID with this example.
There are five steps in our decoding process.
The step one is quite simple, that we only increase the contrast ratio of the image to make the strip pattern clearer.
Let’s go to step 2, barcode detection.
In this step, we want to locate the strip pattern area in the entire image.
The strip pattern and barcode are both characterized by several parallel lines.
The main concept of this algorithm is that horizontal stripe pattern would have low horizontal gradient and high vertical gradient.
Then we look for the area with highest difference of these two gradients.
After thresholding, we can obtain the contours of the strip patterns.
Finally, we get the bounding box of the strip patterns.
We now have those crucial area in the images and we want to estimate the strip width in this step.
We adopt a well-known algorithm YIN, which is originally designed to estimate the fundamental frequency of a piece of audio signal.
We use it to find the period of the periodical square wave in the received images, and that is the strip width we want.
To increase the accuracy, we apply simple tracking strategy.
The reason we want to do tracking is to avoid the influence of background noise.
Here, we defined bib center, as the center coordinate of all strip pattern area we found in step 2.
I’ll show you a example of two consecutive frames.
This one is the former frame, and the strip pattern area is marked with green rectangle. The red dots are the bib center we just defined.
And this one is the latter frame.
This is former. This is latter.
We calculate the mean shift of bib center in a small time duration to estimate reasonable shifting in next frame.
While the background usually has no movement, we can filter out those error recognitions.
I overlap the two consecutive frame into this one. The orange rectangular is for the former frame, and the green one is for the latter frame.
We track the LighBib by calculating the shifting of those red dots.
Doing tracking also provides another benefit.
After tracking a series of image frames of the same runner, we can average the estimated strip width over time to reduce the noise in the estimation.
The last step is quite simple, too.
We find the closest frequency to the estimated strip width, and get the ID with the mapping table we already constructed.
After discussing about how LightBib works, let’s see the performance of it.
We performed the experiments with 3 different parameters: weather condition, camera perspective, and number of runners in an image.
In each scenario, we record 5 videos and calculate their average recall.
In the following slides, I’ll focus on different parameters and compare the result.
First, we can observe that less ambient light disturbance will make the recognition result more accurate.
Second, we can see that when cameras are directly facing the runner, the results are better.
That’s because when capturing from the front, there will be larger LightBib area in the images, and it’s less possible that runners block the LightBib with their arms or hands.
The final parameter is the number of runners in an image.
You can easily observe that it has negative impact on the result.
But the recall is still close to 90% in the best case.
We set the maximum number of 5 because we believe that except the start line, it’s rare that there are more than 5 runners in one frame along the route.
Even if there are more than 5 runners, as long as the LightBib area is clear enough, we can still recognize the ID.
Finally, let’s come to the conclusion of this work.
In conclusion, we introduce LightBib, a new method to automatically recognize runner and gather their pictures.
And we have the average recall at 90% in wide range of scenarios.
Our future work will focus on increasing the number of light patterns to represent more runners.
Another future work is making the LightBib smaller and more easier to carry.
That’s all for my presentation, Thank you for your listening.
But how exactly can we relate the width of the strip in the received image to the transmitted frequency?
Let’s assume in your image, you observe a pair of dark and bright strips that has a width of 2W, which means it occupies 2W rows of pixels.
Since consecutive rows of pixels are exposed at a time difference of T of r here, which is called read-out time, the time we need to capture these two strips is 2W multiply T of r amount of time. And that actually equals the signal duration of the square wave you transmitted.
So now, we get this equation that 2W multiply T of r equals the inverse of the frequency. That gives us the relation between the strip width and the transmitted frequency.
(math)
(Next, we want to find out how many usable frequencies in the frequency set.)