Computer Vision based Dance
Sanjay Goel, Chirag Gupta, T. Gnana Swaroop,
Gaurav Jain, Tarang Gupta and Shoma Chatterjee
Jaypee Institute of Information Technology,
Abstract :- In this paper we discuss a Computer Vision based tool for dance scholars. The tool aims to
use computer vision to enable the analyst to concentrate on body movements. The processed video highlights
the main body motion by extracting body contour. The tool provides the ability to add and display
additional textual information with each frame. It also facilitates juxtaposition of original video with extracted
The fascination for Indian dance all over the world is indicative of the deep-felt need to use the
human body to express and celebrate the great universal truths. It illuminates India's culture in a
direct manner, playing on the sensibilities of the onlooker. Dance in India has seeped into several
other realms like poetry, sculpture, architecture, literature, music and theatre. All Indian dance
forms are thus structured around the nine rasas or emotions , hasya (happiness), krodha (anger),
bhibasta (disgust), bhaya (fear), shoka (sorrow), viram (courage), karuna (compassion), adbhuta
(wonder) and shanta (serenity). Very little contribution other than video storage and
dissemination has been made by the fast growing computer technology in the field of dance and
none for Indian classical dance. This paper discusses a tool that is under development to help
dance scholars to analyze solo dance performances.
1.1 Literature Survey:
Computers find various uses in Dance . Some important non-graphics uses are described
in administration, lighting control, and competition scrutineering. Graphical applications
include notation, choreography, teaching, and performance. One of the earlier works 
explores the nature of creative composition particularly as it applies to dance, and describes
the development of interactive computer based tools to assist the composer. The hierarchical
nature of the composition process calls for an interface which allows the composer the
flexibility to move back and forth between alternate views and conceptual levels of
abstraction. COMPOSE, an interactive system for the composition of dance has been
implemented on Silicon Graphics and Apple workstations, . The user visually composes in
space and in time using menus of postures and sequences. The animation of the dance
composition allows the final result to be evaluated.
One of the first Dance Technology composition, choreographed in 1994, researchers used
Motion Interactive (MINT) - a special motion - capture program they developed - to translate
dance into computer animation, . Two video cameras captured the movement of reflective
markers at 27 points on a dancer’s body. The researchers digitized the video, using it to
create a computer model of the dancer . For another performance, the researchers
employed infrared cameras to track emitters hidden on a dancer’s costume. This data is fed
into a high - speed graphics workstation in real time, the animation video resulted in
animated trails of the dancer’s movements by projection of real-time graphics onto a
translucent screen .
A collaboration between the Atlanta Ballet and Georgia Tech’s Interactive Media
Technology Center (IMTC), the Dance Technology Project featured combining ballet and
computer animation techniques . The project dealt with video costuming. That is, a camera
and computer system track the motions of the dancers on stage while a second graphics
computer is used to create their ‘virtual costumes’ which are projected onto them, in exact
registration to their body orientations - even as they dance. Other activities were computer
generated dancers intermingling with real dancers, and computer - generated art ‘created’ by
the dancers as the performance progressed .
The work reported in  deals with phrase structure detection in contemporary western
dance. Phrases are a sequence of movements that exist at a higher semantic abstraction than
gestures. The problem is important since phrasal structure in dance, plays a key role in
communicating meaning, . They detect fundamental dance structures that form the basis
for more complex movement sequences.
Computed dancing figures have also been proposed as an aid in teaching dance . For
example, the computer could be used to show idealised movements slowly of fast steps,
that are impossible to demonstrate slowly because of problems with balance or momentum.
Computers could also be used as a teaching aid for student to classify for themselves
steps with complex alternatives .
The authors in  have come up with an algorithm of synthesizing music that can
appropriately express emotions in dances. This algorithm can help one compile music
suitable for dance movies or animation films, and is also applicable to any entertainment
systems that use music or dance. The algorithm is composed of three modules. The first
is the module of computing emotions from an inputted dance, the second that of
computing emotions from music in the database and last that of selecting music suitable
for inputted dance via an interface of emotion, .
An experimental dance performance featuring live-motion capture, real-time computer
graphics, and multi-image projection was produced by a cross-departmental team of faculty
and students at Purdue University, . Dancers occupied and traversed performance mediums
or ‘frames’ including a virtual performance frame occupied by a 3D character, driven by a
dancer in motion-capture equipment. Developing and facilitating the relationships between
the dancers in various performance frames was a primary focus of the project.
A multimodal information system method for a basic dance training system is discussed
in . The system targets on beginners and enables them to learn basics of dances easily.
One of the most effective ways of learning dance is to watch a video showing the
performance of dance masters. However, some information cannot be conveyed well
through video. One is the translational motion, especially that in depth direction. One
cannot tell exactly how far does the dancers move forward or backward, . Another is the
timing information. Although one can tell how to move arms or legs from video. It is
difficult to know when to start moving them. The first issue is solved by introducing an
image display on a mobile robot . One can learn the amount of translation just by
following the robot. They introduced active devices for the second issue . The active
devices are composed of some vibro-motors and are developed to direct action-starting cues
with vibration, .
1.2 Scope of project
The main objective of our project is to exploit the potential of digital image processing and
computer vision techniques to serve some of the common and regular needs of scholars and students
of Indian classical dance. At present, dance students and scholars learn or analyze dance
movements by observing performances of professional dancers. Video recordings of
performances are popular for later reference and analysis. Often the dance scholar needs to
concentrate on specific aspects like hand movement and so on. In the absence of any tool to filter out
distracting details, such scholarly analysis becomes a tedious task. User friendly software tool(s) can
help dance scholars analyze and annotate the recorded performances, add subtitles and annotation for
specific frame sequence and store the annotated video in a regular format viewable on any
regular media player. Availability of such software will encourage scholars to add more
information in recorded videos which can be accessed, understood and appreciated by a
common man. Such software will allow the scholars to create a well documented archive of
dance videos with searchable annotations.
2. Outline of the Algorithm
This section discusses design of our Computer Aided Dance Analysis and Visualization tool.
The tool allows users to view the original video and the processed video simultaneously. It
also allows the users to add information to every frame of the dance video. All forms of
Indian Classical dance depict a story. Frame specific information can be added with each
frame as frame annotation. It allows the users to filter out distracting details by extracting
body contour and image skeleton.
We have designed a sample interface using Visual C++. Matlab has been used as an
intermediate test environment , where we tested the various image processing algorithms
which have later been migrated to Visual C++. The phases involved in the design of the
dance tool and the results are underlined below. The main processes are segmentation, edge
detection and skeletonization.
2.1 Segmentation and Edge Detection
We extract frames from the video, as in Fig. 1, and segment to separate the dancer from
the frame. We use the Region Growing algorithm  for this because we need to separate
the dancer on the basis of color as well as region. For applying multi pass region growing
algorithm, we convert the image to 256 level grayscale. We initially select all pixels
of a frame as seed points. Then we compare alternate seeds (s1) across the height
and width of the frame with all four seed neighbors (north,south,east,west). The initial
threshold range is kept to be within eight graylevel difference. If the two compared
seeds are found similar i.e. within the threshold range, then we mark the neighbor
with the value of seed s1 . Next we continue with the same process in subsequent
passes by doubling the threshold range in subsequent passes until it reaches 128 as
we need a binary output. The output of this process is the image of dancer separated
from the rest of image as shown in Fig. 2. We convolve the segmented image with the
Laplacian Mask for boundary detection  as in Fig. 3.
Fig 1. Input Video Frame
Fig 2. Clustered Frame
Fig 3. Edge extracted/Boundary detected
The next and the most important algorithm is skeletonization. We tried using different
thinning algorithms like Medial Axis Transform  but none of them gave statisfactory
results. In Medial Axis Transformation of a region R with border B, for each point p in R,
we find its closest neighbor in B. If p has more than one such neighbor, it is said to
belong to medial axis of R. The Medial Axis Transformation does not serve our purpose as
the output of this algorithm for an L-shaped figure would be as shown in Fig. 4.
Skeleton from Medial Axis Transformation Expected Skeleton
Also the complexity of the Medial Axis Transformation is very high as it compares every
pixel in the boundary of the region to each of the pixels in the image. This was the reason
why we had to modify the algorithm to suit our purpose. In the modified algorithm
instead of calculating the distance of each pixel from the boundary pixel we calculate the
distance of each boundary point from its horizontally opposite boundary point. Unlike Medial
Axis Transform where we compare all the points (whether boundary or region) with each
other, we select only the boundary points. We select a point in the boundary and find
its opposite (horizontal) boundary point. One horizontal line may have portions of more
than one body parts. The odd numbered boundary points on every horizontal line mark the
beginning and the corresponding subsequent even numbered boundary point mark the end
of the body part. We mark the center of these two points as the skeleton point for all
such pairs of boundary points. S represents the set of all skeleton points. The output we
get from this is not the perfect skeleton in all positions but this along with the
boundary is enough for user to visualize the movements. This allows us to get closer to
the expected skeleton as in Fig. 5 with much lower complexity allowing us to process same
video in a lesser time.
Fig 5. Skeletonized Frame
2.3 Object Tracking
The final image processing algorithm in the project is synchronized multiple inter-connected
object tracking. The main objective of this step is to mask and track the anchor points of
the body. These anchor points comprised of head, neck, shoulders, elbows, palms, waist and
one or two points for legs. The points for legs were kept low keeping in mind the
traditional classical dances where the female dancers wear saris. In our future work, we plan
to use the anchor points for creating Vector Stick diagram of the dancer.
Traditional object tracking [15,16] failed in our case because such algorithms are made
for tracking simple object in a video. In our case we had to track multiple inter-connected
complex objects (body parts) in a video and that too in a synchronized way. We have
designed our own algorithm for this purpose. This algorithm is based on the principle of pattern
matching and tracks objects in the input video using the output from the Skeletonizing algorithm.
First of all we mark a point on the object which we want to track as p1. We quantize the image
into 32 gray levels. When an object moves in a video some blurring is caused which
results in slight changes in color of the object. To correct these errors we quantize the
image into 32 gray levels as they are adequate to track major object motions. Now we
take a 11x11 pixel window (w1) on frame (j) with center as p1. Then we take 121, 11x11
pixel search windows from frame (j+1) with centers lying on each pixel of the
corresponding 11x11 window (w1) on frame (j+1). We compare the histograms of all
these search windows on frame (j+1) with (w1) and identify the window of
minimum difference as the region (wr ) in which motion has taken place. The comparison
of histograms is done according to the following formula:
Diff = ∑ [ƒ1 (binx) - ƒ2 (binx)]; where x extends from 1 to 32 gray scales
ƒ1(x): number of pixels of bin x gray scale in primary window.
ƒ2(x): number of pixels of bin x gray scale in search window.
Now the problem is to search one point out of these 121 points in the window (wr).
Firstly, we find the difference along the x and y axis between p1 and the center of wr
(center of wr - p1) to identify the quarter as given in Table 1 with (0,0) being considered
as the top left corner of the image.
X Difference Y Difference Quarter
Positive Positive Bottom right
Negative Positive Bottom left
Positive Negative Top right
Negative Negative Top left
Table 1 : Quarter Identification in the search window
Now we search the identified quarter for skeleton points in top-down, left-right order
and mark the first skeleton point found. Marking the point on the skeleton makes sure that
the point does not move out of the body. If the point is not found in this quarter we
scan the full window for skeleton points and mark the first skeleton point found. If there
is no point of skeleton in this window we simply mark the center of this window as the
corresponding point in frame (j+1). This process is applied to all the consecutive frames with
respect to immediate predecessor frame, hence tracking the object as shown in Fig. 6.
Fig. 6. Locus of tracked finger
The design and interface of the tool was created in Visual C++ following the Document View
Architecture. Our tool extensively used Multithreading in Visual C++. The processed video
highlights the main body motion by extracting the body contour and also provides the
ability to add and display additional textual information about the dance video with each
frame for the user. It facilitates juxtaposition of original video with extracted video as shown in
2.5 Future Scope
The work on the vectorization of the dancer’s stick diagram has been intiated. We have
also realised an interface for easy ( and precise ) access of dance videos from the Digital
Video Archive. The major obstacle is to do this without consuming huge bandwidth. Our
interface for Digital Video Archives is based on the skeletonization algorithm reported in
this paper. Details of this interface will be discussed in a future paper.
Screen Shot of the Main Application
Input Video Subtitle Addition Processed Output
Displayed Here Video Displayed Here
We are extremely thankful to Maria, a dance teacher who runs her own dance school. We
had a very fruitful discussion and we got many new interesting perspectives to look at our problem.
Some of the relevant outcomes of this discussion were, extracting the dancer from the dance video,
hiding irrelevant information like color of dress etc., and applying enhancements on the dancer to
study the dance movements better and ability to compare two dance performances of same dancer or
similar performances by different dancers.
 Visual dictionary of Hastas for Indian dance - hand gestures of Indian Dance
 T. Schiphorst, et al., Tools for Interaction with the Creative Process of Composition.
Centre for Systems Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada., CHI 90 Proceedings,
pp 167 – 174.
 Reseach for the Games. Georgia Tech forged strong ties to Atlanta’s 1996 Olympic Games.
Compiled by Lea McLees.
 V. M. Dyaberi, et al. Phrase Structure Detection in Dance, Proceedings of the 12th annual ACM
international conference on Multimedia, Oct 2004, pp. 332 - 335.
 Dance and the Computer : A Potential for Graphic Synergy. Technical Report 422.
Basser Department of Computer Science. University of Sydney, Oct 2003.
 Hirofumi Morioka , et al.Proposal of an Algorithm to Synthesize Music Suitable for Dance.
Proceedings of the 2004 ACM SIGCHI International Conference on Advances in
Computer Entertainment Technology, Sept 2004, pp. 296 – 301.
 W. Scott, et al. Mixing Dance Realities: Collaborative Development of Live -
Motion Capture in a Performing Arts Environment. ACM Computers in Entertainment ( CIE ),
vol 2, issue 2, April 2004.
 Akio Nakamura, et al. Multimodal Presentation Method for a Dance Training System.
Saitama University, JAPAN. CHI 2005, pp 1685 – 1688.
 R.C. Gonzalez and R.E. Woods - Digital Image Processing, second edition, Pearson Education.
 The Open Video Project
 Digital Video Archives: Managing Through Metadata
 Informedia Digital Video Library System – Carnegie Mellon University
 Fabio Chestani -
Video Retrieval Interfaces
 Building GUIs with Matlab version 5 from MathWorks
 Darrell D. Demirdjian T. Ko T. -
Constraining Human Body Tracking,
Artificial Intelligence Laboratory, MIT
 Minden Gary, Niehaus Doug and Roberts James -
The Digital Video Library System: Vision and Design