• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
St Slides
 

St Slides

on

  • 361 views

test5

test5

Statistics

Views

Total Views
361
Views on SlideShare
361
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    St Slides St Slides Presentation Transcript

    • Sound Texture: Wavelet Tree Learning and Tiling and Stitching Antonio De Sena and Pietro Polotti desena@sci.univr.it, polotti@sci.univr.it ` Universita degli Studi di Verona ` Universita degli Studi di Verona – p. 1/34
    • Goals Illustrating different definitions for sound textures proposed by different authors. Present the basic ideas of two different approaches for analyzing and synthesizing sound textures. Stimulating from the audience a proposal of definition/classification of audio/sound texture. ` Universita degli Studi di Verona – p. 2/34
    • Definitions (1) Definition by Dubnov et al., Hebrew University, Jerusalem, Israel (2002) [1]: “We can describe sound textures as a set of repeating structural elements (sound grains) subject to some randomness in their time appearance and relative ordering but preserving certain essential temporal coherence and across-scale localization.” Ex: “. . . natural and artificial sounds such as rain, a waterfall, traffic noises, people babble, machine noises, and so on.” Fundamental assumption: “. . . the sound signals are approximately stationary at some scale.” Comment: this is according to a precise analytical tool. ` Universita degli Studi di Verona – p. 3/34
    • Definitions (2) Definition by Dubnov and Tishby, Hebrew University, Jerusalem, Israel (1997) [2]: “Sound texture can be considered as stationary acoustical phenomena that obtain their acoustical effects from internal variations in the sound structure.” Variations like: “. . . micro-fluctuations in the harmonics of a pitched sound or statistical properties of random excitation source in an acoustic system.” ` Universita degli Studi di Verona – p. 4/34
    • Definitions (3) Definition by Parker and Behm, University of Calgary, Canada (2004) [3]: “A sound texture can be described as having a somewhat random character (?), but a recognizable quality. Any small (?) sample of a sound texture should sound very much like, but not identical to, any other small sample.” Comment: This is very qualitative. Definition by Norris and Denham, University of Plymouth, (2003) [4]: “A sound texture may be loosely defined as a sound which may have some local structure, but has no perceptually obvious long-term structure.” Comment: This is rather vague. ` Universita degli Studi di Verona – p. 5/34
    • Definitions (4) Definition by Athineos and Ellis, Columbia University, U.S.A. (2003) [5]: “. . . we look at a third class of sounds we call sound textures that are distinct from speech and music.” “. . . textures should have an undetermined extent (duration) with consistent properties (at some level), and be readily identifiable from a small (?) sample.” Comment: “. . . consistent properties”, is a bit vague. Comment: “. . . identifiable from a small sample” seems to be a perceptual criterion (?). They consider the existence of a global structure in time. ` Universita degli Studi di Verona – p. 6/34
    • Two different approaches Creating Sound Textures by Example: Tiling and Stitching. Starting from image processing methods (tiling and stitching) the Parker and Behm [1] developed a new method for creating sound textures. Creating Sound Textures through Wavelet Tree Learning. Starting from an image processing method developed in [4], Dubnov et al. extend this method to the case of audio signal for the creation of sound textures. ` Universita degli Studi di Verona – p. 7/34
    • Creating Audio Texture by Example: Tiling and Stitching (1) Definition by Parker and Behm, University of Calgary, Canada (2004) [3]: “A sound texture can be described as having a somewhat random character, but a recognizable quality. Any small sample of a sound texture should sound very much like, but not identical to, any other small sample.” Comment: This is very qualitative. Examples: waterfall, rain, traffic noises . . . For every chunk, the frequency distribution should not change, nor should any rhythmical pattern or timbre characterization. ` Universita degli Studi di Verona – p. 8/34
    • Creating Audio Texture by Example: Tiling and Stitching (2) Tiling and Stitching based methods. Image Quilting (image processing). Square sample blocks with fixed size. Overlap between adjacent blocks. Select blocks that have some significant measure of agreement between them a . Smoothing edges for reducing “mosaic” effect a . a No more information provided. ` Universita degli Studi di Verona – p. 9/34
    • Creating Audio Texture by Example: Tiling and Stitching (3) Tiling and Stitching based methods. Chaos Mosaic (image processing). Start with only one block. Image will be created copying the block (tiling) to fill the requested size. A chaos transformation need to be applied; es: Arnold’s Cat Map: xl+l = (xl+l + y l+l ) mod m y l+l = (xl+l + 2y l+l ) mod m This transformation maps the output image onto itself. Where image size is m × m, and the iteration number is l. Applied to blocks of pixels (and not on single pixel, to preserve local features). Smoothing edges (or fade) for reducing “mosaic” effect. ` Universita degli Studi di Verona – p. 10/34
    • Creating Audio Texture by Example: Tiling and Stitching (4) Stitching based methods: generation from a sound texture sample. The sound texture need to be separated in blocks of equal duration. Using this blocks a bigger sample can be created. A least square measure is used to find blocks whose head (first 15%) is similar to the tail (last 15%) of the previous one. Blocks is chosen using a LRU (Least Recently Used) algorithm (in combination with least square measure) to “forcing” the procedure to pick up all the chunks. Chunks are cross-faded (15%). ` Universita degli Studi di Verona – p. 11/34
    • Creating Audio Texture by Example: Tiling and Stitching (5) Chunk size can be determinate using amplitude peaks. The entire source sample is analyzed for RMS amplitude and peaks in amplitude with more than 1.5 standard deviation from the baseline are recorded. The mean and standard deviation of the observed distances between these peaks is used to generate the size of each chunk. Hopefully, then, each chunk will contain one “feature” that a listener can recognize. ` Universita degli Studi di Verona – p. 12/34
    • Creating Audio Texture by Example: Tiling and Stitching (6) Tiling based methods: (chaos mosaic) generation from a sound texture sample. Make a matrix with row exactly large enough to hold one period at the “dominant” (?) frequency, or an integer number of periods. Fill the matrix with the sample (row by row). Partition it in rectangular regions. Width of these regions is computed using the dominant frequency (ex: width= n · Fd , with n 150). The corner of the regions are randomly moved using a normal function (with d = 15%) of the box size. ` Universita degli Studi di Verona – p. 13/34
    • Creating Audio Texture by Example: Tiling and Stitching (7) Arnold’s Cat Map is applied with blocks one half smaller to create a background. The background is necessary because the next step can leaves “holes” in the wave. Arnold’s Cat Map is applied at normal size blocks (overlap without add at the background). ` Universita degli Studi di Verona – p. 14/34
    • Creating Audio Texture by Example: Tiling and Stitching (8) Comments. Idea: handicraft work. Two ideas readapted from image processing. Textures: not bad, there are some problem like rhythmical patterns, time-envelope problems, repetitions. ` Universita degli Studi di Verona – p. 15/34
    • Creating Audio Texture by Example: Tiling and Stitching (9) Appendix: Synthesis with Gaussian Pyramid. Again, an idea taken from image processing. A wavelet-like pyramid (MRA tree) done with a gaussian filter (lowpass) and the difference between original and filtered signal (details, bandpass). No full description available. Only a single page brief explanation available on http://pages.cpsc.ucalgary.ca/ ~parker/gamesresearch/tsketch-texture.pdf . ` Universita degli Studi di Verona – p. 16/34
    • Sounds examples and comments Creating Audio Texture by Example: Tiling and Stitching. Crowd (audience): macro-evident repetitiveness (examples sound as juxtaposed reiterated patterns). Time envelope problems: “volume discontinuity”. Fire: less macro-evident repetitiveness (sound examples of juxtaposed repeated patterns). No time envelope problems. Water: the “chaos” example is the best. Other examples: time-envelope problems feeling of “volume discontinuities”. In general it is the least repetition-like. Surf and gulls: Block copy, obtained with small windows, thus less repetition of temporal (almost rhythmical) patterns. ` Universita degli Studi di Verona – p. 17/34
    • Synthesizing Sound Textures through Wavelet Tree Learning (1) Definition by Dubnov et al., Hebrew University, Jerusalem, Israel [1]: “We can describe sound textures as a set of repeating structural elements (sound grains) subject to some randomness in their time appearance and relative ordering but preserving certain essential temporal coherence and across-scale localization.” Ex: “. . . natural and artificial sounds such as rain, a waterfall, traffic noises, people babble, machine noises, and so on.” Fundamental assumption: “. . . the sound signals are approximately stationary at some scale” (?). Comment: this is according to a precise analytical tool. ` Universita degli Studi di Verona – p. 18/34
    • Synthesizing Sound Textures through Wavelet Tree Learning (2) Gabor theory: sound is perceived as a series of short discrete burst of energy. A further assumption: in a sound texture, a statistical characterization of the joint time-frequency and/or time-scale relations is possible. ` Universita degli Studi di Verona – p. 19/34
    • Synthesizing Sound Textures through Wavelet Tree Learning (3) Original idea developed for image (2D) and video (3D) textures [6]. Examples on next slides extracted from: http://www.cs.huji.ac.il/labs/cglab/papers/texsyn/2dtexsyn/ The audio (1D) is an adaptation of the original studies. More works to do in order to avoid silence gaps, too much similar portions, . . . ` Universita degli Studi di Verona – p. 20/34
    • Synthesizing Sound Textures through Wavelet Tree Learning (4) Original texture, same size synthesized texture, 4 times larger synthesized texture. ` Universita degli Studi di Verona – p. 21/34
    • Synthesizing Sound Textures through Wavelet Tree Learning (5) Original texture, same size synthesized texture, 4 times larger synthesized texture. ` Universita degli Studi di Verona – p. 22/34
    • Synthesizing Sound Textures through Wavelet Tree Learning (6) Statistical Learning. Estimating the stochastic source with a training example (a “sample” of the source). El-Yaniv algorithm: generate new random sequences that could have been generated from the source of the sample. The new sequences are generated by synthetic wavelet coefficients. The wavelet coefficients are obtained by following some statistically constrained paths in the analysis wavelet tree. ` Universita degli Studi di Verona – p. 23/34
    • Synthesizing Sound Textures through Wavelet Tree Learning (7) Wavelet MRA Tree. Using a Daubechies wavelet an analysis tree is built. The Daubechies has been chosen because “this wavelet has several superior properties compared to other orthonormal wavelets, especially with respect to translation and rotation invariance, aliasing, and robustness due to its nonorthogonality and redundancy” (?). Each MRA tree node stores the coefficients of the Daubechies Wavelet at a specific scale. ` Universita degli Studi di Verona – p. 24/34
    • Synthesizing Sound Textures through Wavelet Tree Learning (8) Learning. Each coefficient depends on its scale ancestor (upper level) and temporal predecessor (those to its left). Using an algorithm by El-Yaniv [1], the conditional probability along the tree path (scale) can be learnt. A second learn is done using the neighboring node (time) for preserving time structure. ` Universita degli Studi di Verona – p. 25/34
    • Synthesizing Sound Textures through Wavelet Tree Learning (9) Synthesizing. Thus, the signal can be viewed as a collection of paths from the root of the tree toward the leaves. The goal is to generate new tree whose paths are typical sequences generated by the same source, by creating new (candidate) nodes (children) for a node vi . First the algorithm copy the root and the nodes of the level 1 in the new tree. Now let’s assume that we have already generated the first i levels of the new tree. To generate the next level we must add two children nodes to each node v i in level i. The algorithm search among all nodes at i-th level of the tree for nodes wi with maximal-length ε-similar (El-Yaniv, ε is a user threshold) path suffixes w i−1 , wi−2 . . . wj . ` Universita degli Studi di Verona – p. 26/34
    • Synthesizing Sound Textures through Wavelet Tree Learning (10) Synthesizing. Among these candidate the algorithm look for those nodes whose kth (k is a user parameter) predecessor (the nodes on the left in the same level) resemble those of v i children. The algorithm then randomly chooses a candidate and copies the values to the node v i . ` Universita degli Studi di Verona – p. 27/34
    • Synthesizing Sound Textures through Wavelet Tree Learning (11) Comments. Theoretical approach: very interesting mathematical background. Experimental results: silence gap and pattern repetitions. Results with images are better, but probably because image perception is different from audio perception. ` Universita degli Studi di Verona – p. 28/34
    • Sounds examples Synthesizing Sound Textures through Wavelet Tree Learning. Baby crying. Shores. Traffic jam. Their textures have a strong rhythmical or temporal articulation. All the examples show the same problems: The macro-tiles seems to be generated by the same set of “randomly” chosen coefficients, resulting in unnatural-sounding repetitions. ` Universita degli Studi di Verona – p. 29/34
    • Sound Texture Modelling with CFTLP (1) Definition by Athineos and Ellis, Columbia University, U.S.A. [5]: “. . . we look at a third class of sounds we call sound textures that are distinct from speech and music.” “. . . textures should have an undetermined extent (duration) with consistent properties (at some level), and be readily identifiable from a small sample.” ` Universita degli Studi di Verona – p. 30/34
    • Sound Texture Modelling with CFTLP (2) Idea: to model texture as rapidly-modulated noise by using two linear predictors in cascadea . The first, operating in the time domain, is a normal LPC analysis and captures the spectral envelope. The second, in the frequency domain (operating on the residual of the previous LPC analysis), captures the time envelope, i.e. the time structure. Textures can be synthesized using a filtered Gaussian noise, which feed the cascade of filters whose coefficients where obtained by the analysis of the original texture sample. a A quite identical idea can be found on [7]. ` Universita degli Studi di Verona – p. 31/34
    • Sound Texture Modelling with CFTLP (3) CTFLP analysis (up) and synthesis (down) block diagrams. ` Universita degli Studi di Verona – p. 32/34
    • References (1) [1] Dubnov, S.; Bar-Joseph, Z.; El-Yaniv, R.; Lischinski, D.; Werman, M.;: Synthesizing sound textures through wavelet tree learning., Computer Graphics and Applications, IEEE , Volume: 22 , Issue: 4, pp. 38-48, (July-Aug. 2002). [2] Dubnov, S.; Tishby, N.;: Analysis of sound textures in musical and machine sounds by means of higher order statistical features., Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on , Volume: 5, pp. 3845-3848, (21-24 April 1997). [3] Parker, J.R.; Behm, B.;: Creating audio textures by example: tiling and stitching., Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP ’04). IEEE International Conference on , pp:iv-317 - iv-320 vol.4, (17-21 May 2004). ` Universita degli Studi di Verona – p. 33/34
    • References (2) [4] Michael Norris; Sue Denham;: Sound texture detection using Self Organizing Maps., Centre for Theoretical and Computation Neuroscience, University of Plymounth, UK, (Nov 2003). [5] Athineos, M.; Ellis, D.P.W.;: Sound texture modelling with linear prediction in both time and frequency domains., Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03). 2003 IEEE International Conference on , Volume: 5, pp. 648-51, (6-10 April 2003). [6] Z. Bar-Joseph et al.;: Texture Mixing and Texture Movie Synthesis Using Statistical Learning., IEEE Trans. Visualization and Computer Graphics, vol. 7, no. 2, pp. 120-135, (Apr.-Jun. 2001). [7] Zhu, X.L.; Wyse, L.;: Sound texture modeling and time-frequency LPC., Proceedings of the Conf. on Digital Audio Effects (DAFX-04), Napels, Italy, (5-8 October 2004). ` Universita degli Studi di Verona – p. 34/34