Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Color naming 65,274,705,768 pixels

1,634 views

Published on

Presented at Color Imaging XVIII: Displaying, Processing, Hardcopy, and Applications in 2013. Application of machine color naming to 200,000+ wikipedia images.

Published in: Technology
  • Be the first to comment

Color naming 65,274,705,768 pixels

  1. 1. Color naming 65,274,705,768 pixelsNathan Moroney and Giordano BerettaHP LabsElectronic Imaging 2013: Color Imaging XVIII
  2. 2. Outline Motivation  More (pixel) data Finding and processing 65 billion pixels  Hint: Wikipedia & a dual core Open MP color namer What did you learn?  The most frequent non-achromatic color term is… What’s next?  Other than a trillion pixelsElectronic Imaging 2013: Color Imaging XVIII
  3. 3. Motivation Previous work in crowd-sourcing color training data and experimental efforts Related work in the area of big (image) data  A. Torralba, R. Fergus, W. T. Freeman, "80 million tiny images: a large dataset for non-parametric object and scene recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30(11), pp. 1958-1970, 2008.  Ben Shneiderman, "Extreme Visualization: Squeezing a Billion Records into a Million Pixels", SIGMOD Conference, pp. 3-12, (2008).  Steven Seitz, “A Trillion Photos”, EI’13 Keynote (2013).Electronic Imaging 2013: Color Imaging XVIII
  4. 4. Motivation 0 1 2 3 4 5 6 Log Number of ImagesElectronic Imaging 2013: Color Imaging XVIII
  5. 5. Source Data ImageClef 2010 snapshot  Adrian Popescu, Theodora Tsikrika and Jana Kludas, "Overview of the wikipedia retrieval task at ImageCLEF 2010", In the Working Notes for the CLEF 2010 Workshop, 20-23 September, Padova, Italy, 2010.  250,000 images plus associated wikipedia data  20 gigabytes  65,000,000,000 pixels uncompressedElectronic Imaging 2013: Color Imaging XVIII
  6. 6. Source Data: At 200 PPIElectronic Imaging 2013: Color Imaging XVIII
  7. 7. Processing Basic single dual-core (but Open MP threaded) script to process over all image files Simple stuff like getting image dimensions can be done over lunch Uncompressing all the JPEG files to memory can take hours Goal was a color naming algorithm that could be run in less than a dayElectronic Imaging 2013: Color Imaging XVIII
  8. 8. Processing Some testing done using HP Cloud Services and compute clusters But majority of focus on single computing device  Antony Rowstron, Dushyanth Narayanan, Austin Donnelly, Greg OShea, and Andrew Douglas. "Nobody ever got fired for using hadoop on a cluster", In HotCDP 2012 - 1st International Workshop on Hot Topics in Cloud Data Processing, (2012).Electronic Imaging 2013: Color Imaging XVIII
  9. 9. Processing Won’t describe the specifics of the color naming algorithm (throw produce if you have it) but generally  Input single RGB pixel and output is a single color term  Size of vocabulary or number of color terms is a parameter  Relative range of chroma values corresponding to an achromatic values is also a parameter Also currently testing a completely revised model Finally, in the Future directions section note that the best option for formal publication is to make use of currently available open source machine learning toolboxes.Electronic Imaging 2013: Color Imaging XVIII
  10. 10. Results: Aspect Ratios Wide range of image types Most basic test of processing scriptsElectronic Imaging 2013: Color Imaging XVIII
  11. 11. Results: Median Additional test and visualization of basic color properties of images Large enough data set was worthwhile to write custom HTML5 2d canvas rendererElectronic Imaging 2013: Color Imaging XVIII
  12. 12. Results: Median So much data, that as noted by Shneiderman the density plot "uses a spatial substrate organizing principle, but shows concentrations of markers” is maybe a better idea Data, alpha=0.05Electronic Imaging 2013: Color Imaging XVIII
  13. 13. Results: Max Max of R+G+B for the images Final test of basic scripting codeElectronic Imaging 2013: Color Imaging XVIII
  14. 14. Results Color terms across all images Majority pixels achromatic Top chromatic colors are arguably natural tones Higher chroma terms relatively infrequentElectronic Imaging 2013: Color Imaging XVIII
  15. 15. Results Color Terms for 200,000+ images 60000 Color terms per image 50000 Peak at 5 are all 40000 achromatic terms Number of Images 30000 or images Gradual then 20000 rapid usage of 10000 chromatic terms 0 0 5 10 15 20 25 30 35 Number of Color Terms. Maximum Vocabulary of 30Electronic Imaging 2013: Color Imaging XVIII
  16. 16. Results Color Terms for 200,000+ images 60000 Sudden drop off at 30 is a model 50000 failure 40000 Term added to Number of Images 30000 vocabulary based on previous 20000 limited 10000 optimization 0 0 5 10 15 20 25 30 35 Number of Color Terms. Maximum Vocabulary of 30Electronic Imaging 2013: Color Imaging XVIII
  17. 17. Current Work Repeated entire process adjusting the model parameters Processing to fill SQL databases Query the database to validate all of the steps and explore specificElectronic Imaging 2013: Color Imaging XVIII
  18. 18. Current Work SELECT * from cntable order by skyblue desc limit 40Electronic Imaging 2013: Color Imaging XVIII
  19. 19. Future Directions Image collections as “pixel corpora” for algorithm design, testing and optimization.  Similar to the role that written and spoken corpora fill for NLP and corpus linguistics  Useful to formalize for citation and repeatability Additional analysis features Testing with more public domain machine learning algorithms for repeatabilityElectronic Imaging 2013: Color Imaging XVIII
  20. 20. Summary Algorithm optimization, like machine color naming, with 200,000 images is different than with 200. Based on Wikipedia, majority of visual content or pixels are achromatic Based on Wikipedia, higher chroma named pixels are less frequent Based on Wikipedia, there is a gradual then sudden transition in color term usageElectronic Imaging 2013: Color Imaging XVIII

×