MELTEM BALLAN, PH.D.
MELTEMBALLAN@GMAIL.COM
HTTPS://WWW.LINKEDIN.COM/IN/MELTEMBALLAN
Learning From DATA:
Image processing as a part of Big Data
Initiatives
Data is like gunpowder!
You can make a marvelous firework
OR
a dangerous weapon from it
Erosion of Boundaries in Information Age
•Between industrial sectors
•Between products and services
•Between producers and users
•Between IT and non-IT industries
•Between science and industry
•Between science disciplines
•Between people
New Generation of Products in Information Age
• More digital than analogue
• Advanced mechanical components enabled by CAD techniques
• Increasing powers of embedded IT components
• Increased complexity
Greater flexibility
More functions
Higher performance
•Major Advances in Sensor Technology
•Major Advances in Sensor DP Technology
•Use of Machine Learning and Soft Computing
Recognition Technology
Intelligent Recognition Technology
• Eyeprint identification in ATM cash machines. In this system
developed by NCR, a camera captures a digital record of a user's iris
and can verify identity within seconds from a central database.
• Supermarket checkout scanner (US Patent 5,673,089) which uses
scent sensors to identify fruits and vegetables.
• Molecular breath analyzer that can detect diseases such as lung
cancer, stomach ulcer and hepatitis at much earlier stages than
currently used in radiological and laboratory tests.
What is INTELLIGENCE?
"Intelligence is a mental quality that
of the abilities to learn from experience,
adapt to new situations, understand and
handle abstract concepts, and use
knowledge to manipulate one's
Britannica
This tells us WHAT but not HOW.
Thus opens a room for introducing
instrumental definitions. Here we may
introduce the definitions
ARTIFICIAL INTELLIGENCE
or
COMPUTATIONAL INTELLIGENCE.
What is ARTIFICIAL INTELLIGENCE?
”The branch of computer science that studies
how smart a machine can be, which involves
capability of a device to perform functions
normally associated with human intelligence
as reasoning, learning and self involvement.
Expert Systems, Heuristics, Knowledge Based
Systems and Machine Learning”
Webster’s New World Directory on Computer Terms
AI CI
Hard
Computing
Soft
Computing
FL
NN
ES
AI versus CI?
An AI program that cannot solve new problems in new ways is
emphasizing the artificial and not the intelligence. The vast majority of AI
have nothing to do with learning. They may play excellent chess, but they cannot
how to play checkers, or anything else for that matter. In essence they are
calculators.
Any system, whether it is carbon-based or silicon-based, whether it is an
individual, a society, or a species, that generates adaptive behavior to meet goals
range of environments can be said to be intelligent. In contrast, any system that
cannot generate adaptive behavior and can only perform in a single limited
environment demonstrates no intelligence.
(Fogel, 1995)
What makes the algorithms intelligent ?
Chess + Checkers = DATA
DATA
Information output by a sensing device or organ that
includes both useful and irrelevant or redundant
information and must be processed to be meaningful.
(http://www.merriam-webster.com)
GPR Data
GPR systems are able to penetrate under the ground and to detect
metallic and non-metallic objects from their dielectric characters.
GPR Data: Business Case
Land Mind or Coke Can?
FINGERPRINT DATA
• Fingerprints are convex and
concave parallel lines that occur on
points of fingers.
• Those lines are unique and do not
change with age.
SMART HOME AND PHONE TECHNOLOGIES:
BUSINESS CASE
FOOD MOLDS
 Theoretically, 1 billion fungal species
 U.S. Depertmant of Agriculture,
Agricultiral Research Service
 Technical University of Denmark
BUSINESS CASE: EDIBLE OR NOT
FAST FOOD PRICE AUDIT
 It is usually camera picture with
images
 Light can differ from angle to angle
 Pictures would be too much to
separate the image
BUSINESS CASE: SECRET PRICE
AUDIT
DATA PROCESSING APPROACH
Raw Data
Data Preparation
Feature Extraction
Classification
Which Feature of the DATA is
relevant?
What Method to use for DATA
processing?
PRE-PROCESSING METHODS
Grayscale – reduces image to one color channel, ranging from white to black
PRE-PROCESSING METHODS
Thresholding – binarizing an image in such a way that the values bigger than a threshold
will be 255 (maximum pixel value in bytes), and thus set to white and pixels with smaller
intensities will be set to 0 (black). It is a very important operation that is often used to
prepare images for vectorization or further segmentation
PRE-PROCESSING METHODS
Blurring – useful for generating background effects and shadows. It can also very useful for
smoothing the effects of jagged edges: to anti-alias the edges of images, and/or to round
out features to produce highlighting effects.
PRE-PROCESSING METHODS
Contours – curves joining all the continuous points that have the same color or intensity
along a boundary. They’re useful for object or feature detection as well as shape analysis
Bounding Rectangles - the smallest rectangle that can contain a contour. You can use them
to segment out individual letters and numbers in an image.
PRE-PROCESSING METHODS
Edge Detection – points in an image where there is a change in brightness or intensity,
which usually means a boundary between different objects. It measures changes in the
brightness of areas of an image, which we call the gradient. We can measure both
the magnitude(how drastic the change is) and direction of a gradient. If the magnitude of
change at a set of points exceeds a given threshold, then it can be considered an edge.
The Canny edge detection algorithm is a popular edge detection algorithm that produces
accurate, clean edges.
PRE-PROCESSING METHODS
Line and Shape Detection – If our objects of interest are of regular shapes like lines and
circles, you can use Hough Transforms to detect them.
PRE-PROCESSING METHODS
Line and Shape Detection – If our objects of interest are of regular shapes like lines and
circles, you can use Hough Transforms to detect them.
OPTICAL CHARACTER RECOGNITION
Reading and translating the text into computer readable
characters.
TYPICAL PRE-PROCESSING
Load
image
Convert to
tiff
Convert
the
resolution
to 300 DPI
Split image
by color
channel
Edge
detection
Find
contours
Identify
relevant
rectangles
Threshold
image
Find
background
and
foreground
intensities
Identify the
text regions
Sharpen the
letters
Slightly blur
image
Save the
processed
image
Feed the
image to
Classifier
TYPICAL CLASSIFICATION
• Supervised Learning (mapping known input to a known
output)
Classification (mold detection)
Regression (revenue forecasting)
• Unsupervised Learning (figuring out the output with
known input)
Clustering (grouping by buying behavior)
Association (associating similar behaviors)
• Mixed Learning
PAST: REAL-TIME DATA PROCESSING
 Limited Data Sample
 Time Demanding
NOW: REAL-TIME DATA PROCESSING
BOTTOMLINE
• Intelligent Recognition Technology is data driven
in this matter developing an intelligent system
requires:
• To understand the nature of the data
• To bring the expert from the different disciplines
together
AI CI
Hard
Computing
Soft
Computing
FL
NN
ES
THANK YOU VERY MUCH.
FURTHER QUESTIONS AND SUGGESTIONS:
MELTEMBALLAN@GMAIL.COM

Image Processing as a Part of Big Data Initiatives

  • 1.
    MELTEM BALLAN, PH.D. MELTEMBALLAN@GMAIL.COM HTTPS://WWW.LINKEDIN.COM/IN/MELTEMBALLAN LearningFrom DATA: Image processing as a part of Big Data Initiatives
  • 2.
    Data is likegunpowder! You can make a marvelous firework OR a dangerous weapon from it
  • 3.
    Erosion of Boundariesin Information Age •Between industrial sectors •Between products and services •Between producers and users •Between IT and non-IT industries •Between science and industry •Between science disciplines •Between people
  • 4.
    New Generation ofProducts in Information Age • More digital than analogue • Advanced mechanical components enabled by CAD techniques • Increasing powers of embedded IT components • Increased complexity Greater flexibility More functions Higher performance
  • 5.
    •Major Advances inSensor Technology •Major Advances in Sensor DP Technology •Use of Machine Learning and Soft Computing Recognition Technology
  • 6.
    Intelligent Recognition Technology •Eyeprint identification in ATM cash machines. In this system developed by NCR, a camera captures a digital record of a user's iris and can verify identity within seconds from a central database. • Supermarket checkout scanner (US Patent 5,673,089) which uses scent sensors to identify fruits and vegetables. • Molecular breath analyzer that can detect diseases such as lung cancer, stomach ulcer and hepatitis at much earlier stages than currently used in radiological and laboratory tests.
  • 7.
    What is INTELLIGENCE? "Intelligenceis a mental quality that of the abilities to learn from experience, adapt to new situations, understand and handle abstract concepts, and use knowledge to manipulate one's Britannica
  • 8.
    This tells usWHAT but not HOW. Thus opens a room for introducing instrumental definitions. Here we may introduce the definitions ARTIFICIAL INTELLIGENCE or COMPUTATIONAL INTELLIGENCE.
  • 9.
    What is ARTIFICIALINTELLIGENCE? ”The branch of computer science that studies how smart a machine can be, which involves capability of a device to perform functions normally associated with human intelligence as reasoning, learning and self involvement. Expert Systems, Heuristics, Knowledge Based Systems and Machine Learning” Webster’s New World Directory on Computer Terms
  • 10.
  • 11.
    AI versus CI? AnAI program that cannot solve new problems in new ways is emphasizing the artificial and not the intelligence. The vast majority of AI have nothing to do with learning. They may play excellent chess, but they cannot how to play checkers, or anything else for that matter. In essence they are calculators. Any system, whether it is carbon-based or silicon-based, whether it is an individual, a society, or a species, that generates adaptive behavior to meet goals range of environments can be said to be intelligent. In contrast, any system that cannot generate adaptive behavior and can only perform in a single limited environment demonstrates no intelligence. (Fogel, 1995)
  • 12.
    What makes thealgorithms intelligent ? Chess + Checkers = DATA
  • 13.
    DATA Information output bya sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful. (http://www.merriam-webster.com)
  • 14.
    GPR Data GPR systemsare able to penetrate under the ground and to detect metallic and non-metallic objects from their dielectric characters.
  • 15.
    GPR Data: BusinessCase Land Mind or Coke Can?
  • 16.
    FINGERPRINT DATA • Fingerprintsare convex and concave parallel lines that occur on points of fingers. • Those lines are unique and do not change with age.
  • 17.
    SMART HOME ANDPHONE TECHNOLOGIES: BUSINESS CASE
  • 18.
    FOOD MOLDS  Theoretically,1 billion fungal species  U.S. Depertmant of Agriculture, Agricultiral Research Service  Technical University of Denmark
  • 19.
  • 20.
    FAST FOOD PRICEAUDIT  It is usually camera picture with images  Light can differ from angle to angle  Pictures would be too much to separate the image
  • 21.
  • 22.
    DATA PROCESSING APPROACH RawData Data Preparation Feature Extraction Classification
  • 23.
    Which Feature ofthe DATA is relevant? What Method to use for DATA processing?
  • 24.
    PRE-PROCESSING METHODS Grayscale –reduces image to one color channel, ranging from white to black
  • 25.
    PRE-PROCESSING METHODS Thresholding –binarizing an image in such a way that the values bigger than a threshold will be 255 (maximum pixel value in bytes), and thus set to white and pixels with smaller intensities will be set to 0 (black). It is a very important operation that is often used to prepare images for vectorization or further segmentation
  • 26.
    PRE-PROCESSING METHODS Blurring –useful for generating background effects and shadows. It can also very useful for smoothing the effects of jagged edges: to anti-alias the edges of images, and/or to round out features to produce highlighting effects.
  • 27.
    PRE-PROCESSING METHODS Contours –curves joining all the continuous points that have the same color or intensity along a boundary. They’re useful for object or feature detection as well as shape analysis Bounding Rectangles - the smallest rectangle that can contain a contour. You can use them to segment out individual letters and numbers in an image.
  • 28.
    PRE-PROCESSING METHODS Edge Detection– points in an image where there is a change in brightness or intensity, which usually means a boundary between different objects. It measures changes in the brightness of areas of an image, which we call the gradient. We can measure both the magnitude(how drastic the change is) and direction of a gradient. If the magnitude of change at a set of points exceeds a given threshold, then it can be considered an edge. The Canny edge detection algorithm is a popular edge detection algorithm that produces accurate, clean edges.
  • 29.
    PRE-PROCESSING METHODS Line andShape Detection – If our objects of interest are of regular shapes like lines and circles, you can use Hough Transforms to detect them.
  • 30.
    PRE-PROCESSING METHODS Line andShape Detection – If our objects of interest are of regular shapes like lines and circles, you can use Hough Transforms to detect them.
  • 31.
    OPTICAL CHARACTER RECOGNITION Readingand translating the text into computer readable characters.
  • 32.
    TYPICAL PRE-PROCESSING Load image Convert to tiff Convert the resolution to300 DPI Split image by color channel Edge detection Find contours Identify relevant rectangles Threshold image Find background and foreground intensities Identify the text regions Sharpen the letters Slightly blur image Save the processed image Feed the image to Classifier
  • 33.
    TYPICAL CLASSIFICATION • SupervisedLearning (mapping known input to a known output) Classification (mold detection) Regression (revenue forecasting) • Unsupervised Learning (figuring out the output with known input) Clustering (grouping by buying behavior) Association (associating similar behaviors) • Mixed Learning
  • 34.
    PAST: REAL-TIME DATAPROCESSING  Limited Data Sample  Time Demanding
  • 35.
  • 36.
    BOTTOMLINE • Intelligent RecognitionTechnology is data driven in this matter developing an intelligent system requires: • To understand the nature of the data • To bring the expert from the different disciplines together
  • 37.
  • 38.
    THANK YOU VERYMUCH. FURTHER QUESTIONS AND SUGGESTIONS: MELTEMBALLAN@GMAIL.COM