pic2code: Generating HTML Code
from Handwritten Picture
Minhazul Arefin, Kazi Mojammel Hossen, Robin Khan
Overview
Problem Description
Proposed Methodology
Conclusion
Introduction
Objective
Result & Analysis
Introduction
● Technological advances have made Internet pages more important.
● New websites reflect a nation's culture and identity.
● Websites cover learning, social work, gaming, training, marketing, and
advertising.
Introduction
● Programming with automated code creation is faster, cheaper, and more
efficient.
● Progressive design accelerates website completion.
Problem Description
• Designing a multi-system program from scratch can be laborious.
• Balancing program logic and GUI code can be challenging.
Objective
● Development of novel HTML builder approaches for converting images into
HTML format.
● Implementation of deep learning techniques enhances model performance.
● Use of platform-specific runtime languages for GUI programming.
Proposed System Architecture
● Utilization of You Only Look Once
(YOLOv3) model for object detection
● Image preprocessing and feature
extraction using convolutional neural
networks
● Conversion of hand-drawn sketches to
HTML markup language
Data Acquisition
● Use of Microsoft AI Lab's Sketch2Code
program for dataset creation
● Selection of images with UI elements for
training and validation
● Training the model on carefully chosen
photos for accurate detection and encoding
YOLO3 Model
● Bounding boxes with dimension priors and position prediction
Bounding Box Prediction
● It is crucial for accurately detecting and
localizing objects in an image.
● It is a key component of the YOLOv3
model used in the research study.
● It involves using dimension clusters as
anchor boxes to forecast the bounding
boxes in an image
Feature Extraction
● The feature extraction network
incorporates a novel design with shortcut
connections, 3 × 3 and 1 × 1 convolutional
layers, resulting in the architecture
referred to as Darknet-53.
● Comparative analysis indicates that
Darknet-53 outperforms ResNet-101 by
1.5× and matches the performance of
ResNet-152 while being twice as fast.
● The network also boasts the highest
measured floating-point operations per
second.
Feature Extraction
● The network employs multi-scale feature
extraction with three scales (52x52,
26x26, and 13x13) for a 416x416 image.
● The mention of a "detection head"
emphasizes the need for a specialized
component in the network, involving layer
strides and specific architecture to
effectively detect classes and locations in
the input data.
Training
● The system will utilize Joseph Redmon's
YOLOv3 model architecture
● The one-shot learning nature of YOLOv3
suggests potential efficiency, particularly
for tasks like predicting 60-frames-per-
second (fps) videos, where rapid
processing is crucial.
● The mention of pre-trained models
highlights the utilization of existing models
for training, even with a single dataset.
Class Prediction
● The system employs multilabel
classification, where each bounding box
predicts the possible classes, it may
contain.
● Instead of using a SoftMax classifier,
separate logistic classifiers are utilized for
performance reasons.
● Demonstrates the system's effectiveness
in handling complex datasets with
overlapping labels.
Object Cropping
● Each identified area is assigned a
confidence score, and a class is assigned
to the UI component contained within the
identified box.
● The second sub-problem addressed by
the system is the occurrence of
overlapping classification boxes.
● This overlap can be attributed to two
scenarios: Firstly, the close proximity of
two components, and secondly, the
consideration of two distinct components
as a single entity.
Object Recognition
1. Detecting Objects: The system utilizes
the YOLOv3 model to identify classes in
hand-drawn sketches of UI elements such
as buttons, text fields, and images.
2. Detecting Characters: To recognize text
in UI elements, a deep convolutional
neural network (CNN) model is
incorporated into the system.
5. HTML Page Builder Algorithm
● This technique generates header, body, and
footer HTML templates using contour-finding
coordinates.
● Employ raw coordinates to manage different
UI elements and their positions in the HTML
templates.
● Codes are fitted into the templates, resulting
in the capture of the final HTML code for the
web interface.
5. HTML Page Builder
● Uphases and conditionals to structure the
HTML code.
● The method involves stack management to
handle individual elements, checking the top
stack for the next building level, and ensuring
proper order in the HTML generation.
● Dynamically builds HTML pages by
considering input objects' y-coordinates and
employing stack management to create
templates for the header, body, and footer
sections.
Results and Evaluation
● Accuracy rate of 87.71% achieved by the proposed system
● Little bit low accuracy from the state-of-the-art model
Limitations and Future Work
● Inability to support hyperlinks and generate CSS or other types of code
● Potential enhancements and improvements for the proposed system
● Future possibilities for expanding the functionality of the system such as
generating text from real-time videos
Conclusion
● The method presented in the study can transform hand-drawn mock-ups of
web pages into well-organized HTML code.
● This process streamlines the transition from design to implementation,
allowing for efficient web page development.

pic2code: Generating HTML Code from Handwritten Picture.pptx

  • 1.
    pic2code: Generating HTMLCode from Handwritten Picture Minhazul Arefin, Kazi Mojammel Hossen, Robin Khan
  • 2.
  • 3.
    Introduction ● Technological advanceshave made Internet pages more important. ● New websites reflect a nation's culture and identity. ● Websites cover learning, social work, gaming, training, marketing, and advertising.
  • 4.
    Introduction ● Programming withautomated code creation is faster, cheaper, and more efficient. ● Progressive design accelerates website completion.
  • 5.
    Problem Description • Designinga multi-system program from scratch can be laborious. • Balancing program logic and GUI code can be challenging.
  • 6.
    Objective ● Development ofnovel HTML builder approaches for converting images into HTML format. ● Implementation of deep learning techniques enhances model performance. ● Use of platform-specific runtime languages for GUI programming.
  • 7.
    Proposed System Architecture ●Utilization of You Only Look Once (YOLOv3) model for object detection ● Image preprocessing and feature extraction using convolutional neural networks ● Conversion of hand-drawn sketches to HTML markup language
  • 8.
    Data Acquisition ● Useof Microsoft AI Lab's Sketch2Code program for dataset creation ● Selection of images with UI elements for training and validation ● Training the model on carefully chosen photos for accurate detection and encoding
  • 9.
    YOLO3 Model ● Boundingboxes with dimension priors and position prediction
  • 10.
    Bounding Box Prediction ●It is crucial for accurately detecting and localizing objects in an image. ● It is a key component of the YOLOv3 model used in the research study. ● It involves using dimension clusters as anchor boxes to forecast the bounding boxes in an image
  • 11.
    Feature Extraction ● Thefeature extraction network incorporates a novel design with shortcut connections, 3 × 3 and 1 × 1 convolutional layers, resulting in the architecture referred to as Darknet-53. ● Comparative analysis indicates that Darknet-53 outperforms ResNet-101 by 1.5× and matches the performance of ResNet-152 while being twice as fast. ● The network also boasts the highest measured floating-point operations per second.
  • 12.
    Feature Extraction ● Thenetwork employs multi-scale feature extraction with three scales (52x52, 26x26, and 13x13) for a 416x416 image. ● The mention of a "detection head" emphasizes the need for a specialized component in the network, involving layer strides and specific architecture to effectively detect classes and locations in the input data.
  • 13.
    Training ● The systemwill utilize Joseph Redmon's YOLOv3 model architecture ● The one-shot learning nature of YOLOv3 suggests potential efficiency, particularly for tasks like predicting 60-frames-per- second (fps) videos, where rapid processing is crucial. ● The mention of pre-trained models highlights the utilization of existing models for training, even with a single dataset.
  • 14.
    Class Prediction ● Thesystem employs multilabel classification, where each bounding box predicts the possible classes, it may contain. ● Instead of using a SoftMax classifier, separate logistic classifiers are utilized for performance reasons. ● Demonstrates the system's effectiveness in handling complex datasets with overlapping labels.
  • 15.
    Object Cropping ● Eachidentified area is assigned a confidence score, and a class is assigned to the UI component contained within the identified box. ● The second sub-problem addressed by the system is the occurrence of overlapping classification boxes. ● This overlap can be attributed to two scenarios: Firstly, the close proximity of two components, and secondly, the consideration of two distinct components as a single entity.
  • 16.
    Object Recognition 1. DetectingObjects: The system utilizes the YOLOv3 model to identify classes in hand-drawn sketches of UI elements such as buttons, text fields, and images. 2. Detecting Characters: To recognize text in UI elements, a deep convolutional neural network (CNN) model is incorporated into the system.
  • 17.
    5. HTML PageBuilder Algorithm ● This technique generates header, body, and footer HTML templates using contour-finding coordinates. ● Employ raw coordinates to manage different UI elements and their positions in the HTML templates. ● Codes are fitted into the templates, resulting in the capture of the final HTML code for the web interface.
  • 18.
    5. HTML PageBuilder ● Uphases and conditionals to structure the HTML code. ● The method involves stack management to handle individual elements, checking the top stack for the next building level, and ensuring proper order in the HTML generation. ● Dynamically builds HTML pages by considering input objects' y-coordinates and employing stack management to create templates for the header, body, and footer sections.
  • 19.
    Results and Evaluation ●Accuracy rate of 87.71% achieved by the proposed system ● Little bit low accuracy from the state-of-the-art model
  • 20.
    Limitations and FutureWork ● Inability to support hyperlinks and generate CSS or other types of code ● Potential enhancements and improvements for the proposed system ● Future possibilities for expanding the functionality of the system such as generating text from real-time videos
  • 21.
    Conclusion ● The methodpresented in the study can transform hand-drawn mock-ups of web pages into well-organized HTML code. ● This process streamlines the transition from design to implementation, allowing for efficient web page development.