pic2code: Generating HTML Code from Handwritten Picture.pptx

pic2code: Generating HTML Code
from Handwritten Picture
Minhazul Arefin, Kazi Mojammel Hossen, Robin Khan

Overview
Problem Description
Proposed Methodology
Conclusion
Introduction
Objective
Result & Analysis

Introduction
● Technological advances have made Internet pages more important.
● New websites reflect a nation's culture and identity.
● Websites cover learning, social work, gaming, training, marketing, and
advertising.

Introduction
● Programming with automated code creation is faster, cheaper, and more
efficient.
● Progressive design accelerates website completion.

Problem Description
• Designing a multi-system program from scratch can be laborious.
• Balancing program logic and GUI code can be challenging.

Objective
● Development of novel HTML builder approaches for converting images into
HTML format.
● Implementation of deep learning techniques enhances model performance.
● Use of platform-specific runtime languages for GUI programming.

Proposed System Architecture
● Utilization of You Only Look Once
(YOLOv3) model for object detection
● Image preprocessing and feature
extraction using convolutional neural
networks
● Conversion of hand-drawn sketches to
HTML markup language

Data Acquisition
● Use of Microsoft AI Lab's Sketch2Code
program for dataset creation
● Selection of images with UI elements for
training and validation
● Training the model on carefully chosen
photos for accurate detection and encoding

YOLO3 Model
● Bounding boxes with dimension priors and position prediction

Bounding Box Prediction
● It is crucial for accurately detecting and
localizing objects in an image.
● It is a key component of the YOLOv3
model used in the research study.
● It involves using dimension clusters as
anchor boxes to forecast the bounding
boxes in an image

Feature Extraction
● The feature extraction network
incorporates a novel design with shortcut
connections, 3 × 3 and 1 × 1 convolutional
layers, resulting in the architecture
referred to as Darknet-53.
● Comparative analysis indicates that
Darknet-53 outperforms ResNet-101 by
1.5× and matches the performance of
ResNet-152 while being twice as fast.
● The network also boasts the highest
measured floating-point operations per
second.

Feature Extraction
● The network employs multi-scale feature
extraction with three scales (52x52,
26x26, and 13x13) for a 416x416 image.
● The mention of a "detection head"
emphasizes the need for a specialized
component in the network, involving layer
strides and specific architecture to
effectively detect classes and locations in
the input data.

Training
● The system will utilize Joseph Redmon's
YOLOv3 model architecture
● The one-shot learning nature of YOLOv3
suggests potential efficiency, particularly
for tasks like predicting 60-frames-per-
second (fps) videos, where rapid
processing is crucial.
● The mention of pre-trained models
highlights the utilization of existing models
for training, even with a single dataset.

Class Prediction
● The system employs multilabel
classification, where each bounding box
predicts the possible classes, it may
contain.
● Instead of using a SoftMax classifier,
separate logistic classifiers are utilized for
performance reasons.
● Demonstrates the system's effectiveness
in handling complex datasets with
overlapping labels.

Object Cropping
● Each identified area is assigned a
confidence score, and a class is assigned
to the UI component contained within the
identified box.
● The second sub-problem addressed by
the system is the occurrence of
overlapping classification boxes.
● This overlap can be attributed to two
scenarios: Firstly, the close proximity of
two components, and secondly, the
consideration of two distinct components
as a single entity.

Object Recognition
1. Detecting Objects: The system utilizes
the YOLOv3 model to identify classes in
hand-drawn sketches of UI elements such
as buttons, text fields, and images.
2. Detecting Characters: To recognize text
in UI elements, a deep convolutional
neural network (CNN) model is
incorporated into the system.

5. HTML Page Builder Algorithm
● This technique generates header, body, and
footer HTML templates using contour-finding
coordinates.
● Employ raw coordinates to manage different
UI elements and their positions in the HTML
templates.
● Codes are fitted into the templates, resulting
in the capture of the final HTML code for the
web interface.

5. HTML Page Builder
● Uphases and conditionals to structure the
HTML code.
● The method involves stack management to
handle individual elements, checking the top
stack for the next building level, and ensuring
proper order in the HTML generation.
● Dynamically builds HTML pages by
considering input objects' y-coordinates and
employing stack management to create
templates for the header, body, and footer
sections.

Results and Evaluation
● Accuracy rate of 87.71% achieved by the proposed system
● Little bit low accuracy from the state-of-the-art model

Limitations and Future Work
● Inability to support hyperlinks and generate CSS or other types of code
● Potential enhancements and improvements for the proposed system
● Future possibilities for expanding the functionality of the system such as
generating text from real-time videos

Conclusion
● The method presented in the study can transform hand-drawn mock-ups of
web pages into well-organized HTML code.
● This process streamlines the transition from design to implementation,
allowing for efficient web page development.

pic2code: Generating HTML Code from Handwritten Picture.pptx

More Related Content

Similar to pic2code: Generating HTML Code from Handwritten Picture.pptx

More from Minhazul Arefin

Recently uploaded

pic2code: Generating HTML Code from Handwritten Picture.pptx