SlideShare a Scribd company logo
1 of 4
Download to read offline
1
Harnessing Big Data to Simplify Debugging
Asi Lifshitz
Vtool Ltd.
Email: asi@thevtool.com
Abstract
Debugging failing tests is a complex and time-consuming task. The common work flow is iterating between the simulation
log file and the simulation waveforms. The simulation log file can often be considered as Big Data, sometimes reaching tens of
Gigabytes. It is a textual file written top-down, listing lots of messages coming from different sources. It is extremely hard to
navigate through the file, while seeking for the necessary information, without being overwhelmed or miss important information.
In this paper we show how Big Data tools can simplify debugging of failing tests and shorten the verification schedule.
Index Terms
API, EDA, RTL, SystemVerilog, UVM.
I. INTRODUCTION
IT is well known today that the process of verification is one of the major bottlenecks towards tape-out. It is also known
that within the process of verification, debug is the most time-consuming task. The designs to be verified have significantly
grown in the past 10 years, which makes debugging even more complex. The common way of debugging a failing test is by
iterating between the waveforms and the simulation log file. As the complexity increases, more messages are printed out in
the log file, and it is therefore not very rare to face log files that reach several Gigabytes. This paper paves a new path for
using Big Data tools to quickly and efficiently extract data from huge log files. Furthermore, as extracting and manipulating
the data gets simpler, the user can then present the data in a graphical way (versus textual), which is much easier for analysis.
In this paper we demonstrate how a log file of a UVM-based project can be easily entered to Lucene database search engine.
Once the file is stored, Lucene can provide the user all the information needed for debugging the failing test.
The contribution of this paper is in providing better ways of analysing long log files, which are the outcome of simulating
large or complex designs. Shortening the debug time will shorten the project schedule and increase the engineer productivity.
For the sake of brevity we will analyse only UVM messages, but the same techniques can be applied to log files that contain
data from several sources, such as C-code, behavioural models and the RTL design.
The rest of the paper is organized as follows: We provide a definition for Big Data in Section II. In Section III we define
what a software database is, and why it is not suitable for debugging simulation log files. A Database Search Engine is
described in Section IV. Section V describes the structure of a UVM message and how it is entered to Lucene. In Section VI
we show how the records received from Lucene can be processed and graphically presented. Section VII concludes the paper.
II. BIG DATA
Big data [1] is a term for data sets that are so large or complex that traditional data processing applications are inadequate.
Challenges include analysis, capture, search, sharing, storage, transfer, visualization, querying and information privacy. The
term often refers simply to the use of advanced methods to extract value from data, and seldom to a particular size of data
set. What is considered ”big data” varies depending on the capabilities of the users and their tools, and expanding capabilities
make big data a moving target. For some organizations, facing few gigabytes of data for the first time may trigger a need
to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a
significant consideration.
III. DATABASE
A database [2] is an organized collection of data. The data is typically organized to model aspects of reality in a way that
supports processes requiring information. A database management system (DBMS) is a computer software application that
interacts with the user, other applications, and the database itself to capture and analyse data. A general-purpose DBMS is
designed to allow the definition, creation, querying, update, and administration of databases.
As far as verification is concerned, a database can be used in case the user wishes to query a specific record, i.e. a specific
message. However, if some computation is required, or actions like regular expressions and alike, a Database search engine
is to be used. A concrete example which goes beyond the capabilities of a database, is when the verification engineer would
like to see all messages from time point tp1 to time point tp2, where these time points are strings within messages.
2
IV. DATABASE SEARCH ENGINE
A database search engine [3] is a search engine that operates on material stored in a digital database. A search engine allows
the user to search for information using simple keywords. In this paper we use Apache Lucene [4] as our search engine. It
is a free and open-source information retrieval software library, originally written in Java by Doug Cutting. Lucene has been
ported to other programming languages including Delphi, Perl, C#, C++, Python, Ruby, and PHP. Lucene is suitable for any
application that requires full text indexing and searching capability. At the core of Lucene’s logical architecture is the idea of a
document containing fields of text. This flexibility allows Lucene’s API to be independent of the file format. Text from PDFs,
HTML, Microsoft Word, as well as many others, can all be indexed as long as their textual information can be extracted. A
simulation log file is a structured textual file, and as such it can be indexed. Once indexed, Lucene API can be used to search
for all the ”interesting” events that are needed for debugging a failing test.
V. UVM MESSAGE
The Universal Verification Methodology (UVM) is a standardized methodology for verifying integrated circuit designs. The
UVM class library brings much automation to the SystemVerilog language such as sequences and data automation features
(packing, copy, compare) etc., and unlike the previous methodologies developed independently by the simulator vendors, is an
Accellera standard with support from multiple vendors. According to the 2014 Wilson Research Group Functional Verification
Study, that is presented in Figure 1, more than 70% of the industry have adopted UVM, and the forecast was that the numbers
will only grow with time.
Fig. 1. Testbench Methodology Adoption Trends
A UVM-based simulation contains UVM messages that usually have the following format:
Verbosity — filename(line) — Timepoint — Emitter — message
The following is an example of UVM message:
UVM ERROR / p r o j e c t / s f l a s h / v e r i f i c a t i o n / SFLASH controller ENV / s r c / s f l a s h c o n t r o l l e r e n v s b . sv (1863) @ 4498000:
uvm test top . env . sb [WRITE MODE SPI DATA ERR] Sent data packet c o n t a i n s 0x532e4000 , but expected 0 x532e4cb3
where
1) UVM ERROR is the verbosity (or severity)
2) / project / sflash / verification /SFLASH controller ENV/src/sflash controller env sb .sv(1863) is the filename(line)
3) @ 4498000 is the time point
4) uvm test top.env.sb is the emitter of the message
3
5) [WRITE MODE SPI DATA ERR] Sent data packet contains 0x532e4000, but expected 0x532e4cb3 is the message.
The format can be modified by the user, but the structure of the messages is usually kept. The first step towards using
Lucene is to parse the log file, so that every message that contains this structure (or any user defined structure) will be broken
to the aforementioned 5 elements and stored as records in the database. Creating a configurable parser in which the format
of the messages can be defined enables parsing any simulation log file. Once the entire log file is parsed and kept inside the
database, the user can use the efficient API of Lucene to extract information. Few examples are quickly receive all messages
of a specific verbosity, or specific verbosity within some time range, messages containing a specific string (i.e. env.sb) or even
data manipulation such as all messages emitted from the APB UVC writing the value of 0X1 to register sflash reg.enable.
Being designed to handle huge records, Lucese returns these records in a negligible time. The data can then be further
processed into a graphical representation, as we present in the following section, or be kept in its original form.
Another improvement is to perform the parsing and entering of records to Lucene during simulation. A simulation that
produces a long log file usually lasts few minutes to few days, depending on the design size, the simulator, the computation
machine etc,. The process of parsing and entering the records to Lucene may take a non-negligible time if done at the end of
simulation. Performing the process while the simulation is ongoing will guarantee that the API is fully accessible during and
at the end of simulation.
VI. GRAPHICAL REPRESENTATIONS OF A LOG FILE
In this section we provide an example that illustrates of how we graphically present the records that are returned from
Lucene in a way that is easy to trace and easy to comprehend.
Fig. 2. Graphical Representation of a Log File
Figure 2 presents a real log file that was processed by the methods presented in Section V. The topmost graph in this
figure is the High Level representation of the entire simulation. It is a histogram of all the messages that exist in the log
file. We then focus on a reduced range (the white rectangle) in which the red line represents the existence of errors. Within
this reduced region 3 players were added. These players are the 3 graphs below the High Level representation. A player is
some query we asked Lucene to perform. The topmost player presents all messages within the range. The Scoreboard presents
only messages containing the word sb, and the Addr player presents the value of the address in messages coming from the
apb system env.master in this region.
The transition from debugging a textual file to a graphical representation requires some adaptation, but once the verification
engineer gets familiar with the graphical images, problems are traced much faster. The engineer can quickly see what is wrong,
when the pattern changes, or when some unexpected event has occurred.
4
VII. CONCLUSION
The complexity and size of designs these days require new techniques, as the traditional ones impose very long debugging
time. In this work we show how harnessing tools that use for processing big data can simplify and shorten the debug time of
failed tests. We hope that this work will pave a new way for research on importing the very strong capabilities that exist in
software to the existing EDA tools.
REFERENCES
[1] Big data. 09-March-2016. In Wikipedia: The Free Encyclopedia. Available from https://en.wikipedia.org/wiki/Big data
[2] Database. 09-March-2016. In Wikipedia: The Free Encyclopedia. Available from https://en.wikipedia.org/wiki/Database
[3] Database search engine. 09-March-2016. In Wikipedia: The Free Encyclopedia. Available from https://en.wikipedia.org/wiki/Database search engine
[4] Apache Lucene. Available from https://lucene.apache.org

More Related Content

What's hot

Lecture 6 -_presentation_layer
Lecture 6 -_presentation_layerLecture 6 -_presentation_layer
Lecture 6 -_presentation_layerSerious_SamSoul
 
DSNs & X.400 assist in ensuring email reliability
DSNs & X.400 assist in ensuring email reliabilityDSNs & X.400 assist in ensuring email reliability
DSNs & X.400 assist in ensuring email reliabilityIOSR Journals
 
A MOBILE AGENT-BASED P2P E-LEARNING SYSTEM. Takao KAWAMURA & others
A MOBILE AGENT-BASED P2P E-LEARNING SYSTEM. Takao KAWAMURA & othersA MOBILE AGENT-BASED P2P E-LEARNING SYSTEM. Takao KAWAMURA & others
A MOBILE AGENT-BASED P2P E-LEARNING SYSTEM. Takao KAWAMURA & otherseraser Juan José Calderón
 
DISTIBUTED OPERATING SYSTEM
DISTIBUTED  OPERATING SYSTEM DISTIBUTED  OPERATING SYSTEM
DISTIBUTED OPERATING SYSTEM AjithaG9
 
Point-to-Point Communicationsin MPI
Point-to-Point Communicationsin MPIPoint-to-Point Communicationsin MPI
Point-to-Point Communicationsin MPIHanif Durad
 
Ccn unit-2- data link layer by prof.suresha v
Ccn unit-2- data link layer by prof.suresha vCcn unit-2- data link layer by prof.suresha v
Ccn unit-2- data link layer by prof.suresha vSURESHA V
 
Performance Evaluation of a Layered WSN Using AODV and MCF Protocols in NS-2
Performance Evaluation of a Layered WSN Using AODV and MCF Protocols in NS-2Performance Evaluation of a Layered WSN Using AODV and MCF Protocols in NS-2
Performance Evaluation of a Layered WSN Using AODV and MCF Protocols in NS-2csandit
 
70-342 Exam-Advanced Solutions of Microsoft Exchange Server 2013
70-342 Exam-Advanced Solutions of Microsoft Exchange Server 201370-342 Exam-Advanced Solutions of Microsoft Exchange Server 2013
70-342 Exam-Advanced Solutions of Microsoft Exchange Server 2013Roedwig Decesare
 
SOLUTION MANUAL OF COMMUNICATION NETWORKS BY ALBERTO LEON GARCIA & INDRA WIDJAJA
SOLUTION MANUAL OF COMMUNICATION NETWORKS BY ALBERTO LEON GARCIA & INDRA WIDJAJASOLUTION MANUAL OF COMMUNICATION NETWORKS BY ALBERTO LEON GARCIA & INDRA WIDJAJA
SOLUTION MANUAL OF COMMUNICATION NETWORKS BY ALBERTO LEON GARCIA & INDRA WIDJAJAvtunotesbysree
 
Modeling and Performance Evaluation TAODV Routing Protocol Using Stochastic P...
Modeling and Performance Evaluation TAODV Routing Protocol Using Stochastic P...Modeling and Performance Evaluation TAODV Routing Protocol Using Stochastic P...
Modeling and Performance Evaluation TAODV Routing Protocol Using Stochastic P...Editor IJCATR
 
Data communications and networking(DCN)
Data communications and networking(DCN)Data communications and networking(DCN)
Data communications and networking(DCN)hiteshchowdary5
 
OSI model (7 layer )
OSI model (7 layer ) OSI model (7 layer )
OSI model (7 layer ) dimuthu22
 
Wired and Wireless Computer Network Performance Evaluation Using OMNeT++ Simu...
Wired and Wireless Computer Network Performance Evaluation Using OMNeT++ Simu...Wired and Wireless Computer Network Performance Evaluation Using OMNeT++ Simu...
Wired and Wireless Computer Network Performance Evaluation Using OMNeT++ Simu...Jaipal Dhobale
 
New strategy to optimize the performance of spray and wait routing protocol
New strategy to optimize the performance of spray and wait routing protocolNew strategy to optimize the performance of spray and wait routing protocol
New strategy to optimize the performance of spray and wait routing protocolijwmn
 

What's hot (19)

Lecture 6 -_presentation_layer
Lecture 6 -_presentation_layerLecture 6 -_presentation_layer
Lecture 6 -_presentation_layer
 
I0935053
I0935053I0935053
I0935053
 
Cn u5
Cn u5Cn u5
Cn u5
 
DSNs & X.400 assist in ensuring email reliability
DSNs & X.400 assist in ensuring email reliabilityDSNs & X.400 assist in ensuring email reliability
DSNs & X.400 assist in ensuring email reliability
 
Data communication q and a
Data communication q and aData communication q and a
Data communication q and a
 
A MOBILE AGENT-BASED P2P E-LEARNING SYSTEM. Takao KAWAMURA & others
A MOBILE AGENT-BASED P2P E-LEARNING SYSTEM. Takao KAWAMURA & othersA MOBILE AGENT-BASED P2P E-LEARNING SYSTEM. Takao KAWAMURA & others
A MOBILE AGENT-BASED P2P E-LEARNING SYSTEM. Takao KAWAMURA & others
 
DISTIBUTED OPERATING SYSTEM
DISTIBUTED  OPERATING SYSTEM DISTIBUTED  OPERATING SYSTEM
DISTIBUTED OPERATING SYSTEM
 
Point-to-Point Communicationsin MPI
Point-to-Point Communicationsin MPIPoint-to-Point Communicationsin MPI
Point-to-Point Communicationsin MPI
 
Ccn unit-2- data link layer by prof.suresha v
Ccn unit-2- data link layer by prof.suresha vCcn unit-2- data link layer by prof.suresha v
Ccn unit-2- data link layer by prof.suresha v
 
Performance Evaluation of a Layered WSN Using AODV and MCF Protocols in NS-2
Performance Evaluation of a Layered WSN Using AODV and MCF Protocols in NS-2Performance Evaluation of a Layered WSN Using AODV and MCF Protocols in NS-2
Performance Evaluation of a Layered WSN Using AODV and MCF Protocols in NS-2
 
70-342 Exam-Advanced Solutions of Microsoft Exchange Server 2013
70-342 Exam-Advanced Solutions of Microsoft Exchange Server 201370-342 Exam-Advanced Solutions of Microsoft Exchange Server 2013
70-342 Exam-Advanced Solutions of Microsoft Exchange Server 2013
 
SOLUTION MANUAL OF COMMUNICATION NETWORKS BY ALBERTO LEON GARCIA & INDRA WIDJAJA
SOLUTION MANUAL OF COMMUNICATION NETWORKS BY ALBERTO LEON GARCIA & INDRA WIDJAJASOLUTION MANUAL OF COMMUNICATION NETWORKS BY ALBERTO LEON GARCIA & INDRA WIDJAJA
SOLUTION MANUAL OF COMMUNICATION NETWORKS BY ALBERTO LEON GARCIA & INDRA WIDJAJA
 
Chapter 6 pc
Chapter 6 pcChapter 6 pc
Chapter 6 pc
 
Modeling and Performance Evaluation TAODV Routing Protocol Using Stochastic P...
Modeling and Performance Evaluation TAODV Routing Protocol Using Stochastic P...Modeling and Performance Evaluation TAODV Routing Protocol Using Stochastic P...
Modeling and Performance Evaluation TAODV Routing Protocol Using Stochastic P...
 
Data communications and networking(DCN)
Data communications and networking(DCN)Data communications and networking(DCN)
Data communications and networking(DCN)
 
OSI model (7 layer )
OSI model (7 layer ) OSI model (7 layer )
OSI model (7 layer )
 
Wired and Wireless Computer Network Performance Evaluation Using OMNeT++ Simu...
Wired and Wireless Computer Network Performance Evaluation Using OMNeT++ Simu...Wired and Wireless Computer Network Performance Evaluation Using OMNeT++ Simu...
Wired and Wireless Computer Network Performance Evaluation Using OMNeT++ Simu...
 
Ipc
IpcIpc
Ipc
 
New strategy to optimize the performance of spray and wait routing protocol
New strategy to optimize the performance of spray and wait routing protocolNew strategy to optimize the performance of spray and wait routing protocol
New strategy to optimize the performance of spray and wait routing protocol
 

Viewers also liked

Ocde corruption-declaration-ministerielle-2016
Ocde corruption-declaration-ministerielle-2016Ocde corruption-declaration-ministerielle-2016
Ocde corruption-declaration-ministerielle-2016Lettredesjuristesdaffaires
 
Android & windows
Android & windowsAndroid & windows
Android & windowsNijitha NM
 
Android & windows
Android & windowsAndroid & windows
Android & windowsNijitha NM
 
IAS. Istoria schimbarilor pozitive"
IAS. Istoria schimbarilor pozitive"IAS. Istoria schimbarilor pozitive"
IAS. Istoria schimbarilor pozitive"Tatiana Castraşan
 
IAS. The story of positive changes
IAS. The story of positive changesIAS. The story of positive changes
IAS. The story of positive changesTatiana Castraşan
 
Proiectele Uniunii Europene în Moldova
Proiectele Uniunii Europene în MoldovaProiectele Uniunii Europene în Moldova
Proiectele Uniunii Europene în MoldovaTatiana Castraşan
 
Albumul "Pentru o copilărie frumoasă"
Albumul "Pentru o copilărie frumoasă"Albumul "Pentru o copilărie frumoasă"
Albumul "Pentru o copilărie frumoasă"Tatiana Castraşan
 
Загальна будова комп'ютера
Загальна будова комп'ютераЗагальна будова комп'ютера
Загальна будова комп'ютераNatasha Scherbina
 
Pentru o copilărie frumoasă, ediţia II
Pentru o copilărie frumoasă, ediţia IIPentru o copilărie frumoasă, ediţia II
Pentru o copilărie frumoasă, ediţia IITatiana Castraşan
 

Viewers also liked (12)

Ocde corruption-declaration-ministerielle-2016
Ocde corruption-declaration-ministerielle-2016Ocde corruption-declaration-ministerielle-2016
Ocde corruption-declaration-ministerielle-2016
 
Android & windows
Android & windowsAndroid & windows
Android & windows
 
Android & windows
Android & windowsAndroid & windows
Android & windows
 
Observatoire des delais de paiement
Observatoire des delais de paiementObservatoire des delais de paiement
Observatoire des delais de paiement
 
Rapport Badinter
Rapport BadinterRapport Badinter
Rapport Badinter
 
IAS. Istoria schimbarilor pozitive"
IAS. Istoria schimbarilor pozitive"IAS. Istoria schimbarilor pozitive"
IAS. Istoria schimbarilor pozitive"
 
IAS. The story of positive changes
IAS. The story of positive changesIAS. The story of positive changes
IAS. The story of positive changes
 
Proiectele Uniunii Europene în Moldova
Proiectele Uniunii Europene în MoldovaProiectele Uniunii Europene în Moldova
Proiectele Uniunii Europene în Moldova
 
Multimedia
MultimediaMultimedia
Multimedia
 
Albumul "Pentru o copilărie frumoasă"
Albumul "Pentru o copilărie frumoasă"Albumul "Pentru o copilărie frumoasă"
Albumul "Pentru o copilărie frumoasă"
 
Загальна будова комп'ютера
Загальна будова комп'ютераЗагальна будова комп'ютера
Загальна будова комп'ютера
 
Pentru o copilărie frumoasă, ediţia II
Pentru o copilărie frumoasă, ediţia IIPentru o copilărie frumoasă, ediţia II
Pentru o copilărie frumoasă, ediţia II
 

Similar to BigDataDebugging

Database project edi
Database project ediDatabase project edi
Database project ediRey Jefferson
 
System Structure for Dependable Software Systems
System Structure for Dependable Software SystemsSystem Structure for Dependable Software Systems
System Structure for Dependable Software SystemsVincenzo De Florio
 
IRJET - Health Medicare Data using Tweets in Twitter
IRJET - Health Medicare Data using Tweets in TwitterIRJET - Health Medicare Data using Tweets in Twitter
IRJET - Health Medicare Data using Tweets in TwitterIRJET Journal
 
Differences Between Architectures
Differences Between ArchitecturesDifferences Between Architectures
Differences Between Architecturesprasadsmn
 
Linux Assignment 3
Linux Assignment 3Linux Assignment 3
Linux Assignment 3Diane Allen
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET Journal
 
Simple Obfuscation Tool for Software Protection
Simple Obfuscation Tool for Software ProtectionSimple Obfuscation Tool for Software Protection
Simple Obfuscation Tool for Software ProtectionQUESTJOURNAL
 
Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...
Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...
Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...Nikhil Jain
 
TOLL MANAGEMENT SYSTEM
TOLL MANAGEMENT SYSTEMTOLL MANAGEMENT SYSTEM
TOLL MANAGEMENT SYSTEMvishnuRajan20
 
Toll management system (1) (1)
Toll management system (1) (1)Toll management system (1) (1)
Toll management system (1) (1)vishnuRajan20
 
Association Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure AnalysisAssociation Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure AnalysisEditor IJMTER
 

Similar to BigDataDebugging (20)

IT6701-Information management question bank
IT6701-Information management question bankIT6701-Information management question bank
IT6701-Information management question bank
 
Database project
Database projectDatabase project
Database project
 
Ems
EmsEms
Ems
 
Database project edi
Database project ediDatabase project edi
Database project edi
 
System Structure for Dependable Software Systems
System Structure for Dependable Software SystemsSystem Structure for Dependable Software Systems
System Structure for Dependable Software Systems
 
Cloud Spanner
Cloud SpannerCloud Spanner
Cloud Spanner
 
Distributed Systems in Data Engineering
Distributed Systems in Data EngineeringDistributed Systems in Data Engineering
Distributed Systems in Data Engineering
 
IRJET - Health Medicare Data using Tweets in Twitter
IRJET - Health Medicare Data using Tweets in TwitterIRJET - Health Medicare Data using Tweets in Twitter
IRJET - Health Medicare Data using Tweets in Twitter
 
rscript_paper-1
rscript_paper-1rscript_paper-1
rscript_paper-1
 
Differences Between Architectures
Differences Between ArchitecturesDifferences Between Architectures
Differences Between Architectures
 
API Integration
API IntegrationAPI Integration
API Integration
 
Linux Assignment 3
Linux Assignment 3Linux Assignment 3
Linux Assignment 3
 
publishable paper
publishable paperpublishable paper
publishable paper
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
 
Simple Obfuscation Tool for Software Protection
Simple Obfuscation Tool for Software ProtectionSimple Obfuscation Tool for Software Protection
Simple Obfuscation Tool for Software Protection
 
Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...
Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...
Second phase report on "ANALYZING THE EFFECTIVENESS OF THE ADVANCED ENCRYPTIO...
 
TOLL MANAGEMENT SYSTEM
TOLL MANAGEMENT SYSTEMTOLL MANAGEMENT SYSTEM
TOLL MANAGEMENT SYSTEM
 
Toll management system (1) (1)
Toll management system (1) (1)Toll management system (1) (1)
Toll management system (1) (1)
 
Association Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure AnalysisAssociation Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure Analysis
 
Observability
ObservabilityObservability
Observability
 

BigDataDebugging

  • 1. 1 Harnessing Big Data to Simplify Debugging Asi Lifshitz Vtool Ltd. Email: asi@thevtool.com Abstract Debugging failing tests is a complex and time-consuming task. The common work flow is iterating between the simulation log file and the simulation waveforms. The simulation log file can often be considered as Big Data, sometimes reaching tens of Gigabytes. It is a textual file written top-down, listing lots of messages coming from different sources. It is extremely hard to navigate through the file, while seeking for the necessary information, without being overwhelmed or miss important information. In this paper we show how Big Data tools can simplify debugging of failing tests and shorten the verification schedule. Index Terms API, EDA, RTL, SystemVerilog, UVM. I. INTRODUCTION IT is well known today that the process of verification is one of the major bottlenecks towards tape-out. It is also known that within the process of verification, debug is the most time-consuming task. The designs to be verified have significantly grown in the past 10 years, which makes debugging even more complex. The common way of debugging a failing test is by iterating between the waveforms and the simulation log file. As the complexity increases, more messages are printed out in the log file, and it is therefore not very rare to face log files that reach several Gigabytes. This paper paves a new path for using Big Data tools to quickly and efficiently extract data from huge log files. Furthermore, as extracting and manipulating the data gets simpler, the user can then present the data in a graphical way (versus textual), which is much easier for analysis. In this paper we demonstrate how a log file of a UVM-based project can be easily entered to Lucene database search engine. Once the file is stored, Lucene can provide the user all the information needed for debugging the failing test. The contribution of this paper is in providing better ways of analysing long log files, which are the outcome of simulating large or complex designs. Shortening the debug time will shorten the project schedule and increase the engineer productivity. For the sake of brevity we will analyse only UVM messages, but the same techniques can be applied to log files that contain data from several sources, such as C-code, behavioural models and the RTL design. The rest of the paper is organized as follows: We provide a definition for Big Data in Section II. In Section III we define what a software database is, and why it is not suitable for debugging simulation log files. A Database Search Engine is described in Section IV. Section V describes the structure of a UVM message and how it is entered to Lucene. In Section VI we show how the records received from Lucene can be processed and graphically presented. Section VII concludes the paper. II. BIG DATA Big data [1] is a term for data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, search, sharing, storage, transfer, visualization, querying and information privacy. The term often refers simply to the use of advanced methods to extract value from data, and seldom to a particular size of data set. What is considered ”big data” varies depending on the capabilities of the users and their tools, and expanding capabilities make big data a moving target. For some organizations, facing few gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration. III. DATABASE A database [2] is an organized collection of data. The data is typically organized to model aspects of reality in a way that supports processes requiring information. A database management system (DBMS) is a computer software application that interacts with the user, other applications, and the database itself to capture and analyse data. A general-purpose DBMS is designed to allow the definition, creation, querying, update, and administration of databases. As far as verification is concerned, a database can be used in case the user wishes to query a specific record, i.e. a specific message. However, if some computation is required, or actions like regular expressions and alike, a Database search engine is to be used. A concrete example which goes beyond the capabilities of a database, is when the verification engineer would like to see all messages from time point tp1 to time point tp2, where these time points are strings within messages.
  • 2. 2 IV. DATABASE SEARCH ENGINE A database search engine [3] is a search engine that operates on material stored in a digital database. A search engine allows the user to search for information using simple keywords. In this paper we use Apache Lucene [4] as our search engine. It is a free and open-source information retrieval software library, originally written in Java by Doug Cutting. Lucene has been ported to other programming languages including Delphi, Perl, C#, C++, Python, Ruby, and PHP. Lucene is suitable for any application that requires full text indexing and searching capability. At the core of Lucene’s logical architecture is the idea of a document containing fields of text. This flexibility allows Lucene’s API to be independent of the file format. Text from PDFs, HTML, Microsoft Word, as well as many others, can all be indexed as long as their textual information can be extracted. A simulation log file is a structured textual file, and as such it can be indexed. Once indexed, Lucene API can be used to search for all the ”interesting” events that are needed for debugging a failing test. V. UVM MESSAGE The Universal Verification Methodology (UVM) is a standardized methodology for verifying integrated circuit designs. The UVM class library brings much automation to the SystemVerilog language such as sequences and data automation features (packing, copy, compare) etc., and unlike the previous methodologies developed independently by the simulator vendors, is an Accellera standard with support from multiple vendors. According to the 2014 Wilson Research Group Functional Verification Study, that is presented in Figure 1, more than 70% of the industry have adopted UVM, and the forecast was that the numbers will only grow with time. Fig. 1. Testbench Methodology Adoption Trends A UVM-based simulation contains UVM messages that usually have the following format: Verbosity — filename(line) — Timepoint — Emitter — message The following is an example of UVM message: UVM ERROR / p r o j e c t / s f l a s h / v e r i f i c a t i o n / SFLASH controller ENV / s r c / s f l a s h c o n t r o l l e r e n v s b . sv (1863) @ 4498000: uvm test top . env . sb [WRITE MODE SPI DATA ERR] Sent data packet c o n t a i n s 0x532e4000 , but expected 0 x532e4cb3 where 1) UVM ERROR is the verbosity (or severity) 2) / project / sflash / verification /SFLASH controller ENV/src/sflash controller env sb .sv(1863) is the filename(line) 3) @ 4498000 is the time point 4) uvm test top.env.sb is the emitter of the message
  • 3. 3 5) [WRITE MODE SPI DATA ERR] Sent data packet contains 0x532e4000, but expected 0x532e4cb3 is the message. The format can be modified by the user, but the structure of the messages is usually kept. The first step towards using Lucene is to parse the log file, so that every message that contains this structure (or any user defined structure) will be broken to the aforementioned 5 elements and stored as records in the database. Creating a configurable parser in which the format of the messages can be defined enables parsing any simulation log file. Once the entire log file is parsed and kept inside the database, the user can use the efficient API of Lucene to extract information. Few examples are quickly receive all messages of a specific verbosity, or specific verbosity within some time range, messages containing a specific string (i.e. env.sb) or even data manipulation such as all messages emitted from the APB UVC writing the value of 0X1 to register sflash reg.enable. Being designed to handle huge records, Lucese returns these records in a negligible time. The data can then be further processed into a graphical representation, as we present in the following section, or be kept in its original form. Another improvement is to perform the parsing and entering of records to Lucene during simulation. A simulation that produces a long log file usually lasts few minutes to few days, depending on the design size, the simulator, the computation machine etc,. The process of parsing and entering the records to Lucene may take a non-negligible time if done at the end of simulation. Performing the process while the simulation is ongoing will guarantee that the API is fully accessible during and at the end of simulation. VI. GRAPHICAL REPRESENTATIONS OF A LOG FILE In this section we provide an example that illustrates of how we graphically present the records that are returned from Lucene in a way that is easy to trace and easy to comprehend. Fig. 2. Graphical Representation of a Log File Figure 2 presents a real log file that was processed by the methods presented in Section V. The topmost graph in this figure is the High Level representation of the entire simulation. It is a histogram of all the messages that exist in the log file. We then focus on a reduced range (the white rectangle) in which the red line represents the existence of errors. Within this reduced region 3 players were added. These players are the 3 graphs below the High Level representation. A player is some query we asked Lucene to perform. The topmost player presents all messages within the range. The Scoreboard presents only messages containing the word sb, and the Addr player presents the value of the address in messages coming from the apb system env.master in this region. The transition from debugging a textual file to a graphical representation requires some adaptation, but once the verification engineer gets familiar with the graphical images, problems are traced much faster. The engineer can quickly see what is wrong, when the pattern changes, or when some unexpected event has occurred.
  • 4. 4 VII. CONCLUSION The complexity and size of designs these days require new techniques, as the traditional ones impose very long debugging time. In this work we show how harnessing tools that use for processing big data can simplify and shorten the debug time of failed tests. We hope that this work will pave a new way for research on importing the very strong capabilities that exist in software to the existing EDA tools. REFERENCES [1] Big data. 09-March-2016. In Wikipedia: The Free Encyclopedia. Available from https://en.wikipedia.org/wiki/Big data [2] Database. 09-March-2016. In Wikipedia: The Free Encyclopedia. Available from https://en.wikipedia.org/wiki/Database [3] Database search engine. 09-March-2016. In Wikipedia: The Free Encyclopedia. Available from https://en.wikipedia.org/wiki/Database search engine [4] Apache Lucene. Available from https://lucene.apache.org