UML Generator (NCC18)
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
Uploaded on

Imran Sarwar Bajwa, M. Abbas Choudhary [2006], "Natural Language Processing based Automated System for UML Diagrams Generation", in Saudi 18th National Conference on Computer Application, 2006,......

Imran Sarwar Bajwa, M. Abbas Choudhary [2006], "Natural Language Processing based Automated System for UML Diagrams Generation", in Saudi 18th National Conference on Computer Application, 2006, (18th NCCA) Riyadh, Kingdom of Saudi Arabia pp:171-176

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
414
On Slideshare
413
From Embeds
1
Number of Embeds
1

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 1

http://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Natural language processing based automated system Natural language for UML diagrams generation processing Imran Sarwar Bajwa, M. Abbas Choudhary Computer and Emerging Sciences, Balochistan University of Information Technology and Management Sciences Quetta, PakistanKeywords Natural language processing, Knowledge Engineering, Automatic diagrams generation, Text Understanding,UML Diagrams, Information extraction, UML design.Abstract This paper presents a natural language processing based automated system for generating UML diagrams afteranalyzing the given business details in the form of the text. A new model is presented for analyzing the natural languagesand extracting the relative and required information from the given storyline by the user. User writes the requirementsin simple English in a few paragraphs and the designed system has conspicuous ability to analyze the given script. After 1compound analysis and extraction of associated information, the designed system draws various UML diagrams as activ-ity diagrams, sequence diagrams and class diagrams. Other conventional CASE tools require a lot of extra time andefforts from the system analyst during the process of creating, arranging, labeling and finishing the UML diagrams. Thedesigned system provides a quick and reliable way to generate UML diagrams to save the time and budget of both theuser and system analyst.IntroductionThe looks and styles of software engineering have been completely changed in the recent times.These days step of software engineering follows the rules of Object Oriented design patterns. Allphases of software engineering are deviating from the conventions and new paradigms are morepopular these days. Same the case is with Software analysis process which uses Unified ModelingLanguage to map and model the user requirements. Analysis is the key process of building moderninformation system applications and base for the robust and vigorous software application’s designand development.There are various object-oriented modeling languages and tools. The Unified Modeling Language(UML) is one of the famous languages for the object-oriented analysis and design of the softwareapplications. UML is a standard language that is used to identify, visualize, develop and documentthe components of software systems. Additionally, it is used for modeling and mapping the busi-ness logic and other non-software systems. Large and complex systems can easily be modeled byusing UML as it is a very important part of developing objects oriented software and the softwaredevelopment process. Like other conventional methodologies, UML also uses graphical notationsto represent and depict the design and flow of the software projects.In recent times, there is no software which provides services to draw UML diagrams more effi-ciently except Rational Rose, Smart Draw etc and there is no doubt that these are reasonably goodsoftware but has many disadvantages. According to the norms and conventions, the system analysthas to do a lot of work for deducing the business logic and understanding the user requirementsbefore drawing the UML diagrams by using orthodox CASE tools. Hence, there is wastage of somuch time due to the dull nature of the available CASE tools for the required scenario. In today’sworld everybody needs a quick and reliable service. So it was needed that there should be some sortof intelligent software for generating UML based documentation to save time and budget of boththe user and system analyst.Description of ProblemFew years ago data flow diagram’s (DFD) were being used to symbolize the flow of data and rep-resent the user’s requirements. But in current age, unified modeling language is used to model andmap the user requirements, which is more comprehensive e and authentic way to of representationand it is beneficial for the later stages of software development. The problem specifically addressedin this research is primarily related to the software analysis and design phase of the software devel-opment process. The software in the current market which provides this facility is just paint liketools as Visual UML, GD Pro, Smart Draw, Rational Rose etc. All of them have dull nature. To usethe extensively overloaded interface of these CASE tools is a vexing problem.The process of generating the UML diagrams through these software engineering tools is very dif- 18th Nationalficult, time consuming and lengthy process to perform. Therefore, it was needed that any individual Computer Conference 2006person involved obligatory in software development may get his required output with maximum © Saudi Computeraccuracy in minimum time consumed. Society
  • 2. Proposed Solution Object-oriented modeling in less time and effort is significant requirement. In order to resolve all such issues and provide some robust solutions, a helpful framework is required, which has sound ability to facilitate and assist both the users and software engineers. The functionality of the con- ducted research was domain specific but it can be enhanced easily in the future according to the requirements. Current designed system incorporate the capability of mapping user requirements after reading the given requirements in plain text and drawing the set of UML diagrams as Class Diagram, Activity Diagram, Sequence Diagram, Use case diagram and Component Diagram. An Integrated Development Environment would also be provided for User Interaction and efficient Input and output. Object-Oriented Analysis and Design Analysis and design of an information system relates to understand and intend the framework to accomplish the actual job. Typically, design is relates to manage and control the complexity param- eter in a domain. A robust design method also helps to split big tasks into controllable breakups2 (Condamines, 2001). In software engineering, design methods provide various notation usually graphical ones. These notations allow to store and communicate the perpetual design decisions. Object-oriented design has overruled the typical analysis and design techniques as structured design and data-driven design (Androutsopoulos, 1995). As compared to old style design paradigms, object- oriented design models the every active entity to the problem domain using concept and methods Object-oriented languages use variable of manifest the state of an object of objects. or procedures to implement the behaviour of an object. For example, a ball could be an Objects have: • State (shape andare different parameters of shape as colour, size, diameter, shape, type, object. There condition) • Behaviourobject can also have behaviour as throw, roll, catch, hit, etc. The major task in etc. This (What they perform) analysis and design phase is to identify the valid objects and specify there states and Object-oriented languages use variable to manifest the state of an object and methods or procedures to behaviours. In conventional object. Forsystem analyst could be anthis tough job and then implement the behaviour of an methods, example, a ball performs object. There are different parameters ofinformation into UML using some graphicalThis object can or Rational Rose. as maps this shape as colour, size, diameter, shape, type, etc. tool as Visio also have behaviour throw, roll, catch, hit, etc. The major task in analysis and design phase is to identify the valid objects and specify there states and behaviours. In conventional methods, system analyst performs this tough job and then maps this information intoobjects are some graphical tool as Visiofrom a problem In the context of this research, UML using automatically identified or Rational Rose. domain. User provides the input text in English language related to the business In domain. Afterthis research, analysisare automatically identified fromis performed on word the context of the lexical objects of the text, syntax analysis a problem domain. User providesto recognize theEnglishcategory (Androutsopoulos, 1995). First of the lexical analysis level the input text in word language related to the business domain. After all the available of the text, syntax analysis is performed on word level to recognize the word category (Androutso- lexicons are categorized into nouns, pronouns, prepositions, adverbs, articles, poulos, 1995). First of all the available lexicons are categorized into nouns, pronouns, prepositions, adverbs, articles, etc. The syntacticThe syntacticthe programs would have to behave position a conjunctions, conjunctions, etc. analysis of analysis of the programs would in a to be in position to isolate subject, verbs, objects, adverbs, adjectives and variousother complements.It is to isolate subject, verbs, objects, adverbs, adjectives and various other complements. It little little complex and multipart procedure. is complex and multipart procedure. "Zia isis playingwith the red ball.” “Zia playing with red ball." For this example, following is theis the output. For this example, following output. Lexicons Phase-I Phase –II Zia Noun Object is Helping-Verb ------- playing Verb Method with Preposition ------- the Article ------- red Noun Attribute ball Noun Object This is the final output of lexical assessment phase and all nouns are marked as objects and verbs are marked as final output of lexical assessment phase and all nouns are marked In the above This is the methods and all adjective are marked as states of that particular object. as objects example, there are marked ‘Ali’methods andthe concerned methodmarked as states of that and verbs is one object as and ‘work’ is all adjective are of the object Ali. particular object. In the above example, there is one object ‘Ali’ and ‘work’ is the Natural Language Processing concerned method of the object Ali. The understanding and multi-aspect processing of the natural languages that are also termed as “speech languages”, is actually one of the arguments of greater interest in the field artificial intel- ligence fieldLanguage Processing natural languages are irregular and asymmetrical. Tradition- Natural (Strzalowski, 1995). The ally, natural languages are based on un-formal grammars. There naturalgeographical, psychological The understanding and multi-aspect processing of the are the languages that are also and sociological factors which influence the behaviours of natural languages (Losee, 1996). There termed as "speech languages", is actually one of the arguments of greater interest in the field artificial intelligence field (Strzalowski, 1995). The natural languages are irregular and asymmetrical. Traditionally, natural languages are based on un-formal grammars. There are the geographical, psychological and sociological factors which influence the
  • 3. are undefined set of words and they also change and vary area to area and time to time. Due to Naturalthese variations and inconsistencies, the natural languages have different flavours as English lan- languageguage has more than half dozen renowned flavours all over the world. These flavours have differentaccents, set of vocabularies and phonological aspects. These ominous and menacing discrepancies processingand inconsistencies in natural languages make it a difficult task to process them as compared to theformal languages (Krovetz, 1992).In the process of analyzing and understanding the natural languages, various problems are usuallyfaced by the researchers. The problems connected to the greater complexity of the natural languageare verb’s conjugation, inflexion, lexical amplitude, problem of ambiguity, etc. From this set ofproblems the problem which ever causes more difficulties is problem of ambiguity. Ambiguitycould be easily solved at the syntax and semantic level by using a sound and robust rule-basedsystem.Used MethodologyConventional natural language processing based systems use rule based systems. Agents are another 3way to develop speech language based systems (Krovetz, 1992). In the research, a rule-based algo-rithm has been designed and used which has robust ability to read, understand and extract thedesired information. First of all, basic elements of the language grammar are extracted (Drouin,2004) as verbs, nouns, adjectives, etc then on the basis of this extracted information further pro-cessing is performed. In linguistic terms, verbs often specify actions, and noun phrases the objectsthat participate in the action (Zelle, 1993). Each noun phrase’s then role specifies how the objectparticipates in the action. As in the following example Ali is agent: “Ali is writing a letter with a pen.”A procedure that understands such a sentence must discover the agent because he performs theaction of writing, that the letter as the thematic object because it is the object that is written, andthat the pen is an instrument because it is the tool with which hitting is done (Gómez-Pérez, 2005).Thus, complete sentence analysis finds information about the agent, co-agent, thematic object, ben-eficiary, etc. The identification of such information specifically helps to understand the meanings ofthe input sentence as given below.Agent: The agent causes the action to occur as in “Ahmed hit the ball,” Ahmed is agent who per-forms the task. But in this example a passive sentence, the agent also may appear as “The ball washit by Ahmed.’’Co-agent: If agent is working with any other partner that is called co-agent. Both of them carry outthe action together as “Ahmed played tennis with Ali.”Beneficiary: The beneficiary is the person for whom an action has bee performed: “Ahmed broughtthe balls for Ali.” In this sentence Ali is beneficiary.Thematic object: The thematic object is the object the sentence is really all about— typically theobject, undergoing a change. Often the thematic object is the same as the syntactic direct object, as“Ahmed hit the ball.” Here the ball is thematic object.Conveyance: The conveyance is something in which or on which agent travels: ‘Ahmed goes bytrain.”Trajectory: Motion from source to destination takes place over a trajectory. ID contrast to the otherrole possibilities, several prepositions can serve to introduce trajectory noun phrases: “Ahmed andAli went to London from Islamabad”Location: The location is where an action occurs. Several prepositions are manifesting the loca-tion usually a noun phrase as “Ali studied in the library, at a desk, by the wall, a picture, near thedoor.”Time: Time specifies when an action occurs. Prepositions such at, before and after introduce nounto depict time as “Ahmed and Ali left before Evening.”Duration: Duration specifies how long an action takes. Preposition such as since and for indicateduration. “Ahmed and Ali walked for an hour.”
  • 4. Time: Time specifies when an action occurs. Prepositions such at, before and after introduce noun to depict time as "Ahmed and Ali left before Evening." Duration: Duration specifies how long an action takes. Preposition such as since and for indicate duration. "Ahmed and Ali walked for an hour.” Architecture of Designed Designed System Architecture of System The designed UMLG systemThis system draws diagrams UML diagrams after reading acquisition, Syntactic The designed UMLG system hasto draw UML diagrams after reading thethe text scenario pro- vided by the user. has ability ability to draw in five modules: Text input text scenario provided byText user. This system draws diagrams in five modules: Text input Analysis, the understanding, Knowledge extraction, and finally Generation of UML diagrams as acquisition,shown in following figure 1. understanding, Knowledge extraction, and finally Syntactic Analysis, Text Generation of UML diagrams as shown in following figure 1. Class, activity, etc Diagrams Diagram Generationure 1. Objects, methods, attributes Identificationecture of 4Natural Knowledge Extractionguage essing Understanding Meanings sed mated Figure 1. Semantic Analysisem for Architecture ML of the Natural Extracting Nouns, Verbs, Adjectives, etcgramsLanguage Processing ration based Syntax Analysis Automated System Token Extraction from given text for UML Diagrams Lexical Analysis Generation Text Input Acquisition from user i. Text input acquisitionacquisition i. Text input This module helps to acquire input text scenario. User provides the business scenario in from of para- This module helpsof the text. This module scenario. input text in the formbusiness scenario in the words or graphs to acquire input text reads the User provides the characters and generates from of paragraphs (Tang, 2001) This module reads the input text in Thisform characters lexicons of the text. by concatenating the input characters. the module is the implementation of and generates the words or lexicons (Tang, 2001) by concatenating the input characters. this module. the lexical phase. Language specified lexicons or tokens or symbols are generated in This module is the implementation of the lexical phase. Language specified lexicons or ii. Syntactic Analysis tokens or symbols the second modulethisthe deigned framework and it reads the input from module one in the This is are generated in of module. ii. Syntactic form of words. These words are categorized into various classes as verbs, helping verbs, nouns, pro- Analysis nouns, adjectives, prepositions, conjunctions, (Fagan, 1989) etc on the basis of the defined rules for This is the second module of the rules are defined here and it readsof the standard English grammatical rules categorization. A set of deigned framework on the basis the input from module one in the formcalled parts of speech conventions. also of words. These words are categorized into various classes as verbs, helping verbs, Text Understanding adjectives, prepositions, conjunctions, (Fagan, 1989) iii. nouns, pronouns, etc on the basis module defined rules for categorization. A set of words. The defined here given text are This of the reads the input from module 1 in the form of rules are meanings of the on the basis of the standard English semantic rules (Malaisé, 2005). These words are categorized into vari- inferred on this module using grammatical rules also called parts of speech conventions. classes as verbs, helping verbs, nouns, pronouns, adjectives, prepositions, conjunctions, etc. ous iv. Knowledge extraction Required data attributes are extracted in this module (Rijsbergen, 1977) according to the given guide lines. This module, extracts different objects and classes and their respective attributes on the basses of the input provided by the preceding module. Nouns are symbolized as classes and objects and their associated attributes are termed as attributes. v. UML diagram generation This is the last module, which finally uses UML symbols and draws various UML diagrams by com- bining available symbols according to the information extracted of the previous module. As separate
  • 5. diagrams diagram generation v. UML by combining available symbols according to the information extracted of the previous module. As separate scenario will be provided for various diagrams as classes, This is the last module, which finally uses UML symbols and draws various UML sequence and combining available so the separate functions information extracted of the diagrams by activity diagrams, symbols according to the are implemented for the respective module. As separate scenario will be provided for various diagrams as classes, previous diagram. sequence and activity diagrams, so the separate functions are implemented for the Accuracy Evaluation respective diagram. To test the accuracyprovided for various diagramsby the designed system four parameters so the scenario will be of the diagrams generated as classes, sequence and activity diagrams, Natural separate functions are implemented for the respective diagram. Accuracy Evaluation generated diagram from each category was checked. Maximum language had been decided. Each scoreAccuracy Evaluationthe diagrams generatednominations and extractions, the points was declared 25. According to the wrong by the designed system four parameters To test the accuracy of processing wereTo testdecided. Eachof the diagrams generated by the designed system four parameters had been had detected. A matrix ofgenerated diagram from each category was checked. Maximum been the accuracy results of generated diagrams is shown below. decided. Each generated diagram from each category was checked. Maximum score was declared score was declared 25. According to the wrong nominations and extractions, the points Table 1. were detected. A matrixwrong nominations and extractions,is shown below. detected. A matrix of 25. According to the the points were results of generated diagrams is shown below. diagrams of results of generated Testing Dig. Types Objects Attributes Sequence labeling Totalresults of Table 1. Class 22 24 20 19 85%different Testing Dig. Types Objects Attributes Sequence labeling Total UML of results Activity 23 21 16 20 80%Diagrams Class 22 24 20 19 85% different Sequence 21 24 21 22 88% UML Activity 23 21 16 20 80% 5 Diagrams Sequence 21 24 21 22 88% Table 1. A matrix representing UML diagrams accuracy test (%) for class, activity and sequence Testing diagrams has been constructed. Overall diagrams accuracy for all types of UML results of A matrix representing UML diagrams accuracy test (%) for class, activity and sequence diagrams different diagrams is determinedUML diagrams accuracy test (%)typesclass, activity and is determined by A matrix representing by adding total accuracy for all categories and calculating its has been constructed. Overall diagrams accuracy of all for of UML diagrams sequence average thattotal accuracy of case. UML diagrams has83% in constructed. Overall calculating its average that is 83% in this case. adding is been this all categories and diagrams accuracy for all types of UML Diagrams diagrams is determined by adding total accuracy of all categories and calculating its average that is 83%30 this case. inFigure 2. 25 Graphical 30 20 Class Figure 2.presentation 25 15 Activity Figure 2.Aof the Graphical 20 10 Sequence A Graphicalccuracy ofepresentation Class representation 15generated of the 5 Activity of theDiagrams of accuracy 10 0 Sequence accuracy of generated 5 Objects Attributes Sequence labeling generated Diagrams Diagrams 0 The graph above is showing the accuracy ratio of various diagram types in terms of objects, attri- Objects Attributes Sequence labeling butes, sequence and labeling parameters. Conclusion This research is all about the dynamic generation of the UML diagrams by reading and analyzing the given scenario in English language provided by the user. The designed system can find out the classes and objects and their attributes and operations using an artificial intelligence technique such as natural language processing. Then the UML diagrams such as Activity dig., Sequence dig., Component dig., Use Case dig., etc would be drawn. The accuracy of the software is expected up to about 80% with the involvement of the software engineer provided that he has followed the pre-requisites of the software to prepare the input scenario. The given scenario should be complete and written in simple and correct English. Under the scope of our project, software will perform a complete analysis of the scenario to find the classes, their attributes and operations. It will also draw the following diagrams. An elegant graphical user interface has also been provided to the user for entering the Input scenario in a proper way and generating UML diagrams. Future Work The designed system for generating UML diagrams was started with the aims that there should be a software which can read the user requirements given in the form English language text and can draw the selected types of the UML diagrams such as Class diagram, activity diagram, sequence diagram, use case diagram, component diagram, deployment diagram. But last three of them use case diagram, component diagram, deployment diagram are still untouched. There is also some margin of improvements in the algorithms for generating first four types Class diagram, activity diagram, sequence diagram. Current accuracy of generating diagrams is about
  • 6. 80% to 85%. It can be enhanced up to 95% by improving the algorithms and inducing the ability of learning. References Androutsopoulos, G. D. Ritchie, and P. Thanisch. 1995. “Natural Language Interfaces to Databases – An Introduction.” Natural Language Engineering, vol 1, part 1, pages 29–81. B.J. Grosz, D. Appelt, P. Martin, and F. Pereira. (1987). “TEAM: An Experiment in the Design of Transportable Natural Language Inter- faces”. Artificial Intelligence 32, pages 173–243. Condamines, Anne and Rebeyrolle, Josette. (2001). “Searching for and identifying conceptual relationships via a corpus based approach to a Terminological Knowledge Base (CTKB): Method and Results”, Recent Advances in Computational Terminology, pp. 127-148 Drouin Patrick. (2004). “Detection of Domain Specific Terminology Using Corpora Comparison.” Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC), Lisbon, Portugal.6 Fagan, J. L. (1989). “The effectiveness of a non-syntactic approach to automatic phrase indexing for document retrieval”, Journal of the American Society for Information Science, 40(2), 115–132. Gómez-Pérez Asunción, F. Mariano, C. Oscar, (2004) “Ontological Engineering: with examples from the areas of Knowledge Manage- ment”, e-Commerce and the Semantic Web. Springer J. M. Zelle and R. J. Mooney, (1993), “Learning semantic grammars with constructive inductive logic programming”, in: Proceedings of the 11th National Conference on Artificial Intelligence (AAAI Press/MIT Press, Washington, D.C.) , pp. 817–822. Khoo Christopher, Chan Syin, Niu Yun, (2002) “The Many Facets of the Cause-Effect Relation”, The Semantics of Relationships. Kluwer Academic Press. pp. 51-70 Krovetz, R., Croft, W. B. (1992). “Lexical ambiguity and information retrieval.” ACM Transactions on Information Systems, 10, pp. 115–141. Losee, R. M. (1996). “Learning syntactic rules and tags with genetic algorithms for information retrieval and filtering: An empirical basis for grammatical rules.” Information Processing and Management, 32(2), 185–197. L. R. Tang and R. J. Mooney, 2001. “Using Multiple Clause Constructors in Inductive Logic Programming for Semantic Parsing”. In Proc. of the 12th European Conference on Machine Learning (ECML- 2001), Freiburg, Germany, pages 466–477. Malaisé Véronique, Zweigenbaum Pierre, Bachimont Bruno, (2005) “Mining Defining Contexts to Help Structuring Differential Ontolo- gies”, Terminology, 11:1 Rijsbergen V., C. (1977). “A theoretical basis for use of co-occurrence data in information retrieval.” Journal of Documentation, 33(2), 106–119. S. Weiss, C. Apte, D. Johnson, F. Oles, T. Goetz and T. Hampp, (1999), “Maximizing text-mining performance”, IEEE Intelligent Systems 14, 63-69. Strzalowski, T. (1995). “Natural language information retrieval”. Journal of Information Processing and Management, 31(3), 397–417.