IT Architecture Automatic Verification A Network Evidence-based Approach António Alegria André VasconcelosInstituto Superior Técnico, Universidade Técnica de Lisboa Center for Organizational Design and Engineering (CODE) PT Comunicações INESC Inovação and Instituto Superior Técnico Lisbon, Portugal Instituto Superior Técnico email@example.com firstname.lastname@example.orgAbstract— Ensuring constant synchronicity between the IT According to Vasconcelos , the construction andArchitecture (ITA) and the actual Information Systems (IS) maintenance of an ISA is fundamental to the properwithout the help of automatic tools is an intractable task, development of technology’s full potential in supportingespecially when taking into account modern IS’ rapid evolution business requirements. Without an ISA it is impossible to plan,and growing complexity and distributed nature. We propose an analyze, discuss, decide, build (successfully) – and alsoautomatic AS-IS ITA verification methodology and framework measure and control – what cannot be specified or represented.based on deep passive network traffic analysis and logicalinference rules with the goal of inferring relevant facts about the The ISA architectural level should be the map that drives aactual ITA. The resulting knowledge is described according to a methodical, orderly and business-oriented technological growthconceptual model designed for this purpose. We also propose an in organizations. Considering the ISA as the set of designorganization-independent mapping relationship between that artifacts relevant for the description of Information Systems, soITA network evidence model and an ISA modeling framework that it is possible to produce it in accordance to the(CEO Framework), at the technology (ITA) level, realized requirements and maintain it during its lifetime , it becomesthrough a set of logical deduction rules. These rules formally clear that its importance depends on its consistency with thedefine the conditions that must hold between the inferred actual reality of the ISs it describes. As such, it is essential toevidence and a high-level ITA model (both represented in this the usefulness of the ISA that its construction, maintenance andinference system) in order to declare that model factual and in evolution are made in sync with the evolution of the IS.line with reality. It is the automatic execution of these rules thatrealizes the verification process that reports, as a result, all the B. Problem Statementsignificant detected discrepancies. The proposed concepts and Ensuring constant synchronicity between the ISA and themethodology are implemented in a prototype applied to a casestudy in the leading Portuguese Telecom operator. The proposed actual IS without the help of automatic tools is an intractablesolution was shown to be capable of successfully verifying the task, especially when taking into account modern Informationcase study’s ITA model as well as discovering new, Systems’ rapid evolution, and growing complexity andundocumented, information through logical inference. distribution . Common approaches to ISA planning (, ) specify an AS-IS ISA and ITA elicitation step but do not Keywords - Network Traffic; Passive Monitoring; Deep Packet propose any automatic methods or approaches to aid inAnalysis; Logical Inference; Information Systems; IT Architecture assuring the actual correctness of the resulting documentation. It is therefore necessary to establish a process encompassing I. INTRODUCTION continuous IS monitoring and automatic ISA verification inA. Motivation order to detect discrepancies and improve the ISA’s maintenance and evolution. Furthermore, this process needs to Modern phenomena such as globalization, the merging of be integrated into a holistic IS and ISA planning andbusiness and IT, the emergence of new technologies and the construction process.introduction of new business models and regulations occur atan ever increasing pace, demanding a swift adaptation of This research argues that it is possible to take advantage ofmodern organizations and their Information Systems (ISs). the fact that most currently relevant Information Systems “live”Enterprise Architecture (EA), of which the Information System and “cooperate” in networked environments and communicateArchitecture (ISA) is an integral part of, is considered a critical through several, predominately IP-based, application protocolsinstrument to address this need . which we can capture and analyze, in real time, through technology common to network security experts. Enterprise Architecture – and the ISA and its ITArchitecture (ITA) layer – is an ongoing process that should be Therefore, the main question framing this research is:kept in sync with developments both in the external business “How to automatically verify if an IT Architecture modelenvironment and inside the organization, including both its specifies the actual reality of production Information Systems,strategy and operational processes . through the analysis of passively captured network traffic generated and consumed by these systems?“.
C. Research Goals and Scope addition, this rigorous specification of its metamodel based on The central theme of this paper is the possibility of UML makes it easy to extend and manipulate.matching an ISA model and the network traffic actually The CEOF2007 is comprehensive and flexible with regardproduced by the involved ISs. We explore and establish a to the modeling of infrastructure, software components,relationship between conventional ISA concepts (at the execution environments and IT services. However, thisTechnology or IT Architecture level) and the information that framework does not specify a way to properly model networkcan be automatically inferred by correlating all the facts connections, IT service’s network interfaces and how the ISsobtained through network traffic capture, deep inspection and interact and expose their services through these interfaces.analysis. We apply this relationship with the goal of Moreover, the CEOF2007 does not define attributes relating toinvestigating the practicality of automatically verifying the the actual low-level naming of architectural components (e.g.reality of an ITA model by confronting it with information infrastructure nodes’ host names, software components and ITinferred from actual network traffic generated by production Services’ low-level names). Since the ISA model is a high-ISs. level document, the names given to its components are The scope of this work is restricted to the technology part commonly descriptive and divergent from the actual concreteof the ISA – the ITA. Nevertheless, this subject was always names used at a technological level.approached as an integral part of the ISA and the EA. In accordance with the established evaluation principlesFurthermore, we restricted ourselves to passive TCP/IP and in comparison with Archimate, TOGAF and RM-ODP, wenetwork traffic capture and analysis as the used data source. On consider CEOF2007 to be the most appropriate to the purposestop of that, the monitored network traffic is assumed to be of this research, providing that some of the previouslyunencrypted; otherwise, the encryption keys would need to be described limitations are addressed.made available. B. Traffic Analysis in Enterprise Networks II. STATE OF THE ART RESEARCH In order to collect evidence about the actual ITA from an As part of this research we studied the area of Information Enterprise Network, we analyzed and compared differentSystems and Information Systems Architecture modeling, from automatic and online network and application discoverywhich we analyzed and compared four different modeling techniques: Agent-based (including log analysis), Activeframeworks that include an IT architecture level. Moreover, we Analysis with remote access Credentials, Active Networkinvestigated several techniques used in the automated and Probing and Passive Network Monitoring , .online discovery of information about networked Information In terms of cost and impact of deployment and operation onSystems and their actual behavior and low-level technology current production infrastructure, passive monitoring methods(IT) architecture. have been shown to require less effort, cost and time-to-A. Information Systems Architecture Modeling Frameworks deployment. Passive techniques are, by definition, completely Based on the state of the art review on the subject of ISA transparent with regard to the network traffic and othermodeling we analyzed and compared four different systems’ resources and, therefore, are the less intrusiveframeworks: CEO Framework (CEOF2007) , Archimate , approach to enterprise IS monitoring.RM-ODP  and TOGAF . Given our research goals of Despite not offering the highest level of detail, passiveestablishing a relationship between an ITA model and the approaches can become nearly-ubiquitous and are able tonetwork traffic generated by the targeted ISs, this analysis observe the actual interactions between Information Systems.focused on the technology level and involved criteria related to: Passive approaches offer a broad, real-time and up-to-datesupport for proper alignment between the ITA and other ISA view of the overall usage profile of IT services and the actualarchitecture layers; level of support for several important ITA relationships between Information Systems while also beingconcepts related with IT services and their network interfaces; able to capture and analyze actual application layerand the level of support for modeling notation and formal information.specification. In conclusion, due to its ease of deployment, broadness of Our assessment of all frameworks’ features, virtues and coverage, real-time capabilities and the possibility of tappinglimitations, as well as our collaboration with CODE1, led us to into actual application-level interactions and data, we considerdecide on the CEOF2007 as the modeling language and passive monitoring to have the greatest potential and usefulnessframework to set the domain of discourse when referring to in large organizations. Therefore, one of our main aims is to tryISA and its modeling. to push passive monitoring as further as possible in the ITA Firstly, the CEOF2007 ensures proper alignment between verification domain.the ITA level and all other ISA architecture levels and, because III. A SOLUTION FOR AUTOMATIC ITA VERIFICATIONit formally defines its primitives and concepts and their actualnotation as an UML profile, it is easy to extend and to offer We approach the problem stated in section I.B bydifferent views on its models, according to the stakeholder. In identifying and addressing the importance of integrating and framing the solution into an existing ISA planning process. We chose to extend the process proposed by Vasconcelos, Sousa & Tribolet in  by introducing an explicit monitoring and1 CODE – Center for Organizational Design and Engineering, INESC INOV verification step which, continuously and over the ISA’s
Figure 1. Proposed ISA Planning, Building and Maintenance Process (modeled using CEOF2007 UML profile )lifecycle, checks if the current (AS-IS) ISA is actually a set of comprehensive verification rules that map between anconsistent with what is observed in the current Information ISA modeling language (such as the CEOF2007) and the ISs’Systems’ interactions over an organization’s network. Figure 1 and their ITA’s manifestations evidenced in their generatedpresents the resulting process, with our extensions drawn in a network traffic. These manifestations can be detected andlighter gray tone. inferred through deep passive traffic analysis as well as logical inference techniques. Considering the continuous and cyclic nature of ISs andtheir ISA’s development , and the necessity to cover two These rules effectively define the conditions that must holddistinct verification situations – verifying the AS-IS ISA and in order to consider an ISA consistent with the actual realitythe TO-BE ISA, following its implementation – we also inferred from the network traffic generated by theintroduced another higher-level loop. This cycle takes organization’s ISs.advantage of the resulting ISs – after the TO-BE ISA’simplementation – as inputs to the AS-IS ISA updating and This verification process’ inputs are composed by the actualmaintenance process. This way, we can verify in a single, ISs, their expected AS-IS ISA model and the verification rules,unified verification step, both the reality of the AS-IS ISA which actually map between the domains of the ISA modelingmodel as well as if the implicit and explicit expectations in the language and some abstraction over the network trafficTO-BE ISA model were accomplished in its implementation. generated by the targeted ISs. That abstraction is defined by a conceptual model named Netfacts, described in section III.B. We tested this process using an extended CEOF2007 as the ISA modeling language (described in section III.C). Section III.A describes the «Monitor Network Traffic» step which leverages state of the art deep packet inspection techniques and logical inference to passively and continuously capture and analyze an organization’s network traffic in order to detect and infer evidence of the actual ISs’ technology architecture. Section III.D describes the «Mapping Rules» that realize the «Verify If Rules Hold» step. Figure 2. Simplified model of the Automatic ITA Monitoring and Verification Process (modeled using CEOF2007 UML profile ) The verification process’ results address the outcome of each test, allowing the architect to assess discrepancies Nevertheless, the main contribution and focus of this between the AS-IS ISA model and the deployed ISs. Thisresearch is the actual ISA verification step («Verify ISA’s output can be used as a resource in the AS-IS ISA updating andReality»), shown in further detail in Figure 2. This step applies maintenance process, if any discrepancies are detected.
Nonetheless, this update process is out of the scope of this analysis of communications between systems , , ,research. .A. Network Traffic Monitoring In sum, this layer offers the possibility of inferring the At present, ISs predominantly cooperate and interoperate relationship graph between infrastructure elements in athrough TCP/IP networks . The authors argue that if we are computationally efficient way, without the need to inspect allable to observe the network traffic that involves these traffic but only its network and transport-layer headers.interactions and successfully interpret it, it is possible to 2) Superficial Inspection of Application-layer Content:establish a sort of digital ethnography method whereby we are Having the application-layer content stripped and reconstructedable to reconstruct the ITA automatically and in real-time. in the previous level, this level analyses application-layer By taking into account the structure of network traffic that content by confronting it with a set of patterns (e.g. regularrealizes the ISs’ interactions and their supplied IT services’ expressions) that, if matched, identify the used application-usage, we systematize passive network analysis techniques in layer protocol. It is possible to discover other kinds ofincreasingly detailed layers. Raw network traffic constantly information explicitly announced by the protocol, such as whatcaptured through passive monitoring can be subject to software components support each end of communicationinspection and analysis in three levels: Sub-Application-layerInspection; Superficial Inspection of Application-layer flows. These patterns are commonly referred to as signatures orContent; and Deep Interpretation of Application-layer Content. fingerprints because they serve as unique identifiers for these network traffic elements . These levels constitute layers of traffic analysis on which In sum, these techniques add to the previous level’sinformation inferred in one layer can be used as the basis for infrastructure relationship graph information about application-the analysis accomplished in the upper layers. layer protocols (levels 5 to 7 of the OSI model ) and 1) Sub-Application-layer Inspection: All the packets that software components in each end of communication flows.constitute a network flow contain, in their TCP/IP headers, 3) Deep Application-layer Interpretation: Although it isinformation that characterizes the communication at a basic but possible to develop a set of signatures that can classify andcrucial level. By inspecting and analyzing the configuration extract diverse information carried in network traffic, theseflags and data carried in the IP and TCP headers it is possible kinds of techniques are practically restricted to identifyingto infer significant information about the participating systems. protocols and are not as flexible or useful in the extraction and These kinds of techniques enable inferring information inference of other information carried in network traffic’sabout addresses (network and transport) and operating systems application-layer payload, due to the inherent limitations of Figure 3. Conceptual (Entity-Relationship) Model of actual ITA evidence detected and inferred from Network Trafficat both ends of network flows. Furthermore, they make regular expressions. According to Noam Chomsky  it ispossible to reassemble and correlate network flows allowing impossible to describe a higher-level language with a lowerproper examination of their content and temporal and spatial level one. Regular expressions define regular languages – the lowest level of Chomsky’s language hierarchy – while most
languages and protocols used at an application level of C. Extending the CEO Frameworkcommunication are mostly context-free languages (e.g. XML, The assessment made of the CEOF2007 (section II.A)SOAP and SQL are context-free languages). identified some key limitations which prevented it from being Therefore, in order to gain insight into what the ISs are effectively mapped with the actual ITA evidence inferred fromactually doing on the network, there needs to be a higher-level passive network analysis, described through the Netfacts modelanalysis whereby, after the application-layer traffic payload is (section B). We addressed these limitations by extending theextracted and reassembled in the first level and classified in the CEOF2007 metamodel (at the M2 level) through the additionsecond level, it is possible to forward it to specific specialized of some new primitives that reify and formalize concepts suchinterpreters, per application-layer protocol. These interpreters as the «Operating System», the «Network Connection» andare capable of decoding and understanding conversations «Network Service Ports», which formally specifies ITbetween ISs while they use each-others’ IT services, in the Services’ network interfaces. Furthermore, we added newsame way applications decode the traffic sent between them. attributes to existing primitives, namely the «concrete name» and «version». The full specification and detailed rationale of Exactly what information can be inferred at this level is these extensions to the CEOF2007 are available in .extremely dependent on the specific application-layer protocolsand the information explicitly declared in the interactions From this point on we refer to the extended framework asbetween ISs. Nevertheless, it is possible in many cases to CEOF2007+.extract and infer concrete names of infrastructure nodes,software components and IT services and operations, including D. Mapping Rules Between Netfacts and CEOF2007+used parameters, as well as user login names and low level In order to define a mapping association between theinformation entities (e.g. database schema, remote file system Netfacts model and the CEOF2007+ meta-model, we propose astructure) , , . comprehensive set of mapping rules specified using a subset of first-order-logic (Horn clauses, as used in Prolog ). TheseB. Netfacts Model rules prescribe the set of criteria used to check if an ITA model In order to manipulate and map all the evidence inferred is factually aligned with the reality described by the factsthrough the previously described techniques to a higher level structured according to the Netfacts model. Verifying if theseISA model we need to establish a conceptual model that rules hold establishes the actual ITA verification process.defines and relates all inferred facts. We propose a genericmodel that frames and relates all the different kinds of ITA The following two formulas show a small example of suchevidence we can infer from the previously described passive rules, assuming the predicates defining the domain-of-network traffic analysis methods. This model, named Netfacts, discourse have been previously defined (e.g. Name(x,n) meansis designed with the main goal of being generic and that n is a concrete name of x).independent from any ISA modeling Framework and anytraffic analysis technique or tool. MapIPBToSwComponent ( x, y ) ≡ ∀x∃i∃n∃v∃t ( IIP ( x) ∧ BaseNetConnIp ( x, i ) ∧ Name( x, n ) ∧ Version( x, v) ∧ ServiceType( x, t ) The main purpose of the Netfacts model (presented in ∃s ( SwComponent ( y ) ∧ IpSwComponent (i, y ) ∧ Name( y , n) ∧ Version ( y , v) ∧ ServiceType( y, t ))) (1)Figure 3) is to be a reference framework to describe, store and MapIIBItSvcUsageToNetFlow( x, y , z) ≡manipulate all the facts pertaining to the manifestations of ISs ∀x∀y ∃i∃w( IAPB ( x ) ∧ ItSvc( y ) ∧ Uses ( x, y ) ∧ BaseNetConnIp( x, i) ∧ NSP ( y , w)on their generated network traffic and enable mapping these ∃z ( NetFlow( z ) ∧ MapNSPToNetFlow( w, z ) ∧ SrcIp ( z , i)))low-level evidence with any ISA modeling framework, at a (2)technology (ITA) level. Netfacts is a conceptual model that specifies simple entities Rule (1) defines a mapping between an «IT Platformdescribing facts about network communications between Block» (CEOF2007+) and a «Software Component» (Netfacts)information systems. These communications are embodied in through direct matching between analogous attributes. It reads«Network Flows», the central and mediating entity that as: “If there is an «IT Platform Block» whose supporting «ITrepresents a single coherent communication session between Infrastructure Block» maps to a «Network Host» through onetwo TCP/IP endpoints. Communications between infrastructure its «Network Connection» then there must be a «Softwarenodes («Network Host») are made through «Network Flows» Component» with matching attributes detected in thatover a set of «Application Layer Protocols». On each end of a «Network Host» for at least one «Network Flow»”.«Network Flow» it is possible to detect the existence of Rule (2) defines the mapping between an «IT Service»participating «Software Components» as well as the utilization usage relationship and its corresponding «Network Flow». Itof IT «Services» and «Operations», including the specific reads as “If there is an «IT Service» (ITS) used by any «IT«Operation Parameters» that were used. Block» (ITB) then there must be at least one detected «Network This model supports the indication of what service types Flow» whose source «Network Host» maps to any of the ITB’sare supported or supplied by a «Service» or «Software «Network Connections» and whose destination «NetworkComponent», in line with TOGAF’s service taxonomy defined Host», «Transport Port» and «Application Layer Protocol» allin its Technical Reference Model (TRM) . map to at least one of ITS’s «Network Service Ports»”. Netfacts is described and specified in detail in , All the implemented rules and their domain of discourseincluding its entities, attributes and associations. specification are available in .
IV. TECHNICAL IMPLEMENTATION OF THE PROPOSED 2) Superficial Application-layer Content Inspection: At SOLUTION the Superficial Application-Layer Content Inspection level, the In order to validate the previously described proposal we NTMA has one subcomponent based on two variations of thedeveloped a proof-of-concept tool. This tool implements the PADS 1.2  software to infer and identify «ApplicationITA monitoring and verification process detailed in section III Layer Protocols» used in each «Network Flows» andand presented in Figure 2. This prototype is generic enough to «Software Components» participating on each end of thesebe applied in any organization as long as all the restrictions flows.listed in section I.C are satisfied. The two variations of PADS are each responsible for The prototype’s architecture is made of two main analyzing different segments of network flows: traffic comingcomponents with distinct concerns and responsibilities and from the flows source and from the flows destination. Bothshows how to structure an actual ITA model verification tool: use different sets of signatures whose format was extended in this research to support the explicit specification of «Software• Network Traffic Monitoring and Analysis engine Components» service types, according to TOGAF’s TRM (NTMA) – implements the «Monitor Network Traffic» service taxonomy . process and is responsible for passively analyzing In the case study’s research context (see section V) several (previously captured) network traffic and produce facts important application-layer protocol signatures were developed relating to evidence about the actual ITA, described in such as signatures for Tuxedo, Tibco Rendezvous, SOAP, accordance with the Netfacts model («Netfacts HTTP, Oracle Database (TNS) and Microsoft SQL Server Instantiation»). (TDS). These are all common protocols making up a• ITA Inference and Verification Engine (IIVE) – considerable part of traffic in corporate networks responsible for manipulating all the facts inferred by the 3) Deep Interpretation of Application-layer Content: The NTMA enabling their exploration and the discovery of new deep interpretation component is responsible for analyzing information as well as the execution of the ITA verification traffic at the highest level of the traffic analysis hierarchy. This tests («Verify If Rules Hold »). These tests simply apply the component is made of: mapping and verification rules explained in section D in order to reach a conclusion about the reality of the given • one capture and Superficial Application-layer Content ITA model. Inspection-level traffic analyzer – preprocesses and classifies network traffic and forwards it to specialized These two components are described in fuller detail in the interpreters, depending on the detected application-layerfollowing section. protocol;A. Network Traffic Monitoring and Analysis engine (NTMA) • three deep interpretation components, each specialized in The NTMA is composed of four main independent traffic different application-layer protocol stacks: HTTP/SOAP,analyzers that, together, infer evidence and produce the facts SQL and Oracle TNS.described by the Netfacts model. These analyzers operate atdifferent levels of the traffic analysis hierarchy defined in The HTTP/SOAP interpreter uses an HTTP parsing librarysection III.A.  and, if it detects that the HTTP message’s body is SOAP it forwards it to another specialized SOAP envelope interpreter. 1) Sub-Application-layer Inspection: At the Sub- This interpreter infers what IT services and operations areApplication-layer Inspection level, the NTMA has two being called, with which parameters (name and type) and bysubcomponents, built on top of Open Source tools, that manage whom. On the other hand, the HTTP part of this component isand coordinate their execution and parse and interpret the able to infer information such as concrete names associated togenerated output in order to produce information conforming to servers «Network Hosts» as well as client and server «Software Components» involved in these interactions.the Netfacts model. One of these subcomponents uses IPAudit 1.0  to infer The SQL interpreter parses SQL queries inferringand identify all «Network Flows», «Network Hosts» and used information about the used databases’ schema such as used«Transport Ports», including statistics about temporal tables and columns as well as database and data service’s(beginning and end timestamps) and size (number of bytes and concrete names, effectively discovering data pertaining to thepackets sent and received) dimensions of «Network Flows». information architecture. The other subcomponent uses p0f 2.0.8  updated with The TNS interpreter specializes in parsing TNS servicesignatures from the PRADS project 2 to infer and identify request messages used by clients of a data service provided by«Operating Systems» used by each end of the «Network a database hosted on the DBMS Oracle Database , . ByFlows». We also developed a simple mechanism to classify the analyzing these messages, this component is able to inferoperating system’s family (e.g. Windows, UNIX, and Mac information about the data services’ concrete names, userOS). names, concrete names associated to the client and server «Network Hosts» as well as «Software Components» in both ends of the communication flow.2 PRADS – http://gamelinux.github.com/prads
4) Integrating all Traffic Analyzers: After all captured Logtalk classes. These facts serve as the domain for allnetwork traffic is properly analyzed and distilled, all inferred mapping rules in the Knowledge Base.facts about the actual ITA (conforming to the Netfacts model) 3) Knowledge Base: The Knowledge Base is made up ofare integrated into the same «Netfacts Instantiation» all the mapping and verification rules mentioned in sectionknowledge base (see ) by correlating them by network flow (IP III.D and which encode all the domain knowledge thataddresses and transport ports) and temporal approximation. establishes a mapping relationship between an ITA model in aAfterwards, this data is converted to Prolog facts and written particular ISA modeling language (CEOF2007+ in this case)into an output file to be read by our ITA Inference and and the actual ITA’s reality evidence inferred through theVerification Engine. passive capture analysis of network traffic generated by the organization’s ISs (Netfacts model). By checking if these rulesB. ITA Inference and Verification Engine (IIVE) hold, for a given working storage, we are in fact verifying if the The ITA Inference and Verification Engine (IIVE) is one of ITA’s model is consistent with the evidence inferred fromour prototype’s main components and is responsible for the actual network traffic, generated by current productionmanipulation of all Netfacts-conforming facts inferred andproduced by the NTMA and for the automatic verification of systems. This verification is done automatically by thean ITA model by checking if the mapping rules hold between Inference Engine.that model and those facts. 4) User Interface: The IIVE’s user interface component is based on the command line interface supplied by the used IIVE was developed with the Logtalk object-oriented logic Logtalk/Prolog environment. This command line supplies aprogramming language and runtime environment supported by the SWI Prolog implementation . The choice way to query the Working Storage and apply the knowledge inof language is justified by Prolog’s automatic inference the Knowledge Base to the whole ITA model, executing thefeatures and semantic proximity to first-order logic (used in the verification test-suite. Furthermore, we implemented themapping and verification rules specification) and Logtalk’s generation of verification reports that describe the verificationobject-orientation that allow an easier handling of the process including each test’s description (e.g. what attributesarchitecture primitives that usually compose an ISA modeling and/or relationships are being checked), results (e.g. pass, failframework such as the CEOF2007+, mapping its meta-model or unknown) and examples of facts used to reach a conclusionto a class hierarchy that can be instanced to describe the actual about a specific verification step. These reports allow theITA model. architect to easily view all the detected discrepancies as well as This component’s architecture is loosely inspired by classic all the confirmed architecture elements and those that were notExpert Systems (, , ), being composed of four main possible to confirm or refute. Additionally, the architect issubcomponents: Inference Engine; Working Storage; informed of the facts that served as evidence to reach aKnowledge Base and User Interface. conclusion about a particular architecture element. 1) Inference Engine: The Inference Engine is the IIVE’s“brain”, supplying the mechanisms to execute all the rules thatcompose the knowledge base and enabling the exploration ofall the facts that compose the ITA description (CEOF2007+) aswell as all the facts that serve as evidence for the real ITA’smanifestations on the captured network traffic (Netfacts) andinferred through passive capture and analysis (NTMA). The used inference engine is based on the Logtalk/SWI-Prolog runtime environment which, by itself, offers a simple,but sufficiently capable, inference engine for this prototype’spurposes, especially taking into account the semantic proximitybetween Prolog/Logtalk code and the first-order-logicspecification of the knowledge incorporated in the mappingrules, and the Prolog execution model which, throughbacktracking and logical unification, is able to efficiently andautomatically check if the mapping rules hold for a givenworking storage and also discover new knowledge. In addition,this inference engine is capable of logically inferring new,undocumented parts of the ITA from observed facts relatedother parts of that model. 2) Working Storage: The working storage is composed ofall the facts describing the problem-domain state and is Figure 4. IT Service Architecture of the studied IS ecosystem (correctpopulated by all the Netfacts-conforming facts generated by the model in black; introduced errors in gray)NTMA as well as the ITA model description in CEOF2007+
V. PORTUGAL TELECOM IT ARCHITECTURE VERIFICATION The concrete application of the proof-of-concept prototype CASE STUDY to the case study was executed in two distinct situations, corresponding to slightly different ITA models: The concepts and processes described in section III andmaterialized in the prototype described in section IV were 1. Correct model of the described ecosystem, actuallyapplied and evaluated in a case study of the IT Architecture of describing the reality of the production ISs (black modela significant subset of the information systems supporting the of Figure 4, without the shaded elements);sales function of the leading Portuguese telecommunication 2. Incorrect model of the described ecosystem wherebyorganization – Portugal Telecom (PT) Comunicações3. several known mistakes were introduced in the correct We took advantage of the existing network monitoring model. These errors aren’t limited to those shown ininfrastructure in the Pulso monitoring platform  – Figure 4 (in gray) and encompass all shown «ITdeveloped in-house at PT Comunicações – in order to capture Blocks»’ detailed architectures, not presented in finerraw network traffic from different points in the corporate detail in this paper for space limitationsnetwork. Despite this, all raw captured traffic is processed andanalyzed by our prototype which was developed and used in These different situations allowed us to assess thetotal separation from this platform. capability of our proposal to positively verify a correct model and to detect errors in an incorrect model, therefore Next, we briefly describe next the studied IS, whose IT accomplishing its task of automatically verifying the realityService architecture is presented in Figure 4: and actuality of an ITA model.• Sales Force Automation Portal (SFA) – web-based sales VI. RESULTS portal following a classic 2-tier architecture, including load balanced web frontends and a data backend failover cluster This section reports the results of applying the developed supporting the portal through a supplied data service. All proof-of-concept prototype to the previously described case study (section V) and our assessment of the outcome. This non-hardware components are based on Microsoft analysis is broken into three parts: verifying the correct model; technologies such as the .Net Framework 2.0 and IIS 6.0 on verifying the incorrect model and new ITA information the web frontends and SQL Server 2005 on the data discovery. backend.• Order Entry System (SIREL) – manages order entry for A. Verifying the Correct Model just-sold products. Its architecture is based on a failover Figure 5 displays a brief extract of the verification of cluster of HP-UX servers supporting an «Order Entry SIREL Order Entry «IT Logic Block». The first test positively Management» data service realized by a database over an confirms a usage relationship between SIREL and the Oracle Database DBMS. “Distributed Transaction Processing” «IT Service» and is unable to confirm or deny that it is this specific «IT Logic• Service Framework (FWS) – middleware system that Block» that realizes this usage. supplies several integration Web Services used to access IT Services offered by other systems. Its architecture is based it_logic_block: sirel_logic(Order Entry Logic) on a simple 2-tier pattern including load balanced ------------------------------------------------------------------------------------------- application frontends and a failover cluster data backend Testing any outbound network activity toward any «NetworkServicePort» supporting the «IT Service» "Distributed Transaction Processing": [PASS] Found matching network activity from «NetworkHost» "220.127.116.11" in supporting the frontend applications through a supplied data «NetworkFlow»: flow_ead8bcbf6f44c2ad208d89e41fdcee7c1 service. As with SFA, all non-hardware components are Testing «IT Service» "Distributed Transaction Processing" usage through any of its based on Microsoft technologies such as .Net Framework «NetworkServicePorts» by a source «SoftwareComponent» matching this «IT Block» or any supporting «IT Application Block» or «IT Platform Block»: 1.1 and IIS 6.0 on the frontends and SQL Server 2000 on the [UNKN] No matching «SoftwareComponent» detected in a valid outbound connection to the «IT Service» «NetworkServicePort»! Check full test results for details. data backend.• Tuxedo – distributed transaction processing middleware Figure 5. Brief extract of the verification of SIREL Order Entry «IT Logic system. Its architecture is based on a failover cluster of HP- Block» (correct model) UX servers supporting a «Distributed Transaction From the analysis of the produced verification report we Processing» IT Service realized by the Tuxedo software reached the following conclusions, organized by type of component. architecture element: The information systems ecosystem composed of the above 1) IT Infrastructure Block: All «Servers» were positivelysystems is characterized by their externally supplied services identified by at least one of their «Network Connections»and their interrelations, realized by these services’ usage. and/or concrete names. In the case of failover clusters, only theFigure 4 documents these relationships (lighter gray elements active (master) servers were detected, as expected in a typicalrepresent errors purposefully introduced in the model). production environment. 2) IT Platform Blocks and IT Application Blocks: All «Operating Systems» were positively identified. All other «IT Platform Blocks» were positively identified with the exception of the .Net Framework 2.0 used in the SFA3 PT Comunicações – http://www.telecom.pt
frontends and the SQL Server 2005 in the SFA backend. Afterexploring all the facts generated by the NTMA (through the it_logic_block: sirel_logic(Order Entry Logic)IIVE’s user interface) we came to the conclusion that the -------------------------------------------------------------------------------------------ASP.NET 2.0 and other .Net Framework 2.0 components were Testing any outbound network activity toward any «NetworkServicePort» supporting the «IT Service» "Work Order Notification":detected but since these components’ concrete names were not [UNKN] No matching activity found. Checking details:preconfigured in the ITA, the Inference Engine was not able to * Testing any outbound network activity: [PASS] Found outbound activity from «NetworkHost» "18.104.22.168" inmatch them to the .Net Framework 2.0. In the case of SQL «NetworkFlow»: flow_eb5d703c6917532de7f7b4140e1ea5d159 * Testing any outbound network activity towards any «NetworkConnection» supportingServer 2005, the missing detection was caused by the fact that the used «IT Service»: [FAIL] No outbound activity to any of the «NetworkConnections» supporting thethe developed TDS (SQL Server’s protocol) signature could «IT Service» was detected!only detect earlier versions of the DBMS. Testing «IT Service» "Work Order Notification" usage through any of its «NetworkServicePort»: All verifiable «IT Application Blocks» (those with defined [FAIL] «IT Service» is not used by this «IT Block».concrete names) were positively identified, including all Figure 6. Brief extract of the verification of SIREL Order Entry «IT Logicdatabases («IT Data Block»). Block» (incorrect model) 3) IT Service: All «IT Services» were positively identified In all cases, none of these architecture elements werethrough their attributes (e.g. concrete name and service type) positively verified. In the majority of reported cases (tworesulting in the discovery or confirmation of at least one thirds), they were explicitly reported as errors. However, there«Network Service Port» to those services. were a few cases where the prototype could not find evidence All «IT Services»’ realization relationships were positively to support or refute those parts of the model, declaring them asverified with the exception of one data service realized by a unprovable and undetermined. In either situation, the prototypedatabase supported by SQL Server 2005, because, as was raised a “red flag” for every artificially introduced mistake inpreviously explained, there was no signature identifying this the ITA model therefore prompting the architect to furtherparticular version of the SQL Server software component or its investigate the matter, possibly through the prototype’s userused application-layer protocol (TDS). interface and inference engine. All «IT Services»’ utilization relationships were positively Figure 6 displays a brief extract of the verification ofidentified in terms of detecting network traffic corresponding SIREL Order Entry «IT Logic Block» where it detects thethe service’s usage. However, the client-side «Software introduced error – the logic block’s usage of the “Work OrderComponents» were not detected for the matching network Notification” «IT Service».flows and thus, were not able to be verified. C. New Information Discovery 4) Main Problems: In the cases where we could not In addition to verifying the actuality and reality of ITApositively verify an architecture component or relationship it models according to the mapping rules mentioned in sectionwas due to the lack of a particular application-layer protocol III.D, the developed tool is able to explore and discoversignature (e.g. SQL Server 2005) or concrete name disparity undocumented ITA-related information, through thebetween what is specified in the ITA model and what was exploration and inference over the facts generated by MATRinferred from the network traffic (e.g. .Net Framework 2.0 vs. and by taking advantage of the inference engine in MIVA andASP.NET 2.0). Nevertheless, it would only take a small effort a few simple inference rules.(tweaking existing signatures) to improve our developed Through these mechanisms we were able to discover 50signatures and interpreters to take these cases into account. The undocumented Web Services, Databases and Data Serviceslarger the signature base the higher the usefulness of the used or realized by components in the studied IS ecosystem asdeveloped tool. We consider, however, that we managed to well as parts of several Databases’ schemas (e.g. column andachieve a significant coverage of application-layer protocols’ table names) as well as the login names of users accessingand software components’ signatures. Improving string these services.handling and matching would also help in matching concrete VII. CONCLUSION AND FUTURE WORKnames specified in the ITA model and inferred in the network. A. Main ContributionsB. Verifying the Incorrect Model Many of the techniques employed here have existed for Assessing the results produced by the prototype when some time in the security and system management area.verifying the incorrect ITA model we reached the conclusion However, this research’s main contribution is the proposal andthat all introduced errors were reported, allowing the architect demonstration of the application of passive network trafficto fix the model. The purposefully introduced errors were monitoring and analysis as a way to infer relevantdiverse, including changing operating systems and software information about the actual status of the ISs and the use ofcomponents and adding «IT Services» realized by the wrong this information in automatically verifying an ITA model by«IT Block» as well as changing existing service usage trying to match it with facts inferred as evidence in therelationships or introducing new, previously non-existing ones network traffic generated by those ISs. This contribution is(Figure 4). held together by a set of other smaller contributions:
• Systematization of passive network traffic analysis (when needed) and automatically explain the conclusions techniques as an inexpensive source of real-time reached by the inference engine (answering the questions information about the actual state of the ISs and their ITA; “Why?” and “How?”).• Conceptual model of information that can be • Model-oriented User Interface – integration of all the automatically discovered through passive network concepts, techniques and tools hereby proposed in a traffic analysis techniques. We named this model Netfacts; graphical ISA modeling environment.• CEOF2007 extension enabling mapping its high-level C. Future Work meta-model with the network traffic generated by ISs (described through the Netfacts model). We named this Considering all contributions, this research is not self- fulfilling and serves as another stepping stone for further extended framework CEOF2007+. The resulting framework research. We consider the following themes to be important for was applied in a real-world case study in a major Portuguese future work. telecom company – PT Comunicações;• Mapping between network traffic (generated by ISs) and ITA Automatic Discovery – a subject that serves as an an ISA modeling language through the specification of important incentive for persisting on this research path and as first-order-logic rules that specify the restrictions and an ambitious goal worth chasing is the Automatic Discovery of the ITA based on capturing and analyzing the ISs’ and their associations between the Netfacts model and the ITA’s manifestations in their generated network traffic. CEOF2007+; Although the present research is still far from reaching this• Automatic ITA monitoring and verification process, goal, we consider it serves a firm first step that leads the way according to the actual state of the IS. This process was toward it. The proposed passive network traffic analysis integrated and harmonized into a holistic ISA planning, methodology and the usage of logical inference techniques are building and maintenance process, resulting in an extension contributions we believe can be leveraged in this new stage. to the ISA planning process proposed by Vasconcelos . Complex Relationships between Information Systems – This extension establishes a continuous ISA planning, nowadays, and specially with gaining importance of SOA, verification and construction cycle; most ISs communicate and relate with each other through• Development and actual deployment of a proof of middleware systems and asynchronous messaging over ESBs. concept prototype that encompasses all the research work In cases like these, these relationships aren’t directly mirrored hereby described allowing the validation of this work by into the observed network traffic and so, there needs to be a testing it in a real-world case study in a large enterprise. better way to infer them, such as what is proposed in , .B. Limitations Extend Automatic Verification Process to other ISA architecture levels such as the Information Architecture or the In spite of these research contributions, we identified some Application Architecture levels.unaddressed limitations: Apply to Other ISA Modeling Frameworks such as• IS and ISA Planning, Building and Maintenance Archimate or RM-ODP. Process – the proposed extension to the full ISA planning process introduced in Figure 1 needs to be the tested and Use other data sources (e.g. active network probing, validated (in case studies) in order to assess its theoretical agents and log analysis) in order to complement the automatic merits. and runtime ISs and ITA information discovery capabilities.• Detection of some important software components – in spite of developing a considerable amount of new application protocol and software components signatures, REFERENCES some of these architecture elements (e.g. SQL Server 2005) . Land, Martin Op t, et al. Chapter 2: Overview. couldn’t be classified by our prototype. Nevertheless, the Enterprise Architecture (The Enterprise Engineering Series): proposed passive network traffic analysis framework Creating Value by Informed Governance. s.l. : Springer, 2008. (section III.A) can be easily improved over time by . —. Chapter 5: Processes Involved in EA. Enterprise continuously adding new signatures and improving handling Architecture (The Enterprise Engineering Series): Creating of subtle differences in concrete names and versions, Value by Informed Governance. s.l. : Springer, 2008. without much added effort. Supporting SQL Server 2005, for example, would only require a small tweak to the . Information System Architecture Metrics: an Enterprise existing TDS signature. Engineering Evaluation Approach. Vasconcelos, A., Sousa, P. and Tribolet, J. 1, s.l. : Academic Conferences Limited,• Expert System Features – a major part of our proposal’s June 2007, Electronic Journal of Information Systems architecture was inspired by rule-based Expert Systems. Evaluation, Vol. 10, pp. 91-122. available online at Despite this, some useful features (, ) weren’t www.ejise.com. implemented because they were not considered essential for this research’s purpose. The most obvious are the ability to interactively integrate the user’s knowledge at runtime
. Enterprise Architecture: The Issue of the Century. Workshop on Business-Driven IT Management (BDIM 2006).Zachman, J. March 1997, Database Programming and pp. 63-70. ISBN: 1-4244-0176-3.Design, pp. 1-13. . Rifkin, J. IPAudit Web Site. [Online] July 12, 2005.. Brett, Charles. Automated Application Discovery: The [Cited: July 20, 2009.] http://ipaudit.sourceforge.net.Enterprise Architects Auto-Aide. s.l. : Forrester Research Inc., . Zalewski, M. p0f 2 README. [Online] 2006. [Cited:2007. White Paper. July 20, 2009.] http://lcamtuf.coredump.cx/p0f/README.http://www.forrester.com/Research/Document/Excerpt/0,7211,44251,00.html. . Shelton, M. About Passive Asset Detection System (PADS). [Online] June 18, 2005. [Cited: July 20, 2009.]. Spewak, S. and Hill, S. Enterprise Architecture http://passive.sourceforge.net/about.php.Planning: Developing a Blueprint for Data, Applications andTechnology. s.l. : Wiley, 1992. ISBN-13: 978-0471599852. . International Telecommunication Union (ITU). Open Systems Interconnection - Basic Reference Model. s.l. :. Open Group. The Open Group Architectural Framework International Telecommunication Union (ITU), 1994.(TOGAF) - Version 9 Enterprise Edition. 9th Edition. s.l. : Standard. Recommendation X.200 (07/94).Van Haren Publishing, 2009. ISBN 9789087532307. . Three models for the description of language. Chomsky,. Lankhorst, M. & the ArchiMate team. ArchiMate Noam. 3, September 1956, IRE Transactions on InformationLanguage Primer. Enschede : Telematica Instituut, 2004. Theory, Vol. 2, pp. 113-124. DOI:disponível em https://doc.telin.nl/dsweb/Get/Document- 10.1109/TIT.1956.1056813.43839. ArchiMate/D1.1.6a. . Oracle. Oracle Real User Experience Insight - An Oracle. ISO/IEC. ISO/IEC 19793:2008: Information technology - White Paper. [Online] March 2008. [Cited: July 20, 2009.]Open distributed processing - Use of UML for ODP system http://www.oracle.com/technology/products/oem/pdf/twp_usespecification. s.l. : Multiple, 2008. Standard. ISO/IEC r_insight.pdf.19793:2008. . Secerno. The SynoptiQ Engine: The Power Behind. Drogseth, Dennis. Planning for CMDB Design and Secerno DataWall. [Online] 2009. [Cited: July 20, 2009.]Adoption: An Industry Colloquium. [Online] September 1, http://www.secerno.com/?pg=our-approach&sub=powerful-2005. [Cited: January 15, 2008.] analysis.http://www.enterprisemanagement.com/research/asset.php?id=225. . Netwitness Corporation. Netwitness Investigator. [Online] Netwitness Corporation, 2009. [Cited: July 20, 2009.]. Garbani, Jean-Pierre and Mendel, Thomas. The http://www.netwitness.com/products/investigator.aspx.Forrester Wave: Application Mapping For The CMDB, Q12006. s.l. : Forrester Research, Inc., 2006. White Paper. . Alegria, António. Netfacts Model Specification. Lisbon :http://www.forrester.com/rb/Research/wave%26trade%3B_ap PT Comunicações and INESC INOV, 2009. Technical Report.plication_mapping_for_cmdb%2C_q1_2006/q/id/36891/t/2. Available at https://fenix.ist.utl.pt/homepage/ist153841/technical-. Enterprise Architecture Analysis: An Information System reports/netfacts-model-specification. INESC Tec. Rep.Evaluation Approach. Vasconcelos, André, Sousa, Pedro Reference Number 5718.and Tribolet, José. 2, Ulm : Germany Informatics Society,December 2008, Enterprise Modelling and Information . —. CEO Framework: Technology ArchitectureSystems Architectures, Vol. 3, pp. 31-53. Extensions. Lisbon : PT Comunicações and INESC INOV, 2009. Technical Report. Available at. Vasconcelos, André. CEO Framework UML Profile https://fenix.ist.utl.pt/homepage/ist153841/technical-v1.2. Lisbon : CEP, INESC-INOV, 2006. Technical Report. reports/ceo-framework-technology-architecture-extensions.. Laudon, Kenneth C. and Laudon, Jane P. INESC Tec. Rep. Reference Number 5719.Management Information Systems: Managing the Digital . Bratko, Ivan. PROLOG Programming for ArtificialFirm. 10th Edition. s.l. : Prentice Hall, 2006. ISBN-13: 978- Intelligence. 3rd Edition. s.l. : Addison Wesley, 2001. ISBN-8120334687. 13: 978-0201403756.. Mining Semantic Relations using NetFlow. Caracas, A, . Alegria, António. CEO Framework Netfactset al. Salvador, Bahia, Brasil : IEEE, April 7, 2008, Third Mapping and Verification Rules. Lisbon : PT ComunicaçõesIEEE/IFIP International Workshop on Business-driven IT and INESC INOV, 2009. Technical Report. Available atManagement (BDIM 2008), pp. 110-111. ISBN: 978-1-4244- https://fenix.ist.utl.pt/homepage/ist153841/technical-2191-6. reports/ceo-framework---netfacts-mapping-and-verification-. Relationship Discovery with NetFlow to Enable rules. INESC Tec. Rep. Reference Number 5720.Business-Driven IT Management. Kind, A, Gantenbein, Dand Etoh, H. s.l. : IEEE, 2006. IEEE / IFIP International
. Internet Programming with Ruby writers. WEBrick -an HTTP server toolkit. [Online] August 14, 2003. [Cited:August 4, 2009.] http://www.webrick.org.. Litchfield, David. The Oracle Hackers Handbook:Hacking and Defending Oracle. s.l. : Wiley, 2007. ISBN-13:978-0470080221.. Oracle. Oracle Database Net Services Reference - 10gRelease 2 (10.2). [Online] 2005. [Cited: July 15, 2009.]http://download.oracle.com/docs/cd/B19306_01/network.102/b14213.pdf. Part Number B14213-01.. Moura, Paulo. Logtalk - Design of an Object-OrientedLogic Programming Language. Departamento de Informática,Universidade da Beira Interior. Covilhã : s.n., 2003. PhDThesis.. Wielemaker, Jan. SWI-Prolog. [Online] [Cited: August4, 2009.] http://www.swi-prolog.org.. Russel, Stuart and Norvig, Peter. ArtificialIntelligence: A Modern Approach. 2nd Edition. s.l. : PrenticeHall, 2002. ISBN-13: 978-0137903955.. Merrit, Dennis. Building Expert System in Prolog. s.l. :Springer, 1989. ISBN-13: 978-0387970165.. Uma experiência open-source para "tomar o pulso" e"ter o pulso" sobre a função de sistemas e tecnologias deinformação. Alegria, J., Ramalho, R. and Carvalho, T.Lisbon : CAPSI, 2004.