sep-26-03.ppt

563 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
563
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 27
  • Interest measures – make sure that sensitive facts, if they exist, will be deemed uninteresting by algorithms Extra data – example, a “phone book” that contains extra entries. Still useful if goal is to find phone given name, but access to complete phone book doesn’t allow determining facts about (for example) department sizes. Performance – maybe not an issue for small amounts of data, but on large data sets (terabyte); exponential performance is an issue (disk limited) Note that we don’t have the same problem faced by (for example) the GPS military/civilian accuracy encoding. There, the goal is to make information (position) known to all, but just more clearly for some. Here, the information to be made known, and the information to be kept hidden, are completely different. A better analogy would be getting position from communications satellites (e.g. measuring delay). Introducing a small random delay will wreak havoc with trying to determine position by this method, but will not alter the information communicated.
  • Merkle signatures allow one to apply a unique digital signature on an XML document by ensuring at the same time the authenticity and integrity of both the whole document, as well as of any portion of it ( i.e., one or more of its elements/attributes ) . The peculiarity of the Merkle Signature is the algorithm used to compute the digest value of the signed XML document. This algorithm exploits the Merkle tree authentication mechanism proposed in cite { Mer89 } . The basic idea is to associate an hash value with each node in the graph ( i.e., DOM ) representation of an XML document. More precisely, the hash value associated with an attribute is obtained by applying an hash function over the concatenation of the attribute value and the attribute name. By contrast, the hash value associated with an element is the result of the same hash function computed over the concatenation of the element content, the element tagname, and the hash values associated with its children nodes, both attributes and elements.
  • sep-26-03.ppt

    1. 1. New England Database Society (NEDS) Friday, September 26, 2003 Volen 101, Brandeis University Sponsored by Sun Microsystems
    2. 2. Data and Applications Security Developments and Directions and XML Security Bhavani Thuraisingham The National Science Foundation September 2003
    3. 3. Outline <ul><li>Data and Applications Security (DAS) </li></ul><ul><ul><li>Developments and Directions; DAS at NSF </li></ul></ul><ul><li>Secure Semantic Web </li></ul><ul><ul><li>XML Security; Other directions </li></ul></ul><ul><li>Some Emerging Secure DAS Technologies </li></ul><ul><ul><li>Secure Information Integration; Secure Sensor Information Management; Secure Dependable Information Management </li></ul></ul><ul><li>Some Directions for Privacy Research </li></ul><ul><ul><li>Data Mining for handling security problems; Privacy vs. National Security; Privacy Constraint Processing; Foundations of the Privacy Problem </li></ul></ul><ul><li>What are the Challenges? </li></ul><ul><li>Details of XML Security Research </li></ul>
    4. 4. Developments in Data and Applications Security: 1975 - Present <ul><li>Access Control for Systems R and Ingres (mid 1970s) </li></ul><ul><li>Multilevel secure database systems (1980 – present) </li></ul><ul><ul><li>Relational database systems: research prototypes and products; Distributed database systems: research prototypes and some operational systems; Object data systems; Inference problem and deductive database system; Transactions </li></ul></ul><ul><li>Recent developments in Secure Data Management (1996 – Present) </li></ul><ul><ul><li>Secure data warehousing, Role-based access control (RBAC); E-commerce; XML security and Secure Semantic Web; Data mining for intrusion detection and national security; Privacy; Dependable data management; Secure knowledge management and collaboration </li></ul></ul>
    5. 5. Developments in Data and Applications Security: Multilevel Secure Databases - I <ul><li>Air Force Summer Study in 1982 </li></ul><ul><li>Early systems based on Integrity Lock approach </li></ul><ul><li>Systems in the mid to late 1980s, early 90s </li></ul><ul><ul><li>E.g., Seaview by SRI, Lock Data Views by Honeywell, ASD and ASD Views by TRW </li></ul></ul><ul><ul><li>Prototypes and commercial products </li></ul></ul><ul><ul><li>Trusted Database Interpretation and Evaluation of Commercial Products </li></ul></ul><ul><li>Secure Distributed Databases (late 80s to mid 90s) </li></ul><ul><ul><li>Architectures; Algorithms and Prototype for distributed query processing; Simulation of distributed transaction management and concurrency control algorithms; Secure federated data management </li></ul></ul>
    6. 6. Developments in Data and Applications Security: Multilevel Secure Databases - II <ul><li>Inference Problem (mid 80s to mid 90s) </li></ul><ul><ul><li>Unsolvability of the inference problem; Security constraint processing during query, update and database design operations; Semantic models and conceptual structures </li></ul></ul><ul><li>Secure Object Databases and Systems (late 80s to mid 90s) </li></ul><ul><ul><li>Secure object models; Distributed object systems security; Object modeling for designing secure applications; Secure multimedia data management </li></ul></ul><ul><li>Secure Transactions (1990s) </li></ul><ul><ul><li>Single Level/ Multilevel Transactions; Secure recovery and commit protocols </li></ul></ul>
    7. 7. Some Directions and Challenges for Data and Applications Security - I <ul><li>Secure semantic web </li></ul><ul><ul><li>Single/multiple security models? </li></ul></ul><ul><ul><li>Different application domains </li></ul></ul><ul><li>Secure Information Integration </li></ul><ul><ul><li>How do you securely integrate numerous and heterogeneous data sources on the web and otherwise </li></ul></ul><ul><li>Secure Sensor Information Management </li></ul><ul><ul><li>Fusing and managing data/information from distributed and autonomous sensors </li></ul></ul><ul><li>Secure Dependable Information Management </li></ul><ul><ul><li>Integrating Security, Real-time Processing and Fault Tolerance </li></ul></ul><ul><li>Data Sharing vs. Privacy </li></ul><ul><ul><li>Federated database architectures? </li></ul></ul>
    8. 8. Some Directions and Challenges for Data and Applications Security - II <ul><li>Data mining and knowledge discovery for intrusion detection </li></ul><ul><ul><li>Need realistic models; real-time data mining </li></ul></ul><ul><li>Secure knowledge management </li></ul><ul><ul><li>Protect the assets and intellectual rights of an organization </li></ul></ul><ul><li>Information assurance, Infrastructure protection, Access Control </li></ul><ul><ul><li>Insider cyber-threat analysis, Protecting national databases, Role-based access control for emerging applications </li></ul></ul><ul><li>Security for emerging applications </li></ul><ul><ul><li>Geospatial, Biomedical, E-Commerce, etc. </li></ul></ul><ul><li>Other Directions </li></ul><ul><ul><li>Trust and Economics, Trust Management/Negotiation, Secure Peer-to-peer computing, </li></ul></ul>
    9. 9. NSF Efforts in Data and Applications Security (DAS) <ul><li>Security for IIS (Information and Intelligent Systems) Technologies </li></ul><ul><ul><li>DAS focuses on security needs for IIS Division Technologies (e.g. Information and data management, digital libraries, collaboration and e-business, etc.) </li></ul></ul><ul><ul><li>DAS related proposals have also been managed under ITR (Information Technology Research) and other initiatives (e.g., Sensor initiative) during FY2003 </li></ul></ul><ul><li>DAS is part of CISE-wide (Computer and Information Sciences) Directorate Theme on Cyber Trust for FY04 and beyond </li></ul><ul><ul><li>Focus areas for Cyber Trust include: Trusted Computing, Network Security, Data and Applications Security, Embedded Systems Security </li></ul></ul><ul><ul><li>Inaugural Cyber Trust PI Meeting in Baltimore, August 13-15, 2003 </li></ul></ul><ul><ul><li>Plans for FY2004 will be announced soon </li></ul></ul><ul><li>Opportunities possibly also under ITR </li></ul>
    10. 10. Directions and Challenges for Securing the Semantic Web <ul><li>The Semantic Web by Tim Berners Lee </li></ul><ul><ul><li>Definition and Layers </li></ul></ul><ul><li>Steps for Securing the Semantic Web </li></ul><ul><li>XML Security for Securing the Semantic Web </li></ul><ul><li>Related research and directions for secure semantic web </li></ul><ul><ul><li>Secure Information Integration </li></ul></ul>
    11. 11. Secure Semantic Web <ul><li>According to Tim Berners Lee, The Semantic Web supports </li></ul><ul><ul><li>Machine readable and understandable web pages </li></ul></ul><ul><li>Layers for the semantic web: Security cuts across all layers </li></ul><ul><li>Challenge: Not only integrating the layers for the semantic web, but also ensuring secure interoperability </li></ul>Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 TCP/IP, Sockets, HTML, Agents XML, XML Schemas RDF Ontologies, Semantic Interoperability Logic, Proof, Trust
    12. 12. Steps to Securing the Semantic Web <ul><li>Flexible Security Policy </li></ul><ul><ul><li>One that can adapt to changing situations and requirements </li></ul></ul><ul><li>Security Model </li></ul><ul><ul><li>Access Control, Role-based security, Nonrepudiation, Authentication </li></ul></ul><ul><li>Security Architecture and Design </li></ul><ul><ul><li>Examine architectures for semantic web and identify security critical components </li></ul></ul><ul><li>Securing the Layers of the Semantic Web </li></ul><ul><ul><li>Secure agents, XML security, RDF security, secure semantic interoperabiolity, security properties for ontologies, Security issues for digital rights </li></ul></ul><ul><li>Challenge: How do you integrate across the layers of the Semantic Web and preserve security? </li></ul><ul><li>Much of the research is focusing on XML security; Next step is securing RDF documents </li></ul>
    13. 13. XML Security <ul><li>Some ideas have evolved from research in secure multimedia/object data management </li></ul><ul><li>Access control and authorization models </li></ul><ul><ul><li>Protecting entire documents, parts of documents, propagations of access control privileges; Protecting DTDs vs Document instances; Secure XML Schemas </li></ul></ul><ul><li>Update Policies and Dissemination Policies </li></ul><ul><li>Secure publishing of XML documents </li></ul><ul><ul><li>How do you minimize trust for third party publication </li></ul></ul><ul><li>Use of Encryption </li></ul><ul><li>Inference problem for XML documents </li></ul><ul><ul><li>Portions of documents taken together could be sensitive, individually not sensitive </li></ul></ul><ul><li>More details at the end </li></ul>
    14. 14. What are the Next Steps and Challenges for Secure Semantic Web? - I <ul><li>We need to continue with XML security research as well as work with standards </li></ul><ul><ul><li>W3C standards are advancing rapidly; security research, prototypes and products must keep up with the developments </li></ul></ul><ul><ul><li>Researchers, vendors and standards organizations must work together </li></ul></ul><ul><li>Secure XML Database Systems (query, transactions, storage, - - -) </li></ul><ul><li>RDF Security </li></ul><ul><ul><li>When you bring in semantics, many challenges for security </li></ul></ul><ul><ul><li>Need to develop security models for RDF documents </li></ul></ul><ul><li>Secure Ontologies </li></ul><ul><ul><li>Two aspects; one is to develop protection models for Ontology databases; other is to use ontologies for ensuring security and privacy </li></ul></ul>
    15. 15. What are the Next Steps and Challenges for Secure Semantic Web? - II <ul><li>Secure semantic interoperability </li></ul><ul><ul><li>What can we learn from secure database interoperability and federated databases? </li></ul></ul><ul><li>Trust and digital rights management </li></ul><ul><ul><li>How do you trust the contents of a document? How do you pass digital rights when documents are disseminated? </li></ul></ul><ul><li>Security for domain specific semantic webs </li></ul><ul><ul><li>Do we need multiple security policies and models? </li></ul></ul><ul><li>Secure interoperability across the layers of the semantic web </li></ul><ul><ul><li>This will be a major challenge even when security is not being considered </li></ul></ul><ul><ul><li>Security has to be considered in the beginning </li></ul></ul><ul><li>Secure Information Integration is a key component of securing the semantic web </li></ul>
    16. 16. Secure Information Integration <ul><li>Integrate disparate, heterogeneous and autonomous information sources on the web or otherwise </li></ul><ul><ul><li>E.g, structured/unstructured data, data streams, geospatial data </li></ul></ul><ul><li>Security must be considered together with the Information Integration technologies </li></ul><ul><li>IJCAI workshop on Information Integration http://www. isi . edu /info-agents/workshops/ijcai03/ iiweb .html </li></ul><ul><ul><li>Technologies include Information extraction and gathering; Wrapper learning and automatic wrapper generation; Source descriptions, source meta-data learning and source statistics learning; Web service composition; Record linkage/object consolidation and Ontology matching; Novel integration and Inter-schema mediation architectures; Answering queries using views; Web-based query planning, optimization and execution; Data mining for integration </li></ul></ul>
    17. 17. Secure Information Integration: Directions for Research <ul><li>Start research on security technologies for information integration </li></ul><ul><ul><li>E.g., Secure web services decomposition; Security architectures for integration; Security issues for ontology matching, Secure information extraction, etc. </li></ul></ul><ul><li>Secure sensor information management is one aspect of secure information integration </li></ul><ul><ul><li>Data streams from disparate, autonomous and heterogeneous sensors have to be fused and managed securely </li></ul></ul>
    18. 18. Secure Sensor Information Management <ul><li>Sensor network consists of a collection of autonomous and interconnected sensors that continuously sense and store information about some local phenomena </li></ul><ul><ul><li>May be employed in battle fields, seismic zones, pavements </li></ul></ul><ul><li>Data streams emanate from sensors; for geospatial applications these data streams could contain continuous data of maps, images, etc. Data has to be fused and aggregated </li></ul><ul><li>Continuous queries are posed, responses analyzed possibly in real-time, some streams discarded while rest may be stored </li></ul><ul><li>Recent developments in sensor information management include sensor database systems, sensor data mining, distributed data management, layered architectures for sensor nets, storage methods, data fusion and aggregation </li></ul><ul><li>Secure sensor data/information management has received very little attention; need a research agenda </li></ul>
    19. 19. Secure Sensor Information Management: Directions for Research <ul><li>Individual sensors may be compromised and attacked; need techniques for detecting, managing and recovering from such attacks </li></ul><ul><li>Aggregated sensor data may be sensitive; need secure storage sites for aggregated data; variation of the inference and aggregation problem? </li></ul><ul><li>Security has to be incorporated into sensor database management </li></ul><ul><ul><li>Policies, models, architectures, queries, etc. </li></ul></ul><ul><li>Evaluate costs for incorporating security especially when the sensor data has to be fused, aggregated and perhaps mined in real-time </li></ul><ul><li>Research on secure dependable information management for sensor data </li></ul>
    20. 20. Secure Dependable Information Management <ul><li>Dependable information management includes </li></ul><ul><ul><li>secure information management </li></ul></ul><ul><ul><li>fault tolerant information </li></ul></ul><ul><ul><li>High integrity and high assurance computing </li></ul></ul><ul><ul><li>Real-time computing </li></ul></ul><ul><li>Conflicts between different features </li></ul><ul><ul><li>Security, Integrity, Fault Tolerance, Real-time Processing </li></ul></ul><ul><ul><li>E.g., A process may miss real-time deadlines when access control checks are made </li></ul></ul><ul><ul><li>Trade-offs between real-time processing and security </li></ul></ul><ul><ul><li>Need flexible security policies; real-time processing may be critical during a mission while security may be critical during non-operational times </li></ul></ul>
    21. 21. Secure Dependable Information Management Example: Next Generation AWACS <ul><li>Technology provided by the project </li></ul>Hardware Display Processor & Refresh Channels Consoles (14) Navigation Sensors Data Links Data Analysis Programming Group (DAPG) Future App Future App Future App Multi-Sensor Tracks Sensor Detections MSI App Data Mgmt. Data Xchg. Infrastructure Services <ul><li>Security being considered after </li></ul><ul><li>the system has been designed </li></ul><ul><li>and prototypes implemented </li></ul><ul><li>Challenge: Integrating real-time </li></ul><ul><li>processing, security and </li></ul><ul><li>fault tolerance </li></ul>Real-time Operating System
    22. 22. Secure Dependable Information Management: Directions for Research <ul><li>Challenge: How does a system ensure integrity, security, fault tolerant processing, and still meet timing constraints? </li></ul><ul><ul><li>Develop flexible security policies; when is it more important to ensure real-time processing and ensure security? </li></ul></ul><ul><ul><li>Security models and architectures for the policies; Examine real-time algorithms – e.g.,query and transaction processing </li></ul></ul><ul><ul><li>Research for databases as well as for applications; what assumptions do we need to make about operating systems, networks and middleware? </li></ul></ul><ul><li>Data may be emanating from sensors and other devices at multiple locations </li></ul><ul><ul><li>Data may pertain to individuals (e.g. video information, images, surveillance information, etc.) </li></ul></ul><ul><ul><li>Data may be mined to extract useful information </li></ul></ul><ul><ul><li>Need to maintain privacy </li></ul></ul>
    23. 23. Research Directions for Privacy <ul><li>Why this interest now on privacy? </li></ul><ul><ul><li>Data Mining for National Security </li></ul></ul><ul><ul><li>Data Mining is a threat to privacy </li></ul></ul><ul><ul><li>Balance between data sharing/mining and privacy </li></ul></ul><ul><ul><ul><li>Is federated data management a solution </li></ul></ul></ul><ul><li>Privacy Preserving Data Mining </li></ul><ul><li>Inference Problem as a Privacy Problem </li></ul><ul><ul><li>Handling privacy constraints; Foundations </li></ul></ul><ul><li>Web/Semantic Web will have to address privacy </li></ul><ul><li>Federated Architectures for Data Sharing? </li></ul>
    24. 24. Data Mining to Handle Security Problems <ul><li>Data mining tools could be used to examine audit data and flag abnormal behavior </li></ul><ul><li>Much recent work in Intrusion detection </li></ul><ul><ul><li>e.g., Neural networks to detect abnormal patterns </li></ul></ul><ul><li>Tools are being examined to determine abnormal patterns for national security </li></ul><ul><ul><li>Classification techniques, Link analysis </li></ul></ul><ul><li>Fraud detection </li></ul><ul><ul><li>Credit cards, calling cards, identity theft etc. </li></ul></ul>
    25. 25. Data Mining as a Threat to Privacy <ul><li>Data mining gives us “facts” that are not obvious to human analysts of the data </li></ul><ul><li>Enables inspection and analysis of huge amounts of data </li></ul><ul><li>Possible threats: </li></ul><ul><ul><li>Predict information about classified work from correlation with unclassified work </li></ul></ul><ul><ul><li>Mining “Open Source” data to determine predictive events (e.g., Pizza deliveries to the Pentagon ) </li></ul></ul><ul><li>It isn’t the data we want to protect, but correlations among data items </li></ul><ul><li>Initial ideas presented at the IFIP 11.3 Database Security Conference, July 1996 in Como, Italy </li></ul><ul><li>Data Sharing/Mining vs. Privacy: Federated Data Management Architecture for the Department of Homeland Security? </li></ul>
    26. 26. What can we do?: Privacy Preserving Data Mining <ul><li>Prevent useful results from mining </li></ul><ul><ul><li>limit data access to ensure low confidence and support </li></ul></ul><ul><ul><li>Extra data (“cover stories”) to give “false” results with Providing only samples of data can lower confidence in mining results; </li></ul></ul><ul><li>Idea: If adversary is unable to learn a good classifier from the data, then adversary will be unable to learn good </li></ul><ul><ul><li>rules, predictive functions </li></ul></ul><ul><li>Approach: Only make a sample of data available </li></ul><ul><ul><li>Limits ability to learn good classifier </li></ul></ul><ul><li>Several recent research efforts have been reported </li></ul>
    27. 27. Privacy Problem as a form of the Inference Problem <ul><li>Privacy constraints </li></ul><ul><ul><li>Content-based constraints; association-based constraints </li></ul></ul><ul><li>Privacy controller </li></ul><ul><ul><li>Augment a database system with a privacy controller for constraint processing and examine the releasability of data/information (e.g., release constraints) </li></ul></ul><ul><li>Use of conceptual structures to design applications with privacy in mind (e.g., privacy preserving database and application design) </li></ul><ul><li>The web makes the problem much more challenging than the inference problem we examined in the 1990s! </li></ul><ul><li>Is the General Privacy Problem Unsolvable? </li></ul>
    28. 28. Privacy Constraints <ul><li>Simple Constraints - an attribute of a document is private </li></ul><ul><li>Content-based constraints: If document contains information about XXX, then it is private </li></ul><ul><li>Association-based Constraints: Two or more documents together is private; individually they are public </li></ul><ul><li>Dynamic constraints: After some event, the document is private or becomes public </li></ul><ul><li>Several challenges: Specification and consistency of constraints is a Challenge; How do you take into consideration external knowledge? Managing history information </li></ul>
    29. 29. Architecture for Privacy Constraint Processing User Interface Manager Constraint Manager Privacy Constraints Query Processor: Constraints during query and release operations Update Processor: Constraints during update operation Database Design Tool Constraints during database design operation Database DBMS
    30. 30. Foundations of the Privacy Problem <ul><li>Privacy Problem: Given a database and a set of privacy constraints, can you decide ahead of time that privacy will be violated; that is, through querying can one extract information that is private? </li></ul><ul><li>Is the General Privacy problem unsolvable </li></ul><ul><ul><li>Yes. </li></ul></ul><ul><ul><li>To what extent? </li></ul></ul><ul><ul><ul><li>Research result: For every recursively enumerable degree one can find a privacy problem that is one-one equivalent to the degree (paper in preparation) </li></ul></ul></ul><ul><li>What is the Computational Complexity of the Privacy Problem? </li></ul><ul><ul><li>Can one develop varying degrees of privacy classes? What is the space-time complexity? </li></ul></ul>
    31. 31. Privacy for the Web/Semantic Web <ul><li>Privacy for the web is getting a lot of attention; especially after the publicity with the DARPA program (total information awareness - TIA) </li></ul><ul><li>We need to start looking at privacy for the semantic web also; that is what are the additional privacy concerns due to the semantic web? </li></ul><ul><li>Is privacy a technical problem? What roles do lawyers, policy makers and sociologists have to play? </li></ul><ul><ul><li>How can scientists and technologists, lawyers, policy makers and sociologists work together? </li></ul></ul><ul><li>Should we limit privacy research within the context of national security or extend it beyond –e.g., medical community, banking, IRS </li></ul><ul><li>We must follow up with recent IBM workshop on Privacy; Discussions at NSF involving multiple programs </li></ul>
    32. 32. Secure Federated Database Management for Data Sharing: Schema Integration Adapted from Sheth and Larson, ACM Computing Surveys, September 1990 Component Schema for Component A Component Schema for Component B Component Schema for Component C Local Schema 1 Local Schema 2 Generic Schema for Component A Generic Schema for Component B Generic Schema for Component C Export Schema for Component A Export Schema I for Component B Export Schema for Component C Federated Schema for FDS - 1 Federated Schema for FDS - 2 External Schema 1.2 Schema 2.1 External Schema 2.2 External Schema 1.1 Export Schema II for Component B External
    33. 33. Secure Federated Database Management for Data Sharing: Policy Integration Policies at the Component level: e.g., Component policies for components A, B, and C Generic policies for the components: e.g., generic policies for components A, B, and C External policies: Policies for the various classes of users Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Federated policies: integrate export policies of the components of the federation Export policies for the components: e.g., export policies for components A, B, and C (note: component may export different policies to different federations) Adapted from Computers and Security, Thuraisingham, December 1994
    34. 34. What are our challenges? <ul><li>If semantic web is to become viable, we need to understand how the different layers may interoperate; we cannot ignore security and privacy </li></ul><ul><li>Data Mining, National Security and Privacy will dominate research because of the times we are living in </li></ul><ul><li>We don’t have a good handle on secure dependable data/information management </li></ul><ul><ul><li>How do we handle conflicting requirements? e.g., integrating security, real-time processing, and fault tolerant computing </li></ul></ul><ul><ul><li>Building dependable semantic webs? </li></ul></ul><ul><li>Secure sensor nets, Secure e-commerce systems, Secure knowledge management will continue to have many challenging research problems </li></ul><ul><li>We need to build systems based on solid theoretical foundations; composable systems (ensure interfaces are secure) </li></ul><ul><li>Interdisciplinary research is the way of the future; within CS as well as between CS and other areas (e.g., secure sensors) </li></ul>
    35. 35. Some Key Directions <ul><li>Transfer security technology to operational systems; need to develop systems that are flexible, usable and secure </li></ul><ul><ul><li>Bring human computer interaction and people aspects into system design </li></ul></ul><ul><li>Security for emerging applications </li></ul><ul><ul><li>E.g., medical informatics, bioinformatics, scientific and engineering informatics, and other areas </li></ul></ul><ul><li>Data mining for security (e.g., intrusion detection, insider cyber threat); cannot forget about Privacy </li></ul><ul><li>Interdisciplinary research in information security </li></ul><ul><li>Emerging areas include Secure semantic web, Secure Information Integration, Secure Sensors, Trust Management/Negotiation, Economics, - - - - - </li></ul>
    36. 36. Other Ideas and Directions? <ul><li>Please contact </li></ul><ul><ul><li>Dr. Bhavani Thuraisingham The National Science Foundation Suite 1115 4201 Wilson Blvd Arlington, VA 22230 Phone: 703-292-8930 Fax 703-292-9037 email: bthurais@nsf.gov </li></ul></ul>
    37. 37. XML Security <ul><li>Collaborating with University of Milan; Paper to appear in TKDE </li></ul><ul><li>Access Control </li></ul><ul><ul><li>Pull model: User queries XML documents; results are computed by applying the access control rules in the policy base and user credentials </li></ul></ul><ul><ul><li>Push model: Periodically portions of XML documents are pushed to the user depending on the credentials and access control rules </li></ul></ul><ul><li>Secure publishing of XML documents </li></ul><ul><ul><li>With a set of digital signatures generated by the owner and no trust required of the publisher, a subject can verify the authenticity of the query response </li></ul></ul>
    38. 38. Example XML Document Patents Funds Year: 2002 Name: U. Of X Expenses Name: CS title Author ID Asset report Assets Dept Equipment news Patent Other assets Grants Contracts
    39. 39. Subject Credentials and Protection Objects <ul><li>Subjects are given access to XML documents or portions of documents depending on user ID and/or Credentials </li></ul><ul><li>Credential specification is based on credential types; credential type is a pair <credential name, credential properties> </li></ul><ul><ul><li>Example of credential types for the XML document are: Professor, Secretary (depending on the roles) </li></ul></ul><ul><li>Protection objects are objects to which access is controlled </li></ul><ul><ul><li>Entire XML documents or portions of XML documents </li></ul></ul><ul><ul><li>Protection objects is a pair <target, path> </li></ul></ul><ul><ul><li>Target is the file name of the XML document </li></ul></ul><ul><ul><li>Path is Xpath expression on target </li></ul></ul>
    40. 40. Credential Base <Professor credID=“9” subID = “16: CIssuer = “2”> <name> Alice Brown </name> <university> University of X <university/> <department> CS </department> <research-group> Security </research-group> </Professor> <Secretary credID=“12” subID = “4: CIssuer = “2”> <name> John James </name> <university> University of X <university/> <department> CS </department> <level> Senior </level> </Secretary>
    41. 41. Policy Base <ul><li>Policy base stores security policies for protecting the XML source contents </li></ul><ul><li>Policy base is an XML document with a subelement policyspec for each security policy defined for XML source </li></ul><ul><li>Policyspec has the following </li></ul><ul><ul><li>Subject consisting of userID and/or credentials </li></ul></ul><ul><ul><li>Object (with target and path) </li></ul></ul><ul><ul><li>Access modes: Read, Navigate, Append, Write </li></ul></ul><ul><ul><li>Propagation option: No propagation, One level, Cascade </li></ul></ul><ul><li>Security officer manages the policy base </li></ul>
    42. 42. Policy Base Example <? Xml VERSION = “1.0” ENCODING = “utf-8”?> <Policy–base> <policy-spec cred-expr = “ //Professor[department = ‘CS’] ” target = “ annual_ report.xml ” path = “ //Patent[@Dept = ‘CS’]//Node() ” priv = “VIEW” /> <policy-spec cred-expr = “ //Professor[department = ‘CS’] ” target = “ annual_ report.xml ” path = “ //Patent[@Dept = ‘EE’] /Short-descr/Node() and //Patent [@Dept = ‘EE’]/authors ” priv = “VIEW” /> <policy-spec cred-expr = - - - - <policy-spec cred-expr = - - -- </Policy-base> Explantaion: CS professors are entitled to access all the patents of their department. They are entitled to see only the short descriptions and authors of patents of the EE department
    43. 43. Access Control Strategy <ul><li>Subjects request access to XML documents under two modes: Browsing and authoring </li></ul><ul><ul><li>With browsing access subject can read/navigate documents </li></ul></ul><ul><ul><li>Authoring access is needed to modify, delete, append documents </li></ul></ul><ul><li>Access control module checks the policy based and applies policy specs </li></ul><ul><li>Views of the document are created based on credentials and policy specs </li></ul><ul><li>In case of conflict, least access privilege rule is enforced </li></ul>
    44. 44. System Architecture for Access Control User Pull/Query Push/result XML Documents X-Access X-Admin Admin Tools Policy base Credential base
    45. 45. Secure Publishing of XML Documents <ul><li>Distinguish between owner, publisher and user (subject) </li></ul><ul><li>Owner specifies access control policies based on user credentials; policy specified in policybase </li></ul><ul><li>Publisher computes view of document and sends reply document to subject; no trust placed on the publisher by using signatures </li></ul>Owner Publisher Subject Subscribe Policy Security Enhanced Document Secure Structure Query, Policy Reply document, Secure structure
    46. 46. Subject Owner Interaction <ul><li>Subjects register with Owner during subscription phase; during this phase subject is assigned by owner credentials stored at the owner site </li></ul><ul><li>Owner returns to the subject the Subject Policy Configuration (policy identifiers) that apply to the subject signed with the private key of the owner </li></ul><ul><li>Example: If polices P1 and P2 apply to John and policy P6 apply to Jane, owner Joe sends John P1 and P2 and to Jane P6 signed with Joe’s private key </li></ul>
    47. 47. Owner Publisher Interaction <ul><li>For each document the owner sends the publisher information on which subjects can access which portions of the document according to the policy base (I.e. access control policies) </li></ul><ul><ul><li>Also for each element e based on the policies applied to e, the owner inserts policy configuration (binary string) converted to hexadecimal representation; this element is called Policy configuration attribute (PCattribute) </li></ul></ul><ul><ul><li>Policy element which describes the policies for the document is also inserted </li></ul></ul><ul><li>Owner also sends publisher Merkle Signature of each document </li></ul><ul><ul><li>It is the Merkle hash signed by owner’s private key </li></ul></ul><ul><li>The document together with the security information is called “ Security Enhanced Document ” (SE-XML) </li></ul><ul><li>Information in the security enhanced document enables the subject to verify the authenticity of the document returned by publisher </li></ul><ul><li>Additional information encoded in the document called Secure Structure is used by the subject to verify completeness of the result (for certain queries) </li></ul>
    48. 48. Subject Publisher Interaction <ul><li>The subject submits queries to publisher; it also sends its subject policy configuration </li></ul><ul><li>Publisher computes a view of the requested documents based on access control policies for the subject set by the owner </li></ul><ul><li>To verify the authenticity of the answer, subject must recompute the same bottom up hash value signed by owner (i.e. Merkle signature) and compare it with the Merkle signature generated by the owner and inserted by the publisher </li></ul><ul><li>Subject may not get the entire document; therefore publisher sends to the subject additional hash values that refer to the missing portions of the document </li></ul><ul><ul><li>Hash value of parent is computed from hash values of children as well as hash values of tag names/values; publisher sends enough information for subject to compute hash value of the document </li></ul></ul><ul><li>Subject verifies the authenticity of the document </li></ul>
    49. 49. MhX(Author) =h(h(Author)||h(Author.value)) MhX(title) =h(h(title)||h(title.value)) MhX(paragraph) =h(h(paragraph)||h(paragraph.content)|| MhX(Author) || MhX(title) ) Merkle Signature: Example title title Author Author paragraph Politic_page Literary_page Paragraphs title date title Author title Author topic title Author topic title Author topic title Author topic Article Newspaper Frontpage Leading Sport_page news news Politic paragraph
    50. 50. Some Results <ul><li>Theorem 1: Let g = (Vg, vg, Eg, FEg) be the SE-XML version of an XML document d and r = (Vr, vr, Er, FEr) be the reply document corresponding to a query submitted on d by subject s. Each node in Vr,e is authenticable by s where a document d = (Vd, vd, Ed, Fed) is defined as follows: Vd is the set of all element nodes and attribute nodes in d, vd is the node representing the document element called the document root, Ed is the set of edges in d, and FEd is the edge labeling function, Vr,e is the set of element nodes in the reply document r </li></ul><ul><li>Subject Verification Algorithm: Input: Reply document r = (Vr, vr, Er, FEr) Output = True if all nodes in r are authentic. False otherwise </li></ul><ul><li>Theorem 2: Let s be a subject, q be a query submitted by s, and r be the reply document received by s as an answer to q. Subject verification algorithm returns True iff. Each v in Vr,e is authentic where Vr,e is the set of element nodes in the reply document r </li></ul>
    51. 51. Note on Completeness <ul><li>Owner sends structure of the XML document to publisher called secure structure containing names of tags and attributes and not the data content </li></ul><ul><li>Publisher sends the secure structure together with reply document to subject </li></ul><ul><li>Subject locally executes on the structure all queries whose conditions are against the document structure of the original document; the results are compared with the reply document </li></ul><ul><li>Key points </li></ul><ul><ul><li>Secure structure of the document is generated by hashing each tag and attribute name; it has the hashed attribute values of the XML document </li></ul></ul><ul><ul><li>Secure structure also has policy element and policy configuration attributes of elements (not hashed) </li></ul></ul><ul><ul><li>Completeness for queries on structure and equality on attribute values </li></ul></ul><ul><li>Challenge: Extensions for more general queries </li></ul>

    ×