UnBBayes is a probabilistic network framework written in Java. It has both a GUI and an API with inference, sampling, learning and evaluation. It supports BN, ID, MSBN, OOBN, HBN, MEBN/PR-OWL, PRM, structure, parameter and incremental learning.
This presentation talks about UnBBayes-PRM, a plugin for UnBBayes that has a simple implementation of Probabilistic Relational Models.
This presentation was given by Shou Matsumoto from the University of Brasilia in Brazil via web conference to PhD students at George Mason University in the US on the Friday seminar called Krypton (http://krypton.c4i.gmu.edu/) in October 29, 2010.
Knowledge Representation in Artificial intelligence Yasir Khan
This document discusses different methods of knowledge representation in artificial intelligence, including logical representations, semantic networks, production rules, and frames. Logical representations use formal logics like propositional logic and first-order predicate logic to represent facts and relationships. Semantic networks represent knowledge graphically as nodes and edges to model concepts and their relationships. Production rules represent knowledge as condition-action pairs to model problem-solving. Frames represent stereotyped situations as templates with slots to model attributes and behaviors. Choosing the right knowledge representation method is important for building successful AI systems.
The document discusses different knowledge representation schemes used in artificial intelligence systems. It describes semantic networks, frames, propositional logic, first-order predicate logic, and rule-based systems. For each technique, it provides facts about how knowledge is represented and examples to illustrate their use. The goal of knowledge representation is to encode knowledge in a way that allows inferencing and learning of new knowledge from the facts stored in the knowledge base.
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
Tasks such as question answering and semantic search are dependent
on the ability of querying & reasoning over large-scale commonsense knowledge
bases (KBs). However, dealing with commonsense data demands coping with
problems such as the increase in schema complexity, semantic inconsistency, incompleteness
and scalability. This paper proposes a selective graph navigation
mechanism based on a distributional relational semantic model which can be applied
to querying & reasoning over heterogeneous knowledge bases (KBs). The
approach can be used for approximative reasoning, querying and associational
knowledge discovery. In this paper we focus on commonsense reasoning as the
main motivational scenario for the approach. The approach focuses on addressing
the following problems: (i) providing a semantic selection mechanism for facts
which are relevant and meaningful in a specific reasoning & querying context
and (ii) allowing coping with information incompleteness in large KBs. The approach
is evaluated using ConceptNet as a commonsense KB, and achieved high
selectivity, high scalability and high accuracy in the selection of meaningful nav-
igational paths. Distributional semantics is also used as a principled mechanism
to cope with information incompleteness.
Introduction to Distributional SemanticsAndre Freitas
This document provides an introduction to distributional semantics. It discusses how distributional semantic models (DSMs) represent word meanings as vectors based on their linguistic contexts in large corpora. This distributional hypothesis states that words that appear in similar contexts tend to have similar meanings. The document outlines how DSMs are built, important parameters like context type and weighting, and examples like latent semantic analysis. It also discusses how DSMs can support applications like semantic search. Finally, it introduces how compositional semantics explores representing the meanings of phrases and sentences compositionally based on the meanings of their parts.
This document summarizes different theories of how knowledge is organized in memory. It discusses declarative versus procedural knowledge, with declarative being "knowing that" facts and procedural being "knowing how" to perform skills. Concepts, categories, networks and schemas are reviewed as ways to organize declarative knowledge. Prototype and exemplar theories are described as alternatives to defining categories solely based on necessary features. The ACT-R model integrates propositional networks to represent declarative knowledge and production systems for procedural knowledge.
Knowledge Representation in Artificial intelligence Yasir Khan
This document discusses different methods of knowledge representation in artificial intelligence, including logical representations, semantic networks, production rules, and frames. Logical representations use formal logics like propositional logic and first-order predicate logic to represent facts and relationships. Semantic networks represent knowledge graphically as nodes and edges to model concepts and their relationships. Production rules represent knowledge as condition-action pairs to model problem-solving. Frames represent stereotyped situations as templates with slots to model attributes and behaviors. Choosing the right knowledge representation method is important for building successful AI systems.
The document discusses different knowledge representation schemes used in artificial intelligence systems. It describes semantic networks, frames, propositional logic, first-order predicate logic, and rule-based systems. For each technique, it provides facts about how knowledge is represented and examples to illustrate their use. The goal of knowledge representation is to encode knowledge in a way that allows inferencing and learning of new knowledge from the facts stored in the knowledge base.
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
Tasks such as question answering and semantic search are dependent
on the ability of querying & reasoning over large-scale commonsense knowledge
bases (KBs). However, dealing with commonsense data demands coping with
problems such as the increase in schema complexity, semantic inconsistency, incompleteness
and scalability. This paper proposes a selective graph navigation
mechanism based on a distributional relational semantic model which can be applied
to querying & reasoning over heterogeneous knowledge bases (KBs). The
approach can be used for approximative reasoning, querying and associational
knowledge discovery. In this paper we focus on commonsense reasoning as the
main motivational scenario for the approach. The approach focuses on addressing
the following problems: (i) providing a semantic selection mechanism for facts
which are relevant and meaningful in a specific reasoning & querying context
and (ii) allowing coping with information incompleteness in large KBs. The approach
is evaluated using ConceptNet as a commonsense KB, and achieved high
selectivity, high scalability and high accuracy in the selection of meaningful nav-
igational paths. Distributional semantics is also used as a principled mechanism
to cope with information incompleteness.
Introduction to Distributional SemanticsAndre Freitas
This document provides an introduction to distributional semantics. It discusses how distributional semantic models (DSMs) represent word meanings as vectors based on their linguistic contexts in large corpora. This distributional hypothesis states that words that appear in similar contexts tend to have similar meanings. The document outlines how DSMs are built, important parameters like context type and weighting, and examples like latent semantic analysis. It also discusses how DSMs can support applications like semantic search. Finally, it introduces how compositional semantics explores representing the meanings of phrases and sentences compositionally based on the meanings of their parts.
This document summarizes different theories of how knowledge is organized in memory. It discusses declarative versus procedural knowledge, with declarative being "knowing that" facts and procedural being "knowing how" to perform skills. Concepts, categories, networks and schemas are reviewed as ways to organize declarative knowledge. Prototype and exemplar theories are described as alternatives to defining categories solely based on necessary features. The ACT-R model integrates propositional networks to represent declarative knowledge and production systems for procedural knowledge.
Knowledge representation is a field of artificial intelligence that represents information about the world in a way that a computer system can understand to perform complex tasks. It simplifies complex systems through modeling human psychology and problem-solving. Examples of knowledge representation include semantic nets, frames, rules, and ontologies. Knowledge representation allows for automated reasoning about represented knowledge and asserting new knowledge. While first-order logic provides powerful and compact representation, it lacks ease of use and practical implementation for real-world problems. Effective knowledge representation requires balancing expressive power with practical considerations like execution efficiency.
The document provides an overview of artificial intelligence and knowledge-based systems. It discusses definitions of intelligence and AI, as well as knowledge representation schemes like logical, procedural, semantic network, and frame-based representations. The key components of a knowledge-based system are described as the knowledge base, which represents problem domain knowledge, and the inference engine, which uses reasoning techniques to solve problems. Ideal features of knowledge-based systems include efficient problem-solving using knowledge, heuristics, and eliminating unproductive solutions.
This document provides an overview and introduction to the course "Knowledge Representation & Reasoning" taught by Ms. Jawairya Bukhari. It discusses the aims of developing skills in knowledge representation and reasoning using different representation methods. It outlines prerequisites like artificial intelligence, logic, and programming. Key topics covered include symbolic and non-symbolic knowledge representation methods, types of knowledge, languages for knowledge representation like propositional logic, and what knowledge representation encompasses.
A Fuzzy Valid-Time Model for Relational Databases Within the Hibernate FrameworkJosé Enrique Pons
Time in databases has been studied for a long time. Valid time databases capture when the objects are true in the reality. The proposed model allows both representing and querying time in a fuzzy way. The representation and the underlying domain are defined as well as some fuzzy temporal operators. The implementation of the model is developed within the Hibernate framework. The Hibernate framework acts as an abstraction for the running database. Therefore, any relational database supported by the framework can now represent fuzzy valid time in its schema.
This document provides an introduction to knowledge representation in artificial intelligence. It discusses how knowledge representation and reasoning forms the basis of intelligent behavior through computational means. The key types of knowledge that need to be represented are defined, including objects, events, facts, and meta-knowledge. Different types of knowledge such as declarative, procedural, structural and heuristic knowledge are explained. The importance of knowledge representation for modeling intelligent behavior in agents is highlighted. The requirements for effective knowledge representation including representational adequacy, inferential adequacy, inferential efficiency, and acquisitional efficiency are outlined. Propositional logic is introduced as the simplest form of logic using propositions.
The document discusses various topics in artificial intelligence including the Turing test, knowledge representation using semantic networks and search trees, expert systems, neural networks, natural language processing, robotics, and ethical issues. It provides examples and explanations of each topic to demonstrate key concepts in AI such as how knowledge is represented, how expert systems make inferences, how neural networks are trained, and challenges with natural language comprehension. The chapter aims to distinguish problems humans solve best from those computers solve best and define important AI terms and techniques.
The document discusses knowledge representation in cognitive psychology. It defines knowledge and describes two main types: declarative and procedural knowledge. Declarative knowledge refers to static facts and information stored in memory, while procedural knowledge involves skills and how to perform tasks or activities. The document also explains several methods for representing declarative knowledge, including concepts and schemas, frames, and semantic networks. Frames organize knowledge into attribute-value pairs, while semantic networks use a graph structure to represent relationships between concepts. Overall, the document provides an overview of knowledge representation and different models for encoding declarative and procedural information.
UnBBayes is a probabilistic network framework written in Java. It has both a GUI and an API with inference, sampling, learning and evaluation. It supports BN, ID, MSBN, OOBN, HBN, MEBN/PR-OWL, structure, parameter and incremental learning.
This presentation talks about UnBBayes version 4.0.0, which is the first version that supports plugins. In it we present the major concepts behind this Plugin Framework, features and benefits, applications, some sample plugins, specification, extension points, and availability.
This presentation was given by Shou Matsumoto from the University of Brasilia in Brazil via web conference to PhD students at George Mason University in the US on the Friday seminar called Krypton (http://krypton.c4i.gmu.edu/).
UnBBayes is a probabilistic network framework written in Java. It has both a GUI and an API with inference, sampling, learning and evaluation. It supports BN, ID, MSBN, OOBN, HBN, MEBN/PR-OWL, structure, parameter and incremental learning.
The overview is presented through a slides potpourri from different presentations the Artificial Intelligence Group (GIA) from University of Brasilia (UnB) has given since 1999. It covers BN, ID, MSBN, UnBBayes Server, and MEBN.
This presentation was given by Rommel Carvalho when he started his PhD at George Mason University on the Friday seminar called Krypton (http://krypton.c4i.gmu.edu/).
The SPE-PRMS system provides a framework for classifying oil and gas resources with two levels of uncertainty. The first is the uncertainty around discovery volumes, and the second is around the commercial viability of discoveries. Resources are classified as reserves, contingent resources, or prospective resources. Reserves have been discovered and proven commercial through drilling. Contingent resources are discovered but not proven commercial. Prospective resources are undiscovered but thought to have potential for commercial production. The PRMS system defines reserves as quantities that can be commercially produced meeting criteria around discovery, producibility, commercial viability, and existing development plans. Reserves estimates are provided as 1P, 2P, and 3P values with varying probabilities of actual production amounts meeting or
Resources and Reserves are the foundation assets to all E&P companies. Valuation of E&P companies is based on these numbers in their book! It is the most important asset of any E&P company. In Fortune 500 (http://fortune.com/global500/ ) 2,3,4,5,6 11 & 12 are E&P.
This presentation gives an overview to what goes into this MOST IMPORTANT number. You may need to check some basic terms and backup to get a fuller understandign
What are my 3P Reserves? Haas Petroleum Engineering Serviceshaasengineering
What is the best way to estimate your 3P reserves? President of Haas Petroleum Engineering Services Thad Toups gave this presentation on Haas' internal analytics and auditing methodology.
Geomodelling, resource & reserve estimation using mining softwareChandra Bose
The document provides an overview of geomodelling, resource and reserve estimation, and pit optimization for mining projects. It discusses how borehole data, lithology, mineralization, and quality data are used in geomodelling software to create 3D geological models and cross sections. Resource and reserve estimation involves categorizing resources, estimating densities, recovery factors, and cut-off depths to determine geological, mineable, and extractable reserves. Pit optimization software is used to design optimal open pit mine plans that consider pit boundaries, slopes, benches, and production schedules to maximize profitability over the life of the mine.
Ouvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big DataRommel Carvalho
Apresentação realizada no dia 14/03/2017 por Rommel N. Carvalho na Semana de Ouvidoria e Acesso à Informação de 2017, organizada pela CGU.
YouTube: https://youtu.be/vNMtULu5X1c?t=3h20m24s
Como transformar servidores em cientistas de dados e diminuir a distância ent...Rommel Carvalho
Palestra ministrada pelo Dr. Rommel Novaes Carvalho, Coordenador-Geral do Observatório da Despesa Pública e Professor do Mestrado Profissional em Computação Aplicada da UnB.
Evento: Brasil 100% Digital: Integração e transparência a serviço da sociedade
Website: http://www.brasildigital.gov.br/
Data: 10/11/2016
Vídeo: https://www.youtube.com/watch?v=3WYQlPR-RLw&feature=youtu.be&t=2h4m44s
Proposta de Modelo de Classificação de Riscos de Contratos PúblicosRommel Carvalho
O documento propõe três modelos para avaliar o risco de contratos públicos: 1) um modelo de aprendizagem supervisionada para classificar o risco de fornecedores com base em variáveis como doações políticas e histórico de punições; 2) um segundo modelo para classificar o risco de contratos com base em aspectos como competitividade e complexidade; 3) um modelo multicritério para selecionar casos de auditoria com base no risco do contrato, risco da empresa, e questões logísticas.
Categorização de achados em auditorias de TI com modelos supervisionados e nã...Rommel Carvalho
Palestra ministrada pela Patrícia Maia no 2o Seminário sobre Análise de Dados na Administração Pública @ http://www.brasildigital.gov.br/
Resumo: O trabalho consistiu na aplicação de técnicas de mineração de textos para identificação dos principais assuntos abordados nas auditorias dos últimos cinco anos. Foram utilizadas duas abordagens: a abordagem supervisionada aplicando classificação de textos com o algoritmo Random Forest e a abordagem não supervisionada através da técnica de modelagem de tópicos Latent Dirichlet Allocation (LDA). O projeto piloto foi validado com as constatações de TI e está agora sendo estendido a constatações relacionadas a outros temas. O objetivo é permitir catalogar o histórico de constatações emitidas e categorizar automaticamente novos registros. Com isso, os servidores poderão recuperar situações semelhantes para aplicação em novos trabalhos ou, ainda, tratar problemas recorrentes de forma estruturante. Além disso a mesma lógica pode ser usada para gerar conhecimento a partir de outros tipos de texto: pedidos com base na Lei de Acesso à Informações, manifestações do e-OUV, processos analisados pela CRG, notícias de interesse do órgão, etc.
Palestrante: Patrícia Maia - Ministério da Transparência, Fiscalização e Controle
Currículo: Possui mestrado em Computação Aplicada pela Universidade de Brasília (UNB), especialização em Modelagem de Processos e Engenharia de Requisitos pela Universidade Federal do Rio Grande do SUL (UFRGS) e graduação em Tecnologia da Informação. Tem experiência profissional nas áreas de mineração de textos, ETL, banco de dados e controle governamental. Trabalha atualmente no Ministério da Transparência, Fiscalização e Controle (MTFC), exercendo suas atividades na Diretoria de Pesquisas e Informações Estratégicas.
Mapeamento de risco de corrupção na administração pública federalRommel Carvalho
O documento descreve um projeto do governo brasileiro para mapear o risco de corrupção na administração pública federal através da análise e mineração de dados sobre servidores públicos e unidades governamentais. O projeto usa técnicas avançadas de aprendizado de máquina e análise estatística de grandes conjuntos de dados para gerar indicadores confiáveis de risco de corrupção. O objetivo final é fornecer uma ferramenta estratégica para prevenir e combater a corrupção de forma proativa.
1) O Observatório da Despesa Pública utiliza técnicas de ciência de dados para identificar riscos de fraude e irregularidades nos gastos públicos e apoiar a tomada de decisão dos gestores públicos.
2) Projetos como o Mapa de Risco de Fornecedores, a Análise Preventiva de Contratações e a Triagem Automática de Denúncias usam análises preditivas para prevenir situações de risco.
3) O Banco de Preços da APF permite pesquisas de mercado e identificação de sobrepreços nos contratos
Aplicação de técnicas de mineração de textos para classificação automática de...Rommel Carvalho
O uso de classificação automática de textos tem se tornado cada vez mais comum nos últimos anos. Contudo, ao se trabalhar com classificação em larga escala, a complexidade aumenta consideravelmente. Foi realizado um estudo de caso, aplicado à triagem de denúncias na Controladoria Geral da União, utilizando uma grande quantidade de categorias a serem classificadas. A solução proposta empregou aprendizagem de máquina e classificação multilabel. Essas técnicas tiveram como objetivo a construção de um modelo capaz de solucionar adversidades inerentes a este contexto, apresentando ganhos significativos
Patrícia Helena Maia Alves de Andrade - Controladoria-Geral da União
Analista de Finanças e Controle da CGU, atuando na área de mineração de textos e análise de dados, na Diretoria de Pesquisa e Informações Estratégicas. Atualmente está finalizando o Mestrado Profissional em Computação Aplicada na Universidade de Brasília
Filiação partidária e risco de corrupção de servidores públicos federaisRommel Carvalho
O documento discute o uso de aprendizado de máquina para analisar a relação entre filiação partidária e risco de corrupção entre servidores públicos federais brasileiros. Os dados mostraram uma correlação positiva entre filiação partidária e casos de corrupção. Um modelo de floresta aleatória obteve os melhores resultados, identificando variáveis-chave como tempo de filiação e motivo de cancelamento.
Uso de mineração de dados e textos para cálculo de preços de referência em co...Rommel Carvalho
Uma das grandes responsabilidades da CGU é identificar as compras do governo com valores diferentes dos praticados pelo mercado. Dessa forma, é possível mensurar o grau de eficiência das compras realizadas pelos órgãos governamentais. Essa informação é útil tanto para o auditor, que é responsável por fiscalizar o uso dos recursos públicos, como para o gestor, que pode melhorar seus processos observando as melhores práticas de outras unidades do governo. Dada a enorme quantidade e a diversidade das compras realizadas pelo Governo, essa análise se torna praticamente inviável sem a ajuda de algum mecanismo automatizado. No entanto, para que essa análise automatizada seja possível, é preciso ter antes de tudo, uma base de dados com os preços médios, ou de referência, para cada produto que se deseja analisar. Apesar de todas as compras do Governo Federal serem inseridas em um sistema único e centralizado, as informações armazenadas não são detalhadas e estruturadas o suficiente para se calcular esses preços de referência.
Essa palestra apresenta a metodologia desenvolvida na CGU, baseada em técnicas de mineração de dados, para extrair as informações necessárias desse sistema centralizado de forma a possibilitar o cálculo de preços de referência para produtos comprados pelo Governo Federal. Além disso, são apresentadas também algumas análises feitas com base no banco de preços criado a partir dessa metodologia de forma a enfatizar sua importância para a melhoria da gestão dos recursos públicos.
Rommel Novaes Carvalho - Controladoria-Geral da União
Coordenador-Geral do Observatório da Despesa Pública da CGU (http://www.cgu.gov.br/assuntos/informacoes-estrategicas/observatorio-da-despesa-publica), realizou seu PhD e Pós-Doc na George Mason University, EUA, na área de Inteligência Artificial, Web Semântica e Mineração de Dados e também é professor do Mestrado Profissional em Computação Aplicada da UnB
Knowledge representation is a field of artificial intelligence that represents information about the world in a way that a computer system can understand to perform complex tasks. It simplifies complex systems through modeling human psychology and problem-solving. Examples of knowledge representation include semantic nets, frames, rules, and ontologies. Knowledge representation allows for automated reasoning about represented knowledge and asserting new knowledge. While first-order logic provides powerful and compact representation, it lacks ease of use and practical implementation for real-world problems. Effective knowledge representation requires balancing expressive power with practical considerations like execution efficiency.
The document provides an overview of artificial intelligence and knowledge-based systems. It discusses definitions of intelligence and AI, as well as knowledge representation schemes like logical, procedural, semantic network, and frame-based representations. The key components of a knowledge-based system are described as the knowledge base, which represents problem domain knowledge, and the inference engine, which uses reasoning techniques to solve problems. Ideal features of knowledge-based systems include efficient problem-solving using knowledge, heuristics, and eliminating unproductive solutions.
This document provides an overview and introduction to the course "Knowledge Representation & Reasoning" taught by Ms. Jawairya Bukhari. It discusses the aims of developing skills in knowledge representation and reasoning using different representation methods. It outlines prerequisites like artificial intelligence, logic, and programming. Key topics covered include symbolic and non-symbolic knowledge representation methods, types of knowledge, languages for knowledge representation like propositional logic, and what knowledge representation encompasses.
A Fuzzy Valid-Time Model for Relational Databases Within the Hibernate FrameworkJosé Enrique Pons
Time in databases has been studied for a long time. Valid time databases capture when the objects are true in the reality. The proposed model allows both representing and querying time in a fuzzy way. The representation and the underlying domain are defined as well as some fuzzy temporal operators. The implementation of the model is developed within the Hibernate framework. The Hibernate framework acts as an abstraction for the running database. Therefore, any relational database supported by the framework can now represent fuzzy valid time in its schema.
This document provides an introduction to knowledge representation in artificial intelligence. It discusses how knowledge representation and reasoning forms the basis of intelligent behavior through computational means. The key types of knowledge that need to be represented are defined, including objects, events, facts, and meta-knowledge. Different types of knowledge such as declarative, procedural, structural and heuristic knowledge are explained. The importance of knowledge representation for modeling intelligent behavior in agents is highlighted. The requirements for effective knowledge representation including representational adequacy, inferential adequacy, inferential efficiency, and acquisitional efficiency are outlined. Propositional logic is introduced as the simplest form of logic using propositions.
The document discusses various topics in artificial intelligence including the Turing test, knowledge representation using semantic networks and search trees, expert systems, neural networks, natural language processing, robotics, and ethical issues. It provides examples and explanations of each topic to demonstrate key concepts in AI such as how knowledge is represented, how expert systems make inferences, how neural networks are trained, and challenges with natural language comprehension. The chapter aims to distinguish problems humans solve best from those computers solve best and define important AI terms and techniques.
The document discusses knowledge representation in cognitive psychology. It defines knowledge and describes two main types: declarative and procedural knowledge. Declarative knowledge refers to static facts and information stored in memory, while procedural knowledge involves skills and how to perform tasks or activities. The document also explains several methods for representing declarative knowledge, including concepts and schemas, frames, and semantic networks. Frames organize knowledge into attribute-value pairs, while semantic networks use a graph structure to represent relationships between concepts. Overall, the document provides an overview of knowledge representation and different models for encoding declarative and procedural information.
UnBBayes is a probabilistic network framework written in Java. It has both a GUI and an API with inference, sampling, learning and evaluation. It supports BN, ID, MSBN, OOBN, HBN, MEBN/PR-OWL, structure, parameter and incremental learning.
This presentation talks about UnBBayes version 4.0.0, which is the first version that supports plugins. In it we present the major concepts behind this Plugin Framework, features and benefits, applications, some sample plugins, specification, extension points, and availability.
This presentation was given by Shou Matsumoto from the University of Brasilia in Brazil via web conference to PhD students at George Mason University in the US on the Friday seminar called Krypton (http://krypton.c4i.gmu.edu/).
UnBBayes is a probabilistic network framework written in Java. It has both a GUI and an API with inference, sampling, learning and evaluation. It supports BN, ID, MSBN, OOBN, HBN, MEBN/PR-OWL, structure, parameter and incremental learning.
The overview is presented through a slides potpourri from different presentations the Artificial Intelligence Group (GIA) from University of Brasilia (UnB) has given since 1999. It covers BN, ID, MSBN, UnBBayes Server, and MEBN.
This presentation was given by Rommel Carvalho when he started his PhD at George Mason University on the Friday seminar called Krypton (http://krypton.c4i.gmu.edu/).
The SPE-PRMS system provides a framework for classifying oil and gas resources with two levels of uncertainty. The first is the uncertainty around discovery volumes, and the second is around the commercial viability of discoveries. Resources are classified as reserves, contingent resources, or prospective resources. Reserves have been discovered and proven commercial through drilling. Contingent resources are discovered but not proven commercial. Prospective resources are undiscovered but thought to have potential for commercial production. The PRMS system defines reserves as quantities that can be commercially produced meeting criteria around discovery, producibility, commercial viability, and existing development plans. Reserves estimates are provided as 1P, 2P, and 3P values with varying probabilities of actual production amounts meeting or
Resources and Reserves are the foundation assets to all E&P companies. Valuation of E&P companies is based on these numbers in their book! It is the most important asset of any E&P company. In Fortune 500 (http://fortune.com/global500/ ) 2,3,4,5,6 11 & 12 are E&P.
This presentation gives an overview to what goes into this MOST IMPORTANT number. You may need to check some basic terms and backup to get a fuller understandign
What are my 3P Reserves? Haas Petroleum Engineering Serviceshaasengineering
What is the best way to estimate your 3P reserves? President of Haas Petroleum Engineering Services Thad Toups gave this presentation on Haas' internal analytics and auditing methodology.
Geomodelling, resource & reserve estimation using mining softwareChandra Bose
The document provides an overview of geomodelling, resource and reserve estimation, and pit optimization for mining projects. It discusses how borehole data, lithology, mineralization, and quality data are used in geomodelling software to create 3D geological models and cross sections. Resource and reserve estimation involves categorizing resources, estimating densities, recovery factors, and cut-off depths to determine geological, mineable, and extractable reserves. Pit optimization software is used to design optimal open pit mine plans that consider pit boundaries, slopes, benches, and production schedules to maximize profitability over the life of the mine.
Ouvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big DataRommel Carvalho
Apresentação realizada no dia 14/03/2017 por Rommel N. Carvalho na Semana de Ouvidoria e Acesso à Informação de 2017, organizada pela CGU.
YouTube: https://youtu.be/vNMtULu5X1c?t=3h20m24s
Como transformar servidores em cientistas de dados e diminuir a distância ent...Rommel Carvalho
Palestra ministrada pelo Dr. Rommel Novaes Carvalho, Coordenador-Geral do Observatório da Despesa Pública e Professor do Mestrado Profissional em Computação Aplicada da UnB.
Evento: Brasil 100% Digital: Integração e transparência a serviço da sociedade
Website: http://www.brasildigital.gov.br/
Data: 10/11/2016
Vídeo: https://www.youtube.com/watch?v=3WYQlPR-RLw&feature=youtu.be&t=2h4m44s
Proposta de Modelo de Classificação de Riscos de Contratos PúblicosRommel Carvalho
O documento propõe três modelos para avaliar o risco de contratos públicos: 1) um modelo de aprendizagem supervisionada para classificar o risco de fornecedores com base em variáveis como doações políticas e histórico de punições; 2) um segundo modelo para classificar o risco de contratos com base em aspectos como competitividade e complexidade; 3) um modelo multicritério para selecionar casos de auditoria com base no risco do contrato, risco da empresa, e questões logísticas.
Categorização de achados em auditorias de TI com modelos supervisionados e nã...Rommel Carvalho
Palestra ministrada pela Patrícia Maia no 2o Seminário sobre Análise de Dados na Administração Pública @ http://www.brasildigital.gov.br/
Resumo: O trabalho consistiu na aplicação de técnicas de mineração de textos para identificação dos principais assuntos abordados nas auditorias dos últimos cinco anos. Foram utilizadas duas abordagens: a abordagem supervisionada aplicando classificação de textos com o algoritmo Random Forest e a abordagem não supervisionada através da técnica de modelagem de tópicos Latent Dirichlet Allocation (LDA). O projeto piloto foi validado com as constatações de TI e está agora sendo estendido a constatações relacionadas a outros temas. O objetivo é permitir catalogar o histórico de constatações emitidas e categorizar automaticamente novos registros. Com isso, os servidores poderão recuperar situações semelhantes para aplicação em novos trabalhos ou, ainda, tratar problemas recorrentes de forma estruturante. Além disso a mesma lógica pode ser usada para gerar conhecimento a partir de outros tipos de texto: pedidos com base na Lei de Acesso à Informações, manifestações do e-OUV, processos analisados pela CRG, notícias de interesse do órgão, etc.
Palestrante: Patrícia Maia - Ministério da Transparência, Fiscalização e Controle
Currículo: Possui mestrado em Computação Aplicada pela Universidade de Brasília (UNB), especialização em Modelagem de Processos e Engenharia de Requisitos pela Universidade Federal do Rio Grande do SUL (UFRGS) e graduação em Tecnologia da Informação. Tem experiência profissional nas áreas de mineração de textos, ETL, banco de dados e controle governamental. Trabalha atualmente no Ministério da Transparência, Fiscalização e Controle (MTFC), exercendo suas atividades na Diretoria de Pesquisas e Informações Estratégicas.
Mapeamento de risco de corrupção na administração pública federalRommel Carvalho
O documento descreve um projeto do governo brasileiro para mapear o risco de corrupção na administração pública federal através da análise e mineração de dados sobre servidores públicos e unidades governamentais. O projeto usa técnicas avançadas de aprendizado de máquina e análise estatística de grandes conjuntos de dados para gerar indicadores confiáveis de risco de corrupção. O objetivo final é fornecer uma ferramenta estratégica para prevenir e combater a corrupção de forma proativa.
1) O Observatório da Despesa Pública utiliza técnicas de ciência de dados para identificar riscos de fraude e irregularidades nos gastos públicos e apoiar a tomada de decisão dos gestores públicos.
2) Projetos como o Mapa de Risco de Fornecedores, a Análise Preventiva de Contratações e a Triagem Automática de Denúncias usam análises preditivas para prevenir situações de risco.
3) O Banco de Preços da APF permite pesquisas de mercado e identificação de sobrepreços nos contratos
Aplicação de técnicas de mineração de textos para classificação automática de...Rommel Carvalho
O uso de classificação automática de textos tem se tornado cada vez mais comum nos últimos anos. Contudo, ao se trabalhar com classificação em larga escala, a complexidade aumenta consideravelmente. Foi realizado um estudo de caso, aplicado à triagem de denúncias na Controladoria Geral da União, utilizando uma grande quantidade de categorias a serem classificadas. A solução proposta empregou aprendizagem de máquina e classificação multilabel. Essas técnicas tiveram como objetivo a construção de um modelo capaz de solucionar adversidades inerentes a este contexto, apresentando ganhos significativos
Patrícia Helena Maia Alves de Andrade - Controladoria-Geral da União
Analista de Finanças e Controle da CGU, atuando na área de mineração de textos e análise de dados, na Diretoria de Pesquisa e Informações Estratégicas. Atualmente está finalizando o Mestrado Profissional em Computação Aplicada na Universidade de Brasília
Filiação partidária e risco de corrupção de servidores públicos federaisRommel Carvalho
O documento discute o uso de aprendizado de máquina para analisar a relação entre filiação partidária e risco de corrupção entre servidores públicos federais brasileiros. Os dados mostraram uma correlação positiva entre filiação partidária e casos de corrupção. Um modelo de floresta aleatória obteve os melhores resultados, identificando variáveis-chave como tempo de filiação e motivo de cancelamento.
Uso de mineração de dados e textos para cálculo de preços de referência em co...Rommel Carvalho
Uma das grandes responsabilidades da CGU é identificar as compras do governo com valores diferentes dos praticados pelo mercado. Dessa forma, é possível mensurar o grau de eficiência das compras realizadas pelos órgãos governamentais. Essa informação é útil tanto para o auditor, que é responsável por fiscalizar o uso dos recursos públicos, como para o gestor, que pode melhorar seus processos observando as melhores práticas de outras unidades do governo. Dada a enorme quantidade e a diversidade das compras realizadas pelo Governo, essa análise se torna praticamente inviável sem a ajuda de algum mecanismo automatizado. No entanto, para que essa análise automatizada seja possível, é preciso ter antes de tudo, uma base de dados com os preços médios, ou de referência, para cada produto que se deseja analisar. Apesar de todas as compras do Governo Federal serem inseridas em um sistema único e centralizado, as informações armazenadas não são detalhadas e estruturadas o suficiente para se calcular esses preços de referência.
Essa palestra apresenta a metodologia desenvolvida na CGU, baseada em técnicas de mineração de dados, para extrair as informações necessárias desse sistema centralizado de forma a possibilitar o cálculo de preços de referência para produtos comprados pelo Governo Federal. Além disso, são apresentadas também algumas análises feitas com base no banco de preços criado a partir dessa metodologia de forma a enfatizar sua importância para a melhoria da gestão dos recursos públicos.
Rommel Novaes Carvalho - Controladoria-Geral da União
Coordenador-Geral do Observatório da Despesa Pública da CGU (http://www.cgu.gov.br/assuntos/informacoes-estrategicas/observatorio-da-despesa-publica), realizou seu PhD e Pós-Doc na George Mason University, EUA, na área de Inteligência Artificial, Web Semântica e Mineração de Dados e também é professor do Mestrado Profissional em Computação Aplicada da UnB
Detecção preventiva de fracionamento de comprasRommel Carvalho
O documento descreve um estudo sobre a detecção preventiva de fracionamento de compras no Brasil usando redes bayesianas. O estudo utilizou dados de compras do governo para criar um modelo capaz de identificar possíveis fracionamentos. Após a preparação dos dados, diferentes algoritmos de modelagem foram testados e avaliados, resultando em um modelo com alta acurácia e capacidade de classificação. O modelo foi implantado para alertar sobre possíveis fracionamentos em novas compras governamentais.
Identificação automática de tipos de pedidos mais frequentes da LAIRommel Carvalho
O documento descreve um método para identificar automaticamente os tipos de pedidos mais frequentes na Lei de Acesso à Informação (LAI) brasileira através da análise de tópicos em mais de 300 mil pedidos usando o modelo Latent Dirichlet Allocation (LDA). O método identificou vários tópicos comuns, incluindo pedidos sobre o Banco Central do Brasil (BACEN) e sobre concursos públicos. O processo levou cerca de 10 horas para analisar os 300 mil pedidos.
BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...Rommel Carvalho
Presentation given by Rommel N. Carvalho at the 11th Bayesian Modeling Applications Workshop (BMAW 2014) at the 30th Conference on Uncertainty in Artificial Intelligence (UAI 2014) in July 27, 2014, Quebec City, Quebec, Canada. This was a joint work between the Research and Strategic Information Directorate from Brazil's Office of the Comptroller General and the Department of Computer Science from the University of Brasília.
Talk: https://www.youtube.com/watch?v=UVOsztdSQ3A
Paper: http://seor.gmu.edu/~klaskey/BMAW2014/BMAW2014_papers/bmaw2014_paper_6.pdf
Title: Using Bayesian Networks to Identify and Prevent Split Purchases in Brazil.
Abstract: To cope with society's demand for transparency and corruption prevention, the Brazilian Office of the Comptroller General (CGU) has carried out a number of actions, including: awareness campaigns aimed at the private sector; campaigns to educate the public; research initiatives; and regular inspections and audits of municipalities and states. Although CGU has collected information from various different sources - Revenue Agency, Federal Police, and others -, going through all the data in order to find suspicious transactions has proven to be really challenging. In this paper, we present a Data Mining study applied on real data - government purchases - for finding transactions that might become irregular before they are considered as such in order to act proactively. Moreover, we compare the performance of various Bayesian Network (BN) learning algorithms with different parameters in order to fine tune the learned models and improve their performance. The best result was obtained using the Tree Augmented Network (TAN) algorithm and oversampling the minority class in order to balance the data set. Using a 10-fold cross-validation, the model correctly classified all split purchases, it obtained a ROC area of .999, and its accuracy was 99.197%.
Presentation given by Rommel N. Carvalho at the 9th International Workshop on Uncertainty Reasoning for the Semantic Web at the 12th International Semantic Web Conference in October 21, 2013, Sydney, Australia. This was a joint work between the Research and Strategic Information Directorate from Brazil's Office of the Comptroller General and the Department of Computer Science from the University of Brasília.
Title: A GUI for MLN.
Abstract: This paper focuses on the incorporation of the Markov Logic Network (MLN) formalism as a plug-in for UnBBayes, a Java framework for probabilistic reasoning based on graphical models. MLN is a formalism for probabilistic reasoning which combines the capacity of dealing with uncertainty tolerating imperfections and contradictory knowledge based a Markov Network (MN) with the expressiveness of First Order Logic. A MLN provides a compact language for specifying very large MNs and the ability to incorporate, in modular form, large domain of knowledge (expressed in First Order Logic sentences) inside itself. A Graphical User Interface for the software Tuffy was implemented into UnBBayes to facilitate the creation, and inference of MLN models. Tuffy is a Java open source MLN engine.
Presentation given by Rommel N. Carvalho at the 9th International Workshop on Uncertainty Reasoning for the Semantic Web at the 12th International Semantic Web Conference in October 21, 2013, Sydney, Australia. This was a joint work between the Research and Strategic Information Directorate from Brazil's Office of the Comptroller General and the Department of Computer Science from the University of Brasília.
Title: UMP-ST plug-in: a tool for documenting, maintaining, and evolving probabilistic ontologies.
Abstract: Although several languages have been proposed for dealing with uncertainty in the Semantic Web (SW), almost no support has been given to ontological engineers on how to create such probabilistic ontologies (PO). This task of modeling POs has proven to be extremely difficult and hard to replicate. This paper presents the first tool in the world to implement a process which guides users in modeling POs, the Uncertainty Modeling Process for Semantic Technologies (UMP-ST). The tool solves three main problems: the complexity in creating POs; the difficulty in maintaining and evolving existing POs; and the lack of a centralized tool for documenting POs. Besides presenting the tool, which is implemented as a plug-in for UnBBayes, this papers also presents how the UMP-ST plug-in could have been used to build the Probabilistic Ontology for Procurement Fraud Detection and Prevention in Brazil, a proof-of-concept use case created as part of a research project at the Brazilian Office of the Comptroller General (CGU).
Integração do Portal da Copa @ Comissão CMA do Senado FederalRommel Carvalho
Apresentação preparada por Rommel N. Carvalho e apresentada pela Diretora de Sistemas e Informações da Controladoria-Geral da União (CGU), Tatiana Z. Panisset, na reunião da Comissão de Meio Ambiente, Defesa do Consumidor e Fiscalização e Controle (CMA) do Senado Federal (SF). A reunião teve como foco o debate da unificação da entrada de dados dos Portais de Transparência da Copa de 2014 do SF (www.copatransparente.gov.br) e da CGU (http://transparencia.gov.br/copa2014). Mais informações sobre a reunião em http://goo.gl/KCBD6.
As alternativas apresentadas foram discutidas e deliberadas pela CMA com aprovação da colaboração oficial entre o poder Legislativo e o poder Executivo para executar a integração da entrada de dados dos respectivos portais da copa do mundo. Notícias sobre essa colaboração podem ser encontradas em goo.gl/N8cbr, goo.gl/RVMGd, goo.gl/Ze3uJ, goo.gl/6o7BZ e goo.gl/C1CFv.
Título:
O que é e como usar dados abertos governamentais
Resumo:
A Web Semântica visa associar os dados disponibilizados na Web aos seus significados de forma a possibilitar que esses dados sejam compreensíveis tanto por humanos quanto por máquinas. Isso permitirá que tarefas, antes realizadas apenas por humanos, possam agora ser delegadas a máquinas. Técnicas de Web Semântica têm se difundido com o significativo aumento no número de aplicações que fazem uso de ontologias e semântica através de tecnologias como RDF, OWL, dentre outras, e as várias iniciativas espalhadas pelo mundo referente à disponibilização de dados abertos, em especial, de dados abertos governamentais. Dados abertos governamentais são definidos pela W3C – Consórcio da Web, como “a publicação e disseminação na Web de dados gerados pelo Setor Público, compartilhados em formato bruto e aberto, compreensíveis logicamente, de modo a permitir sua reutilização em aplicações digitais desenvolvidas pela sociedade”. O objetivo dessa palestra é apresentar os principais conceitos que norteiam as diversas iniciativas de dados abertos governamentais, a situação atual dessa iniciativa no Brasil, os benefícios que essa iniciativa traz para a sociedade como o uso desses dados abertos para contribuir com a melhoria e transparência da gestão pública.
Palestrante:
Dr. Rommel Novaes Carvalho, Ph.D
Postdoctoral Research Associate – C4I Center @ GMU
Analista de Finanças e Controle – CGU
http://mason.gmu.edu/~rcarvalh
CV resumido:
Rommel Novaes Carvalho é bacharel em Ciência da Computação e Mestre em Informática pela Universidade de Brasília, e doutor em Engenharia de Sistemas e Pesquisa Operacional pela Universidade George Mason, Estados Unidos. Pesquisador em Inteligência Artificial (IA) e membro do Grupo de Pesquisa em Inteligência Artificial da Universidade de Brasília (GIA). Suas áreas de interesse abrangem representação e raciocínio com incerteza na Web Semântica usando inferência bayesiana, mineração de dados, e engenharia de software. Desenvolvedor Java certificado, com experiência em implementação de sistemas de redes probabilísticas, sendo o arquiteto principal do projeto UnBBayes – Framework para raciocino probabilístico, em desenvolvimento pelo GIA desde 2000. Em seu doutorado propôs e implementou a versão 2 para o PR-OWL – Probabilistic OWL, para permitir o reuso de ontologias determinísticas existentes, sua interoperabilidade com ontologias probabilísticas representadas em PR-OWL, e raciocínio misto ontológico e probabilístico. Desde 2005 trabalha na Controladoria-Geral da União como especialista em Tecnologia da Informação. Em 2011, tornou-se pesquisador associado de Pós-Doutorado na George Mason University.
Modeling a Probabilistic Ontology for Maritime Domain AwarenessRommel Carvalho
The document describes developing a probabilistic ontology for maritime domain awareness. It aims to develop an ontology capable of reasoning with evidence from different domains to provide situational awareness. It discusses ontologies, probabilistic ontologies, and using the Probabilistic Web Ontology Language and other techniques. It also presents an uncertainty modeling process and incremental methodology for modeling the probabilistic ontology, including modeling cycles with goals, queries, evidence and assumptions.
Probabilistic Ontology: Representation and Modeling MethodologyRommel Carvalho
Oral Defense of Doctoral Dissertation
Volgenau School of Engineering, George Mason University
Rommel Novaes Carvalho
Bachelor of Science, University of Brasília, Brazil, 2003
Master of Science, University of Brasília, Brazil, 2008
Probabilistic Ontology: Representation and Modeling Methodology
Tuesday, June 28, 2011, 2:00pm -- 4:00pm
Nguyen Engineering Building, Room 4705
Committee
Kathryn Laskey, Chair
Paulo Costa
Kuo-Chu Chang
David Schum
Larry Kerschberg
Fabio Cozman
Abstract
The past few years have witnessed an increasingly mature body of research on the Semantic Web (SW), with new standards being developed and more complex problems being addressed. As complexity increases in SW applications, so does the need for principled means to cope with uncertainty in SW applications. Several approaches addressing uncertainty representation and reasoning in the SW have emerged. Among these is Probabilistic Web Ontology Language (PR-OWL), which provides Web Ontology Language (OWL) constructs for representing Multi-Entity Bayesian Network (MEBN) theories. However, there are several important ways in which the initial version PR-OWL 1.0 fails to achieve full compatibility with OWL. Furthermore, although there is an emerging literature on ontology engineering, little guidance is available on the construction of probabilistic ontologies.
This research proposes a new syntax and semantics, defined as PR-OWL 2.0, which improves compatibility between PR-OWL and OWL in two important respects. First, PR-OWL 2.0 follows the approach suggested by Poole et al. to formalizing the association between random variables from probabilistic theories with the individuals, classes and properties from ontological languages such as OWL. Second, PR-OWL 2.0 allows values of random variables to range over OWL datatypes.
To address the lack of support for probabilistic ontology engineering, this research describes a new methodology for modeling probabilistic ontologies called Uncertainty Modeling Process for Semantic Technologies (UMP-ST). To better explain the methodology and to verify that it can be applied to different scenarios, this dissertation presents step-by-step constructions of two different probabilistic ontologies. One is used for identifying frauds in public procurements in Brazil and the other is used for identifying terrorist threats in the maritime domain. Both use cases demonstrate the advantages of PR-OWL 2.0 over its predecessor.
SWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule LanguageRommel Carvalho
Presentation given by Tomasz Wlodarczyk at the 6th Uncertainty Reasoning for the Semantic Web Workshop at the 9th International Semantic Web Conference in 2010.
Paper: SWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule Language
Abstract: Enhancing Semantic Web technologies with an ability to express uncertainty and imprecision is widely discussed topic. While SWRL can provide additional expressivity to OWL-based ontologies, it does not provide any way to handle uncertainty or imprecision. We introduce an extension of SWRL called SWRL-F that is based on SWRL rule language and uses SWRL's strong semantic foundation as its formal underpinning. We extend it with a SWRL-F ontology to enable fuzzy reasoning in the rule base. The resulting language provides small but powerful set of fuzzy operations that do not introduce inconsistencies in the host ontology.
Default Logics for Plausible Reasoning with Controversial AxiomsRommel Carvalho
Presentation given by Thomas Scharrenbach at the 6th Uncertainty Reasoning for the Semantic Web Workshop at the 9th International Semantic Web Conference in 2010.
Paper: Default Logics for Plausible Reasoning with Controversial Axioms
Abstract: Using a variant of Lehmann's Default Logics and Probabilistic Description Logics we recently presented a framework that invalidates those unwanted inferences that cause concept unsatisfiability without the need to remove explicitly stated axioms. The solutions of this methods were shown to outperform classical ontology repair w.r.t. the number of inferences invalidated. However, conflicts may still exist in the knowledge base and can make reasoning ambiguous. Furthermore, solutions with a minimal number of inferences invalidated do not necessarily minimize the number of conflicts. In this paper we provide an overview over finding solutions that have a minimal number of conflicts while invalidating as few inferences as possible. Specifically, we propose to evaluate solutions w.r.t. the quantity of information they convey by recurring to the notion of entropy and discuss a possible approach towards computing the entropy w.r.t. an ABox.
3. Objectives
What is this presentation for?
– Overview of PRM and its
underlying concepts
Purpose
– Overview of extensions of PRM
• Link uncertainty
– To present a simple
implementation of PRM
3
• UnBBayes-PRM
4. Motivations
E/R models are heavily used
– Most of commercial databases are
based on E/R models
Purpose
PRM allows E/R with uncertainty
– PRM is compatible with optimizations
of BN and E/R
Implementations of PRM are rare
4
5. Target
For whom is this presentation intended?
– People interested on PRM
• E.g. Database architects willing to incorporate
Purpose
probabilistic reasoning
• People looking for a BN extension with the
expressiveness of relational calculus
– People looking for a PRM tool
• E.g. Developers looking for a sample
implementation
• Learners willing to exercise PRM
5 We assume you have basic knowledge about Bayesian Networks
7. What is E/R?
E/R = Entity-Relationship
Abstract conceptual representation of data
Contextualization
– Often used in relational database models
• E.g. Oracle, MySQL, PostgreSQL...
Entities = “nouns”
– A set of elements in a domain
Relationships = “verbs”
– Captures how 2 or more entities are related
Attributes = “characteristics”
7 Attributes holds actual data content.
8. What is E/R?
Constraints
Contextualization
– Cardinality
• 1-1, 1-many, many-1, many-many
– Primary Key (PK):
• minimal set of uniquely identifying attributes
– Foreign Key (FK):
• Attributes that refers to other attributes (PK)
– This is used to conduct relationships
– Allowed values
– Etc.
8
9. What is E/R?
E/R can be represented as a set of Tables
Contextualization
– Entities → tables
– Attributes → columns
– Values of attributes → content of a cell
– 1-1 and 1-many (many-1) relationships → FK
– Many-many relationships → table + FK
Problem
– Classic E/R models do not handle uncertainty
9 UnBBayes-PRM sees E/R as a set of tables.
10. So, what is PRM?
Probabilistic Relational Models
Contextualization
– Template for probability distribution over a
database (E/R model)
• Compact graphical probabilistic model
– well defined semantics
• Natural domain modeling
– objects, properties, relations...
• Attributes can depend on attributes of related
entities
• Generalization over a variety of situations
10
11. So, what is PRM?
PRM's learning algorithms
Contextualization
– Captures relationships in Bayesian learning
algorithms
• There's no need to “flatten” database
PRM's are composed of:
– Relational Schema,
– Relational Skeleton,
– Probabilistic distribution.
11 Machine learning is a major concern in PRM
12. Schema
Static part
Contextualization
– Entities + Relationships + Attributes
– PK, FK, possible (allowed) values...
hasFather
Person
ID: PK
Person BloodType Father : FK to Person
Mother: FK to Person
BloodType : any of {A,B,AB,O}
hasMother
12
13. Skeleton
Dynamic part
Contextualization
– Instantiation of a Schema
– Actual objects
• Attributes are filled with some values
ID: Augustine ID: Mary
Father: NULL Father: NULL
Mother: NULL Mother: NULL
BloodType: O ID: George BloodType: A
Father: Augustine
Mother: Mary
13 BloodType: NULL
14. PRM's structure
Schema + probabilistic dependencies
Contextualization
Attributes have path expressions describing their
parents of that attribute.
– Path expressions = slot chain
• List of FK
– If slot chain contains 1-many relationship, the
number of parents is unknown
Conditional Probability Distribution (CPD)
– Conditional Probability Table (CPT)
– Functions + parameters
14 (Slot chain = empty) := no parents | parents reside in the same table
15. PRM's structure
John Doe Jane Doe
Contextualization
Person Instantiation
Instantiation Me
Person
FK1 FK2
PK
Father
Mother
BloodType
CPD of BloodType
CPD of BloodType
Father A A A ...
Mother A B AB ...
Edge from
Edge from Edge from
Edge from A 75% 25% 50% ...
BloodType
BloodType BloodType
BloodType B 0% 25% 25% ...
of the object
of the object of the object
of the object AB 0% 25% 25% ...
referenced by FK1 referenced by FK2
referenced by FK1 referenced by FK2 O 25% 25% 0% ...
15
16. CPD with aggregation
How do we declare the CPD if the number of parents is
unknown?
Contextualization
Approach 1: special purpose scripts
– E.g. UnBBayes-MEBN's CPD scripts
• A set of IF-THEN-ELSE statements
Approach 2: aggregation
– E.g. Mode, Max, Min, Average...
• Equivalent to an intermediate “deterministic” node
16 UnBBayes-PRM uses the approach 2
17. Inference
Instantiation of a BN from skeleton
Contextualization
Descriptive attributes become random
variables
Once generated, further inference is done as
normal BN (evidence propagation)
17
18. Does the instantiated BN
have cycles?
Case 1: check at PRM schema level
– Schema has no cycle → instances have no cycle
Contextualization
Case 2: schema contains cycles, but the instantiated BN
does not
ID: Augustine ID: Mary
BloodType BloodType
Person Person
ID: George
(Father) (Mother)
Washington
BloodType
18 Person
19. Extension:
link uncertainty
We only mentioned about distribution over attributes
of the objects in a model
Contextualization
– Only the values of the attributes were uncertain
Uncertainty over relational structure of domain was
not addressed yet
– Structure uncertainty
• Values of FK are uncertain
– Slot chains are uncertain
Reference uncertainty & existence uncertainty
19 OBS. Link uncertainty is not implemented in UnBBayes-PRM
20. Reference uncertainty
Slots' (FK) values become a random variable
Contextualization
– Problem
• Unknown number of possible values
– It's difficult to declare CPD at schema level
– Solution
• Create partitions based on “other attributes”
– Assuming that ordinal attributes has a
known number of possible values
20
21. Reference uncertainty
Entity2
Entity2 Entity1
Entity1 Possible values:
Contextualization
PK PK PKs of Entity2
FKToEntity2 (unknown)
BooleanAttrib
Link to a single instance of Entity2
based on the current value of PK
Link to a set (partition) of instances of Entity2,
based on the current value of BooleanAttrib
Entity1
Entity1
Entity2
Entity2
PK Possible values:
PK FKToEntity2 2 (true/false)
BooleanAttrib Selector
21 We can now specify parents of FKs and CPD
22. Reference uncertainty:
instantiating BN
Contextualization
Edge types:
– I: within single object
– II: between objects
– III: from FKs of a slot chain
– IV: from partition attributes to selectors
– V: from selectors to FK
22 Extracted from Probabilistic Relational Models (Getoor et al., SRL07)
23. Existence uncertainty
Creation of a Boolean attribute “Exists” in tables
Contextualization
– Technically, entities also contain “Exists”
• But we assume instances (objects) of entities
“do exist” if they were instantiated
– So, this mechanism is mainly for
relationships
– Because “Exists” is not a FK, we can use it as a
normal random variable.
• No major changes on BN instantiation
23 Objects are related to every possible objects, with 0% ~ 100%
24. UnBBayes-PRM
Open-source Java software
A Java Implementation
– GUI & inference machine
Features
– Edit Schema and Skeleton as tables
– Edit probabilistic dependencies as CPT
– Edit constraints (PK, FK and allowed values)
– Generate BN from Skeleton
– Save/load projects from file
Developed as a plug-in for UnBBayes:
– Alpha version (for internal use)
24 Project page: http://sourceforge.net/projects/unbbayes/
28. UnBBayes-PRM - I/O
/* Table and PK declaration */
A Java Implementation
CREATE TABLE "Person" (
"id" VARCHAR2(300) not null,
"Father" VARCHAR2(300) ,
"Mother" VARCHAR2(300) ,
"BloodType" VARCHAR2(300)
);
ALTER TABLE "Person" ADD CONSTRAINT PK_Person
PRIMARY KEY ("id");
/* Possible values */
ALTER TABLE "Person" ADD CONSTRAINT CK_BloodType
CHECK ( "BloodType" IN ('A', 'B', 'AB', 'O'));
/* Foreign keys (relationships) */
ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Father
FOREIGN KEY ("Father") REFERENCES "Person" ("id");
ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Mother
FOREIGN KEY ("Mother") REFERENCES "Person" ("id");
28 PRM is currently stored as a SQL script. This is a temporary solution.
29. UnBBayes-PRM - I/O
Dependencies are stored as in-table comments
A Java Implementation
COMMENT ON COLUMN Person.BloodType IS 'Person.BloodType()
[ FK_Person_Father ] , Person.BloodType()[ FK_Person_Mother ] ; { 0.75 0.0
0.0 0.25 0.25 0.25 0.25 0.25 (...) }';
Basic format:
– <listOfParents>;{<listOfProbabilities>}
<listOfParents> := comma separated list
– <parentClass>.<parentColumn>
(<aggregateFunction>){<listOfForeignKeys>}
• <listOfForeignKeys> represents a slot chain
29 This is also a temporary solution.
30. UnBBayes-PRM:
limitations
No support for link uncertainty
A Java Implementation
– But existence uncertainty can be “simulated”
Only 1 attribute as PK
Only String types allowed
– Thus, no sequences are allowed
No marginalization
– Cannot delete dependencies
• We must re-create attribute or edit the SQL
script
30
31. UnBBayes-PRM:
limitations
2 edges (dependencies) to a same attribute is
A Java Implementation
not allowed
– Even using different slot chains
3 aggregation functions:
– mode, min, max.
No machine learning
No direct access to an actual database (yet)
– Only by means of a SQL script.
31
32. UnBBayes-PRM:
(possible) future works
Add extension points for plug-ins
Integration with DBMS
– Constraints/rules can be delegated to DBMS
Conclusion
• Some of the limitations may be automatically fixed
Implement machine learning and link
uncertainty
Edit E/R models as diagrams
PRM → MSBN compilation
32 DBMS = DataBase Management System
33. UnBBayes-PRM:
(possible) future works
Implement Dynamic PRM
– Dynamic BN + E/R
Conclusion
Integration with PROXIMITY¹
– RDN - Relational Dependency Network
• Generalization of BN + E/R + Relational Markov
Network
33 ¹A Java open-source tool from University of Massachusetts Amherst
34. Finally
PRM looks practical
– Uncertainty on relational data
• Immediate applicability in databases
Conclusion
– Advanced DBMS can add advanced
features
Machine learning seems to be PRM's major
concern
– It was not addressed by this presentation
34
35. Finally
PRM cannot specify advanced rules and
constraints on conditional probabilities
– Some conditions must be fulfilled “manually”
Conclusion
– Some may be fulfilled by DBMS' features
UnBBayes-PRM provides an editor and inference
engine for basic PRM
35