introdução a inteligência artificialpauloac/pee/introia.pdfo termo inteligência artificial foi...

Introdução a Inteligência Artificial

Prof. Paulo André Castro [email protected]/~pauloac Sala 110, IEC-ITA

mailto:[email protected]

http://www.comp.ita.br/%7Epauloac

2 / 161

Paulo André Lima de Castro

• Bolsista de Produtividade Desen. Tec. e Extensão Inovadora do CNPq Nível 2.

• Engenheiro de Computação pelo Instituto Tecnológico de Aeronáutica (ITA, 1997), Mestre e Doutor pela Escola Politécnica da Universidade de São Paulo (Poli/USP 2009). Pós-doutorado na City University of New York (CUNY, 2013).

• Atualmente é professor do Instituto Tecnológico de Aeronáutica (ITA) e Chefe do Departamento de Metodologias de Computação da Divisão de Ciência da Computação do ITA.

• Participei de diversos projetos de Pesquisa e Desenvolvimento incluindo desenvolvimento de simuladores, avaliação de segurança da informação em sistemas computacionais e aplicação de técnicas inteligentes em sistemas distribuídos.

• Realizo pesquisas na área de Inteligência Artificial com ênfase em Sistemas multiagentes, atuando principalmente nos seguintes temas: agent-based finance, agentes autônomos e aplicações de técnicas inteligentes especialmente em economia e finanças

3 / 161

Introdução a Inteligência Artificial Aula 1 - Introdução: conceitos e aplicações, agentes inteligentes e resolução de problemas por

busca

Aula 2 - Busca heurística: formalização e conceito de heurística, com exemplos

Aula 3 - Raciocínio probabilístico Probabilidades e crenças Modelos gráficos probabilísticos Inferência bayesiana

Aula 4 - Redes Bayesianas. Aprendizagem de Modelos probabilísticos

Decisão automatizada sobre incerteza Introdução a Engenharia de Conhecimento em Ambientes Incertos

Aula 5 - Aprendizado de Máquina: introdução, conceitos e aplicações

Aula 6 - Aprendizado supervisionado: conceitos básicos

Aula 7 - Aprendizado não supervisionado: conceitos básicos

4 / 161

Bibliografia

Korb, K. Nicholson,A. Bayesian Artificial Intelligence. CRC Press. 2011.

Witten, I., Frank, E. Data Mining: Practical Machine learning Tools andTechniques. Elsevier. 2005.

RUSSEL, S.; NORVIG, P. Inteligência Artificial: Uma abordagem moderna. 3a.ed. Rio de Janeiro: Elsevier Editora, 2009.

Pearl, Judea. Probabilistic Reasoning in Intelligent Systems: Network ofPlausible Inference. Morgan Kaufmann, San Mateo, California. 1988

5 / 161

Definições de IAPensando como seres humanos Pensando Racionalmente

“O novo e interessante esforço para fazer oscomputadores pensarem (…) máquinas com mentes, no sentido total e literal” (Haugeland, 1985)

“[Automação de] atividades que associamos aopensamento humano, atvidades como a tomadade decisões, a resolução de problemas, o aprendizado..” (Bellman, 1978)

“O estudo das faculdades mentais pelo uso de modelos computacionais” (Charniak e McDermoot, 1985)

“O estudo das computações que tornampossível perceber, raciocinar e agir” (Winston, 1992)

Agindo como Seres Humanos Agindo Racionalmente

“A arte de criar máquinas que executamfunções que exigem inteligência quandoexecutadas por pessoas” (Kurzweill, 1990)

“O estudo de como os computadores podemfazer tarefas que hoje são melhordesempenhadas por pessoas” (Rich and Knight, 1991)

“Inteligência Computacional é o estudo do projeto de agentes inteligentes” (Poole et al. 1998)

“IA…está relacionada a um desempenhointeligente de artefatos” (Nilsson, 1998)

6 / 161

Inteligência Artificial – Novo ? O termo Inteligência Artificial foi usado oficialmente pela

primeira vez no verão de 1956, em um convite para um workshop de 2 meses organizado por John McCarthy, Marvin Minsky, Claude Shannon,e outros…

“We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.”

Perhaps “computational rationality” would have been more precise and lessthreatening, but “AI” has stuck....McCarthy stated that he resisted the terms“computer” or “computational” in deference to Norbert Weiner, who waspromoting analog cybernetic devices rather than digital computers

7 / 161

What about …. ? Deep learning

“Deep learning is a subset of a more general field of artificial intelligence called machine learning” Buduma, N. The fundamentals of deep learning.

Machine Learning Construção de software “...que pode melhorar seu próprio

comportamento através do estudo diligente de suas próprias experiências” (Russel, Norvig, 2013)

Data mining Finding patterns in data that provide insight or enable fast

and accurate decision making (Witten,2005) Data Mining: Practical Machine learning)

Big data Capturing and managing lots of information (computer systems) Analyzing these masses of new data (data mining)

8 / 161

Machine learning Definitions of “learning” from dictionary:

To get knowledge of by study,experience, or being taughtTo become aware by information orfrom observationTo commit to memoryTo be informed of, ascertain; to receive instruction

Difficult to measure

Trivial for computers

Things learn when they change their behavior in a way that makes them perform better in the future.

• Operational definition:

ML vs Statistics: “In truth, you should not look for a dividing line between machine learning and statistics because there is a continuum— and a multidimensional one at that—of data analysis techniques” Witten, Data Mining: Practical Machine learning.

9 / 161

Data mining Finding patterns in data that provide insight or enable fast

and accurate decision making Strong, accurate patterns are needed to make decisions

Problem 1: most patterns are not interesting Problem 2: patterns may be inexact (or spurious) Problem 3: data may be garbled or missing

Machine learning techniques identify patterns in data and provide many tools for data mining

Machine learning techniques that provide structural descriptions are primary interest for data mining

In data mining, Machine learning is used as a way of ascertaining what factors are taken into account, not to automate the decision

10 / 161

A “Reasonable” Graph Representation of Intersections of Related Areas to AI

11 / 161

E Modelos Probabilísticos em Grafos? Desafios enfrentados por IA… Resolução de Problemas

Conhecimento: Raciocínio e planejamento

Incerteza: Conhecimento e Raciocínio

Aprendizado

Comunicação, percepção e Ação

12 / 161

A IA como um Campo Multidisciplinar

13 / 161

Agentes Um agente é tudo que pode ser considerado capaz de perceber seu

ambiente por meio de sensores e de agir sobre esse ambiente por intermédio de atuadores.

Exemplos: agente animal, agente robótico, agente de software, termostatos….

Figura 01 – Diagrama esquemático de um agente reativo simples.

14 / 161

Agentes Reativos Simples

15 / 161

Agentes Reativos Baseados em Modelo

16 / 161

Agentes Baseados em Objetivos

17 / 161

Agentes Baseados na Utilidade

18 / 161

Agentes Baseados em Aprendizado

19 / 161

Ambiente: Onde os agentes vivem e atuam Propriedades dos Ambientes Observável x Parcialmente Observável Determinístico x Estocástico Episódico x Seqüencial Estático x Dinâmico Discreto x Contínuo Agente Único x Multiagente

20 / 161

Uncertainty (Partially observed or stochastic) environments?

21 / 161

Example 1: Breast Cancer

22 / 161

Example 2: People vs Collins

23 / 161

Example 2: People vs Collins – cont.

Is the probability estimate correct? No. The product rule does not apply in this case!! What is the probability of the couple being guilty?

24 / 161

Example 2: People vs Collins – cont. The pieces of evidence are NOT independent!!!

Furthermore, P(h|e) is not equal to 1-P(e| not h), but:

And by Sum-out..

P(e|h) ?

25 / 161

Example 2: People vs Collins – cont.

Let’s say there are 1,625,000 eligible males and as many female in Los Angeles area...so:

26 / 161

Revisão de Conceitos Básicos de ProbabilidadeP(A | K) – probabilidade condicional ou posterior.Crença em A, dado o corpo de informação K.

P(A) – probabilidade a priori: Crença em A, na falta de informação adicional proveniente de K.

Uma Variável aleatória tem um domínio (conjunto de valores)e associada a cada um a probabilidade de ocorrência daquele valor. Essa função é chamada de distribuição de Probabilidade.Exemplo: Variável Tempo = {Sol, Chuva, Nublado}P(Tempo) – é uma distribuição de probabilidadeP(Tempo) = <0,7;0,2;0,1>

P(Tempo=sol) = 0.7P(Tempo=chuva) = 0.2P(Tempo=nublado) = 0.1

No caso contínuo, usa-se o termo função de densidade de probabilidade. Vamos focar no caso discreto.

27 / 161

Axiomas da Probabilidade Para qualqueis proposições A e B

28 / 161

Syntax

29 / 161

Syntax - 2

30 / 161

Probabilidade condicionalProbabilidade condicional ou posterior, e.g., P(cárie|dordedente) = 0.8

i.e., dado que dordedente é tudo que conheço, a chance de cárie (vista por mim) é de 80%.

P(Cárie | Dordedente) = Vetor de 2 elementos cada um com dois elementos. Por Exemplo: P(Cárie | Dordedente) = <<0,8;0,2>;<0,01;0,99>>

Se sabemos mais, e.g., cárie é também observada, então

P(cárie|dordedente, cárie) = 1OBS:

1) A crença menos específica permanece válida, mas pode ficar inútil.

2) A nova evidência pode ser inútil:

P(cárie|dordedente, Corinthians derrotado) = P(cárie|dordedente) = 0.8

NOTE A IMPORTÂNCIA DO CONHECIMENTO DO DOMÍNIO PARA QUALQUER PROCESSO DE INFERÊNCIA.

31 / 161

O Axioma Básico da Prob. condicional

Ou:

32 / 161

Regra da Cadeia

Prova:

33 / 161

Inversão Bayesiana (Regra de Bayes)

P(H|e): Probabilidade posterior

P(H): Probabilidade a priori

Por quê a fórmula é importante?

Muitas vezes P(e|H) é fácil de calcular, ao contrário de P(H|e)?

34 / 161

Another example: Meningitis Let’s assume 0.8 of people with Meningitis have stiff neck,

probability of Meningitis is 1 in 10000 and 0.1 of having stiff neck..

35 / 161

Normalization

Let’s say we want to calculate P(H|e), the posterior probability distribution over H given a piece of evidence e, and H has n possible values (h1 , …hn)

We can apply Baye’s rule for each value of H

Suming up and Noting that , . We have:

36 / 161

Normalization - 2 As,

P(e) is just a normalization factor in :

And we can calculate P(H|e), without explicitly estimating P(e), using a normalization factor α

37 / 161

Full joint distributions

38 / 161

Full joint distributions - 2

39 / 161

Inference from Full joint distributions

d - number of possible elements of variable, n – number of variables

40 / 161

lnteligência Artificial

Introdução a Redes Bayesianas




41 / 161

Sumário Interpretação de Probabilidades

Redes Bayesianas ou Redes de crença

Inferência probabilística

Aprendizado em método probabilísticos

Métodos simplificados: Bayes ingênuo e Noisy-OR

42 / 161

Interpretations of Probabilities

Do we need to toss a coin infinity (or many times) to make statements about the probability of it landing head in one specific toss?

The alternative view of probability is to think of probabilities as reporting our subjective degrees of belief. This view was expressed by Thomas Bayes (1958) and Pierre Simon de Laplace (1951)

43 / 161

Principal Principle and Conditionalization

44 / 161

Rede Bayesiana ou Rede de Crença (Belief Network)

45 / 161

Example: Is it an Earthquake or burglar?

46 / 161

Example - 2

47 / 161

Markov Blanket (Cobertor de Markov)

48 / 161

Método para construção de uma rede

49 / 161

Exemplo

50 / 161

51 / 161

52 / 161

53 / 161

54 / 161

55 / 161

Outro Exemplo: Conserto de Carro

56 / 161

Exemplo: Seguro de Carro Problema: Estimar custos (Medical, Liability, Property)

dados as informações do segurado e outras disponíveis por outras fontes (em Cinza)

57 / 161

I-map and D-map and Perfect Map I-map: If there are no direct dependencies in the system

being modeled which are not already explicitly shown via arcs, the BN is called an Independence Map (or, I-map for short).

D-map: If every arc in a BN happens to correspond to a direct dependence in the system, then the BN is said to be a Dependence-map (or, D-map for short).

A BN which is both an I-map and a D-map is said to be a perfect map.

58 / 161

Sumário Redes Bayesianas ou Redes de crença




59 / 161

Inferência em Redes Bayesianas Dada uma rede, devemos ser capaz de inferir a partir dela

isto é :

Busca responder questões simples, P(X| E=e) Ex.:

Ou questões conjuntivas: P( Xi , Xj | E=e) Usando o fato:

A inferência pode ser feita a partir da distribuição conjunta total ou por enumeração

60 / 161

Inferência com Distribuição Conjunta Total: Exemplo Por exemplo para saber P(A|b) temos P(A|b)= P(A,b)/P(b)=

<P(a, b)/P(b);P(⌐a , b)/P(b) > =

= α < P(a, b);P(⌐a , b)> = α [ <P(a,b,c)+P(a,b,⌐c); P(⌐a,b,c)+P(⌐a,b, ⌐c)>]

Observe que α pode ser visto como um fator de normalização para o vetor resultante da distribuição de probabilidade, pedida P(A|b). Assim pode-se evitar seu cálculo, Simplesmente normalizando <P(a,b); P(⌐a , b) >

62 / 161

Inferência por Enumeração

63 / 161

Inferência por Enumeração - 2

Pode ser melhorada através do armazenamento dos valores já calculados (Programação Dinâmica)

64 / 161

Calculando P(b) não normalizado"P(b) nao normalizado"

0,0005922

0,001

+ 0,5922426

0,001197 0,591046

* 0,002 * 0,998

+ 0,598525 + 0,59223

0,5985 0,000025 0,5922 0,00003 Produtorio

0,95 0,05 0,94 0,06

0,9 0,01 0,9 0,01

0,7 0,05 0,7 0,05

65 / 161

Calculando P(não b) não normalizado"P(nao b) nao normalizado"

0,001492

0,999

+ 0,001493

0,000366 0,001127

* 0,002 * 0,998

+ 0,183055 + 0,00113

0,1827 0,000355 0,00063 0,0005

0,29 0,71 0,001 0,999

0,9 0,01 0,9 0,01

0,7 0,05 0,7 0,05

66 / 161

Algoritmo de Enumeração

67 / 161

Inferência por Enumeração Algoritmo de Enumeração permite determinar uma

distribuição de probabilidade condicional P(variável de saída| evidências conhecidas)

Também é possível responder perguntas conjuntivas usando o fato:

Demonstração?….

68 / 161

Demonstração

como:

69 / 161

Inferência por Enumeração Como observado, a enumeração tende a recalcular várias

vezes alguns valores

Pode-se eliminar parte do retrabalho através da técnica de programação dinâmica (eliminação de variável)… Basicamente, os valores já calculados são armazenados em uma tabela e selecionados quando novamente necessários…(mais informações Russel, cap. 14)

72 / 161

Sumário Redes Bayesianas ou Redes de crença




73 / 161

Aprendizado em modelos probabilísticos Aprender em redes bayesianas é o processo de

determinar a topologia da rede (isto é, seu grafo direcionado) e as tabelas de probabilidade condicional

Problemas? Como determinar a topologia? Como estimar as probabilidades ? Quão complexas são essas tarefas?

Isto é quantas topologias e quantas probabilidades precisariam ser determinadas….

74 / 161

Tamanho das Tabelas de Probabilidade Condicional e Distribuição Conjunta Total Vamos supor que cada variável é influenciada por no máximo k outras variáveis

(Naturalmente, k<n=total de variáveis).

Supondo variáveis booleanas, cada tabela de probabilidade condicional (CPT) terá no máximo 2k entradas (ou probabilidades). Logo ao total haverá no máximo n* 2k entradas

Enquanto, na distribuição conjunta Total haverá 2n entradas. Por exemplo, para n=30 com no máximo cinco pais (k=5) isto significa 960 ao invés de mais um bilhão (230)

75 / 161

Número de “entradas” da Distribuição Conjunta e na Rede Bayesiana - 2 Em domínios onde cada variável pode ser diretemante influenciada por todas

as outras, tem-se a rede totalmente conectada e assim exige-se a quantidade de entradas da mesma ordem da distribuição conjunta total

Porém se essa dependência for tênue, pode não valer a pena a complexidade adicional na rede em relação ao pequeno ganho em exatidão

Via de regra, se nos fixarmos em um modelo causal acabaremos tendo de especificar uma quantidade menor de números, e os números frequentemente serão mais fáceis de calcular. (Russel,Norvig, 2013, pg. 453)

Modelos causais são aqueles onde se especifica no sentido causa efeito, isto é P(efeito|causa) ao invés de P(causa|efeito), oque geralmente é necessário para diagnóstico

76 / 161

Simplificando a representação tabelas de probabilidade condicional (CPT) Vimos que que o número de entradas de uma CPT

cresce exponencialmente Para o caso binário e K pais, a CPT de um nó terá 2k

probabilidades a serem calculadas

Vejamos duas abordagens para simplificar a rede através da adoção de hipóteses simplificadoras Bayes Ingênuo e OU-ruidoso

77 / 161

Naïve Bayes (Bayes Ingênuo) Uma classe particular e simples de redes bayesianas é

chamada de Bayes Ingênuo (Naïve Bayes) Ela é simples por supor independência condicional entre

todas as variáveis X dada a variável Class As vezes, chamado também de classificador Bayes, por ser

frequentemente usado como abordagem inicial para classificação

78 / 161

Naïve Bayes (Bayes Ingênuo) - 2 A topologia simples traz a vantagem da representação

concisa da Distribuição Conjunta Total. Como todo os nós tem no máximo um pai, cada CPT de

no X tem apenas duas entradas e uma entrada no nó classe. Logo, (2n-1) entradas para toda a rede. Naïve Bayes é linear em relação ao número de nós (n) !!!!

“Na prática, sistemas de Bayes ingênuos podem funcionar surpreendentemente bem….”. pg. 438

79 / 161

Exemplo de Naïve Bayes Vamos retomar o exemplo do jogo de tênis

NÃOForteAltaBoaChuvosoX14SIMFracoNormalQuenteNubladoX13SIMForteAltaBoaNubladoX12SIMForteNormalBoaEnsolaradoX11SIMFracoNormalBoaChuvosoX10SIMFracoNormalFriaEnsolaradoX9NÃOFracoAltaBoaEnsolaradoX8SIMForteNormalFriaNubladoX7NÃOForteNormalFriaChuvosoX6SIMFracoNormalFriaChuvosoX5SIMFracoAltaBoaChuvosoX4SIMFracoAltaQuenteNubladoX3NÃOForteAltaQuenteEnsolaradoX2NÃOFracoAltaQuenteEnsolaradoX1JogarTênisVentoUmidadeTemperaturaCéuEx

80 / 161

Usando a abordagem Bayes ingênuo

Problema a resolver:

81 / 161

Solução: P(Play|Outlook,Temp,Hum,Wind)= P(Outlook,Temp,Hum,Wind|Play)P(Play)/P(Outlook,Temp,

Hum,Wind)= Regra da cadeia e indepêndencia: P(Outlook|Play)P(Temp|Play)P(Hum|Play)P(Wind|Play)P(Pl

ay)/ P(Outlook,Temp,Hum,Wind)

O método de inferência por enumeração já visto é aplicável!!!

Estima-se as probabilidades pelo conjunto de treinamento

83 / 161

Solução 3 - continuação Da mesma forma, P(sunny|play)P(cool|play)P(high|play)P(true|play)P(Play)/P(

e) = 3/5*1/5*4/5*3/5*5/14/P(e) =0.0206/P( e) Mas P(H,e) e P(not H,e) tem que somar 1, assim:

84 / 161

Estimativas de Probabilides Qual a estimativa da probabilidade

P(Outlook=overcast|Play=no)?

Zero! Isto é razoável? Como resolver? Uma Solução: estimador de Laplace (Laplace smoothing). Seja V o número

de valores possíveis para A, estima-se P(A|B) :

P(A=a|B=b) = [N(A=a,B=b)+1]/[N(B=b)+V]

[Rational ]Decisions with Bayesian Networks




86 / 161

[Rational] Decisions

Rational Preferences and Lotteries

Money x Utility

Decision Networks

87 / 161

Lotteries

88 / 161

Preferências Racionais

89 / 161

Violação das Restrições leva a “Irracionalidade”

90 / 161

Maximizing Expected Utility (MEU)

91 / 161

Estimando Utilidades

92 / 161

Definindo Funções de Utilidades através de loterias Dado o intervalo [0,1] entre a “pior catastrofe possível” e

“o melhor prêmio possível”, ao encontrar uma loteria [p,1;1-p,0] que seja indiferente a uma situação S o número p é a utilidade de S

Em ambientes, com prêmios determinísticos pode-se apenas estabelecer a ordem de preferências, nesse caso usa-se o termo utilidades ordinais

Funções de utilidades ordinais podem ser chamadas de funções de valor e são invariantes para qualquer transformação monotônica

93 / 161

Utilidade do dinheiro Preferências de um grupo sobre dinheiro certo (x) e

loteria [p,M;1-p,0]

94 / 161

Dinheiro vs Utilidade Dinheiro não tem uma relação linear (ou simples) com utilidade!

Ao estimar a utilidade em vários experimentos, observa-se que dada uma loteria L com valor esperado EMV(L) tem-se U(L) < U (EMV(L)), isto é as pessoas são aversas a risco

Um gráfico típico de dinheiro ($) vs Utilidade (U):

95 / 161

The Saint Petersburg Paradox The paradox is named from Daniel Bernoulli's presentation of the problem

and his solution, published in 1738 in St. Petersburg

A casino offers a game of chance for a single player in which a fair coin is tossed at each stage. The pot starts at 1 dollar and is doubled every time a head appears. The first time a tail appears, the game ends and the player wins whatever is in the pot.

Thus the player wins 1 dollar if a tail appears on the first toss, 2 dollars if a head appears on the first toss and a tail on the second.

Two questions: How much would you accept to pay for playing this game?

What is the expected monetary value of the game?

96 / 161

The Saint Petersburg Paradox As Bernoulli stated:

The determination of the value of an item must not be based on the price, but rather on the utility it yields…. There is no doubt that a gain of one thousand ducats is more significant to the pauper than to a rich man though both gain the same amount

Bernoulli proposed that utility of money should be logarithmic. U(M)= a*log2(M)+b

This makes EMV to be a finite value.

But it’s always possible to recreate the paradox by changing the function!!! Alternative theories may provide a better description model

(Prospect Theory)

http://en.wikipedia.org/wiki/Ducat

97 / 161

Problemas na Teoria da maximização da utilidade esperada A teoria da maximização da utilidade esperada é uma

teoria normativa. Ela descreve como um agente deve reagir. Entretanto, não é uma teoria descritiva da tomada de decisões reais

Há evidências experimentais que as pessoas violam os axiomas da teoria da utilidade

98 / 161

Escolha A ou B

A: 80% de chance de ganhar $4000

B: 100% de chance de ganhar $3.000

99 / 161

Escolha C ou D

C: 20% de chance de ganhar $4000

D: 25% de chance de ganhar $3.000

100 / 161

Supondo U(0)=0 Se maioria escolhe B em detrimento de A e C em

detrimento de D,

De A e B, temos que 0,8*U(4000)<U(3000)

De C e D temos que 0,8U(4000)>U(3000)

Contraditório!!!!

101 / 161

Teorias alternativas Em linhas gerais as pessoas divergem da teoria da

maximização da utilidade esperada em situações de probabilidade muito alta e/ou muito baixa

Há algumas teorias alternativas que se propõem a descrever o comportamento humano real. Uma das mais relevantes foi proposta por Kahneman e Tversky. Esta teoria propõe um modelo alternativo que descreve esse efeito “certeza” e outros

102 / 161

Decisões [Racionais] com Redes Bayesianas

Preferências Racionais

Utilidades x Dinheiro

Redes de Decisão

Classificação e Avaliação de classificadores

103 / 161

Decision Networks By now we know how to use Bayesian networks to

represent uncertainty and do probabilistic inference.

Now, we extend them to support decision making adding an explicit representation of both the actions under consideration and the value or utility of the resultant outcomes gives us decision networks (also called influence diagrams by Howard and Matheson, 1981).

Bayesian decision networks combine probabilistic reasoning with utilities, helping us to make decisions that maximize the expected utility,…

104 / 161

Principle of Expected Utility The principle of maximum expected utility asserts that an

essential part of the nature of rational agents is to choose that action which maximizes expected utility.

In some cases U(Oi | A) = U(Oi )

105 / 161

Decision Network Node types

106 / 161

Rede de Decisão Nós de acaso: (elipses) representam variáveis aleatórias.

Cada nó de acaso tem um distribuição de probabilidade condicional (dados os nós pais)

Nós de Decisão : (retângulos) representam as possíveis ações

Nós de utilidade: (losangos) representam as preferências do agente e podem ser usadas para definir as ações através da seleção da ação que maximiza a utilidade esperada

107 / 161

Assume that there is 30% of raining and in that case Melbourne has 60% of wining, but only 25% if it is not wet…

Make your decision network! Variables, conditional dependence, action and utility node

108 / 161

Possible solution…

109 / 161

Evaluating Decision Networks

110 / 161

Estimating Expected Utility Without evidence added, the probability of Melbourne

winning is

And:

111 / 161

Outro Exemplo: Escolha da localização de um aeroporto Dependendo da posição pode-se alterar:

O risco de acidentes (logo, o número esperado de mortes.. Deaths

O incômodo causado pelo barulho dos aviões, quanto mais próximo de uma cidade pior….Noise

É fácil perceber que Deaths e Noise serão diretamente afetados pelo volume de tráfego aéreo no aeroporto.

Naturalmente, o custo também é alterado pela localização do aeroporto (Cost) a desapropriação de um determinado terreno pode ser mais ou menos

litigioso….e os custos de ligação de transportes do aeroporto a cidade podem ser maiores ou menores afetando a construção

112 / 161

Decision Network

113 / 161

Podemos determinar distribuições de variáveis, mas como decidir? Nós de ação e nós de utilidade na rede;

114 / 161

Processo de Decisão… pi = P(deaths=i| ASite=s, Noise=n) Ou

P(outcome| action,evidence)

Utilidade Esperada (action=a) = ∑i U(outcomei)*P(outcome i|action=a,evidence)

Escolher ação que maximiza a utilidade esperada

115 / 161

Decisões com Redes Bayesianas Preferências Racionais

Utilidades x Dinheiro

Redes de Decisão Redes de Decisão e Decisão Sequenciada

Modelo Decisório de Markov

116 / 161

Information Links: When a variable need to be observed There may be arcs from chance nodes to decision nodes

— these are called information links (Jensen and Nielsen, 2007, p. 305).

These links indicate when a chance node needs to be observed before the decision D is made — but after any decisions prior to D.

With an information link in place, network evaluation can be extended to calculate explicitly what decision should be made, given the different values for that chance node.

117 / 161

Information Link: Example To illustrate the use of information links, let’s use again

the Melbourne football team example, but now Clare was only going to decide whether to accept the bet or not after she heard the weather forecast

Previously :

118 / 161

Previously Clare decided don’t accept the bet What if there is a reasonable Forecast about the weather? How

to change the network?

119 / 161

Introducing the New Node, Information Link and Decision Table

120 / 161

Decision Table Algorithm (single decision node, with information Link)

122 / 161

Decision Table with recorded actions (highest expected utility)

123 / 161

Another Example: Fever

Decision network?

Variables: Fever, Flu, Therm, Reaction, FeverLater?

Decision: Take aspirin

Utility function: What is relevant to the utility function?

124 / 161

A Decision Network for the Fever example

The action influences chance nodes, instead of the utility node !!!

125 / 161

Decision Table obtained by following the algorithm

126 / 161

Types of Actions There are two main types of actions in decision problems,

intervening and nonintervening. Non-intervening actions do not have a direct effect on

the chance variables being modeled, as in the Football team example.

Intervening actions do have direct effects on the world, in the fever example, deciding to take aspirin will affect the later fever situation.

127 / 161

Types of actions

Decision Making with Bayesian Networks




129 / 161

Sequential Decision Making Thus far, we have considered only single decision problems. Often however

a decision maker has to select a sequence of actions, or a plan. Sometimes, a sequence of action is not enough and it is required a function given observations

In the football decision problem used before, Clare might have a choice as to whether to obtain the weather forecast (perhaps by calling the weather bureau).

In the diagnostics example, the physician must decide whether to order another exam, before deciding on a treatment option.

This type of decision problem has two stages: 1. The decision whether to run a test or make an observation 2. The selection of a final action

130 / 161

A decision network showing the general structure for these test-act decision sequences….

131 / 161

Test-action Decision Sequence If the decision is made to run the test, evidence will be

obtained for the observation node Obs, before the Action decision is made; hence there is an information link from Obs to Action

And if we decide do not to run the test. ? This situation is handled by adding an additional state, unknown, to the Obs node and setting the CPT for Obs

132 / 161

Test-action Decision Sequence - 2 In this generic network, there are arcs from the

Action node to both the chance node X and the utility node U, indicating intervening actions with a direct associated cost. However, either of these arcs may be omitted, representing a non-intervening action or one with no direct cost, respectively

There is an implicit assumption of no-forgetting in the semantics of a decision network. The decision maker remembers the past observations and decisions, indicated explicitly by the information and precedence links

133 / 161

Algorithm for Test-action Decision Sequence

134 / 161

Real estate investment example Your friend is thinking about buying a house as an investment. While it looks fine

externally, he knows that there may be structural and other problems..

He estimates that there is a 70% chance that the house is really in good condition, with a 30% chance that it could be a real dud

He plans to resell the house after doing some renovation and estimates that if the house really is in good condition (i.e., structurally sound), he should make a $5,000 profit, but if it isn’t, he will lose about $3,000 on the investment

He knows that he can get a building contractor to do a full inspection for $600. He also knows that the inspection report may not be completely accurate. He estimates 5% of Bad report for a good house, and 10% of Good report for a Bad house.

The questions are:

Is it worth it to have the building inspection done?

.. and then he should buy the house or not?.

Exercise: Build The Decision Network !!!

135 / 161

A Decision Network

136 / 161

Evaluating by Decision tree When Your friend decides about Inspection, he doesn’t have any information about

the chance nodes, so there are no information links entering the Inspect decision node.

When he decides whether or not to buy, he will know the outcome of that decision hence the information link from Report to BuyHouse.

The temporal ordering of his decisions, first about the inspection, and then whether to buy, is represented by the precedence link from Inspect to BuyHouse.

Note that there is a directed path from Inspect to BuyHouse (via Report) so even if there was no explicit precedence link added by the knowledge engineer for this problem, the precedence could be inferred from the rest of the network structure

137 / 161

Evaluation using a decision tree model In order to show the evaluation of the decision network,

we will use a decision tree representation for the problem

To understand a decision tree, we start with the root node, which in this case is the first decision node, whether or not to inspect the house. Taking the path from the root to leaves each path means: From a decision node, it indicates which decision is made From a chance node, it indicates which value has been

observed

138 / 161

I: Inspect the house BH: Buy the house C: Condition R: Report

Note: We could include

R=unknown, when I=no

But it wouldn’t change

anything

139 / 161

How to decide? Decision Tree Evaluation Algorithm

140 / 161

Evaluated Decision Tree ExpectedUtilities areshown underlined

141 / 161

Decisions and their Expected Utilities

The report may change the decision of buying the house!

142 / 161

Value of information The decision of whether to gather new information is

based on the value of the information or Expected Benefit

In the real estate investment problem,

Note that it already computes the cost of the inspection. So the price is worth paying

143 / 161

Sequential Decision Making Sequential Decision Making may be approached using

another type of Graph Probabilistic Model (Modelos Probabilísticos em Grafos) Markov Chain (Redes ou Cadeias de Markov)

Sequential Decision making is complex in Bayesian Networks, one way of dealing with it is using Dynamic Bayesian Network (DBN) which is going to be our next subject

Dynamic Bayesian networks

145 / 161

Dynamic Bayesian networks DBN are also called dynamic belief networks (Russell and

Norvig, 1995, Nicholson, 1992), probabilistic temporal networks (Dean and Kanazawa, 1989, Dean and Wellman, 1991) and dynamic causal probabilistic networks (Kjærulff, 1992).

Dynamic Bayesian networks (DBNs) explicitly model change over time.

After, we will extend these DBNs with decision and utility nodes, to give dynamic decision networks, which are a general model for sequential decision making or planning under uncertainty.

146 / 161

Bayesian Networks and time Although a causal relationship represented by an arc implies a temporal

relationship, BNs do not explicitly model temporal relationships between variables.

And the only way to model the relationship between the current value of a variable, and its past or future value, is by adding another variable with a different name. We saw an example of this with the fever example earlier with the use of the FeverLater node.

147 / 161

BN and time A Bayesian network is defined by a set of random variable and

arcs connecting them

When constructing a DBN for modeling changes over time, we include one node for each Xi for each time step. If the current time step is represented by t, the previous time step by t-1, and the next time step by t +1, then the corresponding DBN nodes will be:

148 / 161

Dynamic Bayesian Network The relationships between variables in a time-slice are

represented by intra-slice arcs, Although it is not a requirement, the structure of a time-slice does not usually change over time

The relationships between variables at successive time steps are represented by inter-slice arcs, also called temporal arcs, including relationships between the same variable over time, i.e: Xi

t Xit+1

149 / 161

General structure of a Dynamic Bayesian Network

Note that there are no arcs that span more than a single time step. This is another example of the Markov assumption

150 / 161

Variables and their relations intra-slice and inter-slice

Given the usual restriction that the networks for each time slice are exactly the same and that the changes over time also remain the same (i.e., both the structure and the CPTs are unchanging), a DBN can be specified very compactly. The specification must include: Node names

Intra-slice arcs

Temporal (inter-slice) arcs

CPTs for the first time slice t0 (when there are no parents from a previous time)

CPTs for t +1 slice (when parents may be from t or t +1 time-slices).

151 / 161

The Fever Aspirin Example

152 / 161

DBN Fever Aspirin Example

153 / 161

Reasoning in DBN Given evidence about a set of nodes, from the first time slice up to and

including the current time-slice t, we can perform belief updating on the full DBN, using standard BN inference algorithms.

This means obtaining new posterior distributions for all the non-evidence nodes, including nodes in the t+1 and later time-slices. This updating into the future is called probabilistic projection

However, this type of DBN gets very large, very quickly, especially if the interval between time slices is short. To cope, in most cases the DBN is not extended far into the future. Instead, a fixed size, sliding “window” of time slices is maintained.

154 / 161

DBN and sliding “window” of two time-slices (Shading indicates evidence node.)

155 / 161

Sliding window… As the reasoning process moves forward with time, one

older time slice is dropped off the DBN, while another is added.

This use of a window means that every time we move the window along, the previous evidence received is no longer directly available. Instead, it is summarized taking the current belief for (root) nodes, and making these distributions the new priors

The DBN updating process is given in Algorithm 4.5. Note that the steps of this DBN updating algorithm are exactly those of a technique used in classical control theory, called a Kalman Filter

156 / 161

DBN updating process

157 / 161

Inference in DBN Exact clustering algorithms can be applied to DBNs,

particularly if the inference is restricted to two time-slices

Unfortunately, it is common that there is a cluster containing all the nodes in a time slice with inter-slice connections, so the clusters become hard computationally

158 / 161

Dynamic decision networks Just as Bayesian networks can be extended with a

temporal dimension to give DBNs, so can decision networks be extended to give dynamic

decision networks (DDNs).

159 / 161

A DDN for the Fever problem

160 / 161

Mobile Robot Example

The problem of a mobile robot that does localization and tracking can be modeled with a DDN as follows: The nodes ST and SR represent the locations of the target and the robot, respectively

The decision node is M, representing the robot’s movement actions options

The nodes OR and OT represent the robot’s observations of its own and the target’s location, respectively

The overall utility is the weighted sum over time of the utility at each step Ut , which is a measure of the distance between the robot and its target.

161 / 161

Mobile Robot Example

introdução a inteligência artificialpauloac/pee/introia.pdfo termo inteligência artificial foi...

Documents