Translate

Mostrando postagens com marcador Bayes. Mostrar todas as postagens
Mostrando postagens com marcador Bayes. Mostrar todas as postagens

06 julho 2021

História Recente do Teorema de Bayes


Pedro Demo (foto) conta a história do Teorema de Bayes de maneira didática. A conquista do Teorema nas ciências tem alguns "casos" muito interessantes:

(...) Com quatro anos em O Federalista, Mosteller e Wallace tiveram uma ideia rompedora. Um historiador, Adair, alegou que um estudo de 1916 dizia que Hamilton usava "while", enquanto Madison "whilst". O problema é que os termos não eram muito usados, sem falar em erros de tipografia ou de revisão. Mosteller foi introduzindo poder computacional na pesquisa. Chegaram, então, às prévias de Bayes, que poderiam ser importantes para começar a caminhada, e, depois, ir melhorando com dados objetivos. E um resumo: Bayes tinha voltado as costas para sua criação; um quarto de século depois, Laplace a glorificou; nos 1800, foi usada e solapada; ridicularizada nos início dos 1900, foi usada em segrego desesperado na Segunda Guerra e depois utilizada com vigor e condescendência. Mas nos 1970, a regra estava em decadência (Jones, 1986:686. Box & Tiao, 1973:1). Perda de liderança, série de mudanças de carreira e movimento geográficos contribuíram para o retraimento. Savage, porta-voz chefe americano de Bayes como sistema lógico e compreensivo, morreu de infarto em 1971; após a morte de Fermi, Jeffreys e o físico americano Jaynes fizeram campanha em vão por Bayes nas ciências físicas; Jaynes, fervoroso demais, afastou colegas. Lindley foi, aos poucos, construindo departamentos bayesianos de estatística no Reino Unido, mas deixou a gestão em 1977 para fazer pesquisa solo. Good mudou-se do código supersecreto e agências de decodificação da Inglaterra para a academia em Virginia Tech. Madansky, que gostava de toda técnica que funcionava, mudou de RAND para negócio privado e depois para U. of Chicago Business School, onde pretendeu achar mais aplicações do que nos departamentos de estatística. Box interessou-se pelo controle de qualidade na manufatura e, como Deming e outros assessorou a indústria automotiva japonesa. Raiffa também mudou para política pública, enquanto Schlaifer, o bayesiano não matemático, tentou programar computadores. (...) 

Quando Box, J.S. Hunter e W.G. Hunter escreveram Statistics for Experimenters em 1978, intencionalmente omitiram a regra de Bayes: controversa demais para vender. O livro virou, de fato, best-seller. Ironicamente, um filósofo de Oxford, Swinburne, não achou assim: inseriu opiniões pessoais no palpite prévio e nos dados supostamente objetivos do teorema de Bayes para concluir que Deus tinha mais de 50% de probabilidade de existir; depois, Swinburne imaginaria que a probabilidade da ressurreição de Jesus teriam perto de 97%. Eram cálculos que nem Bayes, por mais reverendo que tivesse sido, nem Price, também reverendo, haviam sonhado fazer, o que só valeu para desgastar Bayes. No período, o bastião de Neyman do frequentismo em Berkeley ficou o centro maior de estatística nos Estados Unidos. O departamento amplo de estatística em Stanford, puxado por Stein e outros, eram frequentistas, ou antibayesianos. 

Eisenhower havia lançado a indústria de energia nuclear com seu Atoms for Peace em 1953. Vinte anos depois, não havia estudos abrangentes sobre seus riscos, sendo que corporações privadas possuíam e operaram 50 plantas americanas. Quando o Congresso passou a debater se era o caso absolver proprietários de plantas e operadores de todos os acidentes, Atomic Energy Commission finalmente mandou estudar assiduamente. Não se designou um estatístico, mas um físico e engenheiro: Rasmussen, que fora marinheiro na Segunda Guerra, graduou-se de Gettysburg College em 1950 e obteve Ph.D. em física nuclear de energia baixa no MIT em 1956. Ensinava física lá, até que MIT formou um dos primeiros departamentos de engenharia nuclear em 1958. Não havia ainda ocorrido acidente nuclear. Mas, podendo ser catastrófico, era preciso saber. Faltando dados sobre a fusão de materiais, Rasmussen decidiu fazer o Madansky fez em RAND, ao estudar acidentes com bomba-H. Era o caso recorrer à opinião de expertos e à análise bayesiana. Engenheiros sempre apreciaram julgamento profissional, mas os frequentistas viam subjetividade demais. A Guerra do Vietnã acabaram com os oráculos expertos, também a confiança em líderes havia ruído. A fé na tecnologia também caíra; em 1971 o Congresso cancelou sua participação no avião supersônico de passageiro, SST, a primeira vez que os Estados Unidos rejeitariam uma nova tecnologia. Não havendo dados sobre as plantas, Rasmussen e a turma dele teriam de recorrer à informação disponível prévia. Evitando usar a equação de Bayes, apelaram para as árvores decisórias de Raiffa. Raiffa fora missionário bayesiano e suas árvores tinham raízes em Bayes; evitava-se "regra de Bayes", chamando de abordagem subjetiva. O relatório final de 1974, estava carregado de incertezas bayesianas e distribuições de probabilidade sobre taxas de falha de equipamentos e erros humanos. Frequentistas não atribuíam distribuições de probabilidade para desconhecidos. A única referência a Bayes ficou escondida num cantinho do apêndice III: "Tratar dados como variáveis aleatórias está por vezes associado com abordagem bayesiana..., a interpretação bayesiana pode também ser usada" (M:180). (...)

23 junho 2021

Sobre o Teorema de Bayes


Pedro Demo faz um bom retrospecto sobre o Teorema de Bayes

(...) Sua obra é hoje reverenciada; cumpre reconhecer o que Bayes não fez. Não produziu a versão moderna da regra de Bayes; nem mesmo usou alguma equação algébrica; usou a anotação geométrica antiquada de Newton para calcular e adicionar áreas; nem desenvolveu seu teorema como método matemático poderosos; acima de tudo, ao contrário de Price, não mencionou Hume, religião ou Deus (Holder, 1998. Hume, 1748. Miller, 1994. Owen, 1987). 

Ao invés, cautelosamente confinou-se à probabilidade dos eventos e não mencionou a lide de hipotetizar, predizer, decidir ou assumir ação. Não sugeriu usos possíveis, em teologia, ciência ou ciência social; gerações futuras estenderiam a descoberta para todas as coisas e soluções de mil problemas práticos. Sequer deu nome à sua descoberta. Chamar-se-ia a probabilidade das causas ou probabilidade inversa por 200 anos; não seria bayesiana até os 1950. Na verdade, deu os primeiros passos. Laplace daria formato matemático adequado logo depois (Bellhouse, 2002; 2007a; 2008b).

No texto há uma extensa literatura sobre o tema. 

Foto: aqui

15 junho 2020

Amor incondicional, segundo Bayes

Este texto da Aeon explica o que seria o amor, na abordagem bayesiana. O que seria o amor? Usando a teoria de probabilidade de Bayes, o texto faz uma distinção entre o amor condicional e o amor incondicional. Eis meu trecho favorito:

Alternatively, unconditional love is love that will not change according to any information, as it was not built on the basis of information in the first place. This is love without reason, where no evidence or information can alter it. Why do you love someone? For no reason!

(O amor incondicional é o amor que não muda de acordo com qualquer informação, já que não foi construído com base em informações. É o amor sem razão, onde nenhuma evidência ou informação pode alterá-lo)

No final, uma demonstração matemática simples do que seria o amor incondicional.

(O que isto tem com a contabilidade? Bom, nosso amor pelas técnicas contábeis é tipicamente condicional. Ou deveria ser. Nova informação deve mudar nosso amor. Nosso amor pelas crenças contábeis também. Os postulados seriam o mais próximo do amor incondicional.)

19 outubro 2017

A questão do p-valor na pesquisa científica

Geralmente as pesquisa científicas trabalham com o p-valor. Geralmente trabalhamos com um p-valor de 5%. Entretanto, recentemente, muitas críticas estão sendo dirigidas para esta estatística. Alguns periódicos estão simplesmente proibindo de usar o termo. E muitas pesquisas, quando replicadas, estão apresentando um p-valor acima de 5%, o que "não confirma" a conclusão dos trabalhos publicados.

Muitas sugestões tentam resolver este problema através de testes mais rigorosos e replicação das pesquisas. Outra solução seria mexer no p-valor usado tradicionalmente, de 5%:

Essa inconsistência é típica de muitos estudos científicos. É particularmente comum para p-valor em torno de 0,05 . Isso explica por que uma proporção tão alta de resultados estatisticamente significativos não se replicam.

Em setembro, meus colegas e eu propusemos uma nova idéia: Somente os valores de P inferiores a 0,005 devem ser considerados estatisticamente significativos. Os valores de P entre 0,005 e 0,05 devem ser chamados de sugestivos.

Em nossa proposta, resultados estatisticamente significativos são mais propensos a replicar, mesmo depois de explicar as pequenas probabilidades anteriores que geralmente pertencem a estudos nas ciências sociais, biológicas e médicas.

Além disso, pensamos que a significância estatística não deve servir como um limite de linha brilhante para publicação. Os resultados estatisticamente sugestivos - ou mesmo os resultados que são em grande parte inconclusivos - também podem ser publicados, com base em se eles relataram ou não importantes evidências preliminares sobre a possibilidade de que uma nova teoria possa ser verdadeira.


A base da ideia dos autores é o Teorema de Bayes.

11 outubro 2014

Poder da Estatística Bayesiana


Statistics may not sound like the most heroic of pursuits. But if not for statisticians, a Long Island fisherman might have died in the Atlantic Ocean after falling off his boat early one morning last summer.

The man owes his life to a once obscure field known as Bayesian statistics — a set of mathematical rules for using new data to continuously update beliefs or existing knowledge.

The method was invented in the 18th century by an English Presbyterian minister named Thomas Bayes — by some accounts to calculate the probability of God’s existence. In this century, Bayesian statistics has grown vastly more useful because of the kind of advanced computing power that did not exist even 20 years ago.

It is proving especially useful in approaching complex problems, including searches like the one the Coast Guard used in 2013 to find the missing fisherman, John Aldridge (though not, so far, in the hunt for Malaysia Airlines Flight 370).

Now Bayesian statistics are rippling through everything from physics to cancer research, ecology to psychology. Enthusiasts say they are allowing scientists to solve problems that would have been considered impossible just 20 years ago. And lately, they have been thrust into an intense debate over the reliability of research results.
Thomas Bayes

When people think of statistics, they may imagine lists of numbers — batting averages or life-insurance tables. But the current debate is about how scientists turn data into knowledge, evidence and predictions. Concern has been growing in recent years that some fields are not doing a very good job at this sort of inference. In 2012, for example, a team at the biotech company Amgen announced it had analyzed 53 cancer studies and found it could not replicate 47 of them.

Similar follow-up analyses have cast doubt on so many findings in fields such as neuroscience and social science that researchers talk about a “replication crisis”

Some statisticians and scientists are optimistic that Bayesian methods can improve the reliability of research by allowing scientists to crosscheck work done with the more traditional or “classical” approach, known as frequentist statistics. The two methods approach the same problems from different angles.

The essence of the frequentist technique is to apply probability to data. If you suspect your friend has a weighted coin, for example, and you observe that it came up heads nine times out of 10, a frequentist would calculate the probability of getting such a result with an unweighted coin. The answer (about 1 percent) is not a direct measure of the probability that the coin is weighted; it’s a measure of how improbable the nine-in-10 result is — a piece of information that can be useful in investigating your suspicion.

By contrast, Bayesian calculations go straight for the probability of the hypothesis, factoring in not just the data from the coin-toss experiment but any other relevant information — including whether you have previously seen your friend use a weighted coin.

Scientists who have learned Bayesian statistics often marvel that it propels them through a different kind of scientific reasoning than they had experienced using classical methods.

“Statistics sounds like this dry, technical subject, but it draws on deep philosophical debates about the nature of reality,” said the Princeton University astrophysicist Edwin Turner, who has witnessed a widespread conversion to Bayesian thinking in his field over the last 15 years.

Countering Pure Objectivity

Frequentist statistics became the standard of the 20th century by promising just-the-facts objectivity, unsullied by beliefs or biases. In the 2003 statistics primer “Dicing With Death,” Stephen Senn traces the technique’s roots to 18th-century England, when a physician named John Arbuthnot set out to calculate the ratio of male to female births.

Arbuthnot gathered christening records from 1629 to 1710 and found that in London, a few more boys were recorded every year. He then calculated the odds that such an 82-year run could occur simply by chance, and found that it was one in trillions. This frequentist calculation can’t tell them what is causing the sex ratio to be skewed. Arbuthnot proposed that God skewed the birthrates to balance the higher mortality that had been observed among boys, but scientists today favor a biological explanation over a theological one.

Later in the 1700s, the mathematician and astronomer Daniel Bernoulli used a similar technique to investigate the curious geometry of the solar system, in which planets orbit the sun in a flat, pancake-shaped plane. If the orbital angles were purely random — with Earth, say, at zero degrees, Venus at 45 and Mars at 90 — the solar system would look more like a sphere than a pancake. But Bernoulli calculated that all the planets known at the time orbited within seven degrees of the plane, known as the ecliptic.

What were the odds of that? Bernoulli’s calculations put them at about one in 13 million. Today, this kind of number is called a p-value, the probability that an observed phenomenon or one more extreme could have occurred by chance. Results are usually considered “statistically significant” if the p-value is less than 5 percent.Photo
The Coast Guard, guided by the statistical method of Thomas Bayes, was able to find the missing fisherman John Aldridge.CreditDaniel Shea

But there is a danger in this tradition, said Andrew Gelman, a statistics professor at Columbia. Even if scientists always did the calculations correctly — and they don’t, he argues — accepting everything with a p-value of 5 percent means that one in 20 “statistically significant” results are nothing but random noise.

The proportion of wrong results published in prominent journals is probably even higher, he said, because such findings are often surprising and appealingly counterintuitive, said Dr. Gelman, an occasional contributor to Science Times.

Looking at Other Factors

Take, for instance, a study concluding that single women who were ovulating were 20 percent more likely to vote for President Obama in 2012 than those who were not. (In married women, the effect was reversed.)
Continue reading the main story

Dr. Gelman re-evaluated the study using Bayesian statistics. That allowed him to look at probability not simply as a matter of results and sample sizes, but in the light of other information that could affect those results.

He factored in data showing that people rarely change their voting preference over an election cycle, let alone a menstrual cycle. When he did, the study’s statistical significance evaporated. (The paper’s lead author, Kristina M. Durante of the University of Texas, San Antonio, said she stood by the finding.)

Dr. Gelman said the results would not have been considered statistically significant had the researchers used the frequentist method properly. He suggests using Bayesian calculations not necessarily to replace classical statistics but to flag spurious results.

A famously counterintuitive puzzle that lends itself to a Bayesian approach is the Monty Hall problem, in which Mr. Hall, longtime host of the game show “Let’s Make a Deal,” hides a car behind one of three doors and a goat behind each of the other two. The contestant picks Door No. 1, but before opening it, Mr. Hall opens Door No. 2 to reveal a goat. Should the contestant stick with No. 1 or switch to No. 3, or does it matter?

A Bayesian calculation would start with one-third odds that any given door hides the car, then update that knowledge with the new data: Door No. 2 had a goat. The odds that the contestant guessed right — that the car is behind No. 1 — remain one in three. Thus, the odds that she guessed wrong are two in three. And if she guessed wrong, the car must be behind Door No. 3. So she should indeed switch.

In other fields, researchers are using Bayesian statistics to tackle problems of formidable complexity. The New York University astrophysicist David Hogg credits Bayesian statistics with narrowing down the age of the universe. As recently as the late 1990s, astronomers could say only that it was eight billion to 15 billion years; now, factoring in supernova explosions, the distribution of galaxies and patterns seen in radiation left over from the Big Bang, they have concluded with some confidence that the number is 13.8 billion years.

Bayesian reasoning combined with advanced computing power has also revolutionized the search for planets orbiting distant stars, said Dr. Turner, the Princeton astrophysicist.

In most cases, astronomers can’t see these planets; their light is drowned out by the much brighter stars they orbit. What the scientists can see are slight variations in starlight; from these glimmers, they can judge whether planets are passing in front of a star or causing it to wobble from their gravitational tug.Photo
Andrew Gelman, a statistics professor at Columbia, says the Bayesian method is good for flagging erroneous conclusions. CreditJingchen Liu

Making matters more complicated, the size of the apparent wobbles depends on whether astronomers are observing a planet’s orbit edge-on or from some other angle. But by factoring in data from a growing list of known planets, the scientists can deduce the most probable properties of new planets.

One downside of Bayesian statistics is that it requires prior information — and often scientists need to start with a guess or estimate. Assigning numbers to subjective judgments is “like fingernails on a chalkboard,” said physicist Kyle Cranmer, who helped develop a frequentist technique to identify the latest new subatomic particle — the Higgs boson.


Others say that in confronting the so-called replication crisis, the best cure for misleading findings is not Bayesian statistics, but good frequentist ones. It was frequentist statistics that allowed people to uncover all the problems with irreproducible research in the first place, said Deborah Mayo, a philosopher of science at Virginia Tech. The technique was developed to distinguish real effects from chance, and to prevent scientists from fooling themselves.

Uri Simonsohn, a psychologist at the University of Pennsylvania, agrees. Several years ago, he published a paper that exposed common statistical shenanigans in his field — logical leaps, unjustified conclusions, and various forms of unconscious and conscious cheating.

He said he had looked into Bayesian statistics and concluded that if people misused or misunderstood one system, they would do just as badly with the other. Bayesian statistics, in short, can’t save us from bad science.

At Times a Lifesaver

Despite its 18th-century origins, the technique is only now beginning to reveal its power with the advent of 21st-century computing speed.

Some historians say Bayes developed his technique to counter the philosopher David Hume’s contention that most so-called miracles were likely to be fakes or illusions. Bayes didn’t make much headway in that debate — at least not directly.

But even Hume might have been impressed last year, when the Coast Guard used Bayesian statistics to search for Mr. Aldridge, its computers continually updating and narrowing down his most probable locations.

The Coast Guard has been using Bayesian analysis since the 1970s. The approach lends itself well to problems like searches, which involve a single incident and many different kinds of relevant data, said Lawrence Stone, a statistician for Metron, a scientific consulting firm in Reston, Va., that works with the Coast Guard.

At first, all the Coast Guard knew about the fisherman was that he fell off his boat sometime from 9 p.m. on July 24 to 6 the next morning. The sparse information went into a program called Sarops, for Search and Rescue Optimal Planning System. Over the next few hours, searchers added new information — on prevailing currents, places the search helicopters had already flown and some additional clues found by the boat’s captain.

The system could not deduce exactly where Mr. Aldridge was drifting, but with more information, it continued to narrow down the most promising places to search.

Just before turning back to refuel, a searcher in a helicopter spotted a man clinging to two buoys he had tied together. He had been in the water for 12 hours; he was hypothermic and sunburned but alive.

Even in the jaded 21st century, it was considered something of a miracle.


Fonte:A version of this article appears in print on September 30, 2014, on page D1 of the New York edition with the headline: The Odds, Continually Updated. Order Reprints|Today's Paper|Subscribe

10 janeiro 2007

Notícias breves

1. Sexo antes de uma palestra pode reduzir strees - Clique aqui

2. Agências boicotam Gol após redução de comissões - Para entender a razão, clique aqui

3. Contabilidade de Leasing - Projeto Iasb e Fasb - clique aqui para link

4. Uso de Bayes para combater Spam

5. A idéia que o Esquimó tem inúmeras palavras para "neve" é um mito! - clique aqui

6. Fim da bolha do preço do cobre? - clique aqui e aqui para ler sobre o preço desse commodity numa excelente dissertação de mestrado que orientei