Translate

Mostrando postagens com marcador estatística. Mostrar todas as postagens
Mostrando postagens com marcador estatística. Mostrar todas as postagens

11 agosto 2024

Foi criticar a imprensa e cometeu um erro básico de estatística

Em um artigo publicado no Estadão em 4 de agosto, Claudio de Moura Castro apresenta diversos exemplos de como a imprensa comete erros primários. O título do artigo é “Será a Estatística a Arte de Iludir?”.




Na década de 50, Darrell Huff escreveu um livro que foi um sucesso e é citado até hoje. Chama-se “Como Mentir com Estatística”. Minha edição está em inglês e é datada de 1993, mas sei que o livro foi traduzido, inclusive para o português, e custa cerca de 40 reais.

Parece que, inspirado por Huff, o PhD e pesquisador em educação resolve criticar as pessoas que cometem erros com estatística. No entanto, ele comete um erro que um aluno bem instruído em Introdução à Estatística não cometeria. Leia o trecho:

Em um dos melhores jornais do País soou o alarme: “52,98% dos alunos ficaram abaixo da média”. Que estultice! A média está no “meio” da distribuição. Portanto, próximo da metade dos alunos, obrigatoriamente, estará sempre abaixo da média, não importando quanto sabem. É matematicamente impossível ser diferente.

Logo no primeiro exemplo, há uma confusão básica entre média e mediana. Alguns leitores também perceberam o problema, e irei explicá-lo com um pequeno exemplo hipotético. Suponha uma turma com as seguintes notas em uma disciplina:

2, 2, 2, 2, 4, 5, 5, 5, 5, 5

Para calcular a média, basta somar todas as notas e dividir por 10, pois há dez observações. A média, então, é 3,7. A mediana indica o ponto central das observações. Como temos um número par de observações, a mediana seria a média entre 4 e 5, ou seja, 4,5. Quantos alunos ficaram com nota abaixo da média? Quatro de dez, ou seja, 40%, correspondendo a todos os alunos com nota igual a 2. Quantos alunos ficaram com nota acima da média? Seis de dez, ou seja, 60%, que incluem o aluno com nota 4 e os alunos com nota 5. Todos eles obtiveram uma nota acima da média de 3,7. Assim, o autor do artigo, ao criticar os jornalistas, confunde média com mediana. Este é um tópico de um curso introdutório de estatística, e parece que ele não estaria qualificado para ministrar tal curso, pois não conhece essa distinção.

Há algo sobre o caso, que já foi escrito há milhares de anos, que diz: Não julgueis, para que não sejais julgados.

27 julho 2023

Limites da experiência pessoal

Eu não gosto quando pessoas citam sua experiência pessoal como forma de argumento. É tentador e realmente precisamos policiar sobre isto. Abaixo temos um trecho de um texto que expressa muito do que eu penso sobre o assunto. Por isto, estou usando a versão traduzida do Vivaldi:

É tentador acreditar que podemos simplesmente confiar na experiência pessoal para desenvolver nossa compreensão do mundo. Mas isso é um erro. O mundo é grande e podemos experimentar muito pouco pessoalmente. Para ver como é o mundo, precisamos confiar em outros meios: estatísticas globais cuidadosamente coletadas.

Obviamente, nossas interações pessoais fazem parte do que informa nossa visão de mundo. Reunimos uma imagem da vida dos outros ao nosso redor a partir de nossas interações com eles. Toda vez que encontramos pessoas e ouvimos sobre suas vidas, adicionamos mais uma perspectiva à nossa visão de mundo. Essa é uma ótima maneira de ver o mundo e expandir nosso entendimento. Não quero sugerir o contrário. Mas quero lembrar a nós mesmos o quão pouco podemos aprender sobre nossa sociedade apenas por meio de interações pessoais e quão valiosas são as estatísticas para nos ajudar a construir o resto do cenário.

O horizonte da nossa experiência pessoal

Quantas pessoas você conhece pessoalmente?

Vamos dar uma definição ampla do que significa conhecer alguém e dizer que incluímos todos que você conhece pelo nome. Um estudo nos EUA perguntou quantas pessoas os americanos sabem pelo nome e descobriu que a pessoa média sabe 611.

Vamos supor que você é mais social que o americano médio e conhece 800 pessoas. Em um mundo de 8 bilhões, isso significa que você conhece 0,00001% da população. 100.000 de por cento. (...)

É por isso que sou muito cético quando as pessoas dizem coisas sobre "o mundo hoje em dia" com base no que ouvem das pessoas que conhecem.

Não podemos ver grande parte do mundo através de nossa experiência direta. O horizonte de nossa experiência pessoal é muito estreito. Para cada pessoa que você conhece, existem dez milhões de pessoas que você não conhece.

E é provável que as pessoas que você conhece sejam bastante parecidas com você, longe de serem representativas do mundo - ou do seu país - como um todo.


The horizon of our personal experience — We cannot see much of the world through our own direct experience

Quão amplo pode ser o horizonte de nossa experiência pessoal?

Talvez você pense que restringir as pessoas com quem aprende ao número de pessoas que você conhece pelo nome é muito estreito. Afinal, você também aprende com estranhos que conhece, mesmo que não conheça os nomes deles.


Vamos supor que você seja excepcionalmente bom nisso e tenha uma conversa com três novas pessoas todos os dias da sua vida.

Se você conseguir continuar com isso por 73 anos, conhecerá 80.000 pessoas.4 Isso é mais de mil vezes o número de pessoas que você conheceria pelo nome.

Esta ainda é uma pequena fração do mundo. Depois de uma vida inteira conversando com pessoas, você terá falado com 0,001% da população mundial. Para cada pessoa com quem você conversou, ainda existem 100.000 pessoas com quem você nunca conversou. (...)

Estou focando na interação pessoal como a maneira mais direta e profunda de aprender sobre os outros, mas eles não são nossa única experiência pela qual aprendemos sobre os outros. Também aprendemos vendo as roupas de outras pessoas, vendo suas casas ou ouvindo outras pessoas falarem sobre suas experiências pessoais. Mas, embora essas experiências também ajudem, elas ainda não nos levam muito longe. O mundo é grande e, mesmo que você seja excepcionalmente atencioso e excepcionalmente bom em fazer conexões e falar com as pessoas, é simplesmente impossível ver grande parte do mundo diretamente.

(...) A perspectiva fragmentada da mídia: alguns holofotes sobre pessoas específicas, mas grande parte do mundo é deixada na escuridão

Os limites de nossa experiência pessoal não vão muito além de nós mesmos. Como podemos saber sobre o mundo se queremos ver além desse horizonte apertado?

De uma maneira ou de outra, temos que confiar na mídia. Seja televisão ou rádio, jornal ou fotografia, livros, podcasts, documentários, trabalhos de pesquisa, tabelas estatísticas ou mídias sociais.

Esse fato é tão óbvio que é fácil perder o quão importante é: tudo o que você ouve sobre alguém a mais de algumas dezenas de metros de distância, você sabe através de alguma forma de mídia.

É por isso que a mídia em que escolhemos confiar é tão importante para a nossa compreensão do mundo.

A notícia é a mídia que molda nossa imagem do mundo mais do que qualquer outra. Hoje, muitas vezes está entrelaçado com as mídias sociais. É valioso, pois nos permite ver além do nosso próprio horizonte apertado, mas a visão que as notícias oferecem é manchada e fragmentada.

As notícias sobre as coisas incomuns que acontecem em um particular dia, mas as coisas que acontecem todo dia nunca seja mencionado. Isso nos dá uma imagem tendenciosa e incompleta do mundo; somos inundados com notícias detalhadas sobre terrorismo mas quase nunca ouvi falar tragédias cotidianas como o fato de 16.000 crianças morrerem todos os dias.

O problema não é tanto o que a mídia cobre, mas o que não cobre. Os que são deixados na escuridão geralmente são pobres, impotentes e geograficamente distantes de nós. O que vemos nas notícias não é suficiente para entender o mundo em que vivemos.

(...) Os métodos estatísticos permitem tirar conclusões confiáveis sobre uma população como um todo. A estatística é uma conquista cultural extraordinária que nos permite ampliar nossa visão, desde as histórias individuais daqueles que estão em destaque até uma perspectiva que inclui todos. (...)

As estatísticas globais não apenas nos permitem ver como é o "mundo hoje em dia", mas também como ele mudou. As estatísticas que documentam como o mundo mudou costumam ser muito surpreendentes para aqueles que dependem principalmente das notícias para entender o mundo. Embora as notícias se concentrem predominantemente em todas as coisas que estão dando errado, as estatísticas históricas nos permitem ver também o que deu certo - o imenso progresso o mundo alcançou.

As estatísticas podem iluminar o mundo de uma maneira que nossas experiências pessoais e a mídia não podem. É por isso que, em Nosso mundo em dados, contamos com estatísticas globais para entender como o mundo está mudando.

O visual ilustra o que as estatísticas globais cuidadosamente coletadas possibilitam: elas iluminam o mundo inteiro ao nosso redor e nos permitem ver o que está acontecendo com todos. (...)

Nenhum dado é perfeito

A coleta e produção de boas estatísticas é um grande desafio. Os dados podem não ser representativos de alguma maneira, podem ser mal medidos e alguns podem estar faltando completamente. Todo mundo que depende de estatísticas para formar sua visão de mundo precisa estar ciente dessas deficiências. (...)

10 junho 2023

Corretor de estatística de tese

Statcheck é um pacote em R projetado para detectar erros estatísticos em artigos de psicologia revisados por pares, buscando resultados estatísticos nos artigos, refazendo os cálculos descritos em cada artigo e comparando os dois valores para ver se correspondem. Sua base é a apresentação dos resultados conforme as normas da American Psychological Association (APA). Por este motivo, uma das limitações do Statcheck é só ser capaz de detectar resultados relatados de acordo com as diretrizes da APA. Também não consegue verificar as estatísticas que estejam incluídas em tabelas no artigo. Finalmente, o Statcheck não consegue lidar com correções estatísticas nos testes.


O Statcheck foi criado em 2015 e estudos usando o Statcheck mostraram que muitos artigos de psicologia continham erros no relatório estatístico. Alguns periódicos usam o software como parte de seu processo de revisão por pares.

Um estudo de 2017 mostrou que o Statcheck identificava com precisão mais de 95% dos erros estatísticos. Mas uma análise posterior mostrou que o programa estava identificando apenas 60,4% dos testes. Descobriu-se também que o Statcheck tinha com um viés conservador, não sinalizando muitos testes inconsistentes. 

Um dos criadores da ferramenta, Nuijten, analisou sete mil artigos publicados desde 2003 e descobriu que os artigos possuíam 4,5% menos de erros depois que os periódicos começaram a usar o software. Dois outros periódicos, que não usam a ferramenta, tiveram uma redução de 1% somente. 

A ferramente está disponível como plug-in para o Word. 

Foto: Nathan Dumlao

17 novembro 2022

Existe somente um único teste

 Na Wikipedia há um total de 104 testes estatísticos. Como lidar com tudo isto? Bom, Allen Downey, em 2011, postou no seu blog: Há somente um único teste:



In summary, don't worry about finding the "right" test. There's no such thing. Put your effort into choosing a test statistic (one that reflects the effect you are testing) and modeling the null hypothesis. Then simulate it, and count!

Em 2016, Downey postou novamente sobre o mesmo assunto. 

02 abril 2019

Significância estatística

Esta é uma discussão que interessa aos pesquisadores: há um movimento para repensar a questão da significância estatística. Em outras palavras, os famosos 5% de qualquer pesquisa. Um manifesto, com mais de 800 assinaturas chama a atenção para o uso incorreto do assunto:

Pesquisas de centenas de artigos descobriram que os resultados estatisticamente não significativos são interpretados como indicando "nenhuma diferença" ou "nenhum efeito" em cerca de metade (ver "Interpretações erradas" e Informações suplementares). (...) Concordamos e pedimos que todo o conceito de significância estatística seja abandonado. (...)

Em vez disso [proibição dos valores p], e em consonância com muitos outros ao longo das décadas, estamos pedindo uma parada no uso de valores de P de maneira convencional e dicotômica - para decidir se um resultado refuta ou apóia uma hipótese científica


Bogard critica um pouco a abordagem do artigo:

Embora eu concorde com os sentimentos do resto do artigo da Nature, tenho medo de que os ideais expressos pelos autores possam ser abusados ​​por outros, querendo fugir das salvaguardas do rigor científico ou não compreender completamente os princípios da inferência estatística.

Segundo ele, Gellman também teria preocupações semelhantes. Bogard confessa que o assunto não é trivial, mesmo para ele:

É difícil para os Phds que passaram a vida toda fazendo essas coisas. É difícil para os profissionais que fizeram suas carreiras com isso. Isso é difícil para mim.

Uma palavra de alento no final:

O economista Noah Smith discutiu o retrocesso em relação aos valores de p há alguns anos. Ele afirmou corretamente que "se as pessoas estão fazendo ciência corretamente, esses problemas não serão importantes a longo prazo".

19 outubro 2017

A questão do p-valor na pesquisa científica

Geralmente as pesquisa científicas trabalham com o p-valor. Geralmente trabalhamos com um p-valor de 5%. Entretanto, recentemente, muitas críticas estão sendo dirigidas para esta estatística. Alguns periódicos estão simplesmente proibindo de usar o termo. E muitas pesquisas, quando replicadas, estão apresentando um p-valor acima de 5%, o que "não confirma" a conclusão dos trabalhos publicados.

Muitas sugestões tentam resolver este problema através de testes mais rigorosos e replicação das pesquisas. Outra solução seria mexer no p-valor usado tradicionalmente, de 5%:

Essa inconsistência é típica de muitos estudos científicos. É particularmente comum para p-valor em torno de 0,05 . Isso explica por que uma proporção tão alta de resultados estatisticamente significativos não se replicam.

Em setembro, meus colegas e eu propusemos uma nova idéia: Somente os valores de P inferiores a 0,005 devem ser considerados estatisticamente significativos. Os valores de P entre 0,005 e 0,05 devem ser chamados de sugestivos.

Em nossa proposta, resultados estatisticamente significativos são mais propensos a replicar, mesmo depois de explicar as pequenas probabilidades anteriores que geralmente pertencem a estudos nas ciências sociais, biológicas e médicas.

Além disso, pensamos que a significância estatística não deve servir como um limite de linha brilhante para publicação. Os resultados estatisticamente sugestivos - ou mesmo os resultados que são em grande parte inconclusivos - também podem ser publicados, com base em se eles relataram ou não importantes evidências preliminares sobre a possibilidade de que uma nova teoria possa ser verdadeira.


A base da ideia dos autores é o Teorema de Bayes.

14 abril 2017

Alan Smith: Por que devemos amar as estatísticas



Você acha que é bom em adivinhar dados estatísticos? Mesmo nos considerando bons em matemática ou não, nossas habilidades de entender e trabalhar com os números são realmente limitadas, diz o especialista em visualização de dados, Alan Smith. Nesta agradável palestra, Smith explora a relação entre o que sabemos e o que achamos que sabemos.

13 maio 2016

Falso Positivo

Aparentemente a CIA e NSA (fotografia) estão usando “metadados” para determinar a probabilidade de uma pessoa ser terrorista. Usando a rede de telefonia celular e algoritmos de aprendizagem de máquina, as agências de espionagem dos EUA estão trabalhando no Paquistão para impedir a proliferação do terrorismo. Gelman, citando Grothoff e Porup mostra que uma taxa de falso positivo de 0,18% numa população de 55 milhões de pessoas (do Paquistão que usa celular) significa que 99 mil inocentes serão taxados de terroristas.

18 outubro 2015

Excel é inadequado para Análises Estatísticas


Segue três artigos mostrando a razão do Excel ser inadequado pra análises estatísticas. Conforme os autores dos trabalhos:

No statistical procedure in Excel should be used until Microsoft documents that the procedure is correct; it is not safe to assume that Microsoft Excel’s statistical procedures give the correct answer. Persons who wish to conduct statistical analyses should use some other package.

Resumo:

The reliability of statistical procedures in Excel are assessed in three areas: estimation (both linear and nonlinear); random number generation; and statistical distributions (e.g., for calculating p-values). Excel's performance in all three areas is found to be inadequate. Persons desiring to conduct statistical analyses of data are advised not to use Excel.

On the accuracy of statistical procedures in Microsoft Excel 97 -B.D. Cullough,Berry Wilson- Computational Statistics & Data Analysis Elsevier 28 July 1999


Resumo

Some of the problems that rendered Excel 97, Excel 2000 and Excel 2002 unfit for use as a statistical package have been fixed in Excel 2003, though some have not. Additionally, in fixing some errors, Microsoft introduced other errors. Excel's new and improved random number generator, at default, is supposed to produce uniform numbers on the interval (0,1); but it also produces negative numbers. Excel 2003 is an improvement over previous versions, but not enough has been done that its use for statistical purposes can be recommended.

On the accuracy of statistical procedures in Microsoft Excel 2003- B.D. McCullough,Berry WilsonComputational Statistics & Data Analysis Elsevier 15 June 2005


Resumo

Excel 2007, like its predecessors, fails a standard set of intermediate-level accuracy tests in three areas: statistical distributions, random number generation, and estimation. Additional errors in specific Excel procedures are discussed. Microsoft’s continuing inability to correctly fix errors is discussed. No statistical procedure in Excel should be used until Microsoft documents that the procedure is correct; it is not safe to assume that Microsoft Excel’s statistical procedures give the correct answer. Persons who wish to conduct statistical analyses should use some other package.







22 dezembro 2014

Aquecimento Global: Derretimento estatístico

The Global Warming Statistical Meltdown by Judith Curry

At the recent United Nations Climate Summit, Secretary-General Ban Ki-moon warned that “Without significant cuts in emissions by all countries, and in key sectors, the window of opportunity to stay within less than 2 degrees [of warming] will soon close forever.” Actually, this window of opportunity may remain open for quite some time. A growing body of evidence suggests that the climate is less sensitive to increases in carbon-dioxide emissions than policy makers generally assume—and that the need for reductions in such emissions is less urgent.

According to the U.N. Framework Convention on Climate Change, preventing “dangerous human interference” with the climate is defined, rather arbitrarily, as limiting warming to no more than 2 degrees Celsius (3.6 degrees Fahrenheit) above preindustrial temperatures. The Earth’s surface temperatures have already warmed about 0.8 degrees Celsius since 1850-1900. This leaves 1.2 degrees Celsius (about 2.2 degrees Fahrenheit) to go.

In its most optimistic projections, which assume a substantial decline in emissions, the Intergovernmental Panel on Climate Change (IPCC) projects that the “dangerous” level might never be reached. In its most extreme, pessimistic projections, which assume heavy use of coal and rapid population growth, the threshold could be exceeded as early as 2040. But these projections reflect the effects of rising emissions on temperatures simulated by climate models, which are being challenged by recent observations.

Human-caused warming depends not only on increases in greenhouse gases but also on how “sensitive” the climate is to these increases. Climate sensitivity is defined as the global surface warming that occurs when the concentration of carbon dioxide in the atmosphere doubles. If climate sensitivity is high, then we can expect substantial warming in the coming century as emissions continue to increase. If climate sensitivity is low, then future warming will be substantially lower, and it may be several generations before we reach what the U.N. considers a dangerous level, even with high emissions.

The IPCC’s latest report (published in 2013) concluded that the actual change in 70 years if carbon-dioxide concentrations double, called the transient climate response, is likely in the range of 1 to 2.5 degrees Celsius. Most climate models have transient climate response values exceeding 1.8 degrees Celsius. But the IPCC report notes the substantial discrepancy between recent observation-based estimates of climate sensitivity and estimates from climate models.

Nicholas Lewis and I have just published a study in Climate Dynamics that shows the best estimate for transient climate response is 1.33 degrees Celsius with a likely range of 1.05-1.80 degrees Celsius. Using an observation-based energy-balance approach, our calculations used the same data for the effects on the Earth’s energy balance of changes in greenhouse gases, aerosols and other drivers of climate change given by the IPCC’s latest report.

We also estimated what the long-term warming from a doubling of carbon-dioxide concentrations would be, once the deep ocean had warmed up. Our estimates of sensitivity, both over a 70-year time-frame and long term, are far lower than the average values of sensitivity determined from global climate models that are used for warming projections. Also our ranges are narrower, with far lower upper limits than reported by the IPCC’s latest report. Even our upper limits lie below the average values of climate models.

Our paper is not an outlier. More than a dozen other observation-based studies have found climate sensitivity values lower than those determined using global climate models, including recent papers published in Environmentrics (2012),Nature Geoscience (2013) and Earth Systems Dynamics (2014). These new climate sensitivity estimates add to the growing evidence that climate models are running “too hot.” Moreover, the estimates in these empirical studies are being borne out by the much-discussed “pause” or “hiatus” in global warming—the period since 1998 during which global average surface temperatures have not significantly increased.

This pause in warming is at odds with the 2007 IPCC report, which expected warming to increase at a rate of 0.2 degrees Celsius per decade in the early 21st century. The warming hiatus, combined with assessments that the climate-model sensitivities are too high, raises serious questions as to whether the climate-model projections of 21st century temperatures are fit for making public policy decisions.

The sensitivity of the climate to increasing concentrations of carbon dioxide is a central question in the debate on the appropriate policy response to increasing carbon dioxide in the atmosphere. Climate sensitivity and estimates of its uncertainty are key inputs into the economic models that drive cost-benefit analyses and estimates of the social cost of carbon.

Continuing to rely on climate-model warming projections based on high, model-derived values of climate sensitivity skews the cost-benefit analyses and estimates of the social cost of carbon. This can bias policy decisions. The implications of the lower values of climate sensitivity in our paper, as well as similar other recent studies, is that human-caused warming near the end of the 21st century should be less than the 2-degrees-Celsius “danger” level for all but the IPCC’s most extreme emission scenario.

This slower rate of warming—relative to climate model projections—means there is less urgency to phase out greenhouse gas emissions now, and more time to find ways to decarbonize the economy affordably. It also allows us the flexibility to revise our policies as further information becomes available.

First draft

I learned a lot about writing an op-ed through this process. Below is my first draft. This morphed into the final version based on input from Nic, another journalist and another person who is experienced in writing op-eds, plus input from the WSJ editors. All of the words in the final version have been approved by me, although the WSJ editors chose the title.

The challenge is to simplify the language, but not the argument, and keep it interesting and relevant while at the same not distorting the information. Below is my first draft:

Some insensitivity about climate change

At the recent UN Climate Summit, Secretary-General Ban-Ki Moon stated: “Without significant cuts in emissions by all countries, and in key sectors, the window of opportunity to stay within less than 2 degrees will soon close forever.”

In the context of the UN Framework Convention on Climate Change, preventing ‘dangerous human interference’ with the climate has been defined – rather arbitrarily – as limiting warming to more than 2oC above preindustrial temperatures. The Earth’s surface temperatures have already warmed about 0.8oC, leaving only 1.2oC before reaching allegedly ‘dangerous’ levels. Based upon global climate model simulations, the Intergovernmental Panel on Climate Change (IPCC) 5th Assessment Report (AR5; 2013) projects a further increase in global mean surface temperatures with continued emissions to exceed 1.2oC sometime within the 21st century, with the timing and magnitude of the exceedance depending on future emissions.

If and when we reach this dangerous level of human caused warming depends not only on how quickly emissions rise, but also on the sensitivity of the climate to greenhouse gas induced warming. If climate sensitivity is high, then we can expect substantial warming in the coming century if greenhouse gas emissions continue to increase. If climate sensitivity is low, then future warming will be substantially lower.

Climate sensitivity is the global surface warming that occurs when the concentration of carbon dioxide in the atmosphere doubles. Equilibrium climate sensitivity refers to the rise in temperature once the climate system has fully warmed up, a process taking centuries due to the enormous heat capacity of the ocean. Transient climate response is a shorter-term measure of sensitivity, over a 70 year timeframe during which carbon dioxide concentrations double.

The IPCC AR5 concluded that equilibrium climate sensitivity is likely in the range 1.5°C to 4.5°C and the transient climate response is likely in the range of 1.0°C to 2.5°C. Climate model simulations produce values in the upper region of these ranges, with most climate models having equilibrium climate sensitivity values exceeding 3.5oC and transient climate response values exceeding 1.8oC.

At the lower end of the sensitivity ranges reported by the IPCC AR5 are values of the climate sensitivity determined using an energy budget model approach that matches global surface temperatures with greenhouse gas concentrations and other forcings (such as solar variations and aerosol forcings) over the last century or so. I coauthored a paper recently published in Climate Dynamics that used this approach to determine climate sensitivity. Our calculations used the same forcing data given by the IPCC AR5, and we included a detailed accounting of the impact of uncertainties in the forcing data on our climate sensitivity estimates.

Our results show the best (median) estimate for equilibrium climate sensitivity is 1.64oC, with a likely (17–83% probability) range of 1.25–2.45oC. The median estimate for Transient Climate Response is 1.33oC with a likely range of 10.5-1.80oC. Most significantly, our new results support narrower likely ranges for climate sensitivity with far lower upper limits than reported by the IPCC AR5. Our upper limits lie below – for equilibrium climate sensitivity, substantially below – the average values of climate models used for warming projections. The true climate sensitivity may even be lower, since the energy budget model assumes that all climate change is forced, and does not account for the effects of decadal and century scale internal variability associated with long-term ocean oscillations.

These new climate sensitivity estimates adds to the growing evidence that climate models are running ‘too hot.’ At the heart of the recent scientific debate on climate change is the ‘pause’ or ‘hiatus’ in global warming – the period since 1998 during which global average surface temperatures have not increased. This observed warming hiatus contrasts with the expectation from the 2007 IPCC Fourth Assessment Report that warming would proceed at a rate of 0.2oC/per decade in the early decades of the 21st century. The warming hiatus combined with assessments that the climate model sensitivities are too high raises serious questions as to whether the climate model projections of 21st century have much utility for decision making.

The sensitivity of our climate to increasing concentrations of carbon dioxide is at the heart of the public debate on the appropriate policy response to increasing carbon dioxide in the atmosphere. Climate sensitivity and estimates of its uncertainty are key inputs into the economic models that drive cost-benefit analyses and estimates of the social cost of carbon.

Continuing to use the higher global climate model-derived values of climate sensitivity skews the cost-benefit analyses and estimates of the social cost of carbon. The implications of the lower values of climate sensitivity in our paper is that human caused warming near the end of the 21st century should be less than the 2oC ‘danger’ level for all but the most extreme emission scenario considered by the IPCC AR5. This delay in the warming – relative to climate model projections – relaxes the phase out period for greenhouse gas emissions, allowing more time to find ways to decarbonize the economy affordably and the flexibility to revise our policies as further information becomes available.

10 dezembro 2014

Falhas Metodológicas das pesquisas empíricas em contabilidade

Some Methodological Deficiencies in Empirical Research Articles in Accounting. Accounting Horizons: September 2014
Resumo:

This paper uses a sample of the regression and behavioral papers published in The Accounting Review and the Journal of Accounting Research from September 2012 through May 2013. We argue first that the current research results reported in empirical regression papers fail adequately to justify the time period adopted for the study. Second, we maintain that the statistical analyses used in these papers as well as in the behavioral papers have produced flawed results. We further maintain that their tests of statistical significance are not appropriate and, more importantly, that these studies do not—and cannot—properly address the economic significance of the work. In other words, significance tests are not tests of the economic meaningfulness of the results. We suggest ways to avoid some but not all of these problems. We also argue that replication studies, which have been essentially abandoned by accounting researchers, can contribute to our search for truth, but few will be forthcoming unless the academic reward system is modified.

Keywords:  research methodology, statistical analysis

Received: September 2013; Accepted: May 2014 ;Published Online: May 2014

Thomas R. Dyckman and Stephen A. Zeff (2014) Some Methodological Deficiencies in Empirical Research Articles in Accounting. Accounting Horizons: September 2014, Vol. 28, No. 3, pp. 695-712.

 http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2324266


Thomas R. Dyckman is a Professor Emeritus at Cornell University and an Adjunct Professor at Florida Gulf Coast University, and Stephen A. Zeff is a Professor at Rice University.

Recomendações dos Autores:

In summary we have endeavored to make the following points:

First, authors must adequately defend their selection of the sample period by convincing the reader that the period is stable itself and in relation to periods in close proximity.

Second, the accounting academy should actively seek and reward replications as an essential element in its aspirations to be a scientific community.

Third, authors should attend to the economic significance as well as the statistical significance of their investigations.

Fourth, authors should respect the limitation of conventional hypothesis tests applied to their data, which implies enhanced caution when declaring results to be statistically significant.

Fifth, authors could consider reporting the use of statistical intervals as a way to mitigate the problems of determining the most likely alternative hypothesis and thereby the appropriate Type ll error.

Sixth, authors need to be sure that, in their “Conclusions” section, they discuss the limitations of their research and how these limitations might be overcome, as well as suggest extensions for future research.
Seventh, authors should consider the use of descriptive statistics and other approaches as a means of, or support for, establishing the validity of their research objective.

Eighth, editors should consider requiring authors of accepted papers to provide a complete description of their methodology, including data collection, accuracy, and verification

29 novembro 2014

Pianogram

A imagem abaixo mostra quantas vezes cada tecla do piano é pressionado em relação ao total para uma determinada dada peça de piano.É um pianogram da obra Opus 10 número 5 de Chopin.




Aqui no link existem outras.

13 novembro 2014

Como mentir com índices de performance internacional


“CROOKS already know these tricks. Honest men must learn them in self-defence,” wrote Darrell Huff in 1954 in “How to Lie With Statistics”, a guide to getting figures to say whatever you want them to. Sadly, Huff needs updating.

The latest way to bamboozle with numbers is the “performance index”, which weaves data on lots of measures into a single easy-to-understand international ranking. From human suffering to perceptions of corruption, from freedom to children’s happiness, nowadays no social problem or public policy lacks one (see article). Governments, think-tanks and campaigners love an index’s simplicity and clarity; when well done, it can illuminate failures, suggest solutions and goad the complacent into action. But there are problems. Competing indices jostle in the intellectual marketplace: the World Economic Forum’s Global Gender Gap ranking, published last week, goes head to head with the UN’s Gender Inequality Index, the Index of Women’s Power from Big Think, an internet forum—and even The Economist’s own Glass Ceiling Index. Worse, some indices are pointless or downright misleading.


As easy as 1, 2, 3

Which to trust, and which to ignore? In the spirit of Huff, here is our guide to concocting a spurious index. Use it to guard against guile—or follow it to shape public perceptions and government policies armed only with a catchy title, patchy data and an agenda.

First, banish pedantry and make life easier for yourself by using whatever figures are to hand, whether they are old, drawn from small or biased samples, or mixed and matched from wildly differing sources. Are figures for a country lacking? Use a “comparator”, no matter how dubious; one index of slavery, short of numbers for Ireland and Iceland, uses British figures for both (aren’t all island nations alike?). If the numbers point in the wrong direction, find tame academics and businessfolk to produce more convenient ones, and call their guesses “expert opinion”. If all that still fails to produce what you want, tweak the weighting of the elements to suit.

Get the presentation right. Leaving your methodology unpublished looks dodgy. Instead, bury a brief but baffling description in an obscure corner of your website, and reserve the home page for celebrity endorsements. Get headlines by hamming up small differences; minor year-on-year moves in the rankings may be statistical noise, but they make great copy.

Above all, remember that you can choose what to put in your index—so you define the problem and dictate the solution. Rankings of business-friendliness may favour countries with strict laws; don’t worry if they are never enforced. Measures of democracy that rely on turnout ignore the ability of autocrats to get out the vote. Indices of women’s status built on education levels forget that, in Saudi Arabia, women outnumber men in universities because they are allowed to do little else but study. If you want prostitution banned, count sex workers who cross borders illegally, but willingly, as “trafficking victims”. Criticism can always be dismissed as sour grapes and special pleading. The numbers, after all, are on your side. You’ve made sure of that.


Fonte: aqui
From the print edition: Leaders

03 novembro 2014

Grande parte dos achados publicados em Finanças são falsos positivos

Entrevista com Campbell Harvey, PHD em Chicago. Professor da Duke University. Ex-editor do Journal of Finance. Vice-preseidente da American Finance Association




Q: Investors often rely on financial research when developing strategies. Your recent findings suggest they should be wary. What did you find?

Campbell Harvey: My paper is about how we conduct research as both academics and practitioners. I was inspired by a paper published in the biomedical field that argued that most scientific tests that are published in medicine are false. I then gathered information on 315 tests that were conducted in finance. After I corrected the test statistics, I found that about half the tests were false. That is, someone was claiming a discovery when there was no real discovery.

Q: What do you mean “correcting the tests”?

Campbell Harvey: The intuition is really simple. Suppose you are trying to predict something like the returns on a portfolio of stocks. Suppose you try 200 different variables. Just by pure chance, about 10 of these variables will be declared “significant” – yet they aren’t. In my paper, I show this by randomly generating 200 variables. The simulated data is just noise, yet a number of the variables predict the portfolio of stock returns. Again, this is what you expect by chance. The contribution of my paper is to show how to correct the tests. The picture above looks like an attractive and profitable investment. The picture below shows 200 random strategies (i.e. the data are made up). The profitable investment is just the best random strategy (denoted in dark red). Hence, it is not an attractive investment — its profitability is purely by chance!
200_strategies



Q: So you provide a new set of research guidelines?

Campbell Harvey: Exactly. Indeed, we go back in time and detail the false research findings. We then extrapolate our model out to 2032 to give researchers guidelines for the next 18 years.

Q: What are the practical implications of your research?

Campbell Harvey: The implications are provocative. Our data mainly focuses on academic research. However, our paper applies to any financial product that is sold to investors. A financial product is, for example, an investment fund that purports to beat some benchmark such as the S&P 500. Often a new product is proposed and there are claims that it outperformed when it is run on historical data (this is commonly called “backtesting” in the industry). The claim of outperformance is challenged in our paper. You can imagine researchers on Wall Street trying hundreds if not thousands of variables. When you try so many variables, you are bound to find something that looks good. But is it really good – or just luck?

Q: What do you hope people take away from your research?

Campbell Harvey: Investors need to realize that about half of the products they are sold are false – that is, there is expected to be no outperformance in the future; they were just lucky in their analysis of historical data.

Q: What reactions have Wall Street businesses had so far to your findings?

Campbell Harvey: A number of these firms have struggled with this problem. They knew it existed (some of their products “work” just by chance). It is in their own best interest to deliver on promises to their clients. Hence, my work has been embraced by the financial community rather than spurned.

Professor Harvey’s research papers, “Evaluating Trading Strategies“, “…and the Cross-Section of Expected Returns” and “Backtesting” are available at SSRN for free download.

11 outubro 2014

Poder da Estatística Bayesiana


Statistics may not sound like the most heroic of pursuits. But if not for statisticians, a Long Island fisherman might have died in the Atlantic Ocean after falling off his boat early one morning last summer.

The man owes his life to a once obscure field known as Bayesian statistics — a set of mathematical rules for using new data to continuously update beliefs or existing knowledge.

The method was invented in the 18th century by an English Presbyterian minister named Thomas Bayes — by some accounts to calculate the probability of God’s existence. In this century, Bayesian statistics has grown vastly more useful because of the kind of advanced computing power that did not exist even 20 years ago.

It is proving especially useful in approaching complex problems, including searches like the one the Coast Guard used in 2013 to find the missing fisherman, John Aldridge (though not, so far, in the hunt for Malaysia Airlines Flight 370).

Now Bayesian statistics are rippling through everything from physics to cancer research, ecology to psychology. Enthusiasts say they are allowing scientists to solve problems that would have been considered impossible just 20 years ago. And lately, they have been thrust into an intense debate over the reliability of research results.
Thomas Bayes

When people think of statistics, they may imagine lists of numbers — batting averages or life-insurance tables. But the current debate is about how scientists turn data into knowledge, evidence and predictions. Concern has been growing in recent years that some fields are not doing a very good job at this sort of inference. In 2012, for example, a team at the biotech company Amgen announced it had analyzed 53 cancer studies and found it could not replicate 47 of them.

Similar follow-up analyses have cast doubt on so many findings in fields such as neuroscience and social science that researchers talk about a “replication crisis”

Some statisticians and scientists are optimistic that Bayesian methods can improve the reliability of research by allowing scientists to crosscheck work done with the more traditional or “classical” approach, known as frequentist statistics. The two methods approach the same problems from different angles.

The essence of the frequentist technique is to apply probability to data. If you suspect your friend has a weighted coin, for example, and you observe that it came up heads nine times out of 10, a frequentist would calculate the probability of getting such a result with an unweighted coin. The answer (about 1 percent) is not a direct measure of the probability that the coin is weighted; it’s a measure of how improbable the nine-in-10 result is — a piece of information that can be useful in investigating your suspicion.

By contrast, Bayesian calculations go straight for the probability of the hypothesis, factoring in not just the data from the coin-toss experiment but any other relevant information — including whether you have previously seen your friend use a weighted coin.

Scientists who have learned Bayesian statistics often marvel that it propels them through a different kind of scientific reasoning than they had experienced using classical methods.

“Statistics sounds like this dry, technical subject, but it draws on deep philosophical debates about the nature of reality,” said the Princeton University astrophysicist Edwin Turner, who has witnessed a widespread conversion to Bayesian thinking in his field over the last 15 years.

Countering Pure Objectivity

Frequentist statistics became the standard of the 20th century by promising just-the-facts objectivity, unsullied by beliefs or biases. In the 2003 statistics primer “Dicing With Death,” Stephen Senn traces the technique’s roots to 18th-century England, when a physician named John Arbuthnot set out to calculate the ratio of male to female births.

Arbuthnot gathered christening records from 1629 to 1710 and found that in London, a few more boys were recorded every year. He then calculated the odds that such an 82-year run could occur simply by chance, and found that it was one in trillions. This frequentist calculation can’t tell them what is causing the sex ratio to be skewed. Arbuthnot proposed that God skewed the birthrates to balance the higher mortality that had been observed among boys, but scientists today favor a biological explanation over a theological one.

Later in the 1700s, the mathematician and astronomer Daniel Bernoulli used a similar technique to investigate the curious geometry of the solar system, in which planets orbit the sun in a flat, pancake-shaped plane. If the orbital angles were purely random — with Earth, say, at zero degrees, Venus at 45 and Mars at 90 — the solar system would look more like a sphere than a pancake. But Bernoulli calculated that all the planets known at the time orbited within seven degrees of the plane, known as the ecliptic.

What were the odds of that? Bernoulli’s calculations put them at about one in 13 million. Today, this kind of number is called a p-value, the probability that an observed phenomenon or one more extreme could have occurred by chance. Results are usually considered “statistically significant” if the p-value is less than 5 percent.Photo
The Coast Guard, guided by the statistical method of Thomas Bayes, was able to find the missing fisherman John Aldridge.CreditDaniel Shea

But there is a danger in this tradition, said Andrew Gelman, a statistics professor at Columbia. Even if scientists always did the calculations correctly — and they don’t, he argues — accepting everything with a p-value of 5 percent means that one in 20 “statistically significant” results are nothing but random noise.

The proportion of wrong results published in prominent journals is probably even higher, he said, because such findings are often surprising and appealingly counterintuitive, said Dr. Gelman, an occasional contributor to Science Times.

Looking at Other Factors

Take, for instance, a study concluding that single women who were ovulating were 20 percent more likely to vote for President Obama in 2012 than those who were not. (In married women, the effect was reversed.)
Continue reading the main story

Dr. Gelman re-evaluated the study using Bayesian statistics. That allowed him to look at probability not simply as a matter of results and sample sizes, but in the light of other information that could affect those results.

He factored in data showing that people rarely change their voting preference over an election cycle, let alone a menstrual cycle. When he did, the study’s statistical significance evaporated. (The paper’s lead author, Kristina M. Durante of the University of Texas, San Antonio, said she stood by the finding.)

Dr. Gelman said the results would not have been considered statistically significant had the researchers used the frequentist method properly. He suggests using Bayesian calculations not necessarily to replace classical statistics but to flag spurious results.

A famously counterintuitive puzzle that lends itself to a Bayesian approach is the Monty Hall problem, in which Mr. Hall, longtime host of the game show “Let’s Make a Deal,” hides a car behind one of three doors and a goat behind each of the other two. The contestant picks Door No. 1, but before opening it, Mr. Hall opens Door No. 2 to reveal a goat. Should the contestant stick with No. 1 or switch to No. 3, or does it matter?

A Bayesian calculation would start with one-third odds that any given door hides the car, then update that knowledge with the new data: Door No. 2 had a goat. The odds that the contestant guessed right — that the car is behind No. 1 — remain one in three. Thus, the odds that she guessed wrong are two in three. And if she guessed wrong, the car must be behind Door No. 3. So she should indeed switch.

In other fields, researchers are using Bayesian statistics to tackle problems of formidable complexity. The New York University astrophysicist David Hogg credits Bayesian statistics with narrowing down the age of the universe. As recently as the late 1990s, astronomers could say only that it was eight billion to 15 billion years; now, factoring in supernova explosions, the distribution of galaxies and patterns seen in radiation left over from the Big Bang, they have concluded with some confidence that the number is 13.8 billion years.

Bayesian reasoning combined with advanced computing power has also revolutionized the search for planets orbiting distant stars, said Dr. Turner, the Princeton astrophysicist.

In most cases, astronomers can’t see these planets; their light is drowned out by the much brighter stars they orbit. What the scientists can see are slight variations in starlight; from these glimmers, they can judge whether planets are passing in front of a star or causing it to wobble from their gravitational tug.Photo
Andrew Gelman, a statistics professor at Columbia, says the Bayesian method is good for flagging erroneous conclusions. CreditJingchen Liu

Making matters more complicated, the size of the apparent wobbles depends on whether astronomers are observing a planet’s orbit edge-on or from some other angle. But by factoring in data from a growing list of known planets, the scientists can deduce the most probable properties of new planets.

One downside of Bayesian statistics is that it requires prior information — and often scientists need to start with a guess or estimate. Assigning numbers to subjective judgments is “like fingernails on a chalkboard,” said physicist Kyle Cranmer, who helped develop a frequentist technique to identify the latest new subatomic particle — the Higgs boson.


Others say that in confronting the so-called replication crisis, the best cure for misleading findings is not Bayesian statistics, but good frequentist ones. It was frequentist statistics that allowed people to uncover all the problems with irreproducible research in the first place, said Deborah Mayo, a philosopher of science at Virginia Tech. The technique was developed to distinguish real effects from chance, and to prevent scientists from fooling themselves.

Uri Simonsohn, a psychologist at the University of Pennsylvania, agrees. Several years ago, he published a paper that exposed common statistical shenanigans in his field — logical leaps, unjustified conclusions, and various forms of unconscious and conscious cheating.

He said he had looked into Bayesian statistics and concluded that if people misused or misunderstood one system, they would do just as badly with the other. Bayesian statistics, in short, can’t save us from bad science.

At Times a Lifesaver

Despite its 18th-century origins, the technique is only now beginning to reveal its power with the advent of 21st-century computing speed.

Some historians say Bayes developed his technique to counter the philosopher David Hume’s contention that most so-called miracles were likely to be fakes or illusions. Bayes didn’t make much headway in that debate — at least not directly.

But even Hume might have been impressed last year, when the Coast Guard used Bayesian statistics to search for Mr. Aldridge, its computers continually updating and narrowing down his most probable locations.

The Coast Guard has been using Bayesian analysis since the 1970s. The approach lends itself well to problems like searches, which involve a single incident and many different kinds of relevant data, said Lawrence Stone, a statistician for Metron, a scientific consulting firm in Reston, Va., that works with the Coast Guard.

At first, all the Coast Guard knew about the fisherman was that he fell off his boat sometime from 9 p.m. on July 24 to 6 the next morning. The sparse information went into a program called Sarops, for Search and Rescue Optimal Planning System. Over the next few hours, searchers added new information — on prevailing currents, places the search helicopters had already flown and some additional clues found by the boat’s captain.

The system could not deduce exactly where Mr. Aldridge was drifting, but with more information, it continued to narrow down the most promising places to search.

Just before turning back to refuel, a searcher in a helicopter spotted a man clinging to two buoys he had tied together. He had been in the water for 12 hours; he was hypothermic and sunburned but alive.

Even in the jaded 21st century, it was considered something of a miracle.


Fonte:A version of this article appears in print on September 30, 2014, on page D1 of the New York edition with the headline: The Odds, Continually Updated. Order Reprints|Today's Paper|Subscribe