Program

Monday, 26th June 2017

Location: Room DZ 008

09:00 – 10:00 Registration
10:00 – 11:15 Eric-Jan Wagenmakers
History and Statistical Foundation of Preregistration
11:15 – 11:30

Coffee break (15min)

11:30 – 12:10 Fabien Grégis
The meanings of error and accuracy in the adjustments of the physical constants
12:10 – 12:50 Nevin Climenhaga, Lane Desautels & Grant Ramsey
Causal Inference from Noise
12:50 – 14:00

Lunch (70min)

14:00 – 14:40 Fayette Klaassen & Herbert Hoijtink
Staying in the loop: prior probabilities, Bayes factor, posterior probabilities
14:40 – 15:20 Robbie van Aert and Marcel van Assen
Estimating replicability of science by taking statistical significance into account
15:20 – 15:45

Coffee break (25min)

15:45 – 17:00 Barbara Osimani
Varieties of Evidence and Varieties of Error. A Formal Approach to Evidence Synthesis
17:00 – 17:15

Coffee break (15min)

17:15 – 18:30 Poster Session (with drinks and bites)

          Qiu Lin
Testing, Cross-Checking, and Falsifying: the Indispensability of Counterfactual Reasoning in Scientific Research
          Charles Beasley
The Neyman-Pearson Method in Animal Mindreading Research
          Hannah Rubin
Does Inclusive Fitness Save the Connection Between Rational Choice and Evolution?
          Sebastian Schuol
Misleading debate – incidental findings in genomic research
          Tomasz Żuradzki
The conceptualization of vaccination refusals: between science denial and violation of rational choice
          Elena Popa
Overlaps between epistemic and moral values: uses regarding scientific error
          Katinka Dijkstra, Peter Verkoeijen and Rolf Zwaan
Reading Literary Fiction Improves Theory of Mind: What is the Evidence?
          Adam Kubiak
Different Types of Prior Knowledge Employed to Minimize the Error in Neymanian Statistics

20:00 – Workshop Dinner (Café Anvers, Oude Markt 8, 5038 TJ Tilburg)

Tuesday, 27th June 2017

Location: Room DZ 008

slides

10:00 – 11:15 Daniel Lakens
Scientific errors should be controlled, not prevented [slides]
11:15 – 11:30

Coffee break (15min)

11:30 – 12:10 Herbert Hoijtink
Bayesian Evaluation of Informative Hypotheses
12:10 – 12:50 Noah van Dongen & Eric-Jan Wagenmakers
Statistician Testing
12:50 – 14:00

Lunch (70min)

14:00 – 14:40 Daria Jadreškić
Speed, error, and the epistemic cost of suspended judgment
14:40 – 15:20 David Hopf
Bias without Error? Independence of Research and the “State of the Art”
15:20 – 16:00 Felipe Romero and Jan Sprenger
Scientific Self Correction—The Bayesian Way
16:00 – 16:30

Coffee break (30min)

16:30 – 17:45 Édouard Machery
What is a replication
17:45 – Closing Reception

Abstracts in Alphabetical Order:

Author Title Abstract
Adam Kubiak Different Types of Prior Knowledge Employed to Minimize the Error in Neymanian Statistics

Neyman’s methodology of sampling and estimation is a flexible system allowing for incorporating different types of prior knowledge into inferences to minimize the error in the conclusions. It employs: prior estimate of research variable, characteristics of the population studied, the correlations between auxiliary characteristics and research variable, and finally the knowledge of socio-economic context of the research. Additionally, when the error of hypothesis tests is considered, it is possible to make use of the pre-test assessment of prior probabilities of the hypotheses to decrease the overall error rate without violating the core methodological idea of the frequentist statistical test.

Barbara Osimani Varieties of Evidence and Varieties of Error. A Formal Approach to Evidence Synthesis

The root of the dissent around standards of evidence in medicine can be found in different perspectives on the reliability of evidence. Whereas the EBM approach is focused on reliability as opposed to random and systematic error (confounding, bias), the contending view is worried about defeasibility of inference and various problems with extrapolation (Cartwright, 2011, Clarke et al. 2013, Anjum and Mumford 2012). In particular, much crosstalk is generated by failing to take into account the multilayer structure of scientific inference and the interactions of various first and higher-order dimensions of evidence in (dis) confirming hypotheses: strength, consistency/coherence of items of evidence, (in)dependence structure; reliability of the source, relevance of the evidence for the target population, plausibility with respect to the background theories and contribution to scientific progress. Formal (social) epistemology has a tradition in analysing the interaction of these various dimensions of evidence in contributing to hypothesis (dis)confirmation (see for instance Bovens and Hartmann, 2003; Collins et al. 2015, Romero 2016, Landes, Osimani, Poellinger, 2017). I present here a formal approach to evidence synthesis which follows this tradition and takes into account both above concerns by spelling out how various dimensions of evidence and of error jointly interact in hypothesis confirmation.

Charles Beasley The Neyman-Pearson Method in Animal Mindreading Research

Due to the fact that the non-human animal mindreading research program has been persistently haunted by theoretical under-determination, scientists and philosophers have re-examined the implemented statistical methodology, and proposed reforms to it, with the hopes of obtaining probative evidence. I review three of these reforms to the NPM (Neyman-Pearson-Method) in comparative psychology (namely, Mikhalevich’s (2015) contextual null, Huss and Andrews’ (2014) anti-skeptical null, and Bausman and Halina’s (forthcoming) statistical null), and argue that each betrays a crucial feature of the NPM. I then propose what I call the orthodox null as a return to the intended understanding of the NPM.

Daniël Lakens Scientific errors should be controlled, not prevented

The scientific enterprise produces an increasing amount of findings every year. As in any enterprise, optimization of the production cycle requires a careful analysis of the costs of errors, as well as the cost of the prevention of errors. I will argue for placing error control at the very center of how we design experiments. Depending on the available resources, and the goals of researchers, statistical decision theory should be applied to make informed choices about how much error is desirable. Since the goal of tax-funded scientists should be to generate knowledge as efficiently as possible, our main challenge is logistical, and hence the question of how we should control errors to optimal levels is ultimately an applied question that can only be determined in close collaboration with researchers, and will lead to domain-specific answers.

Daria Jadreškić Speed, error, and the epistemic cost of suspended judgment

I discuss the relationship between speed and error drawing on a discussion between Elliottt and McKaughan (2014), and Steel (2016). The difference between their stances is that the former claimed the preference for speed to be of a non-epistemic nature, while Steel emphasized an epistemic purpose that speed performs: avoiding the epistemic cost of suspended judgment. I use Steel’s notion of time-sensitivity to account for a more accurate role for speed, i.e. for time in science, namely that it is an underlying feature of different values, such as simplicity, ease of use, and predictive success.

David Hopf Bias without Error? Independence of Research and the “State of the Art”

When we talk about the trustworthiness of scientific findings, we might think of biases detrimental either to the reliability or validity of research. In this talk, I point out a novel concept of bias that implies no such thing: bias in the state of the art. The sheer quantity of research on certain topics – reaching from overemphasis to no research at all – can profoundly influence the confirmation of hypotheses, the comprehensiveness of decision-relevant information and, thus, the perceived weight of alternative courses of action. I defend this concept against anticipated attempts of reduction, and discuss possible countermeasures.

Édouard Machery What is a replication

There has been very little effort in providing an abstract characterization of the concept of replication, and the concept of replication remains unclear. This lack of clarity underwrites some of the debates in psychology about what kind of replication (e.g., conceptual vs. exact replication) should be preferred to validate a theory. The goal of this presentation is to propose an analysis of the concept of replication that will clarify the debates going on in psychology and philosophy of science about replication.

Elena Popa Overlaps between epistemic and moral values: uses regarding scientific error

Starting from the concept of ‘moralized rationality’ (Stahl et al.), interpreted as a blurring of the boundaries between moral and epistemic values in everyday reasoning, I investigate how the overlaps between the two are reflected in scientific practice. Following Thagard’s discussion of rationality and emotions in science, I argue that moral judgments are alike non-cognitive values in their involvement of emotions, while sharing the normative character of cognitive values. I further claim that joint moral and epistemic aspects may help address ‘hot cognition’ issues, particularly avoiding data distortion and fabrication.

Eric-Jan Wagenmakers History and Statistical Foundation of Preregistration

The practice of preregistration has received increasing attention of late, particularly in the field of psychology. After outlining the goal of preregistration and the recent initiatives to promote its use, I will present a historical overview that lists the philosophers and methodologists who have advocated preregistration in the past. Even though some advocates of preregistration believed it to be a requirement sine qua non, detractors have argued that preregistration is useless. I outline the arguments pro and con, and then attempt to shed light on the issue by adopting a statistical perspective.

Fabien Grégis The meanings of error and accuracy in the adjustments of the physical constants

The adjustments of the physical constants are a crucial collective endeavor in the field of precision physics, initiated by Raymond Birge in 1929. I propose to explore two examples in the history of the adjustments which illustrate how error, uncertainty and accuracy are understood in this field. I will explain how physicists have exhibited a tension between precision and accuracy in the assessment of experimental results. I will then show how some of them have proposed, since the 1970s, to resolve this tension by appealing to an epistemology of long-term progress based on the possibility to correct for errors.

Fayette Klaassen and Herbert Hoijtink Staying in the loop: prior probabilities, Bayes factor, posterior probabilities

The relative evidence in data for two hypotheses can be quantified in a Bayes factor. Bayesian statistics allows for the continuous updating of evidence from data into posterior probabilities, or conditional probabilities of hypotheses under consideration. At the start of this loop, subjective prior probabilities need to be specified for the hypotheses considered. Often, prior probabilities are considered equal, while this might not be an accurate representation of researcher’s beliefs. This research provides a definition of prior probabilities and a tool to facilitate the understanding and structured formulation of prior probabilities based on subjective ideas about hypotheses for applied researchers.

Felipe Romero and Jan Sprenger Scientific Self Correction—The

Bayesian Way

Replication is central to scientific self-correction, but many findings in the behavioral sciences don’t replicate (Open Science Collaboration, 2015). We evaluate two competing hypotheses about how to make science more self-corrective. Social reformists hypothesize that changes in inference methods alone do not make science more self-corrective unless we change the social structure of science. On the other hand, methodological reformists hypothesize that scientific self-correction would be greatly improved by moving from significance tests (NHST) to Bayesian statistics. Using a computer simulation study, we evaluate whether self-correction depends on the chosen statistical framework. Based on this study, we articulate a middle ground between the social and methodological reforms. Scientific self-correction fails in several scenarios regardless of the statistical framework, but Bayesian analysis leads to less misleading effect size estimates and credible/confidence intervals than NHST.

Hannah Rubin Does Inclusive Fitness Save the Connection Between Rational Choice and Evolution?

At the heart of evolutionary theory is the concept of ‘fitness’, which is, standardly, an organism’s reproductive success. Many evolutionary theorists argue, however, that to explain the evolution of social traits, such as altruism, we must use a different notion of fitness. This ‘inclusive fitness’, which includes the reproductive success of relatives, is claimed to be indispensable for studying social evolution. I show that this pervasive claim in one of the most influential research paradigms in evolutionary biology rests on a confusion between correlation and causation in a subtle way.

Herbert Hoijtink Bayesian Evaluation of Informative Hypotheses

Even after decades of critique, null-hypothesis significance testing is still the dominant research paradigm in the behavioural and social sciences. This is perhaps explained by the fact that the focus has been too much on critique and too little on viable, broadly applicable alternatives, that are implemented in user friendly software packages. In this presentation, first of all, a resume of the main critique of null-hypothesis significance testing will be given. Subsequently, using a simple example, an alternative, Bayesian evaluation of informative hypotheses, will be introduced. The presentation will be concluded with an example in which both approaches are compared.

Katinka Dijkstra, Peter Verkoeijen and Rolf Zwaan Reading Literary Fiction Improves Theory of Mind: What is the Evidence?

Kidd and Castano (2013) claimed that reading literary fiction improves theory of mind relative to reading popular fiction. In the present study, we first examined the empirical evidence reported by Kidd and Castano with a p-curve analysis. This analysis suggests that the evidential value of their findings is inadequate. Secondly, we conducted a well-powered close to exact replication of one of Kidd and Castano’s studies. The results of this replication failed to substantiate Kidd and Castano’s claim as we failed to differential effects of reading literary fiction and popular fiction on various theory-of-mind measures.

Nevin Climenhaga, Lane Desautels and Grant Ramsey Causal Inference from Noise

“Correlation is not causation” is a mantra of the sciences. The standard view from the epistemology of causation is that to tell whether one correlated variable is causing the other, one needs to intervene on the system—the best intervention being a randomized controlled trial. In this paper, we challenge this orthodoxy. We argue that not only are randomized controlled trials not always the only source of causal knowledge, they are not always the best source. Instead, we show that a source of knowledge heretofore not recognized in the philosophical literature—statistical noise—can be a source of causal knowledge.

Noah van Dongen and Eric-Jan Wagenmakers Statistician Testing

If we want to resolve the replication crisis and improve research practice in the social sciences, knowledge is required on how and why researchers, statisticians and methodologists do or do not change their mind when confronted with the flaws in their methods. We propose an extensive exploratory study in how statisticians and methodologists in the social sciences differ in analyzing the same data; how they react to direct exposure to these differences. It is our expectations that these findings will assist in formulating and amending reformation policies in social science. In addition, our results will be relevant for the philosophical debates on science’s capacity to self-correct and between the schools of statistical inference. This presentation provides an outline of the study. .

Qiu Lin Testing, Cross-Checking, and Falsifying: the Indispensability of Counterfactual Reasoning in Scientific Research

This paper examines counterfactual reasoning employed in science to eliminate false hypotheses. In 1895, Simon Newcomb proposed a change for the exponent of the inverse-square law in order to account for the anomaly of Mercury’s perihelion. This modification posed a fundamental challenge to the Newtonian theory of gravity. Later, Ernest Brown provided strong evidence against Newcomb’s proposal by showing, via counterfactual reasoning, that it would disturb values of the Moon. I argue that counterfactual reasoning is a powerful cross-checking instrument whereby hypotheses can be evaluated on two aspects: whether they are compatible with the established results and the accepted theory.

Robbie van Aert and Marcel van Assen Estimating replicability of science by taking statistical significance into account

Consider the following common situation in science nowadays: A researcher reads about a significant effect and replicates it. As a result, the researcher has two effect size estimates, and his/her key objective is to evaluate effect size based on these two study outcomes. This objective is particularly challenging if the replication’s effect size is small and not significant. We developed the snapshot hybrid method that does take into account statistical significance of the original study. This method quantifies the amount of evidence in favor of a zero, small, medium, or large underlying true effect size by computing posterior model probabilities.

Sebastian Schuol Misleading debate – incidental findings in genomic research

Prospective oriented bioethical debates often lack empirical knowledge as reported in (Schuol et al. 2015): Since every genome contains potentially disease-causing alterations that may be detected during genomic analyses to investigate a specific condition these incidental findings (IF) are subject of intense ethical debate. Our empirical analysis show that frequency of IFs is almost zero. In contrast to imaging methods one cannot „see“ genomic IFs. Handling of genomic data request filters and these prevent from IFs. Two years after publication and presenting these findings the debate is still intense. What are the mechanisms behind and how can misleading debates prevented?

Tomasz Żuradzki The conceptualization of vaccination refusals: between science denial and violation of rational choice

Vaccination refusals seem to be clear examples of science denial. Nevertheless, some countries offer non-medical exemptions from mandatory vaccination. It is surprising, because these kinds of exceptions are usually limited to value disagreements (e.g. about the moral status of human fetuses), but are not accepted in cases of science denial (e.g. objections to teaching evolution or climate change in schools). In my presentation I want to show that there are no good reasons to assume that anyone should be allowed to refuse “to vaccinate their dependants on conscientious grounds”, as some authors assume.