Scientifically Based Research — U.S. Department of Education– Pg 3

What Is Scientifically Based Evidence? What Is Its Logic?—Valerie Reyna

     MS.NEUMAN: What I’d like to now do is introduce Valerie Reyna. Valerie is the Deputy OERI, Office of Educational Research and Improvement. Her topic is what is scientifically based evidence, what is its logic?

     VALERIE REYNA: Thank you very much. If you could go ahead and put my first slide up that would be great.

     Welcome, it is a please to have the opportunity to talk to you and I gather that our well-organized organizer is going to keep the question and answer period to the end after all the speakers.

     My usual style as a teacher is to have questions during the talk, so that’s kind of constraining for me but I will try to contain myself.

     MS. NEUMAN: You will be good!

     MS. REYNA: Absolutely! But if there is something that is burning that’s informational, if there’s something that doesn’t make sense at all, it wouldn’t be a good idea not to communicate. So, please do raise you hand for that. At the end, of course, I will be delighted to entertain questions. In fact, a kind of give and take session is what I am really looking forward to, so that I can learn from you too.

     Yes, that’s who I am. We can go to the next slide.

     I am going to talk briefly about: why scientific research, although I don’t think in the very short time that I have available that I could really give you a coherent argument that supports and defends the notion of scientific research, but I can touch on a few ideas very, very lightly.

     One of them is: why scientific research? I think to think about that it’s useful to think about what is the alternative to scientific research? If you didn’t base practice on scientific research, what do you base it on?

     Those alternatives include (this is not an exhaustive list, of course), it includes such things as tradition—this is the way we’ve always done it, for example, superstition, there are—you know, you throw the salt over your left shoulder and the reading scores go up! No, actually, there are things that are not based in fact that in fact become lore that if we really knew the scientific basis of it we would discover that those things in fact are just superstition. They are unfounded beliefs.

     Then, there’s anecdote. A fairly well-known obstetrician physician asked me once, “What’s wrong with anecdotal evidence?” I think it is really a good question. Anecdote is a series of stories that you tell about things that have happened to you in your life. They can be very entertaining anecdotes.

     The reason why we can’t base practice on mere anecdote, however, and this is, of course, well known in medicine, is that individual cases may be exceptions. That may be the only case of that type.

     In fact, anecdotes are often more entertaining when they are unique. But that is a weak basis to generalize to many, many people.

     We know on the basis of experience that anecdotes have turned out to be false and misleading. Sometimes they are very representative, sometimes they’re not. The problem is we don’t know when.

     Next slide. There’s an analogy to medicine that I have obviously drawn on already.

     The first example, of course, is the classic one of when they used to bleed people. People would get sick. You know, I think it was when George Washington was bled that contributed to his death.

     Why was it that good, well-intentioned physicians, because I think they probably were well-intentioned, I don’t think they were trying to hurt the president, why is it that they didn’t notice that it wasn’t working? It wasn’t just with this one patient, it was with many patients. Yet, somehow, personal experience was not sufficient to dissuade them from this practice.

     Well, in fact, clinical trials are very recent in medicine. It was only in the 1940s that the randomized experiment where you know you had 2 groups, and you randomly assigned and all of that became routine and a standard, the gold standard in medicine. That is very recent in historical terms. Prior to that, we relied on those things I talked about in the first slide, like tradition and bleeding people.

     One of the reasons why clinical trials are not sufficient has to do with the psychology of human thinking. I won’t go into it in any depth, but I’m actually a cognitive psychologist and there’s been research done about when you ask people to report about things they have directly observed and directly witnessed and the biases that can creep into that type of reporting. These are normal human biases that are generally adaptive, but they have predictable pitfalls. So, if you rely on your memory for past events, we know that that memory will be biased, and so on. Drawing simply on your personal experience alone is not a solid foundation for generalization.

     Clinical trials in fact are the only way to really be sure about what works in medicine. The logic of it—and the other speakers are going to go into far more depth than I really have the time to do, the logic of it is basically the following: You have a group of people that you want to make a conclusion about. You want to say this intervention—whatever it is, if it’s a new reading technique, or whatever—works for this group or not.

     So, what you do is you take members of that population and you flip a coin essentially as to whether they are going to be in the group that actually gets the intervention or gets some kind of comparison, like what you would have done had you not done this new thing. Standard treatment, that’s a common control.

     The idea is that if you do this enough times and you get big enough groups, you’ve got two groups, the fact that you’re flipping a coin ensures that these two groups, if you have enough people in them, are going to be comparable in every way except the intervention you’re interested in.

     Why is that? Because there was nothing that put one person in one group as opposed to the other. It was all by chance alone that you ended up in the reading intervention group as opposed to the control group. And, so, all the ways in which people do in fact differ, and people do differ, should be represented in both groups. They should be comparable in every way, except the one thing that you made different in their lives, therefore, we can isolate the effect of the outcome and trace it to that intervention uniquely.

     This is the only design that allows you to do that, to make a causal inference. Everything else is subject to a whole bunch of other possible interpretations.

     Now if you have too small a sample, obviously the logic doesn?t follow. Because you can have all the smart people in one group, the not so smart people in the other

if you only have a few. If you do this enough times, you get a big enough group, they will be representative. That has been proven mathematically by things like—well, we won’t get into that!

     The bottom line here is these same rules about what works and how to make inferences about what works, they are exactly the same for educational practice as they would be for medical practice. Same rules, exactly the same logic, whether you are talking about a treatment for cancer or whether you’re talking about an intervention to help children learn. The same logic applies. In fact that’s something I’ve said in talks for a period of time and the National Academy of Sciences report, which I know Mike and Lisa are going to talk about, in fact makes a similar claim. The rules of the game are the same.

     I have the word “brain surgery” up there. The reason I have the word “brain surgery” up there is that I think, you know, when we talk about medicine and things like brain surgery and cancer, it is very, very important to get it right. We all recognize that and most of us buy into that. You know, that you’ve got to have randomized clinical trials because we want to be able to benefit for these treatments for cancer.

     But when we teach students we really are engaging in a kind of brain surgery. We are effecting them one way or the other. Sometimes what we do helps, sometimes what we do, in fact, inadvertently, harms. We really don’t know until we do a randomized clinical trial whether what we are doing is benefiting that student or not. We really don’t know. It may be well intentioned, but that’s not sufficient as we can see from the example from bleeding. So, it is brain surgery essentially and it deserves the same kind of respect for the nature of the consequences, in my opinion.

     Next slide. So, I just told you that the randomized clinical trial, this randomized experiment where you can assign people to two groups and chance alone determines which one they end up in so that they are comparable in every way except for that key thing you want to look at in terms of cause and effect, I said that is the best form of evidence, and it is. It is the best form of evidence.

     However, do we have a lot of that type of evidence in this field that you can draw on? Now, we’ve exhorted you through legislation and a number of other things, you must use this, but is there a lot of gold standard level evidence out there about all the things we do on a daily basis in the classroom?

     No, there isn’t. There is some. There’s some evidence out there. A lot of the evidence, however, is lower on the hierarchy of the strength of evidence. I am going to just touch on this briefly. Again, the other speakers are going to talk about it in more detail. When did I start?

     MS. NEUMAN: Like ten of.

     MS. REYNA: Okay. So, there is a lower level of evidence that we can describe as quasi experimental or large data bases that essentially have lots of characteristics of students in them that you can correlate with one another and you can correlate with outcomes.

     The idea here is that nobody has been randomly assigned. In the real world randomness is a very rare thing. It’s a very artificial thing. In the real world there’s lots—everything’s correlated with everything else.

     Think about the example of socio-economic status. Correlated with everything, you know, your neighborhood, your number of books in the home, all of these things are associated in real life.

     But when you look at the pattern of associations, you can go in through statistical magic, that’s basically it, and you can artificially create a sort of comparison or control by sort of equating people on things. If you look at enough different combinations of people and enough different characteristics you can statistically attempt to control, to capture basically the logic of that gold standard, the randomized experimental trial. That’s always the logic, that’s always the goal.

     But here you attempt to do that by statistics. It’s not as good. It’s a lower level on a hierarchy of evidence, because there could always be something you are not controlling for that in fact is causing your outcome. That’s always possible.

     However, it is second best. It is not nothing. So, for example, you at least know that something is maybe probably true, that there’s a large number of what’s called in public health epidemiological studies, and there would be an analogy in education to those large studies, lots of attributes, the obvious things controlled for. You know, you could at least say, well, it’s probably true. That’s certainly better than we have no idea, much better than no evidence, well, what do you think? It’s not the top level of evidence, but at least it is evidence.

     Another thing that is a good source of extrapolation to practice is evidence based theory, and the evidence based theory is the crucial part. Theories whose predictions have been confirmed and disconfirmed—you know, there’s been an opportunity to disconfirm them as well, they’ve been tested—that are explanatory, that go into the mechanisms of how people learn, how they learn, what’s the process going on.

     If you know something about how people learn and how an intervention was effected, than you have some clue as to whether you can generalize it to your classroom, because you know the mechanism. You know what’s relevant and what’s irrelevant to the causal course of that intervention.

     Is the shoe size of the student relevant? Probably not. Why is that? Because we have an inclusive theory of how learning happens and it doesn’t have to include peoples shoe size. Right? So, if we have a tested theory, we can sometimes extrapolate beyond just the limited group that was originally studied. You know, sort of the boundary conditions for when an intervention is likely to be effective.

     Are there pitfalls of theory based extrapolation? Yes, because sometimes it can turn out to be that it doesn’t follow for that group for other reasons that weren’t study. So, there are always pitfalls.

     A lot of people worry about the fact that science, in some peoples view, is a soulless, heartless enterprise. What about the student as a person? What abut the interpersonal relationship between professionals, teachers, principals, so on and so forth and the student? Doesn’t science really take the heart out of things?

     I would argue: definitely not. When you give students the opportunity to learn and be successful that supports them as people to.

     Moreover, there is really no dichotomy between science and values, for example, or science and emotion. That is a false dichotomy. When we think about values, I think it is important to recognize that evidence does not determine our decision solely. It is not just the facts. It’s the facts plus values. But without the facts, we might make the wrong decision, even based on our values. Because we don’t know what’s true and what’s not true.

     The facts, the evidence is necessary to make decisions that effect students’ lives, but it’s not sufficient. But it is necessary. That is what we’re promulgating, that, at least, it be part of the discussion so that we can base practice on it. So, we’re talking about science with a human face, and that’s a person.

     This whole enterprise of translating scientific research into practice is very complicated. There is even research on how to do that. It’s called translational research. In medicine, for example, there’s a lot of that.

     That last bullet there is really an invitation for your help. I am at OERI, the Office of Educational Research and Improvement. We are thinking very hard about how to do this, how to most effectively be useful to you and to support you in what you are doing.

     So, I would be very, very interested in suggestions that you might have. I am going to stay for the whole day and practical suggestions about education and training, that sort of thing, would be enormously helpful for us. I think this symposium we have here today is a wonderful first step in that. But, it’s the kind of step we need to take and we need to take a lot more.

     Next slide. What is evidence based education? I am going to go through the next slides much more rapidly. I’m just going to sort of allude to points, and then if people want to talk to me more in depth, I’d be happy to do that. This is going to be pretty fast.

     We can’t get the slides up over there? Can you see and can you hear?

     So, what is evidence based education? The best available empirical evidence in making decisions about how to deliver instruction. But, again, we don’t have even the second level evidence about all the practices that currently occur in the classroom. Nor do we have even second or third tier level evidence about things that have to be accomplished in the classroom.

     So what is a professional to do? That’s when human judgment comes into play, to fill those gaps in evidence. That is inevitable. You have to apply your judgment. There are whole books written and research done just on the nature of human judgment. As you make decisions, you might want to dip into that literature. It’s actually quite helpful. Leaders of industry and business often get consultants to advise them about the nature of decision making and decision analysis.

     In a nutshell, what I would say is that there is a lot of wisdom in human judgment. That has been empirically demonstrated. There is also systematic bias in human judgment. That’s also been empirically demonstrated. It’s an inevitable thing that has to be an ingredient today and probably for many, many centuries more.

     We are just not going to know everything right now. That is the nature of science, and we are going to discover new things that make the old knowledge obsolete.

But, at least, in science it is cumulative progress. It builds on the knowledge of the past, if it’s truly science. It doesn?t throw away things people have learned that in fact have been effective. That is not the nature of science. Science is by its essence cumulative.

     What is empirical evidence? Well, the most important aspect of what’s up on that slide, is that it’s objective evidence. It’s the kind of evidence that if two people watched something, they’d say yes, that’s what happened there. The interpretation of that evidence has to do with what I alluded to earlier having to do with causal theory. That’s a whole other level, but at least what happened at a surface level is agreed on. Then you make hypotheses about why it happened and you test those and you can be wrong in science. That’s the nature of empirical evidence.

     Scientific research really is evaluated primarily on two big dimensions. One of them is the quality, and that is primarily in terms of scientific merit, and that has to do with the method. When I was talking before about randomized experimental trials, and large correlational studies, that’s methods, methods of analyses. That has a lot to do with the quality of the evidence. So, if it’s high on the hierarchy, if it’s the gold standard, it’s top quality. If it’s one notch down, it’s second level quality and so on until you get to things that are really at the level of anecdote which are maybe slightly suggestive, but they’re not the highest quality of scientific evidence.

     Relevance and significance, obviously, is the other criterion. Scientific merit and good methods alone don’t make the best scientific research. It has to be relevant to your practice and it has to be significant. The more significant it is, the more people are effected by something, the more severe the issue is that’s being effected, obviously the more important the research.

     So, if you look at the National Science Foundation, for example, and you look at the way they evaluate grants that they receive in the sciences, it turns out to be exactly those two criteria: scientific merit, relevance and significance.

     Next slide. So, here’s a little bit more detail on what I talked about before about levels of evidence. What are the levels of evidence? Again, for those people who can’t see, we’ll make this available in some form or other.

     Again, the other speakers will be talking much more in detail about his. But, we have our randomized trial at the top, then our quasi experiment, then our simple correlational study, and so on down the case studies.

     Go ahead. This is the logic once again in more detail about why randomized control studies are the gold standard, why they’re the highest level of evidence, why it’s what you should rely on with the greatest weight by far.

     Again, there’s self selection bias operating in the real world. What that means is people are assigning themselves to groups in the real world and it’s not random. People of a certain type tend to belong to certain groups to do certain things.

     People who smoke tend to drink more coffee. So, is it the coffee or is it the smoking? Well, you have to control for the drinking of coffee. It’s that sort of logic.

     Next slide. Why is randomization critical? Because it equates on this ways in which people are—differ that are correlated with one another. That’s why it’s so powerful.

     Again, this is just more detail for a longer talk.

     That’s just an example. You can go ahead and skip that.

     Now, when you think about relevance, this is a very difficult thing. Scientific merit you should use the hierarchy of evidence as your guide and that’s fairly straight forward.

     Relevance, on the other hand, is a much more sticky issue and much more difficult. But, one of the key things you can look for is does the study involve a similar intervention outcome to those of interest. You’d be amazed at how many times people say there’s evidence for something, then you go look it up and some very obvious things are wrong like they studied something else.

     They say one thing and it’s really something else. So, they say, okay, the effect of the graphing calculator on the ability to, you know, do certain kinds of mathematical computations without the calculator, you know, there’s some arguments about transfer. And they didn’t look at graphing calculators, they looked at non-graphing calculators. This is common sense.

     So, you’d be amazed at how many things you can screen out by asking some simple, common sense questions about relevance. You’ll screen out a lot of the junk by doing that.

     One of the things you can do is you can search the literature, obviously. Some of that requires, however, you know, folks that have advanced training. And how to do that and how to bridge that is something we should talk about.

     You can screen. Obviously, you should screen on the two dimensions we talked about, quality and relevance. Those should be your touchstones. You can search for evidence that has been interpreted. For example, I give an example of narrative reviews and meta-analyses.

     However, when people summarize the literature and they say they are summarizing the research in a field, the quality of those summaries varies a lot. Some of them are essentially an opinion piece. This is what I think. People’s opinions are interesting, but it is not something you want to necessarily base the lives of millions of children on with great confidence.

     Some reviews are much more formal and meta-analytic and scientific and another person looking at the same literature would make a similar conclusion, those are the ones you want. So, meta-analysis is totally superior to a narrative one.

     Go on to the last slide. This is the part where we talk about what we are trying to accomplish that we hope will support you.

     These are our goals and they are in our strategic plan and we really mean them. We’re trying very hard to achieve these goals.

     We want to provide information and tools. The goals we are ultimately looking for here though is that, as it is in medicine today, that at some point and I think this point is inevitable in the future, at some point the use of scientific research as a basis for educational practice will become routine. It will become customary and people won’t be able to imagine a time when that wasn’t done as a matter of course. Thank you.

     MS.NEUMAN: I think she makes that more clear than anything I’ve heard for a long time.