Scientifically Based Research — U.S. Department of Education– Pg 5
MS. NEUMAN: It’s a special pleasure today to introduce Steve Raudenbush. He’s a colleague of mine at the University of Michigan, and he’s one of those methodologists that actually talk in human language.
He’s a wonderful translator of research evidence and what we should begin to look for as we become critical consumers of research.
MR. STEPHEN RAUDENBUSH: Thanks, Susan. Susan made me promise not to show any equations! I will have no slides, so if anything’s going on up there, pay no attention to it.
In May of 1999, I had the good fortune to attend a meeting at the American Academy of Sciences, not to be confused with the National Academy of Science. The topic of the meeting was how to improve the scientific quality of educational research.
The two main organizers were two venerable characters named Howard Hyatt and Fred Mosteller. For Mosteller and Hyatt it was a kind of a d??j?? vu because they had been among the most influential people half a century earlier in advocating effectively that medicine should be based more on scientific research. They felt that the time was appropriate to make the same argument now in education.
At the time they made the arguments with respect to medicine, they were met with considerable skepticism. There was a famous (at that time, at least) well publicized debate between Hyatt and a heart surgeon. Hyatt was arguing that we should do experiments to see whether new surgical procedures are really effective as compared to let’s say medication. The heart surgeon asked him in a very poignant moment, “Sir, have you ever held the beating heart of a human being in your hand?” The surgeon argued that the cold logic of science did not replace the clinical judgment of the seasoned practitioner.
Hyatt and Mosteller, of course, in response argued that in a lot of cases the medical profession really doesn’t know what the best thing is to do and that in that situation it is unethical not to find out, and in fact if we can find out what works best than over the many years many millions of people perhaps will benefit and that would reveal the true ethical character of basing decisions more on science.
Over the last forty to fifty years, their argument, that of Mosteller and Hyatt, has in many ways I’d say largely won out, that we now in fact accept and admire the commitment of medical professionals to base, not all certainly, but some of their key decisions on research from clinical trials.
One of the questions that comes up that’s interesting is what caused the sea change in medicine and is it likely that anything like that might happen in education. That’s way too big of a question for me to try to answer, but there is an interesting vignette, I guess, a part of the story that has to do with the Salk vaccine for polio.
In the early studies in the ’40s and early ’50s on the Salk vaccine, the studies seemed to show basically that the vaccine wasn’t effective. People who had the vaccine were almost as likely or it may have been in fact equally likely to get polio as those who did not. By the way, at that time the vaccine had not been perfected. It was certainly far from perfect.
But subsequent research showed that higher income families were more likely to get the vaccine and higher income families in this case were more like to in fact get polio. It transmitted in places like swimming pools, places where high SES people, social class people actually had a higher risk.
Subsequently in 1954 was a very important, huge, national, randomized clinical trial on the vaccine. This was a double blind trial in which physicians didn’t know what vaccine, what treatment they were giving to people, whether it was actually the vaccine or just the placebo of sugar water. And, the people who were getting it didn’t know what they were getting. Having grown up in that era, you have to realize when you got sick in those days and the doctor came to your house, remember when the doctor used to come to your house? (Laughter.) Your parents would stand by in mortal fear as the doctor exercised your legs and did various things to see whether it was polio.
So, here people were doing this double blind randomized clinical trial and the people didn’t know what they were getting and the doctors didn’t know what they were giving. It’s quite remarkable that this happened.
But the results showed definitively that the vaccine was far more effective than not having the vaccine which led to further perfection, further clinical trials and ultimately the wiping out of polio as a disease.
Now, we may not expect quite such dramatic success in saving lives in education, although the relationship between education and health is actually a very durable and interesting one, so maybe not being educated can cause a loss of lives.
But there are striking parallels in education. The first evaluation of the Head Start program showed roughly equal cognitive skills at the end of the study if you compare the Head Start and the non-Head Start kids. But subsequent research showed that the Head Start kids had higher levels of poverty than the non-Head Start kids. Some then argued that the results actually showed that the Head Start program must be effective because the kids were doing better than you would have expected them to do given their social background.
So, here’s a result of two groups basically being the same and one group of people saying this shows Head Start is no good and the other group of people saying this shows Head Start is really good. The same evidence, but the evidence is so weak that it can’t really decide the question. Unfortunately, there was no follow-up experiment to give us a better answer.
This leads to a crucial point that Valerie made. In both there are striking parallels, as I said, between medicine and educational research. In both the early vaccine non-experimental trial and in the Head Start evaluation, there is something we call a “confounding variable.” In this case, family income.
As I said in the vaccine case, the higher income people were more likely to get the vaccine, but also more likely to get the disease and therefore the evaluation that didn’t use random assignment was biased against finding the effect of the vaccine.
In the Westinghouse study of Head Start, the evaluation was biased also against finding an effective Head Start because in this case the Head Start group kids were higher in levels of poverty which was associated with lower achievement.
So the power of experimentation, and this is a point Valerie made very clearly, in random assignment is to eliminate confounding variables. You see, we could match the kids—we could have done a better study than the first one. We’ve done many better evaluations since the original Head Start evaluation—we could match people on the basis of family background, making sure that the people we’re comparing are the same with respect to income or other social indicators. But we can never be sure that we have matched on some of the relevant confounders. Variables that predict getting the treatment that are also related to the outcome are confounders. And with random assignment, we eliminate confounders, all confounders even the ones we haven’t thought of, and that is the power of the experiment.
Now, this leads to a series of questions and answers that really form the basis of this paper and I will go through them rather straightforwardly, through them rather quickly here. I’ve got actually ten of them.
The first one is: Am I then saying that only studies that use random assignment are scientific? The answer is no, I’m not saying that.
First, a randomized trial is relevant only when there’s a causal question on the table. There are many terrifically important questions for educational policy that are not causal.
For example, this seems so simple, but have high school graduation rates changed over the past ten years? Which kinds of kids in which kinds of cities and states and in which kinds of schools are at highest risk of dropping out? Tremendously important for policy to know the answer to that question. It is not a causal question. We need a carefully designed survey to answer that question.
So, not all questions are causal. But, secondly, even when a question is causal, it may be impossible to do a randomized study. Another analogy with medicine: researchers have come to a strong consensus that smoking causes lung cancer, but we never had a clinical trial where we randomly assigned people to smoke two packs a day. Yet, we had a variety of scientific inquiry that led to a strong conclusion. We need to know how family conflict effects school achievement, but can you imagine the experiment that would test that causal hypothesis? (Laughter.)
Third, randomized experiments sometimes create artificial circumstances that limit the generalized ability of their findings. I won’t go into detail, but sometimes you need corroborating evidence from studies in a natural setting that aren’t randomized and across—the randomized evidence might be crucial, but you need to supplement it to see whether a new program works in a less controlled setting.
The second questions is: Suppose we do have a causal questions, how do I then judge the scientific quality of the study that doesn’t use random assignment? I guess, what I would say here is that in all of science at the heart of it is an obligation of the researcher to systematically and painstakingly alternative explanations for any finding of interest.
So, if I see a study over here where these kids had a new writing program and these kids didn’t, and these kids, the kids in the writing program are doing better than the ones who didn?t, I don’t just say, “That shows the writing program is good.” I think about other explanations for why that might have happened and I evaluate them. It’s harder to do when you don’t have a randomized experiment, but it is still essential.
So, a scientist is expected to search for disconfirming evidence, and that’s a crucial feature.
Even if we did a randomized experiment, let’s say we did the writing study, we randomly assigned kids to do the writing program or not, we’d still need to develop alternative explanations for why the program worked. The experiment might tell us that the program works. But we want to go further to know what are the crucial ingredients because that may be very helpful to practitioners and policy.
So, even in the randomized context we need to search for explanations, alternative explanations, disconfirming evidence.
Moreover, randomized experiments are never perfectly implemented. So, people who drop out of the study, you’ll have missing data in the two groups. We still have to worry about subtle or not so subtle biases.
So, what makes a causal comparative study then is not simply whether there was random assignment, but whether the investigators have effectively, critically evaluated competing explanations for what was found.
That leads to my third questions: Isn’t it a little bit Pollyanna-ish to expect this scientist, this investigator to police me, let’s say, to police myself and I’m a human being with biases and I’m supposed to evaluate all these things. Well, the key point here is the burden of objectivity does not fall entirely or even primarily on the shoulders of the individual investigator.
The role of the scientific community is key. It’s a healthy scientific community who can—and this relates to democracy, being able to freely evaluate alternative points of view, not feel that there’s going to be some censorship.
The people who are committed to the principles I just mentioned who evaluate this, the process of objectivity really involves this group of people engaging in this ongoing debate. Scientists, as was mentioned, are trained to be skeptical and that process can really work. What’s really in the final analysis scientific is what the community of scientists says is scientific.
How am I doing for time?
MS. NEUMAN: You’re doing okay.
MR. RAUDENBUSH: So, now, so far, if we have a causal question we’d like to do a randomized experiment, we may not be able to, if we can’t, we’ll do it as scientifically as we can, and then sometimes we don’t have causal questions.
This kind of takes us back to a prior question: Is it really possible to do randomized experiments in education? I would argue, yes.
The Tennessee class size study, which by the way Frederick Mosteller called the most important educational study in decades. An amazing state-wide randomized experiment to evaluate the impact of large versus small classes. I’m sure you’re going to hear about some more of them today actually a little bit later in the next session, if I don’t talk too long.
Thomas Cook has done two randomized experimental evaluations of the James Comer whole school reform program. There have been many randomized experiments in schools on the effectiveness of drug prevention programs, not as many though on instruction which is interesting.
So certainly they can be done. The fifth question then is: How can we do them ethically? In the paper, I sketch some scenarios where we can very ethically, very practically, very feasibly do large scale experiments.
Often, what will be randomly assigned to treatments though will not be children. It may, in fact, be schools. Imagine a popular program, I mention “Success for All” simply because as an early literacy program, it’s a program that has—there are over a thousand schools already in it. Many schools want to get into it, but it’s expensive. So, a lot of people want to get it, but they don’t get it. And also the people who run that program can only implement it in so many schools in any given year.
We could run an experiment where we asked people to sign up who want to do it, perhaps give it to them free or at a reduced cost and just say there’s only one condition, we can’t give it to you all at the same time. We’re going to have a lottery that’s going to determine who gets it first which is a very fair way of deciding who gets it first.
So, during that interim period where one group of schools has started to do the program and the others are still waiting, you have a randomized experiment, and a very ethically organized one. That’s just one example. There are other ways.
We need to learn how to do this. People didn’t think you could do it in medicine. Like I said, the Salk vaccine trial was incredible, the double blind experiment. We need to be able to make the argument and we need to learn how to do this stuff.
Number six. I mentioned that not all scientific questions in education are causal, and can I give you a few examples? I’m not going to give you too many. But I do want to mention that we may not have been doing such a good job in education of doing impact studies, causal comparative studies, what works. We need to do a lot more of that.
We’ve done a pretty good job of doing scientific surveys, though. Large scale, national longitudinal studies, tremendous amounts of learning have come out of those studies. And, I’m on the—I’m going to toot the horn of—the AERA National Science Foundation grants committee which has given out small amounts of money to large numbers of young investigators. We have a report that shows hundreds of terrific scientific contributions coming out of that, but generally not of the strong causal character because it’s really based in fact on survey research.
So, we have done pretty well there. I won’t go into the examples in the interest of time, but there are lots of them.
Number seven: How are the best non-causal studies judged? There is this class. We can’t just forget about the fact that a lot of the scientific research is not causal. So, we have a bunch of questions: How did we select the sample? Do they represent a population? How do we measure the key constructs? Is there an established reliability and validity to those constructs? Was the analysis done accurately? Were alternatively explanations painstakingly assessed?
Those are some principles. But, once again, the key point is in the final analysis it’s scientific peer review that applies those principles in a case by case way to evaluate the credibility of the findings.
So, number eight. I’ve only mentioned quantitative research. Does qualitative research play a role? I would say, yes, without doubt. Because we need to not just test the impact of things out in the field, we need to do a lot more of that. We haven’t done enough. But we have to have good things to take into the field. We have to have good ideas about how to teach math, how to teach reading. Those ideas come from up close, careful study of expert practitioners in real settings and how kids learn. So, we need that up close kind of research but see we’ve got to do a better job of connecting that research with field trials of what works, and that’s what’s really been missing.
Number nine. I ask: How do you combine insights? If I’ve said you have to have experiments and you have to have surveys and you have to have quality of research, how do you combine the insights from the different kinds of inquiry? I hate to go back to a medical example, but it’s a very telling one. It’s the causal relationship between smoking and lung cancer.
As I said, you couldn’t do an experiment to make people smoke two packs a day, but you could do a randomized experiment on animals. Strong causal inference, but generalized ability to humans? Then, we do good non-experiments, or quasi-experiments or at least comparisons between smokers and nonsmokers using the best possible survey methods and qualitative research.
Here the analogy is looking at lung tissue and finding out that the lung tissue of smokers is damaged in ways that we might think would be linked to cancer. You put them all together and the weight of evidence, the experimental evidence on the animals, the survey evidence on people and the lung tissue—qualitative, put them together and you get a very compelling case.
We need to do that better, and that’s going to require a very effective and active scientific community.
My tenth and final question is: Is there any danger here that we are going to be overselling the role of science in education? I think there is.
I’ve got a quote here from E. L. Thorndike who wrote the lead article in the founding edition of the Journal of Educational Psychology in 1910. I won’t read the entire quote except to say that Thorndike felt that a scientific psychology was about to produce decisive evidence on virtually every practical question that arises in education. We know in retrospect that he was wrong. Unfortunately, by overselling what science can do, it led to a crisis of, you might say, rising expectations that couldn’t be met. For a long time thereafter science in education fell into disarray.
The same thing happened in the ’60s with scientific problem solving, the idea that we would have kind of a social engineering model. We’d try programs, we’d evaluate them, we’d get feedback, the programs would get better and the great society was going to be born out of this sort of scientific and engineering model. That was an overselling. We couldn’t really pull that off.
So, let’s make sure that we have a balanced view this time. I am so excited that we have an opportunity to do it, to do it right without overselling it this time. I am delighted to have had the chance to be here because I think we’re at a point in history where there seems to be for some reason a confluence of factors and the determination of people who have some power here who organized this, to really improve the quality of research in education and the link between science and education and practice.
Thank you very much.
MS. NEUMAN: A wonderfully wise man.
I know that we went a little bit longer. What we’ll do is, I think, take a break and then come back at quarter to, and then what we’ll do is we’ll combine the two discussion sessions, since I really do want time for questions.
Our next set will be more practical implications in terms of our programs.
Have a good break. There’s coffee in the cafeteria, a good Starbucks across the street.
(Whereupon, the foregoing matter went off the record at 10:24 a.m. and went back on the record at 10:44 a.m.)
MS. NEUMAN: We’re going to get started again.
Let me tell you that all the talks are going to be on the web, as well as in print. I know I forced people to rush through their presentations. The more complete presentation of each will be available to you immediately to you on the web, and, then, in a little bit longer period, in print.
First, before we begin our sessions, I’m just delighted to introduce Linda Wilson. She’s the deputy in OIIA, the Office of Intergovernmental and Interagency Affairs. Did I do that right?
MS. LINDA WILSON: Yes, exactly.
MS. NEUMAN: Good.
MS. WILSON: Hi, I just wanted to make a very quick notice. The department tomorrow is going to be releasing publicly a draft of the strategic plan. It will be on our website. It communicates the President’s and Secretary’s priorities for education over the next five years. It has very strong accountability, much like the No Child Left Behind Act, and it will guide our work here at the department.
It sets high expectations for us and it provides leadership to the nation’s educational system. It’s built on six strategic goals, which are create a culture of achievement, improve student achievement, develop safe school and strong character, transform education into an evidence based field, enhance the quality of and access to post secondary and adult education and establish management excellence.
The plan will not be nor should it be a trophy to hang on the wall. It’s a living document that will guide the course of our work here through the next five years.
Secretary Paige is very committed to this. He has announced his intention to hold each department of education program, office and employee accountable for their responsibilities for implementing this plan.
The reason I am telling you this is because we would welcome your input to this process. As I said, it’s going to be available on the web tomorrow. Your comments we would need by 5:30 p.m. on Thursday, February 21st.
MS. NEUMAN: Thank you very much, Linda.
Now, we turn to implications: What are the implications of a scientific based research approach to our programs, so many of our programs that are going out to children?
I’m asking each of these presentations to be real brief because I really want to give you opportunity to ask questions and make comments.