Simple Design of Experiments – Analysis of Variance (Anova)
So I have an upcoming engineering project I’m working on… I’m trying to optimize an unusual powered propulsion system. I’m still working on a iOS / Android app to take detailed response data, but that’s another story. Right now I’m wondering how I’m going to do the statistical analysis for the testing when I begin to accumulate results. In the old days, I’ve used MiniTab. I can remember doing a whole lot of Gage R&R (Repeatability and Reproducibility) studies using that tool. A gage R&R study is really an Anova experiment which quantifies measurement error in a system. (Can you measure the thickness of a piece of paper using a wooden 12″ ruler? Of course not, etc…) MiniTab is an awesome software program, but holy moly is it expensive. Ouch. I’m wondering in this day and age is there is an open source alternative. And that’s what this posting is all about. I thought I’d build an experiment, and then attempt to analyse it with open source (read that as “free”) available software.
First off. I wanted an experiment that would be cheap and easy to do. Something that would be easy to understand. Something I (or anybody else) could test easily. And here’s what I came up with.
The Coin Drop Test
- Factors (each factor will have two levels):
- Size of Coin (Dime or Quarter)
- Height of drop (60″ or 30″)
- Coin Release Orientation (horizontal or on edge)
- Drop Hand Technique (finger/thumb or two fingers)
- Target Surface (exercise mat vs deep pile carpet)
- Distance from dropped coin to target center (C to C, inches)
The target is a piece of masking tape with a simple ‘X’ marked on it. For the target surface I used two different floor surfaces. My initial thought was that the deep pile carpet would preclude coins from bouncing too far away from the target, relative to the gym mat. I used two different easily available coins.. a quarter and a dime. I would have thought the heavier coin might travel less than the dime. Coin release orientation? My initial thought is that a coin dropped “flat” would remain close to the target. Obviously I knew there would be a lot of variation in the test results. I wanted to see where an experiment with lots of variability might go. I designed a full factorial experiment (2^5). In fact I ran each test four times for a total of 128 different tests. I was very carefully to randomize the tests. I used a simple random number generator, then re-sorted the test criteria in an XL sheet.
A couple of observations, notes: It would be best to decide before you start how accurate you want to be in your measurement. Round off to nearest inch? Nearest half inch? Or find a metric tape measure and measure in mm or cm? I started with nearest 1/4″ and that was probably not necessary. There was a lot of variability in the drop technique. Quite a few times, when using the two finger coin on edge method, when using a dime, the dime would stick to my fingers delaying the drop. Its funny, but by paying close attention to the test, you can sometimes spot unexpected trends, something you want to test in the next round of analysis.
And if you want to repeat or modify this experiment, here is the raw data! Do note, there are two columns there to aid in the randomization of the experimental design. Check out the columns original_order and random_order. Sort by one column or the other as necessary. In normal order it’s pretty easy to see how this full factorial experiment was setup.
And what makes the design of experiment / ANOVA so awesome, is you are not making predictions about results… but instead merely observing what happens. You may believe something is true, but this is a way to prove it (or not!) This is an experiment, used to help identify possible further opportunities to improve desired response. You may not know WHY factor A is better than factor B, but in observation something is measurably different between those two factors.
Analysis of Variance (ANOVA) — What is it?
To determine whether the difference in results is due to random chance or a statistically significant different process or factor, an ANOVA F-test is performed. The F-test is a tool used by statisticians to determine if different test observations occur because of randomization or a true difference in outputs based on which input (factor) is in use. The ANOVA F-test uses the null hypothesis that:
- H(0): Coin size will have no significant effect on distance to target after coin drop.
- H(0): Drop height will have no significant effect on distance to target after coin drop.
- H(0): Coin release orientation will have no significant effect on distance to target after coin drop.
- H(0): Drop hand/finger technique will have no significant effect on distance to target after coin drop.
- H(0): Target surface will have no significant effect on distance to target after coin drop.
Analysis is performed on all the factors one at a time, and then in combination in a decent software package. You could program this yourself (in XL?) but its pretty easy to make a mistake, and not generally recommended. Note: for a nice discussion of the details in how such a program would work, I found this analysis pretty helpful:
And that lead me to start looking at open source software packages that might work.. I looked a whole lot of things. I discover a whole lot of paid applications. Many of those included a 30 day free trial, but I’m really looking for a long term sustainable solution. I ended up looking very close at two different packages, PSPP and R. A very common package in use is the paid program Statistical Package for the Social Sciences (SPSS). Its a very nice program from the folks at IBM, but definitely not inexpensive. GNU PSPP is a program for statistical analysis of sampled data. It is intended as a free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions. Actually there are quite a few exceptions. You can run a F-test analysis on a single factor, but I was unable to run a complete analysis on a multifactor experiment. Perhaps I was just doing it wrong, but that just didn’t work for me.
Instead I discovered the R project. Yowza, we have a winner. Its not totally intuitive as using my favorite tool Minitab, but with a bit of effort you can get some decent results…
First, download R, then run it. I’m using the R-GUI. In the console paste the following:
datafilename="C:/Users/Username/DirectoryX/dfd_coin_drop_experiment.csv" #tell where the data come from
data.ex1=read.csv(datafilename,header=TRUE, sep = ",") #read the data into a table
aov.ex1 = aov(distance_from_target~coin_size*drop_height*drop_orientation*hand_technique*target_surface,data=data.ex1) #do the analysis of variance
summary(aov.ex1) #show the summary table
The magic happens here. That last column, Pr, is the probability that the stated the null hypothesis is valid. Its is a predictive indicator. Obviously if the probability is low, than we must reject the null hypothesis as stated. For this test, we are going to look at a 95% confidence level. Large values of Pr indicate that our null hypothesis is correct, that for even though there may be differences in calculated means, because of variability in the system, the following null hypothesis is completely valid. H(0): Coin size will have no significant effect on distance to target after coin drop. Pr(>F) = 0.77268 It’s only when those probabilities are very small that differences in the factor have real effect on the response. If Pr(>F) is less than 0.05 those factors are critical to the system response. The anova routine from R includes nice visible indicators for factors that may be significant.
As you scan the Anova results, you can see that the following factors are significant.
- drop height
- an interaction between coin_size:drop_orientation
- an interaction between drop_height:drop_orientation
- an interaction between drop_orientation:hand_technique
- an interaction between drop_orientation:hand_technique:target_surface
Do note, that in most experiments, significant dual factor interactions are pretty rare. Generally we are only concerned with single factor elements. That the factor of drop height is significant seems pretty intuitive. Coins dropped from the lower height ended up closer to the target than coins dropped from a higher height. Frankly I was surprised at the other interaction factors here. I suspect there is just a whole lot of variability in what’s going on. And I think some changes should be made to the tested factors, and the experiment re-run. I was also surprised here by some of these results. I was pretty sure we’d see coins closer to the target from the deep pile carpet than the mat, but that’s not what the results reveal. I wonder what would happen if I poured a half inch of beach sand on the carpet and rerun the test? ( Oh, I know the answer to that one, even without testing; my wife would kick me in the butt, and toss me out of the house. )
And let’s quantify the means of each test factor and combinations:
print(model.tables(aov.ex1,"means"),digits=2) #report the means and the number of subjects/cell
Finally lets plot our key factor results. Note that in the following plots, the thick black line represents the median value, the colored box represents the 25% to 75% percentile performance.
Plot — Drop Height
boxplot(distance_from_target~drop_height,main="Coin Drop Test", xlab="Drop Height (inches)", ylab="Distance from Target Center", col=rainbow(7),data=data.ex1)
[Pr(>F) = 0.00226] Its pretty easy to see here that the accuracy to target is better with a low (30″) drop height. Additionally variability is smaller with the 30 inch drop height compared to the 60 inch drop height.
Plot — Drop Height : Drop Orientation Interaction
[Pr(>F) = 0.01241]
boxplot(distance_from_target~drop_height:drop_orientation,main="Coin Drop Test", xlab="Drop Height (inches): Coin Orientation", ylab="Distance from Target Center", col=rainbow(7),data=data.ex1)
Look closely at this graph. You can see why coin orientation all by itself isn’t a significant factor. You can also see why the interaction works the way it does. Was this expected, no way. The data here is observed. But according to the test results, its a valid predictor, given those two factors defined in that way. Obviously if you want to minimize coin distance to target, you’d run your system with coin held on edge, dropped from 30 inches for best results. I will say, when I start to see interactions like these my tendency is to really analyse the system try to figure out what causes these results and adjust (or add) additional factors to further improve performance.
Plot — Drop Orientation : Hand Technique Interaction
[Pr(>F) = 0.02917]
boxplot(distance_from_target~drop_orientation:hand_technique ,main="Coin Drop Test", xlab="Drop Orientation: Hand Technique", ylab="Distance from Target Center", col=rainbow(7),data=data.ex1)
Wait, er what the heck? Look at this graph and the one above it. The results here sort of conflict with the results above. Above your solution was to hold coin on edge, but here, the optimal solution was to hold coin horizontal. Remember I said interactions aren’t all that common. Again, you may have to adjust and / or add additional factors to improve performance. There may well be a better way to solve to optimize the system’s design.
Plot — ETC…
Obviously you can continue to plot out all factors with Pr(>F) less than 0.05…
- coin_size:drop_orientation Pr(>F) = 0.03362
- drop_orientation:hand_technique:target_surface Pr(>F) = 0.03734
Plot — One factor that wasn’t significant…Target Surface
[Pr(>F) = 0.11383] This was kind of a surprise. I fully expected deep pile carpet to improve coin to target performance. It didn’t have a significant effect at the 95% confidence level. With the median values and variability in these results, no advantage here.
boxplot(distance_from_target~target_surface,main="Coin Drop Test", xlab="Target Surface", ylab="Distance from Target Center", col=rainbow(7),data=data.ex1)
And that concludes our exercise. I’m hoping this example makes sense. My goal was to pick a test example that intuitively gave the user a feel for what’s going on. Did the results meet your expectations? And as for open source software, it seems like R is a winner for our analysis.