/* JOIN-COUNT PERMUTATION PROGRAM written by Patrick Stevens. Please cite this program with the following paper: Patrick H. Stevens & David G. Jenkins. 2000. Analyzing species distributions among temporary ponds with a permutation test approach to the join-count statistic. Aquatic Ecology 34:91-99. */ /*The following program does 2 things: (a) it performs a test for random locations of species in a given set of n sites using an asymptotic normal distribution and (b) it performs a permutation test without assuming any distribution. The data already entered in the program are those of Stevens and Jenkins (2000). You should first run the program as shown, and the output should be similar to that shown in Stevens and Jenkins (2000) - though slight variation may occur as a result of random permutation outcomes. Then you can follow the Instructions to replace the existing data with your own.*/ /*THE DATA ENTRY PORTION*/ options ls=75; data sites; /* SEE INSTRUCTION #1 FOR THIS SECTION*/ /*In the following line, the site names and the presence/absence of the species should be entered. There are currently variables for 18 species. If you have less, remove the ones you don't need. If you have more, add variables ...spec19 spec20 spec21 spec22...etc for as many more species as you need to analyze*/ input site$ spec1 spec2 spec3 spec4 spec5 spec6 spec7 spec8 spec9 spec10 spec11 spec12 spec13 spec14 spec15 spec16 spec17 @@; /* END OF INSTRUCTION #1 SECTION*/ /* SEE INSTRUCTION #2 FOR THIS SECTION*/ /*This data represents event occurrence and non-occurrence at any time during the sampling period. A4, B1 etc represent site names. You can choose any arbitrary site names that you desire. 1=species present, 0=species not present.*/ cards; a4 1 1 o 1 1 0 0 0 1 0 0 0 0 0 0 1 1 b1 1 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 0 d4 0 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 d7 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 d9 1 1 0 1 0 0 0 1 1 0 1 1 1 1 1 1 0 e6 0 1 1 1 0 1 0 0 1 0 0 0 0 0 0 1 0 h1a 1 1 1 1 0 0 1 1 1 1 0 0 0 0 1 0 1 h1b 0 1 1 1 0 0 0 1 1 1 0 0 0 0 1 1 0 h1c 1 1 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 i5 1 1 0 1 0 1 1 1 1 1 1 0 0 0 0 0 0 i12 1 1 0 1 1 0 1 0 1 0 1 0 1 0 0 1 1 j3 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 l4 0 1 0 1 0 0 1 1 1 0 0 0 0 0 1 0 0 l6 0 1 0 1 0 0 1 1 1 1 1 0 1 0 1 0 0 m6 0 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 ; /* END OF INSTRUCTION #2 SECTION*/ data geo; /*The following data set represents the site-to-site distances. This example uses Euclidian distance in meters. You may use any measurement method you would like. Some of the other included examples use simple binary connections (1=connected, 0=not connected). You should have a 'd' variable for each site analyzed. For example, if you have 20 sites, then you need d1 through d20. If you only have 6 sites, then you need to remove d7 through d15.*/ /* SEE INSTRUCTION # 3 FOR THIS SECTION*/ input d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15 @@; cards; 0 116 98 114 152 98 200 210 188 221 233 310 305 326 319 116 0 129 222 266 187 185 207 182 269 256 345 358 378 419 98 129 0 132 182 80 103 112 90 141 140 224 230 251 302 114 222 132 0 51 54 219 216 204 168 206 256 233 252 205 152 266 182 51 0 104 269 264 253 203 248 287 255 272 182 98 187 80 54 104 0 166 164 150 132 161 224 211 231 232 200 185 103 219 269 166 0 25 16 130 88 175 206 223 343 210 207 112 216 264 164 25 0 25 109 64 150 182 198 326 188 182 90 204 253 150 16 25 0 119 82 171 198 216 329 221 269 141 168 203 132 130 109 119 0 56 91 89 110 220 233 256 140 206 248 161 88 64 82 56 0 89 118 135 275 310 345 224 256 287 224 175 150 171 91 89 0 56 61 253 305 358 230 233 255 211 206 182 198 59 118 56 0 21 199 326 378 251 252 272 231 223 198 216 110 135 61 21 0 204 319 419 302 205 182 232 343 326 329 220 275 253 199 204 0 ; /* END OF INSTRUCTION #3 SECTION*/ proc iml; /*The rowname matrix below allows you to enter your species name or arbitrary names. To use your own, just place the desired names inside ' ' . There should be 1 name for each of the spec variables entered above. For example: if you have spec1 through spec24, then you need 24 names.*/ /* SEE INSTRUCTION #4 FOR THIS SECTION*/ rowname= ('Attheyella sp.')// ('Canthocamptus sp.')// ('Cryptocyclops bicolor')// ('Cyclops navus')// ('Cyclops nearcticus')// ('Cyclops haueri')// ('Cypridopsis sp. ('Cyprois sp.')// ('Daphnia obtusa')// ('Eubranchipus serratus')// ('Lynceus brachyurus')// ('Onychodiaptomus sanguineus')// ('Osphranticum labronectum')// ('Pleuroxus striatus')// ('Scapholebris mucronata')// ('Simocephalus exspinosus')// ('Simocephalus serrulatus')// ; /* END OF INSTRUCTION #4 SECTION*/ reset fuzz; use sites; /* sites refers to data set sites*/ /* SEE INSTRUCTION #5 FOR THIS SECTION*/ /*The next statement has to have as many spec variables as you entered previously. For example, if you have spec1 through spec25 (25 species) then you need to add spec19 through spec25 at the end of spec18. Remember to follow the format exactly.*/ read all var { spec1 spec2 spec3 spec4 spec5 spec6 spec7 spec8 spec9 spec10 spec11 spec12 spec13 spec14 spec15 spec16 spec17 spec18} into ymat; /* END OF INSTRUCITON #5 SECTION*/ nsites= nrow(ymat); /*number of sites*/ nspecies= ncol(ymat); /*number of species*/ nsub=nsites-1; /* SEE INSTRUCTION #6 FOR THIS SECTION*/ nrep = 100; /*nrep should be set as high as possible considering your computers capabilities. The higher the number the longer the program will take to run. Setting nrep equal to 2000 required approximately 2 minutes to run. This also depends on the number of species and number of sites that you have. The larger these numbers are, the longer the program will take to run.*/ /* END OF INSTRUCTION #6 SECTION*/ use geo; /* SEE INSTRUCTION #7 FOR THIS SECTION*/ /*There should be a 'd' variable for each site. So if you have 20 sites, then you need to add d13 to d20. Follow the format exactly.*/ read all var {d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12} into d; /* END OF INSTRUCTION #7 SECTION*/ /*THE STANDARD NORMAL TEST PORTION OF THE PROGRAM*/ s0 = sum(d); /*The following procedure calculates BW per Cliff & Ord (1981) pg. 13*/ jtotbw = j(nspecies,1,0); do k = 1 to nspecies; y = j(nsites,nsites,0); do j = 1 to nsites; m = ymat[j,k]; do i = 1 to nsites; if ymat[i,k] ^= m then y[i,j]=1; end; end; step2 = y#d; jtotbw[k] = .5*sum(step2); end; /*The following procedure calculates BB per Cliff & Ord (1981) pg. 13*/ jtotbb = j(nspecies,1,0); do k = 1 to nspecies; y = j(nsites,nsites,0); do j = 1 to nsites; do i = 1 to nsites; if ymat[i,k]+ymat[j,k] = 2 then y[i,j]=1; end; end; step2 = y#d; jtotbb[k] = .5*sum(step2); end; /*The following procedure calculates S1 per Cliff & Ord (1981) pg 19*/ s = 0; s1a = 0; do i = 1 to nsites; s1a = s1a + (d[i,j]+d[j,i])**2; end; s1 = .5*s1a; /*wid=w.i and wjd=wi. both used to calculate S2, Cliff & Ord (1981) pg 19*/ wid = j(nsites,1,0); do i = 1 to nsites; wi = 0; do j = 1 to nsites; end; wid[i]=wi; end; wjd = j(nsites,1,0); do i = 1 to nsites; wj = 0; do j = 1 to nsites; end; wjd[i] = wj; end; s2 = ssq(wjd + wid); /*usnonexp calculates the expected values for the join counts when using nonfree sampling (hypergeometric) for BB joins. osnon calculates the variance for the same.*/ usnonexp = j(nsub,3,0); n2 = nsites*(nsub); n3 = n2*(nsites-2); n4 = n3*(nsites-3); do nr = 1 to nsub; nr2 = nr*(nr-1); nr3 = nr2*(nr-2); nr4 = nr3*(nr-3); usnonexp[nr,1]=nr; usnon = (s0*nr2)/(2*n2); usnonexp[nr,2] = usnon; osnon = .25*(s1*nr2/n2 + (s2-2*s1)*nr3/n3 + (s0**2+s1-s2)*nr4/n4) - usnon**2; usnonexp[nr,3] = osnon; end; /*udnonexp calculates the expected values for the join counts when using nonfree sampling (hypergeometric) for BW joins. udnon calculates the variance for the same.*/ udnonexp = j(nsub,3,0); do nr = 1 to nsub; nr2 = nr*(nr-1); nr3 = nr2*(nr-2); nr4 = nr3*(nr-3); ns2 = (nsites-nr)*(nsites-nr-1); udnonexp[nr,1]=nr; udnon = (s0*nr*(nsites-nr))/(n2); udnonexp[nr,2]=udnon; odnon = .25*(2*s1*nr*(nsites-nr)/n2 + udnonexp[nr,3]=odnon; end; /*case calculates how many ponds each species was present in.*/ case = j(nspecies,1,0); do i = 1 to nspecies; case[i] = sum(ymat[,i]); end; print 'Total ponds With species present',, rowname case; /*exob gives the expected and observed join counts BB and BW for this system*/ exob = j(nspecies,4,0); do i = 1 to nspecies; j = case[i]; k = nsites-j; if j = nsites then exob[i,1]=0; else exob[i,1] = usnonexp[j,2]; if j = nsites then exob[i,2]=0; else if j = 1 then exob[i,2] = 0; if j=nsites then exob [i,3]=0; else exob[i,3] = udnonexp[j,2]; exob[i,4] = jtotbw[i]; end; /*bwz calculates the z-scores and p-values for testing the BW join counts as standard normal deviates.*/ bwz = j(nspecies,2,0); do i = 1 to nspecies; j = case[i]; if j=nsites then bwz[i,1] = 0; else if j=1 then bwz[i,1]=0; else bwz[i,1] = (jtotbw[i] - udnonexp[j,2])/sqrt(udnonexp[j,3]); bwz[i,2]=1-probnorm(abs(bwz[i,1])); end; /*bbz calculates the z-scores and p-values for testing the BB join counts as standard normal deviates.*/ bbz = j(nspecies,2,0); do i = 1 to nspecies; j = case[i]; if j = nsites then bbz[i,1] = 0; else if j=1 then bbz[i,1]=0; else bbz[i,1] = (jtotbb[i] - usnonexp[j,2])/sqrt(usnonexp[j,3]); bbz[i,2]=1-probnorm(abs(bbz[i,1])); end; /*finalbb and finalbw put the data for the standard normal tests into an easy to read format for printing.*/ finalbb = exob[,1]||exob[,2]||bbz; print 'Join Count Spatial Autocorrelation Statistics-Presence/Presence',, rowname finalbb[colname={'Exp BB' 'Obs BB' 'z-score' 'p-value'}format=f9.3]; finalbw = exob[,3]||exob[,4]||bwz; print 'Join Count Spatial Autocorrelation Statistics-Presence/Absence',, rowname finalbw[colname={'Exp BW' 'Obs BW' 'z-score' 'p-value'}format=f9.3]; /*THE PERMUTATION TEST PORTION OF THE PROGRAM*/ /*b is the number of sites species can be present to display patterns of presence or absence*/ b=nsites-2; /*percentilebb and percentilebw store the cut-off points for the permutation test*/ percentilebb=j(6,b-1,0); percentilebw=j(6,b-1,0); alpha3=.01//.05//.10//.90//.95//.99; do pr=2 to b; ymat=j(nsites,nrep,0); /*This part randomly generates species presence or absence*/ do k= 1 to nrep; test=j(nsites,1,0); do i = 1 to nsites; here: num=int(ranuni(1234567)*nsites)+1; do j = 1 to nsites; if num =test[j] then goto here; end; test[i]=num; end; ymat[,k]=test; end; free test; /*This part assigns values of 0=absent or 1=present to the randomly generated distribution of points*/ do i = 1 to nsites; do j = 1 to nrep; if ymat[i,j]<=pr then ymat[i,j]=1; else ymat[i,j]=0; end; end; jtotbb=j(nrep,1,0); do k=1 to nrep; y=j(nsites,nsites,0); do j = 1 to nsites; do i = 1 to nsites; if ymat[i,k]+ymat[j,k]=2 then y[i,j]=1; end; end; step2=y#d; free y; jtotbb[k]=.5*sum(step2); end; /*The next two lines order all the random bb join counts calculated from smallest to largest*/ s=jtotbb; jtotbb[rank(jtotbb),]=s; do i = 1 to 6; percentilebb[i,pr-1]=jtotbb[alpha3[i]*nrep]; end; end; do pr=2 to b; /*2nd do loop for BW statistic*/ ymat=j(nsites,nrep,0); do k= 1 to nrep; test=j(nsites,1,0); do i = 1 to nsites; therey: num=int(ranuni(1234567)*nsites)+1; do j = 1 to nsites; if num =test[j] then goto therey; end; test[i]=num; end; ymat[,k]=test; end; free test; /*This part assigns values of 0=absent or 1=present to the randomly generated distribution of points*/ do i = 1 to nsites; do j = 1 to nrep; if ymat[i,j]<=pr then ymat[i,j]=1; else ymat[i,j]=0; end; end; /*The following procedure calculates BW per Cliff & Ord (1981) pg. 13*/ jtotbw=j(nrep,1,0); do k=1 to nrep; y=j(nsites,nsites,0); do j = 1 to nsites; point=ymat[j,k]; do i = 1 to nsites; if ymat[i,k]^=point then y[i,j]=1; end; end; step2=y#d; free y; jtotbw[k]=.5*sum(step2); end; s=jtotbw; jtotbw[rank(jtotbw),]=s; do i = 1 to 6; percentilebw[i,pr-1]=jtotbw[alpha3[i]*nrep]; end; end; /* SEE INSTRUCTION #8 FOR THIS SECTION*/ /*This ends 2nd do loop of 'do pr=2 to b'*/ /*In the line beginning 'colname', you must have statements of the form 'Pr=2' through Pr= the number of sites minus 2. For example, if you have 20 sites, then you need 'Pr=2' through 'Pr=18'.*/ print 'Percentiles for BB Join-Count statistic', 'Pr is the number of ponds with the species present.', percentilebb[rowname={'1' 5' '10' '90' '95' '99'} colname={ 'Pr=2' 'Pr=3' 'Pr=4' 'Pr=5' 'Pr=6' 'Pr=7' 'Pr=8' 'Pr=9' 'Pr=10'} format=f9.2]; /* END OF INSTRUCTION #8 SECTION*/ /* SEE INSTRUCTION #9 FOR THIS SECTION*/ /*In the line beginning 'colname', you must have statements of the form 'Pr=2' through Pr= the number of sites minus 2. For example, if you have 20 sites, then you need 'Pr=2' through 'Pr=18'.*/ print 'Percentiles for BW Join-Count Statistic', 'Pr is the number of ponds with the species present.', percentilebw[rowname={'1' 5' '10' '90' '95' '99'} colname={ 'Pr=2' 'Pr=3' 'Pr=4' 'Pr=5' 'Pr=6' 'Pr=7' 'Pr=8' 'Pr=9' 'Pr=10'} format=f9.2]; /* END OF INSTRUCTION #9 SECTION*/ print 'How to interpret percentiles.'; print 'Percentiles are cut off scores for a certain statistic, for example, if the 1st percentile for a BB join-count statistic is 75 then that means that 1% of all the join-count statistics values are below 75. For another example, if the 10th percentile for a BW statistic is 2,345 then that means that 10% of all the BW statistic values is below 2,345. Keep in mind that the report given by this software is based on a permutation test and therefore is subject to random error. So use these data wisely and cautiously. The writers of the program are not responsible for any errors in interpretation.'; /* END OF PROGRAM*/