## FETA (Framework for Evolving Topology Analysis) software## FETA in use examplesThis transcript shows how to interact with FETA. For this example you will need python R and the FETA software. An example of FETA in use on real data will follow shortly. ## Artificial test networkLet us assume we are creating an artificial network to test first. This network is going to be specified using the netcreator software.
This says, do 30,000 iterations of statistical model 5 and put the output into a file called net5. This creates a test network to play with. This command may take some time to run so feel free to create a smaller one if you are just testing. The file
This is the FETA model format. In brief S specifies a “simple graph” (no repeated edges between a node pair). n and e specify the “outer model” – this says that every new node is connected to either 1, 2, 3 or 4 nodes with given probabilities. A new node is followed by 0, 1, 2 or 3 edges between existing nodes (also with given probabilities). The lines beginning with N specify an “inner model” for nodes connecting to new nodes. 70% of the model is PFP with delta = 0.048 30% of the model is connection to singleton nodes. The lines beginning with E specify an “inner model” for nodes between existing edges. 50% of the model is totally random, 20% connects to doubleton nodes and 30% is proportional to a node’s “triangle count”. OK – now run the analyser to produce the files node5 and edge5. In this case the precise details of the model do not matter. The file “simplegraphmodel” is ideal when you know only that you are dealing with a “simple” graph.
The -w flag skips the first 1000 edges as “warm up” – just in case too small a model biases the sample. You can also use -t to specify a start time if your file uses times. The -r 0.01 “thins” the data by only looking at 1 in every 100 choices. Let’s check how much data we have.
With this much data R might run OK but might not. Rough guide for the R software – look for significant parameters and look for the models with the lowest “deviance”. Now start R and type
This loads the FETA software into R. Now make a first attempt to fit
The output should be
This tries to fit the FETA model to the data in “node5” (our data for connections to new nodes). It adds a connection to degress and a random factor unless you tell it not to. Get a summary as follows
The p factors (and the helpful stars) tells us that the degrees and the single parts were a good guess but the double and the random factor part not so much. Ignore the “Null deviance” – it isn’t useful. The “Residual Deviance” and AIC should be as low as possible. However, later you will see a better way to get this. Let’s try a different model without the double part and the random part.
We might also suspect a PFP model. Let’s imagine we do (since it is right). It’s usually a bad idea to mix a degree based model and a PFP model so take the degrees out and drop the PFP model in. So this model is a mix of PFP and singleton (which is correct but pretend we don’t know that). Also pretend we don’t know delta so let’s put in a bad value.
singlecol with 2 variables summary(l) Call: glm(formula = fmla, family = family, start = mystart) Deviance Residuals: Min 1Q Median 3Q Max -0.85330 -0.02325 -0.01769 -0.01416 4.55237 Coefficients: Estimate Std. Error z value Pr(>|z|) pfpcol 0.68565 0.07300 9.393 < 2e-16 *** singlecol 0.30499 0.05373 5.677 1.37e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: Inf on 553186 degrees of freedom Residual deviance: 2969.5 on 553184 degrees of freedom AIC: 2973.5 Number of Fisher Scoring iterations: 4 OK – this has not worked out so badly. It’s a worse model than the degree based model (because the AIC and deviance are higher) but that might be due to the wrong delta. There’s an automatic procedude for finding good deltas but it is SLOOOOOW.
This will search the model chaging the PFP delta parameter from 0.02 to 0.06 in steps of 0.01. It prints the deviance which we want to be low. (It is a little bit of a cheat that I already know the answer to be in this range).
Our best value is 0.05 which is pretty good really (0.048 is correct).
Not so bad but the deviance for the PFP and the degree models is similar. 2968.5 for the degree model and 2965.1 for PFP. This is the important part – now to use the netanalyser to test the likelihood. Create a model file for the two models
We want to race this against testmodel2. This is the similar but uses the results from the degree modelling not the pfp modelling.
Back to the netanalyser program – the new -S flag asks for likelihood statistics.
We are no longer intrested in the node and edge files so these are thrown away
(to Note that the exact results depend on the exact network created which was from a random process. They should be close to this however.
The Deviance and null deviance should be as low as possible. (The null likelihood should be more or less the same – it is the likelihood of a random model). The mean prob rel random should be high.
The results are
Model 1 is better than model 2 in this test therefore. The final winning model is .73 PFP with delta = 0.05 .27 singleton The actual answer was .7 PFP with delta 0.048 and .3 singleton. Contact: Richard G. Clegg (richard@richardclegg.org) |