FETA (Framework for Evolving Topology Analysis) software

The format of this data is described here.

Download links are gathered here for convenience.

  1. arxiv coauthorship download link.

  2. routeviews AS download link.

  3. UCLA AS download link.

  4. gallery data download link.

  5. flickr data download link.

ArXiv data

A publication co-authorship network was obtained from the online academic publication network arXiv. The first paper was added in April 1989 and papers are still being added to this day. To keep the size manageable, the network was produced just from the papers categorised as math. The network is a co-authorship network: an edge is added when two authors first write a paper together. In this case, because it is required that the network remains connected, edges which are not connected to the largest connected component are ignored. Multiple edges between two authors are not added. The processing of this network is far from perfect, only author names (rather than unique IDs) are matched. Inconsistent naming conventions mean some authors are recorded by first name and surname, and some by initial and surname. To avoid problems matching John Smith, J. Smith and John W. Smith, the match is on first initial and surname, though it is clear this will allow some collisions. one paper was removed from analysis. The paper has sixty authors, far more than the paper with the next largest number of authors. Since each author on a paper forms a graph clique with all the other coauthors in that same paper, this paper added 1,732 links for which no arrival order significant to the evolution of the network could be found. As a size 60 clique would distort most network statistics, it was rejected as an outlier.

arxiv download link.

UCLA AS data

The data set we refer to here as the UCLA AS data set is a view of the Internet AS topology seen between January 2004 and August 2008. It comes from the Internet topology collection maintained by Oliviera et. al.. These topologies are updated daily using data sources such as BGP routing tables and updates from RouteViews, RIPE Abilene and LookingGlass servers. Each node and link is annotated with the times it was first and last observed during the measurement period. Thanks to Raul Landa rlanda@ee.ucl.ac.uk for preparation of this data.

Here is the UCLA AS download link.

Routeviews AS data

For the present paper we define the RouteViews AS data set as the view of the Internet AS topology from the point of view of a single RouteViews data collector. The raw data used to construct it comes from the University of Oregon Route Views Project, and it was recovered from the parsing of the routing tables obtained by running show ip bgp on the command line of route-views3.routeviews.org and capturing the output. To construct the node and link arrival process to which we fit our evolution models we process one such table dump per day over the time interval between April 11th, 2007 and January 16th, 2009. It is well known that an AS map obtained in such a way will not be representative of the true AS Internet topology. Thanks to Raul Landa rlanda@ee.ucl.ac.uk for preparation of this data.

The routeviews AS download link.

Gallery data

The website known simply as gallery is a photo sharing website. To be able to upload pictures and have some control over the display of pictures, users have to create an account and login. From webserver logs, the path logged in users browse as they move across the network can be followed. Thus, images become nodes in the networks, and a user browsing between two photos creates a link between the two nodes that represent them. These links are overlaid for all users in order to form our network. Thanks to Uli Harder uh@doc.ic.ac.uk for provision of this data.

Here is the gallery data download link.

Flickr data

The Flickr website allows users to associate themselves with other users by naming them as Contacts. In “Growth of the flickr social network” Mislove et al describe how they collected data for the graph made by users as they connect to other users. Their original data and a full description is available here. The first 100,000 links of this network arranged as a connected graph is analysed here. The graph is generated by a web-crawling spider so the order of arrival of edges is the order in which the spider moves between the users rather than the order in which the users made the connections. Thus, the evolution dynamics of this network will be determined by the spidering code. Thanks to Uli Harder uh@doc.ic.ac.uk for putting this data in the correct form for FETA.

Contact: Richard G. Clegg (richard@richardclegg.org)