Exploring the AngelList Dataset / by Aleksandar Bradic

Angellist and Crunchbase datasets hold the keys to understanding the dynamics of modern startup ecosystems. In 2013, we created a simple experience allowing users interactive browsing of such communities and discovery of intricate relationships between people,  ideas, and capital.

Two years later, it feels like it's time to revisit this problem and create a new,  improved version of the experience. 

But before we do, let's look at some fundamentals aspects of this dataset.

(all results are based on a random sample of 1,189,116 users and 354,560 startups, last updated on July 7 2015)


NUMBER OF followerS

    Min.  1st Qu.   Median  Mean    3rd Qu.     Max. 
    0.00     0.00     2.00    19.93    10.00        41580.00 

As expected, the follower distribution is highly asymmetrical. While most of the users have only a few followers (50% have less than 2), 1% of them has more than 263, and 0.15% (1371 users) have more than 1000. While the median number is somewhat skewed by the fact that a large number of profiles are not active on the system, this might suggest that the global entrepreneurial system seems dominated by a small group of about thousand key "influencers."

This sounds intriguing, so let's get a bit more qualitative and investigate the very tail of this distribution...

(in progress)