Going through a computational linguistics program will bring you in touch with Zipf’s Law. Its core claim:
In a corpus, the frequency of any word is inversely proportional to its rank.
Translated into less-wordy terms, it means that some words (events) occur very often and many words only occur a few times, or only once.
Zipf’s Law also holds for similar structures like DNA, but the distribution can also be observed in the user reputation of Stack Overflow. The following three graphs contain the reputation (X) and the frequency of this particular reputation value (Y) on log-scaled axes. With increasing normalization, the plot gets more Zipf-like, with the typical long “tails” at the lower end.



If we plot the mass distribution of reputation orderd by decreasing reputation on log-log axes, we get something that looks like the cumulative of an exponential distribution:

On 2009-02-05, the total amount of reputation on Stack Overflow was 8,491,989, and around 15% of the users make up 85% of the reputation (not completely Pareto’s 80-20), with the top user (of 41,082) owning 0.39% of the overall reputation.
For these graphs, I’ve scraped the user overview pages, scraping every single user page would allow for more interesting (and accurate, since inactive users can be removed) statistics, but I’d rather wait for a proper API.


![Validate my RSS feed [Valid RSS]](http://shlomme.diotavelli.net/images/valid-rss.png)
0 responses so far ↓
There are no comments yet...Kick things off by filling out the form below.
Leave a Comment