Förderjahr 2018 / Stipendien Call #13 / ProjektID: 3844 / Projekt: Essays on Communities
To test our theoretical predictions, we deploy the data from GitHub, an online OSS hosting community. GitHub is the largest, public OSS platform providing coding environment and social tools to software developers.
We downloaded the GitHub data from the official archive (http://ghtorrent.org/) that mirrors the data available through the GitHub API. The archive was released to the public on January 19, 2017. The data contains all activities performed by registered developers on the platform from the GitHub’s inception date (29 October 2007) until January 2017. To keep the analyses manageable, we computed a dataset that gives us the information on all founding, pull request, and issue activities performed on GitHub for the period since platform’s inception up until June 2011. Aggregation of activities for our final analysis leaves us with the data on 514,041 project founder‐month observations of coders’ activities across 737,309 GitHub repositories.
The obtained data allows us to observe different types of helping behavior between community participants: code exchange (=sending and receiving pull requests), advice exchange (=opening issues), or project maintenance (=merging or rejecting incoming pull requests) and related communication. Our identification strategy includes the deployment of a refined degree centrality measure for code‐exchange ties stipulated between OSS repository founders and other programmers. Here, by discerning the effect of the diversity of helping behaviors that a repository founder provides to and receives from other participants on his or her projects’ success, we are capable of tracking non-equivalent reciprocity among participants in general.
For our tests we deploy OLS log-log models with founder level fixed effects to control for time-invariant founder characteristics, which we do not observe (such as race, mother tongue, formal education). The analysis is in currently in progress.