Following with the set of posts about my subject in Project Evaluation, today I’m going to talk about the different data we have to retrieve from a SCM project we want to study.
This value is really important, with this value we can check the activity in this project and the activity which the developers make in the project.
The data regarding commits we need to know are:
- Name of the developer who did the commit.
- Date of commit.
- Files involved in the commit.
- Message left by the committer or author (commit description).
- Lines involved in the changes.
With this data about each commit we can perform a lot of different queries to get the data we want, for example regarding the size of the project is really interesting to know:
- Number of commits.
- Number of commiters.
- Number of files involved in commits.
- Number of lines involved in commits.
- Number of commits performed per each commiter.
- % effort.
Studying the mails from the mailing lists we can know data regarding the size of the community, in concrete like:
- Number of unique posters in the mailing lists.
- Number of users posting (conventional users).
- Number of developers posting.
- Number of mailing lists.
We can study also the bugs of a concrete FLOSS projects from its bug tracking system, this data is really useful for us with data like:
- Number of bugs.
- Number of open bugs.
- Number of closed bugs.
- Number of developers fixing bugs.
- Number of bugs reported.
- Number of bugs along different parts of time.
To get this data we can go to the Ohloh website which has a lot of information regarding this, but this data is “brute”, I mean, the data is showed in the website in a simple way, you can check the number of commits, the number of lines of code, the number and name of commiters, etc, but you can’t get the data by yourself and making queries.
You can if you use the same tools that Ohloh use, which there are in the GitHub website under FLOSS licenses, so after use this tools you could get the data and use as you want.
Maybe, at least for the clasmmates of MSWL, is easier to use the LibresoftTools, a set of tools created by the research group Libresoft from Universidad Rey Juan Carlos (the ones who make this M.Sc. possible) tools like CVSAnaly, MSLStats, Bicho, etc.
I’ve been used CVSAnaly since the last two years, and this tool is completely amazing, you can retrieve data from the Subversion, CVS and Git repositories and after that in a database created automatically you can perform a lot of queries and to explode the data.
If also we use R as our statistics application, in combination with the data in the database retrieved using CVSAnaly, we can create graphs, boxplots, etc. to make more visible the conclusion of each study.
And that’s all, see u my friends!!