Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
How different are different diff algorithms in Git?: Use --histogram for code changes
Ist Teil von
Empirical software engineering : an international journal, 2020, Vol.25 (1), p.790-823
Ort / Verlag
New York: Springer US
Erscheinungsjahr
2020
Link zum Volltext
Quelle
SpringerLink (Online service)
Beschreibungen/Notizen
Automatic identification of the differences between two versions of a file is a common and basic task in several applications of mining code repositories. Git, a version control system, has a diff utility and users can select algorithms of diff from the default algorithm
Myers
to the advanced
Histogram
algorithm. From our systematic mapping, we identified three popular applications of diff in recent studies. On the impact on code churn metrics in 14 Java projects, we obtained different values in 1.7% to 8.2% commits based on the different diff algorithms. Regarding bug-introducing change identification, we found 6.0% and 13.3% in the identified bug-fix commits had different results of bug-introducing changes from 10 Java projects. For patch application, we found that the
Histogram
is more suitable than
Myers
for providing the changes of code, from our manual analysis. Thus, we strongly recommend using the
Histogram
algorithm when mining Git repositories to consider differences in source code.