Skip to main content
Log in

Towards reusing hints from past fixes

An exploratory study on thousands of real samples

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

With the usage of version control systems, many bug fixes have accumulated over the years. Researchers have proposed various automatic program repair (APR) approaches that reuse past fixes to fix new bugs. However, some fundamental questions, such as how new fixes overlap with old fixes, have not been investigated. Intuitively, the overlap between old and new fixes decides how APR approaches can construct new fixes with old ones. Based on this intuition, we systematically designed six overlap metrics, and performed an empirical study on 5,735 bug fixes to investigate the usefulness of past fixes when composing new fixes. For each bug fix, we created delta graphs (i.e., program dependency graphs for code changes), and identified how bug fixes overlap with each other in terms of the content, code structures, and identifier names of fixes. Our results show that if an APR approach knows all code name changes and composes new fixes by fully or partially reusing the content of past fixes, only 2.1% and 3.2% new fixes can be created from single or multiple past fixes in the same project, compared with 0.9% and 1.2% fixes created from past fixes across projects. However, if an APR approach knows all code name changes and composes new fixes by fully or partially reusing the code structures of past fixes, up to 41.3% and 29.7% new fixes can be created. By making the above observations and revealing other ten findings, we investigated the upper bound of reusable past fixes and composable new fixes, exploring the potential of existing and future APR approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://wala.sourceforge.net

  2. https://github.com/Sable/soot

  3. https://www.youtube.com/watch?v=oAiVsbXVP6k

  4. https://issues.apache.org/jira/browse/CASSANDRA-1157

  5. http://www.apache.org/

  6. http://db.apache.org/derby/derby_downloads.html

  7. https://issues.apache.org/jira/browse/LUCENE-1510. To save space, in the rest of the paper, we present only ids of fixes, but do not present their urls. All the fixes come from Apache, and their urls can be generated by replacing the id in the above url.

  8. http://wala.sourceforge.net/javadocs/trunk/

  9. PI is more restricted than PS and PN, but not the combination of the two. As a result, Column PI is not the sum of Columns PS and PN.

References

  • Barr ET, Brun Y, Devanbu P, Harman M, Sarro F (2014) The plastic surgery hypothesis. In: Proceedings of the ESEC/FSE, pp 306–317

  • Dagenais B, Hendren LJ (2008) Enabling static analysis for partial Java programs. In: Proceedings of the 23rd OOPSLA, pp 313–328

  • Fluri B, Wursch M, PInzger M, Gall HC (2007) Change distilling: tree differencing for fine-grained source code change extraction. IEEE Trans Softw Eng 33(11):725–743

    Article  Google Scholar 

  • Gabel M, Su Z (2010) A study of the uniqueness of source code. In: Proceedings of the FSE, pp 147–156

  • Gao Q, Zhang H, Wang J, Xiong Y, Zhang L, Mei H (2015) Fixing recurring crash bugs via analyzing Q&A sites. In: Proceedings of the 30th ASE, pp 307–318

  • Golub GH, Heath M, Wahba G (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2):215–223

    Article  MathSciNet  MATH  Google Scholar 

  • Higo Y, Ohtani A, Hayashi S, Hata H, Shinji K (2015) Toward reusing code changes. In: Proceedings of the MSR, pp 372–376

  • Jeffrey D, Feng M, Gupta N, Gupta R (2009) Bugfix: a learning-based tool to assist developers in fixing bugs. In: Proceedings of the 17th ICPC, pp 70–79

  • Jin G, Song L, Shi X, Scherpelz J, Lu S (2012) Understanding and detecting real-world performance bugs. In: Proceedings of the 33rd PLDI

  • Kaleeswaran S, Tulsian V, Kanade A, Orso A (2014) Minthint: automated synthesis of repair hints. In: Proceeding of the ICSE, pp 266–276

  • Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 654–670

  • Kim S, Pan K, Whitehead EEJ Jr (2006) Memories of bug fixes. In: Proceedings of the ESEC/FSE, pp 35–45

  • Kim S, Zimmermann T, Whitehead EJ Jr, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th ICSE, pp 489–498

  • Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: Proceedings of the 35th ICSE, pp 802–811

  • Kuhn HW (1955) The hungarian method for the assignment problem. Naval Res Logist Q 2(1–2):83–97

    Article  MathSciNet  MATH  Google Scholar 

  • Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012) A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. In: Proceedings of the 34th ICSE, pp 3–13

  • Le Goues C, Holtschulte N, Smith EK, Brun Y, Devanbu P, Forrest S, Weimer W (2015) The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Trans Softw Eng 41(12):1236–1256

    Article  Google Scholar 

  • Liang G, Wang Q, Xie T, Mei H (2013) Inferring project-specific bug patterns for detecting sibling bugs. In: Proceedings of the ESEC/FSE, pp 565–575

  • Liu C, Yang J, Tan L, Hafiz M (2013) R2Fix: automatically generating bug fixes from bug reports. In: Proceedings of the 6th ICST, pp 282–291

  • Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Proceedings of the 10th ESEC/FSE, pp 166–178

  • Long F, Rinard M (2016) Automatic patch generation by learning correct code. In: Proceedings of the 43rd POPL, pp 298–312

  • Lu S, Park S, Seo E, Zhou Y (2008) Learning from mistakes—a comprehensive study on real world concurrency bug characteristics. In: Proceedings of the 13th ASPLOS

  • Martinez M, Monperrus M (2013) Mining software repair models for reasoning on the search space of automated program fixing. Empir Softw Eng 20(1):176–205

    Article  Google Scholar 

  • Martinez M, Weimer W, Monperrus M (2014) Do the fix ingredients already exist? An empirical inquiry into the redundancy assumptions of program repair approaches. In: Proceedings of the 36th ICSE, pp 492–495

  • Meng N, Kim M, McKinley K S (2011) Sydit: creating and applying a program transformation from an example. In: Proceedings of the ESEC/FSE, pp 440–443

  • Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38

    Article  MathSciNet  MATH  Google Scholar 

  • Myers EW (1986) Ano(nd) difference algorithm and its variations. Algorithmica 1(1):251–266

    Article  MathSciNet  MATH  Google Scholar 

  • Negara S, Codoban M, Dig D, Johnson RE (2014) Mining fine-grained code changes to detect unknown change patterns. In: Proceedings of the 36th ICSE, pp 803–813

  • Nguyen TT, Nguyen HA, Pham NH, Al-Kofahi J, Nguyen TN (2010) Recurring bug fixes in object-oriented programs. In: Proceedings of the 32nd ICSE, pp 315–324

  • Nguyen HA, Nguyen AT, Nguyen TT, Nguyen TN, Rajan H (2013) A study of repetitiveness of code changes in software evolution. In: Proceedings of the 28th ASE, pp 180–190

  • Qi Y, Mao X, Lei Y, Dai Z, Wang C (2014) The strength of random search on automated program repair. In: Proceedings of the 36th ICSE, pp 254–265

  • Qi Z, Long F, Achour S, Rinard M (2015) An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In: Proceedings of the ISSTA, pp 24–36

  • Ray B, Kim M (2012) A case study of cross-system porting in forked projects. In: Proceedings of the ESEC/FSE

  • Rolim R, Soares G, D’Antoni L, Polozov O, Gulwani S, Gheyi R, Suzuki R, Hartmann B (2017) Learning syntactic program transformations from examples. In: Proceedings of the 39th ICSE, pp 404–415

  • Tan L, Liu C, Li Z, Wang X, Zhou Y, Zhai C (2014) Bug characteristics in open source software. Empir Softw Eng 19(6):1665–1705

    Article  Google Scholar 

  • Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceedings of the ICSE, software engineering in practice, pp 99–108

  • Taneja K, Dig D, Xie T (2007) Automated detection of API refactorings in libraries. In: Proceedings of the 22nd ASE, pp 377–380

  • Weimer W, Nguyen T, Le Goues C, Forrest S (2009) Automatically finding patches using genetic programming. In: Proceedings of the 31st ICSE, pp 364–374

  • Xiong Y, Wang J, Yan R, Zhang J, Han S, Huang G, Zhang L (2017) Precise condition synthesis for program repair. In: Proceedings of the ICSE, pp 416–426

  • Yin Z, Yuan D, Zhou Y, Pasupathy S, Bairavasundaram L (2011a) How do fixes become bugs? In: Proceedings of the ESEC/FSE, pp 26–36

  • Yin Z, Ma X, Zheng J, Zhou Y, Bairavasundaram L N, Pasupathy S (2011b) An empirical study on configuration errors in commercial and open source systems. In: Proceedings of the 23rd SOSP, pp 159–172

  • Yue R, Meng N, Wang Q (2017) A characterization study of repeated bug fixes. In: Proceedings of the ICSME, pp 422–432

  • Zhang J, Wang Z, Zhang L, Hao D, Zang L, Cheng S, Zhang L (2016) Predictive mutation testing. In: Proceedings of the ISSTA, pp 342–353

  • Zhong H, Meng N (2017) Poster: an empirical study on using hints from past fixes. In: Proceedings of the 39th ICSE, pp 144–145

  • Zhong H, Su Z (2015) An empirical study on real bug fixes. In: Proceedings of the 37th ICSE, pp 913–923

  • Zhong H, Wang X (2017) Boosting complete-code tool for partial program. In: Proceedings of the ASE, pp 671–681

  • Zhong H, Zhang L, Xie T, Mei H (2009) Inferring resource specifications from natural language API documentation. In: Proceedings of the 24th ASE, pp 307–318

Download references

Acknowledgements

Hao Zhong is sponsored by the 973 Program in China (No. 2015CB352203), the National Nature Science Foundation of China No. 61572313, and the grant of Science and Technology Commission of Shanghai Municipality No. 15DZ1100305. Na Meng is sponsored by the NSF CCF No. 1565827 and ONR-450487.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Zhong.

Additional information

Communicated by: Sunghun Kim

This paper is an extended version of a poster paper (Zhong and Meng 2017) that is presented in the 39th International Conference on Software Engineering, 2017.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, H., Meng, N. Towards reusing hints from past fixes. Empir Software Eng 23, 2521–2549 (2018). https://doi.org/10.1007/s10664-017-9584-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9584-3

Keywords

Navigation