Piñeros Rodríguez, Sierra Martínez, Peluffo Ordóñez & Timana Peña / INGE CUC, vol. 19 no. 1, pp. 22–36. January - June, 2023

Effort Estimation in Agile Software Development: A Systematic Map Study

Estimación del Esfuerzo en el Desarrollo de Software Ágil: Mapeo Sistemático

DOI: http://doi.org/10.17981/ingecuc.19.1.2023.03

Scientific Research Article. Date received: 27/12/2021. Date aceptance: 22/12/2022.

Camilo Andrés Piñeros Rodríguez

Universidad del Cauca. Popayán (Colombia)

prcamilo@unicauca.edu.co

Luz Marina Sierra Martínez

Universidad del Cauca. Popayán (Colombia)

lsierra@unicauca.edu.co

Diego Hernán Peluffo Ordóñez

Universidad Mohammed VI Polytechnic Ben Guerir (Marruecos)

diego.peluffo@sdas-group.com

Jimena Adriana Timana Peña

Universidad del Cauca. Popayán (Colombia)

jtimana@unicauca.edu.co

To cite this paper

C. Piñeros Rodríguez, L. Sierra Martínez, D. Peluffo Ordóñez & J. Timana Peña, “Effort Estimation in Agile Software Development: A Systematic Map Study”, INGE CUC, vol. 19, no. 1, pp. 22–36, 2023. DOI: http://doi.org/10.17981/ingecuc.19.1.2023.03

Abstract

Context— Making effort estimation as accurate and suitable for software development projects becomes a fundamental stage to favor its success, which is a difficult task, since the application of these techniques in constant changing agile development projects raises the need to evaluate different methods frequently.

Objectives— The objective of this study is to provide a state of the art on techniques of effort estimation in Agile Software Development (ASD), performance evaluation and the drawbacks that arise in its application.

Methodology— A systematic mapping was developed involving the creation of research questions to provide a layout of this study, analysis of related words for the implementation of a search query to obtain related studies, application of exclusion, inclusion, and quality criteria to filter nonrelated studies and finally the organization and extraction of the necessary information from each study.

Results— 25 studies were selected; the main findings are: the most applied estimation techniques in agile contexts are: Estimation of Story Points (SP) followed by Planning Poker (PP) and Expert Judgment (EJ). The most frequent solutions supported in computational techniques such as: Naive Bayes, Regression Algorithms and Hybrid System; also, the performance evaluation measures Mean Magnitude of Relative Error (MMRE), Prediction Assessment (PRED) and Mean Absolute Error (MAE) have been found to be the most commonly used. Additionally, parameters such as feasibility, experience, and the delivery of expert knowledge, as well as the constant particularity and lack of data in the process of creating models to be applied to a limited number of environments are the challenges that arise the most when estimating software in Agile Software Development (ASD).

Conclusions— It has been found there is an increase in the number of articles that address effort estimation in agile development, however, it becomes evident the need to improve the accuracy of the estimation by using estimation techniques supported in machine learning that have been shown to facilitate and improve the performance of this.

Keywords— Effort estimation; agile software development; issues and challenges; automatic learning; performance metrics

Resumen

Contexto— Realizar una estimación de esfuerzo lo más precisa y adecuada para proyectos de desarrollo de software, se ha convertido en pieza fundamental para favorecer el éxito y desarrollo de estos, sin embargo, aplicar este tipo de estimación en proyectos de desarrollo ágil, en donde los cambios son constantes, la convierte en una tarea muy compleja de implementar.

Objetivo— El objetivo de este estudio es proveer un estado del arte sobre técnicas de estimación de esfuerzo en Desarrollo de Software Ágil (ASD), la evaluación de su desempeño y los inconvenientes que se presentan en su aplicación.

Metodología— Se desarrolló un mapeo sistemático que involucró la creación de preguntas de investigación con el fin de proveer una estructura a seguir, análisis de palabras relacionadas con el tema de investigación para la creación e implementación de una cadena de búsqueda para la identificación de estudios relacionados con el tema, aplicación de criterios de exclusión, inclusión y calidad a los artículos encontrados para poder descartar estudios no relevantes y finalmente la organización y extracción de la información necesaria de cada artículo.

Resultados— De los 25 estudios seleccionados; los principales hallazgos son: las técnicas de estimación más aplicadas en contextos ágiles son: Estimación por medio de Puntos de Historia (SP) seguidos de Planning Poker (PP) y Juicio de Expertos (EJ). Soluciones soportadas en técnicas computacionales como: Naive Bayes, Algoritmos de Regresión y Sistema Híbridos; también se ha encontrado que la Magnitud Media del Error Relativo (MMRE), la Evaluación de la Predicción (PRED) y Error Absoluto Medio (MAE) son las medidas de evaluación de desempeño más usadas. Adicionalmente, se ha encontrado que parámetros como la viabilidad, la experiencia y la entrega de conocimiento de expertos, así como la constante particularidad y falta de datos en el proceso de creación de modelos para aplicarse a un limitado número de entornos son los desafíos que más se presentan al momento de realizar estimación de software en el Desarrollo de Software Ágil (ASD).

Conclusiones— Se ha encontrado que existe un aumento en la cantidad de artículos que abordan la estimación de esfuerzo en el desarrollo ágil, sin embargo, se hace evidente la necesidad de mejorar la precisión de la estimación mediante el uso de técnicas de estimación soportadas en el aprendizaje de máquina que han demostrado que facilita y mejora el desempeño de este.

Palabras clave— Estimación del esfuerzo; desarrollo ágil de software; retos y desafíos; aprendizaje automático; métricas de desempeño

I. Introduction

Effort estimation is the process used to predict the required effort for a given task [1] , it is a complex and essential task for the management of software projects, specifically for planning and monitoring [2]; when they are executed correctly, the chances that the project achieves its cost and time objectives are raised considerably [3].

Today, a significant increase in the use of agile methodologies in different software development organizations around the world can be noticed [4], considering the characteristics of agile approaches such as iterative, incremental, short development cycles, active involvement of the customer with requirements that are changing frequently, demanding a quick, flexible and collaborative response by the development team [4], [5], therefore, effort estimation becomes a continuous challenge, requiring to be constantly adjusted [6].

Recent research has shown that high accuracy rate in predicting effort estimation has greatly increased the chance of obtaining a successful and quality product [1]. Consequently, inaccurate estimates negatively impact the development of a software project [7] , generating two results namely: underestimates, which can lead to the termination of projects since budgets and schedules are exceeded or overestimates, where resources can be wasted [8].

Taking into account the above, effort estimation plays an essential role evaluating the success or failure of software projects [8], it is necessary to be in consistent exploration of the current state of literature related to the effort estimation processes of Agile Software Development (ASD), such as those reported in Brazil [9], and Saudi Arabia [10], among others, which have focused on exploring techniques, datasets and cost directors, considering this, the aim for this research is also to explore the current state of the topic with the addition of an analysis on how the present problems in the estimation process have been addressed.

With the purpose of contributing to the field of computing, more specifically in the effort estimation subject, this article presents a systematic mapping on the current knowledge state of the different techniques of estimation of effort in ASD, the performance measures used and the problems and challenges that arise in the subject. For the development process the steps proposed by Swedish researchers (BTH) [11] were included: construction of 4 research questions, one of which involves a small bibliometric analysis, definition of keywords and the search string used in the databases ACM, IEEE Xplore, Springer, Scopus and Web of Science, where a total of 708 articles were found; application of inclusion, exclusion and quality criteria and finally the implementation of snowball sampling with the help of the tool Connected Papers that helps with the exploration of relevant documents, to finally select 25 articles, which were used to answer the implemented research questions.

The main contributions of this research are: An update on the most used techniques for effort estimation in ASD, and an exploration on the most common problems when performing effort estimation in ASD. The rest of the article is structured as follows: Section 2 presents related works. Section 3 describes the research protocol used. The results of the study are detailed in Section 4. Finally, some conclusions and future work are presented in Section 5.

II. Related Works

At present there is a growing interest in the estimation of effort in ASD projects, which is understood by the tendency to use agile approaches such as XP, SCRUM or others. This has motivated the generation of studies that group the current state of the literature on the subject, favoring new methods or techniques, metrics, cost predictors, and the addressing of challenges and issues encountered on a daily basis. Table 1 lists with relevant details some studies that were found in the area, below are their scopes.

Table 1.

Description of Related Works.

Ref	[10]	[12]	[9]	[14]	[15]
Authors	Alsaadi & Saeedi	Fernández-Diego et al.	Dantas et al.	Hacaloglu & Demirors	Altaleb & Gravell
Year	2022	2020	2018	2018	2018
Time period	2011-2019	2014-2020	2014-2018	Not	Not
Class	SLR	SLR	Review	SLR	SLR
Research Protocol	[16]	[17]	[18]	[16]	[19]
Questions	4	4	4	2	3
Databases	IEEE, SD, SPR L,WLY,Wos	ACM, IEEE, SCP,Wos	GS,SCP	SCP, IEEE, SD, ACM, WOS	IEEE, ACM, SCP, WOS, CPI
Num studies	11	73	24	40	21
Item type	REV,CFR	REV CFR	REV, CFR	REV,CFR,WSH	REV,CFR,WSH
WOS: Web of Science (WOS), Science Direct (SD), IEEE Explore (IEEE), SCOPUS (SCP), WILLEY (WLY), SPRINGER LINK (SPR L), Google Scholar GS), Journal (REV), Conference (CFR), Workshop (WSH), No specified (NE), Compendex and Inspec (CPI)

Source: Authors.

In 2022, KAU (Saudi Arabia) presented an SLR (Systematic Literature Review) on data-driven techniques for effort estimation with user stories, covering aspects of techniques, method performance evaluation, independent effort factors (personnel, product, process, and estimation) and datasets features [10]; as a result, it was found that it is crucial to include user stories. for the estimation of effort in ASD projects. Also in 2022, HIOF (Norway) develops a survey on effort estimation techniques in ASD [6], its benefits, reasons why the estimates are not precise and their repercussions; they obtained 53 responses from 7 countries; the techniques reported in the survey were Bucket System, Dot Voting, Expert Estimation, Planning Poker, Team Estimation Game, Swimlane Sizing, Use Case Point, Story Point used in development approaches such as SCRUM, Combinations devOps and Scrum (DS), DevOps, XP, Kanban and Scrum (DXKS); among the benefits are: guiding the team to successfully complete the project; identifying accurately resources and the scope of the project; helping with the identification of early important events: winning precision, among others. The reasons why the estimates are imprecise were also grouped into five categories: related to requirements (complexity, uncertainty, changing, losses, not having non-functional requirements in mind, poor user stories), project management (poor change control, deviant scope, no SCRUM master guidance, unstructured processes), team (distribution, dominant personalities, inexperience), excess optimism (consideration of the best scenario, underestimation of work) and others (hardware issues, ignored effort tests, lack of stakeholder involvement).

In 2020, UPV and IMF (Spain) conducted an SLR [12], that updates the SLR presented by BTH (Sweden) and UFJF (Brazil) [13] with studies from 2014 to 2022, the research questions focus on methods of effort estimation, predictors, dataset features in agile development. They analyze 73 articles and compare them with the results obtained [13], finding a significant number of articles both in the evolution of time and in SP, Cosmic FP, Lines of Code, data sets, application area, among others. They also identified a significant number of cost factors, which were grouped into five categories: Project (Complexity, Risk, Clarity of requirements, Novelty, Quality), Team (Experience, Developer skills, Familiarity, Speed, Communication, size, Work by hours or days and Availability of developers), Techniques (software and development tools, Impact of existing systems), User Stories (Priority, Sprint, Type of development) and Others (textual information, COCOMO cost drivers, process maturity and others not reported) [12].

In 2018, UFCG (Brazil) updated the SLR [9], created by BTH (Sweden) and UFJF (Brazil) [13], through a systematic review where they selected 24 articles, the research questions were based on estimation techniques, metrics and their accuracy, effort predictors, datasets features, for agile development projects, planning levels and development activities that have been investigated, finding that XP is little mentioned in new articles, solutions based on Artificial Intelligence and Machine Learning (Bayesian networks and optimization algorithms) mostly, and identified 10 cost managers (cost driver) for effort estimation (Quality requirements, task size, integration, priority, complexity, stakeholder delay, team composition, work environment experience and technical skill) [9]. Also in 2018, METU and IYTE (Turkey) developed an SLR [14], whose research questions focus on size estimation methods such as Function Points (Simplified, NESMA, IFPUG, Cosmic), story points, use case points, Web Objects and User points; and challenges for agile software development such as misinterpretation of size measurements, difficulties in the application, acceptance of the measurement and estimation process. Likewise, in 2018, Soton (UK) and IMSIU (KSA) elaborated an SLR [15], on the estimation techniques, predictors, their precision, and efficiency used in agile development processes and contrast them with those used in mobile applications finding estimation techniques such as COSMIC FSM, Function Points, FiSMA FSM, Expert Judgment, Analog Estimation, Regression; additionally, they reviewed the cost directors, the development environments, the types of datasets used, and the development activities carried out.

The present research contributes by presenting an update of the effort estimation techniques and the most used metrics in ASD, in addition to the identification of more relevant problems.

III. Methodology

As mentioned in the Introduction, the main objective of this study is to review the current state of the literature on software estimation in SDA. The design of this document is based on the steps proposed by BTH [11], and for its development 4 research questions presented in the following subsection were formulated:

A. Research Questions

In response to the objective of this document, a review was made of the current knowledge state of the techniques or methods and metrics used for effort estimation of software in agile environments, as well as the problems or challenges present in its use. Therefore, four research questions presented in Table 2 were posed, where the motivation of each one is indicated:

Table 2.

Research Questions.

ID	Research Opportunities
ID	Question	Motivation
RQ1	How has the literature on software estimation (effort) in agile development evolved in the period between 2018 - 2022?	Identify the most cited, the periods of time in which they have been most reported, the types of works and their participants.
RQ2	What techniques (methods) have been used in estimating software in agile development in the period between 2018 - 2022?	Reveal the techniques that have been most explored and developed.
RQ3	What metrics have been used to measure accuracy in estimating effort in software development within an agile environment?	Reveal the metrics that have been used the most.
RQ4	What are the most relevant problems and their causes in the estimation of effort in software development within an agile environment?	Synthesize the challenges and drawbacks that have arisen.

Source: Authors.

B. Search

For the identification of keywords related to the study, the PICO strategy (Population, Intervention, Comparison and Outcomes) presented by KUSU and UoD (UK) is applied [16]. For this study, the strategy is composed of: Population: Studies that involve agile methodologies for software development. Intervention: techniques/methods of estimation of effort such as Planning Poker, User History Points, among others. Comparison: Compare different software estimation techniques applied to effort estimation in agile environments, issues, and challenges. Results: techniques, metrics, challenges, problems in estimating effort in agile development projects.

The study began with an exhaustive literature search in different databases identifying words that are aligned with the interest of the research and thus cover the largest number of articles related to the topic of interest. In Table 3 the keywords are presented:

Table 3.

Description of Keywords and Related Words.

Main Word	Related Words
Effort Estimation	Cost Estimation
Machine Learning	Artificial Intelligence
Planning Poker	Agile Development, Agile
Challenge	Difficulties, Failures, Issues, Gaps

Source: Authors.

This provides the necessary information to build the search query as follows: (“Effort Estimation” or “Cost Estimation”) and (“Machine Learning” or “Artificial Intelligence”) and (“Agile Development” or “Agile”) and (“Challenge” or “Difficulties” or “Failures” or “Issues” or “Gaps”). The search query was modified to meet the requirements of each database and to find only articles relevant to the study, the search was limited to include only articles published in the last four years (2018 to 2022). Table 4 shows the databases selected for the search process, the search string, and the obtained studies. A total of 708 articles were found which were analyzed according to the criteria, described in the following subsection.

Table 4.

Results obtained from the implementation of the search string on each database.

Database	Search String	No.
IEEE Xplore	(“Effort Estimation” OR “Cost Estimation”) AND (“Machine Learning” OR “Artificial Intelligence”) AND (“Agile Development” OR “Agile” OR “Scrum”) AND (“Challenge” OR “Difficulties” OR “Failures” OR “Issues” OR “Gaps”)	2
Springer Link	(“Effort Estimation” OR “Cost Estimation”) AND (“Machine Learning” OR “Artificial Intelligence”) AND (“Planning Poker” OR “Agile Development” OR “Agile” OR “Scrum”) AND (“Challenge” OR “Difficulties” OR “Failures” OR “Issues” OR “Gaps”)/	142
Web of Science (WOS)	ALL=((“Effort Estimation” OR “Cost Estimation”) AND (“Machine Learning” OR “Artificial Intelligence”) AND (“Planning Poker” OR “Agile Development” OR “Agile” OR “Scrum”) AND (“Challenge” OR “Difficulties” OR “Adversities” OR “Failures” OR “Issues” OR “Gaps”)))	5
Scopus	ALL ( ( “Effort Estimation” OR “Cost Estimation” ) AND ( “Machine Learning” OR “Artificial Intelligence” ) AND ( “Planning Poker” OR “Agile Development” OR “Agile” OR “Scrum” ) AND (“Challenge” OR “Difficulties” OR “Adversities” OR “Failures” OR “Issues” OR “Gaps”)) AND PUBYEAR > 2017	527
ACM Digital Libray	[[All: “effort estimation”] OR [All: “cost estimation”]] AND [[All: “machine learning”] OR [All: “artificial intelligence”]] AND [[All: “planning poker”] OR [All: “agile development”] OR [All: “agile”] OR [All: “scrum”]] AND [[All: “challenge”] OR [All: “difficulties”] OR [All: “failures”] OR [All: “issues”] OR [All: “gaps”]] AND [Publication Date: (01/01/2018 TO 12/31/2022)]	20

Source: Authors.

C. Selection of Studies and Quality Analysis

The exclusion criteria defined for the selection of articles were: CE1: The title, abstract and content is not related to the search query used. CE2: The study was not written in English. CE3: Studies focused on the estimation of the effort for maintenance, testing and analysis of failures in the field of software. EC4: The complete paper was not available.

The inclusion criteria applied were: CI1: Studies related to effort estimation in agile development. CI2: Papers published in scientific journals and conferences related to effort estimation in SDA. CI3: Studies published between January 2018 and April 2022.

The quality criteria defined were: CC1: The Content Validity Index (CVI) [20], [21], with 4 experts and a Likert scale with 4 options for 3 defined criteria: Relevance (1. Not significant, 2. Not relevant, 3. Relevant, 4. Very relevant), Clarity (1. Not clear, 2. Presents ambiguity, 3. Clear, 4. Very Clear), Specificity (1. Doubt, 2. Very general, 3. Specific, 4. According to what is required). CC2: The citation index of studies CI which is made up of the number of citations of the study so far over the number of years of publication of the study, where NC is the number of citations of the study and A is the number of years of the study so far and is calculated according to (1) , which presents an egalitarian environment in which recent articles are not penalized in a restrictive way.

CC3: Relationship Index with Research Questions or IRPI [22], is calculated according to (2), where NP is the number of research questions that relate to the study and TPI is the total number of research questions.

Table 5 presents the results of the search and selection process once the criteria mentioned above were applied.

Table 5.

Selection of Studies.

Database	Applying criteria				Selected	Final participation
Database	Excl.	Incl.	Qty.	Snow	Selected	Final participation
IEEE Xplore	2	2	2	0	2	8%
Springer Link	26	14	6	0	6	24%
Wos	4	3	1	4	1	4%
Scopus	70	26	14	8	15	60%
ACM	6	2	1	0	1	4%
Total	108	47	24	12	25	100%

Source: Authors.

Considering that the snowball sampling is based on the selection of papers according to their citations and references this procedure was applied to all selected studies with the help of the website Connected Papers (http://connectedpapers.com), which brings articles with their citations and references helping to identify additional studies of interest for this work.

D. Information Extraction

Is important to consider that all studies were provided identified from the beginning of the search process, Table 6 presents the format used for the extraction of the selected studies with its corresponding reference.

Table 6.

Data Extraction.

ID	Title	Year	Publication type	Questions
1	An Intelligent Recommender and Decision Support System (IRDSS) for Effective Management of Software Projects [23]	2020	Journal	RQ1, RQ2, RQ3, RQ4
2	A deep learning model for estimating story points [24]	2019	Journal	RQ1, RQ2, RQ3,
5	A Comparative Analysis on Effort Estimation for Agile and Non-agile Software Projects Using DBN-ALO [25]	2020	Journal	RQ1, RQ2, RQ 3,
6	An ensemble-based model for predicting agile software development effort [26]	2019	Journal	RQ1, RQ2, RQ3
8	A predictive model to estimate effort in a sprint using machine learning techniques [2]	2021	Journal	RQ1, RQ2, RQ3
12	Effort estimation in agile software development using experimental validation of neural network models [27]	2019	Journal	RQ1, RQ2, RQ3, RQ4
68	Efficient Approaches to Agile Cost Estimation in Software Industries: A Project-Based Case Study [28]	2021	Conference	RQ1, RQ2, RQ3, RQ4
83	Quality Requirements Challenges in the Context of Large-Scale Distributed Agile: An Empirical Study [29]	2018	Conference	RQ1, RQ4
147	Story Point-Based Effort Estimation Model with Machine Learning Techniques [30]	2020	Journal	RQ1, RQ2, RQ3
179	Effort Estimation in Agile Software Development: A Exploratory Study of Practitioners’ Perspective [6]	2022	Conference	RQ1, RQ2, RQ3, RQ4
241	Playing planning poker in crowds: Human computation of software effort estimates [31]	2021	Conference	RQ1, RQ4
246	A State of the Art Regressor Model’s comparison for Effort Estimation of Agile software [32]	2021	Conference	RQ1, RQ2, RQ3
339	Linear Regression Model for Agile Software Development Effort Estimation [33]	2020	Conference	RQ1, RQ2, RQ3
363	Extended Planning Poker: A Proposed Model [34]	2020	Conference	RQ1, RQ4
424	DevOPs project management tools for sprint planning, estimation and execution maturity [35]	2020	Journal	RQ1, RQ2, RQ3
529	Enhancing User-Stories Prioritization Process in Agile Environment [36]	2019	Conference	RQ1, RQ4
551	Effort prediction in agile software development with Bayesian networks [37]	2019	Conference	RQ1, RQ2, RQ3
554	An effort estimation support tool for agile software development: An empirical evaluation [38]	2019	Conference	RQ1, RQ2, RQ3, RQ4
566	Effort estimation in agile software development using evolutionary cost- sensitive deep Belief Network [39]	2019	Journal	RQ1, RQ2, RQ3
600	A Novel Hybrid ABC-PSO Algorithm for Effort Estimation of Software Projects Using Agile Methodologies [40]	2018	Journal	RQ1, RQ2, RQ3
610	Using developers’ features to estimate story points [41]	2018	Conference	RQ1, RQ4
629	Software process measurement and related challenges in agile software development: A multiple case study [42]	2018	Conference	RQ1, RQ2, RQ3, RQ4
640	An agile effort estimation based on story points using machine learning techniques [43]	2018	Conference	RQ1, RQ2, RQ3
655	Empirical Study on Commonly Used Combinations of Estimation Techniques in Software Development Planning [44]	2020	Journal	RQ1, RQ4
698	An Empirical Investigation of Effort Estimation in Mobile Apps Using Agile Development Process [45]	2019	Journal	RQ1, RQ4

Source: Authors.

E. Analysis and Classification

The studies were organized and analyzed from two perspectives: 1) A small bibliometric analysis on the year of publication, type of publication, keywords, authors, and their countries, and 2) the studies were grouped and analyzed to answer the RQ2 to RQ4, as can be seen in the Results section.

F. Validity Assessment

For this research, three types of validations have been carried out: 1) Descriptive Validation: The defined research protocol was rigorously applied to avoid biases in the selection of studies, its application was carried out and supervised by all the authors of the study, so that the selection and classification process was transparent and traceable by all the authors (Table 5). 2) Theoretical Validation: To avoid biases in the selection of articles, cross-review by the evaluators was implemented in each of the steps of the protocol. Likewise, to avoid the loss of studies, the search was carried out in five (5) databases with high scientific impact and the search chain was built considering keywords of interest and its relationship with similar words. Additionally, snowball sampling was carried out with the selected studies to identify lost studies from references and citations. 3) Generality: The research protocol was rigorously followed, which is based on the steps proposed by BTH [11], study that has been applied in a variety of mappings in the research area. 4) Interpretative Validation: The results obtained in the study are clear and represent a contribution in the topic of interest, additionally, they were supervised by all the authors, who have experience in the development of this type of studies and extensive experience in the subject matter of this study. 5) Repeatability: Section 3 and 4, present a detailed description of the process followed by its results and description of the applied actions to make the study valid. The defined process based on BTH [11], favors the repeatability of the study for future updates or expansion of its scope.

IV. Results

In this section, the answers to the constituted research questions are presented:

A. RQ1 How has the literature on software estimation (effort) in agile development evolved over time?

Of the 25 selected studies, a bibliometric analysis of their characteristics has been carried out, which focuses on year of publication, type of publication, geographical distribution by author and publisher and keywords. The Fig. 1 presents the distribution of selected studies considering whether they are journal articles or conferences and the year of their publication. 52% were published in conferences and 48% in journals.

Fig. 1. Studies published in journals and conferences by year.

Source: Authors.

Regarding the geographical distribution of the articles (Fig. 2), 25.8% of the authors are originally from India from a total of 31 countries and the rest of the authors are distributed almost homogeneously.

Fig. 2. Country distribution by author.

Source: Authors.

In Fig. 3, of the 24 publishers that publish the selected studies, the majority are from the United States, followed by Germany.

Fig. 3. Country distribution by publisher.

Source: Authors.

Fig. 4 shows the frequency of citations of the selected studies, study number 2 has the most quantity of citations and a large difference is observed compared to studies presented in the same year such as 6 and 8. This shows that the relevance of the most cited studies has remained counted compared to new published studies. It can also be observed that the highest number of citations are presented in studies of the Journal type with 108 citations compared to Conferences with 43.

Fig. 4. Citations by article.

Source: Authors.

Fig. 5 shows a keyword cloud, where the words that stand out the most are Effort Estimation with 12 repetitions, Agile Software Development with 7, Machine Learning with 5, Planning Poker, Agile and Software Effort Estimation with 4, all these words are within the search chain used, this confirms the relevance of the search process and its relevance to the research topic of this study.

Fig. 5. Word cloud with keywords.

Source: Authors.

B. RQ2 What techniques (methods) have been using in estimating effort in agile development?

As can be seen in Table 7, there are many studios that use Story Points (SP) in ASD projects.

Table 7.

Effort Estimation Techniques.

No.	Technique	Quantity	ID/[Ref]
1	History Points - SP	17	2 [24], 5 [25], 6 [26], 8 [2], 12 [27], 83 [29], 147 [30], 246 [32], 339 [33], 529 [36], 551 [37], 554 [38], 566 [39], 600 [40] , 610 [41], 629 [42], 640 [43]
2	Planning Poker - PP	7	1 [23], 68 [28], 241 [31], 363 [34], 424 [35], 655 [45], 698 [45]
3	Expert Judgment - EJ	4	1 [23], 68 [28], 655 [44], 698 [45]

Source: Authors.

The use of SP is reflected in more than 50% of selected studies being the main technique used for the estimation of effort, followed by PP mentioned in 7 articles and EJ with a total of 4. Additionally, several articles found in the literature take an approach to a better estimation using computational solutions or machine learning techniques, taking this into account the computational solutions proposed in Table 8 are also presented:

Table 8.

Computational Solutions.

No.	Technique	Quantity	Compared to	Studies
1	NB	2	Estimated values	424 [35], 551 [37]
2	KNN-Kmeans	1	Estimated values	1 [23]
3	LSTM-RHN-DR	1	Estimated values	2 [24]
4	DBN-ALO	1	DBN, PSO, FLANN, FA, RBFN, IFCM	5 [25]
5	MLP	1	KNN, LR, DT, SVR,	8 [2]
6	FNN	1	ENN	12 [27]
7	GBA	1	RFR, GBR, MLP	147 [30]
8	CatBoost	1	DT, LR, RF, AdaBoost, XGB	246 [32]
9	LR	1	SGB, RF, DT	339 [33]
10	German	1	N/A	554 [38]
11	ECS-DBN	1	FFPB-ENN, SVM-GLM	566 [39]
12	ABC-PSO	1	ABC, PSO	600 [40]
13	SVM	1	N/A	610 [41]
14	ANM	1	ANFIS, GRNN, RBFNs	640 [43]
Naïve Bayes (NB), K Nearest Neighbour (KNN), Long Short-Term Memory (LSTM), Decision Tree (DT), Particle Swarm Optimization (PSO), Deep Belief Network (DBN), Multilayer Perceptron (MLP), Gradient Boosting Algorithm (GBA), Categorical boosting (CatBoost), Linear Regression (LR), Evolutionary Cost-Sensitive Deep Belief Network (ECS-DBN), Artificial Bee Colony (ABC), Support Vector Machines (SVM), Recurrent Highway Net (RHN), Antlion Optimization Algorithm (ALO), Differentiable Regression (DR), Feedforward back-propagation Neural Network (FNN)

Source: Authors.

As can be seen in Table 8, the studies that presented a computational solution to improve the estimation of effort used different computational techniques based on machine learning and artificial intelligence with the use of Naive Bayes in two articles, the use of hybrid methods in 5 articles and the use of different types of neural networks in 4 articles.

C. RQ3 What metrics have been used to measure accuracy in estimating software in agile development?

Fig. 6 shows the percentage distribution of the most used precision measures in ASD, with the Mean Magnitude of Relative Error (MMRE) and the Prediction Evaluation n% (PRED (n)) being the most used, followed by the Mean Absolute Error (MAE).

Fig. 6. Accuracy measures.

Source: Authors.

D. RQ4 What are the most relevant problems and their causes in the estimation of effort in agile development?

Among the most relevant challenges for effort estimation, it has been found:

Since effort estimation in agile environments is mostly done by experts, as can be seen in previous studies [23], [24], [29], the suitability, experience and domain knowledge of the experts, can affect the estimation process in software development projects as seen in works in agreement between Pakistan Nigeria and Colombia [28]. Additionally, communication between estimators greatly affects the estimation process since there is a big number of variables necessary for a correct estimation process [29], [35], [36], not only in with environments that estimate effort based on expert judgment but also estimation based on machine learning techniques for example [40].
Even with the implementation of machine learning techniques for effort estimation there is always limitations on the amount of information available and the generality of the created model. As a tuned model can be used for one project but can become very difficult to apply on another [25], [27], [32], [38], this affects the process of testing validity of the model in different applications [34], [37].
In Planning Poker there may be a case of domination meaning that users with less experience will decide to follow the steps of the users who have more experience affecting the overall estimation process [28].
There is a lack of information for a good effort estimation process, the main reason for this lack of information is due to incorrect consolidation of needed information, required for an accurate estimation [42], [44].

V. Conclusions

This paper presents an exploration to identify effort techniques in ASD, metrics and problems carried out through systematic mapping that included studies that show how the estimation of effort in ASD has evolved on both the number of methodologies used and the new implementation of machine learning techniques both to propose new models to estimate more accurately the effort applied and to automate the estimation process.

Most of the studies implemented validations and comparisons of their results with various techniques or by using empirical data calculated as a result of the experience and performance of projects.

Regarding precision measures, we found that more studies are starting to use a specific type of measure for example MMRE and PRED(n), however there is still a need for comparison with other studies, as there is still not a process for a correct estimation of these type of models.

Among the problems identified, there is a deficit and ambiguity of information for the application of cased based effort estimation methodologies and the implementation of computational models for an accurate estimation process, this generates limitations in the improvement of current estimates.

The above highlights the changing and evolutionary nature of the work on this topic, at the same time as the relevance and seriousness that is being given to the topic, therefore, there is an opportunity to make proposals both to support traditional methods and to propose new ways to improve the accuracy of estimates in ASD projects by the implementation of machine learning techniques.

As future work, we plan to deepen the analysis elaborated in this mapping through an SLR as well as to extend the search chain in order to cover most of the new articles that are published on the subject and thus address in more detail the different challenges and solutions presented by effort estimation in software development projects, especially in agile environments.

Acknowledgements

We thank Universidad del Cauca Popayán, Colombia and Universidad Mohammed VI Polytechnic, Ben Guerir, Morocco.

References

[1] E. Mendes, Cost estimation techniques for web projects. HYS, PA: IGI Pub, 2007. https://doi.org/10.4018/978-1-59904-135-3

[2] M. Ramessur & S. Nagowah, “A predictive model to estimate effort in a sprint using machine learning techniques,” Int J Comput Sci Inf Technol, vol. 13, no. 7, pp. 1101–1110, Apr. 2021. https://doi.org/10.1007/s41870-021-00669-z

[3] R. Britto, E. Mendes & J. Borstler, “An Empirical Investigation on Effort Estimation in Agile Global Software Development,” presented at 10th International Conference on Global Software Engineering Workshops, ICGSEW, CR, ES, 13-16 Jul. 2015. https://doi.org/10.1109/ICGSE.2015.10

[4] S. Bilgaiyan, S. Mishra & M. Das, “A Review of Software Cost Estimation in Agile Software Development Using SoftComputing Techniques,” presented at International Conference on Computational Intelligence and Networks, CINE, BBSR, IN, 11-11 Jan. 2016. https://doi.org/10.1109/CINE.2016.27

[5] IEOM, Annual IEEE Computer Conference, International Conferenceon Industrial Engineering and Operations Management, IEOM, DXB, UAE, 3-5 March 2015. Available: https://ieomsociety.org/ieom/

[6] S. Rc, M. Sánchez-Gordón, R. Colomo-Palacios & M. Kristiansen, “Effort Estimation in Agile Software Development: AExploratory Study of Practitioners’ Perspective,” in LASD 2022: Lean and Agile Software Development, Przybyłek, A.,Jarzębowicz, A., Luković, I., Ng, Y. (Eds)., Cham, CH: Springer, 2022, vol. 428, pp. 136–149. https://doi.org/10.1007/978-3-030-94238-0_8

[7] H. Rastogi, S. Dhankhar & M. Kakkar, “A Survey on Software Effort Estimation Techniques,” presented at 5th International Conference - Confluence The Next Generation Information Technology Summit, Confluence, NOI, IN, 25-26 Sep. 2014. https://doi.org/10.1109/CONFLUENCE.2014.6949367

[8] P. Salvetto, “Modelos automatizables de estimación muy temprana del tiempo y esfuerzo de desarrollo de sistemas de información,” Tesis doctoral, Fac Inform, UPM, MAD, ES, 2004. Recuperado de https://oa.upm.es/367/1/PEDRO_SALVETTO_LEON.pdf

[9] E. Dantas, M. Perkusich, E. Dilorenzo, D. Santos, H. Almeida & A. Perkusich, “Effort Estimation in Agile Software Development: An Updated Review,” Int J Softw Eng Knowl Eng, vol. 28, no. 11–12, pp. 1811–1831, Nov. 2018. https://doi.org/10.1142/S0218194018400302

[10] B. Alsaadi & K. Saeedi, “Data-driven effort estimation techniques of agile user stories: a systematic literature review,” Artif Intell Rev, vol. 55, no. 7, pp. 5485–5516, Jan. 2022. https://doi.org/10.1007/s10462-021-10132-x

[11] K. Petersen, S. Vakkalanka & L. Kuzniarz, “Guidelines for conducting systematic mapping studies in software engineering: An update,” Inf Softw Technol, vol. 64, pp. 1–18, Aug. 2015. https://doi.org/10.1016/j.infsof.2015.03.007

[12] M. Fernández-Diego, E. Méndez, F. González-Ladrón-De-Guevara, S. Abrahão & E. Insfran, “An update on effort estimation in agile software development: A systematic literature review,” IEEE Access, vol. 8, pp. 166768–166800, Sep. 2020. https://doi.org/10.1109/ACCESS.2020.3021664

[13] M. Usman, E. Mendes, F. Weidt, & R. Britto, “Effort estimation in Agile Software Development: A systematic literature review,” presented at 10th International Conference on Predictive Models in Software Engineering, PROMISE '14, TO, IT, 17 sep. 2014. https://doi.org/10.1145/2639490.2639503

[14] T. Hacaloglu & O. Demirors, “Challenges of Using Software Size in Agile Software Development: A Systematic Literature Review,” presented at the Academic Papers at IWSM Mensura, IWSM-Mensura, BJ, CN, 19-20 Sep. 2018. Available: https://hdl.handle.net/11147/7045

[15] A. Altaleb & A. Gravell, “Effort Estimation across Mobile App Platforms using Agile Processes: A Systematic Literature Review,” JSW, vol. 13, no. 4, pp. 242–259, Apr. 2018. https://doi.org/10.17706/jsw.13.4.242-259

[16] B. Kitchenham & S. Charters, “Guidelines for Performing Systematic Literature Reviews in Software Engineering Version 2.3,” KUSU and UoD, Staf, UK, EBSE 2007-001 Tech Rep, 2007. Available from https://userpages.uni-koblenz.de/~laemmel/esecourse/slides/slr.pdf

[17] B. Kitchenman & D. Budgen, Evidence-Based Software Engineering and Systematic Reviews. BC RTN, FL, USA: CRC Press Taylor & Francis Group, 2015.

[18] K. Felizardo, E. Mendes, M. Kalinowski, E. Souza & N. Vijaykumar, “Using Forward Snowballing to update Systematic Reviews in Software Engineering,” presented at 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM '16, BC RTN, FL, USA, 8-9 Sep. 2016. https://doi.org/10.1145/2961111.2962630

[19] B. Kitchenman, O. Brereton, D. Budgen, M. Turner, J. Bailey & S. Linkman, “Systematic literature reviews in software engineering - A systematic literature review,” Inf Softw Technol, vol. 51, no. 1, pp. 7–15, Jan. 2009. https://doi.org/10.1016/j.infsof.2008.09.009

[20] F. Yaghmalef, “Content validity and its estimation,” JME, vol. 3, no. 1, pp. 25–27, Mar. 2003. Available: https://brieflands.com/articles/jme-105015.pdf

[21] E. Almanasreh, R. Moles & T. Chen, “Evaluation of methods used for estimating content validity,” Res Social Adm Pharm, vol. 15, no. 2, pp. 214–221, Feb. 2019. https://doi.org/10.1016/j.sapharm.2018.03.066

[22] E. Milian, M. de Spinola & M. de Carvalho, “Fintechs: A literature review and research agenda,” Electron Commer Res Appl, vol. 34, Feb. 2019. https://doi.org/10.1016/j.elerap.2019.100833

[23] M. Hamid, F. Zeshan, A. Ahmad, F. Ahmad, M. Hamza, Z. Khan, S. Munawar & H. Aljuaid, “An Intelligent Recommender and Decision Support System (IRDSS) for Effective Management of Software Projects,” IEEE Access, vol. 8, pp. 140752–140766, Jul. 2020. https://doi.org/10.1109/ACCESS.2020.3010968

[24] M. Choetkiertikul, H. Dam, T. Tran, T. Pham, A. Ghose & T. Menzies, “A Deep Learning Model for Estimating Story Points,” ITSE, vol. 45, no. 7, pp. 637–656, Jan. 2018. https://doi.org/10.1109/TSE.2018.2792473

[25] A. Kaushik, D. Tayal & K. Yadav, “A Comparative Analysis on Effort Estimation for Agile and Non-agile Software Projects Using DBN-ALO,” Arab J Sci Eng, vol. 45, no. 4, pp. 2605–2618, Nov. 2019. https://doi.org/10.1007/s13369-019-04250-6

[26] O. Malgonde & K. Chari, “An ensemble-based model for predicting agile software development effort,” Empir Softw Eng, vol. 24, no. 2, pp. 1017–1055, Apr. 2019. https://doi.org/10.1007/s10664-018-9647-0

[27] S. Bilgaiyan, S. Mishra & M. Das, “Effort estimation in agile software development using experimental validation of neural network models,” Int J Inf Technol, vol. 11, no. 3, pp. 569–573, Abr. 2018. https://doi.org/10.1007/s41870-018-0131-2

[28] S. Butt, S. Misra, J. Diaz-Martinez & F. De la Hoz, “Efficient Approaches to Agile Cost Estimation in Software Industries: A Project-Based Case Study,” presented at Information and Communication Technology and Applications, ICTA 2020, Cham, CH, 24-27 Nov. 2021. https://doi.org/10.1007/978-3-030-69143-1_49

[29] W. Alsaqaf, M. Daneva & R. Wieringa, “Quality requirements challenges in the context of large-scale distributed agile: An empirical study,” Inf Softw Technol, vol. 110, pp. 39–55, Mar. 2018. https://doi.org/10.1016/j.infsof.2019.01.009

[30] M. Gultekin & O. Kalipsiz, “Story Point-Based Effort Estimation Model with Machine Learning Techniques,” IJSEKE, vol. 30, no. 1, pp. 43–66, Jan. 2020. https://doi.org/10.1142/S0218194020500035

[31] M. Alhamed & T. Storer, “Playing Planning Poker in Crowds: Human Computation of Software Effort Estimates,” presented at 43 International Conference on Software Engineering, ICSE, MAD, ES, 22-30 May. 2021. https://doi.org/10.1109/ICSE43902.2021.00014

[32] M. Arora, A. Sharma, S. Katoch, M. Malviya & S. Chopra, “A State of the Art Regressor Model’s comparison for Effort Estimation of Agile software,” presented at 2nd International Conference on Intelligent Engineering and Management, ICIEM, LDN, UK, 28-30 Apr. 2021. https://doi.org/10.1109/ICIEM51511.2021.9445345

[33] A. Sharma & N. Chaudhary, “Linear Regression Model for Agile Software Development Effort Estimation,” presented at 5th IEEE International Conference on Recent Advances and Innovations in Engineering, ICRAIE, JAIP, IN, 1-3 Dec. 2020. https://doi.org/10.1109/ICRAIE51050.2020.9358309

[34] P. Sudarmaningtyas & R. Mohamed, “Extended Planning Poker: A Proposed Model,” presented at 7th International Conference on Information Technology, Computer, and Electrical Engineering, ICITACEE, SRG, ID, 24-25 Sep. 2020. https://doi.org/10.1109/ICITACEE50144.2020.9239165

[35] J. Angara, S. Prasad & G. Sridevi, “DevOPs project management tools for sprint planning, estimation and execution maturity,” Cybern Inf Technol, vol. 20, no. 2, pp. 79–92, Mar 2020. https://doi.org/10.2478/cait-2020-0018

[36] H. Sheemar & G. Kour, “Enhancing User-Stories Prioritization Process in Agile Environment,” presented at International Conference on Innovations in Control, Communication and Information Systems, ICICCI, GRT NOI, IN, 12-13 Aug. 2017. https://doi.org/10.1109/ICICCIS.2017.8660760

[37] L. Radu, “Effort prediction in agile software development with Bayesian networks,” presented at 14th International Conference on Software Technologies, ICSOFT, STBL, PT, 26-28 Jul. 2019. https://doi.org/10.5220/0007842802380245

[38] E. Dantas, A. Costa, M. Vinicius, M. Perkusich, H. Almeida & A. Perkusich, “An effort estimation supporttool for agile software development: An empirical evaluation,” presented at 31th International Conference on SoftwareEngineering and Knowledge Engineering, SEKE, LX, PT, 10-12 Jul. 2019. https://doi.org/10.18293/SEKE2019-141

[39] H. Premalatha & C. Srikrishna, “Effort estimation in agile software development using evolutionary cost- sensitive deep Belief Network,” Int J Intell Eng Syst, vol. 12, no. 2, pp. 261–269, Dec. 2018. https://doi.org/10.22266/IJIES2019.0430.25

[40] T. Khuat & M. Le, “A Novel Hybrid ABC-PSO Algorithm for Effort Estimation of Software Projects UsingAgile Methodologies,” JISYST, vol. 27, no. 3, pp. 489–506, Mar. 2017. https://doi.org/10.1515/jisys-2016-0294

[41] E. Scott & D. Pfahl, “Using developers’ features to estimate story points,” presented at InternationalConference on the Software and Systems Process, ICSSP'18, GBG, SE, 26-27 May. 2018. https://doi.org/10.1145/3202710.3203160

[42] P. Ram, P. Rodriguez & M. Oivo, “Software Process Measurement and Related Challenges in Agile SoftwareDevelopment: A Multiple Case Study,” presented at Intetnational Conference Product-Focused Software Process Improvement, PROFES, WOB, DE, 28-30 Nov. 2018. https://doi.org/10.1007/978-3-030-03673-7_20

[43] C. Prasada Rao, P. Siva Kumar, S. Rama Sree & J. Devi, “An agile effort estimation based on story points usingmachine learning techniques,” presented at 2nd International Conference on Computational Intelligence and Informatics, ICAI, HYD, IN, 22-23 Dec. 2018. https://doi.org/10.1007/978-981-10-8228-3_20

[44] A. Kialbekov, “Empirical Study on Commonly Used Combinations of Estimation Techniques in Software Development Planning,” presented at European Symposium on Software Engineering, ESSE '20, ROM, IT, 6-8 Nov. 2020. https://doi.org/10.1145/3393822.3432328

[45] A. Altaleb and A. Gravell, “An Empirical Investigation of Effort Estimation in Mobile Apps Using Agile Development Process,” JSW, vol. 14, no. 8, pp. 356–369, Jul. 2019. https://doi.org/10.17706/jsw.14.8.356-369

Camilo Andrés Piñeros Rodríguez. Universidad del Cauca (Popayán, Colombia). https://orcid.org/0000-0001-5925-9736

Luz Marina Sierra Martínez. Universidad del Cauca (Popayán, Colombia). https://orcid.org/0000-0003-3847-3324

Diego Hernán Peluffo Ordóñez. Universidad Mohammed VI Polytechnic Ben Guerir (Marruecos). https://orcid.org/0000-0002-9045-6997

Jimena Adriana Timana Peña. Universidad del Cauca (Popayán, Colombia). https://orcid.org/0000-0002-1587-534X

Effort Estimation in Agile Software Development: A Systematic Map Study

Estimación del Esfuerzo en el Desarrollo de Software Ágil: Mapeo Sistemático

DOI: http://doi.org/10.17981/ingecuc.19.1.2023.03

Scientific Research Article. Date received: 27/12/2021. Date aceptance: 22/12/2022.

Description of Related Works.

Source: Authors.

Research Questions.

Source: Authors.

Description of Keywords and Related Words.

Source: Authors.

Results obtained from the implementation of the search string on each database.

Source: Authors.

Selection of Studies.

Source: Authors.

Data Extraction.

Source: Authors.

Fig. 1. Studies published in journals and conferences by year.

Source: Authors.

Fig. 2. Country distribution by author.

Source: Authors.

Fig. 3. Country distribution by publisher.

Source: Authors.

Fig. 4. Citations by article.

Source: Authors.

Fig. 5. Word cloud with keywords.

Source: Authors.

Effort Estimation Techniques.

Source: Authors.

Computational Solutions.

Source: Authors.

Fig. 6. Accuracy measures.

Source: Authors.

.

© The author; licensee Universidad de la Costa - CUC.

INGE CUC vol. 19 no. 1, pp. 22-36. January - June, 2023

Barranquilla. ISSN 0122-6517 Impreso, ISSN 2382-4700 Online

.