Converting Epics/Stories into Pseudocode using Transformers

7 Jul 2024

Authors:

(1) Gaurav Kolhatkar, SCTR’s Pune Institute of Computer Technology, Pune, India (gauravk403@gmail.com);

(2) Akshit Madan, SCTR’s Pune Institute of Computer Technology, Pune, India (akmadan17@gmail.com);

(3) Nidhi Kowtal, SCTR’s Pune Institute of Computer Technology, Pune, India (kowtalnidhi@gmail.com);

(4) Satyajit Roy, SCTR’s Pune Institute of Computer Technology, Pune, India (satyajit12.roy@gmail.com).

Table of Links

Abstract and Introduction

Literature Survey

Methodology

Performance Analysis

Conclusion and References

Abstract—The conversion of user epics or stories into their appropriate representation in pseudocode or code is a time-consuming task, which can take up a large portion of the time in an industrial project. With this research paper, we aim to present a methodology to generate pseudocode from a given agile user story of small functionalities so as to reduce the overall time spent on the industrial project. Pseudocode is a programming language agnostic representation of the steps involved in a computer program, which can be easily converted into any programming language. Leveraging the potential of Natural Language Processing, we want to simplify the development process in organizations that use the Agile Model of Software Development. We present a methodology to convert a problem described in the English language into pseudocode. This methodology divides the Text to Pseudocode conversion task into two stages or subtasks, each of which is treated like an individual machine translation task. Stage 1 is Text to Code Conversion and Stage 2 is Code to Pseudocode Conversion. We find that the CodeT5 model gives the best results in terms of BLEU score when trained separately on the two subtasks mentioned above. BLEU score is a metric that is used to measure the similarity between a machine-translated text and a set of reference translations.

Index Terms—Text to code generation, Code to Pseudocode generation, Transformers

INTRODUCTION

Efficiency of work is of the highest importance in modern organizations and businesses. A majority of the workplaces today use the Agile Model for software development. Agile is a software development approach based on iterative development, wherein tasks are divided into smaller iterations or sprints. In Agile project management tools such as Jira are used to document the user requirements in the form of epics or user stories. Developers need to understand these requirements and write code for the same. However, a significant amount of development time and efforts can be saved by automating the process of code/pseudo code generation, especially for simple or repetitive problems that have been solved before. The motivation of our research paper is to simplify the work of developers so that they can focus on more complex tasks and in the process, to optimize the software development lifecycle.

Jira is a software application used for issue tracking and project management. It is widely used by agile development teams to track bugs, stories, epics, and other tasks. Epics are large bodies of work that can be broken down into a number of smaller tasks (called stories). Stories, also called “user stories,” are short requirements or requests written from the perspective of an end user. Our aim is to convert epics/stories to pseudo code.

Despite the advantages of the Agile Model, developing software may still be a difficult and drawn-out process, especially when it comes to translating user requirements into functional code. Developers often translate user epics or stories manually into code as part of this process, which can take a lot of time and work.

Our study intends to investigate the potential of utilising machine learning methods and natural language processing to automate the process of generating code and pseudocode from user stories in order to address this difficulty. By doing this, we hope to streamline developers’ tasks, enhance the software development lifecycle, and boost the effectiveness of the entire industrial project.

Recent advancements in the field of natural language processing have made it possible to automate a variety of formerly manual operations. Recent developments in deep learning, in particular, have made it possible to create sophisticated natural language models that can extract context and meaning from text input. By utilising these models, we can quickly and efficiently create code or pseudocode from user stories, relieving the pressure on developers.

Our study will examine a variety of currently used methodologies, including as rule-based systems, statistical models, and deep learning techniques, for producing code or pseudocode from user stories. We will assess the benefits and drawbacks of each methodology and suggest a novel strategy that makes the most of their advantages.

Overall, our research paper aims to make software development more efficient and effective by automating the process of code/pseudocode generation from user stories. By doing so, we hope to free up developers to focus on more complex tasks, reduce the risk of errors, and ultimately deliver better software products to users.

This paper is available on arxiv under CC 4.0 license.