Predicting how long a bug will take to resolve isn't just a curiosity—it can optimize planning, improve SLAs, and alert engineering leaders early about bottlenecks.

In this post, we'll walk through how to build a production-grade microservice using FastAPI and XGBoost that predicts bug resolution time from Jira-like data.

AI-based defect management visual

🔧 Technologies Used:

  • 🐍 Python 3.10+
  • 🚀 FastAPI
  • 📦 XGBoost
  • 📄 Pandas + scikit-learn
  • 🐳 Docker (optional for deployment)

🧠 Why Predict Resolution Time?

  • ⏰ Set expectations across teams and stakeholders
  • 📈 Improve sprint forecasting
  • 🚨 Detect risks early in QA pipelines
  • 💡 Feed into auto-prioritization models

According to Atlassian, inaccurate estimates are one of the top causes of missed sprint goals. ML can help you move from guesswork to insight.

📁 Step 1: Prepare Your Dataset

Start with a CSV export from your issue tracker (like Jira):

Required Columns:

  • summary
  • description
  • priority
  • created
  • resolved

Calculate Resolution Time:

import pandas as pd
df = pd.read_csv("jira_issues.csv")
df['created'] = pd.to_datetime(df['created'])
df['resolved'] = pd.to_datetime(df['resolved'])
df['resolution_time_hours'] = (df['resolved'] - df['created']).dt.total_seconds() / 3600
df = df[df['resolution_time_hours'] > 0]

🔍 Step 2: Feature Engineering

import numpy as np
from sklearn.preprocessing import LabelEncoder
# Text length as a proxy
df['text'] = df['summary'].fillna('') + ' ' + df['description'].fillna('')
df['text_len'] = df['text'].str.len()
# Encode categorical
le = LabelEncoder()
df['priority_encoded'] = le.fit_transform(df['priority'])
# Final features
features = ['text_len', 'priority_encoded']
target = 'resolution_time_hours'

⚙️ Step 3: Train the XGBoost Model

from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
X = df[features]
y = df[target]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = XGBRegressor(n_estimators=100, max_depth=5, learning_rate=0.1)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("MAE:", mean_absolute_error(y_test, y_pred))

📦 Step 4: Package It as a FastAPI Microservice

Install FastAPI:

pip install fastapi uvicorn joblib

Save the model and encoders:

import joblib
joblib.dump(model, "xgb_model.pkl")
joblib.dump(le, "label_encoder.pkl")

Create app.py:

from fastapi import FastAPIfrom pydantic import BaseModelimport joblibimport numpy as npapp = FastAPI()model = joblib.load("xgb_model.pkl")le = joblib.load("label_encoder.pkl")class BugInput(BaseModel):
summary: str
description: str
priority: [email protected]("/predict-resolution-time")def predict_resolution_time(bug: BugInput):
text = (bug.summary or "") + " " + (bug.description or "")
text_len = len(text)
priority_encoded = le.transform([bug.priority])[0]
features = np.array([[text_len, priority_encoded]])
pred_hours = model.predict(features)[0]
return {
    "predicted_resolution_time_hours": round(pred_hours, 2)
}

🧪 Step 5: Test It

Run the API:

uvicorn app:app --reload

Test using curl or Swagger UI:

curl -X POST "http://localhost:8000/predict-resolution-time" \
-H "Content-Type: application/json" \
-d '{"summary": "API failure when clicking Save", "description": "Internal server error with stack trace in logs", "priority": "High"}'

✅ You’ll get a JSON response like:

{
"predicted_resolution_time_hours": 14.26
}

🧪 Optional: Dockerize It

Here’s a sample Dockerfile:

FROM python:3.11
WORKDIR /app
COPY . .
RUN pip install fastapi uvicorn scikit-learn xgboost joblib
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

🧠 Real-World Use

  • GitHub Engineering experimented with ML to predict issue completion time.
  • Microsoft Research showed how SVM and gradient boosting models could reduce issue estimation error by 36%.
  • At Bugflows, we’ve trained domain-specific models that integrate priority, issue type, and historical bug complexity to forecast accurate timelines with <20% average deviation.

🔮 What’s Next?

  • Integrate sprint-level metadata (e.g., workload, velocity)
  • Use text embeddings (BERT) for better modeling
  • Push predictions to Slack or Jira using webhooks

📬 TL;DR

  • ✅ Predict resolution time for bugs
  • ✅ Train on your own historical issue data
  • ✅ Deploy as a real-time microservice

🚀 Want to Skip the Setup?

Bugflows offers plug-and-play models like this out-of-the-box—just connect your Jira or GitHub, and you're ready to go.

👉 Try Bugflows | Book a Demo