This guide is designed to help retrive issues in your CircleCI self-hosted server environment. Follow the step-by-step instructions to gather necessary diagnostic information.
🚨 Important: Timely Log Collection 🚨
Collecting a support bundle is crucial and time-sensitive for all issues. CircleCI logs are retained for a limited time, and log rotation may cause critical information to be lost.
⚠️ Create support bundles within 10 minutes of the issue occurring to prevent loss of relevant logs, which can make diagnosis extremely difficult.
Quick Reference
Issue Type | Essential Logs | Commands |
Docker Executor Issues (Job Delay, Infra Fail) | Support bundle + Nomad alloc logs | |
Machine Executor Issues (Job Delay, Infra Fail) | Support bundle + Journalctl logs | |
Permission Problems | Support bundle + AWS error messages | |
CircleCI API Connection Issues | Support bundle + API request logs with | |
Custom Integration Issues | Support bundle + Integration logs |
Initial Diagnostics
Support Bundle Collection (REQUIRED)
For all issues, start by collecting a support bundle:
kubectl support-bundle https://raw.githubusercontent.com/CircleCI-Public/server-scripts/main/support/support-bundle.yaml -n circleci-server
Important Notes:
If you receive a
timeouterror with RabbitMQ, the bundle may still contain valuable information. .Run this command as soon as possible after observing an issue. If the issue is past that, please rerun or replicate to ensure that logs are captured.
Include the job ID and timestamp of when the issue occurred when submitting to support
Retrieving Job Details (IMPORTANT)
To get complete job information for troubleshooting, collect the job details from the API:
Replace with your server domain and appropriate job info
curl -H "Circle-Token:${CIRCLE_TOKEN}" -s "https://[REDACTED-COMPANY].net/api/v1.1/project/github/organization/project/[JOB_NUM]" | tee job-details.jsonThis will provide essential information like: - Step timing details - Build parameters - Start and completion times - Job history
Docker Executor Issues
If experiencing delays between job steps or infrastructure failures:
Collect a support bundle immediately (see above).
Get the specific job ID from the CircleCI UI or API.
Check the Nomad allocation:
kubectl exec -it $(kubectl get pods -l app=nomad-server -n circleci-server -o name | head -1) -n circleci-server -- nomad status <job-id>
Critically important: Examine allocation logs for the specific job:
kubectl exec -it $(kubectl get pods -l app=nomad-server -n circleci-server -o name | head -1) -n circleci-server -- nomad alloc logs -stderr <allocation-id>
For comprehensive logging of all running jobs and containers, use this script to collect detailed information:
#!/bin/bashmkdir -p ba-logsnomad_server_pod_name=$(kubectl get pods -l app=nomad-server -n circleci-server -o jsonpath='{.items[0].metadata.name}')while :; do
kubectl exec $nomad_server_pod_name -n circleci-server -- nomad status | tail -n +2 | awk '{ print $1 }' | while read -r job; do
date=$(date +%s)
mkdir -p "ba-logs/${date}/${job}" # shellcheck disable=SC2024
kubectl exec $nomad_server_pod_name -n circleci-server -- nomad status "${job}" &"ba-logs/${date}/${job}/status.txt"
# shellcheck disable=SC2024
kubectl exec $nomad_server_pod_name -n circleci-server -- nomad logs -stderr -job "${job}" &"ba-logs/${date}/${job}/stderr.txt" kubectl exec $nomad_server_pod_name -n circleci-server -- nomad status "${job}" | tail -n +18 | awk "{ print \$1 }" | while read -r job_alloc; do
kubectl exec $nomad_server_pod_name -n circleci-server -- nomad alloc exec "${job_alloc}" docker ps -a &"ba-logs/${date}/${job}/docker-ps.txt"
kubectl exec $nomad_server_pod_name -n circleci-server -- nomad alloc exec "${job_alloc}" docker ps -a | tail -n +2 | awk "{ print \$1 }" | while read -r containerid; do
kubectl exec $nomad_server_pod_name -n circleci-server -- nomad alloc exec "${job_alloc}" docker logs $containerid &"ba-logs/${date}/${job}/${containerid}.txt"
done
done
done find ba-logs -type f -mtime +1 -exec rm {} \;
find ba-logs -mindepth 1 -type d -exec bash -c 'rmdir "$1" & /dev/null || true' shell {} \;
echo "..."
sleep 1
doneNote: Remember to modify the namespace in the script from circleci-server to circleci-server to match your environment.
Machine Executor Issues
For issues with machine executors:
Collect a support bundle immediately (within 10 minutes of the issue).
Add the following step to your CircleCI configuration to capture system logs during job execution:
jobs:
your-job-name:
machine: true
steps:
# Your regular job steps here
# Add this step to capture system logs
- run:
name: Retrieve system logs
command: journalctl --no-pager -f
background: true
when: alwaysThis will ensure system logs are captured regardless of whether the job succeeds or fails.
Check machine provisioner logs:
kubectl logs -l app=machine-provisioner-provisioner -n circleci-server > machine-provisioner-logs.txt
Look for resource constraints or network connectivity issues in the logs:
Disk space errors:
No space left on deviceNetwork timeouts:
Connection timed outAWS permission errors:
UnauthorizedOperationResource allocation issues:
Cannot allocate memory
For EC2 instance issues, check AWS permission and authorization errors (see AWS Permission Issues).
AWS Permission Issues
When encountering AWS errors:
Collect a support bundle immediately (within 10 minutes of error occurrence).
Extract the encoded authorization message from logs.
Decode the message:
aws sts decode-authorization-message --encoded-message "<encoded_message>"
Check AWS CloudTrail logs for denied actions (highly recommended):
Recommend to access from the UI but following command might also help give more information https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html# Search CloudTrail logs for errors related to the IAM role aws cloudtrail lookup-events --lookup-attributes AttributeKey=Username,AttributeValue= --max-items 100 # Filter CloudTrail for specific error events aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=RunInstances --max-items 100# Check for specific errors in recent CloudTrail logs aws cloudtrail lookup-events --start-time $(date -u -d "1 hour ago" +"%Y-%m-%dT%H:%M:%SZ") --query "Events[?contains(CloudTrailEvent, 'errorCode') || contains(CloudTrailEvent, 'errorMessage')]"Verify the relevant IAM policies for:
Resource access (check account IDs in ARNs)
Service control policies (look for explicit denies)
Cross-account access permissions
Correct region specification in resource ARNs
API Connection Issues
If experiencing issues with API connections:
Collect a support bundle immediately (within 10 minutes of the issue).
Capture API response and request details with verbose output:
#For curl commands, add the -vvv flag to see detailed request/response information curl -vvv -X POST "https://[REDACTED-COMPANY].net/api/v2/workflow/approve/[BUILD_NUM]"
Check Nginx logs for API-related errors:
kubectl logs -l app=nginx -n circleci-server --tail=500 > nginx-logs.txt
Look for specific HTTP response codes and timing:
404 responses might indicate the job is not yet ready.
403 responses might indicate permission issues.
Slow responses (>1s) might indicate backend processing delays.
For webhook or approval timing issues, capture timestamps of:
Job completion events in logs.
API call attempts.
Webhook delivery attempts.
Custom Integration Issues
For issues with custom integrations (GitHub Enterprise, proxy setups, etc.):
Collect a support bundle immediately (within 10 minutes of the issue).
For GitHub Enterprise integration issues:
Capture GitHub webhook delivery logs (from GitHub Enterprise UI)
Check TLS certificate configuration
Verify network connectivity between CircleCI and GitHub Enterprise
For proxy integrations:
Collect complete request/response cycles including headers
Log both incoming and outgoing payloads (if possible)
Verify that signatures and headers are preserved through the proxy
Contacting Support
When submitting a ticket, please include:
The support bundle (collected within 10 minutes of the issue).
Specific error messages.
Exact job ID and URL experiencing the issue.
Timestamps when the issue occurred.
Any relevant AWS error messages or decoded authorization messages.
For job timing issues: Complete Nomad allocation logs.
For machine executor issues: System journalctl logs.
For API issues: Request/response details with timestamps (using the -vvv flag).