Troubleshooting
Common issues and solutions for SRExpert.
Installation Issues
Helm Installation Fails
Symptom: helm install command fails
Solutions:
- Check Helm version (v3.x required)
helm version- Verify repository is added
helm repo list
helm repo add srexpert-helm https://nexus.srexpert.io/repository/srexpert-helm/
helm repo update- Check namespace exists
kubectl create namespace srexpertPods Not Starting
Symptom: Pods stuck in Pending or CrashLoopBackOff
Check resources:
kubectl describe pod -n srexpert <pod-name>
kubectl logs -n srexpert <pod-name>Common causes:
| Issue | Solution |
|---|---|
| ImagePullBackOff | Check registry credentials |
| Insufficient resources | Increase node capacity |
| PVC pending | Check StorageClass |
| ConfigMap missing | Verify Helm values |
Image Pull Errors
Symptom: ErrImagePull or ImagePullBackOff
Solutions:
- Verify secret exists
kubectl get secret nexus-registry -n srexpert- Check secret configuration
kubectl get secret nexus-registry -n srexpert -o yaml- Recreate secret if needed
kubectl create secret docker-registry nexus-registry \
--docker-server=registry.srexpert.io \
--docker-username=YOUR_USER \
--docker-password=YOUR_PASS \
-n srexpertConnection Issues
Cannot Connect to Cluster
Symptom: Cluster shows “Disconnected” status
Check:
- Kubeconfig is valid
kubectl --kubeconfig=your-kubeconfig get nodes- Network connectivity
curl -k https://your-cluster-api:6443/healthz- Certificate validity
kubectl config view --raw -o jsonpath='{.users[0].user.client-certificate-data}' | base64 -d | openssl x509 -noout -datesAPI Server Timeout
Symptom: Operations timeout
Solutions:
- Check network latency
- Increase timeout in settings
- Verify firewall rules allow port 6443
Authentication Failed
Symptom: 401 Unauthorized errors
Check:
- Token hasn’t expired
- ServiceAccount exists
- RBAC bindings are correct
kubectl auth can-i --list --as=system:serviceaccount:srexpert-system:srexpertDatabase Issues
PostgreSQL Not Starting
Symptom: PostgreSQL pod fails to start
Check PVC:
kubectl get pvc -n srexpert
kubectl describe pvc -n srexpert data-srexpert-backend-postgresql-0Check logs:
kubectl logs -n srexpert srexpert-backend-postgresql-0Common fixes:
- Verify StorageClass exists
- Check storage quota
- Ensure PV permissions
Connection Refused
Symptom: Backend can’t connect to database
Verify service:
kubectl get svc -n srexpert | grep postgresqlCheck password:
kubectl get secret -n srexpert srexpert-database -o jsonpath='{.data.postgres-password}' | base64 -dUI Issues
Dashboard Not Loading
Symptom: Blank page or loading forever
Solutions:
- Clear browser cache
- Check browser console for errors
- Verify backend is running
kubectl get pods -n srexpert -l app.kubernetes.io/name=srexpert-backend- Check ingress configuration
kubectl get ingress -n srexpertLogin Fails
Symptom: Cannot log in
Check:
- Backend logs for errors
kubectl logs -n srexpert -l app.kubernetes.io/name=srexpert-backend --tail=100- Cookie settings match domain
- CORS configuration is correct
Performance Issues
Slow Response Times
Symptom: UI is slow
Solutions:
- Check resource usage
kubectl top pods -n srexpert- Increase resource limits
resources:
limits:
cpu: 2000m
memory: 4Gi- Enable Redis caching
High Memory Usage
Symptom: Pods getting OOMKilled
Solutions:
- Increase memory limits
- Check for memory leaks in logs
- Reduce concurrent operations
Feature-Specific Issues
SRE CLI Not Working
Check:
- WebSocket connectivity
- AI service is enabled
- Kubeconfig has exec permissions
AI Assistant Not Responding
Check:
- License includes AI features
- AI service is enabled
- Network allows outbound connections
Metrics Not Showing
Check:
- Prometheus is configured
- Metrics server is running in cluster
- RBAC allows metrics access
Logs Collection
Backend Logs
kubectl logs -n srexpert -l app.kubernetes.io/name=srexpert-backend --tail=500 > backend.logFrontend Logs
kubectl logs -n srexpert -l app.kubernetes.io/name=srexpert-frontend --tail=500 > frontend.logAll Events
kubectl get events -n srexpert --sort-by='.lastTimestamp' > events.logGetting Help
Support Channels
- Documentation: https://docs.srexpert.io
- Support Portal: https://srexpert.atlassian.net/servicedesk/customer/portal/1
- Email: [email protected]
Information to Include
When contacting support, include:
- SRExpert version
- Kubernetes version
- Error messages
- Steps to reproduce
- Relevant logs
Version Information
# SRExpert version
kubectl get deployment -n srexpert srexpert-backend -o jsonpath='{.spec.template.spec.containers[0].image}'
# Kubernetes version
kubectl version --shortCommon Error Messages
| Error | Meaning | Solution |
|---|---|---|
ECONNREFUSED | Can’t reach service | Check service/network |
401 Unauthorized | Auth failed | Check credentials |
403 Forbidden | No permission | Check RBAC |
404 Not Found | Resource missing | Verify resource exists |
500 Internal Error | Server error | Check backend logs |
503 Service Unavailable | Service down | Check pod status |