Troubleshooting
This guide covers common issues you may encounter when running Nekzus and provides solutions to resolve them.
Quick Diagnostics
Before diving into specific issues, run these diagnostic commands:
# Check container status
docker ps -a | grep nekzus
# View recent logs
docker logs nekzus --tail 100
# Check health endpoint
curl -k https://localhost:8443/api/v1/healthz
# Enable debug logging
docker exec nekzus sh -c "export NEKZUS_DEBUG=true"
Installation Issues
Container Fails to Start
Container exits immediately after starting
Symptoms:
- Container status shows
Exited (1)or similar docker logsshows configuration or initialization errors
Common Causes:
- Invalid configuration file
- Missing required environment variables
- Database initialization failure
Solution:
Check the container logs for specific error messages:
docker logs nekzus 2>&1 | head -50
Common log messages and fixes:
| Log Message | Cause | Solution |
|---|---|---|
JWT secret must be at least 32 characters | JWT secret too short | Set NEKZUS_JWT_SECRET to a 32+ character string |
failed to create database directory | Permission denied | Check volume mount permissions |
failed to load TLS certificate | Invalid certificate files | Verify certificate paths and format |
JWT secret contains weak pattern | Insecure secret detected | Use a strong random secret in production |
Container keeps restarting in a loop
Symptoms:
- Container shows
Restartingstatus - Health checks consistently fail
Solution:
-
Check if the health check endpoint is accessible:
docker exec nekzus wget -q -O- http://localhost:8080/api/v1/healthz -
Verify resource limits are not too restrictive:
# docker-compose.yml
deploy:
resources:
limits:
memory: 1G # Minimum recommended
reservations:
memory: 256M -
Check if the database is corrupted (see Database Issues)
Port Conflicts
Error: bind: address already in use
Symptoms:
- Container fails to start
- Error message mentions port binding failure
Solution:
-
Identify what's using the port:
# Check port 8443 (HTTPS)
sudo lsof -i :8443
# or
sudo netstat -tulpn | grep 8443 -
Either stop the conflicting service or change Nekzus ports:
# docker-compose.yml
ports:
- "9443:8443" # Use port 9443 instead
- "9080:80" -
Update your
NEKZUS_BASE_URLto match the new port:NEKZUS_BASE_URL=https://your-server:9443
Permission Denied Errors
Permission denied when accessing files or Docker socket
Symptoms:
- Errors related to file permissions
- Docker discovery not working
- Database write failures
Solution:
-
For database directory:
# Create data directory with correct permissions
mkdir -p ./data
chmod 755 ./data
# If running as non-root user
chown 1000:1000 ./data -
For Docker socket access:
# Add read-only mount with correct permissions
docker run -v /var/run/docker.sock:/var/run/docker.sock:ro nekzus
# On Linux, ensure user is in docker group
sudo usermod -aG docker $USER -
For certificate files:
# Ensure certificates are readable
chmod 644 ./certs/server.crt
chmod 600 ./certs/server.key
Docker Socket Access
Docker discovery shows 'Docker socket unavailable'
Symptoms:
- Log message:
failed to create Docker client - Discovery shows no containers
- WebSocket event:
Docker Discovery - Docker socket unavailable
Solution:
-
Verify Docker socket is mounted:
# docker-compose.yml
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro -
Check socket path (varies by platform):
Platform Socket Path Linux /var/run/docker.sockmacOS (Docker Desktop) /var/run/docker.sockWindows (WSL2) /var/run/docker.sockPodman /run/podman/podman.sockRootless Docker /run/user/1000/docker.sock -
For custom socket paths, set in config:
# config.yaml
discovery:
docker:
enabled: true
socket_path: "unix:///run/user/1000/docker.sock"
Discovery Issues
Docker Discovery Not Finding Containers
Containers are running but not appearing in discovery
Symptoms:
- Running containers not showing in proposals
- Log shows
scanning containersbut no proposals created
Possible Causes:
- Container on different network
- Container has no HTTP ports
- Container is explicitly disabled
- Container is a system container (filtered)
Solution:
-
Check container labels:
docker inspect <container> --format '{{json .Config.Labels}}' | jq -
Ensure container has
nekzus.enable: "true"label or expose HTTP ports:# docker-compose.yml for your service
labels:
- "nekzus.enable=true"
- "nekzus.app.id=myapp"
- "nekzus.app.name=My Application" -
Check network configuration:
# config.yaml
discovery:
docker:
enabled: true
networks:
- nekzus-network # Only scan specific networks
exclude_networks:
- host
- none -
Enable debug logging to see why containers are skipped:
NEKZUS_DEBUG=true docker logs nekzus 2>&1 | grep -i "skipping"
Container discovered but HTTP probe fails
Symptoms:
- Log shows:
skipping port - HTTP probe failed - Container has exposed ports but none are discovered
Solution:
Nekzus probes ports to verify they serve HTTP. For non-standard setups:
-
Force discovery of specific port:
labels:
- "nekzus.primary_port=3000" -
Discover all TCP ports (skip probing):
labels:
- "nekzus.discover.all_ports=true" -
Check if service is ready: The container might need time to initialize:
# Check if port responds
docker exec <container> wget -q --spider http://localhost:3000
mDNS Discovery Failures
mDNS discovery not finding any services
Symptoms:
- Log shows:
worker started - not fully implemented - No mDNS services discovered
Current Status:
mDNS discovery is not fully implemented in the current version. The worker starts but does not actively discover services.
Workaround:
-
Use Docker discovery for containerized services
-
Manually configure static routes for mDNS services:
# config.yaml
routes:
- route_id: "homeassistant"
app_id: "homeassistant"
path_base: "/apps/homeassistant/"
to: "http://homeassistant.local:8123"
apps:
- id: "homeassistant"
name: "Home Assistant"
icon: "https://example.com/ha-icon.png"
Kubernetes Service Discovery Problems
Kubernetes discovery shows 'failed to create Kubernetes config'
Symptoms:
- Log message:
failed to create Kubernetes config - Kubernetes services not discovered
Solution:
-
When running inside Kubernetes cluster:
Ensure proper RBAC permissions:
# kubernetes/rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: nekzus
rules:
- apiGroups: [""]
resources: ["services", "namespaces"]
verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get", "list", "watch"] -
When running outside cluster:
Mount kubeconfig file:
# docker-compose.yml
volumes:
- ~/.kube/config:/app/.kube/config:ro
environment:
KUBECONFIG: /app/.kube/config -
Configure in config.yaml:
discovery:
kubernetes:
enabled: true
kubeconfig: "/app/.kube/config"
namespaces:
- default
- production
Services discovered but cannot be accessed
Symptoms:
- Kubernetes services appear in catalog
- Proxy returns 502 Bad Gateway
Solution:
-
Verify network connectivity between Nekzus and cluster:
# From Nexus container
docker exec nekzus nslookup myservice.default.svc.cluster.local
docker exec nekzus curl http://myservice.default.svc.cluster.local:8080 -
Ensure Nekzus can resolve Kubernetes DNS:
# docker-compose.yml
dns:
- 10.96.0.10 # kube-dns service IP
Authentication Issues
JWT Token Errors
Error: 'TOKEN_EXPIRED' (Code 1001)
Symptoms:
- API returns 401 with error code
TOKEN_EXPIRED - Mobile app shows authentication expired
Solution:
- Mobile app: The app should automatically attempt to refresh the token
- If refresh fails: Re-pair the device by scanning a new QR code
- For long-running scripts: Use API keys instead of JWT tokens
Token Lifetime Configuration:
# config.yaml
auth:
token_ttl: "24h" # Access token lifetime
refresh_ttl: "720h" # Refresh token lifetime (30 days)
Error: 'TOKEN_INVALID' (Code 1002)
Symptoms:
- API returns 401 with error code
TOKEN_INVALID - Token rejected as malformed
Common Causes:
- JWT secret mismatch: Secret changed after token was issued
- Token corruption: Token was modified or truncated
- Wrong issuer/audience: Token from different Nekzus instance
Solution:
-
Verify JWT secret consistency:
# JWT secret should be the same across restarts
docker exec nekzus printenv | grep JWT_SECRET -
Re-pair affected devices with a fresh bootstrap token
Error: 'DEVICE_REVOKED' (Code 1004)
Symptoms:
- Device cannot authenticate
- Previously working device suddenly rejected
Solution:
The device was explicitly revoked by an administrator.
-
Check revocation in the web UI under Devices
-
To restore access, delete and re-pair the device:
# Generate new bootstrap token
curl -X POST https://localhost:8443/api/v1/auth/bootstrap/generate
Mobile App Pairing Failures
QR code scanning works but pairing fails
Symptoms:
- Mobile app scans QR code successfully
- Pairing request returns error
- Log shows:
failed pairing attempt
Possible Causes:
- Bootstrap token expired (5-minute default lifetime)
- Token already used (one-time use)
- Rate limiting triggered
Solution:
-
Generate a fresh QR code (old ones expire after 5 minutes)
-
Check for rate limiting:
docker logs nekzus 2>&1 | grep -i "rate" -
Wait 1 minute and retry if rate limited
Mobile app cannot reach Nekzus server
Symptoms:
- QR code contains correct URL
- Mobile app shows connection error
Solution:
-
Verify network connectivity:
- Mobile device must be on the same network
- Check firewall rules allow port 8443
-
Verify base URL configuration:
# Should return your server's LAN IP, not localhost
docker logs nekzus 2>&1 | grep "base_url" -
Fix base URL if incorrect:
NEKZUS_BASE_URL=https://192.168.1.100:8443 -
Certificate issues: Mobile apps may reject self-signed certificates. Either:
- Use a trusted certificate (Let's Encrypt)
- Accept the certificate warning on first connection
API Key Problems
API key returns 401 Unauthorized
Symptoms:
- API key was working, now returns 401
- Header
X-API-Keyis set correctly
Common Causes:
- Key revoked or expired
- Insufficient scopes
- Key not found in database
Solution:
-
Check key status in the web UI under Settings > API Keys
-
Verify key has required scopes:
# Key needs appropriate scopes for the endpoint
# e.g., "write:*" for deployment operations -
Create a new key if the old one is compromised or expired
IP Allowlist Issues
Request rejected even from local network
Symptoms:
- Requests from LAN return 401
- Log shows:
Failed to parse IP from RemoteAddr
Solution:
-
Check if behind reverse proxy: When using Caddy/nginx, the real client IP may not be forwarded:
# Caddyfile - forward real IP
header_up X-Real-IP {remote_host}
header_up X-Forwarded-For {remote_host} -
Docker network ranges: Ensure Docker bridge networks are recognized:
The following ranges are automatically recognized as local:
127.0.0.0/8(loopback)10.0.0.0/8(private)172.16.0.0/12(private + Docker)192.168.0.0/16(private)
Proxy Issues
WebSocket Connection Failures
WebSocket upgrade fails with 'WebSocket hijacking not supported'
Symptoms:
- WebSocket connections return 500 error
- Log shows:
WebSocket hijacking not supported
Solution:
This typically occurs when middleware interferes with the connection hijacking.
-
Ensure the route has WebSocket enabled:
routes:
- path_base: /apps/grafana/
to: http://grafana:3000
websocket: true # Required for WebSocket support -
Check if reverse proxy supports WebSocket upgrade:
# Caddyfile
@websocket header Connection *Upgrade*
@websocket header Upgrade websocket
reverse_proxy @websocket {upstream}
WebSocket connects but data not flowing
Symptoms:
- WebSocket handshake succeeds (101 Switching Protocols)
- No messages received after connection
Possible Causes:
- Firewall blocking WebSocket frames
- Proxy timeout too short
- Target service not sending data
Solution:
-
Increase timeouts if needed:
# Route-level timeout configuration
routes:
- path_base: /apps/grafana/
websocket: true
# WebSocket connections have no default timeout -
Check upstream service is sending data:
# Test direct connection to upstream
websocat ws://grafana:3000/api/live/ws
Proxy Timeouts
Error: 'Gateway Timeout' (504)
Symptoms:
- Requests hang then return 504
- Log shows timeout errors
Common Causes:
- Upstream service slow to respond
- DNS resolution taking too long
- Network connectivity issues
Solution:
-
Check upstream service health:
# Direct request to upstream
docker exec nekzus curl -v --max-time 5 http://upstream:8080/ -
Verify DNS resolution:
docker exec nekzus nslookup upstream-service -
Server timeouts are configured in the application:
- Read timeout: 15 seconds
- Write timeout: 30 seconds
- Idle timeout: 120 seconds
Error: 'Bad Gateway' (502)
Symptoms:
- Proxy returns 502
- Upstream service appears to be running
Common Causes and Solutions:
| Error Label | Cause | Solution |
|---|---|---|
connection_refused | Upstream not listening | Check if service is running and port is correct |
connection_reset | Upstream closed connection | Check upstream logs for errors |
host_unreachable | Network issue | Verify container networking |
dns_error | Cannot resolve hostname | Check DNS configuration |
Debug Steps:
# 1. Check if upstream container is running
docker ps | grep <upstream>
# 2. Test connectivity
docker exec nekzus ping <upstream-hostname>
# 3. Test HTTP connection
docker exec nekzus curl -v http://<upstream>:<port>/
SSL/TLS Certificate Errors
Error: 'x509: certificate signed by unknown authority'
Symptoms:
- Proxy to HTTPS upstream fails
- Log shows certificate validation error
Solution:
For self-signed upstream certificates, configure the route to skip verification:
routes:
- path_base: /apps/myservice/
to: https://myservice:8443
tls_skip_verify: true # Only for trusted internal services
Only use tls_skip_verify for trusted internal services. For external services, install proper CA certificates.
Mobile app rejects self-signed certificate
Symptoms:
- Mobile app cannot connect
- Certificate pinning failure
Solution:
-
Recommended: Use a trusted certificate (Let's Encrypt via Caddy)
-
Alternative: Generate certificate with proper SANs:
# Certificate should include your server's IP and hostname
openssl req -x509 -newkey rsa:4096 -nodes \
-keyout server.key -out server.crt -days 365 \
-subj "/CN=nekzus" \
-addext "subjectAltName=DNS:nekzus,IP:192.168.1.100" -
The QR code pairing process includes certificate SPKI for pinning
Path Rewriting Problems
Application returns 404 for assets or API calls
Symptoms:
- Main page loads but assets (CSS, JS) fail
- API calls to wrong path
Common Causes:
- Application expects to run at root path
- Asset paths are absolute, not relative
Solution:
-
Configure
strip_prefixbased on application needs:routes:
# For apps that can handle base paths:
- path_base: /apps/myapp/
strip_prefix: true # /apps/myapp/api -> /api
# For apps that expect full path:
- path_base: /apps/legacy/
strip_prefix: false # /apps/legacy/api -> /apps/legacy/api -
Enable HTML rewriting for apps with hardcoded paths:
routes:
- path_base: /apps/myapp/
rewrite_html: true # Rewrites absolute paths in HTML -
Some applications need environment configuration:
# For the upstream application
environment:
BASE_URL: /apps/myapp
PUBLIC_PATH: /apps/myapp/
Database Issues
SQLite Lock Errors
Error: 'database is locked'
Symptoms:
- Intermittent errors about database locking
- Operations fail under load
Solution:
Nekzus uses WAL mode and connection pooling to handle concurrent access. If you still see lock errors:
-
Check for external database access:
# Ensure no other processes are accessing the database
lsof +D /path/to/data/ -
Verify WAL mode is enabled:
docker exec nekzus sqlite3 /data/nexus.db "PRAGMA journal_mode;"
# Should return: wal -
Increase busy timeout (already set to 5 seconds):
The application sets
PRAGMA busy_timeout=5000by default. -
Check disk space:
df -h /path/to/data/
Database Corruption Recovery
Error: 'database disk image is malformed'
Symptoms:
- Database operations fail
- Application won't start
Solution:
Database corruption may result in data loss. Always maintain backups.
-
Stop the container:
docker stop nekzus -
Attempt recovery:
# Backup corrupted database
cp /data/nexus.db /data/nexus.db.corrupt
# Attempt to recover
sqlite3 /data/nexus.db ".recover" | sqlite3 /data/nexus-recovered.db
# Verify recovered database
sqlite3 /data/nexus-recovered.db "PRAGMA integrity_check;"
# Replace if recovery succeeded
mv /data/nexus-recovered.db /data/nexus.db -
Restore from backup (if recovery fails):
# List available backups
ls -la /data/backups/
# Restore latest backup
cp /data/backups/nexus-backup-latest.db /data/nexus.db -
Start fresh (last resort):
rm /data/nexus.db /data/nexus.db-wal /data/nexus.db-shm
docker start nekzus
# Re-pair all devices
Migration Failures
Error: 'migration failed' on startup
Symptoms:
- Application fails to start
- Log shows migration error
Solution:
-
Check the specific migration error:
docker logs nekzus 2>&1 | grep -i "migration" -
Common migration issues:
Error Cause Solution table already existsInterrupted migration Delete and let it recreate no such columnSchema mismatch Restore from backup constraint failedData integrity issue Check database contents -
Manual migration reset (caution - data loss):
# Backup first
cp /data/nexus.db /data/nexus.db.backup
# Remove and restart
rm /data/nexus.db
docker restart nekzus
Performance Issues
High Memory Usage
Container using excessive memory
Symptoms:
- Container exceeds memory limits
- OOM kills observed
- Memory grows over time
Solution:
-
Check current memory usage:
docker stats nekzus -
Health check includes memory monitoring:
The application monitors memory and reports health as degraded above 512MB.
-
Configure resource limits:
# docker-compose.yml
deploy:
resources:
limits:
memory: 1G
reservations:
memory: 256M -
Check for connection leaks:
docker exec nekzus wget -qO- http://localhost:8080/metrics | grep connections
Slow Response Times
API requests taking longer than expected
Symptoms:
- High latency on API calls
- Proxied requests slow
Diagnostic Steps:
-
Check Prometheus metrics:
curl -s http://localhost:8080/metrics | grep http_request_duration -
Check if it's the proxy or API:
# Direct API call
time curl https://localhost:8443/api/v1/apps
# Proxied request
time curl https://localhost:8443/apps/grafana/ -
Check upstream service health:
Visit Dashboard > Service Health to see upstream response times.
-
Enable request tracing:
NEKZUS_DEBUG=true docker restart nekzus
Connection Pooling
Too many connections to upstream services
Symptoms:
- Upstream services rejecting connections
- "connection reset" errors under load
Solution:
Nekzus uses Go's http.Transport with default pooling:
- Max idle connections: 100
- Max connections per host: 100
- Idle connection timeout: 90 seconds
For high-traffic scenarios, ensure upstream services can handle the connection count.
Logging and Debugging
Enabling Debug Logs
How to enable verbose logging
Solution:
-
Via environment variable:
docker run -e NEKZUS_DEBUG=true nekzus
# or
docker run -e NEKZUS_DEBUG=1 nekzus -
In docker-compose.yml:
environment:
NEKZUS_DEBUG: "true" -
Debug output includes:
- HTTP request details
- WebSocket frame information
- Discovery processing details
- Authentication flow details
Reading Container Logs
How to effectively read and filter logs
Useful Log Commands:
# Last 100 lines
docker logs nekzus --tail 100
# Follow logs in real-time
docker logs nekzus -f
# Logs since specific time
docker logs nekzus --since 1h
# Filter for errors only
docker logs nekzus 2>&1 | grep -i "error"
# Filter by component
docker logs nekzus 2>&1 | grep "component=discovery"
docker logs nekzus 2>&1 | grep "component=proxy"
docker logs nekzus 2>&1 | grep "component=auth"
Common Log Messages
Understanding common log messages
Informational Messages:
| Message | Meaning |
|---|---|
storage initialized | Database connected successfully |
registered docker worker | Docker discovery active |
scanning containers | Docker discovery running |
new proposal | Service discovered, awaiting approval |
config reload: completed successfully | Hot reload succeeded |
Warning Messages:
| Message | Meaning | Action |
|---|---|---|
failed to create docker discovery worker | Docker unavailable | Check Docker socket mount |
docker discovery will be disabled | Continuing without Docker | Mount Docker socket if needed |
only docker network ip found | Host networking issue | Check NEKZUS_BASE_URL |
invalid ack_timeout, using default | Config parse error | Check config syntax |
Error Messages:
| Message | Meaning | Action |
|---|---|---|
migration failed | Database schema error | Check database permissions |
failed to load TLS certificate | Certificate issue | Verify cert files exist and are valid |
JWT secret must be at least 32 characters | Security requirement | Use longer secret |
Getting Help
If you cannot resolve your issue using this guide:
-
Check existing issues: GitHub Issues
-
Gather diagnostic information:
# System information
docker version
docker info
uname -a
# Container status
docker ps -a | grep nekzus
docker inspect nekzus
# Recent logs
docker logs nekzus --tail 200 > nekzus-logs.txt 2>&1
# Health check
curl -k https://localhost:8443/api/v1/health -
Create a new issue with:
- Description of the problem
- Steps to reproduce
- Expected vs actual behavior
- Diagnostic information gathered above
- Configuration (redact secrets)