Troubleshooting

This guide covers common issues you may encounter when running Nekzus and provides solutions to resolve them.

Quick Diagnostics

Before diving into specific issues, run these diagnostic commands:

# Check container status
docker ps -a | grep nekzus

# View recent logs
docker logs nekzus --tail 100

# Check health endpoint
curl -k https://localhost:8443/api/v1/healthz

# Enable debug logging
docker exec nekzus sh -c "export NEKZUS_DEBUG=true"

Installation Issues

Container Fails to Start

Container exits immediately after starting

Symptoms:

Container status shows Exited (1) or similar
docker logs shows configuration or initialization errors

Common Causes:

Invalid configuration file
Missing required environment variables
Database initialization failure

Solution:

Check the container logs for specific error messages:

docker logs nekzus 2>&1 | head -50

Common log messages and fixes:

Log Message	Cause	Solution
`JWT secret must be at least 32 characters`	JWT secret too short	Set `NEKZUS_JWT_SECRET` to a 32+ character string
`failed to create database directory`	Permission denied	Check volume mount permissions
`failed to load TLS certificate`	Invalid certificate files	Verify certificate paths and format
`JWT secret contains weak pattern`	Insecure secret detected	Use a strong random secret in production

Container keeps restarting in a loop

Symptoms:

Container shows Restarting status
Health checks consistently fail

Solution:

Check if the health check endpoint is accessible:

docker exec nekzus wget -q -O- http://localhost:8080/api/v1/healthz

Verify resource limits are not too restrictive:

# docker-compose.yml
deploy:
  resources:
    limits:
      memory: 1G  # Minimum recommended
    reservations:
      memory: 256M

Check if the database is corrupted (see Database Issues)

Port Conflicts

Error: bind: address already in use

Symptoms:

Container fails to start
Error message mentions port binding failure

Solution:

Identify what's using the port:

# Check port 8443 (HTTPS)
sudo lsof -i :8443
# or
sudo netstat -tulpn | grep 8443

Either stop the conflicting service or change Nekzus ports:

# docker-compose.yml
ports:
  - "9443:8443"  # Use port 9443 instead
  - "9080:80"

Update your NEKZUS_BASE_URL to match the new port:
```
NEKZUS_BASE_URL=https://your-server:9443
```

Permission Denied Errors

Permission denied when accessing files or Docker socket

Symptoms:

Errors related to file permissions
Docker discovery not working
Database write failures

Solution:

For database directory:

# Create data directory with correct permissions
mkdir -p ./data
chmod 755 ./data

# If running as non-root user
chown 1000:1000 ./data

For Docker socket access:

# Add read-only mount with correct permissions
docker run -v /var/run/docker.sock:/var/run/docker.sock:ro nekzus

# On Linux, ensure user is in docker group
sudo usermod -aG docker $USER

For certificate files:

# Ensure certificates are readable
chmod 644 ./certs/server.crt
chmod 600 ./certs/server.key

Docker Socket Access

Docker discovery shows 'Docker socket unavailable'

Symptoms:

Log message: failed to create Docker client
Discovery shows no containers
WebSocket event: Docker Discovery - Docker socket unavailable

Solution:

Verify Docker socket is mounted:

# docker-compose.yml
volumes:
  - /var/run/docker.sock:/var/run/docker.sock:ro

Check socket path (varies by platform):

Platform Socket Path
Linux /var/run/docker.sock
macOS (Docker Desktop) /var/run/docker.sock
Windows (WSL2) /var/run/docker.sock
Podman /run/podman/podman.sock
Rootless Docker /run/user/1000/docker.sock

Platform	Socket Path
Linux	`/var/run/docker.sock`
macOS (Docker Desktop)	`/var/run/docker.sock`
Windows (WSL2)	`/var/run/docker.sock`
Podman	`/run/podman/podman.sock`
Rootless Docker	`/run/user/1000/docker.sock`

For custom socket paths, set in config:

# config.yaml
discovery:
  docker:
    enabled: true
    socket_path: "unix:///run/user/1000/docker.sock"

Discovery Issues

Docker Discovery Not Finding Containers

Containers are running but not appearing in discovery

Symptoms:

Running containers not showing in proposals
Log shows scanning containers but no proposals created

Possible Causes:

Container on different network
Container has no HTTP ports
Container is explicitly disabled
Container is a system container (filtered)

Solution:

Check container labels:

docker inspect <container> --format '{{json .Config.Labels}}' | jq

Ensure container has nekzus.enable: "true" label or expose HTTP ports:

# docker-compose.yml for your service
labels:
  - "nekzus.enable=true"
  - "nekzus.app.id=myapp"
  - "nekzus.app.name=My Application"

Check network configuration:

# config.yaml
discovery:
  docker:
    enabled: true
    networks:
      - nekzus-network  # Only scan specific networks
    exclude_networks:
      - host
      - none

Enable debug logging to see why containers are skipped:

NEKZUS_DEBUG=true docker logs nekzus 2>&1 | grep -i "skipping"

Container discovered but HTTP probe fails

Symptoms:

Log shows: skipping port - HTTP probe failed
Container has exposed ports but none are discovered

Solution:

Nekzus probes ports to verify they serve HTTP. For non-standard setups:

Force discovery of specific port:
```
labels:
  - "nekzus.primary_port=3000"
```

Discover all TCP ports (skip probing):

labels:
  - "nekzus.discover.all_ports=true"

Check if service is ready: The container might need time to initialize:

# Check if port responds
docker exec <container> wget -q --spider http://localhost:3000

mDNS Discovery Failures

mDNS discovery not finding any services

Symptoms:

Log shows: worker started - not fully implemented
No mDNS services discovered

Current Status:

mDNS discovery is not fully implemented in the current version. The worker starts but does not actively discover services.

Workaround:

Use Docker discovery for containerized services

Manually configure static routes for mDNS services:

# config.yaml
routes:
  - route_id: "homeassistant"
    app_id: "homeassistant"
    path_base: "/apps/homeassistant/"
    to: "http://homeassistant.local:8123"

apps:
  - id: "homeassistant"
    name: "Home Assistant"
    icon: "https://example.com/ha-icon.png"

Kubernetes Service Discovery Problems

Kubernetes discovery shows 'failed to create Kubernetes config'

Symptoms:

Log message: failed to create Kubernetes config
Kubernetes services not discovered

Solution:

When running inside Kubernetes cluster:

Ensure proper RBAC permissions:

# kubernetes/rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: nekzus
rules:
  - apiGroups: [""]
    resources: ["services", "namespaces"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["networking.k8s.io"]
    resources: ["ingresses"]
    verbs: ["get", "list", "watch"]

When running outside cluster:

Mount kubeconfig file:

# docker-compose.yml
volumes:
  - ~/.kube/config:/app/.kube/config:ro

environment:
  KUBECONFIG: /app/.kube/config

Configure in config.yaml:

discovery:
  kubernetes:
    enabled: true
    kubeconfig: "/app/.kube/config"
    namespaces:
      - default
      - production

Services discovered but cannot be accessed

Symptoms:

Kubernetes services appear in catalog
Proxy returns 502 Bad Gateway

Solution:

Verify network connectivity between Nekzus and cluster:

# From Nexus container
docker exec nekzus nslookup myservice.default.svc.cluster.local
docker exec nekzus curl http://myservice.default.svc.cluster.local:8080

Ensure Nekzus can resolve Kubernetes DNS:

# docker-compose.yml
dns:
  - 10.96.0.10  # kube-dns service IP

Authentication Issues

JWT Token Errors

Error: 'TOKEN_EXPIRED' (Code 1001)

Symptoms:

API returns 401 with error code TOKEN_EXPIRED
Mobile app shows authentication expired

Solution:

Mobile app: The app should automatically attempt to refresh the token
If refresh fails: Re-pair the device by scanning a new QR code
For long-running scripts: Use API keys instead of JWT tokens

Token Lifetime Configuration:

# config.yaml
auth:
  token_ttl: "24h"    # Access token lifetime
  refresh_ttl: "720h"  # Refresh token lifetime (30 days)

Error: 'TOKEN_INVALID' (Code 1002)

Symptoms:

API returns 401 with error code TOKEN_INVALID
Token rejected as malformed

Common Causes:

JWT secret mismatch: Secret changed after token was issued
Token corruption: Token was modified or truncated
Wrong issuer/audience: Token from different Nekzus instance

Solution:

Verify JWT secret consistency:

# JWT secret should be the same across restarts
docker exec nekzus printenv | grep JWT_SECRET

Re-pair affected devices with a fresh bootstrap token

Error: 'DEVICE_REVOKED' (Code 1004)

Symptoms:

Device cannot authenticate
Previously working device suddenly rejected

Solution:

The device was explicitly revoked by an administrator.

Check revocation in the web UI under Devices

To restore access, delete and re-pair the device:

# Generate new bootstrap token
curl -X POST https://localhost:8443/api/v1/auth/bootstrap/generate

Mobile App Pairing Failures

QR code scanning works but pairing fails

Symptoms:

Mobile app scans QR code successfully
Pairing request returns error
Log shows: failed pairing attempt

Possible Causes:

Bootstrap token expired (5-minute default lifetime)
Token already used (one-time use)
Rate limiting triggered

Solution:

Generate a fresh QR code (old ones expire after 5 minutes)

Check for rate limiting:

docker logs nekzus 2>&1 | grep -i "rate"

Wait 1 minute and retry if rate limited

Mobile app cannot reach Nekzus server

Symptoms:

QR code contains correct URL
Mobile app shows connection error

Solution:

Verify network connectivity:
- Mobile device must be on the same network
- Check firewall rules allow port 8443

Verify base URL configuration:

# Should return your server's LAN IP, not localhost
docker logs nekzus 2>&1 | grep "base_url"

Fix base URL if incorrect:

NEKZUS_BASE_URL=https://192.168.1.100:8443

Certificate issues: Mobile apps may reject self-signed certificates. Either:
- Use a trusted certificate (Let's Encrypt)
- Accept the certificate warning on first connection

API Key Problems

API key returns 401 Unauthorized

Symptoms:

API key was working, now returns 401
Header X-API-Key is set correctly

Common Causes:

Key revoked or expired
Insufficient scopes
Key not found in database

Solution:

Check key status in the web UI under Settings > API Keys

Verify key has required scopes:

# Key needs appropriate scopes for the endpoint
# e.g., "write:*" for deployment operations

Create a new key if the old one is compromised or expired

IP Allowlist Issues

Request rejected even from local network

Symptoms:

Requests from LAN return 401
Log shows: Failed to parse IP from RemoteAddr

Solution:

Check if behind reverse proxy: When using Caddy/nginx, the real client IP may not be forwarded:

# Caddyfile - forward real IP
header_up X-Real-IP {remote_host}
header_up X-Forwarded-For {remote_host}

Docker network ranges: Ensure Docker bridge networks are recognized:

The following ranges are automatically recognized as local:
- 127.0.0.0/8 (loopback)
- 10.0.0.0/8 (private)
- 172.16.0.0/12 (private + Docker)
- 192.168.0.0/16 (private)

Proxy Issues

WebSocket Connection Failures

WebSocket upgrade fails with 'WebSocket hijacking not supported'

Symptoms:

WebSocket connections return 500 error
Log shows: WebSocket hijacking not supported

Solution:

This typically occurs when middleware interferes with the connection hijacking.

Ensure the route has WebSocket enabled:

routes:
  - path_base: /apps/grafana/
    to: http://grafana:3000
    websocket: true  # Required for WebSocket support

Check if reverse proxy supports WebSocket upgrade:

# Caddyfile
@websocket header Connection *Upgrade*
@websocket header Upgrade websocket
reverse_proxy @websocket {upstream}

WebSocket connects but data not flowing

Symptoms:

WebSocket handshake succeeds (101 Switching Protocols)
No messages received after connection

Possible Causes:

Firewall blocking WebSocket frames
Proxy timeout too short
Target service not sending data

Solution:

Increase timeouts if needed:

# Route-level timeout configuration
routes:
  - path_base: /apps/grafana/
    websocket: true
    # WebSocket connections have no default timeout

Check upstream service is sending data:

# Test direct connection to upstream
websocat ws://grafana:3000/api/live/ws

Proxy Timeouts

Error: 'Gateway Timeout' (504)

Symptoms:

Requests hang then return 504
Log shows timeout errors

Common Causes:

Upstream service slow to respond
DNS resolution taking too long
Network connectivity issues

Solution:

Check upstream service health:

# Direct request to upstream
docker exec nekzus curl -v --max-time 5 http://upstream:8080/

Verify DNS resolution:

docker exec nekzus nslookup upstream-service

Server timeouts are configured in the application:
- Read timeout: 15 seconds
- Write timeout: 30 seconds
- Idle timeout: 120 seconds

Error: 'Bad Gateway' (502)

Symptoms:

Proxy returns 502
Upstream service appears to be running

Common Causes and Solutions:

Error Label	Cause	Solution
`connection_refused`	Upstream not listening	Check if service is running and port is correct
`connection_reset`	Upstream closed connection	Check upstream logs for errors
`host_unreachable`	Network issue	Verify container networking
`dns_error`	Cannot resolve hostname	Check DNS configuration

Debug Steps:

# 1. Check if upstream container is running
docker ps | grep <upstream>

# 2. Test connectivity
docker exec nekzus ping <upstream-hostname>

# 3. Test HTTP connection
docker exec nekzus curl -v http://<upstream>:<port>/

SSL/TLS Certificate Errors

Error: 'x509: certificate signed by unknown authority'

Symptoms:

Proxy to HTTPS upstream fails
Log shows certificate validation error

Solution:

For self-signed upstream certificates, configure the route to skip verification:

routes:
  - path_base: /apps/myservice/
    to: https://myservice:8443
    tls_skip_verify: true  # Only for trusted internal services

Security Note

Only use tls_skip_verify for trusted internal services. For external services, install proper CA certificates.

Mobile app rejects self-signed certificate

Symptoms:

Mobile app cannot connect
Certificate pinning failure

Solution:

Recommended: Use a trusted certificate (Let's Encrypt via Caddy)

Alternative: Generate certificate with proper SANs:

# Certificate should include your server's IP and hostname
openssl req -x509 -newkey rsa:4096 -nodes \
  -keyout server.key -out server.crt -days 365 \
  -subj "/CN=nekzus" \
  -addext "subjectAltName=DNS:nekzus,IP:192.168.1.100"

The QR code pairing process includes certificate SPKI for pinning

Path Rewriting Problems

Application returns 404 for assets or API calls

Symptoms:

Main page loads but assets (CSS, JS) fail
API calls to wrong path

Common Causes:

Application expects to run at root path
Asset paths are absolute, not relative

Solution:

Configure strip_prefix based on application needs:

routes:
  # For apps that can handle base paths:
  - path_base: /apps/myapp/
    strip_prefix: true  # /apps/myapp/api -> /api

  # For apps that expect full path:
  - path_base: /apps/legacy/
    strip_prefix: false  # /apps/legacy/api -> /apps/legacy/api

Enable HTML rewriting for apps with hardcoded paths:

routes:
  - path_base: /apps/myapp/
    rewrite_html: true  # Rewrites absolute paths in HTML

Some applications need environment configuration:

# For the upstream application
environment:
  BASE_URL: /apps/myapp
  PUBLIC_PATH: /apps/myapp/

Database Issues

SQLite Lock Errors

Error: 'database is locked'

Symptoms:

Intermittent errors about database locking
Operations fail under load

Solution:

Nekzus uses WAL mode and connection pooling to handle concurrent access. If you still see lock errors:

Check for external database access:

# Ensure no other processes are accessing the database
lsof +D /path/to/data/

Verify WAL mode is enabled:

docker exec nekzus sqlite3 /data/nexus.db "PRAGMA journal_mode;"
# Should return: wal

Increase busy timeout (already set to 5 seconds):

The application sets PRAGMA busy_timeout=5000 by default.
Check disk space:
```
df -h /path/to/data/
```

Database Corruption Recovery

Error: 'database disk image is malformed'

Symptoms:

Database operations fail
Application won't start

Solution:

Data Loss Risk

Database corruption may result in data loss. Always maintain backups.

Stop the container:
```
docker stop nekzus
```

Attempt recovery:

# Backup corrupted database
cp /data/nexus.db /data/nexus.db.corrupt

# Attempt to recover
sqlite3 /data/nexus.db ".recover" | sqlite3 /data/nexus-recovered.db

# Verify recovered database
sqlite3 /data/nexus-recovered.db "PRAGMA integrity_check;"

# Replace if recovery succeeded
mv /data/nexus-recovered.db /data/nexus.db

Restore from backup (if recovery fails):

# List available backups
ls -la /data/backups/

# Restore latest backup
cp /data/backups/nexus-backup-latest.db /data/nexus.db

Start fresh (last resort):

rm /data/nexus.db /data/nexus.db-wal /data/nexus.db-shm
docker start nekzus
# Re-pair all devices

Migration Failures

Error: 'migration failed' on startup

Symptoms:

Application fails to start
Log shows migration error

Solution:

Check the specific migration error:

docker logs nekzus 2>&1 | grep -i "migration"

Common migration issues:

Error Cause Solution
table already exists Interrupted migration Delete and let it recreate
no such column Schema mismatch Restore from backup
constraint failed Data integrity issue Check database contents

Error	Cause	Solution
`table already exists`	Interrupted migration	Delete and let it recreate
`no such column`	Schema mismatch	Restore from backup
`constraint failed`	Data integrity issue	Check database contents

Manual migration reset (caution - data loss):

# Backup first
cp /data/nexus.db /data/nexus.db.backup

# Remove and restart
rm /data/nexus.db
docker restart nekzus

Performance Issues

High Memory Usage

Container using excessive memory

Symptoms:

Container exceeds memory limits
OOM kills observed
Memory grows over time

Solution:

Check current memory usage:
```
docker stats nekzus
```
Health check includes memory monitoring:

The application monitors memory and reports health as degraded above 512MB.

Configure resource limits:

# docker-compose.yml
deploy:
  resources:
    limits:
      memory: 1G
    reservations:
      memory: 256M

Check for connection leaks:

docker exec nekzus wget -qO- http://localhost:8080/metrics | grep connections

Slow Response Times

API requests taking longer than expected

Symptoms:

High latency on API calls
Proxied requests slow

Diagnostic Steps:

Check Prometheus metrics:

curl -s http://localhost:8080/metrics | grep http_request_duration

Check if it's the proxy or API:

# Direct API call
time curl https://localhost:8443/api/v1/apps

# Proxied request
time curl https://localhost:8443/apps/grafana/

Check upstream service health:

Visit Dashboard > Service Health to see upstream response times.
Enable request tracing:
```
NEKZUS_DEBUG=true docker restart nekzus
```

Connection Pooling

Too many connections to upstream services

Symptoms:

Upstream services rejecting connections
"connection reset" errors under load

Solution:

Nekzus uses Go's http.Transport with default pooling:

Max idle connections: 100
Max connections per host: 100
Idle connection timeout: 90 seconds

For high-traffic scenarios, ensure upstream services can handle the connection count.

Logging and Debugging

Enabling Debug Logs

How to enable verbose logging

Solution:

Via environment variable:

docker run -e NEKZUS_DEBUG=true nekzus
# or
docker run -e NEKZUS_DEBUG=1 nekzus

In docker-compose.yml:
```
environment:
  NEKZUS_DEBUG: "true"
```
Debug output includes:
- HTTP request details
- WebSocket frame information
- Discovery processing details
- Authentication flow details

Reading Container Logs

How to effectively read and filter logs

Useful Log Commands:

# Last 100 lines
docker logs nekzus --tail 100

# Follow logs in real-time
docker logs nekzus -f

# Logs since specific time
docker logs nekzus --since 1h

# Filter for errors only
docker logs nekzus 2>&1 | grep -i "error"

# Filter by component
docker logs nekzus 2>&1 | grep "component=discovery"
docker logs nekzus 2>&1 | grep "component=proxy"
docker logs nekzus 2>&1 | grep "component=auth"

Common Log Messages

Understanding common log messages

Informational Messages:

Message	Meaning
`storage initialized`	Database connected successfully
`registered docker worker`	Docker discovery active
`scanning containers`	Docker discovery running
`new proposal`	Service discovered, awaiting approval
`config reload: completed successfully`	Hot reload succeeded

Warning Messages:

Message	Meaning	Action
`failed to create docker discovery worker`	Docker unavailable	Check Docker socket mount
`docker discovery will be disabled`	Continuing without Docker	Mount Docker socket if needed
`only docker network ip found`	Host networking issue	Check NEKZUS_BASE_URL
`invalid ack_timeout, using default`	Config parse error	Check config syntax

Error Messages:

Message	Meaning	Action
`migration failed`	Database schema error	Check database permissions
`failed to load TLS certificate`	Certificate issue	Verify cert files exist and are valid
`JWT secret must be at least 32 characters`	Security requirement	Use longer secret

Getting Help

If you cannot resolve your issue using this guide:

Check existing issues: GitHub Issues

Gather diagnostic information:

# System information
docker version
docker info
uname -a

# Container status
docker ps -a | grep nekzus
docker inspect nekzus

# Recent logs
docker logs nekzus --tail 200 > nekzus-logs.txt 2>&1

# Health check
curl -k https://localhost:8443/api/v1/health

Create a new issue with:
- Description of the problem
- Steps to reproduce
- Expected vs actual behavior
- Diagnostic information gathered above
- Configuration (redact secrets)

Quick Diagnostics​

Installation Issues​

Container Fails to Start​

Port Conflicts​

Permission Denied Errors​

Docker Socket Access​

Discovery Issues​

Docker Discovery Not Finding Containers​

mDNS Discovery Failures​

Kubernetes Service Discovery Problems​

Authentication Issues​

JWT Token Errors​

Mobile App Pairing Failures​

API Key Problems​

IP Allowlist Issues​

Proxy Issues​

WebSocket Connection Failures​

Proxy Timeouts​

SSL/TLS Certificate Errors​

Path Rewriting Problems​

Database Issues​

SQLite Lock Errors​

Database Corruption Recovery​

Migration Failures​

Performance Issues​

High Memory Usage​

Slow Response Times​

Connection Pooling​

Logging and Debugging​

Enabling Debug Logs​

Reading Container Logs​

Common Log Messages​

Getting Help​

Quick Diagnostics

Installation Issues

Container Fails to Start

Port Conflicts

Permission Denied Errors

Docker Socket Access

Discovery Issues

Docker Discovery Not Finding Containers

mDNS Discovery Failures

Kubernetes Service Discovery Problems

Authentication Issues

JWT Token Errors

Mobile App Pairing Failures

API Key Problems

IP Allowlist Issues

Proxy Issues

WebSocket Connection Failures

Proxy Timeouts

SSL/TLS Certificate Errors

Path Rewriting Problems

Database Issues

SQLite Lock Errors

Database Corruption Recovery

Migration Failures

Performance Issues

High Memory Usage

Slow Response Times

Connection Pooling

Logging and Debugging

Enabling Debug Logs

Reading Container Logs

Common Log Messages

Getting Help