Skip to main content

Troubleshooting

This guide covers common issues you may encounter when running Nekzus and provides solutions to resolve them.


Quick Diagnostics

Before diving into specific issues, run these diagnostic commands:

# Check container status
docker ps -a | grep nekzus

# View recent logs
docker logs nekzus --tail 100

# Check health endpoint
curl -k https://localhost:8443/api/v1/healthz

# Enable debug logging
docker exec nekzus sh -c "export NEKZUS_DEBUG=true"

Installation Issues

Container Fails to Start

Container exits immediately after starting

Symptoms:

  • Container status shows Exited (1) or similar
  • docker logs shows configuration or initialization errors

Common Causes:

  1. Invalid configuration file
  2. Missing required environment variables
  3. Database initialization failure

Solution:

Check the container logs for specific error messages:

docker logs nekzus 2>&1 | head -50

Common log messages and fixes:

Log MessageCauseSolution
JWT secret must be at least 32 charactersJWT secret too shortSet NEKZUS_JWT_SECRET to a 32+ character string
failed to create database directoryPermission deniedCheck volume mount permissions
failed to load TLS certificateInvalid certificate filesVerify certificate paths and format
JWT secret contains weak patternInsecure secret detectedUse a strong random secret in production
Container keeps restarting in a loop

Symptoms:

  • Container shows Restarting status
  • Health checks consistently fail

Solution:

  1. Check if the health check endpoint is accessible:

    docker exec nekzus wget -q -O- http://localhost:8080/api/v1/healthz
  2. Verify resource limits are not too restrictive:

    # docker-compose.yml
    deploy:
    resources:
    limits:
    memory: 1G # Minimum recommended
    reservations:
    memory: 256M
  3. Check if the database is corrupted (see Database Issues)

Port Conflicts

Error: bind: address already in use

Symptoms:

  • Container fails to start
  • Error message mentions port binding failure

Solution:

  1. Identify what's using the port:

    # Check port 8443 (HTTPS)
    sudo lsof -i :8443
    # or
    sudo netstat -tulpn | grep 8443
  2. Either stop the conflicting service or change Nekzus ports:

    # docker-compose.yml
    ports:
    - "9443:8443" # Use port 9443 instead
    - "9080:80"
  3. Update your NEKZUS_BASE_URL to match the new port:

    NEKZUS_BASE_URL=https://your-server:9443

Permission Denied Errors

Permission denied when accessing files or Docker socket

Symptoms:

  • Errors related to file permissions
  • Docker discovery not working
  • Database write failures

Solution:

  1. For database directory:

    # Create data directory with correct permissions
    mkdir -p ./data
    chmod 755 ./data

    # If running as non-root user
    chown 1000:1000 ./data
  2. For Docker socket access:

    # Add read-only mount with correct permissions
    docker run -v /var/run/docker.sock:/var/run/docker.sock:ro nekzus

    # On Linux, ensure user is in docker group
    sudo usermod -aG docker $USER
  3. For certificate files:

    # Ensure certificates are readable
    chmod 644 ./certs/server.crt
    chmod 600 ./certs/server.key

Docker Socket Access

Docker discovery shows 'Docker socket unavailable'

Symptoms:

  • Log message: failed to create Docker client
  • Discovery shows no containers
  • WebSocket event: Docker Discovery - Docker socket unavailable

Solution:

  1. Verify Docker socket is mounted:

    # docker-compose.yml
    volumes:
    - /var/run/docker.sock:/var/run/docker.sock:ro
  2. Check socket path (varies by platform):

    PlatformSocket Path
    Linux/var/run/docker.sock
    macOS (Docker Desktop)/var/run/docker.sock
    Windows (WSL2)/var/run/docker.sock
    Podman/run/podman/podman.sock
    Rootless Docker/run/user/1000/docker.sock
  3. For custom socket paths, set in config:

    # config.yaml
    discovery:
    docker:
    enabled: true
    socket_path: "unix:///run/user/1000/docker.sock"

Discovery Issues

Docker Discovery Not Finding Containers

Containers are running but not appearing in discovery

Symptoms:

  • Running containers not showing in proposals
  • Log shows scanning containers but no proposals created

Possible Causes:

  1. Container on different network
  2. Container has no HTTP ports
  3. Container is explicitly disabled
  4. Container is a system container (filtered)

Solution:

  1. Check container labels:

    docker inspect <container> --format '{{json .Config.Labels}}' | jq
  2. Ensure container has nekzus.enable: "true" label or expose HTTP ports:

    # docker-compose.yml for your service
    labels:
    - "nekzus.enable=true"
    - "nekzus.app.id=myapp"
    - "nekzus.app.name=My Application"
  3. Check network configuration:

    # config.yaml
    discovery:
    docker:
    enabled: true
    networks:
    - nekzus-network # Only scan specific networks
    exclude_networks:
    - host
    - none
  4. Enable debug logging to see why containers are skipped:

    NEKZUS_DEBUG=true docker logs nekzus 2>&1 | grep -i "skipping"
Container discovered but HTTP probe fails

Symptoms:

  • Log shows: skipping port - HTTP probe failed
  • Container has exposed ports but none are discovered

Solution:

Nekzus probes ports to verify they serve HTTP. For non-standard setups:

  1. Force discovery of specific port:

    labels:
    - "nekzus.primary_port=3000"
  2. Discover all TCP ports (skip probing):

    labels:
    - "nekzus.discover.all_ports=true"
  3. Check if service is ready: The container might need time to initialize:

    # Check if port responds
    docker exec <container> wget -q --spider http://localhost:3000

mDNS Discovery Failures

mDNS discovery not finding any services

Symptoms:

  • Log shows: worker started - not fully implemented
  • No mDNS services discovered

Current Status:

mDNS discovery is not fully implemented in the current version. The worker starts but does not actively discover services.

Workaround:

  1. Use Docker discovery for containerized services

  2. Manually configure static routes for mDNS services:

    # config.yaml
    routes:
    - route_id: "homeassistant"
    app_id: "homeassistant"
    path_base: "/apps/homeassistant/"
    to: "http://homeassistant.local:8123"

    apps:
    - id: "homeassistant"
    name: "Home Assistant"
    icon: "https://example.com/ha-icon.png"

Kubernetes Service Discovery Problems

Kubernetes discovery shows 'failed to create Kubernetes config'

Symptoms:

  • Log message: failed to create Kubernetes config
  • Kubernetes services not discovered

Solution:

  1. When running inside Kubernetes cluster:

    Ensure proper RBAC permissions:

    # kubernetes/rbac.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
    name: nekzus
    rules:
    - apiGroups: [""]
    resources: ["services", "namespaces"]
    verbs: ["get", "list", "watch"]
    - apiGroups: ["networking.k8s.io"]
    resources: ["ingresses"]
    verbs: ["get", "list", "watch"]
  2. When running outside cluster:

    Mount kubeconfig file:

    # docker-compose.yml
    volumes:
    - ~/.kube/config:/app/.kube/config:ro

    environment:
    KUBECONFIG: /app/.kube/config
  3. Configure in config.yaml:

    discovery:
    kubernetes:
    enabled: true
    kubeconfig: "/app/.kube/config"
    namespaces:
    - default
    - production
Services discovered but cannot be accessed

Symptoms:

  • Kubernetes services appear in catalog
  • Proxy returns 502 Bad Gateway

Solution:

  1. Verify network connectivity between Nekzus and cluster:

    # From Nexus container
    docker exec nekzus nslookup myservice.default.svc.cluster.local
    docker exec nekzus curl http://myservice.default.svc.cluster.local:8080
  2. Ensure Nekzus can resolve Kubernetes DNS:

    # docker-compose.yml
    dns:
    - 10.96.0.10 # kube-dns service IP

Authentication Issues

JWT Token Errors

Error: 'TOKEN_EXPIRED' (Code 1001)

Symptoms:

  • API returns 401 with error code TOKEN_EXPIRED
  • Mobile app shows authentication expired

Solution:

  1. Mobile app: The app should automatically attempt to refresh the token
  2. If refresh fails: Re-pair the device by scanning a new QR code
  3. For long-running scripts: Use API keys instead of JWT tokens

Token Lifetime Configuration:

# config.yaml
auth:
token_ttl: "24h" # Access token lifetime
refresh_ttl: "720h" # Refresh token lifetime (30 days)
Error: 'TOKEN_INVALID' (Code 1002)

Symptoms:

  • API returns 401 with error code TOKEN_INVALID
  • Token rejected as malformed

Common Causes:

  1. JWT secret mismatch: Secret changed after token was issued
  2. Token corruption: Token was modified or truncated
  3. Wrong issuer/audience: Token from different Nekzus instance

Solution:

  1. Verify JWT secret consistency:

    # JWT secret should be the same across restarts
    docker exec nekzus printenv | grep JWT_SECRET
  2. Re-pair affected devices with a fresh bootstrap token

Error: 'DEVICE_REVOKED' (Code 1004)

Symptoms:

  • Device cannot authenticate
  • Previously working device suddenly rejected

Solution:

The device was explicitly revoked by an administrator.

  1. Check revocation in the web UI under Devices

  2. To restore access, delete and re-pair the device:

    # Generate new bootstrap token
    curl -X POST https://localhost:8443/api/v1/auth/bootstrap/generate

Mobile App Pairing Failures

QR code scanning works but pairing fails

Symptoms:

  • Mobile app scans QR code successfully
  • Pairing request returns error
  • Log shows: failed pairing attempt

Possible Causes:

  1. Bootstrap token expired (5-minute default lifetime)
  2. Token already used (one-time use)
  3. Rate limiting triggered

Solution:

  1. Generate a fresh QR code (old ones expire after 5 minutes)

  2. Check for rate limiting:

    docker logs nekzus 2>&1 | grep -i "rate"
  3. Wait 1 minute and retry if rate limited

Mobile app cannot reach Nekzus server

Symptoms:

  • QR code contains correct URL
  • Mobile app shows connection error

Solution:

  1. Verify network connectivity:

    • Mobile device must be on the same network
    • Check firewall rules allow port 8443
  2. Verify base URL configuration:

    # Should return your server's LAN IP, not localhost
    docker logs nekzus 2>&1 | grep "base_url"
  3. Fix base URL if incorrect:

    NEKZUS_BASE_URL=https://192.168.1.100:8443
  4. Certificate issues: Mobile apps may reject self-signed certificates. Either:

    • Use a trusted certificate (Let's Encrypt)
    • Accept the certificate warning on first connection

API Key Problems

API key returns 401 Unauthorized

Symptoms:

  • API key was working, now returns 401
  • Header X-API-Key is set correctly

Common Causes:

  1. Key revoked or expired
  2. Insufficient scopes
  3. Key not found in database

Solution:

  1. Check key status in the web UI under Settings > API Keys

  2. Verify key has required scopes:

    # Key needs appropriate scopes for the endpoint
    # e.g., "write:*" for deployment operations
  3. Create a new key if the old one is compromised or expired

IP Allowlist Issues

Request rejected even from local network

Symptoms:

  • Requests from LAN return 401
  • Log shows: Failed to parse IP from RemoteAddr

Solution:

  1. Check if behind reverse proxy: When using Caddy/nginx, the real client IP may not be forwarded:

    # Caddyfile - forward real IP
    header_up X-Real-IP {remote_host}
    header_up X-Forwarded-For {remote_host}
  2. Docker network ranges: Ensure Docker bridge networks are recognized:

    The following ranges are automatically recognized as local:

    • 127.0.0.0/8 (loopback)
    • 10.0.0.0/8 (private)
    • 172.16.0.0/12 (private + Docker)
    • 192.168.0.0/16 (private)

Proxy Issues

WebSocket Connection Failures

WebSocket upgrade fails with 'WebSocket hijacking not supported'

Symptoms:

  • WebSocket connections return 500 error
  • Log shows: WebSocket hijacking not supported

Solution:

This typically occurs when middleware interferes with the connection hijacking.

  1. Ensure the route has WebSocket enabled:

    routes:
    - path_base: /apps/grafana/
    to: http://grafana:3000
    websocket: true # Required for WebSocket support
  2. Check if reverse proxy supports WebSocket upgrade:

    # Caddyfile
    @websocket header Connection *Upgrade*
    @websocket header Upgrade websocket
    reverse_proxy @websocket {upstream}
WebSocket connects but data not flowing

Symptoms:

  • WebSocket handshake succeeds (101 Switching Protocols)
  • No messages received after connection

Possible Causes:

  1. Firewall blocking WebSocket frames
  2. Proxy timeout too short
  3. Target service not sending data

Solution:

  1. Increase timeouts if needed:

    # Route-level timeout configuration
    routes:
    - path_base: /apps/grafana/
    websocket: true
    # WebSocket connections have no default timeout
  2. Check upstream service is sending data:

    # Test direct connection to upstream
    websocat ws://grafana:3000/api/live/ws

Proxy Timeouts

Error: 'Gateway Timeout' (504)

Symptoms:

  • Requests hang then return 504
  • Log shows timeout errors

Common Causes:

  1. Upstream service slow to respond
  2. DNS resolution taking too long
  3. Network connectivity issues

Solution:

  1. Check upstream service health:

    # Direct request to upstream
    docker exec nekzus curl -v --max-time 5 http://upstream:8080/
  2. Verify DNS resolution:

    docker exec nekzus nslookup upstream-service
  3. Server timeouts are configured in the application:

    • Read timeout: 15 seconds
    • Write timeout: 30 seconds
    • Idle timeout: 120 seconds
Error: 'Bad Gateway' (502)

Symptoms:

  • Proxy returns 502
  • Upstream service appears to be running

Common Causes and Solutions:

Error LabelCauseSolution
connection_refusedUpstream not listeningCheck if service is running and port is correct
connection_resetUpstream closed connectionCheck upstream logs for errors
host_unreachableNetwork issueVerify container networking
dns_errorCannot resolve hostnameCheck DNS configuration

Debug Steps:

# 1. Check if upstream container is running
docker ps | grep <upstream>

# 2. Test connectivity
docker exec nekzus ping <upstream-hostname>

# 3. Test HTTP connection
docker exec nekzus curl -v http://<upstream>:<port>/

SSL/TLS Certificate Errors

Error: 'x509: certificate signed by unknown authority'

Symptoms:

  • Proxy to HTTPS upstream fails
  • Log shows certificate validation error

Solution:

For self-signed upstream certificates, configure the route to skip verification:

routes:
- path_base: /apps/myservice/
to: https://myservice:8443
tls_skip_verify: true # Only for trusted internal services
Security Note

Only use tls_skip_verify for trusted internal services. For external services, install proper CA certificates.

Mobile app rejects self-signed certificate

Symptoms:

  • Mobile app cannot connect
  • Certificate pinning failure

Solution:

  1. Recommended: Use a trusted certificate (Let's Encrypt via Caddy)

  2. Alternative: Generate certificate with proper SANs:

    # Certificate should include your server's IP and hostname
    openssl req -x509 -newkey rsa:4096 -nodes \
    -keyout server.key -out server.crt -days 365 \
    -subj "/CN=nekzus" \
    -addext "subjectAltName=DNS:nekzus,IP:192.168.1.100"
  3. The QR code pairing process includes certificate SPKI for pinning

Path Rewriting Problems

Application returns 404 for assets or API calls

Symptoms:

  • Main page loads but assets (CSS, JS) fail
  • API calls to wrong path

Common Causes:

  1. Application expects to run at root path
  2. Asset paths are absolute, not relative

Solution:

  1. Configure strip_prefix based on application needs:

    routes:
    # For apps that can handle base paths:
    - path_base: /apps/myapp/
    strip_prefix: true # /apps/myapp/api -> /api

    # For apps that expect full path:
    - path_base: /apps/legacy/
    strip_prefix: false # /apps/legacy/api -> /apps/legacy/api
  2. Enable HTML rewriting for apps with hardcoded paths:

    routes:
    - path_base: /apps/myapp/
    rewrite_html: true # Rewrites absolute paths in HTML
  3. Some applications need environment configuration:

    # For the upstream application
    environment:
    BASE_URL: /apps/myapp
    PUBLIC_PATH: /apps/myapp/

Database Issues

SQLite Lock Errors

Error: 'database is locked'

Symptoms:

  • Intermittent errors about database locking
  • Operations fail under load

Solution:

Nekzus uses WAL mode and connection pooling to handle concurrent access. If you still see lock errors:

  1. Check for external database access:

    # Ensure no other processes are accessing the database
    lsof +D /path/to/data/
  2. Verify WAL mode is enabled:

    docker exec nekzus sqlite3 /data/nexus.db "PRAGMA journal_mode;"
    # Should return: wal
  3. Increase busy timeout (already set to 5 seconds):

    The application sets PRAGMA busy_timeout=5000 by default.

  4. Check disk space:

    df -h /path/to/data/

Database Corruption Recovery

Error: 'database disk image is malformed'

Symptoms:

  • Database operations fail
  • Application won't start

Solution:

Data Loss Risk

Database corruption may result in data loss. Always maintain backups.

  1. Stop the container:

    docker stop nekzus
  2. Attempt recovery:

    # Backup corrupted database
    cp /data/nexus.db /data/nexus.db.corrupt

    # Attempt to recover
    sqlite3 /data/nexus.db ".recover" | sqlite3 /data/nexus-recovered.db

    # Verify recovered database
    sqlite3 /data/nexus-recovered.db "PRAGMA integrity_check;"

    # Replace if recovery succeeded
    mv /data/nexus-recovered.db /data/nexus.db
  3. Restore from backup (if recovery fails):

    # List available backups
    ls -la /data/backups/

    # Restore latest backup
    cp /data/backups/nexus-backup-latest.db /data/nexus.db
  4. Start fresh (last resort):

    rm /data/nexus.db /data/nexus.db-wal /data/nexus.db-shm
    docker start nekzus
    # Re-pair all devices

Migration Failures

Error: 'migration failed' on startup

Symptoms:

  • Application fails to start
  • Log shows migration error

Solution:

  1. Check the specific migration error:

    docker logs nekzus 2>&1 | grep -i "migration"
  2. Common migration issues:

    ErrorCauseSolution
    table already existsInterrupted migrationDelete and let it recreate
    no such columnSchema mismatchRestore from backup
    constraint failedData integrity issueCheck database contents
  3. Manual migration reset (caution - data loss):

    # Backup first
    cp /data/nexus.db /data/nexus.db.backup

    # Remove and restart
    rm /data/nexus.db
    docker restart nekzus

Performance Issues

High Memory Usage

Container using excessive memory

Symptoms:

  • Container exceeds memory limits
  • OOM kills observed
  • Memory grows over time

Solution:

  1. Check current memory usage:

    docker stats nekzus
  2. Health check includes memory monitoring:

    The application monitors memory and reports health as degraded above 512MB.

  3. Configure resource limits:

    # docker-compose.yml
    deploy:
    resources:
    limits:
    memory: 1G
    reservations:
    memory: 256M
  4. Check for connection leaks:

    docker exec nekzus wget -qO- http://localhost:8080/metrics | grep connections

Slow Response Times

API requests taking longer than expected

Symptoms:

  • High latency on API calls
  • Proxied requests slow

Diagnostic Steps:

  1. Check Prometheus metrics:

    curl -s http://localhost:8080/metrics | grep http_request_duration
  2. Check if it's the proxy or API:

    # Direct API call
    time curl https://localhost:8443/api/v1/apps

    # Proxied request
    time curl https://localhost:8443/apps/grafana/
  3. Check upstream service health:

    Visit Dashboard > Service Health to see upstream response times.

  4. Enable request tracing:

    NEKZUS_DEBUG=true docker restart nekzus

Connection Pooling

Too many connections to upstream services

Symptoms:

  • Upstream services rejecting connections
  • "connection reset" errors under load

Solution:

Nekzus uses Go's http.Transport with default pooling:

  • Max idle connections: 100
  • Max connections per host: 100
  • Idle connection timeout: 90 seconds

For high-traffic scenarios, ensure upstream services can handle the connection count.


Logging and Debugging

Enabling Debug Logs

How to enable verbose logging

Solution:

  1. Via environment variable:

    docker run -e NEKZUS_DEBUG=true nekzus
    # or
    docker run -e NEKZUS_DEBUG=1 nekzus
  2. In docker-compose.yml:

    environment:
    NEKZUS_DEBUG: "true"
  3. Debug output includes:

    • HTTP request details
    • WebSocket frame information
    • Discovery processing details
    • Authentication flow details

Reading Container Logs

How to effectively read and filter logs

Useful Log Commands:

# Last 100 lines
docker logs nekzus --tail 100

# Follow logs in real-time
docker logs nekzus -f

# Logs since specific time
docker logs nekzus --since 1h

# Filter for errors only
docker logs nekzus 2>&1 | grep -i "error"

# Filter by component
docker logs nekzus 2>&1 | grep "component=discovery"
docker logs nekzus 2>&1 | grep "component=proxy"
docker logs nekzus 2>&1 | grep "component=auth"

Common Log Messages

Understanding common log messages

Informational Messages:

MessageMeaning
storage initializedDatabase connected successfully
registered docker workerDocker discovery active
scanning containersDocker discovery running
new proposalService discovered, awaiting approval
config reload: completed successfullyHot reload succeeded

Warning Messages:

MessageMeaningAction
failed to create docker discovery workerDocker unavailableCheck Docker socket mount
docker discovery will be disabledContinuing without DockerMount Docker socket if needed
only docker network ip foundHost networking issueCheck NEKZUS_BASE_URL
invalid ack_timeout, using defaultConfig parse errorCheck config syntax

Error Messages:

MessageMeaningAction
migration failedDatabase schema errorCheck database permissions
failed to load TLS certificateCertificate issueVerify cert files exist and are valid
JWT secret must be at least 32 charactersSecurity requirementUse longer secret

Getting Help

If you cannot resolve your issue using this guide:

  1. Check existing issues: GitHub Issues

  2. Gather diagnostic information:

    # System information
    docker version
    docker info
    uname -a

    # Container status
    docker ps -a | grep nekzus
    docker inspect nekzus

    # Recent logs
    docker logs nekzus --tail 200 > nekzus-logs.txt 2>&1

    # Health check
    curl -k https://localhost:8443/api/v1/health
  3. Create a new issue with:

    • Description of the problem
    • Steps to reproduce
    • Expected vs actual behavior
    • Diagnostic information gathered above
    • Configuration (redact secrets)