Adding loki support
This commit is contained in:
parent
c8b7f13e09
commit
ae97496b10
|
|
@ -0,0 +1,102 @@
|
||||||
|
# Debugging Guide
|
||||||
|
|
||||||
|
This guide provides common commands and procedures for debugging the Kubernetes infrastructure, with a focus on Monitoring, Loki, and Grafana.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
Ensure you have your environment activated:
|
||||||
|
```bash
|
||||||
|
source ~/bin/activate-stg # or activate-prod
|
||||||
|
```
|
||||||
|
|
||||||
|
## Monitoring & Logging (Loki, Grafana, Prometheus)
|
||||||
|
|
||||||
|
### 1. Quick Status Check
|
||||||
|
Check if all pods are running in the relevant namespaces:
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n monitoring
|
||||||
|
kubectl get pods -n loki
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Verifying Loki Log Ingestion
|
||||||
|
|
||||||
|
**Check if Loki is receiving logs:**
|
||||||
|
Paradoxically, "duplicate entry" errors are a *good* sign that logs are reaching Loki (it just means retries are happening).
|
||||||
|
```bash
|
||||||
|
kubectl -n loki logs -l app.kubernetes.io/name=loki --tail=50
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check if the Grafana Agent is sending logs:**
|
||||||
|
The Agent runs as a DaemonSet. Check the logs of one of the agent pods:
|
||||||
|
```bash
|
||||||
|
kubectl -n loki logs -l app.kubernetes.io/name=grafana-agent -c grafana-agent --tail=50
|
||||||
|
```
|
||||||
|
Look for errors like `401 Unauthorized` or `403 Forbidden`.
|
||||||
|
|
||||||
|
**Inspect Agent Configuration:**
|
||||||
|
Verify the Agent is actually configured to scrape what you expect:
|
||||||
|
```bash
|
||||||
|
kubectl -n loki get secret loki-logs-config -o jsonpath='{.data.agent\.yml}' | base64 -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Debugging Grafana
|
||||||
|
|
||||||
|
**Check Grafana Logs:**
|
||||||
|
Look for datasource provisioning errors or plugin issues:
|
||||||
|
```bash
|
||||||
|
kubectl -n monitoring logs deployment/monitoring-grafana --tail=100
|
||||||
|
```
|
||||||
|
|
||||||
|
**Verify Datasource Provisioning:**
|
||||||
|
Grafana uses a sidecar to watch secrets and provision datasources. Check its logs:
|
||||||
|
```bash
|
||||||
|
kubectl -n monitoring logs deployment/monitoring-grafana -c grafana-sc-datasources --tail=100
|
||||||
|
```
|
||||||
|
|
||||||
|
**Inspect Provisioned Datasource File:**
|
||||||
|
Check the actual file generated inside the Grafana pod to ensure `uid`, `url`, etc., are correct:
|
||||||
|
```bash
|
||||||
|
kubectl -n monitoring exec deployment/monitoring-grafana -c grafana -- cat /etc/grafana/provisioning/datasources/datasource.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
**Restart Grafana:**
|
||||||
|
If you suspect configuration hasn't been picked up:
|
||||||
|
```bash
|
||||||
|
kubectl -n monitoring rollout restart deployment/monitoring-grafana
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Connectivity Verification (The "Nuclear" Option)
|
||||||
|
|
||||||
|
If the dashboard is empty but you think everything is working, run a query **directly from the Grafana pod** to Loki. This bypasses the UI and confirms network connectivity and data availability.
|
||||||
|
|
||||||
|
**Test Connectivity:**
|
||||||
|
```bash
|
||||||
|
kubectl -n monitoring exec deployment/monitoring-grafana -- curl -s "http://loki.loki.svc:3100/loki/api/v1/labels"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Query Actual Logs:**
|
||||||
|
This asks Loki for the last 10 log lines for *any* job. If this returns JSON data, Loki is working perfectly.
|
||||||
|
```bash
|
||||||
|
kubectl -n monitoring exec deployment/monitoring-grafana -- curl -G -s "http://loki.loki.svc:3100/loki/api/v1/query_range" --data-urlencode 'query={job=~".+"}' --data-urlencode 'limit=10'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common Issues & Fixes
|
||||||
|
|
||||||
|
### "Datasource not found" in Dashboard
|
||||||
|
* **Cause**: The dashboard expects a specific Datasource UID (e.g., `uid: loki`), but Grafana generated a random one.
|
||||||
|
* **Fix**: Ensure `values.yaml` explicitly sets the UID:
|
||||||
|
```yaml
|
||||||
|
additionalDataSources:
|
||||||
|
- name: Loki
|
||||||
|
uid: loki # <--- Critical
|
||||||
|
```
|
||||||
|
|
||||||
|
### Logs not showing in Loki
|
||||||
|
* **Cause**: The `PodLogs` resource might be missing, or the Agent doesn't have permissions.
|
||||||
|
* **Check**:
|
||||||
|
1. Ensure `ClusterRole` has `pods/log` permission.
|
||||||
|
2. Ensure `LogsInstance` selector matches your `PodLogs` definition (or use `{}` to match all).
|
||||||
|
|
||||||
|
### ArgoCD Out of Sync
|
||||||
|
* **Cause**: Sometimes ArgoCD doesn't auto-prune resources or fails to update Secrets.
|
||||||
|
* **Fix**: Sync manually with "Prune" enabled, or delete the conflicting resource manually if safe.
|
||||||
|
|
@ -0,0 +1 @@
|
||||||
|
export KUBECONFIG=~/workspace/jamkazam/k8s/stg-video-cluster-kubeconfig.yaml
|
||||||
File diff suppressed because it is too large
Load Diff
|
|
@ -0,0 +1,46 @@
|
||||||
|
apiVersion: monitoring.grafana.com/v1alpha1
|
||||||
|
kind: PodLogs
|
||||||
|
metadata:
|
||||||
|
name: all-logs-fixed
|
||||||
|
labels:
|
||||||
|
app.kubernetes.io/name: loki
|
||||||
|
app.kubernetes.io/instance: loki
|
||||||
|
spec:
|
||||||
|
namespaceSelector:
|
||||||
|
any: true
|
||||||
|
selector:
|
||||||
|
matchLabels: {}
|
||||||
|
relabelings:
|
||||||
|
- action: replace
|
||||||
|
sourceLabels:
|
||||||
|
- __meta_kubernetes_pod_node_name
|
||||||
|
targetLabel: __host__
|
||||||
|
- action: labelmap
|
||||||
|
regex: __meta_kubernetes_pod_label_(.+)
|
||||||
|
- action: replace
|
||||||
|
replacement: $1
|
||||||
|
separator: '-'
|
||||||
|
sourceLabels:
|
||||||
|
- __meta_kubernetes_pod_label_app_kubernetes_io_name
|
||||||
|
- __meta_kubernetes_pod_label_app_kubernetes_io_component
|
||||||
|
targetLabel: __service__
|
||||||
|
- action: replace
|
||||||
|
replacement: $1
|
||||||
|
separator: /
|
||||||
|
sourceLabels:
|
||||||
|
- __meta_kubernetes_namespace
|
||||||
|
- __service__
|
||||||
|
targetLabel: job
|
||||||
|
- action: replace
|
||||||
|
sourceLabels:
|
||||||
|
- __meta_kubernetes_pod_container_name
|
||||||
|
targetLabel: container
|
||||||
|
- action: replace
|
||||||
|
sourceLabels:
|
||||||
|
- __meta_kubernetes_namespace
|
||||||
|
targetLabel: namespace
|
||||||
|
- action: replace
|
||||||
|
replacement: loki
|
||||||
|
targetLabel: cluster
|
||||||
|
pipelineStages:
|
||||||
|
- cri: {}
|
||||||
|
|
@ -39,6 +39,27 @@ loki:
|
||||||
retention_deletes_enabled: true
|
retention_deletes_enabled: true
|
||||||
retention_period: 672h
|
retention_period: 672h
|
||||||
|
|
||||||
|
monitoring:
|
||||||
|
selfMonitoring:
|
||||||
|
# -- Grafana Agent annotations
|
||||||
|
annotations: {}
|
||||||
|
# -- Additional Grafana Agent labels
|
||||||
|
labels: {}
|
||||||
|
# -- Enable the config read api on port 8080 of the agent
|
||||||
|
enableConfigReadAPI: false
|
||||||
|
extraClusterRoleRules:
|
||||||
|
- apiGroups: [""]
|
||||||
|
resources: ["pods/log"]
|
||||||
|
verbs: ["get", "list", "watch"]
|
||||||
|
grafanaAgent:
|
||||||
|
podLogs:
|
||||||
|
enabled: false
|
||||||
|
namespaceSelector:
|
||||||
|
any: true
|
||||||
|
logsInstance:
|
||||||
|
podLogsSelector:
|
||||||
|
matchLabels: {}
|
||||||
|
|
||||||
singleBinary:
|
singleBinary:
|
||||||
replicas: 1
|
replicas: 1
|
||||||
persistence:
|
persistence:
|
||||||
|
|
|
||||||
File diff suppressed because it is too large
Load Diff
|
|
@ -192,6 +192,7 @@ kube-prometheus-stack:
|
||||||
additionalDataSources:
|
additionalDataSources:
|
||||||
- name: Loki
|
- name: Loki
|
||||||
type: loki
|
type: loki
|
||||||
|
uid: loki
|
||||||
url: http://loki.loki.svc:3100
|
url: http://loki.loki.svc:3100
|
||||||
access: proxy
|
access: proxy
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -192,6 +192,7 @@ kube-prometheus-stack:
|
||||||
additionalDataSources:
|
additionalDataSources:
|
||||||
- name: Loki
|
- name: Loki
|
||||||
type: loki
|
type: loki
|
||||||
|
uid: loki
|
||||||
url: http://loki.loki.svc:3100
|
url: http://loki.loki.svc:3100
|
||||||
access: proxy
|
access: proxy
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue