AI Resume Tailoring Pipeline on Kubernetes

Introduction

Tailoring a resume for each job posting is one of those tasks that sounds straightforward until you are doing it at volume. Every role has a different emphasis — one wants Terraform and GCP, the next wants Ansible and on-prem Kubernetes. Swapping out bullet points, reordering sections, and adjusting the summary manually for each posting is repetitive, error-prone, and slow enough that it often does not happen at all.

The alternative most people reach for is a generic resume that fits no role especially well. Neither outcome is good.

This post describes a two-stage pipeline running on a homelab Kubernetes cluster that solves this with local and cloud LLMs: a Node.js-based job ingestion script that parses job postings on demand, and a daily CronJob that reads those queued jobs and produces tailored resume variants using Gemini 2.5 Flash — one per posting, every morning at 6am PDT, ready to use.

The Two Stages

Stage 1: Job Ingestion via OpenClaw

The first stage is on-demand. When a job posting looks interesting, a single chat command queues it. The heavy lifting is done by parse_job_links.js, a Node.js script running inside an existing pod called OpenClaw — a Chrome-based browser automation pod that already runs in the cluster.

node /scripts/parse_job_links.js <url1> [url2] ...

The script connects to a headless Chrome instance via the Chrome DevTools Protocol (CDP) on localhost:9222. This matters because modern job boards — Greenhouse, Lever, Workday — render their content with JavaScript. A plain HTTP fetch gets you a skeleton. CDP gets you the rendered DOM.

After each page loads, the script waits 2.5 seconds for JS-rendered content to settle before extracting the page text. That text is then sent to a local Llama 3.1 model running behind LiteLLM at:

litellm-service.llama.svc.cluster.local:4000

The LLM extracts three fields: company, title, and description. Because LLM output is not always clean JSON — models frequently append explanatory text after the closing brace — the script uses brace-counting to find the JSON boundary rather than a naive JSON.parse() on the full response.

Extracted jobs are appended to /mnt/gigs/jobs/pending.json. Duplicate detection is ID-based, so re-submitting a URL you already queued is a no-op. A short summary is posted back to the OpenClaw chat channel when the run completes.

Why a local model for this step? Llama 3.1 is fast, costs nothing per call, and the extraction task is simple enough that it handles it reliably. There is no reason to burn Gemini quota on structured field extraction from a page of text.

Stage 2: Daily Resume Tailoring (CronJob)

The second stage runs automatically every morning. A Kubernetes CronJob fires at 0 14 * * * UTC — 6am PDT / 7am PST — long enough after midnight that any late-night queuing from the previous day is captured.

CronJob spec (abbreviated):

        
      
apiVersion: batch/v1
kind: CronJob
metadata:
  name: morning-brief
  namespace: gigs
spec:
  schedule: "0 14 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          nodeSelector:
            node-availability: "24x7"
          containers:
            - name: morning-brief
              image: z2tone/morning-brief:v1.0.0
              volumeMounts:
                - name: gigs-nfs
                  mountPath: /mnt/gigs

The nodeSelector: node-availability: "24x7" constraint ensures the job lands on a node that is always running, not one that spins down overnight.

What the job does

Reads pending.json and filters out any IDs already in processed.json. This makes the run idempotent — re-running after a partial failure will not re-process completed jobs.
Loads all three master resume PDFs once at startup, before the per-job loop:
- <your-initials>_DevOps.pdf
- <your-initials>_CloudEngineer.pdf
- <your-initials>_SiteReliability.pdf
These live on an NFS volume at /mnt/gigs/resumes/. Storing them as PDFs on NFS means updating a resume is a file replacement — no image rebuild, no redeploy.
For each new job, makes a single Gemini 2.5 Flash call that combines classification and tailoring.

The classify-and-tailor function

        
      
def classify_and_tailor(master_texts, job_title, company, description):
    resumes_block = '\n\n'.join(
        f'[{k} RESUME]\n{v}' for k, v in master_texts.items()
    )
    prompt = (
        f"JOB TITLE: {job_title}\nCOMPANY: {company}\n\n"
        f"JOB DESCRIPTION:\n{description[:3000]}\n\n"
        f"MASTER RESUMES:\n{resumes_block}\n\n"
        f"1. Classify into: DevOps, CloudEngineer, SRE, Other\n"
        f"2. Select + tailor the corresponding resume\n\n"
        f"OUTPUT FORMAT:\nTYPE: <category>\n---\n<tailored resume>\n"
    )
    result = call_gemini(prompt)
    parts = result.split('---\n', 1)
    job_type = parts[0].replace('TYPE:', '').strip()
    tailored = parts[1].strip() if len(parts) > 1 else result
    return job_type, tailored

Combining classification and tailoring into one call halves Gemini API usage versus the obvious two-call approach — classify first, then tailor. The model is given all three master resumes in the same prompt and asked to both pick the right one and produce the tailored output. The --- delimiter makes the response easy to split deterministically.

Rate limiting

Gemini 2.5 Flash on the free tier caps at 10 RPM. With one API call per job, a 7-second sleep between jobs keeps throughput safely under that limit:

        
      
import time

for job in new_jobs:
    job_type, tailored = classify_and_tailor(
        master_texts,
        job["title"],
        job["company"],
        job["description"]
    )
    save_tailored_resume(job, job_type, tailored)
    update_processed(job["id"])
    time.sleep(7)

On 429, 500, or 503 responses, the job retries with exponential backoff: 10 * (2 ** attempt) seconds — 10s, 20s, 40s — before failing that job and moving on.

Output

Tailored resumes are written to:

/mnt/gigs/resumes/tailored/YYYY-MM/{Company}_{Title}.txt

After each job completes, processed.json is updated. At the end of the run, a Telegram message is sent to a dedicated gigs channel summarizing how many resumes were produced and flagging any failures.

Key Design Decisions

Local LLM for extraction, cloud LLM for tailoring. Llama 3.1 via LiteLLM handles structured field extraction cheaply and quickly. Gemini 2.5 Flash handles the quality-sensitive tailoring work. Using the right tool at each step keeps costs near zero without compromising output quality.

Single combined classify+tailor call. The naive implementation would classify the job first, then make a second call to tailor the selected resume. One prompt with all three resumes and a structured output format achieves both in one shot. At 10 RPM free-tier limits, this matters.

Idempotent processing with processed.json. The CronJob can be re-triggered manually or after a crash without duplicating work. Each job ID is recorded on success, so partial runs pick up where they left off.

PDFs on NFS for master resumes. Rebuilding a container image to update a resume would be absurd. The three master PDFs live on a shared NFS volume. Updating a resume is a file copy. The next CronJob run picks up the new version automatically.

Retry with exponential backoff. Free-tier rate limit errors are transient. A three-attempt backoff sequence (10s, 20s, 40s) handles brief quota exhaustion without manual intervention, while still failing fast enough to not stall the rest of the run.

Conclusion

This pipeline took a manual, inconsistent process and turned it into a reliable daily routine. Job postings get queued during the week through a single chat command, and every morning the cluster produces a set of tailored resume variants ready to attach to an application.

The architecture is deliberately simple — a Node.js script for ingestion, a Python CronJob for processing, two LLMs doing what each is actually good at. There are no message queues, no complex orchestration frameworks, no persistent services beyond what was already running. The Kubernetes primitives (CronJob, NFS PVC, node selectors) are sufficient.

If you are running a similar homelab setup and managing an active job search, this pattern is worth adapting. The main extension points are adding more master resume variants, routing the “Other” job type to a fallback model, and wiring in application tracking to close the loop after a resume is sent.