Kubernetes & AI Infrastructure

Kubernetes & AI Infrastructure,
Built from Metal Up.

Deep-dive content on Kubernetes at scale, GPU clusters, distributed training, and the real engineering behind production AI systems. No fluff. No slides-only talks.

500+
Microservices in Production
5x
CNCF Kubestronaut
GPU
Bare-Metal Cluster

What we build here

Engineering education that
ships real infrastructure.

Every piece of content comes from running production systems — not textbooks.

☸️

Kubernetes at Scale

Running 500+ microservices across 30+ teams on bare metal. Cilium CNI, BGP routing, Envoy Gateway, etcd internals, and capacity planning from first principles.

GPU Infrastructure & Distributed Training

Multi-node DDP, NCCL collectives, vLLM deployment, tensor parallelism, and GPU operator internals — from real bare-metal clusters with RTX 4090s and A100s.

🔁

MLOps & AI Platform Engineering

MLflow, Argo Workflows, KServe, Kubeflow Pipelines, and the platform layer that keeps AI teams shipping without blocking on infra.

📡

Observability & GitOps

Prometheus, Grafana, Loki, ArgoCD — how to actually run GitOps at scale across 30+ teams without it becoming a mess.

🧠

Transformer Architecture & Inference

Attention mechanics, GQA/MQA, quantization (AWQ/GPTQ), and inference optimisation from a platform engineer's perspective.

🛠️

Career in AI Infrastructure

How to break into top-tier AI infra roles at CoreWeave, Nebius, Lambda Labs — interview prep, portfolio builds, and what actually matters.

Topics

Everything in the stack.

Kubernetes
NVIDIA GPU Operator
vLLM
Distributed Training
PyTorch DDP
NCCL
Cilium CNI
Envoy Gateway
ArgoCD
MLflow
Argo Workflows
Terraform
KServe
Kubeflow Pipelines
QLoRA Finetuning
Axolotl
Prometheus & Grafana
Rook-Ceph
BGP Routing
HPC Networking
Transformer Architecture
AWQ Quantization

Credentials

Built by someone
running production.

Not a course creator who read the docs. A Senior Platform Engineer who ships this daily.

🏆

CNCF Kubestronaut

All five CNCF Kubernetes certifications — CKA, CKAD, CKS, KCNA, KCSA

☁️

AWS Community Builder

Containers category — recognised contributor to the AWS ecosystem

⚙️

500+ Microservice Cluster

Managing bare-metal Kubernetes for 30+ engineering teams in production

🖥️

GPU Infrastructure

Running AI Infrastructures in production

🔬

HashiCorp Terraform Associate

Infrastructure as Code practitioner across multi-cloud and bare-metal

The person behind it

Isreal Urephu

Isreal Urephu

Founder & Senior Platform Engineer

Not just content.
Built from production.

I'm a Lead Platform Engineer with over 5 years of experience designing and operating large-scale Kubernetes platforms in production. I currently manage a bare-metal Kubernetes cluster hosting more than 500 microservices across 30+ engineering teams and operate GPU infrastructure for vLLM inference and distributed AI training workloads.

Over the years, I've worked closely with AI and machine learning engineers to understand the challenges of building, deploying, and operating AI infrastructure at scale. That experience has deepened my expertise in GPU infrastructure, Kubernetes, and the systems that power modern AI workloads.

I'm also an AWS Community Builder and hold all five CNCF Kubestronaut certifications.

Barilon is where I document what I'm building, operating, and learning—no slides-only content or theory disconnected from practice. Most of what I share comes from real production systems I work on, while the rest is based on hands-on experiments, home lab projects, and deep dives into technologies I'm actively exploring.

CNCF KubestronautAWS Community BuilderLead Platform EngineerKubernetes at Scale

Newsletter

Stay sharp.

Weekly breakdowns on Kubernetes internals, GPU infrastructure, and AI platform engineering. Written by a practitioner, not a content farm.

No spam. Unsubscribe anytime.

Contact

Let's work together.

Whether it's a workshop, a technical consultation, a partnership, or just a conversation about Kubernetes and AI infrastructure — I'm open to it.

or find me on