Kubernetes & AI Infrastructure

Kubernetes & AI Infrastructure,
Built from Metal Up.

Deep-dive content on Kubernetes at scale, GPU clusters, distributed training, and the real engineering behind production AI systems. No fluff. No slides-only talks.

500+
Microservices in Production
5x
CNCF Kubestronaut
GPU
Bare-Metal K8s Cluster
AWS
AWS Community Builder

What we build here

Engineering education that
ships real infrastructure.

Every piece of content comes from running production systems — not textbooks.

☸️

Kubernetes at Scale

Running 500+ microservices across 30+ teams on bare metal. Cilium CNI, BGP routing, Envoy Gateway, etcd internals, and capacity planning from first principles.

GPU Infrastructure & Distributed Training

Multi-node DDP, NCCL collectives, vLLM deployment, tensor parallelism, and GPU operator internals — from real bare-metal clusters with RTX 4090s and A100s.

🔁

MLOps & AI Platform Engineering

MLflow, Argo Workflows, KServe, Kubeflow Pipelines, and the platform layer that keeps AI teams shipping without blocking on infra.

📡

Observability & GitOps

Prometheus, Grafana, Loki, ArgoCD — how to actually run GitOps at scale across 30+ teams without it becoming a mess.

🧠

Transformer Architecture & Inference

Attention mechanics, GQA/MQA, quantization (AWQ/GPTQ), and inference optimisation from a platform engineer's perspective.

🛠️

Career in AI Infrastructure

How to break into top-tier AI infra roles at CoreWeave, Nebius, Lambda Labs — interview prep, portfolio builds, and what actually matters.

Topics

Everything in the stack.

Kubernetes
NVIDIA GPU Operator
vLLM
Distributed Training
PyTorch DDP
NCCL
Envoy Gateway
ArgoCD
MLflow
Argo Workflows
Terraform
KServe
Kubeflow Pipelines
Prometheus & Grafana
HPC Networking

Credentials

Built by someone
running production.

Not a course creator who read the docs. A Senior Platform Engineer who ships this daily.

🏆

CNCF Kubestronaut

All five CNCF Kubernetes certifications - CKA, CKAD, CKS, KCNA, KCSA

☁️

AWS Community Builder

Containers category - recognised contributor to the AWS ecosystem

⚙️

500+ Microservice Cluster

Managing bare-metal Kubernetes for 30+ engineering teams in production

🖥️

GPU Infrastructure

Running AI Infrastructures in production

🔬

HashiCorp Terraform Associate

Infrastructure as Code practitioner across multi-cloud and bare-metal

The person behind it

Isreal Urephu

Isreal Urephu

Founder & Senior Platform Engineer

Not just content.
Built from production.

I'm a Senior Platform Engineer with over 5 years of experience designing and operating large-scale Kubernetes platforms in production. I currently manage a bare-metal Kubernetes cluster hosting more than 500 microservices across 30+ engineering teams and operate GPU infrastructure for vLLM inference and distributed AI training workloads.

Over the years, I've worked closely with AI and machine learning engineers to understand the challenges of building, deploying, and operating AI infrastructure at scale. That experience has deepened my expertise in GPU infrastructure, Kubernetes, and the systems that power modern AI workloads.

I'm also an AWS Community Builder and hold all five CNCF Kubestronaut certifications.

Barilon is where I document what I'm building, operating, and learning—no slides-only content or theory disconnected from practice. Most of what I share comes from real production systems I work on, while the rest is based on hands-on experiments, home lab projects, and deep dives into technologies I'm actively exploring.

CNCF KubestronautAWS Community BuilderLead Platform EngineerKubernetes at Scale

Latest videos

Watch & Learn.

Subscribe on YouTube →
Everything That Happens to Secure an App Before It Reaches Kubernetes
YouTube

Everything That Happens to Secure an App Before It Reaches Kubernetes

In this video, I walk through how a senior DevOps/Platform engineer would secure an application from a developer machine all the way to a Kubernetes cluster. We cover: ✅ Developer workstation security ✅ Git and source control security ✅ CI/CD pipeline security ✅ Container image scanning ✅ Registry security ✅ Kubernetes RBAC ✅ Secrets management ✅ Network policies This is a common senior DevOps and platform engineering interview question and a real-world approach used in modern cloud-native environments. #DevOps #Kubernetes #DevSecOps #PlatformEngineering #CloudSecurity #Docker #AWS

Jun 16, 2026

Linux Process Isolation Explained | cgroups, Namespaces & Docker
YouTube

Linux Process Isolation Explained | cgroups, Namespaces & Docker

In this video, you’ll learn what process isolation is in Linux and how it’s achieved using cgroups and namespaces. We’ll also explore how Docker uses these Linux features under the hood to run containers. We’ll start by examining how processes look when there’s no isolation and everything runs on the same host. Then, we’ll compare that to running each application on its own server, and finally, how multiple apps can run on the same host but remain isolated from each other. At the end of the video, we’ll do some hands-on experiments to see what gets created when a Docker container spins up giving you a clear picture of how containerisation really works.

Sep 15, 2025

How to Build a Kubernetes Cluster with kubeadm and Cilium CNI (Beginner-Friendly Tutorial)
YouTube

How to Build a Kubernetes Cluster with kubeadm and Cilium CNI (Beginner-Friendly Tutorial)

Learn how to create a Kubernetes cluster from scratch using kubeadm and install Cilium as your Container Network Interface (CNI), replacing kube-proxy and unlocking powerful eBPF-based networking. This step-by-step tutorial walks you through provisioning AWS infrastructure, installing Kubernetes components, configuring Cilium, and deploying a sample application to verify the setup. What You’ll Learn in This Tutorial - How to create a VPC in AWS to build an isolated Kubernetes environment - Setting up Security Groups for master and worker nodes - Provisioning 3 EC2 instances (1 Control Plane + 2 Worker Nodes) - Installing kubeadm, kubelet, kubectl, and required components on each node - Initializing the Kubernetes control plane using kubeadm - Joining all worker nodes to the cluster - Installing Cilium CNI as the networking layer (and kube-proxy replacement) - Deploying a test application to confirm everything is working correctly. Links: - kubernetes docs for kubeadm installation: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ - kubernetes docs for creating the cluster: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/ - Containerd Installation guide: https://docs.docker.com/engine/install/ubuntu/ - Cillium docs and installation: https://docs.cilium.io/en/stable/gettingstarted/k8s-install-default/ Follow me on other social media: TikTok: https://www.tiktok.com/@barilonofficial Instagram: https://www.instagram.com/barilonofficial/ LinkedIn: https://www.linkedin.com/in/isrealurephu/

Dec 8, 2025

Newsletter

Stay sharp.

Weekly breakdowns on Kubernetes internals, GPU infrastructure, and AI platform engineering. Written by a practitioner, not a content farm.

No spam. Unsubscribe anytime.

Contact

Let's work together.

Whether it's a workshop, a technical consultation, a partnership, or just a conversation about Kubernetes and AI infrastructure I'm open to it.

or find me on