DevOps · Site Reliability Engineering

Sai ChanduMachavarapu

Cloud & Platform Infrastructure — AWS · Azure · GCP

I design cloud-native platforms and reliability engineering practices that ship faster, cost less, and stay up.

I engineer reliable, self-healing cloud infrastructure — designing platforms that scale gracefully, deploy fearlessly, and stay up when it matters most.

Uptime maintained

0×

Deployment frequency

Reduction in MTTR

Years across cloud domains

Professional Experience

Building reliable platforms at scale.

Five years operating cloud infrastructure across fintech, healthcare, and enterprise workloads — from Kubernetes and IaC to SRE, observability, and cost optimization.

Koch Industries
DevOps Engineer / Site Reliability Engineer
Mar 2026 – Present
New York
99.9% uptime2× deploy freq-40% MTTR
- Designing AWS and GCP cloud-native infrastructure, including virtual networks and Kubernetes-based container orchestration for analytics platforms supporting 10+ microservices across 3 environments.
- Implementing zero-downtime deployment strategies for containerized microservices using Kubernetes and automated CI/CD pipelines, increasing deployment frequency ~2× while maintaining 99.9% uptime.
- Building predictive monitoring and alerting using Prometheus, Grafana, and CloudWatch, improving anomaly detection and cutting incident response time by 40%.
- Managing Kubernetes clusters, container registries, and Docker/EKS/ECS ecosystems across 5+ clusters.
- Administering secrets management and access controls through cloud key vault services, cutting unauthorized configuration exposure incidents by ~20%.
- Collaborating with engineering and data teams on AI/analytics workflows, building model-serving infrastructure on AWS Fargate and Amazon SageMaker for three ML pipelines powering internal reporting tools.
Cerner Corporation
Cloud Platform Engineer
Mar 2025 – Feb 2026
North Kansas City, MO
+20% release velocity-25% MTTD-30% vulns
- Built containerized infrastructure with Docker and Kubernetes (GKE) for distributed healthcare data-processing services across four production environments.
- Implemented CI/CD pipelines using Jenkins and Gradle, improving release velocity by ~20%.
- Designed observability dashboards using Prometheus, Grafana, and Splunk, cutting mean detection time by ~25%.
- Resolved distributed system issues across microservices, JVM services, and NoSQL databases, helping maintain 99.9% platform uptime.
- Partnered with security teams to harden infrastructure and container registry controls, reducing vulnerability findings by ~30%.
Citibank
Associate DevOps Engineer
Jun 2021 – Jul 2024
Bangalore
250K+ tx/day99.8% availability15h/wk saved
- Supported AWS cloud infrastructure (ECS, OpenShift) hosting distributed banking analytics services processing 250,000+ daily transactions.
- Deployed containerized workloads using Docker and configured CI/CD pipelines, reducing manual deployment steps by ~30%.
- Automated infrastructure management tasks using Python, saving 15+ hours of manual effort per week.
- Monitored production systems using CloudWatch, maintaining 99.8% availability across financial transaction infrastructure.
- Maintained operational documentation and runbooks supporting incident response and audit readiness across 5+ compliance reviews.

Core Technical Skills

The toolkit.

Hands-on across the full platform lifecycle — from cloud and containers to CI/CD, observability, and secure infrastructure at scale.

Cloud Platforms

AWSMicrosoft AzureGoogle Cloud Platform

Containers & Orchestration

DockerKubernetesAmazon EKSECSAWS FargateOpenShift

Infrastructure as Code

TerraformCloudFormationARM Templates

CI/CD & Release Automation

JenkinsGitHub ActionsGitLab CI/CDArgoCDGradleGit

Monitoring & Observability

PrometheusGrafanaCloudWatchDataDogSplunkELK Stack

Secrets & Security

Azure Key VaultAWS KMSHashiCorp VaultIAM

AI/ML & MLOps Infrastructure

SageMakerAzure MLVertex AIMLflowKubeflowTensorFlow/PyTorchPineconeWeaviateFAISSRAG pipelines

Scripting & Automation

PythonBash

Distributed Systems

MicroservicesJVM Services (Java, Scala)NoSQL Databases

scm@platform ~ /prod

// live-ish shell — for the ambience.

Key Projects

Things I've built.

Platform, reliability, and MLOps work — each project shipped with measurable impact.

Self-Service Internal Developer Platform

Backstage-based service catalog with scaffolding templates for 3 sample microservices and golden-path CI/CD templates (GitHub Actions + Terraform).

Bootstrap: 1 day → <20 min · ~70% less pipeline setup

BackstageGitHub ActionsTerraformKubernetes

SRE with SLOs & Chaos Testing

Defined SLOs/SLIs, built an error-budget Grafana dashboard, and ran 10+ Chaos Mesh fault-injection experiments validating sub-30s recovery.

-40% incident diagnosis time

Chaos MeshGrafanaPrometheusKubernetes

Multi-Cloud IaC & Cost Optimization

Reusable Terraform modules provisioning equivalent AWS/Azure stacks in under 15 minutes with automated right-sizing and cross-region failover.

$150+/mo saved · <5 min RTO

TerraformAWSAzureGitHub Actions

MLOps + RAG Retrieval Pipeline

CI/CD pipeline integrating MLflow registry with Kubernetes model-serving, plus a FAISS-based RAG retrieval layer for low-latency semantic queries.

Hours → <10 min deploys · <200ms RAG p95

MLflowKubernetesFAISSPython

Education

Academic background.

M.S. Computer and Information Science

University of Texas at Tyler

Aug 2024 – May 2026 · Tyler, TX

GPA

3.6

Certifications

Verified across AI, cloud & data.

Continuously credentialing on cloud platforms, AI/ML systems, and modern data engineering.

DeepLearning.AI

Available for Senior DevOps, SRE, and Platform Engineering roles

Let's build reliable systems together.

Open to DevOps, SRE, and Platform Engineering opportunities. Reach out about cloud infrastructure, Kubernetes, or MLOps roles.

saichandumachavarapu7@gmail.com

Download Resume LinkedIn GitHub Email

Sai ChanduMachavarapu

Building reliable platforms at scale.

Koch Industries

Cerner Corporation

Citibank

The toolkit.

Cloud Platforms

Containers & Orchestration

Infrastructure as Code

CI/CD & Release Automation

Monitoring & Observability

Secrets & Security

AI/ML & MLOps Infrastructure

Scripting & Automation

Distributed Systems

Things I've built.

Self-Service Internal Developer Platform

SRE with SLOs & Chaos Testing

Multi-Cloud IaC & Cost Optimization

MLOps + RAG Retrieval Pipeline

Academic background.

M.S. Computer and Information Science

Verified across AI, cloud & data.

Machine Learning Specialization

AI for Medicine

Generative AI with LLMs

AI Engineer for Developers Associate

Oracle Certified AI Foundations Associate

Oracle AI Vector Search Certified Professional

Oracle Database@AWS Certified Architect Professional

Crash Course on Python

Microsoft Certified: Azure Administrator Associate

GitHub Copilot

Microsoft Certified: Azure AI Fundamentals

Let's build reliable systems together.