Senior Software Engineer with 10+ years building scalable data pipelines, cloud-native infrastructure, and enterprise backend systems. Currently working on indexing and searching petabytes of M365 data.
🌐 hiteshpattanayak.info · AWS Community Builder · CKAD Certified
Languages: Go · Python · TypeScript / Node.js · PySpark
Data Engineering: Databricks · Apache Spark · Delta Lake · Azure Event Hubs
Cloud & Infra: Kubernetes · Docker · Azure · AWS · Terraform · Pulumi
Databases: CosmosDB · PostgreSQL · Elasticsearch · TimescaleDB
AI / LLM: RAG pipelines · Azure OpenAI · Anthropic API · Vector Search
Protocols & APIs: gRPC · REST · GraphQL
- Ultimate CKAD Certification Guide — OrangeAva
- Modern API Design with gRPC — OrangeAva
- Flash talk — gRPC Load Balancing @ GopherCon 2023
- Virtual talk — Microservice Communication using gRPC @ AWS UG Bangalore
- Blog featured in kube-weekly
- Blog featured in LearnK8s LinkedIn pulse
- Semantic Search (RAG) — CosmosDB hybrid vector search + Azure OpenAI over petabytes of M365 backup data; natural language → metadata filters via few-shot Chat Completions
- Elastic Dashboard Changelog — Python + Anthropic API tool that diffs unreadable
.ndjsonKibana files and generates human-readable changelogs - Security Fix Automation — LLM-assisted local skill that ingests Cycode findings and applies targeted fixes with full code context
- Blog Generator — AI-powered workflow (Claude / OpenAI) to draft posts from structured idea files
- AI Chat Assistant — RAG conversational assistant on my blog site (TF-IDF + Netlify Functions + GPT-4o-mini)
My blog has a built-in AI chat assistant. Ask it about my posts, projects, or background — it retrieves relevant content and answers using GPT-4o-mini.
👉 Chat at hiteshpattanayak.info
I'm exploring the intricacies of optimizing retrieval-augmented generation (RAG) by implementing more efficient vector search algorithms and enhancing grounding techniques to improve contextual relevance in responses. I'm particularly focused on integrating page-aware AI chat features that leverage per-page context to deliver a more personalized user experience while reducing latency in data retrieval from vector databases. Additionally, I'm considering strategies for effective chunking of knowledge to balance speed and accuracy in response generation.
Powered by Claude via scheduled GitHub Actions · view workflow



