0%
Open to Opportunities

Krishna
Sathvik

Senior Data Engineer · AI/ML & RAG

I'm Krishna Sathvik Mantripragada, a Senior Data Engineer who loves turning messy data into useful products. I design and build data platforms, real-time pipelines, and AI-powered experiences using tools like Databricks, Spark, Kafka, dbt, and Azure. I also ship full-stack apps end-to-end, from backend APIs to frontend UX.

I build real, production-ready data and AI systems — from streaming pipelines and analytics to RAG chatbots and full-stack web apps.

Azure Databricks Python Apache Kafka Apache Spark Snowflake dbt Power BI Machine Learning SQL Apache Airflow Tableau AWS Delta Lake Apache Flink Scala LangChain OpenAI HuggingFace MLflow Vector Databases Feature Store Azure Data Factory PostgreSQL Docker Azure DevOps RAG LLM PySpark Java Azure Databricks Python Apache Kafka Apache Spark Snowflake dbt Power BI Machine Learning SQL

Featured Projects

Production-Ready
TrailVerse: National Parks Explorer
AI-Powered Platform

TrailVerse

Role: Founder · Full-Stack Developer · Data Engineer

TrailVerse is an AI-powered national parks exploration and trip planning platform for 470+ U.S. park units. It unifies NPS data, interactive maps, real-time weather, events, reviews, and dual LLM trip planning (OpenAI + Claude) into a single production-ready experience.

React 18.3 Node.js MongoDB OpenAI GPT-4
View Live Site
Job Tracking App

ApplyTrak - Enterprise Job Application Tracker

Role: Full-Stack Developer · Automation Engineer

ApplyTrak is a production-ready job application tracking platform that helps modern job seekers manage unlimited applications, goals, and analytics with real-time sync across devices. Built with React, TypeScript, Supabase, and Tailwind, it includes achievements, rich analytics, and a local-to-cloud migration system.

React 19 TypeScript Supabase
View Live App
ApplyTrak: Job Application Tracker

Other Projects

LLM Engineer

RAG Chatbot - Advanced Interview Preparation Assistant

Role: LLM Engineer · RAG Developer

A dual-persona Retrieval-Augmented Generation (RAG) chatbot for interview preparation, combining 557+ curated knowledge chunks with FastAPI and a modern React frontend. It routes questions across AI/ML, Data Engineering, BI, and Analytics Engineering profiles to deliver structured, interview-ready answers.

React 19 FastAPI RAG
GitHub
Data Engineer

Real-time Fraud Detection

Role: Data Engineer · ML Engineer (Real-time Streaming · Anomaly Detection)

ML-powered fraud detection pipeline processing millions of transactions with sub-second latency. Built with Kafka for real-time event streaming, Spark for distributed processing, and machine learning models for anomaly detection and fraud classification.

Python Kafka Spark
GitHub
Data Engineer

Finance Tracker Pipeline

Role: Data Engineer · ETL Developer (Python · Pandas · SQLite)

A personal finance tracking pipeline that ingests CSV transaction data, cleans and categorizes expenses, stores them in SQLite, and exposes interactive summaries through a Streamlit dashboard. It generates monthly breakdowns, category views, and savings trends from raw bank exports.

Python pandas SQLite
GitHub
ML Engineer

Stock Price Prediction Pipeline

Role: ML Engineer (LSTM · Time-Series Modeling)

A complete end-to-end machine learning project for forecasting stock prices using traditional ML (Linear Regression, XGBoost) and deep learning (LSTM), along with time-series forecasting via Facebook Prophet. An interactive Streamlit dashboard makes model outputs, metrics, and visualizations easy to explore.

Python LSTM Streamlit
GitHub
Data Engineer

Market Basket Analysis Pipeline

Role: Data Engineer · ML Engineer (FP-Growth · Association Rules)

An end-to-end Market Basket Analysis pipeline that ingests retail transactions, cleans and filters them, and uses the FP-Growth algorithm to mine frequent itemsets and association rules. A Streamlit dashboard lets users filter by confidence/lift, search by product, and explore top co-occurring items.

Python Streamlit FP-Growth
GitHub
Data Engineer

Real-Time Vehicle Telemetry Pipeline

Role: Data Engineer · IoT Streaming Developer

This project simulates and processes real-time vehicle telemetry (GPS, speed, fuel level, engine temperature) using Kafka, Spark Structured Streaming, Cassandra, and Streamlit. It detects anomalies like overspeeding, overheating, and low fuel, and visualizes live metrics and alerts on a real-time dashboard.

Kafka Spark Cassandra
GitHub

AI/ML & GenAI

Personal Research & Projects

Beyond my professional data engineering work, I actively explore AI/ML, GenAI, and RAG technologies through hands-on projects. I build proof-of-concepts with LangChain, vector databases, and LLMs to understand how these tools work in practice and stay current with the AI landscape.

RAG System

RAG Chatbot - Advanced Interview Preparation Assistant

Role: LLM Engineer · RAG Developer

A dual-persona Retrieval-Augmented Generation (RAG) chatbot for interview preparation, combining 557+ curated knowledge chunks with FastAPI and a modern React frontend. It routes questions across AI/ML, Data Engineering, BI, and Analytics Engineering profiles to deliver structured, interview-ready answers.

RAG OpenAI GPT-4 Claude Vector DB LangChain
View Project
GenAI

Generative AI Applications

Building production applications with GPT-4, Claude, and other LLMs. Exploring prompt engineering, fine-tuning, and agent-based architectures for real-world use cases.

OpenAI API Anthropic Claude Prompt Engineering LLM Agents

Ongoing exploration and experimentation

Machine Learning

ML & Deep Learning Projects

Personal ML projects including stock prediction with LSTM networks, fraud detection systems, and time series forecasting. Focus on production-ready implementations and model optimization.

LSTM XGBoost TensorFlow PyTorch
View Projects
Research

AI Research & Continuous Learning

Staying current with latest AI/ML research, experimenting with new architectures, and contributing to open-source AI projects. Regularly building proof-of-concepts and sharing learnings.

Research Papers Open Source Experimentation Knowledge Sharing

Active learning and contribution

AI Powered

Ask my AI Assistant

Query my background, tech stack, or availability. It reads directly from my resume data.

krishna-bot — node — 80x24
~ System online. Try asking: "What is your experience with Azure?"
~

Career
Timeline

I'm a Senior Data Engineer passionate about building data infrastructure that empowers organizations to make data-driven decisions at scale. With expertise spanning cloud platforms (Azure, AWS), modern data stacks (Snowflake, Databricks, dbt), and real-time streaming (Kafka, Flink), I specialize in turning complex data challenges into elegant, scalable solutions.

Currently at Walgreens Boots Alliance, I design and operate enterprise data platforms processing terabytes of healthcare and retail data monthly, enabling analytics, machine learning, and operational insights for teams across finance, supply chain, and product.

Outside of work, I actively explore AI/ML, GenAI, and RAG technologies—building production applications, experimenting with cutting-edge techniques, and contributing to the AI community. This personal research complements my professional work and keeps me at the forefront of data and AI innovation.

What drives me: Building systems that don't just work today, but scale for tomorrow. I believe great data engineering is invisible to end users—it just works, reliably and efficiently.

Education

Master of Science in Computer Science

University of North Texas • 2021

Bachelor of Technology in IT

GITAM University • 2019

Senior Data Engineer

Walgreens Boots Alliance / Feb 2022 — Present

  • Design and operate enterprise data platform in Databricks, Snowflake, and Azure processing 10+ TB monthly from 15+ sources; built scalable ETL/ELT pipelines delivering trusted datasets with 99.9% availability
  • Build and maintain production dbt transformations and dimensional models; created governed semantic layers and data marts used by 500+ analysts and business users
  • Implement comprehensive data quality frameworks and monitoring with Python and dbt tests; reduced data incidents 45%, eliminated 200+ hours of monthly remediation
  • Partner daily with data scientists to support feature engineering and ML model development; created reusable feature stores that cut model deployment time 45%
  • Orchestrate complex workflows with Apache Airflow including lineage tracking, SLA monitoring, and alerting; improved pipeline success rate from 92% to 99.8%
  • Mentor 6 junior engineers on data engineering best practices, code reviews, and operational excellence; improved team velocity 40% and reduced production incidents 50%

Analytics Engineer

CVS Health / Oct 2020 — Dec 2021

  • Engineered modular dbt transformations and dimensional models in Snowflake processing 7M+ daily records; created reusable analytics-ready datasets and data marts for forecasting and planning
  • Developed Python and SQL-based ETL pipelines with Airflow orchestration; integrated production databases, claims systems, and third-party APIs reducing data refresh time from 8 hours to 2 hours
  • Implemented data quality frameworks and monitoring; established validation checks and automated remediation reducing manual fixes by 30%
  • Collaborated with data science teams to productionize ML models; delivered feature pipelines with automated validation that raised prediction accuracy 16%
  • Delivered certified semantic models and datasets consumed by Tableau and Power BI users; accelerated leadership reporting by 3 days

Data Science Intern

McKesson Corporation / Mar 2020 — Sep 2020

  • Built data pipelines processing 2M+ prescription and utilization records with Python, Spark, and SQL; implemented automated quality gates that improved data accuracy 22%
  • Created reusable, versioned feature tables consumed by XGBoost and forecasting models; reduced training time 40% and improved reliability 16%
  • Developed monitoring dashboards in Tableau providing real-time visibility into pipeline health; reduced incident detection from hours to minutes
  • Supported supply chain analytics by delivering clean, documented datasets; enabled accurate procurement decisions and improved inventory optimization reducing excess costs 12%

Software Developer

Inditek Pioneer Solutions / Jun 2018 — Dec 2019

  • Developed Python and SQL ETL integrations handling 100K+ daily healthcare transactions; delivered stable feeds with 99.8% availability supporting client analytics and reporting
  • Optimized SQL queries, stored procedures, and indexes across large datasets; reduced data access latency 28% and improved performance for downstream analytics consumers
  • Implemented SDLC best practices including code reviews, testing, and documentation; ensured reliability, maintainability, and scalability of data integrations

Core Competencies

Data Architecture & Design

Enterprise data platform design, Medallion/Lakehouse architecture, dimensional modeling, cloud-native data solutions

ETL/ELT Pipeline Engineering

Real-time streaming with Kafka/Flink, batch processing with Airflow/Spark, production-grade dbt transformations

Data Quality & Governance

Comprehensive validation frameworks, automated monitoring, metadata management, GDPR/CCPA compliance

ML/AI Data Infrastructure

Feature store development, MLOps pipeline automation, model training/deployment support

Cloud Data Platforms

Azure (Databricks, Synapse, Data Lake), AWS (S3, Glue, EMR), Snowflake, dbt, Delta Lake

DataOps & Automation

CI/CD for data pipelines, Infrastructure as Code, automated testing and deployment

Technical Skills

Data Engineering & Streaming

Databricks · Apache Spark · Delta Lake · Apache Kafka · Apache Flink · dbt · Airflow · Azure Data Factory

AI, ML & RAG

Python (ML) · scikit-learn · LSTM · RAG · LangChain · OpenAI API · Vector DBs (FAISS, Pinecone)

Cloud & Warehouses

Azure · AWS · Snowflake · Azure Synapse · Redshift · BigQuery

Programming & Scripting

Python · SQL · Scala · Java · PowerShell

BI & Analytics Tools

Power BI · Tableau · Looker · Streamlit

DevOps & Tooling

Git/GitHub · CI/CD · Docker · Kubernetes · Terraform · Azure DevOps

Certifications

Microsoft Azure Data Engineer Associate

Active • Credential ID: 2CA6D7588001CC9F

Designing and implementing data storage, data processing, and data security solutions using Azure services.

Microsoft Azure AI Engineer Associate

Active • Credential ID: 61B6FE700A01EC6

Designing and implementing AI solutions using Azure Cognitive Services, Azure Machine Learning, and Azure Bot Service.

SnowPro Core Certification

In Progress

Validates expertise in Snowflake data warehousing, administration, and analytics.

Databricks Certified Data Engineer

In Progress

Certified in Databricks lakehouse architecture, Spark, and data engineering best practices.

dbt Analytics Engineering Certification

Planned

Production-grade dbt transformations, testing, documentation, and analytics engineering.

O'Reilly ChatGPT Data Analysis

Active • 2025

Advanced techniques for using ChatGPT and AI tools for data analysis and business intelligence.

Publications

AI for Electricity Market Design

Book Chapter — Handbook of Smart Energy Systems, Springer (2023)

Published chapter on artificial intelligence applications in electricity market design and optimization.

Status: Available

Let's Work
Together

I'm currently exploring roles as a Senior Data Engineer or AI/Data Engineer (DE + GenAI/RAG). If you'd like to chat about data platforms, streaming, or AI products, feel free to reach out.

Location

Jefferson City, MO

Remote (USA)

Stack

Azure / Databricks / Snowflake

Python / SQL / dbt