Project Showcase
Explore my innovative projects and technical achievements in detail.


RAG-Based Chatbot on GCP Using Vertex AI
Summary:
Designed and implemented a Retrieval-Augmented Generation (RAG) chatbot that intelligently answers enterprise-specific queries using internal knowledge bases and LLMs.
Tech Stack:
Vertex AI, LangChain, BigQuery, Cloud Functions, Pinecone/Weaviate, Google Cloud Storage, Cloud Run
Key Responsibilities:
Developed pipeline for vectorizing enterprise documents
Integrated document ingestion with LangChainβs retriever
Used Vertex AI for prompt processing and response generation
Deployed on GCP with autoscaling using Cloud Run
Implemented feedback loop for query refinement
Impact:
Enabled automation of 40β60% of L1 support queries
Reduced human intervention in document walkthroughs
Delivered response time under 2 seconds for 95% queries
GenAI Agent AI for Ticket Classification and Routing
Summary:
Built a GenAI-powered agent that auto-classifies and routes support tickets based on contextual understanding, drastically improving triaging speed and accuracy.
Tech Stack:
Vertex AI, LangChain, Pub/Sub, Cloud Logging, Google Cloud Functions, BigQuery
Key Responsibilities:
Extracted ticket metadata from monitoring and alerting tools
Designed prompt templates for classifying issues by service/app
Built LangChain agent to suggest resolution paths or L2 ownership
Logged decisions to BigQuery for auditability and optimization
Impact:
Achieved 70% auto-classification accuracy in first rollout
Reduced triage turnaround time by 50%
Enabled fast handover of high-severity tickets to L2/L3


Cloud Migration Roadmap & Execution
Summary:
Led hybrid cloud migration engagements for clients in pharma sectors, assessing existing infrastructure and defining modernization blueprints.
Tech Stack:
GCP, AWS, Cloud SQL, BigQuery, ADLS Gen2, Terraform, Ansible, Google Migration Center
Key Responsibilities:
Assessed on-prem workloads and databases
Designed hybrid strategy for phased migration (GCP + AWS)
Created landing zones with secure IAM and network policies
Defined CI/CD and Infrastructure as Code practices
Migrated data lakes and BI workloads to GCP
Impact:
Enabled a 30% reduction in operational cost
Delivered phased migration roadmap within 90 days
Met all compliance requirements (HIPAA, GDPR)
Monitoring Setup with Cloud Monitoring & Vertex AI Anomaly Detection
Summary:
Established a proactive observability stack that uses AI for real-time anomaly detection and alerting across cloud infrastructure and data pipelines.
Tech Stack:
Google Cloud Monitoring, Vertex AI, Cloud Functions, Pub/Sub, Cloud Logging, BigQuery
Key Responsibilities:
Set up centralized logging & metrics publishing from distributed systems
Created anomaly detection models using Vertex AI
Configured automated alerting with email, SMS, and Slack integrations
Built dashboards for SRE/Ops teams to visualize system health
Impact:
Reduced MTTR (Mean Time to Resolve) by 35%
Prevented major incidents by detecting anomalies before failures
Trained Ops team to use AI-driven dashboarding effectively


Real-Time ELT Pipeline for Lakehouse using Composer, BigQuery, Pub/Sub, and Cloud Storage
Summary:
Developed a real-time, orchestrated ELT data pipeline to ingest, transform, and load structured/unstructured data into a unified Lakehouse architecture on GCP. The pipeline supports both batch and streaming ingestion models.
Tech Stack:
Cloud Composer (Airflow), BigQuery, Cloud Storage, Cloud Functions, Pub/Sub, Dataform
Responsibilities:
Designed event-driven data ingestion using Pub/Sub with schema enforcement
Orchestrated end-to-end workflows using Cloud Composer (Airflow)
Parsed, cleansed, and stored raw data in Cloud Storage (Bronze layer)
Applied transformations and quality checks with BigQuery SQL (Silver layer)
Managed curated views (Gold layer) for downstream analytics & ML
Triggered notification and error alerts via Cloud Functions
Impact:
Reduced batch processing time from 2 hours to 20 minutes
Achieved unified governance with Lakehouse pattern
Enabled seamless consumption by GenAI model(Gemini Pro)
Scalable Batch + Streaming Data Pipeline Using Dataflow, Dataproc, and BigQuery
Summary:
Architected a hybrid batch + stream pipeline to process high-volume clickstream, sales, and data, leveraging GCP native services for scalable processing and warehousing.
Tech Stack:
BigQuery, Dataflow, Dataproc (Spark), Cloud Functions, Cloud Storage, Cloud Scheduler
Responsibilities:
Built Apache Beam pipelines on Dataflow for near-real-time stream processing
Offloaded heavy joins & transformations to Dataproc Spark clusters (scheduled with Cloud Scheduler)
Integrated external data into Cloud Storage and ingested to staging tables
Transformed and enriched data in BigQuery for reporting & ML
Set up auto-scaling, fault-tolerant architecture using native GCP triggers
Impact:
Enabled analytics on Data
Cut cloud compute costs by 25% through hybrid job design
Improved insight availability from 24 hours to 4 hours


Legacy .NET Monolith to Microservices on GKE with PostgreSQL Backend
Summary:
Led the end-to-end modernization of a legacy enterprise application originally built on .NET and IBM DB2, transforming it into a scalable, containerized microservices architecture hosted on Google Kubernetes Engine (GKE)with Python-based APIs and PostgreSQL as the new backend.
Tech Stack:
.NET (legacy), GKE, Docker, Python (FastAPI/Flask), PostgreSQL, IBM DB2, Cloud Build, GCP IAM, Cloud Logging, Cloud SQL, GitOps (Jenkins)
Responsibilities:
π Monolith to Microservices Refactoring
Analyzed .NET legacy UI and business logic
Broke monolithic code into domain-driven microservices
Rewrote APIs using Python (FastAPI) to interact with the new database
ποΈ Database Migration
Reverse-engineered schema and data from IBM DB2
Migrated historical and operational data to PostgreSQL
Created compatibility layers for downstream reporting systems
βοΈ Cloud-Native Deployment
Containerized Python services with Docker
Deployed all services to Google Kubernetes Engine (GKE)
Implemented horizontal auto-scaling, readiness/liveness probes, and rolling updates
π Security and Networking
Configured GCP IAM roles, service-to-service authentication, and private access to Cloud SQL
Used internal load balancing and VPC-native clusters for secure microservice communication
π§ Observability and CI/CD
Integrated Cloud Logging and Monitoring for each service
Set up Git-based CI/CD pipelines with Cloud Build and ArgoCD for continuous delivery
Impact:
Modernized legacy tech stack, improving scalability and maintainability
Reduced operational costs by moving from licensed DB2 to open-source PostgreSQL
Improved deployment speed with microservices delivering updates independently
Enhanced performance and fault isolation through containerized services on GKE
Enterprise MS SQL Server Migration to GCP Cloud SQL
Summary:
Successfully migrated a production-grade Microsoft SQL Server database from on-premise infrastructure to Google Cloud SQL for SQL Server, enabling better scalability, high availability, and managed backup with reduced operational overhead.
Tech Stack:
MS SQL Server (on-prem), Cloud SQL for SQL Server, Database Migration Service (DMS), Cloud Monitoring, VPC Peering, IAM, Cloud Scheduler, Terraform
Responsibilities:
π§ Assessment & Planning
Conducted deep analysis of existing database structure, dependencies, and usage patterns
Planned zero-downtime cutover window and rollback strategy
βοΈ Migration Execution
Set up Database Migration Service (DMS) with minimal downtime replication
Migrated schema, stored procedures, linked servers, SQL Jobs, and data
Tuned long-running queries and optimized indexes post-migration
π Security & Networking
Enabled private IP access to Cloud SQL via VPC peering
Configured IAM roles, SSL enforcement, and automated backups
Integrated with Secret Manager for app credential handling
π§© Post-Migration Optimization
Configured Cloud Monitoring and Query Insights for performance tuning
Scheduled automated backups and maintenance windows via Cloud Scheduler
Used Terraform to version control infrastructure provisioning
Impact:
Reduced DB management overhead by 70% through managed Cloud SQL
Improved performance consistency and security posture
Enabled integration with other GCP services like BigQuery and Looker
Achieved seamless migration with <5 min downtime during cutover


Pre-Sales Project: MS SQL Server Migration Evaluation β Azure vs GCP
Summary:
Led a pre-sales engagement for a global manufacturing client to evaluate the migration of a business-critical MS SQL Server hosted on-premises. The engagement focused on determining the feasibility of either lift-and-shift, cloud-managed services, or enterprise server hosting on Azure and GCP. The goal was to provide a comprehensive architecture and operational model aligned with scalability, compliance, and cost optimization objectives.
Engagement Type:
Pre-Sales Architecture & PoC (Proof of Concept)
Client: Confidential (Manufacturing sector)
Status: Solution proposed and PoC completed, deal not closed
Key Objectives:
Evaluate whether to lift-and-shift the MS SQL Server VM or modernize to managed database offerings
Provide a comparative analysis between Azure SQL Managed Instance, GCP Cloud SQL for SQL Server, and self-hosted SQL Server on VM
Ensure support for linked servers, SSIS/SSRS workloads, Always-On availability, and Active Directory integration
Deliver a working PoC with sample workloads on both clouds
Solutioning Responsibilities:
π§ Requirements Analysis
Worked with enterprise architects to gather inputs on current workloads, high availability, latency sensitivity, and DR expectations
Assessed dependencies on SQL Server Agent, Linked Servers, CLR objects, and stored procedures
ποΈ Solution Architecture
Designed three migration paths:
Lift-and-Shift to IaaS VMs on Azure/GCP using Migrate for Compute Engine / Azure Migrate
Platform Migration to Azure SQL Managed Instance and GCP Cloud SQL (SQL Server)
High-Availability SQL Server 2022 on Azure VM (Enterprise Licensing) with DR and Always-On clustering
Created TCO comparison, networking diagrams, IAM mapping, backup/restore policies
π Proof of Concept Execution
Set up Cloud SQL instance on GCP with VPC peering, private IP, and IAM integration
Created Azure SQL Managed Instance with AD Authentication and VNet
Migrated sample schema and datasets using SQL Server Migration Assistant (SSMA)
Validated workloads, performance, replication, and monitoring in both environments
Outcomes & Insights:
Azure SQL Managed Instance supported more enterprise features like linked servers and SSIS without external hacks
GCP Cloud SQL offered lower operational cost, simpler IAM, but had limitations around advanced SQL Server features (e.g., cross-database queries, SQL Server Agent scheduling)
Lift-and-shift to VM was feasible but didnβt align with modernization and O&M reduction goals
Delivered a 40-page solution proposal with PoC performance benchmarks and risk assessment
Impact:
Gave client a clear, technically sound roadmap for future-state architecture
Demonstrated ability to navigate complex SQL workloads across clouds
Although the client paused the initiative due to budget review, the technical groundwork remains reusable for future cycles
Oracle Database Migration to AWS EC2 & RDS
Summary:
Led the migration of a mission-critical Oracle 11g/12c database from an on-premise data center to Amazon Web Services (AWS). The engagement focused on rehosting (lift-and-shift) for short-term continuity and replatforming select workloads onto Amazon RDS for Oracle to reduce operational overhead and licensing costs.
Key Objectives:
Migrate large Oracle transactional and analytical databases (~5TB) from aging on-premise infrastructure
Reduce hardware/maintenance costs, improve backup and recovery, and prepare for cloud-native modernization
Meet DR and HA expectations within a single-region setup
Responsibilities:
π§ Discovery & Planning
Conducted infrastructure and application dependency analysis
Reviewed data access patterns, backup cycles, archive policies, and licensing model
Selected a hybrid migration strategy:
Lift-and-shift critical transactional DB to EC2 (Oracle EE on Linux)
Replatform reporting DBs to Amazon RDS for Oracle
βοΈ Execution
Built EC2-based Oracle instance with custom filesystem layout (ASM to XFS conversion)
Migrated schema using Oracle Data Pump (expdp/impdp) and RMAN for full backups
Migrated reporting workloads to RDS for Oracle, tuning parameters for query throughput
Set up Database Links between EC2 Oracle and RDS Oracle for hybrid queries
π Security & Monitoring
Implemented VPC peering, security groups, and KMS-encrypted backups
Integrated CloudWatch monitoring and custom scripts for performance tracking
Configured automated snapshots and PITR for RDS instances
Impact:
Reduced overall TCO by 40% annually compared to on-premise licensing + infra
Improved RTO/RPO using automated backups and snapshot scheduling
Laid the foundation for future refactoring of data pipeline to AWS-native services
Trained clientβs DBAs on managing hybrid EC2 + RDS deployments


Cross-Cloud Data Pipeline: Azure to GCP via ADLS, Databricks, Apigee & BigQuery
Summary:
Designed and implemented an enterprise-grade cross-cloud data platform where data from over 20+ pipelines across multiple Azure regions was ingested, processed, and transferred securely to Google Cloud. The pipeline leveraged Azure Data Factory, ADLS Gen2, Databricks, Apigee, Cloud Composer, and BigQuery for a seamless, end-to-end data ingestion and analytics flow.
π§ Architecture Overview
Azure Side β Ingestion & Cleansing
Ingested data from 20+ source systems using Azure Data Factory (ADF) pipelines across multiple geographies
Data saved to ADLS Gen2 (Raw Layer) in partitioned format with metadata tagging
Applied cleansing, validation, and formatting rules using Azure Databricks (PySpark)
Saved output in Cleansed Layer of ADLS Gen2 in Delta Lake format
Cross-Cloud Integration
Built REST APIs using Apigee (GCP) to securely pull data from ADLS Gen2
Streamed cleaned datasets via API gateway into GCP Cloud Storage (Staging)
GCP Side β DAG-Orchestrated Processing
Used Cloud Composer DAGs to:
Pull data from Cloud Storage (RAW)
Load into BigQuery RAW Layer
Apply schema enforcement, deduplication, anomaly detection via PySpark jobs & SQL
Curated, use-case-specific data was moved into the BigQuery Curated Layer
Consumption
Curated datasets were exposed via Looker, Data Studio, and Vertex AI notebooks
Enabled data scientists to pull data from curated or cleansed layers for ML training
Ensured data availability in near real-time across regions
π Security & Operations
Applied IAM roles, private VNet peering, and OAuth2 tokens for inter-cloud API security
Set up audit logging across Azure & GCP environments
Monitored pipeline failures using Cloud Logging, Azure Monitor, and Slack alerts via Cloud Functions
β Impact
Reduced ETL latency from 12 hours to under 2 hours for high-volume pipelines
Enabled cross-cloud compliance and governance across Azure and GCP
Empowered 10+ data science use cases using curated data from GCP
Demonstrated real-time multi-region ingestion and hybrid cloud orchestration
Get in Touch
Feel free to reach out for collaborations, inquiries, or just to connect. I'm here to help and share ideas!
π LinkedIn: linkedin.com/in/gimshra8
π Portfolio: gm01.in
Connect
Showcasing my skills and projects in tech.
π LinkedIn: linkedin.com/in/gimshra8
π Portfolio: gm01.in
Β© 2025. All rights reserved.