Enterprise AI Platforms & Architectures

Enterprise GenAI Operations & Knowledge Intelligence Platform

Overview:
Designed and led the architecture of a secure, governed enterprise Generative AI platform to enable knowledge retrieval, operational intelligence, and SLA governance across large-scale enterprise systems.

This platform was built to move Generative AI from experimentation to production, with a strong emphasis on governance, reliability, observability, and Responsible AI.

Business Problem:

  • Enterprises struggled with:

    • Fragmented operational knowledge spread across tickets, runbooks, and documents

    • High dependency on manual analysis for incident resolution

    • SLA breaches due to delayed insight and lack of contextual intelligence

    • Risk of uncontrolled GenAI adoption without security or governance

Platform Architecture & Capabilities

Knowledge & Data Laye

  • Ingests structured and unstructured operational data (tickets, runbooks, SOPs)

  • Applies cleaning, chunking, metadata tagging, and contextual enrichment

  • Stores enterprise knowledge in governed analytical and vector stores

GenAI & Retrieval Layer

  • Implements Retrieval-Augmented Generation (RAG) using embeddings and vector search

  • Ensures responses are grounded in enterprise-approved knowledge

  • Prevents hallucinations through controlled retrieval and response constraints

Agentic Orchestration Layer

  • Introduces AI agents to:

    • Interpret user intent

    • Route queries to the correct knowledge or operational context

    • Trigger downstream workflows where automation is required

Governance, Security & Responsible AI

  • Role-based access to enterprise knowledge

  • Data isolation and audit logging

  • Prompt safety controls and output validation

  • Designed to meet compliance and audit requirements

LLMOps & Observability

  • Prompt versioning and lifecycle management

  • Evaluation standards for relevance, accuracy, latency, and cost

  • Centralized logging, monitoring, and alerting for AI pipelines

Technology Stack

  • GCP: Vertex AI (Gemini, embeddings), BigQuery, Vector Search, Cloud Run

  • Multi-cloud ready with AWS Bedrock and OpenSearch compatibility

Outcome & Value
  • Enabled enterprise-wide access to operational knowledge through governed AI

  • Reduced dependency on manual triage and tribal knowledge

  • Improved SLA adherence through proactive intelligence

  • Established a reusable GenAI platform blueprint for future enterprise use cases

AI-Driven Ticket Intelligence & Automation Platform
Overview

Architected an AI-powered operational intelligence platform that analyzes enterprise support and incident tickets to identify SLA risks, prioritize workloads, and trigger automated workflows.

This platform blends classical ML and Generative AI, designed for production reliability, not experimentation.

A modern, minimalistic folded brochure with multiple panels displayed in a row. Each panel features a different design, including text elements and an image of a mountain landscape during sunset on one side. The color scheme is predominantly white, with accents of dark blue, black, and soft pink.
A modern, minimalistic folded brochure with multiple panels displayed in a row. Each panel features a different design, including text elements and an image of a mountain landscape during sunset on one side. The color scheme is predominantly white, with accents of dark blue, black, and soft pink.
Business Problem

Operational teams faced:

  • Large volumes of unstructured tickets with inconsistent prioritization

  • Manual SLA tracking and delayed escalation

  • Reactive incident management instead of proactive intervention

Platform Architecture & Capabilities

Ticket Intelligence Layer

  • Parses and enriches incoming tickets using ML and contextual analysis

  • Classifies tickets based on severity, urgency, and historical patterns

Contextual AI & GenAI Layer

  • Uses AI to understand ticket context beyond keywords

  • Applies enterprise knowledge to suggest resolution paths

  • Provides explainable insights for operational teams

Agent-Based Automation

  • Detects SLA breach risks in real time

  • Triggers automated workflows:

    • Escalation

    • Reassignment

    • Notification to operations teams

  • Integrates with monitoring and alerting systems

Governance & Reliability

  • Full audit trail of AI-driven decisions

  • Controlled automation boundaries to avoid unintended actions

  • Designed to work within enterprise change-management processes

Outcome & Value
  • Improved SLA compliance through proactive detection

  • Reduced manual triage effort for operations teams

  • Enabled AI-assisted operations without compromising control or auditability

Enterprise Data & ML Platform
Overview

Designed and implemented large-scale enterprise data and machine learning platforms that served as the foundation for later GenAI adoption.

This work established strong ML lifecycle discipline, long before Generative AI models became mainstream.

Business Problem

Enterprises required:

  • Scalable analytics and ML platforms

  • Reliable feature engineering pipelines

  • Operational ML models with monitoring and governance

  • Compliance-ready data architectures

Platform Architecture & Capabilities

Data Platform

  • Centralized data lakes and lakehouses

  • Optimized analytical storage for large-scale reporting

  • Governed access aligned with enterprise policies

ML Platform

  • Feature engineering pipelines

  • ML model training and validation

  • Deployment into production systems

  • Monitoring for performance and drift

ModelOps & Governance

  • Standardized ML pipelines

  • Versioned models and datasets

  • Monitoring and retraining strategies

  • Compliance alignment (HIPAA, GDPR)

Technology Stack

  • BigQuery, BigQuery ML

  • Cloud-native orchestration and monitoring

Outcome & Value
  • Enabled predictive analytics and anomaly detection at scale

  • Reduced model deployment timelines

  • Established ML governance practices later reused for GenAI platforms

black blue and yellow textile
Enterprise AI Platform Capabilities
  • End-to-end AI/ML & GenAI architecture ownership

  • Governed RAG and agent-based systems

  • LLMOps & MLOps standards (evaluation, observability, cost governance)

  • AI security, access control, and auditability

  • Multi-cloud AI strategy (GCP primary, AWS/Azure compatible)

  • Responsible AI and compliance-ready designs

Platform-Led Architecture Philosophy

I design platforms, not point solutions.

Each platform is built with governance, scalability, and evolution in mind, ensuring enterprises can adopt AI, data, and cloud capabilities without re-architecting for every new requirement.

Monitoring Setup with Cloud Monitoring & Vertex AI Anomaly Detection

Summary:
Established a proactive observability stack that uses AI for real-time anomaly detection and alerting across cloud infrastructure and data pipelines.

Tech Stack:
Google Cloud Monitoring, Vertex AI, Cloud Functions, Pub/Sub, Cloud Logging, BigQuery

Key Responsibilities:

  • Set up centralized logging & metrics publishing from distributed systems

  • Created anomaly detection models using Vertex AI

  • Configured automated alerting with email, SMS, and Slack integrations

  • Built dashboards for SRE/Ops teams to visualize system health

Impact:

  • Reduced MTTR (Mean Time to Resolve) by 35%

  • Prevented major incidents by detecting anomalies before failures

  • Trained Ops team to use AI-driven dashboarding effectively

An accordion-style brochure is laid out on a textured surface. The brochure consists of multiple panels with a minimalist design, featuring text and some graphical elements in muted colors. Each panel displays different content, with some having bold letters and others displaying paragraphs of text.
An accordion-style brochure is laid out on a textured surface. The brochure consists of multiple panels with a minimalist design, featuring text and some graphical elements in muted colors. Each panel displays different content, with some having bold letters and others displaying paragraphs of text.
Cloud Migration Roadmap & Execution

Summary:
Led hybrid cloud migration engagements for clients in pharma sectors, assessing existing infrastructure and defining modernization blueprints.

Tech Stack:
GCP, AWS, Cloud SQL, BigQuery, ADLS Gen2, Terraform, Ansible, Google Migration Center

Key Responsibilities:

  • Assessed on-prem workloads and databases

  • Designed hybrid strategy for phased migration (GCP + AWS)

  • Created landing zones with secure IAM and network policies

  • Defined CI/CD and Infrastructure as Code practices

  • Migrated data lakes and BI workloads to GCP

Impact:

  • Enabled a 30% reduction in operational cost

  • Delivered phased migration roadmap within 90 days

  • Met all compliance requirements (HIPAA, GDPR)

Real-Time ELT Pipeline for Lakehouse using Composer, BigQuery, Pub/Sub, and Cloud Storage

Summary:
Developed a real-time, orchestrated ELT data pipeline to ingest, transform, and load structured/unstructured data into a unified Lakehouse architecture on GCP. The pipeline supports both batch and streaming ingestion models.

Tech Stack:
Cloud Composer (Airflow), BigQuery, Cloud Storage, Cloud Functions, Pub/Sub, Dataform

Responsibilities:

  • Designed event-driven data ingestion using Pub/Sub with schema enforcement

  • Orchestrated end-to-end workflows using Cloud Composer (Airflow)

  • Parsed, cleansed, and stored raw data in Cloud Storage (Bronze layer)

  • Applied transformations and quality checks with BigQuery SQL (Silver layer)

  • Managed curated views (Gold layer) for downstream analytics & ML

  • Triggered notification and error alerts via Cloud Functions

Impact:

  • Reduced batch processing time from 2 hours to 20 minutes

  • Achieved unified governance with Lakehouse pattern

  • Enabled seamless consumption by GenAI model(Gemini Pro)

Scalable Batch + Streaming Data Pipeline Using Dataflow, Dataproc, and BigQuery

Summary:
Architected a hybrid batch + stream pipeline to process high-volume clickstream, sales, and data, leveraging GCP native services for scalable processing and warehousing.

Tech Stack:
BigQuery, Dataflow, Dataproc (Spark), Cloud Functions, Cloud Storage, Cloud Scheduler

Responsibilities:

  • Built Apache Beam pipelines on Dataflow for near-real-time stream processing

  • Offloaded heavy joins & transformations to Dataproc Spark clusters (scheduled with Cloud Scheduler)

  • Integrated external data into Cloud Storage and ingested to staging tables

  • Transformed and enriched data in BigQuery for reporting & ML

  • Set up auto-scaling, fault-tolerant architecture using native GCP triggers

Impact:

  • Enabled analytics on Data

  • Cut cloud compute costs by 25% through hybrid job design

  • Improved insight availability from 24 hours to 4 hours

An accordion-style brochure is laid out on a textured surface. The brochure consists of multiple panels with a minimalist design, featuring text and some graphical elements in muted colors. Each panel displays different content, with some having bold letters and others displaying paragraphs of text.
An accordion-style brochure is laid out on a textured surface. The brochure consists of multiple panels with a minimalist design, featuring text and some graphical elements in muted colors. Each panel displays different content, with some having bold letters and others displaying paragraphs of text.
Legacy .NET Monolith to Microservices on GKE with PostgreSQL Backend

Summary:
Led the end-to-end modernization of a legacy enterprise application originally built on .NET and IBM DB2, transforming it into a scalable, containerized microservices architecture hosted on Google Kubernetes Engine (GKE)with Python-based APIs and PostgreSQL as the new backend.

Tech Stack:
.NET (legacy), GKE, Docker, Python (FastAPI/Flask), PostgreSQL, IBM DB2, Cloud Build, GCP IAM, Cloud Logging, Cloud SQL, GitOps (Jenkins)

Responsibilities:

  • πŸ”„ Monolith to Microservices Refactoring

    • Analyzed .NET legacy UI and business logic

    • Broke monolithic code into domain-driven microservices

    • Rewrote APIs using Python (FastAPI) to interact with the new database

  • πŸ—ƒοΈ Database Migration

    • Reverse-engineered schema and data from IBM DB2

    • Migrated historical and operational data to PostgreSQL

    • Created compatibility layers for downstream reporting systems

  • ☁️ Cloud-Native Deployment

    • Containerized Python services with Docker

    • Deployed all services to Google Kubernetes Engine (GKE)

    • Implemented horizontal auto-scaling, readiness/liveness probes, and rolling updates

  • πŸ” Security and Networking

    • Configured GCP IAM roles, service-to-service authentication, and private access to Cloud SQL

    • Used internal load balancing and VPC-native clusters for secure microservice communication

  • πŸ”§ Observability and CI/CD

    • Integrated Cloud Logging and Monitoring for each service

    • Set up Git-based CI/CD pipelines with Cloud Build and ArgoCD for continuous delivery

Impact:

  • Modernized legacy tech stack, improving scalability and maintainability

  • Reduced operational costs by moving from licensed DB2 to open-source PostgreSQL

  • Improved deployment speed with microservices delivering updates independently

  • Enhanced performance and fault isolation through containerized services on GKE

Enterprise MS SQL Server Migration to GCP Cloud SQL

Summary:
Successfully migrated a production-grade Microsoft SQL Server database from on-premise infrastructure to Google Cloud SQL for SQL Server, enabling better scalability, high availability, and managed backup with reduced operational overhead.

Tech Stack:
MS SQL Server (on-prem), Cloud SQL for SQL Server, Database Migration Service (DMS), Cloud Monitoring, VPC Peering, IAM, Cloud Scheduler, Terraform

Responsibilities:

  • 🧭 Assessment & Planning

    • Conducted deep analysis of existing database structure, dependencies, and usage patterns

    • Planned zero-downtime cutover window and rollback strategy

  • βš™οΈ Migration Execution

    • Set up Database Migration Service (DMS) with minimal downtime replication

    • Migrated schema, stored procedures, linked servers, SQL Jobs, and data

    • Tuned long-running queries and optimized indexes post-migration

  • πŸ” Security & Networking

    • Enabled private IP access to Cloud SQL via VPC peering

    • Configured IAM roles, SSL enforcement, and automated backups

    • Integrated with Secret Manager for app credential handling

  • 🧩 Post-Migration Optimization

    • Configured Cloud Monitoring and Query Insights for performance tuning

    • Scheduled automated backups and maintenance windows via Cloud Scheduler

    • Used Terraform to version control infrastructure provisioning

Impact:

  • Reduced DB management overhead by 70% through managed Cloud SQL

  • Improved performance consistency and security posture

  • Enabled integration with other GCP services like BigQuery and Looker

  • Achieved seamless migration with <5 min downtime during cutover

An accordion-style brochure is laid out on a textured surface. The brochure consists of multiple panels with a minimalist design, featuring text and some graphical elements in muted colors. Each panel displays different content, with some having bold letters and others displaying paragraphs of text.
An accordion-style brochure is laid out on a textured surface. The brochure consists of multiple panels with a minimalist design, featuring text and some graphical elements in muted colors. Each panel displays different content, with some having bold letters and others displaying paragraphs of text.
Pre-Sales Project: MS SQL Server Migration Evaluation – Azure vs GCP

Summary:
Led a pre-sales engagement for a global manufacturing client to evaluate the migration of a business-critical MS SQL Server hosted on-premises. The engagement focused on determining the feasibility of either lift-and-shift, cloud-managed services, or enterprise server hosting on Azure and GCP. The goal was to provide a comprehensive architecture and operational model aligned with scalability, compliance, and cost optimization objectives.

Engagement Type:
Pre-Sales Architecture & PoC (Proof of Concept)
Client: Confidential (Manufacturing sector)
Status: Solution proposed and PoC completed, deal not closed

Key Objectives:

  • Evaluate whether to lift-and-shift the MS SQL Server VM or modernize to managed database offerings

  • Provide a comparative analysis between Azure SQL Managed Instance, GCP Cloud SQL for SQL Server, and self-hosted SQL Server on VM

  • Ensure support for linked servers, SSIS/SSRS workloads, Always-On availability, and Active Directory integration

  • Deliver a working PoC with sample workloads on both clouds

Solutioning Responsibilities:

  • 🧭 Requirements Analysis

    • Worked with enterprise architects to gather inputs on current workloads, high availability, latency sensitivity, and DR expectations

    • Assessed dependencies on SQL Server Agent, Linked Servers, CLR objects, and stored procedures

  • πŸ—οΈ Solution Architecture

    • Designed three migration paths:

      1. Lift-and-Shift to IaaS VMs on Azure/GCP using Migrate for Compute Engine / Azure Migrate

      2. Platform Migration to Azure SQL Managed Instance and GCP Cloud SQL (SQL Server)

      3. High-Availability SQL Server 2022 on Azure VM (Enterprise Licensing) with DR and Always-On clustering

    • Created TCO comparison, networking diagrams, IAM mapping, backup/restore policies

  • πŸ” Proof of Concept Execution

    • Set up Cloud SQL instance on GCP with VPC peering, private IP, and IAM integration

    • Created Azure SQL Managed Instance with AD Authentication and VNet

    • Migrated sample schema and datasets using SQL Server Migration Assistant (SSMA)

    • Validated workloads, performance, replication, and monitoring in both environments

Outcomes & Insights:

  • Azure SQL Managed Instance supported more enterprise features like linked servers and SSIS without external hacks

  • GCP Cloud SQL offered lower operational cost, simpler IAM, but had limitations around advanced SQL Server features (e.g., cross-database queries, SQL Server Agent scheduling)

  • Lift-and-shift to VM was feasible but didn’t align with modernization and O&M reduction goals

  • Delivered a 40-page solution proposal with PoC performance benchmarks and risk assessment

Impact:

  • Gave client a clear, technically sound roadmap for future-state architecture

  • Demonstrated ability to navigate complex SQL workloads across clouds

  • Although the client paused the initiative due to budget review, the technical groundwork remains reusable for future cycles

Oracle Database Migration to AWS EC2 & RDS

Summary:
Led the migration of a mission-critical Oracle 11g/12c database from an on-premise data center to Amazon Web Services (AWS). The engagement focused on rehosting (lift-and-shift) for short-term continuity and replatforming select workloads onto Amazon RDS for Oracle to reduce operational overhead and licensing costs.

Key Objectives:

  • Migrate large Oracle transactional and analytical databases (~5TB) from aging on-premise infrastructure

  • Reduce hardware/maintenance costs, improve backup and recovery, and prepare for cloud-native modernization

  • Meet DR and HA expectations within a single-region setup

Responsibilities:

  • 🧭 Discovery & Planning

    • Conducted infrastructure and application dependency analysis

    • Reviewed data access patterns, backup cycles, archive policies, and licensing model

    • Selected a hybrid migration strategy:

      • Lift-and-shift critical transactional DB to EC2 (Oracle EE on Linux)

      • Replatform reporting DBs to Amazon RDS for Oracle

  • βš™οΈ Execution

    • Built EC2-based Oracle instance with custom filesystem layout (ASM to XFS conversion)

    • Migrated schema using Oracle Data Pump (expdp/impdp) and RMAN for full backups

    • Migrated reporting workloads to RDS for Oracle, tuning parameters for query throughput

    • Set up Database Links between EC2 Oracle and RDS Oracle for hybrid queries

  • πŸ” Security & Monitoring

    • Implemented VPC peering, security groups, and KMS-encrypted backups

    • Integrated CloudWatch monitoring and custom scripts for performance tracking

    • Configured automated snapshots and PITR for RDS instances

Impact:

  • Reduced overall TCO by 40% annually compared to on-premise licensing + infra

  • Improved RTO/RPO using automated backups and snapshot scheduling

  • Laid the foundation for future refactoring of data pipeline to AWS-native services

  • Trained client’s DBAs on managing hybrid EC2 + RDS deployments

An accordion-style brochure is laid out on a textured surface. The brochure consists of multiple panels with a minimalist design, featuring text and some graphical elements in muted colors. Each panel displays different content, with some having bold letters and others displaying paragraphs of text.
An accordion-style brochure is laid out on a textured surface. The brochure consists of multiple panels with a minimalist design, featuring text and some graphical elements in muted colors. Each panel displays different content, with some having bold letters and others displaying paragraphs of text.
Cross-Cloud Data Pipeline: Azure to GCP via ADLS, Databricks, Apigee & BigQuery

Summary:
Designed and implemented an enterprise-grade cross-cloud data platform where data from over 20+ pipelines across multiple Azure regions was ingested, processed, and transferred securely to Google Cloud. The pipeline leveraged Azure Data Factory, ADLS Gen2, Databricks, Apigee, Cloud Composer, and BigQuery for a seamless, end-to-end data ingestion and analytics flow.

🧭 Architecture Overview
  1. Azure Side – Ingestion & Cleansing

    • Ingested data from 20+ source systems using Azure Data Factory (ADF) pipelines across multiple geographies

    • Data saved to ADLS Gen2 (Raw Layer) in partitioned format with metadata tagging

    • Applied cleansing, validation, and formatting rules using Azure Databricks (PySpark)

    • Saved output in Cleansed Layer of ADLS Gen2 in Delta Lake format

  2. Cross-Cloud Integration

    • Built REST APIs using Apigee (GCP) to securely pull data from ADLS Gen2

    • Streamed cleaned datasets via API gateway into GCP Cloud Storage (Staging)

  3. GCP Side – DAG-Orchestrated Processing

    • Used Cloud Composer DAGs to:

      • Pull data from Cloud Storage (RAW)

      • Load into BigQuery RAW Layer

      • Apply schema enforcement, deduplication, anomaly detection via PySpark jobs & SQL

    • Curated, use-case-specific data was moved into the BigQuery Curated Layer

  4. Consumption

    • Curated datasets were exposed via Looker, Data Studio, and Vertex AI notebooks

    • Enabled data scientists to pull data from curated or cleansed layers for ML training

    • Ensured data availability in near real-time across regions

πŸ” Security & Operations
  • Applied IAM roles, private VNet peering, and OAuth2 tokens for inter-cloud API security

  • Set up audit logging across Azure & GCP environments

  • Monitored pipeline failures using Cloud Logging, Azure Monitor, and Slack alerts via Cloud Functions

βœ… Impact
  • Reduced ETL latency from 12 hours to under 2 hours for high-volume pipelines

  • Enabled cross-cloud compliance and governance across Azure and GCP

  • Empowered 10+ data science use cases using curated data from GCP

  • Demonstrated real-time multi-region ingestion and hybrid cloud orchestration

Teradata to BigQuery Migration Using GCP Native Tools

Summary:
Successfully migrated a legacy, high-volume Teradata enterprise data warehouse from on-premise infrastructure to Google BigQuery, using Google's native BigQuery Assessment Tool and BigQuery Migration Service. The engagement aimed to reduce operational overhead, enable advanced analytics, and modernize the data platform architecture for scalability and self-service.

πŸ”§ Responsibilities
  • Assessment & Discovery

    • Conducted a detailed workload analysis using the BigQuery Assessment Tool

    • Identified compatibility gaps in Teradata SQL, data types, and stored procedures

    • Classified warehouse objects into fully automatable, partially manual, and deprecated categories

  • Migration Planning

    • Designed a phased migration strategy with zero/minimal downtime

    • Defined data staging layers (Raw, Cleansed, Curated) and incremental data ingestion plans

    • Mapped roles and permissions from Teradata to GCP IAM policies

  • Data Migration Execution

    • Utilized BigQuery Migration Service to extract and load data from Teradata into BigQuery

    • Leveraged SQL Translator to convert Teradata SQL to BigQuery-native syntax

    • Ingested large datasets via Cloud Storage and Data Transfer Service with parallel loads

  • Validation & Performance Tuning

    • Performed row-level reconciliation and query performance benchmarking

    • Applied partitioning and clustering strategies to optimize query speed and cost

    • Created materialized views and denormalized tables for BI and reporting teams

  • Security & Governance

    • Implemented data access policies with column-level and row-level security

    • Configured audit logging, backup schedules, and lifecycle policies for storage

    • Provided role-based dashboards for access control review and monitoring

  • Enablement & Handoff

    • Conducted knowledge transfer workshops for business analysts and data scientists

    • Documented the end-to-end architecture, migration artifacts, and rollback plans

    • Provided post-migration support and performance tuning recommendations

βœ… Outcomes & Impact
  • Migrated over TB's of structured data and 100's of procedures

  • Improved dashboard performance by ~40% after schema optimization

  • Enabled real-time data sharing with Downstream and ML/AI teams via Vertex AI + BigQuery integration

  • Reduced annual infra+license cost by ~50% post Teradata sunset

An accordion-style brochure is laid out on a textured surface. The brochure consists of multiple panels with a minimalist design, featuring text and some graphical elements in muted colors. Each panel displays different content, with some having bold letters and others displaying paragraphs of text.
An accordion-style brochure is laid out on a textured surface. The brochure consists of multiple panels with a minimalist design, featuring text and some graphical elements in muted colors. Each panel displays different content, with some having bold letters and others displaying paragraphs of text.

Get in Touch

Feel free to reach out for collaborations, inquiries, or just to connect. I'm here to help and share ideas!


πŸ”— LinkedIn: linkedin.com/in/gimshra8
🌐 Portfolio: gm01.in