Web Platform for AI Model Management

Work done during my time as Principal Data Scientist at Fidelity Investments.

Shyam Subramanian

Image for work project: Web Platform for AI Model Management

Built and deployed an AI Model Management platform from scratch, replacing a legacy tool and becoming the single system of record for AI portfolio across the enterprise
Designed and implemented role-based access control, risk and governance workflows, deployment tracking, and cost and performance monitoring
Powered by a configurable CMS-driven architecture allowing business and risk teams to update workflows and questionnaires without code changes
Scaled to 1000+ models, 100+ use cases, 15+ business groups, and 700+ users spanning model owners, risk managers, and business leadership
Built on a modern full-stack architecture with ReactJS, NodeJS, PostgreSQL, OpenSearch, Strapi CMS, Kubernetes, Jenkins, and Argo

Problem

Managing AI models at enterprise scale is a coordination problem as much as a technical one. Model owners, risk managers, and business leaders each need visibility into different aspects of the AI portfolio: registration, governance, deployment status, cost, and performance. However, these were fragmented across a legacy tool and manual processes that could not scale with the growing volume of models and use cases. Without a unified system, tracking the health and risk of an AI portfolio spanning multiple business groups became increasingly difficult, slowing down decision-making and creating blind spots in model oversight.

My Role

Designed and co-built the platform end-to-end, owning the design and implementation of role-based access control, risk and governance workflow integration, deployment tracking, and cost and performance monitoring.
Collaborated closely with model owners, risk managers, and business stakeholders across 15+ business groups to translate complex governance and compliance requirements into scalable platform features.
Contributed to the overall system architecture, infrastructure setup on Kubernetes, and CI/CD pipeline configuration through Jenkins.

Approach

The platform was designed to replace a legacy tool that could no longer scale with the growing volume of AI models and use cases across the enterprise. The architecture combined a ReactJS frontend with a NodeJS backend, PostgreSQL for structured data, OpenSearch for full-text model search, and Strapi CMS for content management, deployed on Kubernetes with a Jenkins CI/CD pipeline. At the core of the platform was a model registry that allowed model owners to register AI models with rich metadata capturing model type, use cases, business group ownership, deployment status, and associated risk and governance information. OpenSearch powered both full-text search and field-level filtering across the registry, enabling users to quickly discover and explore models across the enterprise portfolio by keyword, model type, business group, deployment status, and other attributes.

A central design challenge was role-based access control across a diverse user base spanning model owners, risk managers, administrators, and viewers across 15+ business groups. The RBAC system combined role hierarchy with fine-grained permissions, ensuring each user type had the right level of visibility and control without exposing sensitive governance and risk information beyond appropriate boundaries.

Risk and governance workflows were designed as structured manual workflows with tracking, allowing risk managers to step through approval and review processes with full audit visibility. A key design decision was making the platform highly configurable without requiring code changes. Risk and governance questionnaires were versioned and fully editable through the Strapi admin dashboard, allowing business and risk teams to update questions and workflows as regulatory requirements evolved.

Deployment tracking and cost and performance monitoring were integrated into the same platform, giving model owners and leadership a single unified view of the AI portfolio rather than requiring them to navigate multiple disconnected systems. The platform also exposed both REST and GraphQL APIs, supporting flexible integration with other internal tools and systems.

Impact

Models

1000+

Users

700+

Business Groups

15+

Learnings

Model risk and governance is an evolving process. Regulatory requirements, internal policies, and risk frameworks change over time. Building static workflows would have required constant developer involvement to keep up. Designing questionnaires and workflows to be versioned and editable without code changes was one of the most valuable decisions we made.

Different stakeholders need fundamentally different things. Model owners needed a streamlined registration and tracking experience, risk managers needed structured workflows and audit trails, and business leaders needed high-level portfolio visibility and reporting. Designing a single platform that served all three without overwhelming any of them required careful thinking about role-based access, navigation, and information hierarchy making RBAC not just a security decision but a UX decision.

CMS-driven content management reduces engineering bottlenecks. Using Strapi Collections as the configuration layer meant business and risk teams could update workflows, questionnaires, and platform content directly through the admin dashboard. This shifted ownership of the platform's evolution from engineering to the business, which was the right call for a tool that needed to adapt continuously.

How business leaders use the tool is different from how you expect. Leadership gravitated toward high-level portfolio views, cost tracking, and use case prioritization rather than model-level details. Understanding this shaped how we designed dashboards and surfaced aggregate insights, and reinforced the importance of involving end users early in the design process.

Collecting clean data is hard even with the right tool in place. With 15+ business groups each operating differently, getting consistent and complete model registration data required ongoing effort beyond the technology. Adoption, data quality, and organizational alignment turned out to be as challenging as the engineering itself.

What I Would Do Next

Introduce AI-powered model discovery using semantic search and embeddings to surface similar models across the enterprise, reducing duplication and helping teams build on existing work.
Integrate real-time model monitoring by connecting deployment tracking to live performance and drift signals, moving from manual status updates to automated health tracking.
Implement automated data quality monitoring through nightly schedules that flag incomplete, inconsistent, or stale model records and alert owners to resolve them.