The backdrop to this decision is rooted in fact that we're in the wild west of LLMOps. During a recent panel discussion at Utah Tech Week, Along with me and a few others, Nate Sanders from Artifact AI made an intriguing observation, noting that prompts should not be viewed as proprietary assets, However, they're integral to our operations—so much so that we're compelled to develop testing protocols for them. If each company requires a high level of ownership over them, it's hard not to see them as proprietary assets. Despite leveraging Humanloop for managing our prompts, the existing evaluation tools fall short of our requirements. We still need to build off of it further. We need a testing service that simulates pseudo-consumer interactions with our chatbots, with distinct personalities, to gauge the impact of prompt modifications on user experience. This brings us to the point of this article. the need for a dedicated testing framework that operates independently from our main application, to rigorously assess our prompts and agents. Is this the domain of a micro-service? Martin Fowler's piece, "Monolith First," resurfaced during my coursework in the MS-CS program at CU Boulder this week, talk about good timing. Fowler advocates for starting with a monolithic architecture, suggesting it's more pragmatic to evolve into micro-services as the application and its requirements become more clearly defined. Drawing inspiration from this perspective, I'm leaning towards starting simple. The allure of micro-services—with their promise of scalability and modularity—is undeniable, but so is the simplicity and cohesiveness of a monolith, especially in the early stages of a domain as fluid as LLMOps. This decision, while seemingly mundane in the grand landscape of tech decisions, underscores a broader philosophy that permeates our approach to development: prioritize work that brings tangible value, and remain agile in new Seas.