At Man Group we strive to be at the forefront of product design and innovation. Investment in technology and processes is vital to support our growth. While we regularly develop greenfield software solutions for new challenges, we equally invest in the continuous evolution of our core systems. Under this programme our core trading platform, ROSA, has efficiently served as the foundation of our investment management systems for many years.
As such a platform matures, development teams face an increasingly complicated decision. Using the established system and processes they can quickly deliver incremental business solutions with predictable effort – a strategy naturally favoured by business unit delivery managers. Yet it incurs less tangible long-term intensifying costs that must ultimately be dealt with. This includes technical challenges such as unplanned obsolescence from technology end-of-life or incompatibilities with newer systems and standards. It also includes social challenges, such as continuing to attract and retain the best talent in the industry who want to work on the cutting edge.
Alternatively, the teams could deliver a completely new system using the latest frameworks, design patterns and industry standards. While the freedom and flexibility this affords is attractive to developers, along with the opportunity to improve their skills, the short-term business impact is significant and could include:
- New development tooling and upgrades to continuous delivery systems to build, test and package the new components;
- New infrastructure or investment in cloud services for deployment environments;
- Changes to runtime monitoring, alerting and support processes;
- Custom bridging services to communicate with the existing system.
Based purely on these considerations and the quantifiable costs involved, the need to rapidly deliver features to support product innovation invariably favours building on the existing system.
A platform strategy of continuous investment and evolution can help mitigate these issues. Planning for technology obsolescence, developing solutions for integrating new technologies and regularly upgrading existing components with minimal impact on business delivery is arguably more challenging than developing a new architecture in isolation. It requires ongoing research, investigation and trials of emerging standards and demands consideration of impact, compatibility, training, tooling, and rollout. Platform services are in constant flux, some using the latest techniques and others waiting for their turn to be upgraded. If implemented successfully though, the gains are significant and ongoing. Investment in business logic is retained, developers are engaged and challenged, costs are more predictable, delivery more efficient. This is how we maintain ROSA.
As a core system, ROSA interacts with a wide range of components across business units, which each use the technology most suitable for their needs. Languages include .NET, Java, and Python, with deployments ranging from Windows Services to Linux Containers on Kubernetes. Conceptually this may be visualised as follows:
Source: Platform Engineering team at Man Group
Illustrative Example. For information only.
Different versions of the same technology may be running concurrently - a stable mature service running on the older .NET Framework 3.5 may be targeted for upgrade eventually but not necessarily before services that experience more frequent functional updates.
Such diversity introduces additional challenges, particularly when considering low-level enhancements that could require configuration and functional changes to the entire system to reap maximum benefit. Such enhancements could include:
- Automated Service Discovery: Each service dynamically reports its address and liveness/readiness status to a central registration system, allowing other services to locate and invoke healthy instances at runtime. It facilitates automated load balancing/disaster recovery and advanced traffic routing scenarios, such as blue/green/canary deployments. It can also eliminate the need for hardware load balancer appliances and static configurations;
- Distributed Telemetry: Detailed insights into a system topology, such as tracing call chains across services and systems, provides critical value when evolving a distributed system. It can help with impact planning for service restructuring or replacement, identifying key performance bottlenecks, and analysis of each release for potential performance degradation;
- Secure Communications: Authenticating and securing calls between trusted services using mTLS with rotating certificates and unique encrypted identities facilitates enhanced access controls, more finely grained permissions and reduces attack surface area. This is an important capability when deploying into environments with a variety of security controls, be they a firewalled intranet or a globally distributed cloud.
Introduction of new application-level features should also consider the broadest possible compatibility. The Microsoft Orleans virtual actor framework is popular amongst .NET developers for creating highly concurrent and scalable services, yet the Python and Java teams would be unable to take advantage of it to contribute actors of their own. Providing a secret store or state store often requires custom APIs (‘Application Programming Interfaces’) or SDKs (‘Software Development Kits’) that may not be compatible with older frameworks or may introduce a level of vendor lock-in that could be difficult to unwind.
Given all these considerations, the Dapr (‘Distributed Application Runtime’) platform provides an interesting value proposition. Its primary component is a compact executable written in Go that runs as a sidecar process deployed alongside an application communicating over HTTP or gRPC, making it compatible with “any language, any framework, anywhere”:
Dapr provides a range of building blocks that any application can use. By routing calls to other services through their respective Dapr sidecars, applications gain automated service discovery, distributed telemetry, and secure communications over mTLS with no other changes required to the application code. Dapr provides a consistent API and pluggable providers for secret stores and state stores, allowing them to be swapped out simply by changing a configuration setting. And Dapr provides an Orleans-based virtual actor framework compatible with many languages - including .NET, Python and Java. The list of features goes on!
With Dapr we can now provide a consistent suite of platform and application features to all services across the estate with minimal effort.
Source: Platform Engineering team at Man Group
Illustrative Example. For information only.
Developing Solutions with the Dapr Community
Early in its development most Dapr use cases relied on Kubernetes deployments and its DNS system for service discovery. We needed Dapr to operate standalone in any hosting scenario, which meant integrating it with an external service discovery system. Fortunately, Dapr is a fully open-source system hosted on GitHub (under review for adoption as a CNCF incubation project) and the maintainers openly welcome contributions. We therefore developed a name resolution component for Hashicorp Consul and contributed it back to the Dapr project, making automated service discovery for any service on any platform available to all Dapr adopters.
More challenging though was how to integrate Dapr into the .NET delivery process with minimal impact on developers or deployment mechanisms. As a sidecar process, Dapr must be distributed and configured alongside each application instance. The Dapr team provide basic command-line tooling to launch the sidecar and an application together, or configurations to inject it into a Kubernetes Pod. However, developers would expect to launch an application within Visual Studio with all dependent components seamlessly started and stopped as necessary. Hosting models such as Windows Services or IIS would ideally launch an application and Dapr would launch alongside it. In other words, the application should ideally own the Dapr sidecar and control its lifetime as if it were a DLL or any other dependent component.
To address this, we developed Dapr Sidekick for .NET – a lightweight library compatible with a wide range of .NET platforms that eliminates the need for the Dapr command-line by seamlessly managing the Dapr sidecar from within the application.
At runtime it configures, launches, and continually monitors Dapr and instantly attempts to restart it should it fail for any reason. Sidecar health is reflected in the application health checks, treating the sidecar as a core dependent component, and all log events are routed through the .NET hosting platform. In fact, so significantly can it simplify the use and adoption of Dapr in standalone hosting environments that we chose to open-source the project and contribute it back to the Dapr community. Using Sidekick, any .NET developer can now easily integrate Dapr into their process!
In summary, Dapr allows us to consistently add platform-wide features and new application capabilities to any service, on any framework, on any operating system or hosting model with minimal development effort. It facilitates adoption of cloud-native practices and service mesh features. The building block provider model allows different state/secret stores and other external components to be swapped out with almost zero impact on the application. By leveraging Dapr we can evolve and scale ROSA more effectively, ensuring business needs continue to be met long into the future.
The organisations and/or financial instruments mentioned are for reference purposes only. The content of this material should not be construed as a recommendation for their purchase or sale.