Open Source licensing is mainstream today and for modern data science libraries it has become practically the default. Nowadays, it would be a real challenge to do any development work in this domain without using any Open Source software at all. This blog post is about how we are using Open Source across our technology teams which drive our investment decision making (collectively known as Man Alpha Tech). We focus on where we find particular value and how we addressed challenges that we found along the way.
At Man Alpha Tech, the question “Is there an Open Source solution for this?” is usually the first thing we ask when we expand our infrastructure, grow our core platform, or look for building blocks for a new data analysis model. Firstly, this prevents us from reinventing the wheel if an implemented solution already exists. The “free” aspect of Open Source eliminates any overhead of acquiring trial licenses or working with restricted versions. A package can be quickly downloaded and tested in a sandboxed environment, which gives us a quick turnaround evaluation. In most cases, any existing/off-the-shelf solution will likely require some degree of tailoring. We can assess this very quickly and the “open” aspect of Open Source enables us to see if a particular adaptation is straightforward or a major piece of work. With the standardisation of Open Source licenses we know without having to read the fine-print if we can use this component in the given scope of a project. Open Source design is likely to use open standards, allowing to combine components of different origins into a whole that is more than just the sum of its parts. This is the opposite of vendor lock-in, where the boundaries of your ecosystem constrain the use to an externally defined application space and where making two systems of different origins work together is fiddly at best.
We try to give back to the community by releasing some of our libraries and tools under widely used Open Source licenses. These include:
- Arctic: Tick-data store
- PyBloqs: Report generation
- okcli: A command-line tool for OracleDB
- OpenStack load balancing
- Pynorama: NLP data visualisation
- Prometheus monitoring of Pure Storage FlashBlades. You can find all our projects on our github page.
Releasing these components as Open Source benefits us by increasing the user base, external code review and ideally external contributions to the code base. But on top of that having the opportunity to give back when being an active consumer of Open Source software is a very motivating developer experience.
In order to be productive as a developer in this kind of environment, it is necessary to embrace the “you need it, you add it” attitude of Open Source software development. This includes wanting to take ownership of the code rather than delegating responsibility to a third party vendor. Documentation of Open Source projects is often fragmented and can become quickly out of date (there are of course notable exceptions). Digging into the source code directly and reading it like a book is the key skill to overcome this obstacle.
From a software management perspective, it is important that the whole business understands there is value in both using and releasing Open Source software. We profit from making a software component public. These advantages are not self-evident, however, and need to be communicated effectively to all decision makers. Man Group management are very progressive in this regard and this buy-in allows us to extend the Open Source-friendly environment with events like our public Hackathon that focuses on contributing to public Open Source projects. The characteristics of various Open Source licenses need to be understood not only among developers, but also e.g. in the legal team, who have developed a company-wide open source policy for us.
The challenges we encountered with our approach of preferentially using Open Source software were mostly of a technical nature: With a low hurdle to onboard Open Source components by everyone, a large number of packages will end up in our software stack. Sometimes different packages solving the same problem are onboarded by different people at different times. Communication is the key to prevent this kind of duplications in the software stack. This information must be quickly accessible by everyone, and it should be easy to make modifications and form some consensus about which package should be selected as the default. What works well for us is a technology radar inspired by the ThoughtWorks Radar, see Build your own radar. It groups packages into buy/sell/hold categories. Changes can be made and discussed through a git pull-request.
Another challenge when using Open Source at large scale is the health of the underlying Open Source projects. Not every project will be critical for us, e.g. we might be fine sticking to the last stable version for our existing projects and switch to a different library once we do the next major refactoring. Widely used libraries and tools have a large number of stakeholders, so we can be quite confident in their future stability. Our focus is therefore on projects that we rely on substantially and where the project community is comparably small. This is usually in specialised areas where we have good expertise and a view on what a good path for future development would be. Systematically engaging with the project communities e.g. at the annual gathering is one way to keep on top of recent developments and potentially contribute to the project roadmap.
Open Source has been the foundation of our work for a long time. Developer skills together with management buy-in allow us to make effective use of Open Source components. We aim to play an active role and contribute back to the community through releasing internal packages as open-source, contributing to the Open Source projects upstream and by organising Open Source focused events. Standing on the shoulders of giants, we try to add an element of our own creativity and gain further insights based on our tech development.