Introduction

Man Group has been at the forefront of data science, long before it was a commonly used phrase. We know the inherent value that is to be found in building a world-leading technology platform, enabling us to ingest, analyse and ultimately extract alpha from datasets of varying shapes and sizes.

Early on when developing our platform, we made an active choice to fully commit to the Python data science and analytics ecosystem. This meant that our data storage solution needed to be both highly performant and feature rich; capable of storing both large scale batch data as well as high volume streaming tick data, whilst seamlessly integrating with Python.

With this in mind, we built Arctic, our Python-centric data science database. Open-sourced in 2015, Arctic integrates with Python as a first-class feature, is intuitive for both researchers and engineers to use, and fits in seamlessly with the rest of our Python stack. Today, Arctic is the database engine of choice for all front-office analytics at Man Group.

In this article, we would like to demonstrate how you can start using Arctic for highly intuitive data storage and retrieval.

Use Cases

Before kicking off with the tutorial, we'd like to be clear regarding some of the use cases that we feel best suit Arctic.

Arctic is a great choice for the following use cases:

  • Bulk timeseries dataframe data: Arctic is primarily designed to store data that is timeseries indexed, a good example of which is demonstrated in the GDP example below. Furthermore, Arctic has been optimised for bulk data transfers. We find dataframes with a datetime index a good abstraction for the sort of data, and data sizes, that Arctic is designed for;
  • Tick data: The Arctic Tick Store is designed to store high volume tick data such as exchange price feeds. Please note that whilst this article does not delve into the Tick Store in much detail, more details can be found in the Arctic documentation;
  • Mixed timeseries & Python storage: Arctic also allows the storage of dictionary-nested timeseries data and can automatically pickle and un-pickle common Python data types. This means Arctic can function as a store for not only timeseries data, but also commonly used Python datastructures.

Arctic is not a great choice for the following use cases:

  • Transactional, relational data: If you require support for database transactions or are storing relational data, Arctic is probably not the optimal choice for your data.

Initial Setup

Arctic has two prerequisites:

  1. Python: Arctic requires a functional Python environment. We won't delve into how to setup up a Python environment as there are plentiful resources on how to do this on the wider web;
  2. Mongo: Arctic also requires a functional Mongo cluster. For the purposes of this guide, we'll use Docker to create a simple single-node MongoDB instance.

To begin creating the required MongoDB instance, we'll first pull the latest Mongo docker image:

We can now start the container:

And that's it for Mongo! We'll come back to using the cluster in a moment, but for now let's go ahead and install Arctic:

And that's it! We now have a functional single-node Mongo cluster and Arctic installed in our local Python environment. Why don't we take it for a spin and see what we can do?

Basic Usage

Arctic provides a wrapper for handling connections to Mongo. To start with, we need to open a Python shell and instantiate our Arctic class against our Mongo cluster:

Now we have an Arctic instance, we can use it to manage the available libraries and symbols. Symbols are the atomic unit of data storage within Arctic and are predominantly used to store DataFrames containing timeseries data. Libraries, on the other hand, contain like symbols and are accessible via an Arctic instance. Note that both libraries and symbols are referenced via string identifiers.

With these required definitions out the way, let's have a look at what libraries already exist:

Unsurprisingly, there are no pre-existing libraries stored in our brand new Mongo cluster, so let's create one:

You may see a message that the library has been created, but `couldn't enable sharding`. If you do, don't worry! It just means we'll be storing the data unsharded, which is not an issue for small datasets.

Once a library is initialized, you can access it:

We now have access to the library and can manage the contained symbols. Let's go ahead and create a symbol by writing a DataFrame containing timeseries data to it! To do this, we're going to fetch a small example of some realistic, real world data - the Open Data GDP dataset:

We now have a simple DataFrame containing a datetime index. Each row represents a year and each column contains an individual timeseries of the nations' GDP over time. This structure is typical for data written to Arctic.

Arctic adheres to a philosophy known as Pandas-In, Pandas-Out, which means that all primitives either take or return Pandas DataFrames. This philosophy ensures that the API is immediately intuitive to both researchers and engineers alike.

With that in mind, let's write out our DataFrame to Arctic:

The write primitive takes an input dataframe and writes it out as an indexed, structured timeseries.

To read the data back, we can use the read primitive:

This is a limited and simplistic example of a read operation. Arctic also supports retrieving data by data range and by column subset. Furthermore, as Arctic also efficiently versions data, read allows you to pass in a point in time to rewind your data back to the specified point in time. For more information, please see the complete docs.

And there we have it! We're written our first timeseries DataFrame to Arctic, and pulled it back out!

What Next?

We've seen how easy it is to install and use Arctic. With just one package install and a few lines of Python, we've stored and retrieved a simple datetime indexed Pandas DataFrame. In addition to the functionality covered in the introduction, Arctic also offers a host of more advanced features including:

  • Support for streaming tick data;
  • Versioning, allowing modifications to be tracked through time and reverted;
  • Bulk data transfers of dataframes containing hundreds of thousands of rows.

The Arctic story doesn't end there though. At Man Group, we're continuing to invest heavily in our data platform, and that includes Arctic. Following the success of Arctic both internally within Man Group and externally, we've been in the process of a ground-up rewrite designed to ensure that Arctic is ready for the next generation of data challenges that Man Group faces.

The next generation of Arctic offers the same, intuitive Python-centric API whilst also utilising a custom C++ storage engine and storage format, resulting in a product that offers:

  • Fast setup: The next iteration of Arctic only requires an available S3 instance and a Python client. MongoDB is not required;
  • Fast performance: World leading performance -10x-100x faster than the current Arctic version;
  • Powerful: Advanced Pandas-like analytics that intelligently utilises the Arctic storage format;
  • Enterprise ready: Enterprise functionality including ACL's, replication and incremental backups.

For more information on what the next generation of Arctic will offer, please contact us at [email protected].

 

I am interested in other Tech Articles.

To receive e-mail alerts whenever new Tech Articles or Events are posted on this site, please subscribe below.

Subscribe

 

Find out more about Technology at Man Group

Important information

This information is communicated and/or distributed by the relevant Man entity identified below (collectively the "Company") subject to the following conditions and restriction in their respective jurisdictions.

Opinions expressed are those of the author and may not be shared by all personnel of Man Group plc (‘Man’). These opinions are subject to change without notice, are for information purposes only and do not constitute an offer or invitation to make an investment in any financial instrument or in any product to which the Company and/or its affiliates provides investment advisory or any other financial services. Any organisations, financial instrument or products described in this material are mentioned for reference purposes only which should not be considered a recommendation for their purchase or sale. Neither the Company nor the authors shall be liable to any person for any action taken on the basis of the information provided. Some statements contained in this material concerning goals, strategies, outlook or other non-historical matters may be forward-looking statements and are based on current indicators and expectations. These forward-looking statements speak only as of the date on which they are made, and the Company undertakes no obligation to update or revise any forward-looking statements. These forward-looking statements are subject to risks and uncertainties that may cause actual results to differ materially from those contained in the statements. The Company and/or its affiliates may or may not have a position in any financial instrument mentioned and may or may not be actively trading in any such securities. Unless stated otherwise all information is provided by the Company. Past performance is not indicative of future results.

Unless stated otherwise this information is communicated by the relevant entity listed below.

Australia: To the extent this material is distributed in Australia it is communicated by Man Investments Australia Limited ABN 47 002 747 480 AFSL 240581, which is regulated by the Australian Securities & Investments Commission ('ASIC'). This information has been prepared without taking into account anyone’s objectives, financial situation or needs.

Austria/Germany/Liechtenstein: To the extent this material is distributed in Austria, Germany and/or Liechtenstein it is communicated by Man (Europe) AG, which is authorised and regulated by the Liechtenstein Financial Market Authority (FMA). Man (Europe) AG is registered in the Principality of Liechtenstein no. FL-0002.420.371-2. Man (Europe) AG is an associated participant in the investor compensation scheme, which is operated by the Deposit Guarantee and Investor Compensation Foundation PCC (FL-0002.039.614-1) and corresponds with EU law. Further information is available on the Foundation's website under www.eas-liechtenstein.li. This material is of a promotional nature.

European Economic Area: Unless indicated otherwise this material is communicated in the European Economic Area by Man Asset Management (Ireland) Limited (‘MAMIL’) which is registered in Ireland under company number 250493 and has its registered office at 70 Sir John Rogerson's Quay, Grand Canal Dock, Dublin 2, Ireland. MAMIL is authorised and regulated by the Central Bank of Ireland under number C22513.

Hong Kong SAR: To the extent this material is distributed in Hong Kong SAR, this material is communicated by Man Investments (Hong Kong) Limited and has not been reviewed by the Securities and Futures Commission in Hong Kong. This material can only be communicated to intermediaries, and professional clients who are within one of the professional investors exemptions contained in the Securities and Futures Ordinance and must not be relied upon by any other person(s).

Japan: To the extent this material is distributed in Japan it is communicated by Man Group Japan Limited, Financial Instruments Business Operator, Director of Kanto Local Finance Bureau (Financial instruments firms) No. 624 for the purpose of providing information on investment strategies, investment services, etc. provided by Man Group, and is not a disclosure document based on laws and regulations. This material can only be communicated only to professional investors (i.e. specific investors or institutional investors as defined under Financial Instruments Exchange Law) who may have sufficient knowledge and experience of related risks.

Switzerland: To the extent the material is distributed in Switzerland the communicating entity is Man Investments AG, Huobstrasse 3, 8808 Pfäffikon SZ, Switzerland. Man Investment AG is regulated by the Swiss Financial Market Supervisory Authority ('FINMA').

United Kingdom: Unless indicated otherwise this material is communicated in the United Kingdom by Man Solutions Limited (‘MSL’) which is an investment company as defined in section 833 of the Companies Act 2006. MSL is registered in England and Wales under number 3385362 and has its registered office at Riverbank House, 2 Swan Lane, London, EC4R 3AD, United Kingdom. MSL is authorised and regulated by the UK Financial Conduct Authority (the ‘FCA’) under number 185637.

United States: To the extent this material is distributed in the United States, it is communicated and distributed by Man Investments, Inc. (‘Man Investments’). Man Investments is registered as a broker-dealer with the SEC and is a member of the Financial Industry Regulatory Authority (‘FINRA’). Man Investments is also a member of the Securities Investor Protection Corporation (‘SIPC’). Man Investments is a wholly owned subsidiary of Man Group plc. The registration and memberships described above in no way imply a certain level of skill or expertise or that the SEC, FINRA or the SIPC have endorsed Man Investments. Man Investments, 452 Fifth Avenue, 27th fl., New York, NY 10018.

This material is proprietary information and may not be reproduced or otherwise disseminated in whole or in part without prior written consent. Any data services and information available from public sources used in the creation of this material are believed to be reliable. However accuracy is not warranted or guaranteed. © Man 2022

MKT003209/ST/GL/W

Please update your browser

Unfortunately we no longer support Internet Explorer 8, 7 and older for security reasons.

Please update your browser to a later version and try to access our site again.

Many thanks.