Check the best options for scaling migration performance
In the epic spaghetti western The Good, The Bad and The Ugly, Tuco (The Ugly) says, "If you work for a living, why do you kill yourself working?"
This quote can be applied to many tasks in IT, and it can also sum up aptly many of the variables that IT professionals encounter when planning or performing a data migration.
When trying to calculate the timeframe of a migration, it is important to be aware of all the factors that might affect the migration's performance. However, while it is a worthwhile endeavour to calculate the length of time a cut-over will take, it is crucial to understand all the factors that are working for and against progress.
The good
Having an amazing network with terabytes of throughput is certainly a checkbox in favour, but comparing bandwidth to migration performance isn't always so. Network speed tests generate random bytes in memory and check how quickly the data can be transferred.
A speed test does not convert data, encrypt data, parse data, index data, read data from disk, write data to disk, or authenticate users. Migration time and speed will depend on an organisation's network speed, server load, throttling and number of folders.
These are some average speeds that IT service providers have reported:
- Low end: 250MB per hr/per user
- Mid end: 750MB per hr/per user
- High end: 1.25GB per hr/per user
A tier 1 migration app will open only one connection per mailbox. This allows IT to scale to a large number of concurrent mailboxes. In this way, hundreds of thousands of mailboxes can be migrated simultaneously.
Migrating more mailboxes at the same time allows parallel processing, reducing the duration of a migration. If all mailboxes are migrated at the same time, the migration duration is the time it takes to migrate the largest mailbox.
The bad
There are certain practices that mailbox users do that make it more difficult to perform efficient migrations. The two worst examples are very large files sizes and mailboxes with many items. There is a per-item transaction overhead and time cost.
For example, in the per-item scenario, given two mailboxes of the same size but with a different number of items, the one with fewer items will migrate faster than the one with more items. And the larger the file size, the longer the migration. Four hundred users with 40MB of data will transfer far faster than one user with 4G of data.
For these larger users, we recommend breaking the data into smaller files to speed up the migration. Essentially: more smaller users are faster than a few larger users. When migrating between two hosted providers, the size of the mailbox is unlikely to overload our bandwidth, but some destinations might have capacity issues. When mailboxes exceed a certain size, speed reduction may occur because of those scalability issues.
The ugly
Even accounting for all the known variables, both good and bad, there are still aspects of the migration that are unknown and out of our control.
Both Microsoft and Google have policies to throttle requests if they are hitting with too much frequency. Even if IT is being very careful not to send too many requests, there are other factors that come into play.
It is important to keep in mind that during a migration an organisation is using a shared network. In any given instance, there could be another migration or equally heavy traffic that causes load balancing or other network protocols that affect traffic. These factors are moving targets and are almost impossible to prepare for.
The solution
For all migrations, if IT has scoped out the project and know there are some very large files or large number of users, the best option is to perform a few test runs or a proof of concept (POC) with a broad data set.
The type of data files should range from large to small. Then, increase concurrence until failures appear. At this point, back off the concurrence to a manageable threshold and give yourself a 50% buffer.
This manner of proof of concept is the best way to hit the moving targets of migration performance.