Story image

Google Cloud TPU machine learning accelerators now available in beta

13 Feb 18

Google has made its Cloud TPUs available in beta on Google Cloud Platform (GCP) to help machine learning experts train and run their ML models faster.

Google defines its cloud TPUs (tensor processing unit) as hardware accelerators that are optimised to speed up and scale up specific ML workloads programmed with TensorFlow.

Each Cloud TPU is built with four custom ASICs, and provides up to 180 teraflops of floating-point performance and 64 GB of high-bandwidth memory onto a single board.

The boards can be used alone or connected via an ultra-fast, dedicated network to form multi-petaflop ML supercomputers called “TPU pods.”, Google explained in a blog post yesterday.

Google stated that it will offer these larger supercomputers on GCP later in the year.

“We designed Cloud TPUs to deliver differentiated performance per dollar for targeted TensorFlow workloads and to enable ML engineers and researchers to iterate more quickly,” Google said on its blog. The company elaborated on this with three examples:

  • Instead of waiting for a job to schedule on a shared compute cluster, you can have interactive, exclusive access to a network-attached Cloud TPU via a Google Compute Engine VM that you control and can customise
  • Rather than waiting days or weeks to train a business-critical ML model, you can train several variants of the same model overnight on a fleet of Cloud TPUs and deploy the most accurate trained model in production the next day
  • Using a single Cloud TPU and following this tutorial, you can train ResNet-50 to the expected accuracy on the ImageNet benchmark challenge in less than a day, all for well under $200

ML model training

Google’s Cloud TPUs can be programmed with high-level TensorFlow APIs, and the company has open-sourced a set of reference high-performance Cloud TPU model implementations.

Google plans to open-source additional model implementations over time.

“Adventurous ML experts may be able to optimise other TensorFlow models for Cloud TPUs on their own using the documentation and tools we provide,” Google added.

Google will introduce TPU pods later this year which will improve the time-to-accuracy of Cloud TPUs.

“Both ResNet-50 and Transformer training times drop from the better part of a day to under 30 minutes on a full TPU pod, no code changes required,” the blog detailed.

Two Sigma chief technology officer and former senior Google engineer Alfred Spector comments, “We made a decision to focus our deep learning research on the cloud for many reasons, but mostly to gain access to the latest machine learning infrastructure.”

“Google Cloud TPUs are an example of innovative, rapidly evolving technology to support deep learning, and we found that moving TensorFlow workloads to TPUs has boosted our productivity by greatly reducing both the complexity of programming new models and the time required to train them.”

Spector concludes, “Using Cloud TPUs instead of clusters of other accelerators has allowed us to focus on building our models without being distracted by the need to manage the complexity of cluster communication patterns.”

Three ways to achieve data security whilst enabling BYOD
"A mobility strategy is now more important than ever before, that said, selecting the right one is often no small task."
Mobile Infrastructure market sees fastest growth since 2014
The report from Dell’Oro shows that while the vendor rankings for the top three vendors remained unchanged with Huawei, Ericsson, and Nokia leading.
HPE unveils AI-driven operations for ProLiant, Synergy and Apollo servers
With global learning and predictive analytics capabilities based on real-world operational data, HPE InfoSight supposedly drives down operating costs.
How IoT and hybrid cloud will change in 2019
"Traditional VPN software solutions are obsolete for the new IT reality of hybrid and multi-cloud."
Enterprises to begin closing their data centres
Dan Hushon predicts next year companies will begin bidding farewell (if they haven't already) to their onsite data centres.
Citrix acquires micro app platform Sapho
Sapho’s micro applications improve employee productivity by consolidating access to tools, activities and tasks in a simple and unified work feed.
HPE expands AI-driven operations
HPE InfoSight extends select predictive analytics and recommendation capabilities to HPE servers, enabling smarter, self-monitoring infrastructure.
Dimension Data nabs three Cisco partner awards
Cisco announced the awards, including APJ Partner of the Year, at a global awards reception during its annual partner conference.