Google's Federated Learning tech gets privacy enhancements
FYI, this story is more than a year old
Lots of organisations are starting to blend artificial intelligence into their websites and apps. Customers expect a smart interface and experience but are hesitant about privacy.
Google is a leading provider of the underlying open-source technology that is used in artificial intelligence by organisations' developers.
First, they open-sourced TensorFlow, which helped those developers with machine learning projects. Then in August, they introduced Federated Learning. You can read our full explanation of Federated Learning in layman's terms here.
With Federated Learning the training data for the machine learning is cleverly kept on the individual device (for example, a smartphone). This means only insights learnt from that training data is submitted centrally, instead of the data itself.
This really is a smart technology that helps artificial intelligence but without privacy issues.
One chink in this approach is the idea of re-identification. That an organisation or system can take these anonymised insights that got submitted centrally and extrapolate back to the full data. Essentially removing the anonymity and possibly doing this without the user knowing.
The solution to this is an approach called Differential Privacy, which makes it difficult or impossible for an individual's private data to be re-identified.
"Differentially-private data analysis is a principled approach that enables organisations to learn from the majority of their data while simultaneously ensuring that those results do not allow any individual's data to be distinguished or re-identified. This type of analysis can be implemented in a wide variety of ways and for many different purposes. For example, if you are a health researcher, you may want to compare the average amount of time patients remain admitted across various hospitals in order to determine if there are differences in care. Differential privacy is a high-assurance, analytic means of ensuring that use cases like this are addressed in a privacy-preserving manner." says Miguel Guevara, Product Manager, Privacy and Data Protection Office, Google.
Here are some of the critical features of the library:
- Statistical functions: Most common data science operations are supported by this release. Developers can compute counts, sums, averages, medians, and percentiles using our library.
- Rigorous testing: Getting differential privacy right is challenging. Besides an extensive test suite, we've included an extensible 'Stochastic Differential Privacy Model Checker library' to help prevent mistakes.
- Ready to use: The real utility of an open-source release is in answering the question "Can I use this?" That's why we've included a PostgreSQL extension along with common recipes to get you started. We've described the details of our approach in a technical paper that we've just released today.
- Modular: We designed the library so that it can be extended to include other functionalities such as additional mechanisms, aggregation functions, or privacy budget management.
Google uses differential privacy already in Google Maps and Chrome since 2014.
The new differential privacy library is available on GitHub here.