Table of Contents
- Privacy upgrades for machine learning
- What is federated learning
- Application example
- 1. Healthcare
- 2. Self-driving cars
- Limitations and Challenges
- 1. Non-independent identically distributed data
- 2.Computing power of terminal
- 3. Data labeling
- 4. Data leak
- Additional Reading/Resources
- Like this article? Here are some other articles I wrote that you might enjoy 👇.
Privacy upgrades for machine learning
Many machine learning applications require large amounts of data for their operation. But the problem is that user data is confidential and private.
Growing concerns about privacy and data rights are challenging the traditional method of training and developing machine learning models, where users must surrender sensitive data to cloud servers.
What is the solution? It’s Federated Learning .
Before jumping into today’s article, we have a new discord server where the bitgrit community is discussing all things data science and AI including the newly released BGR token. Click here to join the server ! (*translation note 1)
What is federated learning
Federated learning was born by combining distributed optimization, privacy research, and machine learning.
The official definition by Wikipedia is as follows.
Federated learning (also called collaborative learning) is a machine learning technique that trains algorithms across multiple distributed edge devices or servers that hold local data samples without exchanging local data .
The key words here are decentralized and local data.
Federated learning was first introduced in a paper published in 2016 by an AI researcher at Google. ” Communication Efficient Learning on Deep Networks from Distributed Data “.
The main idea of federated learning is to bring a centralized model to decentralized devices, thus eliminating the need for user data acquisition.
Since user data never leaves the device, only model results are shared, helping to keep data private and secure (more on that below).
These improvements in machine learning privacy are groundbreaking and bring new possibilities to ML applications that handle sensitive data.
But before diving into the use cases and benefits of federated learning, I want to explain how it works with an example.
Below, we’ll walk through the process of associative learning, using Google’s Gboard next word prediction as an example.
First, Google builds a base ML model trained on public data on its cloud servers.
Then, multiple user terminals learn the ML model spontaneously. Such devices download and acquire models when connected to power and a Wi-Fi network (training a model is a power-intensive operation, so we don’t want to drain the user’s device’s battery). ).
The user’s device provides data related to the model, such as keystroke logs and prediction feedback, to help the model learn and improve.
Once trained, model updates and refinements are aggregated and then encrypted and sent to the cloud to update the base model (on the server) with the new information.
This download and update cycle is done on multiple devices and repeated over and over until good accuracy is achieved. Once completed, the model can be distributed to all other users for any use case.
Importantly, the learning data still remains on the user’s device, and only the learning results are encrypted and sent to the cloud.
The details of the above process are explained with images as follows.
The mobile phone personalizes the model locally according to usage (A). Many users’ updates are aggregated (B), a change agreement is formed for the shared model (C), and then the process is repeated. ( Source ) (*translation note 3)
This method of collaboratively learning and developing machine learning models is powerful and has real-world applications.
When data is siled for legal, economic, or other reasons, federated learning is powerful because it allows individual participants to train models on larger datasets.
Digital health is a good example of an associative learning application. Medical institution data is siled due to patient privacy and data governance concerns and cannot be used without patient consent. With traditional approaches, machine learning models can only learn from a limited set of available data sources, resulting in biased results with respect to hospital equipment/demographics/practices.
With federated learning, the AI algorithm gets more information from other hospitals, so it gets more unbiased information about gender, age, demographics, and more. This allows the model to make more generalized predictions.
References → The Future of Digital Health with Federated Learning
(*5) Nicola Rieke, Solution Architect Manager, Healthcare & Life Sciences Division, NVIDIA, is the lead author of the paper, “Federated Learning Brings the Future of Digital Health, ” to the medical practice of federated learning. It is argued that the introduction will affect the following six parties.
Influence of Stakeholders by Introducing Federated Learning to Medical Practice
2. Self-driving cars
Self-driving cars can also be treated as individual actors, allowing the car to learn rather than sending data back to a central server.
Real-world driving is dangerous and often unpredictable, so federated learning can speed up the learning process and reduce the need to transfer large amounts of data. Ultimately, it has the potential to accelerate the process towards fully autonomous self-driving.
Federated learning has more applications , mainly in the area of the Internet of Things ( IoT ). These examples echo the same aims of federated learning: using ML to enable the IoT while reducing high-level communication and storage overhead while maintaining data privacy.
Limitations and Challenges
Associative learning is still a new idea and has some well-known challenges that prevent it from realizing its full potential.
1. Non-independent identically distributed data
Devices around the world are constantly generating non-independent and co-distributed data. In the statistical literature, non-independent and identically distributed means that the data are not independent and are not identically distributed. The assumption of independent and identically distributed variables is central to many statistical methods and algorithms, and non-independent and identically distributed data add complexity to models and can lead to problems.
2.Computing power of terminal
Each device participating in a federated network has different capabilities at the software and hardware level (network connectivity, RAM, power, etc.). Most smartphones today are highly capable of computationally intensive tasks like model training, but the majority of edge devices are not yet capable of such computations, which can degrade device performance. . So there is a trade-off between maintaining device performance and model accuracy.
3. Data labeling
Many supervised ML techniques require clear and consistent labels for algorithm execution. To automatically label data coming from various devices, you need to implement a good data pipeline.
4. Data leak
It is still possible to identify and obtain a specific user’s data through reverse engineering. However, privacy techniques such as differential privacy can enhance the privacy of federated learning, but at the cost of reduced model accuracy.
For federated learning tasks, see here .
(*Translation Note 7) The article ” Associated Learning: Challenges, Methods, and Future Directions ” , which is the source of the above four issues in associative learning, was published in November 2019 by a machine learning researcher at Carnegie Mellon University in the United States. It was published on the blog CMU ML that he manages. The article lists five unsolved problems in associative learning research:
Five unsolved problems in associative learning research
Interested in tinkering with federated learning?
In the following, we introduce several frameworks for implementing federated learning.
- TensorFlow Federated
- Flower: A friendly federated learning framework
Federated learning is a powerful idea in artificial intelligence. The technology enables distributed learning across multiple devices, with low latency and low power consumption, while ensuring data privacy and security.
There is still a great deal of research going on today, although there are still challenges to be faced before it can be put to practical use in the real world.
I hope this article sparked your interest in federated learning and gave you a glimpse of what it is and what it can achieve.