IT Brief Australia - Technology news for CIOs & IT decision-makers
Story image
Is ChatGPT’s use of our personal data even legal?
Tue, 25th Jul 2023

While AI-powered tools and machine learning models can help us automate routine tasks and have the potential to revolutionise the way we interact with technology, aiding in applications such as chatbots, language translation, and even coding, it's also important to consider their potential drawbacks.

Privacy and ChatGPT

Privacy stands out as a primary concern surrounding these models, as it can be challenging to ascertain if individuals' data has been utilised to train a machine learning model. GPT-3, for instance, is a large language model (LLM) trained on extensive internet data, including personal websites and content from social media platforms.

This has led to concerns that the model has been using personal data without the owners’ permission, and it may now be difficult to control, delete or even say you “own” this data once it has been used to train the model.

Another significant concern is the "right to be forgotten." As the utilisation of GPT and other machine learning models becomes increasingly prevalent, individuals may desire the ability to erase their info from the model’s database.

However, at this point, there is no universal method for individuals to demand the deletion of their data from a machine-learning model. Some researchers and companies are exploring techniques to enable the removal or "forgetting" of specific data points, but these methods are still in the early stages of development, and their feasibility and effectiveness remain uncertain.

So, is that legal?

The legality of employing personal data to train machine learning models like GPT-3 varies depending on the specific laws and regulations in a particular country or region. For example, in the European Union, the General Data Protection Regulation (GDPR) requires that data be collected and used solely for specific, lawful purposes.

In accordance with GDPR regulations, organisations must seek explicit consent from individuals prior to gathering and utilising their data. While there exists a legal foundation for processing personal data for scientific and historical research purposes, the controller must adhere to GDPR's core principles and uphold individuals' rights.

These rights encompass the right to be informed, the right to access, the right to rectification, the right to erasure, the right to object, and the right to data portability. It would appear that many language models may not entirely conform to GDPR requirements.

On the other side of the Atlantic, there is no federal law that specifically regulates the use of personal data to train machine learning models in the United States. However, generally, organisations are required to adhere to laws such as the Health Insurance Portability and Accountability Act (HIPAA) and the Children's Online Privacy Protection Act (COPPA) when collecting and using personal data from individuals in sensitive categories.

In California – where many prominent technology companies are located – compliance with the California Consumer Privacy Act (CCPA), which shares similarities with GDPR, is mandatory.

Ultimately, the development of AI models is constantly in flux. As the field is always evolving, laws and regulations surrounding personal data use in AI are likely to evolve too.

Accuracy of ChatGPT

Another significant concern surrounding language models is the lack of verification and misinformation. It has been widely reported that many language-learning AIs present information confidently and as fact when the information is, in fact, inaccurate. This absence of fact-checking mechanisms could potentially contribute to the spread of false information, particularly in regard to sensitive information like news, politics, and medicine.

Large language learning models possess transformative potential, revolutionising technology interactions and automating diverse tasks. Nevertheless, acknowledging and addressing their potential drawbacks and concerns is crucial. As their use grows, tackling privacy, verification, and misinformation issues becomes imperative.