Generative AI's first data breach: OpenAI takes corrective action, bug patched

Home/ News

Generative AI's first data breach: OpenAI takes corrective action, bug patched

June 22, 2023

This News Covers

What data of ChatGPT was leaked?
Do employees use confidential company data in ChatGPT?
Aftermath of such leaks on a company and employees
When and which top password leak incidences happened in history? Which websites got affected?
How easy or difficult is it to crack and leak passwords?
When and how many ChatGPT credentials were leaked?
OpenAi's official statement on this leak
Which data should you not share with OpenAI?

Top targets of cyberattacks and reasons they are so

In today's digital era, large-scale data breaches have become an alarming reality, with cyber attackers targeting various platforms and databases to gain unauthorized access to sensitive information. Among the prime targets for cyber attacks are platforms such as WhatsApp, Instagram, Gmail, and government databases. These platforms hold vast amounts of data, including login credentials, financial information, and personal identities issued by governments, making them enticing targets for hackers seeking to exploit valuable data for malicious purposes.

Large data sets containing login credentials, financial information, and personal identity issued by governments are highly attractive to cyber attackers and hackers. These data types serve as gateways to valuable resources, allowing unauthorized access, identity theft, and financial fraud. Breaches involving such data can lead to severe consequences, including financial loss, reputational damage, and compromised national security. Mitigating these risks requires strong cybersecurity measures, such as robust authentication protocols, encryption techniques, and regular security audits. Collaboration between government entities, private organizations, and cybersecurity experts is essential to combat evolving cyber threats effectively.

10 Incidences of Cybersecurity Attacks on WhatsApp, Instagram, Gmail, and Government Databases

In May 2019, WhatsApp, a popular messaging platform owned by Facebook, suffered a major data breach that impacted approximately 1.5 million users. The breach involved the installation of spyware on users' devices, allowing hackers to gain access to personal data, including messages, contacts, and call records.

In August 2020, Instagram, a widely used social media platform, experienced a significant data leak where a bug in its application programming interface (API) allowed hackers to obtain access to users' personal information, including email addresses and phone numbers. The data of millions of Instagram users was compromised as a result.

In March 2017, a widespread phishing attack targeted Gmail users. Hackers sent deceptive emails that appeared to be legitimate, tricking users into providing their login credentials. This attack affected a large number of Gmail users, exposing their personal emails and potentially compromising their accounts.

One of the most significant government database breaches occurred in June 2015 when the Office of Personnel Management (OPM) in the United States fell victim to a cyber attack. This breach exposed sensitive personal information, including security clearance details and background investigation records, of millions of current and former federal employees.

WhatsApp faced public scrutiny and backlash in January 2021 when it announced updates to its privacy policy, allowing increased data sharing with its parent company, Facebook. Users raised concerns over the potential compromise of their personal data and privacy, leading to widespread debates and a migration of users to alternative messaging platforms.

In September 2021, it was reported that a cybercriminal forum was selling scraped data from millions of Instagram users. This data, obtained through unauthorized means, included users' email addresses and phone numbers. Such incidents highlight the risks associated with the exposure of personal information on social media platforms.

In October 2018, a sophisticated cyber attack targeted Gmail users, specifically those with high-profile roles, such as politicians, journalists, and activists. Hackers employed tactics like spear-phishing and social engineering to gain unauthorized access to accounts and monitor the victims' communications.

Aadhaar, India's biometric identity system, experienced a data breach in March 2018. The breach exposed the personal information of over one billion Aadhaar cardholders, including their names, addresses, and unique identification numbers. This incident raised concerns about the security and privacy of citizens' data in government databases.

In October 2019, WhatsApp revealed a significant security breach involving the Pegasus spyware developed by an Israeli cyber intelligence firm, NSO Group. The spyware exploited a vulnerability in WhatsApp's calling feature, enabling attackers to install malware on targeted devices and gain access to

WhatsApp Data Breach - May 2019:
Instagram Data Leak - August 2020:
Gmail Phishing Attack - March 2017:
Government Database Breach - Office of Personnel Management (OPM) - June 2015:
WhatsApp Privacy Policy Controversy - January 2021:
Instagram User Data Scraping - September 2021:
Gmail Account Hijacking - October 2018:
Government Database Breach - Aadhaar - March 2018:
WhatsApp Pegasus Spyware Attack - October 2019:

Evolution of Cyberattacks Due to AI Advancements in the Next 2-3 Years

As technology continues to advance, including the rapid development of artificial intelligence (AI) and machine learning (ML) capabilities, it is expected that cyberattacks will also evolve to leverage these advancements for malicious purposes. Here are some potential ways in which cyberattacks may evolve in the next 2-3 years due to AI advancements:

Cybercriminals are likely to leverage AI algorithms to develop more sophisticated and evasive malware and ransomware. AI-powered malware can adapt and evolve in real-time, making it harder for traditional security systems to detect and mitigate. This could lead to an increase in targeted attacks and more effective exploitation of vulnerabilities.

Phishing attacks, which aim to deceive users into revealing sensitive information, can be made more convincing and personalized using AI algorithms. AI can analyze and mimic patterns from genuine communications, making phishing emails and messages appear more authentic. This could result in an increase in successful phishing attempts and data breaches.

AI can enable the automation of cyberattacks by utilizing bots and botnets. These AI-powered bots can autonomously scan for vulnerabilities, launch attacks, and propagate malware, amplifying the scale and speed of cyberattacks. Botnets powered by AI algorithms can learn and adapt their attack techniques, making them more challenging to detect and mitigate.

Deepfake technology, powered by AI, allows for the creation of highly realistic fake audio and video content. Cybercriminals may utilize deepfakes to impersonate individuals, including high-ranking officials or executives, to manipulate and deceive targeted individuals or organizations. This can lead to social engineering attacks, corporate espionage, and reputational damage.

AI algorithms can analyze vast amounts of data from social media platforms, public databases, and online sources to create detailed profiles of individuals. Cybercriminals can leverage this information for sophisticated social engineering attacks, tailoring their approaches to exploit psychological vulnerabilities and manipulate victims into divulging sensitive information or performing malicious actions.

APTs are long-term, stealthy attacks conducted by sophisticated threat actors. With the help of AI, APTs can become more intelligent and adaptive. AI algorithms can analyze network traffic, detect anomalies, and learn from the target environment to evade detection and maintain persistence within compromised systems, allowing attackers to carry out espionage or sabotage activities.

Insider threats pose a significant risk to organizations. AI can be utilized to identify patterns of behavior, detect anomalies, and predict potential insider threats. However, adversaries can also exploit AI to conceal their activities, making it harder to detect malicious insider actions. AI-powered insider threats could bypass traditional security measures and cause significant damage.

AI algorithms themselves can become targets of cyberattacks. Adversarial attacks aim to manipulate or deceive AI systems by introducing specially crafted input data. For example, an AI-powered security system could be tricked into misclassifying malware as benign or misinterpreting sensor data, leading to false security assurances or system vulnerabilities.

AI-Enhanced Malware and Ransomware:
AI-Powered Phishing Attacks:
Automated Attacks and Botnets:
Deepfake Attacks:
AI-Driven Social Engineering:
AI-Assisted Advanced Persistent Threats (APTs):
AI-Powered Insider Threats:
Adversarial AI Attacks:

As AI technology continues to advance, it is crucial for cybersecurity professionals and organizations to stay ahead of these evolving threats. Robust AI-based defense mechanisms, including intelligent anomaly detection, behavior analytics, and adversarial training, will be essential in mitigating the risks posed by AI-powered cyberattacks. Additionally, ongoing research, collaboration between the cybersecurity industry and AI developers, and regulatory frameworks are vital to ensure the responsible and ethical use of AI while addressing emerging cybersecurity challenges.

The recent data leak involving over 100,000 ChatGPT credentials has raised significant concerns about the implications of using AI-powered chatbots. The breach, with India being the most affected country, highlights the global popularity of ChatGPT and its integration into various business operations. Info stealers, which target passwords and sensitive information, are believed to be responsible for the leak. The incident has prompted discussions about the confidentiality of conversations stored in ChatGPT and the potential exposure of proprietary information, internal strategies, and personal communications. It also emphasizes the urgent need for improved password security practices and the implementation of two-factor authentication to mitigate the risk of account takeover attacks.

What data of ChatGPT was leaked?

The data that was leaked from ChatGPT due to a bug in the AI's source code included sensitive user data.

Chat Histories: A bug in ChatGPT's source code resulted in a breach of sensitive data, where unauthorized actors were able to view users' chat history due to a vulnerability in the Redis memory database used by OpenAI.
Users' Personal and Payment Information: The incident also exposed personal and payment data of approximately 1.2% of active ChatGPT Plus subscribers on a specific date (March 20th). This included:
1. Names
2. Email addresses
3. Payment addresses
4. Credit card types
5. The last four digits of credit card numbers
6. Potentially, the first message of a newly-created conversation if both users were active around the same time
Samsung's Confidential Data: Separate from the system vulnerability, Samsung employees reportedly shared confidential company information with ChatGPT. This included:
1. Source code from a faulty semiconductor database
2. Confidential code for a defective equipment issue
3. An entire meeting transcript for the chatbot to create meeting minutes

Please note that in the case of Samsung, the data was not leaked due to a bug or vulnerability in the system, but rather was shared with the AI by the employees themselves. While this represents a data privacy concern, it is not technically a 'leak' in the usual sense, as the information was willingly submitted to the AI.

Do employees use confidential company data in ChatGPT?

There have been instances where employees, such as those from Samsung, have used confidential company data in ChatGPT. They shared confidential information and source codes with ChatGPT to assist in troubleshooting and optimizing work tasks. This, however, poses a significant risk of data leaks and breaches.

There have been instances where employees have used confidential company data with ChatGPT. In the case of Samsung, it was reported that employees used the AI system to help solve work-related problems, which led them to input confidential company data into ChatGPT. Specifically, these incidents involved:

An employee copying the source code from a faulty semiconductor database into ChatGPT to help find a fix.
An employee sharing confidential code to find a solution for defective equipment.
An employee submitting an entire meeting transcript to ChatGPT to create meeting minutes.

However, this is a misuse of the technology and represents a significant security and confidentiality concern. Companies and organizations generally have strict policies about the protection of confidential and proprietary data, and using such data with external systems, especially without proper safeguards, can lead to breaches and leaks, as happened with Samsung.

OpenAI itself warns against sharing sensitive data with ChatGPT, as all inputs can be retained and used to further train the AI models, and deleting specific data prompts is not possible. Hence, employees should refrain from using confidential company data with AI systems like ChatGPT unless there are explicit, approved methods for doing so that comply with company policies and data privacy laws

Aftermath of such leaks on a company and employees

A data breach can lead to exposure of sensitive personal information such as names, email addresses, payment addresses, and partial credit card details. Such breaches pose a significant risk to the users' privacy and financial safety, potentially leading to issues like identity theft and financial fraud.

When employees share confidential company data with AI systems like ChatGPT, they inadvertently risk exposing critical company information or trade secrets. In cases like those reported at Samsung, employees used ChatGPT for troubleshooting, unintentionally revealing sensitive codes and business processes that could be exploited by competitors or malicious actors.

Data breaches and information leaks can significantly harm the reputation of the AI company and erode trust among its user base.

Such data leaks can potentially violate data protection and privacy laws, such as GDPR, attracting legal penalties. Organizations might also face lawsuits from users whose data was compromised. Furthermore, countries or regions might ban or restrict the use of such AI systems until data security measures are assured.

In the aftermath of a data leak, companies typically need to invest significantly in cybersecurity measures to prevent future breaches. This includes improving system vulnerabilities, employing additional security staff, or developing in-house solutions, as seen in Samsung's case. These actions invariably increase operational costs and can impact a company's financial health.

Threat to Privacy and Personal Information
Corporate Espionage and Trade Secrets Exposure
Damaged Reputation and Loss of Trust
Regulatory Challenges and Legal Consequences
Increased Security Measures and Costs

When and which top password leak incidences happened in history? Which websites got affected?

Here are some of the major password leak incidents that have happened more recently

Facebook (2019): In April 2019, Facebook admitted to a security incident that exposed around 600 million plaintext user passwords to its employees. The passwords were stored in readable format within its internal data storage systems. While there were no reported user account abuses related to this incident, it did raise significant concerns about Facebook's data handling practices.
Marriott International (2020): In March 2020, Marriott International disclosed a data breach that exposed the personal information of up to 5.2 million guests. The leaked information included names, addresses, emails, phone numbers, and account passwords.
Zoom (2020): In April 2020, it was reported that over 500,000 Zoom account credentials were being sold on the dark web and hacker forums. The data included email addresses, passwords, personal meeting URLs, and host keys.
Twitter (2020): In July 2020, Twitter experienced a high-profile breach where 130 accounts were compromised, including those of public figures like Barack Obama and Elon Musk. The attackers used these accounts to tweet a bitcoin scam. However, this incident involved a social engineering attack on Twitter employees rather than a direct password leak.
Nitro PDF (2020): In October 2020, Nitro PDF experienced a massive data breach that exposed the passwords, IP addresses, and account details of more than 70 million users. The data was put up for sale on a dark web marketplace.
COMB (Compilation of Many Breaches) Leak (2021): In February 2021, it was reported that the largest compilation of breached data, named "Compilation of Many Breaches" (COMB), had been leaked online. The database contained 3.2 billion unique pairs of cleartext emails and passwords, which were compiled from numerous individual data breaches.

Please keep in mind that the methods used to secure passwords vary by company and circumstance, with some storing passwords in plaintext (unencrypted) and others using various encryption or hashing methods. In all cases, affected users are advised to change their passwords and to be cautious of potential phishing attempts using the leaked data.

How easy or difficult is it to crack and leak passwords?

Password cracking can be challenging or easy depending on a variety of factors. Essentially, it involves guessing or systematically attempting different combinations to find a user's password. Common methods include brute force attacks, dictionary attacks, and rainbow table attacks. The complexity and uniqueness of a password play a significant role in how difficult it is to crack.

A brute force attack involves an attacker trying all possible combinations of characters until the correct password is found. The time taken for this method depends on the length and complexity of the password. Simple and short passwords can be cracked relatively quickly, while long and complex passwords can take years, decades, or even longer with current computing power.

In a dictionary attack, the hacker uses a list (or dictionary) of common words, phrases, or previously leaked passwords, instead of trying all possible combinations. This method is quicker than a brute force attack but less comprehensive. If a user's password doesn't feature in the dictionary being used, the attack won't succeed.

A rainbow table attack uses precomputed tables (rainbow tables) for reversing cryptographic hash functions. It's an efficient way to crack password hashes, but it requires considerable storage. Additionally, this type of attack is less effective if the password system uses "salting" - a technique that adds random data to the password before it's hashed.

Several factors determine the difficulty of cracking a password. These include the length and complexity of the password, the method of storage (e.g., if it's hashed or salted), the computational power of the hacker's system, and the security measures employed by the service provider, such as account lockouts after a certain number of failed attempts or two-factor authentication.

Basics of Password Cracking
Brute Force Attacks
Dictionary Attacks
Rainbow Table Attacks
Factors Affecting the Difficulty of Cracking Passwords

In conclusion, while it can be relatively easy to crack simple, short, and common passwords, cracking complex and unique ones can be a considerably difficult and time-consuming task. That's why it's recommended to use strong, unique passwords for each account and enable additional security measures when available.

When and how many ChatGPT credentials were leaked?

Over 100,000 ChatGPT credentials were leaked within a span of one year, from June 2022 to May 2023. The exact timing and specific incidents of the leaks within that timeframe were not mentioned in the given information.

Date: March 20, 2023

Company Name: OpenAI

Number of Leaked ChatGPT Credentials: Not specifically mentioned but potentially affected 1.2% of ChatGPT Plus subscribers

Details: OpenAI experienced a security issue with its popular AI, ChatGPT. During the incident, users of the platform reported seeing the titles from other users' chat histories. Upon further investigation, it was revealed that this vulnerability might have also exposed sensitive personal data from approximately 1.2 percent of ChatGPT Plus subscribers. The exposed data could potentially include first and last names, email addresses, payment addresses, and the last four digits of credit card numbers, along with their expiration dates.

Date: Not specified, but happened on a Friday

Company Name: Microsoft

Number of Leaked ChatGPT Credentials: No credential leaks reported

Details: Microsoft was discovered to be integrating OpenAI's ChatGPT into its Bing search engine. The update was prematurely released, spotted by some users, and quickly pulled down by Microsoft. The integration aimed at providing an "AI-powered answer engine" capable of delivering human-like conversation, sourcing answers from the web, and providing responses up-to-date with the most recent information. Despite the hasty removal, Bloomberg reported that Microsoft has been working on this integration for months and has invested billions into OpenAI.

Date:March 17, 2023

Company Name: OpenAI

Number of Leaked ChatGPT Credentials: Potentially exposed personal data from 1.2% of ChatGPT Plus subscribers

Details: Following a security incident where a bug caused the titles of other users' chat histories to be visible, OpenAI disclosed its preliminary findings. The incident may have led to the leak of personal data of about 1.2% of ChatGPT Plus subscribers. OpenAI stated that an active user's first and last name, email address, payment address, the last four digits of their credit card, and credit card expiration date may have been visible. The faulty library causing the issue was identified and patched, and OpenAI has implemented additional measures to prevent such issues in the future.

Date: Not specified

Company Name: Samsung

Number of Leaked ChatGPT Credentials:** No personal credentials were leaked

Details: Samsung employees reportedly leaked confidential company information via OpenAI's ChatGPT on at least three separate occasions. The information leaked ranged from source code for a faulty semiconductor database to an entire meeting recording. OpenAI retains data submitted to its services to improve its AI models, causing these leaks to raise serious security concerns. In response to this, Samsung limited the data input to ChatGPT and started developing its in-house AI.

Date: Not specified

Company Name: Samsung

Number of Leaked ChatGPT Credentials: No personal credentials were leaked

Details: Samsung experienced a significant security issue when employees inadvertently shared confidential company data with OpenAI's ChatGPT. The accidental leaks included confidential source code and meeting records, which the AI now retains as a part of its learning process. As a countermeasure, Samsung limited ChatGPT's data upload capacity per person and initiated an investigation into the matter. The company is also exploring the development of its proprietary AI chatbot to avoid similar incidents in the future.

OpenAI ChatGPT Data Breach
Microsoft's ChatGPT Bing Integration Leak
OpenAI Addresses ChatGPT User Data Leak
Samsung's ChatGPT Confidential Data Leak
Samsung Workers Unintentionally Leak Trade Secrets Via ChatGPT

OpenAI's official statement on this leak

OpenAI has released an official statement regarding the recent data breach involving ChatGPT. The company acknowledges the breach and states that it was caused by a bug in an open-source library, specifically the Redis client library called redis-py. The bug allowed some users to see titles from another active user's chat history, and in certain cases, the first message of a newly-created conversation was visible in someone else's chat history. OpenAI emphasizes that full credit card numbers were not exposed and that they have patched the bug. They have also reached out to notify affected users and assure them that there is no ongoing risk to their data. OpenAI is committed to taking corrective measures and rebuilding trust within the ChatGPT user community.

Which data should you not share with OpenAI?

When interacting with AI models like ChatGPT developed by OpenAI, there is certain information you should never share for privacy and security reasons:

Personal Identifiable Information (PII):
This includes details such as your full name, home address, phone number, and date of birth. Sharing such information could potentially lead to identity theft.
Financial Information:
Bank account numbers, credit card numbers, your CVV, and any other financial data should never be shared. These details can be used for fraudulent transactions.
Health Records:
Personal medical information and health records should remain private. This includes your medical history, current medical conditions, and any other health-related information.
Passwords and Security Questions:
Never share your passwords, PINs, or answers to security questions for any of your online accounts. These details can be used to gain unauthorized access to your accounts.
Social Security Number (or equivalent):
In the U.S., this is your Social Security Number. In other countries, it could be your National Insurance Number, Personal Number, etc. These numbers are unique to you and are often used for official identification purposes. Sharing them could lead to serious personal and financial implications.

The stolen credentials have appeared on dark web marketplaces, with India being the most heavily impacted country. The breach highlights the popularity of ChatGPT globally. Information stealers, which target passwords and sensitive information, are believed to be responsible for the breach. Experts emphasize the need for improved cybersecurity practices, including using unique passwords and enabling two-factor authentication. Companies like Samsung have banned the use of ChatGPT due to concerns about data leaks. OpenAI has confirmed a data breach and taken steps to address the issue.

References

Letter to Editor

GET AHEAD

Top Research Reports to Fuel Your Industry Knowledge

Editor's Pick

Healthcare

Cellevate’s Nanofiber Revolution 13× Higher Vaccine Yields, 85% Lower Costs

October 15, 2025

Chemical and Materials

Semiconductor and IC Packaging Materials Gains Momentum as Amkor Invests $7B in Arizona

October 7, 2025

Information and Communication Technology

AI-Driven Data Center Expansion is Accelerating Innovation in Power Infrastructure Worldwide

September 29, 2025

PODCASTS

Sustainable Digital Transformation & Industry 4.0

Sanjay Kaul, President-Asia Pacific & Japan, Cisco, and host Aashish Mehra, Chief Research Officer, MarketsandMarkets, in conversation on unraveling 'Sustainable Digital Transformation and Industry 4.0'

11 July 2023|S2E12|Listen Now

Future of Utilities with Thomas Birr from E.ON

Generative AI

Prasad Joshi, Senior Vice President-Emerging Technology Solutions, Infosys, and host, Vinod Chikkareddy, CCO, MarketsandMarkets, in exploring the recent advances in AI and the generative AI space.

7 Nov 2023|S2E13|Listen Now

Industrial Cybersecurity Market

$16.3 BN

2022

$24.4 BN

2028

Base Year

Forecast Year

The breach highlights the popularity of ChatGPT globally and the risks associated with data leaks.

The recent data leak involving over 100,000 ChatGPT credentials has raised significant concerns about the implications of using AI-powered chatbots.

MarketsandMarkets™ identified a groundbreaking opportunity worth over $76+ billion across the entire value chain of the Generative AI Future Economy.

Highlights:

Top 10 High Growth Opportunities in the Generative AI Economy.
How to target companies in Generative AI Economy ?
What are the top use cases of Generative AI ?
Who are the leading players in Generative AI Industry ?
Which are their most demanding Generative AI technology application areas ?
Which are the top growing applications in Generative AI ?
What is their revenue potential ?

Get Deep Dive Analysis on each one of the above points

Download Whitepaper Now

STAY TUNED

GET EMAIL ALERT

Subscribe Email

+1-888600-6441

Corporate Office Hours

+1-888600-6441

US/Can Toll Free

+44-800-368-9399

UK Office Hours

Chat with usLiveChat

Industries

Practices

Megatrends

Growth Programs

Whom we serve

Thought Leadership

Business Resilience