Regulon — CA-AB-2013 · CA AB 2013 (Training Data Transparency)

Summary

Requires developers of generative AI systems or services made available to Californians to publicly post documentation on their website describing the training data used, including dataset sources, data point counts, IP status, personal information presence, and processing methods. Applies to systems released on or after January 1, 2022, with initial compliance due by January 1, 2026, and updated documentation required before each subsequent release or substantial modification. Three narrow exemptions exist: systems solely for security and integrity, aircraft operations in national airspace, and national security/military/defense systems available only to federal entities. The statute contains no enforcement mechanism, no designated enforcer, and no penalties for noncompliance.

Enforcement & Penalties

Enforcement Authority

No designated enforcement agency specified in the statute. No private right of action is created. No penalty provisions are included. The statute imposes a disclosure obligation but is silent on enforcement mechanism and remedies. In practice, enforcement would likely flow through existing California frameworks—most relevantly the Unfair Competition Law (UCL, Bus. & Prof. Code § 17200), which the AG, district attorneys, and private plaintiffs can use to pursue violations of any California law as an unlawful business practice.

Penalties

The statute does not specify any penalties, damages, or remedies for noncompliance. If pursued under California's Unfair Competition Law (UCL, Bus. & Prof. Code § 17200), available remedies are limited to injunctive relief and restitution—no damages per se.

Who Is Covered

"Developer" means a person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an artificial intelligence system or service for use by members of the public. For purposes of this subdivision, "members of the public" does not include an affiliate as defined in subparagraph (A) of paragraph (1) of subdivision (c) of Section 1799.1a, or a hospital's medical staff member.

What Is Covered

"Generative artificial intelligence" means artificial intelligence that can generate derived synthetic content, such as text, images, video, and audio, that emulates the structure and characteristics of the artificial intelligence's training data.

Compliance Obligations 1 obligation · click obligation ID to open requirement page

T-03 Training Data Disclosure · T-03.2 · Developer · Foundation Model

Civ. Code § 3111(a)(1)-(12), (b)

Plain Language

Developers of generative AI systems or services available to Californians must publish a detailed training data documentation page on their website. The documentation must include a high-level summary covering twelve enumerated categories: dataset sources/owners, purpose alignment, data point counts (general ranges and estimates permitted), data point types, IP status (copyright/trademark/patent or public domain), whether data was purchased or licensed, presence of personal information (per CCPA definition) or aggregate consumer information, any cleaning or processing performed, collection time periods, dates of first use in development, and whether synthetic data generation was used. This obligation applies to any system released on or after January 1, 2022, with initial documentation due by January 1, 2026, and updated documentation required before each new release or substantial modification. Three exemptions apply: systems solely for security and integrity purposes, systems solely for national airspace aircraft operations, and national security/military/defense systems available only to federal entities. Notably, the statute contains no enforcement mechanism or penalties — compliance is effectively self-enforced. This is one of the earliest U.S. state laws requiring public training data disclosure and is considerably more detailed in its enumerated requirements than the EU AI Act's Article 53 training data summary obligation, though weaker in enforcement.

Statutory Text

On or before January 1, 2026, and before each time thereafter that a generative artificial intelligence system or service, or a substantial modification to a generative artificial intelligence system or service, released on or after January 1, 2022, is made publicly available to Californians for use, regardless of whether the terms of that use include compensation, the developer of the system or service shall post on the developer's internet website documentation regarding the data used by the developer to train the generative artificial intelligence system or service, including, but not be limited to, all of the following: (a) A high-level summary of the datasets used in the development of the generative artificial intelligence system or service, including, but not limited to: (1) The sources or owners of the datasets. (2) A description of how the datasets further the intended purpose of the artificial intelligence system or service. (3) The number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets. (4) A description of the types of data points within the datasets. For purposes of this paragraph, the following definitions apply: (A) As applied to datasets that include labels, "types of data points" means the types of labels used. (B) As applied to datasets without labeling, "types of data points" refers to the general characteristics. (5) Whether the datasets include any data protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain. (6) Whether the datasets were purchased or licensed by the developer. (7) Whether the datasets include personal information, as defined in subdivision (v) of Section 1798.140. (8) Whether the datasets include aggregate consumer information, as defined in subdivision (b) of Section 1798.140. (9) Whether there was any cleaning, processing, or other modification to the datasets by the developer, including the intended purpose of those efforts in relation to the artificial intelligence system or service. (10) The time period during which the data in the datasets were collected, including a notice if the data collection is ongoing. (11) The dates the datasets were first used during the development of the artificial intelligence system or service. (12) Whether the generative artificial intelligence system or service used or continuously uses synthetic data generation in its development. A developer may include a description of the functional need or desired purpose of the synthetic data in relation to the intended purpose of the system or service. (b) A developer shall not be required to post documentation regarding the data used to train a generative artificial intelligence system or service for any of the following: (1) A generative artificial intelligence system or service whose sole purpose is to help ensure security and integrity. For purposes of this paragraph, "security and integrity" has the same meaning as defined in subdivision (ac) of Section 1798.140, except as applied to any developer or user and not limited to businesses, as defined in subdivision (d) of that section. (2) A generative artificial intelligence system or service whose sole purpose is the operation of aircraft in the national airspace. (3) A generative artificial intelligence system or service developed for national security, military, or defense purposes that is made available only to a federal entity.