Regulon — NY-S-06955 · NY S 6955 (AI Training Data Transparency Act)

2025-03-27 committee referral REFERRED TO INTERNET AND TECHNOLOGY

2026-01-07 committee referral REFERRED TO INTERNET AND TECHNOLOGY

2026-02-18 AMEND AND RECOMMIT TO INTERNET AND TECHNOLOGY AMEND AND RECOMMIT TO INTERNET AND TECHNOLOGY

2026-02-18 PRINT NUMBER 6955A PRINT NUMBER 6955A

2026-02-25 1ST REPORT CAL.433 1ST REPORT CAL.433

2026-02-26 2ND REPORT CAL. 2ND REPORT CAL.

2026-03-04 chamber passage ADVANCED TO THIRD READING

Summary

Requires developers of generative AI models or services made publicly available to New Yorkers to post detailed training data documentation on the developer's website. The documentation must include a high-level summary of datasets used, covering sources, data types, volume, IP status, personal information presence, data processing methods, collection timeframes, and synthetic data use. The disclosure obligation applies to models released on or after January 1, 2022, and must be posted by January 1, 2027, and before each subsequent release or substantial modification. Exemptions exist for aviation-only AI and national security AI available only to federal entities. The bill contains no enforcement mechanism, penalties, or private right of action.

Enforcement & Penalties

Enforcement Authority

No enforcement mechanism is specified in the bill. No agency is designated to enforce the training data disclosure obligation, and no private right of action is created. The bill amends the General Business Law, which is generally subject to enforcement by the New York Attorney General under Executive Law § 63(12) for repeated or persistent fraud or illegality, but no specific enforcement authority or mechanism is established by this bill.

Penalties

The bill specifies no penalties, damages, or remedies of any kind. No civil penalties, statutory damages, injunctive relief, or attorney fees provisions are included.

Who Is Covered

"Developer" means a person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an artificial intelligence model or service for use by members of the public.

What Is Covered

"Generative artificial intelligence" means a class of AI models that emulate the structure and characteristics of input data to generate derived synthetic content, including, but not limited to, images, videos, audio, text, and other digital content.

Compliance Obligations 1 obligation · click obligation ID to open requirement page

T-03 Training Data Disclosure · T-03.2 · Developer · Foundation ModelContent Generation

Gen. Bus. Law § 1432(1)(a)-(l), (2)

Plain Language

Developers of generative AI models or services that are made publicly available to New Yorkers — whether free or paid — must post detailed training data documentation on the developer's website. The initial deadline is January 1, 2027, and the obligation recurs before each subsequent release or substantial modification of any generative AI model or service released on or after January 1, 2022. The required documentation includes a high-level summary covering twelve enumerated categories: dataset sources or owners, how the data furthers the model's purpose, data point counts (ranges permitted), data types, IP status (copyright/trademark/patent or public domain), whether data was purchased or licensed, presence of personal information, presence of aggregate consumer information, data cleaning or processing applied, data collection timeframes, dates datasets were first used in development, and synthetic data use. Data point counts may be expressed in general ranges, and dynamic datasets may use estimated figures. Two narrow exemptions apply: aviation-only AI and national security AI available exclusively to federal entities. The bill contains no enforcement mechanism or penalties, creating a disclosure obligation without a specified consequence for noncompliance.

Statutory Text

1. On or before January first, two thousand twenty-seven, and prior to each time thereafter that a generative artificial intelligence model or service, or a substantial modification to a generative artificial intelligence model or service, released on or after January first, two thousand twenty-two, is made publicly available to New Yorkers for use, regardless of whether the terms of such use include compensation, the developer of such model or service shall post on the developer's website documentation regarding the data used by the developer to train the generative artificial intelligence model or service, including a high-level summary of the datasets used in the development of the generative artificial intelligence model or service, including, but not limited to: (a) the sources or owners of the datasets; (b) a description of how the datasets further the intended purpose of the artificial intelligence model or service; (c) the number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets; (d) a description of the types of data points within the datasets. For purposes of this paragraph, the following definitions apply: (i) as applied to datasets that include labels, "types of data points" means the types of labels used; and (ii) as applied to datasets without labeling, "types of data points" refers to the general characteristics; (e) whether the datasets include any data protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain; (f) whether the datasets were purchased or licensed by the developer; (g) whether the datasets include personal information or personal identifying information, as defined in section eight hundred ninety-nine-aaa of this chapter; (h) whether the datasets include aggregate consumer information; (i) whether there was any cleaning, processing, or other modification to the datasets by the developer, including the intended purpose of those efforts in relation to the artificial intelligence model or service; (j) the time period during which the data in the datasets were collected, including a notice if the data collection is ongoing; (k) the dates the datasets were first used during the development of the artificial intelligence model or service; and (l) whether the generative artificial intelligence model or service used or continuously uses synthetic data generation in its development. A developer may include a description of the functional need or desired purpose of the synthetic data in relation to the intended purpose of the model or service. 2. A developer shall not be required to post documentation regarding the data used to train a generative artificial intelligence model or service for any of the following: (a) A generative artificial intelligence model or service whose sole purpose is the operation of aircraft in the national airspace; or (b) A generative artificial intelligence model or service developed for national security, military, or defense purposes that is made available only to a federal entity.