A-06578
NY · State · USA
NY
USA
● Pending
Proposed Effective Date
2027-01-01
New York Assembly Bill 6578-A — An Act to amend the general business law, in relation to establishing the artificial intelligence training data transparency act
Requires developers of generative AI models or services made available to New Yorkers to publicly post detailed training data documentation on their websites by January 1, 2027, and before each subsequent release or substantial modification of models released on or after January 1, 2022. Required disclosures include dataset sources, data types, volume, IP status, personal information presence, processing methods, collection timeframes, and use of synthetic data. Separately requires entities that use employee or contractor data to train generative AI to disclose specified information to those employees. Exempts models used solely for national airspace aircraft operations or for national security/military/defense purposes available only to federal entities. The bill contains no enforcement mechanism, designated enforcement authority, or penalty provisions.
Summary

Requires developers of generative AI models or services made available to New Yorkers to publicly post detailed training data documentation on their websites by January 1, 2027, and before each subsequent release or substantial modification of models released on or after January 1, 2022. Required disclosures include dataset sources, data types, volume, IP status, personal information presence, processing methods, collection timeframes, and use of synthetic data. Separately requires entities that use employee or contractor data to train generative AI to disclose specified information to those employees. Exempts models used solely for national airspace aircraft operations or for national security/military/defense purposes available only to federal entities. The bill contains no enforcement mechanism, designated enforcement authority, or penalty provisions.

Enforcement & Penalties
Enforcement Authority
No enforcement mechanism is specified in the bill. No agency is designated with enforcement authority, and no private right of action is created. The bill imposes disclosure obligations but is silent on penalties and enforcement procedures.
Penalties
The bill specifies no penalties, damages, or remedies of any kind. There is no statutory minimum, no civil penalty schedule, no injunctive relief provision, and no attorney fee provision.
Who Is Covered
"Developer" means a person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an artificial intelligence model or service for use by members of the public.
What Is Covered
"Generative artificial intelligence" means a class of AI models that are self-supervised and emulate the structure and characteristics of input data to generate derived synthetic content, including, but not limited to, images, videos, audio, text, and other digital content.
Compliance Obligations 2 obligations · click obligation ID to open requirement page
T-03 Training Data Disclosure · T-03.2 · Developer · Foundation Model
Gen. Bus. Law § 1432(1)(a)-(l), (2)
Plain Language
Developers of generative AI models or services made publicly available to New Yorkers — whether free or paid — must post detailed training data documentation on their website by January 1, 2027, and again before each subsequent release or substantial modification of any model released on or after January 1, 2022. The required disclosure is a high-level summary covering twelve categories: dataset sources/owners, how datasets serve the model's purpose, data point counts (ranges permitted), data point types, IP status (copyright/trademark/patent or public domain), whether data was purchased or licensed, whether personal information is included, whether aggregate consumer information is included, cleaning/processing modifications, data collection timeframes, dates datasets were first used, and whether synthetic data generation was used. Two narrow exemptions apply: models solely for national airspace aircraft operations, and national security/military/defense models available only to federal entities. Note that 'training' is defined broadly to include testing, validating, and fine tuning.
Statutory Text
1. On or before January first, two thousand twenty-seven, and prior to each time thereafter that a generative artificial intelligence model or service, or a substantial modification to a generative artificial intelligence model or service, released on or after January first, two thousand twenty-two, is made publicly available to New Yorkers for use, regardless of whether the terms of such use include compensation, the developer of such model or service shall post on the developer's website documentation regarding the data used by the developer to train the generative artificial intelligence model or service, including a high-level summary of the datasets used in the development of the generative artificial intelligence model or service, including, but not limited to: (a) the sources or owners of the datasets; (b) a description of how the datasets further the intended purpose of the artificial intelligence model or service; (c) the number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets; (d) a description of the types of data points within the datasets. For purposes of this paragraph, the following definitions apply: (i) as applied to datasets that include labels, "types of data points" means the types of labels used; and (ii) as applied to datasets without labeling, "types of data points" refers to the general characteristics; (e) whether the datasets include any data protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain; (f) whether the datasets were purchased or licensed by the developer; (g) whether the datasets include personal information or personal identifying information, as defined in section eight hundred ninety-nine-aaa of this chapter; (h) whether the datasets include aggregate consumer information; (i) whether there was any cleaning, processing, or other modification to the datasets by the developer, including the intended purpose of those efforts in relation to the artificial intelligence model or service; (j) the time period during which the data in the datasets were collected, including a notice if the data collection is ongoing; (k) the dates the datasets were first used during the development of the artificial intelligence model or service; and (l) whether the generative artificial intelligence model or service used or continuously uses synthetic data generation in its development. A developer may include a description of the functional need or desired purpose of the synthetic data in relation to the intended purpose of the model or service. 2. A developer shall not be required to post documentation regarding the data used to train a generative artificial intelligence model or service for any of the following: (a) A generative artificial intelligence model or service whose sole purpose is the operation of aircraft in the national airspace; or (b) A generative artificial intelligence model or service developed for national security, military, or defense purposes that is made available only to a federal entity.
T-03 Training Data Disclosure · Developer · Foundation ModelEmployment
Gen. Bus. Law § 1433(1)(a)-(f), (2)
Plain Language
Any entity — including persons, partnerships, government agencies, or corporations — that develops or substantially modifies a generative AI model using data substantially derived from its own employees or contractors must disclose specified information to each employee whose data was used. This obligation applies regardless of whether the resulting model is made publicly available, meaning purely internal AI tools are covered. Required disclosures to affected employees include: the model's intended purpose, how the datasets serve that purpose, types of data points, whether personal information is included, dates the datasets were first used, and the data collection timeframe. The same narrow exemptions apply as for the public disclosure obligation (national airspace operations and national security/defense models for federal entities only). Notably, the obligated entity in this section is not limited to the defined term 'developer' — it uses a broader formulation covering any entity that uses employee data, potentially capturing entities that would not qualify as developers under § 1431(2) because the model is not made available to the public.
Statutory Text
1. Any person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies a generative artificial intelligence model or service using data of which a substantial part is derived from individuals employed or contracted by the entity, regardless if whether the model is made publicly available, shall ensure that the following information is disclosed to each employee whose data is used to train the artificial intelligence model: (a) the intended purpose of the artificial intelligence model or service; (b) a description of how the collected datasets further the intended purpose of the artificial intelligence model or service; (c) a description of the types of data points within the datasets; (d) whether the datasets include personal information or personal identifying information, as defined in section eight hundred ninety-nine-aaa of this chapter; (e) the dates the datasets were first used during the development of the artificial intelligence model or service; and (f) the time period during which the data in the datasets were collected, including a notice if the data collection is ongoing. 2. An entity that uses employee or contractor data to design, code, produce, or substantially modify a generative artificial intelligence model or service shall not be required to disclose the information required by this section if the model or service: (a) is solely intended to be used in the operation of aircraft in the national airspace; or (b) is developed for national security, military, or defense purposes and only made available to a federal entity.