New York · Assembly Bill · 2025–2026 Regular Sessions
AB6578
New York Assembly Bill 6578-A — An Act to amend the general business law, in relation to establishing the artificial intelligence training data transparency act

Status ● Engrossed Effective N/A Passage Likelihood H

WHAT THIS BILL REGULATES · 1 REQUIREMENT TYPE

How Is This Bill Enforced

Enforcement Authority
No enforcement mechanism is specified in the bill. No agency enforcer is designated and no private right of action is created. Enforcement would depend on general state enforcement authority under the General Business Law.
Private Right of Action
No private right of action. Enforcement is exclusive to the designated authority.
Penalties
The bill does not specify any penalties, damages, or remedies for non-compliance.

What This Bill Requires

Verbatim statutory text on the left; plain-language analysis and a per-section checklist on the right. Numbered markers cross-link to the matching checklist row.

Statutory Text
Analysis & Obligations
Gen. Bus. Law § 1430
Short title

This act shall be known and may be cited as the "artificial intelligenceArtificial intelligence"Artificial intelligence" or "artificial intelligence technology" means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, and that uses machine- and human-based inputs to perceive real and virtual environments, abstract such perceptions into models through analysis in an automated manner, and use model inference to formulate options for information or action.Gen. Bus. Law § 1431(1) training data transparency act".

Establishes the short title of the new article as the "Artificial Intelligence Training Data Transparency Act." This is a naming provision and creates no compliance obligation.

Gen. Bus. Law § 1431
Definitions

(1) "Artificial intelligenceArtificial intelligence"Artificial intelligence" or "artificial intelligence technology" means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, and that uses machine- and human-based inputs to perceive real and virtual environments, abstract such perceptions into models through analysis in an automated manner, and use model inference to formulate options for information or action.Gen. Bus. Law § 1431(1)" or "artificial intelligenceArtificial intelligence"Artificial intelligence" or "artificial intelligence technology" means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, and that uses machine- and human-based inputs to perceive real and virtual environments, abstract such perceptions into models through analysis in an automated manner, and use model inference to formulate options for information or action.Gen. Bus. Law § 1431(1) technology" means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, and that uses machine- and human-based inputs to perceive real and virtual environments, abstract such perceptions into models through analysis in an automated manner, and use model inference to formulate options for information or action.

(2) "DeveloperDeveloper"Developer" means a person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an artificial intelligence model or service for use by members of the public.Gen. Bus. Law § 1431(2)" means a person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifiesSubstantially modifies"Substantially modifies" or "substantial modification" means a new version, new release, or other update to a generative artificial intelligence model or service that materially changes its functionality or performance, including the results of retraining or fine tuning.Gen. Bus. Law § 1431(4) an artificial intelligenceArtificial intelligence"Artificial intelligence" or "artificial intelligence technology" means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, and that uses machine- and human-based inputs to perceive real and virtual environments, abstract such perceptions into models through analysis in an automated manner, and use model inference to formulate options for information or action.Gen. Bus. Law § 1431(1) model or service for use by members of the public.

(3) "Generative artificial intelligenceGenerative artificial intelligence"Generative artificial intelligence" means a class of AI models that are self-supervised and emulate the structure and characteristics of input data to generate derived synthetic content, including, but not limited to, images, videos, audio, text, and other digital content.Gen. Bus. Law § 1431(3)" means a class of AI modelsAI model"AI model" means an information system or component of an information system that implements artificial intelligence technology and uses computational, statistical, or machine-learning techniques to produce outputs from a given set of inputs.Gen. Bus. Law § 1431(8) that are self-supervised and emulate the structure and characteristics of input data to generate derived synthetic content, including, but not limited to, images, videos, audio, text, and other digital content.

(4) "Substantially modifiesSubstantially modifies"Substantially modifies" or "substantial modification" means a new version, new release, or other update to a generative artificial intelligence model or service that materially changes its functionality or performance, including the results of retraining or fine tuning.Gen. Bus. Law § 1431(4)" or "substantial modification" means a new version, new release, or other update to a generative artificial intelligenceGenerative artificial intelligence"Generative artificial intelligence" means a class of AI models that are self-supervised and emulate the structure and characteristics of input data to generate derived synthetic content, including, but not limited to, images, videos, audio, text, and other digital content.Gen. Bus. Law § 1431(3) model or service that materially changes its functionality or performance, including the results of retraining or fine tuning.

(5) "Synthetic data generationSynthetic data generation"Synthetic data generation" means a process in which seed data is used to create artificial data that have some of the statistical characteristics of the seed data.Gen. Bus. Law § 1431(5)" means a process in which seed data is used to create artificial data that have some of the statistical characteristics of the seed data.

(6) "Train a generative artificial intelligence model or serviceTrain a generative artificial intelligence model or service"Train a generative artificial intelligence model or service" includes testing, validating, or fine tuning by the developer of the artificial intelligence model or service.Gen. Bus. Law § 1431(6)" includes testing, validating, or fine tuning by the developer of the artificial intelligenceArtificial intelligence"Artificial intelligence" or "artificial intelligence technology" means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, and that uses machine- and human-based inputs to perceive real and virtual environments, abstract such perceptions into models through analysis in an automated manner, and use model inference to formulate options for information or action.Gen. Bus. Law § 1431(1) model or service.

(7) "Aggregate consumer informationAggregate consumer information"Aggregate consumer information" means information that relates to a group of consumers, from which individual consumer identities have been removed, that is not linked or reasonably linkable to any consumer or household, including via a device. Aggregate consumer information does not mean one or more individual consumer records that have been de-identified.Gen. Bus. Law § 1431(7)" means information that relates to a group of consumers, from which individual consumer identities have been removed, that is not linked or reasonably linkable to any consumer or household, including via a device. Aggregate consumer informationAggregate consumer information"Aggregate consumer information" means information that relates to a group of consumers, from which individual consumer identities have been removed, that is not linked or reasonably linkable to any consumer or household, including via a device. Aggregate consumer information does not mean one or more individual consumer records that have been de-identified.Gen. Bus. Law § 1431(7) does not mean one or more individual consumer records that have been de-identified.

(8) "AI modelAI model"AI model" means an information system or component of an information system that implements artificial intelligence technology and uses computational, statistical, or machine-learning techniques to produce outputs from a given set of inputs.Gen. Bus. Law § 1431(8)" means an information system or component of an information system that implements artificial intelligenceArtificial intelligence"Artificial intelligence" or "artificial intelligence technology" means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, and that uses machine- and human-based inputs to perceive real and virtual environments, abstract such perceptions into models through analysis in an automated manner, and use model inference to formulate options for information or action.Gen. Bus. Law § 1431(1) technology and uses computational, statistical, or machine-learning techniques to produce outputs from a given set of inputs.

Defines eight key terms for the article, including "artificial intelligence," "developer," "generative artificial intelligence," "substantially modifies," "synthetic data generation," "train a generative artificial intelligence model or service," "aggregate consumer information," and "AI model." The definition of "developer" is notably broad — it includes state and local government agencies alongside persons, partnerships, and corporations, and captures anyone who "designs, codes, produces, or substantially modifies" an AI model or service for public use.

The definition of "train" is also broadly drawn, encompassing testing, validating, and fine tuning, which means the disclosure obligations in § 1432 apply to data used for any of these activities, not just initial training runs.

Gen. Bus. Law § 1432
Data used to train generative artificial intelligence models or services
Developer

(1)(a)–(l) 1 On or before January first, two thousand twenty-seven, and prior to each time thereafter that a generative artificial intelligenceGenerative artificial intelligence"Generative artificial intelligence" means a class of AI models that are self-supervised and emulate the structure and characteristics of input data to generate derived synthetic content, including, but not limited to, images, videos, audio, text, and other digital content.Gen. Bus. Law § 1431(3) model or service, or a substantial modification to a generative artificial intelligenceGenerative artificial intelligence"Generative artificial intelligence" means a class of AI models that are self-supervised and emulate the structure and characteristics of input data to generate derived synthetic content, including, but not limited to, images, videos, audio, text, and other digital content.Gen. Bus. Law § 1431(3) model or service, released on or after January first, two thousand twenty-two, is made publicly available to New Yorkers for use, regardless of whether the terms of such use include compensation, the developer of such model or service shall post on the developerDeveloper"Developer" means a person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an artificial intelligence model or service for use by members of the public.Gen. Bus. Law § 1431(2)'s website documentation regarding the data used by the developerDeveloper"Developer" means a person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an artificial intelligence model or service for use by members of the public.Gen. Bus. Law § 1431(2) to train the generative artificial intelligenceGenerative artificial intelligence"Generative artificial intelligence" means a class of AI models that are self-supervised and emulate the structure and characteristics of input data to generate derived synthetic content, including, but not limited to, images, videos, audio, text, and other digital content.Gen. Bus. Law § 1431(3) model or service, including a high-level summary of the datasets used in the development of the generative artificial intelligenceGenerative artificial intelligence"Generative artificial intelligence" means a class of AI models that are self-supervised and emulate the structure and characteristics of input data to generate derived synthetic content, including, but not limited to, images, videos, audio, text, and other digital content.Gen. Bus. Law § 1431(3) model or service, including, but not limited to: (a) the sources or owners of the datasets; (b) a description of how the datasets further the intended purpose of the artificial intelligenceArtificial intelligence"Artificial intelligence" or "artificial intelligence technology" means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, and that uses machine- and human-based inputs to perceive real and virtual environments, abstract such perceptions into models through analysis in an automated manner, and use model inference to formulate options for information or action.Gen. Bus. Law § 1431(1) model or service; (c) the number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets; (d) a description of the types of data points within the datasets. For purposes of this paragraph, the following definitions apply: (i) as applied to datasets that include labels, "types of data points" means the types of labels used; and (ii) as applied to datasets without labeling, "types of data points" refers to the general characteristics; (e) whether the datasets include any data protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain; (f) whether the datasets were purchased or licensed by the developerDeveloper"Developer" means a person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an artificial intelligence model or service for use by members of the public.Gen. Bus. Law § 1431(2); (g) whether the datasets include personal information or personal identifying information, as defined in section eight hundred ninety-nine-aaa of this chapter; (h) whether the datasets include aggregate consumer informationAggregate consumer information"Aggregate consumer information" means information that relates to a group of consumers, from which individual consumer identities have been removed, that is not linked or reasonably linkable to any consumer or household, including via a device. Aggregate consumer information does not mean one or more individual consumer records that have been de-identified.Gen. Bus. Law § 1431(7); (i) whether there was any cleaning, processing, or other modification to the datasets by the developerDeveloper"Developer" means a person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an artificial intelligence model or service for use by members of the public.Gen. Bus. Law § 1431(2), including the intended purpose of those efforts in relation to the artificial intelligenceArtificial intelligence"Artificial intelligence" or "artificial intelligence technology" means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, and that uses machine- and human-based inputs to perceive real and virtual environments, abstract such perceptions into models through analysis in an automated manner, and use model inference to formulate options for information or action.Gen. Bus. Law § 1431(1) model or service; (j) the time period during which the data in the datasets were collected, including a notice if the data collection is ongoing; (k) the dates the datasets were first used during the development of the artificial intelligenceArtificial intelligence"Artificial intelligence" or "artificial intelligence technology" means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, and that uses machine- and human-based inputs to perceive real and virtual environments, abstract such perceptions into models through analysis in an automated manner, and use model inference to formulate options for information or action.Gen. Bus. Law § 1431(1) model or service; and (l) whether the generative artificial intelligenceGenerative artificial intelligence"Generative artificial intelligence" means a class of AI models that are self-supervised and emulate the structure and characteristics of input data to generate derived synthetic content, including, but not limited to, images, videos, audio, text, and other digital content.Gen. Bus. Law § 1431(3) model or service used or continuously uses synthetic data generationSynthetic data generation"Synthetic data generation" means a process in which seed data is used to create artificial data that have some of the statistical characteristics of the seed data.Gen. Bus. Law § 1431(5) in its development. A developerDeveloper"Developer" means a person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an artificial intelligence model or service for use by members of the public.Gen. Bus. Law § 1431(2) may include a description of the functional need or desired purpose of the synthetic data in relation to the intended purpose of the model or service.

(2)(a)–(b) A developerDeveloper"Developer" means a person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an artificial intelligence model or service for use by members of the public.Gen. Bus. Law § 1431(2) shall not be required to post documentation regarding the data used to train a generative artificial intelligence model or serviceTrain a generative artificial intelligence model or service"Train a generative artificial intelligence model or service" includes testing, validating, or fine tuning by the developer of the artificial intelligence model or service.Gen. Bus. Law § 1431(6) for any of the following: (a) A generative artificial intelligenceGenerative artificial intelligence"Generative artificial intelligence" means a class of AI models that are self-supervised and emulate the structure and characteristics of input data to generate derived synthetic content, including, but not limited to, images, videos, audio, text, and other digital content.Gen. Bus. Law § 1431(3) model or service whose sole purpose is the operation of aircraft in the national airspace; or (b) A generative artificial intelligenceGenerative artificial intelligence"Generative artificial intelligence" means a class of AI models that are self-supervised and emulate the structure and characteristics of input data to generate derived synthetic content, including, but not limited to, images, videos, audio, text, and other digital content.Gen. Bus. Law § 1431(3) model or service developed for national security, military, or defense purposes that is made available only to a federal entity.

Section 1432 is the bill's core public-facing obligation. It requires developers of generative AI models or services to post on their website detailed documentation about the data used to train the model, including a high-level summary of datasets. The disclosure is required on or before January 1, 2027, and prior to each subsequent release or substantial modification, for models released on or after January 1, 2022. The obligation applies regardless of whether the model is offered for compensation.

The required documentation is enumerated in twelve categories: dataset sources, purpose descriptions, data-point counts, data-point types (with separate definitions for labeled vs. unlabeled data), IP status (copyright, trademark, or patent), licensing status, presence of personal information, presence of aggregate consumer information, data processing or cleaning methods, collection timeframes, first-use dates, and use of synthetic data generation.

Subdivision 2 carves out models solely for national airspace aircraft operations and models developed for national security, military, or defense purposes available only to federal entities.

Compliance actions 1 item
1
DevelopersDeveloper"Developer" means a person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an artificial intelligence model or service for use by members of the public.Gen. Bus. Law § 1431(2) must post on their website, on or before January 1, 2027, and before each subsequent public release or substantial modification of a generative AI modelAI model"AI model" means an information system or component of an information system that implements artificial intelligence technology and uses computational, statistical, or machine-learning techniques to produce outputs from a given set of inputs.Gen. Bus. Law § 1431(8) or service released on or after January 1, 2022, documentation regarding the data used to train the model or service, including a high-level summary of the datasets covering: (1) the sources or owners of the datasets; (2) a description of how the datasets further the intended purpose; (3) the number of data points, which may be in general ranges with estimates for dynamic datasets; (4) a description of the types of data points (for labeled datasets, the types of labels used; for unlabeled datasets, general characteristics); (5) whether the datasets include data protected by copyright, trademark, or patent or are entirely in the public domain; (6) whether the datasets were purchased or licensed; (7) whether the datasets include personal information or personal identifying information; (8) whether the datasets include aggregate consumer informationAggregate consumer information"Aggregate consumer information" means information that relates to a group of consumers, from which individual consumer identities have been removed, that is not linked or reasonably linkable to any consumer or household, including via a device. Aggregate consumer information does not mean one or more individual consumer records that have been de-identified.Gen. Bus. Law § 1431(7); (9) whether there was any cleaning, processing, or other modification, including its intended purpose; (10) the time period during which data was collected, including notice if collection is ongoing; (11) the dates datasets were first used during development; and (12) whether the model uses or continuously uses synthetic data generationSynthetic data generation"Synthetic data generation" means a process in which seed data is used to create artificial data that have some of the statistical characteristics of the seed data.Gen. Bus. Law § 1431(5). This obligation does not apply to models solely for national airspace aircraft operations or models developed for national security, military, or defense purposes available only to a federal entity.
T-03.2
Gen. Bus. Law § 1433
Employee data used to train generative artificial intelligence models or services
Developer

(1)(a)–(f) 2 Any person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifiesSubstantially modifies"Substantially modifies" or "substantial modification" means a new version, new release, or other update to a generative artificial intelligence model or service that materially changes its functionality or performance, including the results of retraining or fine tuning.Gen. Bus. Law § 1431(4) a generative artificial intelligenceGenerative artificial intelligence"Generative artificial intelligence" means a class of AI models that are self-supervised and emulate the structure and characteristics of input data to generate derived synthetic content, including, but not limited to, images, videos, audio, text, and other digital content.Gen. Bus. Law § 1431(3) model or service using data of which a substantial part is derived from individuals employed or contracted by the entity, regardless if whether the model is made publicly available, shall ensure that the following information is disclosed to each employee whose data is used to train the artificial intelligenceArtificial intelligence"Artificial intelligence" or "artificial intelligence technology" means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, and that uses machine- and human-based inputs to perceive real and virtual environments, abstract such perceptions into models through analysis in an automated manner, and use model inference to formulate options for information or action.Gen. Bus. Law § 1431(1) model: (a) the intended purpose of the artificial intelligenceArtificial intelligence"Artificial intelligence" or "artificial intelligence technology" means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, and that uses machine- and human-based inputs to perceive real and virtual environments, abstract such perceptions into models through analysis in an automated manner, and use model inference to formulate options for information or action.Gen. Bus. Law § 1431(1) model or service; (b) a description of how the collected datasets further the intended purpose of the artificial intelligenceArtificial intelligence"Artificial intelligence" or "artificial intelligence technology" means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, and that uses machine- and human-based inputs to perceive real and virtual environments, abstract such perceptions into models through analysis in an automated manner, and use model inference to formulate options for information or action.Gen. Bus. Law § 1431(1) model or service; (c) a description of the types of data points within the datasets; (d) whether the datasets include personal information or personal identifying information, as defined in section eight hundred ninety-nine-aaa of this chapter; (e) the dates the datasets were first used during the development of the artificial intelligenceArtificial intelligence"Artificial intelligence" or "artificial intelligence technology" means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments, and that uses machine- and human-based inputs to perceive real and virtual environments, abstract such perceptions into models through analysis in an automated manner, and use model inference to formulate options for information or action.Gen. Bus. Law § 1431(1) model or service; and (f) the time period during which the data in the datasets were collected, including a notice if the data collection is ongoing.

(2)(a)–(b) An entity that uses employee or contractor data to design, code, produce, or substantially modify a generative artificial intelligenceGenerative artificial intelligence"Generative artificial intelligence" means a class of AI models that are self-supervised and emulate the structure and characteristics of input data to generate derived synthetic content, including, but not limited to, images, videos, audio, text, and other digital content.Gen. Bus. Law § 1431(3) model or service shall not be required to disclose the information required by this section if the model or service: (a) is solely intended to be used in the operation of aircraft in the national airspace; or (b) is developed for national security, military, or defense purposes and only made available to a federal entity.

Section 1433 imposes a separate, employee-facing disclosure obligation. Any entity — person, partnership, state or local government agency, or corporation — that trains a generative AI model using data substantially derived from its employees or contractors must disclose specified information to each affected employee. This obligation applies regardless of whether the model is made publicly available, which is a broader reach than § 1432's public-availability trigger.

The required disclosures are narrower than § 1432's public documentation: intended purpose, how datasets further that purpose, types of data points, presence of personal or personally identifying information, first-use dates, and collection timeframes. Notably absent are the IP-status, licensing, aggregate-consumer-information, data-cleaning, and synthetic-data disclosures required for public posting.

The same two exemptions apply: models solely for national airspace aircraft operations and models for national security, military, or defense purposes available only to federal entities.

Compliance actions 1 item
2
Any person, partnership, state or local government agency, or corporation that trains a generative AI modelAI model"AI model" means an information system or component of an information system that implements artificial intelligence technology and uses computational, statistical, or machine-learning techniques to produce outputs from a given set of inputs.Gen. Bus. Law § 1431(8) or service using data of which a substantial part is derived from its employees or contractors must disclose to each affected employee: (1) the intended purpose of the AI modelAI model"AI model" means an information system or component of an information system that implements artificial intelligence technology and uses computational, statistical, or machine-learning techniques to produce outputs from a given set of inputs.Gen. Bus. Law § 1431(8) or service; (2) a description of how the collected datasets further that purpose; (3) a description of the types of data points within the datasets; (4) whether the datasets include personal information or personal identifying information; (5) the dates the datasets were first used during development; and (6) the time period during which data was collected, including notice if collection is ongoing. This obligation applies regardless of whether the model is made publicly available. It does not apply to models solely for national airspace aircraft operations or models developed for national security, military, or defense purposes available only to a federal entity.
T-03

Passage Likelihood

High
Status Engrossed
Chamber Passed origin
Committee No action
Majority party Yes
Bipartisan No
Prior session None

Legislative History

2025-03-06 referred to science and technology
2025-05-29 reported referred to rules
2025-06-10 reported
2025-06-10 rules report cal.571
2025-06-10 ordered to third reading rules cal.571
2025-06-10 passed assembly
2025-06-10 delivered to senate
2025-06-10 REFERRED TO RULES
2026-01-07 DIED IN SENATE
2026-01-07 RETURNED TO ASSEMBLY
2026-01-07 ordered to third reading cal.166
2026-01-12 amended on third reading 6578a
2026-02-24 amended on third reading 6578b
2026-05-05 passed assembly
2026-05-05 delivered to senate
2026-05-05 REFERRED TO INTERNET AND TECHNOLOGY

Entry Last Reviewed

2026-05-20
AI generated