Developers must disclose information about the data used to train AI models. Public disclosure obligations require posting documentation on the developer's website covering dataset sources, data types, volume, IP status, personal information presence, processing methods, collection timeframes, and use of synthetic data. Regulator disclosure obligations require submitting similar documentation to a designated authority, which may treat it as confidential.
On or before January 1, 2026, and before each time thereafter that a generative artificial intelligence system or service, or a substantial modification to a generative artificial intelligence system or service, released on or after January 1, 2022, is made publicly available to Californians for use, regardless of whether the terms of that use include compensation, the developer of the system or service shall post on the developer's internet website documentation regarding the data used by the developer to train the generative artificial intelligence system or service, including, but not be limited to, all of the following: (a) A high-level summary of the datasets used in the development of the generative artificial intelligence system or service, including, but not limited to: (1) The sources or owners of the datasets. (2) A description of how the datasets further the intended purpose of the artificial intelligence system or service. (3) The number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets. (4) A description of the types of data points within the datasets. For purposes of this paragraph, the following definitions apply: (A) As applied to datasets that include labels, "types of data points" means the types of labels used. (B) As applied to datasets without labeling, "types of data points" refers to the general characteristics. (5) Whether the datasets include any data protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain. (6) Whether the datasets were purchased or licensed by the developer. (7) Whether the datasets include personal information, as defined in subdivision (v) of Section 1798.140. (8) Whether the datasets include aggregate consumer information, as defined in subdivision (b) of Section 1798.140. (9) Whether there was any cleaning, processing, or other modification to the datasets by the developer, including the intended purpose of those efforts in relation to the artificial intelligence system or service. (10) The time period during which the data in the datasets were collected, including a notice if the data collection is ongoing. (11) The dates the datasets were first used during the development of the artificial intelligence system or service. (12) Whether the generative artificial intelligence system or service used or continuously uses synthetic data generation in its development. A developer may include a description of the functional need or desired purpose of the synthetic data in relation to the intended purpose of the system or service. (b) A developer shall not be required to post documentation regarding the data used to train a generative artificial intelligence system or service for any of the following: (1) A generative artificial intelligence system or service whose sole purpose is to help ensure security and integrity. For purposes of this paragraph, "security and integrity" has the same meaning as defined in subdivision (ac) of Section 1798.140, except as applied to any developer or user and not limited to businesses, as defined in subdivision (d) of that section. (2) A generative artificial intelligence system or service whose sole purpose is the operation of aircraft in the national airspace. (3) A generative artificial intelligence system or service developed for national security, military, or defense purposes that is made available only to a federal entity.
(2) On and after June 30, 2026, and except as provided in subsection (6) of this section, a developer of a high-risk artificial intelligence system shall make available to the deployer or other developer of the high-risk artificial intelligence system: (3) (a) Except as provided in subsection (6) of this section, a developer that offers, sells, leases, licenses, gives, or otherwise makes available to a deployer or other developer a high-risk artificial intelligence system on or after June 30, 2026, shall make available to the deployer or other developer, to the extent feasible, the documentation and information, through artifacts such as model cards, dataset cards, or other impact assessments, necessary for a deployer, or for a third party contracted by a deployer, to complete an impact assessment pursuant to section 6-1-1703 (3).
(1) Except as provided in subsection (f) of this Code section, a developer that offers, sells, leases, licenses, gives, or otherwise makes available to a deployer or other developer an automated decision system shall make available to the deployer or other developer, to the extent feasible, all of the information required to be provided to the Attorney General by subsection (b) of this Code section, as well as the documentation and information, through artifacts such as model cards, data set cards, or other impact assessments, necessary for a deployer or third party contracted by a deployer to complete an impact assessment pursuant to subsection (e) of Code Section 10-16-3. (2) A developer that also serves as a deployer for an automated decision system is not required to generate the documentation required by this subsection unless the automated decision system is provided to an unaffiliated entity acting as a deployer.
Documentation describing: (ii) the data governance measures used to cover the training datasets and examine the suitability of data sources, possible biases, and appropriate mitigation;
1. On or before January first, two thousand twenty-seven, and prior to each time thereafter that a generative artificial intelligence model or service, or a substantial modification to a generative artificial intelligence model or service, released on or after January first, two thousand twenty-two, is made publicly available to New Yorkers for use, regardless of whether the terms of such use include compensation, the developer of such model or service shall post on the developer's website documentation regarding the data used by the developer to train the generative artificial intelligence model or service, including a high-level summary of the datasets used in the development of the generative artificial intelligence model or service, including, but not limited to: (a) the sources or owners of the datasets; (b) a description of how the datasets further the intended purpose of the artificial intelligence model or service; (c) the number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets; (d) a description of the types of data points within the datasets. For purposes of this paragraph, the following definitions apply: (i) as applied to datasets that include labels, "types of data points" means the types of labels used; and (ii) as applied to datasets without labeling, "types of data points" refers to the general characteristics; (e) whether the datasets include any data protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain; (f) whether the datasets were purchased or licensed by the developer; (g) whether the datasets include personal information or personal identifying information, as defined in section eight hundred ninety-nine-aaa of this chapter; (h) whether the datasets include aggregate consumer information; (i) whether there was any cleaning, processing, or other modification to the datasets by the developer, including the intended purpose of those efforts in relation to the artificial intelligence model or service; (j) the time period during which the data in the datasets were collected, including a notice if the data collection is ongoing; (k) the dates the datasets were first used during the development of the artificial intelligence model or service; and (l) whether the generative artificial intelligence model or service used or continuously uses synthetic data generation in its development. A developer may include a description of the functional need or desired purpose of the synthetic data in relation to the intended purpose of the model or service. 2. A developer shall not be required to post documentation regarding the data used to train a generative artificial intelligence model or service for any of the following: (a) A generative artificial intelligence model or service whose sole purpose is the operation of aircraft in the national airspace; or (b) A generative artificial intelligence model or service developed for national security, military, or defense purposes that is made available only to a federal entity.
1. Any person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies a generative artificial intelligence model or service using data of which a substantial part is derived from individuals employed or contracted by the entity, regardless if whether the model is made publicly available, shall ensure that the following information is disclosed to each employee whose data is used to train the artificial intelligence model: (a) the intended purpose of the artificial intelligence model or service; (b) a description of how the collected datasets further the intended purpose of the artificial intelligence model or service; (c) a description of the types of data points within the datasets; (d) whether the datasets include personal information or personal identifying information, as defined in section eight hundred ninety-nine-aaa of this chapter; (e) the dates the datasets were first used during the development of the artificial intelligence model or service; and (f) the time period during which the data in the datasets were collected, including a notice if the data collection is ongoing. 2. An entity that uses employee or contractor data to design, code, produce, or substantially modify a generative artificial intelligence model or service shall not be required to disclose the information required by this section if the model or service: (a) is solely intended to be used in the operation of aircraft in the national airspace; or (b) is developed for national security, military, or defense purposes and only made available to a federal entity.
(ii) the data governance measures used to cover the training datasets and examine the suitability of data sources, possible biases, and appropriate mitigation;
1. On or before January first, two thousand twenty-seven, and prior to each time thereafter that a generative artificial intelligence model or service, or a substantial modification to a generative artificial intelligence model or service, released on or after January first, two thousand twenty-two, is made publicly available to New Yorkers for use, regardless of whether the terms of such use include compensation, the developer of such model or service shall post on the developer's website documentation regarding the data used by the developer to train the generative artificial intelligence model or service, including a high-level summary of the datasets used in the development of the generative artificial intelligence model or service, including, but not limited to: (a) the sources or owners of the datasets; (b) a description of how the datasets further the intended purpose of the artificial intelligence model or service; (c) the number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets; (d) a description of the types of data points within the datasets. For purposes of this paragraph, the following definitions apply: (i) as applied to datasets that include labels, "types of data points" means the types of labels used; and (ii) as applied to datasets without labeling, "types of data points" refers to the general characteristics; (e) whether the datasets include any data protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain; (f) whether the datasets were purchased or licensed by the developer; (g) whether the datasets include personal information or personal identifying information, as defined in section eight hundred ninety-nine-aaa of this chapter; (h) whether the datasets include aggregate consumer information; (i) whether there was any cleaning, processing, or other modification to the datasets by the developer, including the intended purpose of those efforts in relation to the artificial intelligence model or service; (j) the time period during which the data in the datasets were collected, including a notice if the data collection is ongoing; (k) the dates the datasets were first used during the development of the artificial intelligence model or service; and (l) whether the generative artificial intelligence model or service used or continuously uses synthetic data generation in its development. A developer may include a description of the functional need or desired purpose of the synthetic data in relation to the intended purpose of the model or service. 2. A developer shall not be required to post documentation regarding the data used to train a generative artificial intelligence model or service for any of the following: (a) A generative artificial intelligence model or service whose sole purpose is the operation of aircraft in the national airspace; or (b) A generative artificial intelligence model or service developed for national security, military, or defense purposes that is made available only to a federal entity.
(a) A platform that collects user-generated content for the purpose of training artificial intelligence algorithms shall disclose to the user that the user-generated content may be used for the purpose of training artificial intelligence. (b) The disclosure shall be presented to the user at the time the user signs up for the platform and shall be separate from the platform's terms of service agreement. (c) Each user of a platform must acknowledge receipt of the disclosure before being allowed to post user-generated content on the platform.
(1) DBR/OHIC shall provide an initial report to the governor, the senate president and the speaker of the house on the use of artificial intelligence by health insurers within eighteen (18) months of the effective date of this chapter and annually thereafter. (2) The annual report shall state how health insurers use artificial intelligence to manage claims and coverage. The report shall state, for each insurer: (i) The types of artificial intelligence models used; (ii) The role of artificial intelligence in the decision-making process to approve or deny healthcare claims or coverage whenever artificial intelligence is used to make, or is a substantial factor in making, a decision on healthcare claims or coverage; (iii) Information regarding training, testing, and risk management including data governance measures used to cover the training data sets and the measures used to examine the suitability of data sources, possible biases and appropriate mitigation; and (iv) Performance metrics including: number of claims; percentage of claims accepted and denied; the average time claim reviewers and medical professional reviewers spend on each claim and on denials of claims; percentage of claims appealed; and percentage of denials reversed.
(1) DBR/OHIC shall provide an initial report to the governor, the senate president and the speaker of the house on the use of artificial intelligence by health insurers within eighteen (18) months of the effective date of this chapter and annually thereafter. (2) The annual report shall state how health insurers use artificial intelligence to manage claims and coverage. The report shall state, for each insurer: (i) The types of artificial intelligence models used; (ii) The role of artificial intelligence in the decision-making process to approve or deny healthcare claims or coverage whenever artificial intelligence is used to make, or is a substantial factor in making, a decision on healthcare claims or coverage; (iii) Information regarding training, testing, and risk management including data governance measures used to cover the training data sets and the measures used to examine the suitability of data sources, possible biases and appropriate mitigation; and (iv) Performance metrics including: number of claims; percentage of claims accepted and denied; the average time claim reviewers and medical professional reviewers spend on each claim and on denials of claims; percentage of claims appealed; and percentage of denials reversed.
(1) On or before January 1, 2026, and before each time thereafter that a generative artificial intelligence system or service, or a substantial modification to a generative artificial intelligence system or service, released on or after January 1, 2022, is made publicly available to Washingtonians for use, regardless of whether the terms of that use include compensation, the developer of the system or service shall post on the developer's internet website documentation regarding the data used by the developer to train the generative artificial intelligence system or service including, but not limited to: (a) A high-level summary of the datasets used in the development of the generative artificial intelligence system or service including, but not limited to: (i) The sources or owners of the datasets; (ii) A description of how the datasets further the intended purpose of the generative artificial intelligence system or service; (iii) The number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets; (iv) A description of the types of data points within the datasets; (v) Whether the datasets were purchased or licensed by the developer or if the datasets were publicly available; (vi) Whether the datasets include personal information, as defined in RCW 19.373.010; (vii) Whether the datasets include aggregate consumer information; (viii) Whether there was any cleaning, processing, or other modification to the datasets by the developer, including the intended purpose of those efforts in relation to the generative artificial intelligence system or service; (ix) The dates the datasets were first trained or the date of the last significant update to the datasets during the development of the generative artificial intelligence system or service; and (x) Whether the generative artificial intelligence system or service used or continuously uses synthetic data generation in its development. A developer may include a description of the functional need or desired purpose of the synthetic data in relation to the intended purpose of the system or service. (b) For purposes of this subsection, the following definitions apply: (i) As applied to datasets that include labels, "types of data points" means the types of labels used; and (ii) As applied to datasets without labeling, "types of data points" refers to the general characteristics. (2) A developer is not required to post documentation regarding the data used to train a generative artificial intelligence system or service for any of the following: (a) A generative artificial intelligence system or service whose sole purpose is to help ensure security and integrity; (b) A generative artificial intelligence system or service whose sole purpose is the operation of aircraft in the national airspace; and (c) A generative artificial intelligence system or service developed for national security, military, or defense purposes that is made available only to a federal entity.