Sec. 2(1)(a)(i)-(x), (b), (2)
Plain Language
Developers of generative AI systems or services made publicly available to Washington residents must publish detailed training data documentation on their website before each release or substantial modification. The documentation must include a high-level summary covering ten specified categories: dataset sources, how datasets serve the system's purpose, data point counts, data point types, whether data was purchased/licensed/public, presence of personal information, presence of aggregate consumer information, data cleaning or processing performed, dataset training dates, and use of synthetic data generation. The obligation applies retroactively to systems released on or after January 1, 2022, with initial documentation due by January 1, 2026. Three categories are exempt: systems solely for security and integrity, systems solely for aircraft operation in national airspace, and national security/military/defense systems available only to federal entities. The scope of "training" is broad — it includes testing, validating, and fine-tuning by the developer.
Statutory Text
(1) On or before January 1, 2026, and before each time thereafter that a generative artificial intelligence system or service, or a substantial modification to a generative artificial intelligence system or service, released on or after January 1, 2022, is made publicly available to Washingtonians for use, regardless of whether the terms of that use include compensation, the developer of the system or service shall post on the developer's internet website documentation regarding the data used by the developer to train the generative artificial intelligence system or service including, but not limited to: (a) A high-level summary of the datasets used in the development of the generative artificial intelligence system or service including, but not limited to: (i) The sources or owners of the datasets; (ii) A description of how the datasets further the intended purpose of the generative artificial intelligence system or service; (iii) The number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets; (iv) A description of the types of data points within the datasets; (v) Whether the datasets were purchased or licensed by the developer or if the datasets were publicly available; (vi) Whether the datasets include personal information, as defined in RCW 19.373.010; (vii) Whether the datasets include aggregate consumer information; (viii) Whether there was any cleaning, processing, or other modification to the datasets by the developer, including the intended purpose of those efforts in relation to the generative artificial intelligence system or service; (ix) The dates the datasets were first trained or the date of the last significant update to the datasets during the development of the generative artificial intelligence system or service; and (x) Whether the generative artificial intelligence system or service used or continuously uses synthetic data generation in its development. A developer may include a description of the functional need or desired purpose of the synthetic data in relation to the intended purpose of the system or service. (b) For purposes of this subsection, the following definitions apply: (i) As applied to datasets that include labels, "types of data points" means the types of labels used; and (ii) As applied to datasets without labeling, "types of data points" refers to the general characteristics. (2) A developer is not required to post documentation regarding the data used to train a generative artificial intelligence system or service for any of the following: (a) A generative artificial intelligence system or service whose sole purpose is to help ensure security and integrity; (b) A generative artificial intelligence system or service whose sole purpose is the operation of aircraft in the national airspace; and (c) A generative artificial intelligence system or service developed for national security, military, or defense purposes that is made available only to a federal entity.