1 2 3 4 UNITED STATES DISTRICT COURT 5 NORTHERN DISTRICT OF CALIFORNIA 6 7 STEWART ONAN, et al., Case No. 24-cv-01451-CRB (LJC)
8 Plaintiffs, ORDER REGARDING DISCOVERY 9 v. DISPUTE AT ECF NO. 248
10 DATABRICKS, INC., et al., Re: ECF Nos. 246, 248 Defendants. 11
12 13 The parties disagree over (1) whether Defendants must answer Plaintiffs’ Interrogatory 14 Nos. 42-50, (2) whether Defendants must produce licensing agreements for text data and related 15 communications, and (3) whether Patrick Wendell may be designated as a document custodian. 16 ECF Nos. 247-3, 248.1 The close of fact discovery, originally set for November 21, 2025, was 17 continued to January 5, 2026. ECF No. 238. Having considered the record in this case, the 18 parties’ arguments, and the relevant legal authority, the undersigned rules as follows: Defendants 19 shall answer Plaintiffs’ Interrogatory Nos. 42-50. Defendants shall produce licensing agreements 20 for text data (but do not need to produce related communications). Defendants are not required to 21 conduct a search of Patrick Wendell’s custodial file. The joint administrative motion at ECF No. 22 246 is granted. 23 I. INTERROGATORY NOS. 42-50 24 Federal Rule of Civil Procedure 33(a) provides that, “unless otherwise stipulated or 25 ordered by the court, a party may serve on any other party no more than 25 written 26 interrogatories.” The plain language of Rule 33(a) suggests that “each plaintiff may serve each 27 1 defendant with 25 interrogatories.” Trevino v. ACB Am., Inc., 232 F.R.D. 612, 614 (N.D. Cal. 2 2006). However, courts often read Rule 33(a) “to include some reasonable limit” on the number 3 of interrogatories that may be served in a multi-plaintiff action. Herroz v. CRST Van Expedited, 4 Inc., No. ED CV 15-507, 2015 WL 13914976, at *3 (C.D. Cal. Nov. 2, 2015) (explaining that 5 “[s]urely if twelve plaintiffs, all identically situated and acting in unison, brought a lawsuit, they 6 would not be permitted 300 interrogatories”). “District courts have applied the 25-interrogatory 7 limit as a ‘per side’ rule when the parties to an action are nominally separate,” that is, “when 8 represented by a single attorney, when there is a unity of action, or when there is a legal 9 relationship between the parties.” Fate Therapeutics, Inc. v. Shoreline Biosciences, Inc., No. 22- 10 cv-00676, 2023 WL 4142009, at *1 (S.D. Cal. June 22, 2023) (internal quotations omitted) 11 (collecting cases). “[T]he decision to consider multiple parties as one for the purposes of Rule 12 33(a) is within the discretion of the court.’” Herroz, 2015 WL 13914976, at *3 (quoting Rahman 13 v. Smith & Wollensky Rest. Grp., Inc., 2007 WL 1521117, at *8 (S.D.N.Y. May 24, 2007)). 14 There are five named Plaintiffs in this case: Stewart O’Nan, Abdi Nazemian, Brian Keene, 15 Rebecca Makkai, and Jason Reynolds. See ECF No. 131. They are represented by the same 16 counsel, are advancing the same claims, and, at least at this point in the litigation, are acting in 17 concert with one another. See, e.g., id.; ECF No. 196 (motion by all Plaintiffs to modify 18 scheduling order and for leave to file second amended complaint). Plaintiffs argue that under 19 Rule 33’s twenty-five-interrogatories-per-party limit, having served fifty interrogatories total is 20 warranted. See ECF No. 247-3 at 2-3. Defendants argue that Plaintiffs should be treated as one 21 party – and, collectively, be permitted to serve 25 interrogatories – because they are similarly 22 situated, acted in unison, and jointly served the same interrogatories. Id. at 4-5. 23 Given that Plaintiffs are represented by the same counsel and are jointly litigating this 24 action, Defendants’ argument is reasonable. See Fate Therapeutics, 2023 WL 4142009, at *1. It 25 is also somewhat besides the point. “Leave to serve additional interrogatories may be granted to 26 the extent consistent with Rule 26(b)(1) and (2).” Fed. R. Civ. P. 33(a). Given “the importance of 27 the issues at stake in the action, the amount in controversy, the parties’ relative access to relevant 1 Fed. R. Civ. P. 26(b)(1); see Herroz, 2015 WL 13914976, at *4 (“[P]laintiffs have served only 51 2 or so interrogatories, not 100. Given the complexity of the case, the court would grant plaintiffs 3 leave to serve more than 25 interrogatories in any event.”). Defendants are accordingly ordered to 4 answer Interrogatory Nos. 42-50. 5 II. THIRD-PARTY LICENSING AGREEMENTS 6 Plaintiffs’ Request for Production No. 3. requests “[a]ll agreements, licenses, partnerships, or 7 collaborations, and any Documents or Communications regarding potential agreements, licenses, 8 partnerships, or collaborations related to the acquisition and/or use of the Training Data.” ECF No. 207-1 9 at 10. Defendants objected to this request, but agreed to produce documents showing agreements “by 10 which Mosaic acquired and/or used data from the RedPajama – Books Dataset and Books3 Dataset to train 11 the [MPT] models identified in the Complaint.” Id. at 11. Plaintiffs now seek an order compelling 12 Defendants to produce licensing agreements they entered into with third parties to obtain curated training 13 data for LLM training. ECF No. 247-3 at 3. Plaintiffs point to the undersigned’s earlier Order, where she 14 found that nonparty Microsoft’s licensing agreements with publishers to obtain books for LLM training 15 were relevant. Defendants oppose Plaintiffs’ request, arguing that the agreements Plaintiffs seek “have no 16 connection to this litigation,” as they are not “related to the MPT models nor are they agreements for 17 [licensing] books.” ECF No. 247-3 at 5. 18 Defendants are correct that the instant request is different from Plaintiffs’ request for Microsoft’s 19 licensing agreements. There, the undersigned recognized that evidence that Microsoft had paid for what 20 Defendants allegedly stole was arguably relevant to Defendants’ anticipated fair use argument in that it 21 could show (1) the damage to Plaintiffs from Defendants’ alleged infringement and “unrestricted and 22 widespread conduct of the sort engaged in by” Defendants and (2) that Defendants could have obtained an 23 immense corpus of written work to train their LLMs through licensing with authors or publishers rather 24 than alleged infringement. ECF No. 152 at 8; Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 590 25 (1994) (internal quotations omitted); see 17 USC § 107. 26 Key to the undersigned’s reasoning was that Microsoft’s licenses were with publishers for 27 permission to use books for LLM development. How much Microsoft was willing to pay for those books 1 infringement, and what Plaintiffs stand to lose.2 Similarly, the feasibility of technology companies 2 assembling “convenient, general-purpose librar[ies] of works” through licensing rather than copying could 3 undercut arguments that any alleged copying was necessary for technological advancement. Bartz, 787 F. 4 Supp. 3d at 1033; see, e.g. Kadrey v. Meta Platforms, Inc., 788 F. Supp. 3d 1026, 1050 (N.D. Cal. 2025) 5 (quoting Google LLC v. Oracle America, Inc., 593 U.S. 1, 3 (2021)) (discussing “the public benefits the 6 copying will likely produce”).
Free access — add to your briefcase to read the full text and ask questions with AI
1 2 3 4 UNITED STATES DISTRICT COURT 5 NORTHERN DISTRICT OF CALIFORNIA 6 7 STEWART ONAN, et al., Case No. 24-cv-01451-CRB (LJC)
8 Plaintiffs, ORDER REGARDING DISCOVERY 9 v. DISPUTE AT ECF NO. 248
10 DATABRICKS, INC., et al., Re: ECF Nos. 246, 248 Defendants. 11
12 13 The parties disagree over (1) whether Defendants must answer Plaintiffs’ Interrogatory 14 Nos. 42-50, (2) whether Defendants must produce licensing agreements for text data and related 15 communications, and (3) whether Patrick Wendell may be designated as a document custodian. 16 ECF Nos. 247-3, 248.1 The close of fact discovery, originally set for November 21, 2025, was 17 continued to January 5, 2026. ECF No. 238. Having considered the record in this case, the 18 parties’ arguments, and the relevant legal authority, the undersigned rules as follows: Defendants 19 shall answer Plaintiffs’ Interrogatory Nos. 42-50. Defendants shall produce licensing agreements 20 for text data (but do not need to produce related communications). Defendants are not required to 21 conduct a search of Patrick Wendell’s custodial file. The joint administrative motion at ECF No. 22 246 is granted. 23 I. INTERROGATORY NOS. 42-50 24 Federal Rule of Civil Procedure 33(a) provides that, “unless otherwise stipulated or 25 ordered by the court, a party may serve on any other party no more than 25 written 26 interrogatories.” The plain language of Rule 33(a) suggests that “each plaintiff may serve each 27 1 defendant with 25 interrogatories.” Trevino v. ACB Am., Inc., 232 F.R.D. 612, 614 (N.D. Cal. 2 2006). However, courts often read Rule 33(a) “to include some reasonable limit” on the number 3 of interrogatories that may be served in a multi-plaintiff action. Herroz v. CRST Van Expedited, 4 Inc., No. ED CV 15-507, 2015 WL 13914976, at *3 (C.D. Cal. Nov. 2, 2015) (explaining that 5 “[s]urely if twelve plaintiffs, all identically situated and acting in unison, brought a lawsuit, they 6 would not be permitted 300 interrogatories”). “District courts have applied the 25-interrogatory 7 limit as a ‘per side’ rule when the parties to an action are nominally separate,” that is, “when 8 represented by a single attorney, when there is a unity of action, or when there is a legal 9 relationship between the parties.” Fate Therapeutics, Inc. v. Shoreline Biosciences, Inc., No. 22- 10 cv-00676, 2023 WL 4142009, at *1 (S.D. Cal. June 22, 2023) (internal quotations omitted) 11 (collecting cases). “[T]he decision to consider multiple parties as one for the purposes of Rule 12 33(a) is within the discretion of the court.’” Herroz, 2015 WL 13914976, at *3 (quoting Rahman 13 v. Smith & Wollensky Rest. Grp., Inc., 2007 WL 1521117, at *8 (S.D.N.Y. May 24, 2007)). 14 There are five named Plaintiffs in this case: Stewart O’Nan, Abdi Nazemian, Brian Keene, 15 Rebecca Makkai, and Jason Reynolds. See ECF No. 131. They are represented by the same 16 counsel, are advancing the same claims, and, at least at this point in the litigation, are acting in 17 concert with one another. See, e.g., id.; ECF No. 196 (motion by all Plaintiffs to modify 18 scheduling order and for leave to file second amended complaint). Plaintiffs argue that under 19 Rule 33’s twenty-five-interrogatories-per-party limit, having served fifty interrogatories total is 20 warranted. See ECF No. 247-3 at 2-3. Defendants argue that Plaintiffs should be treated as one 21 party – and, collectively, be permitted to serve 25 interrogatories – because they are similarly 22 situated, acted in unison, and jointly served the same interrogatories. Id. at 4-5. 23 Given that Plaintiffs are represented by the same counsel and are jointly litigating this 24 action, Defendants’ argument is reasonable. See Fate Therapeutics, 2023 WL 4142009, at *1. It 25 is also somewhat besides the point. “Leave to serve additional interrogatories may be granted to 26 the extent consistent with Rule 26(b)(1) and (2).” Fed. R. Civ. P. 33(a). Given “the importance of 27 the issues at stake in the action, the amount in controversy, the parties’ relative access to relevant 1 Fed. R. Civ. P. 26(b)(1); see Herroz, 2015 WL 13914976, at *4 (“[P]laintiffs have served only 51 2 or so interrogatories, not 100. Given the complexity of the case, the court would grant plaintiffs 3 leave to serve more than 25 interrogatories in any event.”). Defendants are accordingly ordered to 4 answer Interrogatory Nos. 42-50. 5 II. THIRD-PARTY LICENSING AGREEMENTS 6 Plaintiffs’ Request for Production No. 3. requests “[a]ll agreements, licenses, partnerships, or 7 collaborations, and any Documents or Communications regarding potential agreements, licenses, 8 partnerships, or collaborations related to the acquisition and/or use of the Training Data.” ECF No. 207-1 9 at 10. Defendants objected to this request, but agreed to produce documents showing agreements “by 10 which Mosaic acquired and/or used data from the RedPajama – Books Dataset and Books3 Dataset to train 11 the [MPT] models identified in the Complaint.” Id. at 11. Plaintiffs now seek an order compelling 12 Defendants to produce licensing agreements they entered into with third parties to obtain curated training 13 data for LLM training. ECF No. 247-3 at 3. Plaintiffs point to the undersigned’s earlier Order, where she 14 found that nonparty Microsoft’s licensing agreements with publishers to obtain books for LLM training 15 were relevant. Defendants oppose Plaintiffs’ request, arguing that the agreements Plaintiffs seek “have no 16 connection to this litigation,” as they are not “related to the MPT models nor are they agreements for 17 [licensing] books.” ECF No. 247-3 at 5. 18 Defendants are correct that the instant request is different from Plaintiffs’ request for Microsoft’s 19 licensing agreements. There, the undersigned recognized that evidence that Microsoft had paid for what 20 Defendants allegedly stole was arguably relevant to Defendants’ anticipated fair use argument in that it 21 could show (1) the damage to Plaintiffs from Defendants’ alleged infringement and “unrestricted and 22 widespread conduct of the sort engaged in by” Defendants and (2) that Defendants could have obtained an 23 immense corpus of written work to train their LLMs through licensing with authors or publishers rather 24 than alleged infringement. ECF No. 152 at 8; Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 590 25 (1994) (internal quotations omitted); see 17 USC § 107. 26 Key to the undersigned’s reasoning was that Microsoft’s licenses were with publishers for 27 permission to use books for LLM development. How much Microsoft was willing to pay for those books 1 infringement, and what Plaintiffs stand to lose.2 Similarly, the feasibility of technology companies 2 assembling “convenient, general-purpose librar[ies] of works” through licensing rather than copying could 3 undercut arguments that any alleged copying was necessary for technological advancement. Bartz, 787 F. 4 Supp. 3d at 1033; see, e.g. Kadrey v. Meta Platforms, Inc., 788 F. Supp. 3d 1026, 1050 (N.D. Cal. 2025) 5 (quoting Google LLC v. Oracle America, Inc., 593 U.S. 1, 3 (2021)) (discussing “the public benefits the 6 copying will likely produce”). In contrast, how much Defendants were willing to pay other technology 7 companies for curated datasets designed to fine-tune LLMs to perform specific tasks says relatively little 8 about whether—or how much—Defendants would have paid Plaintiffs for their written works or “the effect 9 of” the alleged infringement “upon the potential market for or value of the copyrighted work.” Campbell, 10 510 U.S. at 590 (quoting 17 U.S.C. § 170). 11 But this case may very well rise or fall on whether Defendants’ alleged use of Plaintiffs’ works was 12 fair. Fair use is a “flexible” concept, “the application of which requires judicial balancing, depending upon 13 relevant circumstances, including significant changes in technology.” Oracle, 593 U.S. at 19-20. Fair use 14 in the context of generative AI training is a novel area of law, and the undersigned is hesitant to unduly 15 limit the scope of discovery in a manner that could foreclose fair use arguments the parties may advance. 16 Judge Breyer may consider Defendants’ willingness and ability to pay for licensing training data—even 17 training data in a different form, acquired for a different express purpose—in evaluating whether 18 Defendants alleged use of Plaintiffs’ works was fair. In general non-privileged information “relevant to 19 any party’s claim or defense” may be discoverable. Fed. R. Civ. P. 26(b)(1). “‘[R]elevancy’ is construed 20 broadly.” Alves v. Riverside Cnty., 339 F.R.D. 556, 559 (C.D. Cal. 2021). 21 Furthermore, given that Defendant have not presented a record showing that the burden of this 22 discovery outweighs its likely benefit, Plaintiffs’ request that Defendants produce the agreements is 23 granted. Id. (After relevancy has been established, “[t]he party opposing discovery then has the burden of 24 showing that discovery should be prohibited, and the burden of clarifying, explaining or supporting its 25 objections.”). However, Defendants do not need to produce any “related communications” and are just 26
27 2 Of course, Judge Breyer may determine that this is not a type of loss that “the Copyright Act 1 required to produce executed agreements with third parties to obtain textual data for training purposes. 2 III. PATRICK WENDELL 3 Plaintiffs next request to add Patrick Wendell as a document custodian. Plaintiffs deposed 4 Mr. Wendell on December 4,2025, where he testified that he had a significant role in Databricks’ 5 acquisition of MosaicML. See ECF No. 247-5 at 3 (explaining that he was “the primary member 6 of the leadership team evaluating whether we should do the acquisition, as well as understanding 7 what value Mosaic would bring to Databricks”). Plaintiffs then proceeded to request to add Mr. 8 Wendell as a document custodian on December 22, 2025. They argue that he “is likely to possess 9 unique documents related to Databricks’ motivation for the acquisition,” and request that 10 Defendants search Mr. Wendell’s custodial file for the period between May 1 to July 19, 2023 11 using the search string “Mosaic* AND (MPT* OR Storywriter*) AND (revenue* OR profit* OR 12 forecast* OR purchase OR acquisition OR acquire OR roadmap).” ECF No. 247-3 at 4. 13 Defendants oppose the request to add Mr. Wendell, arguing that the request is untimely and that 14 Plaintiffs’ apparent “regret” over “their choice of custodians” does not justify the late addition of a 15 new custodian. Id. at 6. 16 The undersigned agrees with Defendants. Although discovery is an iterative process, and 17 it is generally appropriate for parties to tailor requests based on what they learn earlier in 18 discovery, Plaintiffs have not shown that adding a new custodian at this late stage is warranted. 19 Plaintiffs may not have learned that Mr. Wendell was “the primary member of the leadership team 20 evaluating” the acquisition until December, but they knew he involved in the acquisition months 21 before that and delayed pursuing this matter. ECF No. 247-5 at 3; see ECF No. 178-6 (identifying 22 Wendell as a member of the “ring-fenced diligence group” evaluating the acquisition). The 23 discovery cutoff has now passed and adding a new document custodian now would unduly delay 24 the case schedule. Timeliness and diligence issues aside, other high-level executives involved in 25 the acquisition from both Databricks and MosaicML have been designated custodians and had 26 their custodial files combed through. See, e.g., ECF No. 130 (granting Plaintiffs’ request to 27 designate Databricks CEO Ali Ghodsi as a custodian), ECF No. 127. While Plaintiffs may be 1 it is also quite plausible that most of these documents were included in these other custodians’ 2 || files. ECF No. 247-3 at 4. 3 Plaintiffs’ request to add Patrick Wendell as a custodian is accordingly denied. 4 || IV. CONCLUSION 5 Defendants shall respond to Interrogatory Nos. 42-50 and produce the executed agreements 6 || for textual training data no later than January 30, 2026. 7 IT IS SO ORDERED. 8 Dated: January 16, 2026 9 10 hs wrtre LBA. CISNBROS Uniged States Magistrate Judge 12
15 16
= 17
Z 18 19 20 21 22 23 24 25 26 27 28