A great deal of the analysis of securities class action lawsuit settlements revolves around measures of aggregate, average and median settlement amounts. These data, while useful, are relatively unhelpful in trying to anticipate the outcome of any particular case, particularly at the outset. To try to develop a way to predict likely case outcome at the outset of a securities class action lawsuit, four academics conducted a detailed statistical analysis of securities class action settlements in order to identify factors that affect outcomes.

In their April 30, 2012 paper entitled “Predicting Securities Fraud Settlements and Amounts: A Hierarchical Bayesian Model of Federal Securities Class Action Lawsuits” (here), Northwestern University Business Professor Blakeley McShane, Juridigm Principal and Vice President Oliver Watson, U. Penn Law Professor Tom Baker and Fordham University Law Professor Sean Griffith set out to create a “predictive model to forecast case outcomes based exclusively on information available at the time the lawsuit is filed.”

Their model, described in their paper, “estimates (i) the probability of the settlement versus dismissal of a securities class action lawsuit and (ii) the amount for which the class action will settle conditional on the settlement.”

A great deal of the authors’ paper is devoted to a description of the methodology used to derive the data on which their analysis is based. Another significant part of the paper is devoted to a description of their analytic methodology, which, as their title suggests, employs high level statistical approaches and techniques. A detailed description of the authors’ data derivation and statistical methodologies is beyond the scope of this blog (which is another way of saying that I know my limits).

For purposes of understanding the authors’ conclusions, it is useful to note that the authors derived a data set of nearly 1200 securities class action lawsuits and associated case resolutions. Among other critical steps taken to derive their data set, the authors focused exclusively on cases filed post-PSLRA that were filed five years or more before the starting date of their analysis. (The five year cut-off was used to ensure the likelihood that the cases in the data set had been finally resolved). Essentially, the authors looked at cases filed between 1996 and 2005 that otherwise survived the authors’ filters and sorting criteria.

The authors also derived several of their own measures using variety of data sources. For example, in order to determine the “notoriety” or “newsworthiness” of a particular company or case, the authors considered the number of Google News Archive hits associated with the company in the year prior to the lawsuit filing.

Using these and other data points and applying selected statistical methods to develop their model, the authors identified a number of variables predictive of whether a case is settled or dismissed, and variables predictive of the settlement amount if a case is settled.

The variables the authors identified that indicate that a case will most likely settle include “a number of classes or types of securities associated with the case, a higher return on the S&P 500 during the class period, whether or not GAAP violations were alleged and having an individual plaintiff listed.” Factors that indicate that a case is less likely to settle (that is, more likely to be dismissed) include “longer filing times, higher market capitalization, a higher company return during the class period, having an institutional plaintiff listed, and greater public notoriety (as measured by the number of Google hits in the year prior to filing).”

The variables the authors found that positively impact the settlement amount include “the total number of securities, the length of the class period, market capitalization, the company return during the class period, whether or not earnings were restated, whether or not the case was a Securities Act Section 11 case, whether or not insider trading was alleged, the existence of an institutional plaintiff, and the number of Google hits.” Factors associated with lower settlement amounts include “longer filing times and not having an institutional investor listed (i.e., having only an individual plaintiff listed or having no plaintiff listed).”

The authors also found that though GAAP cases are more likely to settle, the GAAP cases that do settle do not have higher settlement amounts. The authors speculate that this is likely due to the fact that an allegation of a GAAP violation significantly bolsters the merits of the case, which increases the chances the case will survive a dismissal motion. The authors suggest that this makes it more appealing for plaintiffs to take on a GAAP violation case even if the potential damage award is relatively low.

At the same time, Rule 10b-5 cases are less likely to settle (that is, more likely to be dismissed) but those that do settle have higher settlement amounts. The authors attribute this to the greater damages available to Rule 10b-5 plaintiffs. The authors suggest that plaintiffs rationally might be willing to pursue cases with a lower survivability probability when the cases are likelier to have larger settlements, assuming the cases survive dismissal. Cases without institutional plaintiffs are more likely to survive motions to dismiss, which the authors interpret to suggest both that institutional investors select the high potential value cases and that plaintiffs’ lawyers exercise more care regarding the merits of cases with only an individual plaintiff.

The authors also noted a number of differences among the various circuits and industries. For example, the authors note that the eleventh circuit appears to have modestly lower settlement amounts whereas the ninth and tenth circuits have modestly higher settlement amounts. Similarly, utilities have somewhat higher settlement amounts.

*Discussion*

I have necessarily summarized here the authors’ much more detailed analysis. The only way to fully understand and appreciate the authors’ predictive analysis, as well as the ways in which the authors’ conclude that the various factors are predictive, is to read their paper in full, which I recommend.

I do note that the ability to predict case outcomes at the outset is important for a number of process participants, including in particular the affected D&O insurers. Among other things, D&O insurers must have reliable means to assess and predict case outcomes at the outset in order to try and set case reserves appropriately. In addition, D&O insurers whose coverage attaches only in the excess layers will want to be able to assess cases at the outset in order to try to determine the likelihood that losses associated with any particular claim will penetrate their attachment point. For the involved D&O insurers, the authors’ predictive model could provide a useful tool.

The authors’ model could prove a useful tool for the defendant companies themselves as well as for their defense counsel. It is critically important for companies and their counsel in setting their litigation strategy to have an accurate understanding of the seriousness of the claim. The authors’ model may provide a useful way for companies and their counsel to make a realistic assessment of the seriousness of the case in order to try to set defense strategy appropriately.

If I were to make one suggestion to the authors in order to make their analysis more accessible, it would be to expand their summary description of the relevant factors so that the factors are not only identified but also so that the nature of their relevance is more apparent. For example, it is of course important for the authors to state in the summary of their conclusions that, for example, “the length of the class period” is a relevant factor positively impacting settlement. It would be even more helpful for the non-mathematician reader for the authors to explain in the conclusion section how the variation of the length of the class period affects the settlement (that is, is it a shorter or a longer class period that positively affects the settlement?). A more detailed explanation in the paper’s discussion section of the authors’ specific conclusions with respect to each of the identified factors would make the authors’ otherwise somewhat intimidating paper more approachable to a wider variety of readers and would make the authors’ conclusions both clearer and more useful for those trying to understand the implications of the authors’ analysis.

I would like to thank Professor Tom Baker for providing me with a copy of this interesting paper.

**UPDATE:** *Following my publication of this post, and in particular in response to my comments about the paper, one of the paper’s authors, Blakeley McShane, contacted me with a supplement to the article, to provide further explanation of the paper and its conclusions. Because I think the supplement significantly aids an understanding of the paper, I have reproduced the supplement in full below. My thanks to McShane for taking the time to prepare a detailed supplement and for his willingness to allow me to publish it here. Here is the supplement: *

Thank you for your interest in our paper “Predicting Securities Fraud Settlements and Amounts: A Hierarchical Bayesian Model of Federal Securities Class Action Lawsuits” which is forthcoming in the Journal of Empirical Legal Studies. We really enjoyed reading your write-up of our results and wanted to follow up on your last paragraph where you requested a more friendly description of the effect of each variable. We will attempt to be as clear as possible and focus on our “best guess” of the effect of each variable (i.e., the dots in Figures 8a and 9a respectively). Of course our estimates are subject to uncertainty (indicated by the thick and thin lines of Figures 8a and 9a) but we will ignore that for the purpose of this discussion.

First, let’s begin by discussing the data. Our principal data source comes from the Riskmetrics Group’s Securities Class Action Services Division which tracks securities fraud class action lawsuits on a commercial basis. Nonetheless, as you mentioned, a substantial amount of processing as well as augmentation with data from other sources was required. This is detailed in Section II of our paper so we will just give a brief description of each of the variables that are “statistically significant” in either of the two stages of our model (i.e., the settlement versus dismissal stage and settlement amount conditional on settlement stage).

• Total Securities: The number of different securities (e.g. stocks, bonds, etc.) associated with the case.

• Filing Time: The length of time from the end of the class period until the filing date.

• Class Length: The length of the class period.

• Market Capitalization: The market capitalization of the plaintiff firm.

• Company Return: Roughly speaking, the percentage return on the plaintiff firm’s stock during the class period (see Section II.C for full details on how we constructed this variable).

• S&P 500 Return: The percentage return on the S&P 500 during the class period.

• GAAP: Whether or not GAAP violations were alleged in the case.

• Restated: Whether or not the allegation mentions that the company’s financial statements were restated.

• 10b5: Whether or not the case was a Rule 10b-5 case.

• Section 11: Whether or not the case was a Securities Act Section 11 case.

• Plaintiff: The plaintiff variable has three values. If one or more institutions are listed as the plaintiff, we set out Plaintiff variable equal to “Institutional”. If no institutions are listed but one or more individuals are, we set it equal to “Individual”. Finally, if nothing is listed in the database, we set it equal to “Unknown”. Of course, this does not mean there is no plaintiff in the case; rather, it means Riskmetrics has not obtained the information for this variable. This is potentially informative for whether or not a case settles and for how much it settles for if it does settle, but probably says more about Riskmetrics’ priorities in gathering data than anything else. In particular, given the nature of Riskmetrics’ business, they are most highly incented to collect complete data for cases which settle and especially those which settle for large amounts. Consequently, we would, for example, a priori expect Empty plaintiff cases (i) to be less likely to settle and (ii) to settle for less when they do settle.

• Insider Trading: Whether or not insider trading was alleged in the case.

• Google Hits: A measure of the newsworthiness or notoriety of the case. In particular, the number of Google News Archive associated with the company name in the year prior to the filing date (see Section II.E for full details on how we constructed this variable).

With the variables defined, let’s begin with the factors that predict settlement amount conditional on a case settling. We identified eleven “statistically significant” predictors of the settlement amount:

• Total Securities: A 1% increase in total securities is associated with a 0.25% increase in the settlement amount.

• Filing Time: A 1% increase in the filing time is associated with a 0.1% decrease in the settlement amount.

• Class Length: A 1% increase in the length of the class period is associated with a 0.1% increase in the settlement amount.

• Market Capitalization: A 1% increase in market capitalization of the plaintiff firm is associated with a 0.4% increase in the settlement amount.

• Company Return: A 1% increase in plaintiff firm’s return over the class period is associated with a 0.2% increase in the settlement amount.

• Restated: Restated financial statements are associated with a 20% increase in the settlement amount.

• Section 11: Securities Act Section 11 cases are associated with a 45% increase in the settlement amount.

• Individual Plaintiff: Individual plaintiff cases are associated with a 35% decrease in the settlement amount relative to cases with an institutional plaintiff.

• Unknown Plaintiff: Unknown plaintiff cases are associated with a 40% decrease in the settlement amount relative to cases with an institutional plaintiff.

• Insider Trading: Insider trading cases are associated with a 30% increase in the settlement amount.

• Google Hits: A 1% increase in the number of Google News hits is associated with a 0.05% increase in the settlement amount.

Many of these variables make intuitive sense. For instance, the total number of securities, market capitalization, and number of Google hits are associated with the size of the firm and hence how much damage can be done and how large a settlement can be extracted. Similarly, the longer the class length, the greater the number of securities traded during the period and, hence, the larger the damages. The higher damages for Section 11 cases (all other things being equal) may reflect the fact that plaintiffs do not need to prove scienter to succeed. The filing time result has a plausible basis as there is often a rush to file the “best” cases. Interestingly, some “merits” variables such as Restated and Insider Trading, which in theory should only affect whether or not a case settles or is dismissed, also impact the settlement amounts thus suggesting that decisions over whether or not there were damages versus how great those damages were may not be entirely independent. The Company Return finding is somewhat surprising, since a lower return during the class return would indicate larger damages from the alleged fraud. Our hypothesis is that that this finding is picking up on the capacity of the defendant to pay. Other things being equal, a company that recently made money is going to be better able to pay a settlement. Finally, the result for the identity of plaintiff is consistent with Riskmetrics’ business model as outlined above.

The interpretation of the significant coefficients for the model which predicts whether a case settles or is dismissed is somewhat more complicated than that for the settlement amount model. This is because each case is associated with a “latent score” giving the probability of dismissal: cases with high scores are very likely to settle and cases with low scores are very likely to be dismissed. The tricky part is that the relationship between the latent score and the probability is non-linear. Instead, it follows an S-shaped curve called a logistic curve:

With a logistic curve, the increase in the probability of settlement associated with a small change in the latent score depends on the original latent score: at the extremes, the change in probability is quite small whereas in the middle it is quite large. For example, an increase of 0.1 from -4.0 to -3.9 hardly changes the probability as can be seen in the above figure (the probability only goes from 1.8% to 1.9%). Similarly, an increase of 0.1 from 4.0 to 4.1 hardly changes the probability as can be seen in the above figure (the probability only goes from 98.2% to 98.4%). On the other hand, in the middle of the curve, small changes can have a substantial impact. For example, an increase of 0.1 from 0.0 to 0.1 changes the probability substantially from 50% to 52.5%. In the descriptions which follow, the increase in probability will be for the middle of the curve where the “action” is.

We identified ten “statistically significant” predictors of the probability of settlement:

• Total Securities: A 1% increase in total securities is associated with a 0.4% increase in the probability of settlement.

• Filing Time: A 1% increase in the filing time is associated with a 0.02% decrease in the probability of settlement.

• Market Capitalization: A 1% increase in market capitalization of the plaintiff firm is associated with a 0.02% decrease in the probability of settlement.

• Company Return: A 1% increase in plaintiff firm’s return over the class period is associated with a 0.15% decrease in the probability of settlement.

• S&P 500 Return: A 1% increase in plaintiff firm’s return over the class period is associated with a 0.3% increase in the probability of settlement.

• GAAP: Allegations of GAAP violations are associated with a 13% increase in the probability of settlement.

• 10b5: Rule 10b-5 cases are associated with a 25% decrease in the probability of settlement.

• Individual Plaintiff: Individual plaintiff cases are associated with an 8% increase in the probability of settlement relative to cases with an institutional plaintiff.

• Unknown Plaintiff: Unknown plaintiff cases are associated with a 60% decrease in the probability of settlement relative to cases with an institutional plaintiff.

• Google Hits: A 1% increase in the number of Google News hits is associated with a 0.02% decrease in the probability of settlement.

Again, many of these results have an intuitive basis. The result for filing has the same intuition as for settlement amounts: many of the “best” cases are filed early. While we had no a priori expectation for the market capitalization result, perhaps bigger firms are better able to defend themselves; regardless, the effect is weak. Nonetheless, this as well as the result for Google Hits may be a result of the “plaintiff selection effect” whereby plaintiffs select cases which are more likely to be dismissed but will settle for a large amount conditional on surviving the motion to dismiss. This is consistent with the results presented above. Not surprisingly, as the company’s return during the class period goes down (potential evidence of fraud), the likelihood of settlement goes up; this is exacerbated when the market as a whole (as measured by the S&P 500) goes up during the same period. GAAP violations are a classic merits variable and therefore associated with increased likelihood of settlement. A combination of Riskmetrics’ incentive in gathering data as well as the plaintiff selection effect are likely at play in explaining the plaintiff results (i.e., Riskmetrics’ is more likely to gather plaintiff information for cases which settle and, further, institutional plaintiffs are more likely to become involved in cases with larger damages potential). Riskmetrics’ incentives are also likely to explain the number of securities result, as Riskmetrics will be more likely to gather all of the securities information for cases that actually do settle. Finally, the lower probability of settlement of 10b5 cases likely reflects the greater difficulty of proving the knowledge required (scienter) as compared to, for example, Section11 cases.

We hope this is more interpretable and clarifies matters some. Thank you again for your interest in and coverage of our work