Retrosheet


Negro League Data

Compiling Negro League Data

To the best of our knowledge, Retrosheet is the first source for game-level Negro League data on the Internet. The data presented here have been compiled by Retrosheet from original sources - primarily newspaper game stories printed at the time of these games. For each season, Retrosheet volunteers have gone through newspapers - mostly what is available online, but also by physically visiting libraries in some cases - and identified a set of games which involved two teams of major-league caliber players. Game files were then created for each game found to identify as much statistical information as possible.

Retrosheet has tried to be fairly liberal in its standards for what constitutes a "major-league" team and whether or not to include games. For example, Retrosheet has included a few teams which were not members of a formal Negro League, including the 1946 Cincinnati Crescents and the 1942 Cincinnati Clowns. Retrosheet has also sought to include all games played between "major-league" teams, including exhibition games. Retrosheet has sought to identify the "gametype" for all of the games it has collected, so researchers are free to exclude exhibition games from their analysis if they desire. And, of course, researchers are also free to exclude games played by (and against) the Cincinnati Crescents or any other team. Basically, Retrosheet's standard is "were the two teams 'major-league'; do we know the score, the date, and the location; if yes to all of those questions, then let's include it."

Presenting Negro League Data

Retrosheet's goal in presenting Negro League data is to provide our best estimate of what actually happened in the games on which we report. At the same time, we also want to convey the extent to which the data we have collected so far may be uncertain.

The level of detail at which Negro League data can be determined is highly variable across games and the data "known" is highly uncertain in many cases. For example, for many games, we have no box score but may have a reference to the fact that a particular player had at least one hit in the game. To attempt to convey this uncertainty in our data, teams and players may be given up to three sets of statistical lines for each game within the data files which are available for download. These are identified within the .csv files by the variable 'stattype'.


All teams players will have lines with stattype 'value' regardless of how little information may be known. Data for which Retrosheet has no information will be blank. In most cases where we have some information, Retrosheet has attempted to make its best estimate of player statistics and has assigned these totals to the stattype 'value'. In cases where there is some uncertainty, additional lines with stattype 'lower' or 'upper' may be added.
As an example of 'upper' and 'lower' stattypes, we may know that a pitcher was knocked out of the game in the 5th inning and that the opposing team scored 4 runs in the 5th inning. In this case, the lower and upper bound for the pitcher's innings pitched would be 4 and 4.2, respectively, and the lower and upper bound for the pitcher's runs allowed would be 0 and 4 (plus whatever we know the pitcher allowed in his first four innings pitched).

The game and player pages on our website report the 'value' figures, where they exist, and blanks otherwise. This is done for aesthetic reasons as much as anything. If people wish to work with the Negro League statistics compiled by Retrosheet, they are strongly encouraged to download our data and make their own decisions regarding which 'stattypes' - 'value', 'lower', or 'upper' - (and which games) are most appropriate for their analysis.

Deduced Play-by-Play Accounts

Retrosheet's primary data output has traditionally been play-by-play accounts of games. We have found some play-by-play accounts for Negro League games - mostly newspaper accounts but also a few scorecards - and have attempted to include as many such accounts as we can find.

Retrosheet has also decided to embrace a tool which we first introduced for AL and NL games in 2011: deduced games. A deduced game is a game for which full play-by-play data have not been found but for which we have one or more moderately detailed game stories and/or box scores. A plausible play-by-play account can be derived from these sources. Depending on the quality of the sources, the result may not be unique but the result should be plausible and conform with the available information about the game as much as possible.

We think that the inclusion of as many deduced games as possible is desirable for a couple of reasons. First, Retrosheet believes that game-level play-by-play data provide a deeper understanding and appreciation of baseball, not merely in terms of statistics, but also in terms of being able to best gain an understanding and appreciation of the players and styles of play throughout baseball history, including the Negro Leagues. This is, of course, our mission. In addition, deducing a play-by-play account is often the best way to resolve discrepancies between game stories and/or box scores. If two game accounts differ in some respect, deducing a game can provide valuable insight in determining which account is more plausible. This is a particularly important consideration with respect to Negro League games which, unlike AL and NL games through most of their history, have no "official" contemporaneous record that one can easily fall back on as definitive (or, at least, "official").

Given these considerations, Retrosheet will attempt to deduce as many games as possible with each new Negro League season being released, beginning with the 1939 and 1949 seasons which were released on June 10, 2024. In addition, Retrosheet will be going back through seasons which we have already released and deducing as many games as possible. As of June 10, 2024, Retrosheet has completed this work for the 1948 season. So far, Retrosheet has been able to deduce approximately 150 games per season. Obviously, we hope that these numbers will grow over time and that many of these deductions may some day be replaced with full play-by-play data from original sources.

One downside to our decision to deduce as many games as possible is that deducing games takes time. As a result, it is likely that our pace of releasing new seasons will slow down somewhat. While we all certainly want to release new seasons as quickly as possible, we also want the data that we release to be as complete and as useful as possible and we think the benefits of deducing as many games as possible outweigh the unfortunate downside.

We hope you enjoy our Negro League presentation. Please do not hesitate to let us know if you find any errors, additional sources for any of the games which we present, or additional games which we may be missing.

Tom Thress Retrosheet President

Retrosheet website last updated December 3, 2024.
All data contained at this site is copyright 1996-2024 by Retrosheet. All Rights Reserved. Click here for information about the use of Retrosheet data

Send comments and suggestions to Tom Thress: tthress-ATsign-retrosheet.org.
Join the Retrosheet Discussion group here: RetroList
Retrosheet is an all-volunteer organization and a 501(c)(3) charitable organization. To volunteer, please e-mail Tom Thress. To make a donation, you can visit here: Donation Page