Data Files Available for Download
To keep things simple, Negro League data can be downloaded within a single zip file which contains all data relevant to the Negro Leagues. This file is approximately 18Mb large and can be downloaded here.
The centerpiece of Negro League data are a set of .csv files which summarize game-level data for all (5,255) Negro League games for which Retrosheet has compiled data. There are five such .csv files.
- gameinfo.csv - contains game-level information such as teams, attendance, umpires, etc.
- teamstats.csv - contains team-level statistics - line scores, lineups, and team statistics (batting, pitching, fielding)
- batting.csv - batting statistics by player by game
- pitching.csv - pitching statistics by player by game
- fielding.csv - fielding statistics by player by position by game
The columns are labeled and should be mostly self-explanatory. But, in case not, the columns are defined in the document context.txt which is included in the zip file (and can be read here).
The level of detail at which Negro League data can be determined is highly variable across games and the data "known" is highly uncertain in many cases. For example, for many games, we have no box score but may have a reference to the fact that a particular player had at least one hit in the game. To attempt to convey this uncertainty in our data, teams and players may be given up to three sets of statistical lines for each game within the data files which are available for download. These are identified within the .csv files by the variable 'stattype'.
- stattype 'value' is Retrosheet's best estimate of the relevant statistical total
- stattype 'lower' is the lower bound on a player's total
- stattype 'upper' is the upper bound on a player's total
All teams players will have lines with stattype 'value' regardless of how little information may be known. Data for which Retrosheet has no information will be blank. In most cases where we have some information, Retrosheet has attempted to make its best estimate of player statistics and has assigned these totals to the stattype 'value'. In cases where there is some uncertainty, additional lines with stattype 'lower' or 'upper' may be added.
As an example of 'upper' and 'lower' stattypes, we may know that a pitcher was knocked out of the game in the 5th inning and that the opposing team scored 4 runs in the 5th inning. In this case, the lower and upper bound for the pitcher's innings pitched would be 4 and 4.2, respectively, and the lower and upper bound for the pitcher's runs allowed would be 0 and 4 (plus whatever we know the pitcher allowed in his first four innings pitched).
In addition to these five files which aggregate all Negro League games, we also have compiled separate logs by team (subsets of teamstats.csv divided by team-season), by ballpark (subsets of gameinfo.csv) and by player (subsets of batting.csv, pitching.csv, and fielding.csv). For ballparks and players, these aggregate across all seasons.
In addition to these .csvs, Retrosheet has also compiled event files (.evx files) and box-score files (.ebx files) for games for which sufficient data is available. Games are compiled into a single file for each season for which we have compiled games of the relevant type. In the former case, event files are included both for games for which we have found play-by-play accounts as well as games which have been deduced. The latter are identified within the files via a comment at the start of the play-by-play portion of the file.
Finally, the zip file here includes roster files for all teams for whom Retrosheet has compiled rosters as well as our master files for people (biofile.csv), ballparks (ballparks.csv), and teams (teams.csv). These files include data for all people, teams, and sites across all Retrosheet games, not just Negro League games.
Download All Negro League Data (14 Mb)
Download All Retrosheet Data (241 Mb)
Back to Main Page for Negro League Baseball
NOTICES
Recipients of Retrosheet data are free to make any desired use of the information, including (but not limited to) selling it, giving it away, or producing a commercial product based upon the data. Retrosheet has one requirement for any such transfer of data or product development, which is that the following statement must appear prominently
The information used here was obtained free of
charge from and is copyrighted by Retrosheet. Interested
parties may contact Retrosheet at 20 Sunset Rd.,
Newark, DE 19711.
Retrosheet makes no guarantees of accuracy for the information that is supplied. Much effort is expended to make our website as correct as possible, but Retrosheet shall not be held responsible for any consequences arising from the use the material presented here. All information is subject to corrections as additional data are received. We are grateful to anyone who discovers discrepancies and we appreciate learning of the details.
Retrosheet website last updated September 18, 2024.
All data contained at this site is copyright 1996-2024 by Retrosheet. All Rights Reserved. Click here for information about the use of Retrosheet data
Send comments and suggestions to Tom Thress: tthress-ATsign-retrosheet.org.
Join the Retrosheet Discussion group here: RetroList
Retrosheet is an all-volunteer organization and a 501(c)(3) charitable organization. To volunteer, please e-mail Tom Thress. To make a donation, you can visit here: Donation Page