A new set of files are tentatively being released for download as of July 28, 2024. These files take the form of a series of .csv files which summarize game-level data for all games for which Retrosheet has compiled from 1901 - 2023. These files mostly mirror Retrosheet's Negro League downloads.
These files are contained within a single zip file which can be downloaded here. The size of the ZIP file is 1.01Gb. The total size of the unzipped files is 9.23Gb. It is my intention that these files be largely self-explanatory. Nevertheless, here is some explanation.
There are five master .csv files which can be found in the root folder of the zip file.
The columns are labeled and should be mostly self-explanatory. But, in case not, the columns are defined in the document contents.txt which is included in the zip file (and can be read here).
In some cases, some player and/or team lines may be reflected within multiple statistical lines for a particular player or team as a way to convey uncertainty. One column of all statistical lines is 'stattype'. The variable 'stattype' may take on one of four possible values.
All teams and players will have lines with stattype 'value' regardless of how little information may be known. Data for which Retrosheet has no information will be blank. In most cases where we have some information, Retrosheet has attempted to make its best estimate of player statistics and has assigned these totals to the stattype 'value'. In cases where there is some uncertainty, additional lines with stattype 'lower' or 'upper' may be added.
The stattypes 'upper' and 'lower' are applied to some Negro Leagues data for which there may be some uncertainty. As an example of 'upper' and 'lower' stattypes, we may know that a pitcher was knocked out of the game in the 5th inning and that the opposing team scored 4 runs in the 5th inning. In this case, the lower and upper bound for the pitcher's innings pitched would be 4 and 4.2, respectively, and the lower and upper bound for the pitcher's runs allowed would be 0 and 4 (plus whatever we know the pitcher allowed in his first four innings pitched).
The statype 'official' is used to identify statistics for which Retrosheet's totals differ from official league records - i.e., these identify what Retrosheet calls 'discrepancies'. Lines with stattype 'official' only exist for players and/or teams for which there is at least one relevant "discrepancy". Lines with stattype 'official' will only report statistics that were official statistics at the time they were compiled.
The root folder of the zip file here also contains our master files for people (biofile0.csv), ballparks (ballparks.csv), and teams (teams.csv).
In addition to files which aggregate all games (since 1901), we also have compiled separate logs by team, by ballpark (subsets of gameinfo.csv) and by player (subsets of batting.csv, pitching.csv, and fielding.csv). For ballparks and players, these aggregate across all seasons. These can be found within separate folders within the zip file here. Hopefully, the names of these folders are self-explanatory: 'teamlogs', 'parklogs', 'playerlogs', respectively.
For teams, there are two sets of team-specific logs: {team}_log.csv is a subset of gameinfo.csv; {team}_stats.csv is a subset of teamstats.csv. These are compiled by team-season, so, for example, statistics are compiled for the 1936 New York Yankees within the files 1936NYA_log and 1936NYA_stats. Team files are not compiled across seasons (yet).
A set of game logs for umpires are also available within the folder 'umpirelogs'. Umpire logs may be incomplete prior to 1912 (Retrosheet's first season with fully-proofed event files).
In addition, the zip file contains a folder for each season between 1901 and 2023. Each of these folders contains versions of the five core .csv files - gameinfo, teamstats, batting, pitching, fielding - for all games within that season. The season-specific file names include the relevant season as a prefix to their file names - e.g., the 'gameinfo' file for the 1950 season is '1950gameinfo.csv'.
Each season folder contains two additional .csv files. The file 'allplayers.csv' identifies all players who played during that season. This file merges all of our roster files for the season.
For seasons for which we have released some event files (1912 - 2023), the file 'plays.csv' includes parsed play-by-play output for all of the games for which Retrosheet has released play-by-play data. (The files '1937plays.csv' and '1938plays.csv' include parsed play-by-play data for some deduced Negro League games which have not yet been formally released; mostly because it was easier to include them here than try to exclude them.). As with the other .csv files included here, it is hoped that the column headings are self-explanatory. But the specific columns within these files are described here. The contents linked there are also included as a text file within the file which can be downloaded at the link below.
Download All Daily Logs (1.01Gb zip, 9.23Gb unzipped)
The data here should be thought of as a Beta release. It is certainly my intention that everything here is as correct as possible and these data include no known errors. I also understand that this is a massively large file. The files here include significant duplication. It is my eventual hope to incorporate the ability to download subsets of this data directly from relevant pages of the website (e.g., here). That work remains ongoing. In the meantime, please let me know if you find any errors or have any issues with any of the files.
Enjoy!
Tom Thress
Retrosheet President
Recipients of Retrosheet data are free to make any desired use of the information, including (but not limited to) selling it, giving it away, or producing a commercial product based upon the data. Retrosheet has one requirement for any such transfer of data or product development, which is that the following statement must appear prominently
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.
Retrosheet website last updated September 18, 2024.
All data contained at this site is copyright 1996-2024 by Retrosheet. All Rights Reserved. Click here for information about the use of Retrosheet data