The columns are labeled and should be mostly self-explanatory. But, in case not, the columns are defined here.
These seven files can be downloaded here: Main CSV Download.
This download consists of a single .zip file which contains the seven aforementioned .csv files which contain information for all games for which Retrosheet has compiled at least some information from 1901 - 2024. This includes Negro League games, all-star games, postseason games, et al.
In total, the files here cover 219,369 games, for which Retrosheet has box scores for 216,065, and event files for 199,830. The latter number includes 4,688 "deduced" games. Overall, the 199,830 play-by-play accounts which Retrosheet has compiled contain a total of 15,852,865 total plays. Needless to say, many of the files here are quite large.
Data Subsets: In addition to the general file described on this page, Retrosheet is pleased to also offer subsets of these data for download. Such offerings are described here.
'stattype'
In some cases, some player and/or team lines may be reflected within multiple statistical lines for a particular player or team as a way to convey uncertainty. One column of all statistical lines is 'stattype'. The variable 'stattype' may take on one of four possible values.
All teams and players will have lines with stattype 'value' regardless of how little information may be known. Data for which Retrosheet has no information will be blank. In most cases where we have some information, Retrosheet has attempted to make its best estimate of player statistics and has assigned these totals to the stattype 'value'. In cases where there is some uncertainty, additional lines with stattype 'lower' or 'upper' may be added.
The stattypes 'upper' and 'lower' are applied to some Negro Leagues data for which there may be some uncertainty. As an example of 'upper' and 'lower' stattypes, we may know that a pitcher was knocked out of the game in the 5th inning and that the opposing team scored 4 runs in the 5th inning. In this case, the lower and upper bound for the pitcher's innings pitched would be 4 and 4.2, respectively, and the lower and upper bound for the pitcher's runs allowed would be 0 and 4 (plus whatever we know the pitcher allowed in his first four innings pitched).
The statype 'official' is used to identify statistics for which Retrosheet's totals differ from official league records - i.e., these identify what Retrosheet calls 'discrepancies'. Lines with stattype 'official' only exist for players and/or teams for which there is at least one relevant "discrepancy". Lines with stattype 'official' will only report statistics that were official statistics at the time they were compiled.
A simplified set of .csv files which only include 'stattype' = 'value' can be downloaded here: Simplified CSV Files.
'biodata'
A set of accompanying biographical files can be downloaded here: Biographical Files.
This .zip file contains seven .csv files
The biographical files associated with ballparks, coaches, managers, teams, and umpires include columns labeled 'first_g' and 'last_g', which indicate the first and last game (chronologically) in which the relevant person, place, or team appears within the full set of Retrosheet games (since 1901). Dates in all of these files take the form 'yyyymmdd'. For ballparks, managers, teams, and umpires, each person, place, or team has a single entry with 'first_g' and 'last_g' spanning multiple seasons as necessary.
Data Subsets: In addition to the general file described on this page, Retrosheet is pleased to also offer subsets of these data for download. Such offerings are described here.
It is always Retrosheet's intention that everything it offers for download (and displays on its website) is as correct as possible. The data offered for download here include no known errors. Nevertheless, this is a massive undertaking and the volume of data here is quite formidable. Please let me know if you find any errors or have any issues with any of the files.
Enjoy!
Tom Thress
Retrosheet President
Recipients of Retrosheet data are free to make any desired use of the information, including (but not limited to) selling it, giving it away, or producing a commercial product based upon the data. Retrosheet has one requirement for any such transfer of data or product development, which is that the following statement must appear prominently
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.
Retrosheet website last updated January 8, 2025.
All data contained at this site is copyright 1996-2025 by Retrosheet. All Rights Reserved. Click here for information about the use of Retrosheet data