Today’s a discussion of baseball, stats, box scores, and three handy Kotlin functions.
tl;dr
We cover three built-in Kotlin functions today:
filter { ... }
: Return a newCollection
with all the items that cause the attached predicate block to returntrue
.flatMap { ... }
: Take aList
ofList
objects and turn it into a singleList
with all the items together.groupBy { ... }
: Turn aList
into aMap
with the value from the key selector block.
Those three functions take the stats for every day of the season for all batters and turn them into a single set of totals for each batter.
Background
The code you’ll see today comes from the Android Baseball League (ABL) APIs, the data source for the second half of “Kotlin and Android Development featuring Jetpack”.
The quick summary of the ABL is that I wanted to use a baseball app for the advanced half of the book, and rather than worry about any legal issues with using real data from a site like Retrosheet, I “just” wrote my own baseball simulator. If you want to hear more about that whole process, I recorded a video on it:
Today’s focus is on calculating batting stats from the ABL and how we can convert data to get what we need.
The End Goal
The goal here is to get box scores for all batters for the season up to a given date, represented by a LocalDateTime
parameter. The end result is a set of summaries which contain data like you see here. We can then use this data in the ABL app to show the leaders in home runs, RBI, or any of the other two-dozen+ stats using a BatterBoxScoreItem
(the data class
holding all the stats). For reference, that class partially looks like this:
data class BatterBoxScoreItem(
val player: Player,
val games: Int,
val plateAppearances: Int,
val atBats: Int,
val runs: Int,
// More stats live here
) : BoxScoreItem {
// We also have calculated stats based on other stats
val totalBases = (hits + doubles + triples * 2 + homeRuns * 3)
}
As a BatterBoxScoreItem
can be the stats for any number of games, we can convert the daily box scores into a single BatterBoxScoreItem
per player for all the games played in our time frame. This is what calculateBatterBoxScores()
, the function we’re focusing on, does: it takes a list of BatterBoxScoreItem
objects for a player and combines them into a single BatterBoxScoreItem
instance.
The Main Function
The batting stats calculation function looks like this:
fun calculateBattingStatsForDateTime(filterDateTime: LocalDateTime? = null) =
battingStats
.filter { (dayOfYear, _) ->
filterDateTime == null || dayOfYear < filterDateTime.dayOfYear
}
.flatMap { (_, batterBoxScoreItems) ->
batterBoxScoreItems
}
.groupBy { batterBoxScoreItem ->
batterBoxScoreItem.player
}
.calculateBatterBoxScores()
The battingStats
object is a Map<Int, List<BatterBoxScoreItem>>
where the key is the day of the year and the List<BatterBoxScoreItems>
contains all the stats for all batters on that day. That Map
contains all the batting data for the entire season, but we only want the data up through the entered filterDateTime
, meaning we need to filter out some items.
Filtering the Data
One of my favorite features of the ABL APIs is that everything can be time-shifted, meaning you can send in a date to any of the endpoints and get the stats for that day. This gives the sense of an ongoing season at any point and gives the ABL app something to update.
filterDateTime
is an optional parameter, so if it’s not there, we can skip the filtering process by including filterDateTime == null
. If omitted, this check will be true
regardless of the data and will allow every item through. Otherwise, we will take the records where the day of the year is before the entered filterDate
.
The filter {...}
function returns a new (immutable) Map
in the same format as battingStats
but now limited to the time frame we want. This is better, but it’ll be easier to get a player’s stats together if we’re working with a single List
instead.
Flat-Mapping the Data
The flatMap { ... }
function takes in a Collection
containing other Collection
objects and puts them all into a single List
. In our case, we take the List
of stats from each day and combine them into one new List<BatterBoxScoreItem>
. All the batting stats for all players in the entered time frame are in this new list, which makes splitting them up per player that much smoother.
Note that the block for flatMap { ... }
can always transform the data as you wish, so you can also change the Collection
objects as you merge them together. And this is always done without modifying the original Collection
.
Grouping the Data
Since we’re looking for the stats for each player, we want to split up the data by each player. The groupBy
function does just that by taking the items in the List
and changing them into a Map
. This Map
has the value returned from the block as the key (in our case, the Player
object) and all the items from the List
as the value.
Once we have the Map
ready, we can use calculateBatterBoxScores()
to convert each Map.Entry
into a single BatterBoxScoreItem
summarizing a batter’s stats. That function (in part) looks like this:
fun Map<Player, List<BatterBoxScoreItem>>.calculateBatterBoxScores() =
this.map { (player, boxScoreItems) ->
BatterBoxScoreItem(
player,
boxScoreItems.sumBy { it.games },
boxScoreItems.sumBy { it.plateAppearances },
boxScoreItems.sumBy { it.atBats },
// All the other stats follow this same lovely pattern.
)
}
Final Thoughts
By using these three functions, we went from stats split up per day to stats split up per player. We get a summarized view of a batter’s output that is then used to give us league leaders or individual player histories.
All three functions work on any subtype of Collection
(List
, Map
, Set
, etc.) and can do even more than we see here via their transformation functions. They’re perfect for dealing with any kind of data you need.