
Joel Sokol is a Fouts Family Associate Professor in the Stewart School of Industrial & Systems Engineering and Director of Georgia Tech's interdisciplinary Master of Science in Analytics degree. Dr. Sokol's primary research interests are in sports analytics and applied operations research. He has worked with teams or leagues in all three of the major American sports and has received Georgia Tech's highest awards for teaching.
Where are you from originally and how long have you been at Georgia Tech?
I'm from New Jersey and I went to college at Rutgers, which is a big state university there. I got my Ph.D. at MIT and then I came down here. This is basically my first real job, and I've been here since the fall of 1999.
What drew you to sports analytics?
When I was a kid I was a big sports fan. When I was in college I didn't know what I wanted to major in, so I tried a lot of different things, and I didn't really like them much - at least not enough to do them as a career. Then one of my roommates said "Hey, I'm taking this class that's very mathy, but it's more applied, and I think you'll like it." I took it with him - it was an optimization course in operations research - and I loved it. I decided that was what I wanted to do. It seemed like every week we were learning something where I could say "Hey, I can apply this to analyzing baseball" or "This would help with my fantasy football team."
So from the time I started learning this stuff I was thinking about how it could be applied to this sort of analysis. When I was in grad school I wrote a paper on optimizing baseball teams, particularly batting orders. So it all started from there.
Where has it led?
Now I'm leading the team of people that puts out the NCAA tournament rankings every year, the LRMC. I've done some consulting for sports teams in three different leagues - MLB, the NFL, and the NBA. It's a lot of fun doing it.
How much has your fantasy football team improved since you started applying your research to your hobbies?
It's interesting. At first I played fantasy baseball a lot more, and those teams got a lot better, even in college. Then the more I started doing it as part of my research and consulting professionally, I find the less I got into it for fun. With a professional stake I watch in a very different way. When I watch the NCAA tournament instead of thinking "I'd like this school to win" I start thinking about what would make the model look better. I've always been a bigger baseball fan than other sports, and in this year's Major League Baseball playoffs four of the 10 teams were teams I'd done stuff with. I found I was rooting for them, even if they were playing teams I usually root for. I still enjoy it a lot, but the way I watch sports has changed.
What sports are doing the most to incorporate analytics into decision-making and which are playing catch-up?
I know the least about hockey, but my impression is that they're somewhat behind in analytics. Baseball is pretty far advanced, partly because there's such an individual against individual matchup, so you don't have to worry about team effect and trying to decompose things like, if your team gained four yards on a running play, how much of that goes to the left guard, and how much is interaction between other positions. Baseball is more straightforward. Basketball may have been behind for a while, but they seem to have caught up quite a bit, especially with the new SportVU technology that allows them to track the ball and every player at 25 frames per second and really get a lot of insight from that.
How many teams have you worked with?
I've worked with five baseball teams, to varying degrees, and I've started working with a football team and a basketball team. I can do it all from here, so I don't need to travel for games. I'm not a scout- I don't need to see them play live.
Is your basketball model (the LRMC) a work in progress or do you think it's as refined as it's going to get?
I've had a handful of students looking at ways to improve the model, and they've come up with some very smart improvements over the years, but it turns out that as smart as they are and as correct as they are, there's really not much difference in the results. I have Ph.D. students working on this and we've discovered that there's sort of a limit. You can't predict better than 70-75% of the games, and the reason seems to be that there's such a random component in college basketball, especially compared to how well mathematical models and experts can pinpoint how good teams are. If you look at the Las Vegas spread, and they're the real experts, about a third of their games are off by 11 or more points. People's estimates of how good a team really is are much more precise than plus or minus 11 points, relative to each other. So there's this big random component that's hard to get past.
I don't think that will be there forever though. I think a lot of what we consider randomness is really human factors. For example, there's been some work done on travel and timing and how that impacts results. If you're a west coast team coming to play a game on the east coast at 1 pm, it feels like you're playing at 10 o'clock in the morning, and those teams don't do as well. 10 years ago we might have said that's randomness. Now we see it as a human factor and something the biggest betters take into account, and something we can take into account in our models. Nate Silver, from fivethirtyeight.com, takes those factors into consideration in his basketball model. We don't. We rank the teams as if they're all playing at their peak. I think there's room to measure that day-to-day human variability effect but right now we don't have access to that data in many cases. I don't think there are going to be many improvements to any of the models, including ours.
How much could a bad performance from one player - maybe someone worried about an exam - wreck the model? That would seem to be very random.
I believe a lot of how a player will perform isn't random. It's individual variation from day-to-day, but if you could theoretically get data on class and exam schedules and things like that someone might be able to crunch it and get a slightly more accurate model.
Do you have any success stories from your work with teams that you can point to that you felt validated your work?
There are some instances where I'd say yes, but I'm not actually allowed to talk about them.
But of my public work, I'd say this basketball model is probably one. The first year it went live we put it together it was after the 2002-03 season. And what really spurred it was that Georgia Tech had played on one of those holiday tournaments and they were playing Tennessee and they were ahead by a point with just a couple of seconds left. Then a Tennessee player hit a half court shot and won the game. At the end of the regular season people normally say this team is on the bubble, or this team is in and this one is out. So Georgia Tech was an out team and a lot of the experts said if they had won one more game they would have put them on the bubble and they would have had a shot at the tournament. That made me think back to that game. If some guy hits a half court shot does it really say that Georgia Tech is a different team than if he'd missed the half court shot, which is what usually happens?
After that year I started putting the model together. I knocked on my statistic expert colleague's door, Paul Kvam, who is no longer at Tech. We put this model together and tested it on the fly for the first time the next year, in 2003-04. Before the NCAA tournament started that year our model was basically the only one that was predicting Georgia Tech going to the Final Four. It was a little bit worrisome, because we were trying to make the case that we had this completely unbiased mathematical model and here we were at Georgia Tech picking Georgia Tech to go to the Final Four, and nobody agreed with us. And of course, Tech made it to the Final Four that year. They made us look good. That, and some luck probably. Tech played lots of close games in that run and every round we kept saying Tech was likely to win. That sort of put us on the map.
A few years after that our model correctly predicted the Final Four, the finalists, and the winner. Even the NIT winner. Again, that's probably more luck than anything else. I don't expect it to ever happen again because there's so much randomness, but it really helped in terms of getting us attention and getting people to pay attention to it.
Do you have many students pursuing careers in sports analytics? What sort fo advice do you give them?
At the PhD. level for a long time I was actively discouraging people from doing a sports analytics thesis. If they wanted to go into industry with it they would have been okay, but if they wanted to use it to go in academia I don't think they would have been taken seriously at the time. But lately I think that has changed. It's a more mainstream field to be working in and I have a student doing a sports analytics thesis now, but he's not going to pursue a career in it. In our interdisciplinary Master's of Science in Analytics degree one of those students did go into sports analytics. He's working for a sports startup. I've also talked to some undergraduates and given them some advice along the way and they're working in the field now.
Generally, the people that do this for a hobby, and do it well, are the ones getting cherry picked for jobs. They write about it on blogs and websites, get noticed, and get hired. It happened in baseball and football and basketball and it's been happening in hockey. That's the most sure-fire way to get noticed, but you have to have good stuff first. You have to stand out, because there are so many people blogging.

