Expected goals and Big Football Data: the statistics revolution that is here to stay

Paul MacInnes
Expected goals and Big Football Data: the statistics revolution that is here to stay

The first time I came across the phrase “expected goals” was in November 2015. I had disappeared into a Twitter wormhole and when I emerged I was on a site called Statsbomb and an article entitled “Leicester City and their Trip to the Kamikaze Zone”.

At that point in their miracle season, Leicester were merely unlikely upstarts, third in the table with Jamie Vardy nine games into his goalscoring sequence. But the author of the article, Mohamed Mohamed, had identified something rather unusual about the Foxes. The rate at which they were scoring and conceding goals was incredibly high. It was a rate that, if continued until the end of the season, would make them only the fifth side to score more than 60 goals and concede more than 50 in the Premier League. Of the four other occasions such a feat had been achieved, one was by Brendan Rodgers’s title‑dropping Liverpool side and another two by Sir Bobby Robson’s Newcastle United. Clearly something unusual was happening but the kamikaze football was not backed up when you looked at the data – their expected goal difference ratio was only 0.5.

Expected what? Zero point, eh? I may not have realised it but at that moment I had stumbled on a new branch of football analytics, one created in public, often by the public, and one that seems likely to transform the way people watch and talk about the game. I was hardly the first to discover it – there was already a thriving digital community – but at the same time I had never heard it mentioned in any pub, football commentary or match report. I felt a bit like a naturalist might when they stumbled across a silverback gorilla: scared perhaps, certainly wary, but compelled to keep looking.

What exactly are “expected goals” (or as the shorthand has it, xG)? Here is one of the men responsible for its development, Michael Caley, to define the metric in layman’s terms: “The idea is to quantify the likelihood of a goal being scored from a particular shot attempt (or other scoring chance). This is an idea that I think is quite intuitive. ‘We need to create better scoring chances’ is something managers have said forever, and xG is basically just a quantification of that notion. The broad concept has probably been around for a long time in football – Charles Reep’s notion that ‘one of every nine shots is scored’ is a sort of early version of xG.”

The key difference between the notion devised by the notorious post‑war football analyst and inventor of the long-ball game and Caley’s interpretation of it is that word “quantification”. Caley, who holds a doctorate in religious studies from Harvard, started dabbling in football data while he was a student. Now he writes about it full-time, his motto: “Bringing baseball stat nerdiness to football.” Like many of his fellow analyst-enthusiasts, Caley mines masses of football data to establish how likely any given chance might be to end up in a goal. It starts with the position of a chance – one shot in six inside the six-yard box might go in, for example – but it hardly ends there.

Here is Caley describing the variables that inform his own xG model. “Right now my model evaluates shot attempts across a variety of axes: where was the shot attempted from? What sort of pass assisted the shot? With what body part was the shot taken? Did the attacker dribble past his defender before trying the shot? How fast was the attacking move that led to the shot? Was the shot off a rebound or from a set play? All of these factors clearly influence the likelihood of scoring a goal. By aggregating this information into a model, I can estimate the likelihood of scoring different shooting chances in a match or over a season.”

READ MORE: Gossip - Ibrahimovic future latest, Chelsea eye £70m striker

READ MORE: Ronald Koeman gives Lukaku advice on future

READ MORE: Alexis Sanchez 'wants to stay in a winning team'

Get it? Good. Because quantitative statistical analysis (let’s call it Big Football Data) is here to stay. Over the past decade we have become used to the analysis of football becoming ever more in-depth: from the revolutions around nutrition and fitness within the game to Gary Neville and his telestrator outside it. Big Football Data is another leap altogether, however, and while in one sense it simply mirrors developments in many other industries, from retail to logistics, there is some question as to whether the football world, both clubs and supporters, are ready for an approach that essentially says “I value your opinion, but these are the facts”. For an illustration of this one could do worse than Google a recent encounter on US TV between the journalist Gabriele Marcotti and ex-pro Craig Burley, one which began with Marcotti mentioning xG and ended with Burley bawling “enough of this nerd nonsense!”

Ted Knutson is the man who runs Statsbomb, the name for both the website and a proprietary set of analytic tools that Knutson hopes to lease to professional football clubs. Like Caley, Knutson is an American, born in Chicago, but he lives in the UK and became best known as the man crunching the numbers for Matthew Benham, the owner of Brentford FC and an evangelist for a data-based approach to running football clubs. Knutson’s Twitter feed can sometimes seem like a series of almost psychic prognostications (he called out my club, Norwich City, as being in a false position in the Championship approximately two days before they began their descent from the top of the table to outside the play-off places). But while Knutson, to paraphrase the Simpsons, might welcome our new data overlords, he insists that the rest of us also have nothing to fear.

“It’s certainly true that some people inside the game are reluctant to embrace data,” Knutson says. “There are plenty of holdouts and that is totally understandable. Some people feel that the use of data suggests that their last 30 years of knowledge are irrelevant. But: one, that’s not true; and two, you need to be open to new ideas to improve. The game has always been about opinions and plenty of people have valid opinions based on years of experience. Analytics is about enhancing that experience.”

The genesis of expected goals most likely lies with Opta, the data company that has been analysing football matches since 2001, recording all the information that for years has appeared in the small statistical summaries that round up each match on TV and in the papers. According to Caley, it was two of the company’s analysts, Sam Green and Devin Pleuler, who first began modelling xG in the late noughties. Such is the complicated and often cumulative nature of such research however, Sarah Rudd of StatDNA, also an American, was working on similar models at the same time.

To see how analytics now permeates the professional game it is worth knowing that StatDNA was bought by Arsenal in 2014, its research now incorporated wholesale into the club’s decision making. Green, meanwhile, went on to work for Aston Villa while Pleuler is head of analytics at the MLS side Toronto FC. Another example of the way the wind is blowing was the appointment of Michael Edwards, a data analyst, as Liverpool’s new sporting director. While Knutson’s point regarding resistance within the game is no doubt correct, it is surely the case that the influence of Big Football Data will only grow.

There are obvious advantages for professional clubs in getting analytics right, such are the tight margins of competition and the potential financial rewards. According to Knutson: “There’s so much money involved in player recruitment that if you stop one transfer mistake a season it pays for your analysis team for years.”

But enough about the clubs, what about the fans? The Dutch football data analyst who goes by the Twitter handle 11Tegen11 describes the service he offers as going “from watching a football match to seeing a football match …” After each weekend he posts composite images that show xG for both sides in any given game, each effort on goal mapped on to the pitch (the bigger the blob, the better the chance) and his equally fascinating pass maps that map aggregate positions of players on the pitch and the passing links with their team-mates (again, the bigger the blob, the more times they have touched the ball). Similar maps have started to pop up on TV highlights shows, while last week Major League Soccer in the US announced it will make xG data on all its fixtures available after every match. Big Football Data is starting to go mainstream.

 

For Caley, the application of analytics need not change the way people enjoy the game. “I’d want to think about ‘analytics in media’ not as some radical change”, he says, “but where some of the things people in media already try to do, like craft compelling stories about football that describe the game well, are enhanced by analytics.”

Knutson, however, believes we will start watching (or seeing) games differently, partly as a result of a changing audience. “I think people will start watching football differently”, he says. “Posting xG on Match of the Day will be a talking point, it will give them something to argue about and get a rise out of fans. The data will show you what happens over the course of a million shots and experts can tell you why.

“It will also bring a new kind of fan to the game, absolutely. This kind of data overlaps with people who play Football Manager or fantasy league. It’s exactly the same thing, you take information about players and you evaluate them. If you want to be better in fantasy league then you need data. In the US almost everyone grows up with fantasy sports as much as the real thing now.”

Leicester, you may have heard, went on to win the league in 2016. In the new year and with a lot of points under their belt, Claudio Ranieri adapted his side’s tactics and a more conservative approach resulted in them grinding out a series of 1-0 wins. By the end of the season, Leicester had scored 68 goals, but conceded only 36. That gave them a goal difference ratio of 0.52.

By using Yahoo you agree that Yahoo and partners may use Cookies for personalisation and other purposes