Manchester United has decided to take a chance on Ruben Amorim to lead their 25/26 campaign which is due to start in less than a month's time. And with the clock ticking ever so closer to their first game against Arsenal (featuring a striker quite familiar to the Iberian manager) is there still enough time to address a position that has been a point of concern for the past few seasons? And with seemingly every striker target this summer signing for rivals, we take a stab at trying to find suitable alternatives for the sake of discussion.
In this article we take a data-driven approach to do just that. We first detail our methodology, then we apply this methodology to the target player: Hugo Ekitike, then we examine suitable candidates for alternatives, and we conclude with a summary of our findings and provide readers with a recommendation.
This portion of the report details a data-driven approach to identifying a suitable and attainable alternative for a target forward, Hugo Ekitike. It is highly technical, and if you'd prefer to skip to the results section of this report you can skip to the "Target Player Analysis" section.
The methodology consists of two main phases:
Similarity Modelling: Using unsupervised machine learning, we identify a cohort of players with statistically similar playstyles to our target.
Performance Analysis: We conduct a deep-dive analysis of the target and the most promising alternatives, evaluating their performance through a series of visualizations which cover shot selection, finishing quality, and overall statistical output.
The objective is to move beyond simple goal and assist counts, xG and xA(G) comparison, and other previous attempts to utilize quantitative and qualitative models. Our aim is to find a player who can truly replicate the target's specific role and contribution to their team.
Our analysis is built on a multi-step quantitative framework.
To find players who are truly similar, we go beyond surface-level statistics, utilizing some of the most comprehensive football datasets available from the 24/25 season of the big 5 leagues (EPL, La Liga, SerieA, Bundesliga and Ligue 1). The following is applied to this data to find players similar to the target
Dimensionality Reduction (PCA): A player is defined by dozens of statistics (e.g., passes, shots, tackles, pressures). Principal Component Analysis (PCA) distils these stats into a smaller set of "Principal Components." Each component represents a distinct, abstract aspect of a player's style ("high-volume shooter," "creative passer", etc.). This allows us to compare players based on their fundamental style rather than noisy individual metrics selected subjectively (remember, I am an amateur football analyst doing this as a hobby).
Clustering (K-Means): Using these principal components (PCs), we apply K-Means clustering to group all players in the dataset into distinct clusters. Players within the same cluster share a similar overall playstyle based on these principal components of our player model. This serves as our first filter for identifying alternatives.
Euclidean Distance: Within the target player's cluster, we calculate the Euclidean distance between our target and every other player. This is a "straight-line" measure in the 2D component space (we only use the top 2 PCs for the cluster plot). Players with the smallest distance are the most direct stylistic matches, providing a ranked list of alternatives.
Once similar players are identified, we must profile them to understand the nuances of their performances for their respective clubs in the 24/25 season.
Feature Scaled Radar Chart: This provides a 360-degree view of a player's strengths and weaknesses. Crucially, it uses min-max normalization, showing how the player performs in key metrics relative to all their peers. This standardized scale (0-100) allows for fair and direct comparison between players, even on metrics with different native scales (e.g., raw counts like goals scored in a season vs. per-90 stats).
Shot Analysis: This answers two critical questions: "Where do they shoot from?" and "How well do they finish?"
Average Shot Distance & Top Shot Zones: These plots visualize a player's shooting habits. We can identify if they are a "poacher" who shoots from close range or someone who prefers shots from distance. We specifically highlight the top 5 most effective shot zones, ranked by average Post-Shot Expected Goals (PSxG), to see where they generate the most quality chances.
Shot Outcome & Quality (PSxG) Breakdowns: These plots detail a player's finishing ability. We analyse the frequency of goals, saves, and blocks for each body part (foot, head). More importantly, by looking at the average PSxG by body part, we can assess their finishing quality independent of luck (more on PSxG here).
Activity Heatmap (courtesy of Sofascore): To understand a player's positioning, pitch-side bias, and work rate, a heatmap of their on-field touches (sourced from Sofascore) is used as a qualitative supplement.
Ekitike falls into cluster 2 in our findings. This cluster is characterized by players with a high goal threat who also play a significant role in their team's creative buildup.
K-Means Player Clusters:
TIP: right click images and select "Magnify Image" (if you're on Microsoft Edge) or "Open Image in New Tab" (and open the downloaded webp) to zoom in
I'm not a fan of webp either, I'm sorry...
For commentary on the features of the principal components and validation of our model, please refer to the Appendix section.
The Euclidean distance analysis reveals the following players as his closest stylistic matches:
Top 5 Most Similar Players:
Player | Distance | Cluster |
---|---|---|
Serhou Guirassy | 8.450887 | 2 |
Valentín Castellanos | 9.324525 | 2 |
Moise Kean | 9.513885 | 2 |
Ollie Watkins | 9.583903 | 2 |
Emanuel Emegha | 9.994472 | 2 |
See Appendix for a more complete list of players.
Performance Radar Chart:
The Radar chart captures how well the player performs as a forward. Min-Max Normalization not only scales each metric for easier interpretation but also implicitly compares the player to other forwards in our dataset. However, these values should serve as qualitative "scores" rather than absolute values (e.g., the chart below doesn't plot Ekitike's absolute xG; it's a score applied to him based on his and other forwards' xG).
Shot Profile:
Typical Shot Distance | Top 5 Shot Zones by Quality |
---|---|
![]() | ![]() |
Finishing Analysis:
Shot Outcome Frequency | Average Shot Quality by Body Part |
---|---|
![]() | ![]() |
Activity Heatmap:
Ekitike's heatmap reflects his work rate and typical on-pitch contributions. There is a bias in and around the box, but this heatmap also shows how prevalent his contributions are across the pitch.
Hugo Ekitike appears to be a brilliant player, and Liverpool must have done their due diligence when deciding to take a gamble on him. We believe the analysis presented so far shows why they likely felt comfortable taking this risk: he has a very high xG relative to his peers, contributes to many stages/aspects of play, and demonstrates high work rate and creative output.
When examining the cluster graph closely, there are not many forwards in the big 5 leagues who pose a bigger goal threat (PC1) while also showing moderate creativity for a center forward (who can surely slot into an AM role, pun unintended).
This poses a challenge when finding an alternative that has the potential to provide the goal-scoring ability and creativity that United desperately needs. Essentially, there are not many players like him in the top 5 leagues (further demonstrated by the cluster graph). We did our best to identify players we think are worth investigating.
One of the best players identified, who can be classified as a "pure striker." He contributes little to the buildup of attack but contributes to a high goal threat for his team.
Performance Radar Chart:
Shot Profile:
Average Shot Distance | Top 5 Shot Zones by Quality |
---|---|
![]() | ![]() |
Finishing Analysis:
Shot Outcome Frequency | Average Shot Quality by Body Part |
---|---|
![]() | ![]() |
Activity Heatmap:
Similar to Guirassy, but contributes even less to the attack besides direct goal threat.
Performance Radar Chart:
Shot Profile:
Average Shot Distance | Top 5 Shot Zones by Quality |
---|---|
![]() | ![]() |
Finishing Analysis:
Shot Outcome Frequency | Average Shot Quality by Body Part |
---|---|
![]() | ![]() |
Activity Heatmap:
Weaker candidate for the CF role based on last season's performances, with a left-side bias in his heatmap.
Performance Radar Chart:
Shot Profile:
Average Shot Distance | Top 5 Shot Zones by Quality |
---|---|
![]() | ![]() |
Finishing Analysis:
Shot Outcome Frequency | Average Shot Quality by Body Part |
---|---|
![]() | ![]() |
Activity Heatmap:
A surprise in our analysis, this player displays very high creative output but lower than expected goal threat for a striker. He also seems to take corners for his team, which may contribute to the higher creative output.
Performance Radar Chart:
Shot Profile:
Average Shot Distance | Top 5 Shot Zones by Quality |
---|---|
![]() | ![]() |
Finishing Analysis:
Shot Outcome Frequency | Average Shot Quality by Body Part |
---|---|
![]() | ![]() |
Activity Heatmap:
Probably the second player I'd like to highlight on this list. He exhibits a surprisingly high goal threat relative to other forwards in this dataset. He also demonstrates high-quality shots with both feet and his head at relatively high volumes (scoring more goals on his weaker foot in the league). Creative contribution to his team is relatively low, but he could serve as an alternative to Guirassy, Kean, and Watkins (all of whom may be unattainable).
Performance Radar Chart:
Shot Profile:
Average Shot Distance | Top 5 Shot Zones by Quality |
---|---|
![]() | ![]() |
Finishing Analysis:
Shot Outcome Frequency | Average Shot Quality by Body Part |
---|---|
![]() | ![]() |
Activity Heatmap:
It has been a challenge to find a player who shows both the same level of goal threat and creative output as Hugo Ekitike. This probably reflects Liverpool's desire to acquire this player for over €90m. Similar to Roberto Firmino and the late Diogo Jota, he appears to be a player who can drop deep and assist in stages of play beyond the final third. His significantly high goal threat would surely have been an attractive trait for the Reds, who would no doubt like to retain their Premier League title by bolstering their squad further.
This, however, makes the task of finding an alternative player rather difficult. While there are a few strikers who show a similar level of goal threat, and others showing promise to do so, no Center Forward shared Ekitike's high level of creative output.
Serhou Guirassy showed a similarly high level of goal threat but lacked Ekitike's creative output; the same can be said for Moise Kean. Ollie Watkins seemed underwhelming on all fronts compared to the other aforementioned players. Even ignoring these points, each of these alternatives is likely unattainable for United given their current budget: Guirassy recently signed for Dortmund, who are notorious for pricing their players aggressively. Kean's release clause recently expired, and he has four years left on his contract. Watkins has three years left on his, and his club has issued a "hands-off" message for this player.
Emanuel Emegha also has three years left on his deal with Ligue 1 side RC Strasbourg, but his club may not demand a high fee for him (Clearlake Capital ownership and its implications were considered). He also showed a surprisingly high level of goal threat last season (though still low creative output) and performed high-quality shots with both feet and his head at relatively high volumes. He may not be a 1:1 alternative, but he might be a good goalscoring option for United, and he is therefore my pick. Otherwise, they might be perfectly fine with waiting for better options to become available or looking at talent outside the top 5 leagues.
The principal components for our clustering model are thus:
Since our data set was filtered for only forwards (defined by FBRef) features used in our model reflected qualities typically sought out in a forward. The first principal component seems to capture direct goal scoring threat, whereas the second principal component captures creative threat. For more on PCA, see here.
Features from FBref and Understat were used for this analysis, to train the clustering model we omitted per/90 metrics, discipline metrics, and redundant features as part of the cleaning process to improve variance in our player model (aiding in finding clusters; data from WhoScored would've been helpful to also have, but there seems to be an issue in this SoccerData scraper class).
For more on the Davies-Bouldin and Caliniski-Harbasz Index, see here and here.
Here is a more complete list of players similar to Ekitike:
As mentioned previously, all data used for this study comes from FBRef and Understat. The latter datasource only covers the top five leagues, making it difficult to use these two datasources for finding talent outside of these league.
One improvement is to try to incorporate other datasources, drop Understat, and cover more leagues (such as Eredivise and Primera Liga (POR)). There are some technical isssues there that made me put this on hold, however:
the WhoScored datasource for SoccerData does not expose the same functions required for this study
this will likely require extra dev work on my end to extend SoccerData's features, something I'd like to avoid
there will need to be a restructuring of the codebase due to dropping Understat for WhoScored
One other improvement is to focus the Player Similarity Modelling portion of this study away from goal threat and creative output and more on "style of play". This will allow the analysis to possibly suggest players outside of their clusters (as they are right now) which have similar styles of play, possibly helping to identify players who could be alternatives given extra coaching or a change in environment.
SoccerData: the tool used to scrape the datasources used for this study.
FBRef: and the team that provides the football world with all the beautiful data we use for this study
Understat: and their team for also providing the football world with more beautiful data used in this study
unearthed.eth
<100 subscribers