Clustering Legends of Runeterra archetypes
Clustering Legends of Runeterra archetypes#
Goal#
Given a list of n
Legends of Runeterra decks, how to automatically group them into archetypes, ie groups of similar decks?
Clustering decks as archetypes allows for better calculation of metrics like win-rate and play-rate. This is a problem that comes back for every card game I have ever played, and is usually “simply” solved by humans sort the decks themselves.
But what if deck clustering could be automated?
Approaches#
Intuition#
I started to think about how to “simply” group decklists. Two ways came to mind:
Choosing decks that have a pool of common cards
Choosing decks that have few different cards
Whichever distance we choose, we can start from individual decks and add new decks one by one, selecting the “closest” one to the cluster at each step. We can do this process in descending popularity order, so ties are broken by the popularity of decks. We can then select the biggest non-overlapping clusters found as good archetypes and re-start the process.
It does sound pretty close to k-means as it also relies on a centroid vector for each cluster, but here we don’t have to select how many clusters we want from the start. We can also have clusters of very different sizes and it will not be an issue with our approach.
We’ll call that method incremental constrained cluster growth and it’s the one I’ll be implementing in this blog post.
Existing clustering algorithms#
Clustery analysis is a complex and heavily researched domain. Algorithms usually exhibit O(n**3)
complexity.
The most popular clustering algorithm, or at least the only one I remember from my M.Sc., is k-means. It’s unfortunately a pretty poor fit here as we don’t know the number of archetypes/clusters we are looking for, and it’s also poorly adapted for clusters of vastly different sizes.
Hierarchical clustering is a better fit for our use-case:
In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis which seeks to build a hierarchy of clusters.
Deck archetypes naturally have multiple levels of “granularity” we can look at. From big macro archetypes being defined only by a few cards to micro archetypes arising from very few card differences. Ordering them in a hierarchy sounds like a great fit even though it will be harder to use for visualisation and analysis.
I will try hierarchical clustering and visualisation at some point, but I’m keeping it for a future blog post!
Preparing data#
Games selection#
As Legends of Runeterra does not offer access to ranks through its API, I use TrueSkill to identify the best players on a given server and parse them in descending skill order. I only use ranked results to determine skill ratings.
I verified the algorithm by verifying that the best players identified by TrueSkill were indeed in Master rank. We can do that by checking account names, which is the only type of rank data available.
This allows me to parse games for the ~20 000 best players per server, which returns ~35 000 ranked games per day per server. As there are 3 servers in total, we have ~100 000 games per day of data. This likely covers rank until platinum, and maybe even gold. But we have no way to check as LoR’s API doesn’t let us query ranks for players ¯\(ツ)/¯
SQL query#
The data parsed from Riot’s API is susceptible to regular model changes and I’ve therefore saved it as a JSONB
column in Postgres
.
I have written that query in many different ways but in the end making a subquery to create a player
table was the easiest for me to maintain. I’m sure this query can be written in a smarter way but I’m still only starting to get used to JSONB
direct querying with Postgres:
SELECT
deck_code,
factions,
COUNT(*) as games,
COUNT(CASE WHEN win::boolean THEN 1 END) as wins
FROM (
SELECT
jsonb_array_elements(lor_game.data->'info'->'players')->>'win' as win,
jsonb_array_elements(lor_game.data->'info'->'players')->>'deck_code' as deck_code,
jsonb_array_elements(lor_game.data->'info'->'players')->>'factions' as factions
FROM lor_game
WHERE lor_game.data->'info'->>'game_version' = 'live-green-3-13-42'
) as player
GROUP BY deck_code, factions
HAVING COUNT(*) > 10
ORDER BY games DESC
Result:
Deck code |
Factions |
Games |
Wins |
---|---|---|---|
|
[“faction_Bilgewater_Name”, “faction_Noxus_Name”] |
39880 |
24067 |
|
[“faction_Demacia_Name”, “faction_Shurima_Name”] |
24118 |
12756 |
|
[“faction_Jhin_Name”, “faction_Noxus_Name”] |
16387 |
9315 |
|
[“faction_Demacia_Name”, “faction_Shurima_Name”] |
16153 |
9146 |
… |
… |
… |
… |
# Loading our env variables before we load our library
import dotenv
dotenv.load_dotenv(verbose=True, override=True)
# We checked for the latest patch value by looking at the most recent games
latest_version = "live-green-3-13-42"
# Connecting to the database
from neotokyo import db
session = db.connection.ghost_session_maker.session_maker()
# ORM
import sqlalchemy
from sqlalchemy.dialects import postgresql
# Defining our subquery, already filtered on the right game version
# Tbh the postgres JSONB syntax in SQLAlchemy is pretty disgusting
player_table = (
sqlalchemy.select(
sqlalchemy.func.jsonb_array_elements(
db.models.LorGame.data["info"]["players"],
type_=postgresql.JSONB,
)["deck_code"].astext.label("deck_code"),
sqlalchemy.func.jsonb_array_elements(
db.models.LorGame.data["info"]["players"],
type_=postgresql.JSONB,
)["factions"].astext.label("factions"),
sqlalchemy.func.jsonb_array_elements(
db.models.LorGame.data["info"]["players"],
type_=postgresql.JSONB,
)["win"].astext.label("win"),
)
# We put our patch limit here so the subquery return is smaller
.where(
db.models.LorGame.data["info"]["game_version"].astext == latest_version,
)
.subquery()
)
games_count = sqlalchemy.func.count().label("games_count")
wins = sqlalchemy.func.count(
sqlalchemy.case(
(player_table.c.win == "true", 1),
)
).label("wins")
# The disgusting player_table code lets us write a pretty clean and readable query at least!
query = (
sqlalchemy.select(
player_table.c.deck_code,
player_table.c.factions,
games_count,
wins,
)
.order_by(games_count.desc())
.group_by(
player_table.c.deck_code,
player_table.c.factions,
)
.having(games_count > 10)
)
# We get the results as a list
latest_patch_decks = session.execute(query).all()
print(f"Games found: {sum(dd.games_count for dd in latest_patch_decks):,}")
/home/tolki/.cache/pypoetry/virtualenvs/neotokyo-vP6WhsAw-py3.10/lib/python3.10/site-packages/aioredis/connection.py:68: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
hiredis_version = StrictVersion(hiredis.__version__)
/home/tolki/.cache/pypoetry/virtualenvs/neotokyo-vP6WhsAw-py3.10/lib/python3.10/site-packages/aioredis/connection.py:69: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if hiredis_version < StrictVersion("1.0.0"):
Games found: 2,639,173
Deck codes to deck lists#
Riot gives us deck codes, but we want a list of cards to easily compute the distance. Technically one could implement the algorithm in pure SQL, but it’s quite a pain.
We will instead use lor-deckcodes to transform deck codes into a list of card codes and counts, and will then transform card cards into card names using our own database.
To store decklists in an easy to compare format we will use a set of 40 strings with card names postfixed by their occurence number:
{
Veigar-1,
Veigar-2,
Veigar-3,
Senna-1,
...
}
# Defining constants
DECK_SIZE = 40
# Adding a small utility to compute card names
card_cache = {}
def get_card_name(code: str) -> str:
if code not in card_cache:
card_cache[code] = session.get(db.models.LorCard, code).data["name"]
return card_cache[code]
from typing import Set
# A small function to have more readable card lists
# We don't do it based on deck codes/decklists because we want to use it with archetypes later
def cards_table(cards: Set[str]) -> str:
"""Beautiful output of cards"""
import tabulate
# We could do all that with complex list comprehensions but the gain in performance is not worth the loss in readability
tabulate_input = []
current_row = []
added_cards = set()
for card in sorted(
cards,
# The last character is the # of copies and we want to go in descending order
key=lambda x: x[-1],
reverse=True,
):
card_name = card[:-2]
# If we already added the card we continue
if card_name in added_cards:
continue
added_cards.add(card_name)
current_row.append(f"{card[-1]}x {card_name}")
# We put only 4 cards per row
if len(current_row) == 4:
tabulate_input.append(current_row)
current_row = []
if current_row:
tabulate_input.append(current_row)
return tabulate.tabulate(tabulate_input, tablefmt="html")
# Making a dataclass to hold our information
from dataclasses import dataclass
# Utilities for outputting HTML
from IPython.display import HTML, display
# A dataclass automatically creates its own __init__ functions from type hints which is nice
@dataclass
class DeckData:
deck_code: str
factions: str
cards: Set[str]
games_count: int
wins: int
@property
def winrate(self):
return self.wins / self.games_count
def display(self):
display(
HTML(
f"""<h3>{self.deck_code}</b></h2>
<ul><li>Games: {self.games_count:,}</li>
<li>Winrate: {self.winrate*100:.2f}%</li>
<li>Factions: {self.factions}</li>
<li>Decklist: {cards_table(self.cards)}</li>"""
)
)
import lor_deckcodes
decks_data = {}
for row in latest_patch_decks:
deck = lor_deckcodes.LoRDeck.from_deckcode(row.deck_code)
# We store decklist as a set of 40 unique strings as it lets use the intersection operator
# I tried storing decklists as sets of unique integers but saw no significant performance improvement
cards = set()
for card in deck.cards:
for i in range(card.count):
cards.add(f"{get_card_name(card.card_code)}-{i+1}")
# Making sure we do have 40 unique card strings in our set
assert len(cards) == DECK_SIZE
# Saving all the data we got for this specific deck code
decks_data[row.deck_code] = DeckData(
deck_code=row.deck_code,
factions=row.factions,
games_count=row.games_count,
wins=row.wins,
cards=cards,
)
print(f"Decks found: {len(decks_data):,}")
first_deck_code = next(iter(decks_data))
decks_data[first_deck_code].display()
print("Raw cards data: ", decks_data[first_deck_code].cards)
Decks found: 46,092
CICQCAQDAMAQKBQBAEDAMFIEAIDBMGRGHICQCAYCBQHSKKACAEBAMLIBAYDB4AA
- Games: 80,571
- Winrate: 60.21%
- Factions: ["faction_Bilgewater_Name", "faction_Noxus_Name"]
- Decklist:
3x Zap Sprayfin 3x Marai Warden 3x Legion Grenadier 3x Miss Fortune
3x Decimate 3x Precious Pet 3x Legion Saboteur 3x Twisted Fate
3x Noxian Fervor 3x Riptide Sermon 3x Legion Rearguard 3x Jagged Butcher
2x Make it Rain 2x Eye of Nagakabouros
3x Zap Sprayfin | 3x Marai Warden | 3x Legion Grenadier | 3x Miss Fortune |
3x Decimate | 3x Precious Pet | 3x Legion Saboteur | 3x Twisted Fate |
3x Noxian Fervor | 3x Riptide Sermon | 3x Legion Rearguard | 3x Jagged Butcher |
2x Make it Rain | 2x Eye of Nagakabouros |
Raw cards data: {'Noxian Fervor-1', 'Zap Sprayfin-2', 'Zap Sprayfin-3', 'Legion Rearguard-1', 'Make it Rain-1', 'Decimate-2', 'Marai Warden-3', 'Legion Grenadier-3', 'Miss Fortune-2', 'Eye of Nagakabouros-1', 'Miss Fortune-3', 'Legion Grenadier-1', 'Decimate-3', 'Precious Pet-3', 'Jagged Butcher-1', 'Zap Sprayfin-1', 'Riptide Sermon-2', 'Legion Saboteur-3', 'Twisted Fate-2', 'Twisted Fate-3', 'Precious Pet-2', 'Noxian Fervor-3', 'Legion Rearguard-2', 'Miss Fortune-1', 'Legion Saboteur-2', 'Make it Rain-2', 'Eye of Nagakabouros-2', 'Noxian Fervor-2', 'Marai Warden-2', 'Marai Warden-1', 'Riptide Sermon-1', 'Jagged Butcher-2', 'Riptide Sermon-3', 'Legion Grenadier-2', 'Decimate-1', 'Legion Saboteur-1', 'Legion Rearguard-3', 'Precious Pet-1', 'Twisted Fate-1', 'Jagged Butcher-3'}
Implementing incremental constrained cluster growth#
Basic idea#
Define the distance from a deck to a cluster
Iterate on decks in descending popularity (# games)
For each deck
Create a cluster containing only this deck
Add decks one by one by, selecting the one with shortest distance to the cluster each step
Add non-overlapping clusters in descending size
Re-start iteration with any remaining decks
The archetype will be the aggregate decklist as defined by Frank Karsten’s in this ChannelFireball article.
Things that didn’t work out#
With a direct implementation, computational time was through the roof at ~24h with ~20,000 decks
Complexity
O(n**3)
does that
Even with multiple optimisations and shortcuts the method was still way too heavy and took multiple hours for each iteration
I tried an idea I called fast cutoff, which would directly select a cluster if it contained more than 1% of all decks
It did heavily speed up the process at the cost of clustering quality
I was able to not need it anymore once I fixed my code
Deck factions#
The most important thing to add was “pre-clustering”. O(n**3)
complexity means that if we’re able to split the data into 100 sets, complexity goes down by 1,000,000.
And there are 100 obvious sets for our data: the deck’s factions. Those are similar to deck colors in Magic or hero in Hearthstone.
Forcing all decks in an archetypes to be the same factions does lose some granularity for decks that just splash 3/6 cards in a different faction, but it’s actually something we want. The end goal is identifying success of different strategies, and changing a splash in a deck is a new strategy.
The simplification also allowed me to drop the fast cutoff idea as I was working with much smaller lists of decks to cluster at each step.
from collections import defaultdict
# Creating faction groups
# We use lists and not sets because we want to keep the decks ordered by count
factions_decks = defaultdict(list)
for deck_code in decks_data:
factions_decks[decks_data[deck_code].factions].append(deck_code)
print(f"Found {len(factions_decks):,} different factions")
max_faction = max(factions_decks, key=lambda x: len(factions_decks[x]))
print(f"The faction with the most decks is", max_faction)
Found 88 different factions
The faction with the most decks is ["faction_BandleCity_Name", "faction_ShadowIsles_Name"]
Distance and clusters#
Distance between two decks#
Distance is calculated through the intersection operator for sets applied on the set of cards in the deck: &
.
This works as each copy (1-2-3) is defined as a unique string in our decklists.
Distance from a deck to a cluster#
The distance from a deck to a cluster is 0 if adding the new deck does not change the intersection of existing decks in the cluster. If it requires removing a card from the intersection to add the new deck, it’s 1, and so on and so forth.
I initially implemented it wrongly, which cost me a lot of time.
The right way to calculate the distance from a deck to a cluster is to use the cluster’s intersection and compare it to the decklist.
Implementation#
What really matters to us is the distance of a deck to a cluster, so it’s even easier to implement all those as part of a Cluster
class.
While we’re at it we implement the aggregate decklist code in that class as well as some basic stats handling and a beautiful display!
from abc import abstractmethod
from collections import Counter
from typing import Optional
type_cache = {}
def get_card_type(name: str) -> str:
if name not in type_cache:
card = (
session.query(db.models.LorCard)
.filter(db.models.LorCard.data["name"].astext == name)
.first()
)
type_cache[name] = card.data["supertype"]
return type_cache[name]
# A class we'll use to compute cluster stats and display them
class ClusterStats:
def __init__(self, cluster) -> None:
self.cluster = cluster
cards_count = Counter()
self.wins = 0
self.games_count = 0
for deck_code in self.cluster.decks:
self.games_count += decks_data[deck_code].games_count
self.wins += decks_data[deck_code].wins
self.factions = decks_data[deck_code].factions
for card in decks_data[deck_code].cards:
# Instead of just using 1, we use the decklists' wins
# This makes it so not only more popular versions have more weight, but successful ones do too
cards_count[card] += decks_data[deck_code].wins
self.aggregated_decklist = [c for c, count in cards_count.most_common(40)]
champions = set(
c[:-2]
for c in self.aggregated_decklist
if get_card_type(c[:-2]) == "Champion"
)
self.title = f"""{" ".join(champions)} - {
" ".join(n[9:-6] for n in self.factions.replace(" ", "")[1:-1].split(","))
}"""
@property
def winrate(self):
return self.wins / self.games_count
def display(self):
display(
HTML(
f"""<h3>{self.title}</h3>
<ul>
<li><b>{self.winrate*100:.2f}% winrate</b></li>
<li>{self.games_count:,} games</li>
<li>{len(self.cluster.decks)} decklists</li>
<li>Aggregated decklist: {cards_table(self.aggregated_decklist)}</li></ul>"""
)
)
class Cluster:
def __init__(self, center: str) -> None:
self.decks = [center]
def __contains__(self, item) -> bool:
return True if item in self.decks else False
def __len__(self) -> int:
return len(self.decks)
@abstractmethod
def distance(self, deck_code: str) -> int:
...
@abstractmethod
def can_be_added_to_cluster(self, distance: int) -> Optional[bool]:
# TODO Check if we actually need that
# We will use a trilean here
# True -> can be added
# False -> cannot be added yet
# None -> will never be able to be added (useful for some distances)
# We use the distance as argument so we don't compute it twice but to allow for different rules
...
@abstractmethod
def add(self, deck_code: str) -> None:
...
def get_stats(self) -> ClusterStats:
return ClusterStats(self)
Centered clusters#
Given a center
deck and a list of candidates
, we want to find the “best” cluster built around center
.
To do that, we add decks one by one, taking the closest one to the cluster at each step.
There are a few possible optimisations:
If a deck has a distance 0 to the cluster, we should be able to add it instantly
Distance 0 should mean adding it doesn’t change our cluster
import copy
from typing import List
import time
def get_centered_cluster(
center: str,
candidates: List[str],
cluster_class: Cluster,
) -> Cluster:
# We start the cluster with the center
cluster = cluster_class(center)
# We remove it from the candidates
candidates.remove(center)
# We iterate until we aren't able to add an eligible deck or we've added them all
while len(candidates) > 0:
# This is the maximum value (fully disjointed decks)
minimum_distance = float("inf")
minimum_deck = None
# We iterate on possible members
# We copy the list because we want to be able to remove candidates during iteration
for candidate in list(candidates):
cluster_distance = cluster.distance(candidate)
can_be_added = cluster.can_be_added_to_cluster(cluster_distance)
# We check if the deck can be added to our cluster first
if can_be_added is None:
# We use a trilean to speed up the process with some algorithms
candidates.remove(candidate)
elif can_be_added is False:
pass
elif can_be_added is True:
# If distance = 0 we can add the deck directly (happens for common cards clustering)
if cluster_distance == 0:
cluster.add(candidate)
candidates.remove(candidate)
continue
# We check if we found a new closest candidate and save it if that's the case
if cluster_distance < minimum_distance:
minimum_distance = cluster_distance
minimum_deck = candidate
# One step of iterations on candidates is over
# If we didn't find a new minimum deck, we stop
if minimum_deck is None:
break
# Adding the deck we found to our cluster
cluster.add(minimum_deck)
# We remove the deck from the remaining decks and continue iterating
candidates.remove(minimum_deck)
return cluster
Getting all clusters#
Then, we make a function for getting all clusters for a list of deck codes. This one’s pretty simple.
from typing import List
def get_all_clusters(deck_codes: List[str], cluster_class: Cluster) -> List[Cluster]:
"""Gets all clusters centered on each deck code
Args:
deck_codes (List[str]): a list of deck codes
Returns:
List[Cluster]: all clusters found centered on each deck code
"""
clusters = []
for center in deck_codes:
# We copy the list of deck codes because we change it in our code above
candidates = list(deck_codes)
# We get the biggest cluster centered on deck
clusters.append(get_centered_cluster(center, candidates, cluster_class))
return clusters
Clustering a list of decks#
We now have all the necessary building blocks to make our last function. One that takes a list of deck codes and returns clusters containing all decks.
Pseudocode:
Call
get_all_clusters
to get all clusters centered on each remaining deck codeOrder them by size
Add them in descending size order as long as they don’t contain any deck that’s already in our result
The biggest one will always get added
Smaller ones can get added directly if they have no overlap with the big clusters found in that step
Restart the process until all decks have been assigned to a cluster
def get_best_clusters(deck_codes: List[str], cluster_class: Cluster) -> List[Cluster]:
"""Gets the best clusters for the list of deck codes
Args:
deck_codes (List[str]): a list of deck codes
Returns:
List[Cluster]: the best clusters found
"""
clusters = []
remaining_decks = deck_codes
def clusters_contains_deck(clusters: List[Cluster], deck) -> bool:
# Return True if the deck is in any of the clusters already found
return any(deck in c for c in clusters)
while len(remaining_decks) > 0:
possible_clusters = get_all_clusters(remaining_decks, cluster_class)
for possible_cluster in sorted(
possible_clusters, key=lambda x: len(x), reverse=True
):
# If no other cluster already contains any of the decks in the current candidate cluster, we add it
# This means we will always add the biggest cluster found at that step
if all(not clusters_contains_deck(clusters, d) for d in possible_cluster.decks):
clusters.append(possible_cluster)
# Else we pass to the next possible cluster
else:
continue
# We update our remaining decks
remaining_decks = [
d for d in deck_codes if not clusters_contains_deck(clusters, d)
]
return clusters
Clustering based on common cards#
All decks in an archetype must share
ARCHETYPE_SIZE
cardsAny deck whose addition would make the cluster intersection go below
ARCHETYPE_SIZE
can directly be eliminatedAdding more decks will make the cluster more stringent with future additions
ARCHETYPE_SIZE = DECK_SIZE - 10
class CommonCardsCluster(Cluster):
def __init__(self, center: str) -> None:
super().__init__(center)
# Saving the intersection of the decklists in our cluster will help speed up the process
self.intersection = decks_data[center].cards
self.archetype_size = ARCHETYPE_SIZE
@staticmethod
def deck_to_deck_distance(dc_1: str, dc_2: str):
# A small class method for validation
return DECK_SIZE - len(
# & is the intersection operator for sets
decks_data[dc_1].cards
& decks_data[dc_2].cards
)
def distance(self, deck_code: str) -> int:
# If the cluster_intersection is fully contained in the decks_data cards, cluster_intersection = cluster + deck intersetion and len = 0
# This will always be >= 0 because at most there's full overlap between a deck and a cluster's intersection
return len(self.intersection) - len(
# & is the intersection operator for sets
self.intersection
& decks_data[deck_code].cards
)
def can_be_added_to_cluster(self, distance: int) -> Optional[bool]:
# If adding the new deck would make the intersection go over the archetype size, it will never be able to be added
# For example if we currently have an intersection of 30 and an archetype size of 28, the max distance to add is 2
if distance > len(self.intersection) - self.archetype_size:
return None
else:
return True
def add(self, deck_code: str) -> None:
self.decks.append(deck_code)
# & is the intersection operator for sets
self.intersection = self.intersection & decks_data[deck_code].cards
for deck_code in decks_data:
# A deck code has 100% overlap with itself
assert CommonCardsCluster.deck_to_deck_distance(deck_code, deck_code) == 0
if deck_code != first_deck_code:
# deck-to-deck distance is always > 0 as they're distinct and <= 40 as that's the maximum difference
assert (
0
< CommonCardsCluster.deck_to_deck_distance(deck_code, first_deck_code)
<= 40
)
# In this specific case, the distance to the cluster is the same as the distance to the deck:
# it's the number of non-overlapping cards
cluster = CommonCardsCluster(first_deck_code)
assert cluster.distance(deck_code) == CommonCardsCluster.deck_to_deck_distance(
deck_code, first_deck_code
)
# Validation of the centered clusters code
for deck in factions_decks[max_faction][:200]:
# Only checking out of 200 decks for speed
cluster = get_centered_cluster(
deck,
factions_decks[max_faction][:200],
CommonCardsCluster,
)
# We check the number of cards common to all decks in the cluster
if len(cluster.decks) > 1:
# If have more than one deck, their intersection is at most 39 cards
assert 40 > len(cluster.intersection) >= ARCHETYPE_SIZE
else:
# A few decks simple have no good neigbors, we check the distance is > DECK_SIZE - ARCHETYPE_SIZE
assert min(cluster.distance(d) for d in factions_decks[max_faction][:200] if d != deck) > DECK_SIZE - ARCHETYPE_SIZE
# Validation of the best clusters code
clusters = get_best_clusters(factions_decks[max_faction][:200], cluster_class=CommonCardsCluster)
# We clustered 200 decks, the sum of clusters lengths should be 200
assert sum(len(c) for c in clusters) == 200
# Checking we have the right intersection sizes
for cluster in clusters:
assert 40 >= len(cluster.intersection) >= ARCHETYPE_SIZE
Putting it all together#
Parameter selection#
Selecting the right ARCHETYPE_SIZE
is crucial so I experimented with a few different values. Keep in mind we have ~30,000 decklists.
ARCHETYPE_SIZE |
ARCHETYPES |
AVG DECK/ARCHETYPE FOR TOP 100 |
---|---|---|
30 |
10132 |
53 |
28 |
8847 |
65 |
25 |
7032 |
80 |
ARCHETYPE_SIZE=28
looks like a good spot for archetype size and is coherent with the intuition of what defines an archetype. It allows for 12 cards to be different between decklists, which is 4 individual playsets of cards.
At the same time, the most popular archetype (Senna Veigar) contains over 500 different decks which is starting to obscure a lot of individual card choices which may be meaningful.
A dynamic ARCHETYPE_SIZE
might be what’s needed, in particular for decks which have a lot of “flex” slots. Smaller changes in those popular decks could create new archetypes, and fringe decks could get grouped more easily to allow for easier analysis. This might be something I explore in the future.
Analysing the results#
And finally we get to the fun part, running it and finding out what’s the best deck in Legends of Runeterra right now!
Or so I thought. Let’s take a look at the result.
# This will be a list of lists containing deck codes
common_cards_cluster = []
# Progress visualization
from tqdm.notebook import tqdm
# Running the clustering process for all regions takes a few minutes
# It could be heavily optimized but it's good enough for now
for faction in tqdm(factions_decks):
best_clusters = get_best_clusters(factions_decks[faction], CommonCardsCluster)
common_cards_cluster.extend(best_clusters)
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Input In [10], in <cell line: 9>()
7 # Running the clustering process for all regions takes a few minutes
8 # It could be heavily optimized but it's good enough for now
9 for faction in tqdm(factions_decks):
---> 10 best_clusters = get_best_clusters(factions_decks[faction], CommonCardsCluster)
11 common_cards_cluster.extend(best_clusters)
Input In [7], in get_best_clusters(deck_codes, cluster_class)
15 return any(deck in c for c in clusters)
17 while len(remaining_decks) > 0:
---> 18 possible_clusters = get_all_clusters(remaining_decks, cluster_class)
20 for possible_cluster in sorted(
21 possible_clusters, key=lambda x: len(x), reverse=True
22 ):
23 # If no other cluster already contains any of the decks in the current candidate cluster, we add it
24 # This means we will always add the biggest cluster found at that step
25 if all(not clusters_contains_deck(clusters, d) for d in possible_cluster.decks):
Input In [6], in get_all_clusters(deck_codes, cluster_class)
17 candidates = list(deck_codes)
19 # We get the biggest cluster centered on deck
---> 20 clusters.append(get_centered_cluster(center, candidates, cluster_class))
22 return clusters
Input In [5], in get_centered_cluster(center, candidates, cluster_class)
25 # We iterate on possible members
26 # We copy the list because we want to be able to remove candidates during iteration
27 for candidate in list(candidates):
---> 29 cluster_distance = cluster.distance(candidate)
30 can_be_added = cluster.can_be_added_to_cluster(cluster_distance)
32 # We check if the deck can be added to our cluster first
Input In [8], in CommonCardsCluster.distance(self, deck_code)
21 def distance(self, deck_code: str) -> int:
22 # If the cluster_intersection is fully contained in the decks_data cards, cluster_intersection = cluster + deck intersetion and len = 0
23 # This will always be >= 0 because at most there's full overlap between a deck and a cluster's intersection
---> 24 return len(self.intersection) - len(
25 # & is the intersection operator for sets
26 self.intersection
27 & decks_data[deck_code].cards
28 )
KeyboardInterrupt:
# We'll take a look at the 100 archetypes with the most games
for cluster in sorted(common_cards_cluster, key=lambda x: len(x), reverse=True)[:100]:
stats = cluster.get_stats()
if "Miss Fortune" in stats.title and "Twisted Fate" in stats.title:
stats.display()
Miss Fortune Twisted Fate - Bilgewater Noxus
- 59.54% winrate
- 127,094 games
- 257 decklists
- Aggregated decklist:
3x Precious Pet 3x Marai Warden 3x Legion Saboteur 3x Decimate 3x Noxian Fervor 3x Jagged Butcher 3x Legion Rearguard 3x Zap Sprayfin 3x Legion Grenadier 3x Twisted Fate 3x Miss Fortune 3x Riptide Sermon 2x Eye of Nagakabouros 2x Make it Rain
Miss Fortune Twisted Fate - Bilgewater Noxus
- 58.01% winrate
- 2,939 games
- 51 decklists
- Aggregated decklist:
3x Marai Warden 3x Legion Saboteur 3x Legion Grenadier 3x Noxian Fervor 3x Twisted Fate 3x Miss Fortune 3x Decimate 3x Legion Rearguard 3x Zap Sprayfin 3x Eye of Nagakabouros 3x Precious Pet 3x Riptide Sermon 2x Jagged Butcher 2x Make it Rain
Miss Fortune Twisted Fate - Bilgewater Noxus
- 58.58% winrate
- 1,137 games
- 32 decklists
- Aggregated decklist:
3x Marai Warden 3x Legion Saboteur 3x Zap Sprayfin 3x Decimate 3x Legion Grenadier 3x Noxian Fervor 3x Miss Fortune 3x Legion Rearguard 3x Jagged Butcher 3x Riptide Sermon 3x Twisted Fate 2x Eye of Nagakabouros 2x Make it Rain 2x Precious Pet 1x Tentacle Smash
Gangplank Miss Fortune Twisted Fate - Bilgewater Noxus
- 56.78% winrate
- 2,892 games
- 31 decklists
- Aggregated decklist:
3x Legion Saboteur 3x Zap Sprayfin 3x Decimate 3x Imperial Demolitionist 3x Noxian Fervor 3x Crackshot Corsair 3x Miss Fortune 3x Legion Rearguard 3x Marai Warden 3x Precious Pet 3x Legion Grenadier 2x Island Navigator 2x Twisted Fate 1x Gangplank 1x Jagged Butcher 1x Arachnoid Sentry
As we can see, we have 3 archetypes that are… Pretty much the same MF-TF aggro deck. The differences in their aggregate decklists are minimal, and they should not be 3 different archetypes, or split differently at least.
I think this is due to the fact that players will try pretty much any change to a decklist, which means each individual card will get cut from the decklist at some point. Archetypes aren’t really defined by cards they all have in common, but more by how many cards they have in common.
So it’s time to implement our second distance!
Clustering based on cards differences#
Let’s take the opposite approach:
The distance of a deck to a cluster is the maximum number of different cards it has with a deck in the cluster
We will add new decks by minimizing this maximum distance
We will set a limit on the maximum distance between two decks in an archetype
The goal is to add decks that differ slightly from eachother, whichever cards the players decide to change.
Even though the number seems high at MAX_DIFFERENCE=15
, it’s the maximum difference between two arbitrary decks in the cluster. As the clusters are formed iteratively, similar decks will get grouped together early and hopefuly it won’t catch too many “parallel” archetypes.
# This will be the maximum number of different cards 2 different decks can have while in the same cluster
MAX_DIFFERENCE = 15
# Global distance cache
decks_distance_cache = defaultdict(dict)
def deck_to_deck_distance(dc_1: str, dc_2: str):
# This time we will use this distance heavily and add a distance cache
# We sort the decks
d1, d2 = sorted((dc_1, dc_2))
if d2 not in decks_distance_cache[d1]:
decks_distance_cache[d1][d2] = DECK_SIZE - len(
# & is the intersection operator for sets
decks_data[d1].cards
& decks_data[d2].cards
)
return decks_distance_cache[d1][d2]
class DifferenceBasedCluster(Cluster):
def __init__(self, center: str) -> None:
super().__init__(center)
self.max_difference = MAX_DIFFERENCE
# Internal cluster distance cache to speed up max search
# Each deck will point to its maximum distance to the cluster
# Which means when iterating a deck, we only need to check what's biggest between the max distance in the cache and the one with the last deck added
self.cluster_distance_cache = defaultdict(lambda: 0)
def distance(self, deck_code: str) -> int:
# We could do max(self.deck_to_deck_distance(d, deck_code) for d in self.decks)
# But we want to speed things up and any deck with a distance > MAX_DIFFERENCE can be ruled out directly
max_distance = max(self.cluster_distance_cache[deck_code], deck_to_deck_distance(deck_code, self.decks[-1]))
# In this situation the deck we selected will be too far, we return the maximum value and we'll remove the deck
if max_distance > self.max_difference:
return float("inf")
# Else we save the new value
self.cluster_distance_cache[deck_code] = max_distance
return max_distance
def can_be_added_to_cluster(self, distance: int) -> Optional[bool]:
# By now I realize this function was superfluous, and we could have relied on distance() returning float("inf") as our way to prune decks
# Refactoring notebooks is a huge pain though so I won't change it :D
# If adding the new deck would make the intersection go over the archetype size, it will never be able to be added
# For example if we currently have an intersection of 30 and an archetype size of 28, the max distance to add is 2
if distance == float("inf"):
return None
else:
return distance < self.max_difference
def add(self, deck_code: str) -> None:
self.decks.append(deck_code)
for deck_code in decks_data:
# A deck code has 100% overlap with itself
assert deck_to_deck_distance(deck_code, deck_code) == 0
if deck_code != first_deck_code:
# deck-to-deck distance is always > 0 as they're distinct and <= 40 as that's the maximum difference
assert 0 < deck_to_deck_distance(deck_code, first_deck_code) <= 40
# In this specific case, the distance to the cluster is the same as the distance to the deck:
# it's the number of non-overlapping cards
cluster = DifferenceBasedCluster(first_deck_code)
# Our cluster distance returns float("inf") if deck is too far, so we need to check >= here
assert cluster.distance(deck_code) >= deck_to_deck_distance(
deck_code, first_deck_code
)
# Validation of the centered clusters code
for deck in factions_decks[max_faction][:200]:
# Only checking out of 200 decks for speed
cluster = get_centered_cluster(
deck,
factions_decks[max_faction][:200],
DifferenceBasedCluster,
)
assert len(cluster.decks)
# Validation of the best clusters code
clusters = get_best_clusters(
factions_decks[max_faction][:200],
cluster_class=DifferenceBasedCluster,
)
# We clustered 200 decks, the sum of clusters lengths should be 200
assert sum(len(c) for c in clusters) == 200
Checking results#
# This will be a list of lists containing deck codes
difference_based_clusters = []
# Progress visualization
from tqdm.notebook import tqdm
# Running the clustering process for all regions is ~1 hour with that algorithm
# It could be heavily optimized but let's say it's ok for now
for faction in tqdm(factions_decks):
best_clusters = get_best_clusters(factions_decks[faction], DifferenceBasedCluster)
difference_based_clusters.extend(best_clusters)
# We'll take a look at the 1000 archetypes with the most games
for cluster in sorted(difference_based_clusters, key=lambda x: len(x), reverse=True)[:100]:
stats = cluster.get_stats()
if stats.title.startswith("Miss Fortune Twisted Fate"):
stats.display()
Miss Fortune Twisted Fate - Bilgewater Noxus
- 59.46% winrate
- 136,131 games
- 506 decklists
- Aggregated decklist:
3x Legion Saboteur 3x Noxian Fervor 3x Decimate 3x Legion Rearguard 3x Marai Warden 3x Zap Sprayfin 3x Precious Pet 3x Jagged Butcher 3x Legion Grenadier 3x Twisted Fate 3x Miss Fortune 3x Riptide Sermon 2x Eye of Nagakabouros 2x Make it Rain
Looks better! We only have a single pirates aggro list that properly catches the decklists from the 3 previous ones we had.
So after all this… Let’s take a look at the best decks and call it a day :D
# Let's take a look at the 100 archetypes with the most games and print the 10 highest winrates amongst tho
for cluster in sorted(
sorted(difference_based_clusters, key=lambda x: len(x), reverse=True)[:100],
key=lambda x: x.get_stats().winrate,
reverse=True,
)[:10]:
stats = cluster.get_stats()
stats.display()
Miss Fortune Twisted Fate - Bilgewater Noxus
- 59.45% winrate
- 137,768 games
- 506 decklists
- Aggregated decklist:
3x Legion Saboteur 3x Noxian Fervor 3x Decimate 3x Legion Rearguard 3x Marai Warden 3x Zap Sprayfin 3x Precious Pet 3x Jagged Butcher 3x Legion Grenadier 3x Twisted Fate 3x Miss Fortune 3x Riptide Sermon 2x Eye of Nagakabouros 2x Make it Rain
Ziggs Katarina Annie - Noxus Shurima
- 58.10% winrate
- 4,881 games
- 40 decklists
- Aggregated decklist:
3x Legion Saboteur 3x Treasure Seeker 3x Imperial Demolitionist 3x Decimate 3x Legion Rearguard 3x Merciless Hunter 3x Legion Grenadier 3x Ruin Runner 3x Ruinous Path 3x Baccai Reaper 3x Annie 2x Noxian Fervor 2x Ziggs 2x Might 1x Katarina
Taric Poppy - Demacia MtTargon
- 57.50% winrate
- 25,699 games
- 296 decklists
- Aggregated decklist:
3x Golden Aegis 3x Petricite Broadwing 3x Poppy 3x Esmus, Breath of the World 3x Taric 3x Fleetfeather Tracker 3x Mountain Sojourners 3x Pale Cascade 3x Laurent Protege 2x Relentless Pursuit 2x Tyari the Traveler 2x Guiding Touch 2x Solari Soldier 1x Concerted Strike 1x Bastion 1x Ranger's Resolve 1x Brightsteel Protector 1x Sharpsight
Annie Jhin - Jhin Noxus
- 57.35% winrate
- 69,715 games
- 463 decklists
- Aggregated decklist:
3x Legion Saboteur 3x Boomcrew Rookie 3x Noxian Fervor 3x Jhin 3x Annie 3x Tusk Speaker 3x Decimate 3x Legion Rearguard 3x Imperial Demolitionist 3x Solari Sunhawk 3x The Stagehand 3x Doombeast 3x Crackshot Corsair 1x Arachnoid Sentry
Gnar Sejuani - BandleCity Freljord
- 57.34% winrate
- 3,122 games
- 57 decklists
- Aggregated decklist:
3x Teenydactyl 3x Pokey Stick 3x Inventive Chemist 3x Sejuani 3x Gnar 3x Troll Chant 3x Chief Nakotak 3x Tusk Speaker 3x Bitsy Lizard 3x Poison Dart 3x Ursine Spiritwalker 2x Three Sisters 2x Spotted Toad 2x Minitee 1x Elixir of Iron
Elise Gwen - Noxus ShadowIsles
- 57.14% winrate
- 27,051 games
- 287 decklists
- Aggregated decklist:
3x Decimate 3x Precious Pet 3x Noxian Fervor 3x Arachnoid Horror 3x Doombeast 3x Stygian Onlooker 3x Elise 3x House Spider 3x Frenzied Skitterer 3x Legion Saboteur 2x Imperial Demolitionist 2x Stalking Shadows 2x Gwen 2x Boisterous Host 2x Phantom Butler
Gangplank Miss Fortune Twisted Fate - Bilgewater Noxus
- 57.04% winrate
- 3,054 games
- 48 decklists
- Aggregated decklist:
3x Noxian Fervor 3x Miss Fortune 3x Legion Saboteur 3x Legion Rearguard 3x Decimate 3x Crackshot Corsair 3x Imperial Demolitionist 3x Zap Sprayfin 3x Marai Warden 3x Precious Pet 3x Legion Grenadier 2x Island Navigator 2x Twisted Fate 2x Arachnoid Sentry 1x Gangplank
Viego Evelynn - Evelynn ShadowIsles
- 56.95% winrate
- 83,699 games
- 610 decklists
- Aggregated decklist:
3x Camavoran Soldier 3x Viego 3x Evelynn 3x Sultur 3x Glimpse Beyond 3x Hate Spike 3x Domination 3x Vile Feast 3x Invasive Hydravine 3x Black Spear 2x Vengeance 2x Vora 2x Neverglade Collector 2x Atrocity 2x Barkbeast
Gwen Elise Katarina - Noxus ShadowIsles
- 56.83% winrate
- 39,115 games
- 249 decklists
- Aggregated decklist:
3x Boisterous Host 3x Phantom Butler 3x Gwen 3x Ravenous Flock 3x Fallen Reckoner 3x Ruined Reckoner 3x House Spider 3x Legion Rearguard 3x Vile Feast 2x Eternal Dancers 2x Arachnoid Sentry 2x The Harrowing 2x Hate Spike 2x Katarina 2x Glimpse Beyond 1x Elise
Shyvana Aurelion Sol - Demacia MtTargon
- 56.81% winrate
- 5,497 games
- 87 decklists
- Aggregated decklist:
3x Shyvana 3x Dragon Chow 3x Herald of Dragons 3x Dragon's Clutch 3x Single Combat 3x Ruined Dragonguard 3x Screeching Dragon 3x Sharpsight 3x Wounded Whiteflame 3x Dragonguard Lieutenant 3x Strafing Strike 2x Eclipse Dragon 2x Aurelion Sol 2x Concerted Strike 1x Kadregrin the Ruined
Conclusion#
Because clustering is excruciatingly slow I’m not really satisfied with this first approach. There has to be a fast algorithm that can handle this task.
But in the meanwhile, I’ll just rely on heuristics. For Legends of Runeterra, this can simply be looking at champions and regions.
If you have any other idea for how to implement clustering for card games in a more game-agnostic way, don’t hestitate! I’ll still be looking at the subject moving forward, even though this blogpost already took me much more time than expected as my first notebook-based blog post.