I've been working on this in my spare time for a couple days and thought I would share it with you all. With the recent changes to MTGO "5-0" data sharing to a smaller, non-random sample, there is no longer a good source for tier and general meta analysis data.
Being a software developer with the skills to pull it off in a relatively short amount of time, I decided to go ahead and take on this project to do a meta analysis. It will only take me a few minutes to update this when new data comes available, and I plan to keep it up to date for as long as I'm following the Modern competitive scene. I may turn it into a website with more features later, but for now a spreadsheet is fine.
The data includes a tier list with a raw meta share, weighted score, and some other stats. My methodology is explained below the analysis link.
The paper meta analysis covers all major tournaments. All tournaments feature at a minumum:
Hundreds of players
9 Swiss rounds
Top-8 playoff
At least 16 placement results posted
The types of tournaments currently included are:
Grand Prix
SCG Open
SCG Invitational
SCG Classic
The online meta analysis covers weekly Modern Challenge events. Only decks with at least 5 wins in the tournament are included. 4-3 decks are not included. Usually anywhere from 17-25 decks will have 5 wins or more in the 32 results posted.
All software was written by a professional software developer who specializes in big data analytics. All code is written using industry best practices and includes rigorous unit tests.
Archetype Definitions
Most of the archetypes are obvious, but some concessions had to be made for the sake of clarity in the lists. This was the hardest part of the whole process because each deck had to at least be glanced at before being categorized because the naming is often inconsistent or even flat out wrong. I've captured the published name in addition to my curated name so it's available for future analysis.
"Scapeshift" - All variants, including the current favored deck featuring Valakut, Primeval Titan and Scapeshift in addition to the Into the Breach version.
"U/Bx DS" - Grixis Death's Shadow plus the odd Esper Death's Shadow, which differ only slightly (red or white splash) and have the same plan.
"G/Bx DS" - Jund, G/B and the odd four-color Death's Shadow variants with white splash (but not blue), and not Abzan.
"Jund" - Encompasses non-Death's Shadow Jund Midrange and similar G/B Midrange decks.
"Abzan" - All non-Death's Shadow variants with or without Traverse.
Jeskai - There are three distinctions:
"Jeskai Control" - All control-oriented variants, including Queller, Nahiri, etc but not U/R variants.
"Jeskai Saheeli" - Semi-control variant running 4x Saheeli Rai and 4x Felidar Guardian for the "Oops, I win" combo.
"Four-Color Saheeli" - There is also a variant running green that's more all-in on the combo and is a distinct deck.
"Gx Tron" - G/B, G/R, and G/W variants of the traditional Tron deck.
"Death and Taxes" - The mono-white deck only, no B/W or G/W variants.
"B/W Eldrazi" - A distinct Eldrazi variant, sometimes called "Eldrazi and Taxes". Some similarity to the mono-white Death and Taxes but different enough to be separated.
Bant and G/W Company - There are a several variants, and the naming in deck lists is often quite mixed up:
"Counters Company" - All Devoted Druid/Vizier of Remedies combo variants. These decks are named lots of different things in deck lists, but they are consolidated here under one name.
"Knightfall" - Knight of the Reliquary variants whether G/W or Bant w/Retreat to Coralhelm or counterspells.
"G/W Hate Bears" - A distinct deck from the above, also distinct from but similar to Death and Taxes.
"Turns" - All Ux color variants.
Meta Share Calculation
Meta share is calculated as a raw percentage of all placements in all tournaments analyzed for the last 90 days with no weighting. It is not analagous to overall meta share since it only considers top finishers, but it is certainly a reasonable proxy for meta share at the top of the competitive meta.
Tier Calculation
Tiers are determined using the community standard metric of overall meta share. The cutoffs are fixed as follows:
Tier 1: >= 4.0%
Tier 2: >= 2.0%
Tier 3: >= 1.0%
Tier 4: >= 0.5%
Untiered: < 0.5%
Weighted Tier List Calculation
Results are weighted using scaling functions on two dimensions. The first dimension is placement in the tournament results, with a higher placement scoring a little more and with bonus points for each Top-8 win. The second dimension is time, with more recent tournaments scoring highest and more distant tournaments scoring an increasingly reduced amount.
Archetypes are weighted not just by placing in the top of the tournament but also by where they place and whether they won additional Top-8 games. The scaling function is linear and relatively flat such that the 32nd place deck gets 1 point, the 8th place finisher gets 2 points, and the 1st place finisher gets 2.5 points. Decks that win Top-8 games (i.e. the first through 4th finishers) get an additional .5 points for each win (.5, 1 or 1.5 points).
For some tournaments, there are only 16 finishers given. In this case, the scaling still works from a base of 32, so the 16th place deck would get the same number of points (1.6) that the 16th place deck would get in a 32-finisher result set. The same applies to every placement up to 1.
The numbers given above apply only to the most recent tournament, and more distant tournaments scores will be deducted a percentage amount depending on how distant they are. All tournaments being scored are in the last 90 days. Scoring deduction amounts scale linearly from a base date of the most recent tournament date, with the most distant tournament placements (90 days older than the most recent) having their scores reduced by about 50%. Scores that are one week older than the most recent event would be reduced by only about 3.89%, and so on.
These are my current thoughts and are subject to change as data changes over time and more analysis is done.
Comments on specific decks:
There is a really obvious top end of Grixis DS and Affinity. They are miles away from the next best decks in the paper meta.
Counters Company, Jeskai Control and Abzan are doing a lot better than people seem to realize.
Rumors of Burn's demise are premature. It's firmly tier 1.
Jund is doing better than expected, especially if DS variants are considered. If combined, GBx DS/Midrange would be firmly in tier 1. It would seem Tarmogoyf is in a stronger position than perceived. In contrast, Non-DS Grixis variants don't show.
W/R Prison is also exceeding it's perceived placement in the meta.
U/W Control is not doing as well as a lot of people seem to think. Jeskai Control is clearly the much more consistent deck in the paper meta.
General comments:
53 distinctly different decks have placed at the top of a tournament in the last three months.
Comparing the raw meta share to the weighted score, you can see that they actually line up pretty well.
And finally, based on this analysis, the data disagrees quite a bit with this forum's tier placements. I'm not sure if the moderators are basing it more on the random MTGO results or not, but FWIW the placements are really out of synch with the competitive paper meta.
First off, let me say thanks very much for your contribution to the modern community with this! People are hungry for this kind of information after the MTGO changes. I think this does have the potential for some very useful and interesting analysis, but I do have some reservations.
I think there should be more caveats concerning how literally we should take this performance data and how much stake we put into conclusions drawn from it's analysis. Do you truly believe GDS and Affinity are tier 0 right now in modern? What I see is a shifting sideboard war as players attack GDS with graveyard hate and get caught sleeping by affinity, and vice versa. There is also the fact that the top 32 can be wildly skewed due to tiebreakers and draft-portions of some events, and with the low amount of data points in a 90-day period this can really really significantly impact tierings in your system.
So all in all, I really like what you've presented here but I think we should be looking to draw more qualitative conclusions about sideboard and prevalence trends rather than view it as a quantitative indicator of raw performance.
In terms of drawing conclusions from the data, the way I view it is the data is cold and emotionless. This is all the good data that we have for describing the paper meta, and it says what it says. Do I wish there were more? Yes. It would be great if they went 64 deep instead of 32 (or 16). We should bug SCG to post more. WotC too. I could go back farther in time to get more data, but more than three months ago the meta was quite different. Three months feels about right because that's WotC's release schedule, and going pre-Push or pre-Ballista wouldn't be helpful.
I think the main caveat to make, and I did mention it, is that this data describes the top of the competitive meta. It's not necessarily applicable to what people see at their LGS, but it is 100% representative of what decks are winning and doing well at the top of the format.
I think it's fine that the competitive paper meta turned out to not be what people imagined it to be. It's not what a random sampling of MTGO games led people to believe it is.
I was certainly surprised. I didn't expect there to be a clear Tier 0, but there is. It's not debatable. GDS and Affinity are placing more and higher than any other decks. A lot more, in fact twice as much as Tier 1 decks.
I didn't expect to see so many Tarmogoyf decks, especially Abzan. Goyf is a 4-of in more than 10% of the Tier 2+ decks. Goyf is performing. Again, it's not debatable. It's cold, hard data.
I have some ideas about trends, will be working on them when I get back from vacation in a couple weeks. One thought is to show week by week how each tier 3+ deck is trending over time in a line chart. It will show how one decks's rise coincides with another decks fall.
I also want to implement an archetype tagging mechanism to be able to aggregate data on different things, some examples could be "Combo", "Control", "Linear", "Interactive", "Death's Shadow", "Eldrazi", etc.
A far-reaching idea is to start ingesting deck lists and use them to train neural networks, then test new deck lists against the network to make inferences about similarity, note deck changes over time, automatically discover new decks, etc ... would be a fun project, but not now.
Edit: Looks like I missed labeling an "Auras" (aka G/W Hexproof) deck, will go back and fix that, probably tomorrow.
Edit2: So there were two other events within the last 3 months that I didn't include - Copenhagen and Kobe. That's a good chunk of data. Thought they were older for some reason. I'll also add those and see how it shifts things.
Thank you for your work, this is very interesting and will hopefully be of use to the community. The detailed methodology being laid out is highly appreciated as well.
Private Mod Note
():
Rollback Post to RevisionRollBack
Vorthos-y Johnny. All will be One
Modern - Cheeri0s (building), Belcher (building), Lantern (building), UW Control (building)
RIP Magic Duels. Wizards will regret what they did to you.
To be frank, calling two decks Tier 0 with only 11% of the meta split between the two is a huge exaggeration, and anyone can tell you that in practice neither deck is so unbeatable as to be called tier 0. 11% is a pretty mild number, especially when we've had other decks take up as much as 18% (and way more than during Eldrazi Winter),
Jund DS and Jund Midrange shouldn't be considered the same deck. Same for Abzan DS and Abzan Midrange. They just have too many differences in the cards they run and their overall pace to be considered the same deck. I know that you didn't combine their usage in the spread sheet, but you brought it up in your second post, so I thought I should say something.
For your score calculations, are you taking into account how many rounds of Swiss are being played before the Top 8 cutoffs? I wasn't sure based on your OP, and it would be an important factor since fewer rounds before top 8 = more variance. I'm no statistics major, so I couldn't tell you how much of a factor it should be, but I don't think it should be ignored
Other than that, it seems like a good project to start early. Keep up the good work
Private Mod Note
():
Rollback Post to RevisionRollBack
Decks
Modern: UWUW Control UBRGrixis Shadow URIzzet Phoenix
For folks who may be looking at this, I was missing Kobe and Copenhagen, and those will alter the results quite a lot. I'll update it tomorrow morning and post here. Very sorry for any confusion or unintentional misrepresentation.
If data says the top tier is just two decks, then the top tier is just two decks. No need to apologize or manipulate anything with a tier 1.5. The top tier isn't a "tier 0" situation because the top tier decks don't make up a large enough part of the meta. I would just have your tier 1.5 labeled as tier 2 since that seems to be what the data says.
Private Mod Note
():
Rollback Post to RevisionRollBack
Modern Decks
KnightfallGWUR
Azorius Control UW
Burn RBG
I don't know how other people did it in their meta reviews, but I always used standard deviations over the average. This is significantly less arbitrary than other methods. It's still arbitrary because all stats are secretly kind of arbitrary, but it's at the much more objective end of arbitrariness.
Again, it's a good analysis and awesome you did it. You just run into big trouble with your point scale, where you are assigning super arbitrary values to various finishes without really thinking through their meaning. For example, a 32nd place finish gets 1 point and a 1st place finish gets 2.5 points. Why not 1 point vs. 1.5 points? Or .5 vs. 3? In essence, you've made up values by assigning numbers there, which warps the entire scale. Another issue; a win in the T8 gets you a point. Does that mean winning one T8 match is really weighted the same as a 32nd place finish? That's a strange weighting I could never explain, and yet it's a weird effect of the system you decided on.
The meta share is objective. 11% is twice as much share as 5.5%. 5.5% is more than twice as much as 2%. Etc. But the other score you give doesn't really function like that, even though you set it up as if it does. I think this is because you are trying to make a scale to suggest performance, but a) your data is too limited to do that, and b) you've assigned arbitrary values to measure that performance. The result is data where one column makes sense (share) and would be the same no matter how many people crunched your numbers. The score column, however, is just your personal scoring method with personally selected numbers. It would change depending on who assigned those values, so it's basically useless for ranking decks.
The raw meta share percentage is there if that's what important to anyone.
But the raw meta share percentage doesn't really tell the whole story. That's why you see some differences in the order of the share and weighted score. For example, you can see that Death and Taxes is outperforming its meta share. That's pretty interesting to me.
Of course any weighting scheme is going to be arbitrary. I chose the scaling function specifically so that 8th would be twice as many points as 32nd, so that's how it works. I also wanted to reward the player/deck considerably for winning games in the top-8 playoff. And winning the tournament. Those wins are a big deal. If you think placing 32nd is just as good as winning, fine, use meta share, but I disagree.
If you don't like the scaling function, that's fine too, but it feels reasonable to me and wasn't a ton of work to implement. The numbers can easily be tweaked; they are just constants in a program. I tried tweaking them a few different ways and the current configuration is working as intended. And this is really just an intial prototype; I'm open to other ideas.
To be frank, calling two decks Tier 0 with only 11% of the meta split between the two is a huge exaggeration, and anyone can tell you that in practice neither deck is so unbeatable as to be called tier 0. 11% is a pretty mild number, especially when we've had other decks take up as much as 18% (and way more than during Eldrazi Winter),
Jund DS and Jund Midrange shouldn't be considered the same deck. Same for Abzan DS and Abzan Midrange. They just have too many differences in the cards they run and their overall pace to be considered the same deck. I know that you didn't combine their usage in the spread sheet, but you brought it up in your second post, so I thought I should say something.
For your score calculations, are you taking into account how many rounds of Swiss are being played before the Top 8 cutoffs? I wasn't sure based on your OP, and it would be an important factor since fewer rounds before top 8 = more variance. I'm no statistics major, so I couldn't tell you how much of a factor it should be, but I don't think it should be ignored
Other than that, it seems like a good project to start early. Keep up the good work
Thanks for the comments.
Regarding Jund, yeah they are separate in the calculations. I just made the comment regarding Tarmogoyf and BGx in general, which is much better positioned in the paper meta than people give it credit for. And it looks like it's gonna get a boost from the missing GPs, but I'll see for sure tomorrow.
All the tournaments are at least 9 rounds. The ones with a day two such as Opens and GPs are more like 15 I think. And they all have a Top-8 single elimination playoff. But generally speaking I agree with you - I wouldn't want to include smaller tournaments like IQs that are 6 rounds and only ~50 people.
Labels ignored (Tier 0 has meaning within the community, but I'm not going to crucify you for it...) I think this is a more realistic approach than when we saw the shake up here, I really appreciate the effort!
If data says the top tier is just two decks, then the top tier is just two decks. No need to apologize or manipulate anything with a tier 1.5. The top tier isn't a "tier 0" situation because the top tier decks don't make up a large enough part of the meta. I would just have your tier 1.5 labeled as tier 2 since that seems to be what the data says.
Yeah, the 1.5 actually was kind of a joke.
I guess people have in their mind that "Tier 0" means something specific regarding meta share. That's fine, I don't care. It's just data. But it is a huge jump from GDS/Affinity to ET, etc.
I don't know how other people did it in their meta reviews, but I always used standard deviations over the average. This is significantly less arbitrary than other methods. It's still arbitrary because all stats are secretly kind of arbitrary, but it's at the much more objective end of arbitrariness.
Again, it's a good analysis and awesome you did it. You just run into big trouble with your point scale, where you are assigning super arbitrary values to various finishes without really thinking through their meaning. For example, a 32nd place finish gets 1 point and a 1st place finish gets 2.5 points. Why not 1 point vs. 1.5 points? Or .5 vs. 3? In essence, you've made up values by assigning numbers there, which warps the entire scale. Another issue; a win in the T8 gets you a point. Does that mean winning one T8 match is really weighted the same as a 32nd place finish? That's a strange weighting I could never explain, and yet it's a weird effect of the system you decided on.
The meta share is objective. 11% is twice as much share as 5.5%. 5.5% is more than twice as much as 2%. Etc. But the other score you give doesn't really function like that, even though you set it up as if it does. I think this is because you are trying to make a scale to suggest performance, but a) your data is too limited to do that, and b) you've assigned arbitrary values to measure that performance. The result is data where one column makes sense (share) and would be the same no matter how many people crunched your numbers. The score column, however, is just your personal scoring method with personally selected numbers. It would change depending on who assigned those values, so it's basically useless for ranking decks.
The raw meta share percentage is there if that's what important to anyone.
But the raw meta share percentage doesn't really tell the whole story. That's why you see some differences in the order of the share and weighted score. For example, you can see that Death and Taxes is outperforming its meta share. That's pretty interesting to me.
Of course any weighting scheme is going to be arbitrary. I chose the scaling function specifically so that 8th would be twice as many points as 32nd, so that's how it works. I also wanted to reward the player/deck considerably for winning games in the top-8 playoff. And winning the tournament. Those wins are a big deal. If you think placing 32nd is just as good as winning, fine, use meta share, but I disagree.
If you don't like the scaling function, that's fine too, but it feels reasonable to me and wasn't a ton of work to implement. The numbers can easily be tweaked; they are just constants in a program. I tried tweaking them a few different ways and the current configuration is working as intended. And this is really just an intial prototype; I'm open to other ideas.
I totally agree that share doesn't tell the whole story and it's important to look at other performance metrics as well. My issue is that this scale uses arbitrary values you've selected without any clear rationale. You've then used that scale in place of a "true" performance metric like MWP/GWP or conversion rate. That's especially problematic because you tell us multiple times your analysis is "cold and emotionless" without "smoke and mirrors," but in reality there's this major personal weighting decision behind your entire scoring system, and no justification for those values.
The metagame share is certainly cold, hard data analysis and it speaks for itself. Once you add the other GP, N will get bigger and those numbers will get even tighter. But your performance metric does not meet those same objective standards. I'd cut that metric entirely, redo it from the ground up so it's not so arbitrary, or rework your presentation of the analysis so you aren't making it out to be something it isn't.
Well, by "smoke and mirrors" I mean what Wizards is doing - showing 5 decks of their choice daily - and by extension what people who use that data are doing. There is a built-in bias. My analysis is based on all the available un-biased paper data.
And a pre-determined mathematical formula is objective. Numbers go in, numbers come out. The function could be something different, but the results wouldn't be dramatically different because of the nature of the data.
Also, I don't see how I could possibly be making it out to be something that it isn't when I've broken down the details of the methodology. It's completely transparent and like I said anyone can just ignore the weighted score. Not sure why you're so bothered by the score, seems like you are taking it really personally.
And I'm not removing the scoring, it wouldn't be nearly as interesting without it, but it could certainly change. I'm more likely to add more columns with different metrics plus different types of trend charts than I am to remove anything.
Edit: You mentioned standard deviations, and that is certainly one possible approach. I charted out an example, and the curve didn't have the shape I wanted, so I wrote my own function.
Cool, great info. Personally surprised that Eldrazi Iron isn't a little bit closer to the other 2 but either way useful to have.
I was wondering if you could just have a box on the spreadsheet showing the most recently added event(s) just incase people are looking at it during a double header weekend and it's not been updated as of yet.
There are now 13 tournaments in the results. After this weekend, there will be 15, and the data will be maxed out for the 3 month window. I think that's enough data to get a decent picture of the meta.
I'm no longer naming tiers. I did some color coding where I think the "tier" break points are. I also added another level since having 53 archetypes supports it.
The results are interesting. Some things moved quite a lot despite the tournaments being about 7 weeks old now. Dredge was reallly helped a lot by the GPs. I think its current ranking is actually a lot more accurate; I had thought it looked too low before.
Counters Company moved above ET as it was really strong at the GPs. Counters Company is an interesting case because it seems to be underrepresented in tier lists. I've concluded that it's a product of the deck having lots of different names, and the engines some sites are using just take the name as given, so the results are fragmented. I look at each deck to make sure it matches an archetype without duplication. There are other offenders, but Counters Company is the worst.
Merfolk and Gx Tron moved down. Some recent showings bolstered their ranking, but with a more complete data set we see they actually belong a bit lower down the list, which seems about right to me. I'll leave the old tab up for a bit if anyone who already had looked at that one wants to compare.
Also, I made a slight modification to my scaling function. I wanted a little bump in the curve at the top of the tournament, but maybe it was a bit too much, so I'm now only awarding .5 points for Top-8 wins instead of 1. So decks are still rewarded for winning but will be less likely to jump up a tier just for one really good showing.
I was thinking a bit more about other metrics, and was thinking about what ktkenshinx wrote, and I probably didn't explain some things well enough.
When using functions like average, mean and standard deviation, you have to be careful. If you just throw data at it and accept the results, they can be skewed. For example, if I'm plotting SD of a deck's finishes, and there are lots of placements (like for Affinity), it will work out ok. But if a deck only shows up once, and has a better than average finish, it will appear better than it actually is. I'm wondering if this type of calculation is what is causing decks like Humans to be perceived as tier 2 when it's clearly untiered in the competitive paper meta.
My scoring function avoids this problem because it is weighted on appearances. Meta share is kind of the dumb version of this method, where every placement is treated as equal. And any mean or average based calculation on the aggregate data also isn't any different from meta share. I'm going to look at ways to apply statistical models to the data to come up with something that makes sense. One option that could be interesting is using probabilities to predict how likely a given deck is to win the next tournament. Any of these things are pretty easy to do, it's just a matter of deciding what makes sense for the data.
Cool, great info. Personally surprised that Eldrazi Iron isn't a little bit closer to the other 2 but either way useful to have.
I was wondering if you could just have a box on the spreadsheet showing the most recently added event(s) just incase people are looking at it during a double header weekend and it's not been updated as of yet.
Well done, thanks for the effort!
Thanks.
It's in a single column now with the date of the last calculated tournament at the top. I'll just add a new column every weekend, push the old stuff to the right.
Made a couple more tweaks. Now there are two tier tabs, one sorted by meta share and one by weighted score. Meta share is the default tab since that's what people seem to prefer.
Made a couple more tweaks. Now there are two tier tabs, one sorted by meta share and one by weighted score. Meta share is the default tab since that's what people seem to prefer.
Big improvements overall and appreciate the work. I'd just go ahead and name them Tier 1 through Tier 3. I know you want to avoid that terminology, but everyone is going to use it anyway and you aren't differentiating your project by omitting it.
I've tried to do similar performance scales in the past. For years I tried to get a scale to work in place of a real MWP/GWP, points, or conversion rate metric, and I've tried similar scales to yours where you're rating higher performance at events slightly more than lower performance. There are a few big issues with this approach. First, as you can see on your sheet, you end up with a score that cannibalizes the meta share ranking too heavily. The result is a list that is ordered almost identically to the meta share list, so it isn't giving a lot of new information.
The second problem is that the score undersells good decks. D&T makes a pretty sizable jump from the Tier 3 meta share bracket to the Tier 2 score bracket based on its score, but that gets lost in the noise of all the other decks that don't move at all.
The third problem is the scores don't compare well. Bant Eldrazi is not a good deck right now, and yet its score is the same as the D&T score, despite D&T being better positioned and having much better performances.
Finally, the score doesn't account for the so-called dartboard effect. If you throw a lot of darts at a board, some are bound to hit. If 30% of the field shows up on Grixis DS and only 15% make it to the top tables, that's still a huge share of the top tables, but a mediocre/bad conversion rate. By contrast, if 3 players played D&T and 2 ended up in the T8, that's a spectacular conversion rate. The score doesn't capture those nuances because, again, it's heavily based on share.
A better score would be to look at the points for all decks in all tournaments. SCG and Wizards publishes points by player, and you could cross-reference those with decklists to figure out average scores for different decks. That would be a much more objective approach, but I acknowledge not all tournaments provide that information.
The second problem is that the score undersells good decks. D&T makes a pretty sizable jump from the Tier 3 meta share bracket to the Tier 2 score bracket based on its score, but that gets lost in the noise of all the other decks that don't move at all.
Thanks for the comments.
I wouldn't expect all the decks to move, nor do I think anything is lost "in the noise". It's a very small data set. It's pretty easy to switch between the two and see what's going on. D&T is a good example of a deck that's outperforming its meta share, and the point system shows that. Same with Merfolk. And Elves are the opposite - it's placing frequently, but it's clearly not a threat to actually win. The scoring is working as intended.
Some of the metrics you are talking about are just impossible due to lack of data. My goal was to produce the closest approximation to meta truth with the data available, and I think I've done that. If there were better data, I'd be using it.
Edit: Also, I think the score is a better representation of the meta. Following the examples above, I'd have D&T as tier 2 and Elves as tier 3, not vice versa as meta share would have it. And tier 3 isn't bad - they are decks that are performing well.
The new organization is great and I like having both the direct measure as the primary sort metric, and then the weighted measure as the secondary sort metric. This gives the direct frequency of occurrence at the top of events first, and then the option of seeing it weighted by placement as well. Between both you get more information than just the first, so the second is welcome imo.
Private Mod Note
():
Rollback Post to RevisionRollBack
Modern Decks
KnightfallGWUR
Azorius Control UW
Burn RBG
So youre doing data thats 3 months ago and newer for only paper? And to make sure I'm reading this correctly, the first tab is just raw meta game field %? And the tiers have a hard divide based on colors?
Not a math guy, just trying to read this.
Its probably necessary now that wizards has fully killed any viability in reading meta % online.
The second problem is that the score undersells good decks. D&T makes a pretty sizable jump from the Tier 3 meta share bracket to the Tier 2 score bracket based on its score, but that gets lost in the noise of all the other decks that don't move at all.
Thanks for the comments.
I wouldn't expect all the decks to move, nor do I think anything is lost "in the noise". It's a very small data set. It's pretty easy to switch between the two and see what's going on. D&T is a good example of a deck that's outperforming its meta share, and the point system shows that. Same with Merfolk. And Elves are the opposite - it's placing frequently, but it's clearly not a threat to actually win. The scoring is working as intended.
Some of the metrics you are talking about are just impossible due to lack of data. My goal was to produce the closest approximation to meta truth with the data available, and I think I've done that. If there were better data, I'd be using it.
Edit: Also, I think the score is a better representation of the meta. Following the examples above, I'd have D&T as tier 2 and Elves as tier 3, not vice versa as meta share would have it. And tier 3 isn't bad - they are decks that are performing well.
So I don't want to jump into this argument deeply. But I would like to point out that you are a professional in thats stats field. So to you, it may seem super obvious that DT jumped a tier. But Ktkneshinx is correct in saying that stuff gets lost in the noise. When you post online to a large group of people who aren't professionals in your field about your field, you need to take care to present things in a clear and 'newb friendly' way (I wouldn't discuss music on here and just assume you all had a firm understanding of Fuxian counterpoint rules and part writing).
Your personal scoring metric may seem nice to you. But in presentation looks like you are telling people it is the real truth because meta share by itself isn't accurate enough. It is your interpretation of the data, and for all I know may be the most correct interpretation. But, it is your interpretation, not raw data. And tiers on this site have long been based on meta share. Not ability to win a tournament.
It seems to me that what ktkenshinx is trying to say is that your presentation and discussion make it seem like you are trying to say that your interpretation of the data is the cold hard truth. You may not be saying that at all, but it looks to me like that is the faltering point of the discussion. Also, as a sidenote. I would put a lot of worth in what ktkenshinx says about mtg data analysis. He has been analyzing mtg tournament data for a long time and has probably come to the same stumbling blocks as you and would love to help you over them.
So youre doing data thats 3 months ago and newer for only paper? And to make sure I'm reading this correctly, the first tab is just raw meta game field %? And the tiers have a hard divide based on colors?
Not a math guy, just trying to read this.
Its probably necessary now that wizards has fully killed any viability in reading meta % online.
Yes, first tab is sorted by raw meta share as %. Colors are approximate tiers, just divided where it seemed to make sense and went 4 deep.
Orange Tier 1
Yellow Tier 2
Green tier 3
Blue Tier 4
Purple Untiered
I'll add a couple more stats when I do the next batch with Atlanta Open and Classic tomorrow morning.
And again, if someone doesn't like my scoring or doesn't think placement or winning top 8 matters then just ignore the weighted scoring tab.
Private Mod Note
():
Rollback Post to RevisionRollBack
To post a comment, please login or register a new account.
Being a software developer with the skills to pull it off in a relatively short amount of time, I decided to go ahead and take on this project to do a meta analysis. It will only take me a few minutes to update this when new data comes available, and I plan to keep it up to date for as long as I'm following the Modern competitive scene. I may turn it into a website with more features later, but for now a spreadsheet is fine.
The data includes a tier list with a raw meta share, weighted score, and some other stats. My methodology is explained below the analysis link.
Without further ado, the results can be found here:
https://docs.google.com/spreadsheets/d/1ZRLhtRToCI2VkJ-xYVwwvYuYtFmCLyMyRH1pbjCdBg8
Meta Description
The paper meta analysis covers all major tournaments. All tournaments feature at a minumum:
The online meta analysis covers weekly Modern Challenge events. Only decks with at least 5 wins in the tournament are included. 4-3 decks are not included. Usually anywhere from 17-25 decks will have 5 wins or more in the 32 results posted.
All software was written by a professional software developer who specializes in big data analytics. All code is written using industry best practices and includes rigorous unit tests.
Archetype Definitions
Most of the archetypes are obvious, but some concessions had to be made for the sake of clarity in the lists. This was the hardest part of the whole process because each deck had to at least be glanced at before being categorized because the naming is often inconsistent or even flat out wrong. I've captured the published name in addition to my curated name so it's available for future analysis.
"Scapeshift" - All variants, including the current favored deck featuring Valakut, Primeval Titan and Scapeshift in addition to the Into the Breach version.
"U/Bx DS" - Grixis Death's Shadow plus the odd Esper Death's Shadow, which differ only slightly (red or white splash) and have the same plan.
"G/Bx DS" - Jund, G/B and the odd four-color Death's Shadow variants with white splash (but not blue), and not Abzan.
"Jund" - Encompasses non-Death's Shadow Jund Midrange and similar G/B Midrange decks.
"Abzan" - All non-Death's Shadow variants with or without Traverse.
Jeskai - There are three distinctions:
"Death and Taxes" - The mono-white deck only, no B/W or G/W variants.
"B/W Eldrazi" - A distinct Eldrazi variant, sometimes called "Eldrazi and Taxes". Some similarity to the mono-white Death and Taxes but different enough to be separated.
Bant and G/W Company - There are a several variants, and the naming in deck lists is often quite mixed up:
Meta Share Calculation
Meta share is calculated as a raw percentage of all placements in all tournaments analyzed for the last 90 days with no weighting. It is not analagous to overall meta share since it only considers top finishers, but it is certainly a reasonable proxy for meta share at the top of the competitive meta.
Tier Calculation
Tiers are determined using the community standard metric of overall meta share. The cutoffs are fixed as follows:
Tier 1: >= 4.0%
Tier 2: >= 2.0%
Tier 3: >= 1.0%
Tier 4: >= 0.5%
Untiered: < 0.5%
Weighted Tier List Calculation
Results are weighted using scaling functions on two dimensions. The first dimension is placement in the tournament results, with a higher placement scoring a little more and with bonus points for each Top-8 win. The second dimension is time, with more recent tournaments scoring highest and more distant tournaments scoring an increasingly reduced amount.
Archetypes are weighted not just by placing in the top of the tournament but also by where they place and whether they won additional Top-8 games. The scaling function is linear and relatively flat such that the 32nd place deck gets 1 point, the 8th place finisher gets 2 points, and the 1st place finisher gets 2.5 points. Decks that win Top-8 games (i.e. the first through 4th finishers) get an additional .5 points for each win (.5, 1 or 1.5 points).
For some tournaments, there are only 16 finishers given. In this case, the scaling still works from a base of 32, so the 16th place deck would get the same number of points (1.6) that the 16th place deck would get in a 32-finisher result set. The same applies to every placement up to 1.
The numbers given above apply only to the most recent tournament, and more distant tournaments scores will be deducted a percentage amount depending on how distant they are. All tournaments being scored are in the last 90 days. Scoring deduction amounts scale linearly from a base date of the most recent tournament date, with the most distant tournament placements (90 days older than the most recent) having their scores reduced by about 50%. Scores that are one week older than the most recent event would be reduced by only about 3.89%, and so on.
Comments on specific decks:
I think there should be more caveats concerning how literally we should take this performance data and how much stake we put into conclusions drawn from it's analysis. Do you truly believe GDS and Affinity are tier 0 right now in modern? What I see is a shifting sideboard war as players attack GDS with graveyard hate and get caught sleeping by affinity, and vice versa. There is also the fact that the top 32 can be wildly skewed due to tiebreakers and draft-portions of some events, and with the low amount of data points in a 90-day period this can really really significantly impact tierings in your system.
So all in all, I really like what you've presented here but I think we should be looking to draw more qualitative conclusions about sideboard and prevalence trends rather than view it as a quantitative indicator of raw performance.
U Merfolk
UB Tezzerator
UB Mill
In terms of drawing conclusions from the data, the way I view it is the data is cold and emotionless. This is all the good data that we have for describing the paper meta, and it says what it says. Do I wish there were more? Yes. It would be great if they went 64 deep instead of 32 (or 16). We should bug SCG to post more. WotC too. I could go back farther in time to get more data, but more than three months ago the meta was quite different. Three months feels about right because that's WotC's release schedule, and going pre-Push or pre-Ballista wouldn't be helpful.
I think the main caveat to make, and I did mention it, is that this data describes the top of the competitive meta. It's not necessarily applicable to what people see at their LGS, but it is 100% representative of what decks are winning and doing well at the top of the format.
I think it's fine that the competitive paper meta turned out to not be what people imagined it to be. It's not what a random sampling of MTGO games led people to believe it is.
I was certainly surprised. I didn't expect there to be a clear Tier 0, but there is. It's not debatable. GDS and Affinity are placing more and higher than any other decks. A lot more, in fact twice as much as Tier 1 decks.
I didn't expect to see so many Tarmogoyf decks, especially Abzan. Goyf is a 4-of in more than 10% of the Tier 2+ decks. Goyf is performing. Again, it's not debatable. It's cold, hard data.
I have some ideas about trends, will be working on them when I get back from vacation in a couple weeks. One thought is to show week by week how each tier 3+ deck is trending over time in a line chart. It will show how one decks's rise coincides with another decks fall.
I also want to implement an archetype tagging mechanism to be able to aggregate data on different things, some examples could be "Combo", "Control", "Linear", "Interactive", "Death's Shadow", "Eldrazi", etc.
A far-reaching idea is to start ingesting deck lists and use them to train neural networks, then test new deck lists against the network to make inferences about similarity, note deck changes over time, automatically discover new decks, etc ... would be a fun project, but not now.
Edit: Looks like I missed labeling an "Auras" (aka G/W Hexproof) deck, will go back and fix that, probably tomorrow.
Edit2: So there were two other events within the last 3 months that I didn't include - Copenhagen and Kobe. That's a good chunk of data. Thought they were older for some reason. I'll also add those and see how it shifts things.
Modern - Cheeri0s (building), Belcher (building), Lantern (building), UW Control (building)
RIP Magic Duels. Wizards will regret what they did to you.
Other than that, it seems like a good project to start early. Keep up the good work
Modern:
UWUW Control
UBRGrixis Shadow
URIzzet Phoenix
If data says the top tier is just two decks, then the top tier is just two decks. No need to apologize or manipulate anything with a tier 1.5. The top tier isn't a "tier 0" situation because the top tier decks don't make up a large enough part of the meta. I would just have your tier 1.5 labeled as tier 2 since that seems to be what the data says.
KnightfallGWUR
Azorius Control UW
Burn RBG
The raw meta share percentage is there if that's what important to anyone.
But the raw meta share percentage doesn't really tell the whole story. That's why you see some differences in the order of the share and weighted score. For example, you can see that Death and Taxes is outperforming its meta share. That's pretty interesting to me.
Of course any weighting scheme is going to be arbitrary. I chose the scaling function specifically so that 8th would be twice as many points as 32nd, so that's how it works. I also wanted to reward the player/deck considerably for winning games in the top-8 playoff. And winning the tournament. Those wins are a big deal. If you think placing 32nd is just as good as winning, fine, use meta share, but I disagree.
If you don't like the scaling function, that's fine too, but it feels reasonable to me and wasn't a ton of work to implement. The numbers can easily be tweaked; they are just constants in a program. I tried tweaking them a few different ways and the current configuration is working as intended. And this is really just an intial prototype; I'm open to other ideas.
Thanks for the comments.
Regarding Jund, yeah they are separate in the calculations. I just made the comment regarding Tarmogoyf and BGx in general, which is much better positioned in the paper meta than people give it credit for. And it looks like it's gonna get a boost from the missing GPs, but I'll see for sure tomorrow.
All the tournaments are at least 9 rounds. The ones with a day two such as Opens and GPs are more like 15 I think. And they all have a Top-8 single elimination playoff. But generally speaking I agree with you - I wouldn't want to include smaller tournaments like IQs that are 6 rounds and only ~50 people.
Spirits
Yeah, the 1.5 actually was kind of a joke.
I guess people have in their mind that "Tier 0" means something specific regarding meta share. That's fine, I don't care. It's just data. But it is a huge jump from GDS/Affinity to ET, etc.
I totally agree that share doesn't tell the whole story and it's important to look at other performance metrics as well. My issue is that this scale uses arbitrary values you've selected without any clear rationale. You've then used that scale in place of a "true" performance metric like MWP/GWP or conversion rate. That's especially problematic because you tell us multiple times your analysis is "cold and emotionless" without "smoke and mirrors," but in reality there's this major personal weighting decision behind your entire scoring system, and no justification for those values.
The metagame share is certainly cold, hard data analysis and it speaks for itself. Once you add the other GP, N will get bigger and those numbers will get even tighter. But your performance metric does not meet those same objective standards. I'd cut that metric entirely, redo it from the ground up so it's not so arbitrary, or rework your presentation of the analysis so you aren't making it out to be something it isn't.
And a pre-determined mathematical formula is objective. Numbers go in, numbers come out. The function could be something different, but the results wouldn't be dramatically different because of the nature of the data.
Also, I don't see how I could possibly be making it out to be something that it isn't when I've broken down the details of the methodology. It's completely transparent and like I said anyone can just ignore the weighted score. Not sure why you're so bothered by the score, seems like you are taking it really personally.
And I'm not removing the scoring, it wouldn't be nearly as interesting without it, but it could certainly change. I'm more likely to add more columns with different metrics plus different types of trend charts than I am to remove anything.
Edit: You mentioned standard deviations, and that is certainly one possible approach. I charted out an example, and the curve didn't have the shape I wanted, so I wrote my own function.
I was wondering if you could just have a box on the spreadsheet showing the most recently added event(s) just incase people are looking at it during a double header weekend and it's not been updated as of yet.
Well done, thanks for the effort!
Legacy - LED Dredge, ANT & WDnT
https://docs.google.com/spreadsheets/d/1ZRLhtRToCI2VkJ-xYVwwvYuYtFmCLyMyRH1pbjCdBg8
There are now 13 tournaments in the results. After this weekend, there will be 15, and the data will be maxed out for the 3 month window. I think that's enough data to get a decent picture of the meta.
I'm no longer naming tiers. I did some color coding where I think the "tier" break points are. I also added another level since having 53 archetypes supports it.
The results are interesting. Some things moved quite a lot despite the tournaments being about 7 weeks old now. Dredge was reallly helped a lot by the GPs. I think its current ranking is actually a lot more accurate; I had thought it looked too low before.
Counters Company moved above ET as it was really strong at the GPs. Counters Company is an interesting case because it seems to be underrepresented in tier lists. I've concluded that it's a product of the deck having lots of different names, and the engines some sites are using just take the name as given, so the results are fragmented. I look at each deck to make sure it matches an archetype without duplication. There are other offenders, but Counters Company is the worst.
Merfolk and Gx Tron moved down. Some recent showings bolstered their ranking, but with a more complete data set we see they actually belong a bit lower down the list, which seems about right to me. I'll leave the old tab up for a bit if anyone who already had looked at that one wants to compare.
Also, I made a slight modification to my scaling function. I wanted a little bump in the curve at the top of the tournament, but maybe it was a bit too much, so I'm now only awarding .5 points for Top-8 wins instead of 1. So decks are still rewarded for winning but will be less likely to jump up a tier just for one really good showing.
When using functions like average, mean and standard deviation, you have to be careful. If you just throw data at it and accept the results, they can be skewed. For example, if I'm plotting SD of a deck's finishes, and there are lots of placements (like for Affinity), it will work out ok. But if a deck only shows up once, and has a better than average finish, it will appear better than it actually is. I'm wondering if this type of calculation is what is causing decks like Humans to be perceived as tier 2 when it's clearly untiered in the competitive paper meta.
My scoring function avoids this problem because it is weighted on appearances. Meta share is kind of the dumb version of this method, where every placement is treated as equal. And any mean or average based calculation on the aggregate data also isn't any different from meta share. I'm going to look at ways to apply statistical models to the data to come up with something that makes sense. One option that could be interesting is using probabilities to predict how likely a given deck is to win the next tournament. Any of these things are pretty easy to do, it's just a matter of deciding what makes sense for the data.
Thanks.
It's in a single column now with the date of the last calculated tournament at the top. I'll just add a new column every weekend, push the old stuff to the right.
Big improvements overall and appreciate the work. I'd just go ahead and name them Tier 1 through Tier 3. I know you want to avoid that terminology, but everyone is going to use it anyway and you aren't differentiating your project by omitting it.
I've tried to do similar performance scales in the past. For years I tried to get a scale to work in place of a real MWP/GWP, points, or conversion rate metric, and I've tried similar scales to yours where you're rating higher performance at events slightly more than lower performance. There are a few big issues with this approach. First, as you can see on your sheet, you end up with a score that cannibalizes the meta share ranking too heavily. The result is a list that is ordered almost identically to the meta share list, so it isn't giving a lot of new information.
The second problem is that the score undersells good decks. D&T makes a pretty sizable jump from the Tier 3 meta share bracket to the Tier 2 score bracket based on its score, but that gets lost in the noise of all the other decks that don't move at all.
The third problem is the scores don't compare well. Bant Eldrazi is not a good deck right now, and yet its score is the same as the D&T score, despite D&T being better positioned and having much better performances.
Finally, the score doesn't account for the so-called dartboard effect. If you throw a lot of darts at a board, some are bound to hit. If 30% of the field shows up on Grixis DS and only 15% make it to the top tables, that's still a huge share of the top tables, but a mediocre/bad conversion rate. By contrast, if 3 players played D&T and 2 ended up in the T8, that's a spectacular conversion rate. The score doesn't capture those nuances because, again, it's heavily based on share.
A better score would be to look at the points for all decks in all tournaments. SCG and Wizards publishes points by player, and you could cross-reference those with decklists to figure out average scores for different decks. That would be a much more objective approach, but I acknowledge not all tournaments provide that information.
Thanks for the comments.
I wouldn't expect all the decks to move, nor do I think anything is lost "in the noise". It's a very small data set. It's pretty easy to switch between the two and see what's going on. D&T is a good example of a deck that's outperforming its meta share, and the point system shows that. Same with Merfolk. And Elves are the opposite - it's placing frequently, but it's clearly not a threat to actually win. The scoring is working as intended.
Some of the metrics you are talking about are just impossible due to lack of data. My goal was to produce the closest approximation to meta truth with the data available, and I think I've done that. If there were better data, I'd be using it.
Edit: Also, I think the score is a better representation of the meta. Following the examples above, I'd have D&T as tier 2 and Elves as tier 3, not vice versa as meta share would have it. And tier 3 isn't bad - they are decks that are performing well.
KnightfallGWUR
Azorius Control UW
Burn RBG
Not a math guy, just trying to read this.
Its probably necessary now that wizards has fully killed any viability in reading meta % online.
So I don't want to jump into this argument deeply. But I would like to point out that you are a professional in thats stats field. So to you, it may seem super obvious that DT jumped a tier. But Ktkneshinx is correct in saying that stuff gets lost in the noise. When you post online to a large group of people who aren't professionals in your field about your field, you need to take care to present things in a clear and 'newb friendly' way (I wouldn't discuss music on here and just assume you all had a firm understanding of Fuxian counterpoint rules and part writing).
Your personal scoring metric may seem nice to you. But in presentation looks like you are telling people it is the real truth because meta share by itself isn't accurate enough. It is your interpretation of the data, and for all I know may be the most correct interpretation. But, it is your interpretation, not raw data. And tiers on this site have long been based on meta share. Not ability to win a tournament.
It seems to me that what ktkenshinx is trying to say is that your presentation and discussion make it seem like you are trying to say that your interpretation of the data is the cold hard truth. You may not be saying that at all, but it looks to me like that is the faltering point of the discussion. Also, as a sidenote. I would put a lot of worth in what ktkenshinx says about mtg data analysis. He has been analyzing mtg tournament data for a long time and has probably come to the same stumbling blocks as you and would love to help you over them.
Marath, Will of the Wild Tokens!! / Karrthus, Tyrant of Jund Dragons! / Muzzio, Visionary Architect / Brago, King Eternal / Daretti, Scrap Savant / Narset, Enlightened Master / Alesha, Who Smiles at Death / Bruna, Light of Alabaster / Marchesa, the Black Rose / Iroas, God of Victory / Freyalise, Llanowar's Fury / Omnath, Locus of rage / Titania, Protector of Argoth / Kozilek, the Great Distortion
Modern
Elves / Titanshift / Merfolk
Yes, first tab is sorted by raw meta share as %. Colors are approximate tiers, just divided where it seemed to make sense and went 4 deep.
Orange Tier 1
Yellow Tier 2
Green tier 3
Blue Tier 4
Purple Untiered
I'll add a couple more stats when I do the next batch with Atlanta Open and Classic tomorrow morning.
And again, if someone doesn't like my scoring or doesn't think placement or winning top 8 matters then just ignore the weighted scoring tab.