I was reading some articles a couple days ago from Frank Karsten about making a simulator to goldfish decks and I thought it was a pretty cool project so I went about doing the same thing for myself. His code was released free but I thought it would require too many modifications to do what I wanted to do so I instead made my own.
I finished the simulator last night and ran it. Sadly, Windows update ended the process about 90% of the way through so some decks were excluded from the results, but those were just some combinations of 28 and 30 land decks so I'm not too worried about it.
I decided to build the decks out of a 7 card pool, in future updates to the simulation I can probably expand that pool to around 10-11 cards in order to add more detail. The 7 cards I went with and their constraints were
Land - A generic rainbow land, min 14, max 30 (simulation ended in the middle of doing 28 land decks)
SmallOne - Birds of Paradise, min 0, max 8
LargeOne - Wild Nacatl (always a 3/3), min 0, max 4
SmallTwo - Watchwolf, min 0, max 8
LargeTwo - Tarmogoyf or Jotun Grunt, min 0, max 8
Three - No specific card in mind, resilient 5 power creature (think Geist or KotR averaged with smaller guys like Smiter), min 0, max 12
Burn - Lightning Bolt, min 0, max 8
For speed reasons there is also currently a step size of 2 so for example Wild Nacatl's can only be run in values of 0, 2, and 4. This was done to increase speed. The difference in a step size of 1 vs 2 is 3,145,728 possible decks vs 14,175. Instead the plan is to look for optimal card ranges, and then run it again with a narrower focus.
My logic is fairly simple as is usual when goldfishing. If mana is at 1 it prioritizes acceleration and then other one drops, at 2 it prioritizes bigger creatures over smaller ones and burn with leftover mana, and at three+ it does the same thing with three drops being the highest priority. At all stages if burn is lethal to the opponent post combat it will burn the opponent first. Each deck configuration (of 60 cards) is run 12,500 times right now. That number can easily be increased in the future with a tighter restrictions on deck construction. There is also a maximum of 13 turns per game, but so far nothing has gone beyond 12.
The mulligan strategy is different too. Rather than implement something like Frank Karstens optimal mulligan strategy I went with a different approach. An optimal mulligan strategy cannot be computed by a player quickly in the middle of a match so I find it to be fairly useless. Instead I evaluate a hand by four criteria:
1. Do you have 2+ mana?
2. Do you have 3+ non land cards?
3. Does the hand contain atleast one Wild Nacatl or any 2 drop?
If any of the above are a no it then does one more check
4. Is your handsize greater than 4?
If that is true it mulligans.
The results were fairly interesting and I wonder just how much of it was due to a low sample size, I'm not sure where to upload a spreadsheet containing the best and worst performing decks so you'll have to settle for a couple of images. What surprised me most were the land counts. The best performing decks had 26 lands plus mana dorks with a relatively equal split among other card types.
Oddly enough, the highest performing 22 land deck which you would think would do well for aggro was the following configuration and came in at #44 in the list (note that it's also over 60 cards, my best/worst screenshots exclude decks over 60).
Lands 22
SmallOne 6
LargeOne 4
SmallTwo 8
LargeTwo 8
Three 6
Burn 8
WinTurn 4.73736
Also interesting was that all of the true best decks went past 60 cards. I wonder how much of this is due to a margin of error due to sample size. Hopefully some other people find this worth discussing, or can propose some changes to my logic to improve things.
An optimal mulligan strategy cannot be computed by a player quickly in the middle of a match
I would argue that's the case if you intend to hardcode the mulligan logic itself into the simulation.
But then again, I really don't want to suggest trying to implement a learning algorithm solely to learn what hands are worth keeping because they have a good initial start for fastest possible kill, because learning algorithms are a very special kind of black magic bull*****.
But on the other hand a learning algorithm would elegantly fix a bunch of the issues I have with how this whole thing was approached in the first place. See, what I'm thinking is that the real question is determining the best starting hands.
That is, in goldfish land, where optimal play is pretty much definable as winning as soon as possible, can we then go on to evaluate starting hands in terms of how good of a start they give us regardless of the actual goldfishing sequence?
Can we simulate the mulligan decision-making process, and then use that to determine a hierarchy of playable hands?
Come to think of it, if you can quantify hands that way, what about ranking deck builds based upon their optimal hands, narrowing down a 60 card pile that maximizes the odds of seeing certain starting hands... I'm sure I'm overcomplicating the problem at hand, but I've got way too much free time on my hands as is.
Well, there's a few things I can do with mulligans. The first thing is to implement the scry rule on a pretty simple basis. Look at what the hand needs and if the scry isn't what it needs then bottom it. It's probably not too difficult to do that, but that comes after the mulligan decision.
Another thing I could do would be to change the criteria based on hand size. At 7 you're really looking for an optimal mix of cards but at 5 you're really just looking for some lands and spells. Right now the logic involves looking for a perfect hand every time, so at 5 it's still looking for 2 mana and an early game play. Perfectly coming up with a players logic is a little too complicated because you ideally want to evaluate hands based on some type of plan for how the match will play out like use both Bolts on their creatures, ride a Geist to victory (which this hand would consider a failing 5).
I think the problem with only determining the best starting hands is that you also have to account for topdecks. Even on a turn 4 win after keeping 7 you're seeing 30% of all the cards you see in the match after turn 1 and some cards are disproportionately affected by that. A Lightning Bolt is a fantastic topdeck for example, but while the data is clearly showing Birds of Paradise to be useful in the opening hand, you definitely don't want to be drawing it on turn 3 or 4.
Well again, in goldfish land we can make some assumptions.
You can modify goldfish land to have an opponent that always plays at least one creature a turn to better represent the decision-tasking game states of "they have a blocker, what do" but for the most part you shouldn't have to care about what they're doing because, again, it's goldfish land. By definition they do nothing!
It's both a benefit (simplifies factors, great for quick simulations) and a drawback (makes the usefulness of the resulting data rather questionable in practice, defeating the whole point of the simulation).
Anyway I might cobble up something of my own in my free time. Can't say it'll go anywhere but I've got a lot of free time as is and this is as good a use of my college education as I can think of right now. Don't take that to mean I'll actually deliver something, though; computer science is a discipline built upon laziness and theorycraft.
Well again, in goldfish land we can make some assumptions.
You can modify goldfish land to have an opponent that always plays at least one creature a turn to better represent the decision-tasking game states of "they have a blocker, what do" but for the most part you shouldn't have to care about what they're doing because, again, it's goldfish land. By definition they do nothing!
Well, I should probably state my goals with this project as I have three of them.
The first is that I'm interested in the Knightfall deck which if you've read the thread at all you'll see has no consensus builds. I figured my approach is just as valid as anyone elses, so I'm attempting to find an optimal skeleton for it, which should explain my card choices.
The second and third are related and are for Legacy, though I suppose they also apply to Modern. Burn is my favorite deck in that format and I take a very formulaic approach to the deck. For years I've been maintaining a spreadsheet that can evaluate cards and determine the optimal configuration. Rather than rely on formulas I would like to try a simulation approach instead and see if it offers any new insights to deck construction. Ultimately, I would like to find an optimal/near optimal configuration which can then be played against another deck.
Obviously, making two decks interact is extremely complex but I think it's possible with certain decks such as in Legacy Burn vs any variant of Delver or in Modern something like Zoo vs Esper Control provided you keep each card simple that shouldn't be too computationally expensive (blocking seems to be the worst part, so that's what matches have to avoid).
Anyway I might cobble up something of my own in my free time. Can't say it'll go anywhere but I've got a lot of free time as is and this is as good a use of my college education as I can think of right now. Don't take that to mean I'll actually deliver something, though; computer science is a discipline built upon laziness and theorycraft.
If you want my code I can provide it, it's in Python. I just have to touch up some comments first since I'm really bad about writing comments while writing code.
Would there be a way to do this in such a way that you plug in the functions of the top cards played in a format and have it spit out the best "deck" for preventing an opponent from having any relevant interaction, or for maximum relevant interaction with the "top cards"?
Ran another simulation while at FNM tonight. Tightened card constraints based on the best results prior and increased the number of iterations per game. This one used
Land - 22 to 28
SmallOne - 2 to 4
LargeOne - 0 to 4
SmallTwo - 4 to 8
LargeTwo - 4 to 8
Three - 6 to 12
Burn - 6 to 8
With 25,000 games per deck. Results of the top 20 decks attached.
The highest performing deck overall was actually 66 cards, which probably says something about the variance involved in just 25k games.
Land - 28
SmallOne - 2
LargeOne - 4
SmallTwo - 8
LargeTwo - 8
Three - 8
Burn - 8
Would there be a way to do this in such a way that you plug in the functions of the top cards played in a format and have it spit out the best "deck" for preventing an opponent from having any relevant interaction, or for maximum relevant interaction with the "top cards"?
Not that I can think of, but my programming skills are at best average. A quick runtime is needed in order to generate large enough samples to overcome the huge amount of variance in the game (I'm shorting it at 25k games right now, but I'm trying to narrow the ranges down to where I can get 100k game iterations), which means things need to be relatively simple. The complex branching lines of play that are created when two decks actually interact with each other means that simple can't be done. This means that going down such a route can't produce a large enough sample size which means the variance is still there, which makes such an approach useless. You would either need access to a supercomputer or a large distributed computing project to do it.
As promised here's the code. The pastebin link will expire in a week. The reason for that is that I'm still making updates and would prefer older versions to not sit around forever.
And... next round done. I used very wide parameters this time with a higher number of iterations per game because I've been noticing a couple of peculiar things that go completely against typical Magic wisdom and I'm not sure if it's a legitimate insight, a flaw in my logic (posting my code helps rule out this one), or a case of setting up the simulation parameters to confirm what I've seen. Most notably, in every simulation run so far the best performing decks have contained above 60 cards, with the best almost all being in the 63-64 range. 64 is a big difference from 61 so it really flies in the face of all established Magic theory.
I've attached three screenshots here. One is the best results overall, with all the results that would fit on my screen as I want to show the pattern that's pretty clear with deck size. Next is the best decks limited to 60 cards. The final one is the best optimized. There is some clear variance taking place because the best 60 card deck is using 8 Watchwolf and 2 Tarmogoyf (for this simulation Goyf is always 4 power) which is very clearly not correct because Goyf is strictly better in this test. As a result the final screenshot is the top 20 decks after removing all cases where a Watchwolf is being run over a Goyf.
I don't think it's worth it to try to optimize a Skeleton deck, they're a pretty weak tribe compared to zombies and even zombies aren't very good in Modern.
Private Mod Note
():
Rollback Post to RevisionRollBack
These days, some wizards are finding they have a little too much deck left at the end of their $$$.
MTG finance guy- follow me on Twitter@RichArschmann or RichardArschmann on Reddit
While I know nothing about coding, I think I can explain why your winning decks have 64+ cards.
First, there is no uniqueness to the cards. In your simulation, the decks contain up to 12 copies of Lightning Bolt. In real magic, you can only play four bolts, and the rest of your bolts need to be rift bolt or shard volley or other cards, which are all obviously weaker. In real magic, players are heavily incentivized to play with exactly sixty cards, so they can draw lots of Lightning bolts and fewer shard volleys. Within the simulation's version of magic, there's no real difference between a 60 card deck and a 600 card deck-- the first deck might run 8 lightning bolts and the second might run 80, but they'd play almost exactly the same.
Further, with a smaller deck you're more likely to draw sideboard cards in games 2 and 3. The simulation doesn't account for that.
I think the simulation may favor larger decks because they can get the closest to a 'solved' ratio of cards. The ideal configuration may contain, for example, a 1:1 ratio of burn to birds, a 3:2 ratio of birds to nacatls, a 17:13 ratio of nactls to goyfs, and so on. Since you don't allow for much granularity with your simulation, larger decks are better able to be built according to these ratios.
In short, more granularity will make your winning decks have 60 cards.
Also, in an ideal world, copies 5-8 of a card should be somehow worse than copies 1-4. I have to imagine that would be a coding nightmare, however.
Private Mod Note
():
Rollback Post to RevisionRollBack
Currently Playing...
Modern: Kiki ChordWBRG
TokensWB
EDH: Kuon, Ogre AscendantBBB
#FreeContractfromBelow
To post a comment, please login or register a new account.
I finished the simulator last night and ran it. Sadly, Windows update ended the process about 90% of the way through so some decks were excluded from the results, but those were just some combinations of 28 and 30 land decks so I'm not too worried about it.
I decided to build the decks out of a 7 card pool, in future updates to the simulation I can probably expand that pool to around 10-11 cards in order to add more detail. The 7 cards I went with and their constraints were
Land - A generic rainbow land, min 14, max 30 (simulation ended in the middle of doing 28 land decks)
SmallOne - Birds of Paradise, min 0, max 8
LargeOne - Wild Nacatl (always a 3/3), min 0, max 4
SmallTwo - Watchwolf, min 0, max 8
LargeTwo - Tarmogoyf or Jotun Grunt, min 0, max 8
Three - No specific card in mind, resilient 5 power creature (think Geist or KotR averaged with smaller guys like Smiter), min 0, max 12
Burn - Lightning Bolt, min 0, max 8
For speed reasons there is also currently a step size of 2 so for example Wild Nacatl's can only be run in values of 0, 2, and 4. This was done to increase speed. The difference in a step size of 1 vs 2 is 3,145,728 possible decks vs 14,175. Instead the plan is to look for optimal card ranges, and then run it again with a narrower focus.
My logic is fairly simple as is usual when goldfishing. If mana is at 1 it prioritizes acceleration and then other one drops, at 2 it prioritizes bigger creatures over smaller ones and burn with leftover mana, and at three+ it does the same thing with three drops being the highest priority. At all stages if burn is lethal to the opponent post combat it will burn the opponent first. Each deck configuration (of 60 cards) is run 12,500 times right now. That number can easily be increased in the future with a tighter restrictions on deck construction. There is also a maximum of 13 turns per game, but so far nothing has gone beyond 12.
The mulligan strategy is different too. Rather than implement something like Frank Karstens optimal mulligan strategy I went with a different approach. An optimal mulligan strategy cannot be computed by a player quickly in the middle of a match so I find it to be fairly useless. Instead I evaluate a hand by four criteria:
1. Do you have 2+ mana?
2. Do you have 3+ non land cards?
3. Does the hand contain atleast one Wild Nacatl or any 2 drop?
If any of the above are a no it then does one more check
4. Is your handsize greater than 4?
If that is true it mulligans.
The results were fairly interesting and I wonder just how much of it was due to a low sample size, I'm not sure where to upload a spreadsheet containing the best and worst performing decks so you'll have to settle for a couple of images. What surprised me most were the land counts. The best performing decks had 26 lands plus mana dorks with a relatively equal split among other card types.
Oddly enough, the highest performing 22 land deck which you would think would do well for aggro was the following configuration and came in at #44 in the list (note that it's also over 60 cards, my best/worst screenshots exclude decks over 60).
Lands 22
SmallOne 6
LargeOne 4
SmallTwo 8
LargeTwo 8
Three 6
Burn 8
WinTurn 4.73736
Also interesting was that all of the true best decks went past 60 cards. I wonder how much of this is due to a margin of error due to sample size. Hopefully some other people find this worth discussing, or can propose some changes to my logic to improve things.
Edit: Included info on step size.
I would argue that's the case if you intend to hardcode the mulligan logic itself into the simulation.
But then again, I really don't want to suggest trying to implement a learning algorithm solely to learn what hands are worth keeping because they have a good initial start for fastest possible kill, because learning algorithms are a very special kind of black magic bull*****.
But on the other hand a learning algorithm would elegantly fix a bunch of the issues I have with how this whole thing was approached in the first place. See, what I'm thinking is that the real question is determining the best starting hands.
That is, in goldfish land, where optimal play is pretty much definable as winning as soon as possible, can we then go on to evaluate starting hands in terms of how good of a start they give us regardless of the actual goldfishing sequence?
Can we simulate the mulligan decision-making process, and then use that to determine a hierarchy of playable hands?
Come to think of it, if you can quantify hands that way, what about ranking deck builds based upon their optimal hands, narrowing down a 60 card pile that maximizes the odds of seeing certain starting hands... I'm sure I'm overcomplicating the problem at hand, but I've got way too much free time on my hands as is.
Another thing I could do would be to change the criteria based on hand size. At 7 you're really looking for an optimal mix of cards but at 5 you're really just looking for some lands and spells. Right now the logic involves looking for a perfect hand every time, so at 5 it's still looking for 2 mana and an early game play. Perfectly coming up with a players logic is a little too complicated because you ideally want to evaluate hands based on some type of plan for how the match will play out like use both Bolts on their creatures, ride a Geist to victory (which this hand would consider a failing 5).
I think the problem with only determining the best starting hands is that you also have to account for topdecks. Even on a turn 4 win after keeping 7 you're seeing 30% of all the cards you see in the match after turn 1 and some cards are disproportionately affected by that. A Lightning Bolt is a fantastic topdeck for example, but while the data is clearly showing Birds of Paradise to be useful in the opening hand, you definitely don't want to be drawing it on turn 3 or 4.
You can modify goldfish land to have an opponent that always plays at least one creature a turn to better represent the decision-tasking game states of "they have a blocker, what do" but for the most part you shouldn't have to care about what they're doing because, again, it's goldfish land. By definition they do nothing!
It's both a benefit (simplifies factors, great for quick simulations) and a drawback (makes the usefulness of the resulting data rather questionable in practice, defeating the whole point of the simulation).
Anyway I might cobble up something of my own in my free time. Can't say it'll go anywhere but I've got a lot of free time as is and this is as good a use of my college education as I can think of right now. Don't take that to mean I'll actually deliver something, though; computer science is a discipline built upon laziness and theorycraft.
Well, I should probably state my goals with this project as I have three of them.
The first is that I'm interested in the Knightfall deck which if you've read the thread at all you'll see has no consensus builds. I figured my approach is just as valid as anyone elses, so I'm attempting to find an optimal skeleton for it, which should explain my card choices.
The second and third are related and are for Legacy, though I suppose they also apply to Modern. Burn is my favorite deck in that format and I take a very formulaic approach to the deck. For years I've been maintaining a spreadsheet that can evaluate cards and determine the optimal configuration. Rather than rely on formulas I would like to try a simulation approach instead and see if it offers any new insights to deck construction. Ultimately, I would like to find an optimal/near optimal configuration which can then be played against another deck.
Obviously, making two decks interact is extremely complex but I think it's possible with certain decks such as in Legacy Burn vs any variant of Delver or in Modern something like Zoo vs Esper Control provided you keep each card simple that shouldn't be too computationally expensive (blocking seems to be the worst part, so that's what matches have to avoid).
If you want my code I can provide it, it's in Python. I just have to touch up some comments first since I'm really bad about writing comments while writing code.
Lantern Control
(with videos)
Uc Tron
Netdecking explained
Netdecking explained, Part 2
On speculators and counterfeits
On Interaction
Every single competitive deck in existence is designed to limit the opponent's ability to interact in a meaningful way.
Record number of exclamation points on SCG homepage: 71 (6 January, 2018)
"I don't want to believe, I want to know."
-Carl Sagan
Land - 22 to 28
SmallOne - 2 to 4
LargeOne - 0 to 4
SmallTwo - 4 to 8
LargeTwo - 4 to 8
Three - 6 to 12
Burn - 6 to 8
With 25,000 games per deck. Results of the top 20 decks attached.
The highest performing deck overall was actually 66 cards, which probably says something about the variance involved in just 25k games.
Land - 28
SmallOne - 2
LargeOne - 4
SmallTwo - 8
LargeTwo - 8
Three - 8
Burn - 8
Not that I can think of, but my programming skills are at best average. A quick runtime is needed in order to generate large enough samples to overcome the huge amount of variance in the game (I'm shorting it at 25k games right now, but I'm trying to narrow the ranges down to where I can get 100k game iterations), which means things need to be relatively simple. The complex branching lines of play that are created when two decks actually interact with each other means that simple can't be done. This means that going down such a route can't produce a large enough sample size which means the variance is still there, which makes such an approach useless. You would either need access to a supercomputer or a large distributed computing project to do it.
As promised here's the code. The pastebin link will expire in a week. The reason for that is that I'm still making updates and would prefer older versions to not sit around forever.
The parameters were:
length = 50,000
step size = 1
Land - 19-28
SmallOne - 0-4
LargeOne - 2-4
SmallTwo - 0-8
LargeTwo - 2-4
Three - 6-12
Burn - 6-14
I've attached three screenshots here. One is the best results overall, with all the results that would fit on my screen as I want to show the pattern that's pretty clear with deck size. Next is the best decks limited to 60 cards. The final one is the best optimized. There is some clear variance taking place because the best 60 card deck is using 8 Watchwolf and 2 Tarmogoyf (for this simulation Goyf is always 4 power) which is very clearly not correct because Goyf is strictly better in this test. As a result the final screenshot is the top 20 decks after removing all cases where a Watchwolf is being run over a Goyf.
MTG finance guy- follow me on Twitter@RichArschmann or RichardArschmann on Reddit
First, there is no uniqueness to the cards. In your simulation, the decks contain up to 12 copies of Lightning Bolt. In real magic, you can only play four bolts, and the rest of your bolts need to be rift bolt or shard volley or other cards, which are all obviously weaker. In real magic, players are heavily incentivized to play with exactly sixty cards, so they can draw lots of Lightning bolts and fewer shard volleys. Within the simulation's version of magic, there's no real difference between a 60 card deck and a 600 card deck-- the first deck might run 8 lightning bolts and the second might run 80, but they'd play almost exactly the same.
Further, with a smaller deck you're more likely to draw sideboard cards in games 2 and 3. The simulation doesn't account for that.
I think the simulation may favor larger decks because they can get the closest to a 'solved' ratio of cards. The ideal configuration may contain, for example, a 1:1 ratio of burn to birds, a 3:2 ratio of birds to nacatls, a 17:13 ratio of nactls to goyfs, and so on. Since you don't allow for much granularity with your simulation, larger decks are better able to be built according to these ratios.
In short, more granularity will make your winning decks have 60 cards.
Also, in an ideal world, copies 5-8 of a card should be somehow worse than copies 1-4. I have to imagine that would be a coding nightmare, however.
Modern: Kiki ChordWBRG
TokensWB
EDH: Kuon, Ogre AscendantBBB
#FreeContractfromBelow