Hugo Hacker News

How did so many Dungeon Crawl: Stone Soup players miss such an obvious bug?

dota_fanatic 2021-08-17 03:46:34 +0000 UTC [ - ]

Big fan of DCSS, but this doesn't surprise me. After getting a few all rune wins I stopped playing largely because it is so opaque. Even after spending a LOT of quality time with the (sometimes incorrect) wiki and in-game lookup system, I realized I would never have an intuitive sense of the effects of my choices wrt stats, skilling, weapon and armor choices, and so on. I watched some of the best crawl players in the world play to see how they approached these very common situations and even there I didn't see as much consensus as I hoped to find given common scenarios.

Part of the joy of being human is understanding systems and working with them efficiently to achieve goals. Good luck doing that with crawl. You can sort of achieve this by dumping a lot of time and building up intuition via experience, but you still won't be able to say why you chose this over that for all your choices.

I found myself getting into scenarios, wondering, which choice is better along one particular dimension? STR vs DEX? This armor vs that? What are the trade-offs? The only real way to find that out would be to dump the game at its current state and then simulate both choices against a suite of enemies to see how it plays out at least in 1v1 situations, never mind when handling many enemies. It's frustrating playing a game with so many numbers yet the output of the systems they feed into are so opaque.

It's really unfortunate because although it already feels like a gem of a game, it could be soo much better if the interplay of the various combat / character systems wasn't so ridiculously complected. I would love to see some of the sensibilities of the Factorio team applied to crawl's UI and underlying combat/character systems.

eeegnu 2021-08-17 04:48:26 +0000 UTC [ - ]

It's worth noting that at least in recent versions, the in game formulas have been getting simplified. For instance armor used to give you both AC and GDR (guaranteed damage resistance), whereas now GDR is a direct function of AC regardless of its source. Similarly DEX now more directly influences Evasion, whereas before it was artificially capped to give very diminishing returns past ~25 DEX, and now allows you to get very high Evasion.

dota_fanatic 2021-08-17 13:43:55 +0000 UTC [ - ]

That's good to hear. I hope they keep going down that path. To refer to another game, in any of the Soulsborne games (Dark Souls, Bloodborne, ...) by From Software, as you level up you get to contribute points towards stats which affects your ability to wield weapons and how much damage they do. You don't even have to equip them, the game gives you a diff of what your new damage will be along various dimensions (physical, magic, etc) as well as how your "equip load" is affected just by inspecting it. Although this doesn't necessarily mean you can understand the difference between a really heavy, highly damaging weapon vs a fast weapon, by reading a wiki or by playing through once or twice you'll have a very good base understanding of what different weapons bring to the table and how your character drives those strengths/weaknesses.

With DCSS? Well... how much damage you do is going to depend on whether or not you're wearing a shield (what size?), what kind of armor you're wearing if any, how much you've trained for that weapon class, how much STR you have, what bonuses it has, and what bonuses you have from gear. Then RNG will play a significant role on top of that. Maybe RNG applies twice, once for your swing, once for their defense? It's been too long, I can't say confidently.

I realize it's challenging for the interface design because everything is already tight but I'm sure there's a solution. If I equip this armor/weapon/jewelry, how fast will I be attacking compared to now? What will my damage distribution look like before enemy resistances? Same with allocating stats, please give feedback in the form of derived data that is closer to the thing I care about (surviving).

harpiaharpyja 2021-08-17 14:51:24 +0000 UTC [ - ]

I don't think the problem is complexity as much as it is opaqueness. I have played other games that are also very complex but yet still very enjoyable because the complexity is open to be examined.

kibwen 2021-08-17 17:10:24 +0000 UTC [ - ]

The opacity is a remnant from the game as it was designed twenty years ago, which took cues from roguelikes of the day such as NetHack (which is famous for its opacity). The developers of Stone Soup have spent a few decades improving the interface with the goal that you shouldn't need any external wikis or resources to win: all info should be in the game itself. For example, a relatively recent-ish change is that you can inspect enemies and see a list of all possible spells that they can cast, along with estimates of the damage that each spell will do. A much older feature is the in-game help menu, which is impressively thorough and will give you a high-level overview of every monster, spell, stat, ability, item, god, level, status effect, and so on (although it's not necessarily super-detailed when it comes to obscure things like calculating damage; simplifying these calculations is the only way to go).

tsywke44 2021-08-17 08:37:26 +0000 UTC [ - ]

That actually sounds nice. Compared to most other "competitive” games, where there is an optimal meta way of playing it, and doing anything else is basically trolling and instant loss

dota_fanatic 2021-08-17 14:00:14 +0000 UTC [ - ]

In some sense, yes, but in another, not so much. Crawl is quite difficult and you're always a few steps away from death if you don't react to different situations appropriately. It can take players months to years to get their first win. Some people are never able to beat it before giving up on it. There's still an optimal way to play in most situations, but many couldn't tell you what that optimum is beyond some rules of thumb which aren't always right.

Don't get me wrong, there's a lot of gold in those hills and I had a lot of fun getting better and getting my first win. But after that I was very frustrated with how hard it was to just stop at a fork in the road and try to answer, which choice is objectively better? I like difficult games that are fair and require skill growth to succeed in, but I want to be able to understand them so I can make informed choices as I get better as well. Too much: because that's what the black box determined. sorry, you died. should have been more cautious.

gambiting 2021-08-17 12:28:51 +0000 UTC [ - ]

Isn't this what this is though? There are certain builds where if you follow them to a T, then the game is "easy" and you win almost always. If you just experiment then yeah you can win by some crazy stroke of luck, but most likely you will just have a bad time.

I had the same experience in Caves of Qud(which is great btw) - there are some builds that really really really work, and others which are unwinnable.

spywaregorilla 2021-08-17 15:27:03 +0000 UTC [ - ]

No not really. Very few builds will let you win the game easily. Situational awareness is required. Relatively few enemies in the game are just melee brutes. Most of them occupy some niche that requires you to think. They can confuse your character, or lock all the doors, or summon, or irresistibly cut your health in half, etc. An exceptionally fortunate character, say, a kung fu octopode transmutated into statue form and wearing 8 magical rings of power can probably karate chop their way through 99.9% of the game without much of a worry, but may still lose in a 1:1 slug fest with certain lords of hell.

There are easier builds and there are harder builds, but if you aren't good, you will lose. Case in point, the bug described here effectively doubled damage output and win rates were still pretty low.

kibwen 2021-08-17 17:18:49 +0000 UTC [ - ]

Although it's worth noting that there are different degrees of "winning the game". Only three runes are required to win, but maximizing your score and prestige will require you to get all fifteen runes. If you're just going for a three-rune victory, then there are more-or-less "safe" builds. For example, a Minotaur of any melee class, or a Berserker of any species that isn't completely inept at melee can reliably win a three-rune (not a guaranteed victory obviously, you still need to know what you're doing, and most importantly when to run away), although these characters are going to have a very hard time in the extended endgame without some sort of pivot.

lmohseni 2021-08-17 13:16:40 +0000 UTC [ - ]

There’s a great YouTuber named Ultraviolent4 who plays lots of “unwinnable” or maybe just really odd combinations, and watching his games really improved my own.

slingnow 2021-08-17 15:58:46 +0000 UTC [ - ]

Sounds like the game isn't made for you then.

This would be like attempting to play Go, and complaining that it's nothing like Chess because the optimal move generally isn't obvious even after 10,000 hours of play. And then loosely tying it into "part of the joy of being human" instead of simply stating that "this is my preference".

dota_fanatic 2021-08-17 19:19:21 +0000 UTC [ - ]

Re your first paragraph, sure, and I'm fine with that. But the following analogy is quite off.

Go read their manifesto: https://github.com/crawl/crawl/blob/master/crawl-ref/docs/cr...

Specifically the section titled "Clarity".

> Things ought to work in an intuitive way.

Spoiler alert, they most certainly do not. There's nothing intuitive about being completely unable of knowing the practical effects of something as simple as equipping a piece of armor or a shield or allocating a point to STR vs DEX. I gave an example in another comment of how other games do provide information when comparing between options so you can intuit their effects. I have faith they'll get there, though. I believe they'll gain many more players if they can, too. :) I'm sure I'll be back in a year or two after they improve the gameplay more.

gnramires 2021-08-17 16:27:48 +0000 UTC [ - ]

Yes, there's an interesting distinction here. Although I'd say it's more like Chess than Go, in the sense the mechanics are really simple in Go, while Chess has specific rules for each piece and you should memorize openings and quirks to get an edge (Go is much more about generalization).

In reality, both Chess and Go and all board games are fairly simple, you can learn all the rules in a few sittings and then just focus on strategy.

In roguelikes, often learning the system is the point (or a point) of the game. The fun, at least to me, is in dealing with the unknown and discovering how it works and what works.

Nethack (a classic roguelike) takes it to the extreme -- a bit too much in fact, because an essential game mechanic is an obscure passage you'd have to deduct (Elbereth). But the fun is there: you grab a potion and have no idea what happens if you drink it -- maybe you will be killed instantly, maybe it will make you into a god -- it's not just "which of those effects am I going to get?". You have to rely on previous knowledge, as well as expectations from similar games and what you know about the game.

But isn't that quite common IRL? Like, okay figuring out what you should be eating has a science to back it (albeit quite extensive): you read the most effective diets, what risks it reduces, you health, your weight and need to make a decision. Already not very simple strategic decision. And then you need to take into account satisfying your tastes, how often diet fads reverse (is eating butter healthy? go figure), how to do it when sharing a meal with other people, and so on. The unknowns are as significant as skillful strategic planning. The same could be said to running a company: you need to manage uncertainties in a vast decision space, with opportunities like calling for partnerships, merging, buying other companies, spending on uncertain research, and much more. You slowly acquire domain-specific knowledge (with help from the vast literature, plus a few insights of yours) and use that to in turn reduce and manage uncertainties and plan ahead (here your model of other players and capability to find the 'best move' -- least risky, greatest payoff, more robust -- will shine).

hinkley 2021-08-17 19:27:47 +0000 UTC [ - ]

The term common in MMOs is theorycrafting, and it gained particular momentum in an era where some common games allowed for addons that could export telemetry data. Contrived or common scenarios could be mined (Monte Carlo simulations if you will) to determine how patterns lined up with theories.

2021-08-17 09:07:22 +0000 UTC [ - ]

Gunax 2021-08-17 08:06:48 +0000 UTC [ - ]

I don't think this is player stupidity at all. Actually, I suspect it's perfectly rational.

The game is so noisy that no one can really predict what will happen. If I hit the grue with my sword--it could reasonably do 10% of damage or 40%--and both seem plausible.

In a statistical sense, the greater the standard deviation is, the more difficult it is to detect a sample population that differs from the established distribution.

In a bayesian sense, the players may have been perfectly rational--the likelihood of doubling or tripling your winrate is just not very unlikely when you're playing a game which is very random. Perhaps players can rationally conclude they were in the top 1% of luck at the moment, and that the chance of a major bug is less than that.

As evidenced by the referenced posts, it only becomes obvious when the sample size is huge (eg. the sample of all players reading the subreddit).

ajuc 2021-08-17 09:51:40 +0000 UTC [ - ]

Yes. It's very similar to when you encounter compiler or operating system or hardware bug as a programmer.

You are trained for years to NEVER BLAME THE LOWER LAYERS because that makes you a worse programmer. So when it is actually their fault - you might look stupid for exhausting every other possibility before checking these basic assumptions.

But this is still a good rule, because it breaks very rarely.

Retric 2021-08-17 10:33:56 +0000 UTC [ - ]

Reminds me of the utter hell it was to discover the test machine had bad RAM.

ajuc 2021-08-17 10:43:21 +0000 UTC [ - ]

I've had a bug in linux driver for a very shitty touchscreen we used in our embedded devices. It needed manual calibration and you had to save the calibration matrix in a config file read on system startup. They used locale-aware input function in a driver so when the locale was German (or other with "," as a decimal separator) it broke and users got kernel panic at startup :)

It bricked all our devices in Germany.

We worked around it, notified the device driver developers and forgot about it.

A year later we shipped with new version of kernel and it broke again - because the bug was fixed in kernel and our workaround was now causing the bug :)

hinkley 2021-08-17 19:34:43 +0000 UTC [ - ]

This is part of what drove people to CI/CD. If your code is broken in the morning and the build is green, it’s probably you. If it’s red, maybe you should check other things sooner?

jerf 2021-08-17 13:33:58 +0000 UTC [ - ]

One of the most interesting insights I retained from my machine learning course from a couple of decades ago is that you can put a mathematical bound on how quickly you can learn from a given set of data, and how much you can learn. In the situations presented by the real world this is often not very interesting because we can only process such a vanishing fraction of the data available that we're so far below this theoretical optimum that it's uninteresting, but in more constrained situations it can be relevant.

In this case, while I agree that there was enough data to have a good guess that something had gone wrong after a certain period of time, it definitely is the sort of noisy data that would take some non-trivial aggregation to be sure, and even in hindsight I'm not so sure the signal was so strong that it's worth metaphorically beating anyone up about.

Moreover, only in hindsight is it obvious that the culprit was a doubling of melee damage. Prior to knowing that, the learning rate was constrained by the multiplicity of possible theories that could have explained this. It's not like it was simply a matter of "Either 1. the game is unchanged and people have just Got Gud or 2. the melee damage rate has been doubled", the span of possibilities was much larger than that. Had an oracle come down from whereever and offered that choice to the devs, sure, gathering enough data to resolve that one bit's worth of information would have been easy. But presented with the full, massive array of possibilities, it's a much harder task to even determine that there is a problem, let alone what it is.

Upshot, don't beat yourselves up too much. It may be obvious in hindsight but these are really quite hard problems to solve in the deliberate absence of information.

eutectic 2021-08-17 11:22:57 +0000 UTC [ - ]

Assuming your win rate goes from 2% -> 4%, you need around 200 samples to be 90% confident that something is up, 270 to be 95% confident, and 450 to be 99% confident, if you start with a Beta(2, 98) prior. Obviously results will change somewhat depending on the model.

gnramires 2021-08-17 12:28:21 +0000 UTC [ - ]

From the article:

> Parthenocarpy:

> Well that explains me going from a 2% winrate to 17.95% in the span of two weeks

In that case, as soon as you got more than 1 win you should notice an abnormality (according to back-of-the napkin 8 trials to perceive the difference, using frequentist stat).

Also, that assumes you only get information from win/loss. In reality, every interaction (monster damage) should provide information (too easy kills, abnormal damage).

eutectic 2021-08-17 15:11:05 +0000 UTC [ - ]

For that drastic a difference you need to play about 30 games on average to be 99% sure that p_win>0.02 in the Bayesian model.

Agreed that win rates are not the only data available, and that players probably should have noticed the difference. Attributing it to a bug is arguably a harder problem.

cridenour 2021-08-17 03:19:14 +0000 UTC [ - ]

> Computers are warm rocks we tricked into doing math and it’s a miracle they do anything.

This might be my favorite way to describe programming.

kibwen 2021-08-17 05:26:13 +0000 UTC [ - ]

Computers are literally (literally) magic. Take a rock, use light to inscribe it with arcane runes, then infuse it with lightning and recite the proper incantation to complete the spell.

notanzaiiswear 2021-08-17 09:14:18 +0000 UTC [ - ]

I often wonder about a fantasy world where magic works like technology in the real world (like there is long arduous research to create new spells, not simply "being born with magical talent"). Would it become a popular fantasy setting?

thaumasiotes 2021-08-17 10:06:51 +0000 UTC [ - ]

The Obsidian Trilogy (Mercedes Lackey and James Mallory) is set in such a world, though it also features "wild magic" (you petition the wild gods for an effect, and you probably get it, but you take on an open-ended obligation to do something for them at a later point) and demonic magic.

Tom4hawk 2021-08-17 11:31:00 +0000 UTC [ - ]

> you petition the wild gods for an effect, and you probably get it, but you take on an open-ended obligation to do something for them at a later point) and demonic magic

So.. technical debt and "hacks" ?:D

tygrak 2021-08-17 10:05:55 +0000 UTC [ - ]

Pretty much what you are describing about is Ra by qntm [1]. I would definitely recommend it!

[1] https://qntm.org/ra

TremendousJudge 2021-08-17 15:56:27 +0000 UTC [ - ]

Was going to recommend this

gerdesj 2021-08-17 09:33:56 +0000 UTC [ - ]

Sir Terry Pratchett often explored something similar to that in his Discworld novels but perhaps in reverse - who knows!

"Anthill Inside" 8)

solarmist 2021-08-17 21:11:55 +0000 UTC [ - ]

A comedy series takes this literally. Magic 2.0 by Scott Meyer.

On some computers there's a mysterious file that has all of the (world/universe's)? object variables, such as a person/object's location or height, age, etc, and magic is performed by manipulating those via verbal macros.

New spells are researched by literally programming and debugging.

It's a fun series, but it feels the author's done about as much as he can with it.

wccrawford 2021-08-17 15:53:41 +0000 UTC [ - ]

I read a litrpg book that the main character got a power that was basically programming. He could modify objects to do certain things based on triggers.

It was quite entertaining. But most of what he did was either off-the-cuff, or the book hid the hours and hours of testing and research so that it wouldn't bore the reader.

Unfortunately, I couldn't find the name of the book in my reading history.

notanzaiiswear 2021-08-18 09:43:28 +0000 UTC [ - ]

Now that you mention it I seem to remember a book where a programmer beams himself into a world where he can then live like a wizard because he can affect it with his programming. Also can't remember the title, but it was a fun read.

wccrawford 2021-08-18 12:02:37 +0000 UTC [ - ]

I found it!

Apocalypse: Generic System (Systems of the Apocalypse Book 1)

I haven't read the sequel yet, though.

whatshisface 2021-08-17 15:09:11 +0000 UTC [ - ]

That's basically how it works in Harry Potter. That's why you have to go to school for magic.

notanzaiiswear 2021-08-17 16:52:52 +0000 UTC [ - ]

There is not a lot about discovering new spells, though? Possible exception Snape's remarks in his notebook on potion brewing.

And in HP people have the innate talent for magic.

whatshisface 2021-08-17 19:56:57 +0000 UTC [ - ]

People can have more or less talent at technology, although it's usually called "intelligence."

vidarh 2021-08-17 11:42:34 +0000 UTC [ - ]

Recent Dr Strange has him learn how to forge magic items in a somewhat structured way after running into an alien who approaches magic pretty much like technology.

eggsome 2021-08-17 09:18:56 +0000 UTC [ - ]

Wizard's Bane (1989) is a fun take on that.

canadianfella 2021-08-17 05:59:59 +0000 UTC [ - ]

Not literally

jholman 2021-08-17 06:45:48 +0000 UTC [ - ]

The traditional extension of this joke is:

"Hey. Don't oversimplify. First you gotta hammer them flat and trap lightning inside them."

meowster 2021-08-17 04:16:58 +0000 UTC [ - ]

The energy that made those rocks warm, came from spinning other rocks really fast.

teruakohatu 2021-08-17 04:50:23 +0000 UTC [ - ]

I might be missing the joke, but it mostly comes from burning rocks (coal), shining sunlight on rocks, water flowing downhill or wind/gas/steam spinning metal.

tsimionescu 2021-08-17 09:39:05 +0000 UTC [ - ]

With the exception of photovoltaic cells, all the other ways we generate electricity ultimately generate the electricity by spinning a dynamo - directly for hydro and wind, and through boiling water to steam for fossil, most solar plants (which use mirrors to heat water in pipes), nuclear, geothermal. Not sure if fusion would generate the electricity directly or also heat up water to spin up some magnets.

kaibee 2021-08-17 14:05:09 +0000 UTC [ - ]

Fusion uses the extreme heat generated by the fusion reaction to... boil some water and run it through a turbine.

p1necone 2021-08-17 06:05:27 +0000 UTC [ - ]

All of those energy sources are then somehow converted into spinning rocks (magnets) really fast to convert them into electricity though.

ben_w 2021-08-17 07:35:53 +0000 UTC [ - ]

All but one of them, though PV is quantum magic and rather more mind-bending for most people than non-euclidian geometry was for Lovecraft.

meowster 2021-08-17 05:00:50 +0000 UTC [ - ]

Electricity comes from spinning loadstones (magnets).

mrob 2021-08-17 10:51:06 +0000 UTC [ - ]

Big generators in power stations typically use field coils (electromagnets), not permanent magnets.

fishtoaster 2021-08-17 06:27:21 +0000 UTC [ - ]

I’m not 100% sure, but I think the origin of that is https://twitter.com/daisyowl/status/841802094361235456?s=21

hermitdev 2021-08-17 04:51:49 +0000 UTC [ - ]

Love this gem, too.

kleinsch 2021-08-17 03:26:39 +0000 UTC [ - ]

I play online games that have variable winrates bc you’re playing with a match made team. If I go on a 5 game win streak, did I improve? Is the game bugged? Nah, I probably got lucky and got good teammates. If I go on a 5 game loss streak, that’s definitely what I’m telling myself.

There’s an element of luck in roguelikes too. It’s part of what makes both types of games compelling. The outcome is uncertain when you sit down to play.

So are there psychological traits that make players likely to mistake this bug? Maybe. But on an individual basis, without aggregated winrate data, how would I know it’s not just a lucky winstreak? Especially when I’m playing a style of game where that’s the point?

thaumasiotes 2021-08-17 03:53:18 +0000 UTC [ - ]

> Since time travel is impossible even for Apple, the past should remain constant no matter how far ahead in time you’re looking at it. For example, you look the same in your 2010 yearbook whether you’re reading it in 2011 or 2021. The time you’re reading the book doesn’t have any bearing on what the book says.

A lot of historians and archaeologists would be ecstatic if this were actually true.

rozab 2021-08-17 11:42:55 +0000 UTC [ - ]

Am I missing something or has this comment slipped in from another thread?

thaumasiotes 2021-08-17 12:40:48 +0000 UTC [ - ]

Oh, you're right - it's from the essay previous to this one, https://desystemize.substack.com/p/desystemize-6 .

Jarwain 2021-08-17 05:27:01 +0000 UTC [ - ]

Technically wouldn't the main impact of time be not what the book says, but how it's interpreted? Barring issues from degradation ofc

slim 2021-08-17 05:45:17 +0000 UTC [ - ]

Communication = context + information. There's no communication without context. So there's no reading without interpretation directly tied to the context of the reader. You can't read the same book twice without it being different.

goodcanadian 2021-08-17 08:37:47 +0000 UTC [ - ]

Also, over time, the book wears out, so someone makes a copy. Mistakes and misunderstandings creep in. Or, even, the copyist changes a few things to better fit his world view. It is possible that the book isn't as static as you think.

tsimionescu 2021-08-17 09:49:47 +0000 UTC [ - ]

As others have pointed out, degradation and imperfect copying are serious issues, they can't really be ignored. Also, language itself changes over time, adding significant difficulty to understanding what the book was supposed to have said. And of course, metaphor and references to then-well-known people or events become unrecognizable very quickly.

thaumasiotes 2021-08-17 09:34:58 +0000 UTC [ - ]

Degradation is a very serious, and inherently time-based, phenomenon.

soared 2021-08-17 03:55:44 +0000 UTC [ - ]

In my day job I manage ~8 deep learning models that make decisions automatically. I adjust 3 inputs, and the models output a lot of data that I use to make adjustments the next day. The issue is.. its incredibly muddy. Making the same change across models, according to documentation, will produce similar results.

Having only had this job for a couple months I honestly cannot tell if the input -> output is just very very muddy, or if the documentation is wrong and the models all behave differently. Similar to Crawl I can't see behind the curtain.. my inputs are generic "factor 1", "factor 2" and my outputs have more meaning but are subject to unknowable fluctuations from outside sources.

> holding the system constant enough that you can evaluate your changes over time

I do not know if I can learn from the system because I'm not sure if its trustworthy. Good thing there are directions on how to test it included in the post :)

eeegnu 2021-08-17 05:07:20 +0000 UTC [ - ]

To give another example of a bug in DCSS that went unnoticed for a while: there's a spell called Ozocubu's Refrigeration that for 6 months [1] dealt 1/3 the damage that it should have. It was considered very weak with a steep drawback before this (for a short duration after casting, you couldn't drink any potions), so it was ignored, and no one really questioned why it was useless.

[1]: https://github.com/crawl/crawl/commit/ab847a317b82e2fb0316bf...

ballenf 2021-08-17 09:54:58 +0000 UTC [ - ]

> ... remember that the Systemic Stability Principle is completely false, and just happens to be a false belief that it’s often useful to hold. A system being stable in the past doesn’t mean it’s stable now, and evidence that the Systemic Stability Principle works is not the same thing as evidence that it’s right.

That is, the only way you can get better at a game like this is to assume that the system is stable (otherwise you'll attribute failure to bad luck instead of bad strategy).

> It’s also important to avoid refuting observations by appealing to system definitions. If someone says “This feels like it’s doing more damage than it says”, resist the impulse to say “Nope, it does exactly this value; it’s written right here.” Instead, try to design an experiment that would prove whether the value written in the system is correct. If designing the experiment is very hard, that should be interpreted as a risk factor - if no one can check that the system is doing what it ought to, then maybe it really is wrong! Play yes-and with systems skeptics, letting them invest their time into correspondence work if they think something is wrong. Or be the system skeptic yourself, if a certain observation sits wrong with you.

That skill is a superpower in software development when dealing with complex systems that behave unexpectedly.

throwawaygal7 2021-08-17 15:38:53 +0000 UTC [ - ]

I'm a former DCSS player. I actually love the opacity and lack of documentation in the game. One of the truly enchanting elements of older rogue likes was the randomness, undocumented features, and various other things along those lines. It has been very sad to see crawl lose a lot of these elements as the pave of development has increased... The new player base wants everything to be optimal and so has gradually militated for the removal of the more unique and strange elements of the game play.

Simplified combat and magic, removal of extraneous skills, crippled food system... soon no doubt they'll try to 'balance' the mutation system which is the last of these old school rogue like features to still maintain a large presence.

I think the dispute is generational... I love crawl and had some modest victories but never defined myself around it. to me it was a distraction and just a fun game to sink time into in undergrad - not something to be relentlessly optimized.

kibwen 2021-08-17 17:25:33 +0000 UTC [ - ]

Try any of Crawl's zillion forks that add all sorts of weird things. Or you can just play old versions; even the official servers still host versions from over a decade ago.

zijoud 2021-08-17 05:03:19 +0000 UTC [ - ]

Pretty funny.. I happened to have picked up DCSS during this bug and I'm just now learning about it. http://crawl.akrasiac.org/scoring/players/zijoud.html

I was only playing the game for a week before I had a "streak" of wins, which is something you shoot for. And it was all during a tournament. I felt cool then, but not now.

personjerry 2021-08-17 02:23:31 +0000 UTC [ - ]

Seems like having tests would've easily solved this issue.

Supermancho 2021-08-17 02:29:05 +0000 UTC [ - ]

The question is "why didn't the players notice" not why the developers didn't notice.

The article leads with a clickbait question.

The reason they didn't notice is because the stats aren't readily available and public to everyone, to see the sharp uptick in wins paired with the game being filled with opaque systems. No player playing for the first time would notice or have a chance to notice.

To put it another way, this is an article about a (secret and mistaken) massive change to a very difficult game that made it slightly less difficult. Same article turns around and asks why nobody noticed, like that's an interesting question.

kibwen 2021-08-17 05:44:21 +0000 UTC [ - ]

> the stats aren't readily available and public to everyone

The stats have always been public and instantly available, impressively so. Here's the bot command to query the stats in question:

    !lg * start>"2015-03-06 00:00:00" start<"2015-03-21 00:00:00" cv=0.16 / won
Output:

    1180/39437 games for * (start>'2015-03-06 00:00:00' start<'2015-03-21 00:00:00' cv=0.16): N=1180/39437 (2.99%)
So that's a 3% winrate for the major version in question during the dates the bug existed.

Using another query to determine the all-time winrate across all versions:

    !lg * / won
We see the historic winrate to be about 1%.

The listgame interface is documented at https://github.com/crawl/sequell/blob/master/docs/listgame.m... , and the bots can be found on IRC and Discord.

thaumasiotes 2021-08-17 03:41:59 +0000 UTC [ - ]

> The question is "why didn't the players notice" not why the developers didn't notice.

No, lots of players noticed. The question is phrased as "how did the players miss this?", but that's not what the author means - what he means is "why didn't the community come to a consensus on what had happened within two weeks of the introduction of the change?"

Which is a much stupider question.

mandmandam 2021-08-17 07:54:55 +0000 UTC [ - ]

> Which is a much stupider question.

I don't think it's a stupid question at all; not when the effect is so large. It was literally double what it should be, leading to a 3x overall win rate.

Coming from Dota 2, if there was a bug like this it would be caught within minutes, if not seconds, and patched within an hour or two.

- Yes, the Dota player base is orders of magnitude larger, and the damage numbers are easily available. Also I have never played Crawl.

tsimionescu 2021-08-17 10:00:41 +0000 UTC [ - ]

It's not a stupid question at all, it's actually a good pretext to discuss the difficulties inherent in reasoning about an opaque system you are interacting with.

The article also brings evidence that many players didn't notice or believe that anything changed, they specifically believed that nothing did change, which is a surprising belief for such a huge change - but understandable in the context of of such an opaque system. I also believe that other players noticed something had made the game easier, but no one realized what specifically had changed, or most likely suspected how big of a change it was.

thaumasiotes 2021-08-17 10:11:39 +0000 UTC [ - ]

> The article also brings evidence that many players didn't notice or believe that anything changed, they specifically believed that nothing did change, which is a surprising belief for such a huge change

Not really. The article cites responses in online discussion to that effect. But there is no evidence that those responses came from someone who had ever played the altered version of the game. With certainty, many of them didn't. You don't have to do a playthrough on the latest version before leaving a new comment in a forum thread.

It's a safe bet that the majority of the player base never encountered the bug at all, since it was only available if you downloaded the game within a two-week window.

djmips 2021-08-17 02:56:22 +0000 UTC [ - ]

I thought it was an interesting question followed by an interesting take. There was a bit of ego salve in the conclusion but it seemed plausible.

hnxs 2021-08-17 02:33:57 +0000 UTC [ - ]

“Slightly less difficult” is a huge understatement. I don’t know specific stats, but I’d say the average player had their winrates at least double (eg 2% to 4%)

cortesoft 2021-08-17 03:12:28 +0000 UTC [ - ]

It is not easy to notice a change of 2% success to 4% success without a lot of samples.

muzani 2021-08-17 04:14:13 +0000 UTC [ - ]

This is an interesting idea. Let's say there's a new feature where electric damage and cold damage ignores metal armor. But suddenly there's a bug where if the weapon does both electric and cold damage, the physical damage also ignores armor.

How would someone design tests that work well for roguelikes? It's not simply Human strikes Orc for 10 damage. Damage range may be a bell curve of 5-15. There's misses and hits. There's little bonuses that increase the miss probability, probably to ridiculous levels. There's armor, calculated from things like race, constitution, magic, divine bonuses, class mastery bonus.

And then you have procedurally generated... stuff. Equipment are the easiest to control. Some games have procedurally generated monsters, some have procedurally generated deities. To be able to reduce this stuff to tests kind of defeats the purpose of doing it, which is to be unpredictable.

Sometimes you have a combo of things that add up to 40% damage resistance. Is 40% too much or too little?

After all this, how do you detect that the bugged spear is doing an unreasonable amount of damage?

personjerry 2021-08-17 04:28:55 +0000 UTC [ - ]

In cases like this we use mock functions to replace the rng dependent functions during testing.

https://en.m.wikipedia.org/wiki/Mock_object

muzani 2021-08-18 07:30:21 +0000 UTC [ - ]

This won't catch the procedurally generated edge case though unless it mocks every possible combination. It could well run into millions of scenarios just to test damage.

2021-08-17 16:00:25 +0000 UTC [ - ]

camtarn 2021-08-17 13:39:06 +0000 UTC [ - ]

Monte Carlo simulation? Have your CI server run millions of games in fast forward, and collect as many statistics as you can. When you make a change, check that the stats that you expected to change did change, and that others stayed the same.

Wouldn't detect extremely improbable things, though, like having a spear with a super-rare effect + having armour with another super-rare effect, since those would show up so infrequently that they wouldn't skew the overall stats much.

a_e_k 2021-08-17 08:00:04 +0000 UTC [ - ]

(a) Set a deterministic RNG seed before each test.

and/or

(b) Have each test loop enough to get a good estimate of the probability distribution.

muzani 2021-08-18 07:33:49 +0000 UTC [ - ]

B seems fine. But wouldn't a RNG seed give different results as you add more randomly generated features?

nimih 2021-08-17 04:56:01 +0000 UTC [ - ]

It sure would be inconvenient for your hypothesis if the crawl project had a `test` directory which predates the bug in question...

soared 2021-08-17 04:22:23 +0000 UTC [ - ]

> try to design an experiment that would prove whether the value written in the system is correct. If designing the experiment is very hard, that should be interpreted as a risk factor - if no one can check that the system is doing what it ought to, then maybe it really is wrong!