Personally, I've come to really dislike test names like "test_refresh_failure". They tell you what component is being tested, but not what kind of behavior is expected. Which can lead to a whole lot unnecessary confusion (or bugs) when you're trying to maintain a test whose implementation is difficult to understand, or if you're not sure it's asserting the right things.
It also encourages tests that do too much. If the test is named "test_refresh", well, it says right there in the name that it's a test for any old generic refresh behavior. So why not just keep dumping assertions in there?
I'm much more happy with names like, "test_displays_error_message_when_refresh_times_out". Right there, you know exactly what's being verified, because it's been written in plain English. Which means you can recognize a buggy test implementation when you see it, and you know what behavior you're supposed to be restoring if the test breaks, and you are prepared to recognize an erroneously passing test, and all sorts of useful things like that.
Call me crazy, but I go even more verbose than that. I go full Rspec/BDD mode:
class TestAuthenticateMethod:
class TestWhenUserExists:
class TestWhenPasswordIsValid:
def test_it_returns_200(self, app, user):
assert ...
class TestWhenPasswordIsInvalid:
def test_it_returns_403(self, app, user):
assert ...
class TestWhenUserDoesNotExist:
def test_it_returns_403(self, app, user):
assert ...
It is so worth it. I can look back at my tests 3 months later and know exactly what was intended wthout duplicating explanations with comments. Yes some methods can get a bit long and indented, but it's worth it for me. The other downside is you have to start all classes/methods with "test" but I got used to that very quickly.
When you run pytest it's very clear what's being tested and what the expected outcome is since it gives you the full class hierarchy in the output.
I’ve been working on a codebase like this, and my major complaint is that I’ll see a deeply-nested test and have to scroll waaaaay up to understand the context. I prefer ‘foo_bar_baz’, ‘foo_bar_qux’ to
I've just never seen the "this belongs in comments" approach work out in practice. Maybe it's something about human psychology. Perhaps things might seem obvious when you've just written the test, so you don't think to comment them. Perhaps code reviewers feel less comfortable saying, "I don't understand this, could you please comment it?" Perhaps it's the simple fact that optional means optional, which means, "You don't have to do it if you aren't in the mood." Regardless of the reason, though, it's a thing I've seen play out so many times that I've become convinced that asking people to do otherwise is spitting into the wind.
I really hate, but I can see its virtue, the pylint rule that complains about lacking docstrings.
In tests, however, I prefer not to have them as my favorite test runners replace the full name of the test method with its docstring, which makes it a lot harder to find the test in the code.
My only issue is that the rule doesn't give you a good way to annotate that the object is documented by its signature. Sometimes devs are lazy, but the rule doesn't make them not lazy so you get pointless docs like:
def matriculate_flange(via: Worble):
"Matriculates flange via a Worble."
That's more or less why I hate this feature - it encourages lazy docstrings. Right after you named something is probably the worst possible time for you to explain it.
I've been speculating that the right place for the "why" of tests to live is in documentation, where references should be hung like citations (which may or may not be included in any given rendering of the documentation in question) supporting claims of fact. Then we can surface that any time a test fails, and find tests that no longer check anything we care about.
All this is true, but doesn't it apply just as much to writing descriptive test function names?
Test function names are less sensitive than regular functions, since they're not explicitly called, but I still don't want to read a_sentence_with_spaces_replaced_by_underscores.
I don't want to either, but until Python gives us backtick symbols or we something like Kotlin's kotest that lets the test name just be an actual string, that's sort of the choice we're left with. And I'm inclined to take the option that I've known to lead to more maintainable tests over the long run over the option that I've known to engender problems, even if it is harder on the eyes. Form after function.
As far as whether or not people do a better job with descriptive test function names, what I've seen is that they do? I of course can't share any data because this is all in private codebases, so I guess this could quickly devolve into a game of dueling anecdotes. But what I've observed is that people tend to take function names - and, by extension, function naming conventions - seriously, and they are more likely to think of comments as expendable clutter. (Probably because they usually are.) Which means that they'll think harder about function names in the first place, and also means that code reviewers are more likely to mention if they think a function name isn't quite up to snuff.
And I just don't like to cut those sorts of human behavior factors out of the picture, even when they're annoying or hard to understand. Because, at the end of the day, it's all about human factors.
Why would you want to keep test function names in your head? I would rather use that space for regular functions which are in use throughout my program. Test functions are usually single-use anyways so being greedy with characters makes much more sense elsewhere. The most important thing to remember about a test function is what property it tests (and how). Putting that in the function name as a reminder makes a lot of sense imo.
I think this is one of those conventions where your team agrees on one convention, and you just try to follow it consistently. If that's looking at the test in code to find descriptive comments, great. If that's long test names, cool. Just do the same thing consistently.
True - I think it depends on who uses your test outputs. If PMs and BAs care, having a more verbose output (descriptions, pretty graphs, etc) helps a lot. If it’s just for devs then they can more happily go through the code base.
I would've thought so but no, pytest will show the docstring in `--collect-only -v` (badly), but it doesn't show any sort of description when running the tests, even in verbose mode (see issue #7005 which doesn't seem to be very popular)
It's a choice of course, but I look at test functions like other functions. If you can name a function such that it doesn't need a comment (but also doesn't use 100 characters to do that), I'll gladly take that over a comment. Same like in all other code: if you need comments to explain what it does, it's likely not good enough and/or needs to be put in a function which tells what it does. Comments on why it does stuff the way it does are of course where it's at.
The difference is in tooling. Docstrings are collected and displayed in many contexts. The intended purpose is writing “documentation” in the same place as code. Comments are only seen if you open the source file.
Python's builtins, the standard library, and many popular community libraries and tools use docstrings to generate tooling and documentation websites. Describing it as "useful for a certain workflow" is so understated it borders on false. The reverse is more true: docstrings are not useful for a certain workflow, but most Python programmers will find them quite helpful.
All Python programmers already benefit from the docstrings written in the libraries they use, which generate tooltips and documentation websites.
you can access it from object.__doc__, and there is tooling in ide's like pycharm to quick view them to see what a function does, auto generated documentation
prior to mypy/type hints it allowed you to document the types of a function
Sometimes there are practical reasons to avoid this approach and have fewer test methods that test multiple behaviors of a single thing. For example a lot of our projects setup and teardown a database between test methods so a few fatter test methods run a lot faster than a large number of small test methods. We rely on good commenting within the methods to understand what exactly is being checked.
In that case, why not use some auxiliary function to load the resources (in your case, the database) and decorate it with functools.cache[0] to avoid the function from getting executed multiple times? Sure, this means re-using resources between multiple tests (which is discouraged for good reasons) but your current test effectively does the same thing, the only difference being that everything is being tested inside one single test function.
PS: How come your setup and teardown operations are so expensive in the first place? Why do you even need to set up an entire database? Can't you mock out the database, set up only a few tables or use a lighter database layer?
The difference is that it's very obvious in our current tests when the database is being reset. And to be clear it's not like we're testing the entire app in a single test function - our test functions just tend to cover more than one feature.
We've also found over the years that if you want to be sure everything works on version X of database system Y, you run your CI tests on that.
One of the most common test modules in R is called “test that” and you invoke a test by calling a function (rather than defining one) called “test_that”, the first argument is a string containing a description of what you want to test and the second argument is the code you want to test.
That way, all your unit tests reads: “test that error message is displayed when refresh times out” etc.
I think it’s a really nice way to lay things out and it avoids all the “magic” of some functions being executed by virtue of their name.
This. Like probably most programmers, I tend to read code more than I write code and when I read tests, I want to know immediately what a test does and checks. I don't want to read the test itself, unless I already know it's of interest to me.
In my experience, when a unit test has no clear name, this is usually a sign that the test does too much (meaning that it's actually not a unit test anymore) or even comprises multiple tests, leading to all the known bad consequences this has (leaking state between tests, unclear test assumptions etc.)
I think that behave (https://behave.readthedocs.io/en/stable/) is more useful for testing these more real-life usecases. I tend to have a `test_my_function` in pytest tests and the more integration and functionality-related testing in the behave tests.
have you considered using "test_refresh__failure" instead? makes clear "refresh" is the component being tested and "failure" is a description of the behavior
The author doesn't like pytest fixtures, but personally they're one of my favorite features of pytest.
Here's an example use case: I have a test suite that tests my application's interactions with the DB. In my experience, the most tedious part of these kinds of tests is setting up the initial DB state. The initial DB state will generally consist of a few populated rows in a few different tables, many linked together through foreign keys. The initial DB state varies in each test.
My approach is to create a pytest fixture for each row of data I want in a test. (I'm using SQLAlchemy, so a row is 1-1 with a populated SQLAlchemy model.) If the row requires another row to exist through a foreign key constraint, the fixture for the child row will depend on the fixture for the parent. This way, if you add the child test fixture to insert the child row, pytest will automatically insert the parent row first. The fixtures ultimately form a dependency tree.
Finally in a test, creating initial DB state is simple: you just add fixtures corresponding to the rows you want to exist in the test. All dependencies will be created automatically behind the scenes by pytest using the fixtures graph. (In the end I have about ~40 fixtures which are used in ~240 tests.)
One thing I've encounter with pytest fixtures is they have a tendency to balloon in size.
We started out with like 50 fixtures, but now we have a conftest.py file that has `institution_1`, ..., `institution_10`.
My end conclusion is that fixtures are nice for some things, like managing mocks, and clearing the databases after tests, but for data it's better to write some functions to create stuff.
So instead of `def test_something(institution_with_some_flag_b)` you'd write in your test body:
I’ve rewritten a bunch of our tests to this factory pattern last week, too (the factory is a fixture though - FactoryBoy is worth a look).
I’d argue that too many global fixtures in conftest have a high risk of becoming a “Mystery Guests” or too general fixtures. For a test reader it’s impossible to know the semantics of “institution_10”.
I believe this to be rooted in DRY obsession leading to coupling of tests: “We need a second institution in two modules? Let’s lift it up to global!”
I'm the exact opposite, I absolutely hate pytest fixtures. They are effectively global state, so adding a fixture somewhere in your code base might affect the tests in a completely different location. This gets even worse with every fixture you add because, being global state, fixtures can interact with one another – often in unexpected ways. Finally, readers unfamiliar with your code won't know where the arguments for a given `test_xy()` function come from, i.e. the dependency injection is completely unclear and your IDE won't help you much.
There are so many other (better) ways to achieve the same goal, such as decorators or – as already mentioned by emptysea in their sibling comment – explicitly invoking some function from within the test to do the setup/teardown.
One one hand, I've been impressed with how they compose and have let me do some great things. For example, I had system tests that needed hardware identifiers. I had a `conftest.py` to add CLI args for them. I then made fixtures to wrap the lookup of these. In the fixture, I marked it as Skip if the arg was missing. This was then propagated to all of the tests, only running the ones the end-user had the hardware for.
On the other hand, when I need to vary the data between tests and that data is an input to something that I'd like to abstract the creation of, fixtures break down and I have to instead use a function call.
I've been looking for something like this for ages. I'm excited to try some of this stuff out, like spec'd Mocks.
I'm curious if anyone else who has been drawn in by the allure of the Mock has some strategies to avoid the footguns associated with them? (Python specifically)
Just don't use them unless you have to. The two main reasons to mock are for network requests and because something is too slow. Other than that, test things for real. Do not isolate parts of your code from other parts of your code by using mocks. If your code does side effects on your own machine, like writes to the file system, let it write to a temporary directory.
Indeed. And if you're mocking eg. API calls in a client library, try to have tests for the real things as well. They don't have to be part of your normal test suite, they can run only if env vars are set with the API keys needed.
VCR.py (https://github.com/kevin1024/vcrpy) is a great utility for mocking APIs.
It will run each request once, save the responses to YAML files, and then replay the responses every time you re-run the tests. It's also very useful for caching API responses (e.g. you have a trial account with limited number of requests).
Unfortunately, if used for testing, it will not cover the case when the original API changes its interface.
That Ian Cooper talk is just fantastic. It's perhaps best contribution to the subject of TDD anyone has produced since Kent Beck popularized the idea in the first place.
I'd really recommend this video. I had seen it years ago and just came back to it. It's about changing your architecture, one of the effects of that being changing how you need/use mock. https://www.youtube.com/watch?v=DJtef410XaM&t
assert on the call_count attribute of a mock instead of trying to use methods on it like .assert_called_once_with()
"a mock’s job is to say, “You got it, boss” whenever anyone calls it. It will do real work, like raising an exception, when one of its convenience methods is called, like assert_called_once_with. But it won’t do real work when you call a method that only resembles a convenience method, such as assert_called_once (no _with!)."
I'd add to that, that test should be readable.
personally I prefer to use: GIVEN, WHEN, THEN as comments in the tests.
Also; it's ok not to be DRY while writing tests.
Depending on context and implementation details, I'd say DRYing tests can be anywhere from indispensable to toxic.
I'm fine with creating libraries of shared functionality that tests can use, especially when it helps readability. If you've got several tests with the same precondition, having them all call a function named "givenTheUserHasLoggedIn()" in order to do the setup is a nice readability win. And, since it's a function call, it's not too difficult to pick apart if a test's preconditions diverge from the others' at a later date.
What I absolutely cannot stand is using inheritance to make tests DRY. If you've got an inheritance hierarchy for handling test setup, the cost of implementing a change to the test setup requirements is O(N) where N is the hierarchy depth, with constant factors on the order of, "Welp, there goes my afternoon."
I've gotten lured into the inheritance stuff and it's super nice at the very, very beginning and becomes a nightmare to maintain. Obviously a horrible tradeoff for software.
I've found that having a class/function as a parameter and explicitly listing the classes/functions that get tested is a small step back and way easier to maintain and read. It sets off some DRY alarms, cause usually that whole list is just "subclasses of X". And it seems like burden to update. "So if I make a new subclass, I have to add it everywhere?". Yes. Yes you do. Familiarity with the test suite is table stakes for development. You'll need to add your class name to like ten lists, and get 90% coverage for your work, then write a few tests about what's special about your class. When something breaks, you'll know exactly what's being tested. And you'll be able to opt out a class from that test with one keystroke.
That being said… I still have a dream of writing a library for generating tests for things that inherit from collections.abc. Something like “oh, you made a MutableSequence? let’s test it works like a list except where you opt-out.”
The given, when, then breakdown is interesting, though I've never seen language test utilities actually enforce that structure. Maybe an interesting potential experiment (regardless of language) ?
I feel like your last point is especially important. Sooooo many times have I seen over-abstracted unit tests that are unreadable and are impossible to reason about, because somebody decided that they needed to be concise (which they don't).
I'd much rather tests be excessively verbose and obvious/straightforward than over abstracted. It also avoids gigantic test helper functions that have a million flags depending on small variations in desired test behaviour...
They're already grouped by module which normally provides enough granularity (in my experience, I've only scaled this up to 50k LOC apps though so YMMV).
I just saw the github readme for this project. How is the describe variant different from just grouping the tests together into a module called test_describe_my_function.py and then having smaller named functions inside?
With rspec, you use the describe and context keywords.
At one level, yes, it's mainly syntactical sugar. As the test-writer, the two approaches may seem interchangeable.
Where I find it really helps is when I'm not the test-writer but rather I'm reviewing another developer's tests, say in PR. I find this syntax and hierarchy produces a much more coherent test suite and makes it easier for me to twig different use cases and test quality generally.
> With pytest, it's possible to organize tests in a similar way with classes. However, I think classes are awkward. I don't think the convention of using camel-case names for classes fit very well when testing functions in different cases. In addition, every test function must take a "self" argument that is never used.
So there's no reason to do this, aside from aesthetics.
I'd recommend against doing un-Pythonic stuff like this, it makes your code harder to pick up for new engineers.
My suggestion is that adding a mini-DSL for parsing nested functions as nested tests is less readable for the average Python programmer who has not seen that plugin before.
And I've never had a problem with reading class names vs. function names; we do that all the time when reading Python code.
I think this is clearly in the realm of subjective preference for what "looks nicer", which I'd call aesthetics.
On the other hand, this probably breaks your IDE's pytest integration, which would be an objective material downside.
Whatever floats your boat though, definitely not a hill I'd die on.
I'm curious how others test code that operates on large datasets—eg, transformations of a dataframe, parsing complicated responses, important implementations of analytics functions.
I've previously used serialized data—JSON, or joblib if there are complex types (eg, numpy)—but these seem pretty brittle...
This is a good talk by the core developer Raymond Hettinger [1]. He prefers pytest too. I don't do any crazy testing, but I really like property-based testing with Hypothesis, which is also mentioned. This video isn't Python but it's a great intro to property-based testing [2].
I used to use unittest for this reason, but it's pretty silly. Having extra dependencies for the tests makes no difference for end users and these days it barely makes a difference to developers.
What do you mean, extra dependencies? The only difference between pytest and unittest in this regard is that tests using unittest declare their dependency explicitly, using an import[0]. Most pytest tests still implicitly require pytest as a dependency, though. (Think of fixtures etc. etc.)
I actually like unittest's approach here – in my book, explicit is better than implicit.
It also encourages tests that do too much. If the test is named "test_refresh", well, it says right there in the name that it's a test for any old generic refresh behavior. So why not just keep dumping assertions in there?
I'm much more happy with names like, "test_displays_error_message_when_refresh_times_out". Right there, you know exactly what's being verified, because it's been written in plain English. Which means you can recognize a buggy test implementation when you see it, and you know what behavior you're supposed to be restoring if the test breaks, and you are prepared to recognize an erroneously passing test, and all sorts of useful things like that.