Much or this can be automated with c2rust[1][2]. They recently published an overview[3] of lessons learned and future directions for Rust and C integration.
That doesn't really translate C to Rust, it compiles it to unsafe Rust with all the weaknesses of the C code. It doesn't even use Rust arrays. You would not want to maintain what comes out. It turns
p[j] = p[j-1];
into
*p.offset(j as isize) = *p.offset((j - 1 as libc::c_int) as isize);
C has no way to say how big p is. (Biggest flaw in the design of C).
That info is essential to translating this properly. The translator needs either a hint from the user, or inference from examining the calls to the function.
Now, if you could stick something like
ASSERT(LENGTHOF(p) == n);
which would be ignored by C if ASSERT was defined to ignore it,
and the Rust translator understood that, it could use a Rust array, which has a size.
Now the output Rust code could use subscripts properly.
Generating constraints like that is a hard problem, but providing an interactive tool which prompts the user on where to insert them is not.
C2Rust contributor here. You are right that the transpiler preserves all the weaknesses of C in its output. (Thanks for taking a look :)
C2Rust includes a refactoring tool that (in some, currently very limited circumstances) let us lift raw C pointers to Rust references (to slices) as a step towards using safe Rust constructs. This is obviously not something that can be 100% automated but our goal is to see how far we can reasonably go. Lots of work ahead but we 100% agree that some level of automation is highly desirable here.
Not when it's being pitched by gp as an alternative to rewriting such that the rewrite is the primary code. This particular scenario is one where an opaque blob is not at all the goal, and thus c2rust is not a valid strategy
It seems that part of the goal of c2rust was to retain the C memory layout in memory. If that's not required, C arrays can be translated to Rust arrays. Except for some corner cases involving casts and pointer arithmetic, that should work, and generate much cleaner output code. You can detect those, at least. For pointer arithmetic, do the pointer arithmetic and convert it to a subscript before indexing. Then you have bounds checks.
This is a classic problem with computer language conversion - you lose the idioms of both the input language and the output language in translation.
> it makes it into essentially a blob of uneditable code.
No, it doesn't. It retains comments, the names of variables, and most of the structure of the code. You're meant to be able to factor the unsafe behaviours out incrementally from there (and there are a lot of bundled and associated tools meant for exactly that). I ran an example through c2rust the other day and rustc told me immediately that I had functions that didn't need to be marked unsafe. switch statements are probably the thing that suffers most with c2rust, though I think there's a tool for that.
My favorite translation so far is Duff's Device. In old C from Wikipedia modernized it's roughly:
void send(volatile short *to, short *from, int count)
{
int n = (count + 7) / 8;
switch (count % 8) {
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}
While the result from c2rust is even less maintainable than original interleaved-loop-switch, I'm impressed that it looks like they have a robust way to handle bizarre control flow. (The output from c2rust is not reproduced here to save space, you can try it on https://c2rust.com/ )
Do you actually have those C commands in your preprocesser? Sorry if the question is pretty basic but I’m learning C. It’s the one language you can’t avoid! (e.g. Erlang VM is written in C) Any info appreciated :)
Those aren’t builtin macros, no. There is “assert.h”, which has an ‘assert’ function/macros/thing. The LENGTHOF macro I’m not as sure. There are ARRAY_LEN macros that I’ve seen — being sizeof(x)/sizeof(x[0]) — but that’s all compile time, and wouldn’t lend itself here too well.
For C, almost everything is written by you, especially when it comes to macros. There are some “basics” that you pick up here and there, which you can copy-paste more or less.
The C preprocessor is a beautiful thing, but can get a bit tedious. Sometimes a lot tedious. But shit is textual replacement powerful!
The verification ecosystem for C is the last reason that tool-augmented C is safer than Rust. A Rust-to-C compiler might let Rust leverage that. So, thanks for the tip.
I don't think mrustc speaks the C dialects that are friendly to tool augmented verification currently, FWIW. The focus of that project is from source boot strapping, and support for archs that don't have LLVM backends.
Reading the comments here it seems like I'm not the only one who thinks some Rust code is pretty syntax-heavy even compared to older systems languages like C and C++.
I wonder if this is in part to further separate application development from system development (slowly paving the way for more verbose but formally verifiable code in core components), and to push more people to write higher level code instead. I can imagine many reasons for why things would be steered this way.
Edit: Bad code example, sorry; thanks for the informative reply! I didn't mean to be unnecessarily negative, just a thought that ran through my head.
I can assure you that Rust's syntax is not designed to further separate application development from system development, if anything, we encourage a way broader audience to learn Rust than just pure systems programmer folks. In general, Rust's syntax is intended to be vaguely similar to C/C++/Java, but with some influence from functional languages where appropriate, and with syntactic forms that work better with type inference. There is very few syntactic forms that are truly novel in Rust (even lifetimes are taken from OCaml's generic syntax, because lifetimes are generics!)
In addition to this excellent reply, if you were using the type core::panic::PanicInfo commonly (i.e. "twice") you would import PanicInfo at the top of the file
Yes, and you can do something similar with C++ too; I debated if I should include it or not for this reason. (The semantics are a bit different...) Thanks for bringing it up!
In C++ unary prefix & is used in an expression to take the address of an object. To declare a parameter of reference type the & would be between the type name and the parameter name (i.e. after PanicInfo/before _info).
> "name: type" not "type name", which adds a single :
Sorry for asking a newbie question that's a quick search away, but is there an equivalent to "type name, name, name..."? (I'm biased towards C myself but contemplating a switch...)
Rust’s pattern syntax is richer than just introducing variables. I’m on my phone, so it’s a bit hard to show off all of the things that can be done. In this exact case, though, it usually looks like
One reason is that overriding "=" would allow you to violate the move semantics of Rust: a = b; is guaranteed to be equivalent to memcpy(&b, &a, sizeof(a)). In C++, you can overload the assignment operator, which lets you do crazy, unexpected things. In the end, it's mostly a tradeoff between convenience (writing a = b; instead of a.assign_from(b);) and explicitness (being able to tell what might and what might not happen under the hood).
It's the difference between the Copy and Clone traits. One thing that Rust still lacks in its standard library is a Clone-like trait for moving data, which is also something that C++ can express, and could be useful in combination with "pinned" object which can't be memcpy'd. There is a crate providing this (via a Transfer trait), but it's nowhere near being a standard.
If this were ever to happen, it would have to be similar to Clone in that it would be explicit, not implicit. This would also make it not as useful as the C++ version, but it means not giving up all of the advantages of our semantics.
Because `fn foo() u8 {}` doesn't read as nicely as `fn foo() -> u8 {}`?
Also because "no return type" is the most common return type, and this way representing "no return type" as omitting the `-> ()` reads nicely: `fn foo() {}`
I think that the advantage of leading with `fn` arrives when you have paragraphs of functions to scan through.
Leading with a consistent word (be it fn, func, or whatever) immediately and disambiguously identifies the line as defining a function (rather than a variable, or whatever), which is a useful landmark to anchor on or pivot from.
So, using grep or a similar tool, how would you find the definition of the function, separately from its uses? In a Rust codebase I could just run `grep -r "fn foo" .` and find the definition immediately.
Clearly this doesn’t matter if you’re in an IDE with a working “find definition” feature, but that’s often enough not the case that I find easily greppable definitions to be a gigantic advantage.
We could have, and discussed it, but IIRC there's some good reasons why we didn't. That was a very long time ago and I don't quite remember them now, though... I think that the idea is that this gets ambiguous with type ascription?
I'm not sure of the historical context, but it would have required either for the syntax to declare the return type of closures to be different from functions or for type ascription to have a different syntax.
To be honest, type ascription being : was a (small) mistake that affects the comprehensibility of errors because it is distance 1 from multiple common typos and is valid almost everywhere in the grammar, so it is very easy for the parser to go down the wrong route. Even worse, the feature is nightly only so we're not even getting the benefits of the simple syntax.
Advice for language designers: make your grammar redundant, with lots of sign posts for the compiler to recover when things are wrong. Things like mandatory semicolons are good for both humans when reading code and for the compiler when parsing it. You can also make your compiler perform semicolon autoinsertion, but still have an error, that way the type checker can still run.
What I don't understand about all of these languages born out of some inspiration to "do C++ right" which includes the inception of Java, Go, and now this: is the complete disregard for established systems and failure to stand on their shoulders. C++ stood on the shoulders of C and one doesn't need to cite the decades of success as proof; some syntax has proven itself as quite optimal.
From what I understand compiler frontends can be augmented to solve all of the problems Rust is trying to solve. Instead Rust is reinventing the entire wheel which is already sharing an axle with Go and Java's entirely reinvented wheels -- not just with syntax, but entire ecosystems from the ground up.
I'd like to start seeing a little more reuse of both ideas and effort instead of this present state.
Existing languages that have evolved by adding features onto a decades old base have a large technical debt. Sure, it is possible to write relatively safe code in C++ using modern features like std::unique_ptr, but you have to consciously use those features and any time you actually want to make use of that base (say, by using an older library), you lose all those modern benefits, at least in a part of your code. You either have to intertwine old code with new or write bindings, just as you would with Rust.
No software can keep total backwards compatibility for decades and be truly modern and consistent, because old mistakes and now unwanted features have to stay. To even attempt having the best design possible you have to either break backwards compatibility in new versions (like Python) or make an entirely new language at some point (like Rust).
Then perhaps correct me on what problems Rust is specifically trying to solve? As far as I can tell Rust stands on the shoulders of the LLVM backend- there are existing frontends, like C and C++ but Rust started from scratch.
My question is for what purpose? The problems we face today syntactically relate to intuitively expressing global superword-level parallelism to fully leverage AVX-512. No language is currently sufficient in this regard. Was Rust's syntax designed to maximally inform the autovectorizer in the backend to do this? That would be a worthy goal for this kind of investment. Otherwise it's wheel reinvention to me, and I feel Rust is being pushed really, really hard.
Writing vectorizable code is certainly important for some use cases, but it’s not relevant for the majority of programming tasks, even in systems/infra programming. Most things outside of numerical work just aren’t bound by the effective width of the core backend.
Rust doesn’t have to solve every problem in order to be useful. The major problem it does solve (relative to C++) is explicit compile-time tracking and enforcement of the scope of object validity, preventing (or at least making more unlikely) a class of bugs including use-after-free, uninitialized reads, buffer overflows, dangling pointers, and so on — without requiring a garbage collector.
This C++ code, a real mistake I’ve seen in the wild, would be totally impossible to write in Rust without explicitly using the “unsafe” keyword:
SomeType* foo()
{
SomeType x;
return &x;
}
None of this has to do with vectorization, but it’s still important and useful for many tasks.
(As an aside: I would be curious to know if there are any languages that make it easier to write auto-vectorizing compilers. Personally I’ve never seen numeric tight loop kernels written without explicit use of architecture-specific intrinsics and/or raw assembly language. But I’ve only worked on that sort of code for at most a few months in my career, so I’d be happy to learn if there’s research in this area.)
> Explicit compile-time tracking and enforcement of the scope of object validity, preventing (or at least making more unlikely) a class of bugs including use-after-free, uninitialized reads, buffer overflows, dangling pointers.
Why does an entirely new syntax have to be invented from the ground up to solve this problem? Does C-like syntax not adequately inform a compiler that wants to implement something like Rust's borrowing? It appears to work for `std::unique_ptr` and `std::move` semantics. It doesn't appear that Rust's syntax conveys any order of magnitude more information. Minor augmentations to C/C++, taking it some other direction which fixes whatever shortcomings for next-generation static analysis tools can be the focus of effort. I don't see why Rust first has to invent the universe.
Here's an example of overhead effort required by Rust: https://github.com/rust-vmm/kvm-bindings. This has already been implemented in C and distributed with most operating systems. Why wrap it? Now when KVM gets new features Rust has to also update its wrapper interface too. Isn't that just a propagation delay?
> Why does an entirely new syntax have to be invented from the ground up to solve this problem?
I don’t know. I’m not on the core Rust team.
Whether it’s theoretically possible to write static analyzers that implement Rust-like guarantees in C-like languages seems a bit irrelevant to me, since in the real world, none actually exist. Rust does.
I will happily continue using Rust, and happily reevaluate that choice if these next-generation static analyzers ever materialize.
(Also: Personally, to me, understanding rust syntax is actually much easier than trying to remember what && means in various contexts, the difference between std::move and std::forward, the difference between decltype(foo) and decltype((foo)), and various other C++ gotchas. But YMMV.)
> Whether it’s theoretically possible to write static analyzers that implement Rust-like guarantees in C-like languages seems a bit irrelevant to me, since in the real world, none actually exist. Rust does.
Both of those are much newer than Rust. So it appears Rust was valuable after all, even if all it did was influence C++ static analysis developers!
I will also point out that these are still much less than what Rust offers — read through the list of caveats in the MSVC post. Or the fact that “analysis is only function-local” in Clang...
Oh, and for these to catch everything you would need to only depend, transitively, on things that also used them. Just like in Rust you need to only depend transitively on things that don’t abuse `unsafe`, but that seems less unreasonable to me.
> Or the fact that “analysis is only function-local” in Clang...
This is not inherently a problem; Rust's analysis is also function-local, it's actually a design goal.
That said, you're 100% right at the issues here; this does not guarantee memory safety, it is far less than what Rust provides. That said, I'm happy it exists; anything that makes stuff better is a win.
As someone who wrote a precursor to that code (I work on crosvm), any language other than C has to write binding code for KVM (or any kernel interface for that matter). Are you suggesting that the only language that should get to use KVM is C (and other languages that can consume C headers)?
Additionally, when KVM gets new features, the hard part will not be adding more bindings to that crate; it will be utilizing that feature, which would have to happen in any langauge.
> Does C-like syntax not adequately inform a compiler that wants to implement something like Rust's borrowing? It appears to work for `std::unique_ptr` and `std::move` semantics.
It does not. Note that Rust's move semantics are different than C++'s, Rust has what C++ folks call "destructive move."
There are a few different goals, but one of the largest ones is memory safety. C and C++ are not memory safe, but Rust is. Memory safety is impossible to retrofit onto these languages, and so, you need a new language.
> As far as I can tell Rust stands on the shoulders of the LLVM backend- there are existing frontends, like C and C++ but Rust started from scratch.
Implementations are irrelevant, the issue is semantics.
And that's why Rust started from scratch. I totally understand the problem and the justification to address it. Yet nobody has ever been able to justify why it's necessary to dispense with the entirety of C and C++ to achieve memory safety. The power of frameworks like LLVM give you freedom to implement whatever semantics you want. Rust wouldn't have gotten off the ground if it weren't for that kind of power. I feel this power has been abused. For example, you could create a fork of C++ that merely compile-errors when you use a pointer and not a reference unless you enclose it in some `unsafe` block that you invented -- anything you want like that is possible. One can simply start with C++ augmented so that everything is const by default; that is the ideal progress to me.
There's still no justification for Rust to be completely hausdorff to the last 50 years of established C systems and start over. If Rust doesn't really offer anything substantive for this amount of divergence I feel it will always be relegated to wrappers and a niche community.
> Yet nobody has ever been able to justify why it's necessary to dispense with the entirety of C and C++ to achieve memory safety.
As soon as you make a backwards incompatible change, you've already made a new language, effectively. Because:
> For example, you could create a fork of C++ that merely compile-errors when you use a pointer and not a reference unless you enclose it in some `unsafe` block that you invented
This means that effectively all C++ code would fail to compile under this implementation.
> One can simply start with C++ augmented so that everything is const by default; that is the ideal progress to me.
This is nowhere near enough to achieve the goal. Sure, it's a thing someone could do. But it's kind of irrelevant.
> This means that effectively all C++ code would fail to compile under this implementation.
You're implying that all C++ code uses pointers and that's patently false. It's even false for C code. Pointers was the first thing C++ boxed with references. Box it more. Forks of C++ even exist, like GNU++ extensions. There's a whole spectrum of possibilities between that and where you are right now and I don't see why everyone needs to cross this chasm and embrace Rust at this cost.
Breaking one part of the language doesn't break everything for everyone. That's definitely a fallacy if it couches the justification I'm chasing here.
> For example, you could create a fork of C++ that merely compile-errors when you use a pointer and not a reference unless you enclose it in some `unsafe` block that you invented -- anything you want like that is possible. One can simply start with C++ augmented so that everything is const by default; that is the ideal progress to me.
But then it wouldn't be C++, it would be something else. You would very quickly end up with a language incompatible with C or C++ if you set off to implement the safety mechanisms of Rust on top of them.
I mean, look at how long it took to get modules into C++, and that's a feature that is opt-in and won't change the semantics of existing code. You can't "simply start with C++" for anything.
> There's still no justification for Rust to be completely hausdorff to the last 50 years of established C systems and start over.
C and C++ aren't static either. They're evolving too and modern iterations of the language aren't compatible, or even recognizable, from the earlier incantations.
(Is the big hangup here that Rust puts the type on the right side of the variable name?)
> If Rust doesn't really offer anything substantive for this amount of divergence I feel it will always be relegated to wrappers and a niche community.
If you're coming from a background of C++ development it's not that hard to pick up Rust. The borrow checker can be a bit of a pain but it isn't mysterious; it's got a well defined set of rules it plays with. There's a lot more to Rust than memory safety, it has a well laid out feature set that is refreshing and probably not implementable in C++ in my lifetime. The trait-based generics are great, the metaprogramming facilities are great, pattern matching is killer, that (nearly) everything is an expression is something C and C++ desperately need, and the out-of-the-box tooling is phenomenal.
I get that Rust isn't for everyone, but painting it as irrelevant and unnecessary is too extreme, and arguably wrong if it's inspiring linting tools that bring some of Rust's rules to the C/C++ world, or inspiring language features.
On a side note, what do you think of Apple moving away from Objective C toward Swift?
So my running conclusion is that Rust's founding decisions were made with a sort of "go big or go home" logic given this premise. To conclude that breaking even one part of C/C++ is grounds to throw up one's hands and drain the baby with the bathwater makes sense through this prism. At that point one might as well pour through the literature and academic languages to pick out all the goodies and gimmicks that haven't been mainstreamed before too.
The problem here is that Rust is not graceful replacement for C/C++. I'm being sold on Rust from a systems perspective and the system is already written in C. It's a simple and incomplete language begging for extension but it's done a good job of abstractly representing commodity computing hardware to a programmer for 50 years. Rust doesn't even make an attempt to gracefully regrade C/C++ into Safe-C++ and I can't come to any other conclusion that this is for nothing more than a lack of creativity.
There is an ignorance to the real costs of reinventing the wheel when one doesn't replace the original wheel too. That's why I am starting to truly convince myself Rust is actually harmful. If Rust gracefully took C in another direction (like C++ did), I could add Rust compilation units to my projects and share C interfaces which basically give me access to the entire existing universe so long as I use something like `extern "Rust" {` or visa-versa. If Rust broke ~10% of C++ I could put forth ~10% effort to port my projects to it and integrate it entirely. At a certain point down that vector, Rust can be the messianic "C++ done right" without looking back.
With the introduction of modules in C++20 there is now an intriguing possibility of implementing a new/cleaned-up language with compatible object model/ABI and using modules to interface with existing C++ and even C (via header units).
I don't know, to me the C version is more readable. I know for sure that in the end the C macros will only do text replacement, whereas the Rust macros could do anything.
I would reject any code review containing that character too, unless I worked on a project that used German internally, or was creating a product targeted towards the German market in some way.
> I know for sure that in the end the C macros will only do text replacement, whereas the Rust macros could do anything.
My impression is exactly the opposite: Rust macros are guaranteed to expand to syntactically valid code that still obeys the memory/type/lifetime-safety constraints, whereas C macros can expand to something that won't even compile, never mind have any guarantees about soundness. Not to mention all the hoops you have to jump through with extra parens and whatnot to avoid the footguns of C macro operator precedence.
That is only with procedural macros.
Normal macros operate just by token substitutions.
Procedural macros are also special in that they are the only macros that can be 'unhygenic': creating new identifiers in the current scope that were not passed as an argument to the macro.
(Although knowing if something is a procedural macro or not is not obvious when using one in code)
Yes, Rust is by far the most unreadable language that I've tried. I really wanted to like it but everything is just so ugly, unclear, and (seemingly) unnecessarily complicated.
u16 is marginally nicer than uint16_t, but why is there a colon? and why do we have to nest square brackets instead of [1][2][3] (or maybe even better [1,2,3] which you can get with suitable C++ libraries)?
Sure I can parse Rust. But it is definitely more complicated, and more "noisy".
Types going after identifiers avoids the need for the lexer hack, which causes all sorts of problems in C (such as "typename"). A colon nicely separates the two; I prefer something there as opposed to "x int" like in Go.
You have to nest square brackets to avoid ambiguity. Is &int[] an array of references or an reference to an array?
The turbofish rule is much easier to learn: just use ::<> when explicitly providing types for a call. The C++ concept of a "dependent qualified name" is a lot harder to explain.
What's important isn't how often you need to help the compiler: it's how easy the rules are. The turbofish is unfortunate, but it's nowhere near as bad as typename.
I don't think that's true. Consider a modified version of [1]:
template<typename T> class X {
void foo() {
typename T::A* pa;
}
}
The problem here is that C++ can't parse this without knowing whether T::A is a type or not. Otherwise it might be "T::A multiplied by pa". This is the lexer hack in action.
Rust, by contrast, has no such limitation [2]:
trait SomeTrait {
type A;
}
struct X<T> {
f: T,
}
impl<T> X<T> where T: SomeTrait {
fn foo() {
let pa: *mut T::A;
}
}
This compiles and runs just fine with no need for a turbofish on T::A, because Rust has no lexer hack.
"The syntax is this way to make lexing it easier" is not a good argument for syntax. Ever. Lex it into tokens, parse it using semantic analysis, and be done. Plenty of compilers have been doing this for a long while now, and plenty of work has been done to make this a non-problem. Choosing syntax because it's slightly-easier to implement but slightly-harder to use is not a recipe for adoption.
The problems with the lexer hack are user-facing problems, not compiler-writer-facing problems. They include typename, order of declarations being significant, weird function pointer syntax, and the most vexing parse.
I'm not advocating for the lexer hack. There are non-hack-y alternatives, hiding this pain from users. The options of "the lexer hack" or "identifiers first" is a false dichotomy. There are many ways to lex and then semantically analyze programs, and I do not understand why you are arguing as if that is not true.
What’s regular for computers is also more regular for humans. You’re absolutely right that taken to an extreme, doing things for computers isn’t great, but neither is making a super complex grammar
Regular has a technical meaning here, that is, Chomsky’s grammar hierarchy. It’s where the “regular” in “regular expression” comes from. That said, I’m using it in an imprecise way here to mean “simpler to process.” (This is because regular languages are simpler to process than say, context-sensitive languages.)
Location of the type is about grammar complexity. Rust’s grammar plays into its type inference capabilities, and the pattern syntax. There’s an underlying uniformity with syntax elsewhere.
The example of typename shows that it’s a problem that can’t be overcome by the compiler, so it’s trading off one bad syntax for another, not trading off bad syntax for difficulty to implement.
I am sure there is good theoretical arguments. But they are hard on the humans. Ideally I would want something like
constdata keymaps[1,2,3, u16]
That is easy to read, gets rid of all the extra line noise and directly tells me everything I need to know about memory layout and performance.
1.) it is constant, known and compile time and can be put into a read-only segment (or possibly flash rom on an embedded system).
2.) it is named keymaps. The name is important and should come early
3.) it is an array. arrays and primitive datatype have many important differences and programming languages should not try to hide that.
4.) it has dimensions 1 by 2 by 3 (in that order). Listing the "3" first in Rust when the first dimension only has extend 1 might have good reasons but is damn hard to read if you have more than 2 dimensions. Especially if you end up with things like 3 by 3 by 4 by 3. Which of the inner two is larger?
5.) Having the type of the element last makes sense, because in terms of memory layout that just means that we have 2 consecutive bytes. I also makes it easier to which from "a 1 by 2 by 3 array of u16" to "a 1 by 2 array of (three vectors of u16)".
Now you will probably give me reasons why I can't have that. But when I am coding I don't hard how hard it is on the compiler writers (as long as I can express things unambiguously), but want to have it as easy as possible so I have brain cycles to spare to think about data layout and algorithms.
On a more serious note, in general Rust favors explicit simple syntax: the only syntax related to arrays you need to learn is `[TYPE; LENGTH]` which is the way to write an array of type TYPE and length LENGTH, pretty straightforward. `[[[usize; 3]; 2]; 1]` is simply a composition of such arrays, as multidimensional arrays are just arrays of arrays.
C has a few more variants: the implicit length of `keymaps[]`, the `[0] = ...` initializer , the alternative `keymaps[1,2,3]` syntax. This is nice syntactic sugar, but you don't technically need it. Although if you really don't like the raw Rust syntax, you can always use macros like shown or a library like multiarray [2].
In a way I would say this makes Rust easier to learn: there are only a few symbols and patterns you need to learn to recognize, and the rest is compositions.
[1] WARNING: macro definitions are very symbol heavy, and thus even more unreadable.
The consequences of the lexer hack are hard on humans (typename, most vexing parse, order of declarations being significant, weird function type syntax), not just compiler writers.
You can already define a custom type which will allow you to have a nice syntax for multidimensional arrays: `Matrix<1,2,3>`. It solves your issue of nesting brackets, and you can impl arbitrary indexing for it.
Unfortunately, rust does not have numeric types outside the special case baked in arrays, so it cannot do that yet afaik. There is a ticket for it, but it needs work.
Also, normally the array in Rust would also be a constant, with the keyword `const` instead of `static`: the reason it is static however, is so the C program can access it.
Sure I can parse Rust. But it is definitely more complicated, and more "noisy".
I'm dabbling in some microcontroller stuff currently. The one thing I've noticed is that the Arduino (C++) environment seems to rely on magic. Lots of mysterious constants, registers, etc and it's not entirely clear what's what.
Meanwhile using rust in this environment is very explicit. It's quite a bit more verbose than the C++ version. I'm also sure some of this is due to me having to write the implementation itself but for me it's a lot easier to understand what's going on when things are nicely typed. It's the difference between:
Specifically I really like that I can access the parts of the register by name and that access is typed. You can't write to a read-only register, you can't modify a write-only register, and if you don't have a default value defined you call something else (e.g. write_with_zero) that makes it clear what you're doing.
Edit: Another thing I really prefer over the rust vs Arduino/C++ API is that state is encoded in types. So you may have a GPIO pin PC25. But that type takes a type parameter indicating state e.g. PC25<PeripheralB<Output<PushPull>>>. Yeah that's verbose but it's also very explicit. If you have something (e.g. UART/USART driver) that needs a pin to be configured in a specific manner you'll have to go through some non-trivial effort to pass an incorrectly configured pin in. As a result if your program compiles you can be more confident that it will do what you expect.
I love the typed API svd2rust generates as well. Generally you can just let autocomplete do the driving, the only things needing a brain are the abbreviated register names manufacturers use and the order of operations needed.
I wonder if Rust would be better suited for Arduino/embedded beginners. Rust is quite painless when you just want to glue a few crates together. I'm sure everyone would rather debug a compiler error than some invalid memory issue happening on the microcontroller.
The "mysterious constants, registers, etc" are very readable once you get familiar with the agreed abbreviations and the MCU you're programming for (and how bit shifts work in the code example you gave). I really disagree that Rust is somehow more readable, especially not the example you brought. How can you say `pioc.oer` is somehow more understandable than any piece of that C code?
How can you say `pioc.oer` is somehow more understandable than any piece of that C code?
Easily. pioc.oer tells you that you're using the PIOC peripheral, oer tells you're your manipulating the oer register specifically, and the mutable reference (&mut) indicates that you're modifying it (and the borrow checker ensures you're not going to be modifying it in two places at once).
Additionally the rust embedded folks have a practice of returning a "constrained" structure from an initialization function. Typically the configuration function will take ownership of the peripheral and then return a restricted wrapper around it. This means that if you're doing something that will result in immutable registers you'll get back a structure that doesn't allow you to modify those registers. So, for instance, trying to configure the watchdog timer twice on the MCU I'm using will not compile because you don't even have that original object around anymore. If the program compiles you're probably OK.
Nowhere in that C code is any of that referenced. There's no idea which peripheral is being manipulated. ulPin and ulPinConfiguration both expand to integers, so there's no guarantee you've even gotten the parameters in the right order. Likewise the shifting and masking is exposing unnecessary implementation details, and things that compile don't necessarily do what you think they may do (e.g. modifying immutable registers).
> There's no idea which peripheral is being manipulated.
The Arduino-y function usually maps internal pin mappings (and to the pin numbers on the given PCB. It's quite clear if you know what you're compiling the code for. Not to mention that how a pin-remapping function works usually doesn't matter, the resulting abstraction is very nice to use.
> pioc.oer tells you that you're using the PIOC peripheral, oer tells you're your manipulating the oer register specifically
And why should I care about that information when I already have the abstraction written before? The Rust code is much much worse in terms of ease-of-use in this case - manually having to look up how pins on the board map to internal registers is cumbersome. Some of the confusion about the names might also stem from that I expect consistent capitalization when dealing with registers, why would `pioc` be lowercase if it's in reality a register being modified? It's actually weird. I won't even begin how horrible-looking the "expanded" form is, compared to the 2-line C equivalent.
One more thing I just now realized, the Rust team made an incredibly bad decision picking the symbols, for example for mutable references. I don't see anyone with any good amount of C experience ever wanting to use Rust if they have to re-learn what `&` really means - useless waste of time for most. It's akin to designing a new safer bike but switching the handlebar direction. And in the end, the amount of symbols in Rust, combined with how annoying they're to type on non-US layouts, combined with the (wrongly) carried over connotations from other languages makes it a terrible replacement for what it's advertised for.
> So, for instance, trying to configure the watchdog timer twice on the MCU I'm using will not compile because you don't even have that original object around anymore.
That is very cumbersome and illogical. Reconfiguration is quite common.
> There's no idea which peripheral is being manipulated.
I think that's just your unfamiliarity with the platform and the example you chose. If you'd write the exact same code you brought as an example in C it'd be much clearer than the Rust code and just two lines. I'd love to see the asm of the Rust code.
Your criticism (e.g. OMG wrong case, OMG C uses & to mean something else) seems mostly centered around the fact that rust isn't C and less around the merits of rust itself. But I'll bite...
Some of the confusion about the names might also stem from that I expect consistent capitalization when dealing with registers, why would `pioc` be lowercase if it's in reality a register being modified?
Typically in rust screaming snake case is reserved for constants. In this case, pioc is not a register (so there you go). In the context of the embedded stuff the peripherals typically get screaming snake case names at the top level struct. In this example I've configured it and assigned it to a local variable named pioc.
That is very cumbersome and illogical. Reconfiguration is quite common.
In the example I gave it's not possible. After an initial write to the watchdog's configuration register all subsequent writes are ignored by the MCU. That's the whole point of having compile time checks. If it compiles, it's probably OK. If it doesn't compile you're probably doing something that won't work or won't do what you expect.
If you were to take the example something that can be modified you'd still have the functions laying around to modify the register.
I think that's just your unfamiliarity with the platform and the example you chose.
I think you'd probably want to guess again. Which peripheral is being manipulated? What happens if I get a magic number wrong and that function operates on the wrong peripheral?
> seems mostly centered around the fact that rust isn't C and less around the merits of rust itself.
If a language is advertised as a replacement for C/C++ then one can reasonably expect there to be little that works counter-intuitively coming from the to-be-replaced language.
> In this case, pioc is not a register (so there you go).
"It's in reality a register being modified" is not the same as "it's a register".
> After an initial write to the watchdog's configuration register all subsequent writes are ignored by the MCU. If it compiles, it's probably OK.
Hardware isn't perfect. You brought up the immutability of the watchdog timer as a benefit, it really isn't, that's all I wanted to say. I also doubt that just a compiling piece of Rust can usually handle a hardware failure or an error.
> Which peripheral is being manipulated? What happens if I get a magic number wrong and that function operates on the wrong peripheral?
If you already have an Arduino-y abstraction then it doesn't matter in which language it's written, the same opaqueness would happen if you can't look up the pin mapping from documentation. If you really need to see which peripheral is being manipulated then it's not difficult to write two very simple lines of C to do the same thing just as clearly. You're comparing two very different things, it's just not a very good comparison.
i find the rust version way more readable; it's clear that it's an array of 1 (array of 2 (array of 3 u16s)), which corresponds to the way "multidimensional" arrays are actually laid out in c and friends.
and why can't we write it like you just did, with outer to inner dimensions going from left to right, i.e. in the same order the indices go when we actually use elements?
Array filling shares syntax with array typing and array declaration:
[0; 10] // ten-element array filled with zeros
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] // equivalent array with ten zeroes typed out
[i32; 10] // type of ten-element i32 array
Because the "filling" syntax is optional, it makes sense to place it after the fill value. The array typing syntax follows. Essentially you specify the value and say "copy this X times" (it only works with types that implement Copy IIRC)
Given how often the C++ complexity argument is made, I am genuinely concerned that Rust will fully deliver on its promise on being simpler than C++. Only in retrospect will it be evident that it was the wrong goal to shoot for all along the way.
I don’t think Rust has ever promised to be simpler than C++. It is simpler, in my opinion, but while I think it’s probably a good goal to shoot for, I’ve never seen anything that says it is an explicit goal of the language.
It does go out of its way to be ergonomic, which could be interpreted as simpler.
What? Brainfuck is one of the easiest languages to read. It's almost entirely context-free; you can immediately infer what's going to happen at any given point of the code, except brackets, which require a bit of counting if you're not using a syntax highlighter...
The macros in the original code most all seem to be constants and very few functions, so I'm not sure what the power of Rust's macros buys here. I don't see a lot of difference between
LT(3, KC_TAB)
and
[TAB <{3}]
Both are "magical". Likewise, I can also understand the layout precisely by looking at the C code. To understand the Rust code, I have to assume what layer! and r! are doing, or I have to inspect them to make that determination.
I'm _sure_ Rust has advantages in certain situations, I'm not sure this is a great example to display that.
The macro system is stupidly powerful but I agree this isn't a decent example. C preprocessor macros are very cumbersome not only to read and write, but to use.
I alluded to this is another comment but Rust macros can call arbitrary Rust code. Including file i/o at compile time, such as loading a config file in your syntax of choice, deserializing it to a struct, and then using it to synthesize tokens. Rust's macros become an extremely powerful tool for code generation using Rust itself.
They come at the cost of readability and complexity, sometimes it's a good trade off but too much magic can make systems horrible to maintain. You often still have to know what a macro is doing by jumping to the code and it's a lot easier to mentally parse text replacement.
It's also the sort of complexity caused by developers that live in IDE's, outside of that it's rather trivial to add a build step with powerful language independent code generation tools.
Also, when someone inevitably takes macros why to far and you have to debug them, what is the rust equivalent of gcc -E/-save-temps to see the intermediate code? I assume it does but the lack of such features can be a nightmare working with attribute/annotation driven code in c#/java.
I have no idea how to type emojis on Linux (with a barebones tiling WM), the OS I use almost exclusively for work, although I’m willing to believe there’s probably a way. But I would still immediately reject a code review inserting emojis into our codebase — why force every single person who modifies the code to figure out how to type them in their setup, use a particular font, etc. I’d also reject a code review naming things in Chinese or French, for similar reasons, unless I worked in a country that uses one of those languages.
Furthermore, Windows, Mac, and Linux are not the only operating systems, nor even the only ones Rust supports. Good luck figuring out how to type emojis on Solaris or the GUI-less FreeBSD base system.
> I have no idea how to type emojis on Linux (with a barebones tiling WM)
You can copy and paste. In the context of this project, the small, fixed set of symbols (not emojis, but whatever) makes a lot of sense, and the code comes with an example that you can copy from. I would guess that this works even on Solaris or in a decent editor on GUI-less FreeBSD.
As for using, I don't know, "thumbs up" and "thumbs down" symbols in place of "true" or "false", sure, that would be excessive.
The FreeBSD base system ships with two editors (IIRC): ed and nvi. Ed doesn’t let you copy and paste at all, except whole lines. I don’t know for sure, but I’d guess that nvi does work properly with Unicode. But can the base system’s font render it properly? I don’t know.
Anyway, that’s beside the point. It’s an unnecessary piece of additional complexity for, as far as I can tell, no real benefit over escape sequences.
I mean, I guess? But I’m guessing way over 99% of lines of code is ASCII. Of course then it’s also UTF-8, but it’s still a little disingenuous to say that “most” code is Unicode.
At this point, I'm not so much against Rust because it's a difficult or hard thing to use, but because there's such a learning curve I can't, in good conscious, move team projects to it. "To continue to work on this with me, you must first learn Rust" is a hard sell.
Despite being a Rust fan, I do believe that it is in fact harder to use than nearly all languages, despite the valiant effort of the community to make it as easy to use as possible.
Things shouldn’t be ported to Rust willy-nilly, but in its niche (basically: a better C++, because the compiler enforces many of the things you need to keep track of in your head in C++, and also because as many productivity features as are possible without violating “pay-for-what-you-use” too fragrantly are baked in), it can be worth the cost.
I am sure some others will disagree with this, but I think using Rust in domains that Python or Java or Awk or Bash is well-suited to is quite silly. For me, Rust competes with C and C++, and little else.
IMO, this even applies when replacing most C/C++: a lot of that code doesn’t need most of the benefits or C++ or Rust. Often, it’s better to replace it with simpler programs doing less work in Go or JavaScript or Python.
I love Rust, but most use cases just don’t need it. The language needs a pretty substantial amount of experience to use, though, so you don’t want just a few Rust programs lying around; no one will have a reason to learn, so they will be even harder to support.
Between these, it becomes pretty difficult to find a critical mass of problems that need Rust enough to both justify the cost and to get a decent amount of traction in the company.
I'd argue it'd make more sense for the keymap macro to load a json/toml/yaml config file and then generate the code but that's just me. Would be a little more straightforward too using serde, just type out your schema in normal rust, automatically derive the deserializer via serde, load a file with a function like macro, and synthesize the C compatible code from there.
Those are not actually the same. There are several differences, but the most obvious one is that the standard ABI on x86-64 Linux passes integer parameters in the following order: rdi, rsi, rdx, rcx, r8, r9.
I noticed this immediately the first time I tried reading some Windows disassembly, as “rdi rsi rdx” is very thoroughly burned into my brain...
The Rust example made me feel anxious. I never got macro systems. What's so hard to read about a basic multi dimensional array? It's like the most basic of data structures. Am I the only one that can easily picture this in my head ?
Isn't the advantage of the macro in the Rust example not the fact that it's a more convenient multi-dimensional array (is it even that? like you say...) but that we're using literals not for their literal value, but for some local symbolic value. For instance, `1` doesn't mean 1, it means whatever `KC_1` means (perhaps that 63).
I surely find the Rust macro example easier to read in isolation.
Macro systems that are effectively simple find-and-replace are terrible. Modern macro systems that give you access to the actual syntax tree to manipulate directly - well, I feel more comfortable writing a little more code than using them, but at some point the differences mount up. I suspect the tradeoff here is better in the demonstration than the actual use, but other cases are valuable.
[1] https://c2rust.com
[2] https://github.com/immunant/c2rust
[3] https://immunant.com/blog/2019/11/rust2020/