Hacker News new | past | comments | ask | show | jobs | submit | more sebastianmestre's comments login

Is there a more direct or elegant solution than reducing it to a quadratic equation?

By the way r=6 is also a solution, if we treat the diagram a bit more abstractly.


Yes, couldn't visualise that!


-6


Width in a flame graph is directly proportional to runtime. Optimizing a block that covers x% of the graph will only speed up the program by x% or less, so probably dont bother with blocks less than 0.5% wide.

This by itself should already tell you what NOT to optimize.

But really, you should be looking for operations that take a long time but shouldn't (wide blocks that should be thin). To find it you need to have an intuitive idea of how fast things should be beforehand.

If you have no idea how fast things should be, no amount of graphs will help you with this.


Yeah, Sean Barrett has a good writeup about them and optimizing them for branch-prediction, etc

It's framed in the context of programming-language tokenization, but the principles are the same.

https://nothings.org/computer/lexing.html


They meant that different characters don't change the performance, unlike the Linux and MacOS versions, where different characters go through different code paths.

But yes, a bigger file does take longer to process.


In my experience, 3 passes of box blur looks good enough for small kernels (less than 50px). Is this faster than that?


Will be essentially the same performance. A box filter has to add one pixel entering the kernel and remove one pixel exiting the kernel. This one has one pixel entering and leaving each half of the kernel and then combining the two halfs. So three box blurs have six additions and once this filter is also six additions. Maybe the box filter could be somewhat slower because it has to increment two pointers three times while stack blur has to increment three pointers once and also incurs the loop overhead only once. For a fixed radius you could however use only one pointer and realize the offsets between the pointers with an appropriate addressing mode if available on the target architecture.

This is the box filter inner loop.

  sum += data[right++] - data[left++]
This is the stack blur inner loop.

  lsum += data[middle]  - data[left++]
  rsum += data[right++] - data[middle++]

  sum += rsum - lsum


Parsing and symbol resolution on simple languages is kinda trivial in my opinion, but I'll probably write some articles about that stuff later

Control flow and pointers are coming in the next part, and it's still surprisingly easy


Is memory / register allocation in your roadmap too ? (honest question, I'm not sure it's in the scope of your articles)


Register allocation is not on the roadmap, I think it's too hard for this series... But maybe I just haven't figured out an easy enough way to do it yet :^)

Memory allocation? You mean making a heap allocator? In the spirit of lowering the bar for creating new languages, I lean more towards calling into libc's malloc instead of building a custom solution haha


That's right!


The article describes AST -> Assembly transformation, no?


Author here. I wanted to get the bar for writing a compiler as low as possible, and emitting assembly code seemed better in that regard.

Also I don't have time to learn x86 instruction encoding right now :(


Thanks for the response! I was joking a bit (though not entirely - I really like dynamic code generators and am inspired by tiny, powerful systems) but as I mentioned I like the approach and hope that you get the chance to finish the series.


> There are push and pop instructions which are perfectly suited to this nested value use. In the past, lamenting how many compilers don't seem to realise that (and the fact that push/pop are specially optimised on x86 via hardware known as the stack engine)

That's fair, but in the article I also store local variables relative to %rsp, so I wouldn't want it changing at all

I could use %rbp for that instead, but then I'd have to explain it in the article :^)


That's fair, but in the article I also store local variables relative to %rsp, so I wouldn't want it changing at all

You only need to keep track of how much you've put on the stack (which can be a single counter) and adjust the offsets appropriately.

But then you'd have to explain that too...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: