> So no, there are plenty of easy ways to fool the optimizer by obfuscation.
If you mean fooling the compiler by the source code obfuscation, it won't – by the time the first optimisation pass arrives, the source had already been transformed into an abstract syntax tree and the source code obfuscation becomes irrelevant.
Multiple optimiser passes do take place, but they are bounded in time – it is not an accepted expectation that the optimiser will spend a – theoretically – indefinite amount of time trying to arrive at the most perfect instruction sequence.
There was a GNU project a long time ago, «superoptimiser», which, given a sequence of instructions, would spend a very long time trying to optimise it into oblivion. The project was more of an academic exercise, and it has been long abandoned since.
It is, of course, well known – or at the very least frequently and enthusiastically repeated – that a suitably drunk Odersky devised Scala whilst watching a Reese’s Peanut Butter Cup ad[0].
Laminated glass does not prevent routine stone chip events – if a tiny fragment of the stone becomes wedged in the outer ply or at the laminate interface at a tension point and, coupled with the temperature difference (inside the cabin vs ambient), cabin pressure and body flex that often place higher tensile stress lower on the windscreen, the crack can start propagating very quickly.
That was my experience earlier in the year: I was driving alongside a large fuel tanker on a city road when a tiny stone chip, probably thrown up from under the tanker’s tyres, struck the front windscreen. It took about an 1 ½ hour for the initially invisible crack to spread into an irreparable 30 cm one – effectively right in front of my eyes – and the windscreen had to be replaced. Lesson learned: do not drive anywhere near large trucks or fuel tankers or maintain a larger distance.
But the laminated glass will prevent the structural collapse of the windshield and will also prevent the occupants from being showered with glass shards. It is also more likely that the windshield will withstand an impact from a large stone, leaving a localised and static crack that can be repaired with resin.
> ...and will also prevent the occupants from being showered with glass shards.
Hasn't it been the case for a long time now that glass in automobiles is coated so that it breaks into small, generally-square fragments, rather than shards?
I've never smashed a window myself, but every couple of months, I see the remains of a window smashing on the sidewalk... it's always a pile of small, generally-square fragments.
My memory tells me that this design was mandated long ago because folks would get shards embedded in them effectively forever. One of my parents related a story that one of the parents up the tree would irregularly have to extract migrating glass shards breaking through the skin of his face that had been embedded during an automobile accident many years prior. But, perhaps that story is bullshit and completely fabricated, IDK.
That's tempered glass which breaks into the safer fragments. Still not completely safe obviously, especially if stuff is getting thrown around violently in an accident. The bigger safety case for laminated glass though is since it sticks together your body or limbs can't fly out through it in a rollover accident (even if belted can happen on the sides). There's also some fringe benefits: noise isolation, UV protection, and supposedly more annoying for thieves.
By placing a statement upon the public internet, you both implicitly and explicitly consent to that content being consumed by anyone, and by any means. Such is the implicit covenant that access to the public internet imposes upon all participants.
Making the content queryable by a database engine is merely a technical optimisation of the efficiency with which that content may be consumed. The same outcome could have been accomplished by capturing a screenshot of every web page on the internet, or by copying and pasting the said content laboriously by an imaginary army of Mechanical Turks.
A private network may, of course, operate under an entirely different access model and social contract.
Embeddings are encodings of shared abstract concepts statistically inferred from many works or expressions of thoughts possessed by all humans.
With text embeddings, we get a many-to-one, lossy map: many possible texts ↝ one vector that preserves some structure about meaning and some structure about style, but not enough to reconstruct the original in general, and there is no principled way to say «this vector is derived specifically from that paragraph by authored by XYZ».
Does the encoded representation of the abstract concepts represent the derivate work? If yes, then every statement ever made by a human being represents the work derivative of someone else's by virtue of learning how to speak in the childhood – they create a derivative work of all prior speakers.
Technically, the3re is a strong argument against treating ordinary embedding vectors as derivative works, because:
- Embeddings are not uniquely reversible and, in general, it is not possible reconstruct the original text from the embedding;
- The embedding is one of an uncountable number of vectors in a space where nearby points correspond to many different possible sentences;
- Any individual vector is not meaningfully «the same» as the original work in the way that a translation or an adaptation is.
Please do note that this is the philosophical take and it glosses over the legally relevant differences between human and machine learning as the legal question ultimately depends on statutes, case law and policy choices that are still evolving.
Where it gets more complicated.
If the embeddings model has been trained on a large number of languages, it makes the cross-lingual search easily possible by using an abstract search concept in any language that the model has been trained on. The quality of such search results across languages X, Y and Z will be directly proportional to the scale and quality of the corpus of text that was used in the model training in the said languages.
Therefore, I can search for «the meaning of life»[0] in English and arrive at a highly relevant cluster of search results written in different languages by different people at different times, and the question becomes is «what exactly it has been statistically[1] derived from?».
[0] The cross-lingual search is what I did with my engineers last year to our surprise and delight of how well it actually worked.
[1] In the legal sense, if one can't trace a given vector uniquely back to a specific underlying copyrighted expression, and demonstrate substantial similarity of expression rather than idea, the «derivative work» argument in the legal sense becomes strained.
> Take Myanmar as an example: even if China occupied it […]
Historically, however, the record is rather unflattering for China in its engagements with Myanmar (formerly Burma) – China has waged four wars[0] with Myanmar and suffered a defeat to Myanmar in each instance.
[0] Or one war with four invasions – depending on the point of view.
so i guess the Mayanmar people shouldn't blame china now.. they should build some thing like the Vietness people: we fight the chinese and we always win, lets be proud of it.
> The only reason why they have chosen "" as prefix in C, which they later regretted, was because it seemed easier to define the expressions "++p" and "*p++" to have the desired order of evaluation.
There has been no shortage of speculation, much of it needlessly elaborate. The reality, however, appears far simpler – the prefix pointer notation had already been present in B and its predecessor, BCPL[0]. It was not invented anew, merely borrowed – or, more accurately, inherited.
The common lore often attributes this syntactic feature to the influence of the PDP-11 ISA. That claim, whilst not entirely baseless, is at best a partial truth. The PDP-11 did support pre-increment and post-increment indirect address manipulation – but notably lacked their symmetrical complements: pre-increment and post-decrement addressing modes[1]. In other words, it exhibited asymmetry – a gap that undermines the argument for direct PDP-11 ISA inheritance, i.e.
[1] PDP-11 ISA allocates 3 bits for the addressing mode (register / Rn, indirect register (Rn), auto post-increment indirect / (Rn)+ , auto post-increment deferred / @(Rn)+, auto pre-decrement indirect / -(Rn), auto pre-increment deferred / @-(Rn), index / idx(Rn) and index deferred / @idx(Rn) ), and whether it was actually «let's choose these eight modes» or «we also wanted pre-increment and post-decrement but ran out of bits» is a matter of historical debate.
The prefix "*" and the increment/decrement operators have been indeed introduced in the B language (in 1969, before the launch of PDP-11 in 1970, but earlier computers had some autoincrement/autodecrement facilities, though not as complete as in the B language), where "*" has been made prefix for the reason that I have already explained.
The prefix "*" WAS NOT inherited from BCPL, it was purely a B invention due to Ken Thompson.
In BCPL, "*" was actually a postfix operator that was used for array indexing. It was not the operator for indirection.
In CPL, the predecessor of BCPL, there was no indirection operator, because indirection through a pointer was implicit, based on the type of the variable. Instead of an indirection operator, there were different kinds of assignment operators, to enable the assignment of a value to the pointer, instead of assigning to the variable pointed by the pointer, which was the default meaning.
BCPL has made many changes in the syntax of CPL, whose main reason was the necessity of adapting the language to the impoverished character set available on American computers, which lacked many of the characters that had been available in Europe before IBM and a few other US vendors have succeeded to replace the local vendors, also imposing thus the EBCDIC and later the ASCII character sets.
Several of the changes done between BCPL and B had the same kind of reason, i.e. they were needed to transition the language from an older character set to the then new ASCII character set. For instance the use of braces as block delimiters was prompted by their addition into ASCII, as they were not available in the previous character set.
The link that you have provided to a manual of the B language is not useful for historical discussions, as the manual is for a modernized version of B, which contains some features back-ported from C.
There is a manual of the B language dated 1972-01-07, which predates the C language, and which can be found on the Web. Even that version might have already included some changes from the original B language of 1969.
* was the usual infix multiplication operator in BCPL, and it was not used for pointer arithmetic.
The BCPL manual[0] explains the «monadic !» operator (section 2.11.3) as:
2.11.3 MONADIC !
The value or a monadic ! expression is the value of the storage cell whose address is the operand of the !. Thus @!E = !@E = E, (providing E is an expression of the class described in 2.11.2).
Examples.
!X := Y Stores the value of Y into the storage cell whose address is the value of X.
P := !P Stores the value of the cell whose address is the value of P, as the new value of P.
The array indexing used the «V ! idx» syntax (section 2.13, «Vector application»).
So, the ! was a prefix operator for pointers, and it was an infix operator for array indexing.
In Richard's account of BCPL's evolution, he noted that on early hardware the exlamation mark was not easily available, and, therefore, he used a composite *( (i.e. a diagraph):
«The star in *( was chosen because it was available … and it seemed appropriate for subscription since it was used as the indirection operator in the FAP assembly language on CTSS. Later, when the exclamation mark became available, *( was replaced by !( and exclamation mark became both a dyadic and monadic indirection operator».
So, in all likelihood, !X := Y became *(X := Y, eventually becoming *X = Y (in B and C) whilst retaining the exact and original semantics of the !.
The BCPL manual linked by you is not useful, as it describes a recent version of the language, which is irrelevant for the evolution of the B and C languages. A manual of BCPL from July 1967, predating B, can be found on the Web.
The use of the character "!" in BCPL is much later than the development of the B language from BCPL, in 1969.
The asterisk had 3 uses in BCPL, as the multiplication operator, as a marker for the opening bracket in array indexing, to compensate for the lack of different kinds of brackets for function evaluation and for array indexing, and as the escape character in character strings. For the last use the asterisk has been replaced by the backslash in C.
There was indeed a prefix indirection operator in BCPL, but it did not use any special character, because the available character set did not have any unused characters.
The BCPL parser was separate from the lexer, and it was possible for the end users to modify the lexer, in order to assign any locally available characters to the syntactic tokens.
So if a user had appropriate characters, they could have been assigned to indirection and address-of, but otherwise they were just written RV and LV, for right-hand-side value and left-hand-side value.
It is not known whether Ken Thompson had modified the BCPL lexer for his PDP computer, to use some special characters for operators like RV and LV.
In any case, he could not have used asterisk for indirection, because that would have conflicted with its other uses.
The use of asterisk for indirection in B became possible only after Ken Thompson has made many other changes and simplifications in comparison with BCPL, removing any parsing conflicts.
You are right that BCPL already had prefix operators for indirection and address-of, which was different from how this had been handled in CPL, but Martin Richards did not seem to have any reason for this choice and in BCPL this was a less obvious mistake, because it did not have structures.
On the other hand, Ken Thompson did want to have "*" as prefix, after introducing his increment and decrement operators, in order to need no parentheses for pre- and post-incrementation or decrementation of pointers, in the context where postfix operators were defined as having higher precedence than prefix.
Also in his case this was not yet an obvious mistake, because he had no structures and the programs written in B at that time did not use any complex data structures that would need correspondingly complex addressing expressions.
Only years later it became apparent that this was a bad choice, while the earlier choice of N. Wirth in Euler (January 1966; the first high-level language that handled pointers explicitly, with indirection and address-of operators) had been the right one. The high-level languages that had "references" before 1966 (the term "pointer" has been introduced in IBM PL/I, in July 1966), e.g. CPL and FORTRAN IV, handled them only implicitly.
Decades later, complex data structures became common while the manual optimization of incrementing/decrementing explicitly pointers for addressing arrays became a way of writing inefficient programs, which prevent the compiler from optimizing correctly the array accessing for the target CPU.
So the choice of Ken Thompson can be justified in its context from 1969, but in hindsight it has definitely been a very bad choice.
I take no issue with the acknowledgment of being on the losing side of a technical argument – provided evidence compels.
However, to be entirely candid, I have submitted two references and a direct quotation throughout the discourse in support of the position – each of which has been summarily dismissed with an appeal to some ostensibly «older, truer origin», presented without citation, without substantiation, and, most tellingly, without the rigour such a claim demands.
It is important to recall that during the formative years of programming language development, there were no formal standards, no governing design committees. Each compiled copy of a language – often passed around on a tape and locally altered, sometimes severely – became its own dialect, occasionally diverging to the point of incompatibility with its progenitor.
Therefore, may I ask that you provide specific and credible sources – ones that not only support your historical assertion, but also clarify the particular lineage, or flavour, of the language in question? Intellectual honesty demands no less – and rhetorical flourish is no substitute for evidence.
What you say is right, and it would have been less lazy for me to provide links to the documents that I have quoted.
On the other hand, I have provided all the information that is needed for anyone to find those documents through a Web search, in a few seconds.
I have the quoted documents, but it is not helpful to know from where they were downloaded a long time ago, because, unfortunately, the Internet URLs are not stable. So for links, I just have to search them again, like anyone else.
These documents can be found in many places.
For instance, searching "b language manual 1972" finds as the first link:
There exists an earlier internal report about Euler from April 1965 at Stanford, before the publication of the language in CACM, where both indirection and address-of were prefix, like later in BCPL. However, before the publication in January 1966, indirection has been changed to be a postfix operator, choice that has been retained in the later languages of Wirth.
C's «static» and «auto» also come from PL/I. Even though «auto» has never been used in C, it has found its place in C++.
C also had a reserved keyword, «entry», which had never been used before eventually being relinquished from its keyword status when the standardisation of C began.
> I smell AI writing assistance. Which is a shame […]
I have met multiple brilliant, very bright, and talented people (mathematicians, physicists, doctors) who excel at what they know and do, yet immensely struggle to spell, write, or both. There are also people who do not like to write (whatever the reason is).
GenAI has been a great boon for such a type of person as it dissolves their struggle – they convey the idea to the machine (however awful the scribe is) and GenAI handles the grammar and style.
Granted, it is different from «hey, GenAI pet, write me a blog post on XYZ».
Whilst I think that C has its place, my personal choice of Algol 26 or 27 would be CLU – a highly influential, yet little known and underrated Algol inspired language. CLU is also very approachable and pretty compact.
If you mean fooling the compiler by the source code obfuscation, it won't – by the time the first optimisation pass arrives, the source had already been transformed into an abstract syntax tree and the source code obfuscation becomes irrelevant.
Multiple optimiser passes do take place, but they are bounded in time – it is not an accepted expectation that the optimiser will spend a – theoretically – indefinite amount of time trying to arrive at the most perfect instruction sequence.
There was a GNU project a long time ago, «superoptimiser», which, given a sequence of instructions, would spend a very long time trying to optimise it into oblivion. The project was more of an academic exercise, and it has been long abandoned since.
reply