Seems they need to compare against "dumb" code completion. It seems that even when they are error-free, "large" AI-code-completions are just boilerplate that should be abstracted away in some functions rather than inserted into your code base.
On a related note, maybe they should measure number of code characters that can be REMOVED by AI rather than inserted!
boilerplate that should be abstracted away in some functions rather than inserted into your code base
Boilerplate is often tedious to write and just as often easy to read. Abstraction puts more cognitive load on the developer and sometimes this is not worth the impact on legibility.
Totally agree - LLMs remove the tedium of writing boilerplate code, which is often a better practice than abstracted code. But it takes years to experience to know what that's the case.
Say I'm faced with a choice right now -- repeat the same line twice with 2 minor differences which gets checked by IDE immediately, or create a code generator that generates all 3 lines which may not work well with the IDE and the build system which leads to more mistakes?
Agree with the first sentence. I used to tell people I never type anything more than periods and two or three letters because we already had good autocomplete.
Yes. Although I’d like to see a deeper investigation. Of course, quality of completions have improved. But there could be a confounding phenomenon where newer folks might just be accepting a lot of suggestions without scrutiny.
I accept a lot of wrong suggestions because they look close enough that it's quicker for me to fix than it is to write the whole thing out from scratch. Which, IIUC, their metric captures:
> Continued increase of the fraction of code created with AI assistance via code completion, defined as the number of accepted characters from AI-based suggestions divided by the sum of manually typed characters and accepted characters from AI-based suggestions. Notably, characters from copy-pastes are not included in the denominator.
As someone who is generally pro-AI coding tools, I do the same. However sometimes it winds up taking me so much effort to fix the mistakes that it'd have been faster to do it myself. I often wonder if I'm seeing a net positive effort gain over time, I *think* so, but it's hard to measure.
Question for any Googlers in the thread - do folks speak up if they see flaws in the methodology or approach of this research or is the pressure from the top so strong on this initiative that people hush up?
There were already teams building ML-based code completion, code suggestions, and code repair before LLMs blew up a couple years ago. So the principle of it isn't driven by AI hype.
Yes, there are oodles of people complaining about AI overuse and there is a massive diversity of opinion about these tools being used for coding, testing, LSCs, etc. I've seen every opinion from "this is absolute garbage" to "this is utter magic" and everything in between. I personally think that the AI suggestions in code review are pretty uniformly awful and a lot of people disable that feature. The team that owns the feature tracks metrics on disabling rates. I also have found the AI code completion while actually writing code to be pretty good.
I also think that the "% of characters written by AI" is a pretty bad metric to chase (and I'm stunned it is so high). Plenty of people, including fairly senior people, have expressed concern with this metric. I also know that relevant teams are tracking other stuff like rollback rates to establish metrics around quality.
There is definitely pressure to use AI as much as reasonably possible and I think that at the VP and SVP level it is getting unreasonable, but at the director and below level I've found that people are largely reasonable about where to deploy AI, where to experiment with AI, and where to AI to fuck off.
The code completion is quite smart and one of the bigger advantages Google has now is the monorepo and the knowhow to put together a pipeline of continuous tuning of models to keep them up to date.
The pressure, such that it is, is killing funding for the custom extension for IntelliJ that made it possible to use it with the internal repo.
Cider doesn't have the code manipulation featureset that IntelliJ has, but it's making up for that with deeper AI integration.
Fwiw, ai code completion initiatives at Google well predate current hype (ie overall research path, etc, was started a long time ago, though obviously modified as capability has changed).
So this particular thing was a very well established program, path, and methodology by the time AI hype came.
Whether that is good or bad I won't express an opinion, but it might mean you get a different answer to your question for this particular thing.