No amount of fine-tuning can prevent models from doing anything. All it can do is reduce the likelihood of exploits happening, while also increasing the surprise factor when they inevitably do. This is a fundamental limitation.
They're different. Most programs can in principle be proven "correct" -- that is, given some spec describing how it's allowed to behave, it can either be proven that the program will conform to the spec every time it is run, or a counterexample can be produced.
(In practice, it's extremely difficult both (a) to write a usefully precise and correct spec for a useful-size program, and (b) to check that the program conforms to it. But small, partial specs like "The program always terminates instead of running forever" can often be checked nowadays on many realistic-size programs.)
I don't know any way to make a similar guarantee regarding what comes out of an LLM as a function of its input (other than in trivial ways, by restricting its sample space -- e.g., you can make an LLM always use words of 4 letters or less simply by filtering out all the other words). That doesn't mean nobody knows -- but anybody who does know could make a trillion dollars quite quickly, but only if they ship before someone else figures it out, so if someone does know then we'd probably be looking at it already.
AI can't distinguish between user prompts and malicious data, until that fundamental issue is fixed no amount of mysql_real_secure_prompt will get you anywhere, we had that exact issue with sql injection attacks ages ago.
I have a background in program analysis, but I'm less familiar with the kind of kernels you are optimising.
- Can you give some more insight on why 12 ops suffice for representing your input program?
- With such a small number of ops, isn't your search space full of repeat patterns? I understand the will to have no predefined heuristics, but it seems that learning some heuristics/patterns would massively help reduce the space.
we're just optimizing linear algebra, which is mostly made up of patterns of simple ops. for instance, matmul is just broadcasted multiply -> sum reduce.
the search does common subexpression elimination by default. if two patterns are unioned in the search space, it applies that union to every occurrence of that pattern at the same time, so using e-graphs it helps keep the search space smaller.
ah i see the confusion. we do common subexpression elimination of the terms in the search space (which allows single application of rewrites to apply to many repeat patterns) but the search can choose to re-use patterns of terms when we extract dags after the search space is built. so various levels of recomputation are searched.
right now since we're profiling kernels, and we have a reference output of the unoptimised version, we can directly measure deviation of profiled outputs "for free" since we're already computing them for runtime. tbh this isn't what i want long term, i want to bake numerical stability natively into the search space to only extract dags that would produce stable outputs. hopefully that'll be solved soon.
This heavily depends on share classes and preferences. Surely the new investor wants better terms. The issue isn't so much dilution as a preference but added risk of never even getting a payout at all.
Feel free, my email is my HN username at gmail. As I mentioned in a sibling comment, I'm not in tech anymore (except for fun) but I'm happy to provide any insights or advice to any questions you may have. I last worked with ERP systems about 10 years ago, but the thing about these systems is that they don't really change, so hopefully my knowledge is still helpful!
As a quick summary, this is mostly all ETL (Extract, Transform, Load). Learn the lowest practical level language you can for each system so you can extract / query exactly the data you need (in SAP it's mostly SQL but there are a bunch of proprietary "languages" and extensions that can help. In MS it's mostly SQL, in Infor it's worth learning some RPG /CL, but only enough to get the data you need). Learn the notification paradigms each system offers (this can be as simple as scheduling jobs, it can be file-system watchers, it can be RPC based, etc. etc. Each system is different, but you need to know when to do stuff with the data. Both on change from external influences, and on completion of whatever job you ran). Thirdly, utilise a modern language for the "Transform" and "Load" phases of the job. I used to use everything from C, C#, Perl, Python, etc. These languages are much easier and much more powerful than something like RPG or pure SQL for a non-maestro. There are people who use complex stored procedures for everything, and that's cool, but it's also a steeper barrier of entry than using a modern, "batteries-included" language.
The reason I initially taught myself Go (13 years ago now) was due to the incredibly powerful and simple data manipulation packages built into it's standard library. It's a damn good language for doing high speed, low latency transformation of lots of data, while being simple and easy to code and deploy. Having said this, the afore-mentioned C, C#, Perl and Python were all working fine before I switched to Go! Use what you know, and what's available in the sytem you're working on.
If you have any questions feel free to send me an email.
True, but I'm not actually in tech anymore. I still have a consultancy business, but I make money doing geo-technical rope-access work now. A bit of a sea-change.
I'm happy to impart my thoughts on this topic for free, but I'm not sure how useful they would be after almost 10 years!
You will always find the most efficient farm machinery to be the least human-like in its design principles. The more it looks like something out of Mad Max the better.
Unless we come up with a machine like the combine harvester for blackberries, no one is going to be interested.
There are several kinds of blueberry picking machines. There are air-blast pickers that blow the berries off the plant. There are ones with wheels of vibrating sticks. There are ones that get a comb around the plant and pull.
Some berries get damaged, yes. Some leaves and twigs get through. They're separated out by a very fast vision-based sorting machine before packing.[1] That's been standard technology for a decade or so.
Apple picking is still in the R&D stage.[2] Cost needs to come down to $0.02 per pick.
It's great to see startups in this area, but the thing has to work. There are too many failed ag robotics startups.[3] Ask "could you pressure-wash this thing"? If there are wires, electronics, and bearings exposed, it's still experimental.
Yes, powerwashing would be wanted. That's an IP69K, not too hard to hit with some basic mechanical protections.
Unless you need delicate sensors which need direct contact to samples to work.
Maybe it's not a complete necessity, but generally it's gonna be mixed in with big farm equipment that is power washed. The more you have to "coddle" the equipment the less cost effective it'll be for farmers.
Farm workers generally know how to wash themselves. Still I'd wage good money farm hands have used power washers on each other. Probably work well to clean off work coveralls!
Strictly it needn't be if it offers an even better solution, but, realistically, what startup trying to introduce a new technology (that isn't cleaning technology) has time to also develop a novel way to clean things? It is such an unlikely scenario that it isn't worth considering.
> You will always find the most efficient farm machinery to be the least human-like in its design principles [...] the combine harvester
Oh? I find my human-based process for separating grain to be of the very same principles as the combine. The specific mechanics aren't exactly the same. For example a combine has a fan, while I have lungs. But the principle — using airflow to aid in separation — is the same.
The sprayer is the only piece of equipment on my farm I can think of that employs a different principle to do the job as compared to how I would do the job by hand.
In most cases if you want to machine harvest you have to design your field around that. A vineyard, for example, that is designed to be machine harvested looks very different from one that is designed to be hand harvested. So if you want to machine harvest blackberries at scale you probably have to plant and manage your blackberry bushes in a specific way to allow for machine harvesting.
This is a classic example of University press releases, you learn to recognize the pattern. Someone who's skill set is PR gets a dumbed-down version of the science, and then converts that into a hype piece that ignores reality in favor of vague statements.
If you want the essence of this technique look at any university press release about fusion technology.
>Every time I see these headlines, the tech seems to be at least 10 years away from product.
There's no incentive for the capital class to massively invest in fruit picking robotics when there are tens of millions of exploitable humans on the planet that you can use do the same job for dirt cheap.
The economic balance needs to change for change to happen.
That's why the capital class is overinvesting in AI, because that can potentially replace the higher paid jobs where the labor has leverage and turn them into similarly exploitable workers.
The quote from the researcher is that one "could [hypothetically] design something that is better than the human hand for that one specific task," which gets turned into "some day this specific device could be better" in the prose, which becomes a suggestion that hey, maybe it already is better! in the headline. Everything published by a Uni PR department is a puff piece, frankly I don't know why they're even allowed here.
This is very cool. I'm wondering if some of the templates and switch statements would be nicer if there was an intermediate representation and a compiler-like architecture.
I'm also curious about how this compares to something like Jax.
You are absolutely correct! I started working on a sort of compiler a while back but decided to get the basics down first. The templates and switch(s) are not really the issue but rather going back and forth between C & Python. This is an experiment I did a few months ago: https://x.com/nirw4nna/status/1904114563672354822 as you can see there is a ~20% perf gain just by generating a naive C++ kernel instead of calling 5 separate kernels in the case of softmax.
I think perhaps this could be done in other ways that don't require interval arithmetic for autodiff, only that the gradient is conservatively computed, in other words carrying the numerical error from f into f'
Two main aspects: 1. How to handle the data related to the target problem; 2. Choosing suitable charts to present this data.
#1. By leveraging the increasingly powerful coding capabilities of LLMs, we can appropriately process raw data to obtain a dataset that closely aligns with our goals;
#2. We expanded echart and utilized its rich chart types already supported, along with the Univer SDK from the Univer team, ultimately creating tables.
I'm building a tool to make ml on tabular data (forecasting, imputation, etc) easier and more accessible. The goal is to go from zero to a basic working model in minutes, even if the initial model is not perfect, and then iteratively improve the model step by step, while continuously evaluating each step with metrics and comparisons to the previous model. So it's less ml foundation research, and more trying to package it in a user friendly way with a nice workflow, but if that's interesting feel free to reach out (email in profile).