Rabbit R1 AI box revealed to just be an Android app

AnActOfCreation@programming.dev · 2 months ago

Rabbit R1 AI box revealed to just be an Android app

MxM111@kbin.social · 2 months ago

The best convincing answer is the correct one. The correlation of AI answers with correct answers is fairly high. Numerous test show that. The models also significantly improved (especially paid versions) since introduction just 2 years ago.
Of course it does not mean that it could be trusted as much as Wikipedia, but it is probably better source than Facebook.

De_Narm@lemmy.world · 2 months ago

“Fairly high” is still useless (and doesn’t actually quantify anything, depending on context both 1% and 99% could be ‘fairly high’). As long as these models just hallucinate things, I need to double-check. Which is what I would have done without one of these things anyway.

AIhasUse@lemmy.world · 2 months ago

Hallucinations are largely dealt with if you use agents. It won’t be long until it gets packaged well enough that anyone can just use it. For now, it takes a little bit of effort to get a decent setup.

TrickDacy@lemmy.world · 2 months ago

1% correct is never “fairly high” wtf

Also if you want a computer that you don’t have to double check, you literally are expecting software to embody the concept of God. This is fucking stupid.

De_Narm@lemmy.world · edit-2 2 months ago

1% correct is never “fairly high” wtf

It’s all about context. Asking a bunch of 4 year olds questions about trigonometry, 1% of answers being correct would be fairly high. ‘Fairly high’ basically only means ‘as high as expected’ or ‘higher than expected’.

Also if you want a computer that you don’t have to double check, you literally are expecting software to embody the concept of God. This is fucking stupid.

Hence, it is useless. If I cannot expect it to be more or less always correct, I can skip using it and just look stuff up myself.

TrickDacy@lemmy.world · 2 months ago

Obviously the only contexts that would apply here are ones where you expect a correct answer. Why would we be evaluating a software that claims to be helpful against 4 year old asked to do calculus? I have to question your ability to reason for insinuating this.

So confirmed. God or nothing. Why don’t you go back to quills? Computers cannot read your mind and write this message automatically, hence they are useless

De_Narm@lemmy.world · 2 months ago

Obviously the only contexts that would apply here are ones where you expect a correct answer.

That’s the whole point, I don’t expect correct answers. Neither from a 4 year old nor from a probabilistic language model.

TrickDacy@lemmy.world · 2 months ago

And you don’t expect a correct answer because it isn’t 100% of the time. Some lemmings are basically just clones of Sheldon Cooper

De_Narm@lemmy.world · 2 months ago

I don’t expect a correct answer because I’ve used these models quite a lot last year. At least half the answers were hallucinated. And it’s still a common complaint about this product as well if you look at actual reviews (e.g., pretty sure Marques Brownlee mentions it).

FlorianSimon@sh.itjust.works · 2 months ago

Something seems to fly above your head: quality is not optional and it’s good engineering practice to seek reliable methods of doing our work. As a mature software person, you look for tools that give less room for failure and want to leave as little as possible for humans to fuck up, because you know they’re not reliable, despite being unavoidable. That’s the logic behind automated testing, Rust’s borrow checker, static typing…

If you’ve done code review, you know it’s not very efficient at catching bugs. It’s not efficient because you don’t pay as much attention to details when you’re not actually writing the code. With LLMs, you have to do code review to ensure you meet quality standards, because of the hallucinations, just like you’ve got to test your work before committing it.

I understand the actual software engineers that care about delivering working code and would rather write it in order to be more confident in the quality of the output.

TrickDacy@lemmy.world · 2 months ago

Like most people, I have no interest in engaging in conversation with someone who gives me zero reason to.

Not that it’s any of your business, but quality matters to me more than anything else, which is why I like tools that help me deliver it