• 0 Posts
  • 10 Comments
Joined 1 year ago
cake
Cake day: June 16th, 2023

help-circle

  • I am an LLM researcher at MIT, and hopefully this will help.

    As others have answered, LLMs have only learned the ability to autocomplete given some input, known as the prompt. Functionally, the model is strictly predicting the probability of the next word+, called tokens, with some randomness injected so the output isn’t exactly the same for any given prompt.

    The probability of the next word comes from what was in the model’s training data, in combination with a very complex mathematical method to compute the impact of all previous words with every other previous word and with the new predicted word, called self-attention, but you can think of this like a computed relatedness factor.

    This relatedness factor is very computationally expensive and grows exponentially, so models are limited by how many previous words can be used to compute relatedness. This limitation is called the Context Window. The recent breakthroughs in LLMs come from the use of very large context windows to learn the relationships of as many words as possible.

    This process of predicting the next word is repeated iteratively until a special stop token is generated, which tells the model go stop generating more words. So literally, the models builds entire responses one word at a time from left to right.

    Because all future words are predicated on the previously stated words in either the prompt or subsequent generated words, it becomes impossible to apply even the most basic logical concepts, unless all the components required are present in the prompt or have somehow serendipitously been stated by the model in its generated response.

    This is also why LLMs tend to work better when you ask them to work out all the steps of a problem instead of jumping to a conclusion, and why the best models tend to rely on extremely verbose answers to give you the simple piece of information you were looking for.

    From this fundamental understanding, hopefully you can now reason the LLM limitations in factual understanding as well. For instance, if a given fact was never mentioned in the training data, or an answer simply doesn’t exist, the model will make it up, inferring the next most likely word to create a plausible sounding statement. Essentially, the model has been faking language understanding so much, that even when the model has no factual basis for an answer, it can easily trick a unwitting human into believing the answer to be correct.

    —-

    +more specifically these words are tokens which usually contain some smaller part of a word. For instance, understand and able would be represented as two tokens that when put together would become the word understandable.



  • I’m convinced that we should use the same requirements to fly an airplane as driving a car.

    As a pilot, there are several items I need to log on regular intervals to remain proficient so that I can continue to fly with passengersor fly under certain conditions. The biggest one being the need for a Flight Review every two years.

    If we did the bare minimum and implemented a Driving Review every two years, our roads would be a lot safer, and a lot less people would die. If people cared as much about driving deaths as they did flying deaths, the world would be a much better place.







  • I don’t go around using that word because of how many people find it disrespectful. But, and I ask this out of honest curiousity, why is it offensive in the first place?

    I see it as synonymous with ‘idiot’ or ‘stupid’ when used colloquially. The argument that it’s a medical term doesn’t really hold as ‘idiot’ and ‘moron’ are also medical terms that refer to a lacking of intellectual acuity. In many ways ‘retarded’ has the same meaning both colloquially and medically. To be mentally retarded is to be mentally slowed or lacking that similar mental acuity that ‘idiot’ or ‘moron’ convey.

    Retarded just means slow and it’s a perfectly apt description. Where I think people get confused is when retardation is linked with a specific attribute like physical retardation or emotional retardation, those convey very different meanings.

    I’m not saying that we should start using it again, but that I find it odd how society has latched onto a very specific word and labelled it as bad in the matter of a decade. At the end of the day, any word that can be used to insult or demean, is rude. It’s not the word being used, it’s what is meant by them. The term 'Cis-gender ’ is also being used in a highly exclusionary way and often times is conveyed as an insult. However, it’s real meaning is not insulting in the least.