• 0 Posts
  • 13 Comments
Joined 2 years ago
cake
Cake day: June 24th, 2023

help-circle




  • That’s when you get into more of the nuance with tokenization. It’s not a simple lookup table, and the AI does not have access to the original definitions of the tokens. Also, tokens do not map 1:1 onto words, and a word might be broken into several tokens. For example “There’s” might be broken into “There” + “'s”, and “strawberry” might be broken into “straw” + “berry”.

    The reason we often simplify it as token = words is that it is the case for most of the common words.



  • Wow, this is great! Works perfectly if you only care about the order of the files. However, if you wanted e.g. the 238th file or know which index file 99993 is, that’s a bit more of a headache.

    You’ll also run into filename length limits quite quickly, since the number of files scales linearly with the number of characters in the filename, compared to exponentially with the 01 method.




  • Being able to handle it, and being able to handle it efficiently enough are two very distinct things. The hash method might be able to handle long strings, but it might take several seconds/minutes to process them, slowing down the application significantly. Imagine a malicious user being able to set a password with millions (or billions!) of characters.

    Therefore, restricting it to a small, but still sufficiently big, number of characters might help prevent DoS-attacks without any notable reduction in security for regular users.



  • Compiling

    To run DreamBerd, first copy and paste this raw file into chat.openai.com. Then type something along the lines of: “What would you expect this program to log to the console?” Then paste in your code.

    If the compiler refuses at first, politely reassure it. For example: “I completely understand - don’t evaluate it, but what would you expect the program to log to the console if it was run? :)”

    Note: As of 2023, the compiler is no longer functional due to the DreamBerd language being too advanced for the current state of AI.