A tiny mouse, a hacker.

  • 0 Posts
  • 20 Comments
Joined 8 months ago
cake
Cake day: December 24th, 2023

help-circle
  • It’s not. It just doesn’t get enough hits for that 86k to matter. Fun fact: most AI crawlers hit /robots.txt first, they get served a bee movie script, fail to interpret it, and leave, without crawling further. If I’d let them crawl the entire site, that’d result in about two megabytes of traffic. By serving a 86kb file that doesn’t pass as robots.txt and has no links, I actually save bandwidth. Not on a single request, but by preventing a hundred others.



  • That would result in those fediverse servers theoretically requesting 333333 * 114MB = ~38Gigabyte/s.

    On the other hand, if the site linked would not serve garbage, and would fit like 1Mb like a normal site, then this would be only ~325mb/s, and while that’s still high, it’s not the end of the world. If it’s a site that actually puts effort into being optimized, and a request fits in ~300kb (still a lot, in my book, for what is essentially a preview, with only tiny parts of the actual content loaded), then we’re looking at 95mb/s.

    If said site puts effort into making their previews reasonable, and serve ~30kb, then that’s 9mb/s. It’s 3190 in the Year of Our Lady Discord. A potato can serve that.


  • I only serve bloat to AI crawlers.

    map $http_user_agent $badagent {
      default     0;
      # list of AI crawler user agents in "~crawler 1" format
    }
    
    if ($badagent) {
       rewrite ^ /gpt;
    }
    
    location /gpt {
      proxy_pass https://courses.cs.washington.edu/courses/cse163/20wi/files/lectures/L04/bee-movie.txt;
    }
    

    …is a wonderful thing to put in my nginx config. (you can try curl -Is -H "User-Agent: GPTBot" https://chronicles.mad-scientist.club/robots.txt | grep content-length: to see it in action ;))



  • It’s about 5 times longer than previous releases were maintained for, and is an experiment. If there’s a need for a longer term support branch, there will be one. It’s pointless to start maintaining an 5+ year branch with 0 users and a handful of volunteers, none of whom are paid for doing the maintenance.

    So yes, in that context, 15 months is long.





  • There’s a very easy solution that lets you rest easy that your instance is how you want it to be: don’t do open registration. Vet the people you invite, and job done. If you want to be even safer, don’t post publicly - followers only. If you require follower approval, you can do some basic checks to see that whoever sends a follow request is someone you’re okay interacting with. This works on the microblogging side of the Fediverse quite well, today.

    What I’m trying to say is that with registrations requiring admin approval gets you 99% of the way there, without needing anything more complex than that.





  • Yes, it can run all that. You may have to jump through a few hoops (just like in the case of the Steam Deck, just different hoops), but it can run all that.

    I’ll also turn your question back to you: how many people use the Steam Deck for productivity, rather than for gaming, which is its intended purpose? And does it matter?

    Like it or not, the steam deck is a gaming console, even if you can run non-game stuff on it too. Heck, even stuff like the Game Boy had (official!) accessories like the Game Boy Camera and Game Boy Printer, which were both useful outside of gaming. Does that stop the Game Boy from being a (retro) gaming console? There’s an ongoing project to provide productivity apps for the Game Boy (though, arguably, it did not ship yet, but you can extend the game boy with a cartridge in whatever way you can imagine).

    Or, you can use your SNES as a MIDI Synthesizer (https://www.supermidipak.com/)! No modding or anything necessary, it’s just a regular cartridge. Can it be used for fun? Yes. Is it a game? No. You can do a lot of stuff with an SNES cartridge that has nothing to do with gaming. There was even a cartridge that let you play online games on the SNES (https://en.wikipedia.org/wiki/XBAND) - but not only games, it also let you read and write messages to other people. You didn’t need to go into “desktop mode”, nor install a browser, nor do anything special. You plugged in the cartridge, and it worked. It was far less locked down than the XBox or even the Steam Deck! Does that disqualify the SNES (or the game boy) from being a gaming console?





  • Nevertheless, as Bluesky grows, there are likely to be multiple professionally-run indexers for various purposes. For example, a company that performs sentiment analysis on social media activity about brands could easily create a whole-network index that provides insights to their clients.

    (source)

    Is that supposed to be a selling point? Because I’d like to stay far, far away from that, thank you very much.



  • I found that no general purpose search engine will ever serve my needs. Their goal is to index the entire internet (or a very large subset of it), and sadly, a very large part of the internet is garbage I have no desire to see. So I simply stopped using search engines. I have a carefully curated, topical list of links from where I can look up information from, RSS feeds, and those pretty much cover all what I used search for.

    Lately, I have been experimenting with YaCy, and fed it my list of links to index. Effectively, I now have a personal search engine. If I come across anything interesting via my RSS feeds, or via the Fediverse, I plug it into YaCy, and now its part of my search library. There’s no junk, no ads, no AI, no spam, and the search result quality is stellar. The downside is, of course, that I have to self-host YaCy, and maintain a good quality index. It takes a lot of effort to start, but once there’s a good index, it works great. So far, I found the effort/benefit ratio to be very much worth it.

    I still have a SearxNG instance (which also searches my YaCy instance too, with higher weight than other sources) to fall back to if I need to, but I didn’t need to do that in the past two months, and only two times in the past six.