Developer, 11 year reddit refugee

Zetaphor

  • 1 Post
  • 25 Comments
Joined 1 year ago
cake
Cake day: March 12th, 2024

help-circle









  • People should be attacking your idea, not their perception of you based on your choice in browser.

    My objection with Brave, Vivaldi, and other other browser that is just Chrome with a different skin of paint is they are all signalling an acceptance of Google’s monopoly over the web standards ecosystem.

    Mozilla is a shit organization run by a shit CEO, but they’re the only alternative we have to the megalith that is the advertising company known as Google. It really shouldn’t be a hard argument to understand that putting an advertising company at the head of the web standards process is a really bad idea if you care about anything other than Google’s revenue streams, ie a free and open web.

    Chromium only exists as a way for Google to keep antitrust regulators from coming after them like they did to Microsoft when IE had a monopoly. It’s source-available, not open source, they don’t accept commits from non-Googlers. The moment they feel safe closing down the Chromium repos without having to lose too much money in fines or blowback, they absolutely will.

    We’re literally watching this happen right now with Android, another formally open source project from Google that is slowly having all of its open source components clawed back so that they can maintain their control over the ecosystem and protect the revenue stream that is their data collection and app store.

    When Google inevitably decides to pull the plug on Chromium the collective of forked browser developers is not going to be able to keep up with the massive engineering effort required to keep a modern browser going. Especially when a corporation like Google can and will push forward complex and difficult to implement standards expressly for the purpose of making those forks obsolete. They have the manpower, capital, and control over massive web properties to effectively push out anyone they don’t want.

    All it takes is them making a change to Youtube that hinders alternative browsers and that will be the death of that open source ecosystem. They’ve literally pulled this exact move before with Youtube by hindering Firefox’s performance by pushing through the implementation of shadow DOM.

    All of this has happened before and all of this will happen again. Trusting an advertising company with control of the open web is the nerd equivalent of leopards ate my face







  • Quoting this comment from the HN thread:

    On information and belief, the reason ChatGPT can accurately summarize a certain copyrighted book is because that book was copied by OpenAI and ingested by the underlying OpenAI Language Model (either GPT-3.5 or GPT-4) as part of its training data.

    While it strikes me as perfectly plausible that the Books2 dataset contains Silverman’s book, this quote from the complaint seems obviously false.

    First, even if the model never saw a single word of the book’s text during training, it could still learn to summarize it from reading other summaries which are publicly available. Such as the book’s Wikipedia page.

    Second, it’s not even clear to me that a model which only saw the text of a book, but not any descriptions or summaries of it, during training would even be particular good at producing a summary.

    We can test this by asking for a summary of a book which is available through Project Gutenberg (which the complaint asserts is Books1 and therefore part of ChatGPT’s training data) but for which there is little discussion online. If the source of the ability to summarize is having the book itself during training, the model should be equally able to summarize the rare book as it is Silverman’s book.

    I chose “The Ruby of Kishmoor” at random. It was added to PG in 2003. ChatGPT with GPT-3.5 hallucinates a summary that doesn’t even identify the correct main characters. The GPT-4 model refuses to even try, saying it doesn’t know anything about the story and it isn’t part of its training data.

    If ChatGPT’s ability to summarize Silverman’s book comes from the book itself being part of the training data, why can it not do the same for other books?

    As the commentor points out, I could recreate this result using a smaller offline model and an excerpt from the Wikipedia page for the book.