Carl T. Holscher fights for the customers.

Tag: Blog

Is a part of you powering AI?

The internet is user-generated content. We’re all making things and contributing whether publicly or “privately”. With our names or without.
AI” is years of our data being scraped, packaged and sold back to us.

ChatGPT and the other Large Language Models (LLMs) are little more than someone who is incredible well-read with perfect recall. They’re taken the internet’s data and packaged it and put a neat little plain language front-end on it for us to interact with.

The reason chatGPT can write such good fanfiction is because it scraped 32billion words from AO3. And that was in 2019. So there’s likely even more fanfic in the large language model today. If you look at this and say, “but it’s only fanfiction, who cares?” Would it be acceptable to other writers?

It’s abhorrent that a program which purports to support a community of writers has based at least 32 billion words of its program on the writing of a community that did consent to have their work used.

Writing fic is not stealing, but taking fic and using it to develop a dataset, and then offering that dataset to the public without having gotten permission from literally anyone is ethically gross.

How Bots Like ChatGPT Have Stolen Fanfiction, and What It Means

What if your entire history of writing that you had publicly posted to the Internet was scooped up and used without your permission for another company to make money from?

Well, that it likely the cast as Kevin Schaul, Szu Yu Chen and Nitasha Tiku writing for The Washington Post have researched and reported on.

To look inside this black box, we analyzed Google’s C4 data set, a massive snapshot of the contents of 15 million websites that have been used to instruct some high-profile English-language AIs, called large language models, including Google’s T5 and Facebook’s LLaMA. (OpenAI does not disclose what datasets it uses to train the models backing its popular chatbot, ChatGPT)

See the websites that make AI bots like ChatGPT sound so smart

What about social media?

Social networks like Facebook and Twitter — the heart of the modern web — prohibit scraping, which means most data sets used to train AI cannot access them. Tech giants like Facebook and Google that are sitting on mammoth troves of conversational data have not been clear about how personal user information may be used to train AI models that are used internally or sold as products.

So while your posts to social media may not be in ChatGPT, it’s certainly going to be included in Meta/Facebook’s own product. And they’re long history of scooping up and and all data, it’s certainly far more extensive.

What about if you have ever written in a blog on powered by WordPress, Tumblr, Blogspot and Live Journal? Then you’ve included too.

My own site is included in the data set at rank 1,953,276.

If you write on the web, you’re likely there too. You can search through the data by scrolling to the bottom of The Washington Post’s article: See the websites that make AI bots like ChatGPT sound so smart.

As with any story that talks about data, there’s a section at the end describing how the Post came to this data and the 15.1 million unique domains included in this dataset.

How do you feel about your writing being included in this gigantic data sets and being used to build products?

Blog when you disagree

When you disagree, that’s what you should write about, and you should post it to your blog. 140 characters thrown against wave after wave of mainstream opinion tweets will be drowned out. A blog post isn’t a cheap opinion; it’s a statement that what you think matters.

Manton Reece’s Blog when you disagree

On Self-Hosting

I predict 2014 is the year when we see more popular services go away. Either because they’re unsustainable businesses or they’re bought up and immediately integrated into larger companies. Either way, they go away and all we’ve left with is a message saying how much we, the customers, mean to them.

Because of this I’ve started to bring some services in-house and run them on my server. The following tools are what I’ve chosen to use.

Author’s Note: I am not saying they are the best thing out there. Nor am I saying they are perfect for you. They’re just what I use. I use them. I like them. You may not.

With the pile of services that will host your text and images, I still prefer to host my blog. Congratulations! You’re here.

Blogs

Tech in the Trenches is hosted on a WordPress installation I run off a Dreamhost shared server. I’m not fancy.

With the recent demise of Google Reader and Feedly’s questionable decisions, I’ve decided to host my own RSS reader. Sure, there are plenty of good ones out there. But it’s not something I care enough about to pay for.

If I didn’t have RSS, life would go on. I would go back to keeping folders of links just as I did before RSS. I would also use the various social media networks to let the good stuff bubble up from the muck of the Internet.

RSS

To that end, I found TinyTinyRSS and decided to install it.

It’s small, flexible and has a plugin community around it. But the reason I found it is Dreamhost blogged about installing it. What’s easier than that?

So a few minutes after reading the post, I had TT-RSS setup.

Once it’s running I would recommend finding a different theme as I don’t care for the default. I’m using the Feedly theme out of habit. There’s also a Google Reader-style theme if you want to relive the glory.

While there is a native Android client it didn’t help me out on iOS.

To get it working on my iPhone, I am using the Fever plugin. This allows TT-RSS to authenticate as if it were Shaun Inman’s Fever. It works with Reeder, which I use. It also supports Mr. Reader and ReadKit according to the developer.

To make this work you have to enable API access in your tt-rss account preferences (Preferences -> Enable external API) before using the client. I missed this step and couldn’t figure out why it wouldn’t work.

Analytics

I don’t keep a close eye on my analytics. But I am curious every now and again when I get more than a few hits on a post where the traffic comes from.

Piwik works well for me. It gives me what Google Analytics provides without the threat of it going away.

Photos

This is almost constantly in flux. For years I used Gallery. It was stable and robust. But then it grew bloated. I prefer smaller tools and went looking for an alternative.

I decided on Piwigo. It feels lighter to me. I don’t a complex set of tools. I want a place to make albums and show them off. That’s it. It’s simple and it works for me. If you’re a Dreamhost user, both of these are available as one-click installs.

I’ve also been flirting with TroveBox (formerly OpenPhoto). They have a hosted option that will use your own storage but also charges a monthly fee.

They provide downloads and documentation to get the software setup on a variety of server setups.

GIFs

Yes, I keep some animated gifs at my disposal. To do this, I use Eat My GIF. It’s a ridiculously simple drop-in installation and now I have a place to throw GIFs to deploy as needed. Yes, I realize this is very silly. But I like it and it’s developed by a friend.

So don’t hate.
Hates Gonna Hate

What I’m not hosting

Email. I have no desire to run my own mail server. I use Gmail and am perfectly happy with it for now.

Social Media. I see the value of a distributed social media network. However, I am happy with Twitter/App.net/Facebook. I don’t need anything else.

I tried out Tent in the form of Tent.is, which now appears to be Cupcake.io, for a short time but I’m not enough of an ubernerd to hack it.

OwnCloud I had running for a while. But I found I didn’t really use it. Dropbox is still fine for me. It’s on my radar and I may use it again for something. But I just don’t have a need for it.

It’s easy to get carried away and start hosting things I don’t need to host. It makes more work for me to support and keep it updated and working. Sometimes the trade-offs are easier letting someone else do the heavy lifting.

Just because I can do it, doesn’t mean I should.

Are you all self-hosting anything interesting? Tell me about it over on Twitter or ADN.

Not everyone is a nerd like me

I wrestled with starting Tech in the Trenches. I’ve kept blogs as an HTML file (back in the days of Xoom.com), Xanga, LiveJournal, Blogger, Tumblr, TypePad, Textpattern and currently WordPress. I’ve got a pile of notebooks from school where I used to write obsessively. Song lyrics, (terrible) teenage poetry, musings and thoughts on whatever popped into my head. I wrote often and still write daily.

I hesitated starting this blog because I never had a single topic in mind where I could focus my writing. What I wanted to focus on was technology and recommendations. But I felt like everything I wanted to say had already been said by others.

I would say to myself, “What is the point in writing about this app or this website? Shawn Blanc or John Gruber have already covered it better than I ever could.

In addition to being an obsessive writer, I am also a compulsive reader. I digest tech news and writing through hundreds of RSS feeds, blogs, podcasts and articles I come across everyday. Johnny 5 and I share the cry for Need More Input!

It is because of my massive consumption of tech news and information, I have the tendency to assume because I am aware of that cool website or application those around me are as well. However, this is clearly not the case. Not everyone reads like a man obsessed.

That is partially what pushed me over the edge into starting this blog. I wanted to share the interesting things I found and knew about. I wanted to help other people find interesting things to make their lives and work better.

I want to share what I find with friends, family and the strangers who somehow stumble across this space from Google searches, or Tumblr links.

I finally realized I don’t need to be the first person ever to talk about something. Nor do I need to be the best person to talk about it. I need to write what I want to write and not let the self-doubt stop me.

Powered by WordPress & Theme by Anders Norén