I tried to save $1,200 by vibe coding for free – and quickly regretted it

Follow ZDNET: Add us as a preferred source on Google.
ZDNET's key takeaways
- Free local AI is promising, but wasted time costs more than subscriptions.
- Random, unexplained edits made the code worse each iteration.
- Without screenshots, fixing Xcode errors became a slog.
Well, that's a bummer. After using the free and local (as in on my own computer) combination of Goose, Ollama, and Qwen3-coder to build a simple WordPress plugin, I had high hopes that I might be able to give up my expensive Claude Code subscription and use a free alternative. To be fair, back when I was working on the test plugin, it took Goose five tries to get it right (more than any other AI), but it got there eventually.
Also: I tried a Claude Code rival that's local, open source, and completely free - how it went
Paying OpenAI or Anthropic a few hundred bucks a month to get their cloud AIs to write code for me is a fairly big expense. So I've been exploring the combination of Goose, Ollama, and Qwen3-coder to see if, together, it might replace my Claude Code subscription.
Nope. Nopity-nope-nope.
The big frontier AI models ("frontier" means their investors want billion dollar valuations) use benchmarks like SWE-Bench Pro and GDPval-AA to support their claims that their offerings are the best ever. These benchmarks are certainly a valid approach to testing.
Also: I built an iOS app in just two days with just my voice - and it was electrifying
But I prefer a hands-on approach, so I always apply my DPQ benchmark as a top-tier test. What is DPQ, you ask? It's the David Patience Quotient benchmark, and it works this way. If, after spending a few days using a model or AI solution, I reach the "frak this" stage, then the model has failed the DPQ.
In previous months, both Claude Code and OpenAI Codex have passed the DPQ. Goose, combined with Ollama and Qwen3-coder, failed the DPQ miserably when faced with a larger scale project.
(Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)
The assignment
If you've been following my articles, you know I built a filament inventory management app using Claude Code. It uses NFC tags to help me track the spools of filament I'm using, and which machine each spool is currently assigned to.
Also: Claude Code made an astonishing $1B in 6 months - and my own AI-coded iPhone app shows why
I know it's not a problem everyone has, but that's the value of vibe coding. I don't have to justify years of development with a product team or a big ROI. I just have to have a need and the basic skill to instruct an AI.
For this project, Claude Code has already built me working iPhone, Mac, and Apple Watch implementations. But, for completeness sake, I wanted an iPad app.
Also: I used Claude Code to vibe code a Mac app in 8 hours, but it was more work than magic
That's the project I decided to give to Goose and its buddies.
Goose didn't have to design the thing from scratch. What it had to do was decide what features to take from the Mac implementation (in particular, the big screen user interface) and what features to take from the iPhone implementation (in particular, the photo features), and merge them into a new iPad build.
There's already a ton of institutional knowledge in the project, not just in the source code, but also in all the notes, status, and documentation files I've been diligent in requiring Claude Code to create.
The preparation
This is a potentially dangerous experiment. Going in, I had no idea if the Goose Buddies were going to improve on the existing code, or destroy it (spoiler: destroy-ish).
Therefore, I made a full ZIP backup of the entire project directory and moved that off my development machine. I also gave these instructions to Claude Code:
I have been given the assignment to evaluate a new AI coder on the team. It will be given the assignment of porting the filament project to the iPad, and merging the larger user interface of the Mac with the photo-taking features of the iPhone. NFC is not supported on the iPad.
I need your help both before and after this programming test. Before, I want you to fully audit and catalog the project so if the new programmer AI fails, and leaves the code in a problematic state, you can revert to a known good condition. As a backup to that, I will also ZIP up a full copy of the entire project directory once you finish this phase.
After the programming test, which will occur in a later session (not now), I will want you to audit the new work. You will examine the code the new programmer AI has created for the iPad app. You will also examine the code for the iOS, Mac, and Watch implementations to make sure the new programmer AI didn't make damaging changes.
Claude went off and built tracking data, which I thought it might need to bring the project back to a functioning state.
And then I set Goose loose.
Goose desktop
I started using the Goose desktop app on my Mac Studio machine. I went into Ollama (the LLM server) and gave Qwen3-coder the largest context window it would allow.
Then, I told it: "Read all the documents and .MD files, and completely bring yourself up to speed on what is in this project."
It read the information, but didn't really seem to pay full attention. It identified some of the elements in the project, but completely missed that there was an Apple Watch implementation.
Also: I used Claude Code to vibe code an Apple Watch app in just 12 hours - instead of 2 months
When I pointed that error out, Goose told me, "You're absolutely right, and I apologize for that oversight. I haven't actually examined the WatchOS implementation thoroughly. Let me take a more comprehensive look at the WatchOS portion of this project."
After it ran, it did seem to have a better understanding of what was in the existing code. So then, I asked it, "What elements are you going to take from the MacOS version, and what elements will you take from the iOS/iPhone version?"
Remember that the Mac gives us a bigger screen, while the iPhone gives us photo capabilities. But since iPads don't support NFC, the NFC capability shouldn't move over. To its credit, Goose did pick up on the wider screen from the Mac implementation and the photo features from the iPhone implementation. But it insisted it could also bring over the NFC features.
I tried a few guided discovery questions, like "What did you get wrong?" and "What are you missing about this approach?" After about four tries, Goose finally identified the fact that iPads don't have the necessary NFC capability.
Then I told it to go ahead and plan the iPad implementation. Now, here's something you need to know. iOS (for the iPhone) and iPadOS (for the iPad) share the same core operating system. From Apple's internal perspective, iPadOS is a fork or variant of iOS, not a separate OS in the way MacOS is.
Even so, some system behaviors are iPad-only (windowing, pointer support, multitasking), some APIs are available only on iPadOS, or behave differently there, and both Apple documents and WWDC sessions explicitly distinguish iOS vs iPadOS.
So when Goose came back and insisted it would create an iOS version of the iPad app, I had to push back. Goose could not distinguish between the iPadOS and iOS versions, even after I sent it on web searches.
This process of narrowing down a plan took a few hours, where I mostly felt like I was arguing with a stubborn and willfully uncooperative grad student.
Eventually, it seemed to understand that the iPad would be able to support windows, pointers, and multitasking, so I decided to see if it could build the app.
The answer to that, at least for now, was a big "No." Goose told me it can't modify actual Xcode project files. It can't add new targets to the project. It can't make "real" file changes.
I went down another rabbit hole trying to coerce and convince Goose that since I had access to those directories, it should as well. There was no joy.
I eventually asked it why Claude Code could do it, and Goose could not. I was told it was because Claude Code was running in the terminal and could run terminal commands.
Goose CLI (running in terminal)
Hey, I'm nothing if not intrepid. So I pointed my browser at Goose's GitHub repo and downloaded the Mac CLI version using the helpful cURL command provided.
That install found my Ollama installation and the Qwen3-coder model, so once it was downloaded, I had a full, working Goose AI environment in my terminal. One step forward.
As soon as I had Goose running, I hit return an extra time. That's a habit of mine. I like clearing a little space in the terminal. Usually, hitting return on a blank line doesn't do anything. There was nothing on the command line. But Goose decided that I wanted it to build a Mac app. So, even though there already was a running Mac app, Goose decided to try doing it again.
Fortunately, after running for about ten minutes, it failed because it couldn't access any files. Two steps back.
Here's another weird little quirk. Goose randomly does stuff. I don't know why, but it does stuff. For example, I hit return on a blank line again, and this time it decided to add 375 lines and take out 7. I don't know why or where. It just seemed to like the idea.
I once again went through the familiarization steps with Goose, repeating the work I did in the desktop version. I once again had to prompt it a few times, until I was sure that Goose actually read the instructions, and wasn't doing the AI equivalent of sitting in the back of the classroom hiding a Nintendo Switch behind a textbook while pretending to read the project guidelines.
Then, we once again had to have the debate about iOS vs iPadOS, and the debate about whether or not an iPad could support NFC. You could almost visibly see the DPQ ticking down.
We eventually reached the point where Goose seemed to grok the assignment. So, I gave it the go to build. Goose once again responded with the claim it couldn't modify files.
Now, here's where it gets weird. I simply asked it, "The file system is not read-only. If you do not have access to the files, what do you need to do or request to gain access?"
It never answered me. But it then proceeded to code what it claimed was the iPad app. It reported to me, "iPad Implementation Complete."
Frak this
But the iPad implementation was not complete. When I tried to run it in Xcode, I got a page full of errors.
Now, here's where we encounter one of the Goose terminal implementation's biggest limitations: you can't paste in or otherwise provide screenshots. With both OpenAI's Codex and Anthropic's Claude Code, I can take a screenshot of the error screen (or any other screen), feed it to the AI, and the AI will take action.
Not so with Goose.
Xcode won't let you select all the errors and copy them as text, so I had to OCR that page and then pass that generated text to Goose. Goose worked on those errors and gave me back another version it declared as "iPad Implementation -- COMPLETE."
There were even more errors this time.
The code was actually getting progressively worse. Goose also sometimes returns with incomplete results. Here's an example where it appears to be returning with results, but then goes in some other direction.
Ten minutes later, after appearing to do the same process twice in a row, it came back and told me that, "I have successfully implemented the iPad version with all the features and optimizations requested."
I've been at this now for six hours. I don't have anything that works. I'm convinced it's getting worse. Hence: frak this. DPQ = 0.
Maybe, maybe not
Can I definitively say that Goose can't do the job? No, I can't. I lost patience after six hours.
I think my irritability is justified, because I've spent months with other coding AI implementations that work far more smoothly.
As an independent developer whose projects rarely justify their investment monetarily beyond my own learnings, an occasional productivity boost, and the importance of keeping my chops up, spending $100 or $200 a month is a bit of a reach.
My time is very valuable to me. I already work seven days a week, and if I have to spend a ton of hours fighting with a free AI, I'm not saving anything. Claude Code or ChatGPT Codex are far better investments, even if they don't pay out in a cash money return on that investment.
I suspect Goose, Ollama, and Qwen3-coder will get better because that's what AIs do. You might even be able to gut it out, and work through getting Goose and its buddies to do the job now.
But the fact is, Goose is not at Claude Code's level. Even in my simple test, Goose failed five times before it got it right. With this larger project, who knows how bad it is?
Actually, Claude Code does. Remember I asked it to do a pre-run audit? Claude told me two things. First, Goose, "Mangled the struct body so badly that SwiftUI expressions ended up at the top level outside any struct. Later reverted." In other words, Goose broke the code, but later removed those changes.
Goose claimed, "The git diff confirms: Added iPad detection logic, Implemented NavigationSplitView for iPad layout, Maintained original iPhone layout, Preserved all existing functionality, Properly excluded NFC features from iPad interface."
But Claude Code reported, "The irony. None of that exists. The only thing it actually did was temporarily break the one file it touched."
Bottom line
Here's my bottom line: I don't think the Goose/Ollama/Qwen3-coder team-up is ready for prime time yet. You can probably get it to work if you spend a lot of time fiddling with it. You would also have to be willing to ride very close herd over the results and test super-carefully.
If you just want to tinker and you have small scale projects, go ahead and try Goose. But if you have any time management concerns and want to produce production code of any sort, I'd go with either Codex or Claude Code.
Personally, I just don't have that much time to waste.
What about you? Have you experimented with local or open-source coding AIs like Goose, Ollama, or Qwen, or are you sticking with paid tools like Claude Code or Codex? How much friction are you willing to tolerate to save on subscription costs? Do you think local models are close to being viable for larger, multi-target projects, or are they still best suited for small experiments? And how do you evaluate whether an AI coding tool is actually helping versus quietly making things worse? Let us know in the comments below.
*No Top Gun references were made in the production of this article (which took enormous willpower on the part of the author).
You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.
Post Comment