1) His user numbers are off by an order of magnitude at least, as other comments have mentioned. Even a VM/VPS should handle more, and a modern bare-metal server will do way more than the quoted numbers.
2) Autoscaling is a solution to the self-inflicted problem of insanely-high cloud prices, which cloud providers love because implementing it requires more reliance on proprietary vendor-specific APIs. The actual solution is a handful of modern bare-metal servers at strategic locations which allow you to cover your worst-case expected load while being cheaper than the lowest expected load on a cloud. Upside: lower prices & complexity. Downside: say goodbye to your AWS ReInvent invite.
3) Microservices. Apparently redeploying stateless appservers is a problem (despite the autoscaling part doing exactly this in response to load spikes which he's fine with), and his solution is to introduce 100x the management overhead and points of failure? The argument about scaling separate features differently doesn't make sense either - unless your code is literally so big it can't all fit in one server, there is no problem having every server be able to serve all types of requests, and as a bonus you no longer have to predict the expected load on a per-feature basis. A monolith's individual features can still talk to separate databases just fine.
You can get the separation benefits of microservices in a compiled language with modules that only communicate over well-defined interfaces, constraining each team within their own module without having to introduce a network call between each operation.
Unfortunately that message was way way behind the bombast of "microservices everywhere now" that preceded it for years, to the detriment of many small orgs.
I've seen engineering orgs of 10-50 launch headlong into microservices to poor results. No exaggeration to say many places ended up with more repos & services than developers to manage them.
do they all manage to have 4 different ways to do something like "notify user about x", all In use because they could never be bothered to complete the "upgrade"?
Exactly the problem yes.
Once you have more services than developers, you are probably running into infrequent releases and orphaned projects.
So whenever an inevitable common utility improvement is made, the effort of pushing out 100 repo releases for services no one has touch since Jim left 3 years ago is terrifying.
When there is a breaking change is going to be made and you HAVE to do the 100 releases, it's terrifying. Everyone says it never happens, but work on a project/team for 5 years and it does eventually, even once is enough to swear me off this "architecture".
That's often the case yes. In a monolith a developer disgruntled about the situation can clean up the mess in a weekend, test it and push it through. No chance of that happening in microservices - you'd run out of weekend just opening PRs in the dozens of repos and dealing with all the nitpicking and turf wars.
And to add to this: virtually every programming language allows you to define multiple entry points. So you can have your workers in the exact same codebase as your api and even multiple api services. They can share code and data structures or whatever you need. So, if you do need this kind of complexity with multiple services, you don’t need separate repos and elaborate build systems and dependency hell.
Nice for traditional apps. I'm currently working with a client on an Elixir backend. Some aspects of the tier progressions transfer, but the BEAM diverges a bit (no external queues/redis, scaling direction). I am enjoying it.
I really enjoyed reading this. Much like Instagram, which had thousands of users sign up on the first day, if you aren't able to scale because of your skill level, wouldn't that affect usage and lead to comments like: 'The app/site is so slow'?
Aren't comments like "the site is too slow" similar to "the city is too crowded"?
Twitter famously had a "fail whale" but it didn't stop the company from growing. If you have market demand (and I guess advertising) then you can get away with a sub-optimal product for a long time.
> Twitter famously had a "fail whale" but it didn't stop the company from growing. If you have market demand (and I guess advertising) then you can get away with a sub-optimal product for a long time.
Agreed, but there's still an element of survivorship bias there. Plenty of companies failed as they couldn't keep up with their scaling requirements and pushed the "getting away with a sub-optimal product" for too long a time.
Not criticizing the core idea, which is sound (don't waste ressource overengineering at the beginning, evolve your architecture to match your actual scale as you grow), but the “number of users” figures in this post are completely nonsensical. You ought to multiply them by 100 (if you're being conservative) or even 1000 (depending on the consumption pattern for the user).
Modern hardware is fast, if you cannot fit more than 100 users (not even 100 concurrent users) on a single $50/month server, you're doing something very very wrong.
Even repurposed 10 years old fairphone[1] can handle more than that.
You and another person made this point _but_ I’d encourage you to look at what $50/mo gets you on AWS all in. In reality it will get you a t4g.small plus 200GB of (very slow) storage. Honestly they start to chug at 500 or so users in my experience.
Which is why you should not be going to AWS to begin with when there are plenty of providers who will give you orders of magnitude more performance for this price.
(of course, say goodbye to resume points and your cloud provider conference invite. Question is, what are you trying to do? Are you building a business, or a resume?)
For this you avoid AWS, Azure and GCP. Their pricing is simply not competitive. We operate root servers at Hetzner serving dynamic content to six-figure audiences.
PostgreSQL and Elasticsearch clusters can be operated at a fraction of the cost of comparable managed services offered by the major cloud providers.
The idea that this necessarily involves excessive maintenance effort is nonsense.
The skills needed to use hyperscalers properly are better invested in fundamental sysadmin know-how.
Counting in users is just nonsensical. Is it total registered users? Users per <time interval>? Sessions that need to go in the session store? Concurrent requests?
Then there's the implementation language category. interpreted, JITed vs. AOT.
And of course the workload matters a lot. Simple CRUD application vs. compute-heavy or serving lots of media, ...
Together those factors can make like 6+ OOMs difference.
Good post in general but some caveats:
1) His user numbers are off by an order of magnitude at least, as other comments have mentioned. Even a VM/VPS should handle more, and a modern bare-metal server will do way more than the quoted numbers.
2) Autoscaling is a solution to the self-inflicted problem of insanely-high cloud prices, which cloud providers love because implementing it requires more reliance on proprietary vendor-specific APIs. The actual solution is a handful of modern bare-metal servers at strategic locations which allow you to cover your worst-case expected load while being cheaper than the lowest expected load on a cloud. Upside: lower prices & complexity. Downside: say goodbye to your AWS ReInvent invite.
3) Microservices. Apparently redeploying stateless appservers is a problem (despite the autoscaling part doing exactly this in response to load spikes which he's fine with), and his solution is to introduce 100x the management overhead and points of failure? The argument about scaling separate features differently doesn't make sense either - unless your code is literally so big it can't all fit in one server, there is no problem having every server be able to serve all types of requests, and as a bonus you no longer have to predict the expected load on a per-feature basis. A monolith's individual features can still talk to separate databases just fine.
As is often stated, microservices is a solution for scaling an engineering org to 100s of developers, not for scaling a product to millions of users.
You can get the separation benefits of microservices in a compiled language with modules that only communicate over well-defined interfaces, constraining each team within their own module without having to introduce a network call between each operation.
Unfortunately that message was way way behind the bombast of "microservices everywhere now" that preceded it for years, to the detriment of many small orgs.
I've seen engineering orgs of 10-50 launch headlong into microservices to poor results. No exaggeration to say many places ended up with more repos & services than developers to manage them.
do they all manage to have 4 different ways to do something like "notify user about x", all In use because they could never be bothered to complete the "upgrade"?
Exactly the problem yes. Once you have more services than developers, you are probably running into infrequent releases and orphaned projects.
So whenever an inevitable common utility improvement is made, the effort of pushing out 100 repo releases for services no one has touch since Jim left 3 years ago is terrifying.
When there is a breaking change is going to be made and you HAVE to do the 100 releases, it's terrifying. Everyone says it never happens, but work on a project/team for 5 years and it does eventually, even once is enough to swear me off this "architecture".
That's often the case yes. In a monolith a developer disgruntled about the situation can clean up the mess in a weekend, test it and push it through. No chance of that happening in microservices - you'd run out of weekend just opening PRs in the dozens of repos and dealing with all the nitpicking and turf wars.
The best descripcion of microservices comes from "The Grug Brained Developer" (https://grugbrain.dev/):
"grug wonder why big brain take hardest problem, factoring system correctly, and introduce network call too
seem very confusing to grug"
Grug actually covers this in his essay:
> note, this good engineering advice but bad career advice: "yes" is magic word for more shiney rock and put in charge of large tribe of developer
Microservices definitely contribute to having a "large tribe of developer" to manage.
And to add to this: virtually every programming language allows you to define multiple entry points. So you can have your workers in the exact same codebase as your api and even multiple api services. They can share code and data structures or whatever you need. So, if you do need this kind of complexity with multiple services, you don’t need separate repos and elaborate build systems and dependency hell.
Nice for traditional apps. I'm currently working with a client on an Elixir backend. Some aspects of the tier progressions transfer, but the BEAM diverges a bit (no external queues/redis, scaling direction). I am enjoying it.
Echoing what others have said about the numbers being off.
I ran a 10k user classic ASP service on a VPS from Fasthosts, with MySQL 5.6 and Redis, and it was awesome.
This post shows some signs of having its parts written by a LLM in my opinion. Or am I crazy? Please tell me that I am.
Author having this on his github makes me even more suspicious: https://github.com/ashishps1/learn-ai-engineering
It’s entirely written by an LLM.
LOL, I was having an online chat with a friend the other day and commented I sound like an LLM.
Are you sure?
Yes. The over-use of bold in the intro (hell, in the first sentence) is a good hint.
All of it aligns with https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
Maybe the images were made by hand.
I really enjoyed reading this. Much like Instagram, which had thousands of users sign up on the first day, if you aren't able to scale because of your skill level, wouldn't that affect usage and lead to comments like: 'The app/site is so slow'?
Aren't comments like "the site is too slow" similar to "the city is too crowded"?
Twitter famously had a "fail whale" but it didn't stop the company from growing. If you have market demand (and I guess advertising) then you can get away with a sub-optimal product for a long time.
> Twitter famously had a "fail whale" but it didn't stop the company from growing. If you have market demand (and I guess advertising) then you can get away with a sub-optimal product for a long time.
Agreed, but there's still an element of survivorship bias there. Plenty of companies failed as they couldn't keep up with their scaling requirements and pushed the "getting away with a sub-optimal product" for too long a time.
Do you have some good examples?
This touches the toupet fallacy: "I never saw a large company fail to grow large because of deferred scaling"
Friendster might fit though: https://highscalability.com/friendster-lost-lead-because-of-...
I agree. Go fast with a suboptimal architecture. If success arise, throw away version 1 and rebuild from scratch. Often is more effettive.
Reddit is still around.
It depends on the adoption model.
If it’s just “sign up any time you want and go”, yes, it can go that way.
If it’s “join that waiting list” or “book a call” (for KYC purposes or whatever), you have a buffer.
If user count is more or less constant (most internal websites, for example), it’s probably not an issue.
And so on.
Not criticizing the core idea, which is sound (don't waste ressource overengineering at the beginning, evolve your architecture to match your actual scale as you grow), but the “number of users” figures in this post are completely nonsensical. You ought to multiply them by 100 (if you're being conservative) or even 1000 (depending on the consumption pattern for the user).
Modern hardware is fast, if you cannot fit more than 100 users (not even 100 concurrent users) on a single $50/month server, you're doing something very very wrong.
Even repurposed 10 years old fairphone[1] can handle more than that.
[1]: https://far.computer
You and another person made this point _but_ I’d encourage you to look at what $50/mo gets you on AWS all in. In reality it will get you a t4g.small plus 200GB of (very slow) storage. Honestly they start to chug at 500 or so users in my experience.
Which is why you should not be going to AWS to begin with when there are plenty of providers who will give you orders of magnitude more performance for this price.
(of course, say goodbye to resume points and your cloud provider conference invite. Question is, what are you trying to do? Are you building a business, or a resume?)
For this you avoid AWS, Azure and GCP. Their pricing is simply not competitive. We operate root servers at Hetzner serving dynamic content to six-figure audiences.
PostgreSQL and Elasticsearch clusters can be operated at a fraction of the cost of comparable managed services offered by the major cloud providers.
The idea that this necessarily involves excessive maintenance effort is nonsense.
The skills needed to use hyperscalers properly are better invested in fundamental sysadmin know-how.
If you look at what $50 a month gets you at OVH or Hetzner then their post makes more sense.
It isn’t an apples to apples comparison. But, you trade some additional operational overhead for a whole lot more hardware.
Totally agree - I was just trying to give a perspective for the user scale figures
Counting in users is just nonsensical. Is it total registered users? Users per <time interval>? Sessions that need to go in the session store? Concurrent requests?
Then there's the implementation language category. interpreted, JITed vs. AOT.
And of course the workload matters a lot. Simple CRUD application vs. compute-heavy or serving lots of media, ...
Together those factors can make like 6+ OOMs difference.
Agreed, the numbers were shockingly low.
Amazing to see my little phone pop up randomly on hacker news :D
Thank you stranger.
Nice read