I have a deep distrust of hierarchies, because they keep you trapped into a single model that keeps extending its authority, usuall without anyone explicitly deciding that it should do so. For example, the file system: once it was deemed hierarchy is the main metaphor for navigation, the structure persisted and was reused for organisation, ownership, access control and governance. And it became infrastructure we cannot easily remove before we could even question if it was right or not. And once it dominated, non-hierarchical things were retrofitted as glue, e.g.: symlinks, aliases, shortcuts... also, when's the last time you've used a tag?
The webs are so much more malleable, but they're also not free. All the 'good enough's you were that a hierarchy that was taking care of implicitly are now your responsibility to model precisely and make sure they're performant as well. Look at ReBAC, for example. It gives you expressive power, but it also forces you to reason precisely about relationships, graph traversal, consistency, and cost. Strikingly similar is GraphQL.
Interestingly, source code is hierarchical, but compiles almost immediately to a graph IR and most analysis and optimisations happen there. But almost nobody looks at a CFG/SSA graph directly. You author in a hierarchical manner, yet the operational substrate is a malleable graph.
The problem with trees is that the are a dimensional reduction, an aggregation; taking a problem without directionality and applying a useful/functional hierarchy.
So a tree is a way to take a high dimensionality graph and make it usefully lower dimensionality, but, given the aforementioned proof, that reduction is going to go from being a lossless compression to a heuristic. So any interesting problem (at least, any problem interesting to me) is only going to be aided (read: not solved exhaustively) by that hierarchy.
I'm okay with this. Being okay with this has been one of the most freeing things over the last 20 years of my career. Accept inaccuracy, and find usefulness in your data structures.
Reminds me of my favorite math essay: "When is one thing equal to some other thing?"
It's a great question, much deeper and more interesting than it seems. The essay suggests thinking in terms of isomorphisms (relative to the structure you care about) rather than equality in some absolute sense, and I've found a fuzzy version of that to be a really useful perspective even in areas that can't be fully formalized.
I jumped to a similar conclusion right away and popped over here to comment only to find you have beaten me to the punch. I use to keep a work wiki page of common problems the team encounters over and over again.
Years ago, I stumbled upon the "idea" was already debated in other fields long before programming. Lumpers and Splitters.
I wonder whether the author deliberately avoided ontology? That's what comes to mind when I read this. The age-old debate between taxonomy and ontology.
The first chapter of this waves away the fact that hierarchical filesystems are now useless, but it is still a fact. There is no more reason to organize your files than there is to drive around in a chariot. It is hard to map one domain to the other, but it is also not necessary. With AI indexing and recall it's less necessary than it has ever been.
I have a deep distrust of hierarchies, because they keep you trapped into a single model that keeps extending its authority, usuall without anyone explicitly deciding that it should do so. For example, the file system: once it was deemed hierarchy is the main metaphor for navigation, the structure persisted and was reused for organisation, ownership, access control and governance. And it became infrastructure we cannot easily remove before we could even question if it was right or not. And once it dominated, non-hierarchical things were retrofitted as glue, e.g.: symlinks, aliases, shortcuts... also, when's the last time you've used a tag?
The webs are so much more malleable, but they're also not free. All the 'good enough's you were that a hierarchy that was taking care of implicitly are now your responsibility to model precisely and make sure they're performant as well. Look at ReBAC, for example. It gives you expressive power, but it also forces you to reason precisely about relationships, graph traversal, consistency, and cost. Strikingly similar is GraphQL.
Interestingly, source code is hierarchical, but compiles almost immediately to a graph IR and most analysis and optimisations happen there. But almost nobody looks at a CFG/SSA graph directly. You author in a hierarchical manner, yet the operational substrate is a malleable graph.
The problem with trees is that the are a dimensional reduction, an aggregation; taking a problem without directionality and applying a useful/functional hierarchy.
And that's a problem because Aggregability is NP-Hard: https://dl.acm.org/doi/abs/10.1145/1165555.1165556
So a tree is a way to take a high dimensionality graph and make it usefully lower dimensionality, but, given the aforementioned proof, that reduction is going to go from being a lossless compression to a heuristic. So any interesting problem (at least, any problem interesting to me) is only going to be aided (read: not solved exhaustively) by that hierarchy.
I'm okay with this. Being okay with this has been one of the most freeing things over the last 20 years of my career. Accept inaccuracy, and find usefulness in your data structures.
One nice tool for analyzing maps as a tree is as a dominator trees. I wrote a bit about it here: https://neugierig.org/software/blog/2023/07/dominator.html
I think all three problems are really one problem under the hood:
Are these two things actually the same thing, or they separate?
Reminds me of my favorite math essay: "When is one thing equal to some other thing?"
It's a great question, much deeper and more interesting than it seems. The essay suggests thinking in terms of isomorphisms (relative to the structure you care about) rather than equality in some absolute sense, and I've found a fuzzy version of that to be a really useful perspective even in areas that can't be fully formalized.
https://people.math.osu.edu/cogdell.1/6112-Mazur-www.pdf
Or non binary. How much are these the same and how.
I jumped to a similar conclusion right away and popped over here to comment only to find you have beaten me to the punch. I use to keep a work wiki page of common problems the team encounters over and over again.
Years ago, I stumbled upon the "idea" was already debated in other fields long before programming. Lumpers and Splitters.
https://en.wikipedia.org/wiki/Lumpers_and_splitters
Wow, thanks for that, TIL! I’m definitely a code lumper.
"Ambiguity is the enemy", as a rule of thumb, has helped me
Every few years I watch, with amusement, our management restructuring the organizational hierarchy, allegedly because the old one didn't work.
I thought the two hard problems were naming things, cache invalidation, and off-by-one errors?
Don't race forget conditions!
His message was submitted before the memory recall completed execution.
At least the title “The Third Hard Problem” is still appropriate regardless of whether you get the joke right.
I thought it was timezones.
There are only two hard problems in computer programming:
1. Naming things 2. Cache invalidation 3. off-by-one errors
there are actually 10 problems: naming things, cache invalidation, base conversions, off by one errors, and cache invalidation
Base conversions are easy, but which off by one is it? base 4 or base 6?
Putting object into trees is basically a caching problem.
I was thinking it's a naming problem haha, a file path can be seen as a global/fully-qualified name really.
I wonder whether the author deliberately avoided ontology? That's what comes to mind when I read this. The age-old debate between taxonomy and ontology.
The first chapter of this waves away the fact that hierarchical filesystems are now useless, but it is still a fact. There is no more reason to organize your files than there is to drive around in a chariot. It is hard to map one domain to the other, but it is also not necessary. With AI indexing and recall it's less necessary than it has ever been.
This seems optimistic.
Use multiple trees.
This is more true as stated than people want to give credit for, usually.