bitshiftin 2 years ago

The blog author founded altinity. Altinity's main product offering is a hosted clickhouse service. The top 10 committers to clickhouse all seem to be clickhouse employees. Looking at altinity on github, they contribute much less open source. If clickhouse the company are spending 40%+ of their money to build the product, then others including altinity spend 5% dev and 80% marketing, they will get more customers. That isn't sustainable. How do you solve that? Other than fencing off exclusive enterprise features.

  • Sphax 2 years ago

    Exactly. I have no problem with open source developers shifting to an open core model so they can get paid, even if it means I as a user don't get these new features for free.

    • dartos 2 years ago

      I like what directus does. They release their source code under a restrictive license (you need to pay if you make more than so much money) with a timer. After 3 years of any release the license changes to GPL.

      • foxandmouse 2 years ago

        Do you have a source for the After 3 years clause? That's my first-time hearing of it!

  • sylvinus 2 years ago

    There's a kind of survivorship bias here! I'm sure that if Altinity (or others) weren't forced to do RFCs on GitHub like the one mentioned in this article that get low/no attention from the core team, they would be in the top committers :)

    • bitshiftin 2 years ago

      "forced to do RFCs on GitHub" The author has raised 1 RFC ever. Have you evidence they attemnpted to contribute more in the past? Proposals to form a steering committee? sponsorship? PRs?

      • unamedrus 2 years ago

        Are you part of ClickHouse community, ever used that DBMS or participate in any activity related it?

        Just to make sure, we are on the same page.

        • bitshiftin 2 years ago

          Yes I am part of the community. Yes I have used clickhouse for many years now. Yes I have recommended clickhouse to a number of others and helped them personally get started. I think it's a great piece of technology.

          • unamedrus 2 years ago

            In that case, you should know what is Altinity.

  • 3np 2 years ago

    Yeah, so, if maintainers started blocking external contributions for business reasons or changed the license of the upstream repo, that would be a legitimate concern.

    If this blog post raises any legitimate concerns, I missed them. All I see is entitlement to continue get work for free. If you're concerned about gaps in ClickHouse functionality, maybe pick up the slack and contribute back?

    It takes a village to raise a database, as the saying goes.

    • sylvinus 2 years ago

      AFAIK, the author is indeed trying to contribute back, that is the whole point: https://github.com/ClickHouse/ClickHouse/issues/54644

      • bitshiftin 2 years ago

        In 4 years, the author has raised 6 issues and zero PRs. The latest issue was raised last week to inflame this same argument and is 4x longer than any previous issue.

    • ethbr1 2 years ago

      Ignoring any arguments to the author/company, the big question is "What does ClickHouse do, if there's a PR from the community reimplementing a cloud-only/closed feature?"

      The problem with open core is there's no great answer.

      Either it's merged, in which case there are now two codebases implementing the same feature (one open, one closed), and the company's revenue stream is imperiled.

      Or it's rejected (either explicitly or quietly ignored), in which case work is wasted and the project is less useful than it could be.

      How did open core companies historically handle this?

      As ugly as it is, it feels like permissive OSS (e.g. MIT) core + open but anti-SaaS non-OSS cloud-only/closed feature is a more sustainable model that encourages development in the open.

      E.g. an MIT-alike license for select features that says "free-as-in-beer license up to X users, otherwise talk to our sales team and get a commercial license"

      At the end of the day, I want OSS to succeed and be great, but especially nowadays that takes a large team, which takes funding, which requires a competitive revenue model.

  • remorses 2 years ago

    A license that doesn’t allow reselling the service is good enough

    For example I saw this with Tailwind UI

  • mgachka 2 years ago

    "A license that doesn’t allow reselling the service is good enough For example, I saw this with Tailwind UI" FYI, we discussed it in this thread https://github.com/ClickHouse/ClickHouse/issues/44767. For the moment, the only way to get the new features is to use the cloud version (which is likely to be a no-go for most companies managing their own clickhouse infrastructure).

    • atwong 2 years ago

      MongoDB did something similar. It's open source for you to extend and host yourself but you can't build a cloud service for it.

  • hodgesrm 2 years ago

    CEO of Altinity here, also editor of the blog article. It would be more interesting to address the points that it raises. If you are an open source user of ClickHouse, do you really want basic features like object storage for tables or ability to delete data efficiently withheld?

    This question is important regardless of who raises it. Projects like Kafka, Spark, PostgreSQL, and Kubernetes (among others) have solved it while allowing good returns to those who contribute.

    p.s., We spent 7% of budget on marketing last month. A sizable fraction of our budget is devoted to open source contributions ClickHouse and ecosystem projects.

    • joshxyz 2 years ago

      i appreciate your work here man, altinity even wrote a lot of useful blog posts about clickhouse back then, love those stuff.

      this whole thing reminds me of elastic and hashicorp, and it's hard to pick sides given that the core maintainers also worked their assess of building it in public, and the community contributors also put their effort into it.

      i think this common theme is unlocking a new era of software where core maintainers productize the main product slapped with a bsl license and the community (incl other businesses) maintain their own fork.

      it's great that discussions like this are being brought up and talked about.

      • hodgesrm 2 years ago

        Thanks. That's exactly why we wrote the article. Beyond any commercial considerations, we have worked with ClickHouse for many years and are personally invested in seeing it become the default analytic database worldwide. I believe the best path to that goal is robust support for open source development and distribution.

        • atwong 2 years ago

          Lots of other competitors like Apache Pinot, Apache Druid and StarRocks are fighting in that default analytics space.

          • ddorian43 2 years ago

            StarRocks has compute/storage separation in open-source as example.

            • unamedrus 2 years ago

              Not only. Transactions, UPDATES, CBO, Better join optimizations.

              It seems that someone is stuck in 2016, when there is no good alternatives for ClickHouse exist in open source.

            • PeterCorless 2 years ago

              When datasets are small and can easily fit into a single node [a few terabytes], this isn't as much of an issue. Yet when datasets grow far larger, or when compute/QPS needs grow while the dataset grows slower — when either side of the equation does not scale in balanced proportion with each other — that's when this separation of compute & storage becomes vital. [Either that, or you need to find hardware servers or cloud instance types that also support this imbalance of compute & storage, which is sometimes harder to do; it also locks you into a hardware configuration that cannot dynamically scale as needs and workloads change.]

              Apache Pinot also offers the same 2-tier compute/storage separation. And it also has nodes for minion [administrative] tasks. Again, these are more issues for larger scale analytical use cases.

              • ddorian43 2 years ago

                > Apache Pinot also offers the same 2-tier compute/storage separation.

                Based on looking at the docs, I don't think so. Maybe only with HDFS. Feel free to link to a page that says otherwise.

  • rjzzleep 2 years ago

    Altinity maintains the ClickHouse operator which got a lot of people to use ClickHouse to begin with. ClickHouse has had a lot of corner cases that were reported from those kinds of people, myself included.

    If you look at some of the discussions, while a lot of the fixes come from the clickHouse team it would be unjust to say that the corner case discussions don't contribute to the fixes.

    I think part of the reason is that ClickHouse, being sort of a unique offering brings with its users sometimes a quite competent bunch that go beyond the "I want this feature, please implement".

    • hodgesrm 2 years ago

      To that point Altinity has contributed about 900 PRs to ClickHouse, and many more if you include ecosystem projects like the operator, clickhouse-backup (which we maintain), the community grafana plugin (over 11M downloads last I looked), ODBC driver, etc. All of this is open source.

      We've also been very active on diagnosing problems, logging issues, and contributing ideas for solutions. Alexey Milovidov has logged the most issues of anyone (2376) but the next two people (1012, 810) are from Altinity. The #6 and #9 contributors of issues are also from Altinity.

      • rjzzleep 2 years ago

        I totally forgot about clickhouse-backup. It was frequently referenced in ClickHouse issues, and helped me and probably a lot of others understand how ClickHouse stores and transfers data.

benjaminwootton 2 years ago

This would be a shame and also a mistake in my opinion.

Clickhouse is instantly differentiated from Snowflake, Databricks, BigQuery and RedShift with the open source offering that you can deploy yourself. There are lots of other options but Clickhouse has the most mindshare and is the techies choice.

I find myself rooting for them and recommending them for that before you even get into any technical comparison.

  • hodgesrm 2 years ago

    ClickHoues is also faster than any of them if you know how to use it properly. It helps if you have some distributed systems background and an intuitive feel for map/reduce.

    For example ReplacingMergeTree uses a distributed algorithm to process changes without incurssing excessive INSERT time expense. It's quite elegant.

    • kgopalak 2 years ago

      Insert should hav never been expensive in the first place. This was probably hard for clickhouse because they started with postgres as the base which is optimized for oltp. In apache Pinot/druid etc, insert is nothing more than a simple append and believe thats the case today with clickhouse as well... In other words, these things are table stakes today and are not differentiators.

      • hodgesrm 2 years ago

        This is a different problem. Update is expensive in distributed columnar data. ReplacingMergeTree translates updates into inserts which are very fast and always have been. It then updates rows in a lazy fashion.

  • atwong 2 years ago

    All the main players in Clickhouse's space like Apache Pinot, Apache Druid, StarRocks, PrestoDB all have mindshare and unicorns using their products. It sounds like you haven't seen whats happening in this space.

    • kuchenbecker 2 years ago

      Trino, not Presto.

      Presto, created by FB, was required to let any FB engineer merge without OWNERS (because Facebook doesn't have OWNERS files unless it would create a SEV1).

      Subsequently, original creators of Presto forked it to PrestoSQL.

      So Facebook trademarked the name Presto.

      So creators renamed it Trino.

      https://trino.io/blog/2020/12/27/announcing-trino.html

mgachka 2 years ago

As a user of clickhouse since 2018 I'm fully aligned with the content of this article. This technology is one of the best I've been using in my career.

The choice of clickhouse for a new project in my company has always been a no-brainer, but the recent move from clickhouse.inc to a closed source version has made this choice less straightforward.

snegussie 2 years ago

Anyone familiar with Databend, Starrocks, or ByConity? They all focus on shared storage with separate compute. Currently checking out ByConity. Been using Clickhouse for quite a while and these were on my radar

up2isomorphism 2 years ago

The inherited advantage of being closed source in the first places is that you will not be accused of “moving away” from open source. We never see Microsoft office or Apple MacOS moving away from open source.

  • paulryanrogers 2 years ago

    MacOS (X) was once more open than it is now. It is a shame they have locked more away behind a closed license.

that_guy_iain 2 years ago

Open Source doesn’t pay. Companies need to make money. Any open source product owned and developed by a for profit company is at danger of it moving away from open source.

If you want open source go fund non profit organisations and/or charities. The fact we don’t see developers do that tells me a lot.

nnurmanov 2 years ago

Here is a similar Thin-Crust Open Core model https://reactflow.dev/blog/asking-for-money-for-open-source/. But I hear CH and understand their move. I think OSS Sponsorship has failed; it does not generate enough money to pay a team of top engineers. The best move would have been to implement better paid support model and allow more users to pay for support. Currently CH charges immense amount for their support, so only large corporations could afford.

  • jwatte 2 years ago

    But if they had a cheaper support model, the large corporations would also pay less, and they'd make even less money overall. Not to mention: support engineers are expensive, and researching tickets take a lot of time.

    • nnurmanov 2 years ago

      Not sure, you can have more customers but fixed number of issues. I am researching into it. You may crowdsource issue solving and have revenue sharing enabled.

jabart 2 years ago

Clickhouse open source and cloud user. My understanding is that the cloud version uses S3. Which would mean that they have very specific tenant pattern and code to run that version. This may be why lightweight work a specific way in that environment, or they need a way to test it at scale that would be hard through a feature flag in the open source product. Lightweight deletes were released to both, and previous roadmaps listed updates as upcoming.

You should be using ReplacingMergeTree if you are doing updates at the current moment.

  • unamedrus 2 years ago

    > Lightweight deletes were released to both

    Because they were contributed by community member, not ClickHouse Inc core team.

    https://github.com/ClickHouse/ClickHouse/pull/37893

    • jabart 2 years ago

      Commits show both, yes a community member kicked things over with the initial PR and seems like a team effort to get that feature launched.

  • hodgesrm 2 years ago

    > You should be using ReplacingMergeTree if you are doing updates at the current moment.

    Indeed. Altinity and other community users like ContentSquare made numerous contributions to make it more usable. It's a promising approach to updates at scale and has improved markedly over the last few months.

    That said you can't currently use RMT very efficiently in S3 because of overall limitations in MergeTree S3 table storage. We need to think about whether the improvements we're proposing will also enhance RMT. Thanks for bringing that up.

    • jabart 2 years ago

      Yeah, its the don't treat S3 like a disk issue. I'm looking at S3 only for cold storage but need AWS VPC Gateway Endpoint support for S3 access since we are on-premise.

      *Yes you can use a vpc gateway but need public IPs to waste to setup the BGP/IP routes.

PeterZaitsev 2 years ago

I do not think ClickHouse Inc will Abandon to Open Source, but following current trend it may focus on restricting Innovations to their proprietary cloud only product.

We see it with Oracle (MySQL) where most of innovations is happening in cloud only "Heatwave" or MongoDB where MongoDB Atlas increasingly getting features not available in their Community (SSPL) version

msarrel 2 years ago

How dare they go open core to make money from their own product.

brunoqc 2 years ago

Same thing with timescale. I think their s3 storage layer is only in their cloud version.

lairv 2 years ago

How do you chose between ClickHouse and DuckDB ? It feels like they solve the same problem

  • devoxi 2 years ago

    They are both columnar data stores and while they solve the same problem I wouldn't use them in the same situation. DuckDB is often referred as the sqlite of analytics, meaning that it's lightweight and you can embed it. On the other hand ClickHouse is definitely the way to go if you need to distribute your queries over multiple servers. If your workload can be held on a single server and you only need standard SQL functions both will serve you well. If you have more specific needs maybe you should have a look at the documentation. For example ClickHouse has a very extensive support for nested arrays which can prove quite useful.

    • hodgesrm 2 years ago

      Duckdb has also gotten mindshare as an engine to read Parquet from data lakes. The fact that it's embeddable enables some very creative uses. It helped that for a time DuckDB was substantially quicker than ClickHouse on reading Parquet. That advantage has eroded with recent improvements on ClickHouse Parquet support. I expect the gap will close quickly.

  • benjaminwootton 2 years ago

    They solve the same problem in that they are OLAP data stores, but that's where the similarity ends. Clickhouse is a centralised OLAP store (like 10s of others) whilst DuckDB is an embedded database that is usually ran in process.

    What is it about DuckDB and it's strange cult like following? It's nice that it's in process, but then it's an incremental improvement over Pandas. Nice tool and well implemented but I don't see what is transformative about it.

    • aadant 2 years ago

      ClickHouse power is to have one binary that runs anywhere :

      - local - server - cloud (*) - serverless - in-process https://github.com/chdb-io/chdb similar to DuckDB

      (*) except for the forked cloud versions, ClickHouse Inc, Huawei, etc ...

  • qxip 2 years ago

    Different beasts, but if by any chance you love ClickHouse already and just want to run OLAP queries in-process, there's chdb: https://github.com/chdb-io/chdb

  • atwong 2 years ago

    Scale. DuckDB chokes at a certain point (just like sqlite isn't the same as mysql or postgresql in terms of scalability). That's why they're building a better/bigger version.