I would claim that I'm quite familiar with the TLS and certificate ecosystem, and up until this news item I didn't know Google runs a CT search engine.
Everyone uses either crt.sh or to a lesser extend Censys. I think this space is covered well and there's just no need for a Google-operated CT search engine nobody knows about.
I think I would claim to be similarly familiar :) but I did know Google operated this search and I've even used it in the past when Rob's site was down.
As I think I expressed in a post last year it's important that we could migrate off crt.sh but it isn't a problem if we never do.
One thing I don't like but can't attest to still being a problem since maintaining one is no longer my job, is some logs are (were) unreliable for a crawler, which is a problem if you wanted to spin up an alternative to crt.sh. Google checks things work for them as part of qualification, but that's at best a 90% solution
How many people feel they get positive value from CT services in general?
I have Cloudflare email me whenever my domain gets a cert, and that is a positive value for me. However, it is very clear to me that bots are watching these logs (in near real time) because obscure hostnames (sometimes UUIDs) that I set up for a quick test but need a real cert will start getting traffic almost immediately after the cert comes back from Let's Encrypt.
As an attacker, this log is a fantastic tool for service/host discovery, but as a defender I kind of wish I could opt out.
Note: a wildcard cert is a good way to avoid the problem, but I prefer to avoid wildcards if possible.
A lot do, but without knowing it. Chrome uses an 'expect-CT' header, which means the sites TLS cert should chain to a root CA, with both stored in a CTL.
I don't think that's what your parent post meant, but, Chrome actually mechanically requires CT logging in order to trust a site, the expect-CT header doesn't come into it. Safari has the same policy, and Mozilla intends to some day do likewise in Firefox but the politics of the exact rules are tricky for reasons we could get into if somebody cares.
When your Chrome connects to a TLS server, the server has to provide a certificate, and Chrome examines that certificate before any HTTP traffic happens (and thus before you could send an HTTP header like Expect-CT anywhere) unless you've got Group Policy or similar rules saying otherwise:
- The certificate must have proof it was logged in the form of SCTs. For most sites the SCTs are baked into the certificate when they got it, they're all that incomprehensible gibberish near the end of your certificate if you've read it.
- The certificate must have been issued relatively recently (825 days or less previously) and it must expire within 825 days of issuance or if it was issued since some time in 2020 when Apple's policy change happened, 398 days.
- The certificate says it is for TLS Servers (in most cases certificates say they're also for TLS Clients, and sometimes other things too, but Chrome checks it says specifically TLS Servers here)
- The certificate has a Subject Alternative Name matching the DNS name of the server, or, if the URL we're resolving is for a numeric IP address, a SAN matches that IP Address. SANs are typed, and are not free human text, so a DNS SAN and an IP address SAN are distinguishable even if 10.20.30.40 was a valid DNS name, which it is not. Old-fashioned "Common" names are disregarded.
- The certificate must be signed by a trusted CA (or by an intermediate which in turn was trusted, and so on recursively).
I think I hit the big ticket items, there might be some others.
Everybody indirectly gets value from CT because it allows the public to exercise their oversight over the Web PKI more effectively, and thus it acts as a guarantee over the behaviour of the CAs.
As a researcher it lets you call "bullshit" easily, many years ago I actually used the logs to show that CAs "missed" (did an inadequate job of identifying and revoking) some certificates they should have revoked months earlier, if you attempted that without logs you'd never be sure how much you missed and you'd have no fixed scope for the work.
[[ As to getting hit by bots, keep in mind that passive DNS can cause that too, if anybody (testers, third party services, anybody) does a DNS lookup for name X on most well known public DNS servers, the server operators sell the DNS answer + timestamp (but not who asked so this is not personal information), which you can then buy as a bulk service. So e.g. some-uuid-on-a.server.example gets looked up on an iPhone over somebody's ISP, and within seconds a feed has added some-uuid-on-a.server.example A 10.20.30.40 to the list of known DNS answers. ]]
You can use these monitors, (although not specifically the one operated by Google for much longer) or you can build your own. If you build your own you can choose to specialise. Maybe you only care about certificates for names under .horse, or you only care about certificates from for-profit companies, or with RSA keys.
I know there are CT search services like crt.sh, but is it practical to download the raw data and search it locally? If the logs are append‐only, it feels like a perfect usecase for rsync.
Or purchase 50x4TB drives for $10k USD, or 100 if you want RAID-1. These can easily fit into 4-8U of rack space, at around $1,500 or less per rack U (don't buy new / expensive servers if all you need is high-IO NAS storage hosts for cephFS).
This will have minimal recurring fees. But will cost you time to manage instead of paying $CLOUDVENDER $50k/year. Please evaluate the tradeoffs for your scenario but good God don't blindly default to the cloud without first thinking carefully about it.
See also related: Today's HN top article is "Just say Yes to self-hosting"
Do you think that CT log data replication would be more secure and efficient if the CT logs were stored by a zero trust distributed application like a blockchain, instead of Merkle signatures in a database owned by one party (that's now discontinuing free indexing, at least)?
> Trillian is a centralized Merkle tree: it doesn't support native replication [...] According to the trillian README, trillian depends upon MySQL/MariaDB and thus internal/private replication is as good as the SQL replication model (which doesn't have a distributed consensus algorithm like e.g. paxos).
And what about indexing and search queries at volume, again without replication?
>> Indexers are node operators in The Graph Network that stake Graph Tokens (GRT) in order to provide indexing and query processing services. Indexers earn query fees and indexing rewards for their services. They also earn from a Rebate Pool that is shared with all network contributors proportional to their work, following the Cobbs-Douglas Rebate Function.
>> GRT that is staked in the protocol is subject to a thawing period and can be slashed if Indexers are malicious and serve incorrect data to applications or if they index incorrectly. Indexers can also be delegated stake from Delegators, to contribute to the network.
>> Indexers select subgraphs to index based on the subgraph’s curation signal, where Curators stake GRT in order to indicate which subgraphs are high-quality and should be prioritized. Consumers (eg. applications) can also set parameters for which Indexers process queries for their subgraphs and set preferences for query fee pricing.
> For convenience, these sources are aggregated and continuously exported to a GCS bucket maintained by OSV: gs://osv-vulnerabilities
> This bucket contains individual entries of the format gs://osv-vulnerabilities/<ECOSYSTEM>/<ID>.json as well as a zip containing all vulnerabilities for each ecosystem at gs://osv-vulnerabilities/<ECOSYSTEM>/all.zip
> E.g. for PyPI vulnerabilities:
# Or download over HTTP via https://osv-vulnerabilities.storage.googleapis.com/PyPI/all.zip
gsutil cp gs://osv-vulnerabilities/PyPI/all.zip
Hopefully, with an incentivized Blockchain Indexing service and/or e.g. GCS buckets that you just always `cp` and then load locally and then query locally, we can find a solution for queries of the growing CT Certificate Transparency logs.
How many people here do any sort of regular review of CT logs, and do you really think an attacker couldn't get away with whatever their attack was by the time you found out?
I did for a long time but I stopped. Real time is still super useful though. I have Cloudflare and Let's Encrypt email me now. Mostly it's been helpful for tracking down old boxes/services that are still running that have been lost/forgotten.
Before the inevitable flood of the super insightful and valuable comments like "hurr durr another google product killed", let me ask:
Is there a good alternative frontend for these queries? As far I understand the underlying data is still public and available from many sources, but this was the only easy to use search I know of.
Cloudflare sells some services to notify you on changes to your domains, but what about just checking on it manually, or exploring history of other domains?
I am aware of that, but as a non profit providing certificate services, it aligns with their mission (imho) to also provide CT log storage and search functionality imho (not to denigrate crt.sh, which is good tooling, but yet another for profit org [Sectigo.com, formerly Comodo CA] benevolently providing the service for now).
Personally, I think it's good to move away from Google, Facebook, and similar providing these core internet services. Throw some money at Let's Encrypt, the Internet Archive, and whomever is going to run Elasticsearch for this corpus, call it a day. The primitives are cheap to do so (compute and storage).
Building this was exciting, I've been proud to run it and and it predated services from Facebook or Cloudflare. I'm not sure what the future should hold however, it's hard to bring it up anywhere without being told people should just use Facebook.
I would claim that I'm quite familiar with the TLS and certificate ecosystem, and up until this news item I didn't know Google runs a CT search engine.
Everyone uses either crt.sh or to a lesser extend Censys. I think this space is covered well and there's just no need for a Google-operated CT search engine nobody knows about.
I am in the same boat. I built a CT Log scraping system back in 2016 and once storage got too expensive I migrated the solution to the Censys API.
Our use case is finding phishing sites targeting our customers, CT Logs have been amazing for detecting campaigns early.
I think I would claim to be similarly familiar :) but I did know Google operated this search and I've even used it in the past when Rob's site was down.
As I think I expressed in a post last year it's important that we could migrate off crt.sh but it isn't a problem if we never do.
One thing I don't like but can't attest to still being a problem since maintaining one is no longer my job, is some logs are (were) unreliable for a crawler, which is a problem if you wanted to spin up an alternative to crt.sh. Google checks things work for them as part of qualification, but that's at best a 90% solution
Also not that crt.sh is operated by Sectigo (formerly Comodo). Previously they had not good reputation.
It would be nice to learn about great google tool by a another way than the news they are discontinued.
How many people feel they get positive value from CT services in general?
I have Cloudflare email me whenever my domain gets a cert, and that is a positive value for me. However, it is very clear to me that bots are watching these logs (in near real time) because obscure hostnames (sometimes UUIDs) that I set up for a quick test but need a real cert will start getting traffic almost immediately after the cert comes back from Let's Encrypt.
As an attacker, this log is a fantastic tool for service/host discovery, but as a defender I kind of wish I could opt out.
Note: a wildcard cert is a good way to avoid the problem, but I prefer to avoid wildcards if possible.
A lot do, but without knowing it. Chrome uses an 'expect-CT' header, which means the sites TLS cert should chain to a root CA, with both stored in a CTL.
I don't think that's what your parent post meant, but, Chrome actually mechanically requires CT logging in order to trust a site, the expect-CT header doesn't come into it. Safari has the same policy, and Mozilla intends to some day do likewise in Firefox but the politics of the exact rules are tricky for reasons we could get into if somebody cares.
When your Chrome connects to a TLS server, the server has to provide a certificate, and Chrome examines that certificate before any HTTP traffic happens (and thus before you could send an HTTP header like Expect-CT anywhere) unless you've got Group Policy or similar rules saying otherwise:
- The certificate must have proof it was logged in the form of SCTs. For most sites the SCTs are baked into the certificate when they got it, they're all that incomprehensible gibberish near the end of your certificate if you've read it.
- The certificate must have been issued relatively recently (825 days or less previously) and it must expire within 825 days of issuance or if it was issued since some time in 2020 when Apple's policy change happened, 398 days.
- The certificate says it is for TLS Servers (in most cases certificates say they're also for TLS Clients, and sometimes other things too, but Chrome checks it says specifically TLS Servers here)
- The certificate has a Subject Alternative Name matching the DNS name of the server, or, if the URL we're resolving is for a numeric IP address, a SAN matches that IP Address. SANs are typed, and are not free human text, so a DNS SAN and an IP address SAN are distinguishable even if 10.20.30.40 was a valid DNS name, which it is not. Old-fashioned "Common" names are disregarded.
- The certificate must be signed by a trusted CA (or by an intermediate which in turn was trusted, and so on recursively).
I think I hit the big ticket items, there might be some others.
Everybody indirectly gets value from CT because it allows the public to exercise their oversight over the Web PKI more effectively, and thus it acts as a guarantee over the behaviour of the CAs.
As a researcher it lets you call "bullshit" easily, many years ago I actually used the logs to show that CAs "missed" (did an inadequate job of identifying and revoking) some certificates they should have revoked months earlier, if you attempted that without logs you'd never be sure how much you missed and you'd have no fixed scope for the work.
[[ As to getting hit by bots, keep in mind that passive DNS can cause that too, if anybody (testers, third party services, anybody) does a DNS lookup for name X on most well known public DNS servers, the server operators sell the DNS answer + timestamp (but not who asked so this is not personal information), which you can then buy as a bulk service. So e.g. some-uuid-on-a.server.example gets looked up on an iPhone over somebody's ISP, and within seconds a feed has added some-uuid-on-a.server.example A 10.20.30.40 to the list of known DNS answers. ]]
You can use these monitors, (although not specifically the one operated by Google for much longer) or you can build your own. If you build your own you can choose to specialise. Maybe you only care about certificates for names under .horse, or you only care about certificates from for-profit companies, or with RSA keys.
Thanks, the passive DNS is a good point. I hadn't thought about that.
I know there are CT search services like crt.sh, but is it practical to download the raw data and search it locally? If the logs are append‐only, it feels like a perfect usecase for rsync.
https://github.com/SSLMate/certspotter/ possibly?
Certspotter is basically just crawling for matches, it doesn't retain them.
It's impractical due to the size of the data. You're talking hundreds of millions, or billions of certificates.
Bit more than 6 billion.
And yes, it's quite challenging. It would be a nice service to the community if someone would host downloadable dumps of CT logs.
100 billion * 2048 bytes per cert is just 205 terabytes.
That’s not that impractical. Could easily be stored on gcs and s3 with requestor pays for the api and any associated egress charges.
205 terabytes at 2 cent per GB/month is just 50k/year.
And FTP is a perfectly viable and straightforward alternative to Dropbox. You just need to...
Or purchase 50x4TB drives for $10k USD, or 100 if you want RAID-1. These can easily fit into 4-8U of rack space, at around $1,500 or less per rack U (don't buy new / expensive servers if all you need is high-IO NAS storage hosts for cephFS).
This will have minimal recurring fees. But will cost you time to manage instead of paying $CLOUDVENDER $50k/year. Please evaluate the tradeoffs for your scenario but good God don't blindly default to the cloud without first thinking carefully about it.
See also related: Today's HN top article is "Just say Yes to self-hosting"
https://news.ycombinator.com/item?id=30781536
It's $12,300/year at Backblaze.
https://www.backblaze.com/b2/cloud-storage-pricing.html
According to https://groups.google.com/g/certificate-transparency/c/iU6SH..., there is no easy direct download. But you can just use the get-entries endpoint repeatedly and/or in parallel to download however many you want.
Yes, you can use the certificate-transparency go code to pull down from the trillian API https://github.com/google/certificate-transparency-go/blob/m...
You would need to know the index, or you could just iterate over a range
Do you think that CT log data replication would be more secure and efficient if the CT logs were stored by a zero trust distributed application like a blockchain, instead of Merkle signatures in a database owned by one party (that's now discontinuing free indexing, at least)?
From "Oak, a Free and Open Certificate Transparency Log" (LetsEncrypt 2019) https://news.ycombinator.com/item?id=19920002 :
> Trillian is a centralized Merkle tree: it doesn't support native replication [...] According to the trillian README, trillian depends upon MySQL/MariaDB and thus internal/private replication is as good as the SQL replication model (which doesn't have a distributed consensus algorithm like e.g. paxos).
And what about indexing and search queries at volume, again without replication?
From "A future for SQL on the web" https://news.ycombinator.com/item?id=28158491 :
> https://thegraph.com/docs/indexing
>> Indexers are node operators in The Graph Network that stake Graph Tokens (GRT) in order to provide indexing and query processing services. Indexers earn query fees and indexing rewards for their services. They also earn from a Rebate Pool that is shared with all network contributors proportional to their work, following the Cobbs-Douglas Rebate Function.
>> GRT that is staked in the protocol is subject to a thawing period and can be slashed if Indexers are malicious and serve incorrect data to applications or if they index incorrectly. Indexers can also be delegated stake from Delegators, to contribute to the network.
>> Indexers select subgraphs to index based on the subgraph’s curation signal, where Curators stake GRT in order to indicate which subgraphs are high-quality and should be prioritized. Consumers (eg. applications) can also set parameters for which Indexers process queries for their subgraphs and set preferences for query fee pricing.
FWIW, here's how OSV affords search queries: https://github.com/google/osv#data-dumps
> For convenience, these sources are aggregated and continuously exported to a GCS bucket maintained by OSV: gs://osv-vulnerabilities
> This bucket contains individual entries of the format gs://osv-vulnerabilities/<ECOSYSTEM>/<ID>.json as well as a zip containing all vulnerabilities for each ecosystem at gs://osv-vulnerabilities/<ECOSYSTEM>/all.zip
> E.g. for PyPI vulnerabilities:
Hopefully, with an incentivized Blockchain Indexing service and/or e.g. GCS buckets that you just always `cp` and then load locally and then query locally, we can find a solution for queries of the growing CT Certificate Transparency logs.
How many people here do any sort of regular review of CT logs, and do you really think an attacker couldn't get away with whatever their attack was by the time you found out?
I did for a long time but I stopped. Real time is still super useful though. I have Cloudflare and Let's Encrypt email me now. Mostly it's been helpful for tracking down old boxes/services that are still running that have been lost/forgotten.
Review or get emailed when a cert is issued with your domain in it somewhere.
Secure-corpdomain.com etc?
This was not very satisfying. Why?
Not much more data in the Google Groups archive. https://groups.google.com/g/certificate-transparency/
Before the inevitable flood of the super insightful and valuable comments like "hurr durr another google product killed", let me ask:
Is there a good alternative frontend for these queries? As far I understand the underlying data is still public and available from many sources, but this was the only easy to use search I know of.
Cloudflare sells some services to notify you on changes to your domains, but what about just checking on it manually, or exploring history of other domains?
https://crt.sh/
I didn't even know Google had a CT search. The search results are better presented/more accessible on crt.sh from my one minute of testing.
Does Let's Encrypt offer a similar functionality for CT queries? Could they if donations were directed to building said functionality?
Certificates generated via Lets Encrypt already end up in the Certificate Transparency log. You can check this yourself via https://crt.sh
I am aware of that, but as a non profit providing certificate services, it aligns with their mission (imho) to also provide CT log storage and search functionality imho (not to denigrate crt.sh, which is good tooling, but yet another for profit org [Sectigo.com, formerly Comodo CA] benevolently providing the service for now).
Personally, I think it's good to move away from Google, Facebook, and similar providing these core internet services. Throw some money at Let's Encrypt, the Internet Archive, and whomever is going to run Elasticsearch for this corpus, call it a day. The primitives are cheap to do so (compute and storage).
FTR, any certificate that wants to be recognized by Chrome ends up there.
There's a zillion of them, right? crt.sh is maybe the best known, but if you Google you'll find a bunch (for instance Facebook runs one).
crt.sh came up but I was getting 5xx errors. Looks like it might be overloaded now, or I picked "expensive" queries.
But thanks for the recommendation, I'll try it some time later.
Edit: Regarding the facebook one: "Log into Facebook to use this tool." Yeah no thanks.
Some altarnatives:
* https://crt.sh/
* https://developers.facebook.com/tools/ct
* https://ui.ctsearch.entrust.com/ui/ctsearchui
> Log into Facebook to use this tool.
Not sure I'd even list that as a reasonable alternative.
Feel free to not use it.
https://ui.ctsearch.entrust.com/ui/ctsearchui is the only one I could get to work, but I uMatrix things so it could be my own fault.
I'd just like to chime in with my own alerting service:
https://ctadvisor.lolware.net/ https://github.com/technion/ct_advisor
Building this was exciting, I've been proud to run it and and it predated services from Facebook or Cloudflare. I'm not sure what the future should hold however, it's hard to bring it up anywhere without being told people should just use Facebook.