Show HN: Hacker News user blogroll

dm.hn

937 points by deathbypenguin 2 years ago

I saw this [0] pretty cool thread by user revskill, and wanted a quicker way to search through it, but also to keep them all in one place so I can read them at my leisure whenever I get time.

Right now is like 60 lines of Ruby using Nokogiri, but I will certainly look into it further down the line and improve the list.

There's a cronjob checking the thread every 12 hours but I will eventually shut that down and it will become static after that.

There are some really awesome blogs in there. I really recommend going through the list, it made my day.

[0] "Could you share your personal blog here". https://news.ycombinator.com/item?id=36575081

honzabe 2 years ago

Lately, I feel like I am overwhelmed by content, and yet it is increasingly rare to find something authentic, something that is not either made just as a vehicle for advertising or designed to attract attention and likes on social media.

I was re-reading The Royal Road to Romance by Richard Halliburton recently. I love that book and if he lived today, he would be a travel blogger... so I tried to find a blog that would feel like that. And google search gives me more travel blogs than I can absorb, but they all feel like products.

I haven't had the time to go through blogs by HNers carefully yet but I hope there might be some gems in that pile. HN attracts a certain kind of people and if blogs written by them differ from the rest of the internet the same way HN itself, that would be great.

I like this idea very much. Thank you, the author of that original thread, and thank you, the creator of https://dm.hn

  • revskill 2 years ago

    My general idea on asking this question, is how to answer the questions:

    - How to add comments to my blog post ? => Just add a link to your blog post here.

    - How to upvote on a blog / blog article ? => Just use HN.

    - How to aggregate for facilitate search/categorization ? => There's a site here. Because Google Search sucked so hard now.

    - In case of LLM feeding, you own your own policies and privacy on your own data.

    Thanks you for joining.

  • kmarc 2 years ago

    that's why I find it refreshing to read... Books. As opposed to blogs. My feed reader now has easily a thousand unread entries, because I'm also overwhelmed with their nature of "vehicle for advertising"* (sometimes advertising just themselves)

    Books are authored, proof-read, and since you already paid for it, chances are lower to find this advertising feel tothe.

    * I love this expression!

    • honzabe 2 years ago

      Absolutely - I've been rediscovering books lately. Although it is hard - the years of interneting did a number on my attention span and habits.

      I used to be a voracious reader when I was a kid. That book I mentioned - the Royal Road to Romance - I remember how smitten I was by that book when I was 14. Now it felt really slow (compared to YouTube shorts) and I had trouble getting into it. But once I did, the sparks of the old excitement appeared again. Completely different feeling than after 2 hours of YouTube shorts.

      Books are great but I am sure that there is a lot of content on the internet written because someone genuinely wanted to say something (and BTW, not that books written as products are that rare either). It's just harder and harder to find it.

      • gcanyon 2 years ago

        You might enjoy The Long Ride by Lloyd Sumner -- dude set off on a bicycle with a couple hundred dollars in the early '70s and bicycled around the world.

  • safety1st 2 years ago

    Just reviewing what I've got added to my feed reader:

    * tilde.news

    * Lobsters

    * Slashdot

    * lemmy.sdf.org

    * the linux sub on lemmy.ml

    * A selection of the less annoying subreddits, like r/askphilosophy

    * A selection of local news websites for where I live

    * A selection of blogs written by random people who I think are interesting

    * Hackaday

    * indieretronews.com

    * Hacker Public Radio

    * HN of course

    * And other random stuff. And dm.hn is probably going to be amazing when I have some time to comb through it

    In the event that none of it's interesting, I pop open a Gemini client and just start clicking around, I always find the most random long ramblings. The Lagrange client in particular is a very refreshing reading/browsing experience.

    Internet content has never been better and I don't feel overwhelmed by inauthentic stuff at all, I know that there's a lot of it out there, but it rarely reaches my eyeballs, mainly through the now-decaying morass that is Reddit sometimes.

    Mind you it took me years to come up with the list of feeds I like and it's very personal to my interests, but it's always just been a text file that I edit so it was easy.

amadeuspagel 2 years ago

The latest posts from these blogs: https://webloglist.com/hn

  • jakebasile 2 years ago

    That's cool! Did you pull RSS from all the sites you could and use that to aggregate it?

    • amadeuspagel 2 years ago

      Yes, webloglist uses RSS autodiscovery.

      • darekkay 2 years ago

        It seems the autodiscovery didn't work for my blog (link in profile). I've posted something 2 days ago but it doesn't appear on your site. My feed is on the list from JSTucker, who also used some sort of autodiscovery.

        • amadeuspagel 2 years ago

          Atom isn't supported yet. Working on it.

          EDIT: Atom is supported now, but I haven't updated the list yet.

  • rambambram 2 years ago

    Nice list! I was almost going to ask you if you have an OPML file with all the feeds, but then I decided to check the list manually for interesting latest posts and grab only their feeds. Thanks for the list!

  • addandsubtract 2 years ago

    Now we just need ChatGPT to read them all and give us a daily update on the interesting ones.

  • freediver 2 years ago

    Any chance for an RSS of this?

    • tlavoie 2 years ago

      Sort of a meta-feed, for those with feeds of their own?

minebreaker 2 years ago

Just an idea. Wouldn't it be great to have a standard format for a user profile for automated discovery?

Something like:

  Any random string.
  [age]: xx
  [location]: xxx  # city or country or geohash or whatever
  [email]: foo@xxx.com  # can be obfuscated
  [blog]: https://xxx
  I use the format!  # magic tag to indicate you are following to the standard
  • flurdy 2 years ago

    I wouldn't but age in there. Not for PII reasons but I doubt I would remember to update it every year... My own website I changed to just say what decade my age is to give me some leeway.

    Good idea though will change mine now.

    The profile is used by other things as well such as keybase verification etc.

    • khimaros 2 years ago

      perhaps birth year?

  • tiim 2 years ago

    There is! It's called microformats[1] and is a very minimal format to embed machine readable data inside of html via standardized class names. The format for a person would be an h-card[2]. There are a bunch of parsing libraries for multiple programming laguages, such as https://go.microformats.io/.

    For example if you enter my website url in there you get all the data as a nice json object: https://go.microformats.io/?url=https%3A%2F%2Ftiim.ch

    [1]: http://microformats.org/ [2]: http://microformats.org/wiki/h-card

LordDragonfang 2 years ago

Since hn karma probably correlates to how much hn readers would enjoy a blog, I'd love a column/columns that includes the user's:

   - hn profile karma
   - total karma of posts from that domain
   - as above, but Sum(log(post_karma[i]))

...or something similar.

Whatever is feasible. For a while I've wanted a list of "blogs/domains that hn likes" that isn't polluted by general-high-traffic domains.

  • ploum 2 years ago

    That’s an awesome idea, I would be really curious to see the result.

    (hoping that this does not backfire, for example encouraging people to spam HN with their own posts to gain some karma on the blogroll)

    • LordDragonfang 2 years ago

      To address the exact situation in your parenthetical, I considered putting Sum(log(post_karma[i]-k)), for a k such that the expected log value is negative unless you get enough upvotes.

  • deathbypenguin 2 years ago

    I feel i'm going to open source this so people can add their own functionality. I will need to refactor first since I just hacked this together rather quickly.

  • deathbypenguin 2 years ago

    Karma is there for now to sort by... I'll see about the rest later on.

jefftk 2 years ago

Neat! It looks like something is broken with unicode handling? For example the "smart" apostrophe in https://news.ycombinator.com/item?id=36594375 (U+2019, RIGHT SINGLE QUOTATION MARK) is being rendered as "â€". Perhaps something is interpreiting utf-8 as latin1?

  • deathbypenguin 2 years ago

    Indeed. I'm having a fight with that at the moment and the line breaks as well.

    • sillysaurusx 2 years ago

      Actually, I recognize that specific breakage (a-box), as I’ve had to deal with it in my game engine. The problem is that something is interpreting each byte of a utf-8 encoded string as a separate character. That’s why some bytes show up as á and others are boxes — á is one of the few non-English characters that’s still valid ascii (single byte characters).

      The fix is to tell your framework to decode in utf-8 mode. I don’t use ruby, but in python it’s mode=‘utf-8’. In C++ it’s to convert to wstring, then operate on wchar_t.

      Unicode problems are mysterious, but I find it quite gratifying to solve them. At least nowadays. I used to find them incredibly annoying. But it’s pretty cool seeing any language be rendered by your app.

zrkrlc 2 years ago

Probably better to sort by karma by default though, otherwise it's alphabetical privilege all over again lol https://eric.ed.gov/?id=EJ905588

  • honzabe 2 years ago

    I think you are right (although I now wish I had not abandoned my old account). However, I have always felt that karma systems are too biased towards old users who had more time to accumulate karma. If only the karma system had some "aging" of results built in, like the tennis ranking system.

    And I am only partially saying that because I have a new account here.

    • bredren 2 years ago

      It looks like the tennis ATP ranking mostly drops rank points 52 weeks after they are gained.

      HN karma is not publicly associated with dated posts, (I think only the account holder can see that)

      But account creation date and total karma are public, so some derivation is possible.

      Something like:

      ranking_periods = number of 52 week spans in HN account lifetime

      avg_k_per_rp = Accumulated (total) Karma / RP

      Living Karma = AK - avg_k * RP

      Curious if this makes sense / there are better ways to do it w public info.

      If the private info were avail, it could drop karma from archive link comments, etc.

  • flurdy 2 years ago

    "Alphabetical privilege", nice. As someone who's surname starts with AB I would also call it "alphabetical curse". E.g. back when I was young when teachers would describe what to do then often would call each pupil up alphabetically to do it, and I was always first, and most of time had not actually listened to the instructions...

    • c0wb0yc0d3r 2 years ago

      I was in the same boat. It took me a long time to realize that it was better to set "the bar" rather than meet or exceed "the bar."

jmmv 2 years ago

Nice!

Any way we can update the description? For my case, what I sent to the original thread doesn't necessarily describe the blog :)

Also, a suggestion: a raw list of usernames like this, sorted alphabetically, can lead to gamification where people choose names that rank first to ensure they show up on the first page. In the past, when showing similar lists, I've implemented randomization so that no one person has an advantage.

  • sen 2 years ago

    Whenever I've done lists like this, I make the first-load randomise but give the user options to reorder the list in various ways (alphabetical asc/desc, latest activity, etc). I think that's the best way to stop too-blatant gamification.

surprisetalk 2 years ago

I just made something similar!

https://blogs.hn

  • yogsototh 2 years ago

    This is pretty nice, but for my blog I noticed the latest article are wrong. Check yannesposito.com

    • surprisetalk 2 years ago

      It looks like the description I pulled from your site was "Most recent articles", which I think made things confusing haha

        <meta name="description" content="Most recent articles">
      

      If you look at fetch.js in the repo, it just pulls the top posts from Algolia search.

  • jdsalaro 2 years ago

    what criteria did you use to create it? If I'm not mistaken mine's missing :)

voigt 2 years ago

I’m missing the good old time of webrings. This is very close :)

Nadya 2 years ago

I'm in the middle of making it easier for me to write so that I'll actually write more. :) so there's only 1 really old post currently.

https://nadyanay.me/blog

The subject matter I have planned is more on retro/small web projects and a store for well researched posts where I'm sick of having to find studies over and over to cite as sources. Easier to quote myself than write the same post for the 50th time.

ghoomketu 2 years ago

Looks great and congrats on shipping. If it were up to me I'd still be deliberating the best framework and design to use for this, and how I can pipe the comments through chatgpt to extract the category, keywords and do things that make it the best blogroll ever.

And then I would have just thought it's too much work for nothing and that'd be the end of it :P

  • myth2018 2 years ago

    Dude I can definitely relate

  • deathbypenguin 2 years ago

    haha I had the exact same ideas, but then I was "bah! I'll put it out there and I'll add functionality over time"

    • swyx 2 years ago

      did you just have the .hn TLD standing by? where is that from? must have cost a pretty penny

      • deathbypenguin 2 years ago

        it was parked for a year... I'm supposed to renew next week. 100US/year :)

        • swyx 2 years ago

          how much did it take to buy? i just checked on namecheap for the one i want and they wanted 5k for it to start… :/

          • deathbypenguin 2 years ago

            I bought it for like 100US on iwantmyname... maybe check over there...

            • swyx 2 years ago

              thank you!!

  • willhackett 2 years ago

    Live updating would be awesome. This could probably be done with a Cloudflare Worker, D1 (in Alpha, but still cool) and a Cron.

    Remix.run is a brilliant framework for running React on Workers.

  • Hrundi 2 years ago

    Analysis paralysis is very cruel. Many of my side projects died because of this, or just got stuck in development hell, even while interest was high.

    Looking back, some would have made good money if I had just released them

  • wey-gu 2 years ago

    This is totally me :p

1attice 2 years ago

Naked self-promotion here, but I was late to the party on the original blogroll -- is there any way to add blogs post-ex-facto? Is there a submission mechanism?

  • eigenhombre 2 years ago

    Same, I think this is a great idea and would like to submit mine as well -- maybe an "add" feature on the page would make sense, or re-inquire here at intervals, maybe monthly/yearly?

  • TOGoS 2 years ago

    I commented, but mine got missed, somehow. Maybe because my phone auto-capitalized the "H" in "http" and the script didn't account for funny capitalization. Sad!

  • scarface_74 2 years ago

    You could make it so you have to have your blog in your HN profile and have a karma of at least $x to reduce spam.

  • slushh 2 years ago

    >There's a cronjob checking the thread every 12 hours but I will eventually shut that down and it will become static after that.

    Which post-ex-facto? You should still be able to add your blog to the original submission.

deathbypenguin 2 years ago

Added feeds thanks to JSTucker. They are being fetched from the Gist. I think the cron ran, so there are more blogs now.

json: https://dm.hn/blogroll.json (I'll add the feed to each item in a minute)

smokel 2 years ago

Great work, thank you for sharing this.

I would prefer to see the entire list, so that I can easily search for keywords in the browser. Apparently, all data is available on the client side, but the table renderer seems to limit the table size to at most 100 entries.

  • ryan-duve 2 years ago

    A workaround while you're waiting for this to be supported by OP is to go to inspector and change the last dropdown option to

        <option value="10000">10000</option>
    

    then select it in the UI.

abathur 2 years ago

Hmm. Any idea why some wouldn't show up? I posted in https://news.ycombinator.com/item?id=36588940 but don't see it in the list.

  • toyg 2 years ago

    Same for me. Maybe the scraper choked on pagination, maybe they just took a snapshot before we posted.

  • abathur 2 years ago

    Ah. Mine is up, now, though the entry shows some sort of parse break, maybe around newlines.

    Sorry for being and edge case :)

do-me 2 years ago

That's awesome and so much more practical than scrolling through HN. It would also be possible to integrate semantic search so people don't necessarily need to know the keywords. If you're interested, feel free to ping me or take a look at https://github.com/do-me/SemanticFinder. In case I could just create a pre-indexed version based on your data dump which would be quite convenient to use.

jcnoel 2 years ago

Wow, mine made it. Now I really need to keep the sucker up to date. Thanks.

  • epiccoleman 2 years ago

    Heh, I'm having the same feeling.

stoyko 2 years ago

I saved that original link in the hopes of going through it but this is much better. But this is much better. Saving this instead.

ksec 2 years ago

Off Topic : I just checked the .HN domain and it is 100 EURO per year.

  • RomanHauksson 2 years ago

    I used to own roman.hn since my last initials are H-N, but I switched to roman.computer after I dropped my second last name (it's hyphenated).

    Technically you can't own an Honduran domain name if you're some rando American like me, but you can use a registrar like Njalla, which legally owns it for you but lets you control it.

re 2 years ago

This made me think of "planets", which I feel had a heyday back in the late 2000s before Reddit and social media took over everything. Anyone want to take all the blogs with RSS/Atom feeds and build an HN planet? :)

> In online media a planet is a feed aggregator application designed to collect posts from the weblogs of members of an internet community and display them on a single page

https://en.wikipedia.org/wiki/Planet_(software)

  • ploum 2 years ago

    Yeah, planet were awesome. I’m proud to say that my blog was both on planet.gnome (the original one) and planet.ubuntu.

    Now, I feed that the most interesting planet is planet.debian, which offers lot of variety without being focused on Debian.

    The great feature I liked was that Planet were not about a given project. It was about the people contributing to the project. Their life. Their interests.

    At some point, lot of planets started to ask only "on-topic" posts with a specific RSS feeds. Those planets became boring as it was mainly stuffs you could find on forum or any tech related websites.

    • bthallplz 2 years ago

      Yes! I've loved Planet Python[0] because it really lets you see that the Python community is quite varied, fun, and human.

      [0]: https://planetpython.org

ghomem 2 years ago

Clap clap clap. This is excellent public service @deathbypenguim. Yesterday I was scrolling through that enormous thread and using control+F to look for keywords of interest on the posted blog descriptions. Now it will be much easier to follow fellow bloggers. Thanks for having my blog on your list too.

leejoramo 2 years ago

This is great. An OPML version of this would be great to bulk IMPORT the RSS/ATOM feeds into your favorite feed reading app.

syx 2 years ago

I would add a shuffle button that opens a random blog so it’s nicer to discover something new compared to endless paginations.

  • deathbypenguin 2 years ago

    Noted. I will be correcting a few bits and adding new functionality over the next few days/weekend.

    • microtonal 2 years ago

      I missed the original topic. I’m not a very active blogger, but I am an active HNer. Any chance you could add my blog?

      https://danieldk.eu/blog/

      Nice work!

  • scastiel 2 years ago

    +1

    I would even add a “I’m feeling lucky” button, to redirect to a random blog ;)

  • deathbypenguin 2 years ago

    Random blog button up now.

    • alonsonic 2 years ago

      Love this, have been reading random blogs for the last 30minuts already

JSTucker 2 years ago

Heres an OPML with all the feeds I could detect from the list! https://gist.github.com/Josh-Tucker/030b8cba6557927a27f1c7e6...

  • swyx 2 years ago

    if you share the code for OPML conversion maybe OP could incorporate it quickly

    • JSTucker 2 years ago

      The script I've written is a horrible hack and will never see the light of day unfortunately. (Hence all the errors when importing)

  • mjgs 2 years ago

    Thanks - I’m currently importing the whole list into Feedly, which I’ll probably regret. The user experience is so hilariously bad compared to social medias, and I’m finding it funny that today Meta has released their new threads app.

    Anyway currently over 500 new uncategorised feeds have appeared. I’ve seen some German, Vietnamese and Russian blog posts, it’s total mayhem.

    Lol

    Update: total feeds imported 692, loads and loads of errors

msteffen 2 years ago

I’m at https://prog.blog. I only have a few posts (sort of meta-software-engineering-focused. The posts are on, like, “why does programming always take longer than you think?” and “how do you make decisions while working on a project”). I have more at the draft stage that I’m hoping to publish soon!

skilled 2 years ago

Good job. I would honestly love this but with RSS feeds also, but I know it's a tough ask unfortunately. (Not for you, but in general)

  • xoranth 2 years ago

    Most blogs that have RSS also have a `<link rel="alternate" type="application/rss+xml">` tag that redirects you to the RSS feed. If you pass the link to the homepage to a feed reader[^0], it will follow the link tag and find the RSS feed.

    [^0]: At least, Liferea on Linux, NetNewsWire and Vienna on Mac, do this. AFAIR NetNewsWire is even smarter than that, and can sometimes find the RSS feed even when there is no link tag.

    • marginalia_nu 2 years ago

      A bit of a snag is that many CMSes generate multiple feeds, and there is no way I'm aware of for identifying which is the "canonical" feed.

  • deathbypenguin 2 years ago

    I'll put the feed with the latest posts up over the weekend.

susam 2 years ago

Very interesting! Thanks for sharing your project here. Out of curiosity, I did some searches with some interesting strings. At the time of posting this comment, here is what the search results look like:

Vim: 8 entries

Emacs: 7 entries

Python: 24 entries

Rust: 24 entries

Lisp: 5 entries

Clojure: 3 entries

Haskell: 5 entries

Zig: 5 entries

Elixir: 4 entries

Scheme: 0 entries

Postgres: 4 entries

MySQL: 0 entries

SQLite: 3 entries

Jekyll: 9 entries

HTML: 40 entries

Markdown: 6 entries

LaTeX: 1 entry

Hugo: 12 entries

Next.js / Nextjs: 4 entries

Gatsby: 2 entries

Pelican: 0 entries

.com: 495 entries

.dev: 90 entries

.net: 84 entries

.io: 82 entries

.me: 53 entries

.org: 43 entries

.xyz: 15 entries

.page: 6 entries

github.io: 46 entries

medium.com: 18 entries

blogspot.com: 8 entries

wordpress.com: 4 entries

livejournal.com: 0 entries

tech: 178 entries

programming: 66 entries

random: 61 entries

thought: 49 entries

math: 16 entries

musing: 12 entries

blag: 1 entry

favorite: 28 entries

favourite: 9 entries

Now all of these results are string search results, so there is always going to be a little bit of noise when we try to draw conclusions out of these results. For example, the results for ".dev" also contains results that look like "*dev*.com".

Despite the noise, I found these results interesting. I remember in the early days when the blogosphere was being constructed 20 km above the tag clouds, it was very fashionable to have blogs for random musings or random thoughts. So I am delighted to see that most blogs out here are tech blogs. Surprisingly there is only blag. I expected at least a few more.

One of the Lisp entries is mine. Also, one of the Vim entries is mine. It is a bit ironical because I am actually an Emacs user. If I had known the comments we write on HN would become part of the search string in this blogroll, I might have chosen my words in my comment to the "Ask HN" port more judiciously! :)

  • boricj 2 years ago

    reverse engineering: 5 entries

    Ghidra: 1 entry (mine)

    On one hand it does bring some level of perspective on the popularity of a particular topic you're into. My first reaction was "Just 0.5% for reverse-engineering? I guess I'm down in a deep dark rabbit hole..."

    On the other hand, I haven't seen the blogs of Ken Shirriff, Alex Ionescu or Raymond Chen on that list, which I know are quite popular and regularly make it to the Hacker News front page.

    • saagarjha 2 years ago

      Presumably this would require them to show up on Hacker News and advertise their blog.

thomasahle 2 years ago

Has anyone done statistics on what generators / platforms people use? I currently use a mix / roll my own, but I'd love recommendations for a good setup.

In particular the features I'm after are: (1) Latex support for equations, (2) Support for code snippets, (3) Support for my own custom D3 or other javascript widgets.

  • steve_adams_86 2 years ago

    You and I want the same thing! I’m sad to say I haven’t found that yet. I’ve been considering rolling my own but I feel like so many similar things exist, there must be something out there already…

    One thing I did start building is sort of like a rudimentary code sandbox that’s geared towards running code inline to to explain concepts. I tried using existing solutions, but none really do what I’m thinking of. What might be ideal is something like observable.hq with the code and results visible. I’d like to show the DOM, console, or even both, along with the supporting code.

    Anyway, that’s a while off because it’s not trivial. Sometimes I’m surprised there isn’t something obviously suitable for this and I must be missing something, but everything I’ve found so far really misses the mark.

    One thing that kills me is that I want these widgets to live as long as my writing does. So many 3rd party tools could be gone next week; I can’t waste time throwing examples in there if it’ll just wind up MIA without warning.

  • epiccoleman 2 years ago

    I rolled my own for my current blog (at epiccoleman.com). I wrote a post about it, which honestly isn't that interesting, since it basically just amounts to writing posts in regular old html.

    I did use Tailwind for styling, mostly because I was interested in learning more about it.

    I use PrismJS for styling code blocks, and it works very well. No complaints there.

    The thing I like about "just use HTML" is that it ultimately affords a ton of flexibility if I ever want to embed some interesting layout or little JavaScript demo. A good example of this is this explainer section from a post I wrote about SVG. I'm proud of how this turned out and it wouldn't have been possible to make it look as good as it does without just manually writing the markup (scroll to "Understanding SVG", I don't think I put an anchor on the heading unfortunately):

    https://epiccoleman.com/posts/2023-04-05-svg-circle-of-fifth...

    I have another post about the "tech stack" here if you're interested: https://epiccoleman.com/posts/2023-03-07-how-i-built-this-si...

  • komali2 2 years ago

    Honestly I've found that if all I want is text, images, latex, code snippets, and maybe a tiny bit of javascript, then Hugo or maybe Jekyll with static deploys to some normal ass webserver is the most consistently easy and maintainable. Beyond that just straight up HTML files.

    I've had too many blog services close on me, too many frameworks go stale and require inordinate amounts of time to update, too many deploy strategies deprecate some aspect I depended on, to want to go through all that for whatever bells and whistles I get for doing the extra effort.

    My blog is just hugo https://github.com/komali2/blog

    and my co-op's blog is just a folder of html files in our website directory lol https://github.com/508-dev/508.dev/tree/main/src/blog

akiselev 2 years ago

Our future AI overlords sincerely thank you for this pristine data set.

MichaelMoser123 2 years ago

I guess twitter and reddit are charging money for API access, so scraping of good old blogs will become more important for the training those LLM models. I am not sure if i want to be part of this show.

Well, I guess that HN is also mined extensively for your our utterances, resistance is futile.

TimCTRL 2 years ago

Saw the .hn domain and I was like What...HN has its own TLD. Then i searched google and saw it belongs to honduras...daft me i guess..

  • mcmcmc 2 years ago

    All two-letter TLDs are country codes.

  • airstrike 2 years ago

    hah, had to Cmd+F for this comment because I also :O'd

arthurcolle 2 years ago

Can you add a "show all" option? And a CSV download? This is a great dataset

  • deathbypenguin 2 years ago

    Done!

    • arthurcolle 2 years ago

      Can you re-sync dataset so my blog is included hehe. I posted on original thread...

oneeyedpigeon 2 years ago

It's great. Is there any real point in sorting on description or url? I guess url does group http and https, which might be useful, but description definitely seems like it would be nicer if the sort option were removed.

voigt 2 years ago

Having the latest post of each blog available is an awesome, thank you for adding it.

Another killer feature on top would be to sort for latest post, so they can be ordered by date desc. This would make a great HN-meta news page :)

OliveMate 2 years ago

Thanks for this! It'll make finding everything posted in that thread much easier, and the random blog button has already sent me down a few rabbit holes (even if I don't understand half of them!).

version_five 2 years ago

Can you say what criteria you used to filter the thread into valid blogs?

kiruio 2 years ago

Cool, I forgot to add descriptions. Would be nice if I could fix it

kodah 2 years ago

Instead of making it static could you implement a submission process and liveliness checker? This seems like a really cool way to share content with each other.

ngshiheng 2 years ago

love the simplicity and UI, gonna add to my bookmark now

edit: spending more time on the site, i kinda wish the sites are tagged. from the original thread there was a site that someone wrote about <plants> which i find pretty cool. if i get "get a random blog" from the tags i like, that might be more relevant. i understand that this is going to be a difficult ask since you're largely dealing with unstructured data here

grozzle 2 years ago

Good project, but I've always hated the word "blogroll". It's a pun on "bog roll", like toilet paper, isn't it?

alfiedotwtf 2 years ago

Weird. I added to that original post, but I'm not on your list. Maybe your code didn't go to the "See more comments" page?

Aissen 2 years ago

It's very nice, thanks ! It would be nice if descriptions had new lines; some aren't readable, while they work quite well on HN.

landgenoot 2 years ago

Cool! Now I have to add my RSS feed to the <HEAD>, just like we did in the Firefox 3.0 era with a dedicated RSS-button.

PennRobotics 2 years ago

Sort suggestions:

karma / account age

karma / number of submissions

-----

Edit: this probably needs a weighting or minimum denominator to avoid new users getting launched to the top

I-M-S 2 years ago

Can we do this for HN users' podcasts?

  • swyx 2 years ago

    shameless plug for my pod: https://www.latent.space/podcast

    • cloverich 2 years ago

      This is a great podcast; discovered it a few weeks back and have listened to a few now. I especially like that it doesn't devolve into just chatting, but actually covers technical topics and gets into some of the nitty gritty.

      • swyx 2 years ago

        thanks very much!

gxs 2 years ago

There's a lot to sort through there - anyone have any recommendations/anything worth highlighting?

generalizations 2 years ago

Looks like there's still a few blogs with RSS feeds that are missing that tag in the list.

b8 2 years ago

Hmm, my blog wasn't added. Maybe when the data was scraped I hadn't posted it yet?

levysoft 2 years ago

I can't believe you had the same idea and necessity but you preceded me. Good job!

kaetemi 2 years ago

Anyone building a search engine?

Moncefmd 2 years ago

This is great! Would've been cool to also be able to sort by votes though.

petercammeraat 2 years ago

Brilliant. Easy to use as filter for subjects (if people described their blog)

bthallplz 2 years ago

These threads make me wish that I had a blog, not just a regular website. :(

  • nelsonfigueroa 2 years ago

    I took at look at your website and it seems like you could easily add blog posts to it!

    • bthallplz 2 years ago

      Hah, thanks. I've been hoping to do so, but still haven't gotten around to it. There's some quirks with the static site generator that I use[0] that lead me to keep postponing setting up blog-ish features, and I don't know enough python to fix them.

      [0]: https://github.com/gordonbrander/lettersmith_py

      • nelsonfigueroa 2 years ago

        If you ever want to try a new static site generator, I use Hugo[0] to generate my site. There's a lot of pre-built themes[1] you can use. Most (if not all) have blogging functionality built in, all you need to do is drop in a Markdown file with your content. You may need to learn a little bit if Golang if you want to customize themes. Just throwing it out there as an option.

        [0]: https://gohugo.io/

        [1]: https://themes.gohugo.io/

        • bthallplz 2 years ago

          Thanks! In writing out my reply to you I realized that I should look into other generators (specifically looking into Hugo, as I think I've seen it used by people like myself who take notes in Obsidian). The key features I want are backlinks support and blogging features, along with Markdown support.

      • NiloCK 2 years ago

        Not having a convenient and current publishing path shouldn't stop you from writing. Start your drafts folder!

1270018080 2 years ago

Conspiracy: That post was only made to harvest data for someone's model

  • bachmeier 2 years ago

    Well, given that blogs are public and the whole point is for others to read them, I think that's okay.

nickstinemates 2 years ago

Thanks for making this! It is great to see what people are writing about.

xwdv 2 years ago

I’m going to train an LLM on all these blog posts, make a true HN AI.

guy98238710 2 years ago

Needs sorting by last post date and a way to add new blogs.

revskill 2 years ago

Thanks for great work.

Next step could be support AI Chat with HN blogs ?

brentcetinich 2 years ago

The latest post seems to show the oldest post sometimes

jakebasile 2 years ago

Look ma, I'm in an HN link! This is pretty neat.

sublinear 2 years ago

Sounds a little too close to "bog roll".

joseferben 2 years ago

Thanks for putting this together, love the name!

zdwolfe 2 years ago

Looks cool, thanks for making this.

verse 2 years ago

Love this, thanks for building it!

hyperific 2 years ago

Thanks for doing this!