U+237C ⍼ Right Angle with Downwards Zigzag Arrow

659 points by cbzbc 4 years ago

Dislosure: I'm not directly from the fields of the Sciences Of Angles And Ambiguously Crossing Lines nor I've every seen or used this symbol before. However to me it's, pretty evidently, supposed to be a "no right angle" symbol.

(A) It's in the math section, (B) it's with angles, (C) the thunderbolt ↯ is commonly used for "not" or more specifically for dis-proof in this area and

(D) at least by my 30 s internet search on a mobile phone I couldn't find any other "no-angle" or "no-right-angle" symbol.

Someone could argue that usually you use a simple strike through as like as in ≠ (unequal), ∉ (not-element-of) or ∅ (empty set) but I would say it was chosen to avoid confusion in this case. The angle itself (without the "no/not") consists of only to orthogonal lines so it would be kinda complicated to "strike it though" in any direction without ambiguity that would resemble a triangle, a fork or whatnot.

■

mbauman 4 years ago

That doesn't jive with the history in TFA — the Unicode name and location was inferred from the symbol itself without knowledge of its meaning.
- dundarious 4 years ago
  
  I don't see the contradiction. The only thing they used from the name is the "right angle" aspect. Given their argument is this is a composition of thunderbolt + X, for some X (and derived from their prior knowledge of thunderbolt's compositional meaning), deciphering the image as "thunderbolt + right angle" is trivial and consistent with the naming origin in TFA.
zeteo 4 years ago

> (C) the thunderbolt ↯ is commonly used for "not" or more specifically for dis-proof in this area
Any examples?
- firstcommentyo 4 years ago
  
  High school physics and math as a major. I could scan you my scripts and papers if you're interested.....no won't. ;-D
  But maybe "commonly used" was maybe the wrong term. More appropriately: "sometimes" or "by some".
  
  IshKebab 4 years ago
  
  I have never seen it used as not once in maths or physics. "extremely rarely" perhaps.
  
  mywittyname 4 years ago
  
  To be fair, there are lot of math symbols out there.
  http://mirrors.dotsrc.org/ctan/info/symbols/comprehensive/sy...
  There are lots of examples of the lightning bolt in there. In fact, under ulsy Contradiction symbols, there are four variants.
  I also noticed the exact symbol being discussed is listed under "Angles".
  
  renewiltord 4 years ago
  
  Where in the world? I’ve never used it despite similar background. Perhaps regional?
- HuangYuSan 4 years ago
  
  I believe in German (possibly also other languages) the thunderbolt ↯ is commonly used to mean "this is a contradiction" in a mathematical proof, equivalently to in English a kind of ⋕ rotated by 45° or the symbol ※. The symbol ⟂ on the other hand means "false" and is used in particular in formal logic.
  
  froh 4 years ago
  
  Yes and no.
  Yes, we indeed used and afaict still use the thunderbolt for contradiction in my German university.
  However "perpendicular" and "bottom/falsum" are two different Unicode codepoints with very similar glyphs.
  https://en.m.wikipedia.org/wiki/Up_tack
- contravariant 4 years ago
  
  I've seen it used for contradiction. Though that's not the same thing as 'not' and I can't think of why you'd combine this with orthogonality.
- hedora 4 years ago
  
  If the thunderbolt means not, and the right angle is displaying the x and y axis, then this symbol could be a pun for "not a function".
_haoa 4 years ago

Right angles have a small box near the vertex which denotes it is a right angle [0].
This symbol doesn't have that box, so I don't think it's a right angle.
[0] https://en.wikipedia.org/wiki/Right_angle#/media/File:Right_...
Edit: This merely adds to the confusion, since the name of the glyph contains the words "right angle."
¯\_(ツ)_/¯
- cgriswald 4 years ago
  
  https://en.wikipedia.org/wiki/Right_angle
  > In Unicode, the symbol for a right angle is U+221F ∟ RIGHT ANGLE (HTML ∟ · &angrt;). It should not be confused with the similarly shaped symbol U+231E ⌞ BOTTOM LEFT CORNER (HTML ⌞ · &dlcorn;, &llcorner;). Related symbols are U+22BE ⊾ RIGHT ANGLE WITH ARC (HTML ⊾ · &angrtvb;), U+299C ⦜ RIGHT ANGLE VARIANT WITH SQUARE (HTML ⦜ · &vangrt;), and U+299D ⦝ MEASURED RIGHT ANGLE WITH DOT (HTML ⦝ · &angrtvbd;).[5]
  > In diagrams, the fact that an angle is a right angle is usually expressed by adding a small right angle that forms a square with the angle in the diagram, as seen in the diagram of a right triangle (in British English, a right-angled triangle) to the right. The symbol for a measured angle, an arc, with a dot, is used in some European countries, including German-speaking countries and Poland, as an alternative symbol for a right angle.[6]
- mikeryan 4 years ago
  
  This merely adds to the confusion, since the name of the glyph contains the words "right angle."
  The article notes that sans a given meaning the glyph was given a “descriptive name”.
  So you’re not wrong? :-P
froh 4 years ago

Perpendicular + Unicode combining solidus = ⟂ + / = ⟂̸
- r0uv3n 4 years ago
  
  I think perpendicular most commonly refers to lines/vectors/planes etc., while the right angle symbol refers to angles. Also, there are often multiple symbols expressing the same thing.
  
  froh 4 years ago
  
  Yes, a sibling comment meanwhile found the right angle symbol U+221F ∟
  https://news.ycombinator.com/item?id=31015104
esperent 4 years ago

> the thunderbolt ↯ is commonly used for "not" or more specifically for dis-proof in this area and
I don't think it's that common. At least, I don't recall seeing it ever. Maybe it's used in non-English mathematics?
Wikipedia mentions it's also used in electrolysis so maybe this new one is related to that somehow?
- maze-le 4 years ago
  
  It's used in german mathematics education (secondary level), either to mark a contradiction in a proof or more generally to mark an erroneous statement.
  
  ruuda 4 years ago
  
  Also in Dutch universities to mark a contradiction, especially in a proof by contradiction.
  
  dirkt 4 years ago
  
  But I have never seen it to mark negation of a condition, that's usually done with a slash (as in ≠ ≮ ≯ ≰ ≱ ≴ ≵ ⊄ ⊅ ⊈ ⊉ ⊊ ⊋ ∉ ∌ ∄ ∦, you get the idea).
  So for "not a right angle" I'd have expected a "right angle" symbol with a slash through it.
  
  r0uv3n 4 years ago
  
  But how would you position the slash to get a somewhat easy to decipher symbol? To me, the right angle symbol seems to lend itself more to this unorthodox negation through the contradiction symbol than to negation through the normal slash.
  
  maze-le 4 years ago
  
  Funny enough, I've only seen it at the Gymnasium (secondary level) and not in the University a few years later -- then indeed the usual symbols were the 'slashed' relations like you've described, or the bottom symbol: ⊥ in logic. Maybe it's an idiosyncrasy of a certain subset of math teachers.
- qiskit 4 years ago
  
  Same. Never seen that symbol in my life. I've seen ¬, ~, !, etc used for not/negation in computer science, math, logic, etc.
  And some commenters said they used it to mark proof by contradiction, but why is there a need to mark it when you are showing it via proof? A canonical example of proof by contradiction is proving sqrt(2) is not rational. Never have I seen it marked with that symbol. Where would you even mark it? At the beginning with the assumption? Or at the end like QED?
  
  AaronFriel 4 years ago
  
  Math degree holder from Iowa, yeah, I've seen and used it many times. The symbol is used when you reach the contradictory statement. Like "1 = 2".
  "By way of contradiction suppose P, then ..., thus ~P ↯. Therefore ..."
  
  phogster 4 years ago
  
  To me that reads: I throw your assumption into the GROUND!
  
  nerdponx 4 years ago
  
  I've actually always wanted a way to mark "the contradiction" once I've obtained it. Thanks!
  
  valtism 4 years ago
  
  I was taught it in extracurricular mathematics in Australia. We were taught that it goes at the end of a contradiction proof once the contradiction has been found. We used to write it extra large, like lightning strike. I think of it like a proof mic-drop.
- ratmice 4 years ago
  
  I believe I have seen it used as a symbol which indicates the discharge of an assumption, but never for "not".
- ceh123 4 years ago
  
  It's the first symbol referenced for symbols used in proof by contradiction to show contradiction [0]. I know that's not exactly "not" or "disproof" but I think that might be what the poster was getting at.
  [0] https://en.wikipedia.org/wiki/Contradiction#Symbolic_represe...
danparsonson 4 years ago

I submit to you that it's clearly not a thunderbolt but an arrow indicating changing directions; that being overlaid on top of a pair of axes is obviously useful in the study of non-Euclidean geometry to indicate the use of wibbly-wobbly dimensions.
- etothepii 4 years ago
  
  Particularly useful for timey-whimey relativistic analyses.
  
  Shorn 4 years ago
  
  Related to The Whole Sort Of General Mish Mash.
kens 4 years ago

I've thought that it would be cool to have a Wiki with an entry for each character, describing what it is, and its history. Although that wouldn't help for mystery characters like this one, there are a lot of characters with stories behind them.
- sprayk 4 years ago
  
  I like this idea. It would serve as a place to put a well-sourced answer to the question about this character, and the talk section could be used to discuss further investigation into the topic, or when new uses inevitably arise.
- paledot 4 years ago
  
  I was just discussing :man-in-business-suit-levitating: with some friends earlier today. Also an interestingly cryptic background, albeit not an unsolved one.
  https://emojipedia.org/person-in-suit-levitating/
  (Edit: Apparently HN automatically removes emoji.)
  
  logbiscuitswave 4 years ago
  
  The story behind MIBSL is definitely fascinating and some great trivia there. There’s a longer article about it here: https://www.newsweek.com/2016/05/06/secret-ska-history-man-b... that covers not just the inspiration for the emoji itself, but a brief history behind the inspiration behind the inspiration. Lots of levels of metaness to unpack.
  
  alx__ 4 years ago
  
  I love this! I've always assumed it was a rude boy emoji. Was briefly in a high school ska band :D
- subroutine 4 years ago
  
  Wikipedia already does this for many symbols. See for example...
  https://en.wikipedia.org/wiki/Miscellaneous_Technical
  Aside from the table describing each symbol, if you scroll to the bottom of the page, it links out to full articles related to each. For a full list see...
  https://en.wikipedia.org/wiki/List_of_Unicode_characters
mizzao 4 years ago

Reminds me of https://xkcd.com/2606/
Explained: https://www.explainxkcd.com/wiki/index.php/2606

bamboozled 4 years ago

These unicode characters feel like they were given to us from an alien species or something.

How did it we end up with so many characters of unknown origin?

I had no idea what it meant or was used for, thus assigned it a “descriptive name” when collating the symbols for the STIX project. (I still have no idea, nor can supply an example of the symbol in use.) […] it is the case that ISO 9573-13 existed long before either AFII or the STIX project were formed. […] I once asked Charles Goldfarb what the source of these entities was, but remember that he didn’t have a definitive answer.

bryanrasmussen 4 years ago

>These unicode characters feel like they were given to us from an alien species or something.
I worked at a large media company that had lots of differing icon sets in play across different media.
These icons were in SVG and they had been optimized pretty intensely. In some cases due to a bug in one of the optimizing tools some types of bezier curves got weird, so instead of say the round headed person with their hand held up to say stop it was the star headed monstrosity pointing to doom from the heavens. Because of how the icons were used and not used these optimized errors were actually sitting around so long that nobody had examples of the original icons although one could guess because in some cases we had similar ones in other projects that had not been optimized.
So maybe a similar thing would be the source of these weird alien entities.
- imglorp 4 years ago
  
  Well, Klingon [edit, was proposed] for Unicode. Maybe someone imported some 70s scifi orthography, just because.
  
  lifthrasiir 4 years ago
  
  Not yet. Even the 2021 request [1] to remove Klingon from the Not The Roadmap list [2] is in hiatus.
  [1] https://www.unicode.org/roadmaps/not-the-roadmap/
  [2] https://www.unicode.org/L2/L2021/21155-klingon-req.pdf
  
  Freak_NL 4 years ago
  
  Did it? It was proposed a few times; did the last proposal actually land?
  
  masklinn 4 years ago
  
  > Klingon made it into Unicode.
  No it did not. Klingon was originally proposed in 1997 and rejected in 2001. A second proposal was made in 2016 with more optimistic noises. But AFAIK it has yet to be accepted.
  It is also, like Tengwar and Cirth (which AFAIK remain unincluded even though they are on the BMP roadmap), held back on IP grounds. To my knowledge, the IP issues remain fully unresolved.
  Klingon is included in the ConScript registry, but that is unrelated to unicode itself, it performs ad-hoc and non-standard allocations in private use areas.
  
  teddyh 4 years ago
  
  ConScript seems to have been semi-replaced by the Under-ConScript:
  https://www.kreativekorp.com/ucsur/
- bamboozled 4 years ago
  
  I would've thought they'd have a table of every icon and a description or something, maybe at the time it was never taken very seriously or likely to take off as it did, so people didn't bother. Like IPv4...
- tclancy 4 years ago
  
  >the star headed monstrosity pointing to doom from the heavens
  That sounds like a useful reaction/ response these days.
- buescher 4 years ago
  
  >star headed monstrosity pointing to doom from the heavens
  Could someone please feed that to DALL-E?
  
  kingcharles 4 years ago
  
  Every post and Tweet on the Net now includes this exact reply by someone.
  What monster hath we unleashed?
  
  CyanBird 4 years ago
  
  Listen, I would if I could, that invite list must be huuuge by now, that or go mighty slow
somedude895 4 years ago

> Notably, it appears that anyone could register a glyph with the AFII for a fee of 5$ to 50$ (about 8.60$ to 86$, accounting for inflation). Even if the International Glyph Register can be found, it likely merely contains another table with the glyph, the indentifier, and the short description. To know its origins would require the original registration request that added the character, but it’s unlikely that such old documents from a now-defunct non-profit organization in the 90s would have been kept or digitized.
Could be any random kid who found out about this and wanted the cool symbol they made up registered.
- lifthrasiir 4 years ago
  
  In some sense, you can still do! The Ideographic Variation Database [1] essentially allows a definition of new CJK ideograph [sic] as a glyphic subset of existing characters, with a possible processing fee.
  [1] https://unicode.org/ivd/
rcarmo 4 years ago

I suspect there's an entire alien alphabet (like Marain, for instance) in there someplace. There was a proposal to stuff Klingon into the Private Use Area, at least...
- speed_spread 4 years ago
  
  If you're willing to use a discontinuous subset you could probably find close enough glyphs to make a full Marain. Ordering would be messed up and require a lookup table though.
JKCalhoun 4 years ago

I assumed W.A.S.T.E. were behind them.
(Might need to add this: https://en.wikipedia.org/wiki/The_Crying_of_Lot_49)
dicytea 4 years ago

Something similar exists in JIS called 幽霊文字 (ghost characters), which refers to kanji of mysterious origin with no real-world usage that somehow made its way into the JIS character set. After some investigation, most of them turns out to be mistranscriptions of kanji from old historical materials.
- DavidVoid 4 years ago
  
  https://en.wikipedia.org/wiki/JIS_X_0208#Kanji_from_unknown_...
  Due to this thorough investigation, the committee was able to pare down the number of kanji for which the source cannot be confidently explained to twelve, shown on the adjacent table. Of these, it is conjectured that several glyphs came about due to copying errors. In particular, 妛 was probably created when printers tried to create 𡚴 by cutting and pasting 山 and 女 together. A shadow from that process was misinterpreted as a line, resulting in 妛 (a picture of this can be found in the Jōyō kanji jiten).
dirtyid 4 years ago

I remember convincing friend to build unicode pokedex extension that collected all the unicode symbol he was exposed to via cansual web browsing. Never followed up but I think it'd be neat, or something along the lines of rare unicode browser bingo.

ghostoftiber 4 years ago

(Edited to upload the image to imgur and avoid spammy advertisements).

Here I'll date myself: I remember this as "diode with a gate". Back when we did circuit diagrams with stencils, you had the diode stencil which looks like a triangle with a line on top, and then with the electrical stencils you had "decorations".

The intention was to put down the original symbol on the paper, move the decorations stencil over top of it and then add the required decorations. It's why diode symbols look like this: https://imgur.com/a/0tSLV7O (notice "step recovery diode").

The "lightning bolt" isn't a lightning bolt, it's a hint that this diode is going to have a very sharp "snap off" in the waveform. See: https://www.electronics-notes.com/articles/electronic_compon...

OK so why do we have a seperate decorator for a diode? Can't we just have a pocket full of stencils for diodes? Space was at a premium back then. It goes back to daisy wheels and typeballs: https://en.wikipedia.org/wiki/Printer_(computing)#Impact_pri... You would have one position for "diode" and one position for "decorator" and the printer would know when it got one ASCII char it would print the diode, then send whatever the thin space is to advance the print carriage a small step, then print the decorator.

Someone should be able to find a daisy wheel or typeball dedicated to circuits and bear this out.

esquivalience 4 years ago

That first link is a redirect spiral through multiple interstitial ads. Enjoyed the rest of the comment though!
- ghostoftiber 4 years ago
  
  Thanks for the heads up - I've edited the post to a copy of the image I uploaded to imgur.
  
  MzHN 4 years ago
  
  Ironically imgur is nowadays very, very user hostile.
  You can't view an image without JavaScript. Once you enable it you get "f*ck your privacy" popups, and ads if you don't have a blocker. On mobile I can no longer view anything on imgur at all, only the top bar renders for some reason. There seems to be no way around this.
  It was also recently, although after the downhill, bought by a company that specializes in buying dying social media platforms and milking them dry with questionable ethics. How questionable? Well, they got into a Darknet Diaries episode https://darknetdiaries.com/episode/93/
- themodelplumber 4 years ago
  
  > a redirect spiral through multiple interstitial ads
  For a second I was thinking you meant this as the correct definition of the symbol, and was very surprised :-)
  
  esquivalience 4 years ago
  
  That is horribly plausible!

Someone 4 years ago

I would think something like this:

      |
      |  \
      |   \
      |    \  /\
      |     \/  \
      |          \
      |          _\/
      |
      +———————————————

Could (more or less) fit that description and would make more sense as a symbol. Something like it even made it into Unicode (https://emojipedia.org/chart-decreasing/)

GavinMcG 4 years ago

That to me immediately communicates a decreasing chart. I would have no idea that the right-angle lines represent right angles generally and not chart axes.
Cthulhu_ 4 years ago

It's like the icon in question is a drunk / mirrored version of this one, from memory, drawn behind the back.
It's like o7 vs 7o; if you know you know: http://i.imgur.com/ZjhHU87.jpg
jandrese 4 years ago

The article makes a decent case for the symbol to be a chart symbol that means "no right angle". The zig zag arrow apparently being a shorthand for "no" in that particular circle.
It looks like a symbol that someone added for completeness but isn't particularly useful even in the field.
standeven 4 years ago

This was my first thought as well. Either a misdrawn version of this, or a corrupt SVG, that somehow made it to production.

slowmotiony 4 years ago

I remember back in the day we used to find publicly exposed Windows FTP servers, create new folders using some messed up unicode characters and upload pirated games and movies there to share with each other. The only way to open those directories was to specifically type the exact path in unicode, simply double clicking on the folder in filezilla or windows explorer resulted in a error. Sometimes the admins themselves couldn't delete them and just left them there. Good times.

technothrasher 4 years ago

I remember the days of people beginning to abuse ftp sites, all us admins shutting down our writable ftp upload folders, and thinking, "this is why we can't have nice things." It was the beginning of the end of the early, friendly internet.
- vletal 4 years ago
  
  I do not get it. Did you have to shut it down? Does not make sense to complain that someone uploaded stuff to a public unprotected writable storage. Wouldn't securing it with a set of credentials suffice?
  
  p_l 4 years ago
  
  Some were open for uploads by design, in spirit of sharing things - essentially use the free space left after maib purpose to provide friendly mirrors for things like new projects etc. I recall using Archie to find copies of open source software at the ending edge of that era.
  Some also were used as submissions for projects, long before sites like sourceforge started. Especially since plonking a bigger source dump on newsgroups wasn't exactly well received.
  
  jorvi 4 years ago
  
  Sometimes people should be able to do nice things without it getting abused, no?
  In The Netherlands, in the nicer neighborhoods we have something called a ‘buurtbieb’ aka a ‘neighborhood libraries’, which is a weatherproof cabinet where people can put surplus books that other people in the neighborhood can borrow.
  Of course you could take all the books or use the cabinet to store candy, but why would you?
  
  jumpkick 4 years ago
  
  We have these throughout many neighborhoods in my city in central Florida, USA. We’re a college town so I just assumed it was somehow connected to that. Neat that it’s an international thing!
  
  dividedbyzero 4 years ago
  
  Munich, Germany has them as well
  
  Liquid_Fire 4 years ago
  
  In the UK these are commonly set up inside old unused telephone boxes - you can find them in many villages/towns, e.g.: https://nothingintherulebook.com/2018/11/03/british-phone-bo...
  
  neutronicus 4 years ago
  
  Here in Baltimore, MD, too, although the focus is mostly on kids books
  
  evandrofisico 4 years ago
  
  Here in Brazil we have those on bus stops.
  
  yawz 4 years ago
  
  Great to hear these little neighborhood libraries are international. We have them here where I live in Colorado, US.
  
  Kon-Peki 4 years ago
  
  Indeed. The zoning code for my town specifically calls them out (as allowed, with no permits necessary).
  
  theandrewbailey 4 years ago
  
  Can confirm neighborhood libraries are a thing in Pittsburgh (USA).
  
  sodapopcan 4 years ago
  
  We have them in Toronto, ON. We call them LLLs or Little Lending Libraries. There are actually quite a lot of them.
  
  coldacid 4 years ago
  
  We have them out in the 905s too. I've seen quite a few of them here in Durham Region.
  
  username923409 4 years ago
  
  I've also seen many of these at bus stations near Victoria, BC.
  
  phyzome 4 years ago
  
  Usually called Little Free Libraries in the US.
  (The name is a little weird, because regular libraries are also free...)
  
  DocTomoe 4 years ago
  
  Obligatory "Free as in beer vs. free as in freedom" comment. I have pulled stuff out of small community bookshelves that would never have seen their chance in a "professional-run" public library, both bad and good.
  
  samatman 4 years ago
  
  The word "public" in "public library" is load bearing, you can't replace it with "regular", hence your confusion.
  Private libraries (mine for example) are not free, as in beer or otherwise.
  
  db48x 4 years ago
  
  True, though to be fair most people never get to use private libraries. Or they used a library at their University that was technically private, but that gave access to the public as well. Public libraries are ubiquitous and very normal, while private libraries are the exception.
  
  samatman 4 years ago
  
  It's a normal elision, yes, we all picture a public library when we say "library". But "free library" isn't redundant or weird, because "public" is a modifier of library, not a trait.
  People tend to call their personal library a "book collection" or the like, but it's a library, in just the same way that a Little Free Library is.
  So most people who read have at least a small private library, whether they think of it in those terms or not.
  
  robonerd 4 years ago
  
  In America, public schools all have private libraries, reserved for attending students. (Maybe some operate as public libraries, but I've never seen nor heard of it.)
  Furthermore, public libraries are not necessarily free. In America they virtually are all; fees only for late returns. But this is not globally true; in some parts of the world, libraries open to the public charge a fee for checking out books, or even require a fee for entry.
  
  elliekelly 4 years ago
  
  There are two libraries near me that aren’t free - they charge an annual “membership” fee. One even operates more like an old blockbuster when it comes to newly released books. They charge a daily rental fee! It’s 25¢ a day, I believe.
  
  notreallyserio 4 years ago
  
  FWIW "Little Free Library" is a trademark and its owners have been aggressive in its defense. I don't know what folks should use as a generic name.
  
  Beldin 4 years ago
  
  Buurtbieb - in English, roughly pronounced as b-eew-rt-beep.
  It's a literal translation of "neighbourhood library", it alliterates, and it sounds cute. (Keep the "beep" part short for that).
  
  hedora 4 years ago
  
  Just keep using it as a generic name. They've already lost the generification war. Are they seriously going to track down and sue neighborhood libraries?
  Good luck getting a jury to enforce the trademark.
  
  bee_rider 4 years ago
  
  It seems bizarre to me that someone could trademark such a straightforwardly descriptive name.
  
  frosted-flakes 4 years ago
  
  Yeah, but if you take a book from a LFL, you own it. With a public library, you merely borrowed it.
  
  dwighttk 4 years ago
  
  I always took it to mean “no really this is free, take a book!”
  
  Arubis 4 years ago
  
  There’s actually coordination around these things: https://littlefreelibrary.org/
  
  boredumb 4 years ago
  
  In Puerto Rico there are quite a few of these on the sidewalk and despite the rains they are generally always stocked with books. There are bars on everyone's windows and doors, but books piled up on the street.
  
  chasd00 4 years ago
  
  there's one down the street from me but instead of books it has canned food. It says "little free pantry" on it. It must have been around for a while because the neighborhood it's in has long sense been gentrified and is populated with very well-off residents vs the working poor that use to live there.
  
  mbeex 4 years ago
  
  I think, you don't get the full grasp of "early, friendly internet". Very few people do today. In my bubble - programming, for example, young people can't even imagine that there were times when you could focus on _things_ instead of writing layers of security code around them.
  
  hardware2win 4 years ago
  
  I think you make it sound as if that was good, but it was straight naive or irresponsible
  
  0des 4 years ago
  
  It was a different time
  
  beowulfey 4 years ago
  
  Sure, in today’s world.
  That’s like saying it would be naive and irresponsible for me to go outside without a life preserver today despite an unforeseen catastrophic global flood drowning the lands 10 years from now. It was a different world, with different expectations and frameworks.
  
  ysavir 4 years ago
  
  The GP is saying "I miss the days where I could easily exploit people" and the response was "I miss the days where we respected each other enough to not exploit each other". It wasn't naive or irresponsible, but reflective of a time with more trust, cooperation, and good intentions.
  
  alex3305 4 years ago
  
  Reminds me of a few years ago, when I accidentally exposed my Domoticz install to the internet without authentication. I've had missed something in my Nginx config with X-Forwarded-For headers. After about a week or something apparently a foreign visitor came by my install and decided to have some good fun. Turning my lights on/off at random times. It took me about 3 days to realize what have happened, but in the mean time he didn't just destroy my install and only mess with me. Which was really sweet, because nuking the system would be far easier than opening the webpage every night.
  That was a good and fun security lesson though and now I always check outside security with a mobile hotspot.
  
  FabHK 4 years ago
  
  "There are villages on the countryside that are safe and friendly, everyone knows each other, people don't even lock their door."
  - "Man, those idiots are naive and irresponsible."
  
  hardware2win 4 years ago
  
  1 I didnt call any1 "idiot"
  2 it s not like other people cant go to those places, thus it is kinda irresponsible
  
  adrusi 4 years ago
  
  That's like saying it's naive and irresponsible to gooutside without locking your front door when you live in a tiny remote village with 40 other people you've known for your whole life.
  
  hardware2win 4 years ago
  
  Not really, in your example theres no way any1 appears and even if he does, then your friends protect ur stuff
  Meanwhile internet aint remote village
  
  GavinMcG 4 years ago
  
  Point is it used to be
  
  stirfish 4 years ago
  
  Do you know of any tiny remote internet villages left? There has to be a few
  
  fasquoika 4 years ago
  
  https://tildeverse.org/
  
  grahameb 4 years ago
  
  It makes me sad to think of all those simple little services we used to run on *NIX machines, like `finger` and `whois`. You'd never want to disclose that information now, but at the time it was quite nice to be able to see if a friend or colleague was around with a simple network query.
  
  joquarky 4 years ago
  
  I remember when I could connect to nearly any server on the internet on port 25 and manually type the commands to send an email.
  .
  
  siriussidus 4 years ago
  
  You can still submit mail to virtually any mail server using telnet. I just tried it on Gmail for curiosity, and it did work!
  
  brimble 4 years ago
  
  I dunno. Everyone fairly-publicly shares their entire friend network and what they had for lunch, now, usually under their real name.
  
  williamscales 4 years ago
  
  It’s like how when I was a kid, nobody in our neighborhood locked their doors at night. There was no need. Until there was.
- bfuller 4 years ago
  
  i was 13 when my public upload folder started getting messed with, sad day
- wanderer_ 4 years ago
  
  You guys should read The Cuckoo's Egg by Clifford Stoll. It's a classic.
- TameAntelope 4 years ago
  
  The fact you believed it would last is proof we still can have nice things. :)
- totetsu 4 years ago
  
  Wearz were very nice things.
  
  egfx 4 years ago
  
  It’s Warez.
  
  throwaway787544 4 years ago
  
  "wah-rez"
  
  jkhdigital 4 years ago
  
  Wait what? It’s pronounced like the city in Mexico?
  
  AdamH12113 4 years ago
  
  I've heard a lot of people pronounce it like that, but I'm pretty sure that's not correct. It's clearly the English word "wares"[1] with the S replaced with a Z, similar to "hackz" and "cheatz", which were also common in that era. I think the "wah-rez" pronunciation came from people seeing the l33tspeak and not recognizing the original word behind it.
  [1] A synonym for "goods" or "products". See https://www.merriam-webster.com/dictionary/wares
  
  jholman 4 years ago
  
  It's not a synonym for "goods", because only one type of thing was ever "wares"; software. It's just for dividing up the sections of your piracy BBS into, like "filez" (files, multi-kilobyte textfiles full of instructions on how to make bombs etc), "imagez", "warez", etc.
  Anyway, by 1990, in the piracy circles I distantly associated with, it was quite common to pronounce it like "juarez". Sort of semi-ironically, like, it's obviously the wrong pronunciation, but nonetheless everyone uses that pronunciation on purpose. So, what could be more correct than "the thing everyone does"?
  Of course, pronunciation only happens in meatspace (or at least it did back before MP3s and before YouTube and so on), and of course I'm talking about clusters of teenagers separated by thousands of km. We had "meetupz" or "meetz" in my city, which is how I know how "everyone" pronounced it... but it's certainly possible that in most cities/whatever there was some other pronunciation rule.
  
  blowski 4 years ago
  
  > It's not a synonym for "goods", because only one type of thing was ever "wares"; software. It's just for dividing up the sections of your piracy BBS into, like "filez" (files, multi-kilobyte textfiles full of instructions on how to make bombs etc), "imagez", "warez", etc.
  Citation needed there.
  I have always assumed it came from fleamarkets where people selling pirated VHS films and knock-off Rolexes would be described as “selling their wares”. Changing the s to a z was an obvious step in 90s internet culture.
  
  jholman 4 years ago
  
  Okay, so my citation is, I was there, I was a (fringe) participant in pre-internet piracy culture, starting in 1990.
  Pirate BBSes would have various "goods" (in the sense you and GP mean) available for download, including images (hint: some of them may have involve ladies), text files, and software. Sometimes there would also be sections for various art media created by users, such as .mods or ASCII art or poetry or whatever. Those various "goods" would never be all slopped together, they'd be divided into categories. And the category called "warez" would never, ever, have anything in it other than pirated software.
  I agree that the s-to-z thing is just classic hacker/leet culture, though it's not internet culture, because it predates the people in question having internet access. I'm saying that the "wares" that becomes "warez" is not "wares-as-in-goods", it's "wares-as-in-softwares". It's pluralized even though "software" is a non-count noun, because then it fits with "files", "images", and so on. And yes, ultimately the "-ware" in "software" is from the sense that you and GP are talking about; I'm saying that the etymology is not directly from there, because otherwise all the other kinds of pirated stuff would also be "warez", and it never, ever, was.
  
  AdamH12113 4 years ago
  
  I too never seen "warez" used to refer to anything other than pirated software. You make a good point about the derivation; it probably is directly from "software". Adding a superfluous Z to the end of a plural mass noun was also a characteristic of l33tspeak, as I recall.
  
  Maursault 4 years ago
  
  I'm not sure how you are missing it, but hardware and software both etymologically have ware (as in a manufactured article, product, or merchandise) built-in to them. Without ware, there would be no hardware or software, or warez. The root of these words, also silverware, cookware, courseware, Tupperware, Corningware, etc., indeed is "ware." And wares is merely the plural of ware.
  
  mlyle 4 years ago
  
  > I think the "wah-rez" pronunciation came from people seeing the l33tspeak and not recognizing the original word behind it.
  I think it was explicitly luls a lot of the time. I saw "warez" spelled as "juar3z", etc, a lot.
  
  ykonstant 4 years ago
  
  To be fair, L-thirtythreet-speak pronunciations can be quite confusing!
  
  raydev 4 years ago
  
  This reminds me, my friend and I were the only people we knew who'd even used the internet in the late 90s so no one was around to correct us, and 3 of the apparently incorrect pronounciations we had agreed on were:
  - war-ehz
  - gee-aw-cit-eez
  - jif
  
  Doubtme 4 years ago
  
  oh my god rapidshare was hot garbage
sen 4 years ago

We did the same thing using the character for a non-breaking space, I think it was ALT+0160. It would sort last in the list, and just be an effectively-invisible entry unless you were really paying attention. Combined with an exploit we had to change users on the FTP servers behind most dialup ISPs hosting (the free couple Mb hosting you’d get with your dialup account that very few people cared about or used), meant we had pretty much unlimited file hosting, filling random families web hosting with hidden folders full of mp3s and warez.
paskozdilar 4 years ago

I remember making secret directories on my Windows desktop by using a transparent icon and ALT+255 as filename. Good times.
- ale42 4 years ago
  
  I was doing the same on MS-DOS, keeping "secret" files on a floppy disk with a directory having a name ending with an invisible Alt+255... it was even impossible to look inside it with the Windows 3.1 file manager.
vishnugupta 4 years ago

That exact memory crossed my mind as soon as I saw that U + <number> in the title :-D. Fun times indeed!
kingcharles 4 years ago

You too, huh? This was my first foray into the "dark" side of the Internet as a kid, pre-Web, hanging out with pirates on IRC and get "hired" to go around the early 'Net and fuck up people's upload folders by creating hidden directories we could load with our group's warez. ^H^H^H^H
moogly 4 years ago

_vti_cnf

ezoe 4 years ago

There are some kanji scripts that has no record of existing usage in the JIS character encoding which was also incorporated to the Unicode. It's called "ghost character" in Japanese.

https://ja.wikipedia.org/wiki/%E5%B9%BD%E9%9C%8A%E6%96%87%E5...

kingcharles 4 years ago

I feel bad for the font designers who have to put all these inane characters in, have to draw them and hint them, and they have no purpose except they have to be there or someone will complain.
- lifthrasiir 4 years ago
  
  Fortunately there are only a handful of such cases. But unfortunately there are tons of commonly used CJKV ideographs; typical Chinese or Japanese fonts are of course not expected to have all Chinese characters (there are almost 100,000 of them while OpenType fonts can only have 65K glyphs), but they are expected to have thousands of commonly used characters.
  
  ezoe 4 years ago
  
  It must be really nice that even an amateur font designer can single-handedly create a quality font for English usage in his spare time.
  For Japanese, it requires a minimal of few thousands of characters and symbols and it still doesn't cover all the commonly used characters today.

jeffnappi 4 years ago

The person who appears to have done the work of collecting this character (and others) for submission into the Unicode process back in 1997[0] (Barbara Beeton) has actually responded to the StackExchange question[1].

Unfortunately even she is not aware of what the symbol is actually for.

[0] https://www.ams.org/STIX/bnbranges.html [1] https://tex.stackexchange.com/a/640596

primer42 4 years ago

So Unicode has all these mysterious characters... but I would bet that it's still true that many people on the planet speaking common languages can't even type their name...

This post is from 2015, and I'd love to know if unicode has added better support for non-English languages since then.

https://modelviewculture.com/pieces/i-can-text-you-a-pile-of...

Based on https://en.wikipedia.org/wiki/Bengali_(Unicode_block), only 3 more Bengali characters have been added since 2015.

giraffe_lady 4 years ago

That publication was so good, I was really bummed when they shut down. Looks like they came back for a minute in 2020? I had no idea but I know what I'm doing tonight.
nograpes 4 years ago

I was very surprised by your comment and by the article you linked that the name Aditya cannot be represented in Unicode. I think it can be represented: আদিত্য.
I am not a Bengali-speaker, but I am familiar with the class of scripts to which the Bengali script belongs, abugidas. These scripts assume a vowel following every consonant. When two consonants occur one after the other in a word (a consonant cluster), this must be represented specially, because if you just wrote (consonant, consonant) it would be pronounced (consonant, inherent vowel, consonant).
The "ty" in Aditya is one such consonant cluster. The way this cluster is written is ত্য. This is represented as three code points (I think I am messing up the proper terms), one for the "t", one to "join", and one for "y".
Some people think of the special shape that the final "y" as a separate character on its own. In fact, it has it's own name (ya-phalā). I can understand why it would be confusing to see that the ya-phalā can't be typed as its own single character (" ্য"), but it really has to do with a difference in how the input is is implemented and how the person thinks about their own language.
In fact, on the unicode.org site, typing this very character is part of the FAQ for Bengali: https://unicode.org/faq/bengali.html#6
- andlarry 4 years ago
  
  There was a lot of discussion [0] of that point when the Model View Culture article was originally posted 7 years ago.
  It's complicated, but the author of the piece seems to take issue with how the character set was designed by the language authorities the UTC delegated to.
  The whole comment thread is an interesting read.
  [0] https://news.ycombinator.com/item?id=9220147
goto11 4 years ago

The article present it like it purely due to western-centrism these characters does not have distinct code points in Unicode. In reality the issue is much more subtle - a discussion whether a certain glyph is a ligature of two characters or its own distinct character.
kens 4 years ago

I read that "I Can’t Write My Name" article when it came out and it's remarkably misguided. First, there are solid linguistic reasons why Unicode handles that character the way it does. Second, the article completely misunderstands how the Unicode Consortium works. Finally, the Unicode Consortium is remarkably open to character proposals from random people. The author could have written a proposal and fixed the problem in half the time it took to write the article. Source: I am a random person who got multiple characters added to Unicode.
- yesenadam 4 years ago
  
  > I am a random person who got multiple characters added to Unicode.
  Tell us that story please!

cheschire 4 years ago

The name itself sounds like it should be a graph of a downward trend line on a graph.

I’m guessing the person who implemented it got this exact requirement wording in the Unicode definition and nothing else, didn’t make the logical connection, and just implemented it as close to literally as they could.

Jarmsy 4 years ago

There's already U+1F4C9 for that though.
- scbrg 4 years ago
  
  If by "already" you mean "eight years later" :)
  ⍼ (U+237C) is in Unicode 3.2 (from 2002), (U+1F4C9) is from Unicode 6.0 (from 2010).
  [edit]: HN ate my 1F4C9 glyph. Use your imagination :)
  
  mkl 4 years ago
  
  https://codepoints.net/U+1F4C9 CHART WITH DOWNWARDS TREND
donkeyd 4 years ago

The update under the article has an explanation of where the name probably came from:
> I had no idea what it meant or was used for, thus assigned it a “descriptive name” when collating the symbols for the STIX project.
If I understand this correctly in the context, this person named the glyph based on what it looked like. So it wasn't the other way around.
- mkl 4 years ago
  
  It's possible both events happened. The downward trend line character certainly seems like something people might have wanted.
throw0101a 4 years ago

> The name itself sounds like it should be a graph of a downward trend line on a graph.
Or a lightning bolt through a window (with only the bottom-left of the window frame being visible).
MauranKilom 4 years ago

But if I read the article correctly, this glyph comes from a set of math symbols. I don't think "stock goes down" was ever used in any mathematical script.

yreg 4 years ago

I generally (perhaps naively) think that going forward knowledge loss won't be much of an issue compared to our history.

Surely the archeologists of the future won't have to wonder what some tool from our times was used for or what some symbol we currently use means… They will have Wikipedia and archive.org and whatnot!

But that fantasy is not compatible with reality where we are already unable to find out what is the purpose of some characters in Unicode.

berkes 4 years ago

That presumes humans can access our (electronic) media and understand it, in some 8.000 years or further.
There's no saying that there'll be a society capable of reading bits and bytes by then. Not just collapsed society -they'll hardly be interested in reading a random discussion on an orange forum for a niche group that lived 8000 years ago- but maybe even societies that are vastly technical superior to our own but cannot fathom what things meant 8 millenia back. I mean we have texts from some 600 years ago, that we can read, but cannot understand (e.g. Rohonc Codex). Eventhough our technology and knowledge is far superior to when it was written.
- mywittyname 4 years ago
  
  It will probably be even worse in the future, given that internet subgroups form their own language dialects as a kind of shibboleth.
  "Why do people in this group of wall drivers show off their wedding bands?"
chadlavi 4 years ago

On the contrary: books might survive total societal collapse, but electronics don't.
- yreg 4 years ago
  
  Sure, but my prediction is that the human civilization is likely to never have a total societal collapse.
mitchdoogle 4 years ago

Even digital storage is not permanent. Important things will be copied and preserved, but I imagine at some point so many of the relics of everyday life will be deleted or deteriorate at some point in the far future, such as this very comment
tsol 4 years ago

Electronics become unusuable quickly, though. We can find stone tablets and clay pottery, but 10k years from now will they be able to find hard drives and extract useful data? Seems like it can easily go in the opposite direction

kortex 4 years ago

Wake up, first thing that pops into my head, "I should check HN" (normally it's imgur, yeah bad habits).

Number one post is the Linking Sigil. Neat.

If you know, you know.

As for how a chaos magick symbol concocted in the 21st century ended up in a 1994 font spec, clearly discordians used the power of fnord to retcon it.

lgl 4 years ago

Context: https://tme.miraheze.org/wiki/Ellis_(sigil)
firstcommentyo 4 years ago

Im sorry to be a party pooper but though Linking Sigil is also mentioned in the article but that's not what the article is refering/asking about.
- bckr 4 years ago
  
  Hmm, the article links to the Linking Sigil at the bottom, in the links section.
  But the rest of the article is concerned with how mysterious the symbol is, and how no one knows where it came from.
  A clue: anyone can register a symbol for a surprisingly small fee.
  A question: why would the sigil be mentioned in an addendum but not in the article proper?
  Anyway, it's pretty obvious that GP had a premonition this morning, with a pay off.
  
  CobrastanJorji 4 years ago
  
  > A clue: anyone can register a symbol for a surprisingly small fee.
  A unicode symbol? I want a symbol! How much are we talking about?
  
  kortex 4 years ago
  
  There's a process for it. I'm not sure it costs anything but it's a bunch of paperwork. You have to justify what it's used for, why existing solutions don't work, etc. The working group is probably pretty reasonable, but I'm sure it's an involved process.
  If you do, can you please tack on symbols for following external links, space bar symbol, and all the other miscellaneous internet adjacent characters I always have to reach to Fontawesome for?
  http://www.unicode.org/pending/proposals.html

rich_sasha 4 years ago

Might we run out of Unicode code points, like we (seem to) be running out of IPv4 addresses?

As another comment mentions, once you add all these snowmen, with/without snow, male female and gender-neutral, in a few skin colour options (plus neutral)... it adds up. Plus, exponential growth once you consider family of snowmen (different number/genders/races of "parents", different number/gender/races of "children" and so on...).

lifthrasiir 4 years ago

There is no reason to believe the current rate (about ~35,000 over the period 2010--2020) to change rapidly, so we are probably safe for this century. You should be aware that emoji gender and skin color is encoded in character sequences and modifiers rather than atomic characters, exactly in order to avoid that exponential growth.
And in the unlikely case that Unicode gets so many characters somehow, you can always extend it: http://ucsx.org/
- jancsika 4 years ago
  
  Ok but what about all the cryptocurrency symbols? Those will probably accelerate the rate.
  Perhaps not by a significant or even measurable amount. Nonetheless, it's a great reason to start investigating a blockchain alternative to Unicode
  
  lifthrasiir 4 years ago
  
  The successful bitcoin sign proposal [1] explicitly deals with such a criticism:
  > Will Unicode be flooded with symbols for many crypto-currencies?
  > Most other crypto-currencies have learned from the difficulty that a non-Unicode symbol causes for Bitcoin, and use a symbol already in Unicode. For instance, Dogecoin uses Đ, Ethereum uses Ξ, Litecoin uses Ł, Namecoin uses ℕ, Peercoin uses Ᵽ and Primecoin uses Ψ. Some, like Ripple, use Roman capital letters (XRP), mimicking ISO 4217 currency codes.
  > While it is possible another crypto-currency will have a non-Unicode symbol that is extensively used in text, this is unlikely.
  I think this section was crucial for the eventual acceptance, because Unicode people do care (a lot) about long-term consequences of proposals.
  [1] https://www.unicode.org/L2/L2015/15229-bitcoin-sign.pdf
  
  nybble41 4 years ago
  
  It seem to me that this is something best handled with tag characters, like ¤XBT + (U+E007F) = ₿ (where the letters are from the tag block, U+E00xx). This mirrors one of the two systems for rendering national flags[0], just with a different starting codepoint, and can easily accommodate all the ISO 4217 currency codes and common unofficial extensions. If a system doesn't know how to render a particular glyph it can just fall back to showing the Roman capital letters.
  The downside of this approach is size: each tag codepoint (including the end marker) requires four bytes in UTF-8, plus two for ¤, so the sequence above is 18 bytes long.
  [0] https://en.wikipedia.org/wiki/Tags_(Unicode_block)#Current_u...
  
  lifthrasiir 4 years ago
  
  That sounds interesting, but modern currency symbols are already fast-tracked anyway---they almost always get assigned in the next version of Unicode---and more than one currency symbols for given ISO 4217 code can exist so I don't think it would work.
  
  nybble41 4 years ago
  
  > modern currency symbols are already fast-tracked anyway
  For national currencies, perhaps. New national currencies aren't introduced all that often, and there is a lot of pressure to support them quickly as their use is often mandatory for anyone living in that jurisdiction. For new private currencies, including crypto-currencies, we don't see quite the same eagerness—the observation that new crypto-currencies were more likely to reuse existing Unicode symbols than invent new ones was a consideration in getting the Bitcoin symbol adopted, as they didn't want to open up the floodgates to large numbers of new currency symbols. The tag-based system offers a compromise.
  > and more than one currency symbols for given ISO 4217 code can exist so I don't think it would work
  That is a bit of a problem, but it could be handled with the variant selector codepoints, for example ¤MOP = MOP$, ¤MOP(VS1) = 圓, and ¤MOP(VS2) = 元, if the symbols have the same meaning. To save some space the VS could replace the end codepoint. For fractional units there could be a different prefix such as ¢ for 1/100 or ₥ for 1/1000 in place of the ¤, or incorporating one of the Unicode fraction codepoints for other ratios up to ⅞ (or ⅑ or ⅒). These would be rendered verbatim in the fallback version, like ¢USD.
- secret-noun 4 years ago
  
  > emoji gender and skin color is encoded in character sequences
  A good tool to see this broken down is https://unicode-x-ray.vercel.app/?t=%E2%9C%8C%F0%9F%8F%BC%F0... (edit: fixed url to use percent encoded emoji)
bayindirh 4 years ago

Some of the glyphs you mention are combinatorial code points. i.e. they are multibyte characters combined to a single character. So you add a gender modifier and skin color modifier to change the appearance. You don't add multiple code points.
It's your device rendering these 2-3 byte character sets as single icons/emojis.
- masklinn 4 years ago
  
  > So you add a gender modifier and skin color modifier to change the appearance. You don't add multiple code points.
  FWIW that's true for the skin colors (there are 5 fitzpatrick scale modifiers, U+1F3FB to U+1F3FF), but it's not true for the gender: the basic gendered characters (e.g. U+1F468 "MAN", U+1F469 "WOMAN") were part of the original set "merged" from japanese emoji so the gender-neutral equivalent (e.g. U+1F9D1 "ADULT") was added as a separate codepoints.
  
  bayindirh 4 years ago
  
  According to this document [0], there are "Gender Alternates", which change the gender of an Emoji. Relevant part is starting near the end of Page 2.
  [0]: https://www.unicode.org/L2/L2016/16181-gender-zwj-sequences....
knome 4 years ago

1) there's only ~150k unicode values defined. If we assume a signed int for available space, we have 2,147,333,647 of 2,147,483,647 remaining. moreso if the int is unsigned. We're fine. 2) they use values that combine like ligatures to create the variants of values. there isn't a combinatorial explosion because color is a modifier value, and sex, and then the underlying symbol. It's not a unique symbol for each combination.
IPv4 ran down because everything needs an IP to be on the net and there are more humans than available addresses, and more gear than humans.
We don't need different characters per human, only to document existing languages and to account for the slow growth of modern hieroglyphs.
- cygx 4 years ago
  
  If we assume a signed int for available space
  Note that as it is currently defined, the Unicode codespace ranges from U+0000 to U+10FFFF, with some reserved codepoints (eg to encode surrogate pairs), yielding a total number of 1,112,064 assignable code points.
  
  throw0101a 4 years ago
  
  1,112,064 code points ought to be enough for anybody. — Bill Gates
  
  chrismorgan 4 years ago
  
  > as it is currently defined
  I find it completely implausible that this will ever change: the current size is baked in too heavily.
  • The abomination UTF-16, which is distressingly popular, cannot possibly support it. Replacing UTF-16 would be a massive upheaval in many ecosystems (e.g. JavaScript, Qt, Windows), and there’s no real prospect of most of those environments moving away from UTF-16, because it’s a massive breaking change for them by now. Rather, if the code space were running out, they’d devise something along the lines of second-level surrogate pairs. (And then we’d curse UTF-16 even more, because it’d have ruined Unicode for everyone again.)
  • All code that performs Unicode validation (which isn’t as much as it should be, but is still probably a majority) would need to be upgraded. Any systems not upgraded would either mangle or more commonly fail on new characters.
  • UTF-8 software would also need to be adjusted, since it’s artificially limited to the 21-bit space; and it wouldn’t be just a matter of flipping a few switches here and there to remove that limit—there will be lots of small places that bake in the the assumption that representing a scalar value requires no more than four UTF-8 code units.
- mkl 4 years ago
  
  We can't assume a signed int, as character encodings limit the number of codepoints: "Excluding surrogates and noncharacters leaves 1,111,998 code points available for use." -- https://en.wikipedia.org/wiki/Unicode#:~:text=Excluding%20su...
  
  thaumasiotes 4 years ago
  
  But character encodings don't limit the number of codepoints. Unicode is just a big list of correspondences between an integer and a glyph. There's no limit to how many integers you can assign.
  Unicode encodings are separate standards that give correspondences between Unicode code points (integers) and byte sequences. If Unicode changes in a way that invalidates an encoding, that just calls for a new encoding.
  
  mkl 4 years ago
  
  Yes, it could technically be extended, but the transition would be a massive undertaking, so in practice the encodings do limit the number of codepoints. UTF-16, which creates the limitation, is very widely used and required by major programming language standards like ECMAScript. A lot of software still can't cope with codepoints outside the BMP, and they were established with UTF-16 in 1996.
  
  marcosdumay 4 years ago
  
  Besides the difference between the abstract and unlimited Unicode and the encodings, our current "modern" encodings, UTF-8 and the new UTF-16 are artificially restricted and can be trivially expanded into a huge number of codepoints just by removing those restrictions.
  
  mkl 4 years ago
  
  New UTF-16? I'm only aware of the original 1996 one, which uses all of its 20 surrogate-pair bits for the codepoint (unlike UTF-8 which can use bits to extend to more bytes). In my understanding, "just" removing that restriction would mean completely replacing the encoding, like UCS-2 being replaced with UTF-16. The new one may have some overlap, but transitioning to it would still be a huge undertaking, and far from trivial (quite a few programs today still use UCS-2, quarter of a century after UTF-16 was introduced to replace it).
  
  kevin_thibedeau 4 years ago
  
  Unicode has been limited to 21-bits for a while so that UTF8 is guaranteed to encode no more than four bytes per code point. It can support the full 32-bit code space but changing now will break a lot of validation code.
- masklinn 4 years ago
  
  > If we assume a signed int for available space
  While UTF8 was originally defined as able to encode 31 bits, because of the limitations of UTF-16 RFC 3629 explicitly restricted the unicode code-space to 21 bits (or about 1.1 million codepoints).
- monsieurbanana 4 years ago
  
  > We don't need different characters per human
  Unicode NFTs here we come
moron4hire 4 years ago

Things like skin tone variations are not defined as individual code points. They are sequences of code points that combine to make the full, customized glyph. So you have one code point for "medical", one for "professional", one for "female", one for "brown skin", one for "blond hair", and from that you get a more specific picture of a doctor..
akvadrako 4 years ago

We are nowhere close to running out of code points. Unicode as currently defined has 1.1 million, but even that could be increased if there was a need. There isn't, since only 114 thousand are defined.
There are not separate code points for all combinations of genders and skin colors; the characters are made as combinations.
masklinn 4 years ago

> Might we run out of Unicode code points, like we (seem to) be running out of IPv4 addresses?
No. There are currently 144697 codepoints allocated, out of a possible 1.1 millions. And most updates allocate a few hundreds. The large allocations (in the thousands at a time) overwhelmingly concern large additions of CJK unified ideographs (see: 13.0 with 4969 out of 5930 new codepoints, 10.0 with 7494 / 8518, 8.0 with 5771/7716).
There have been large additions of historical scripts (9.0 added the entire Tangut script, 7.0 added 23 different scripts) but those occurrences have slowed down a lot.
xg15 4 years ago

I think the current approach is to just invent yet another "meta layer" of characters and declare that this particular sequence of bytes/codepoints/surrogate pairs/grapheme clusters/extended grapheme clusters/zwj sequences/whatever else you can think of has a special meaning and does not behave like you think it does. See also Henri Sivonen's essay on unicode string length [1]
So in a way, Unicode is already long past the time where you invent NATs and other hacks to buy you time with the scarcity problem.
[1] https://hsivonen.fi/string-length/
goto11 4 years ago

The snowmen are in Unicode because they existed in a character set before the Unicode standard was created. Unicode was deliberately created as a superset of all existing character sets at the time.
masklinn 4 years ago

> it adds up.
It really, really doesn't.
According to UTS #51, as of unicode 14 (and its ~140000 allocated codepoints) there are under 3500 codepoints classified as emoji.
And do keep in mind that #, or ®, are classified as emoji.
And incidentally, U+2654 "white chess king" (♔) was in unicode 1.0. The moral panic around emoji is really tiring, it's absolute, utter nonsense, every single time.
mike_hearn 4 years ago

We already did! That's what happened when UTF-16 was exhausted, which was never the original plan. Just like how the IPv4 internet degraded into a mess of hacks once addresses ran short (like NAT), so too did Unicode start becoming wildly more complex.
Amongst other things, hitting the limit of 16 bits meant the introduction of:
- The concept of "planes"
- UTF-16 combining characters
- UTF-32
- The newfound desire to encode emoji using combining characters, which means many apparently simple emoji are actually hacked together out of a mini programming language (e.g. black man = man emoji + skin tone modifier). Same thing for flags, which are actually two English letters mapped into a different part of the code space and then combined e.g. the British flag is G+B.
It's one reason why emoji broke so much software. It used to be that before emoji nobody cared about characters beyond the basic multilingual plane and ignored them. Then emoji came along and broke everything that assumed a UTF-16 code point == a character.

sj4nz 4 years ago

I'll propose that it could be the glyph to represent "cutting corners":

> To skip certain steps in order to do something as easily or cheaply as possible, usually to the detriment of the finished product or end result.

timonoko 4 years ago

It is a proofreaders mark with languages with long words. The L-shape is "Split the word here" and same with arrow-squiggle on top is "Do it at the next syllable or not at all". For example words "YÖ-KLUBI" and "YÖK-LUBI" have different meanings. Source: I have seen Finnish proofreaders marks.

bombcar 4 years ago

This sounds plausible but I can't search Finnish enough to find examples.
- timonoko 4 years ago
  
  You can find German marks "Korrekturlesen Zeichen". The L-shape is described in DIN 16511, but cannot find the opposite.
  
  bombcar 4 years ago
  
  Here's DIN 16511 https://www2.informatik.hu-berlin.de/sv/lehre/korrekturzeich... for anyone interested. Perhaps someone in Finland could dig further? It might be a bit strange to have proofreader marks for proofreading marks, but maybe something slipped in.
  "oikolukumerkit" found an image with more than just the DIN referenced marks, but not much more.

AnthonBerg 4 years ago

This symbol should be interpreted literally – it is of unknown meaning and origin. That’s what it means: “Of unknown meaning and origin”.

tarsinge 4 years ago

And still no external link character, ridiculous.

albrewer 4 years ago

Hm, now that you mention it, I always thought of the external link symbol as being a box with an arrow coming from inside it and protruding out of the upper right hand corner, but I don't see that symbol anywhere in Unicode, and I'm not sure why I have that association.
There is the U+1F517 link symbol but I'm not sure that's communicating the same thing.
- teddyh 4 years ago
  
  I often see a globe symbol used to indicate external links; i.e. U+1F310, U+1F30D, U+1F30E, or U+1F30F.
- layer8 4 years ago
  
  > I'm not sure why I have that association.
  Wikipedia uses it.

abakker 4 years ago

To me, it looks like a symbol you would use to denote electricity present. I'd say it was meant to say that an electrical box or some other piece of infrastructure had electricity present. It could even be a non-standard symbol for a ground.

edit: the right angle portion of it looks like the symbol for 3 wire 2 phase electricity used here - https://www.conceptdraw.com/How-To-Guide/qualifying-symbols ..Yes, it is just a right angle. but I could see the electricity symbol being overlaid to indicate that it was an electrical symbol.

jason0597 4 years ago

I still don't understand why Unicode has all these obscure symbols but they still haven't added all superscript/subscript numbers and letters

https://stackoverflow.com/questions/6638471/why-does-the-uni...

To quote a reply from the above StackOverflow thread: "So, they added a snowman with snow AND a snowman without snow , so that the weather forecaster of this world can avoid the dull snowflake , but we will never get our missing superscript q‽"

blacklion 4 years ago

I don't understand, why Unicode must (should?) contain superscript and subscript glyphes at all. Declared goal of Unicode is to have encoding of all characters used by all languages, past and modern. Subscript and superscript are not used by any language as separate characters, it is typesetting property. It should be solved by other means, not by character/glyph encoding. Should Unicode include ALL characters strike-out? Underlined? Double-underlined? Small-caps variant for all letters for languages where small-caps are used in typography tradition?
And, BTW, what do you mean by "all letters"? Should Unicode contain sub/superscript variants of Hangul or Devanagari or letters from hundreds other non-latin-alphabae languages? So, Unicode must be approximately tripled, bar hieroglyphic part (and why hieroglyphics should not be sub/superscripted?)?
- cygx 4 years ago
  
  Should Unicode contain sub/superscript variants of Hangul or Devanagari or letters from hundreds other non-latin-alphabae languages?
  Nope, you'd use markers similar to U+200E (LEFT-TO-RIGHT MARK) and U+200F (RIGHT-TO-LEFT MARK) that already exist to indicate text direction (which is also a typesetting property).
  
  lifthrasiir 4 years ago
  
  They are relevant because Unicode had to define the bidirectional rendering and not every rendering can be automatically inferred from logical (abstract) characters. Unicode has no reason to define the general text rendering including subscripts and superscripts, so there is no reason for Unicode to define control characters for them.
  
  cygx 4 years ago
  
  Unicode had to define the bidirectional rendering
  Why? They could have left this for a higher layer to handle.
  
  lifthrasiir 4 years ago
  
  Unicode defines characters, their semantics and (very flexible) guidelines for rendering them. Unlike, say, bold, italic or super/subscripts, bidirectionality is an intrinsic property of those characters and can't be easily refactored.
  
  cygx 4 years ago
  
  Should a universal text encoding provide a way to encode the names of mathematical and physical quantities?
  In my opinion, yes. If it can't, it's not fit for purpose, no matter what is or is not an intrinsic property of some characters...
  
  thaumasiotes 4 years ago
  
  > Unicode defines characters, their semantics
  Unicode specifically states that it doesn't define the semantics of characters. That would seriously interfere with its purpose of defining characters.
  There are some notable exceptions, and they are acknowledged to be mistakes.
  
  lifthrasiir 4 years ago
  
  > Unicode specifically states that it doesn't define the semantics of characters.
  The Unicode Standard explicitly says otherwise:
  > Characters have well-defined semantics. These semantics are defined by explicitly assigned character properties, rather than implied through the character name or the position of a character in the code tables (see Section 3.5, Properties). [1]
  > The Unicode Standard associates a rich set of semantics with characters and, in some instances, with code points. The support of character semantics is required for conformance; see Section 3.2, Conformance Requirements. [2]
  To be fair, it refers to "character" semantics which is more or less abstracted by character properties. It is not like that, for example, △ U+25B2 WHITE UP-POINTING TRIANGLE UNICODE CHARACTER can only ever be used for denoting triangles. But it has defined semantics in the way that the character has properties expected for such symbols.
  [1] https://www.unicode.org/versions/Unicode14.0.0/ch02.pdf#page...
  [2] https://www.unicode.org/versions/Unicode14.0.0/ch04.pdf#page...
- BaRRaKID 4 years ago
  
  This is probably an edge case, but I work in lab software that uses chemical symbols and having sub and super characters saves lots of headaches. I can just store "CO₂" in a database, query it, and display it back as a simple string, or display values in scientific notation like 1,3×10³, without having to use any formatting.
  But to be honest I'm not sure what the parent comment wants to see added because at the moment having all the letters from A-Z, numbers from 0-9, and plus minus and equals signs as both subscript and superscript seems to be enough.
  
  cygx 4 years ago
  
  Upper-case subscripts are missing, for one: I'm not allowed to talk about the normal force F_N in plain text email. Superscript and subscript Greek letters would also be nice to have, eg in context of relativity.
  
  blacklion 4 years ago
  
  Why not Devanagari then? This Europe-centric point of view bother me.
  Also, I've seen a lot of different symbols as subscripts in mathematical and physical articles, like squares, triangles, arrows, etc.
  
  cygx 4 years ago
  
  Why not Devanagari then? This Europe-centric point of view bother me.
  Sure: As I mentioned in another comment, I'd add markers to enable arbitrary super and subscripting.
  However, the question I responded to was asking what specifically people were missing in practice, and the examples I gave are things I personally would have used if they had been available.
lifthrasiir 4 years ago

Unicode superscript and subscript is not intended for mathematical usages [1].
[1] https://unicode.org/faq/ligature_digraph.html#Pf8
- IshKebab 4 years ago
  
  That's a cop out. You could equally say that new emojis shouldn't be added because you should use inline images for those. Or RTL markers shouldn't be added because you should use dedicated text styling for that.
  There are a ton of places that don't support superscript markup.
  
  tgv 4 years ago
  
  > You could equally say that new emojis shouldn't be added because you should use inline images for those.
  Well, that's really a better solution. Or a unicode character that allows you to set a pixel on a 256x256 grid and one to compose them. Strike that. Better not give anyone bad ideas.
  
  DiabloD3 4 years ago
  
  Almost sounds like you reinvented DEC Sixel.
  
  lifthrasiir 4 years ago
  
  > You could equally say that new emojis shouldn't be added because you should use inline images for those.
  If emojis weren't allocated out of compatibility concern, this would be exactly my opinion from the day 1. To be honest I'm not still happy with the current emoji assignments and semantics. Not even Unicode people are satisfied either, there are numerous proposals for replacing emoji with something else (example keyword: QID emoji).
  > RTL markers shouldn't be added because you should use dedicated text styling for that.
  > There are a ton of places that don't support superscript markup.
  Unlike most text attributes, bidirectionality is an intrinsic property of abstract characters and thus absolutely within the Unicode's scope. Ideally you can't and shouldn't make some LTR character to behave like RTL characters or vice versa. Bidi control characters only exist to correct automatic rendering, and can be presented out of band (the Bidi specification is explicitly designed for this use case in mind [1]).
  [1] https://www.unicode.org/reports/tr9/#Markup_And_Formatting
vesinisa 4 years ago

Should we also have slanted, bold, semi-bold, light and underlined versions of every code point? Versions with/without serifs? For monospaced text? Those are all presentational matters. That we have super/subscripts in Unicode in the first place seems to have been just a hack to help terminal emulator software deal with obsolete encodings like ISO-8859-1: https://www.unicode.org/L2/L2000/00159-ucsterminal.txt
- account42 4 years ago
  
  𝐒𝐡𝐨𝐮𝐥𝐝 𝘄𝗲 𝗵𝗮𝘃𝗲 𝗯𝗼𝗹𝗱 𝙖𝙣𝙙/𝘰𝘳 𝑠𝑙𝑎𝑛𝑡𝑒𝑑 𝒄𝒉𝒂𝒓𝒂𝒄𝒕𝒆𝒓𝒔 𝗶𝗻 𝗨𝗻𝗶𝗰𝗼𝗱𝗲? 𝓘𝓽 𝓼𝓮𝓮𝓶𝓼 𝓈𝑜𝓂𝑒𝑜𝓃𝑒 𝔱𝔥𝔬𝔲𝔤𝔥𝔱 𝖘𝖔!
  
  mkl 4 years ago
  
  Those are intended for maths, not for formatted text. Variables in mathematics are usually a single character, so there is a great variety of ways to format the characters to create different symbols. Diacritical marks, underlines, etc. are also used for this.
goto11 4 years ago

> but they still haven't added all superscript/subscript numbers and letters
That would triple the size of Unicode.
- c22 4 years ago
  
  I've been told we'll never run out of space in Unicode.
- hiccuphippo 4 years ago
  
  They would just need to add one Unicode modifier for superscript and one for subscript like there is for gender and skin color.
  
  goto11 4 years ago
  
  Fair enough, but general formatting codes would overlap with what is already supported in rich-text formats like HTML or LaTeX. Unicode is a standard for encoding characters, it is not supposed to be a rich-text document format itself.
- IshKebab 4 years ago
  
  I mean they could at least add q.
mbauman 4 years ago

Sign onto the proposal: https://github.com/stevengj/subsuper-proposal

grandchild 4 years ago

While I absolutely enjoyed the historical research on such a miniscule mystery, I also liked how it took me two clicks from the front page of HN into an occult eBook about "khaos magick".

The things people write about...

kromem 4 years ago

It looks like someone asked for a glyph that would look like a chart with a downward trending zigzag, someone ended up getting the instructions and drew this thing, and the request proceeded bunched with other requests through the process with no one adequately challenging that the glyph really looks like what it's supposed to look like.

And yeah, actually a downward zig zag on a x/y plot glyph would be useful to have.

Like "chart with downwards trend" added to Unicode 6.0 in 2010, 25 years after "right angle with downward zig zag" was proposed and included.

lizardactivist 4 years ago

This is like the definition of legacy luggage. And somewhere there's probably someone who will argue that if the symbol is not present in a typeface, then said typeface is not "compliant".

aharris6 4 years ago

According to today's xkcd, this symbol means "Larry Potter"

https://xkcd.com/2606

paledot 4 years ago

Evidently Randall Munroe reads HN, to the surprise of no one.

bell-cot 4 years ago

Guess: right-handedness (as in chirality, polarized light, spiral motion, etc.)

quickthrower2 4 years ago

⍼

tlb 4 years ago

Since it looks like a caduceus on a graph, I propose it as a symbol for ethical statisticians.

russellbeattie 4 years ago

Eventually Unicode will think, "Hey, maybe bold, italic and underline aren't just decorative, but required formatting which conveys emphasis, and other information that needs to be contained within the text itself!"

Or, maybe not and we'll continue to lose formatting every time we copy and paste and be forced to use plain text for the rest of our lives. Also, we can color our emojis now, but that WARNING text can't be in red. Because colors don't matter?

Which ever person decided basic formatting shouldn't be in the spec was wrong and we lose important details every day because of it.

pwdisswordfish9 4 years ago

Someone show him U+29B0 REVERSED EMPTY SET.

progbits 4 years ago

Clearly useful for typesetting reflections of mathematical proofs. /s
- willis936 4 years ago
  
  It's how Leonardo Da Vinci would type up proofs.
  
  mkl 4 years ago
  
  I think he would have typed things left to right. He only wrote in mirror because it was more ergonomic for him, but there's no such issue with a keyboard.
leipert 4 years ago

Seems like Wikipedia has the answer:
> When writing in languages such as Danish and Norwegian, where the empty set character may be confused with the alphabetic letter Ø (as when using the symbol in linguistics), the Unicode character U+29B0 REVERSED EMPTY SET ⦰ may be used instead
yreg 4 years ago
Not long ago I found these
```
    ≤ U+2264 LESS-THAN OR EQUAL TO
    ⋜ U+22DC EQUAL TO OR LESS-THAN
    ≥ U+2265 GREATER-THAN OR EQUAL TO
    ⋝ U+22DD EQUAL TO OR GREATER-THAN
```
or even
```
    ⋚ U+22DA LESS-THAN EQUAL TO OR GREATER-THAN
    ⋛ U+22DB GREATER-THAN EQUAL TO OR LESS-THAN
```
- lifthrasiir 4 years ago
  
  The former is probably for the same reason that both plus-minus and minus-plus exist. The latter is commonly used for the "unordered" relation in partially ordered sets.
  
  account42 4 years ago
  
  Wouldn't "less than, equal to, or greater than" imply anything EXCEPT unordered?
  
  lifthrasiir 4 years ago
  
  Ah, correct. The slashed variant would mean unordered, while the original character means ordered.
- alickz 4 years ago
  
  > ⋚ U+22DA LESS-THAN EQUAL TO OR GREATER-THAN
  > ⋛ U+22DB GREATER-THAN EQUAL TO OR LESS-THAN
  These are very interesting. What would be the use case for these?
  
  a_shovel 4 years ago
  
  Perhaps "has a comparison relation to"? So for any two numbers x and y, x ⋚ y is true, but "square ⋚ pentagon" is false.
  
  skykooler 4 years ago
  
  More concretely, it's true only for real numbers - so -4 ⋚ 7 is true, but 3+2i ⋚ 5 is false.
  
  vermarish 4 years ago
  
  When I was learning statistical hypothesis testing, I once wrote notes that looked like "H_0: mu ⋚ a <--> p-value: P(T(X) ⋛ T(a))", although I didn't include the equal-to bar.
  
  mkl 4 years ago
  
  That sounds like a different symbol: ≶ U+2276
  There are lots of similar symbols: ≶≷≸≹⋚⋛⪋⪌⪏⪐⪑⪒⪓⪔
  
  Nadya 4 years ago
  
  https://en.wiktionary.org/wiki/%E2%8B%9A
  > If the function f is differentiable and concave, then f′(x1)⋚f′(x2) as x1⋛x2. That is f′(x1) and f′(x2) have the opposite relation as x1 and x2.
  
  bialpio 4 years ago
  
  This blew my mind:
  "Related terms
  ⋛ (synonymous when used on its own, but antonymous when used jointly)"

teleforce 4 years ago

It can be a symbol for polarization of electromagnetic (EM) waves, with Electrical and Magnetic fields moving orthogonal to each other [1].

Unlike other waves like sound waves, EM has polarization component.

In wireless communication, for example, polarization can be used as another component for diversity to increase the performance of the communication channel.

[1]https://www.microwaves101.com/encyclopedias/polarization

jjtheblunt 4 years ago

AMS = American Mathematical Society last i subscribed. How the heck would someone surveying mathematicians not have found that?

ectopod 4 years ago

In a linked post Barbara Beeton says not. She collated these characters while working for the AMS so she should know.
https://tex.stackexchange.com/questions/640588/what-is-%E2%8...
- jjtheblunt 4 years ago
  
  that's even more wild! thank you for sharing / emphasizing that curious twist
  
  cold_fact 4 years ago
  
  I work at AMS currently, this is so interesting!

tgorgolione 4 years ago

This reminds me of the design used to denote a graph whose y axis does not start at 0:

https://tex.stackexchange.com/a/79272

herodotus 4 years ago

> And the inclusion of “AMS” in the names of the entity collections likewise remained mysterious.

Could this be The American Mathematical Society?

anentropic 4 years ago

Seems plausible, but from the linked stackexchange question: https://tex.stackexchange.com/questions/640588/what-is-%E2%8...
> It appeared in the entity set ISOAMSA, which, regardless of the name, had no connection with the American Mathematical Society.
- herodotus 4 years ago
  
  Thanks. I missed that.

yablak 4 years ago

Someone mentioned the lightning bolt means a contradiction so best guess is "use the left-hand rule"?

JulianMorrison 4 years ago

That is a chaos magick linking sigil.

cm2187 4 years ago

You will soon need a billion usd budget to implement a new font

dskloet 4 years ago

From the title I thought this was about some Uranium isotope.

skykooler 4 years ago

U-237 exists and has a half-life of about six days; I can't think of a valid modifier that would add that C on the end, though. (Unless you're talking about a very specific isotopic composition of uranium methanide, I guess.)

rackjack 4 years ago

That rabbit hole of esotericism was pretty cool.

baruchel 4 years ago

First thought was about Feynman diagrams :-(

SleekEagle 4 years ago

Looks like a break in a graph axis

pfalke 4 years ago

https://m.xkcd.com/2606/ — mystery solved!

8ytecoder 4 years ago

https://xkcd.com/2606/

Now we have an XKCD for this.

em-bee 4 years ago

i guess randall got inspired by this discussion: https://xkcd.com/2606/

reaperducer 4 years ago

no one knows what ⍼ is meant to represent

Translation: Nothing came up on a Google search, and going to the library and looking in a book is hard.

I see this more and more often these days. Bloggers claiming that there is no known origin for something, or inventing their own histories based on nothing more than internet searches.

The internet is vast, but 99.9% of the world's history and information is not online for free.

adamrezich 4 years ago

on the contrary, dude seems to have done pretty extensive research—did you read the article?