_chendo_ 3 years ago

I've built an app that has the same goals (not operate a mouse) but approach it completely different.

Rather than try to simulate the moving the mouse itself, Shortcat [https://shortcat.app/] indexes the user interface (buttons, text fields, links, menus, etc) and enables fast fuzzy search of the interface. Type a word, abbreviations, or hints and hit Enter to click or action the element. Works almost everywhere on macOS, including browsers, Electron apps, and even iOS apps!

The goal is to minimise cognitive overhead to achieve a particular intent, so being able to type a word to hit a button, or active a deep menu item when you don't know the shortcut is quick and easy.

I'm currently working on a modal option which enables staying within Shortcat to navigate an interface, as well as chords for simulating scrolling and arrow keys.

Shortcat relies on using the Accessibility API to index UI elements however, and is dependent on how well an app or website has implemented it. One of the goals is to help improve accessibility implementations by exposing more people to its implementations and pushing for developers to fix broken or incorrectly implemented accessibility tagging.

Shortcat is macOS only for now as I haven't been able to investigate how viable doing this on Windows or Linux would be, especially on Linux considering all the different toolkits that exist.

  • LASR 3 years ago

    This is excellent. I will be trying this out.

  • kache_ 3 years ago

    Have you considered using ML/OCR to figure out the position of the text relative to the screen? Seems much simpler than relying on accessibility APIs

    Thank you for your hard work!

    • _chendo_ 3 years ago

      I have plans to use ML/OCR to augment results down the road but the AX APIs and ecosystem on most apps (that I encounter, at least) are generally decent. Also, OCR means it won’t understand buttons with just icons, whereas AX APIs can grab em just fine.

      Thanks! It’s easily my longest running project at a decade

  • hawski 3 years ago

    Very interesting thing. I wonder if the gap of apps not supporting a11y could be reduced by using Tesseract to OCR the text.

  • discodachshund 3 years ago

    Excited to try this out! Is it planned to open source? I would love to try integrating this into Raycast

    • punnerud 3 years ago

      Is Raycast open source? Could only find that the plugins are on GitHub.

    • _chendo_ 3 years ago

      It won't be open source, but I will be adding an API so it can be integrated with other apps and scripted

  • Luc 3 years ago

    This app is quality. I can tell you have been working on it for years. Why not charge for it?

    • _chendo_ 3 years ago

      Haha, thanks :)

      I did charge for it a couple of years ago, however I rebuilt the whole thing from scratch after a long hiatus and hadn't had bothered to reimplement licensing because the existing options all kinda suck, and figured I'd focus the time on features and usability first. I think with the modal mode in the next release will bring it much closer to a 1.0 release.

      • justusw 3 years ago

        If you bundle it and release a paid for application on the App Store, I would totally buy it and even roll it out to my staff. The magic of the App Store allows you to do company wide roll-outs quite easily.

        • _chendo_ 3 years ago

          I'm not sure if an app like Shortcat can be released on the App Store given it uses the Accessibility APIs (sandboxing etc), also the 15-30% cut they take is a bit ooooof, but I do have plans to support company/teams licensing!

      • tfsh 3 years ago

        +1 this is awesome! I'd like to donate if I can :)

        edit: nvm me, found the option in settings (on activation show shortcuts immediately).

        Quick question, I've been playing around with Shortcat for a while. When I press the activation hot-key it takes about 4 seconds for the yellow two-letter denoated highlights to show up, despite the app's text stating "found n elements in ~0.20s". Is there a config option to instantly show the yellow highlights?

        • _chendo_ 3 years ago

          Thanks! I don't have a way to take tips yet, but you can support by pushing for developers to improve their accessibility implementations when you run into issues!

          I see you found the setting for that. It was a deliberate default initially as the intended way to use Shortcat is to activate Shortcat and type what you want without waiting to see hints, as this is generally faster and less mental overhead IMO, especially for fast typists and well-structured interfaces.

          However, some people prefer minimal keystrokes and I get that. I'm trying to figure out the right set of defaults to make it friendly to new users while nudging people to how Shortcat is designed to be used and will be tweaking it as I go.

  • yewenjie 3 years ago

    I have been meaning to build something like this for myself, albeit for Linux. Does anybody know if there is any already existing efforts there?

    Given that Linux doesn't have anything like an accessibility API, I think the only option is training ML models.

  • pabs3 3 years ago

    Should be doable on Linux for most mainstream apps due to the toolkits having a11y support, but obviously not all apps use mainstream toolkits.

  • zvmaz 3 years ago

    Excellent! I'd love to have that on Linux.

  • smcleod 3 years ago

    I've been playing around with ShortCat recently - really cool app! Keep up the good work.

  • m1r5h1 3 years ago

    I never comment on HN however I just want to say I've downloaded your app and it's very impressive - I'm going to try and incorporate this into my workflow the best I can. Thanks!

  • Tepix 3 years ago

    Looks very cool, have you considered adding voice input or is it already possible?

    • least 3 years ago

      There’s already a very sophisticated system on MacOS for voice input so I feel like it’d probably be superfluous.

  • Daynil 3 years ago

    I love this new wave of tools coming out for mouseless computer use. Chronic mouse use has destroyed my wrist so I have to avoid using it as much as possible.

    I love Shortcat's approach in general, indexing the UI. However, the reliance on the Accessibility API is actually a significant downside in the real world in my experience since so many apps don't properly implement it. I feel like Warpd is a good complement to this, you could use Hint or Grid mode as a fallback when the indexing approach fails.

    I wish I could use shortcat or Warpd, but unfortunately I'm on windows. Curious if anyone has any good tool recommendations for windows? Currently, I'm using:

    1. Vimium for Chrome (so good, wish I could just use it across the OS).

    2. Hunt and Peck: https://github.com/zsims/hunt-and-peck has been my favorite for OS-level use, a simple version of shortcat for windows. But, it's not maintained and not as slick as some of these newer tools.

    • mjcohen 3 years ago

      I have used a trackball for many years since my wrist started bothering me, and I love it. I am right-handed, and I use Logitech's 575 and MX Ergo. I prefer the Ergo, even though it is more expensive. I keep it beside me on the couch where I sit. That way my elbow makes a 90 degree angle. Very comfortable. My keyboard is on my lap and my monitor at eye level.

      • Daynil 3 years ago

        Nice, I've actually tried a trackball myself, but with the way I use my desk (sit/stand) it caused more problems than it solved (shoulder issues). Ergonomics is an art I suppose.

    • _chendo_ 3 years ago

      Oooooh, Hunt and Peck indicates that it's possible to make a Shortcat for Windows!

      I would probably need to pay someone to build that particular version though cause the last time I built anything for Windows was like 15+ years ago

    • forgotmypw17 3 years ago

      If you're already using Vimium, I suggest trying qutebrowser, which takes keyboard accessibility to a whole new level, by making it a first-class feature for the entire browser.

      It does basically cut out the mouse, and had a several-days learning curve for me, but after that it's pretty great. Here are some cool features, off the top of my head:

      * Python-scriptable, though I haven't figured out how to use this yet.

      * Bind javascript bookmarklets to a keyboard shortcut (use :bind with the jseval command)

      * Toggle not only javascript, but image loading and a whole slew of other features, with a keyboard macro.

      * Vertical tabs.

      * All config is adjustable via commands.

      * Keyboard macros like "pop tab into a new window", "clone tab", "close all other tabs", etc.

      * Text selection using the keyboard.

      * Quite similar keyboard dynamics to vim.

      It has a built-in ad blocker, and you should run :adblock-update when you first use it.

      Another browser which is similar, but which I haven't gotten into as much, is Luakit.

  • joshspankit 3 years ago

    This is what CMD-Shift-? should be.

    I think this paradigm along with more app developers putting all the important functions in menus is a strong contender for Maximum Intuitive Productivity

  • madacol 3 years ago

    Looks really cool. Is it able to select text for copying?

    • _chendo_ 3 years ago

      I'm working on a version that allows sending arrow keys with modifiers to the targeted application, so soon!

  • bloopernova 3 years ago

    Shortcat is utterly amazing. I really hope I can work this into my entire MacOS usage. You should be really proud of what you've made because this is fantastic!!

  • abalaji 3 years ago

    Any plans to add scrolling functionality to shortcat? I'd be able to move over completely from vimac if that gets added.

    • _chendo_ 3 years ago

      Working on that right now :)

  • fcoury 3 years ago

    Oh my! I came to the comment section to ask about a Mac app that I've seen a long time ago that did this. Lo and behold, you, the author, have written the first comment. :-)

    Thank you for Shortcat, I used it a long time ago and loved it. Excited to giv it another go!

    • _chendo_ 3 years ago

      No worries! Glad you love Shortcat :D

  • ta988 3 years ago

    I just tried that, this is excellent.

awestroke 3 years ago

Looking at the limitations, I hate how fragmented Linux is becoming for apps like this. Completely separate implementations for:

- X

- Wayland + Gnome Shell

- Wayland + wlroots-based compositor

- Wayland + sway or any other non-wlroots based compositor

Is Wayland really that much better that this is worth it? Why can't Wayland be aware of the compositor like X?

  • alrlroipsp 3 years ago

    There's XWayland for interop. Besides that, people have been moving away from X11 for years now.

    • ranguna 3 years ago

      I still question myself as to why

      • bogwog 3 years ago

        Have you actually used Wayland? Besides the obvious current issues (compatibility and/or missing features), it is way better. Even something as simple as moving and resizing windows feels noticeably faster and more responsive on Wayland compared to X.

        I don't use it at the moment because of compatibility issues with some software that I use, but that's not Wayland's fault.

        • NhanH 3 years ago

          > Besides the obvious current issues (compatibility and/or missing features)

          I mean, that's the whole biggest criticism of wayland, 15 years in and it still doesn't work (well?) with the biggest GPU brand. The criticism has always been about whether it is worth it to fragment linux land and throw away a few decades worth of work on X.

          • qu4z-2 3 years ago

            My criticism of Wayland has always been that they dropped a spec, forgot to make an actual server, and now every window manager needs to implement support for various extensions separately. It's a fine design if you assume everyone only uses Gnome, but...

            It's not just throwing away a decade's worth of work on X, it's making everyone redo that work per display manager (thankfully wlroots exists as a kind of Wayland shared library).

  • dottedmag 3 years ago

    This is growth pains.

    Once the the need is understood well enough, a common extension will be written, and Mutter, KWin and sway/wlroots will implement it.

    This is better for security than X11 free-for-all.

    • eptcyka 3 years ago

      I agree that Wayland by design can be more secure - but judging from my personal threat model, if malicious code gets to attack X or Wayland, it's already all over.

    • asveikau 3 years ago

      Isn't the use case of apps injecting mouse and cursor events the "security free for all" that Wayland is trying to prevent?

      Full disclosure, I am a Wayland skeptic. I don't think your focus on X input security is as justified as you probably think.

      • yjftsjthsd-h 3 years ago

        I think Wayland's security model was more worried about reading inputs than writing them (i.e. preventing keyloggers). Of course https://www.x.org/archive/X11R7.5/doc/security/XACE-Spec.htm... also exists, so...

        • asveikau 3 years ago

          Writing events is certainly a potential security problem.

          I know in the Windows world, one of the UAC features was that a less privileged process can't send events to an elevated window.

          In X11, I think last I checked most distros disable the XTEST extension by default out of security concerns. Skimming the warpd code, they are using XTEST for the X backend.

          As I think of the keylogger problem, it's not really privilege escalation, is it? If you're running as the same user as all the other clients, you could ptrace(2) them and intercept their event loops. I guess there are some container-based app deployment solutions now where you could run stuff at different security levels, so maybe it's more of a legit issue now...

      • dottedmag 3 years ago

        If it is implemented it will most likely go through the xdg-desktop-portal, with a policy controlled by compositor.

        And most likely it won't be injection of mouse and cursor events, but something higher-level, like focus switching requests.

        • asveikau 3 years ago

          Are you aware that the use case here is simulating a mouse? Focus switching is not enough.

          • dottedmag 3 years ago

            Ah, I mixed it up with another tool, sorry.

            This one looks like a small feature in a compositor and not an external tool, really.

            I guess it would take 100-200 LoC to implement in GNOME Mutter.

            • asveikau 3 years ago

              Then you'd need to implement it in every compositor.

              Excuse me for being blunt. I don't know if you understand how shitty of a design you advocate. Solid designs do not require modifying core components to write application level features the original authors did not envision.

              • dottedmag 3 years ago

                Excuse me for being extra blunt. I don't know if you understand how shitty of a design you advocate. Solid designs do not open users to being attacked and their credentials stolen by malicious applications, including sandboxed ones.

                Moving cursor around is a compositor's domain, not some arbitrary application's that decided to fiddle with the user's input.

                • asveikau 3 years ago

                  To you it's an arbitrary program. To the user it's a program they want to work.

                  An API should not be so preachy about which programs can theoretically be written. It should provide broad mechanisms.

                  It is very frustrating to work with people who think like you do, that 3 or 4 unrelated projects have to carve up narrow exceptions to how the platform works for every single use case, nominally because of theoretical harm of this exploit no one will write, but actually more based on your ego perception that you know better than every other developer on the planet.

                  So Wayland has this long list of impossible applications which are doable everywhere else. It's a prima donna.

  • epse 3 years ago

    Wait is there a separate implementation for sway and other wlroots? Because sway is wlroots based, that's where wlroots originates?

    • awestroke 3 years ago

      I might be wrong about Sway not being wlroots based. But there are a lot of compositors that are not based on wlroots

      • yjftsjthsd-h 3 years ago

        Yeah, sway is wlroots. But GNOME and KDE are both doing their own thing, so the point is in generally perfectly valid.

  • vidarh 3 years ago

    Personally I don't think Wayland will get to a good spot until one compositor "wins" and provides interfaces for modification/extension such as e.g. X-like ability to write window managers without replacing the whole compositor. At this point I see no compelling reason to switch to Wayland for my own part.

  • TheCycoONE 3 years ago

    Where were you seeing that. I only glanced through a handful of files, but it appears to be one implementation for me:

    https://github.com/rvaiya/warpd/tree/master/src/platform/way...

    If there is a distinction, and there might be, "completely separate" is exaggerating.

    Edit: I see that only wlroot based implementations are supported so far, and of those there are some things broken in wayfire. Perhaps this is what you're referring to?

    • awestroke 3 years ago

      The wayland implementation does not support Gnome Shell on wayland (arguably the most mainstream combo since it's the default in ubuntu). To support Gnome Shell on Wayland, you need to include a Gnome Shell extension which has some headaches. X is a third implementation. And then there is one additional implementation per Wayland compositor that is not based on wlroots

      • TheCycoONE 3 years ago

        Not a full and completely separate implementation though. Just wrappers to mask differences in compositor extensions that haven't been stabilized yet. If there are none the implementation is simply impossible.

  • 3np 3 years ago

    You can reduce that list from 4 to 3 (sway and others use wlroots. GNOME is the one weird out in Wayland)

    • yjftsjthsd-h 3 years ago

      Eh. Sway uses wlroots, but KDE is a completely different option that they didn't list so it comes out in the wash.

bradrn 3 years ago

Looks very useful! I especially like the ‘grid mode’ — I would never have thought of that idea myself. It’s just a pity it isn’t available on Windows, though I’ve previously had good experiences with Mousable [https://github.com/wirekang/mouseable].

  • eichin 3 years ago

    It's been in https://kaleidoscope.readthedocs.io/en/stable/plugins/Kaleid... for a while (as in, it's a plugin you can include in your custom open source keyboard firmware, with both 2x2 and 3x3 modes.)

    • psytrx 3 years ago

      When I first tried warp on my new keyboard, I really did not understand what was happening and thought it was bugged. The bindings came by default. I removed them and went on.

      A few days later I wanted to add custom macros and had to read the docs. Skimming the warp section made me realize it's basically just recursive space positioning. I tried the mode again, and soon realized how incredibly useful this is.

      It's a bit difficult for me to keep the state in my mind when going down the tree because there's no visual indicators. It's a keyboard firmware after all.

  • EsportToys 3 years ago

    Saw this thread just now, I'm gonna try to implement this in AutoIt, let's see how long it takes me! [2022-10-16-09:58-Z]

    • lolive 3 years ago

      What is AutoIt? A fork of AutoHotKey?

      • EsportToys 3 years ago

        It's like AutoHotKey but designed to be more programming- rather than scripting-oriented. It is astonishingly easy to create GUIs in AutoIt in comparison, I use it to rapidly prototype UX ideas.

        In fact, historically AHK is actually a fork of AutoIt.

  • EsportToys 3 years ago

    (followup on previous comment) finished in three hours [2022-10-16-13:00-Z]:

    https://github.com/EsportToys/AutoWarpd

    • prasoon503 3 years ago

      how to activate it? I cant seem to figure out. The script is running but how to activate the grid mode?

      • EsportToys 3 years ago

        Ctrl+Win+G to activate, then uijk to move quadrants and m,. for left/middle/right clicks

        • philonoist 3 years ago

          please upload full documentation. I am testing this right now.

          • EsportToys 3 years ago

            Added rudimentary README to the repo

  • herrsimon 3 years ago

    Long time warpd user here: You should ditch grid mode for the much more efficient hint mode. Also, check out keyd by the same author. The combination of warpd/keyd easily saves me an hour of work every day.

forgotmypw17 3 years ago

This is great.

I have been using something called keynav[1], for getting a similar grid mode. I would never guess how intuitive it is.

It doesn't replace the mouse, but it's helpful for that occasional click in the middle of heavy keyboarding.

https://github.com/jordansissel/keynav

1letterunixname 3 years ago

vim-easymotion for the entire screen. I love it. Useful in numerous ways from accessibility to constrained devices to keyboard-centric navigation.

In the 1980's, there was a thread of animosity directed at GUIs and mice as productivity-killers and providing accessibility to novices that robbed power-users of expressivity and automation as features shifted towards UIs over text mode applications. I think we can agree that with necessary and sufficient software engineering and UX, CLI-UI-API parity is achievable offering an easier learning curve, varying levels of user astuteness, mental models, and expressivity to accomplish a task by having different MVC "views" or "presentations" to interact with software or systems of any sort.

hsbauauvhabzb 3 years ago

I’ve long thought eye tracking would be awesome in this style, this is the next best (and currently only technically viable) solution. Well done!

Multicomp 3 years ago

On Windows speech recognition or dragon naturally speaking, there is the mouse grid functionality. It divides the screen into a grid of nine tiles, then you type a number to select one of the tiles, then the tile gets divided into nine tiles, which recurses on down until you have a single coordinate selected.

I just wish I had an easy way to do that from the numpad. That way, to move the mouse to an arbitrary location I need it to be, I could type 19432 enter and know that corresponded to the coordinates to refresh the page I am reading, that way I could use the mouse less and less as I started to memorize the 80% case of where I need the mouse to go and just bang it out on the keyboard.

speps 3 years ago

For Android users, that's a feature of the Voice Access app.

yjftsjthsd-h 3 years ago

Oh, neat; I'm very attached to keynav for this use case, but this is more portable. I'll have to dig into the Wayland limitations and caveats, since I thought that this was literally impossible to implement usefully there. Maybe this is one less blocker for me being able to switch now.

necrotic_comp 3 years ago

I use a thinkpad-style keyboard and my mouse is on the homerow. It feels like that is much more efficient than this, as you get the precision of the mouse without having to move your hand.

I don't understand why more people don't adopt it. Is it because it's so different from a normal mouse ?

  • rowanG077 3 years ago

    From what I remember isn't that a tiny thumbstick? That's much, much slower then something like this.

    • necrotic_comp 3 years ago

      No it's not. It's pressure sensitive and about as fast or faster than a trackpad with movement. I've never had a problem with speed, even across multiple, large, monitors.

      • rowanG077 3 years ago

        Is this not what you mean: https://www.youtube.com/watch?v=7H8o_-7bKIU? I really doubt it's as fast/accurate as a trackpad even if you master it. This tool looks to be as fast as a mouse if you master it in many situations. But you would need a direct comparison by skilled users for each to be sure. I just don't see how a mini stick will ever beat pressing two buttons.

        • kfajdsl 3 years ago

          It's definitely not as accurate as a trackpad or a mouse, but it's not to hard to get very close. The benefit of it is that it's right on home row. You don't need to move your hands to use the mouse.

          On my thinkpad, I use the trackpoint and trackpad equally.

          • qu4z-2 3 years ago

            My experience is that a mouse is most accurate, followed by a trackpoint, followed by a trackpad, but then again I rarely use trackpads. I inevitably move the mouse when I take my finger off or try to press buttons. Also if I leave it enabled, I inevitably teleport the mouse around the screen with my palms, "palm detection" or no.

            Sadly every non-IBM/Lenovo trackpoint I've ever used is awful (although significantly improved by putting a Thinkpad cover bit on the joystick, if you're stuck with one).

            That said, having played through the SC1 campaign with a trackpoint... even the best ones are not as good as a real mouse.

        • akho 3 years ago

          It’s faster, but less accurate than a trackpad, and certainly faster than keynav (and probably the thing in the post, which is a re-implementation of keynav).

  • _dain_ 3 years ago

    Those nipples tend to get stuck in one place for me, so I have to disable them.

philonoist 3 years ago

Windows has this in parts as KeyNavish, Fluent Search, Win-vind, Voice Finger, Window's accessibility's Voice Access, Window managers, etc. and still fall short.

hunogo 3 years ago

What's the point of this?

  • psytrx 3 years ago

    Some people prefer a keyboard-focussed workflow and try to avoid using the mouse as much as possible.

    • hunogo 3 years ago

      Weird.

      • alrlroipsp 3 years ago

        The only thing weird here is your lack of context and understanding, all while still commenting.

      • pomatic 3 years ago

        Some people need tools like this, as an assistive technology - think RSI, Parkinsons and other issues that affect dexterity or elbow movement. Not so weird.

        • xcambar 3 years ago

          Thank you for the examples of necessity over examples of preference.

        • hunogo 3 years ago

          Thank you for explaining - as a consequence of illness and disability, I can understand the need.

          But why someone would intentionally make things more difficult for themselves as a preference, I don't get. It would be like walking around in crutches when you have two perfectly healthy legs.

          • MontyCarloHall 3 years ago

            If implemented well (as [0] is), it can actually be much faster than using the mouse for certain tasks. For example, when browsing Google results, it’s a lot quicker to navigate to a result by pressing the first letter or two of its link text than dragging the mouse to click the link.

            As a more common example, I only launch applications by opening a prompt (e.g. Spotlight on Mac) and typing the first couple letters of the program I’m starting. This is much faster than navigating using the mouse to the applications folder/menu/dock/taskbar etc. and clicking an icon.

            I agree keyboard-based navigation is not faster for everything. Luckily, tools like this don’t prevent you from also using a mouse!

            [0] https://news.ycombinator.com/item?id=33222384

            • wruza 3 years ago

              Reminds me of the times when every clickable element had an unde&rlined key and could be activated by alt-r. Then some designheads decided that it is non-beatiful and killed it.

          • _dain_ 3 years ago

            I like to keep both hands on the keyboard. Every mouse movement incurs the cost of reaching the right hand for the mouse, then moving the right hand back and re-finding my place on the keyboard. I don't like that constant back-and-forth movement. It breaks my flow and it can make my arm ache.

          • vehemenz 3 years ago

            Why would you use a mouse when you have a perfectly good keyboard with 68+ keys and God knows how many viable input combinations?

          • linsomniac 3 years ago

            If taken to the logical conclusion, your question extends to "why do we have keyboard shortcuts when you can just mouse there?" Taken to the illogical conclusion: "Why even have a keyboard when you can just use a mouse?"

            There are times when a mouse is good, there are times when I don't want to take my hands off the keyboard and mouse for something.

          • alchemist1e9 3 years ago

            Seems unlikely you are a serious software developer, software engineer, or sysadmin. It’s well known mouse use slows you down and causes ergonomic issues.

          • qu4z-2 3 years ago

            > intentionally make things more difficult for themselves as a preference

            No-one would do that, that would be crazy. People intentionally make things easier for themselves as a preference, and different people find different things easy or hard.

          • forgotmypw17 3 years ago

            Because once you learn how to use it, the keyboard is much faster and more capable than a pointing device.

            So it's less like crutches and more like rollerblades.

jumperabg 3 years ago

Is this a vimium for everything?!?!? IT SEEMS LIKE IT IS!

rhokstar 3 years ago

Has anyone used this in video games? FPS? MMORPGs?