As the comment by timsworkaccount says, this is definitely Athena at JPMorgan.
For those interested, I gave a talk on Athena at PyData UK 2018 called "Python at Massive Scale". 4500 developers making 20k commits a week. Codebase with 35m LOC.
The video is here: https://www.youtube.com/watch?v=ZYD9yyMh9Hk
It covers Athena's origins, what it is used for, application architecture, infrastructure, dev tooling and culture.
I worked on that project 10 years ago as a consultant, and it was certainly strategic at that time. It was fairly well known that one of the main stakeholder was pushing for those system and making a career out of it by jumping bank every few years.
I understand the value of it, but as an experienced scientific developer and python dev, the culture shock was huge. I don't have fond memory working on it. It is very different from traditional programming environment, closer to a kind of reimplementation of smalltalk env w/ python. I believe an influence was actually an old system that used smalltalk in the 90ies at JPM.
One of the fancy thing was integrating reactive programming, which worked through ugly hacks, at least at that time, by parsing code to detect dependencies. IIRC, it could manage list comprehension but not loops. They also had their own python binary w/ both python2 and python3 in one process.
The culture shock you mention was very real. I joined in 2010 (when Athena was 3 years old) and left 8 years later in 2018.
I remember my first few months unlearning normal Python and figuring out how to build the 'pixie graph', a lazily-evaluated Python DAG suited for calculating financial instruments. It took a while to get your head around this, but when you did it was a very powerful and productive way to building trading and risk management applications.
To get some sense in how this worked, here are two public projects on github with good introductions:
* https://github.com/timkpaine/tributary
* https://github.com/janushendersonassetallocation/loman
Hi Steve, thanks for your talk and for the links. Are those repos pixie dependencies? Or are you saying that pixie works like those repos?
They are "toy" projects illustrative of the general concepts. ("toy" as in smaller in scope; one of the two is by a former colleague).
JPM's actual pixie code is proprietary, extremely performant after 12 years of pushing into bigger and bigger scale problems, and is definitely not on GitHub!
I really like this video for explaining the DAG - https://www.youtube.com/watch?v=lTOP_shhVBQ
Sounds like a descendant of Goldman Sachs‘s SecDB/Slang with automatic dependency graph building. Did it have purple children (nodes whose value influences the graph structure) and twiddle scopes (modified copies of subtrees)? :-)
Yes indeed. Quite a few of Athena's early developers were from GS.
I've not heard the term "twiddle scopes" before; in Athena it is "tweaking".
Come to think of it, it might have been a “diddle scope” (I’m not a native speaker).
For finance applications (risk) the whole concept works quite well, doesn’t it - combining the natural advantages of code and Excel.
It was indeed "diddle," not "twiddle." At BAML, it's a "tweak." Beacon Platform is another implementation by the same team (but with support for the public cloud, and many more advanced features, I believe, including tighter web integration). I think it uses the terminology "bind."
[Disclosure: I was part of the Quartz Core team in 2011/2012.]
I was positively surprised that Slang was actually quite usable. I had expected much worse.
I suspect the Common Lisp influence was beneficial.
The weirdest thing, language-wise, that I noticed was scoping. It's neither dynamic nor static scoping, but weird scoping. (But there are work-around to get something like static scoping.)
Outside of the language, the whole CVS-based version control and review process was weird. But understandable as a product of the late 1990s, when review-before-going-into-permanent-history must have been way ahead of its time.
> The weirdest thing, language-wise, that I noticed was scoping.
Not the spaces in variable names?
Those annoyed me immensely. I can feel my heart rate go up 10 BPM just thinking about it. I guess it was meant to make scripts more readable to minimally-techy people, but actually just made it harder to parse.
No, actually not. I found that quite refreshing and occasionally useful.
But I was used to different and exotic conventions from the obscure languages I played with over the years.
Slang was OK as a language. The SecDB/Slang ecosystem was years ahead of its time and all credit to its inventors and maintainers, but a monorepo full of decades of critical code from thousands of developers still gives me palpitations.
Just to add a bit of history, SecDB itself was an acquisition from J Aron, a commodity trading shop in the early 90s.
It was ubiquitous in GS by the mid-2000s, and was rather instrumental to GS navigating the crisis in 2008.
Being able to accurately and quickly compute risks across the entire firm's books, rather than manually merging across separate systems was key advantage, which encouraged JP, Citi and BofA to build a SecDB clone themselves.
https://www.goldmansachs.com/our-firm/history/moments/1993-s...
Man AHL also had a version that worked on top of nodes representing timeseries: https://github.com/man-group/mdf
Loman author here. Thank you very much for the mention. Amazed that I never heard of Athena or pixie graphs. Our intention with Loman was to create a library scoped for a single process - we looked at the possibility of creating a system responsible for executing much larger graphs on a real-time ongoing basis, but it felt like a larger project than we'd be able to execute well. It sounds like Athena was that, and it worked well, subject to being a culture shock for people coming into it?
A similar library from another asset manager - https://github.com/man-group/mdf. Although MDF seemed to work at the level of timeseries instead of scalar values.
> python2 and python3 in one process
This would have had me running screaming from the building
I stared at my screen with deer-headlight eyes for half a second while hearing screaming in my head too.
AAAAaaaaaa
> I believe an influence was actually an old system that used smalltalk in the 90ies at JPM.
Imagine the benefit to the business had they just stuck with Smalltalk the whole time! It is remarkable how many businesses could have been quietly making money for years using a high-quality dynamic language such as Smalltalk or Lisp rather than going through the whole C/C++/Java/Python/Ruby/JavaScript/Go treadmill.
But I guess if they had done that then their programmers might not have gotten to type as many {}! Nor gotten to reinvent the wheel so often.
Kapital (the Smalltalk system) wasn't much of an influence on Athena. Athena was very much a SecDB/Slang derived effort.
And Kapital was an interesting experiment but rightfully died. Smalltalk just wasn't the right environment for something that complex and scale. I'd expect most lisps to suffer in the same way. And TBH it was more the image based model rather than the dynamic language per se though that was an issue.
There's a point where dynamism becomes a liability and Kapital went way beyond that point. Python fares a bit better but still, a huge effort is needed to control its failings at scale. This isn't helped at all by sticking a DAG in the middle of it as was done with Athena and its ilk. This all can be made to work but in the "cos its Turing complete" sense rather than the language helping in any meaningful way.
Source: I suffered through it.
> This all can be made to work but in the "cos its Turing complete" sense rather than the language helping in any meaningful way.
I had to LOL. You have very nicely put into words an experience I bet many have had.
Kapital was also rewritten in Java and survived in a few successor places ('Derivatives Studio' at CSFB). There it was my introduction to 'thunks'.
> I'd expect most lisps to suffer in the same way.
Why? Eg Scheme or even the uglier Common Lisp are quite disciplined. And Clojure ain't too bad either.
As far as I can tell, Lispers don't usually go around monkey-patching things.
Elispers certainly do!
Oh, definitely. But no one in their right mind would use emacs lisp. Or any dynamically-by-default scoped Lisp.
And all the time they saved they could have used to train new devs, reinvent a(nother) new vcs, figured out a new way to deploy image based software etc?
(I realize Smalltalk is great, but my point is it has its issues as well otherwise I recon sooner or later the advantages would be so clear that Amazon or Google or someone would be all over it since it would give them a competitive edge ...)
I don't anything about smalltalk, but one issue that was highlighted at that time was the inability to scale the number of "objects" to the required numbers (I can't remember exact typology, sorry). They used one of the smalltalk provider at that time, and went way beyond the upper limit.
That was the official story anyway. A lot of things were claimed when I worked there, a lot of it BS.
The culture in banking was definitely not for me: generally fairly smart people, but very perverse culture. This was the only environment I ever worked where people who actually lie to my face to make sure I made mistake when coding and fail. Everything was custom, and sometimes that meant athena, sometimes that meant terrible systems such as a "distributed database" that was a single threaded wrapper on top of an C++ STL hashmap running on NFS. The guy behind it was reworking its own protocol based on UDP instead of TCP because said system was too slow...
There was 0 abstraction, so if you wanted something such as EUR/USD pair, you had to ask the one guy who knew which id it was, so that you could get the data from in code. To this day I am convinced this was done on purpose for job security.
I believe an influence was actually an old system that used smalltalk in the 90ies at JPM.
That system you're thinking of was Kapital.
http://www.cincomsmalltalk.com/main/successes/financial-serv...
It's the only Smalltalk usage at JPM I believe, they could have doubled-down on it but instead went to Python and as you say, tried reinventing the wheel in it.
> tried reinventing the wheel in it.
That's not reinventing the wheel, that's building a new wheel you hope performs better/different with different materials and maybe some different techniques, and has a long and storied history of both successes and failures.
Reinventing the wheel would be if they published this system and some other bank recreated it without using what they published, thus "inventing" something that already existed and was available.
I never realised Kapital was written in smalltalk. Interesting
Apparently when it booted in the terminal, there was a nice ASCII portrait of Marx. I also have it on good authority that the program that launched the Smalltalk image from the terminal was called "das," so you literally would type "das kapital" to get the thing going.
> IIRC, it could manage list comprehension but not loops.
If it was like the predecessor system I worked with, it could handle either. But in a list comprehension, it could figure out the individual dependencies of the individual list items.
So, suppose you had a a function that took in a list, and returned a list with some computations performed on each element. If done with list comprehension, and one input element changed, only the corresponding output element would need to be recomputed, and only that would be recomputed.
If done with a loop, and one input element changed, the entire output array would be recomputed (even if all but one element remained unchanged - it would be hard to guarantee that property by code inspection at the time the dependency graph is built).
Hi Steve, nice to see you on here and doing well! :)
My first thought was Athena as well, but it could be Quartz (it started in 08-09 at BaML, vs Athena starting in 2006).
Both Citi and Barcap also tried (albeit to less success than JPM/BaML) to do similar things.
With that said, For those unfamiliar - at least two major banks run insanely large production python installs across thousands of developers. Core skills (nowadays) are python and react - and yes, we’re all actively hiring :)
All of the large banks also have pretty good open source initiatives - just check out github repos for JPMorgan, BaML, or GS. We have also been active participants in Pycon (key sponsors and doing sessions, like Steve’s) since at least 2009.
This thread is a real trip down memory lane!
I joined JPMorgan in 2010, as the first bank in London to offer me a Python role. I ended up staying for 8 years, almost all of it working in Athena: Commodities and FX trading; a bit of Equities; then with the Athena Core team working on the machine learning environment.
I learned a ton, worked with many hugely impressive people - on both the trading and technology sides of the business - and left with lots of good memories.
I remember interacting with you at one point. Time flies!
A serious question, how do you feel about working at such a powerful bank that has been involved in so many controversies, to put it lightly?
It is not a far-fetched argument that institutes like that are only a burden to society, existing only to widen socio-economical gaps. The point of view of an insider would be very interesting I think.
Do you believe banks are a net negative to society? Or more about institutions that partake in trading and the like?
The word "bank" in the first sentence [0] links to the Wikipedia page for JP Morgan, so I'd wager it is Athena.
[0] "There was one remarkable legacy system when I was working in a bank sometime back."
This is really quite unlikely to be Athena, which isn't scheduled to complete python 3 migration until Q4 2020 according to:
https://www.techrepublic.com/article/jpmorgans-athena-has-35...
The update isn't a job for one person, and I don't think it is finished yet, unless it is ahead of this alleged schedule...
This is talking about "Athena Server Pages" which is a web framework within Athena.
touché
Ah! Not sure how you figured that out, but it would make sense and explain the "legacy" comments (Athena is not legacy as far as I know but a component like that could be).
As someone who has worked with these Python-based bank frameworks on the UI side, the UI frameworks leaves a lot to be desired. Lots of reinventing the wheel (resulting in an inferior wheel).
Thankfully, both JPM and BAML seem to have started to ditch the UI frameworks, at least on the web side (in favor of React mostly). I don't know what the situation is on the desktop UI side. But I know what they had before was loathed by many.
Developers hated it because it was like working with one hand tied behind your back. Traders hated it because developers couldn't deliver the best product because they were working with one hand tied behind their back.
I presume the core teams responsible for building the framework hated it because trying to get support for the proprietary UI tools was painful.
I imagine hiring managers and HR hated it because they would blatantly lie about the job - that the role would be WinForms/WPF (desktop) or JavaScript (web) UI development, without mentioning that all of that was really wrapped up by the proprietary Python framework, and very rarely would you get to touch the underlying industry standard tech stacks.
I thought it sounded like Athena. It’s the best CI/CD I’ve ever used despite its quirks. I talk fondly of it at future jobs I’ve gone to
I've lost count of the number of internal projects I've seen named after mythological figures.
I've lost count of the number of internal projects I've seen named after mythological figures.
The Athena team ran out of names and one major component of it is called Bob.
Is it the job scheduler?
From the discussion here, sounds like this is Python version of K's (built in) trigger + dependency mechanism, which traces its origins to A+ and A ; A and A+ were originally developed inside Morgan Stanley by Arthur Whitney, and K was developed for a bank client IIRC, (UBS? or MS?)
This sounds really interesting; can you elaborate on it at all?
K4 uses : for assignment, and :: for dependency, so:
means c is defined to be the sum of a and b and dependent on both. If either a or b changes, c will be marked stale, and the next time you use it, it will be recomputed. It's essentially as simple as that, and is unified with database views - e.g.
means that any insert/update to the "accounts" table will mark "total" as stale, and the next use will recompute it, but otherwise it behaves like any other variable.
I'm not sure if the K4 engine is smart enough to avoid recompute on updates to other columns.
If you want more fine grained control, you can set up triggers - e.g. "execute this code on update to variable a" - where your code gets the changed indices of a list/vector (or changed tuples of a table) as arguments; this is useful if a plain complete recompute is too costly, or you need to do something at the time of change, rather than at the next evaluation.
K2 and A+ had a similar system, each with different syntax.
IIRC the folks who worked on SecDB at GS bounced around Wall St. recreating this abomination in Python.
And wasn't Athena JPMorgan's attempt at re-creating something like Slang and SecDB?
Yes. If I'm not mistaken it was from the same guy (and team?) that created Slang/SecDB at Goldman?
Same team subsequently moved onto Bank of America to create Quartz, which is BoA's version of the same concept.
Then subsequently moved on to found Washington Square Technologies, where I think they're doing something similar, but not attached to a specific bank? Not 100% sure.
I've heard the concept was shopped around to a few other banks as well, but did not take root.