Right, having done this myself long ago, much of the performance improvement comes from just raw C-code in the runtime. Making less string copies, etc. are a big deal, especially on real programs. That's much of what you've seen in the first set of PHP 7 improvements.
Taking PHP from a bytecode interpreter to generated x86 will mean that in places where you don't end up "hopping out", you should see closer to C-code like performance (the same is true for JavaScript, Python, etc.). Unlike a few years ago though, PHP7's scalar type hints and other slight tweaks (made on purpose!) should make this an even easier task than it was for say PHP 5.4. Type inference is incredibly effective, and in practice (see Paul Biggar's excellent thesis on this) there really isn't that much polymorphism in PHP programs. Parameters to functions usually have one maybe two types or they're actually polymorphic and see 15+ variations. This is just the nature of what you're able to reason about ("Can I really handle this being an int, float, bool, string, array, function callback, or arbitrary class here? Of course not...").
That said, once you get the two of them working really well together (eliminating friction at calling from the generated machine code for the language back into the runtime), you see additional benefits even on real programs. It's too bad they only report what the API endpoint did, that's relatively easy! (The full dashboard load was much more complex, with tons of crazy class loading and JSON serialization/deserialization, lots of Thrift RPCs, and plenty of memcache).
Source: I've run the Tumblr dashboard in production briefly on a system with a JIT-based runtime ;)