llimllib 9 years ago

Here's a google bigquery that lists the most common PDFs referenced in the github sample dataset, and the top 100 results: https://gist.github.com/llimllib/3f1877eab06208958060f491cf3...

It's possible to run this query against the full github dataset but I couldn't figure out how to pay for it, so if somebody wants to do that it would be excellent.

  • llimllib 9 years ago

    just a note: it's bizarre that I absolutely cannot find a way to determine a) how much it would cost to run or b) how I would pay for it if I wanted to run it

    • infogulch 9 years ago

      I changed it to query from [bigquery-public-data:github_repos.contents] instead, and before I execute the query it says "Valid: This query will process 1.68 TB when run.".

      Queries are $5/TB [0].

      So a bit less than 10 bucks. :)

      Edit: brb, that's totally worth it.

      [0]: https://cloud.google.com/bigquery/pricing

ape4 9 years ago

Since the Java source is open, its all there to be peer-reviewed. If a paper its based on isn't the best you can make some noise about it. This is a good situation for Java.

lorenzhs 9 years ago

Some more found by a quick grep for "et al.", "Proceedings", "Proc. ", "Symposium", "Conference", "Conf. ", "PPoPP" (a conference with an easy-to-grep-for name), and "acm.org":

hotspot/src/cpu/ppc/vm/ppc.ad: See J.M.Tendler et al. "Power4 system microarchitecture", IBM J. Res. & Dev., No. 1, Jan. 2002.

hotspot/src/cpu/x86/vm/crc32c.h: V. Gopal et al. / Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction April 2011 8

hotspot/src/share/vm/gc/shared/taskqueue.hpp: Le, N. M., Pop, A., Cohen A., and Nardell, F. Z.: Correct and efficient work-stealing for weak memory models Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP 2013), 69-80

jdk/src/java.base/share/classes/java/util/Arrays.java: Peter McIlroy's "Optimistic Sorting and Information Theoretic Complexity", in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474, January 1993

jdk/src/jdk.crypto.ec/share/native/libsunec/impl/mpmontg.c: "A Cryptogrpahic Library for the Motorola DSP56000" by Stephen R. Dusse' and Burton S. Kaliski Jr. published in "Advances in Cryptology: Proceedings of EUROCRYPT '90, LNCS volume 473, 1991, pg 230-244

hotspot/src/share/vm/opto/superword.hpp: "Exploiting SuperWord Level Parallelism with Multimedia Instruction Sets" by Samuel Larsen and Saman Amarasinghe [...] published in ACM SIGPLAN Notices, Proceedings of ACM PLDI '00, Volume 35 Issue 5

jdk/src/java.base/share/classes/java/util/SplittableRandom.java: Leiserson, Schardl, and Sukha "Deterministic Parallel Random-Number Generation for Dynamic-Multithreading Platforms", PPoPP 2012

jdk/src/java.base/share/classes/java/util/SplittableRandom.java: "Parallel random numbers: as easy as 1, 2, 3" by Salmon, Morae, Dror, and Shaw, SC 2011

jdk/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java: "Dynamic Circular Work-Stealing Deque" by Chase and Lev, SPAA 2005

jdk/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java: "Idempotent work stealing" by Michael, Saraswat, and Vechev, PPoPP 2009

jdk/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java: "Leapfrogging: a portable technique for implementing efficient futures" by D.B. Wagner and B.G. Calder, PPoPP '93, http://dl.acm.org/citation.cfm?id=155354

jdk/src/java.base/share/classes/java/util/concurrent/LinkedTransferQueue.java: Using elimination to implement scalable and lock-free FIFO queues, Moir et al, http://portal.acm.org/citation.cfm?id=1074013

jdk/src/java.base/share/classes/java/util/concurrent/LinkedTransferQueue.java: "Bounding space usage of conservative garbage collectors", HJ Boehm, http://portal.acm.org/citation.cfm?doid=503272.503282 (this is the Boehm GC paper)

jdk/src/java.base/share/classes/java/util/concurrent/locks/StampedLock.java: Design, verification and applications of a new read-write lock algorithm, Shirako et al, SPAA 2012

hotspot/src/share/vm/opto/escape.hpp: Jong-Deok Shoi, Manish Gupta, Mauricio Seffano, Vugranam C. Sreedhar, Sam Midkiff: "Escape Analysis for Java", Procedings of ACM SIGPLAN OOPSLA Conference, November 1, 1999

hotspot/src/share/vm/runtime/os.cpp: Gilad Bracha and David Ungar: "Mirrors: Design Principles for Meta-level Facilities of Object-Oriented Programming Languages", in Proc. of the ACM Conf. on Object-Oriented Programming, Systems, Languages and Applications, October 2004

jdk/src/jdk.crypto.ec/share/native/libsunec/impl/ec_naf.c: D. Hankerson, J. Hernandez and A. Menezes, "Software implementation of elliptic curve cryptography over binary fields", Proc. CHES 2000

jdk/src/java.base/share/classes/java/util/concurrent/SynchronousQueue.java: "Nonblocking Concurrent Objects with Condition Synchronization", by W. N. Scherer III and M. L. Scott. 18th Annual Conf. on Distributed Computing, Oct. 2004

  • AlexDenisov 9 years ago

    That's a lot :) Some of your findings are actually listed in the original article, but not all of them obviously.

    • lorenzhs 9 years ago

      Ah, sorry, I didn't really check for dupes---I just skipped the ones with a pdf link in the vicinity. I'm just glad that sometimes the clever things that academics churn out are actually used in practice. Far too rarely if you ask me, but I'm biased of course ;)

cagmz 9 years ago

I had to cite sources while implementing an artificial immune system (real valued negative selection and clonal selection algorithms). I read through a few papers for each algorithm and cited the clearest one as a source.

the8472 9 years ago

it would be great if it also mentioned which files the links were found in

  • hood_syntax 9 years ago

    Seconded! I really like this compilation. Very interesting to see the algorithms and data structures behind the implementation of a language, especially one of the more popular ones.

  • willismichael 9 years ago

    To be fair, sometimes code and comments get moved around, and any of us can use grep (or whatever other search tool you prefer) to find a specific link in the source.

  • AlexDenisov 9 years ago

    You can just grep by PDF name/url and find the code.

rawfael 9 years ago

Please, do it for the Linux source code.

  • bonzini 9 years ago

    About 99% of Linux (or even more) is drivers. But indeed there should be useful references in the scheduler, locking primitives, memory management and core networking code.

MrBuddyCasino 9 years ago

To be more precise, its actually a list of scientific papers referenced in the OpenJDK source code.

  • qznc 9 years ago

    ... as direct pdf links found via grep.

    There might be more references without a pdf link.

    • nommm-nommm 9 years ago

      I'm surprised the author didn't search for DOI links.

      • verandaguy 9 years ago

        Asking as someone not as familiar with the research community as I'd like to be, what are DOI files, what advantages do they have over PDF/PostScript, and are they common?

        • MacsHeadroom 9 years ago

          DOI isn't a file format. It's an object identifier for papers like ISBN is for books.

        • nommm-nommm 9 years ago

          It's not a file format, it's a digital identifier. The APA can explain it better than I can:

          "A digital object identifier (DOI) is a unique alphanumeric string assigned by a registration agency (the International DOI Foundation) to identify content and provide a persistent link to its location on the Internet. The publisher assigns a DOI when your article is published and made available electronically."

          So you can access a journal article by going to http://dx.doi.org/DIO-GOES-HERE. doi.org doesn't host files, just resolves them to the current and correct location. For example the DOI number for the first paper in TFA is 10.1007/11427186_42 so it can be accessed at http://dx.doi.org/10.1007/11427186_42

          You know the DOI you know where you can find it.

  • sctb 9 years ago

    Thanks, we've updated the title to clarify.

  • rosstex 9 years ago

    Are you saying you can't run a PDF? :)