Gensim is a very cool python2, numpy-based vector space modelling (information retrieval) framework. It does the job in a straightforward way, and it has been a great project for me to learn python with because it uses some nice tricks in real life scenarios (like Generators) and is AFAICT elegantly coded. Sometimes I find it hard to believe how much functionality can be crammed in so few lines of (readable) code.
But anyway we're having some issues in it with cPickle (it breaks when saving large matrices, it breaks with some objects). For now I worked around it by using jsonpickle but I wonder how viable this alternative really is.
To give at least a crude idea of performance characteristics of different pickle methods, I wrote a very simple benchmark program - picklebench - to compare pickle, cPickle and jsonpickle. The script fills a dictionary which gets bigger and bigger, and for certain sizes of dictionary it is saved to, and loaded from disk again. We measure some metrics of each step. We continue until memory is exhausted.
limitations of this benchmark:Testing JsonPickle == 1000 == Stored in 0.01s. file size 0.05 MB. Speed 6.56 MB/s. RSS taken 0.23 MB. (4.34 MB per MB in file) Loaded in 0.01s. Speed 9.30 MB/s. RSS taken 0.04 MB. (0.71 MB per MB in file) == 4000 == Stored in 0.03s. file size 0.21 MB. Speed 6.93 MB/s. RSS taken 0.45 MB. (2.16 MB per MB in file) Loaded in 0.02s. Speed 9.54 MB/s. RSS taken 0.47 MB. (2.23 MB per MB in file) == 16000 == Stored in 0.13s. file size 0.85 MB. Speed 6.80 MB/s. RSS taken 0.86 MB. (1.00 MB per MB in file) Loaded in 0.09s. Speed 9.55 MB/s. RSS taken 0.88 MB. (1.04 MB per MB in file) == 64000 == Stored in 0.50s. file size 3.44 MB. Speed 6.87 MB/s. RSS taken 4.02 MB. (1.17 MB per MB in file) Loaded in 0.36s. Speed 9.44 MB/s. RSS taken 2.52 MB. (0.73 MB per MB in file) == 256000 == Stored in 2.10s. file size 13.97 MB. Speed 6.64 MB/s. RSS taken 17.24 MB. (1.23 MB per MB in file) Loaded in 1.53s. Speed 9.15 MB/s. RSS taken 8.49 MB. (0.61 MB per MB in file) == 1024000 == Stored in 8.60s. file size 56.23 MB. Speed 6.54 MB/s. RSS taken 61.64 MB. (1.10 MB per MB in file) Loaded in 6.27s. Speed 8.97 MB/s. RSS taken 95.99 MB. (1.71 MB per MB in file) == 4096000 == Stored in 38.80s. file size 228.26 MB. Speed 5.88 MB/s. RSS taken 181.83 MB. (0.80 MB per MB in file) Loaded in 25.17s. Speed 9.07 MB/s. RSS taken 170.84 MB. (0.75 MB per MB in file) == 16384000 == Testing cPickle Protocol 0 == 1000 == Stored in 0.01s. file size 0.01 MB. Speed 0.83 MB/s. RSS taken 0.05 MB. (5.95 MB per MB in file) Loaded in 0.00s. Speed 9.02 MB/s. RSS taken 0.05 MB. (5.04 MB per MB in file) == 4000 == Stored in 0.00s. file size 0.04 MB. Speed 8.69 MB/s. RSS taken 0.10 MB. (2.52 MB per MB in file) Loaded in 0.00s. Speed 15.43 MB/s. RSS taken 0.13 MB. (3.26 MB per MB in file) == 16000 == Stored in 0.01s. file size 0.17 MB. Speed 11.06 MB/s. RSS taken 0.13 MB. (0.79 MB per MB in file) Loaded in 0.01s. Speed 16.82 MB/s. RSS taken 0.10 MB. (0.60 MB per MB in file) == 64000 == Stored in 0.06s. file size 0.69 MB. Speed 12.43 MB/s. RSS taken 0.12 MB. (0.17 MB per MB in file) Loaded in 0.04s. Speed 17.25 MB/s. RSS taken 0.77 MB. (1.11 MB per MB in file) == 256000 == Stored in 0.22s. file size 2.96 MB. Speed 13.65 MB/s. RSS taken 0.18 MB. (0.06 MB per MB in file) Loaded in 0.17s. Speed 17.49 MB/s. RSS taken 3.09 MB. (1.04 MB per MB in file) == 1024000 == Stored in 0.88s. file size 12.20 MB. Speed 13.93 MB/s. RSS taken 0.16 MB. (0.01 MB per MB in file) Loaded in 0.69s. Speed 17.72 MB/s. RSS taken 12.38 MB. (1.01 MB per MB in file) == 4096000 == Stored in 3.52s. file size 52.14 MB. Speed 14.80 MB/s. RSS taken 0.05 MB. (0.00 MB per MB in file) Loaded in 2.84s. Speed 18.37 MB/s. RSS taken 49.55 MB. (0.95 MB per MB in file) == 16384000 == Stored in 12.68s. file size 218.27 MB. Speed 17.22 MB/s. RSS taken 0.19 MB. (0.00 MB per MB in file) Loaded in 11.59s. Speed 18.82 MB/s. RSS taken 198.20 MB. (0.91 MB per MB in file) Testing cPickle Protocol 1 == 1000 == Stored in 0.00s. file size 0.00 MB. Speed 9.65 MB/s. RSS taken 0.05 MB. (11.10 MB per MB in file) Loaded in 0.00s. Speed 9.31 MB/s. RSS taken 0.06 MB. (12.81 MB per MB in file) == 4000 == Stored in 0.00s. file size 0.02 MB. Speed 10.95 MB/s. RSS taken 0.10 MB. (4.95 MB per MB in file) Loaded in 0.00s. Speed 11.15 MB/s. RSS taken 0.12 MB. (6.19 MB per MB in file) == 16000 == Stored in 0.02s. file size 0.08 MB. Speed 5.26 MB/s. RSS taken 0.13 MB. (1.64 MB per MB in file) Loaded in 0.01s. Speed 11.77 MB/s. RSS taken 0.09 MB. (1.13 MB per MB in file) == 64000 == Stored in 0.03s. file size 0.32 MB. Speed 12.60 MB/s. RSS taken 0.12 MB. (0.37 MB per MB in file) Loaded in 0.03s. Speed 11.48 MB/s. RSS taken 0.78 MB. (2.43 MB per MB in file) == 256000 == Stored in 0.10s. file size 1.66 MB. Speed 17.00 MB/s. RSS taken 0.18 MB. (0.11 MB per MB in file) Loaded in 0.11s. Speed 14.50 MB/s. RSS taken 3.10 MB. (1.86 MB per MB in file) == 1024000 == Stored in 0.40s. file size 7.04 MB. Speed 17.76 MB/s. RSS taken 0.16 MB. (0.02 MB per MB in file) Loaded in 0.45s. Speed 15.54 MB/s. RSS taken 12.39 MB. (1.76 MB per MB in file) == 4096000 == Stored in 1.59s. file size 28.55 MB. Speed 17.95 MB/s. RSS taken 0.05 MB. (0.00 MB per MB in file) Loaded in 1.88s. Speed 15.20 MB/s. RSS taken 49.55 MB. (1.74 MB per MB in file) == 16384000 == Stored in 6.23s. file size 114.59 MB. Speed 18.40 MB/s. RSS taken 0.19 MB. (0.00 MB per MB in file) Loaded in 7.18s. Speed 15.96 MB/s. RSS taken 198.21 MB. (1.73 MB per MB in file) Testing cPickle Protocol 2 == 1000 == Stored in 0.00s. file size 0.00 MB. Speed 9.54 MB/s. RSS taken 0.05 MB. (11.10 MB per MB in file) Loaded in 0.00s. Speed 8.92 MB/s. RSS taken 0.06 MB. (12.81 MB per MB in file) == 4000 == Stored in 0.00s. file size 0.02 MB. Speed 13.25 MB/s. RSS taken 0.10 MB. (4.95 MB per MB in file) Loaded in 0.00s. Speed 11.39 MB/s. RSS taken 0.12 MB. (6.19 MB per MB in file) == 16000 == Stored in 0.01s. file size 0.08 MB. Speed 13.85 MB/s. RSS taken 0.13 MB. (1.64 MB per MB in file) Loaded in 0.01s. Speed 10.77 MB/s. RSS taken 0.09 MB. (1.13 MB per MB in file) == 64000 == Stored in 0.02s. file size 0.32 MB. Speed 13.99 MB/s. RSS taken 0.12 MB. (0.37 MB per MB in file) Loaded in 0.03s. Speed 11.45 MB/s. RSS taken 0.78 MB. (2.43 MB per MB in file) == 256000 == Stored in 0.10s. file size 1.66 MB. Speed 16.81 MB/s. RSS taken 0.18 MB. (0.11 MB per MB in file) Loaded in 0.12s. Speed 14.31 MB/s. RSS taken 3.10 MB. (1.86 MB per MB in file) == 1024000 == Stored in 0.37s. file size 7.04 MB. Speed 19.15 MB/s. RSS taken 0.16 MB. (0.02 MB per MB in file) Loaded in 0.45s. Speed 15.55 MB/s. RSS taken 12.39 MB. (1.76 MB per MB in file) == 4096000 == Stored in 1.58s. file size 28.55 MB. Speed 18.09 MB/s. RSS taken 0.05 MB. (0.00 MB per MB in file) Loaded in 1.83s. Speed 15.61 MB/s. RSS taken 49.55 MB. (1.74 MB per MB in file) == 16384000 == Stored in 6.02s. file size 114.59 MB. Speed 19.04 MB/s. RSS taken 0.19 MB. (0.00 MB per MB in file) Loaded in 7.32s. Speed 15.65 MB/s. RSS taken 198.21 MB. (1.73 MB per MB in file) Testing pickle Protocol 0 == 1000 == Stored in 0.01s. file size 0.01 MB. Speed 1.17 MB/s. RSS taken 0.03 MB. (3.21 MB per MB in file) Loaded in 0.04s. Speed 0.20 MB/s. RSS taken 0.06 MB. (6.41 MB per MB in file) == 4000 == Stored in 0.02s. file size 0.04 MB. Speed 1.91 MB/s. RSS taken 0.09 MB. (2.42 MB per MB in file) Loaded in 0.02s. Speed 2.12 MB/s. RSS taken 0.12 MB. (3.15 MB per MB in file) == 16000 == Stored in 0.08s. file size 0.17 MB. Speed 2.00 MB/s. RSS taken 0.13 MB. (0.79 MB per MB in file) Loaded in 0.07s. Speed 2.27 MB/s. RSS taken 0.09 MB. (0.55 MB per MB in file) == 64000 == Stored in 0.33s. file size 0.69 MB. Speed 2.08 MB/s. RSS taken 0.12 MB. (0.18 MB per MB in file) Loaded in 0.29s. Speed 2.40 MB/s. RSS taken 0.77 MB. (1.11 MB per MB in file) == 256000 == Stored in 1.34s. file size 2.96 MB. Speed 2.21 MB/s. RSS taken 0.18 MB. (0.06 MB per MB in file) Loaded in 1.17s. Speed 2.53 MB/s. RSS taken 3.09 MB. (1.04 MB per MB in file) == 1024000 == Stored in 5.33s. file size 12.20 MB. Speed 2.29 MB/s. RSS taken 0.16 MB. (0.01 MB per MB in file) Loaded in 4.73s. Speed 2.58 MB/s. RSS taken 12.38 MB. (1.01 MB per MB in file) == 4096000 == Stored in 21.23s. file size 52.14 MB. Speed 2.46 MB/s. RSS taken 0.05 MB. (0.00 MB per MB in file) Loaded in 18.63s. Speed 2.80 MB/s. RSS taken 49.55 MB. (0.95 MB per MB in file) == 16384000 == Stored in 85.09s. file size 218.27 MB. Speed 2.57 MB/s. RSS taken 0.19 MB. (0.00 MB per MB in file) Loaded in 74.60s. Speed 2.93 MB/s. RSS taken 198.20 MB. (0.91 MB per MB in file) Testing pickle Protocol 1 == 1000 == Stored in 0.01s. file size 0.00 MB. Speed 0.80 MB/s. RSS taken 0.06 MB. (11.96 MB per MB in file) Loaded in 0.00s. Speed 1.56 MB/s. RSS taken 0.07 MB. (14.52 MB per MB in file) == 4000 == Stored in 0.02s. file size 0.02 MB. Speed 0.84 MB/s. RSS taken 0.10 MB. (4.95 MB per MB in file) Loaded in 0.01s. Speed 1.65 MB/s. RSS taken 0.13 MB. (6.61 MB per MB in file) == 16000 == Stored in 0.09s. file size 0.08 MB. Speed 0.85 MB/s. RSS taken 0.13 MB. (1.64 MB per MB in file) Loaded in 0.05s. Speed 1.68 MB/s. RSS taken 0.09 MB. (1.13 MB per MB in file) == 64000 == Stored in 0.38s. file size 0.32 MB. Speed 0.85 MB/s. RSS taken 0.10 MB. (0.32 MB per MB in file) Loaded in 0.19s. Speed 1.67 MB/s. RSS taken 0.80 MB. (2.51 MB per MB in file) == 256000 == Stored in 1.49s. file size 1.66 MB. Speed 1.11 MB/s. RSS taken 0.19 MB. (0.11 MB per MB in file) Loaded in 0.75s. Speed 2.21 MB/s. RSS taken 3.13 MB. (1.88 MB per MB in file) == 1024000 == Stored in 6.11s. file size 7.04 MB. Speed 1.15 MB/s. RSS taken 0.16 MB. (0.02 MB per MB in file) Loaded in 2.99s. Speed 2.35 MB/s. RSS taken 12.45 MB. (1.77 MB per MB in file) == 4096000 == Stored in 24.40s. file size 28.55 MB. Speed 1.17 MB/s. RSS taken 0.06 MB. (0.00 MB per MB in file) Loaded in 11.97s. Speed 2.39 MB/s. RSS taken 49.74 MB. (1.74 MB per MB in file) == 16384000 == Stored in 97.62s. file size 114.59 MB. Speed 1.17 MB/s. RSS taken 0.19 MB. (0.00 MB per MB in file) Loaded in 48.04s. Speed 2.39 MB/s. RSS taken 198.88 MB. (1.74 MB per MB in file) Testing pickle Protocol 2 == 1000 == Stored in 0.01s. file size 0.00 MB. Speed 0.78 MB/s. RSS taken 0.06 MB. (11.96 MB per MB in file) Loaded in 0.00s. Speed 1.57 MB/s. RSS taken 0.07 MB. (14.52 MB per MB in file) == 4000 == Stored in 0.02s. file size 0.02 MB. Speed 0.85 MB/s. RSS taken 0.10 MB. (4.95 MB per MB in file) Loaded in 0.01s. Speed 1.52 MB/s. RSS taken 0.13 MB. (6.60 MB per MB in file) == 16000 == Stored in 0.09s. file size 0.08 MB. Speed 0.85 MB/s. RSS taken 0.13 MB. (1.64 MB per MB in file) Loaded in 0.05s. Speed 1.66 MB/s. RSS taken 0.09 MB. (1.18 MB per MB in file) == 64000 == Stored in 0.38s. file size 0.32 MB. Speed 0.85 MB/s. RSS taken 0.10 MB. (0.31 MB per MB in file) Loaded in 0.19s. Speed 1.65 MB/s. RSS taken 0.80 MB. (2.51 MB per MB in file) == 256000 == Stored in 1.52s. file size 1.66 MB. Speed 1.09 MB/s. RSS taken 0.19 MB. (0.11 MB per MB in file) Loaded in 0.76s. Speed 2.18 MB/s. RSS taken 3.13 MB. (1.88 MB per MB in file) == 1024000 == Stored in 6.19s. file size 7.04 MB. Speed 1.14 MB/s. RSS taken 0.16 MB. (0.02 MB per MB in file) Loaded in 3.01s. Speed 2.34 MB/s. RSS taken 12.45 MB. (1.77 MB per MB in file) == 4096000 == Stored in 24.60s. file size 28.55 MB. Speed 1.16 MB/s. RSS taken 0.06 MB. (0.00 MB per MB in file) Loaded in 12.06s. Speed 2.37 MB/s. RSS taken 49.74 MB. (1.74 MB per MB in file) == 16384000 == Stored in 98.38s. file size 114.59 MB. Speed 1.16 MB/s. RSS taken 0.19 MB. (0.00 MB per MB in file) Loaded in 47.89s. Speed 2.39 MB/s. RSS taken 198.88 MB. (1.74 MB per MB in file)Paraphrased:
posted on Thursday, 16 Jun 2011 22:32 - link - tags: foss, python - path: / - 0 comments