I have an ultra large list of numerical values in
numpy.float64 format, and I want to convert each value, to
0.0 if there’s an
inf value, and parse the rest the elements to simple float.
This is my code, which works perfectly:
# Values in numpy.float64 format. original_values = [np.float64("Inf"), np.float64(0.02345), np.float64(0.2334)] # Convert them parsed_values = [0.0 if x == float("inf") else float(x) for x in original_values]
But this is slow. Is there any way to faster this code? Using any magic with
numpy (I have no experience with these libraries)?
Hey~ you probably are asking how could you do it faster with numpy, the quick answer is to turn the list into a numpy array and do it the numpy way:
import numpy as np original_values = [np.float64("Inf"), ..., np.float64(0.2334)] arr = np.array(original_values) arr[arr == np.inf] = 0
arr == np.inf returns another array that looks like
array([ True, ..., False]) and can be used to select indices in
arr in the way I showed.
Hope it helps.
I tested a bit, and it should be fast enough:
# Create a huge array arr = np.random.random(1000000000) idx = np.random.randint(0, high=1000000000, size=1000000) arr[idx] = np.inf # Time the replacement def replace_inf_with_0(arr=arr): arr[arr == np.inf] = 0 timeit.Timer(replace_inf_with_0).timeit(number=1)
The output says it takes 1.5 seconds to turn all 1,000,000
0s in a 1,000,000,000-element array.
arr.tolist() in the end to convert it back to a list for MongoDB, which should be the common way. I tried with the billion-sized array, and the conversion took about 30 seconds, while creating the billion-sized array took less than 10 sec. So, feel free to recommend more efficient methods.
Answered By – Q. Yu