extracting number with decimal points from text extracted from pdf files

Issue

I need to extract only numbers with a decimal point from the following string. I used re module but faced a problem with a number of commas(there can be no commas or more than 1). Another problem is decimal numbers followed by words (i.e. 1,513,971.63Savings ). As I extracted the string from PDF files so I can’t change the format.

sample string:

Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy

output:

19,858,700.86
350,745,799.38
174,381.98
1,125,990.66
131,647.15

Anyone help?

Solution

I guess you missed the 174,381.98. If so, use (\d+(?:[,.]\d+)+) pattern to get the expected result.

import re

string = """Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy"""

print(*re.findall(r"(\d+(?:[,.]\d+)+)", string), sep="\n")

Answered By – Artyom Vancyan

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published