이미지 처리와 텍스트 인식 python pillow & tesseract

Pillow

https://pillow.readthedocs.io/en/stable/installation.html

Installation

Warnings: Python Support: Pillow supports these Python versions.,,,,,,,,, Python, 3.10, 3.9, 3.8, 3.7, 3.6, 3.5, 3.4, 2.7,,, Pillow >= 9.0, Yes, Yes, Yes, Yes,,,,,, Pillow 8.3.2 - 8.4, Yes, Yes, Ye...

pillow.readthedocs.io

> pip install pillow

from PIL import Image, ImageFilter

kitten=Image.open("kitten.jpg")

blurryKitten=kitten.filter(ImageFilter.GaussianBlur)

blurryKitten.save("kitten_blurred.jpg")

blurryKitten.show()

Tessseract

OCR lib

https://tesseract-ocr.github.io/

Tesseract documentation

Documentation

tesseract-ocr.github.io

https://github.com/tesseract-ocr/tessdoc

GitHub - tesseract-ocr/tessdoc: Tesseract documentation

Tesseract documentation. Contribute to tesseract-ocr/tessdoc development by creating an account on GitHub.

github.com

tessdoc-main.zip

2.48MB

> pip install numpy

형식이 일정한 텍스트 처리 조건(일부는 전처리로 해결가능)

표준 폰트 하나로 작성되어야 함. 손글씨, 필기체, 장식적인 폰트 제외

복사본 혹은 사진이라면 행 구분이 명료해야, 복사 열화현상, 심하게 어두워진 부분 제외

수평 정렬, 기울어진 글자가 없어야 함

텍스트가 이미지를 벗어나거나, 이미지 모서리에서 잘려서는 안 됨

https://hanbit.co.kr/support/supplement_survey.html?pcode=B7159663510

한빛출판네트워크

더 넓은 세상, 더 나은 미래를 위한 아시아 출판 네트워크 :: 한빛미디어, 한빛아카데미, 한빛비즈, 한빛라이프, 한빛에듀

hanbit.co.kr

source.zip

7.37MB

(windows)

>tesseract text.tif textoutput &type textoutput.txt

from PIL import Image
import subprocess

def cleanFile(filePath, newFilePath):
image=Image.open(filePath)

#회색 임계점을 설정하고 이미지를 저장
image=image.point(lambda x: 0 if x<143 else 255)
image.save(newFilePath)

#새로 만든 이미지를 테서렉트로 읽습니다.
subprocess.call(["tesseract", newFilePath, "output"])

#결과 텍스트 파일을 열이 읽습니다.
outputFile=open("output.txt", 'r')
print(outputFile.read())
outputFile.close()

cleanFile("text_2.png", "text_2_clean.png")

저작자표시 비영리 변경금지

'Python' 카테고리의 다른 글

Automation: Youtube Search (0)	2023.01.11
10 minutes to pandas (0)	2023.01.10
딥러닝 음성인식에 필요한 wav 훈련 데이터 다루기(잡음중첩) , DSP를 이용한 음성인식, TinyML 음성인식(호출어 감지) 모델 훈련하기 (0)	2022.08.16
Paiza Cloud IDE (0)	2022.07.21
Kalman Filter Recap (0)	2022.07.21

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

AI 3D Printing

이미지 처리와 텍스트 인식 python pillow & tesseract

Pillow

Tessseract

'Python' 카테고리의 다른 글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

이미지 처리와 텍스트 인식 python pillow & tesseract

Pillow

Tessseract

'Python' 카테고리의 다른 글

관련글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역