번역을 맡겨 보자

그래도 이 모자란 AI가 잘할 수 있는 것도 있지 않을까요? 가장 먼저 생각난 것이 번역이었습니다. 매우 단순하고, 역사가 깊은 작업입니다.

문장 번역하기

llm_trans.py

from langchain_community.llms.gpt4all import GPT4All
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

model = GPT4All(model="model/mistral-7b-openorca.Q4_0.gguf", n_threads=8)

template = """
Translate this Korean sentence to English: {sentence}

Print only the result sentence.
"""
prompt = PromptTemplate(template=template, input_variables=["sentence"])

llm_chain = LLMChain(prompt=prompt, llm=model, verbose=True)


def translate_ai(sentence: str) -> str:
    result = llm_chain.invoke({
        "sentence": sentence
    })
    return result["text"]

위와 같이 코드를 작성했습니다. 한 번 실행해 보겠습니다.

문맥이 좀 이상하긴 한데, 쓸 만은 한 것 같습니다. 마침 마크다운 문서를 번역할 게 좀 있어서, 써먹어 보기로 했습니다.

파일 번역하기

file_trans.py

import os
import glob
from func.md_preprocess import md_preprocess
from func.llm_trans import translate_ai

root_path = os.getcwd()
os.makedirs("result", exist_ok=True)

for file_path in glob.glob("raw/*.md"):
    absolute_file_path = os.path.join(root_path, file_path)

    with open(absolute_file_path, "r+", encoding="utf8") as f:
        docs = f.read()

    # Pre-process file
    sentences = md_preprocess(absolute_file_path)
    sentence_count = len(sentences)
    
    # Loop
    for idx, sentence in enumerate(sentences):
        translated = translate_ai(sentence).strip()
        replace_sentence = f'{translated}\n<!--Original: {sentence}-->'
        docs = docs.replace(sentence, replace_sentence)
        print(translated, flush=True)
        print(f"{idx+1}/{sentence_count} processed", flush=True)
    
    new_file_path = absolute_file_path.replace("raw\\", "result\\")
    with open(new_file_path, "w+", encoding="utf8") as f:
        f.write(docs)