Posted on:September 10, 2023 at 10:47 AM

Mecab ko와 은전한닢 프로젝트 사전 설치하기

Mecab ko와 은전한닢 프로젝트 사전 설치하기

1. Introduction

mecab은 일본에서 만들어진 형태소 분석기인데 한글 형태소 분석할 때도 많이 쓰인다. mecab-ko는 mecab을 한글에 맞게 변형한 버전이다. 자세한 정보는 mecab-ko에서 참조한다. mecab으로 형태소 분석을 제대로 할려면 사전이 중요한데, 한글로는 은전한닢 프로젝트에서 만든 사전 mecab-ko-dic이 유용하다.

2. Install mecab-ko on Mac

한글 mecab이 homebrew에 추가되었다. 예전처럼 필드하지 않아도 되니 편하다.

brew info mecab-ko
==> mecab-ko: stable 0.996-ko-0.9.2 (bottled)
See mecab
https://bitbucket.org/eunjeon/mecab-ko
Conflicts with:
  mecab (because both install mecab binaries)
/usr/local/Cellar/mecab-ko/0.996-ko-0.9.2 (20 files, 4.0MB) *
  Poured from bottle using the formulae.brew.sh API on 2023-09-10 at 16:19:09
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/m/mecab-ko.rb
==> Analytics
install: 9 (30 days), 28 (90 days), 50 (365 days)
install-on-request: 9 (30 days), 28 (90 days), 49 (365 days)
build-error: 0 (30 days)

설치를 해보자

brew install mecab-ko
mecab -v
mecab of 0.996/ko-0.9.0

설치된 후에는 실행관련 파일들의 위치는

/usr/local/Cellar/mecab/0.996

설정파일의 위치는

/usr/local/etc/mecabrc

3. Install mecab on Ubuntu

일반 일본어 mecab을 깔려면 패키지 메니저로 설치하면 된다.

apt install mecab
mecab -v
mecab of 0.996

4. 은전한닢 사전파일 Ubuntu에서 build하기

사전파일은 compile을 해야해서 Ubuntu에서 빌드한 후에 필요하면 맥으로 가져오도록 하자. 빌드설정 때문에 root user로 빌드했다.

가이드를 따라서

  1. 먼저 필요한 패키지들을 설치하고
sudo su -
apt install automake
  1. 다운로드 페이지에서 가장 최신버전의 링크를 구한다음에
ubuntu@vm:~$ wget https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.1.1-20180720.tar.gz
--2023-09-10 22:13:33--  https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.1.1-20180720.tar.gz
Resolving bitbucket.org (bitbucket.org)... 104.192.141.1, 2406:da00:ff00::22cd:e0db
Connecting to bitbucket.org (bitbucket.org)|104.192.141.1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://bbuseruploads.s3.amazonaws.com/a4fcd83e-34f1-454e-a6ac-c242c7d434d3/downloads/b5a0c703-7b64-45ed-a2d7-180e962710b6/mecab-ko-dic-2.1.1-20180720.tar.gz?response-content-disposition=attachment%3B%20filename%3D%22mecab-ko-dic-2.1.1-20180720.tar.gz%22&response-content-encoding=None&AWSAccessKeyId=ASIA6KOSE3BNEGJIUYOF&Signature=9EDXa%2FrG%2FvQwKG5Iqacwxp4qi3w%3D&x-amz-security-token=FwoGZXIvYXdzEBgaDBxktdtv%2BA6aIwHbhCK%2BAfqKDGTNiN%2Fzb0WOM5pOCFgA%2BdOAKdCwDcNKBtEiRigr0Ezl0wANlSLuVVBmCXulyjVIVXQYmlHEeDNU5MIIoHqmGPj2CuvWNk%2F5WFutK4Yg7ftwzKWt6l93%2B%2F704FmrsyMF8sAKbObROKSjaQ7DEWKjAWZeEAKHfZuH2LDRxibXtH4gYjbrLPcJnxNItLKET93udSm3D0U9GByoiW4hOF5zQYkFrlLrK%2Fz0uy%2F%2Bp1%2B%2F8xX3i1UTH%2FDaT1Q9CkYolvv4pwYyLbyOpCDBjNhLSMnVKqUh81jz7q5Mivqr0%2FEevJeG%2BU1sMkAGEUOw9q4Dj7jbUw%3D%3D&Expires=1694385310 [following]
--2023-09-10 22:13:33--  https://bbuseruploads.s3.amazonaws.com/a4fcd83e-34f1-454e-a6ac-c242c7d434d3/downloads/b5a0c703-7b64-45ed-a2d7-180e962710b6/mecab-ko-dic-2.1.1-20180720.tar.gz?response-content-disposition=attachment%3B%20filename%3D%22mecab-ko-dic-2.1.1-20180720.tar.gz%22&response-content-encoding=None&AWSAccessKeyId=ASIA6KOSE3BNEGJIUYOF&Signature=9EDXa%2FrG%2FvQwKG5Iqacwxp4qi3w%3D&x-amz-security-token=FwoGZXIvYXdzEBgaDBxktdtv%2BA6aIwHbhCK%2BAfqKDGTNiN%2Fzb0WOM5pOCFgA%2BdOAKdCwDcNKBtEiRigr0Ezl0wANlSLuVVBmCXulyjVIVXQYmlHEeDNU5MIIoHqmGPj2CuvWNk%2F5WFutK4Yg7ftwzKWt6l93%2B%2F704FmrsyMF8sAKbObROKSjaQ7DEWKjAWZeEAKHfZuH2LDRxibXtH4gYjbrLPcJnxNItLKET93udSm3D0U9GByoiW4hOF5zQYkFrlLrK%2Fz0uy%2F%2Bp1%2B%2F8xX3i1UTH%2FDaT1Q9CkYolvv4pwYyLbyOpCDBjNhLSMnVKqUh81jz7q5Mivqr0%2FEevJeG%2BU1sMkAGEUOw9q4Dj7jbUw%3D%3D&Expires=1694385310
Resolving bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)... 52.217.122.169, 52.217.228.65, 3.5.11.178, ...
Connecting to bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)|52.217.122.169|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 49775061 (47M) [application/x-tar]
Saving to: ‘mecab-ko-dic-2.1.1-20180720.tar.gz’

mecab-ko-dic-2.1.1-20180720.tar.gz                  100%[=================================================================================================================>]  47.47M  24.7MB/s    in 1.9s

2023-09-10 22:13:36 (24.7 MB/s) - ‘mecab-ko-dic-2.1.1-20180720.tar.gz’ saved [49775061/49775061]
ubuntu@vm:~$ tar xvfz mecab-ko-dic-2.1.1-20180720.tar.gz
mecab-ko-dic-2.1.1-20180720/
mecab-ko-dic-2.1.1-20180720/configure
mecab-ko-dic-2.1.1-20180720/COPYING
mecab-ko-dic-2.1.1-20180720/autogen.sh
mecab-ko-dic-2.1.1-20180720/Place-station.csv
mecab-ko-dic-2.1.1-20180720/NNG.csv
mecab-ko-dic-2.1.1-20180720/README
mecab-ko-dic-2.1.1-20180720/EF.csv
mecab-ko-dic-2.1.1-20180720/MAG.csv
mecab-ko-dic-2.1.1-20180720/Preanalysis.csv
mecab-ko-dic-2.1.1-20180720/NNB.csv
mecab-ko-dic-2.1.1-20180720/Person-actor.csv
mecab-ko-dic-2.1.1-20180720/VV.csv
mecab-ko-dic-2.1.1-20180720/Makefile.in
mecab-ko-dic-2.1.1-20180720/matrix.def
mecab-ko-dic-2.1.1-20180720/EC.csv
mecab-ko-dic-2.1.1-20180720/NNBC.csv
mecab-ko-dic-2.1.1-20180720/clean
mecab-ko-dic-2.1.1-20180720/ChangeLog
mecab-ko-dic-2.1.1-20180720/J.csv
mecab-ko-dic-2.1.1-20180720/.keep
mecab-ko-dic-2.1.1-20180720/feature.def
mecab-ko-dic-2.1.1-20180720/Foreign.csv
mecab-ko-dic-2.1.1-20180720/XPN.csv
mecab-ko-dic-2.1.1-20180720/EP.csv
mecab-ko-dic-2.1.1-20180720/NR.csv
mecab-ko-dic-2.1.1-20180720/left-id.def
mecab-ko-dic-2.1.1-20180720/Place.csv
mecab-ko-dic-2.1.1-20180720/Symbol.csv
mecab-ko-dic-2.1.1-20180720/dicrc
mecab-ko-dic-2.1.1-20180720/NP.csv
mecab-ko-dic-2.1.1-20180720/ETM.csv
mecab-ko-dic-2.1.1-20180720/IC.csv
mecab-ko-dic-2.1.1-20180720/Place-address.csv
mecab-ko-dic-2.1.1-20180720/Group.csv
mecab-ko-dic-2.1.1-20180720/model.def
mecab-ko-dic-2.1.1-20180720/XSN.csv
mecab-ko-dic-2.1.1-20180720/INSTALL
mecab-ko-dic-2.1.1-20180720/rewrite.def
mecab-ko-dic-2.1.1-20180720/Inflect.csv
mecab-ko-dic-2.1.1-20180720/configure.ac
mecab-ko-dic-2.1.1-20180720/NNP.csv
mecab-ko-dic-2.1.1-20180720/CoinedWord.csv
mecab-ko-dic-2.1.1-20180720/XSV.csv
mecab-ko-dic-2.1.1-20180720/pos-id.def
mecab-ko-dic-2.1.1-20180720/Makefile.am
mecab-ko-dic-2.1.1-20180720/unk.def
mecab-ko-dic-2.1.1-20180720/missing
mecab-ko-dic-2.1.1-20180720/VCP.csv
mecab-ko-dic-2.1.1-20180720/install-sh
mecab-ko-dic-2.1.1-20180720/Hanja.csv
mecab-ko-dic-2.1.1-20180720/MAJ.csv
mecab-ko-dic-2.1.1-20180720/XSA.csv
mecab-ko-dic-2.1.1-20180720/Wikipedia.csv
mecab-ko-dic-2.1.1-20180720/tools/
mecab-ko-dic-2.1.1-20180720/tools/add-userdic.sh
mecab-ko-dic-2.1.1-20180720/tools/mecab-bestn.sh
mecab-ko-dic-2.1.1-20180720/tools/convert_for_using_store.sh
mecab-ko-dic-2.1.1-20180720/user-dic/
mecab-ko-dic-2.1.1-20180720/user-dic/nnp.csv
mecab-ko-dic-2.1.1-20180720/user-dic/place.csv
mecab-ko-dic-2.1.1-20180720/user-dic/person.csv
mecab-ko-dic-2.1.1-20180720/user-dic/README.md
mecab-ko-dic-2.1.1-20180720/NorthKorea.csv
mecab-ko-dic-2.1.1-20180720/VX.csv
mecab-ko-dic-2.1.1-20180720/right-id.def
mecab-ko-dic-2.1.1-20180720/VA.csv
mecab-ko-dic-2.1.1-20180720/char.def
mecab-ko-dic-2.1.1-20180720/NEWS
mecab-ko-dic-2.1.1-20180720/MM.csv
mecab-ko-dic-2.1.1-20180720/ETN.csv
mecab-ko-dic-2.1.1-20180720/AUTHORS
mecab-ko-dic-2.1.1-20180720/Person.csv
mecab-ko-dic-2.1.1-20180720/XR.csv
mecab-ko-dic-2.1.1-20180720/VCN.csv

설정을 하고

root@vm:~/mecab-ko-dic-2.1.1-20180720# ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking for mecab-config... /usr/bin/mecab-config
configure: creating ./config.status
config.status: creating Makefile

빌드를 해보자.

root@vm:~/mecab-ko-dic-2.1.1-20180720# make
/usr/lib/mecab/mecab-dict-index -d . -o . -f UTF-8 -t UTF-8
reading ./unk.def ... 13
emitting double-array: 100% |###########################################|
reading ./MAJ.csv ... 240
reading ./VA.csv ... 2360
reading ./Place.csv ... 30303
reading ./Place-address.csv ... 19301
reading ./MAG.csv ... 14242
reading ./ETN.csv ... 14
reading ./MM.csv ... 453
reading ./NorthKorea.csv ... 3
reading ./XR.csv ... 3637
reading ./VX.csv ... 125
reading ./IC.csv ... 1305
reading ./Place-station.csv ... 1145
reading ./Foreign.csv ... 11690
reading ./Person.csv ... 196459
reading ./Symbol.csv ... 16
reading ./EP.csv ... 51
reading ./XSN.csv ... 124
reading ./ETM.csv ... 133
reading ./J.csv ... 416
reading ./Wikipedia.csv ... 36762
reading ./Group.csv ... 3176
reading ./Preanalysis.csv ... 5
reading ./XSV.csv ... 23
reading ./NNG.csv ... 208524
reading ./NNBC.csv ... 677
reading ./VCP.csv ... 9
reading ./EF.csv ... 1820
reading ./Inflect.csv ... 44820
reading ./VV.csv ... 7331
reading ./VCN.csv ... 7
reading ./Hanja.csv ... 125750
reading ./XPN.csv ... 83
reading ./XSA.csv ... 19
reading ./NNP.csv ... 2371
reading ./NR.csv ... 482
reading ./EC.csv ... 2547
reading ./NP.csv ... 342
reading ./NNB.csv ... 140
reading ./CoinedWord.csv ... 148
reading ./Person-actor.csv ... 99230
emitting double-array: 100% |###########################################|
reading ./matrix.def ... 3822x2693
emitting matrix      : 100% |###########################################|

done!
echo To enable dictionary, rewrite /etc/mecabrc as \"dicdir = /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ko-dic\"
To enable dictionary, rewrite /etc/mecabrc as "dicdir = /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ko-dic"
  1. 빌드가 완료되면 사전 파일이 만들어진다
/usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ko-dic/dic
root@vm:/usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ko-dic# tree
.
├── char.bin
├── dicrc
├── left-id.def
├── matrix.bin
├── model.bin
├── pos-id.def
├── rewrite.def
├── right-id.def
├── sys.dic
└── unk.dic

이 사전을 사용하도록 mecabrc파일에 사전의 위치를 정해준다.

/etc/mecabrc
root@vm:~/mecab-ko-dic-2.1.1-20180720# cat /etc/mecabrc
;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
;dicdir = /var/lib/mecab/dic/debian
dicdir = /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ko-dic

; userdic = /home/foo/bar/user.dic

; output-format-type = wakati
; input-buffer-size = 8192

; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n```

우분투에서 잘 설치되었는지 테스트해본다.

root@vm:~/mecab-ko-dic-2.1.1-20180720# mecab
오늘은 날씨가 좋다
오늘	NNG,*,T,오늘,*,*,*,*
은	JX,*,T,은,*,*,*,*
날씨	NNG,*,F,날씨,*,*,*,*
가	JKS,*,F,가,*,*,*,*
좋	VA,*,T,좋,*,*,*,*
다	EC,*,F,다,*,*,*,*
EOS

5. 이 사전을 mac으로 가져와서 mecab-ko랑 연결해보자

우분투에서 만든 사전을 다운받아서 다음 위치에 풀어준다.

/usr/local/lib/mecab/dic/mecab-ko-dic

mecabrc파일을 수정해준다.

cat /usr/local/etc/mecabrc
;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
;dicdir =  /usr/local/lib/mecab/dic/ipadic
dicdir =  /usr/local/lib/mecab/dic/mecab-ko-dic

; userdic = /home/foo/bar/user.dic

; output-format-type = wakati
; input-buffer-size = 8192

; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n

맥에서 잘 설치되었는지 테스트 해본다.

mecab
오늘은 날씨가 좋다
오늘	NNG,*,T,오늘,*,*,*,*
은	JX,*,T,은,*,*,*,*
날씨	NNG,*,F,날씨,*,*,*,*
가	JKS,*,F,가,*,*,*,*
좋	VA,*,T,좋,*,*,*,*
다	EC,*,F,다,*,*,*,*
EOS