1. Introduction
mecab은 일본에서 만들어진 형태소 분석기인데 한글 형태소 분석할 때도 많이 쓰인다. mecab-ko는 mecab을 한글에 맞게 변형한 버전이다. 자세한 정보는 mecab-ko에서 참조한다. mecab으로 형태소 분석을 제대로 할려면 사전이 중요한데, 한글로는 은전한닢 프로젝트에서 만든 사전 mecab-ko-dic이 유용하다.
2. Install mecab-ko on Mac
한글 mecab이 homebrew에 추가되었다. 예전처럼 필드하지 않아도 되니 편하다.
brew info mecab-ko
==> mecab-ko: stable 0.996-ko-0.9.2 (bottled)
See mecab
https://bitbucket.org/eunjeon/mecab-ko
Conflicts with:
mecab (because both install mecab binaries)
/usr/local/Cellar/mecab-ko/0.996-ko-0.9.2 (20 files, 4.0MB) *
Poured from bottle using the formulae.brew.sh API on 2023-09-10 at 16:19:09
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/m/mecab-ko.rb
==> Analytics
install: 9 (30 days), 28 (90 days), 50 (365 days)
install-on-request: 9 (30 days), 28 (90 days), 49 (365 days)
build-error: 0 (30 days)
설치를 해보자
brew install mecab-ko
mecab -v
mecab of 0.996/ko-0.9.0
설치된 후에는 실행관련 파일들의 위치는
/usr/local/Cellar/mecab/0.996
설정파일의 위치는
/usr/local/etc/mecabrc
3. Install mecab on Ubuntu
일반 일본어 mecab을 깔려면 패키지 메니저로 설치하면 된다.
apt install mecab
mecab -v
mecab of 0.996
4. 은전한닢 사전파일 Ubuntu에서 build하기
사전파일은 compile을 해야해서 Ubuntu에서 빌드한 후에 필요하면 맥으로 가져오도록 하자. 빌드설정 때문에 root user로 빌드했다.
가이드를 따라서
- 먼저 필요한 패키지들을 설치하고
sudo su -
apt install automake
- 다운로드 페이지에서 가장 최신버전의 링크를 구한다음에
ubuntu@vm:~$ wget https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.1.1-20180720.tar.gz
--2023-09-10 22:13:33-- https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.1.1-20180720.tar.gz
Resolving bitbucket.org (bitbucket.org)... 104.192.141.1, 2406:da00:ff00::22cd:e0db
Connecting to bitbucket.org (bitbucket.org)|104.192.141.1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://bbuseruploads.s3.amazonaws.com/a4fcd83e-34f1-454e-a6ac-c242c7d434d3/downloads/b5a0c703-7b64-45ed-a2d7-180e962710b6/mecab-ko-dic-2.1.1-20180720.tar.gz?response-content-disposition=attachment%3B%20filename%3D%22mecab-ko-dic-2.1.1-20180720.tar.gz%22&response-content-encoding=None&AWSAccessKeyId=ASIA6KOSE3BNEGJIUYOF&Signature=9EDXa%2FrG%2FvQwKG5Iqacwxp4qi3w%3D&x-amz-security-token=FwoGZXIvYXdzEBgaDBxktdtv%2BA6aIwHbhCK%2BAfqKDGTNiN%2Fzb0WOM5pOCFgA%2BdOAKdCwDcNKBtEiRigr0Ezl0wANlSLuVVBmCXulyjVIVXQYmlHEeDNU5MIIoHqmGPj2CuvWNk%2F5WFutK4Yg7ftwzKWt6l93%2B%2F704FmrsyMF8sAKbObROKSjaQ7DEWKjAWZeEAKHfZuH2LDRxibXtH4gYjbrLPcJnxNItLKET93udSm3D0U9GByoiW4hOF5zQYkFrlLrK%2Fz0uy%2F%2Bp1%2B%2F8xX3i1UTH%2FDaT1Q9CkYolvv4pwYyLbyOpCDBjNhLSMnVKqUh81jz7q5Mivqr0%2FEevJeG%2BU1sMkAGEUOw9q4Dj7jbUw%3D%3D&Expires=1694385310 [following]
--2023-09-10 22:13:33-- https://bbuseruploads.s3.amazonaws.com/a4fcd83e-34f1-454e-a6ac-c242c7d434d3/downloads/b5a0c703-7b64-45ed-a2d7-180e962710b6/mecab-ko-dic-2.1.1-20180720.tar.gz?response-content-disposition=attachment%3B%20filename%3D%22mecab-ko-dic-2.1.1-20180720.tar.gz%22&response-content-encoding=None&AWSAccessKeyId=ASIA6KOSE3BNEGJIUYOF&Signature=9EDXa%2FrG%2FvQwKG5Iqacwxp4qi3w%3D&x-amz-security-token=FwoGZXIvYXdzEBgaDBxktdtv%2BA6aIwHbhCK%2BAfqKDGTNiN%2Fzb0WOM5pOCFgA%2BdOAKdCwDcNKBtEiRigr0Ezl0wANlSLuVVBmCXulyjVIVXQYmlHEeDNU5MIIoHqmGPj2CuvWNk%2F5WFutK4Yg7ftwzKWt6l93%2B%2F704FmrsyMF8sAKbObROKSjaQ7DEWKjAWZeEAKHfZuH2LDRxibXtH4gYjbrLPcJnxNItLKET93udSm3D0U9GByoiW4hOF5zQYkFrlLrK%2Fz0uy%2F%2Bp1%2B%2F8xX3i1UTH%2FDaT1Q9CkYolvv4pwYyLbyOpCDBjNhLSMnVKqUh81jz7q5Mivqr0%2FEevJeG%2BU1sMkAGEUOw9q4Dj7jbUw%3D%3D&Expires=1694385310
Resolving bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)... 52.217.122.169, 52.217.228.65, 3.5.11.178, ...
Connecting to bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)|52.217.122.169|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 49775061 (47M) [application/x-tar]
Saving to: ‘mecab-ko-dic-2.1.1-20180720.tar.gz’
mecab-ko-dic-2.1.1-20180720.tar.gz 100%[=================================================================================================================>] 47.47M 24.7MB/s in 1.9s
2023-09-10 22:13:36 (24.7 MB/s) - ‘mecab-ko-dic-2.1.1-20180720.tar.gz’ saved [49775061/49775061]
ubuntu@vm:~$ tar xvfz mecab-ko-dic-2.1.1-20180720.tar.gz
mecab-ko-dic-2.1.1-20180720/
mecab-ko-dic-2.1.1-20180720/configure
mecab-ko-dic-2.1.1-20180720/COPYING
mecab-ko-dic-2.1.1-20180720/autogen.sh
mecab-ko-dic-2.1.1-20180720/Place-station.csv
mecab-ko-dic-2.1.1-20180720/NNG.csv
mecab-ko-dic-2.1.1-20180720/README
mecab-ko-dic-2.1.1-20180720/EF.csv
mecab-ko-dic-2.1.1-20180720/MAG.csv
mecab-ko-dic-2.1.1-20180720/Preanalysis.csv
mecab-ko-dic-2.1.1-20180720/NNB.csv
mecab-ko-dic-2.1.1-20180720/Person-actor.csv
mecab-ko-dic-2.1.1-20180720/VV.csv
mecab-ko-dic-2.1.1-20180720/Makefile.in
mecab-ko-dic-2.1.1-20180720/matrix.def
mecab-ko-dic-2.1.1-20180720/EC.csv
mecab-ko-dic-2.1.1-20180720/NNBC.csv
mecab-ko-dic-2.1.1-20180720/clean
mecab-ko-dic-2.1.1-20180720/ChangeLog
mecab-ko-dic-2.1.1-20180720/J.csv
mecab-ko-dic-2.1.1-20180720/.keep
mecab-ko-dic-2.1.1-20180720/feature.def
mecab-ko-dic-2.1.1-20180720/Foreign.csv
mecab-ko-dic-2.1.1-20180720/XPN.csv
mecab-ko-dic-2.1.1-20180720/EP.csv
mecab-ko-dic-2.1.1-20180720/NR.csv
mecab-ko-dic-2.1.1-20180720/left-id.def
mecab-ko-dic-2.1.1-20180720/Place.csv
mecab-ko-dic-2.1.1-20180720/Symbol.csv
mecab-ko-dic-2.1.1-20180720/dicrc
mecab-ko-dic-2.1.1-20180720/NP.csv
mecab-ko-dic-2.1.1-20180720/ETM.csv
mecab-ko-dic-2.1.1-20180720/IC.csv
mecab-ko-dic-2.1.1-20180720/Place-address.csv
mecab-ko-dic-2.1.1-20180720/Group.csv
mecab-ko-dic-2.1.1-20180720/model.def
mecab-ko-dic-2.1.1-20180720/XSN.csv
mecab-ko-dic-2.1.1-20180720/INSTALL
mecab-ko-dic-2.1.1-20180720/rewrite.def
mecab-ko-dic-2.1.1-20180720/Inflect.csv
mecab-ko-dic-2.1.1-20180720/configure.ac
mecab-ko-dic-2.1.1-20180720/NNP.csv
mecab-ko-dic-2.1.1-20180720/CoinedWord.csv
mecab-ko-dic-2.1.1-20180720/XSV.csv
mecab-ko-dic-2.1.1-20180720/pos-id.def
mecab-ko-dic-2.1.1-20180720/Makefile.am
mecab-ko-dic-2.1.1-20180720/unk.def
mecab-ko-dic-2.1.1-20180720/missing
mecab-ko-dic-2.1.1-20180720/VCP.csv
mecab-ko-dic-2.1.1-20180720/install-sh
mecab-ko-dic-2.1.1-20180720/Hanja.csv
mecab-ko-dic-2.1.1-20180720/MAJ.csv
mecab-ko-dic-2.1.1-20180720/XSA.csv
mecab-ko-dic-2.1.1-20180720/Wikipedia.csv
mecab-ko-dic-2.1.1-20180720/tools/
mecab-ko-dic-2.1.1-20180720/tools/add-userdic.sh
mecab-ko-dic-2.1.1-20180720/tools/mecab-bestn.sh
mecab-ko-dic-2.1.1-20180720/tools/convert_for_using_store.sh
mecab-ko-dic-2.1.1-20180720/user-dic/
mecab-ko-dic-2.1.1-20180720/user-dic/nnp.csv
mecab-ko-dic-2.1.1-20180720/user-dic/place.csv
mecab-ko-dic-2.1.1-20180720/user-dic/person.csv
mecab-ko-dic-2.1.1-20180720/user-dic/README.md
mecab-ko-dic-2.1.1-20180720/NorthKorea.csv
mecab-ko-dic-2.1.1-20180720/VX.csv
mecab-ko-dic-2.1.1-20180720/right-id.def
mecab-ko-dic-2.1.1-20180720/VA.csv
mecab-ko-dic-2.1.1-20180720/char.def
mecab-ko-dic-2.1.1-20180720/NEWS
mecab-ko-dic-2.1.1-20180720/MM.csv
mecab-ko-dic-2.1.1-20180720/ETN.csv
mecab-ko-dic-2.1.1-20180720/AUTHORS
mecab-ko-dic-2.1.1-20180720/Person.csv
mecab-ko-dic-2.1.1-20180720/XR.csv
mecab-ko-dic-2.1.1-20180720/VCN.csv
설정을 하고
root@vm:~/mecab-ko-dic-2.1.1-20180720# ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking for mecab-config... /usr/bin/mecab-config
configure: creating ./config.status
config.status: creating Makefile
빌드를 해보자.
root@vm:~/mecab-ko-dic-2.1.1-20180720# make
/usr/lib/mecab/mecab-dict-index -d . -o . -f UTF-8 -t UTF-8
reading ./unk.def ... 13
emitting double-array: 100% |###########################################|
reading ./MAJ.csv ... 240
reading ./VA.csv ... 2360
reading ./Place.csv ... 30303
reading ./Place-address.csv ... 19301
reading ./MAG.csv ... 14242
reading ./ETN.csv ... 14
reading ./MM.csv ... 453
reading ./NorthKorea.csv ... 3
reading ./XR.csv ... 3637
reading ./VX.csv ... 125
reading ./IC.csv ... 1305
reading ./Place-station.csv ... 1145
reading ./Foreign.csv ... 11690
reading ./Person.csv ... 196459
reading ./Symbol.csv ... 16
reading ./EP.csv ... 51
reading ./XSN.csv ... 124
reading ./ETM.csv ... 133
reading ./J.csv ... 416
reading ./Wikipedia.csv ... 36762
reading ./Group.csv ... 3176
reading ./Preanalysis.csv ... 5
reading ./XSV.csv ... 23
reading ./NNG.csv ... 208524
reading ./NNBC.csv ... 677
reading ./VCP.csv ... 9
reading ./EF.csv ... 1820
reading ./Inflect.csv ... 44820
reading ./VV.csv ... 7331
reading ./VCN.csv ... 7
reading ./Hanja.csv ... 125750
reading ./XPN.csv ... 83
reading ./XSA.csv ... 19
reading ./NNP.csv ... 2371
reading ./NR.csv ... 482
reading ./EC.csv ... 2547
reading ./NP.csv ... 342
reading ./NNB.csv ... 140
reading ./CoinedWord.csv ... 148
reading ./Person-actor.csv ... 99230
emitting double-array: 100% |###########################################|
reading ./matrix.def ... 3822x2693
emitting matrix : 100% |###########################################|
done!
echo To enable dictionary, rewrite /etc/mecabrc as \"dicdir = /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ko-dic\"
To enable dictionary, rewrite /etc/mecabrc as "dicdir = /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ko-dic"
- 빌드가 완료되면 사전 파일이 만들어진다
/usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ko-dic/dic
root@vm:/usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ko-dic# tree
.
├── char.bin
├── dicrc
├── left-id.def
├── matrix.bin
├── model.bin
├── pos-id.def
├── rewrite.def
├── right-id.def
├── sys.dic
└── unk.dic
이 사전을 사용하도록 mecabrc파일에 사전의 위치를 정해준다.
/etc/mecabrc
root@vm:~/mecab-ko-dic-2.1.1-20180720# cat /etc/mecabrc
;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
;dicdir = /var/lib/mecab/dic/debian
dicdir = /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ko-dic
; userdic = /home/foo/bar/user.dic
; output-format-type = wakati
; input-buffer-size = 8192
; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n```
우분투에서 잘 설치되었는지 테스트해본다.
root@vm:~/mecab-ko-dic-2.1.1-20180720# mecab
오늘은 날씨가 좋다
오늘 NNG,*,T,오늘,*,*,*,*
은 JX,*,T,은,*,*,*,*
날씨 NNG,*,F,날씨,*,*,*,*
가 JKS,*,F,가,*,*,*,*
좋 VA,*,T,좋,*,*,*,*
다 EC,*,F,다,*,*,*,*
EOS
5. 이 사전을 mac으로 가져와서 mecab-ko랑 연결해보자
우분투에서 만든 사전을 다운받아서 다음 위치에 풀어준다.
/usr/local/lib/mecab/dic/mecab-ko-dic
mecabrc파일을 수정해준다.
cat /usr/local/etc/mecabrc
;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
;dicdir = /usr/local/lib/mecab/dic/ipadic
dicdir = /usr/local/lib/mecab/dic/mecab-ko-dic
; userdic = /home/foo/bar/user.dic
; output-format-type = wakati
; input-buffer-size = 8192
; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n
맥에서 잘 설치되었는지 테스트 해본다.
mecab
오늘은 날씨가 좋다
오늘 NNG,*,T,오늘,*,*,*,*
은 JX,*,T,은,*,*,*,*
날씨 NNG,*,F,날씨,*,*,*,*
가 JKS,*,F,가,*,*,*,*
좋 VA,*,T,좋,*,*,*,*
다 EC,*,F,다,*,*,*,*
EOS