Skip to content
Snippets Groups Projects
README.md 1.93 KiB
Newer Older
Tiphaine Viard's avatar
Tiphaine Viard committed
# mapaie

Nicolas Allègre's avatar
Nicolas Allègre committed
## Getting started
Nicolas Allègre's avatar
Nicolas Allègre committed
### Rappel : Creating a virtual environment
Nicolas Allègre's avatar
Nicolas Allègre committed
```sh
Tiphaine Viard's avatar
Tiphaine Viard committed
pip install virtualenv
virtualenv venv
```

Nicolas Allègre's avatar
Nicolas Allègre committed
### Rappel : Creating a SSH key for GIT
Tiphaine Viard's avatar
Tiphaine Viard committed

Nicolas Allègre's avatar
Nicolas Allègre committed
1. Création de la clé SSH
```sh
ssh-keygen -t ed25519 -C "MS IA <prenom.nom> GITLAB_Telecom-Paris" -f GITLAB_ENST_SSHKey
```

2. Dépôt sur le GITLAB
Tiphaine Viard's avatar
Tiphaine Viard committed

Nicolas Allègre's avatar
Nicolas Allègre committed
3. Test de la connexion
```sh
ssh -i ..\..\..\GITLAB_ENST_SSHKey -T git@gitlab.enst.fr
Tiphaine Viard's avatar
Tiphaine Viard committed
```
Nicolas Allègre's avatar
Nicolas Allègre committed

### Getting project environment

1. Create virtual Python environment

```sh
virtualenv venv
Tiphaine Viard's avatar
Tiphaine Viard committed
source venv/bin/activate
Nicolas Allègre's avatar
Nicolas Allègre committed

# on Windows
.\venv\Scripts\activate
```

2. Clone the repository
```sh
git clone <URL>
```

3. Configuring GIT
```sh
cd mapaie
git config --local user.name "Prénom Nom"
git config --local user.email "prenom.nom@telecom-paris.fr"
git config --local core.sshCommand "ssh -i C:\\<Path_to_SSHey>\\GITLAB_ENST_SSHKey"
```
 - `--local` pour une configuration local d'un dépôt GIT (mapaie/.git/config)
 - `--global` pour une configuration user de GIT ($HOME/.config/git/config)
 - `--system` pour une configuration PC de GIT (<Path_to_GIT>/etc/gitconfig)

4. Install all requirement
```sh
python -m pip install -r mapaie\requirements.txt
python -m nltk.downloader stopwords
python -m nltk.downloader punkt_tab
Tiphaine Viard's avatar
Tiphaine Viard committed
```

Nicolas Allègre's avatar
Nicolas Allègre committed
5. Installation des dépendances externes (voir la doc sur [python-magic](https://pypi.org/project/python-magic/))
- Windows & Mac
```sh
python -m pip install python-magic-bin
```
- Linux (Debian/Ubuntu)
```sh
sudo apt-get install libmagic1
```

## Using

Tiphaine Viard's avatar
Tiphaine Viard committed
Snakemake should be installed on the side.

Tiphaine Viard's avatar
Tiphaine Viard committed
You can then run `snakemake -c4` to download PDF files and extract their contents. PDF files are stored in `./pdfs`, and textual contents in ` ./txts/`.
Nicolas Allègre's avatar
Nicolas Allègre committed

### Manually

Dans l'ordre : 
```sh
python dl_docs.py
python parse_docs.py
python preproccess.py
python create_corpus_before_lang.py
python create_corpus.py -t themes.json -d data/preprocessed/ -m iramuteq
python create_corpus.py -t themes.json -d data/preprocessed/ -m cortext
Nicolas Allègre's avatar
Nicolas Allègre committed
```