You can actually make a whole path of directories with one single mkdir command:
mkdir -p a/b/c/d/e
will create a folder a, inside it folder b, ..., inside it folder e. If part of the path exists, it will not raise any error, it will simply create the part that does not yet exist.
To see how it proceeds, use:
mkdir -pv a/b/c/d/e
Tuesday, July 22, 2008
Thursday, July 10, 2008
Mallet command to generate topic models
To generate topic model:
First, split the data into individual files:
split -l 1 -d -a 6 ../data.txt data-
Then convert the split data to mallet format:
text2vectors --input data --remove-stopwords --output data-mallet.txt --keep-sequence TRUE --keep-sequence-bigrams TRUE
Next generate the topics:
vectors2topics --input data-mallet.txt --num-topics 250 --num-top-words 100 --output-doc-topics doc-topics.txt --num-iterations 100 --show-topics-interval 1000 > topic-words.txt
To generate at phrase level, use the N-gram option:
vectors2topics --input data-mallet.txt --num-topics 250 --num-top-words 100 --output-doc-topics doc-topics.txt --num-iterations 100 --show-topics-interval 1000 --use-ngrams true > topic-phrases.txt
If any source file is modified, run "make clean", "make", "make jar". if "make jar" is skipped, the change will not be seen.
First, split the data into individual files:
split -l 1 -d -a 6 ../data.txt data-
Then convert the split data to mallet format:
text2vectors --input data --remove-stopwords --output data-mallet.txt --keep-sequence TRUE --keep-sequence-bigrams TRUE
Next generate the topics:
vectors2topics --input data-mallet.txt --num-topics 250 --num-top-words 100 --output-doc-topics doc-topics.txt --num-iterations 100 --show-topics-interval 1000 > topic-words.txt
To generate at phrase level, use the N-gram option:
vectors2topics --input data-mallet.txt --num-topics 250 --num-top-words 100 --output-doc-topics doc-topics.txt --num-iterations 100 --show-topics-interval 1000 --use-ngrams true > topic-phrases.txt
If any source file is modified, run "make clean", "make", "make jar". if "make jar" is skipped, the change will not be seen.
Subscribe to:
Posts (Atom)