TextGrocery

A simple, efficient short-text classification tool based on LibLinear

Embed with jieba as default tokenizer to support Chinese tokenize

Performance

Train set: 48k news titles with 32 labels
Test set: 16k news titles with 32 labels
Compare with svm and naive-bayes of scikit-learn

Classifier	Accuracy	Time cost(s)
scikit-learn(nb)	76.8%	134
scikit-learn(svm)	76.9%	121
TextGrocery	79.6%	49

Sample Code

>>> from tgrocery import Grocery
# Create a grocery(don't forget to set a name)
>>> grocery = Grocery('sample')
# Train from list
>>> train_src = [
    ('education', 'Student debt to cost Britain billions within decades'),
    ('education', 'Chinese education for TV experiment'),
    ('sports', 'Middle East and Asia boost investment in top level sports'),
    ('sports', 'Summit Series look launches HBO Canada sports doc series: Mudhar')
]
>>> grocery.train(train_src)
# Or train from file
# Format: Label\tText
>>> grocery.train('train_ch.txt')
# Save model
>>> grocery.save()
# Load model(the same name as previous)
>>> new_grocery = Grocery('sample')
>>> new_grocery.load()
# Predict
>>> new_grocery.predict('Abbott government spends $8 million on higher education media blitz')
education
# Test from list
>>> test_src = [
    ('education', 'Abbott government spends $8 million on higher education media blitz'),
    ('sports', 'Middle East and Asia boost investment in top level sports'),
]
>>> new_grocery.test(test_src)
# Return Accuracy
1.0
# Or test from file
>>> new_grocery.test('test_ch.txt')
# Custom tokenize
>>> custom_grocery = Grocery('custom', custom_tokenize=list)

More examples: sample/

Install

$ pip install tgrocery

Only test under Unix-based System

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
sample		sample
tgrocery		tgrocery
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
README.rst		README.rst
README_CN.md		README_CN.md
runtests.py		runtests.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sample

sample

tgrocery

tgrocery

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE.txt

LICENSE.txt

MANIFEST.in

MANIFEST.in

Makefile

Makefile

README.md

README.md

README.rst

README.rst

README_CN.md

README_CN.md

runtests.py

runtests.py

setup.py

setup.py

Repository files navigation

TextGrocery

Performance

Sample Code

Install

About

Releases

Packages

Languages

License

2shou/TextGrocery

Folders and files

Latest commit

History

Repository files navigation

TextGrocery

Performance

Sample Code

Install

About

Resources

License

Stars

Watchers

Forks

Languages