clj-ml

A machine learning library for Clojure built on top of Weka and friends

Installation

In order to install the library you must first install Leiningen. You should also download the Weka 3.6.2 jar from the official weka homepage. If maven complains about not finding weka, follow its instructions to install the jar manually.

To install from source

git clone the project
$ lein deps
$ lein compile
$ lein compile-java
$ lein uberjar

Installing from Clojars

[clj-ml "0.0.3-SNAPSHOT"]

Installing from Maven

(add Clojars repository)

<dependency>
   <groupId>clj-ml</groupId>
   <artifactId>clj-ml</artifactId>
   <version>0.0.3-SNAPSHOT</version>
 </dependency>

Supported algorithms

Filters

supervised discretize
unsupervised discretize
supervised nominal to binary
unsupervised nominal to binary

Classifiers

C4.5 (J4.8)
naive Bayes
multilayer perceptron

Clusterers

k-means

Usage

I/O of data

    REPL> (use 'clj-ml.io)

    REPL> ; Loading data from an ARFF file, XRFF and CSV are also supported
    REPL> (def ds (load-instances :arff "file:///Applications/weka-3-6-2/data/iris.arff"))

    REPL> ; Saving data in a different format
    REPL> (save-instances :csv "file:///Users/antonio.garrote/Desktop/iris.csv"  ds)

Working with datasets

    REPL>(use 'clj-ml.data)

    REPL>; Defining a dataset
    REPL>(def ds (make-dataset "name" [:length :width {:kind [:good :bad]}] [ [12 34 :good] [24 53 :bad] ]))
    REPL>ds

    #<ClojureInstances @relation name

    @attribute length numeric
    @attribute width numeric
    @attribute kind {good,bad}

    @data
    12,34,good
    24,53,bad>

    REPL>; Using datasets like sequences
    REPL>(dataset-seq ds)

    (#<Instance 12,34,good> #<Instance 24,53,bad>)

    REPL>; Transforming instances  into maps or vectors
    REPL>(instance-to-map (first (dataset-seq ds)))

    {:kind :good, :width 34.0, :length 12.0}

    REPL>(instance-to-vector (dataset-at ds 0))
    [12.0 34.0 :good]

Filtering datasets

    REPL>(us 'clj-ml.filters)

    REPL>(def ds (load-instances :arff "file:///Applications/weka-3-6-2/data/iris.arff"))

    REPL>; Discretizing a numeric attribute using an unsupervised filter
    REPL>(def  discretize (make-filter :unsupervised-discretize {:dataset *ds* :attributes [0 2]}))

    REPL>(def filtered-ds (filter-process discretize ds))

Using classifiers

    REPL>(use 'clj-ml.classifiers)

    REPL>; Building a classifier using a  C4.5 decission tree
    REPL>(def classifier (make-classifier :decission-tree :c45))

    REPL>; We set the class attribute for the loaded dataset
    REPL>(dataset-set-class ds 4)

    REPL>; Training the classifier
    REPL>(classifier-train classifier ds)


     #<J48 J48 pruned tree
     ------------------

     petalwidth <= 0.6: Iris-setosa (50.0)
     petalwidth > 0.6
     |	petalwidth <= 1.7
     |	|   petallength <= 4.9: Iris-versicolor (48.0/1.0)
     |	|   petallength > 4.9
     |	|   |	petalwidth <= 1.5: Iris-virginica (3.0)
     |	|   |	petalwidth > 1.5: Iris-versicolor (3.0/1.0)
     |	petalwidth > 1.7: Iris-virginica (46.0/1.0)

     Number of Leaves  :		5

     Size of the tree :	9


    REPL>; We evaluate the classifier using a test dataset
    REPL>; last parameter should be a different test dataset, here we are using the same
    REPL>(def evaluation   (classifier-evaluate classifier  :dataset ds ds))

     === Confusion Matrix ===

       a	 b  c	<-- classified as
      50	 0  0 |	 a = Iris-setosa
       0 49  1 |	 b = Iris-versicolor
       0	 2 48 |	 c = Iris-virginica

     === Summary ===

     Correctly Classified Instances	   147		     98	     %
     Incorrectly Classified Instances	     3		      2	     %
     Kappa statistic			     0.97
     Mean absolute error			     0.0233
     Root mean squared error		     0.108
     Relative absolute error		     5.2482 %
     Root relative squared error		    22.9089 %
     Total Number of Instances		   150

    REPL>(:kappa evaluation)

     0.97

    REPL>(:root-mean-squared-error e)

     0.10799370769526968

    REPL>(:precision e)

     {:Iris-setosa 1.0, :Iris-versicolor 0.9607843137254902, :Iris-virginica
      0.9795918367346939}

    REPL>; The classifier can also be evaluated using cross-validation
    REPL>(classifier-evaluate classifier :cross-validation ds 10)

     === Confusion Matrix ===

       a	 b  c	<-- classified as
      49	 1  0 |	 a = Iris-setosa
       0 47  3 |	 b = Iris-versicolor
       0	 4 46 |	 c = Iris-virginica

     === Summary ===

     Correctly Classified Instances	   142		     94.6667 %
     Incorrectly Classified Instances	     8		      5.3333 %
     Kappa statistic			     0.92
     Mean absolute error			     0.0452
     Root mean squared error		     0.1892
     Relative absolute error		    10.1707 %
     Root relative squared error		    40.1278 %
     Total Number of Instances		   150

    REPL>; A trained classifier can be used to classify new instances
    REPL>(def to-classify (make-instance ds
                                                      {:class :Iris-versicolor,
                                                      :petalwidth 0.2,
                                                      :petallength 1.4,
                                                      :sepalwidth 3.5,
                                                      :sepallength 5.1}))
    REPL>(classifier-classify classifier to-classify)

     0.0

    REPL>(classifier-label to-classify)

     #<Instance 5.1,3.5,1.4,0.2,Iris-setosa>


    REPL>; The classifiers can be saved and restored later
    REPL>(use 'clj-ml.utils)

    REPL>(serialize-to-file classifier
    REPL> "/Users/antonio.garrote/Desktop/classifier.bin")

Using clusterers

    REPL>(use 'clj-ml.clusterers)

    REPL> ; we build a clusterer using k-means and three clusters
    REPL> (def kmeans (make-clusterer :k-means {:number-clusters 3}))

    REPL> ; we need to remove the class from the dataset to
    REPL> ; use this clustering algorithm
    REPL> (dataset-remove-class ds)

    REPL> ; we build the clusters
    REPL> (clusterer-build kmeans ds)
    REPL> kmeans

      #<SimpleKMeans
      kMeans
      ======

      Number of iterations: 3
      Within cluster sum of squared errors: 7.817456892309574
      Missing values globally replaced with mean/mode

      Cluster centroids:
                                                Cluster#
      Attribute                Full Data               0               1               2
                                   (150)            (50)            (50)            (50)
      ==================================================================================
      sepallength                 5.8433           5.936           5.006           6.588
      sepalwidth                   3.054            2.77           3.418           2.974
      petallength                 3.7587            4.26           1.464           5.552
      petalwidth                  1.1987           1.326           0.244           2.026
      class                  Iris-setosa Iris-versicolor     Iris-setosa  Iris-virginica

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
src		src
test/clj_ml		test/clj_ml
README.markdown		README.markdown
project.clj		project.clj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

clj-ml

Installation

To install from source

Installing from Clojars

Installing from Maven

Supported algorithms

Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

antoniogarrote/clj-ml

Folders and files

Latest commit

History

Repository files navigation

clj-ml

Installation

To install from source

Installing from Clojars

Installing from Maven

Supported algorithms

Usage

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages