Skip to content

nasa-jpl-memex/GeoParser

Repository files navigation

GeoParser

The Geoparser is a software tool that can process information from any type of file, extract geographic coordinates, and visualize locations on a map. Users who are interested in seeing a geographical representation of information or data can choose to search for locations using the Geoparser, through a search index or by uploading files from their computer. The Geoparser will parse the files and visualizes cities or latitude-longitude points on the map. After the information is parsed and points are plotted on the map, users are able to filter their results by density, or by searching a key word and applying a "facet" to the parsed information. On the map, users can click on location points to reveal more information about the location and how it is related to their search.

Installation (Docker)

  1. docker build -t nasajplmemex/geo-parser --no-cache -f Dockerfile .
  2. docker-compose up -d
  3. Visit http://localhost:8000 on your browser

Try it out to help fight COVID!

GeoParser has been updated with a new easy to use Docker install, and also an example to download and run the COVID-19 literature data and view the locations. Use that example to explore and test out GeoParser on a real example and view locations from that dataset.

Installation (manually)

Requirements

  1. Python 2.7
  2. pip
  3. Django
  4. Tika Python

Install Requirements

  1. Install python requirements
pip install -r requirements.txt

How to Run the Application

  1. Run Solr Change directory to where you cloned the project cd Solr/solr-5.3.1/ ./bin/solr start

  2. Clone lucene-geo-gazetteer repo

    git clone https://github.com/chrismattmann/lucene-geo-gazetteer.git
    cd lucene-geo-gazetteer
    mvn install assembly:assembly
    add lucene-geo-gazetteer/src/main/bin to your PATH environment variable
    

    make sure it is working

    lucene-geo-gazetteer --help
    usage: lucene-geo-gazetteer
     -b,--build <gazetteer file>           The Path to the Geonames
                                           allCountries.txt
     -h,--help                             Print this message.
     -i,--index <directoryPath>            The path to the Lucene index
                                           directory to either create or read
     -s,--search <set of location names>   Location names to search the
                                           Gazetteer for
    
  3. You will now need to build a Gazetteer using the Geonames.org dataset. (1.2 GB)

    cd lucene-geo-gazetteer
    curl -O http://download.geonames.org/export/dump/allCountries.zip
    unzip allCountries.zip
    lucene-geo-gazetteer -i geoIndex -b allCountries.txt
    

    make sure it is working

    lucene-geo-gazetteer -s Pasadena Texas
    [
    {"Texas" : [
    "Texas",
    "-91.92139",
    "18.05333"
    ]},
    {"Pasadena" : [
    "Pasadena",
    "-74.06446",
    "4.6964"
    ]}
    ]
    

Now start lucene-geo-gazetteer server

lucene-geo-gazetteer -server
  1. Run tika server as mentioned in https://cwiki.apache.org/confluence/display/TIKA/GeoTopicParser on port 8001. Port can be configured via config.txt

  2. Make sure you can extract locations from Tika Server

curl -T /path/to/polar.geot -H "Content-Disposition: attachment; filename=polar.geot" http://localhost:8001/rmeta

You can obtain [file here] (https://raw.githubusercontent.com/chrismattmann/geotopicparser-utils/master/geotopics/polar.geot)

Output should be this

[
   {
      "Content-Type":"application/geotopic",
      "Geographic_LATITUDE":"39.76",
      "Geographic_LONGITUDE":"-98.5",
      "Geographic_NAME":"United States",
      "Optional_LATITUDE1":"27.33931",
      "Optional_LONGITUDE1":"-108.60288",
      "Optional_NAME1":"China",
      "X-Parsed-By":[
         "org.apache.tika.parser.DefaultParser",
         "org.apache.tika.parser.geo.topic.GeoParser"
      ],
      "X-TIKA:parse_time_millis":"1634",
      "resourceName":"polar.geot"
   }
]
  1. Run Django server python manage.py runserver

  2. Open in browser http://localhost:8000/ Note : Please refer to the wiki page on this github repository which can act as a guide for you on how to use GeoParser.

Technologies we Use