A vagrant environment for doing GeoTrellis development.
This repository can be used to set up a virtual machine environment to develop on GeoTrellis using Vagrant. The virtual machine will include GeoTrellis, Spark, HDFS, ZooKeeper, and Accumulo.
In order to get started with this virtual machine, some software must be installed on the host machine and the host machine must support virtualization.
Vagrant (version >= 1.7.2) is required on the host to manage the virtual machine. Binaries are available for most operating systems.
Ansible (version >= 1.8.2) is required to handle configuration of the virtual machine. Ansible is officially supported for Mac OSX and Linux host environments, though it can be used with a Windows host machine. There are multiple ways to install Ansible, choose the most appropriate one for your operating system.
VirtualBox is an open source virtual software package used to handle virtual machines. There are binaries available for most operating systems.
Note: It is also possible to run the virtual machine using a Kernel Based Virtual Machine on Linux. This can be done using the Vagrant Libvirt Provider. Vagrant Libvirt is still under active development and there are additional requirements if KVM is used.
Git is used for version control. It is necessary to use Git to download the GeoTrellis code and submit patches for development.
Your host machine should have at least 6GB of memory, a modern x86-64 processor, and virtualization support must be enabled for the processor being used.
-
Clone this repository.
git clone https://github.com/geotrellis/vagrant.geotrellis.git
Note: If you wish to submit patches to this repository, you should consider forking this repository.
-
Install required software listed above
-
Fork GeoTrellis
-
Navigate into the directory created by cloning
vagrant.geotrellis
-
Clone GeoTrellis from your forked repository.
At this point, the directories in the
vagrant.geotrellis
directory should look like this.vagrant.geotrellis ├── ansible ├── geotrellis ├── README.md └── Vagrantfile
-
Determine the appropriate folder syncing option by setting the
VAGRANT_GEOTRELLIS_SYNC
environment variable- For Linux and Mac OSX, NSF is likely the best option
- For Windows, consider using rsync
Rsync requires an extra process to be run to sync folders when developing, but has huge performance benefits compared to other options. This will greatly speed up compiling and running GeoTrellis since build products will not need to be synced back and forth between your guest and host machine.
Value | Sync Folder Type |
---|---|
nfs | NFS |
rsync | RSYNC |
OS Default |
-
In the top level directory with the
Vagrantfile
bring up the virtual machine at the command line.vagrant up
At this point Vagrant will start the virtual machine and begin provisioning it with Ansible. Depending on internet connection speeds, installation and downloading of all dependencies could take some time.
-
Once the machine finishes provisioning, you can verify that Accumulo and HDFS are running by navigating to their web UIs.
- Accumulo: http://localhost:50095
- HDFS: http://localhost:50070
-
If using the RSync shared folder option, start the vagrant rsync process to ensure your changes in the GeoTrellis code get synced to the virtual machine
vagrant rsync-auto
-
Once finished, you can ssh into the machine, navigate to the GeoTrellis directory, and start hacking on GeoTrellis.
vagrant ssh
cd /home/vagrant/geotrellis/
- In order to run a program using geotrellis on spark you will need to create an assembly (fat jar) of the project like so:
cd geotrellis
sbt "project spark" assembly
You can use spark-submit
on the vagrant machine to start a spark job:
spark-submit \
--class geotrellis.spark.ingest.AccumuloIngestCommand \
--master local[4] \
--driver-memory 1G \
--driver-library-path /usr/local/lib \
/vagrant/geotrellis/spark/target/scala-2.10/geotrellis-spark-assembly-0.10.0-SNAPSHOT.jar \
--input s3a://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@geotrellis-test/nlcd-geotiff \
--instance geotrellis-accumulo-cluster --user root --password secret --zookeeper localhost \
--table tiles --layerName NLCD
Note: You may need to specify fs.s3a.access.key
and fs.s3a.secret.key
in hdfs-site.xml
if your secret key includes /
;