In our initial feature tutorial we provided a code snippet to read in comma separated value file and produce feature collection.
In this tutorial we will build a CSV DataStore, and in the process explore several aspects of how DataStores work and best to make use of them.
If you would like to follow along with this workshop, start a new Java project in your favourite IDE, and ensure GeoTools is on your CLASSPATH (using maven or downloading the jars).
Note
Terminology
DataStore borrows most of its concepts (and some of its syntax) from the OpenGIS Consortium (OGC) Web Feature Server Specification:
Here is the sample locations.csv file:
LAT, LON, CITY, NUMBER, YEAR
46.066667, 11.116667, Trento, 140, 2002
44.9441, -93.0852, St Paul, 125, 2003
13.752222, 100.493889, Bangkok, 150, 2004
45.420833, -75.69, Ottawa, 200, 2004
44.9801, -93.251867, Minneapolis, 350, 2005
46.519833, 6.6335, Lausanne, 560, 2006
48.428611, -123.365556, Victoria, 721, 2007
-33.925278, 18.423889, Cape Town, 550, 2008
-33.859972, 151.211111, Sydney, 436, 2009
41.383333, 2.183333, Barcelona, 914, 2010
39.739167, -104.984722, Denver, 869, 2011
52.95, -1.133333, Nottingham, 800, 2013
45.52, -122.681944, Portland, 840, 2014
The first line of our CSV file is a header that provides the column names:
LAT, LON, CITY, NUMBER, YEAR
Each column name is treated as a simple String. More complicated formats have the option of isolating names into different name spaces.
Each subsequent line is used to capture a single feature of information suitable for mapping.
46.066667, 11.116667, Trento, 140, 2002
In our example the LAT and LON information represents a POINT(46.066667, 11.116667), the CITY Trento and the NUMBER 140 and YEAR 2002 capture details of the GRASS users conference (and one of the earliest Free and Open Source Software for Geomatics (FOSS4G) events.
Here is our strategy for representing GeoTools concepts with a CSV file.
FeatureID or FID - uniquely defines a Feature.
We will use the row number in our CSV file.
FeatureType Name
Same as the name of the .csv file (ie. “locations” for locations.csv.)
DataStore
We will create a CSVDataStore to access all the FeatureTypes (.csv files) in a directory
FeatureType or Schema
We will represent the names of the columns in our CSV (and if possible their types).
Geometry
Initially we will try and recognise several columns and map them into Point x and y ordinates. This technique is used to handle content from websites such as geonames.
We can also look at parsing a column using the Well-Known-Text representation of a Geometry.
# CoordinateReferenceSystem
Look for a prj sidecar file (ie locations.prj for locations.csv .)
Rather than go through the joy of parsing a CSV file by hand, we are going to make use of a library to read CSV files.
The JavaCSV project looks nice and simple and is available in maven:
For our purposes a key benefit of this implementation is streaming - it will read one line at a time and avoid loading the entire file into memory.
References:
Time to create a new project making use of this library:
Create a new project:
Fill in project details, paying careful attention to the gt.version property you wish to use. You can choose a stable release (recommended) or use 14-SNAPSHOT for access to the latest nightly build.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.geotools.tutorial</groupId>
<artifactId>csv</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>CSV DataStore</name>
<description>CSV DataStore tutorial</description>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<geotools.version>13.2</geotools.version>
</properties>
</project>
Add the following dependencies:
<dependencies>
<dependency>
<groupId>org.geotools</groupId>
<artifactId>gt-api</artifactId>
<version>${geotools.version}</version>
</dependency>
<dependency>
<groupId>org.geotools</groupId>
<artifactId>gt-data</artifactId>
<version>${geotools.version}</version>
</dependency>
<dependency>
<groupId>org.geotools</groupId>
<artifactId>gt-cql</artifactId>
<version>${geotools.version}</version>
</dependency>
<dependency>
<groupId>org.geotools</groupId>
<artifactId>gt-epsg-hsql</artifactId>
<version>${geotools.version}</version>
</dependency>
<dependency>
<groupId>net.sourceforge.javacsv</groupId>
<artifactId>javacsv</artifactId>
<version>2.0</version>
</dependency>
</dependencies>
Available from these repositories:
<repositories>
<repository>
<id>maven2-repository.dev.java.net</id>
<name>Java.net repository</name>
<url>http://download.java.net/maven/2</url>
</repository>
<repository>
<id>osgeo</id>
<name>Open Source Geospatial Foundation Repository</name>
<url>http://download.osgeo.org/webdav/geotools/</url>
</repository>
<repository>
<snapshots>
<enabled>true</enabled>
</snapshots>
<id>boundless</id>
<name>Boundless Maven Repository</name>
<url>http://repo.boundlessgeo.com/main</url>
</repository>
</repositories>
Finally we get to switch to Java 7:
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
</plugins>
</build>
You can check against the completed pom.xml
Create a directory src/test/resources and in there create package org.geotools.tutorial.csv. Then add locations.csv to this package.
LAT, LON, CITY, NUMBER, YEAR
46.066667, 11.116667, Trento, 140, 2002
44.9441, -93.0852, St Paul, 125, 2003
13.752222, 100.493889, Bangkok, 150, 2004
45.420833, -75.69, Ottawa, 200, 2004
44.9801, -93.251867, Minneapolis, 350, 2005
46.519833, 6.6335, Lausanne, 560, 2006
48.428611, -123.365556, Victoria, 721, 2007
-33.925278, 18.423889, Cape Town, 550, 2008
-33.859972, 151.211111, Sydney, 436, 2009
41.383333, 2.183333, Barcelona, 914, 2010
39.739167, -104.984722, Denver, 869, 2011
52.95, -1.133333, Nottingham, 800, 2013
45.52, -122.681944, Portland, 840, 2014
Download locations.csv.
Below is a JUnit4 test case to confirm JavaCSV is available and can read our file. Create a directory src/test/java and in there create package org.geotools.tutorial.csv. Then add CSVTest.java to the package:
/* GeoTools - The Open Source Java GIS Toolkit
* http://geotools.org
*
* (C) 2010-2014, Open Source Geospatial Foundation (OSGeo)
*
* This file is hereby placed into the Public Domain. This means anyone is
* free to do whatever they wish with this file. Use it well and enjoy!
*/
package org.geotools.tutorial.csv;
import static org.junit.Assert.assertTrue;
import java.io.File;
import java.io.FileReader;
import java.io.Serializable;
import java.net.URL;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
import org.geotools.data.DataStore;
import org.geotools.data.DataStoreFinder;
import org.geotools.data.DataUtilities;
import org.geotools.data.FeatureReader;
import org.geotools.data.Query;
import org.geotools.data.Transaction;
import org.geotools.data.simple.SimpleFeatureCollection;
import org.geotools.data.simple.SimpleFeatureIterator;
import org.geotools.data.simple.SimpleFeatureSource;
import org.geotools.factory.CommonFactoryFinder;
import org.geotools.feature.DefaultFeatureCollection;
import org.geotools.filter.text.cql2.CQL;
import org.geotools.referencing.CRS;
import org.junit.Test;
import org.opengis.feature.Property;
import org.opengis.feature.simple.SimpleFeature;
import org.opengis.feature.simple.SimpleFeatureType;
import org.opengis.feature.type.AttributeDescriptor;
import org.opengis.feature.type.GeometryDescriptor;
import org.opengis.filter.Filter;
import org.opengis.filter.FilterFactory;
import org.opengis.filter.identity.FeatureId;
import com.csvreader.CsvReader;
import com.vividsolutions.jts.geom.Geometry;
public class CSVTest {
@Test
public void test() throws Exception {
List<String> cities = new ArrayList<String>();
URL url = CSVTest.class.getResource("locations.csv");
File file = new File(url.toURI());
try (FileReader reader = new FileReader(file)) {
CsvReader locations = new CsvReader(reader);
locations.readHeaders();
while (locations.readRecord()) {
cities.add(locations.get("CITY"));
}
}
assertTrue(cities.contains("Victoria"));
}
}