Setting the Stage

You need geospatial data to do anything with GIS. Data is the fuel for all GIS-based projects. Not long ago, obtaining data for a GIS-based project was an arduous task. Challenges included the lack of data, monetary, licensing, and other restrictions on data, difficulty in obtaining and sharing large data sets given the state of computer technology, and other societal and technical challenges. Because great time and effort was involved with either creating your own data or obtaining data that someone else had created, the bulk — sometimes even 90% of the typical GIS-based project’s lifespan was focused on obtaining and processing data, leaving precious little time for analysis and communicating the results of the analysis. I remember a project wherein I was studying the impact of freeways on urban neighborhoods in Kansas City, Missouri. Back in 1993, I spent a month obtaining and processing census data for a single county. Now, 24 years later, I can gather and start analyzing most of that same data in one minute using a variety of web-based platforms.

While it still can be cumbersome to obtain data at specific scales for specific areas, cloud-based data services, crowdsourced maps and databases, and real-time streaming make it much easier for anyone to obtain vast amounts of data in a short amount of time. In fact, because it has become much easier than even a few years ago to find data, is an article on searching for geospatial data even relevant in 2017? I submit that yes, it is important to provide such an article for the following four key reasons. 

First, we have been in a hybrid environment these past few years, which I predict will remain with us for years to come. In this hybrid environment, some data is best downloaded to your local device, and some is best streamed from cloud-based services. This hybrid environment has been a great leap forward, but the additional choices bring some additional complexity to data search strategies.

Second, the needs of the geospatial data community have always outpaced the supply. The community continues to seek data in different study areas, at finer scales of resolution, often as close to real-time as possible, covering a wider variety of themes than ever before, and as such, are not content with using data that they obtained 20 years ago.

Third, embedding GIS in organizational and IT infrastructure and workflows continues to expand the audience that consumes geospatial data, which impacts the type of data that GIS analysts seek, and the ways in which they communicate the results of their work.

Fourth, the societal forces of which geospatial data is a part are continually evolving, including crowdsourcing, copyright, location privacy, fee vs. free provision, national and international data standards, data portals, and more. These forces continue to change how data can and should be accessed, used, and shared.

These strategies do not encompass all situations, scales, and needs, so as always, the Directions Magazine editors and I look forward to your reactions and additional suggestions.

Search Strategies

Before searching for data, I advise you to first closely examine the scope of your project. Just because you can access every water well in Texas, do you really need every well for your study? Think back to your master’s thesis or other large project where your advisors were always telling you to “narrow and focus” your problem statement and your goals — some similar strategies may apply here as well. I also advise that you become familiar with the types of geospatial data — including vector data (shapefiles, geodatabases, feature services, vector tiles, and other vector formats), raster data (ArcGrids, GeoTiffs, other images, tiled image services, and so on), tabular data (Excel tables, CSVs, text files, other databases, and other formats), and other types of data that you could use in a geospatial environment (including ground images, webcam feeds, monitors from the rapidly expanding Internet of Things, and others). In conjunction with this, get familiar with national and international data standards and metadata formats. It is also important to understand the platform and tools you are using, and which formats you can use in your platform. This will enable you to determine what formats you can aim for in your search and whether you need to process the data once you have obtained it. For example, some tools require the data to be unzipped; others require that the data needs to be in zipped format to use.

Once you have defined the scope of your project, the capabilities of your GIS tools, and understand data standards, formats, and metadata, then I advise you to examine the new streaming paradigm of data access, and after covering that base, move to the “traditional” paradigm. Determine the types of data that you can read in directly from web-based services (the streaming paradigm) and then determine the types of data that you need to download. Your project’s need may be best met through a combination of the streaming and traditional paradigms.

I advise you to first look for data already in the cloud. To find data in the cloud, start with your platform’s data services. For example, if you are using Esri technology, start with the data library in ArcGIS Online, beginning with the Living Atlas of the World, a curated set of thousands of data layers. Use smart search strategies for your platform, using date, file types, layer owner name, and other terms, just as when you conduct a Google search (which I also advise in searching for spatial data), that can save you a great deal of time by narrowing your search. Understand whether your search is yielding maps, layers, groups, or other types of data.

Once you have searched and mined your own tool’s platform, move to local, state, national, and international data depositories and portals. These often feature user-friendly interfaces, a variety of formats, and the ability to stream data as well as download, but equally often, they may behave erratically, their User Interface may have not changed since the late 1990s, and they may present challenges in datums, projections, formats, and other issues. My colleague Jill Clark and I regularly review data portals — both the good and those that “need improvement” — on our Spatial Reserves data blog; for example, IndianaMap, the LandViewer Tool, the Los Angeles GeoHub, and the Texas Natural Resources Information System. Need help getting started with data portals? Try our list of the Top 10 most useful geospatial data portals, Dr. Karen Payne’s list of sites, and Robin Wilson’s list of free data.  In your examination of portals, government agencies will be prominent, but don’t neglect academic institutions, including their geospatial libraries, nonprofit organizations, and even private companies.

Final Considerations

Equally important in successfully searching for, and finding, geospatial data is being able to critically evaluate its quality. In another article in  Directions Magazine, I argue that in fact, because it is so easy to obtain data nowadays, and given the advent of crowdsourcing and cloud-based GIS, data quality considerations actually matter now more than ever. 

For further information on these topics of finding and evaluating data, along with associated issues such as crowdsourcing and location privacy, see the book that Jill Clark and I wrote entitled The GIS Guide to Public Domain Data and the blog that we have updated weekly for over five years, Spatial Reserves.

Short Exercises to Practice Data Searching

Try these hands-on exercises, designed to give you practice searching for data and evaluating what you’ve found.

1) Let’s say that you need data on plate tectonics. You need a world map with plate boundaries. One way to start is with ArcGIS Online. In the upper right, search for map: plate boundaries. Observe the thousands of hits. Then search for map: tags: plate boundaries. Then, search for map: plates 4 types. Finally, search for map: plates 4 types owner:jjkerski. Note how each search strategy netted fewer results. Practice these sorts of smart strategies when searching for your own data. For more search strategies for this platform, read these guidelines.

2) Let’s say you need data about flooding, and about floods in mountain-front communities. Let’s explore two methods of gathering the data: First, the traditional paradigm: Go to the Boulder County open data site: http://gis.bouldercounty.opendata.arcgis.com/ Search on floodplain. Click on floodplain. Note the attributes. Download dataset > Download Floodplain.zip to your computer. Note where you downloaded it, and unzip it; it is now a shapefile that you can bring into your GIS for analysis.

Now, let’s use the new streaming paradigm. Go to the same data repository: http://gis.bouldercounty.opendata.arcgis.com/. Search on floodplain, and click on it as before. This time, select “Open in ArcGIS.” You will see that these layers are also available as feature services and will display in an ArcGIS Online map. You can do some analysis on these layers inside ArcGIS Online or another platform that can consume a feature layer, without downloading it to your device. For more, go to http://opendata.arcgis.comand search for your topic or region of interest.

Continuing Your Learning: Finding and Evaluating Data and Solving Problems