Time is arguably our most valuable resource and preparing datasets can often be the most time-consuming part of any GIS project. Ingesting data from various sources, cleaning large datasets, joining datasets together, and calculating new fields are common data preparation workflows. These are traditionally not carried out in ArcGIS Online, even though the final output is frequently a web map or web application. Consequently, a lot of time can be spent migrating data and swapping between programs. This could be a thing of the past now with ArcGIS Data Pipelines. Read on to find out more and learn how it could help make your ArcGIS workflows more efficient.
What is Data Pipelines?
Data Pipelines is a native data integration capability of ArcGIS Online. What this means is it allows you to connect to and process data from various sources. Similar capabilities have been available through developer technologies like Notebooks and Python but the learning curve to become proficient in these can be daunting. Data Pipelines has been positioned to address this and make it easier than ever to perform data preparation workflows within ArcGIS Online. Not only does it come with a user-friendly drag-and-drop interface, but Data Pipelines provides the functionality to all ArcGIS Online users, to create pipelines with no coding required.
So, what can you do with Data Pipelines? The list below is by no means complete but highlights include:
- Ingest either hosted data or any data shared publicly that is accessible via URL.
- Connect to datasets in data warehouses (external data stores) such as Amazon S3, Microsoft Azure or Snowflake.
- Enrich your datasets by joining it with layers from the ArcGIS Living Atlas
- Filter and clean your data using a variety of data processing tools.
- Calculate new fields and make use of the native Arcade Editor. Arcade is our flexible expression language that allows you to work with data throughout ArcGIS. Learn more about Arcade in this previous webinar.
- Output either new layers, overwrite a layer, or append to an existing layer.
- Schedule a data pipeline task to run once it has been built.
What is the cost and requirements?
Data Pipelines usage is charged in credits, per minute for either interactive pipelines or scheduled pipelines:
- Interactive Pipelines consume credits at a rate of 50 credits per hour and are consumed whenever the connection status is Connected.
- Scheduled Pipelines consume credits at a rate of 70 credits per hour, for the time it takes to complete a task.
The one requirement for ArcGIS Online users to start building Data Pipelines is that they need to be a Creator User type with the following General Content privilege enabled: Create and run data pipelines.
How do you get started with Data Pipelines?
Users can access Data Pipelines through the icon found in the App Launcher. Once Data Pipelines opens choose to Create data pipeline.
There are three elements of a Data Pipeline; inputs, tools and outputs.
Inputs are your data sources and you can include one or multiple inputs in your pipeline. To add an input, select Inputs from the left hand menu and then select the respective datasource. A great feature of Data Pipelines is you can preview your data at any stage in tabular format and on a map if the data is spatial. In the following example, a layer from the Living Atlas is added as an input and then previewed.
Moving onto the second element, once inputs have been added then Tools can be used to process the data. These include filtering, joining, and calculating new fields plus many more. A full list of the available tools can be found here. In the first example below a filter is being used to only return data records that match an SQL query. In this case, local authorities in Scotland.
In this second example a join is being used to join two inputs together based on a shared field.
The final Data Pipelines element is an Output. In the example below, a new feature layer is being created.
Additional Information
At the moment Data Pipelines is only available in ArcGIS Online. Data Pipelines may be available in ArcGIS Enterprise in a future release.
Please test out Data Pipelines for yourself and if you have suggestions for tools that will improve your workflows, leave a comment in the Data Pipelines Esri Community forums. If you would like to find out more then check out this talk at our Scottish Conference 2023 or the Product FAQs here.
I believe Data Pipelines is a fantastic tool for data preparation and integration within ArcGIS Online and I am looking forward to seeing how it will be used in the community.