Skip to content

Creating a DataPortal

In an ETL pipeline, the next step after processing data from an external source is to put it in a data store (load). A DataPortal makes setting up a database from the data model very simple.

In this tutorial, we will continue from the Reading BlueBikes data tutorial to use it as our dataset. We will set up a DataPortal, then use the DataPortalSync to load the data into the database. This is so we now have a permanent store for our data, and an access point for other applications as well.

The DataPortal Excel File

The DataPortal will have a similar structure to the "Simple Survey" example excel file that is available on the Create New DataPortal Page, since we are essentially creating one table in the database.

The excel file used in this Tutorial is available here. Download BlueBikes DataPortal Excel

Master Sheet

In the master sheet, we name the database, and use the Link ControlType to point towards the entry page of the DataPortal. As Type, enter Form.BlueBikes which points towards another sheet in the file.

DataPortal Master Sheet

Master.settings (optional step)

Settings pages are optional. The heading fields are Option, Value. Here, we disable the AutoSave feature. With AutoSave enabled, when you start entering data on a DataPortal page, it is automatically saved, even if your entry is not complete. When it is turned off, you need to click the Save Button for the data to be saved.

DataPortal Master Settings

TripData Container Page

We're going to skip over the BlueBikes sheet that we linked from the master page, and instead first create the container where we will be storing the trip data that we processed in the previous DataFlow. This is where we define the schema of the table, defining the names and datatypes. In our DataPortal, we also pick a ControlType for how to display these fields to a user entering in data.

Now go through each of the fields in the dataset, and list out their properties. In the Name column, we want these to match our dataset exactly. Recall that in the DataFlow, we renamed these columns to format them and remove spaces. In the DisplayName field, we make them more readable. In the Type column, we are mostly using C# System Types: integers for IDs, Doubles for decimals like latitude, strings for text and datetime for time fields. For UserType field of our data, we instead use a Category control type to limit the input values. We define the category values in the UserType sheet of our excel file. The other ControlTypes are selected based on the datatype of the field.

Name DisplayName Description Type ControlType Required
TripDuration Trip Duration System.Int32 Spin
StartTime Start Time System.DateTimeOffset DateTime
StopTime Stop Time System.DateTimeOffset DateTime
StartStationID Start Station ID System.Int32 Spin
StartStationName Start Station Name System.String Text
StartStationLatitude Start Station Latitude System.Double NumberFormatting
StartStationLongitude Start Station Longtitude System.Double NumberFormatting
EndStationID End Station ID System.Int32 Spin
EndStationName End Station Name System.String Text
EndStationLatitude End Station Latitude System.Double NumberFormatting
EndStationLongitude End Station Longitude System.Double NumberFormatting
BikeID Bike ID System.Int32 Spin
UserType User Type Form.UserType Category
ZipCode Zip Code System.String ZIP

Now let's go to the Categories sheet. so we can define the UserTypes we referenced as Form.UserType. Here, "UserType" is the header of a column of the Categories sheet. We add in the two values Customer and Subscriber.

BlueBikes DataPortal Categories

BlueBikes Container Page

Now let's go back to the BlueBikes sheet we referenced in the master sheet. Here we describe what to show as the main page of the DataPortal. We need to reference our TripData container, and list what columns we want to display.

BlueBikes DataPortal Container

For Type, enter [Form.TripData].

Under Columns enter: [Form.TripData.TripDuration, Form.TripData.StartTime, Form.TripData.StopTime, Form.TripData.StartStationID, Form.TripData.StartStationName, Form.TripData.StartStationLatitude, Form.TripData.StartStationLongitude, Form.TripData.EndStationID, Form.TripData.EndStationName, Form.TripData.EndStationLatitude, Form.TripData.EndStationLongitude, Form.TripData.BikeID, Form.TripData.UserType, Form.TripData.ZipCode]

Optionally, the column SearchBoxes set to TRUE will allow us to search on a column level, such as to only view data for a specific bike id or station.

Upload DataPortal

On the New DataPortal page, either click the Choose File button, or drag your file over to the upload box, and in the background Composable creates your database. Once it's finished processing, click on the Open DataPortal button and you'll be brought to the homepage of your DataPortal, which will look empty, since we haven't added any data.

BlueBikes DataPortal Home

Inserting Data with the DataPortal Sync Module

In the DataPortal, we can manually enter in entries one by one. In this case, we want to add a batch of data to our database, which can be done with the DataPortalSync module. The DataPortalSync module takes care of auditing changes to the database table and well as transforming category fields to an integer lookup.

To start, go back to the data ingestion DataFlow, and add the ExternalTableOutput to the last module. This allows us to run the DataFlow and access the table from another DataFlow. We are separating out the two functions of extracting our data and loading it to the database, so other processes could also read in the data and run transformations without the DataPortalSync module running every time.

BlueBikes DataPortal Home

Next, create a new DataFlow, which will have two modules. The first is our previous module. In the module sidebar, go to My DataFlows or Search All DataFlows and enter in the DataFlow name (exactly). You can also search without any search keyword to bring up all applicable DataFlows. After adding the module to the designer, you can see it has one output node, which was added from the External Table Output module. Running this module runs the DataFlow that it references. You can also click on the module name, which will open up the referenced dataflow in a new browser.

Add in a DataPortal Sync module and connect the output of the previous module to the DataModule. In the FormID selector, select the DataPortal that was created earlier. ContainerName is TripData (the sheet name in the DataPortal excel file). And ParentInstanceId is 1. Click on the question mark for the module details, that explains more about how the DataPortal Sync module runs.

BlueBikes DataPortal Home

Run this DataFlow and you will see updates in the Trace Log at the bottom with how many rows have been inserted. (It'll take a while, it's a large dataset.) And then we're done.

With an excel file, and a 2 module DataFlow, we've created a database and inserted data, without any SQL code written.

Note

If you have a key with credentials to write to your DataPortal, the ODBC Insert module can work, but does not include all of the features that are included with a DataPortal so it is not recommended. The DataPortalSync module transforms category data into the lookup key, and also provides auditing on data changes. For instance, your container database table also includes the following columns AclId, CreatedById, CreatedByUserName, CreatedDateTime.

Next Steps

DataPortals can be referenced in other Composable tools. A DataRepository can connect directly to a DataPortal id and show you a database diagram with tables connected by primary and foreign keys. With database credentials, create a Key for the connection, and then you can create QueryViews to query the data and share the results.