Code Reuse and Modularity in Composable
Composable provides special mechanisms for managing complexity of DataFlows. Specifically, Composable allows you to reuse parts of your DataFlow by encapsulating them as stand-alone custom modules.
When developing your DataFlow applications in Composable, users have access to hundreds of “first-class” modules that are available immediately. Users may take a subset of these modules, expose the inputs and outputs, and save it as a custom module. For those familiar with traditional text-based programming languages, this is essentially the Composable equivalent of functions, and allows for code reuse.
Code resuse is the use of existing code across multiple applications. Software engineers employ systematic software reuse as a strategy for increasing productivity and improving quality. This is captured in the general programming philosophy of Don’t Repeat Yourself (DRY) that states “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.” This strategy holds true for flow-based programming, as well as traditional text-based programming, and is a key feature when authoring complex DataFlows in Composable.
As an example of code reuse, let us assume that we need to perform an identical task many times, either within a single DataFlow or by multiple distinct DataFlows. For this example, the repeated task takes in a specific column of a table, performs an aggregation (e.g., SUM, AVG, or COUNT), and outputs the result. We can compose a DataFlow for this task, utilizing first-class modules wrapped with External Input and External Output modules, and save it as its own DataFlow.
The External Input and Output modules work to expose the inputs and outputs of a DataFlow within a parent DataFlow. Use the correct External Input and Output modules based on the data type required (e.g., string, boolean, table, etc.).
Here is what this DataFlow will look like.
What we have are three inputs (the data table, the column name to be aggregated and the aggregate function) and one output (the resulting aggregate value). The inputs and outputs are exposed by use of the External Inputs and External Outputs modules, which will become module connections within the parent DataFlow where this is used. Note that the value in the description box will become helper text for these connections, so be sure to make good use of these!
With the inputs and outputs exposed, our resulting DataFlow can now be encapsulated into a single custom module.
Now, a parent DataFlow may use this just like any other module. For example, we can reuse the above DataFlow three times as follows:
Here, we upload a single CSV file, and flow the resulting data table into three instances of our custom module to perform a SUM, AVG and COUNT on a specific column.
While code (DataFlow) reuse is an excellent way to organize and condense complex workflows, you can use these concepts of reuse and modularity in conjunction with the Code module to generate your own custom modules. This can be helpful when no combination of existing first-class modules can achieve a required function.
As an example, assume that a repetitive data engineering task requires you to replace all occurrences of a given substring (in this case DATE_HOLDER) with a new string (the current date). This can be achieved as follows:
Note that here, we used the Python Code module, but you may use any number of languages (from Python to R to SAS…). The code module allows you to insert your own algorithm (in any number of languages).
The above DataFlow is encapsulated as a single custom module.
And, with Composable being a collaborative platform, you may share this custom module with other users within your enterprise.
For more complex workflows, you can logically group a set of modules and encapsulate them into their own child DataFlow, and have a parent DataFlow application reference and call the child DataFlow. This reference DataFlow application looks and feels like any other module, but its functionality is to call the referenced DataFlow. To provide inputs into the DataFlow, and retrieve certain outputs, the referenced DataFlow must have certain module inputs and outputs marked as externalized with additional data (name and description). The DataFlow author does this by adding special modules that externalize inputs and outputs. These external modules then become the inputs and outputs in the ‘application reference module’ in the calling DataFlow application.
In conjuction with a code module, for languages ranging from R to Python to SAS, custom modules can be developed rapidly, reused across multiple DataFlows, and shared with other users.