Writing a new transformation step

Some day I may take the time to turn this into a full fledged how to, but until then, it is a dumping ground of tips and stuff I have learned.

  • To write a new component (i.e. transformation step), you need to implement 4 classes, create a plugin.xml, and create an icon:
    • The Plugin class (extends BaseStep implements StepInterface)
      • This is the class that controls the data in and out. It is responsible for getting a row, processing it, and writing to output streams
    • The Data class (extends BaseStepData implements StepDataInterface)
      • This class encapsulates a data row. You can put special methods in here to work with your data objects if you want. The default handling has worked fine for me so far.
    • The Meta class (extends BaseStepMeta implements StepMetaInterface)
      • This class encapsulates the meta data. You'll do a lot of work in here, as this is where you can define new fields, where parameters are handled (saved to and read from the repository), and is the interface from the Dialog class to the component.
    • The Dialog class (extends BaseStepDialog implements StepDialogInterface)
      • This class provides the GUI interface that the user uses to set parameters for your component. It will call methods on your Meta class.
    • plugin.xml
      • This defines some meta data that the Kettle GUI uses, such as the category for your component, the jar files used by your component, the icon, and some names and descriptions.
      • Example:
      • <?xml version="1.0" encoding="UTF-8"?>
        <plugin
           id="ESRIGeocoderPlugin"
           iconfile="globe32.png"
           description="ESRI Online Geocoder Plugin"
           tooltip="Use this step to geocode addresses using ESRI's online geocoder"
           category="Transform"
           classname="net.timbert.geocoder.kettle.ESRIGeocoderPluginMeta">
           <libraries>
            <library name="ESRIGeocoderPlugin.jar"/> <!-- this is the jar that contains the 4 classes -->
            <library name="ESRIGeocoder.jar"/> <!-- the rest are supporting libraries -->
            <library name="axis.jar"/>
            <library name="commons-discovery-0.2.jar"/>
           </libraries>
            
           <localized_category>
             <category locale="en_US">Transform</category>
           </localized_category>
           <localized_description>
             <description locale="en_US">ESRI Online Geocoder Plugin</description>
           </localized_description>
           <localized_tooltip>
             <tooltip locale="en_US">Use this step to geocode addresses using ESRI's online geocoder</tooltip>
           </localized_tooltip>
        </plugin>
        
        
    • an icon
      • You'll need to create a 32×32 icon in png format.
    • Now, compile your 4 classes into a jar file, then put the jar, plugin.xml, icon file, and all supporting jar files into a new directory located at ${kettle-home}/plugins/steps. Call the directory whatever you want. When you restart Kettle, your component should show up. It you did something wrong, Kettle may not start at all, so remove your directory until you figure out what. Common issues could be missing some dependent jars, jar files compiled in a higher java release (must be 1.5).

Some Tips

Init method

  • Be sure to return a true or false value from your plugin's init method. If the init fails, return false, otherwise return true. If you don't return false on failure, your transformation will not be shut down properly (although it seems to be the case that even if you do this, you may still get the message “The transformation is running, don't start it twice!” 1) after an init fails in one of your steps). Here is my recommended pattern for the init:
    • 	public boolean init(StepMetaInterface smi, StepDataInterface sdi)
      	{
      		if (super.init(smi, sdi)) {
      		    logBasic("Initializing my plugin");
      		    meta = (MyPluginMeta)smi;
      		    data = (MyPluginData)sdi;
      
      			try {
                                // do your init stuff here
      				
      			} catch (Exception e) {
      				this.logError("Error initializing the plugin", e);
      				return false;
      			}
      			return true;
      		}
      		return false;
      	}
      
      

Dispose method

  • Along with a good init method, you should also have a good dispose method to clean things up. This gets called when the transformation is done (either by error, or successful completion). Here is my recommended pattern:
    • 	public void dispose(StepMetaInterface smi, StepDataInterface sdi)
      	{
      	    logBasic("Disposing my plugin");
      	    meta = (MyPluginMeta)smi;
      	    data = (MyPluginData)sdi;
      
      	    try {
      			// do your disposing here
      		} catch (Exception e) {
      			logError("Error disposing my plugin", e);
      		}
      		finally{
      		    super.dispose(smi, sdi);
      		}
      	}
      
      

Referencing outputRowMeta

  • You cannot reference the outputRowMeta object from your StepDataInterface before calling getRow() in the processRow method of your plugin.
    • Example of what will NOT work:
    • public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
      {
        data = (YourPluginClassData)sdi;
        data.outputRowMeta.size();
        Object[] r=getRow();
      }
      
    • you must call Object[] r=getRow(); before calling data.outputRowMeta.size(); or you will get a NullPointerException. Thats why you see this crazy if statement in the processRow method of bundled plugins:
      • 	if (first)
                {
        			logRowlevel("Processing first row");
                    first = false;
                    
                    data.outputRowMeta = (RowMetaInterface)getInputRowMeta().clone();
                    meta.getFields(data.outputRowMeta, getStepname(), null, null, this);            
                }
        
    • The problem with this is, who wants to have to perform an if statement for every row processed when the code inside is executed just one time???? Talk about an unnecessary performance hit. Multiply that if statement by every row processed, and then by every component in your transformation…that adds up! Surely there is a better way.

Returning dynamic fields from your component

  • It is possible to do this, and quite easy. Just know that the getFields() method of your meta class will be called whenever the set of fields is needed, and this is after any values have been read from the xml file or repository, so you have access to these values to determine your field list. It is also not necessary to clear out any previous fields you may have added in a prior call to getFields() before adding new ones. This is taken care of for you. Here is an example:
    • 	public void getFields(RowMetaInterface r, String origin, RowMetaInterface[] info, StepMeta nextStep, VariableSpace space) throws KettleStepException
      	{
      		// add in dynamic fields
                      if (some condition) {
                        addOutputField(r, origin, "fieldA", ValueMetaInterface.TYPE_NUMBER, 18, 15, new Double(0), "###.###############");
                      } else {
                        addOutputField(r, origin, "fieldB", ValueMetaInterface.TYPE_STRING, 25, 0, "", null);
                      }
              }
      
      	private void addOutputField(RowMetaInterface r, String origin, String fieldName, int fieldType, int length, int precision, Object defaultValue, String conversionMask) {
      		ValueMetaAndData newField;
      		ValueMetaInterface v;
      		newField = new ValueMetaAndData(new ValueMeta(fieldName, fieldType), defaultValue);
      		newField.getValueMeta().setLength(length);
      		newField.getValueMeta().setPrecision(precision);
      		if (conversionMask != null) {
      			newField.getValueMeta().setConversionMask(conversionMask);	
      		}
      		
      		v = newField.getValueMeta();
      		v.setOrigin(origin);
      		r.addValueMeta(v);
      	}
              
      

How to make the color of steps red/green

If you have a component where you let the user select steps to send data to i.e. true/false filter style, you can make the hops appear in red/green. Do the following:

  • Override public String[] getTargetSteps(). This method should return the array of step names that the user chose. The step name in index 0 should be the true step (colored green), and index 1 is the false step (colored red).
  • optional: Override public boolean chosesTargetSteps(). Yeah, I know it's misspelled, but oh well, it's that way in the base class so we are stuck with poor spelling. This should return true if the component allows the user to choose the target steps.
  • example:
    •     public boolean chosesTargetSteps() {
          	
          	if (getTrueStep() == null || "".equals(getTrueStep())) {
          		return false;
          	}
          	return true;
          }
          
      	public String[] getTargetSteps()
      	{
      	    if (chosesTargetSteps())
      	    {
      	        return new String[] { getTrueStep(), getFalseStep() };
      	    }
      	    return null;
      	}
      
      
1) As a side note, it is bad user interface design to use exclamation points in error messages. It invokes panic in the end user, which serves no useful purpose.