EAI Steps
The following job steps can be visually linked together to extract, transform, and load (ETL) data from a wide variety of sources and speed development time using pre-built adapters and actions.
Name | Category | Description |
Abort | Flow | Abort a transformation |
Add a checksum | Transform | Add a checksum column for each input row |
Add constants | Transform | Add one or more constants to the input rows |
Add sequence | Transform | Get the next value from an sequence |
Add value fields changing sequence | Transform | Add sequence depending of fields value change. Each time value of at least one field change, PDI will reset sequence. |
Add XML | Transform | Encode several fields into an XML fragment |
Aggregate Rows | Deprecated | |
Analytic Query | Statistics | Execute analytic queries over a sorted dataset (LEAD/LAG/FIRST/LAST) |
Append streams | Flow | Append 2 streams in an ordered way |
ARFF Output | Data Mining | Writes data in ARFF format to a file |
Automatic Documentation Output | Output | This step automatically generates documentation based on input in the form of a list of transformations and jobs |
Avro input | Input | Decode binary or Json Avro data from a file or a field |
Block this step until steps finish | Flow | Block this step until selected steps finish. |
Blocking Step | Flow | This step blocks until all incoming rows have been processed. Subsequent steps only recieve the last input row to this step. |
Calculator | Transform | Create new fields by performing simple calculations |
Call DB Procedure | Lookup | Get back information by calling a database procedure. |
Change file encoding | Utility | Change file encoding and create a new file |
Cassandra input | Big Data | Read from a Cassandra column family |
Cassandra output | Big Data | Write to a Cassandra column family |
Check if a column exists | Lookup | Check if a column exists in a table on a specified connection. |
Check if file is locked | Lookup | Check if a file is locked by another process |
Check if webservice is available | Lookup | Check if a webservice is available |
Clone row | Utility | Clone a row as many times as needed |
Closure Generator | Transform | This step allows you to generates a closure table using parent-child relationships. |
Combination lookup/update | Data Warehouse | Update a junk dimension in a data warehouse. Alternatively, look up information in this dimension. The primary key of a junk dimension are all the fields. |
Concat Fields | Transform | The Concat Fields step is used to concatenate multiple fields into one target field. The fields can be separated by a separator and the enclosure logic is completely compatible with the Text File Output step. |
Copy rows to result | Job | Use this step to write rows to the executing job. The information will then be passed to the next entry in this job. |
CouchDB Input | Big Data | Retrieves all documents from a given view in a given design document from a given database |
Credit card validator | Validation | The Credit card validator step will help you tell: (1) if a credit card number is valid (uses LUHN10 (MOD-10) algorithm) (2) which credit card vendor handles that number (VISA, MasterCard, Diners Club, EnRoute, American Express (AMEX),...) |
CSV file input | Input | Simple CSV file input |
Data Grid | Input | Enter rows of static data in a grid, usually for testing, reference or demo purpose |
Data Validator | Validation | Validates passing data based on a set of rules |
Database join | Lookup | Execute a database query using stream values as parameters |
Database lookup | Lookup | Look up values in a database using field values |
De-serialize from file | Input | Read rows of data from a data cube. |
Delay row | Utility | Output each input row after a delay |
Delete | Output | Delete data in a database table based upon keys |
Detect empty stream | Flow | This step will output one empty row if input stream is empty (ie when input stream does not contain any row) |
Dimension lookup/update | Data Warehouse | Update a slowly changing dimension in a data warehouse. Alternatively, look up information in this dimension. |
Dummy (do nothing) | Flow | This step type doesn't do anything. It's useful however when testing things or in certain situations where you want to split streams. |
Dynamic SQL row | Lookup | Execute dynamic SQL statement build in a previous field |
Edi to XML | Utility | Converts an Edifact message to XML to simplify data extraction (Available in PDI 4.4, already present in CI trunk builds) |
ElasticSearch Bulk Insert | Bulk loading | Performs bulk inserts into ElasticSearch |
Email messages input | Input | Read POP3/IMAP server and retrieve messages |
ESRI Shapefile Reader | Input | Reads shape file data from an ESRI shape file and linked DBF file |
ETL Metadata Injection | Flow | This step allows you to inject metadata into an existing transformation prior to execution. This allows for the creation of dynamic and highly flexible data integration solutions. |
Example plugin | Transform | This is an example for a plugin test step |
Execute a process | Utility | Execute a process and return the result |
Execute row SQL script | Scripting | Execute SQL script extracted from a field created in a previous step. |
Execute SQL script | Scripting | Execute an SQL script, optionally parameterized using input rows |
File exists | Lookup | Check if a file exists |
Filter Rows | Flow | Filter rows using simple equations |
Fixed file input | Input | Fixed file input |
Formula | Scripting | Calculate a formula using libformula |
Fuzzy match | Lookup | Finding approximate matches to a string using matching algorithms. Read a field from a main stream and output approximative value from lookup stream. |
Generate random credit card numbers | Input | Generate random valide (luhn check) credit card numbers |
Generate random value | Input | Generate random value |
Generate Rows | Input | Generate a number of empty or equal rows. |
Get data from XML | Input | Get data from XML file by using XPath. This step also allows you to parse XML defined in a previous field. |
Get File Names | Input | Get file names from the operating system and send them to the next step. |
Get files from result | Job | This step allows you to read filenames used or generated in a previous entry in a job. |
Get Files Rows Count | Input | Get Files Rows Count |
Get ID from slave server | Transform | Retrieves unique IDs in blocks from a slave server. The referenced sequence needs to be configured on the slave server in the XML configuration file. |
Get previous row fields | Deprecated | |
Get repository names | Input | Lists detailed information about transformations and/or jobs in a repository |
Get rows from result | Job | This allows you to read rows from a previous entry in a job. |
Get SubFolder names | Input | Read a parent folder and return all subfolders |
Get System Info | Input | Get information from the system like system date, arguments, etc. |
Get table names | Input | Get table names from database connection and send them to the next step |
Get Variables | Job | Determine the values of certain (environment or Kettle) variables and put them in field values. |
Google Analytics | Input | Fetches data from google analytics account |
Google Docs Input | Input | |
Greenplum Bulk Loader | Bulk loading | Greenplum Bulk Loader |
Greenplum Load | Bulk loading | Greenplum Load |
Group by | Statistics | Builds aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly. |
GZIP CSV Input | Input | Parallel GZIP CSV file input reader |
Hadoop File Input | Big Data | Read data from a variety of different text-file types stored on a Hadoop cluster |
Hadoop File Output | Big Data | Write data to a variety of different text-file types stored on a Hadoop cluster |
HBase input | Big Data | Read from an HBase column family |
HBase output | Big Data | Write to an HBase column family |
HBase Row Decoder | Big Data | Decodes an incoming key and HBase result object according to a mapping |
HL7 Input | Input | Read data from HL7 data streams. |
HTTP client | Lookup | Call a web service over HTTP by supplying a base URL by allowing parameters to be set dynamically |
HTTP Post | Lookup | Call a web service request over HTTP by supplying a base URL by allowing parameters to be set dynamically |
IBM Websphere MQ Consumer | Input | Receive messages from any IBM Websphere MQ Server |
IBM Websphere MQ Producer | Output | Send messages to any IBM Websphere MQ Server |
Identify last row in a stream | Flow | Last row will be marked |
If field value is null | Utility | Sets a field value to a constant if it is null. |
Infobright Loader | Bulk loading | Load data to an Infobright database table |
Ingres VectorWise Bulk Loader | Bulk loading | This step interfaces with the Ingres VectorWise Bulk Loader "COPY TABLE" command. |
Injector | Inline | Injector step to allow to inject rows into the transformation through the java API |
Insert / Update | Output | Update or insert rows in a database based upon keys. |
Java Filter | Flow | Filter rows using java code |
JMS Consumer | Input | Receive messages from a JMS server |
JMS Producer | Output | Send messages to a JMS server |
Job Executor | Flow | This step executes a Data Integration Job, passes parameters and rows. |
Join Rows (cartesian product) | Joins | The output of this step is the cartesian product of the input streams. The number of rows is the multiplication of the number of rows in the input streams. |
JSON Input | Input | Extract relevant portions out of JSON structures (file or incoming field) and output rows |
JSON output | Output | Create Json bloc and output it in a field ou a file. |
Knowledge Flow | Data Mining | Executes a Knowledge Flow data mining process |
LDAP Input | Input | Read data from LDAP host |
LDAP Output | Output | Perform Insert, upsert, update, add or delete operations on records based on their DN (Distinguished Name). |
LDIF Input | Input | Read data from LDIF files |
Load file content in memory | Input | Load file content in memory |
LucidDB Bulk Loader | Deprecated | |
LucidDB Streaming Loader | Bulk loading | Load data into LucidDB by using Remote Rows UDX. |
Utility | Send eMail. | |
Mail Validator | Validation | Check if an email address is valid. |
Mapping (sub-transformation) | Mapping | Run a mapping (sub-transformation), use MappingInput and MappingOutput to specify the fields interface |
Mapping input specification | Mapping | Specify the input interface of a mapping |
Mapping output specification | Mapping | Specify the output interface of a mapping |
MapReduce Input | Big Data | Key Value pairs enter here from Hadoop MapReduce |
MapReduce Output | Big Data | Key Value pairs exit here and are pushed into Hadoop MapReduce |
MaxMind GeoIP Lookup | Lookup | Lookup an IPv4 address in a MaxMind database and add fields such as geography, ISP, or organization. |
Memory Group by | Statistics | Builds aggregates in a group by fashion. This step doesn't require sorted input. |
Merge Join | Joins | Joins two streams on a given key and outputs a joined set. The input streams must be sorted on the join key |
Merge Rows (diff) | Joins | Merge two streams of rows, sorted on a certain key. The two streams are compared and the equals, changed, deleted and new rows are flagged. |
Metadata structure of stream | Utility | This is a step to read the metadata of the incoming stream. |
Microsoft Access Input | Input | Read data from a Microsoft Access file |
Microsoft Access Output | Output | Stores records into an MS-Access database table. |
Microsoft Excel Input | Input | Read data from Excel and OpenOffice Workbooks (XLS, XLSX, ODS). |
Microsoft Excel Output | Output | Stores records into an Excel (XLS) document with formatting information. |
Microsoft Excel Writer | Output | Writes or appends data to an Excel file |
Modified Java Script Value | Scripting | This steps allows the execution of JavaScript programs (and much more) |
Mondrian Input | Input | Execute and retrieve data using an MDX query against an OLAP server (Mondrian) |
MonetDB Agile Mart | Agile | |
MonetDB Bulk Loader | Bulk loading | Load data into MonetDB by using their bulk load command in streaming mode. |
MongoDB Input | Big Data | Reads all entries from a MongoDB collection in the specified database. |
MongoDB Output | Big Data | Write to a MongoDB collection. |
Multiway Merge Join | Joins | Multiway Merge Join |
MySQL Bulk Loader | Bulk loading | MySQL bulk loader step, loading data over a named pipe (not available on MS Windows) |
Null if... | Utility | Sets a field value to null if it is equal to a constant value |
Number range | Transform | Create ranges based on numeric field |
OLAP Input | Input | Execute and retrieve data using an MDX query against any XML/A OLAP datasource using olap4j |
OpenERP Object Delete | Delete | Deletes data from the OpenERP server using the XMLRPC interface with the 'unlink' function. |
OpenERP Object Input | Input | Retrieves data from the OpenERP server using the XMLRPC interface with the 'read' function. |
OpenERP Object Output | Output | Updates data on the OpenERP server using the XMLRPC interface and the 'import' function |
Oracle Bulk Loader | Bulk loading | Use Oracle Bulk Loader to load data |
Output steps metrics | Statistics | Return metrics for one or several steps |
Palo Cell Input | Input | Retrieves all cell data from a Palo cube |
Palo Cell Output | Output | Updates cell data in a Palo cube |
Palo Dimension Input | Input | Returns elements from a dimension in a Palo database |
Palo Dimension Output | Output | Creates/updates dimension elements and element consolidations in a Palo database |
Reporting Output | Output | Executes an existing report (PRPT) |
PostgreSQL Bulk Loader | Bulk loading | PostgreSQL Bulk Loader |
Prioritize streams | Flow | Prioritize streams in an order way. |
Process files | Utility | Process one file per row (copy or move or delete). This step only accept filename in input. |
Properties Output | Output | Write data to properties file |
Property Input | Input | Read data (key, value) from properties files. |
R script executor | Statistics | Executes an R script within a PDI transformation |
Regex Evaluation | Scripting | Regular expression Evaluation. This step uses a regular expression to evaluate a field. It can also extract new fields out of an existing field with capturing groups. |
Replace in string | Transform | Replace all occurences a word in a string with another word. |
Reservoir Sampling | Statistics | Transform Samples a fixed number of rows from the incoming stream |
REST Client | Lookup | Consume RESTfull services. REpresentational State Transfer (REST) is a key design idiom that embraces a stateless client-server architecture in which the web services are viewed as resources and can be identified by their URLs |
Row denormaliser | Transform | Denormalises rows by looking up key-value pairs and by assigning them to new fields in the output rows. This method aggregates and needs the input rows to be sorted on the grouping fields |
Row flattener | Transform | Flattens consecutive rows based on the order in which they appear in the input stream |
Row Normaliser | Transform | De-normalised information can be normalised using this step type. |
RSS Input | Input | Read RSS feeds |
RSS Output | Output | Read RSS stream. |
Rule Executor | Scripting | Execute a rule against each row (using Drools) |
Rule Accumulator | Scripting | Execute a rule against a set of rows (using Drools) |
Run SSH commands | Utility | Run SSH commands and returns result. |
S3 CSV Input | Input | S3 CSV Input |
S3 File Output | Output | Exports data to a text file on an Amazon Simple Storage Service (S3) |
Salesforce Delete | Output | Delete records in Salesforce module. |
Salesforce Input | Input | Reads information from SalesForce |
Salesforce Insert | Output | Insert records in Salesforce module. |
Salesforce Update | Output | Update records in Salesforce module. |
Salesforce Upsert | Output | Insert or update records in Salesforce module. |
Sample rows | Statistics | Filter rows based on the line number. |
SAP Input | Input | Read data from SAP ERP, optionally with parameters |
SAS Input | Input | This step reads files in sas7bdat (SAS) native format |
Script | Experimental | |
Secret key generator | Experimental | Generate secrete key for algorithms such as DES, AEC, TripleDES. |
Select values | Transform | Select or remove fields in a row. Optionally, set the field meta-data: type, length and precision. |
Send message to Syslog | Utility | Send message to Syslog server |
Serialize to file | Output | Write rows of data to a data cube |
Set field value | Transform | Replace value of a field with another value field |
Set field value to a constant | Transform | Replace value of a field to a constant |
Set files in result | Job | This step allows you to set filenames in the result of this transformation. Subsequent job entries can then use this information. |
Set Variables | Job | Set environment variables based on a single input row. |
SFTP Put | Experimental | |
Simple Mapping | Mapping | Turn a repetitive, re-usable part of a transformation (a sequence of steps) into a mapping (sub-transformation). |
Single Threader | Flow | Executes a transformation snippet in a single thread. You need a standard mapping or a transformation with an Injector step where data from the parent transformation will arive in blocks. |
Socket reader | Inline | Socket reader. A socket client that connects to a server (Socket Writer step). |
Socket writer | Inline | Socket writer. A socket server that can send rows of data to a socket reader. |
Sort rows | Transform | Sort rows based upon field values (ascending or descending) |
Sorted Merge | Joins | Sorted Merge |
Split field to rows | Transform | Splits a single string field by delimiter and creates a new row for each split term |
Split Fields | Transform | When you want to split a single field into more then one, use this step type. |
Splunk Input | Transform | Reads data from Splunk. |
Splunk Output | Transform | Writes data to Splunk. |
SQL File Output | Output | Output SQL INSERT statements to file |
Stream lookup | Lookup | Look up values coming from another stream in the transformation. |
SSTable Output | Big Data | writes to a filesystem directory as a Cassandra SSTable |
Streaming XML Input | Deprecated | |
String operations | Transform | Apply certain operations like trimming, padding and others to string value. |
Strings cut | Transform | Strings cut (substring). |
Switch / Case | Flow | Switch a row to a certain target step based on the case value in a field. |
Symmetric Cryptography | Experimental | Encrypt or decrypt a string using symmetric encryption. Available algorithms are DES, AEC, TripleDES. |
Synchronize after merge | Output | This step perform insert/update/delete in one go based on the value of a field. |
Table Agile Mart | Agile | |
Table Compare | Utility | This step compares the data from two tables (provided they have the same lay-out). It'll find differences between the data in the two tables and log it. |
Table exists | Lookup | Check if a table exists on a specified connection |
Table input | Input | Read information from a database table. |
Table output | Output | Write information to a database table |
Teradata Fastload Bulk Loader | Bulk loading | The Teradata Fastload Bulk loader |
Teradata TPT Insert Upsert Bulk Loader | Bulk loading | Bulk loading via TPT using the tbuild command. |
Text file input | Input | Read data from a text file in several formats. This data can then be passed on to the next step(s)... |
Text file output | Output | Write rows to a text file. |
Transformation Executor | Flow | |
Transformation Executor | Flow | |
Unique rows | Transform | Remove double rows and leave only unique occurrences. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly. |
Unique rows (HashSet) | Transform | Remove double rows and leave only unique occurrences by using a HashSet. |
Univariate Statistics | Statistics | This step computes some simple stats based on a single input field |
Update | Output | Update data in a database table based upon keys |
User Defined Java Class | Scripting | This step allows you to program a step using Java code |
User Defined Java Expression | Scripting | Calculate the result of a Java Expression using Janino |
Value Mapper | Transform | Maps values of a certain field from one value to another |
Vertica Bulk Loader | Bulk loading | Bulk loads data into a Vertica table using their high performance COPY feature |
Web services lookup | Lookup | Look up information using web services (WSDL) |
Knowledge Flow | Data Mining | Executes a Knowledge Flow data mining process |
Write to log | Utility | Write data to log |
XBase input | Input | Reads records from an XBase type of database file (DBF) |
XML Input Stream (StAX) | Input | This step is capable of processing very large and complex XML files very fast. |
XML Input | Deprecated | |
XML Join | Joins | Joins a stream of XML-Tags into a target XML string |
XML Output | Output | Write data to an XML file |
XSD Validator | Validation | Validate XML source (files or streams) against XML Schema Definition. |
XSL Transformation | Transform | Transform XML stream using XSL (eXtensible Stylesheet Language). |
Yaml Input | Input | Read YAML source (file or stream) parse them and convert them to rows and writes these to one or more output. |
Zip File | Utility | Creates a standard ZIP archive from the data stream fields |