Record selector
The record selector is responsible for translating an HTTP response into a list of Airbyte records by extracting records from the response and optionally filtering and shaping records based on a heuristic. Schema:
HttpSelector:
  type: object
  anyOf:
    - "$ref": "#/definitions/RecordSelector"
RecordSelector:
  type: object
  required:
    - extractor
  properties:
    "$parameters":
      "$ref": "#/definitions/$parameters"
    extractor:
      "$ref": "#/definitions/RecordExtractor"
    record_filter:
      "$ref": "#/definitions/RecordFilter"
The current record extraction implementation uses dpath to select records from the json-decoded HTTP response.
For nested structures * can be used to iterate over array elements.
Schema:
DpathExtractor:
  type: object
  additionalProperties: true
  required:
    - field_path
  properties:
    "$parameters":
      "$ref": "#/definitions/$parameters"
    field_path:
      type: array
      items:
        type: string
Common recipes:
Here are some common patterns:
Selecting the whole response
If the root of the response is an array containing the records, the records can be extracted using the following definition:
selector:
  extractor:
    field_path: []
If the root of the response is a json object representing a single record, the record can be extracted and wrapped in an array. For example, given a response body of the form
{
  "id": 1
}
and a selector
selector:
  extractor:
    field_path: []
The selected records will be
[
  {
    "id": 1
  }
]
Selecting a field
Given a response body of the form
{
  "data": [{"id": 0}, {"id": 1}],
  "metadata": {"api-version": "1.0.0"}
}
and a selector
selector:
  extractor:
    field_path: ["data"]
The selected records will be
[
  {
    "id": 0
  },
  {
    "id": 1
  }
]
Selecting an inner field
Given a response body of the form
{
  "data": {
    "records": [
      {
        "id": 1
      },
      {
        "id": 2
      }
    ]
  }
}
and a selector
selector:
  extractor:
    field_path: ["data", "records"]
The selected records will be
[
  {
    "id": 1
  },
  {
    "id": 2
  }
]
Selecting fields nested in arrays
Given a response body of the form
{
  "data": [
    {
      "record": {
        "id": "1"
      }
    },
    {
      "record": {
        "id": "2"
      }
    }
  ]
}
and a selector
selector:
  extractor:
    field_path: ["data", "*", "record"]
The selected records will be
[
  {
    "id": 1
  },
  {
    "id": 2
  }
]
Filtering records
Records can be filtered by adding a record_filter to the selector. The expression in the filter will be evaluated to a boolean returning true if the record should be included.
In this example, all records with a created_at field greater than the stream slice's start_time will be filtered out:
selector:
  extractor:
    field_path: []
  record_filter:
    condition: "{{ record['created_at'] < stream_slice['start_time'] }}"
Transformations
Fields can be added or removed from records by adding Transformations to a stream's definition.
Schema:
RecordTransformation:
  type: object
  anyOf:
    - "$ref": "#/definitions/AddFields"
    - "$ref": "#/definitions/RemoveFields"
Adding fields
Fields can be added with the AddFields transformation.
This example adds a top-level field "field1" with a value "static_value"
Schema:
AddFields:
  type: object
  required:
    - fields
  additionalProperties: true
  properties:
    "$parameters":
      "$ref": "#/definitions/$parameters"
    fields:
      type: array
      items:
        "$ref": "#/definitions/AddedFieldDefinition"
AddedFieldDefinition:
  type: object
  required:
    - path
    - value
  additionalProperties: true
  properties:
    "$parameters":
      "$ref": "#/definitions/$parameters"
    path:
      "$ref": "#/definitions/FieldPointer"
    value:
      type: string
FieldPointer:
  type: array
  items:
    type: string
Example:
stream:
  <...>
  transformations:
      - type: AddFields
        fields:
          - path: [ "field1" ]
            value: "static_value"
This example adds a top-level field "start_date", whose value is evaluated from the stream slice:
stream:
  <...>
  transformations:
      - type: AddFields
        fields:
          - path: [ "start_date" ]
            value: { { stream_slice[ 'start_date' ] } }
Fields can also be added in a nested object by writing the fields' path as a list.
Given a record of the following shape:
{
  "id": 0,
  "data":
  {
    "field0": "some_data"
  }
}
this definition will add a field in the "data" nested object:
stream:
  <...>
  transformations:
      - type: AddFields
        fields:
          - path: [ "data", "field1" ]
            value: "static_value"
resulting in the following record:
{
  "id": 0,
  "data":
  {
    "field0": "some_data",
    "field1": "static_value"
  }
}
Removing fields
Fields can be removed from records with the RemoveFields transformation.
Schema:
RemoveFields:
  type: object
  required:
    - field_pointers
  additionalProperties: true
  properties:
    "$parameters":
      "$ref": "#/definitions/$parameters"
    field_pointers:
      type: array
      items:
        "$ref": "#/definitions/FieldPointer"
Given a record of the following shape:
{
  "path":
  {
    "to":
    {
      "field1": "data_to_remove",
      "field2": "data_to_keep"
    }
  },
  "path2": "data_to_remove",
  "path3": "data_to_keep"
}
this definition will remove the 2 instances of "data_to_remove" which are found in "path2" and "path.to.field1":
the_stream:
  <...>
  transformations:
      - type: RemoveFields
        field_pointers:
          - [ "path", "to", "field1" ]
          - [ "path2" ]
resulting in the following record:
{
  "path":
  {
    "to":
    {
      "field2": "data_to_keep"
    }
  },
  "path3": "data_to_keep"
}