Formatting Rules

User data is uploaded by sending a multipart/form-data POST request very similar to the way Taxonomy is uploaded:

  1. Form “metadata”, with Content-Type “application/json”, is the JSON formatted metadata discussed in the previous section on Metadata.

  2. Form “data”, with Content-Type “application/octet-stream”, is the bz2 compressed stream of a set of rows separated by new line characters. Representation format of rows vary by the API being called.

  3. All user/audience bz2 compressed data should be limited to 5GB. If compressed user/audience file is larger than 5GB, please consider splitting them into smaller files smaller than 5GB.

Sections on /audience, /audience/segment, /audience/attribute and /audience/opt-out APIs describe their way of representing a unit of data, which may either be a JSON object or simple text. Irrespective of the representation format, DataX requires that each unit of data be collapsed as a single line of input in the payload, prior to compression. This enables parallel distributed processing of data within DataX.

For example, if the following JSON object represents attributes of a user: (please ignore actual JSON format. Representation format for a unit of data is discussed in detail in the next sub-sections.)

{
        "urn" : "user-1",
        "att" : [
                {
                "id" : "YaScore",
                 "val" : 28
                },
                {
                        "id" : "Age",
                        "val" : 32
                }
        ]
}

Each unit of data must be represented as a single line of input in the final payload. An attribute upload with YaScore and Age values for three users must look like the following prior to compression:

{"urn":"user-1",
"att":[{"id":"YaScore","val":56},
{"id":"Age","val":32}]}
{"urn":"user-2",
"att":[{"id":"YaScore","val":34},
{"id":"Age","val":44}]}
{"urn":"user-3",
"att":[{"id":"YaScore","val":78},
{"id":"Age","val":21}]}