Sitecore CDP Audience Export

Sitecore CDP Audience Export : Data Challenges and Strategy

As organizations transition from Sitecore Experience Platform 10.1 to modern solutions like Sitecore CDP (part of the Sitecore AI ecosystem), data migration and continuity become critical concerns.

This blog outlines a real-world use case, key challenges, and the strategy used to ensure a seamless transition—especially when maintaining compatibility with existing downstream systems.


Use Case Overview

The current implementation is built on Sitecore Experience Platform 10.1, where:

  • All contact data and user interactions are captured
  • Data is sent to a third-party system
  • The external system processes the data and generates analytical reports

During migration to Sitecore CDP, a critical requirement is:

👉 The third-party system must continue receiving data in the exact same JSON format

This ensures:

  • No disruption in reporting
  • No changes required in downstream systems

Migration Challenges

1. Incomplete Data Availability

Not all constructs from XP exist in CDP:

  • No direct equivalents for:
    • Goals
    • Events (as defined in XP)
    • Page Event Definition items
  • Data model differences:
    • XP → Contact
    • CDP → Guest

👉 Result: Gaps in behavioral and historical data representation


2. Maintaining Data Format Consistency

The third-party system expects a strict JSON schema based on XP.

Any deviation can:

  • Break integrations
  • Require costly refactoring

Solution Approach

The implementation is divided into two parts:

Part 1: Data Creation Using Audience Export (This Blog)

Goals:

  • Recreate missing data constructs
  • Map CDP Guest data to XP Contact structure
  • Output data in the legacy JSON format

Recreating Missing Data

In XP, the Interaction object contains Goals and Events.

To replicate this in CDP:

Step 1: Create Custom Templates

In Sitecore AI CMS:

  • Create templates for:
    • Goals
    • Events
  • Organize them in dedicated folders:
    • /Goals
    • /Events

Step 2: Capture Custom Data via Events

Using the Cloud SDK (Headstarter Kit):

You can send custom data through:

  • Page View Events
  • Custom Events (e.g., Goal events)

Each event includes a data extension object.




Example: Fetch the goal and event data  crated using the custom event and goal template and populate as you need as shown in the above diagram.




Step 3: Access Data in Guest Profile

Once events are triggered:

  • Data is stored under:
    • Guest →Session→ Events → arbitraryData.ext

This enables:

  • Reconstruction of Goals
  • Event tracking per session

The data which we send from the this above event is available  in event object of the session.




"arbitraryData": {
"ext": {
"Data": "User Logged in Successfully",
"DateTime": "2026-04-08T07:36:06.512Z",
"DefinitionId": "88f351c3-be40-45ea-864a-54c2b830d62f"
}
}

👉 Key Point:
This allows you to inject missing XP-like data into CDP events.

⚠️ Limitation:

  • Session extension objects cannot be modified
  • Only event-level data is flexible

Exporting Data via Audience Export

There are multiple export options:

  • Custom API implementation
  • REST API extraction
  • Audience Export (chosen approach)

Step-by-Step: Audience Export

1. Create a Segment

Define a segment to filter relevant Guests.


2. Create Full Audience Export

Navigate to Audience Export and configure:

  • Select Full Export
  • Choose your segment

3. Define Output Format

This is the most critical step.

Two modes are available:

Basic Mode

  • Flat key-value JSON
  • Limited flexibility

Advanced Mode (Recommended)

  • Supports nested JSON
  • Allows reconstruction of XP-style structure

Please see the code below which is used to access all the session array of the guest and create a custom json which will be part of the audience export


(function() {
    var result = [];

    if (guest && typeof guest === 'object' && Array.isArray(guest.sessions) && guest.sessions.length > 0) {
        for (var i = 0; i < guest.sessions.length; i++) {
            var currentSession = guest.sessions[i];
          var sessionData = extractSessionDetails(currentSession);

                if (sessionData && Object.keys(sessionData).length > 0) {
                    result.push(sessionData);
                }
        }
    } else {
        console.log("No sessions present in guest object.");
    }

    console.log("Session Array created successfully.");
    console.log("Session detail extracted:", JSON.stringify(result));

    return { Payload: { sessions: result } };
})();

function extractSessionDetails(session) {
    var events = [];
    if (session.events && Array.isArray(session.events)) {
        for (var j = 0; j < session.events.length; j++) {
            events.push(extractEventDetails(session.events[j]));
        }
    }

    var city = "";
    var country = "";
    var region = "";
    if (session.dataExtension && session.dataExtension.length > 0 && session.dataExtension[0].values) {
      
    
     var exts = session.dataExtensions;
    for (var i = 0; i < exts.length; i++) {
        var ext = exts[i];
        if (ext && ((ext.key || "") === "bxt" || (ext.name || "") === "bxt")) {
            var v = ext.values || {};
            city= v.geoLocationCity || "";
            country= v.geoLocationRegion || "";
            region= v.geoLocationContinent || "";
            
        }
    }
    }

    // Dynamic values (fallback to defaults if missing)
    var browserMajor = session.userAgent || "Unknown Browser";
    var browserMinor = "0.0"; // You can parse userAgent string if needed
    var browserVersion = "0.0"; // You can parse userAgent string if needed
    var screenWidth = session.screenWidth || "480";
    var screenHeight = session.screenHeight || "480";
    var engagementValue = session.engagementValue || 30;
    
    
var newId = generateGUID();

    return {
        ChannelId: session.channel || "",
        EngagementValue: "",
        StartDateTime: session.createdAt || "",
        EndDateTime: session.endedAt || "",
        Duration: session.duration || "0",
        Events: events,
        UserAgent: session.userAgent || "",
        ContactId: session.guestRef || "",
        ConcurrencyToken: newId || "",
        LastModified: session.endedAt || "",
        Id: session.ref || "",
        WebVisit: {
            FacetKey: "WebVisit",
            Browser: {
                BrowserMajorName: browserMajor,
                BrowserMinorName: browserMinor,
                BrowserVersion: browserVersion
            },
            Language: session.language || "",
            OperatingSystem: {
                MajorVersion: session.osMajorVersion || "",
                MinorVersion: session.osMinorVersion || "",
                Name: session.operatingSystem || ""
            },
            Referrer: session.referer || "",
            Screen: {
                ScreenWidth: screenWidth,
                ScreenHeight: screenHeight
            },
            SearchKeywords: session.searchKeywords || "",
            SiteName: session.siteName || "website",
            _odata_type: "System.Collections.Generic.KeyValuePair`2[System.String,Sitecore.XConnect.Facet]"
        },
        IPInfo: {
            FacetKey: "IpInfo",
            AreaCode: session.areaCode || "N/A",
            BusinessName: session.businessName || "N/A",
            City: city,
            Country: country,
            IpAddress: session.ipAddress || "0.0.0.0",
            Isp: session.isp || "N/A",
            Latitude: session.latitude || "",
            Longitude: session.longitude || "",
            LocationId: session.locationId || "",
            MetroCode: session.metroCode || "N/A",
            PostalCode: session.postalCode || "N/A",
            Region: region,
            Url: session.referer || "",
            Dns: "",
            _odata_type: "Sitecore.XConnect.Collection.Model.IpInfo"
        },
        UserAgentInfo: {
            FacetKey: "UserAgentInfo",
            DeviceType: session.deviceType || "Computer",
            DeviceVendor: session.deviceVendor || "Unknown",
            DeviceVendorHardwareModel: session.deviceModel || "Desktop, Emulator",
            _odata_type: "Sitecore.XConnect.Collection.Model.UserAgentInfo"
        },
        CustomInteractionDetail: {
            FacetKey: "GoogleUser",
            GoogleSessionID: session.googleSessionID || "N/A",
            VisitorID: session.guestRef || "N/A"
        }
    };
}

function extractEventDetails(event) {
    return {
        id: event.ref || "",
        Data: event.arbitraryData && event.arbitraryData.ext && event.arbitraryData.ext.Data || "",
        Url: event.arbitraryData.page || "",
        DefinitionId: event.arbitraryData && event.arbitraryData.ext && event.arbitraryData.ext.DefinitionId || "",
        ItemId: "00000000-0000-0000-0000-000000000000",
        EngagementValue: event.engagementValue || "0",
        Timestamp: event.timestamp || new Date().toISOString(),
        Duration: event.duration || "00:00:00",
        EventType: event.type || "",
        Text: event.arbitraryData && event.arbitraryData.ext && event.arbitraryData.ext.data || "",
        DataKey: event.arbitraryData && event.arbitraryData.ext && event.arbitraryData.ext.data || "",
        EventItemId: event.ref || ""
    };
}

function generateGUID() {
    function s4() {
        return Math.floor((1 + Math.random()) * 0x10000)
            .toString(16)
            .substring(1);
    }
    return s4() + s4() + '-' + s4() + '-' + s4() + '-' +
           s4() + '-' + s4() + s4() + s4();
}

👉 Use this to:

  • Map Guest data
  • Include custom event data
  • Rebuild legacy JSON schema

4. Run Test

Validate the structure before full execution.







5. Execute Export

Click Run Now to generate the export.


6. Download Data

  • Go to Activity Tab
  • Enable download option (disabled by default)-Go to CDP settings as mentioned below


  • Download the file

7. Extract the Data

The export is in .gz format:

  • Default unzip may show empty data


After extraction:

  • You’ll get a single JSON file
  • Contains all Guest records for the segment



Key Insight

Even though CDP lacks native XP constructs:

👉 You can reconstruct them using custom event data + advanced export formatting


Part 2: Data Delivery via Sitecore Connect (Coming Next)

In the next blog, we will cover:

  • Accessing exported data using Sitecore Connect
  • Sending transformed data to the third-party system
  • Ensuring reliability and compatibility

Conclusion

Migrating from Sitecore Experience Platform 10.1 to Sitecore AI is more than a platform upgrade—it’s a data transformation challenge.

By leveraging:

  • Custom event extensions
  • Audience Export (Advanced Mode)
  • Structured JSON mapping

👉 You can successfully maintain backward compatibility with existing systems.

Comments

Popular posts from this blog

Solrcloud With Zookeeper -Single server setup

Render Sitecore Experience Forms Using Sitecore XP 10.4 with a Headless Approach (Next.js + JSS SDK)

Next.js with XM Cloud EDGE and GraphQL