Databricks Lakehouse Federation


Databricks Lakehouse Federation


Databricks Lakehouse Federation allows for data virtualization without importing data into Databricks. Although CData Connect Cloud allows for data virtualization, there are some features, such as data lineage and fine-grained access control, that can benefit from being virtualized on Databricks. For more information on Databricks Lakehouse Federation, see the Databricks documentation.

Databricks Lakehouse Federation can pull data from sources that you have connected to your CData Connect Cloud account. This page outlines the steps to connect Databricks Lakehouse Federation to the CData Connect Cloud Virtual SQL Server API.

Prerequisites

Before you connect, you must first do the following:

  • Connect a data source to your CData Connect Cloud account. See Connections for more information.
  • Generate a Personal Access Token (PAT) on the Settings page. Copy this down, as it acts as your password during authentication.

Connecting to CData Connect Cloud

To establish a connection from Databricks Lakehouse Federation to the CData Connect Cloud Virtual SQL Server API, follow these steps.

  1. Log into Databricks.

  2. In the navigation pane, select Catalog. Click + and select Add a connection.

  3. In Create Connection, enter the following:

    • Connection name—the user-defined connection name.

    • Connection type—select SQL Server from the drop-down list.

    • Auth type—select Username and password.

    • Hosttds.cdata.com

    • Port14333

    • User—enter your CData Connect Cloud username. This is displayed in the top-right corner of the CData Connect Cloud interface. For example, test@cdata.com.

    • Password—enter the PAT you generated on the Settings page.

  4. Click Test connection. If the connection succeeds, a confirmation dialog appears.

  5. You now need to add the target database to the Catalog. In the navigation pane, select the connection you just created and click Create catalog.

  6. In the Create a new catalog dialog, enter the following information,

    • Catalog name—enter a user-defined catalog name.

    • Connection—this is the Databricks connection you defined earlier and selected.

    • Database—enter the Connection Name of the CData Connect Cloud data source you want to connect to (for example, Salesforce1).

  7. Click Create. If the catalog creation succeeds, a confirmation dialog appears. Click View catalog to view your new catalog.

  8. You can also return to Catalog in the navigation pane and find the catalog you just added. Expand the catalog to view the objects in the catalog.

  9. Select a table and click the Overview tab to view table metadata.

  10. Click the Sample Data tab to view the real-time data in the table.

  11. Go to Dashboards in the navigation pane to visualize the data.