Databricks


Databricks


This page outlines the steps to configure Databricks to access CData Connect Cloud. Databricks can pull data from sources that you have connected to your CData Connect Cloud account.

Prerequisites

Before you can configure and use Databricks with CData Connect Cloud, you must first connect a data source to your CData Connect Cloud account. See Connections for more information.

You must also generate a Personal Access Token (PAT) on the Settings page. Copy this down, as it acts as your password during authentication.

Connecting to CData Connect Cloud

To establish a connection from Databricks to CData Connect Cloud, follow these steps.

  1. Download and install the CData Connect Cloud JDBC driver.

    1. Open the Client Tools page of CData Connect Cloud.
    2. Search for JDBC or Databricks.
    3. Click Download and select your operating system.
    4. When the download is complete, run the setup file.
    5. When the installation is complete, the JAR file can be found in the installation directory.
  2. Log into Databricks.

  3. In the navigation pane, select Compute. Start any compute or create a new one.

  4. Once the compute is started, click the compute and then select the Libraries tab.

  5. Click Install new. The Install library dialog appears.

  6. Select DBFS. Then drag and drop the JDBC JAR file into the indicated area. The file has the name cdata.jdbc.connect.jar. Click Install.

  7. You must now run three notebook scripts, one by one.

  8. The first script is below. Change the following:

    • Update User with your CData Connect Cloud username.

    • Update Password with the PAT you generated in the prerequisites.

    • Update Your_Connection_name with the name of the data source you created in the prerequisites.

    driver = "cdata.jdbc.connect.ConnectDriver"
    url ="jdbc:connect:AuthScheme=Basic;User=user@cdata.com;Password=***********;URL=https://cloud.cdata.com/api/;DefaultCatalog= Your_Connection_Name;"    
    
  9. Run the first script.

  10. From the menu on the right side, select Add cell below to add a second script. The second script is below. Change the following:

    • Update User with your CData Connect Cloud username.

    • Update Password with the PAT you generated in the prerequisites.

    • Update Your_Connection_name with the name of the data source you created in the prerequisites.

    • Update YOUR_SCHEMA.YOUR_TABLE with your schema and table, for example, PUBLIC.CUSTOMERS.

    remote_table = spark.read.format ( "jdbc" ) \
    .option ( "driver" , "cdata.jdbc.connect.ConnectDriver") \
    .option ( "url","jdbc:connect:AuthScheme=Basic;User=user@cdata.com;Password=*******;URL=https://cloud.cdata.com/api/;DefaultCatalog= Your_Connection_Name;") \
    .option ( "dbtable" , "YOUR_SCHEMA.YOUR_TABLE") \
    .load ()
    
  11. Run the second script.

  12. Add a cell for the third script. The third script is below. Select the columns you want to display.

    display (remote_table.select ("ColumnName1","ColumnName2"))
    
  13. Run the third script.

  14. You can preview your data in Databricks.